Tuesday, March 5, 2013

Codegate YUT Preliminary 2013: bin500

Last weekend, we had the chance to play at the Codegate YUT 2013 preliminary match. The final challenge was in the binary category, and presented us with the following window:

This is a Win32 application that asks as input 6 lotto digits (from the set 0-9a-z), which are processed once you click the "BUY" button. 

The first hurdle to jump through is the anti-debugging and anti-disassembly. There is one thread, running at 0x00401C80, that continuously checks for changes in the main function of the application, such as breakpoints and patches. We disable this thread. There are also some sequences of code sprinkled around that confuse disassemblers:

.text:00402607    push    33h
.text:00402609    call    $+5
.text:0040260E    add     dword ptr [esp], 5
.text:00402612    retf
.text:00402613    call    $+5
.text:00402618    mov     dword ptr [esp+4], 23h
.text:00402620    add     dword ptr [esp], 0Dh
.text:00402624    retf

This is a known trick to jump between x86 and x86_64 mode under WOW64, and back again. Since no code is actually running in 64-bit mode, we can just safely nop out all those sequences and be done with it.

We are now ready to take a look at the main function, starting at 0x004046C0. The function first checks the lotto sequence for a hardcoded correct value, by checking the MD5 of each individual character. It is easy to figure out which character sequence results in the desired lotto: 58p1gt.

Once we get past this check, we see another check. This one matches the lower 8 characters of a modified SHA-1 of each lotto number to the hardcoded string 26c113b16b376f27f86192b1b6c0273aeae2066af4eb6dbb. We simply patch over this check, since the SHA-1 modification used here calls GetTickCount in its compression function; clearly it's meant to be random and to distract us.

 We finally reach the important part. The 6 lotto letters are xored with the string (at 0x0058D6AC) {0x15, 0x59, 0x1c, 0x48, 0x49, 0x6c}, and a 24-byte array is formed:
    unsigned char key[24] = 
        '5', '8', 'p', '1', 'g', 't',
        '5'^0x15, '8'^0x59, 'p'^0x1c, '1'^0x48, 'g'^0x49, 't'^0x6c,
        '5', '8', 'p', '1', 'g', 't',
        'f', '3', 'v', '1', '4', '\0'
This array is then used as the key to decrypt the hardcoded 24-byte block at 0x005B2814. The cipher used here is 192-bit block Rijndael. Not AES, Rijndael — AES is only defined for 128-bit blocks.

The winning key 58p1gt does not give out a meaningful decryption of the hardcoded block. What are we to do? Bruteforce! 6 characters from the allowed 0-9a-z 36-char dictionary means we only have to go through at most ~2^32 combinations. 

We quickly find that the correct lotto key to correctly decrypt the block is 4z9na3, and the flag is "L0tto_1s_noT_bAd_9ame!"

The bruteforcer used to get at the solution is available here.

Thursday, January 17, 2013

Cloud hashing

Hash functions are everywhere. Getting this webpage to your computer has required at least a handful of hash calculations. We use them for identification, authentication, or even data structures, such as hash tables or Bloom filters. A particular strain of hash functions, cryptographic hash functions, tends to have more narrow uses: authentication (e.g., as part of HMAC), pseudorandom number generation, password storage (e.g., as part of HMAC in PBKDF2), and so on.

Until a few years ago, everyone was happy: MD5 ruled the land, while SHA-1 sometimes made an appearance. Disaster struck in 2004, when Xiaoyun Wang successfully found collisions in MD5, and seriously damaged SHA-1's reputation. Today, the landscape is different: MD5 and SHA-1 are no longer recommended, and SHA-2 and SHA-3 are the appointed replacements.

Yet, SHA-2 and SHA-3 do not fulfill every hash function requirement adequately. In particular, they are fairly slow in software, clocking at between 10 and 20 clocks per byte on modern x86 processors. In comparison, MD5 and SHA-1 clock between 4 and 6 clocks per byte. Applications that need to hash massive amounts of data will notice this gap. Therefore, such applications will often make the conscious choice to use SHA-1 or MD5, because a potentially insecure product is often better than an unresponsive product.

There are other options to consider when fast hashing (and non-strict adherence to standards) is required. Several SHA-3 finalists were very fast in software, such as BLAKE or Skein. Both are in the same order of magnitude of speed as MD5, and have passed the exhaustive level of scrutiny needed to get to SHA-3 finalist.

The recently announced BLAKE2 function takes advantage of the analysis already done on BLAKE, and strips the design of any unnecessary computational overhead. This results on a function that is expected to be as secure as SHA-3 in every way, but actually faster than MD5. Furthermore, BLAKE2 adds support for tree (parallel) hashing, which is a godsend for big data applications where sequential hashing would be simply too slow regardless of how optimized the design got.

So what are the applications of fast (and/or parallel) hashing?

File blacklisting/whitelisting is an obvious one; matching files to known hash sets is only dependent on disk and hashing speed. Fast-enough hashing can also benefit advanced filesystems. A remarkable case is ZFS, where a strong hash is supported (SHA-256), but disabled by default for performance reasons. SHA-256 is also required for other advanced features, such as deduplication.

Ultimately, pretty much every current hash use case does benefit from a faster function: faster equals less power consumed, and less overhead for the actual application. One exception is password storage, but you should not be using a plain hash function for that anyway!