Friday, January 8, 2016

ZeroAccess 3 Analysis

After the takedown attempt in December 2013, the current status of ZeroAccess (alias Sirefef) has been much disputed. Initially the botmaster had released a command to the remaining infrastructure which contained nothing but the message "White Flag", signaling they may had given up on the botnet; but between March 21st 2014 and July 2nd 2014, SecureWorks had reported the ZeroAccess infrastructure resumed distributing commands.

Ever since the takedown ZeroAccess has not attempted to infect new hosts, instead it simply uses the large number of remaining infections to perform tasks—until now, that is. On January 3rd R136a1 came across a previously unseen sample, which FireF0X confirmed to be a variant of ZeroAccess. It's uncertain how long this sample has been in the wild, but it certainly points towards the botnet being a little less dead than was previously claimed.


The first thing I noticed was the dropper has a resource file which contains a 61 KB PNG.

The image may look corrupted, which is because the developer decided to store the encrypted PE file in a valid PNG format. Rather than using stenography to hide the code in an existing image, the image is the code.

The dropper uses the native Ldr API to find and acquire the resource (LdrFindResource_U, LdrAccessResource) followed by the GdiPlus API to convert the PNG to a bitmap (GdipCreateBitmapFromStream), get the size of the code (GdipGetImageWidth × GdipGetImageHeight × 4), then decrypts it by xor'ing it with the key 'wDT:'.

Execution is then passed to some shellcode which is pointed to by the e_res field of the decrypted PE's header. The shellcode resides directly after the .reloc section but is not part of the PE and therefore will not be present if the executable is later dumped after it's loaded.

A unique and interesting thing about this shellcode is the way in which it handles resolving strings. Due to the fact shellcode is designed to be executed at whatever address it's allocated, it can't have the address of the strings hardcoded in; instead, it has to find their absolute address somehow. In this shellcode the author places a relative call directly before the string which makes a relative call to a stub after it, the stub then pops the return address into the eax and returns to the shellcode. As a result when string_ZwMapViewOfSection is called, the return address of call loc_22E (the first byte of the string 'ZwMapViewOfSection') ends up in the eax register.

The shellcode itself seems pretty mundane: it simply sets a hardware breakpoint on ZwMapViewOfSection using ZwSetContextThread with flag set to CONTEXT_DEBUG_REGISTERS, then loads shellstyle.dll (a legitimate system DLL). What actually happens though is when the PE loader tries to map shellstyle into memory, the breakpoint is hit and the shellcode's exception handler is called. The exception handler then modifies the return address of ZwMapViewOfSection to point to a function which replaces the module with the one we saw decrypted earlier, as a result loading the malicious DLL in the place of shellstyle.dll.

Once the shellcode has replaced shellstyle.dll, execution is returned to the dropper which then uses LdrGetProcedureAddress to resolve and call the payload's entrypoint.

First Stage Payload

The payload consists of a DLL file with an embedded Microsoft Cabinet file (.cab), which can be found by searching the binary for the string 'MSCF'. The cabinet contains 6 files: 3 for 32-bit and 3 for 64-bit.

File Description
s32 Boostrap list containing IPs and ports for 32-bit nodes
s64 Boostrap list containing IPs and ports for 64-bit nodes
k32 32-bit final stage payload
k32 64-bit final stage payload
l32Shellcode used to load k32
l32Shellcode used to load k64

Next the payload checks a couple of events to see if the system is already infected, if not it create the event \BaseNamedObjects\Restricted\{12E9D947-EDF5-4191-AADB-F51815F004D8}. A semi-unique identifier for the bot is created by MD5 hashing the volume creation time of \SystemRoot, then formatting it as a 32-bit GUID (just like with older versions of the bot).

This version of ZeroAccess stores itself in a folder named #., which is contained within the directory %localappdata%\Google\Desktop\Install\<BotGUID>. The folder name #. is special because it makes use of a difference in operation between the NativeAPI (ntdll) and Windows API (kernel32). Using native API functions a folder with a dot at the end of the name is entirely valid and
functional; however, under the Windows API it is not; the result is a usable folder that the malware can freely use to store and execute its components, but that cannot be accessed or modified by explorer or any application not using the Native API. For further protection, the folder is set as an NTFS Reparse Point which redirects to nowhere, preventing access by any means that is susceptible to NTFS re-parsing.

The payload extracts the bootstrap list (s32 or s64 depending on architecture) from the cabinet file, then decrypts and stores it inside the install directory in a file named @. The dropper is also moved to the same directory with the name GoogleUpdate.exe.

GoogleUpdate.exe is added to autostart by creating a key under HKCU\Software\Microsoft\Windows\CurrentVersion\RunOnce with the the bot's GUID as its name; the key name is prefixed with a null byte and written as Unicode, preventing Regedit from displaying it.

Throughout its installation process the bot sends status updates to a C&C via UDP on port 53 (DNS) with geolocation information obtained from the api and an affiliate id stored in the bot (ZeroAccess uses an affiliate scheme in which people can sign up to be paid a small amount of money for each computer they infect with the malware). All status update communication is encrypted with the key 0x4C4F4E47 ('LONG'), which is the same key as in previous versions.

Finally the main payloads l32 and k32 or l64 and k64 are extracted from the cabinet (my hypothesis is 'l' stands for 'Loader' and k stands for 'Kernel'). The ZeroAccess kernel is a DLL containing the main code for communicating with the peer-to-peer network and the loader is used to load the DLL in the same way the dropper loaded shellstyle.dll, except this time the DLL loaded and replaced is comres.dll.

If the process is UAC elevated it injects into svchost.exe, otherwise it will attempt to inject into explorer and bypass UAC using the same method as previous version (detailed in FireF0X's UACMe), before finally injecting into svchost.exe.

The injection methods works by writing the kernel, followed by the loader, to the remote process using ZwAllocateVirtualMemory and ZwWriteVirtualMemory, then executing the loader by calling ZwQueueApcThread. In the case of 64-bit, the payload will use Heaven's Gate to switch into 64-bit mode before calling the injection routine, allowing it to inject 64-bit processes from WOW64.

Peer to Peer Communications

The new ZeroAccess variant largely preserves the communication protocol of its previous versions, with some notable changes. For completeness, the UDP packet format for C&C communication has a common header of the form

struct zeroaccess_packet {
  uint32_t crc32;
  uint32_t type;
  uint32_t id;
  uint32_t length        : 16;
  uint32_t reply         :  1;
  uint32_t packet_number : 15;

The packet_number field appears to be new. It is used to ensure incoming packets match earlier requests, possibly to prevent 'retL' flooding by takedown attempts. The type field can be one of 'getL' and 'retL—the 'newL' field of some earlier versions is now gone. id is a 32-bit bot session id, generated at startup and used to identify sessions. The reply field indicates whether the receiving node should reply with a getL of its own.

Every packet is encrypted—as before—with a very simple rolling-xor algorithm:

void zeroaccess_encrypt(unsigned char * message, size_t bytes) {
  uint32_t k = 0x31323334UL;
  assert(bytes % 4 == 0);
  for(size_t i = 0; i < bytes; i += 4) {
    store_le32(m + i, load_le32(m + i) ^ k);
    k = (k << 1) | (k >> 31);

Note the new key, '1234', instead of the old one 'ftp2'. This same algorithm is used with the 'LONG' key mentioned above.

The getL packet is empty and consists only of the above header. The retL packet format is slightly changed due to one of the most noteworthy changes in the protocol—UDP ports are now randomly-generated, and thus the retL packets must include port information. The retL packet is of the form

struct peer_entry {
  uint32_t ip;
  uint32_t port : 14;
  uint32_t timestamp : 18;

struct file_entry {
  uint32_t id;
  uint32_t timestamp;
  uint32_t length;
  uint8_t  signature[128];

struct retL {
  peer_entry peers[16];
  file_entry files[/*variable*/];

On 32-bit Windows hosts, the port number is computed as peers[i].port + 16384, whereas on 64-bit Windows, it is computed as peers[i].port + 32768. It is therefore simple to figure out an individual node's platform by its UDP port number. The timestamp is computed as (GetTime() - 0x41000000) / 3600, where GetTime() is

ULONG GetTime() {
  FILETIME time;
  ULONG stamp;
  RtlTimeToSecondsSince1980((LARGE_INTEGER *)&time, &stamp);
  return stamp;

We can see that the base timestamp 0x41000000, i.e., July 22 2014, lower-bounds the age of this variant. The file list format remains unchanged (modulo timestamp format), albeit with a distinct RSA-1024 verification key compared to the previous botnets.

The bot must also remember what its port number is. This information is stored in the Extended Attributes of the @ file. The list of known peers is stored  in this file, as remarked above.


We have some early crawling numbers for this variant. But first, is may be useful to compare with the previous ZeroAccess 2 botnet. We see for ZeroAccess 2 daily distribution seemed rather consistent. The total confirmed connections were approximately 715,000 for the month.

Looking deeper into a sub aggregation of country and unique IPs counted we can see a high level in US on Comcast, although this could be explained by IP churning due to DHCP for the cable provider. Nonetheless, this is a high number of uniques compared to any other ISP.

Finally, we see a rather healthy distribution of nodes and countries with a heavy weight in Japan and US. Additionally, we observe an unusual high amount of unique IPs the day after Christmas.

In comparison, the latest Zero Access 3 variant, we see the IP distribution is primarily focused in Russia. Whether this team is preparing the infrastructure for a larger planned botnet or it is focusing its efforts in the area is still unknown. 

We can potentially hypothesize that this botnet has been retooled to allow botnet subgroups using the generation of new encryption keys and RSA binary signing keys. The result of this would mean multiple actors could use the same malware, similarly to trends that have been observed with other seed-based malware, like Tiny Banker (aka Tinba). This would make takedowns more difficult in that there is no one central peer to peer network, but rather a cluster of peer to peer networks. Another plausible hypothesis is that the ZeroAccess code base has been sold (or stolen) to a third party, one who is more interested in targeting Eastern Europe.

These types of networks are becoming more and more resilient to takedowns, as observed with Dridex, and even ZeroAccess itself is still maintaining its presence. With multiple authors and actors involved, the difficulty of takedown attempts by technical means is also compounded by the difficulty of prosecuting multiple authors (and actors) in multiple geopolitically diverse jurisdictions.

We would like to thank MalwareTech for his invaluable contributions to this research.

Saturday, March 21, 2015

Complexity is the bugdoor's friend

Backdoors are a fashionable topic these days, ever since the BULLRUN program was uncovered by the Snowden leaks. Bruce Schneier and others recently wrote a survey on the topic, which covers much of what is known about backdoors, available here. You should also check the Underhanded Crypto Contest out, which has ended a couple of weeks ago.

Backdoors come in all shapes and sizes: some can be mathematical in naturelike the Dual EC generatorwhile others can be simply coding mistakes in a key routine in an application, also known as “bugdoors”. They may be used to leak (parts of) secret keys, derandomize random number generators, bypass authentication measures, or simply leak plaintext. Today I will show an example of the latter.

There is a now well-known method to introduce a “bugdoor” in RC4 implementations:

#define TOBYTE(x) (x) & 255
#define SWAP(x,y) do { x^=y; y^=x; x^=y; } while (0)

static unsigned char A[256];
static int i=0, j=0;

void init(char *passphrase) {
  int passlen = strlen(passphrase);
  for (i=0; i<256; i++)
    A[i] = i;
  for (i=0; i<256; i++) {
    j = TOBYTE(j + A[TOBYTE(i)] + passphrase[j % passlen]);
    SWAP(A[TOBYTE(i)], A[j]);
  i = 0; j = 0;

unsigned char encrypt_one_byte(unsigned char c) {
    int k;
    i = TOBYTE(i+1);
    j = TOBYTE(j + A[i]);
    SWAP(A[i], A[j]);
    k = TOBYTE(A[i] + A[j]);
    return c ^ A[k];

This method relies on the clever (and old!) trick of swapping registers without using a temporary variable by using XOR (or, in general, addition and subtraction on any commutative group). When i == j, however, this trick will instead zero the state byte. Eventually the entire state will be zero, and the result of RC4 encryption will be the unaltered plaintext!

I present now a twist on this trick, this time in C++. Here it is:

#include <cstddef>
#include <cstdint>

#include <tuple>
#include <utility>

namespace security {
  namespace util {
    // type safety is great
    template <typename T> struct wrap {
      T x;
      wrap(const T &x = T()) : x{x} {}
      operator T &() { return x; }

    using  uint8 = wrap< std::uint8_t>;
    using uint16 = wrap<std::uint16_t>;
    using uint32 = wrap<std::uint32_t>;
    using uint64 = wrap<std::uint64_t>;

  namespace rc4 {
    template <typename uint8 = util::uint8>
    inline void stream(unsigned char * s, std::size_t slen,
                       const unsigned char * k, std::size_t klen) {
      using std::swap;
      uint8 S[256];

      for (int i = 0; i < 256; ++i) {
        S[i] = i;

      for (int i = 0, j = 0; i < 256; ++i) {
        j = (j + S[i] + k[i % klen]) & 255;
        swap(S[i], S[j]);

      for (uint8 i = 0, j = 0; slen--;) {
        i += 1;
        j += S[i];
        swap(S[i], S[j]);
        *s++ = S[(S[i] + S[j]) & 255];

// elsewhere
namespace security {
  namespace util {
    template <template <typename...> class U, typename... V>
    void swap(U<V...>& x, U<V...>& y) {
      // Option #1
      // decltype(auto) t = x; x = y; y = t;
      // Option #2
      using std::tie;
      tie(x, y) = tie(y, x);

This one relies on a combo of C++ idiosyncrasies. Starting with the swap, I present two similar ways to subvert it:
  • Option #1 uses the C++14 decltype(auto) keyword, which is essentially the auto keyword with decltype semantics. What this means is that the type of t will be wrap<uint8_t> &, instead of the expected wrap<uint8_t>. The result of the swap will then be (y, y) instead of (y, x).
  • Option #2 uses std::tie instead. The idea is the same here: tie(x, y) creates an std::tuple<wrap<uint8_t>&, wrap<uint8_t>&>, that is, a tuple of references, and thus the assignment will necessarily overwrite one of the registers.
In either case, the result will be that the RC4 internal state will now have 2 equal values, instead of 2 swapped values. This will ensure that eventually we will have an output of always (or almost always) the same value. This is somewhat superior to Wagner and Biondi's method, in that simply by looking at the ciphertext it is not as easy to detect that something has gone horribly wrong.

But this is not enough; rc4::stream explicitly imports std::swap, so this (correct) version of the swap should be used instead. To get around this, we use C++’s argument-dependent lookup (ADL). ADL basically says that, when given an unqualified (i.e., without the explicit std::) function call, C++ will also look for the function in the argument's namespaces. Therefore, the call swap(S[i], S[j]) will find both std::swap and security::util::swap, since wrap<uint8_t> also lives in security::util. The wrap<T> type may be justifiable as a type employed for added security checks, e.g., to detect integer overflow/range or to ensure two’s complement semantics. In this example it is a very simple barebones wrapper around T.

However, if we place a generic swap(T&, T&) swap function in security::util, we will get a compiler error, since the call is now ambiguous according to C++ rules. Since swap(wrap<uint8_t> &, wrap<uint8_t>&) would be too obvious, I went with swap(U<V>&, U<V>&). C++ orders overloadable functions by specialization, and since U<V>& is “more specialized” than T&, U<V>& gets priority and solves the ambiguity.

This is not yet enough. Since we defined swap after rc4::stream, std::swap would still be picked, since the name lookup process would not look at the later definition. To work around this, we make rc4::stream a template function, which forces the lookup of swap to occur at instantiation time instead of when the function is defined.

What do we learn from all of this? Well, certainly that C++ is a complex language, but I doubt that was ever in question. C++11 is in many ways a nicer language than its predecessor, but it is not a simpler one. And in complexity it is easier to hide flawsit would certainly take me much longer to spot the flaw in this version than in Wagner and Biondi’s.

Many of our current woes with cryptography-related implementations, such as Heartbleed, the Triple Handshake Attack, and OpenSSL’s regular CVEs, can be (partially) traced back to complexity; it is therefore imperative to look at such code as not only a blackbox, but also a source of attack surface which can lead to privacy loss. On our audits, unnecessary complexity and over-engineering are often things that we flag, despite not being flaws in and of themselves, because they often enable subtle security issues to creep in unannounced.

Thursday, January 8, 2015

OpenSSL's squaring bug, and opportunistic formal verification

OpenSSL's latest round of security advisories is out today, and besides the usual memory safety issues—seemingly endemic in its DTLS implementation—there is also an interesting bug in multiprecision arithmetic. In this post I will describe how this bug could have been avoided, in the security community's sacred tradition of simple post facto solutions.

The bug

We can look at this OpenSSL commit for the bug details. I will focus on one of the affected functions, mul_add_c2; what follows applies similarly to the other one. Here's the diff:

 #define mul_add_c2(a,b,c0,c1,c2) {     \
        BN_ULONG ta=(a),tb=(b),t0;      \
        BN_UMULT_LOHI(t0,t1,ta,tb);     \
-       t2 = t1+t1; c2 += (t2<t1)?1:0;  \
-       t1 = t0+t0; t2 += (t1<t0)?1:0;  \
-       c0 += t1; t2 += (c0<t1)?1:0;    \
+       c0 += t0; t2 = t1+((c0<t0)?1:0);\
        c1 += t2; c2 += (c1<t2)?1:0;    \
+       c0 += t0; t1 += (c0<t0)?1:0;    \
+       c1 += t1; c2 += (c1<t1)?1:0;    \

The issue here is incorrect carry propagation. This results in a very hard to spot bug, unlikely to be caught by randomized testing, especially with 64-bit limbs.

Luckily, this bug does not seem to be easily exploited to perform bug or invalid point point attacks, but the next time we may not be so lucky. Additionally, this is the kind of bug that the often-touted solution to OpenSSL's security problems—memory-safe languages like Rust, Go, OCaml and so on—is powerless against. What's the answer?

Avoiding the bug

The obvious answer to avoiding subtle bugs like the above is, of course, more auditing. But auditing is not cheap; people with both the ability to understand and spot flaws in delicate cryptographic code are rare and hard to come by. 

By now, the open-source community has also been shown to be lackluster in its capability to find these mistakes; Eric Raymond's “given enough eyeballs, all bugs are shallow” aphorism has proved to be an overstatement, and catastrophic vulnerabilities linger for years, and sometimes decades, unidentified.

One answer is formal verification. Proving code correct has obvious advantages; this process often also highlights flaws in the programmer's thinking (e.g., what is actually being proved does not match what is intended). Formal verification tools have come a long way, but they are still very hard to use. The verification of a Curve25519 scalar multiplication implementation is still A Big Deal; Adam Langley has also written on the state of formal verification on the wild.

While Langley concerns himself with a slightly different issue, ensuring limbs do not overflow on unsaturated arithmetic, our predicament here suggests another, conceptually simpler approach: (dis)prove that mul_add_c2, which uses saturated arithmetic and overflows limbs by definition, is equivalent to \((c_2 2^{64} + c_1 2^{32} + c_0) + 2ab\). For this we'll use an SMT solver, in this case Z3 for convenience, directly.

For bite-sized functions like mul_add_c2 the situation is not so bad. Suppose that the programmer, concurrently with the C code, had also written the following Python code, using z3py:

from z3 import *

W = 32 # word length

# t0 + t1*2**W = a * b
def BN_UMULT_LOHI(a, b):
  prod = ZeroExt(W, a) * ZeroExt(W, b)
  t0 = Extract(1*W-1,  0, prod)
  t1 = Extract(2*W-1,  W, prod)
  return t0, t1

# b ? 1 : 0
def COND(b):
  return If(b, BitVecVal(1, W), BitVecVal(0, W))

a, b, c0, c1, c2 = BitVecs("a b c0 c1 c2", W)
ref = Concat(c2, c1, c0) + 2 * ZeroExt(2*W, a) * ZeroExt(2*W, b)

t0, t1 = BN_UMULT_LOHI(a, b)
t2 = t1+t1; c2 += COND(ULT(t2,t1));
t1 = t0+t0; t2 += COND(ULT(t1,t0));
c0 += t1; t2 += COND(ULT(c0,t1));
c1 += t2; c2 += COND(ULT(c1,t2));

prove( Concat(c2, c1, c0) == ref )

Running this program immediately tells us not only that the function is incorrect, but also gives us an example of an input where the function fails! Better yet, this is verifiable by anyone with the appropriate software, instead of solely by domain experts.

$ python 
[c2 = 3774871552,
 b = 2567394084,
 a = 3592503423,
 c0 = 4044407142,
 c1 = 3283914074]

Note that while we are using Z3 here for its convenient Python bindings, we can easily export the expression to the standard SMTLIB2 format instead, and use any solver we like. Plugging in the corrected code,

t0, t1 = BN_UMULT_LOHI(a, b)
c0 += t0; t2 = t1 + COND(ULT(c0, t0));
c1 += t2; c2 += COND(ULT(c1, t2));
c0 += t0; t1 += COND(ULT(c0, t0));
c1 += t1; c2 += COND(ULT(c1, t1));

we can now see that Z3 finds no problem with the code:

$ python 

Success! Now, obviously this requires more developer time; but the code is mostly shared between the real implementation and the verification, and the size of the functions to be proved correct is rather limited. Yet we believe this is still a net gain against easy to make mistakes, which can stay hidden for a long time, even in the best implementations. Considering OpenSSL's status as critical Internet infrastructure, it seems warranted to go the extra mile on routines like this.

Theorem provers are often useful in our software auditing approach at Kryptos Logic; they are faster than most humans at finding failure cases for tedious functions, and leave us free to reason about other parts of the system.


Somebody points out that the proof is incorrect; the issue here is that we are working over the bitvector logic, which makes our proof implicitly modulo \(2^{96}\). The underlying assumption here, which we also made, is that the result fits into the 3 output words. This is reasonable in the context in which the function is used, but without context it does make the proof incorrect. An easy way to correct this is to add a few bits of slack to account for overflow:

ref = ZeroExt(W, Concat(c2, c1, c0)) + 2 * ZeroExt(3*W, a) * ZeroExt(3*W, b)
# ...
prove( ZeroExt(W, Concat(c2, c1, c0)) == ref )

This results in a failure now!

$ python 
[c2 = 4294967295,
 b = 12788566,
 a = 4294967288,
 c0 = 2786395306,
 c1 = 4282404864]

We can resolve this by adding a constraint on the maximum size of the \( c \) at input:

assumption = ULT(Concat(c2, c1, c0), 2**(3*W-1))
ref = ZeroExt(W, Concat(c2, c1, c0)) + 2 * ZeroExt(3*W, a) * ZeroExt(3*W, b)
# ...
solve( ZeroExt(W, Concat(c2, c1, c0)) != ref, assumption )

Now the proof works again under the assumption that \(c < 2^{95}\) (we use z3py's solve function here to use multiple constraints; the prove function is implemented in terms of solve).

$ python 
no solution

Thursday, October 2, 2014

World War Zero Access — When Zombie botnets come alive

To kick off cyber security awareness month and, of course, the Halloween month, today we discuss how to bring botnet zombies (or zombots, as we sometimes call them) back from the dead, and why you should care.

Botnets have become big business in criminal circles, and a scourge on the Internet. At any given moment there are many botnets active, controlling millions of unwilling computers to send spam, steal personal information, or whatever someone paying the botnet controllers wants them to do.

The industry, naturally, has taken measures against these botnets. Some of them are controlled by command-and-control servers who are continuously updated at certain algorithm-generated domains; hijacking these domains effectively stops the malware from receiving new commands.

More recent botnets, such as Zeus Gameover, are more decentralized and implement their own peer-to-peer protocols to make taking them down harder. The usual approach there is to sinkhole the peers, i.e., to inject numerous bogus peers to the botnet until they overtake the real ones. This is a half-measure at best, and botnets are quickly adapting to resist this sort of attack.

Fighting botnets is hard work.

The end result is that, unless the threat is removed directly from the infected machines, the malware will linger possibly indefinitely there. This can have several consequences, one of which is posthumous reactivation even after the original owners are gone.

Case study: Zero Access

Take the Zero Access malware, for example. This was originally a kernel-level rootkit that infected machines, and acts as a delivery framework for other malicious payloads. Its communication protocol was peer-to-peer, with a backup C&C server, on a few TCP ports, with two layers of communication:

 - 'getL', 'retL', 'newL' commands, which manage the dispersion of new infected nodes;
 - 'getF', which downloads a module which executes an actual malicious payload. Files are signed using a 512-bit RSA key.

Sometime around 2012, Zero Access upgraded their protocol: they switched to UDP, and switched packet encryption from RC4 with a hardcoded key to a custom 'cipher':

    void encrypt(unsigned char * msg, size_t msg_size) {
      uint32_t k = 0x66747032;
      for(size_t i = 0; i < msg_size; i += 4) {
        store32_le(msg + i, load32_le(msg + i) ^ k);
        k = rotate_left(k, 1);

This later version of Zero Access also upgraded the RSA key to sign payloads to 1024 bits. Besides that, the protocol didn't change much, beyond some basic IP filtering to prevent sinkholing.

A 2013 survey  by Christian-Rossow et al found around 50,000 Zero Access 1 nodes stilll around; other more recent crawls by Peter Kleissner show that this number has remained somewhat stable. Indeed, there are still many Zero Access 1 bots floating around, which have not upgraded to version 2 for one reason or another. Hijacking these nodes to do one's bidding is one RSA-512 key away:

-----END PUBLIC KEY-----
Is this a problem? Not at all. The first 512-bit key was factored 15 years ago by a team at CWI. It did take many machines and months of work. In the ensuing decade, however, this task became incredibly easier. Today, with tools like GGNFS, msieve, or CADO-NFS, you can factor keys of this size (and higher) in just a couple of days on a beefy machine. And we did. For all intents and purposes, every remaining old Zero Access node is controllable by anyone who has a spare machine to do this.

To show we did in fact factor the key, here is a Base64-encoded RSA-PSS signature for the message 'Braaaaaaaaaains!':

You can verify the signature with the following Python script:

    from Crypto.Signature import PKCS1_PSS
    from Crypto.Hash import SHA
    from Crypto.PublicKey import RSA

    pub = '''-----BEGIN PUBLIC KEY----- 
    -----END PUBLIC KEY-----'''
    pk  = RSA.importKey(pub)
    verifier =

    msg = 'Braaaaaaaaaains!'
    sig = '''HnwaZghZhtiPJ+OHbTnNDAVcg/yvciDdmBi+SRoKwWXV4k

    print verifier.verify(, sig)

What does this imply? An organization infected with a dead botnet's malware can be specifically targeted, by combining botnet crawling with WHOIS information. The malware could then be taken over for information extraction. Keep in mind that botnets are not solely after monetary profit, by way of credit card info. They often are rented to the highest bidder, and can be used for corporate espionage, spam, or as a delivery platform for more advanced intrusions.

This is less farfetched than you think. Among the many Snowden revelations of the last year, one of them was that the NSA has been busy taking over botnets for their own purposes, going as high as 140,000 hijacked peers.
It is safe to assume that they are not the only organization doing this. Until the infections themselves are killed by their respective organizations, this is a latent risk.

Crypto is hard

The buck does not stop here. Malware has routinely used cryptography in its operation for a while, be it C&C communication, encrypting files for ransom, or to authenticate payloads. Like every other software, however, malware also regularly gets it wrong, with hilarious effects:

The point here is that malware authors make cryptographic mistakes just like any other developer, and this can often translate to direct control of the infected machine by entities other than the author. In other words, even if the original authors are taken down, even if the original command and control servers disappear, so long as the malware is still present, in a dormant state, it is still a significant risk to the whole ecosystem.

But wait, there's more. The most common algorithm and key length in all the malware we have analyzed is 1024-bit RSA. This was once impregnable, like RSA-512 before it, but 1024-bit keys have already reached end-of-life status and are gradually becoming factorable, if they are not already (by well-funded agencies).

Arjen Lenstra estimated, back in 2009, that RSA-1024 was probably secure against community efforts (that is, relatively underfunded) until around 2014. A few years from now, we may see many of these botnetseven if abandoned by thenbecome alive again by the first attacker to factor those keys.
Yes, master...I will do your bidding


Botnets have always been quick to adapt to new landscapes, and are increasingly harder to fully take down as a result of the industry's attempts to do just that. Some of the more advanced malware today is not far from being an APT. In fact, some malware is APT. Therefore it is very dangerous to dismiss infections, even from allegedly inactive botnets, as low-risk. Most of them are equipped with information-stealing capabilities, and can fall into the wrong hands, even after the original owners are gone.

How can you know if your organization is plagued by malware or zombots? One way is by listening in to our breach intelligence feed, Vantage, which continuously monitors the Internet for common botnet activity, among other threats.

So be careful, because the next time you hear trick or treat, it might be a new zombie horde knocking at your ports! Happy Halloween Month ;)

Wednesday, July 2, 2014

H.323 Registration Weaknesses: Part 2

Last time, we have seen how Avaya's H.323 default gateway authentication method was vulnerable to an attacker that is able to passively intercept communications between a client and the gateway. Upon hearing this, we were pointed to the secure authentication method employed in Avaya's gear, with the proprietary OID 2.16.840.1.114187.1.6.2.

This new authentication method is based on DH-EKE, a password-based authenticated key exchange (PAKE). Invented by Bellovin and Merritt in the early 1990s, PAKE schemes kill two birds with one stone:

  • They prevent dictionary attacks to recover passwords, even if an attacker is able to view all traffic;
  • They prevent man-in-the-middle attacks if an active attacker does not know the password. This is an improvement over regular Diffie-Hellman key agreement.

Avaya's PAKE scheme is DH-EKE, One of the variants proposed originally by Bellovin and Merritt. Many more PAKE schemes have popped up over the years, like SPEKE, SRP, J-PAKE, or SPAKE2. Many of them are hindered by patents; EKE has the advantage of its patent having expired a few years ago.

In the secure scheme, the following message sequence happens between the gateway (\(G\)) and user (\(U\)):

  • gatekeeperRequest (\(U \rightarrow G\)): \(U\) generates their own Diffie-Hellman public key, \(\text{DH}_U\), and encrypts it using AES-128 in CTR mode using a key derived from the first 16 bytes of SHA-1(password). \(U\) sends this encrypted public key to \(G\), along with a nonce \(R_e\), and \(IV\), the CTR mode nonce.
  • gatekeeperConfirm (\(G \rightarrow U\)): \(G\) generates its own Diffie-Hellman key \(\text {DH}_G\) and computes the shared key \(K_m\) after decrypting \(U\)'s public key. \(G\) then sends back \(\text{DH}_G\) (unencrypted), a nonce \(R_g\), and the HMAC-SHA1-96 authenticator of this message (with a key derived from \(K_m\)).
  • registrationRequest (\(U \rightarrow G\)): This request proceeds without any significant cryptographic material, except that an HMAC-SHA1-96 authenticator is appended to ensure that \(U\) also knows the shared key \(K_m\).
  • registrationConfirm/registrationRejected (\(G \rightarrow U\)): This message notifies the client whether the registration failed or succeeded.

A good implementation of this protocol makes both offline password bruteforcing and man-in-the-middle attacks unfeasible, as pointed out above. When we analyzed Avaya's client software, however, we found the following optimization: the client did not generate a new Diffie-Hellman key for each registration. Instead, it caches the same \(\text{DH}_U\) for every registration within the same session. This allows offline bruteforcing of the password, which can also lead to man-in-the-middle attacks. 

How does this work? Suppose that an attacker captures two message exchanges like the above for the same user and password. Let \(\text{AES-CTR}_{K_p}(IV)\) be the keystream obtained by encrypting \(\text{IV}, \text{IV}+1\), etc, using \(K_p\) = SHA1(password)[0:16]. Since \(\text{DH}_U\) is the same for both exchanges, the encrypted content of the two gatekeeperRequest packets can be represented as follows:

\[ C_0 = \text{AES-CTR}_{K_p}(\text{IV}_0) \oplus \text{DH}_U \\ C_1 = \text{AES-CTR}_{K_p}(\text{IV}_1) \oplus \text{DH}_U \]

We can use this fact to test whether a guessed password is correct. Suppose we guess a password p; then it is the correct password only if 

\[ \text{AES-CTR}_{K_p}(\text{IV}_0) \oplus C_0 = \text{AES-CTR}_{K_p}(\text{IV}_1)  \oplus C_1 \]

Therefore one password check requires only two AES encryptions and one SHA-1 evaluation, and can be made extremely fast. GPU crackers could have a field day with this.

Unlike the case in the previous post, a sufficiently strong password is enough to thwart bruteforce, since this scheme is using stronger cryptographic primitives. But it is a safe assumption that most passwords are weak, especially so in phones, where it is more likely than not simple a PIN number.

I should also stress that this bug has been fixed; Avaya's PSST was rather forthcoming and professional in the handling of our report, and made the necessary steps to resolve this issue as quickly as possible.

Tuesday, June 24, 2014

H.323 Registration Weaknesses: Part 1

Voice-over-IP VoIP is a rather obscure niche. There are essentially two main standards: SIP and H.323. While the former may be more well-known by the Internet crowd, H.323 is still the dominant standard today for large deployments, due to the telecom heritage of ITU-T standards.

Some time ago, we found a vulnerability in the endpoint registration phase of several Avaya products. In fact, we found two vulnerabilities: one in the "default" autentication method for Avaya phones, and another one in the "secure" method. This post describes the first one. I will describe the second, more interesting, flaw in the next post.

The default authentication mode for Avaya's phones was based on the ISO/IEC 9798-2 standard. This is essentially a challenge-response protocol, wherein the user must prove to the gateway he knows the password by encrypting a gateway-chosen secret with a key derived from it. When a client wants to authenticate itself to a gateway, both knowing a shared secret (e.g., a password), the following interaction happens between the user (U) and the gatekeeper (G):

  • gatekeeperRequest (\(U \rightarrow G\)): this message contains the authenticationCapability field set to pwdSymEnc, and the algorithmOIDs field set to the OID, which is defined to be DES in CTS mode.
  • gatekeeperConfirm (\(G \rightarrow U\)): \(G\) sends back to \(U\) a ClearToken containing the challenge. This challenge consists of a timestamp, a random integer, and generalID, an identifier of the gateway.
  • registrationRequest \(U \rightarrow G\): \(U\) sends the ClearToken back to \(G\), encrypted using the user's key \(K_{p}\) using the aforementioned DES-ECB-CTS. The key derivation is virtually nonexistent, and resembles the weak Norton Diskreet function breakable with Copacobana:

  •  unsigned char deskey[8] = {0,0,0,0,0,0,0,0};  
     for(i=0; i < pwlen; ++i)  
       deskey[i%8] ^= 2*password[i];  
  • registrationConfirm/registrationRejected \(G \rightarrow U\): Depending on whether the authentication succeeded, the gateway will send back one of these messages to the user.
So what is the problem here? The protocol employed for the authentication, ISO/IEC 9798-2 (Mechanism 2), seems to be sound, according to its requirements. This protocol, however, does not do anything to prevent offline attacks that retrieve the shared key; since this key is derived from a (potentialy weak) password, bruteforce will work quite effectively.

One might argue that this is not really a weakness, and that a strong password should be able to keep attackers at bay. Unfortunately, due to the poor cipher choice (56-bit DES in ECB-CTS mode), poor key derivation function, and known plaintext (generalID), one is able to bruteforce any password, regardless of its strength.

Consider the following gatekeeperRequest sample packet:

0000   45 00 01 00 c0 17 24 09 12 02 6a df 14 00 44 00  E.....$...j...D.
0010   45 00 46 00 49 00 4e 00 49 00 54 00 59 00 2d 00  E.F.I.N.I.T.Y.-.
0020   47 00 4b                                         G.K             

We have known plaintext there, namely the UTF-16 string "DEFINITY-GK". Given the encrypted registrationRequest packet,

0000   3d 7a d2 d5 98 9a 04 1c 07 b0 b2 c0 ca 9b fb cb  =z..............
0010   c6 ab 9c 5a 74 e5 0f 4d 80 3e bf fe bb bb d9 ca  ...Zt..M.>......
0020   1d 33 ef                                         .3.             

we can obviously bruteforce the DES key out of a network capture. In this example, the password exployed was "3002".

DES bruteforcing is a very well studied problem, and specialized hardware can break any key quickly and at relatively low cost. With a weak key derivation as the above, it becomes much easier. Consider, for example, a long password that only uses the alphabet A-Z. This highly restricts the bit pattern to 010XXXXX, or 5 useful bits per byte, or \(2^{40}\) distinct keys, already smaller a keyspace than bruteforcing a 9-character password (\(\approx 2^{42}\) possibilities).

One can go even further, by noticing that similar products use similar generalID strings. This allows for precomputing to break those: using time-memory tradeoff techniques, an attacker can perform a single large precomputation that allows for  much faster key recovery. Martin Hellman showed that a precomputed table of \(2^{38}\) values allows for a recovery of the key in about \(2^{38}\) iterations, following a \(2^{56}\)-iteration table generation procedure. The state of the art today is rainbow tables, which provide a significant, albeit non-asymptotic, speedup over Hellman's method.

This is it for the old default authentication method. In short, the symmetric authentication method allows for offline password recovery, and to make it even worse, there is no password strong enough to withstand it. In the next post, we will see their public-key solution to the offline cracking problem, and how it was also broken.

Tuesday, March 5, 2013

Codegate YUT Preliminary 2013: bin500

Last weekend, we had the chance to play at the Codegate YUT 2013 preliminary match. The final challenge was in the binary category, and presented us with the following window:

This is a Win32 application that asks as input 6 lotto digits (from the set 0-9a-z), which are processed once you click the "BUY" button. 

The first hurdle to jump through is the anti-debugging and anti-disassembly. There is one thread, running at 0x00401C80, that continuously checks for changes in the main function of the application, such as breakpoints and patches. We disable this thread. There are also some sequences of code sprinkled around that confuse disassemblers:

.text:00402607    push    33h
.text:00402609    call    $+5
.text:0040260E    add     dword ptr [esp], 5
.text:00402612    retf
.text:00402613    call    $+5
.text:00402618    mov     dword ptr [esp+4], 23h
.text:00402620    add     dword ptr [esp], 0Dh
.text:00402624    retf

This is a known trick to jump between x86 and x86_64 mode under WOW64, and back again. Since no code is actually running in 64-bit mode, we can just safely nop out all those sequences and be done with it.

We are now ready to take a look at the main function, starting at 0x004046C0. The function first checks the lotto sequence for a hardcoded correct value, by checking the MD5 of each individual character. It is easy to figure out which character sequence results in the desired lotto: 58p1gt.

Once we get past this check, we see another check. This one matches the lower 8 characters of a modified SHA-1 of each lotto number to the hardcoded string 26c113b16b376f27f86192b1b6c0273aeae2066af4eb6dbb. We simply patch over this check, since the SHA-1 modification used here calls GetTickCount in its compression function; clearly it's meant to be random and to distract us.

 We finally reach the important part. The 6 lotto letters are xored with the string (at 0x0058D6AC) {0x15, 0x59, 0x1c, 0x48, 0x49, 0x6c}, and a 24-byte array is formed:
    unsigned char key[24] = 
        '5', '8', 'p', '1', 'g', 't',
        '5'^0x15, '8'^0x59, 'p'^0x1c, '1'^0x48, 'g'^0x49, 't'^0x6c,
        '5', '8', 'p', '1', 'g', 't',
        'f', '3', 'v', '1', '4', '\0'
This array is then used as the key to decrypt the hardcoded 24-byte block at 0x005B2814. The cipher used here is 192-bit block Rijndael. Not AES, Rijndael — AES is only defined for 128-bit blocks.

The winning key 58p1gt does not give out a meaningful decryption of the hardcoded block. What are we to do? Bruteforce! 6 characters from the allowed 0-9a-z 36-char dictionary means we only have to go through at most ~2^32 combinations. 

We quickly find that the correct lotto key to correctly decrypt the block is 4z9na3, and the flag is "L0tto_1s_noT_bAd_9ame!"

The bruteforcer used to get at the solution is available here.