Thursday, January 8, 2015

OpenSSL's squaring bug, and opportunistic formal verification

OpenSSL's latest round of security advisories is out today, and besides the usual memory safety issues—seemingly endemic in its DTLS implementation—there is also an interesting bug in multiprecision arithmetic. In this post I will describe how this bug could have been avoided, in the security community's sacred tradition of simple post facto solutions.

The bug

We can look at this OpenSSL commit for the bug details. I will focus on one of the affected functions, mul_add_c2; what follows applies similarly to the other one. Here's the diff:

 #define mul_add_c2(a,b,c0,c1,c2) {     \
        BN_ULONG ta=(a),tb=(b),t0;      \
        BN_UMULT_LOHI(t0,t1,ta,tb);     \
-       t2 = t1+t1; c2 += (t2<t1)?1:0;  \
-       t1 = t0+t0; t2 += (t1<t0)?1:0;  \
-       c0 += t1; t2 += (c0<t1)?1:0;    \
+       c0 += t0; t2 = t1+((c0<t0)?1:0);\
        c1 += t2; c2 += (c1<t2)?1:0;    \
+       c0 += t0; t1 += (c0<t0)?1:0;    \
+       c1 += t1; c2 += (c1<t1)?1:0;    \

The issue here is incorrect carry propagation. This results in a very hard to spot bug, unlikely to be caught by randomized testing, especially with 64-bit limbs.

Luckily, this bug does not seem to be easily exploited to perform bug or invalid point point attacks, but the next time we may not be so lucky. Additionally, this is the kind of bug that the often-touted solution to OpenSSL's security problems—memory-safe languages like Rust, Go, OCaml and so on—is powerless against. What's the answer?

Avoiding the bug

The obvious answer to avoiding subtle bugs like the above is, of course, more auditing. But auditing is not cheap; people with both the ability to understand and spot flaws in delicate cryptographic code are rare and hard to come by. 

By now, the open-source community has also been shown to be lackluster in its capability to find these mistakes; Eric Raymond's “given enough eyeballs, all bugs are shallow” aphorism has proved to be an overstatement, and catastrophic vulnerabilities linger for years, and sometimes decades, unidentified.

One answer is formal verification. Proving code correct has obvious advantages; this process often also highlights flaws in the programmer's thinking (e.g., what is actually being proved does not match what is intended). Formal verification tools have come a long way, but they are still very hard to use. The verification of a Curve25519 scalar multiplication implementation is still A Big Deal; Adam Langley has also written on the state of formal verification on the wild.

While Langley concerns himself with a slightly different issue, ensuring limbs do not overflow on unsaturated arithmetic, our predicament here suggests another, conceptually simpler approach: (dis)prove that mul_add_c2, which uses saturated arithmetic and overflows limbs by definition, is equivalent to \((c_2 2^{64} + c_1 2^{32} + c_0) + 2ab\). For this we'll use an SMT solver, in this case Z3 for convenience, directly.

For bite-sized functions like mul_add_c2 the situation is not so bad. Suppose that the programmer, concurrently with the C code, had also written the following Python code, using z3py:

from z3 import *

W = 32 # word length

# t0 + t1*2**W = a * b
def BN_UMULT_LOHI(a, b):
  prod = ZeroExt(W, a) * ZeroExt(W, b)
  t0 = Extract(1*W-1,  0, prod)
  t1 = Extract(2*W-1,  W, prod)
  return t0, t1

# b ? 1 : 0
def COND(b):
  return If(b, BitVecVal(1, W), BitVecVal(0, W))

a, b, c0, c1, c2 = BitVecs("a b c0 c1 c2", W)
ref = Concat(c2, c1, c0) + 2 * ZeroExt(2*W, a) * ZeroExt(2*W, b)

t0, t1 = BN_UMULT_LOHI(a, b)
t2 = t1+t1; c2 += COND(ULT(t2,t1));
t1 = t0+t0; t2 += COND(ULT(t1,t0));
c0 += t1; t2 += COND(ULT(c0,t1));
c1 += t2; c2 += COND(ULT(c1,t2));

prove( Concat(c2, c1, c0) == ref )

Running this program immediately tells us not only that the function is incorrect, but also gives us an example of an input where the function fails! Better yet, this is verifiable by anyone with the appropriate software, instead of solely by domain experts.

$ python 
[c2 = 3774871552,
 b = 2567394084,
 a = 3592503423,
 c0 = 4044407142,
 c1 = 3283914074]

Note that while we are using Z3 here for its convenient Python bindings, we can easily export the expression to the standard SMTLIB2 format instead, and use any solver we like. Plugging in the corrected code,

t0, t1 = BN_UMULT_LOHI(a, b)
c0 += t0; t2 = t1 + COND(ULT(c0, t0));
c1 += t2; c2 += COND(ULT(c1, t2));
c0 += t0; t1 += COND(ULT(c0, t0));
c1 += t1; c2 += COND(ULT(c1, t1));

we can now see that Z3 finds no problem with the code:

$ python 

Success! Now, obviously this requires more developer time; but the code is mostly shared between the real implementation and the verification, and the size of the functions to be proved correct is rather limited. Yet we believe this is still a net gain against easy to make mistakes, which can stay hidden for a long time, even in the best implementations. Considering OpenSSL's status as critical Internet infrastructure, it seems warranted to go the extra mile on routines like this.

Theorem provers are often useful in our software auditing approach at Kryptos Logic; they are faster than most humans at finding failure cases for tedious functions, and leave us free to reason about other parts of the system.


Somebody points out that the proof is incorrect; the issue here is that we are working over the bitvector logic, which makes our proof implicitly modulo \(2^{96}\). The underlying assumption here, which we also made, is that the result fits into the 3 output words. This is reasonable in the context in which the function is used, but without context it does make the proof incorrect. An easy way to correct this is to add a few bits of slack to account for overflow:

ref = ZeroExt(W, Concat(c2, c1, c0)) + 2 * ZeroExt(3*W, a) * ZeroExt(3*W, b)
# ...
prove( ZeroExt(W, Concat(c2, c1, c0)) == ref )

This results in a failure now!

$ python 
[c2 = 4294967295,
 b = 12788566,
 a = 4294967288,
 c0 = 2786395306,
 c1 = 4282404864]

We can resolve this by adding a constraint on the maximum size of the \( c \) at input:

assumption = ULT(Concat(c2, c1, c0), 2**(3*W-1))
ref = ZeroExt(W, Concat(c2, c1, c0)) + 2 * ZeroExt(3*W, a) * ZeroExt(3*W, b)
# ...
solve( ZeroExt(W, Concat(c2, c1, c0)) != ref, assumption )

Now the proof works again under the assumption that \(c < 2^{95}\) (we use z3py's solve function here to use multiple constraints; the prove function is implemented in terms of solve).

$ python 
no solution