## Friday, March 30, 2012

### Invisible threats and RSA

There has been much fuss regarding the recent Lenstra et al's survey on X.509 certificates (see also Nadia Heninger's independent, but similar, results), where it was shown (among other things) that over 20000 out of 6 million RSA keys out there have no security, since their public keys are easily factored.

This is a visible threat: given a set of public keys, we can tell whether they are broken. Today I want to talk about invisible threats, i.e., threats that are undetectable to anyone but their perpetrator. We often know them as "backdoors."

Young and Yung formalized the concept of such backdoors as "Secretly Embedded Trapdoors with Universal Protection", or SETUPs. A SETUP is a modification of a cryptosystem that behaves very similarly, but its output is easily breakable by an attacker (and no one else!) A SETUP must also be (polynomially) indistinguishable from a valid output of said cryptosystem.

#### Small roots of polynomials

What does smooth integer finding, error correction, and code-breaking have in common? They can all be represented as finding "small" roots of some integer polynomial (modulo n) and, in turn, they all come down to a lattice reduction problem.

Coppersmith, in his 1996 paper, showed how to use lattice reduction as a tool to attack RSA. In particular, he showed how given half of the bits of a public key's ($$N = pq$$) factor (obtained via, say, a side-channel attack), you can find the full factor, and thus break the key. You do this by modelling the problem as a polynomial: given $$p'$$, an approximation of $$p$$ within a reasonable distance of $$p$$, e.g., $$<2^{256}$$, find small roots of

$f(x) = x + p' \pmod{p}.$

Since we do not actually know $$p$$, we use the next best thing, $$N$$, a multiple thereof. The algorithm itself works by "lifting" the polynomial onto the integers, and using the well-known LLL (co-authored by A. K. Lenstra, incidentally) approach to solving said integer polynomial. LLL is fast in the "polynomial-time"-sense: given a basis of dimension $$d$$ with $$n$$-dimensional vectors with norm at most $$B$$, LLL finds a short (but not the shortest; that problem, SVP, is NP-hard) basis in $$d^{3+o(1)}n(d + \log B) \log B$$ bit operations, or roughly $$O(d^5 \log B)$$. This means bases with some hundreds of vectors of (say) 1024-bit integers will not be fast to reduce in practice, and may require hours or days to reduce.

Luckily, the Coppersmith method doesn't require very large bases (usually dimension is under 100, for 1024-bit RSA; we'll get to that later), and is often pretty fast in practice. Now, let's see the actual backdoor in practice.

### An RSA backdoor

A very simple RSA SETUP was given by Crépeau and Slakmon, relying on Coppersmith's method to factor vulnerable keys. What it does is simply to embed approximately half of the bits of one of the factors of the RSA key, in the key itself. Here's some SAGE code to do this:

def rsa_bd_keygen(k):
"""Generate backdoored RSA key with k bits. Tested with k = 1024"""
slack = 32 # k//4 + slack bits stored
beta = 0.5 # p = n^beta
e = 65537
p = random_prime(2^(k//2), lbound=ceil(sqrt(2)*2^(k//2-1)), proof=False)
q_ = random_prime(2^(k//2), lbound=ceil(sqrt(2)*2^(k//2-1)), proof=False)
n_ = p*q_

tau = (n_ // 2^(7*k//8)) % 2^(k//8)
mu  = (p // 2^(k//4 - slack)) % 2^(k//4 + slack)
lam = n_ % 2^(5*k//8 - slack)

n = tau*2^(7*k//8) + mu*2^(5*k//8 - slack) + lam
q = n//p; q += (1^^q&1)

while gcd(e, q-1) > 1 or not q.is_prime():
m = ZZ.random_element(2^(k//8 - slack - 10)) & -2
q = q^^
n = p*

d = e.inverse_mod(n)
return (p,q,n,e,d)

The above code returns an RSA modulus in which the middle $$k/4 + \text{slack}$$ bits belong to the factor $$p$$, and the remaining bits are randomly generated. This is not yet a SETUP: the moduli $$n$$ are distinguishable from random, by using Coppersmith's method to obtain $$p$$ from its upper bits. To turn this into a SETUP, one has to hide the bits, by using, e.g., encryption on the partial $$p$$ information. In case our generator is a blackbox, this can be regular symmetric encryption; when the users have access to the key generation procedure, it is best to use public key methods to hide it - Young and Yung have made a living out of coming up with such methods.

To break the key generated above, we can too use SAGE:
def rsa_bd_break(n, k, eps=0.5/8):
tau = (n // 2^(7*k//8)) % 2^(k//8)
mu = (n // 2^(5*k//8 - slack)) % 2^(k//4 + slack)
print "Solving..."
P.<x> = PolynomialRing(Zmod(n))
f = x + p_*2^(k//4 - slack)
# set_verbose(2)
while True:
roots = f.small_roots(X=2^(k//4 - slack), beta=0.5, epsilon=eps)
if len(roots) > 0:
print hex(ZZ(roots[0]))
return ZZ(roots[0])
print "Failed with eps = " + str(eps)
if eps/2 < epsthresh:
print "Quitting..."
break
print "Retrying with eps = " + str(eps/2)
eps /= 2
return None

As long as we have more than half the bits of $$p$$ in $$p'$$, we can find the full $$p$$ in polynomial time. This does not mean it can't be slow. I have added the slack bits to increase the efficiency of the root finding, requiring the reduction of a smaller basis¹. The eps argument is used to control the size of the lattice to reduce --- the smaller it is, the larger the lattice basis will be (and the better the chances to finding a solution).

Here's an example execution:

Generating RSA key... Done.
Original p = 11183581128574189163983793036002083746501559137516652528752284765046234959066928998151315453810766744898339995298871471424016984721754805133501061552181043
Solving...
Failed with eps = 0.0625000000000000
Retrying with eps = 0.0312500000000000
Recovered p = 11183581128574189163983793036002083746501559137516652528752284765046234959066928998151315453810766744898339995298871471424016984721754805133501061552181043

SAGE factors RSA-1024 bit keys, with 272 bits of $$p$$ embedded, very quickly, usually under a second.

#### Summary

The grandfather of this sort of method is, of course, Ken Thompson's legendary talk "Reflections on Trusting Trust". The moral there was that you can't really trust code you did not fully create yourself²; in our case, it is similar: you cannot trust unreviewed, proprietary cryptographic code that behaves like a blackbox. It could be doing anything!

Backdoors like this one, be them in hardware or software, can be avoided by systematically auditing and reviewing code. If you're a vendor, you can increase your perceived trustworthiness by opening up critical routines such as key generation. As a client, you should demand that such code be thoroughly reviewed either by you, or appropriate third-parties. In today's uncertain times, when intrusions come from increasingly powerful adversaries, one can never be too careful.

[1] Lattice basis reduction, while polynomial-time, can be quite slow in practice, with complexity ranging from cubic to quintic.
[2] And even then, you may shoot yourself in the foot.

## Tuesday, March 20, 2012

### More fun with C++ metaprogramming

Since you're here, you have most certainly seen my last C++-related post, wherein I implemented compile-time AES encryption using C++ templates.

Today, I bring you CubeHash. Now, you might be wondering of what use is a compile-time implementation of CubeHash. Template metaprogramming is used here for a very specific task: to compute the initial state.

The CubeHash specification dictates that the initial 1024-bit (or 32-word) state is computed from the parameters h (output hash bit length), b (the number of bytes to mix with the state each round), and r (number of rounds) as follows:

• Place the 32-bit value h/8 in state[0];
• Place the 32-bit value b in state[1];
• Place the 32-bit value r in state[2];
• Fill state[3] through state[31] with 0;
• Apply the CubeHash round function 10r times.
To compute the initial CubeHash state, then, we have to be able to compute the full round function. And that's exactly what we did. The following C++ header file implements all possible variants of CubeHash easily, since all that changes between them are the h, b and r parameters. The implementation itself is based on the reference simple implementation, available from SUPERCOP. Switching to a faster SSE2 implementation is trivial.

Here it is: CubeHash. Have fun.