Challenge 27: Recover the key from CBC with IV=Key

Take your code from the CBC exercise and modify it so that it repurposes the key for CBC encryption as the IV.

Applications sometimes use the key as an IV on the auspices that both the sender and the receiver have to know the key already, and can save some space by using it as both a key and an IV.

Using the key as an IV is insecure; an attacker that can modify ciphertext in flight can get the receiver to decrypt a value that will reveal the key.

The CBC code from exercise 16 encrypts a URL string. Verify each byte of the plaintext for ASCII compliance (ie, look for high-ASCII values). Noncompliant messages should raise an exception or return an error that includes the decrypted plaintext (this happens all the time in real systems, for what it's worth).

Use your code to encrypt a message that is at least 3 blocks long:

AES-CBC(P_1, P_2, P_3) -> C_1, C_2, C_3

Modify the message (you are now the attacker):

C_1, C_2, C_3 -> C_1, 0, C_1

Decrypt the message (you are now the receiver) and raise the appropriate error if high-ASCII is found.

As the attacker, recovering the plaintext from the error, extract the key:

P'_1 XOR P'_3

The oracle will be using the key as the IV (on top of using it as the key), and we are asked to recover the key.

Let's try to find the attack by ourselves as if we didn't read the instructions.

We will begin assuming we have a full decryption oracle available that will always return the decrypted plaintext.

Since we have a decryption oracle, let's draw the CBC decryption chart where we give a three-blocks ciphertext C1 C2 C3 and the oracle answers with plaintext P1 P2 P3:

(TODO)

Since the key $ K $ is used as IV, we have $ P_1 = Dec(C_1) \oplus K $. So we are almost there: if we manage to get the value of $ Dec(C_1) $, we are done.

This makes us want to feed $ C_1 $ to one of the $ Dec() $ blocks other than the first one, because the result we would get would then be $ P_i = Dec(C_1) \oplus C_{i-1} $ for some $ i $ and we will know $ C_{i-1} $ because we are the ones giving all the $ C_{i-1} $ values to the oracle.

Even better, we can choose what this $ C_{i-1} $ value will be, and the best option for us is $ C_{i-1} = 0 $ in which case we simply get $ P_i = Dec(C_1) $ and we are able to get the key.

So here the attack would consists in:

  • pick random $ C_1 $
  • feed the decryption oracle with ciphertext $ C_1 ~||~ 0 ~||~ C_3 $ where $ C_3 = C_1 $
  • compute the key as $ K = P_1 \oplus C_3 $

But wait, we don't have a complete decryption oracle, right? the instructions say that the oracle should "Verify each byte of the plaintext for ASCII compliance", and raise an exception if there is a non-ASCII byte found.

Except that when such non-ASCII byte is found, then the oracle does output the plaintext.

Since we are pretty free regarding the value of $ C_1 $, we know we will find a value where the decrypted plaintext will happen to contain non-ASCII bytes. If it does not happen on first attempt, just pick a new $ C_1 $ and try again.

I am almost disapointed: at first I thought that their oracle would prevent us from having a direct access to the plaintext and that we would have to find a way around that.

In [1]:
import os

from libmatasano import encrypt_aes_128_cbc, decrypt_aes_128_cbc

class Oracle:
    def __init__(self):
        self.key = os.urandom(16)
        
    def encrypt(self, msg):
        return encrypt_aes_128_cbc(msg, iv=self.key, key=self.key)
    
    def decrypt(self, ctxt):
        ptxt = decrypt_aes_128_cbc(ctxt, iv=self.key, key=self.key)
        try:
            ptxt.decode(encoding='ascii')
        except UnicodeDecodeError:
            # in the real world the oracle would output an error message
            # with 'ptxt' encoded in hexadecimal or base64
            # and we would have to convert it
            return ptxt
    
oracle = Oracle()
In [2]:
c1 = os.urandom(16)
oracle.decrypt(c1 + b'\x00'*16 + c1)
---------------------------------------------------------------------------
PaddingError                              Traceback (most recent call last)
<ipython-input-2-d432f2186fc3> in <module>
      1 c1 = os.urandom(16)
----> 2 oracle.decrypt(c1 + b'\x00'*16 + c1)

<ipython-input-1-9a7ccfd9df8c> in decrypt(self, ctxt)
     11 
     12     def decrypt(self, ctxt):
---> 13         ptxt = decrypt_aes_128_cbc(ctxt, iv=self.key, key=self.key)
     14         try:
     15             ptxt.decode(encoding='ascii')

~/repos/cryptopals/challenges/libmatasano.py in decrypt_aes_128_cbc(ctxt, iv, key)
    180         padded_msg += bxor(mask, tmp)
    181         mask = enc_block
--> 182     return pkcs7_strip(padded_msg, block_size)
    183 
    184 for _ in range(4):

~/repos/cryptopals/challenges/libmatasano.py in pkcs7_strip(x, block_size)
     88 
     89     if padding_size > block_size:
---> 90         raise PaddingError('illegal last byte (greater than block size)')
     91     if padding_size == 0:
     92         raise PaddingError('illegal last byte (zero)')

PaddingError: illegal last byte (greater than block size)

Oho, padding error ! Didn't see that one coming.

Well again we can just try with a different value for $ C_1 $ and on average we should not have padding errors at least one time out of 256 (when $ P_3 $ ends with bytes \x01).

But there is a more efficient way. We can use the fact that we have an encryption oracle as well, in order to get a $ C_1 $ block which we know ends with a proper PKCS#7 padding.

This is probably why they tell us in the instructions to start by encrypting some plaintext first, and use the plaintext blocks we got to make our input to the decryption oracle.

In [3]:
p1 = os.urandom(16)
p2 = os.urandom(16)
p3 = os.urandom(16)

from libmatasano import split_bytes_in_blocks, PaddingError, bxor, html_test

c1, c2, c3, c4 = split_bytes_in_blocks(oracle.encrypt(p1+p2+p3),
                                   blocksize=16)

for i in range(0xFF):
    try:
        ptxt = oracle.decrypt(c1 + b'\x00'*15 + bytes([i]) + c1)
        break
    except PaddingError:
        continue
else:
    print('Error: always get a padding error')

pp1, pp2, pp3 = split_bytes_in_blocks(ptxt,
                                      blocksize=16)
In [4]:
html_test(oracle.key == bxor(bxor(pp1, pp3), b'\x00'*15 + bytes([i])))
ERROR

What? Did we forgot something again? Let's cheat a little bit and compare what we got with the answer:

In [5]:
bxor(oracle.key, bxor(bxor(pp1, pp3), b'\x00'*15 + bytes([i])))
Out[5]:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'

So we're almost there, but there is a small error at the end of the key.

This is because of PKCS#7 padding again. If there is not padding error it means (with high probablity) that the decrypted value (before padding removal) was ending with a \x01 byte. (See challenge 17 on this).

Then the padding removal function removed the last byte, which is equivalent to replacing it with \x00 (our bxor function pads the shortest input with \x00 to match the length of the longuest one)

Replacing a \x01 byte with a \x00 one is equivalent to XORing it with \x01, and we have the following:

$$ P'_3 = Dec(C'_1) \oplus C'_2 \oplus 0x000...01 $$

Which explains the small difference we observe.

We are thus able to get the key reliably the following way:

In [6]:
html_test(oracle.key == bxor(bxor(pp1, pp3), b'\x00'*15 + bytes([i^1])))
OK

This has a very small chance of not working, if the abscence of padding error was caused by another valid padding than \x01 (for instance \x02\x02).