Challenge 17: The CBC padding oracle

This is the best-known attack on modern block-cipher cryptography.

Combine your padding code and your CBC code to write two functions.

The first function should select at random one of the following 10 strings:

MDAwMDAwTm93IHRoYXQgdGhlIHBhcnR5IGlzIGp1bXBpbmc=
MDAwMDAxV2l0aCB0aGUgYmFzcyBraWNrZWQgaW4gYW5kIHRoZSBWZWdhJ3MgYXJlIHB1bXBpbic=
MDAwMDAyUXVpY2sgdG8gdGhlIHBvaW50LCB0byB0aGUgcG9pbnQsIG5vIGZha2luZw==
MDAwMDAzQ29va2luZyBNQydzIGxpa2UgYSBwb3VuZCBvZiBiYWNvbg==
MDAwMDA0QnVybmluZyAnZW0sIGlmIHlvdSBhaW4ndCBxdWljayBhbmQgbmltYmxl
MDAwMDA1SSBnbyBjcmF6eSB3aGVuIEkgaGVhciBhIGN5bWJhbA==
MDAwMDA2QW5kIGEgaGlnaCBoYXQgd2l0aCBhIHNvdXBlZCB1cCB0ZW1wbw==
MDAwMDA3SSdtIG9uIGEgcm9sbCwgaXQncyB0aW1lIHRvIGdvIHNvbG8=
MDAwMDA4b2xsaW4nIGluIG15IGZpdmUgcG9pbnQgb2g=
MDAwMDA5aXRoIG15IHJhZy10b3AgZG93biBzbyBteSBoYWlyIGNhbiBibG93

... generate a random AES key (which it should save for all future encryptions), pad the string out to the 16-byte AES block size and CBC-encrypt it under that key, providing the caller the ciphertext and IV.

The second function should consume the ciphertext produced by the first function, decrypt it, check its padding, and return true or false depending on whether the padding is valid.

What you're doing here: This pair of functions approximates AES-CBC encryption as its deployed serverside in web applications; the second function models the server's consumption of an encrypted session token, as if it was a cookie.

It turns out that it's possible to decrypt the ciphertexts provided by the first function.

The decryption here depends on a side-channel leak by the decryption function. The leak is the error message that the padding is valid or not.

You can find 100 web pages on how this attack works, so I won't re-explain it. What I'll say is this:

The fundamental insight behind this attack is that the byte 01h is valid padding, and occur in 1/256 trials of "randomized" plaintexts produced by decrypting a tampered ciphertext.

02h in isolation is not valid padding.

02h 02h is valid padding, but is much less likely to occur randomly than 01h.

03h 03h 03h is even less likely.

So you can assume that if you corrupt a decryption AND it had valid padding, you know what that padding byte is.

It is easy to get tripped up on the fact that CBC plaintexts are "padded". Padding oracles have nothing to do with the actual padding on a CBC plaintext. It's an attack that targets a specific bit of code that handles decryption. You can mount a padding oracle on any CBC block, whether it's padded or not.

In [1]:
import os
import random
from random import randint
from libmatasano import (
    encrypt_aes_128_cbc, decrypt_aes_128_cbc,
    PaddingError, split_bytes_in_blocks, bxor,
    cbc_xor, BLOCK_SIZE, pkcs7_strip, html_test
)
from base64 import b64decode
In [2]:
class Oracle:
    messages = list(map(b64decode, [
        b'MDAwMDAwTm93IHRoYXQgdGhlIHBhcnR5IGlzIGp1bXBpbmc=',
        b'MDAwMDAxV2l0aCB0aGUgYmFzcyBraWNrZWQgaW4gYW5kIHRoZSBWZWdhJ3MgYXJlIHB1bXBpbic=',
        b'MDAwMDAyUXVpY2sgdG8gdGhlIHBvaW50LCB0byB0aGUgcG9pbnQsIG5vIGZha2luZw==',
        b'MDAwMDAzQ29va2luZyBNQydzIGxpa2UgYSBwb3VuZCBvZiBiYWNvbg==',
        b'MDAwMDA0QnVybmluZyAnZW0sIGlmIHlvdSBhaW4ndCBxdWljayBhbmQgbmltYmxl',
        b'MDAwMDA1SSBnbyBjcmF6eSB3aGVuIEkgaGVhciBhIGN5bWJhbA==',
        b'MDAwMDA2QW5kIGEgaGlnaCBoYXQgd2l0aCBhIHNvdXBlZCB1cCB0ZW1wbw==',
        b'MDAwMDA3SSdtIG9uIGEgcm9sbCwgaXQncyB0aW1lIHRvIGdvIHNvbG8=',
        b'MDAwMDA4b2xsaW4nIGluIG15IGZpdmUgcG9pbnQgb2g=',
        b'MDAwMDA5aXRoIG15IHJhZy10b3AgZG93biBzbyBteSBoYWlyIGNhbiBibG93',
    ]))
    
    def __init__(self):
        self.key = os.urandom(16)
        
    def encrypt(self):
        # here "self.messages" is a class attribute
        # see https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy
        msg = random.choice(self.messages)
        iv = os.urandom(16)
        ctxt = encrypt_aes_128_cbc(msg, iv, self.key)
        return ctxt, iv
    
    def check_padding(self, cryptogram):
        ctxt, iv = cryptogram
        try:
            decrypt_aes_128_cbc(ctxt, iv, oracle.key)
            return True
        except PaddingError:
            return False
In [3]:
oracle = Oracle()
In [4]:
ctxt, iv = oracle.encrypt()
cryptogram = {'ctxt': ctxt, 'iv': iv}
In [5]:
oracle.check_padding((cryptogram['ctxt'], cryptogram['iv']))
Out[5]:
True
In [6]:
oracle.check_padding((os.urandom(16), cryptogram['iv']))
Out[6]:
False

As we already did, we will take an example with a block size of 4 for simplicity.

The message will be lipsum, that is 6 characters encoded as 6 bytes, and noted M1 M2 M3 M4 M5 M6. PKCS#7 padding for this message with a block size of 4 requires to add two bytes of padding, so the padded messa eg will be

M1 M2 M3 M4 | M5 M6 P  P

Where a "pipe" character | denotes the block limits and where the P byte is equal to 2

Here is how we can recover the last message byte M6 using a "padding oracle", that is some external way to know if the padding if the underlying message is correct or not.

Remember with CBC mode we are able to flip bits in the underlying plaintext by flipping bits in the ciphertext. This makes us capable of XORing the plaintext with an arbitrary value (XORing is equivalent to bit flipping) as long as we are only affecting bytes in a same block.

First of all this gives us a very simple way to tell how many bytes of PKCS#7 padding bytes there are at the end of the plaintext: You use CBC bitflipping to alter the very last byte. It is very probable that the resulting message will have an invalind padding.

    M1 M2 M3 M4 | M5 M6 P  P
⊕                          X
=   M1 M2 M3 M4 | M5 M6 P  P'

unless P' = 1, oracle will give a Padding Error

To remove the probability of not getting a padding error when you are altering a padding byte, you can just re-test with a different value X: you know the byte you are altering is part of the padding if you got a padding error with at least one value of X.

You then alter the second-to-last byte of plaintext, see if the padding oracle raises a padding error, and so on. When you don't get a padding error whatever value you use for X, it means that the byte you are altering was not part of the padding.

    M1 M2 M3 M4 | M5 M6 P  P
⊕                    X      
=   M1 M2 M3 M4 | M5 ?? P  P

no padding error, whatever the value of 'X' is
In [7]:
for i in range(BLOCK_SIZE):
    altered_cryptogram = cbc_xor(cryptogram, pad=b'\xff', index=len(ctxt)-1-i)
    
    if oracle.check_padding((altered_cryptogram['ctxt'],
                             altered_cryptogram['iv'])):
        # checking with another value
        altered_cryptogram = cbc_xor(cryptogram, pad=b'\x11', index=len(ctxt)-1-i)
        
        if oracle.check_padding((altered_cryptogram['ctxt'],
                                 altered_cryptogram['iv'])):
            # OK this time we are sure
            padding_length = i
            break
else:
    # having padding errors at all iteration
    # indicates that the last block is all padding
    # thatis, message length was a multiple of block size
    padding_length = BLOCK_SIZE

print('padding length:', padding_length)

# checking we got it right
expected = BLOCK_SIZE - len(decrypt_aes_128_cbc(ctxt, iv, oracle.key)) % BLOCK_SIZE
html_test(padding_length == expected)
padding length: 8
OK

The technique to recover the message bytes is quite similar.

We will use bitflipping in order to perform the following operation:

    M1 M2 M3 M4 | M5 M6 P  P
⊕                    X1 X2 X3
=   M1 M2 M3 M4 | M5 P' P' P'

Such that the resulting message is properly padded, that is in our example P' = 3.

X2 and X3 are easy to compute: because we know the padding lenght we know the values of the original padding bytes P, and we know the value of the padding bytes P' we want to get after bit flipping (P' = P+1). Thus X2 = X3 = P ⊕ P'.

What we don't know is which value of X1 will result in a proper padding. However we simply have to try every single value and give the result to the padding oracle: when we don't get a padding error, it means that we found the proper byte X1.

    M1 M2 M3 M4 | M5 M6 P  P
⊕                    i  X2 X3
=   M1 M2 M3 M4 | M5 ?? P' P'

Padding Error unless M6 ⊕  i = P',
that is, i = X1

It is now easy to infer the value of M6:

M6 = P' ⊕  X1

We can then repeat the same process to learn message byte M5, trying to find new values X1 X2 X3 X4 such that

    M1 M2 M3 M4 | M5 M6 P  P
⊕                 X1 X2 X3 X4
=   M1 M2 M3 M4 | P' P' P' P'

Where P' is 4 this time in our example. Again here we will only have to try every posibility for X1 because we already know the proper values for X2 X3 X4 (recall we know M6 now)

Then, there are a couple details to pay attention to to be able to recover the whole message in every situation: When we know a whole block of plaintext, we have to remove it because the paddig oracle will never look at bytes in the second-to-last block.

Say we already recovered M6 and M5, now we are looking for X1 such that the following operation results in a message with proper padding:

    M1 M2 M3 M4
⊕            X1
=   M1 M2 M3 P'

(that is we want to obtain P' = 1).

How do we do that ? Same as before: we try every possible value for X1 and give the result to the padding oracle. If we don't get a padding error, it can be one of the following situations:

  • we got a last byte of value 1: this is what we wanted
  • we got a last byte of value 2 and it happens that the message byte M2 has a value of 2 as well
  • we got a last byte of value 3 and it happens that the message bytes M2 and M3 both have a value of 3 as well
  • etc …

What do we do to make sure we are in the first case and not in the other ones? Well first the first case is much more likely than the others, so we could just accept to have an attack that will not work every now and then.

TODO mechanism to ensure 100% attack success rate

In [8]:
known_bytes = bytes([padding_length])*padding_length

def recover_one_more_byte(cryptogram, known_bytes):
    ctxt = cryptogram['ctxt']
    iv = cryptogram['iv']
    
    # we drop ciphertext blocks which plaintext is known entirely
    # the index of the target byte (the one we are going to mess with)
    # is computed from the end of the remaining ciphertext blocks
    # (negative index)
    # with K for known bytes, T for target, X for the rest:
    # XXXX XXTK KKKK => target index will be -2
    nb_blocks_to_drop = len(known_bytes) // BLOCK_SIZE
    target_byte_index = - (len(known_bytes) % BLOCK_SIZE) - 1

    # I ♥ slices!
    # note the "or None" so that if "nb_blocks_to_drop" is zero
    # we get a slice with no end (goes to the end of the iterable)
    not_dropped = slice(0, -nb_blocks_to_drop*BLOCK_SIZE or None)

    l = len(known_bytes[not_dropped])

    for i in range(256):
        pad = bytes([i]) + bxor(known_bytes[not_dropped], bytes([l+1])*(l))
        cryptogram_to_alter = {'ctxt': ctxt[not_dropped], 'iv': iv}
        cryptogram_altered = cbc_xor(cryptogram_to_alter, pad, target_byte_index)
        if oracle.check_padding((cryptogram_altered['ctxt'], cryptogram_altered['iv'])):
            target_byte_value = (l+1) ^ i
            return bytes([target_byte_value])
    else:
        err_msg = 'all values triggered a padding error'
        if l == 2:
            err_msg += '; probably a mistake when recovering last byte of the block'
        raise Exception(err_msg)
        
# quick sanity check
assert recover_one_more_byte(cryptogram, known_bytes) == bytes([decrypt_aes_128_cbc(ctxt, iv, oracle.key)[-1]])

while len(known_bytes) < len(ctxt):
    new_byte = recover_one_more_byte(cryptogram, known_bytes)
    known_bytes = new_byte + known_bytes

pkcs7_strip(known_bytes, BLOCK_SIZE)
Out[8]:
b"000001With the bass kicked in and the Vega's are pumpin'"