This is the best-known attack on modern block-cipher cryptography.
Combine your padding code and your CBC code to write two functions.
The first function should select at random one of the following 10 strings:
MDAwMDAwTm93IHRoYXQgdGhlIHBhcnR5IGlzIGp1bXBpbmc= MDAwMDAxV2l0aCB0aGUgYmFzcyBraWNrZWQgaW4gYW5kIHRoZSBWZWdhJ3MgYXJlIHB1bXBpbic= MDAwMDAyUXVpY2sgdG8gdGhlIHBvaW50LCB0byB0aGUgcG9pbnQsIG5vIGZha2luZw== MDAwMDAzQ29va2luZyBNQydzIGxpa2UgYSBwb3VuZCBvZiBiYWNvbg== MDAwMDA0QnVybmluZyAnZW0sIGlmIHlvdSBhaW4ndCBxdWljayBhbmQgbmltYmxl MDAwMDA1SSBnbyBjcmF6eSB3aGVuIEkgaGVhciBhIGN5bWJhbA== MDAwMDA2QW5kIGEgaGlnaCBoYXQgd2l0aCBhIHNvdXBlZCB1cCB0ZW1wbw== MDAwMDA3SSdtIG9uIGEgcm9sbCwgaXQncyB0aW1lIHRvIGdvIHNvbG8= MDAwMDA4b2xsaW4nIGluIG15IGZpdmUgcG9pbnQgb2g= MDAwMDA5aXRoIG15IHJhZy10b3AgZG93biBzbyBteSBoYWlyIGNhbiBibG93
... generate a random AES key (which it should save for all future encryptions), pad the string out to the 16-byte AES block size and CBC-encrypt it under that key, providing the caller the ciphertext and IV.
The second function should consume the ciphertext produced by the first function, decrypt it, check its padding, and return true or false depending on whether the padding is valid.
What you're doing here: This pair of functions approximates AES-CBC encryption as its deployed serverside in web applications; the second function models the server's consumption of an encrypted session token, as if it was a cookie.
It turns out that it's possible to decrypt the ciphertexts provided by the first function.
The decryption here depends on a side-channel leak by the decryption function. The leak is the error message that the padding is valid or not.
You can find 100 web pages on how this attack works, so I won't re-explain it. What I'll say is this:
The fundamental insight behind this attack is that the byte 01h is valid padding, and occur in 1/256 trials of "randomized" plaintexts produced by decrypting a tampered ciphertext.
02h in isolation is not valid padding.
02h 02h is valid padding, but is much less likely to occur randomly than 01h.
03h 03h 03h is even less likely.
So you can assume that if you corrupt a decryption AND it had valid padding, you know what that padding byte is.
It is easy to get tripped up on the fact that CBC plaintexts are "padded". Padding oracles have nothing to do with the actual padding on a CBC plaintext. It's an attack that targets a specific bit of code that handles decryption. You can mount a padding oracle on any CBC block, whether it's padded or not.
import os
import random
from random import randint
from libmatasano import (
encrypt_aes_128_cbc, decrypt_aes_128_cbc,
PaddingError, split_bytes_in_blocks, bxor,
cbc_xor, BLOCK_SIZE, pkcs7_strip, html_test
)
from base64 import b64decode
class Oracle:
messages = list(map(b64decode, [
b'MDAwMDAwTm93IHRoYXQgdGhlIHBhcnR5IGlzIGp1bXBpbmc=',
b'MDAwMDAxV2l0aCB0aGUgYmFzcyBraWNrZWQgaW4gYW5kIHRoZSBWZWdhJ3MgYXJlIHB1bXBpbic=',
b'MDAwMDAyUXVpY2sgdG8gdGhlIHBvaW50LCB0byB0aGUgcG9pbnQsIG5vIGZha2luZw==',
b'MDAwMDAzQ29va2luZyBNQydzIGxpa2UgYSBwb3VuZCBvZiBiYWNvbg==',
b'MDAwMDA0QnVybmluZyAnZW0sIGlmIHlvdSBhaW4ndCBxdWljayBhbmQgbmltYmxl',
b'MDAwMDA1SSBnbyBjcmF6eSB3aGVuIEkgaGVhciBhIGN5bWJhbA==',
b'MDAwMDA2QW5kIGEgaGlnaCBoYXQgd2l0aCBhIHNvdXBlZCB1cCB0ZW1wbw==',
b'MDAwMDA3SSdtIG9uIGEgcm9sbCwgaXQncyB0aW1lIHRvIGdvIHNvbG8=',
b'MDAwMDA4b2xsaW4nIGluIG15IGZpdmUgcG9pbnQgb2g=',
b'MDAwMDA5aXRoIG15IHJhZy10b3AgZG93biBzbyBteSBoYWlyIGNhbiBibG93',
]))
def __init__(self):
self.key = os.urandom(16)
def encrypt(self):
# here "self.messages" is a class attribute
# see https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy
msg = random.choice(self.messages)
iv = os.urandom(16)
ctxt = encrypt_aes_128_cbc(msg, iv, self.key)
return ctxt, iv
def decrypt(self, cryptogram):
ctxt, iv = cryptogram
decrypt_aes_128_cbc(ctxt, iv, oracle.key)
# the oracle does not return the result of the decryption
# but it will raise an exception if padding stripping failed.
oracle = Oracle()
ctxt, iv = oracle.encrypt()
cryptogram = {'ctxt': ctxt, 'iv': iv}
oracle.decrypt((cryptogram['ctxt'], cryptogram['iv']))
try:
oracle.decrypt((os.urandom(16), cryptogram['iv']))
raise Exception('A padding error was expected')
except PaddingError:
print('Got a padding error just as expected')
As we already did, we will take an example with a block size of 4 just for the sake of simplicity.
The message will be 6 characters encoded as 6 bytes
(it could be the word “attack” for instance),
and noted M1 M2 M3 M4 M5 M6
.
PKCS#7 padding for this message with a block size of 4
requires to add two bytes of padding,
so the padded messa eg will be
M1 M2 M3 M4 | M5 M6 P P
Where a "pipe" character |
denotes the block limits
and where the P
byte is equal to 0x02
Here is how we can recover the last message byte M6
using a "padding oracle", that is some external way to know
if the padding if the underlying message is correct or not.
Remember with CBC mode we are able to flip bits in the underlying plaintext
by flipping bits in the ciphertext.
This makes us capable of XORing the plaintext with an arbitrary value
(XORing is equivalent to bit flipping)
as long as we are only affecting bytes in a same block.
We will do this using the cbc_xor
function which we created in the previous challenges
and put in our libmatasano.py
file.
First of all this gives us a very simple way to tell how many bytes of PKCS#7 padding bytes there are at the end of the plaintext: You use CBC bitflipping to alter the very last byte. It is very probable that the resulting message will have an invalind padding.
M1 M2 M3 M4 | M5 M6 P P
⊕ X
= M1 M2 M3 M4 | M5 M6 P P'
unless P' = 1, oracle will give a Padding Error
To remove the probability of not getting a padding error
when you are altering a padding byte,
you can just re-test with a different value X
:
you know the byte you are altering is part of the padding
if you got a padding error with at least one value of X
.
You then alter the second-to-last byte of plaintext,
see if the padding oracle raises a padding error, and so on.
When you don't get a padding error
whatever value you use for X
,
it means that the byte you are altering was not part of the padding.
M1 M2 M3 M4 | M5 M6 P P
⊕ X
= M1 M2 M3 M4 | M5 ?? P P
no padding error, whatever the value of 'X' is
for i in range(BLOCK_SIZE):
altered_cryptogram = cbc_xor(cryptogram, pad=b'\xff', index=len(ctxt)-1-i)
try:
oracle.decrypt((altered_cryptogram['ctxt'], altered_cryptogram['iv']))
# trying with another value just to be sure
altered_cryptogram = cbc_xor(cryptogram, pad=b'\x11', index=len(ctxt)-1-i)
oracle.decrypt((altered_cryptogram['ctxt'], altered_cryptogram['iv']))
# If we reach this line without any padding error
# We are pretty sure that the byte we are messing with
# is *not* part of the padding
padding_length = i
break
except PaddingError:
# It seems that the byte we are messing with
# is part of the padding;
# let's move on to the next byte
continue
else:
# having padding errors at all iteration
# indicates that the last block is all padding,
# that is, message length was a multiple of block size
padding_length = BLOCK_SIZE
print('padding length:', padding_length)
# checking we got it right
expected = BLOCK_SIZE - len(decrypt_aes_128_cbc(ctxt, iv, oracle.key)) % BLOCK_SIZE
html_test(padding_length == expected)
The technique to recover the message bytes is quite similar.
We will use bitflipping in order to perform the following operation:
M1 M2 M3 M4 | M5 M6 P P
⊕ X1 X2 X3
= M1 M2 M3 M4 | M5 P' P' P'
Such that the resulting message is properly padded,
that is in our example P' = 3
.
X2
and X3
are easy to compute:
because we know the padding lenght
we know the values of the original padding bytes P
,
and we know the value of the padding bytes P'
we want to get after bit flipping (P' = P+1
).
Thus, it is easy for us to compute X2 = X3 = P ⊕ P'
.
What we don't know is which value of X1
will result in a proper padding.
We just have to try every single value
and give the result to the padding oracle:
when we don't get a padding error,
it means that we found the proper byte X1
.
M1 M2 M3 M4 | M5 M6 P P
⊕ i X2 X3
= M1 M2 M3 M4 | M5 ?? P' P'
Padding Error unless M6 ⊕ i = P',
that is, i = X1
It is now easy to infer the value of M6:
M6 = P' ⊕ X1
We can then repeat the same process
to learn message byte M5
,
trying to find new values X1 X2 X3 X4
such that
M1 M2 M3 M4 | M5 M6 P P
⊕ X1 X2 X3 X4
= M1 M2 M3 M4 | P' P' P' P'
Where P'
is equal to 0x04
this time in our example.
Again here we will only have to try every posibility for X1
because we already know the proper values for X2 X3 X4
(recall we know M6
now)
Then, there are a couple details to pay attention to to be able to recover the whole message in every situation: When we know a whole block of plaintext, we have to remove it because the paddig oracle will never look at bytes in the second-to-last block.
Say we already recovered M6
and M5
,
now we are looking for X1
such that
the following operation results in a message with proper padding:
M1 M2 M3 M4
⊕ X1
= M1 M2 M3 P'
(that is we want to obtain P' = 1
).
How do we do that ?
Same as before:
we try every possible value for X1
and give the result to the padding oracle.
If we don't get a padding error,
it can be one of the following situations:
1
: this is what we wanted2
and it happens that the message byte M3
has a value of 2
as well3
and it happens that message bytes M2
and M3
both have a value of 3
as wellWhat do we do to make sure we are in the first case and not in the other ones? Well first the first case is much more likely than the others, so we could just accept to have an attack that will not work every now and then.
(TODO mechanism to ensure 100% attack success rate)
known_bytes = bytes([padding_length])*padding_length
def recover_one_more_byte(cryptogram, known_bytes):
ctxt = cryptogram['ctxt']
iv = cryptogram['iv']
# we drop ciphertext blocks which plaintext is known entirely
# the index of the target byte (the one we are going to mess with)
# is computed from the end of the remaining ciphertext blocks
# (negative index)
# with K for known bytes, T for target, X for the rest:
# XXXX XXTK KKKK => target index will be -2
nb_blocks_to_drop = len(known_bytes) // BLOCK_SIZE
target_byte_index = - (len(known_bytes) % BLOCK_SIZE) - 1
# I ♥ slices!
# note the "or None" so that if "nb_blocks_to_drop" is zero
# we get a slice with no end (goes to the end of the iterable)
not_dropped = slice(0, -nb_blocks_to_drop*BLOCK_SIZE or None)
l = len(known_bytes[not_dropped])
for i in range(256):
pad = bytes([i]) + bxor(known_bytes[not_dropped], bytes([l+1])*(l))
# ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# the "X1" in the "X2", "X3" etc... in the explanation
# the explanation
cryptogram_to_alter = {'ctxt': ctxt[not_dropped], 'iv': iv}
cryptogram_altered = cbc_xor(cryptogram_to_alter, pad, target_byte_index)
try:
oracle.decrypt((cryptogram_altered['ctxt'], cryptogram_altered['iv']))
# If we reach this line padding was okay,
# so we know exactly what value we reached through bit flipping,
# and we can deduce the original value of the byte we were messing with
target_byte_value = (l+1) ^ i
return bytes([target_byte_value])
except PaddingError:
# We still haven't found a value for "X1"
# which results in a valid padding
continue
else:
err_msg = 'all values triggered a padding error'
if l == 2:
err_msg += '; probably a mistake when recovering last byte of the block'
raise Exception(err_msg)
# quick sanity check
assert recover_one_more_byte(cryptogram, known_bytes) == bytes([decrypt_aes_128_cbc(ctxt, iv, oracle.key)[-1]])
while len(known_bytes) < len(ctxt):
new_byte = recover_one_more_byte(cryptogram, known_bytes)
known_bytes = new_byte + known_bytes
pkcs7_strip(known_bytes, BLOCK_SIZE)
This attack was first described by Serge Vaudenay in 2002 in the following paper: