In [16]:
import random
from random import randint
import os
from libmatasano import html_test

Challenge 9 - Implement PKCS#7 padding

A block cipher transforms a fixed-sized block (usually 8 or 16 bytes) of plaintext into ciphertext. But we almost never want to transform a single block; we encrypt irregularly-sized messages.

One way we account for irregularly-sized messages is by padding, creating a plaintext that is an even multiple of the blocksize. The most popular padding scheme is called PKCS#7.

So: pad any block to a specific block length, by appending the number of bytes of padding to the end of the block. For instance,

"YELLOW SUBMARINE"

... padded to 20 bytes would be:

"YELLOW SUBMARINE\x04\x04\x04\x04"
In [17]:
def pkcs7_padding(message, block_size):
    padding_length = block_size - ( len(message) % block_size )
    # the message length is a multiple of the block size
    # we add *a whole new block of padding*
    # (otherwise it would be difficult when removing the padding
    # to guess the padding length)
    if padding_length == 0:
        padding_length = block_size
    padding = bytes([padding_length]) * padding_length
    return message + padding

def pkcs7_strip(data):
    padding_length = data[-1]
    return data[:- padding_length]
In [18]:
html_test(pkcs7_padding(b'YELLOW SUBMARINE', 20) == b'YELLOW SUBMARINE\x04\x04\x04\x04')
OK
In [19]:
for _ in range(5):
    length = randint(20, 70)
    block_size = randint(10, 30)
    msg = os.urandom(length)
    padded_msg = pkcs7_padding(msg, block_size)
    assert len(padded_msg) % block_size == 0
    assert pkcs7_strip(padded_msg)
# if we reached this point everything went fine
html_test(True)
OK

Challenge 10 - Implement CBC mode

CBC mode is a block cipher mode that allows us to encrypt irregularly-sized messages, despite the fact that a block cipher natively only transforms individual blocks.

In CBC mode, each ciphertext block is added to the next plaintext block before the next call to the cipher core.

The first plaintext block, which has no associated previous ciphertext block, is added to a "fake 0th ciphertext block" called the initialization vector, or IV.

Implement CBC mode by hand by taking the ECB function you wrote earlier, making it encrypt instead of decrypt (verify this by decrypting whatever you encrypt to test), and using your XOR function from the previous exercise to combine them.

The file here is intelligible (somewhat) when CBC decrypted against "YELLOW SUBMARINE" with an IV of all ASCII 0 (\x00\x00\x00 &c)

Here's the good ol' "CBC chart":

CBC mode

I disagree with the instructions when they say that you should use "ECB encryption/decryption" to implement CBC mode.

For me, even ECB should use padding, otherwise it's not really a "mode of encryption". Later in challenges we will need to use ECB mode on messages of arbitrary length (this will be when we have to break ECB mode). For this, I want to reserve the name encrypt_aes_128_ecb for a function that has padding.

Now the "AES" block you see in the figure above should not use padding. Remember, with PKCS#7 padding a message of lenght exactly the block size is added a full block of padding. This would mean that $ C_1 $ is twice the size of $ P_1 $ ! This is not at all how padding should be done in CBC mode: First we padd the entire message, then we split it in blocks and each block goes through the block cipher without extra padding.

For this reason I will create a function named encrypt_aes_128_block that represents this "AES" block in the chart above. Unfortunately there is no way to call the AES core directly with the cryptography library, so we have to call ... ECB mode. I admit that's a bit confusing.

In [20]:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
backend = default_backend()

def encrypt_aes_128_block(msg, key):
    '''unpadded AES block encryption'''
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    encryptor = cipher.encryptor()
    return encryptor.update(msg) + encryptor.finalize()

def decrypt_aes_128_block(ctxt, key):
    '''unpadded AES block decryption'''
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    decryptor = cipher.decryptor()
    decrypted_data =  decryptor.update(ctxt) + decryptor.finalize()
    return decrypted_data

# so with this function we need a message of **exactly** 128 bits (16 Bytes)
msg = os.urandom(16)
key = os.urandom(16)
ctxt = encrypt_aes_128_block(msg, key)
msg_2 = decrypt_aes_128_block(ctxt, key)

html_test(msg == msg_2)
OK

Now we implement CBC mode. So recall, each block of plaintext is XORed with the previous block of ciphertext before being AES-encrypted.

Since the first block does not have a "previous ciphertext block", we XOR it against a "nonce" (for "number used once") called an "Initialization Vector" or IV.

We will need a function to split a long bytestring into evenly-sized blocks. To my knowledge, Python does not have a function for that in its standard library.

In [21]:
from math import ceil
def split_bytes_in_blocks(x, blocksize):
    nb_blocks = ceil(len(x)/blocksize)
    return [x[blocksize*i:blocksize*(i+1)] for i in range(nb_blocks)]

We will also need our function bxor from set 1. I will put the functions I have to re-use from one challenge to the other is in a library called libmatasano.

In [22]:
from libmatasano import bxor
In [23]:
def encrypt_aes_128_cbc(msg, iv, key):
    result = b''
    previous_ctxt_block = iv
    padded_ptxt = pkcs7_padding(msg, block_size=16)
    blocks = split_bytes_in_blocks(padded_ptxt, blocksize=16)
    
    for block in blocks:
        to_encrypt = bxor(block, previous_ctxt_block)
        new_ctxt_block = encrypt_aes_128_block(to_encrypt, key)
        result += new_ctxt_block
        # for the next iteration
        previous_ctxt_block = new_ctxt_block
    
    return result

def decrypt_aes_128_cbc(ctxt, iv, key):
    result = b''
    previous_ctxt_block = iv
    blocks = split_bytes_in_blocks(ctxt, blocksize=16)
    
    for block in blocks:
        to_xor = decrypt_aes_128_block(block, key)
        result += bxor(to_xor, previous_ctxt_block)
        assert len(result) != 0
        # for the next iteration
        previous_ctxt_block = block
        
    return pkcs7_strip(result)

for _ in range(5):
    length = randint(5,50)
    msg = os.urandom(length)
    key = os.urandom(16)
    iv = os.urandom(16)
    ctxt = encrypt_aes_128_cbc(msg, iv, key)
    assert decrypt_aes_128_cbc(ctxt, iv, key) == msg
    
html_test(True)
OK

Challenge 11 - An ECB/CBC detection oracle

Now that you have ECB and CBC working:

Write a function to generate a random AES key; that's just 16 random bytes.

Write a function that encrypts data under an unknown key --- that is, a function that generates a random key and encrypts under it.

The function should look like:

encryption_oracle(your-input)
=> [MEANINGLESS JIBBER JABBER]

Under the hood, have the function append 5-10 bytes (count chosen randomly) before the plaintext and 5-10 bytes after the plaintext.

Now, have the function choose to encrypt under ECB 1/2 the time, and under CBC the other half (just use random IVs each time for CBC). Use rand(2) to decide which to use.

Detect the block cipher mode the function is using each time. You should end up with a piece of code that, pointed at a block box that might be encrypting ECB or CBC, tells you which one is happening.

Actually we still don't have a proper ECB mode (with padding). So let's start with this.

In [24]:
def encrypt_aes_128_ecb(msg, key):
    padded_msg = pkcs7_padding(msg, block_size=16)
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    encryptor = cipher.encryptor()
    return encryptor.update(padded_msg) + encryptor.finalize()

def decrypt_aes_128_ecb(ctxt, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    decryptor = cipher.decryptor()
    decrypted_data =  decryptor.update(ctxt) + decryptor.finalize()
    message = pkcs7_strip(decrypted_data)
    return message

for _ in range(5):
    length = randint(5,50)
    msg = os.urandom(length)
    key = os.urandom(16)
    iv = os.urandom(16)
    ctxt = encrypt_aes_128_cbc(msg, iv, key)
    assert decrypt_aes_128_cbc(ctxt, iv, key) == msg
    
html_test(True)
OK

OK now about this oracle thing, well that should be pretty simple. The only way I see how to detect ECB mode is what we did before, that is, you just see if there are duplicated blocks of ciphertext. However to have that happen you need your plaintext to have repeating blocks in the first place. Said differently, on most messages this technique will just not work to detect ECB.

Maybe the idea is that you are an attacker that controls the plaintext being sent? In which case you can send really weird messages, like "AAAAAAAAAA..." so that you are sure to have identical blocks if the oracle uses ECB.

In [25]:
# extra optional parameter if we want to force the mode
def encryption_oracle(message, mode=None):
    key = os.urandom(16)
    random_header = os.urandom(randint(5, 10))
    random_footer = os.urandom(randint(5, 10))
    to_encrypt = random_header + message + random_footer
    
    if mode==None:
        mode = random.choice(['ECB', 'CBC'])
    if mode == 'ECB':
        return encrypt_aes_128_ecb(to_encrypt, key)
    elif mode == 'CBC':
        iv = os.urandom(16)
        return encrypt_aes_128_cbc(to_encrypt, iv, key)
    
from libmatasano import test_ecb_128
    
for _ in range(10):
    mode = random.choice(['ECB', 'CBC'])
    # because of the random header and footer we need more that just 2 blocks of plaintext
    message = b'A'*50
    ctxt = encryption_oracle(message, mode)
    detected_mode = 'ECB' if test_ecb_128(ctxt) else 'CBC'
    assert detected_mode == mode
    
html_test(True)
OK

Challenge 12 - Byte-at-a-time ECB decryption (Simple)

Copy your oracle function to a new function that encrypts buffers under ECB mode using a consistent but unknown key (for instance, assign a single random key, once, to a global variable).

Now take that same function and have it append to the plaintext, BEFORE ENCRYPTING, the following string:

Um9sbGluJyBpbiBteSA1LjAKV2l0aCBteSByYWctdG9wIGRvd24gc28gbXkg
aGFpciBjYW4gYmxvdwpUaGUgZ2lybGllcyBvbiBzdGFuZGJ5IHdhdmluZyBq
dXN0IHRvIHNheSBoaQpEaWQgeW91IHN0b3A/IE5vLCBJIGp1c3QgZHJvdmUg
YnkK

Spoiler alert: Do not decode this string now. Don't do it.

Base64 decode the string before appending it. Do not base64 decode the string by hand; make your code do it. The point is that you don't know its contents.

What you have now is a function that produces:

AES-128-ECB(your-string || unknown-string, random-key)

It turns out: you can decrypt "unknown-string" with repeated calls to the oracle function!

Here's roughly how:

  1. Feed identical bytes of your-string to the function 1 at a time --- start with 1 byte ("A"), then "AA", then "AAA" and so on. Discover the block size of the cipher. You know it, but do this step anyway.
  2. Detect that the function is using ECB. You already know, but do this step anyways.
  3. Knowing the block size, craft an input block that is exactly 1 byte short (for instance, if the block size is 8 bytes, make "AAAAAAA"). Think about what the oracle function is going to put in that last byte position.
  4. Make a dictionary of every possible last byte by feeding different strings to the oracle; for instance, "AAAAAAAA", "AAAAAAAB", "AAAAAAAC", remembering the first block of each invocation.
  5. Match the output of the one-byte-short input to one of the entries in your dictionary. You've now discovered the first byte of unknown-string.
  6. Repeat for the next byte.

Congratulations: This is the first challenge we've given you whose solution will break real crypto. Lots of people know that when you encrypt something in ECB mode, you can see penguins through it. Not so many of them can decrypt the contents of those ciphertexts, and now you can. If our experience is any guideline, this attack will get you code execution in security tests about once a year.

About this:

Lots of people know that when you encrypt something in ECB mode, you can see penguins through it.

See Challenge 8

In [26]:
import base64

DATA_TO_APPEND = base64.b64decode(
    "Um9sbGluJyBpbiBteSA1LjAKV2l0aCBteSByYWctdG9wIGRvd24gc28gbXkg"
    "aGFpciBjYW4gYmxvdwpUaGUgZ2lybGllcyBvbiBzdGFuZGJ5IHdhdmluZyBq"
    "dXN0IHRvIHNheSBoaQpEaWQgeW91IHN0b3A/IE5vLCBJIGp1c3QgZHJvdmUg"
    "YnkK"
)

class ECB_Oracle:
    def __init__(self):
        self.key = os.urandom(16)

    def encrypt(self, msg):
        return encrypt_aes_128_ecb(msg + DATA_TO_APPEND, self.key)

def find_block_size(oracle):
    # to find the block size
    # we encrypt 'A' then 'AA' then 'AAA' and so on
    # until we notice that the beginning of the ciphertext stops changing
    # (this will mean we filled the first block entirely with 'A's
    # and we are now filling the second block)
    current_ctxt = None
    for i in range(2, 20):
        previous_ctxt = current_ctxt or oracle.encrypt(b"A"*1)
        current_ctxt = oracle.encrypt(b"A"*i)
        if previous_ctxt[:4] == current_ctxt[:4]:
            return i-1

oracle = ECB_Oracle()
block_size = find_block_size(oracle)
assert block_size == 16

# Checking we are able to detect ECB mode
# (by encrypting a long list of 'A's)
assert test_ecb_128(oracle.encrypt(b'A'*50))

# Fiding length of payload to extract
# (recall the oracle uses PKCS#7 padding
# so message length is not ciphertext lenght)
# we increase the amount of text until we notice a change in size
# this correspond when our_message + target_message is a multiple of the block size
# (recall, PKCS#7 padding adds a whole block of padding in this case)
def find_payload_length(oracle):
    previous_length = len(oracle.encrypt(b''))
    for i in range(20):
        length = len(oracle.encrypt(b'X'*i))
        if length != previous_length:
            return previous_length - i

Now we are ready to write the core of our attack function. The trick is to insert as message a series of A such that we are in the following situation, where T denote target bytes (the bytes we want to decrypt) (again we use a block length of 4 just for the example):

AAAT TTTT TT...

We store the encryption of the "AAAT" block in variable target_block.

We then ask for the encryption of AAAi for every possible value of i (256 possible values since i is a byte).

Because the oracle is always using the same secret key, when we ask for the encryption of AAAi and i happens to be equal to the first T byte, we will notice that we just got the value we had in the target_block variable. We then know the value of the first T byte (because we know the value of i).

Then we start again the same process making sure that it's the second target byte that's last in a block with only bytes that we know. Noting known target bytes as K (for "known"):

AAKT TTTT TT...

Then we encrypt AAKi for every i, and so on...

Note that at the beginning we are reducing the number of As at every step but at some time we will have to add some of them:

AAAK KKKT TTTT TT...

Where the block we will store and try to re-create is the KKKT block.

In [11]:
# the "cleanest" way to code this I think
# is to have a function that takes the target bytes we already know
# ("known_plaintext" variable)
# and finds one more byte.
def recover_one_more_byte_ecb(oracle, known_plaintext, block_size):
    # padding length must be so that the first unknown character is at the end of a block,
    # that is: p+k+1 = 0 mod B
    # hence: p = (-k-1) mod B
    k = len(known_plaintext)
    padding_length = (-k-1) % block_size
    padding = b"A" * padding_length

    # target block plaintext is the one containing only known characters
    # except its last character.
    # we just count the number of blocks containning only known bytes,
    # and take the next block.
    target_block_number = len(known_plaintext) // block_size
    # again, 'slices' in Pythn are what you put in square brackets
    # to access iterable objects;
    # see https://docs.python.org/3/library/functions.html#slice
    target_slice = slice(target_block_number*block_size,
                         (target_block_number+1)*block_size)
    target_block = oracle.encrypt(padding)[target_slice]

    # trying every possibility for the last byte
    for i in range(256):
        message = padding + known_plaintext + bytes([i])
        block = oracle.encrypt(message)[target_slice]
        if block == target_block:
            # we got it !
            return bytes([i])

def recover_one_byte_at_time_ecb(oracle, block_size):
    known_plaintext = b""
    # again, the ciphertext is the encryption of message + pkcs#7_padding
    # so we don't want to have the pkcs padding in our recovered message
    payload_length = find_payload_length(oracle)
    for _ in range(payload_length):
        new_known_byte += recover_one_more_byte_ecb(oracle, known_plaintext, block_size)
        known_plaintext = known_plaintext + new_known_byte
    return known_plaintext

secret = recover_one_byte_at_time_ecb(oracle, block_size)

print(secret.decode())
Rollin' in my 5.0
With my rag-top down so my hair can blow
The girlies on standby waving just to say hi
Did you stop? No, I just drove by

This challenge was a major step, because this was the first time we truely break a ciphertext (yes there was challenge 6 but come on, breaking repeating-key XOR does not count)

Challenge 13 - ECB cut-and-paste

Write a k=v parsing routine, as if for a structured cookie. The routine should take:

foo=bar&baz=qux&zap=zazzle

... and produce:

{
  foo: 'bar',
  baz: 'qux',
  zap: 'zazzle'
}

(you know, the object; I don't care if you convert it to JSON).

Now write a function that encodes a user profile in that format, given an email address. You should have something like:

profile_for("foo@bar.com")

... and it should produce:

{
  email: 'foo@bar.com',
  uid: 10,
  role: 'user'
}

... encoded as:

email=foo@bar.com&uid=10&role=user

Your "profile_for" function should not allow encoding metacharacters (& and =). Eat them, quote them, whatever you want to do, but don't let people set their email address to "foo@bar.com&role=admin".

Now, two more easy functions. Generate a random AES key, then:

A. Encrypt the encoded user profile under the key; "provide" that to the "attacker". B. Decrypt the encoded user profile and parse it.

Using only the user input to profile_for() (as an oracle to generate "valid" ciphertexts) and the ciphertexts themselves, make a role=admin profile.

In [29]:
class Profile_Manager:
    def __init__(self):
        self.key = os.urandom(16)

    # a static method
    # is like a normal function, it does not depend on the instance;
    # see https://docs.python.org/3/library/functions.html#staticmethod .
    # I could use a normal function
    # but it makes more sense to "attatch it"
    @staticmethod
    def parse(byte_string):
        string = byte_string.decode()
        result = dict(pair.split('=')
                      for pair in string.split('&'))
        return result

    @staticmethod
    def profile_for(email):
        if b"&" in email or b"=" in email:
            raise ValueError("Invalid email address")
        return b"email=" + email + b'&uid=10&role=user'

    def get_encrypted_profile(self, email):
        profile = self.profile_for(email)
        return encrypt_aes_128_ecb(profile, self.key)

    def decrypt_and_parse_profile(self, ctxt):
        profile = decrypt_aes_128_ecb(ctxt, self.key)
        return self.parse(profile)

manager = Profile_Manager()

assert (
    manager.profile_for(b"email@example.com") ==
    b'email=email@example.com&uid=10&role=user'
)

assert (
    manager.parse(b'email=email@example.com&uid=10&role=user') ==
    {'email': 'email@example.com', 'role': 'user', 'uid': '10'}
)

encrypted_profile = manager.get_encrypted_profile(b"email@example.com")
assert (
    manager.decrypt_and_parse_profile(encrypted_profile) ==
    {'email': 'email@example.com', 'role': 'user', 'uid': '10'}
)

Before we do the attack, a quick reminder: encryption does not prevent modifications of the message. For authenticity of the message you need a Message Authentication Code or MAC (typically HMAC) or a signature scheme (RSA signing, DSA etc...).

Nowadays you have what we call authenticated encryption that combine an encryption scheme and a MAC because combining them yourself is hazardous as well.

The take-away is: here we are going to alter the message easily because it's ECB mode, which is terrible; but do not think that you would be safe using CBC or CTR mode: all these modes are for encryption only and are not made to protect the authenticity of the message.

The attack

In [30]:
block_size = 16
# the email address for the admin profile we want to create
# must satisfy a single requirement:
# its length must be so that the role (the "user" value) should be at the start of the last block.
target_email = b"eeeeeeeeeeeemail@attacker.com"
print("using email address:", target_email)
print("blocks:", split_bytes_in_blocks(manager.profile_for(target_email), block_size))
ciphertext_1 = manager.get_encrypted_profile(target_email)
using email address: b'eeeeeeeeeeeemail@attacker.com'
blocks: [b'email=eeeeeeeeee', b'eemail@attacker.', b'com&uid=10&role=', b'user']

With this (weird) email address, we manage to have the value for the "role" at the start of the last block.

Remember that in ECB mode blocks are encrypted indepedently from one another. WHat we want is to replace the last block of this ciphertext with a valid ciphertext block that contains "admin" instead of "user".

Taking PKCS#7 padding into account, what we want is an encryption of "admin" padded to 16 bytes.

In [32]:
# here we fabricate an input so that
# we get an encryption of the plaintext "admin" with correct padding
chosen_plaintext = pkcs7_padding(b"admin", block_size)
fabricated_email = b"nextBlockShouldSt@rt.Here:" + chosen_plaintext
print("using fabricated email:", fabricated_email)
print("blocks:", split_bytes_in_blocks(manager.profile_for(fabricated_email), block_size))
using fabricated email: b'nextBlockShouldSt@rt.Here:admin\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b'
blocks: [b'email=nextBlockS', b'houldSt@rt.Here:', b'admin\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b', b'&uid=10&role=use', b'r']

Again we had to be careful with lengths so that the "admin" value (plus proper padding) is at the beginning of a block

Perfect! now we take our blocks from the first ciphertext and we replace the block containning "user" with our brand new "admin" block from the second ciphertext!

In [20]:
ciphertext_2 = manager.get_encrypted_profile(fabricated_email)
cut_block = ciphertext_2[2*block_size : 3*block_size]
new_ciphertext = ciphertext_1[:-block_size] + cut_block
profile = manager.decrypt_and_parse_profile(new_ciphertext)

print("Profile obtained:")
print(profile)

if profile['role'] == 'admin':
    html_test(True)
using email address: b'eeeeeeeeeeeemail@attacker.com'
blocks: [b'email=eeeeeeeeee', b'eemail@attacker.', b'com&uid=10&role=', b'user']

using fabricated email: b'nextBlockShouldSt@rt.Here:admin\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b'
blocks: [b'email=nextBlockS', b'houldSt@rt.Here:', b'admin\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b', b'&uid=10&role=use', b'r']

Profile obtained:
{'email': 'eeeeeeeeeeeemail@attacker.com', 'uid': '10', 'role': 'admin'}
OK

Again, I will never repeat it enough: Encryption does not provide integrity/authenticity protection!!!