01-to-08

Set 1

Challenge 1 - Convert hex to base64

The string:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

Should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

So go ahead and make that happen. You'll need to use this code for the rest of the exercises.

Cryptopals Rule

Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

Base64 is a way of encoding any bytes into a printable string. Like hexadecimal encoding, but more efficient since one character in base64 encodes 6 bits ($ 2^6 = 64 $) when one character in hexadecimal encodes 4 bits ($ 2^4 = 16 $). For more information see Wikipedia:Base64.

What is nice with Python is that it has tools in the standard library to manage both hexadecimal and base64, so going from one to the other is easy.

In [1]:
hex_string = '49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d'
In [2]:
# I wrote a simple function that will display a nice "OK" or "ERROR"
# depending on the test given as input
from libmatasano import html_test
In [3]:
# https://docs.python.org/3/library/binascii.html#binascii.unhexlify
from binascii import hexlify, unhexlify
# https://docs.python.org/3/library/base64.html
from base64 import b64encode, b64decode

b64_string = b64encode(unhexlify(hex_string))

html_test(b64_string == b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t')
OK

Easy ! Next challenge.

Challenge 2 - Fixed XOR

Write a function that takes two equal-length buffers and produces their XOR combination.

If your function works properly, then when you feed it the string:

1c0111001f010100061a024b53535009181c

... after hex decoding, and when XOR'd against:

686974207468652062756c6c277320657965

... should produce:

746865206b696420646f6e277420706c6179

"XOR" means the "Exclusive OR" operation: Wikipedia:Exclusive_or. Here we will "XOR" bitstrings together bit-by-bit ("bitwise XOR").

Python does have a builtin operator for bitwise XOR, it's the ^ operator, but it operates on integers.

In [4]:
# the builtin "bin" function give the binary representation of a number
# this makes the effect of bitwise XOR more seeable.
bin(123), bin(127), bin(123 ^ 127)
Out[4]:
('0b1111011', '0b1111111', '0b100')

We will mostly be working on bytes so we need the same function but that takes (and returns) bytes.

In [5]:
def bxor(a, b):
    "bitwise XOR of bytestrings"
    return bytes([ x^y for (x,y) in zip(a, b)])
In [6]:
A = unhexlify('1c0111001f010100061a024b53535009181c')
B = unhexlify('686974207468652062756c6c277320657965')
expected_result = unhexlify('746865206b696420646f6e277420706c6179')

html_test(bxor(A,B) == expected_result)
OK

Challenge 3 - Single-byte XOR cipher

The hex encoded string:

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

... has been XOR'd against a single character. Find the key, decrypt the message.

You can do this by hand. But don't: write code to do it for you.

How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

What they mean here is that they took one byte (one byte is 8 bits, like 01111011 for instance that represents the number 123), and XORed each byte of the message against this "key" byte.

An example:

In [7]:
# our message
msg = b'hi there!'
# let's pick a random number between 0 and 255 for our key; what about 77?
key = b'\x77'
# the byte string we will XOR the message against is usually called the "keystream"
keystream = key*len(msg)
# here is our "ciphertext"
bxor(msg, keystream)
Out[7]:
b'\x1f\x1eW\x03\x1f\x12\x05\x12V'

Note that this method of encryption is equivalent to the famous "caesar cipher" (see Wikipedia:Caesar_cipher) that is presented in every introduction to cryptography. It's ust that this time we are working with bytes instead of letters and that the operation is XOR instead of rotation in the alphabet (both operations are easily computable permutations in their respective domain, which is why it works)

Usually you hear about breaking the caesar cipher by analysing the frequency of letters in the ciphertext and matching it with the frequency of letters in the alphabet.

For the caesar cipher, and the similar one that we are attacking in this challenge, there is a simpler attack: try every single key.

This is because the number of possible keys is very small (256 possibilities in our case, 26 in the "classical" caesar cipher with letters, compared to $ 2^{128} ≈ 3.4 × 10^{38} $ for AES-128 that we will use later in these challenges).

Here is, then, our (very simple) strategy to recover the message: we decrypt the message using every possible key, and we see what message we get each time. The proper key should be the only one giving a meaningful message as output (the rest should give a meaningless mix of bytes).

We then need to know how to decrypt with this cipher given the key that was used for encryption (or simply the key that we want to try).

With the XOR operation, the decryption operation is simply the same as the encryption one. This is because of the following property of the XOR operation: XORing twice by the same value gives back the original value. That is, for all bytes (or bits, or bytestrings) $ A $ and $ B $, we have:

$$ ( A \oplus B ) \oplus B = A $$

Where $ \oplus $ denotes the XOR operation.

That is, to decrypt you just have to re-generate the keystream and XOR the ciphertext against it.

In [8]:
# this is the ciphertext we are supposed to work on
ciphertext = unhexlify('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736')
In [9]:
# we said we will try every byte as the key, so let's start with 1
# (zero as a key would be uninteresting)
candidate_key = bytes([1])
keystream = candidate_key*len(ciphertext)
# that's our candidate keystream:
keystream
Out[9]:
b'\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01'
In [10]:
bxor(ciphertext, keystream)
Out[10]:
b'\x1a66207>y\x14\x1a~*y502<y8y)6,7=y6?y;8:67'

We can see that using 1 as the key does not give a meaningful message, meaning that the key is not 1. But we're not going to check manually every possible key (it goes up to 255), we need an automated test that tells us if the key is the good one or not.

The message is supposed to be english text, so it should have almost only letters.

The isalpha builtin in python would be interesting but it only detects if a bytestring contains only alphabetical characters (see https://docs.python.org/3/library/stdtypes.html#bytes.isalpha)

In [11]:
(b'lol').isalpha()
Out[11]:
True
In [12]:
# the spaces and the question tag are not "alphabetical characters"
(b'how are you?').isalpha()
Out[12]:
False

What we want is the ratio between letters and other symbols: It will be high for text and low otherwise. In ASCII encoding, letters a to z correspond to numbers 97 to 122. We can see that with the ord built-in function:

In [13]:
ord('a'), ord('b'), ord('z')
Out[13]:
(97, 98, 122)

The "space" character is also very frequent in text, might as well count it too. This will increase accuracy and will prove useful later (Challenge 6).

In [14]:
ord(' ')
Out[14]:
32
In [15]:
ascii_text_chars = list(range(97, 122)) + [32]
In [16]:
[ x in ascii_text_chars for x in b'how are you?']
Out[16]:
[True, True, True, True, True, True, True, True, True, True, True, False]
In [17]:
sum([ x in ascii_text_chars for x in b'how are you?'])
Out[17]:
11
In [18]:
def letter_ratio(input_bytes):
    nb_letters = sum([ x in ascii_text_chars for x in input_bytes])
    return nb_letters / len(input_bytes)

def is_probably_text(input_bytes):
    r = letter_ratio(input_bytes)
    return True if r>0.7 else False
In [19]:
is_probably_text(b'Hello, how are you?')
Out[19]:
True
In [20]:
is_probably_text(b'\x1e22643:}\x10\x1ez.}1468}<}-2(39}2;}?<>23')
Out[20]:
False
In [21]:
def attack_single_byte_xor(ciphertext):
    # a variable to keep track of the best candidate so far
    best = None
    for i in range(2**8): # for every possible key
        # converting the key from a number to a byte
        candidate_key = i.to_bytes(1, byteorder='big')
        keystream = candidate_key*len(ciphertext)
        candidate_message = bxor(ciphertext, keystream)
        nb_letters = sum([ x in ascii_text_chars for x in candidate_message])
        # if the obtained message has more letters than any other candidate before
        if best == None or nb_letters > best['nb_letters']:
            # store the current key and message as our best candidate so far
            best = {"message": candidate_message, 'nb_letters': nb_letters, 'key': candidate_key}
    return best

result = attack_single_byte_xor(ciphertext)

print('key:', result['key'])
print('message:', result['message'])
print('nb of letters:', result['nb_letters'])

html_test(is_probably_text(result['message']))
key: b'X'
message: b"Cooking MC's like a pound of bacon"
nb of letters: 30
OK

What an interesting message we have here...

Challenge 4 - Detect single-character XOR

One of the 60-character strings in this file has been encrypted by single-character XOR.

Find it.

(Your code from #3 should help.)

We modify our previous function so that it returns an error in case the ratio of letters and spaces is too low to be a properly decrypted message:

In [22]:
class InvalidMessageException(Exception):
    pass

def attack_single_byte_xor(ciphertext):
    best = {"nb_letters": 0}
    for i in range(2**8):
        candidate_key = i.to_bytes(1, byteorder='big')
        candidate_message = bxor(ciphertext, candidate_key*len(ciphertext))
        nb_letters = sum([ x in ascii_text_chars for x in candidate_message])
        if nb_letters>best['nb_letters']:
            best = {"message": candidate_message, 'nb_letters': nb_letters, 'key': candidate_key}
    
    if best['nb_letters'] > 0.7*len(ciphertext):
        return best
    else:
        raise InvalidMessageException('best candidate message is: %s' % best['message'])

Let's try if it works:

In [23]:
from os import urandom

try:
    # Instead of giving a real encrypted message we give random bytes
    attack_single_byte_xor(urandom(16))
except InvalidMessageException:
    print('Got an InvalidMessageException as expected')
else:
    print('No exception: something is wrong')
Got an InvalidMessageException as expected
In [24]:
with open('data/04.txt') as data_file:
    ciphertext_list = [
        # the 'strip' is to remove the "newline" character
        # which python keeps when reading a file line by line
        unhexlify(line.strip())
        for line in data_file
    ]
In [25]:
candidates = list()
# for the "enumerate" builtin function, see
# https://docs.python.org/3/library/functions.html#enumerate
for (line_nb, ciphertext) in enumerate(ciphertext_list):
    try:
        message = attack_single_byte_xor(ciphertext)['message']
    except InvalidMessageException:
        pass
    else:
        candidates.append({
            'line_nb': line_nb,
            'ciphertext': ciphertext,
            'message': message
        })
        
if len(candidates) > 1:
    print("Error: more than one candidate")
    html_test(false)
else:
    for (key, value) in candidates[0].items():
        print(f'{key}: {value}')
    html_test(is_probably_text(candidates[0]['message']))
line_nb: 170
ciphertext: b'{ZB\x15A]TA\x15A]P\x15ETGAL\x15\\F\x15_@XE\\[R?'
message: b'Now that the party is jumping\n'
OK

Challenge 5 - Implement repeating-key XOR

Here is the opening stanza of an important work of the English language:

Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal

Encrypt it, under the key "ICE", using repeating-key XOR.

In repeating-key XOR, you'll sequentially apply each byte of the key; the first byte of plaintext will be XOR'd against I, the next C, the next E, then I again for the 4th byte, and so on.

It should come out to:

0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272
a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

Encrypt a bunch of stuff using your repeating-key XOR function. Encrypt your mail. Encrypt your password file. Your .sig file. Get a feel for it. I promise, we aren't wasting your time with this.

In [26]:
message = b"Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"
key = b'ICE'
# We have to repeat the key until the keystream is at least as long as the message
# our "bxor" function gives an output as long as the shortest input
# so the output will be as long as the message here
keystream = key*(len(message)//len(key) + 1)

ciphertext = bxor(message, keystream)
expected_result = unhexlify(
    b'0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d6'
    b'3343c2a26226324272765272a282b2f20430a652e2c652a3124'
    b'333a653e2b2027630c692b20283165286326302e27282f'
)

html_test(ciphertext == expected_result)
OK

Challenge 6 - Break repeating-key XOR

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

  • Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.

  • Write a function to compute the edit distance/Hamming distance between two strings.

  • For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.

  • The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.

  • Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.

  • Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.

  • Solve each block as if it was single-character XOR. You already have code to do this. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

Here is an illustration of what we are told to do:

repeating key illustration

Here is the idea:

  • First we have to find out the lenght of the key. This is the purpose of the steps using the "edit distance".

  • When we have the key length, we are able to group together ciphertext bytes that share the same key byte.

  • This result in "blocks" each being a "single-character XOR" which we know how to break since challenge 3.

The most complicated part is probably then the technique to find the key lenght. Why it works is not obvious (at least not to me?), but how it's done is given step by step so we can just follow these steps.

In [27]:
# the bitwise-xor of two strings will have ones
# only in places where the two strings have different bits
# hence, computing the hamming distance of the two strings
# is equivalent to counting the numbers of one (the "hamming weight") of their XOR sum
def hamming_distance(a, b):
    return sum(bin(byte).count('1') for byte in bxor(a,b))
In [28]:
with open("data/06.txt") as file:
        ciphertext = b64decode(file.read())
In [29]:
def score_vigenere_key_size(candidate_key_size, ciphertext):
    # as suggested in the instructions,
    # we take samples bigger than just one time the candidate key size
    slice_size = 2*candidate_key_size

    # the number of samples we can make
    # given the ciphertext length
    nb_measurements = len(ciphertext) // slice_size - 1

    # the "score" will represent how likely it is
    # that the current candidate key size is the good one
    # (the lower the score the *more* likely)
    score = 0
    for i in range(nb_measurements):

        s = slice_size
        k = candidate_key_size
        # in python, "slices" objects are what you put in square brackets
        # to access elements in lists and other iterable objects.
        # see https://docs.python.org/3/library/functions.html#slice
        # here we build the slices separately
        # just to have a cleaner, easier to read code
        slice_1 = slice(i*s, i*s + k)
        slice_2 = slice(i*s + k, i*s + 2*k)

        score += hamming_distance(ciphertext[slice_1], ciphertext[slice_2])

    # normalization: do not forget this
    # or there will be a strong biais towards long key sizes
    # and your code will not detect key size properly
    score /= candidate_key_size
    
    # some more normalization,
    # to make sure each candidate is evaluated in the same way
    score /= nb_measurements

    return score
In [30]:
def find_vigenere_key_length(ciphertext, min_length=2, max_length=30):
    # maybe this code is a bit over-sophisticated
    # it just outputs the key size for wich
    # the score at the "score_vigenere_key_size" function is the *lowest*
    key = lambda x: score_vigenere_key_size(x,ciphertext)
    return min(range(min_length, max_length), key=key)
In [31]:
def attack_repeating_key_xor(ciphertext):
    keysize = find_vigenere_key_length(ciphertext)

    # we break encryption for each character of the key
    key = bytes()
    message_parts = list()
    for i in range(keysize):
        # the "i::keysize" slice accesses elements in an array
        # starting at index 'i' and using a step of 'keysize'
        # this gives us a block of "single-character XOR" (see figure above)
        part = attack_single_byte_xor(bytes(ciphertext[i::keysize]))
        key += part["key"]
        message_parts.append(part["message"])

    # then we rebuild the original message
    # by putting bytes back in the proper order
    # TODO again code may be over-sophisticated and not very readable here
    message = bytes()
    for i in range(max(map(len, message_parts))):
        message += bytes([part[i] for part in message_parts if len(part)>=i+1])

    return {'message':message, 'key':key}
In [32]:
result = attack_repeating_key_xor(ciphertext)
print("key:",result["key"],'\n')
print('message:\n')
print(result["message"].decode())
key: b'Terminator X: Bring the noise' 

message:

I'm back and I'm ringin' the bell 
A rockin' on the mike while the fly girls yell 
In ecstasy in the back of me 
Well that's my DJ Deshay cuttin' all them Z's 
Hittin' hard and the girlies goin' crazy 
Vanilla's on the mike, man I'm not lazy. 

I'm lettin' my drug kick in 
It controls my mouth and I begin 
To just let it flow, let my concepts go 
My posse's to the side yellin', Go Vanilla Go! 

Smooth 'cause that's the way I will be 
And if you don't give a damn, then 
Why you starin' at me 
So get off 'cause I control the stage 
There's no dissin' allowed 
I'm in my own phase 
The girlies sa y they love me and that is ok 
And I can dance better than any kid n' play 

Stage 2 -- Yea the one ya' wanna listen to 
It's off my head so let the beat play through 
So I can funk it up and make it sound good 
1-2-3 Yo -- Knock on some wood 
For good luck, I like my rhymes atrocious 
Supercalafragilisticexpialidocious 
I'm an effect and that you can bet 
I can take a fly girl and make her wet. 

I'm like Samson -- Samson to Delilah 
There's no denyin', You can try to hang 
But you'll keep tryin' to get my style 
Over and over, practice makes perfect 
But not if you're a loafer. 

You'll get nowhere, no place, no time, no girls 
Soon -- Oh my God, homebody, you probably eat 
Spaghetti with a spoon! Come on and say it! 

VIP. Vanilla Ice yep, yep, I'm comin' hard like a rhino 
Intoxicating so you stagger like a wino 
So punks stop trying and girl stop cryin' 
Vanilla Ice is sellin' and you people are buyin' 
'Cause why the freaks are jockin' like Crazy Glue 
Movin' and groovin' trying to sing along 
All through the ghetto groovin' this here song 
Now you're amazed by the VIP posse. 

Steppin' so hard like a German Nazi 
Startled by the bases hittin' ground 
There's no trippin' on mine, I'm just gettin' down 
Sparkamatic, I'm hangin' tight like a fanatic 
You trapped me once and I thought that 
You might have it 
So step down and lend me your ear 
'89 in my time! You, '90 is my year. 

You're weakenin' fast, YO! and I can tell it 
Your body's gettin' hot, so, so I can smell it 
So don't be mad and don't be sad 
'Cause the lyrics belong to ICE, You can call me Dad 
You're pitchin' a fit, so step back and endure 
Let the witch doctor, Ice, do the dance to cure 
So come up close and don't be square 
You wanna battle me -- Anytime, anywhere 

You thought that I was weak, Boy, you're dead wrong 
So come on, everybody and sing this song 

Say -- Play that funky music Say, go white boy, go white boy go 
play that funky music Go white boy, go white boy, go 
Lay down and boogie and play that funky music till you die. 

Play that funky music Come on, Come on, let me hear 
Play that funky music white boy you say it, say it 
Play that funky music A little louder now 
Play that funky music, white boy Come on, Come on, Come on 
Play that funky music 

Challenge 7: AES in ECB mode

The Base64-encoded content in this file has been encrypted via AES-128 in ECB mode under the key

"YELLOW SUBMARINE".

(case-sensitive, without the quotes; exactly 16 characters; I like "YELLOW SUBMARINE" because it's exactly 16 bytes long, and now you do too).

Decrypt it. You know the key, after all.

Easiest way: use OpenSSL::Cipher and give it AES-128-ECB as the cipher.

We will use the excellent cryptography Python library (https://cryptography.io/).

The documentation for AES in ECB mode is there:

https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/

ECB mode is the most straightforward way to encrypt a long message with a block cipher like AES. It simply consists in splitting the message in parts that have exactly the block lenght, and encrypting each part with the key.

However, and as we are going to see it soon in the next challenges, ECB is completely insecure!.

In [33]:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
backend = default_backend()

def decrypt_aes_128_ecb(ctxt, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    decryptor = cipher.decryptor()
    decrypted_data =  decryptor.update(ctxt) + decryptor.finalize()
    # would need some padding stripping actually (we'll see padding later)
    message = decrypted_data
    return message
In [34]:
with open("data/07.txt") as file:
    data = file.read()
    
print(decrypt_aes_128_ecb(
        ctxt = b64decode(data),
        key=b"YELLOW SUBMARINE"
    ).decode())
I'm back and I'm ringin' the bell 
A rockin' on the mike while the fly girls yell 
In ecstasy in the back of me 
Well that's my DJ Deshay cuttin' all them Z's 
Hittin' hard and the girlies goin' crazy 
Vanilla's on the mike, man I'm not lazy. 

I'm lettin' my drug kick in 
It controls my mouth and I begin 
To just let it flow, let my concepts go 
My posse's to the side yellin', Go Vanilla Go! 

Smooth 'cause that's the way I will be 
And if you don't give a damn, then 
Why you starin' at me 
So get off 'cause I control the stage 
There's no dissin' allowed 
I'm in my own phase 
The girlies sa y they love me and that is ok 
And I can dance better than any kid n' play 

Stage 2 -- Yea the one ya' wanna listen to 
It's off my head so let the beat play through 
So I can funk it up and make it sound good 
1-2-3 Yo -- Knock on some wood 
For good luck, I like my rhymes atrocious 
Supercalafragilisticexpialidocious 
I'm an effect and that you can bet 
I can take a fly girl and make her wet. 

I'm like Samson -- Samson to Delilah 
There's no denyin', You can try to hang 
But you'll keep tryin' to get my style 
Over and over, practice makes perfect 
But not if you're a loafer. 

You'll get nowhere, no place, no time, no girls 
Soon -- Oh my God, homebody, you probably eat 
Spaghetti with a spoon! Come on and say it! 

VIP. Vanilla Ice yep, yep, I'm comin' hard like a rhino 
Intoxicating so you stagger like a wino 
So punks stop trying and girl stop cryin' 
Vanilla Ice is sellin' and you people are buyin' 
'Cause why the freaks are jockin' like Crazy Glue 
Movin' and groovin' trying to sing along 
All through the ghetto groovin' this here song 
Now you're amazed by the VIP posse. 

Steppin' so hard like a German Nazi 
Startled by the bases hittin' ground 
There's no trippin' on mine, I'm just gettin' down 
Sparkamatic, I'm hangin' tight like a fanatic 
You trapped me once and I thought that 
You might have it 
So step down and lend me your ear 
'89 in my time! You, '90 is my year. 

You're weakenin' fast, YO! and I can tell it 
Your body's gettin' hot, so, so I can smell it 
So don't be mad and don't be sad 
'Cause the lyrics belong to ICE, You can call me Dad 
You're pitchin' a fit, so step back and endure 
Let the witch doctor, Ice, do the dance to cure 
So come up close and don't be square 
You wanna battle me -- Anytime, anywhere 

You thought that I was weak, Boy, you're dead wrong 
So come on, everybody and sing this song 

Say -- Play that funky music Say, go white boy, go white boy go 
play that funky music Go white boy, go white boy, go 
Lay down and boogie and play that funky music till you die. 

Play that funky music Come on, Come on, let me hear 
Play that funky music white boy you say it, say it 
Play that funky music A little louder now 
Play that funky music, white boy Come on, Come on, Come on 
Play that funky music 


Challenge 8 - Detect AES in ECB mode

In this file are a bunch of hex-encoded ciphertexts.

One of them has been encrypted with ECB.

Detect it.

Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.

Remember: ECB mode splits the message in blocks of 16 bytes (the block size of the cipher we are using) and encrypts each block using the same key. And AES is a deterministic process: running AES with the same key and same message always gives the same output.

Thus, if there are two identical blocks in the message, there will be two identical blocks in the ciphertext.

In some cases, this suffices for an adversary to recover the message directly from the ciphertext. One famous example is this picture of the Linux logo (a penguin) encrypted with ECB mode. Here it is from Wikipedia:

ECB-encrypted image

The long series of white pixels in the background causes a somewhat uniform pattern on the encrypted picture that reveals the silhouette of the penguin.

This is quite an extreme case, and in many situations it's not because the message is encrypted using ECB that you will be able to learn its content instantly just by looking at it. It should be enough, however, to convince you that ECB is insecure as hell: this kind of weakness is not what we expect from a strong encryption mechanism.

But the true purpose of this challenge is to prepare us to mess with ECB encryption to be able to recover any message encrypted under ECB, provided that we have a way to obtain encryptions of messages of our choice.

In [35]:
with open('data/08.txt') as f:
    ctxts = [unhexlify(line.strip()) for line in f]
In [36]:
def has_repeated_blocks(ctxt, blocksize=16):
    '''blocksize is in bytes'''
    if len(ctxt) % blocksize != 0:
        raise Exception('ciphertext length is not a multiple of block size')
    else:
        num_blocks = len(ctxt) // blocksize
        
    blocks = [ctxt[i*blocksize:(i+1)*blocksize] for i in range(num_blocks)]
    
    if len(set(blocks)) != num_blocks:
        return True
    else:
        return False
In [37]:
hits = [ctxt for ctxt in ctxts if has_repeated_blocks(ctxt)]
hits
Out[37]:
[b'\xd8\x80a\x97@\xa8\xa1\x9bx@\xa8\xa3\x1c\x81\n=\x08d\x9a\xf7\r\xc0oO\xd5\xd2\xd6\x9ctL\xd2\x83\xe2\xdd\x05/kd\x1d\xbf\x9d\x11\xb04\x85B\xbbW\x08d\x9a\xf7\r\xc0oO\xd5\xd2\xd6\x9ctL\xd2\x83\x94u\xc9\xdf\xdb\xc1\xd4e\x97\x94\x9d\x9c~\x82\xbfZ\x08d\x9a\xf7\r\xc0oO\xd5\xd2\xd6\x9ctL\xd2\x83\x97\xa9>\xab\x8dj\xec\xd5fH\x91Tx\x9ak\x03\x08d\x9a\xf7\r\xc0oO\xd5\xd2\xd6\x9ctL\xd2\x83\xd4\x03\x18\x0c\x98\xc8\xf6\xdb\x1f*?\x9c@@\xde\xb0\xabQ\xb2\x993\xf2\xc1#\xc5\x83\x86\xb0o\xba\x18j']
In [38]:
html_test(len(hits)==1)
OK