24

Challenge 24: Create the MT19937 stream cipher and break it¶

You can create a trivial stream cipher out of any PRNG; use it to generate a sequence of 8 bit outputs and call those outputs a keystream. XOR each byte of plaintext with each successive byte of keystream.

Write the function that does this for MT19937 using a 16-bit seed. Verify that you can encrypt and decrypt properly. This code should look similar to your CTR code.

Use your function to encrypt a known plaintext (say, 14 consecutive 'A' characters) prefixed by a random number of random characters.

From the ciphertext, recover the "key" (the 16 bit seed).

Use the same idea to generate a random "password reset token" using MT19937 seeded from the current time.

Write a function to check if any given password token is actually the product of an MT19937 PRNG seeded with the current time.

In [1]:

from math import log2

from libmatasano import MT19937_32

MT19937_32 outputs 32-bits numbers, and we need bytes (4 bits) to operate on. So we're going to split every 32-bit number into 4 bytes.

In [2]:

def mt19937_keystream_generator(key):
    assert log2(key) <= 16
    prng = MT19937_32(key)
    while True:
        rand_nb = next(prng)
        # 32 bits = 4 bytes
        yield from rand_nb.to_bytes(4, byteorder='big')

In [3]:

def transform_mt19937(msg, key):
    assert isinstance(key, int)
    keystream = mt19937_keystream_generator(key)
    
    return bytes([ x^y for (x,y) in zip(msg, keystream)])

In [4]:

key = 1234
ctxt = transform_mt19937(b'hello there!', key)
ctxt

Out[4]:

b'Yb\x07C\x10F\x96\xbb\xfa0\xe0\x07'

In [5]:

transform_mt19937(ctxt, key)

Out[5]:

b'hello there!'

Breaking the Cipher¶

When I first saw that we had to "crack the MT19937 stream cipher", I thought we were going to re-use the previous attack: encrypting known plaintext gives you access to the corresponding bytes of the keystream, and with enough keystream bytes you can clone the state with the technique from the previous challenge. Then, you are able to re-generate the keystream for all bytes after the known plaintext, so you can decrypt whatever was appended to the known plaintext.

Turns out, that's not what we are supposed to do here: there is no plaintext appended to our chosen plaintext (here it's prepended), and they ask us to recover the key. Also I was surprised to see that they suggest to generate just 14 bytes of known plaintext! That's way too little to do state cloning.

I was thus trying to see if even with a part of the state I could re-wind the recurrence relation and the initialization procedure up to the seed. And it wasn't easy...

But! It's actually far more simple than that: brute force. The number of possible keys is super small so you can just try them all. They even insist on it in the instructions: the cipher is using a 16-bits seed. It's kind of weird actually because from the specifications of MT19937 the seed seems to be 32 bits. Well even 32 bits should be small enough to crack, it would just take longer.

It's a bit disapointing to use brute force this far in the challenges...

In [6]:

import os
from random import randint

from libmatasano import html_test

MAX_SEED = (1<<16) - 1

key = randint(1, MAX_SEED)

prefix_size = randint(2,20)
prefix = os.urandom(prefix_size)

ctxt = transform_mt19937(prefix + b'A'*14, key)

for i in range(1, MAX_SEED):
    p = transform_mt19937(ctxt, i)
    if p.endswith(b'A'*14):
        result = i
        break
else:
    html_test(False)
    
html_test(result==key)

Breaking the "password reset token"¶

In [7]:

from time import time

In [8]:

def gen_token():
    seed = int(time()) & MAX_SEED
    pseudo_random_byte_gen = mt19937_keystream_generator(seed)
    token = bytes(next(pseudo_random_byte_gen) for _ in range(16))
    return token

In [9]:

token = gen_token()
token

Out[9]:

b'[\xd95(\xb3EO\x06^<3/\xfc\xe7\xa9\x1f'

In [10]:

def is_generated_with_mt19937(token):
    for i in range(1, MAX_SEED):
        pseudo_random_byte_gen = mt19937_keystream_generator(i)
        guess = bytes(next(pseudo_random_byte_gen) for _ in range(16))
        if guess == token:
            return True
    else:
        return False

In [11]:

assert is_generated_with_mt19937(token)
assert not is_generated_with_mt19937(os.urandom(16))
html_test(True)

A bit boring. Or is there something I'm missing ?