Audio Based True Random Number Generator POC

Few days ago I came up with an idea to create a true random number generator based on noise gathered from a cheap microphone attached to my computer. Tests showed that when sampling the microphone, the least significant bit behaves pretty randomly. This lead me to think it might be good source for gathering entropy for a true random number generator.

The base design was to gather the noise from the microphone than apply a process that will make in more uniform and refine its randomness. After some design iterations I came up with a process based on applying a hash function to the noise. Each iteration involves filling block of the hash function from the least significant bits of the microphone output and applying the hash. Each iteration outputs the current hash digest. Assuming the hash function is uniform, this will output a uniformly distributed blocks of bits. Furthermore, because there the previous state of the hash function influences the next digest computation, the process accumulates entropy that can smooth out potentially less random blocks. Because for all commonly used hash function the block size is much larger than the digest size the output can tell much about the current state or any future or past state. This also holds true even if someone can find all pre-images of the hash function as the amount of possible states will be too big.

I’ve built a Python proof of concept (using md5 as a hash function) suitable for Linux.

import hashlib
import struct


class GRandom:
    def __init__(self):
        self.audio = open("/dev/dsp","rb")
        self.hash = hashlib.md5()
    
    def get_raw_block(self):
        buffer = self.audio.read(self.hash.block_size*8)
        bytes = struct.unpack("%iB"%len(buffer), buffer)

        longs = []
        for i in range(self.hash.block_size/4):
            temp = 0
            for b in bytes[i*32:(i+1)*32]:
                temp = (temp << 1) ^ (b & 1)
            longs.append(temp)
        return struct.pack("%iI"%len(longs), *longs)

    def get_block(self):
        self.hash.update(self.get_raw_block())
        return self.hash.digest()

The amount of generated bits per second is given by (sample rate)*(digest size)/(block size). So for 8KHz (default) sampling rate and md5 we’ll get a theoretical speed of 2000b/s. SHA type hashes have higher digest to block size ration thus may result in higher speeds. Another source of speed up may be to change the sample rate of the microphone. But setting it too high may have negative effects on the entropy. The code may get a considerable performance gain by porting it to c/c++, as it uses both bit manipulations and calculates hashes. Anyways, even the Python implementation’s speed allows us it be used for many cases where true randomness is required, such as generating passwords.

Update 2010-08-07 I’ve corrected the speed calculation.

3 thoughts on “Audio Based True Random Number Generator POC”

  1. I have the same idea few years back and use it for
    my Phd thesis. I have written a paper on it since
    2007.Below are the first two without using any hash
    function.
    Nur Azman Abu and Zulkiflee Muslim, Random Number Generation for Cryptographic Key, Proceedings International Conference on Engineering and ICT, ICEI 2007, 27–28 November 2007, Melaka, Malaysia, Volume 1, pp 255–260.
    Nur Azman Abu and Zulkiflee Muslim, Random Room Noise for Cryptographic Key, Proceedings IEEE International Conference on Digital Ecosystem and Technologies DEST2008, 27–29 February 2008, Phitsanulok, Thailand, pp381–387.

    If you are interested in my research I will be glad to
    share with you.

    Thank you

    Nur Azman

  2. Hi Nur,

    I’ve read the abstract of both articles and they sound very interesting. Especially the part on statistical analysis of the random room noise. It would be great if you could share with me your findings.

    The reason I chose to use a hash function on the raw noise is that the raw noise wasn’t evenly distributed enough. By using hash I can guarantee (to some extent) a good statistical behavior.

  3. Nice to see you reply promptly.
    Audio LSB must be at least 16-bit per sample.
    Image LSB must be at least 14-bit per sample.
    It does not matter the sampling frequency. In fact
    the higher is the better. Using hash function would just
    discredit the randomness of the “air ambience”.
    3 years ago I have tested 2^20 bit and they passed all NIST random tests. At the moment I am working on multiple megabit per second ambience random number
    generator.
    So far it looks very promising. I have written several papers on this topic. You can browse them will see some more soon on the net.

    Thank you

    Nur Azman

Leave a Reply

Your email address will not be published. Required fields are marked *