Although it may look simple at first sight to give a definition
of what a random number is, it proves to be quite difficult
in practice.
A random number is a number generated by a process, whose outcome
is unpredictable, and which cannot be subsequentially reliably
reproduced. This definition works fine provided that one has
some kind of a black box – such a black box is usually
called a random number generator – that fulfills this
task.
However, if one were to be given a number, it is simply impossible
to verify whether it was produced by a random number generator
or not. In order to study the randomness of the output of such
a generator, it is hence absolutely essential to consider sequences
of numbers.
It is quite straightforward to define whether a sequence of
infinite length is random or not. This sequence is random if
the quantity of information it contains – in the sense
of Shannon's information theory – is also infinite. In
other words, it must not be possible for a computer program,
whose length is finite, to produce this sequence. Interestingly,
an infinite random sequence contains all possible finite sequences.
Such an infinite sequence does for example contain the Microsoft
Windows source code or the text of the Geneva conventions. Unfortunately,
this definition is not very useful, as it is not possible in
practice to produce and process infinite sequences.
In the case of a finite sequence of numbers, it is formally
impossible to verify whether it is random or not. It is only
possible to check that it shares the statistical properties
of a random sequence – like the equiprobability of all
numbers – but this a difficult and tricky task. To illustrate
this, let us for example consider a binary random number generator
producing sequences of ten bits. Although it is exactly as likely
as any other ten bits sequences, 1 1 1 1 1 1 1 1 1 1 does look
less random than 0 1 1 0 1 0 1 0 0 0.
In order to cope with this difficulty, definitions have been
proposed to characterize "practical" random number
sequences. According to Knuth [1], a sequence of random numbers
is a sequence of independent numbers with a specified distribution
and a specified probability of falling in any given range of
values. For Schneier [2], it is a sequence that has the same
statistical properties as random bits, is unpredictable and
cannot be reliably reproduced. A concept that is present in
both of these definition and that must be emphasized is the
fact that numbers in a random sequence must not be correlated.
Knowing one of the numbers of a sequence must not help predicting
the other ones. Whenever random numbers are mentioned in the
rest of this paper, it will be assumed that they fulfill these
"practical" definitions.
Testing randomness
Statistical randomness tests aim at determining whether a particular
sequence of numbers was produced by a random number generator.
The approach is to calculate certain statistical quantities
and compare them with average values that would be obtained
in the case of a random sequence. These average values are obtained
from calculations performed on the model of an ideal random
number generator. Testing randomness is an empirical task. There
exists numerous tests, each one of them revealing a particular
type of imperfection in a sequence.
One example is the frequency test. In the case of binary sequences,
it focuses on the relative frequency of 1's with respect to
0's. The autocorrelation test, which investigates correlations
between adjacent bits, is another example. A good reference
on randomness testing can be found in [1]. Maurer demonstrated
that all tests can be derived from trying to compress a sequence
[3]. If a given sequence can be compressed, then it is not random.
When put into the perspective of the definition given above
for an infinite random sequence, this observation is natural.
Because of the difficulty of defining what a random number is,
it is essential to choose an adequate generator to produce these
numbers. Moreover, it is safer to have a good understanding
of the underlying randomness generating process.
[1] Knuth, D., The Art of Computer Programming,
Vol. 2, 2nd ed., Addison-Wesley, Reading, (1981).
[2] Schneier, B., Applied Cryptography: Protocols, Algorithms,
and Source Code in C, John Wiley & Sons, New York, (1996).
[3] Maurer, U., "A universal statistical test for random
bits generators", Journal of Cryptology, 5, 89-106 (1992).