A pseudo-random
sequence of
numbers (as opposed to a random sequence) has some properties to it that are
not random. Randomness can be looked at in the following ways:
- Chi Squared: This test is the test used in looking at randomness of data. It measures the deviation of a sample from the expectation. In a purely random sample of data, it would be expected that the numbers be evenly random. If you roll a fair six sided dice 60 times, you would expect to get 10 of each number. If one gets extremely high values or extremely low values on the chi-squared test it means that it is skewed somewhere.
For the data run that I did, it was 204,800 bytes. One would expect to see approximately 800 of each valued byte. This was not the case. The lower 128 values had a mean of 999. The upper 128 had a mean of 600. This is clearly not random.
Chi square distribution for 204800 samples is 12954.37, and randomly would exceed this value 0.01 percent of the times.
The low 8 bits of rand() is very non-random:
Chi square distribution for 500000 samples is 0.01, and randomly would exceed this value 99.99 percent of the times.
A truly random source of numbers looks like (this example is from radioactive decay):
Chi square distribution for 32768 samples is 237.05, and randomly would exceed this value 75.00 percent of the times.
- Arithmetic mean: Take all the bytes in the file, sum them up, and divide it by the file length. Each byte can have a value between 0 and 255, thus the expected mean is '127.5'.
As you can probably guess, the mean of this entire value is on the low side:
Arithmetic mean value of data bytes is 111.5296 (127.5 = random).
- Monte Carlo value for Pi: This test takes every 24 bit as an X and Y location inside a square. The distance is then calculated from the center. If it is within the radius then it is counted as a "hit". The percentage of hits can then be used to calculate pi.
Monte Carlo value for Pi is 3.488002813 (error 11.03 percent).
For a truly random source generated by radioactive delay, a 32768 byte file approximates:
Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).
- Serial Correlation Coefficient: This test measures the nature of each byte depending on the previous byte. For a random sequence, this value will be close to zero. A non-random sequence such as a text file will provide a number close to '0.5'. Bitmaps will approach 1.
Serial correlation coefficient is -0.052083 (totally uncorrelated = 0.0).
Running these tests for yourself can show that
rand() is not a true
random number generator, especially the count of each
byte and seeing that while the numbers
look random, they are not perfectly random, thus pseudo-random.
Source code used for the above tests follows:
#include <stdlib.h>
#include <stdio.h>
main()
{
int i,r;
srand(1);
for(i = 0; i < 512000; i++)
{
r = rand();
printf("%c%c%c%c",
(r & 0xff000000) >> 24,
(r & 0x00ff0000) >> 16,
(r & 0x0000ff00) >> 8,
r & 0x000000ff);
}
}
The program to generate these values and tests:
http://www.fourmilab.ch/random/