By , on

I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:

While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.

The program prints the numbers using a C command like

```printf("%f\n", normal(0, 1));
```

By default, the `%f` format string outputs numbers with a precision of six significant digits:

```-0.641062
1.116142
1.417036
0.337435
-0.310383
...
```

Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10%, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!

This is an excerpt from Jochen's blog.
Newer entry: Wisent version 0.6.1 released
Older entry: scary phone message