The usual way to think about the problem is to consider the probability $\overline{p}$ of the opposite event we're interested in: that there is

*no*birthday"collision" at all within a group of $k$ people. This, we're said (without really explaining why), is easier to calculate, and it follows from it that we can retrieve the probability $p$ of our goal event (i.e. that there is

*at least one*"collision") as $1 - \overline{p}$.

The above-mentioned essay discusses at length this idea and the underlying principles, but it also casually challenges the reader with an interesting side problem: find a way to calculate $p$

*directly*, that is, without relying on the probability of the opposite event. Being a programmer, I decided to tackle this problem not from the principles, but from the opposite direction, by trying to derive understanding from "procedural tinkering".

In the first version of this article, I had derived a way of screening the $k$-permutations out of the cartesian power set $N^k$. I thought this was the answer, but someone on the Cryptography Class discussion forum helped me understand that this was actually only a rearrangement of the indirect computation (i.e. $p = 1 - \overline{p}$). The correct way to compute $p$ directly should rather involve the sum of:

- the chance of a collision occurring
*only*on the second choice - the chance of a collision occurring
*only*on the third choice - ...
- the chance of a collision occurring
*only*on the $k$-th choice

from __future__ import division from itertools import product n = 10 k = 5 prev_colls = set() p_coll = 0 def findPrevColl(p, j): for i in range(2, j): if p[:i] in prev_colls: return True return False for j in range(2, k+1): n_colls = 0 count = 0 for p in product(range(n), repeat=j): if len(set(p)) < j and not findPrevColl(p, j): n_colls += 1 prev_colls.add(p) count += 1 print 'at k = %d, P_k = %f' % (j, n_colls / count) p_coll += n_colls / count print 'sum(P_k) = %f' % p_coll # verify result with the "indirect" formula import operator from numpy.testing import assert_approx_equal assert_approx_equal(p_coll, 1 - reduce(operator.mul, range(n-k+1, n+1)) / n ** k) # at k = 2, P_k = 0.100000 # at k = 3, P_k = 0.180000 # at k = 4, P_k = 0.216000 # at k = 5, P_k = 0.201600 # sum(P_k) = 0.697600

`itertools`

module's product
function to generate the cartesian powers of $k$ (i.e. all possible
trials of length $k$) and the
`set`

data structure for easy past and current collision
detection. Once fully convinced that this was working, the obvious
next step was to derive an analytic formulation for it. By studying
some actual values from my program, I figured out that the probability
for a collision occurring *only*on the $k$-th choice should be: \[ P_{only}(n, k) = \frac{\left(\frac{n! \cdot (k-1)}{(n-k+1)!}\right)}{n^k}\] meaning that the total, direct probability of at least one collision is the sum: \[ P_{any}(n, k) = \sum_k{\frac{\left(\frac{n! \cdot (k-1)}{(n-k+1)!}\right)}{n^k}}\] As this wasn't still fully satisfying, because it doesn't yield an intuitive understanding of what's happening, the same helpful person on the Cryptography forum offered an equivalent, but much better rewrite: \[ P_{only}(n, k) = \left(\frac{n}{n} \cdot \frac{n-1}{n} \cdot \cdot \cdot \frac{n-k+2}{n}\right) \cdot \left(\frac{k-1}{n}\right)\] which can be easily understood by imagining a bucket of $n$ elements: the chance of a collision happening exactly at choice $k$ is the probability that there was

*no*collision with the first $k-1$ choices ($\frac{n}{n} \cdot \frac{n-1}{n} \cdot \cdot \cdot$, increasing as the bucket fills up) multiplied by the probability of a collision at $k$, which will happen $k-1$ times out of $n$, if we assume that the bucket is already filled with $k-1$ different values (by virtue of the no-previous-collision assumption).

Although the conclusion of my previous attempt was that it's often possible to derive mathematical understanding from "procedural tinkering" (i.e. from programming to math), I'm not so sure anymore, as this second version is definitely a counterexample.