Last time we talked about rolling unbiased virtural dice, now let's turn to
opening virtual booster packs in a collectible card game. If you're a nerd like
me, you may be interested in reading about the specifics of how cards are distributed
in Magic: The Gathering and Yu-Gi-Oh!, but our main concern here is to
randomly select n
items from an array of N
options. I'm told that it's also
useful in scientific computing or whatever.
Luckily, The Art of Computer Programming Volume 2: Semi-Numerical Algorithms has got us covered, with what Knuth unimaginatively calls Algorithm S, which works by iterating over the source array, and, for each item, generating a random number to decide whether or not to include it in the sample:
#define Inc(P, S) (P) = (void *)((uint8_t *)(P) + (S))
void rand_select(void * src, int N,
void * dst, int n,
size_t elem_size)
{
assert(0 < n); assert(n <= N);
while (n > 0) {
uint32_t r = randr(N);
if (r < n) {
memcpy(dst, src, elem_size);
Inc(dst, elem_size); n--;
}
Inc(src, elem_size); N--;
}
}
In my opinion, this is fairly straightforward and intuitive; I've even been known to type it inline from memory. It has some problems, though, like the O(N) runtime and the sheer number of random samples it requires.
We can do better, as shown by Jeffrey Scott Vitter in his paper Faster Methods
for Random Sampling and the follow-up An Efficienet Algorithm for Sequential
Random Sampling. The main idea here is to generate a random number S
, which
is the gap between selected items, the crux being its probability distribution.
He outlines a few different algorithms for doing this; the one he recommends for
general use he calls Algorithm D. Thinking about this problem too hard seems
to be really bad for the creative mind.
His benchmarks were made on an IBM 3081 and are over 30 years out of date now, so I figured I'd give this a spin myself. I will test his algorithms A, C, and D (both sub-variants), as well as the implementation of Algorithm S shown above. My benchmarks will run on a Lenovo ThinkPad running Windows 10, with an Intel i5-7200U and 8GB of RAM. The code was compiled with MSVC version 19.13.26131.1 on optimization level 2. Here's what I found:
- For small values of
N
, (less than 2000 or so), Algorithm A is fastest, irrespective of the value ofn
. - While he recommends the first variant of Algorithm D, the second one seems to be faster on modern machines, at least for the majority of inputs I tried.
- Algorithm C is blazing fast for very small
n
, but unreasonably slow for largen
(so much so that I included it only in my first round of tests). I suspect that's because it's the only algorithm that runs inO(n*n)
, though my implementation could be buggy as well.
As a result, I recommend the second variant of Algorithm D, with an additional
check to process small inputs with Algorithm A. Another optimization we can
do is to check for n == N
, in which case we can simply copy all remaining items.
This is a relatively rare case, but it should accelerate sampling when n / N
approaches 1, which all algorithms I tested do poorly with. Here's my implementation:
#define randf() ((double)rand() / (double)(1ULL << 32))
#define AlphaInverse 12
void rand_select(void * src, int N,
void * dst, int n,
size_t elem_size)
{
assert(0 < n); assert(n <= N);
if (N <= 2048) goto method_a;
{ // Method D:
double V_prime = log(1.0 - randf());
double q1 = N - n + 1;
int threshold = AlphaInverse * n;
while (n > 1 && threshold < N) {
double q2 = (q1 - 1) / ((double)N - 1);
double q3 = log(q2);
int S; double LHS, y;
do {
// Generate U and X:
for (;;) {
S = (int)(V_prime / q3);
if (S < q1) break;
V_prime = log(1.0 - randf());
}
LHS = log(1.0 - randf());
// Accept S early?
double RHS = S * (log((q1 - S) / (double)(N - S)) - q3);
if (LHS <= RHS) {
V_prime = LHS - RHS; break;
}
// Accept S at all?
y = 1.0;
int bottom, limit;
if (n - 1 > S) {
bottom = N - n; limit = N - S;
} else {
bottom = N - S - 1; limit = (int)q1;
}
for (int top = N - 1; top >= limit; --top, --bottom) {
y *= (double)top / (double)bottom;
}
V_prime = log(1.0 - randf());
} while (q3 > -(log(y) + LHS) / S);
// Skip S records and select the next one
Inc(src, elem_size * S);
memcpy(dst, src, elem_size);
Inc(src, elem_size);
Inc(dst, elem_size);
N -= S + 1; n -= 1; q1 -= S;
threshold -= AlphaInverse;
}
}
method_a: {
int top = N - n;
while (1 < n && n < N) {
double V = randf();
int S = 0;
double quot = (double)top / (double)N;
while (quot > V) {
S++; top--; N--;
quot *= (double)top / (double)N;
}
// Skip S records and select the next one
Inc(src, elem_size * S);
memcpy(dst, src, elem_size);
Inc(src, elem_size);
Inc(dst, elem_size);
N--; n--;
}
}
// Fast special cases:
if (n > 1) { // n == N ; do a fast copy of all remaining items
memcpy(dst, src, elem_size * n);
} else { // n == 1 ; S is uniform in [0, N)
int S = randr(N);
Inc(src, elem_size * S);
memcpy(dst, src, elem_size);
}
}
// Safe-ish macro that prevents wrong elem_size,
// with (void *) casts for C++ compatibility
#define RandSelect(Src, SrcCount, Dst, DstCount) \
rand_select((void *) Src, SrcCount, \
(void *) Dst, DstCount, \
sizeof(*(Src)))
Of course, this is much longer, and much more complicated code, but it's also around
2 to 5 times faster for most inputs, and easily dozens of times faster when N
is large and n / N
approaches 0. In none of my test cases has it ever run more
than a few percent slower than Algorithm S.
For more details on how this works, see Vitter's paper; I couldn't even hope to do a better job explaining it. Also, as always, use this code at your own risk; I didn't trust it myself when I first wrote it, and, while my initial tests didn't show an easily detectable bias, there may still be nasty bugs in it.