teryror/prng_sampling_without_replacement.md

## prng_sampling_without_replacement.md

      
    Raw
  

              prng_sampling_without_replacement.md
            
          
    Random Sampling Without Replacement

Last time we talked about rolling unbiased virtural dice, now let's turn to
opening virtual booster packs in a collectible card game. If you're a nerd like
me, you may be interested in reading about the specifics of how cards are distributed
in Magic: The Gathering and Yu-Gi-Oh!, but our main concern here is to
randomly select n items from an array of N options. I'm told that it's also
useful in scientific computing or whatever.
Luckily, The Art of Computer Programming Volume 2: Semi-Numerical Algorithms has
got us covered, with what Knuth unimaginatively calls Algorithm S, which works
by iterating over the source array, and, for each item, generating a random number
to decide whether or not to include it in the sample:
#define Inc(P, S) (P) = (void *)((uint8_t *)(P) + (S))

void rand_select(void * src, int N,
                 void * dst, int n,
                 size_t elem_size)
{
    assert(0 < n); assert(n <= N);
    
    while (n > 0) {
        uint32_t r = randr(N);
        if (r < n) {
            memcpy(dst, src, elem_size);
            Inc(dst, elem_size); n--;
        }
        
        Inc(src, elem_size); N--;
    }
}

In my opinion, this is fairly straightforward and intuitive; I've even been known
to type it inline from memory. It has some problems, though, like the O(N)
runtime and the sheer number of random samples it requires.
We can do better, as shown by Jeffrey Scott Vitter in his paper Faster Methods
for Random Sampling and the follow-up An Efficienet Algorithm for Sequential
Random Sampling. The main idea here is to generate a random number S, which
is the gap between selected items, the crux being its probability distribution.
He outlines a few different algorithms for doing this; the one he recommends for
general use he calls Algorithm D. Thinking about this problem too hard seems
to be really bad for the creative mind.
His benchmarks were made on an IBM 3081 and are over 30 years out of date now, so
I figured I'd give this a spin myself. I will test his algorithms A, C,
and D (both sub-variants), as well as the implementation of Algorithm S
shown above. My benchmarks will run on a Lenovo ThinkPad running Windows 10, with
an Intel i5-7200U and 8GB of RAM. The code was compiled with MSVC version 19.13.26131.1
on optimization level 2. Here's what I found:

For small values of N, (less than 2000 or so), Algorithm A is fastest,
irrespective of the value of n.
While he recommends the first variant of Algorithm D, the second one seems
to be faster on modern machines, at least for the majority of inputs I tried.
Algorithm C is blazing fast for very small n, but unreasonably slow for
large n (so much so that I included it only in my first round of tests).
I suspect that's because it's the only algorithm that runs in O(n*n), though
my implementation could be buggy as well.

As a result, I recommend the second variant of Algorithm D, with an additional
check to process small inputs with Algorithm A. Another optimization we can
do is to check for n == N, in which case we can simply copy all remaining items.
This is a relatively rare case, but it should accelerate sampling when n / N
approaches 1, which all algorithms I tested do poorly with. Here's my implementation:
#define randf() ((double)rand() / (double)(1ULL << 32))
#define AlphaInverse 12

void rand_select(void * src, int N,
                 void * dst, int n,
                 size_t elem_size)
{
    assert(0 < n); assert(n <= N);
    if (N <= 2048) goto method_a;
    
    { // Method D:
        double V_prime = log(1.0 - randf());
        double q1 = N - n + 1;
        int threshold = AlphaInverse * n;
        
        while (n > 1 && threshold < N) {
            double q2 = (q1 - 1) / ((double)N - 1);
            double q3 = log(q2);
            
            int S; double LHS, y;
            do {
                // Generate U and X:
                for (;;) {
                    S = (int)(V_prime / q3);
                    if (S < q1) break;
                    V_prime = log(1.0 - randf());
                }
                LHS = log(1.0 - randf());
                
                // Accept S early?
                double RHS = S * (log((q1 - S) / (double)(N - S)) - q3);
                if (LHS <= RHS) {
                    V_prime = LHS - RHS; break;
                }
                
                // Accept S at all?
                y = 1.0;
                int bottom, limit;
                if (n - 1 > S) {
                    bottom = N - n;      limit = N - S;
                } else {
                    bottom = N - S - 1;  limit = (int)q1;
                }
                
                for (int top = N - 1; top >= limit; --top, --bottom) {
                    y *= (double)top / (double)bottom;
                }
                
                V_prime = log(1.0 - randf());
            } while (q3 > -(log(y) + LHS) / S);
            
            // Skip S records and select the next one
            Inc(src, elem_size * S);
            memcpy(dst, src, elem_size);
            Inc(src, elem_size);
            Inc(dst, elem_size);
            
            N -= S + 1; n -= 1; q1 -= S;
            threshold -= AlphaInverse;
        }
    }
    
    method_a: {
        int top = N - n;
        while (1 < n && n < N) {
            double V = randf();
            
            int S = 0;
            double quot = (double)top / (double)N;
            while (quot > V) {
                S++; top--; N--;
                quot *= (double)top / (double)N;
            }
            
            // Skip S records and select the next one
            Inc(src, elem_size * S);
            memcpy(dst, src, elem_size);
            Inc(src, elem_size);
            Inc(dst, elem_size);
            
            N--; n--;
        }
    }
    
    // Fast special cases:
    if (n > 1) { // n == N ; do a fast copy of all remaining items
        memcpy(dst, src, elem_size * n);
    } else {     // n == 1 ; S is uniform in [0, N)
        int S = randr(N);
        Inc(src, elem_size * S);
        memcpy(dst, src, elem_size);
    }
}

// Safe-ish macro that prevents wrong elem_size,
// with (void *) casts for C++ compatibility
#define RandSelect(Src, SrcCount, Dst, DstCount) \
    rand_select((void *) Src, SrcCount,          \
                (void *) Dst, DstCount,          \
                sizeof(*(Src)))

Of course, this is much longer, and much more complicated code, but it's also around
2 to 5 times faster for most inputs, and easily dozens of times faster when N
is large and n / N approaches 0. In none of my test cases has it ever run more
than a few percent slower than Algorithm S.
For more details on how this works, see Vitter's paper; I couldn't even hope to do
a better job explaining it. Also, as always, use this code at your own risk; I didn't
trust it myself when I first wrote it, and, while my initial tests didn't show an
easily detectable bias, there may still be nasty bugs in it.

  
## timings.csv
# Cycle timings are the best of 128 runs measured with __rdtsc()
       n;       N;     Method S;     Method A;    Method D1;    Method D2
      32;      64;         1199;          712;          498;          531
      16;      64;          746;          309;          309;          317
       8;      64;          713;          242;          246;          243
       4;      64;          698;          137;          135;          141
       2;      64;          634;           32;           31;           35
       1;      64;          640;           19;           19;           21
       1;      64;          639;           20;           19;           20
     128;     256;         4191;         2423;         2218;         2877
      64;     256;         3352;         1409;         1421;         1546
      32;     256;         3019;         1178;         1142;         1255
      16;     256;         2910;          880;          797;          874
       8;     256;         2794;          881;          857;          914
       4;     256;         2523;          472;          474;          498
       2;     256;         2484;           71;           69;           73
     512;    1024;        22332;        19232;        18785;        21230
     256;    1024;        16946;        10079;         9778;        11500
     128;    1024;        13548;         6485;         6362;         7468
      64;    1024;        12167;         5573;         5400;         5886
      32;    1024;        11463;         4592;         4641;         4879
      16;    1024;        10592;         3345;         3434;         3495
       8;    1024;        10053;         3271;         3279;         3339
    2048;    4096;        95620;       102535;       102195;       106622
    1024;    4096;        71894;        61242;        60366;        65682
     512;    4096;        57257;        39067;        38319;        40186
     256;    4096;        48362;        26725;        26616;        27808
     128;    4096;        46377;        20632;        18500;        13409
      64;    4096;        41243;        18090;         8670;         8168
      32;    4096;        40246;        15511;         4304;         3964
    8192;   16384;       389604;       423554;       423655;       444740
    4096;   16384;       289127;       265239;       264952;       278870
    2048;   16384;       230349;       165614;       165858;       172072
    1024;   16384;       200680;       114189;       114271;       117173
     512;   16384;       173540;        88382;        87892;        59351
     256;   16384;       165319;        74939;        36781;        29943
     128;   16384;       161071;        67645;        17515;        15404
   32768;   65536;      1563605;      1702371;      1705923;      1786723
   16384;   65536;      1158174;      1067576;      1071668;      1120929
    8192;   65536;       921210;       666581;       666649;       692912
    4096;   65536;       788762;       457985;       458199;       474218
    2048;   65536;       694324;       356689;       369341;       235859
    1024;   65536;       660594;       300243;       170063;       118936
     512;   65536;       645255;       275020;        76850;        59609
  131072;  262144;      6262652;      6849169;      6856107;      7183388
   65536;  262144;      4653875;      4276142;      4266440;      4497788
   32768;  262144;      3697511;      2682024;      2681406;      2786451
   16384;  262144;      3216539;      1838556;      1837907;      1897022
    8192;  262144;      2807249;      1426520;      1554821;       919154
    4096;  262144;      2647786;      1216239;       719543;       472479
    2048;  262144;      2594443;      1101944;       345988;       229614
  524288; 1048576;     25134406;     27477865;     27453719;     28809979
  262144; 1048576;     18572246;     17098052;     17104752;     18009254
  131072; 1048576;     14782997;     10728903;     10729915;     11114613
   65536; 1048576;     12863335;      7379707;      7425323;      7633840
   32768; 1048576;     11562808;      5693612;      6267886;      3720767
   16384; 1048576;     10785682;      4920032;      2952058;      1895571
    8192; 1048576;     10533216;      4500471;      1459342;       970113
 2097152; 4194304;    100647601;    109992025;    109972330;    115199561
 1048576; 4194304;     74433585;     68479257;     68527102;     72172319
  524288; 4194304;     59206802;     42833816;     42862236;     44583978
  262144; 4194304;     51609907;     29613321;     29555045;     30536135
  131072; 4194304;     47068071;     23163535;     25131119;     14882598
   65536; 4194304;     44869653;     21353959;     12740620;      8453649
   32768; 4194304;     44314096;     20575597;      6631770;      4930347
	# Cycle timings are the best of 128 runs measured with __rdtsc()
	n; N; Method S; Method A; Method D1; Method D2
	32; 64; 1199; 712; 498; 531
	16; 64; 746; 309; 309; 317
	8; 64; 713; 242; 246; 243
	4; 64; 698; 137; 135; 141
	2; 64; 634; 32; 31; 35
	1; 64; 640; 19; 19; 21
	1; 64; 639; 20; 19; 20
	128; 256; 4191; 2423; 2218; 2877
	64; 256; 3352; 1409; 1421; 1546
	32; 256; 3019; 1178; 1142; 1255
	16; 256; 2910; 880; 797; 874
	8; 256; 2794; 881; 857; 914
	4; 256; 2523; 472; 474; 498
	2; 256; 2484; 71; 69; 73
	512; 1024; 22332; 19232; 18785; 21230
	256; 1024; 16946; 10079; 9778; 11500
	128; 1024; 13548; 6485; 6362; 7468
	64; 1024; 12167; 5573; 5400; 5886
	32; 1024; 11463; 4592; 4641; 4879
	16; 1024; 10592; 3345; 3434; 3495
	8; 1024; 10053; 3271; 3279; 3339
	2048; 4096; 95620; 102535; 102195; 106622
	1024; 4096; 71894; 61242; 60366; 65682
	512; 4096; 57257; 39067; 38319; 40186
	256; 4096; 48362; 26725; 26616; 27808
	128; 4096; 46377; 20632; 18500; 13409
	64; 4096; 41243; 18090; 8670; 8168
	32; 4096; 40246; 15511; 4304; 3964
	8192; 16384; 389604; 423554; 423655; 444740
	4096; 16384; 289127; 265239; 264952; 278870
	2048; 16384; 230349; 165614; 165858; 172072
	1024; 16384; 200680; 114189; 114271; 117173
	512; 16384; 173540; 88382; 87892; 59351
	256; 16384; 165319; 74939; 36781; 29943
	128; 16384; 161071; 67645; 17515; 15404
	32768; 65536; 1563605; 1702371; 1705923; 1786723
	16384; 65536; 1158174; 1067576; 1071668; 1120929
	8192; 65536; 921210; 666581; 666649; 692912
	4096; 65536; 788762; 457985; 458199; 474218
	2048; 65536; 694324; 356689; 369341; 235859
	1024; 65536; 660594; 300243; 170063; 118936
	512; 65536; 645255; 275020; 76850; 59609
	131072; 262144; 6262652; 6849169; 6856107; 7183388
	65536; 262144; 4653875; 4276142; 4266440; 4497788
	32768; 262144; 3697511; 2682024; 2681406; 2786451
	16384; 262144; 3216539; 1838556; 1837907; 1897022
	8192; 262144; 2807249; 1426520; 1554821; 919154
	4096; 262144; 2647786; 1216239; 719543; 472479
	2048; 262144; 2594443; 1101944; 345988; 229614
	524288; 1048576; 25134406; 27477865; 27453719; 28809979
	262144; 1048576; 18572246; 17098052; 17104752; 18009254
	131072; 1048576; 14782997; 10728903; 10729915; 11114613
	65536; 1048576; 12863335; 7379707; 7425323; 7633840
	32768; 1048576; 11562808; 5693612; 6267886; 3720767
	16384; 1048576; 10785682; 4920032; 2952058; 1895571
	8192; 1048576; 10533216; 4500471; 1459342; 970113
	2097152; 4194304; 100647601; 109992025; 109972330; 115199561
	1048576; 4194304; 74433585; 68479257; 68527102; 72172319
	524288; 4194304; 59206802; 42833816; 42862236; 44583978
	262144; 4194304; 51609907; 29613321; 29555045; 30536135
	131072; 4194304; 47068071; 23163535; 25131119; 14882598
	65536; 4194304; 44869653; 21353959; 12740620; 8453649
	32768; 4194304; 44314096; 20575597; 6631770; 4930347