Skip to content

Instantly share code, notes, and snippets.

@tommyettinger
Last active December 18, 2024 19:23
Show Gist options
  • Save tommyettinger/46a874533244883189143505d203312c to your computer and use it in GitHub Desktop.
Save tommyettinger/46a874533244883189143505d203312c to your computer and use it in GitHub Desktop.
Mulberry32 PRNG
/* Written in 2017 by Tommy Ettinger ([email protected])
To the extent possible under law, the author has dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.
See <http://creativecommons.org/publicdomain/zero/1.0/>. */
#include <stdint.h>
/* This is a 32-bit variant on the 63-bit Thrust PRNG, adapted for possible
usage in embedded hardware with 32-bit registers because TonyB on the prng
mailing list seemed like he could use this. Thrust is still in development,
mainly at another Gist ( https://gist.github.com/tommyettinger/e6d3e8816da79b45bfe582384c2fe14a ).
Mulberry uses a few multiplications compounded on the last result, much like
how a mulberry is a compound fruit, to help improve quality beyond what the
tiny state normally allows. It should have a period of 2 to the 32, exactly,
since the state changes to be each 32-bit unsigned integer before cycling.
The speed of this generator hasn't even been tested, and is probably less
than desirable on a 64-bit desktop processor, but it's meant for 32-bit
hardware. It passes gjrand's 13 tests with no failures and a total P-value
of 0.984 (where 1 is perfect and 0.1 or less is a failure) on 4GB of
generated data. That's a quarter of the full period. On the same amount, the
comparable SplitMix32 generator has a total P-value of 0, and had multiple
failures of extreme significance; these were two rank 1 minor failures but
also two more at rank 20 and rank 21, each an irredeemable failure on its
own. Testing on the full period (2 to the 32 numbers with 4 bytes each, for
a total of 16GB of data), Mulberry32 still does very well (for a generator
with 32 bits of state) with total P = 0.753 and all 13 tests passed. Since
SplitMix32 had failed beyond any hope of improvement at a small size, to
compare I tried SplitMix64, with 64 bits of state and generating 64 bits at
a time. On 4GB of data, SplitMix64's total P-value is 0.494, and it has two
rank 1 failures. On 16GB of data, SplitMix64's total P-value is 0.396, and
it has 1 rank 1 failure. To compare what is possible at that state size, a
variant on Thrust with 64 bits of state and 64 bit output passes 13 of 13
tests and gets a P-value of 0.91 when testing on 4GB of data, and a P-value
of 0.857 while still passing all tests on 16GB of data. Both SplitMix64 and
the 64-bit Thrust variant use fewer operations than Mulberry32 to obtain each
result (and that result is twice the size); if you can use a PRNG that works
with 64-bit math, you generally will benefit.
*/
uint32_t x; /* The state can be seeded with any value. */
/* Call next() to get 32 pseudo-random bits, call it again to get more bits. */
// It may help to make this inline, but you should see if it benefits your code.
uint32_t next(void) {
uint32_t z = (x += 0x6D2B79F5UL);
z = (z ^ (z >> 15)) * (z | 1UL);
z ^= z + (z ^ (z >> 7)) * (z | 61UL);
return z ^ (z >> 14);
}
/* The one large constant, 0x6D2B79F5UL, is tightly linked to the shift amounts
used later in the code. It is probably a bad idea to change the constant without
verifying that the quality is still OK in a robust testing suite like gjrand or
maybe TestU01's BigCrush (this may be unable to pass BigCrush simply because its
state size is too small). This constant was found by repeatedly editing the
constant used by Thrust in a 64-bit version (which in turn was found by shifting
the constant used for Thrust in its 63-bit version, and changing the lowest and
highest few bits), adjusting the shift amounts to match specific patterns that
could occur (or be edited into) the constant. That on its own didn't prove
enough, so a bitwise or with 61 was tried (along with other, smaller numbers; 29
was used in an earlier version), and in conjunction with the add-to-product-then-
xor step, that seems to do the trick and yield high quality for the state size.
The constant was changed between the first and second versions, but only a few
bits actually changed and the shifts were all fine where they were. One of the
bitwise OR values did change, which proved key to bringing the 16GB test to a
point where it has no failures.
*/
@tommyettinger
Copy link
Author

Good point, @oisyn , I've hit that issue myself when using GWT (which transpiles Java to JS, but that means a Java int becomes a JS Number). The | 0 works there as well.

@mattv8
Copy link

mattv8 commented Jan 26, 2024

Thank you for this, just what I needed. I've taken and adapted it to output an 8-bit integer that I'm using to generate a static random color generator. It's not perfect, but I simply needed a consistent random RGB value to generate a "static" color every time the browser refreshed. Here's my implementation, called in a one-liner like seededPRNG(42)(), returns a pseudo-random integer value between 0 and 255:

function seededPRNG(seed) {

            // Initialize my internal state (constants arbitrarily chosen performs better than a "normalized" one as you did, like Math.floor(seed * 0xFFFFFFF) + 0x6D2B79F5)
            let t = Math.floor(seed * 0x12BFFFFFED4) + 0x6D2B79F5;

            // Return a function that generates the next 8-bit pseudo-random value
            return () => (
                t = Math.imul(t ^ t >>> 15, t | 1),// XOR the current state with a right shift by 15 bits to add additional entropy
                t ^= t + Math.imul(t ^ t >>> 7, t | 61),// Perform an integer multiplication to add additional entropy
                (t ^ t >>> 14) & 0xFF// XOR the current state with a right shift by 14 bits to add additional, then bitwise AND with 0xFF to ensure an 8-bit value
            );

        }

I've run some tests on performance both visually and using the Chi-Square test. For 1000 unique seeds, I get a Chi-Square test statistic of 254.2848 and a corresponding p-value: 0.5009 comparing between the "ideal distribution" of 0-255 and the actual distribution from the simulation. Of course the P-value varies depending on the seed(s) used.

Here's my test script (and code): https://www.visnovsky.us/pub/prng-tests.html

@tommyettinger
Copy link
Author

@mattv8 I'll test this some, but even with this only returning the bottom 8 bits, you can be certain that the function you return won't be able to produce a sequence of bytes without cycling much sooner than it should. That's because you're transforming t with operations that aren't reversible; this was similar to the mistake I made when I first wrote mulberry32. The trivial case to show a problem is when t is initialized to 0, which can only happen if seed is in a specific range of floating-point numbers. If that happens, your function will always return 0, and never change out to a different state. The particular type of generator you have is comparable to Middle-Square, the first PRNG algorithm devised for a computer; both your function and middle-square generally "drop in" after some sequence into a smaller cycle, unless they start on a smaller cycle (which isn't good either!). It may take a while to cycle even if it is smaller (than 2 to the 32, the optimum). A seed of 1 drops into a smaller cycle with length 37956; you can check this by giving the initial value for t to be 166469239 (which is on the smaller cycle). While on the smaller cycle, it won't ever "jump back up" to the sequence that contains 1, and 1 will never be used again. This property is, for lack of a better term, just plain bad. Researchers who used Middle-Square had to figure out which sequences would run for longer before collapsing into a short cycle, which isn't at all needed on modern PRNG designs. The "normalized" one as I did ensures that at least every state will be passed into the (flawed) output section of the function. Because yours uses the same (flawed) output section as the state transition, the state transition becomes flawed as well. You should look at the skeeto/hash-prospector -derived designs from later in this (long) comment section; they should be faster or about as fast, and should run on a large-increment-counter/Weyl-sequence of some kind to ensure they have the maximum period.

@bryc
Copy link

bryc commented Jan 27, 2024

@mattv8 I believe the increment must be part of the return function, rather than executed only once when setting the seed.

Slightly modified your function construct to use splitmix32 with the constants found from https://github.com/skeeto/hash-prospector, that @tommyettinger mentioned. It's slightly faster and as I understand it, has a full period (it will exhaust all 32-bit numbers before it repeats, which does not occur with mulberry32). I assume the randomness statistics are improved as well with this.

function splitmix32(s) {
    return (t) => (
        s = s + 0x9e3779b9 | 0, // |0 is an optimization; prevents VM from leaving 32-bit mode via addition
        t = Math.imul(s ^ s >>> 16, 0x21f0aaad),
        t = Math.imul(t ^ t >>> 15, 0x735a2d97),
        (t = t ^ t >>> 15) & 0xFF // >>> 0 is better, can't you just: splitmix32() & 0xFF?
    );
}

@mattv8
Copy link

mattv8 commented Feb 5, 2024

@tommyettinger & @bryc thank you both for your insightful comments. I'm still trying to wrap my head around these PRNGs, and both of you have great suggestions, most notably how only returning the bottom 8 bits (truncating/masking) is going to give you a bad time. In my own testing, the shortcomings of my algorithm quickly becomes apparent when seeding integer values closer to 2^32 (because of the whole 2-to-the-32-generations period I'm assuming). I've since moved the masking outside of the function (e.g. splitmix32() & 0xFF).

At any rate, playing around with @bryc's version of splitmix32, the performance across the range of integers between 0 and 2^32 is better, about 30% to 60% faster than mine (4.3 ms vs 2.6 ms for 100000 "seeds", but exact time varies depending on run, see for yourself here).

Sorry I don't have more profound thoughts than this, other than my gratefulness for both of your inputs. I'll continue playing around with each algorithm on my site and report any interesting findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment