Blog Posting Early-Access Alpha: Braid Particle System Revision (part 1)

Jonathan Blow

#7516

July 7, 2016

This is a rough draft. I figured I would circulate it here to see whether everything makes sense, whether people feel like a particular section could use further explanation, etc.

------------------------------------------------------------------------------------
The game Braid originally shipped to the world in 2008, and after some ports in 2009, I have not done much with the code. But I want to maintain this game indefinitely into the future; in the back of my mind, there have always been some clean-ups that I have wanted to perform on the code. Often when shipping a game, the best answer to a problem isn't evident, and we are under time pressure, so we do something sufficient but sub-optimal. Other times, we are under technical constraints due to systems we wanted to deploy on at a particular time, and as the years go on, these systems become no longer relevant, so the code can be cleaned up. I figured it would be interesting to talk about some of these things in a blog (and the blog will help motivate me to think about these situations and clean some of them up!)

So, let's talk about particle systems and random number generators.

On the night of July 4th, instead of watching patriotic USA fireworks, I dug out the Braid code and replaced the random number generator. The old generator was the LCG used in BCPL, but when generating particles in some cases, I had needed to add some ugly workarounds to get good-looking results. These results are not necessarily the BCPL LCG's fault; it may just have to do with the way I was using it. But I was also motivated to experiment with PCG, a newly-invented random number generator that sounds really good. So I chose that as my first approach.

----------

Like many games, Braid has particle sources, where the properties of each particle (position, velocity, size, rate of change of size, initial color, rate of change of color, etc) are produced by a pseudorandom number generator. Most games just generate these properties once when a particle is created, then simulate the particle forward in time once per frame, which is computationally inexpensive. But Braid needs particles to be able to simulate forward and backward in time; and it's not enough to reverse-simulate any individual particle backward, because if a particular particle has been destroyed (due to being of old age and just having faded away), when you rewind you want to see that exact particle reappear, fade in, and life its life backwards with exactly the same parameters. You could do this using a classical particle system and storing the state of every particle, but that would become expensive very quickly. So the problem statement for Braid was, given the settings for a particular particle source, to be able to generate the state of the particles spawned by that source at any given time t, in a way that is relatively efficient with both computation and storage.

I solve this problem by re-seeding the random number generator for every particle every frame -- essentially undergoing the creation process for every particle every time it is rendered, and then simulating that particle to the current time. As long as the particle source knows exactly which particles will be active at a given time (which requires a constant emission rate, a fixed maximum lifetime, and other such constraints), then we don't need to store the state of any of the particles. As long as the logic to evolve a particle through time is a simple closed-form function or other small piece of relatively straightforward code, that's not too expensive computationally either (much cheaper than simulating forward N fixed timesteps from the starting state of the particle, where N can easily be in the hundreds or thousands).

In Braid, the code for rendering particles is something like this:

    for each particle source in the current level {
        for each particle owned by this one source {
            seed a random number generator by this particle's index
                (multiplied by some large number tom ake the seed seem more random).
            generate position, size, color, rotation, etc of this particle at spawn time.
            simulate particle values forward to current time.
            store renderable values into output buffer.
        }
    }

The reason you have to seed for each particle, rather than only once per source, is that the range of starting and ending particle indices will be changing every frame, so there isn't any sensible starting point in time for your seed to designate. But by seeding for each particle, you suddenly don't care about time any more.

To my surprise, when I first implemented this, particles looked bad. Correlations were visible between different particle attributes: for example, the X and Y components of each particle's velocity were correlated, which biased particles to travel in some directions but not others; particle size was correlated in weird ways with speed, and so on.

It was not too surprising to me that seeding for every particle made the numbers less random, but I had not expected the decreased randomness to be so visually objectionable. Of course I tried several of the obvious solutions, like only using the upper bits of the LCG's output, but they didn't seem to matter much.

I did not know a lot about the practical usage of random number generators (I still don't!), but I needed to fix this problem, so I'll now explain the solution I arrived at.

---------
Video: Braid particle systems with undesired correlations.
https://dl.dropboxusercontent.com/u/2970717/braid_unfixed.mp4

Video: Braid particle systems after fixes have been applied.
https://dl.dropboxusercontent.com/u/2970717/braid_fixed.mp4
---------

It seemed clear that seeding for each particle was causing the problem. This would be very unsurprising if I were just using each particle's index as the seed (since these indices are small integers in sequence), but I had been multiplying the index by some large number in an attempt to make it seem more random. Apparently this had not been enough. So, what if I generated some table of much more-random numbers at startup, using a higher-quality random-generation scheme that is too slow to use at runtime, and then I indexed that table per-particle to retrieve a high-quality random number that I would use as an ingredient in the seed for each particle? As mentioned, I didn't know too much about random number generation, but I knew the Mersenne Twister was supposed to be "high-quality but slow", so I used that. The code is shown below. (Note: Today the Mersenne Twister is seen as having a few drawbacks, so this is not a recommendation; it is just what I did then. Also, I am not claiming that what I did was a good solution; it is interesting, though, in that it solved the problem in a shipping game that's been played by a million or two people.)

Here's how I set up the table of higher-quality random numbers:

    const int NUM_NUMBERS = 127;
    globals.particle_randomization_table_size = NUM_NUMBERS;
    globals.particle_randomization_table = new unsigned long[NUM_NUMBERS];

    Random_Generator_Mersenne mersenne;

    // (This is me writing this comment in 2016):
    // The seed number below is a traditional way of initializing the Mersenne Twister, 
    // but M.E. O'Neill makes the case that you can't well-seed this RNG with only 32 bits:
    //           http://www.pcg-random.org/posts/cpp-seeding-surprises.html
    // I chose a fixed seed so that successive runs of the game are identical.
    // Thus I could have just replaced this init code with an array literal containing
    // pre-generated output numbers, but I think I figured the impact on startup time was small
    // and that I preferred the flexibility of keeping this code live so I could change it at will.
    
    mersenne.seed(19650218UL);

    for (int i = 0; i < NUM_NUMBERS; i++) {
        globals.particle_randomization_table[i] = mersenne.get();
    }

When computing the particles during each frame, the table was used in this way:

        int randomization_cursor = (first_particle_index + num_particles - 1) % globals.particle_randomization_table_size;
        if (randomization_cursor < 0) randomization_cursor += globals.particle_randomization_table_size;

        for (int i = num_particles-1; i >= 0; --i) {  // We iterate backwards because we draw new particles beneath older particles, as this just looks better; and most-obscured particles draw first, for Painter's Algorithm reasons.
            int particle_index = first_particle_index + i;

            // The next two lines are an optimization to avoid using a modulus operator per particle.
            // We really want to do: particle_index % globals.particle_randomization_table_size;
            // Note that particle_randomization_table_size is not a power of two, intentionally. (It's close though!)
            // I am presuming the 'if' gets compiled into a CMOV.
            
            int randomization_index = randomization_cursor;
            if (--randomization_cursor < 0) randomization_cursor = globals.particle_randomization_table_size-1;
    
            unsigned int number_a = globals.particle_randomization_table[randomization_index];

            // 'gen' is the random number generator.
            gen.seed(number_a * particle_index * big_number);

            // The rest of the particle generation logic goes here, for example:
            p.lifetime  = lerp(lifetime0, lifetime, gen.get_zero_to_one());
            p.size      = gen.get_within_range(size_scale_0, size_scale_1);
            p.dtheta_dt = gen.get_within_range(scaled_dtheta_dt_0, scaled_dtheta_dt_1);
            p.theta0    = lerp(theta0, theta1, gen.get_zero_to_one());
            ...
            // Including a lot more code that does the simulation part.
        }

You'll note here the presence of a variable 'big_number' inside the seed argument. big_number is computed in the following way, once per particle system (not once per particle):

        gen.seed(source->portable_id * 13482034981);  // @Hack: This number is meaningless, I just banged on the keyboard.  It could be a really bad number.
        gen.get();
        gen.get();
        gen.get();
        gen.get();
        gen.get();
        gen.get();
        unsigned int big_number = gen.get();

For context, gen.get() looks like this:

    inline u32 Random_Generator::get() {
        state = state * 2147001325 + 715136305; // BCPL generator
        // Shuffle non-random bits to the middle, and xor to decorrelate with seed.
        return 0x31415926 ^ ((state >> 16) + (state << 16));
    }

Note my continuing commitment to quality there where I call gen.seed for the particle source (yes, that comment is from the original source code). I suppose I should have investigated whether that number is pathologically bad for BCPL.

So, per particle system, I am seeding once by the ID of the particle system (so this is constant per-particle system), then generating 6 'random' numbers and throwing them away, then using the seventh. I throw 6 away because I was finding this necessary to keep the visual appearance of randomness, even using the Mersenne Twister starting table. After 6 numbers, things looked random enough.

I did not feel great about this. Sure, these extra generations were happening only once per particle system, rather than per particle, but in Braid a lot of particle systems have relatively few particles in them, and speed was at a premium. (On the Xbox 360 I'd had to write a SIMD version of the particle generation code in order for the game to hit 60fps everywhere).

But when you're shipping a game, you have so many things to worry about that some things just fall off the table. The disciplined approach would have been to stop trying to mix a bunch of numbers together in a bowl and hope it magically works, and instead learn more about random number generators, and find a simpler and cleaner solution. But if I had done that, the game would have come out later (or not at all; I was having a hard time psychologically by the time the game was finally done, 3.5 years into the project, and many independent game development efforts collapse long before the game is done).

So instead, I mixed a bunch of numbers together in a bowl, and after trying a few things it worked well enough.

-------

Of course, it wasn't quite that simple, because there were 4 variants of the particle-generation code that were mostly copies of each other: (1) the slow path to fill out to seed the initial state of a particle system, (2) to iterate the state of those particles forward quickly every frame, (3) to implement the special case of particles along a line (for the sun only), and (4) to implement the particles for the Ring in world 6 (the calling site is actually ifdeffed out, but this code is still hanging around for some reason, presumably because I thought it might come back at some point). (3) and (4) were factored into their own implementations so that they would not slow down the particle evaluation in (2) with extra conditionals.

What is extra fun is that (1) and (2) need to compute exactly the same result, otherwise you'll get discontinuities when particle systems wrap or when rewind is engaged. I had forgotten this when I started doing this particle system revision, and I spent about 20 minutes trying to figure out what was going on before I saw how things are structured.

-------

This set of clumsy randomization hacks has been sticking in the back of my mind as one of the things I am vaguely unhappy with in the Braid code. (It is not nearly the worst, because this one isn't visible to the user; there's other stuff, like the camera handling and occasional collision detection weirdness, that upset me more because you feel them when you play).

For some reason, this week I decided to fix it. As it turns out, simply by using PCG instead of BCPL, I was able to strip out all the randomization hacks discussed above, leaving only the "straightforward" system of re-seeding for each particle based on its index.

I was able to remove the Mersenne Twister completely from the codebase (two files containing 79 lines of code (not including blank lines and comments)).

After simplifying away all the extra randomization stuff from the particle rendering code, I saved XXX LOC total. Most of the code in the listings above has been removed.

------

So, at the end of the day, I removed a bunch of ugly hacks and ended up with shorter, more-elegant code. So this was a win, right?

Well, not clearly. One reason is that PCG is slower than BCPL, and there's not really a good reason for us to be paying that cost. Again, here's what BCPL number generation looks like:

    inline u32 Random_Generator::get() {
        // state is 32 bits.
        state = state * 2147001325 + 715136305;
        return 0x31415926 ^ ((state >> 16) + (state << 16));
    }

and here's what PCG looks like:

    inline u32 Random_Generator::get() {
        // state, inc are both 64 bits!
    
        u64 oldstate = state;
        state = oldstate * 6364136223846793005ULL + (inc|1);
    
        u32 xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
        u32 rot = oldstate >> 59u;
        return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
    }

PCG is doing substantially more math, *and* it's doing it on numbers that are twice as wide.

If this were some kind of cryptographic application, paying this cost might be warranted. But we're just trying to generate reasonable visuals for some particles; we don't need to pass rigorous statistical tests.

But the real reason this isn't a win is that I didn't increase my knowledge in any deep way. I learned some surface stuff about random number generators by skimming the PCG paper, but really so far I have been lazy. I don't even have the conceptual tools yet to reason about what the best solution for this problem would look like; so long as that is the case, I am doomed to just kind of keep trying random things.

Fortunately, in the light reading I have done so far, I've learned enough to get a little bit of traction, and I see several potential directions in which to go to improve this situation.

Most clearly, it seems like we ought to be able to go back to BCPL and do something simpler to increase the quality of our seeding. At the same time, though, from what I am learning about PCG, LCGs, and other types of random number generators, there are a few possibilities that should at least be discussed. We'll continue this in part 2, in which I actually start learning real things about random number generators, and stop treating them like magical black boxes.

Edited by Jonathan Blow on July 7, 2016, 5:06pm

Ralph Brorsen

#7519

July 7, 2016

Typo: "tom ake"
Maybe explain acronyms first time used? (LCG, BCPL)

- "The reason you have to seed for each particle, rather than only once per source, is that the range of starting and ending particle indices will be changing every frame"
I don't understand the reason you give here. Explain this in a little more detail?

At a higher level, I found the post to be not quite rigorous enough in it's discussion of how the particle system is simulated. For instance, these two sentences:
"and then simulating that particle to the current time"
"simulate particle values forward to current time"
The phrasing suggests to me that you're doing timestep based simulation to current time, where as I am pretty sure from the context that that is not what you meant.

Perhaps it would be fruitful if you used some strong mathematically derived terms to denote the difference, perhaps stateful vs stateless computation, and gave a short example of the difference w/ actual simulation. The post seems a little too fuzzy on this point, which is crucial for understanding the PRNG part.

Jonathan Blow

#7520

July 7, 2016

Thanks for the comments. I am already beefing the article up to remedy this situation.

ratchetfreak

#7523

July 7, 2016

Sounds like you never stopped using a straight linear relation between the index and the seed.

I think that's what is the core of the problem when the RNG you are using is still at heart a LCG (newstate = oldstate * a + c mod m). Which has a very high correlation between successive values.

My gut tells me that trying to get a non-linear relation between the index and seed you would get better results. For example adding the square of the index to the seed.

Slightly unrelated but if the table size is one less than a power of two then you can get the increment-and-modulo operation with

1 2	index = ++index; index = index & 0x7f + index >> 7;

Works best with increment though.

Jonathan Blow

#7524

July 7, 2016

Hmm, well, since then I have gotten good results using BCPL with a linear relationship between the index and the seed, or at least, as linear as it ever was. Stay tuned for part 2.

Edited by Jonathan Blow on July 7, 2016, 11:45pm

Jonathan Blow

#7525

July 8, 2016

Okay, the official blog post is up... http://number-none.com/blow/blog/...2016/07/07/braid_particles_1.html

ratchetfreak

#7592

July 14, 2016

About the fast forwarding section.

Because a LCG is cyclic over a pretty small period you can get the x0 from state xi by using -i mod period and because period is a power of 2 you can use the 2s' compliment of i and mask it.

ratchetfreak

#7603

July 15, 2016

With part 2 up I see that you also didn't like the fast forwarding because it is errorprone to keep track how many numbers you need per particle. I already explained how to rewind using the fast forward algorithm by utilizing the period being a power of 2 of the LCG you used.

I think that can be alleviated if you create a struct of ints where the all the random numbers you need for 1 particle are put in

struct partical_randoms{
int posX;
int posY;
int velX;
int velY;
//...
}

And you pass it to the RNG with "partical_randoms randoms; gen.get((int*)&randoms, sizeof(randoms)/sizeof(int));" Then when you need an extra random number generated you add it to the struct and the passed size adjusts automatically.

Edited by ratchetfreak on July 15, 2016, 12:31pm