…in which we make the particles go faster.

(Warning: This is a long post with a lot of detail and not much added in terms of functionality. Our code will be a fair bit more complicated at the end of this session than it was before so you should try to follow the outlines of this — it also contains quite a few pro tricks you can steal for your own projects — but don’t let yourself get bogged down in the details.)

Last time we built a sprite sheet class and gave it a very basic test in the particle system, but we ran into a problem: it’s so slow! Rather than struggling on adding more to the project while it runs like this, let’s take a moment to make it more efficient.

This is a standard way to work when developing a project. You write the code in the most intuitive and natural way you can, without worrying too much about performance. Then when there’s a problem, you work out where it is and optimize it.

Why not write the most optimal code you can as you go along? There are (at least) two reasons. One is that you don’t always know what will be optimal, at least at first. Sure, we know we should call loadImage() in setup() rather than every frame. It’s really slow, and we know that, so we’ll do that almost automatically. But other performance issues might not be obvious at first.

The other reason is that highly-optimized code is usually more complex and harder to understand than the more “intuitive” kind. We’ll see this in today’s post. You want your code to be as simple as possible, so experienced developers tend to avoid “premature optimization” and only go through this process when performance is actually a problem.

Well, we’ve hit that point now, so what to do?

Computers are very fast, so slowness is usually due to one of two things:

  • Importing data from outside the programme, such as loading an image from a file;
  • Performing an operation many times in a loop.

The first one isn’t an issue for us — the only external data we’re using is our sprite sheet, and we import that once in setup() and then we’re done. So the issue is probably the second one.

Our performance issue must be down to how long it takes to draw a frame — that is, to something that happens in draw():

Setting the background is hardly going to be the issue; nor is setting noStroke(). So the problem must be in one of the two ParticleSystem functions. Which would you guess?

Here is ParticleSystem.advance():

It does a bunch of calculations for every particle, so if we have 1000 particles it will do all of these 1000 times every frame. This could definitely be a problem. But the calculations are pretty simple and it’s not easy offhand to see how to make them any more efficient. This is definitely not “low-hanging fruit” from an optimization perspective. We can come back to this if we have to but let’s hope we don’t.

The other candidate is drawParticleSystem():

Again, were doing some stuff for every particle, so again the code inside the loop might run thousands of times every frame. But here an experienced optimizer would spot a couple of red flags:

  • Several calculations with floats
  • The functions lerp and lerpColor, which we don’t know the efficiency of

My first instinct would be to get these calculations — including the lerps — out of the draw() loop. That means pre-calculating all the values we need and storing them.We’ll trade off a slower start-up time (because a ton of calculations will have to be done then) for faster frames once we’re up and running. Usually a tradeoff like this is worth making.

Until I try it I won’t know whether this will solve the performance problem but it will surely not make things worse. So let’s do it.

To draw each particle, drawParticleSystem needs a colour and a radius. Each of these is currently being calculated based only on the particle’s age, which is an integer that ranges from 0 when it’s born to maxAge when it dies.

So my first idea was to “cache” the colour and radius that corresponds to each possible age by calculating them in advance, when the ParticleSystem is created. Unfortunately, I tried it and it didn’t make much of a difference.

What’s slow here appears to be two things:

  • Setting the tint for each particle;
  • Resizing each particle

Caching the values to use for these actions helped a little but but nowhere near enough. So we’re going to have to cache the exact image we need for every age a particle can be.

Looking ahead, we’d like to be able to give different images to different particle. But we can cross that bridge when we get to it; let’s not try to optimise code we haven’t written yet.

Here’s a sptire cache in ParticleSystem — it’s an array of PImages, one for each age the particle can be:

Now we’ll populate it with the right sprite for each age, suitable resized and tinted. Sounds easy, right? There are some catches. First, I have to be sure to get() a copy of the sprite’s image, not just store the image itself in the array. This stems from a problem about references vs values that I won’t get into here — suffice to say it took me a minute or two to figure it out:

The get() function creates a copy of the pixel data in the image — and since we’re starting at (0, 0) and going up to (spriteWidth, spriteHeight) we’re actually getting the contents of the whole image. But this is now a completely separate copy, not just a reference to the one stored in SpriteSheet. This is important because we’ll now change it.

The resizing is easy — it’s in the screenshot above. But tinting is harder. PImage doesn’t have a tint() function — this command only works when you draw the image and can’t be stored to the image itself. So we shall have to do that by hand. We’ll go pixel-by-pixel, setting the RGB channels individually.

The alpha channel ought to be a combination of the alpha channel from the image (since transparency is important there) and the alpha of the lerped colour (since that allows for fading over time). The right way to combine them is to multiply them and divide by 255 to bring the answer back into the 0-255 range.

Here is the final result:

This took me a few goes to get right, so please don’t feel disheartened if it looks, on paper, as if it just leaped out of my head fully-formed; I can assure you it didn’t.

So here’s how it looks, and it runs wonderfully fast on my laptop now; I would estimate it’s a good 100 times faster than before:

So to round off this post I wanted to fix something that’s been bugging me for a while — the way the particles drift off to the bottom right of the screen. This is caused by the fact that we’re still positioning them like circles, but images are actually positioned from their top-left corners, not their centres.

“This will be easy to fix,” I thought, but I was wrong. Here’s the fix I thought I wanted:

This is no good for two different reasons. The first is that it doesn’t do the right thing, and the second is that it destroys almost all the performance improvements we’ve fought so hard for already.

While digging through the code to see what can be done about the latter, I realised we’ve still got a leftover from the very first code we wrote: ParticleSystem always puts new particles in the centre of the screen. We will definitely want to change that later, so for now let’s pass the x and y coordinates of the centre into the constructor and store them in a PVector:

Then we’ll use that to initialise new particles:

Only the first line inside init() has changed. Note that we need to create a new PVector, again because we want each particle’s position to be its own copy of the centre position rather than a reference to it. This means each particle can still move independently of the others.

We can test this out by having the x-coordinate of the centre change while the sketch is running:

This causes our ParticleSystem to drift to the right in a stately fashion. Now we’ve tested it I’ll delete this for the moment — we can worry about moving particle systems around later when we have some physics going on.

So what about the drift? My first attempt wasn’t far off, but I want to follow the same logic as I have for size and colour; I’ll cache the amount I need to offset the image at each age. Here’s the array of offsets being declared and initialized in ParticleSystem:

and here they are being given their values (line 37) — I decided to do it while I was caching the images as that saved me a few lines of code:

Note that I’m dividing by 4 here — I’m sure my brain is just being slow today but I can’t figure out why it’s 4 and not 2. I tried 2, it didn’t work, I tried 4 and it did. Guess and check — not ideal. I need to think about this a bit more or I suspect it will come back and bite us later. But never mind for now.

Here’s sizeOffset being usedto place the particles:

It works!

But performance is still not great, especially as the sprites get larger. In fact, if you watch the particle system you’ll probably find it speeds up and slows down depending on whether there are a lot of big sprites on-screen or not. This is obviously not acceptable, so more optimization is needed.

Why is it so slow? I was mystified. So I tried removing various bits of code to see what made a difference. What I discovered was that the fix for the “drifting” is still an issue. But why? How can it be that this is slow:

but this is fast:

?? The only difference is a couple of int additions! Or is it? Actually, no, it isn’t. The problem here is that Processing’s PVector holds a pair of floats, not ints. So Processing is having to do a bunch of back-and-forth conversions between float and int to achieve this. It’s all happening in the background so it’s hard to spot.

To fix this I made a quick class called IntPair designed to store a pair of ints. But this didn’t do what I wanted. It turns out you do have to track some things as floats because otherwise the movement gets “quantized” and you end up with particles all moveing together along similar lines.

So I ended up with this class, which holds a pair of floats that can be converted to ints when you need that. I therefore called it IntableFloatPair, a name I now regret but am stuck with (choose names carefully!):

I included the add() function because I knew I’d need it; everything else is hopefully straightforward.

Then I had to change all my PVectors into IntableFloatPairs, which was quite boring. There are changes in Particle:

and rather too many to print here in ParticleSystem — just follow your nose changing all PVectors into IntableFloatPairs and you’ll be nearly done.

Just a couple of things to note. First, my add() method in IntableFloatPair isn’t quite the same as the one in PVector. My one just adds whatever you passed in to the existing values, whereas the one in PVector returns a new PVector. I don’t need PVector’s behaviour and it costs something to return a new copy, so I didn’t do it. That means we now have this line of code (line 70 — compare with the one above):

Now, in drawParticleSystem I want int values, and IntableFloatPair has methods to provide these — so you just have to be careful to put round brackets after x() and y() here to make sure you call the functions rather than just accessing the floats directly:

Is this really better than the built-in PVector? Yes, absolutely! It runs 5-10 times faster and doesn’t seem to speed up and slow down as it was doing before. Sometimes your own hand-made class or function that perfectly suits your needs is superior to the built-in one that’s designed to work fairly well for everybody.

I can now run it at 800×800 pixels with 300 particles growing up to 700 pixels wide, something that was impossible before we went through this process and now runs at a solid 20fps on my underpowered laptop. I can’t upload a GIF that big here so here’s a small version:

Optimization is complete — next week we’ll be in a position to make it do more cool stuff.

All the code for this series is available on GitHub.

[EDIT: I got to the end of this and started work on next week’s session when I remembered something that caused me to literally slap my forehead in annoyance. Processing’s default renderer doesn’t use OpenGL. If you just add P2D as a third argument to the size() command at the start of setup() you will generally get a big performance boost from graphics-intensive sketches. No quotation marks, just P2D on its own, it will turn grey. This makes a huge difference to the current sketch. Oh well, now it runs even faster, so I’m happy enough. Code from next week onwards will have this turned on but try it for yourself now and you’ll certainly see the difference.]