The 2024 Wheel Reinvention Jam just concluded. See the results.

How are L2 cache misses such a big problem? Is something flushing the cache?

Let me preface this by saying that I understand that OOP practices are not cache-friendly since they allocate objects separately in many different places in memory.

I also understand that the L2 cache is usually in the 1MB-8MB range. My question is how is 1MB of cache not enough to hold the data you use the most?

I have some theories but don't know which one is what actually happens. I'll list them and maybe some of you can help me understand this better.

1- Maybe the caching algorithm is not very smart and every time you manipulate a large array such as an art asset or something it uses the whole space in the cache and ends up flushing everything you did before?
2- Maybe other processes are using a big portion of the cache as well? (This doesn't seem likely since on my computer at least most processes are at 0% CPU usage most of the time)
3- Maybe I'm just wrong in my assumption that game data is small except for art assets and some other cases like this? Maybe you do actually need much more than 2MB of cache to hold all of your data? I don't know about this one, it might be true in AAA games, but not on smaller indie games?,


Am I missing some other reason?

Could it be the case that in the future if L2 caches increase in size to something like 50MB+ it will be less of a problem?

Edited by Italo on
Well, the L2 cache size is not 1-8MB - it's only 256kb per core for recent Intel CPUs and 512kb for recent AMD CPUs. The L3 cache is generally several megabytes these days, but that's shared between all cores, and is very slow compared to L2 and especially L1. Also note that while L1 is separated into a code cache and a data cache, L2 and L3 hold code and data together, so your application's machine code (including library functions you call) will also take up some space in L2.

The problem with making the cache arbitrarily large is that it would also make it arbitrarily slow, since there's a tradeoff here between size and latency. This is the reason CPUs have multiple levels of cache in the first place, as opposed to just one big cache, or even just a direct link to memory. In fact, L2 is already kinda slow - around a couple dozen cycles to access, as opposed to just 5 or 6 for L1 cache. So you'd really prefer to keep data in L1 if possible. L1 and L2 cache sizes haven't changed much in the past several years, so I don't foresee getting 50 MB of L2 anytime soon.

As for how much data is really needed, it depends entirely on the game. For heavy 3D games, it really can be a lot. And keep in mind that OOP doesn't only affect the access patterns of memory, it can also substantially increase the amount of memory being used. For example, consider the difference between storing a contiguous array of objects vs. storing an array of pointers to individually heap-allocated objects. You're eating 8 bytes for the pointer to the object, maybe another 8 bytes for the vtable pointer inside the object, some unspecified amount for heap management, probably some padding to meet heap alignment, and likely some unspecified amount of overhead from fetching whole cache lines that you aren't going to use, depending on how fragmented the heap is.

Edited by Miles on
Another quick comment regarding how much data a game really has, here is a link to an old video where Jon mentions the size of the base Entity class in The Witness: https://youtu.be/ZHqFrNyLlpA?t=41m20s

Now imagine having an island full of those - oh, and some of them have skeletal animations, and there's grass everywhere, and particle systems, and audio clips playing all the time... this stuff adds up quick!
Another interesting point - cache is not arbitrary loaded with bytes from anywhere. It is split in cache lines which is 64-bytes on recent CPUs. That means that for 256KB cache there are only 4096 different cache lines possible, sharing code & data.

For your #2 point - even if your other processes show that they are using 0% CPU they could still wake up hundreds if not thousands of times per second to do some kind of small work (check mutex/event and go back to sleep). That touches at least one or two of cache lines, probably more. If I look in my task manager I can see that I have ~3000 threads running in total. Not all of them are active of course, but that is almost same count as my cache line count (per cpu core).
Thanks for the responses, I guess it makes sense.

Also, when I was writing this post I realized that many games / applications take up a substantial amount of RAM when running, so I guess it makes sense that the cache line isn't big enough to keep all the data needed "hot".