Okay and now some micro optimization stuff because I just want to nerd out.
IF you do want to get into more micro optimization learning more about how CPUs handled instruction, memory, branching, and so on is helpful. Mind you, most of the time this isn’t necessary at the game logic level unless you’re doing very particular things, but if you have many objects that do the same/similar things, there are some things you can do:
1 - group by logic
Each instruction has to be sent to the CPU so CPUs cache instructions and basically try to ‘work ahead’ as much as possible. If your logic is changing all the time (e.g. you run A.Update() then B.Update() then A.Update() again) then it can’t cache all the instructions because it needs to run different ones all the time. The solution is to run all A’s together then all the B’s (again this really only makes sense to do if you have many A’s and or many B’s).
2 - know your data
In a similar manner to instructions, CPUs cache memory as well. It turns out the physical distance from your ram to CPU matters because CPUs are so fast that the physical time for the signal to travel is actually relatively slow. Thus, CPUs have a small RAM cache on the chip itself so when they need to get something you have in memory, they pull all the surrounding memory in hopes that it gets whatever else it needs, and puts it in the cache. You can take advantage of this by putting all your similar data in one spot, instead of allocated wherever. e.g. if you have 500
class Dingbot, and you compute something with each of their data every frame, it’s faster to keep them all in one packed array then have them scattered about memory. This is because when you compute Dingbot, the CPU has likely pulled Dingbot[1, 2,…etc] into cache since they’re all next to each other in memory, so when you compute over Dingbot it doesn’t need to go out to main memory again.
3 - reduce your memory footprint
This goes with #2 and for much the same reason. Because the CPU cache is small since it has to fit on the chip, the smaller the size of your data structures, the more of them you fit on the cache, and the CPU will have to fetch from main memory less frequently.
4 - reduce conditional branches
This follows from #1, that if the cpu is trying to cache instructions ahead of time, then if you branch (i.e.
if(true/false) it breaks this because if it’s true it has to load a certain set of instructions, and if it’s false it has to load a different set. You’d be surprised how often I’ve been able to factor code out of branches, or branch less frequently by organizing my code and my data better. An example, if I have several
class A and
class B: A as well as
class C: A which inherit from A, and I know that the code that branches is based on class, then branching once, then running the code for all of B, then branching again and running all of the code for C, rather than branching on each individual object, can dramatically reduce how often you branch.
A psuedocode example:
// many branches
foreach (obj in children_of_A)
// versus less branched code
foreach (obj in b_class_objs)
// we know all B's are shiny so no branch
foreach(obj in c_class_objs)
// we know that all C's are dull so no branch needed
In both cases we iterate over the same number of objects, but by knowing our data and separating it properly, we can reduce the number of branches.
- You probably don’t need to do most of this unless you’re doing something ambitious.
- Know your data
- Just like the real world “Cache is king baby”