So i have recently become fascinated with allocators, and have tried my hand at writing a general purpose allocator, with tentative success i might add, but the main question I’m left asking myself if i can make a faster allocator than the general glibc allocator, then what is it doing that I’m possibly not doing, that is necessary for safety or correctness, that is making it slower. Any how to anyone out there who loves low level programming and has some thoughts on the subject i would welcome them.
Generally speaking its usually because they have to be more general purpose and safety.
Now I am a bit rusty since I haven’t worked with allocators in a serious manner in since college but there is a lot of bookkeeping that has to happen behind the scenes for them. Also, memory is a really security sensitive area so allocators may do a ton of checks to ensure memory integrity, applications are not writing where they shouldn’t be, ect.
The heap though is one of the largest sources of slowdown if I remember. To ensure the the computers memory is being used and not wasted allocators and OSs do a lot of work to try to prevent memory fragmentation which is not an easy job and can be slow.
Like with the C++ standard library, custom implementations are usually faster, but that is usually because they don’t have to worry about edges cases, checks, or can make reasonable assumptions that a general implementation can not do.
Allocators are a big, fun, complex topic! I’m no expert here, but what I’ve heard or reasoned through:
Code that makes a bunch of tiny allocs and frees is likely to be slow regardless of allocator engineering. Don’t make a pile of little objects with their own addresses; consolidate them into nice, linear, arrays and vecs! This makes for better memory access (less pointer-chasing, more predictable to prefecter, less stalling, less per-object overhead), and has the benefits of being simpler if not in fact already done for you.
Where there must be many small allocations, see if you can use an arena. The general idea is to find a lifetime that many things will have (a simulation tic, the duration of these three functions, etc), and put them all together so when they die, free() is trivial. Any (few!) survivors can be realloc()d elsewhere.
In cases where you really do need to make many individual allocations with hetrogenous lifetimes, I’m out of my depth. Consult the ancients, writers of OSs, VMs, game engines, and the like!