Jump to content

New memory manager


Recommended Posts

Hey guys.


One of my pet peeves is memory management. It's something that bugs the hell out of me. Not because I dislike it, but because from experience, managing memory can be super critical in making sure that your game runs smoothly and depending on the platform, is stable. The principal idea is to avoid calling the system memory allocation libs as much as possible. This helps would give the following benefits;

 

  1. Performance - system memory calls are typically very costly so they are avoided as much as possible.
  2. Helps to reduce overhead caused by OS memory paging/virtual mapping and re-mapping.
  3. Helps with stability as free'd memory blocks can be coalesced with surrounding free blocks. Memory fragmentation should not occur or only happen in very rare circumstances
  4. Helps track down issues w.r.t. memory bugs
  5. Helps for future platforms with limited memory (mobile, console etc) compared to desktop

 

I tried enabling the memory manager in RC 3.7 and got a whole load of compiler issues. I'm not all that bothered. by that because the existing manager is messy and relies on overriding new within classes to get it to work properly. I'm thinking of implementing a new memory manager with the following requirements;

 

  1. TLSF based - an open source, reliable memory allocation scheme. I know that Unity uses it for it's underlying memory alloc stuff on console.
  2. Globally overrides new and delete. Now, I KNOW that this is a massive issue for some folks. But in my view it saves a lot of pain and effort.
  3. Memory allocation/free tracking. This comes at the cost of of extra memory usage, but is extremely handy in helping track down unused memory, multiple free bugs etc.
  4. Allocate memory in large "pages" which are managed by tlsf.
  5. Provide specialized heaps such as a block heap to reduce the cost of frequent small allocations (64-bytes or less)
  6. Thread safety - there's an issue here with the way that the thread/mutex stuff is implemented in Torque right now - these would have to be re-jigged.
  7. Labeling of where memory is allocated from and why - slightly different than tracking by source file. Able to identify things such as resource/sim usage.

 

I intend to implement most, if not all of these features in my own build and test them out. Now what I want to know from you guys is;

 

  1. Is this something you feel would be useful to you?
  2. What other features would you like to see?
  3. Is it something that you would like to see in T3D main branch (really one for the steering committee I guess) ?

Link to post
Share on other sites

Just a quick word from me, since I'm not too knowledgeable about this - in TGE, where the memory manager was still in use, there was trouble with adding libraries like recast, or even the C++ standard library AFAIR, because of the global overriding of new.

 

Performance - system memory calls are typically very costly so they are avoided as much as possible.

I've also read the opposite - that this used to be true but isn't any more. Are there any sources for this?


What are the issues with threads/mutexes? Keen to hear more about that.


All that said, having a working memory manager sounds like a win.

Link to post
Share on other sites
Just a quick word from me, since I'm not too knowledgeable about this - in TGE, where the memory manager was still in use, there was trouble with adding libraries like recast, or even the C++ standard library AFAIR, because of the global overriding of new.

 

I never tried to add 3rd party libs with the old memory manager, but I don't think new is overridden globally. I haven't grep'd this too much though, but all I saw with the memory manager define was new being overridden per class.


Overriding new globally can cause some issues though. A good example is anything that uses stl and uses a container which is either declared as static/global. So the container tries to call new during startup, before main is called, and everything explodes.

 

Performance - system memory calls are typically very costly so they are avoided as much as possible.

I've also read the opposite - that this used to be true but isn't any more. Are there any sources for this?

 

It could be that os memory calls have gotten better. I know that on Android (as of a couple of years ago) they were slow-ish and the same goes for iOS. Under Windows/Linux/OSX this may have gotten much better, but there is still the following issues;

 

  1. Having lots of allocs from os allocators could cause virtual paging to be slow or even erratic.
  2. Instances where lots of small allocations below a certain size tend to be slow in comparison to a specialized unit heap allocation system.

 

Then there's console, where memory interfaces can be massively different. The calls to malloc/new et al tend to be wrapped around system specific libs. Not to mention manipulation of the CPU VAT in lieu of a buddy-allocation scheme to get rid of fragmentation (typically, I see this on engines primarily aimed at PC and ported to console). And even if os calls and paging isn't an issue, having an underlying allocator that will deal with specializing in small allocations and helping to reduce fragmentation on systems with very primitive allocators (Playstation, Nintendo) is very helpful. And if a platform specific allocator needs to be written to make use of CPU VAT, then it's an "easy" task because all the memory allocation stuff goes through a single interface.

 

What are the issues with threads/mutexes? Keen to hear more about that.

 

Good question. In order to avoid namespace pollution in the rest of the engine, the classes are defined in platform with a forward declared struct that holds the platform specific stuff for the implementation.For example for the Mutex class this is called PlatformMutexData, and the Mutex class internally holds a pointer to that struct called mData.


The actual implementation of the Mutex class is done on a per platform basis. In PlatformWin32 for example, there is a mutex.cpp file that does this and also defines PlatformMutexData. In the constructor of Mutex, mData is initialized by allocating the PlatformMutexData struct using new. It looks a little like this;

 

struct PlatformMutexdata
{
    CRITICAL_SECTION mCriticialSection;
};
 
Mutex::Mutex()
{
    mData = new PlatformMutexData;
    InitializeCriticalSection(&mData->mCriticalSection);
}

 

On the surface this looks ok (if a little shoddy). But if we're to use a Mutex inside of the memory manager to make it thread safe, we end up allocating memory for the Mutex BEFORE the memory manage is initialized. There are some possible solutions to this, which are :-

 

  • Re-write the thread/sync classes exposing platform specific details to the rest of the engine to avoid calling new (easy, but bad)
  • Re-write the thread/sync classes to use a custom allocator system (less easy, but prone to bugs and other issues)
  • Re-write the thread/sync classes to do interface hiding in some other easy way that I can't think of
  • Implement an interface for mutex's in the platform layer of the memory manager (easy, but more a little more work per-platform)

 

All that said, having a working memory manager sounds like a win.

 

I hope so. As long as it's implemented properly, and can be turned on and off it should make general development and porting a lot easier.

Link to post
Share on other sites

The next question to ask, of course, is whether we can introduce memory management in specific areas where it would make sense - for example making object pooling easier to do. Projectiles, for example, might benefit from this, depending on your gametype. That's a higher-level issue, but relevant to memory management I think.


Thanks for the great run-down on mutexes. Does sound like a tricky problem. I'd defer to your wisdom on that issue!

Link to post
Share on other sites
The next question to ask, of course, is whether we can introduce memory management in specific areas where it would make sense - for example making object pooling easier to do. Projectiles, for example, might benefit from this, depending on your gametype. That's a higher-level issue, but relevant to memory management I think.

 

I think object pooling, while a memory issue, is something that's done to the implementation of that particular system. It's not just allocating objects that tends to be problematic, but also, the lists that these objects manage internally. And since they tend to use templated contains such as Vector, it makes it problematic to ensure that objects are allocating from a single pool of memory.


In short; it's really down to the overall design of the system, such as particles, rather than relying on a catch-all technique.


This however, is where a memory manager can come in handy (as well as good profiling tools). If you can track where allocations happens (Vectors, Console Object types etc), you can get an idea of which systems are doing lots of allocations and and prioritise a re-design of how they use resources.

 

Thanks for the great run-down on mutexes. Does sound like a tricky problem. I'd defer to your wisdom on that issue!

 

I'm leaning towards a platform hook to create a mutex for the memory manager itself. Which brings me to the next subject.


I downloaded an old version of TGE 1.4 from my account at GG to use as a testbed for a new memory manager. I wrote a small memory manager that implemented some of my requirements using tlsf that hooked into dMalloc, dRealloc and dFree. It worked pretty well as it turned out, and gave me some interesting stats on the Stronghold FPS demo bundled with the engine.


The most interesting thing, was that I used lazy initialisation in the memory manager because I didn't want to mess round with figuring out where to put it. This was a good thing, as there are lots of global static objects such as Mutex that call new or dMalloc in their constructors. This meant that the allocator was being called before main was called! I also got some other interesting stats from the Stronghold demo;


In-game mem usage ~19MB

Peak mem usage: ~21MB

Number of allocations: ~48k

Number of frees: ~21K

Number of realloc calls: ~4K (alloc and free were also bumped up by this)


I meant to track the number of allocations below 16, 32 and 64 bytes as I have a feeling there's a lot of those going on but I haven't had the time. Once it's up and running, I'll transplant it to 3.6 or 3.7 and see what interesting stuff I can find.

Link to post
Share on other sites
  • 1 month later...

@lowlevelsoul I just realised - are these mutex issues you're describing the reason why we have a nice object-oriented API for semaphores, but a more procedural API for mutexes? I always wondered why I can't mutex->lock, and instead have to Mutex::lock(mutex).

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...