Scalability issues with standard memory allocatorIt is a known fact that multi-threaded applications do not scale well with standard memory allocator, because the heap is a bottleneck. When multiple threads simultaneously allocate or de-allocate memory from the allocator, the allocator will serialize them. Therefore, with the addition of more threads, we find more threads waiting, and the wait time grows longer, resulting in increasingly slower execution times. Due to this behavior, programs making intensive use of the allocator actually slow down as the number of processors increases. Hence standard
malloc
works well only in single-threaded applications, but poses serious scalability issues with multi-threaded applications running on multi-processor (SMP) servers.
Solution: libumem
, an userland slab allocatorSun started shipping
libumem
, an userland slab (memory) allocator, with Solaris 9 Update 3.
libumem
provides faster and more efficient memory allocation by using an
object caching mechanism. Object caching is a strategy in which memory that is frequently allocated and freed will be cached, so the overhead of creating the same data structure(s) is reduced considerably. Also per-CPU set of caches (called
Magazines) improve the scalability of
libumem
, by allowing it to have a far less contentious locking scheme when requesting memory from the system. Due to the object caching strategy outlined above, the application runs faster with lower lock contention among multiple threads.
libumem
is a page based memory allocator. That means, if a request is made to allocate 20 bytes,
libumem
aligns it to the nearest page (ie., at 24 bytes on SPARC platform -- the default page size is
8K on Solaris/SPARC) and returns a pointer to the allocated block. As these requests add up, it can lead to internal fragmentation, so the extra memory that is not requested by application, but allocated by
libumem
is wasted. Also
libumem
uses 8 bytes of every buffer it creates, to keep meta data about that buffer. Due to the reasons outlined in this paragraph, there will be a slight increase in the per process memory footprint.
More interesting information about
libumem
can be found in the article
Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources.
libumem
can also be used in catching memory management bugs like memory leaks, corrupted heap, in an application.
Identifying Memory Management Bugs Within Applications Using the libumem Library article has the detailed steps to catch memory management bugs with lucid explanation/examples.
Quick tip:
Run "
truss -c -p <pid>
", and stop the data collection with Ctrl-c (^c) after some time say 60 sec. If you see more number of system calls to
lwp_park, lwp_unpark, lwp_mutex_timedlock
, it is an indication that the application is suffering from lock contention, and hence may not scale well. Consider linking your application with
libumem
library, or pre-load
libumem
during run-time, for better scalability.
Technorati tagsSolaris |
OpenSolaris