Mandalika's scratchpad [ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14  07.14  09.14  10.14  11.14  12.14  01.15  02.15  03.15  04.15  06.15  09.15  12.15  01.16  03.16  04.16  05.16  06.16  07.16  08.16  09.16  12.16  01.17  02.17  03.17  04.17  06.17  07.17  08.17  09.17  10.17  12.17  01.18  02.18  03.18  04.18  05.18  06.18  07.18  08.18  09.18  11.18  12.18  01.19  02.19  05.19  06.19  08.19  10.19  11.19  05.20  10.20  11.20  12.20  09.21  11.21  12.22 


Saturday, March 18, 2006
 
Solaris: Better scalability with libumem

Scalability issues with standard memory allocator

It is a known fact that multi-threaded applications do not scale well with standard memory allocator, because the heap is a bottleneck. When multiple threads simultaneously allocate or de-allocate memory from the allocator, the allocator will serialize them. Therefore, with the addition of more threads, we find more threads waiting, and the wait time grows longer, resulting in increasingly slower execution times. Due to this behavior, programs making intensive use of the allocator actually slow down as the number of processors increases. Hence standard malloc works well only in single-threaded applications, but poses serious scalability issues with multi-threaded applications running on multi-processor (SMP) servers.

Solution: libumem, an userland slab allocator

Sun started shipping libumem, an userland slab (memory) allocator, with Solaris 9 Update 3. libumem provides faster and more efficient memory allocation by using an object caching mechanism. Object caching is a strategy in which memory that is frequently allocated and freed will be cached, so the overhead of creating the same data structure(s) is reduced considerably. Also per-CPU set of caches (called Magazines) improve the scalability of libumem, by allowing it to have a far less contentious locking scheme when requesting memory from the system. Due to the object caching strategy outlined above, the application runs faster with lower lock contention among multiple threads.

libumem is a page based memory allocator. That means, if a request is made to allocate 20 bytes, libumem aligns it to the nearest page (ie., at 24 bytes on SPARC platform -- the default page size is 8K on Solaris/SPARC) and returns a pointer to the allocated block. As these requests add up, it can lead to internal fragmentation, so the extra memory that is not requested by application, but allocated by libumem is wasted. Also libumem uses 8 bytes of every buffer it creates, to keep meta data about that buffer. Due to the reasons outlined in this paragraph, there will be a slight increase in the per process memory footprint.

More interesting information about libumem can be found in the article Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources.

libumem can also be used in catching memory management bugs like memory leaks, corrupted heap, in an application. Identifying Memory Management Bugs Within Applications Using the libumem Library article has the detailed steps to catch memory management bugs with lucid explanation/examples.

Quick tip:
Run "truss -c -p <pid>", and stop the data collection with Ctrl-c (^c) after some time say 60 sec. If you see more number of system calls to lwp_park, lwp_unpark, lwp_mutex_timedlock, it is an indication that the application is suffering from lock contention, and hence may not scale well. Consider linking your application with libumem library, or pre-load libumem during run-time, for better scalability.

Technorati tags
|


Comments: Post a Comment



<< Home


2004-2019 

This page is powered by Blogger. Isn't yours?