Mandalika's scratchpad [ Work blog @Oracle | Stock Market Notes | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14  07.14  09.14  10.14  11.14  12.14  01.15  02.15  03.15  04.15  06.15  09.15  12.15  01.16  03.16  04.16  05.16  06.16  07.16  08.16  09.16  12.16  01.17  02.17  03.17  04.17  06.17  07.17  08.17  09.17 


Thursday, January 26, 2006
 
64-bit dbx: internal error: signal SIGBUS (invalid address alignment)

The other day I was chasing some lock contention issue with a 64-bit application running on Solaris 10 Update 1; and stumbled with an unexpected dbx crash. My intention is to find the hot locks with the help of dbx's threads, syncs, sync -info <address> commands. (Of course, most of this information can be obtained with modular debugger, mdb, too - but that's a different story). So when I attached the running process to dbx, it crashed, while loading some of the libraries with the following error:

dbx: internal error: signal SIGBUS (invalid address alignment)
dbx's coredump will appear in /tmp


As the error message indicates, the problem is with the alignment of 64 bits (in this case) on 64-bit boundary of the object being loaded. From the bug report dbx dumped core when hitting a misaligned load in libdl.so.1, it appears that elfsign is responsible for the misaligned section headers of Solaris 10's 64-bit libraries. Since then Sun fixed the elfsign, and released majority of properly aligned 64-bit libraries as patches through sunsolve web site. So, if the system running Solaris 10 (or later) is not patched with the fixed libraries, it is still possible to see dbx crashes. This is because 64-bit dbx expects the libraries to be aligned properly on their boundaries -- it just crashes when it encounters any misaligned libraries.

Perl script to check misaligned section headers in 64-bit objects

(Credit: Chris Quenelle)
#!/usr/bin/perl

use File::Find;
sub wanted {
return unless -x && -f;
return unless /.*\.so\.[0-9]$/;
return unless `/bin/file $_ | grep 64-bit`;
$out = `elfdump -e $_ | grep e_shoff`;
$out =~ m/e_shoff:\s+(0x[0-9a-f]+)\s/;
if ($1 =~ m/[4c]$/) {
print "bad alignment in file: $File::Find::name\n";
}
}

print "Looking for bad ELF section header table alignment in 64-bit files\n";
find(\&wanted, ( "/lib", "/usr/lib" ));

Make this script executable with chmod +x <script>.pl command, and run it directly ie., ./<script>.pl (doesn't need root privileges)

Fix to Sun Studio 11's 64-bit dbx

If the above perl script returns some library names as output, it is very likely that the existing 64-bit dbx may not work on Solaris 10 (and later). Chris Quenelle of Sun Microsystems, posted a fixed 64-bit Studio 11 dbx binary, that doesn't crash even if it encounters misaligned elf section headers, in his blog. The binary can be downloaded from: http://mediacast.sun.com/share/quenelle/dbx.ss11.align.fix.gz.

Note that this is an unsupported binary. Do not copy it over the existing Studio 11 64-bit dbx. Just place it anywhere else, and make it an executable. Then simply attach the running process to the new 64-bit dbx binary.

Quick summary

Sun Studio's 64-bit dbx (any version) may crash on Solaris 10 (and later) platform. An unsupported fix can be downloaded from http://mediacast.sun.com/share/quenelle/dbx.ss11.align.fix.gz

More information in Chris Quenelle's web log:
  1. Two bad Solaris bugs that affect dbx users
  2. misligned ELF section headers

Example showing the dbx crash

% /home/dev/SS11/SUNWspro/prod/bin/sparcv9/dbx - 14901
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.1' in your .dbxrc
Reading -
Reading ld.so.1
Reading libmdxmembernamecache.so
Reading libnqcachestorage.so
Reading libnqcryptography.so
Reading libnqperf.so
Reading libnqportable.so
Reading libnqscache.so
...
...
Reading libsocket.so.1
Reading librt.so.1
Reading libCstd.so.1
Reading libCrun.so.1
Reading libm.so.1
Reading libthread.so.1
Reading libc.so.1
Reading libdl.so.1
Reading libnsl.so.1
Reading libaio.so.1
Reading libmd5.so.1
Reading libm.so.2
Reading libc_psr.so.1
Reading en_US.ISO8859-1.so.3

dbx: internal error: signal SIGBUS (invalid address alignment)
dbx's coredump will appear in /tmp
detaching from process 14901
Abort (core dumped)

Technorati Tags
| |


Comments: Post a Comment

Links to this post:

Create a Link



<< Home


2004-2017 

This page is powered by Blogger. Isn't yours?