Pages

Thursday, August 04, 2005

Sun Studio C/C++: Annotated listing (compiler commentary) with er_src

Wouldn't it be nice if we know what exactly the compiler did when we specify a set of optimization flags on compile line? We specify a wide variety of compiler options with the hope that the resulting binary performs better. But unless we know for sure that using a certain compilation flag helps, most of the times it appears that the compiler is doing nothing, and even we may think that certain options were there just to give placebo effect to the user.

Sun ships a tool called er_src with Sun Studio compilers; so the users can examine or have a look at the optimizations done by the compiler(s). With the availability of er_src, compiler optimizations are no longer treated as a "black box" optimization.

Compiler logs most of its actions in stabs/dwarf of ELF object file, when the source was compiled with -g (debug) flag. Compiler generated messages are called "compiler commentary"; and the commentary will be interspersed in the source code, where the compiler did some optimization or transformation. When compiled with debug flag (-g), compiler commentary and the location of the source code will be stored in the object file (.o). er_src tool reads the source file and interleaves the compiler commentary in the output. Obviously the original source file has to be there in the path that was stored in the object file, during compilation. [Thanks to Chris Quenelle for the correction]

With er_src <object-file>, er_src dumps all the source along with compiler commentary. It is also possible to get the commentary and the disassembly for all or selected functions. Note that er_src even accepts Java class (.class) files.

Read the man page of er_src, for an explanation of how to read compiler commentary in object files to determine for which functions the compiler actually makes a substitution.

eg.,
 % cat string.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int __strcmp(const char *str1, const char *str2 ) {
int rc = 0;

for(;;) {
rc = *str1 - *str2;
if(rc != 0 || *str1 == 0) {
return (rc);
}
++str1;
++str2;
}
}

int __strlen(const char *str) {
int length = 0;

for(;;) {
if (*str == 0) {
return (length);
} else {
++length;
++str;
}
}
}

char *__strreverse(const char *str) {
int i, length = 0;
char *revstr = NULL;

length = __strlen(str);
revstr = (char *) malloc (sizeof (char) * length);

for (i = length; i > 0; --i) {
*(revstr + i - 1) = *(str + length - i);
}

return (revstr);
}

int main() {
printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));
printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));
printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));

return (0);
}

To see the impact of compiling the same code with -O4 optimization, compile this source with -g -xO4 options
% cc -c -g -xO4 string.c

% er_src string.o
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {


Bounds test for loop below moved to top of loop
6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
10. if(rc != 0 || *str1 == 0) {
11. return (rc);
12. }
13. ++str1;
14. ++str2;
15. }
16. }
17.
18. int __strlen(const char *str) {


Bounds test for loop below moved to top of loop
19. int length = 0;
20.
21. for(;;) {
22. if (*str == 0) {
23. return (length);
24. } else {
25. ++length;
26. ++str;
27. }
28. }
29. }
30.
31. char *__strreverse(const char *str) {
32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.

Loop below scheduled with steady-state cycle count = 3 <= indicates that
software pipelining (modulo scheduling) has been applied

Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

38. for (i = length; i > 0; --i) {
39. *(revstr + i - 1) = *(str + length - i);
40. }
41.
42. return (revstr);
43. }
44.
45.
46. int main() {

Function __strcmp inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse inlined from source file string.c into the code for the following line
Function __strlen inlined from source file string.c into inline copy of function __strreverse
Bounds test for loop below moved to top of loop
Loop below scheduled with steady-state cycle count = 3
Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
From this listing, it is clear that the compiler tried its best to optimize the code by inlining the routines, and by doing loop unrolling and transformations. Of course, these are the things it is supposed to do with the documented -xO4 option. But since the compiler predictions may not be correct all the time, it is the responsibility of the user (say developer) to find out how the code being laid out; and if not satisfied with the outcome, to give more hints to the compiler with the compiler supported pragmas, profile feedback, rearranging the code etc.,

Now let's see what the compiler thinks about the same code, if we provide some feedback about run-time behavior of the program.
 % cc -g -xO4 -xprofile=collect -o string string.c

% ./string

strcmp(pod, podcast) = -99 <- returns 0 if matches
strlen(Solaris10) = 9
reverse(Solaris10) = 01siraloS

% ls -ld string.profile
drwxrwxrwx 2 build engr 512 Aug 4 17:23 string.profile/

% cc -g -xO4 -xprofile=use:string -c string.c

% er_src string.o
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {

6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
10. if(rc != 0 || *str1 == 0) {
11. return (rc);
12. }
13. ++str1;
14. ++str2;
15. }
16. }
17.
18. int __strlen(const char *str) {

19. int length = 0;
20.
21. for(;;) {
22. if (*str == 0) {
23. return (length);
24. } else {
25. ++length;
26. ++str;
27. }
28. }
29. }
30.
31. char *__strreverse(const char *str) {

32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen not inlined because the profile-feedback execution count is too low
35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.

Loop below scheduled with steady-state cycle count = 3
Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

38. for (i = length; i > 0; --i) {
39. *(revstr + i - 1) = *(str + length - i);
40. }
41.
42. return (revstr);
43. }
44.
45.
46. int main() {


Function __strcmp not inlined because the profile-feedback execution count is too low
47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen not inlined because the profile-feedback execution count is too low
48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse not inlined because the profile-feedback execution count is too low
49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
This time, the compiler thought it is not very beneficial to inline the routines because the execution frequency of those routines is too low (1 in this case); and of course that's what profile feedback optimization is supposed to do ie., optimizing the code, based on the run-time feedback. In this example, both -xO4 and -xprofile (Profile Feedback Optimization) are working together to make the best decision.

Few more examples:
To list all functions from the given object:
% er_src -func string.o

Functions sorted in lexicographic order

Load Object:

Address Size Name

0x00000000 64 __strcmp
0x00000040 72 __strlen
0x00000088 184 __strreverse
0x00000140 372 main
To print the compiler commentary only for changes involved inlining:
% er_src -cc inline string.o
...
...

29. }
30.
31. char *__strreverse(const char *str) {
32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen inlined from source file string.c into the code for the following line
35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.
...
...

43. }
44.
45.
46. int main() {

Function __strcmp inlined from source file string.c into the code for the following line
47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen inlined from source file string.c into the code for the following line
48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse inlined from source file string.c into the code for the following line
Function __strlen inlined from source file string.c into inline copy of function __strreverse
49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
To print disassembly:
% er_src -disasm all -1 string.o
---------------------------------------
Annotated disassembly
---------------------------------------
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {

[ 5] 0: ldsb [%o0], %o4
[ 5] 4: mov %o0, %o3

Bounds test for loop below moved to top of loop
6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
[ 9] 8: ldsb [%o1], %o5
[ 9] c: subcc %o4, %o5, %o0
10. if(rc != 0 || *str1 == 0) {
[10] 10: bne,pn %icc,0x38
[10] 14: cmp %o4, 0
[10] 18: be,pn %icc,0x38
11. return (rc);
12. }
13. ++str1;
[13] 1c: inc %o3
[ 9] 20: ldsb [%o1 + 1], %o5
[ 9] 24: ldsb [%o3], %o4
14. ++str2;
[14] 28: inc %o1
[ 9] 2c: subcc %o4, %o5, %o0
[10] 30: be,pt %icc,0x18
[10] 34: cmp %o4, 0
[11] 38: retl
[11] 3c: nop
15. }
16. }
17.
...
...
__________________
Technorati tags: | |

2 comments:

  1. Just a minor nit. The source code doesn't get stored in your .o file. The .o file only has the pathname to the source code that went into it.
    er_src reads the source file and interleaves the compiler commentary in the output.

    ReplyDelete
  2. Thanks, Chris. Unfortunately I'm always falling for these er_* tools. Earlier it was er_print, and now er_src. Even though I thought it is a risk to encode the source in object file (which is wrong of course), stripping the binary relieved me a bit. I think I need to work more on the run machine where I can't access the source, instead of build machine.

    ReplyDelete