Mandalika's scratchpad [ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04  10.04  11.04  12.04  01.05  02.05  03.05  04.05  05.05  06.05  07.05  08.05  09.05  10.05  11.05  12.05  01.06  02.06  03.06  04.06  05.06  06.06  07.06  08.06  09.06  10.06  11.06  12.06  01.07  02.07  03.07  04.07  05.07  06.07  08.07  09.07  10.07  11.07  12.07  01.08  02.08  03.08  04.08  05.08  06.08  07.08  08.08  09.08  10.08  11.08  12.08  01.09  02.09  03.09  04.09  05.09  06.09  07.09  08.09  09.09  10.09  11.09  12.09  01.10  02.10  03.10  04.10  05.10  06.10  07.10  08.10  09.10  10.10  11.10  12.10  01.11  02.11  03.11  04.11  05.11  07.11  08.11  09.11  10.11  11.11  12.11  01.12  02.12  03.12  04.12  05.12  06.12  07.12  08.12  09.12  10.12  11.12  12.12  01.13  02.13  03.13  04.13  05.13  06.13  07.13  08.13  09.13  10.13  11.13  12.13  01.14  02.14  03.14  04.14  05.14  06.14  07.14  09.14  10.14  11.14  12.14  01.15  02.15  03.15  04.15  06.15  09.15  12.15  01.16  03.16  04.16  05.16  06.16  07.16  08.16  09.16  12.16  01.17  02.17  03.17  04.17  06.17  07.17  08.17  09.17  10.17  12.17  01.18  02.18  03.18  04.18  05.18  06.18  07.18  08.18  09.18  11.18  12.18  01.19  02.19  05.19  06.19  08.19  10.19  11.19  05.20  10.20  11.20  12.20  09.21  11.21  12.22 


Sunday, September 04, 2005
 
Sun Studio C/C++: Profile Feedback Optimization II

Most of the related information is already available at: Sun Studio C/C++: Profile Feedback Optimization. This blog post tries to cover the missing {from previous blog post} pieces of PFO (aka Feedback Based Optimization, FBO).

Compiling with multiple profiles

Even though it was not mentioned explicitly {in plain english} in the C++ compiler options, Sun C/C++ compilers accept multiple profiles on the compile line, with multiple -xprofile=use:<dir> options. -xprofile=use:<dir>:<dir>..<dir> results in a compilation error.

eg.,
CC -xO4 -xprofile=use:/tmp/prof1.profile -xprofile=/tmp/prof2.profile driver.cpp

When compiler encounters multiple profiles on the compile line, it merges all the data before proceeding to do optimizations based on the feedback data.

Building patches contd.,

In general, it is always recommended to collect profile data, whenever something gets changed in the source code. However it may not be feasible to do it, when very large applications were built with feedback optimization. So, organizations tend to skip the feedback data collection when the changes are limited to very few lines (Quick fixes); and to collect the data once the quick fixes become large enough to release a patch Cluster (aka Fix pack). Normally fix packs will have the binaries for the entire product, and all the old binaries will be replaced with the new ones when the patch was applied.

It is important to know, how a simple change in source code affects the feedback optimization, in the presence of old profile data. Assume that an application was linked with a library libstrimpl.so, that has implementation for string comparison (__strcmp) and for calculating the length of a string (__strlen).

eg.,
% cat strimpl.h
int __strcmp(const char *, const char *);
int __strlen(const char *);

% cat strimpl.c
#include <stdlib.h>
#include "strimpl.h"

int __strcmp(const char *str1, const char *str2 ) {
int rc = 0;

for(;;) {
rc = *str1 - *str2;
if(rc != 0 || *str1 == 0) {
return (rc);
}
++str1;
++str2;
}
}

int __strlen(const char *str) {
int length = 0;

for(;;) {
if (*str == 0) {
return (length);
} else {
++length;
++str;
}
}
}

% cat driver.c
#include <stdio.h>
#include "strimpl.h"

int main() {
int i;

for (i = 0; i < 50; ++i) {
printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));
printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));
}

return (0);
}

Now let's assume that the driver was built with the feedback data, with the following commands:
cc -xO2 -xprofile=collect -G -o libstrimpl.so strimpl.c
cc -xO2 -xprofile=collect -lstrimpl -o driver driver.c
./driver
cc -xO2 -xprofile=use:driver -G -o libstrimpl.so strimpl.c
cc -xO2 -xprofile=use:driver -lstrimpl -o driver driver.c

For the next release of the driver, let's say the string library was extended by a routine to reverse the given string (__strreverse). Let's see what happens if we skip the profile data collection for this library, after integrating the code for __strreverse routine. The new code can be added anywhere (top, middle or at the end) in the source file.

Case 1: Assuming the routine was added at the bottom of the existing routines

% cat strimpl.c
#include <stdlib.h>
#include "strimpl.h"

int __strcmp(const char *str1, const char *str2 ) { ... }

int __strlen(const char *str) { ... }

char *__strreverse(const char *str) {
int i, length = 0;
char *revstr = NULL;

length = __strlen(str);
revstr = (char *) malloc (sizeof (char) * length);

for (i = length; i > 0; --i) {
*(revstr + i - 1) = *(str + length - i);
}

return (revstr);
}

% cc -xO2 -xprofile=use:driver -G -o libstrimpl.so strimpl.c
warning: Profile feedback data for function __strreverse is inconsistent. Ignored.

This (adding the new code at the bottom of the source file) is the recommended/wisest thing to do, if we don't want to collect the feedback data for the new code that we add. Doing so, the existing profile data remains consistent, and get optimized as before. Since there is no feedback data available for the new code, compiler simply does the optimizations as it usually does without -xprofile.

Case 2: Assuming the routine was added somewhere in the middle of the source file

% cat strimpl.c
#include <stdlib.h>
#include "strimpl.h"

int __strcmp(const char *str1, const char *str2 ) { ... }

char *__strreverse(const char *str) {
int i, length = 0;
char *revstr = NULL;

length = __strlen(str);
revstr = (char *) malloc (sizeof (char) * length);

for (i = length; i > 0; --i) {
*(revstr + i - 1) = *(str + length - i);
}

return (revstr);
}

int __strlen(const char *str) { ... }

% cc -xO2 -xprofile=use:driver -G -o libstrimpl.so strimpl.c
warning: Profile feedback data for function __strreverse is inconsistent. Ignored.
warning: Profile feedback data for function __strlen is inconsistent. Ignored.

As compiler keeps track of the routines by line numbers, introducing some code in a routine makes its profile data inconsistent. Also since the position of all other routines that are underneath the newly introduced code may change, their feedback data becomes inconsistent, and hence compiler ignores the profile data, to avoid introducing functional errors.

The same argument holds true, when the new code was added at the top of the existing routines; but it makes it even worse, since all the profile data for the routines of this object become unusable (inconsistent). Have a look at the warnings from the following example:

Case 3: Assuming the routine was added at the top of the source file

#include <stdlib.h>
#include "strimpl.h"

char *__strreverse(const char *str) {
int i, length = 0;
char *revstr = NULL;

length = __strlen(str);
revstr = (char *) malloc (sizeof (char) * length);


for (i = length; i > 0; --i) {
*(revstr + i - 1) = *(str + length - i);
}

return (revstr);
}

int __strcmp(const char *str1, const char *str2 ) { ... }

int __strlen(const char *str) { ... }

% cc -xO2 -xprofile=use:driver -G -o libstrimpl.so strimpl.c
warning: Profile feedback data for function __strreverse is inconsistent. Ignored.
warning: Profile feedback data for function __strcmp is inconsistent. Ignored.
warning: Profile feedback data for function __strlen is inconsistent. Ignored.

SPARC, x86/x64 compatibility

At this time, there is no compatibility between the way the profile data gets generated & gets processed on SPARC, and x86/x64 platforms. That is, it is not possible to share the feedback data generated by C/C++ compilers on SPARC, in x86/x64 platforms and vice-versa.

However there seems to be some plan in place to make it compatible in Sun Studio 12 release.

Asynchronous profile collection

Current profile data collection requires the process to be terminated, in order to dump the feedback data. Also with multi-threading processes, there will be some incomplete profile data generation, due to the lock contention between multiple threads. If the process dynamically loads, and unloads other libraries with the help of dlopen(), dlclose() system calls, it leads to indirect call profiling, and it has its share of problems in collecting the data.

Asynchronous profile collection eases all the problems mentioned above by letting the profiler thread to write the profile data it is collecting, periodically. With the asynchronous data collection, the probability of getting the proper feedback data is high.

This feature will be available by default in Sun Studio 11; and as a patch to Sun Studio 9 & 10 compilers. Stay tuned for the exact patch numbers for Studio 9 and 10.

Notes:
  1. When -xprofile=collect is used to compile a program for profile collection and -xprofile=use is used to compile a program with profile feedback, the source files and compiler options other than -xprofile=collect and -xprofile=use must be identical in both compilations

  2. If both -xprofile=collect and -xprofile=use are specified in the same command line, the rightmost -xprofile option in the command line is applied

  3. If the code was compiled with -g or -g0 options, with the help of er_src utility, we can see how the compiler is optimizing with the feedback data. Here's how to: Sun Studio C/C++: Annotated listing (compiler commentary) with er_src
Acknowledgements:
Chris Aoki, Sun Microsystems
__________________
Technorati tags: | |


Comments: Post a Comment



<< Home


2004-2019 

This page is powered by Blogger. Isn't yours?