Profile Feedback
--------------------
In general, compilers do optmizations based on the compiler flags supplied during the compilation, and the other information that it thinks appropriate for an optimization though the user hasn't asked for it specifically eg., inlining certain routines, even if they are not accompanied by
inline
key word, loop unrolling etc., However since it can't predict the run-time behavior of the application, it can't do certain optimizations like block reordering, register allocation etc., during compile time. So, it is upto the developer for careful laying out of the instructions for better performance of the application. As developers may not be the end users in most of the cases, it is a cumbersome exercise for the developers to get the run-time data, to identify the hot code (where the application spends most of the time), and to rewrite the some code to improve the performance.
The developer can be relieved from such tasks by assisting optimizers with the program profile ie., the run-time behavior of a program: expected probabilities of branches and frequencies of executions of given program blocks.
Sun Studio C/C++ compilers support
profile feedback technique to automate tasks mentioned above. Profile feedback is a mechanism by which a user can gather information about the run-time behavior of the application, and allow the compiler to use this information to optimize the application further.
Typical steps involved in using profile feedback mechanism:
- Compilation for profile data collection
- Build the application with
-xprofile=collect
option. In this step, the source code is instrumented to gather data; counters are inserted into the source code to facilitate determining the number of times the code was executed. This data will be used to build a control flow graph.
-xprofile=collect
may suppress optimizations that modify the structure of the control flow graph, in order to preserve the accuracy of the description used to generate control flow instrumentation
- You can also specify the name of the program that is being analyzed, during compilation with
-xprofile=collect
flag. The flag will be -xprofile=collect:name
. The name is optional and, if not specified, is assumed to be a.out
. When compiling different object files with the -xprofile=collect:<name>
option, <name> must be the same for all the object files used to produce the final program. If they are not consistent, then the result will be dependent on the final link order of the program
- Note that profile feedback works only at optimization levels
-xO2
and above
- eg., (trivial C code, just for example)
% cat bubblesort.c
#include <stdio.h>
#include <stdlib.h>
#define COUNT 1000
void swap (int *Array, int i, int j) {
int temp;
temp = Array[i];
Array[i] = Array[j];
Array[j] = temp;
}
void bubblesort(int *Array) {
int i, j;
for (i = 0; i < COUNT; ++i) {
for (j = (i + 1); j < COUNT; ++j) {
if (Array[i] > Array[j]) {
swap (Array, i, j);
}
}
}
}
int main() {
int i, *Array;
Array = (int *) malloc (sizeof (int) * COUNT);
for (i = COUNT; i > 0; --i)
Array[COUNT - i] = i;
bubblesort(Array);
for (i = 0; i < COUNT; ++i)
printf("\nArray[%d] = %d", i, Array[i]);
return (0);
}
% cc -o bubblesort -xO4 -xprofile=collect bubblesort.c
- Data collection run
- Rebuild the application with feedback
- Measure the application performance, and compare with baseline run
Notes:
- In general, any profile feedback improves performance
- Designing good training runs is a complex issue and nearly impossible for some programs. So care must be taken to run the application using a typical data set or several typical data sets. It is important to use data that is representative of the data that will be used by your application in a real-world scenario
- Because the process requires compiling the entire application twice, it is intended to be used only after other debugging and tuning is finished, as one of the last steps before putting the application into production
- The profile data collection is synchronous. The instrumented code dumps the profile data during the shutdown of the application process. Since it is not asynchronous, multi-threaded applications may experience some (profile) data loss due to the race condition between multiple threads. There is some work in progress to get asynchronous profile data collection, in the presence of multiple threads and without the need to shutting down the process. Hopefully this feature will be available in Sun Studio 11
Patch releasesIf the application is very big, and if only few modules were changed, profile only those binaries (executables or shared libraries) that are rebuilt for the patch. However, in order to collect a meaningful profile, there needs to be
-xprofile=collect
versions of all object files comprising a rebuilt executable or shared library. For example, if the executable
mtserver
is built with object files
smiwork.o
and
smiutil.o
, then rebuild those object files with
-xprofile=collect
, along with
mtserver
. Then simply replace the old binaries of the collect build with the new patched binaries. And then re-run the training run, collect the feedback data for the entire build; and finally recompile all object files in the binary (executable or library) with
-xprofile=use
In other words:
Let's say the shared library
libmodel.so
, built from objects
model.o
and
buscomp.o
, has to be patched. Under PFO, this will be done as follows:
- Compile
model, buscomp
objects with -xprofile=collect
- Build
libmodel.so
with -xprofile=collect
- Simply replace the
libmodel.so
of the previous complete collect build, with the (latest) patched libmodel.so
- Collect profile data for the entire application
- Compile
model, buscomp
objects again with -xprofile=use
and with the feedback data from step 4 (see above)
- Re-build
libmodel.so
with -xprofile=use
and with the feedback data from step 4 (see above)
- Release
libmodel.so
as a patch
Other optimizations that would work well with profile feedback:
- Profile feedback works best with crossfile optimisation (controlled by the flag
-xipo
) since this allows the compiler to look at potentially optimisations between all source files
- Mapfiles work at the routine level, and profile feedback works within routines; it would seem to be a simple progression to do both optimisations at the same time. This is possible with link-time optimisation (controlled by the flag
-xlinkopt
). This is also called post-optimisation
The whole discussion on profile feedback optimization can be summarized in the following 3 steps:
- Build the application with the flags
-xprofile=collect -xipo
% cc -xO2 -xprofile=collect:application.profile -xipo -o application application.c
- Run the application with one or more representative workloads
% ./application args
- Rebuild the application with
-xprofile=use -xipo -xlinkopt
% cc -xO2 -xprofile=use:application.profile -xipo -xlinkopt -o application application.c
Suggested Reading:
- Sun Studio C/C++ compiler options
- Improving Code Layout Can Improve Application Performance by Darryl Gove
Acknowledgements:
Chris Aoki (Sun Microsystems)
_____________________
Technorati tags:
Sun |
C |
C++ |
Programming