Elektra 0.11.0
|
One of the primary resources in computing is execution time. To keep usage of this resource type low, it makes sense to profile code and check which code paths in a progamm take the longest time to execute. There exist various tools to handle this kind of profiling. For this tutorial we will use
Since we want to improve the readability of the Callgrind output we choose a build type that includes debug symbols. The two obvious choices for the build type are:
RelWithDebInfo
(optimized build with debug symbols), andDebug
(non-optimized build with debug symbols). We use Debug
here, which should provide the most detailed profiling information.
For this tutorial we decided to profile the YAJL plugin. Since Elektra loads plugin code via dlopen
and Callgrind does not support the function dlclose
properly we remove the dlclose
calls in the file dl.c
temporarily. At the time of writing one option to do that is deleting
dlclose
statement, andif
-statement that checks the return value of a dlclose
call. An unfortunate effect of this code update is that Elektra will now leak memory when it unloads a plugin. On the other hand, Callgrind will be able to add source code information about the YAJL plugin to the profiling output.
As we already described before we use the Debug
build type for the profiling run. To make sure we test the actual performance of the YAJL plugin we disable debug code and the logger. The following commands show one option to translate Elektra using this configuration, if we use Ninja as build tool:
We use the tool `benchmark_plugingetset` to profile the execution time of YAJL. The file `keyframes_complex.json` serves as input file for the plugin. Since benchmark_plugingetset
requires a data file called
, we save a copy of keyframes_complex.json
as test.yajl.in
in the folder benchmarks/data
:
. After that we call benchmark_plugingetset
directly to make sure that everything works as expected:
. If the command above fails with a segmentation fault, then please check
lib
directory in the build folder to LD_LIBRARY_PATH
on Linux). If benchmark_plugingetset
executed successfully, then you can now use Callgrind to profile the command:
. The command above will create a file called callgrind.out
in the root of the repository. You can now remove the input data and the folder benchmarks/data
:
. If you use Docker to translate Elektra, then you might want to fix the paths in the file callgrind.out
before you continue:
. Now we can analyze the file callgrind.out
with a graphical tool such as QCacheGrind:
. If everything worked as expected QCacheGrind should open the file callgrind.out
and display a window that look similar to the one below:
. You can now select different parts of the call graph on the left to check which parts of the code take a long time to execute.
XRay is an extension for LLVM that adds profiling code to binaries. Profiling can be dynamically enabled and disabled via the environment variable XRAY_OPTIONS
.
Since XRay currently requires LLVM we need to set the compiler appropriately. We use Clang 8 in our example.
. We enable the static build (BUILD_STATIC=ON
) and disable the dynamic build (BUILD_SHARED=OFF
), since XRay currently does not support dynamic libraries. To enable Xray we use the compiler switch -fxray-instrument
. To instrument every function we set the instruction threshold to 1
with -fxray-instruction-threshold=1
.
We will analyze the YAJL plugin below. Please make sure that the CMake command above includes the plugin:
. Now we can translate the code with Ninja and change the current directory back to the root of the repository:
. In the next step we use `benchmark_plugingetset` to execute YAJL for the input file keyframes_complex.json
. To do that we
data
in the directory `benchmarks`, andkeyframes_complex.json
as test.yajl.in
. The following commands show you how to do that:
. Now we first check if running [benchmark_plugingetset
][] works without instrumentation:
. If everything worked correctly, then the command above should finish successfully and not produce any output. To instrument the binary we set the environment variable XRAY_OPTIONS
to the value xray_mode=xray-basic verbosity=1
.
. The command above will print the location of the XRay log file to stdterr
:
. Now we can use the log file to analyze the runtime of the execution paths of the binary. To do that we first save the name of the log file in the variable LOGFILE
. This way we do not need to repeat the filename every time in the commands below.
. To list the 10 functions with the longest runtime we use the command llvm-xray account
:
. We can also use the log file to create a Flame Graph. To do that we use the llvm-xray stack
to create an input file for the tool flamegraph.pl
. We then create the Flame Graph with the following command:
. The image below shows one example how the picture could look like:
. Additional information on how to use the data produced by XRay is available here.