This page contains optimization results of dco version 1.0.1 achieved on the Livermore loops benchmark while optimizing code generated by gcc version 4.1.2.

Preparing the benchmarks

We used the C version of the Livermore loops benchmark. The code was modified to eliminate calibration, thus ensuring that on every run the same number of iterations are executed on the same input data. This makes it possible to compare the execution times of the program ( and not the estimate amount of MFlops as in the original implementation ).

Read this to understand how the benchmarks were executed.


Results of optimization

The following table lists the improvements of the 
dco optimized code over the code generated by gcc. The complete data, collected during the benchmarking, is presented here. Some cases have links pointing to in-depth explanation of the benchmark and results of it optimization.

For the convenience, cases were
  • there is no improvement ( the execution time difference is in the range from -5% to 5% ) are marked in this color
  • dco generated code is slower ( by more than 5% ) than compiler generated code are marked in this color
  • dco generated code is moderately faster than compiler generated code ( from 5% to 10% ) are marked in this color
  • dco generated code is faster than compiler generated code ( from 10% to 20% ) are marked in this color
  • dco generated code is much faster than compiler generated code ( 20% or more ) are marked in this color

>Livermore loops
Kernel 1 33%
Kernel 2 3%
Kernel 3 40%
Kernel 4 18%
Kernel 5 60%
Kernel 6 20%
Kernel 7 36%
Kernel 8 22%
Kernel 9 14%
Kernel 10 32%
Kernel 11 74%
Kernel 12 15%
Kernel 13 0%
Kernel 14 10%
Kernel 15 0%
Kernel 16 6%
Kernel 17 0%
Kernel 18 16%
Kernel 19 29%
Kernel 20 2%
Kernel 21 6%
Kernel 22 0%
Kernel 23 2%
Kernel 24 84%

Considering results of optimization in the range from -5% to 5% to be "the same" and results outside of this range to be better/worse, the data listed above can be summarized as following:

dco improved the gcc-generated code 17 out of 24 times - 71% cases
dco didn't affect the performance of the gcc-generated code 7 out of 24 times - 29% cases

with the average improvement of 27%.

Timing of the Livermore loops kernels

The following table presents the execution data collected while performing benchmarking of the Livermore loops - see this for description of the code that was benchmarked.

The two columns under gcc and gcc+dco headers present execution times ( in seconds ) and execution speeds ( in MFlops ) achieved by the compiler generated code and dco optimized code respectively. The column under the gcc+dco/gcc header lists the improvement achieved by utilizing dco over the compiler generated code. For example, the compiler generated code executed kernel #1 in 4.96 second and achieved speed of 1707.57 MFlops; after optimization by dco the resulting code run for 3.32 second delivering 2551.83 MFlops which is 33.06% improvement.


Kernel# gcc 4.1.2 gcc+dco gcc+dco/gcc
1 4.96 1707.57 3.32 2551.83 33.06%
2 2.38 1462.97 2.32 1500.79 2.52%
3 5.93 1136.8 3.55 1898.05 40.13%
4 4.66 1239.76 3.8 1519.63 18.45%
5 5.2 183.98 2.07 462.66 60.19%
6 4.53 718.38 3.63 896.31 19.87%
7 4.87 1677.7 3.12 2619.9 35.93%
8 5 991.29 3.88 1277.1 22.40%
9 4.6 1540.48 3.95 1794.09 14.13%
10 4.94 353.19 3.38 516.43 31.58%
11 5.78 96.75 1.52 368.67 73.70%
12 5.18 681.16 4.39 803.66 15.25%
13 4.57 198.85 4.58 198.42 -0.22%
14 4.71 256.27 4.26 283.34 9.55%
15 3.72 586.53 3.72 586.53 0.00%
16 5.61 698.71 5.29 740.95 5.70%
17 5.01 593.81 4.99 596.19 0.40%
18 4.7 826.95 3.95 984.03 15.96%
19 5.81 372.45 4.1 527.69 29.43%
20 4.53 374.26 4.43 382.71 2.21%
21 4.88 362.2 4.61 383.41 5.53%
22 4.88 191.36 4.86 192.14 0.41%
23 4.17 681.74 4.09 695.08 1.92%
24 4.85 132.3 0.77 837.82 84.12%
Geometric Mean 4.75
3.45
27.28%