page contains optimization results of dco
version 1.0.1 achieved on
loops benchmark while optimizing code generated by gcc version 4.1.2.
We used the C version of the Livermore loops benchmark. The code was modified to eliminate calibration, thus ensuring that on every run the same number of iterations are executed on the same input data. This makes it possible to compare the execution times of the program ( and not the estimate amount of MFlops as in the original implementation ).
Read this to understand how the benchmarks were executed.
Results of optimization
The following table lists the improvements of the dco optimized code over the code generated by gcc. The complete data, collected during the benchmarking, is presented here. Some cases have links pointing to in-depth explanation of the benchmark and results of it optimization.
For the convenience, cases were
Considering results of optimization in the range from -5% to 5% to be "the same" and results outside of this range to be better/worse, the data listed above can be summarized as following:
with the average improvement of 27%.
this for description of the code that was benchmarked.
The two columns under gcc and gcc+dco headers present execution times ( in seconds ) and execution speeds ( in MFlops ) achieved by the compiler generated code and dco optimized code respectively. The column under the gcc+dco/gcc header lists the improvement achieved by utilizing dco over the compiler generated code. For example, the compiler generated code executed kernel #1 in 4.96 second and achieved speed of 1707.57 MFlops; after optimization by dco the resulting code run for 3.32 second delivering 2551.83 MFlops which is 33.06% improvement.