This
page contains dco's
optimization results for the Livermore
loops benchmark while optimizing code generated by the gcc version
4.2.2 on x86-64 and IA-32 systems. See this for
optimization results of
the previous version of dco
( version 1.0.1 ) on the same benchmark for the
code
generated by gcc
version 4.1.2.Preparing the benchmarksFor every kernel of the benchmark, dco was invoked twice: first without any options ( default mode ) and then with the option; the best execution time is reported - note that the x86 assembly source that was optimized is one generated when gcc was invoked. Read this to understand how the benchmarks were executed and code optimization results were calculated. Timing of the Livermore loops kernels and Results of optimizationThe two columns under gcc and gcc+dco headers present execution times ( in seconds ) achieved by the compiler generated code and dco optimized code respectively. The column under the gcc+dco/gcc header lists the improvement achieved by utilizing dco over the compiler generated code. For example, the compiler generated code executed kernel #1 in 3.06 seconds; after optimization by dco the resulting code run for 2.22 seconds which is 27.45% improvement. The following are the results of optimizations achieved on 64-bit Linux operating system running on the 2.66GHz Core2 computer. The gcc version 4.2.2 compiler, used to process the benchmarks, was invoked with the following command line options: -S -O3
-fomit-frame-pointer -funroll-all-loops-ffast-math -march=nocona
-mfpmath=sse -msse3
The dco version 1.1.0 was used to optimize compiler generated code.
On average dco achieved improvement of 26% over the 64-bit code generated and optimized by the gcc version 4.2.2. The following are the results of optimizations achieved on 32-bit Linux operating system running on the 2.8GHz Pentium4 computer. The gcc version 4.2.2 compiler, used to process the benchmarks, was invoked with the following command line options: -S -O3
-fomit-frame-pointer -funroll-all-loops-ffast-math -march=pentium4
-mfpmath=sse -msse2
-mstackrealign The dco version 1.1.1 was used to optimize compiler generated code. Note that dco's command line option was used during optimization.
On average dco achieved improvement of 32% over the 32-bit code generated and optimized by the gcc version 4.2.2. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||