Increasing data locality ofparallel programs executed in embedded Systems 11
Obviously, increase in performance of the programs by the parallel execution with no synchronizations is not considered in the tables. Applying the blocking techniąue has further improved data locality. For the Parallel SFS sources with the array contraction techniąue applied the misses/total ratio do not properly render data locality improve-ment sińce reused data were placed in CPU registers and therefore DCU total accesses factors were decreased. In fact, data locality is better then previously due to usage of the fastest memory registers.
The results achieved in the foregoing experiments confirmed the results previously achieved for real multiprocessor computers. They indicated on the significant improve-ment of the DCU utilization for a PowerPC405 processor used in embedded systems where well-known optimization techniąues to improved data locality were applied with-in synchronization-free slices.
We intend to implement the results of our research in an academic source-to-source compiler. Figurę 3 illustrates the structural overview of the software to be build.
*comDonent»
=□
O Mathematica ... O IDataLocalKy
«lnterface»
O IDataLocality
O Mam Controller
•component* £] EwIaciLi
Module
O IBttractSIices «use»
«comDonent» Omega □ Calculator APT
«coiroonent»
O ICodeGen
«interface»
O ILoopTransform
Mathematica O Wolfram Research API
Figurę 3. A structural overview of the software to build based on the results of research
The MainController is responsible for managing the execution of all compiler mod-ules. The SlicesExtractionModule implements the method of extracting parallel synchronization-free slices [1]. It makes the extraction of slices from an input C source taking advantage of the Omega Calculator Library [11] to fulfill a dependence analysis. The LoopTransformationModule analyses possible combinations of the slices agglome-ration and applies into output codę various space-time scheduling options as well as techniąues for improving data locality (i.e. tiling and array contraction). The DataLoca-lityEstimationModule implements the method of calculating data locality factors. VOPC domain model of the module worked out during research is presented in Figurę 4. Both latter modules use the linear algebra engine from the Wolfram Research’s