8002824275

6 Włodzimierz Bielecki, Krzysztof Kraska

Similarly to the Computer software development, the embedded system development needs programming languages, debuggers, compilers, linkers and other programming tools. Approved as an IEEE standard, the SystemC language is an example of the tool that enables the implementation of both software and hardware parts of embedded Systems.

The optimal implementation of software components designed for multiprocessor embedded systems is critical for their performance and the power consumption. Howev-er, poor data locality is a common feature of many existing numerical applications [6]. Such applications are often represented with affine loops where the considerable quanti-ties of data placed in arrays exceeded the size of a fast but smali cache memory. For an inefficient codę, referenced data has to be fetched to a cache from external memory although they could be reused many times. Because cache is expensive, memories often operate at fuli speed of a processor while cheaper but morę capacious external memory modules operate at several times slower freąuency, hence the time to access data located in a cache memory is significantly less. Improvement in data locality can be obtained by means of high-level program transformations. Increasing data locality within a program improves the utilization of fast data cache and delimits accesses to slower memory modules at lower level. Finally it makes generał performance improvement for software parts of embedded systems.

2. Analysis of data locality for synchronization-free slices

A new method of extracting synchronization-free slices (SFS) in arbitrarily nested loops was introduced in [1], The method enables us to extract morę parallel threads than other methods. The well-known technique invented by Wolfe [3] estimates data reuse factors. It makes possible to adopt such order of the loop execution that increases data locality in a program. In relation to the method of extracting synchronization-free slices, the estimation of data locality is a necessary activity to obtain an improved performance for a program executed on a target architecture. The SFS method extracts maximal number of the parallel threads however any target embedded architecture consists of the fixed number of CPU cores usually smaller than the number of threads extracted. Hence, it is necessary to adjust the level of parallelism in a program to the target architecture [10]. Our previous research conducted on parallel computers indicates that the extraction of synchronization-free slices as well as applying the tiling and the array contraction techniques within an individual thread can considerably increase the performance of a parallel program. For example, the results of the experiments performed for the Livermore Loops Kemel 1 (hydro fragment) [5] and the matrix multiplication algorithm [6] indicate the considerable gains in the execution time (Figurę la and Figurę lb) [2].

On the contrary, the example of a simple codę in Figurę 2 executed on the same target architecture proves that the extraction of parallel slices under certain circumstances can limit the performance of a program - the execution time of the parallel codę (8 seconds) was about 30% greater than that of the correspondent sequential codę (6 seconds). It can be noticed that the parallel codę has the decreased spatial-reuse factor value for a reference to the array a [ ] caused to maintain the coherence between the caches of multiple processors to a large extent.

Wyszukiwarka

Podobne podstrony:
12 Włodzimierz Bielecki, Krzysztof Kraska Mathematica package. The CodeGenerationModule is responsib
10 Włodzimierz Bielecki, Krzysztof Kraska 405/440/460 embedded systems development and the related I
Włodzimierz Bielecki, Krzysztof Kraska The assurance of the optimal performance for a program with t
Spis treści Włodzimierz Bielecki, Krzysztof Kraska INCREASING DATA LOCALITY OF PARALLEL PROGRAMS EXE
Diagnosis and deyelopment trends similar to the average value in Poland (Figurę 1), and these two re
Being surę that a waveform similar to the figurę should be seen on the scope* cSi cSb ® cSj cFSj cSb
CCF20100223005 Fig. 21. Englńti [u:), lougue poation Fig. 2Z Position of the lips
Increasing data locality of parallel programs executed in embedded Systems Włodzimierz Bielecki, Krz
scan0031 3 LyngsehV ALM ET Marinę UMS 2000 Technical System DescriptionUMS 2000 Extended The system
3 Termiaology of Ritual Bread 67 Bulgarian names for cakes, similar to the loaves for the cerem
Easy Tatting (9) Ice Crystal This is similar to the star suncatcher, but has many very large picots
image070 of the sky.” This chronicie, similarly to the First Novgorod Chronicie, places 01eg’s grave
260 (41) 231 Mounts ing buckie frames and some loops apparently similar to the present ones. Ceramic
The stack efficiency is shown in Fig. 9. The behavior is similar to the voltage, sińce these are dir
CCF20100223005 u b. Long u, L«. (u:) Fig. 21. Engiiifi
world by intensive and rigorous industrialization, with respect to the economic system and the model

więcej podobnych podstron