8002824278

8002824278



Increasing data locality of parallel programs executed in embedded systems 9

Increasing data locality of parallel programs executed in embedded systems 9

Table 2. Temporal-reuse factors for the Livermore Loops Kernel 1 (hydro fragment)

Reference

Reuse factors

Temporal

Spatial

Self-reuse

Cumulative self reuse

Data footprint

k

1

k

1

Rk

Ri

Rk*

R,'

Fk*

F,'

x[kl

1

Loop

32

1

32

loop

32

321oop

n/32

n/128

y[k]

1

Loop

32

1

32

loop

32

321oop

n/32

n/128

Z[k+10]

1

Loop

32

1

32

loop

32

321oop

n/32

n/128

Z[k+11]

1

Loop

32

1

32

loop

32

321oop

n/32

n/128

E

128

4* loop

128

128* loop

n/8

n/32


Table 3. Spatial-reuse factors for the Livermore Loops Kernel 1 (hydro fragment)

Reference

Reuse factors

Self-spatial reuse

Group-temporal reuse

Cumulative group reuse

k

1

k

1

k

1

zfk+11]

32

1

1

1

32

32

z[k+10]

32

1

1

loop

32

32*loop


The value Fi*= 196608 for the 4-bytes array element size gives 768KB. After apply-ing the tiling techniąue and splitting data into blocks with the side B=32 elements, the value of the data footprint was decreased because that data amount could be entirely placed in DataCache-Ll:

»    6 *B2    «

F =- ; fi = 32 ; F. = 48 .

128

The value Fi*=48 for the 4-bytes array element size gives 192B. It should be noticed that DataCache-Ll was shared between 2 parallel threads.

In the case of the Livermore Loops Kernel 1 (hydro fragment), the self-reuse factors are identical for the source with fine grained parallelism and the source with synchroni-zation-free slices extracted. There is also the group-reuse between references z [k+11] and z [k+10] sorted so that the reuse distance between adjacent references is lexico-graphically nonnegative. There are also self-temporal and self-spatial reuse factors for the both references. The group-spatial reuse factor eąuals one sińce there is the self-spatial reuse factor. To take into account reuse between references, a generalized data reuse factor for the outermost loop LI is computed by dividing the data footprint by the cumulative group reuse factor that finally gives:

n! 32 32 * loop


= n/(322 *loop).

3. Experiments

Experiments were performed by means of the software Simulator IBM PowerPC Multi-Core Instruction Set Simulator vl.29 (MC-ISS) [7] intended for the PowerPC



Wyszukiwarka

Podobne podstrony:
Increasing data locality of parallel programs executed in embedded Systems Włodzimierz Bielecki, Krz
Increasing data locality ofparallel programs executed in embedded Systems 11 Obviously, increase in
Increasing data locality ofparallel programs executed in embedded Systems 13References [1]
7 lncreasing data locality ofparallel programs executed in embedded systems Livermore loop Kernel
Spis treści Włodzimierz Bielecki, Krzysztof Kraska INCREASING DATA LOCALITY OF PARALLEL PROGRAMS EXE
GDAŃSK UNIVERSITY OF TECHNOLOGY PROGRAMME GUIDESTUDY IN GDAŃSK
Uniwersytet Ekonomiczny w Poznaniu posiada w swojej ofercie 2 programy Executive Master of Busi
ARCHITECTURE DESCRIPTION JOB PROSPECTS Graduate programmes end in a degree examination comprisingan
Fitness dla dzieci w wieku szkolnym i przedszkolnym. Program zawiera m.in.: elementy tańca, budowę z
3 KAPITAŁ LUDZKIPOLNYSLĄSK % in w polskim systemie edukacjiSZCZEGÓŁOWY PROGRAM
SECTION 2 Test Speciflcations for SPE Petroleum Engineering Certification Program Examination in the
Tracking errors decreasing in CNC system of machinę tools 87 and velocity feedforward value. The tor
Tracking errors decreasing in CNC system of machinę tools 89 Fig. 3. The architecture of CNC control
Tracking errors decreasing in CNC system of machinę tools 91 Fig. 6. Waveforms of velocity and posit
93 Tracking errors decreasing in CNC system of machinę toolsLITERATURĘ 1.    Tsai M.-
The principal objective of the environmental agriculture programmes is to promote Systems of agricul

więcej podobnych podstron