8002824260

8002824260



10 Włodzimierz Bielecki, Krzysztof Kraska

405/440/460 embedded systems development and the related IBM RlSCWatch v6.0i debugger [8]. Cache utilization was reached from DCU (sim readdcu) statistics of the Simulator.

The following configuration of the Simulator was used to conduct experiments:

—    2 x PowerPC405 processors with

—    16KB two-way set-associative DataCache-Ll (8 words/32 bytes cache linę)

—    no DataCache-L2.

The sources exposed to the experiments were developed in a manner representative for the embedded software development using the cross-platform development envi-ronment composed of the Intel PC Workstation and the target executable architecture [8]. The examined C sources were compiled on the Fedora 4 Linux x86 to the PowerPC Embedded ABI file format by means of the gcc-3.3.1 compiler and executed in the target system environment using the MC-ISS software Simulator. Due to the target architecture limitations, two threads of the data processing were extracted in the sources. Iterations of a parallel loop were assigned to threads according to the scheduling of static policy, i.e., one thread has assigned a half of the consecutive loop iterations [12].

Table 4 shows the results achieved for the matrix multiplication codę being simu-lated in the MC-ISS embedded software Simulator.

Table 4. The experimental results of DCU utilization for the matrix multiplication codę (N=256, B=8)

RlSCWatch

STATUS

Sequential

Parallel SFS

Parallel SFS with Blocking

Parallel SFS with Blocking & Array Contraction

CPU0

CPU1

CPU0

CPU1

CPU0

CPU1

CPU0

CPU1

DCU total

accesses

31852424

N/A

127014104

127044104

1450122%

1450122%

119846472

119846472

DCU misses

2160751

N/A

8634538

8634538

317789

317789

317789

317789

Misses/total

r%i

6,8%

N/A

6,8%

6,8%

0,22%

0,22%

0,27%

0,27%

Table 5 shows the results obtained for the Livermore loop Kernel 1 (hydro fragment) codę executed in the MC-ISS embedded software Simulator.

Table 5. The experimental results of DCU utilization for the Kernel 1 (loop= 100; array_size=8192*sizeof(int))

RlSCWatch

STATUS

Sequential

Parallel

Parallel SFS

Parallel SFS with Arrav Contraction

CPU0

CPU1

CPU0

CPU1

CPU0

CPU1

CPU0

CPU1

DCU total

accesses

11527637

N/A

5800687

5800687

11576472

11576472

8399916

8399916

DCU misses

309546

N/A

155799

155799

5130

5130

5131

5131

misses/total

[%]

2,69%

N/A

2,69%

2,69%

0,04%

0,04%

0,06%

0,06%

The examined sources have achieved the same (in the first case) and much better (in the second case) DCU misses/total ratio after synchronization-free slices extraction.



Wyszukiwarka

Podobne podstrony:
12 Włodzimierz Bielecki, Krzysztof Kraska Mathematica package. The CodeGenerationModule is responsib
Spis treści Włodzimierz Bielecki, Krzysztof Kraska INCREASING DATA LOCALITY OF PARALLEL PROGRAMS EXE
6 Włodzimierz Bielecki, Krzysztof Kraska Similarly to the Computer software development, the embedde
Włodzimierz Bielecki, Krzysztof Kraska The assurance of the optimal performance for a program with t
Increasing data locality of parallel programs executed in embedded Systems Włodzimierz Bielecki, Krz
10 10 WŁODZISŁAW I ŁOKIETEK, ks. łęczycki, kujawski . sieradzki i krakowski, kr. polski, * m. 3
OFERTA DLA KANDYDATÓW dr inż. Włodzimierz Ruciński prof. dr hab. inż. Włodzimierz Bielecki dr inż.
10 WŁODZIMIERZ WOJCIECHOWSKI conej dolnośląskim nefrytom i ich wykorzystywaniu w pradziejach zamieśc
mobista1 Kategoria Dzwonek 07 Zgłoś sią Trzy Korony -10 w skali Beauforta Krzysztof Cugowski i Halin
10.    Włodzimierz Wesołowski, Jan Włodarek (red.), Kręgi integracji i rodzaje
177 10 Włodzimierz Szafrański, Report on the Archaeological Investiga- tions at Płock in 1959
B574 810 98 FRANCISZEK KRZYSZTAŁOWICZ 98 FRANCISZEK KRZYSZTAŁOWICZ Rys. 39. Grzybica strzygąca tuło
Komitet redakcyjny: Ignacy M. Doliński Jerzy Molas Włodzimierz Pianka Krzysztof
bajki Misia i Margolci cz1c 3.    Trzy Świnki 10 49 Narrator: Krzysztof Kowalewski, Ś

więcej podobnych podstron