8002824260

10 Włodzimierz Bielecki, Krzysztof Kraska

405/440/460 embedded systems development and the related IBM RlSCWatch v6.0i debugger [8]. Cache utilization was reached from DCU (sim readdcu) statistics of the Simulator.

The following configuration of the Simulator was used to conduct experiments:

— 2 x PowerPC405 processors with

— 16KB two-way set-associative DataCache-Ll (8 words/32 bytes cache linę)

— no DataCache-L2.

The sources exposed to the experiments were developed in a manner representative for the embedded software development using the cross-platform development envi-ronment composed of the Intel PC Workstation and the target executable architecture [8]. The examined C sources were compiled on the Fedora 4 Linux x86 to the PowerPC Embedded ABI file format by means of the gcc-3.3.1 compiler and executed in the target system environment using the MC-ISS software Simulator. Due to the target architecture limitations, two threads of the data processing were extracted in the sources. Iterations of a parallel loop were assigned to threads according to the scheduling of static policy, i.e., one thread has assigned a half of the consecutive loop iterations [12].

Table 4 shows the results achieved for the matrix multiplication codę being simu-lated in the MC-ISS embedded software Simulator.

Table 4. The experimental results of DCU utilization for the matrix multiplication codę (N=256, B=8)

RlSCWatch STATUS	Sequential		Parallel SFS		Parallel SFS with Blocking		Parallel SFS with Blocking & Array Contraction
RlSCWatch STATUS	CPU0	CPU1	CPU0	CPU1	CPU0	CPU1	CPU0	CPU1
DCU total accesses	31852424	N/A	127014104	127044104	1450122%	1450122%	119846472	119846472
DCU misses	2160751	N/A	8634538	8634538	317789	317789	317789	317789
Misses/total r%i	6,8%	N/A	6,8%	6,8%	0,22%	0,22%	0,27%	0,27%

Table 5 shows the results obtained for the Livermore loop Kernel 1 (hydro fragment) codę executed in the MC-ISS embedded software Simulator.

Table 5. The experimental results of DCU utilization for the Kernel 1 (loop= 100; array_size=8192*sizeof(int))

RlSCWatch STATUS	Sequential		Parallel		Parallel SFS		Parallel SFS with Arrav Contraction
RlSCWatch STATUS	CPU0	CPU1	CPU0	CPU1	CPU0	CPU1	CPU0	CPU1
DCU total accesses	11527637	N/A	5800687	5800687	11576472	11576472	8399916	8399916
DCU misses	309546	N/A	155799	155799	5130	5130	5131	5131
misses/total [%]	2,69%	N/A	2,69%	2,69%	0,04%	0,04%	0,06%	0,06%

The examined sources have achieved the same (in the first case) and much better (in the second case) DCU misses/total ratio after synchronization-free slices extraction.

Wyszukiwarka

Podobne podstrony:
12 Włodzimierz Bielecki, Krzysztof Kraska Mathematica package. The CodeGenerationModule is responsib
Spis treści Włodzimierz Bielecki, Krzysztof Kraska INCREASING DATA LOCALITY OF PARALLEL PROGRAMS EXE
6 Włodzimierz Bielecki, Krzysztof Kraska Similarly to the Computer software development, the embedde
Włodzimierz Bielecki, Krzysztof Kraska The assurance of the optimal performance for a program with t
Increasing data locality of parallel programs executed in embedded Systems Włodzimierz Bielecki, Krz
10 10 WŁODZISŁAW I ŁOKIETEK, ks. łęczycki, kujawski . sieradzki i krakowski, kr. polski, * m. 3
OFERTA DLA KANDYDATÓW dr inż. Włodzimierz Ruciński prof. dr hab. inż. Włodzimierz Bielecki dr inż.
10 WŁODZIMIERZ WOJCIECHOWSKI conej dolnośląskim nefrytom i ich wykorzystywaniu w pradziejach zamieśc
mobista1 Kategoria Dzwonek 07 Zgłoś sią Trzy Korony -10 w skali Beauforta Krzysztof Cugowski i Halin
10. Włodzimierz Wesołowski, Jan Włodarek (red.), Kręgi integracji i rodzaje
177 10 Włodzimierz Szafrański, Report on the Archaeological Investiga- tions at Płock in 1959
B574 810 98 FRANCISZEK KRZYSZTAŁOWICZ 98 FRANCISZEK KRZYSZTAŁOWICZ Rys. 39. Grzybica strzygąca tuło
Komitet redakcyjny: Ignacy M. Doliński Jerzy Molas Włodzimierz Pianka Krzysztof
bajki Misia i Margolci cz1c 3. Trzy Świnki 10 49 Narrator: Krzysztof Kowalewski, Ś

więcej podobnych podstron