concurrencyruntime

background image

Don McCrady, Principal Development Lead
Parallel Computing Platform
June 7-10

background image

Some cool new C++
Parallel Iteration
Tricks for reducing shared state
Asynchronous Agents
Concurrent containers

background image

Demo: N-Bodies

background image
background image

Part of the C++ Runtime

No new libraries to link in
PPL: Parallel Pattern Library
Agents: Asynchronous Agents Library

Abstracts away the notion of threads

Tasks are computations that may be run in parallel

Use PPL & Agents to express your potential concurrency

Let the runtime map it to the available concurrency
Scale from 1 to 256 cores

background image

int

x = 5;

int

y = 7;

auto

functor =

[x,&y] (

int

z) {

y += x - z;

};

functor(3);
cout << y;

class

_FT {

public

:

_FT(

int

x,

int

& y)

: _x(x), _y(y)

{ }

void operator

()(

int

z) {

_y += _x - z;

}

private

:

int

_x;

int

& _y;

};

int

x = 5;

int

y = 7;

_FT functor(x, y);
functor(3);
cout << y;

background image

#include

<vector>

#include

<algorithm>

using namespace

std;

vector<

int

> v = …;

foreach(v.begin(), v.end(), [&v] (

int

item) {

cout << item << endl;

});

Lambdas make functional programming palatable in C++

background image

#include

<ppl.h>

using namespace

Concurrency;

parallel_for(0, 1000, [] (

int

i) {

work(i);

});

parallel_for iterates over a range in parallel

background image

Order of iteration is
indeterminate.

Cores may come and
go.

Ranges may be
stolen by newly idle
cores.

parallel_for(0, 1000, [] (

int

i) {

work(i);

});

Core 4

Core 3

Core 1

work(0…249)

work(500…749)

work(750…999)

Core 2

work(250…499)

background image

parallel_for considerations:

Designed for unbalanced loop bodies

An idle core can steal a portion of another core’s range of work

Supports cancellation

Early exit in search scenarios

For fixed-

sized loop bodies that don’t need cancellation,

consider parallel_for_fixed from the sample pack.

background image

parallel_for(0, yBound, [] (

int

y) {

for

(

int

x=0; x < xBound; ++x) {

complex c(minReal + deltaReal * x,

minImag + deltaImag * y);

Color pixel = ComputeMandelBrotColor(c);

}

});

Parallelize outer loops first

Usually plenty of outer loop iterations to spread out to all cores

Inner loops do sufficient work to overcome parallel overheads

background image

#include

<ppl.h>

using namespace

Concurrency;

vector<

int

> v = …;

parallel_for_each(v.begin(), v.end(), [] (

int

i) {

work(i);

});

parallel_for_each iterates over an STL container in
parallel

background image

Works best with containers that support random-
access iterators:

std::vector, std::array, std::deque,
Concurrency::concurrent_vector, …

Works okay, but with higher overhead on containers
that support forward (or bi-di) iterators:

std::list, std::map, …

background image

Shared state kills scalability of parallel
iteration

critical_section cs;

double

sum = 0;

parallel_for(0, 1000, [&sum, &cs] (

int

i) {

cs.lock();
if (SomeCondition(i))

sum += SomeComputation(i);

cs.unlock();
SomeFurtherComputation(i);

});

• High contention: entire

loop is serialized.

• Cache thrashing.

• Potential thread

explosion.

background image

Reduce contention if possible.

critical_section cs;

double

sum = 0;

parallel_for(0, 1000, [&sum, &cs] (

int

i) {

if (SomeCondition(i)) {

cs.lock();
sum += SomeComputation(i);
cs.unlock();

}
SomeFurtherComputation(i);

});

• Contention potentially

reduced by moving lock
inside the if-statement.

• Still thrashes the cache.

background image

Use combinable for per-thread computations.

Each thread has its own state; no shared state.
Operations must be commutative.

combinable<

double

> sums;

parallel_for(0, 1000, [&sums] (

int

i) {

if (SomeCondition(i))

sums.local() += SomeComputation(i);

SomeFurtherComputation(i);

});

double

sum = sums.combine(std::plus<

double

>());

• Practically zero

contention.

• No cache thrashing.

background image

Demo: Relatively Prime Numbers

background image

Not all patterns map to loops or tasks.

Pipelines, state machines, producer/consumer

Agent: an asynchronous object that communicates
through message passing.
Message Blocks: participants in message-passing which
transport from source to target.
Message: encapsulates state that is transferred between
message blocks.

background image

Message blocks for
storing data

unbounded_buffer<T

>

overwrite_buffer<T>
single_assignment<T>

Message blocks for
pipelining

transformer<T,U>
call<T>

Message blocks for
joining data

choice
join

Send and receive

send, asend
receive
try_receive

background image

send

unbounded_buffer

transformer

(reverse)

receive

“glorp”`

propagat

e

“glorp”`

propagat

e

“prolg”`

background image

class

ReverserAgent :

public

Concurrency::agent

{

private

:

transformer<string,string> reverser;

public

:

unbounded_buffer<string> inputBuffer;

ReverseAgent()

: reverser([] (string in) -> string {

string reversed(in);
reverse(reversed.begin(), reversed.end());

return

reversed;

})

{

inputBuffer.link_target(&reverser);

}

protected:

virtual void

run();

};

background image

void

ReverserAgent::run() {

for

(;;) {

string s = receive(&reverser);

if (s ==

"pots"

) {

done();

return

;

}

cout <<

"Received message : "

<< s << endl;

}

}

background image

void

main()

{

ReverserAgent reverseAgent;
reverseAgent.start();

for

(;;) {

string s;
cin >> s;
send(reverseAgent.inputBuffer, s);

if

(s ==

"stop"

)

break

;

}

agent::wait(&reverseAgent);

}

background image

Demo: String Reverse Agent

background image

Two thread-safe, lock-free containers provided:

concurrent_vector<T>:

Lock-free push_back, element access, and iteration
No deletion!

concurrent_queue<T>:

Lock-free push and pop

Sample pack adds:

concurrent_unordered_map<T,U>
concurrent_set<T>

background image

#include

<ppl.h>

#include

<concurrent_vector.h>

using namespace

Concurrency;

concurrent_vector<

int

> carmVec;

parallel_for(2, 5000000, [&carmVec](

int

i) {

if

(is_carmichael(i))

carmVec.push_back(i);

});

background image

#include

<ppl.h>

#include

<concurrent_queue.h>

using namespace

Concurrency;

concurrent_queue<

int

> itemQueue;

parallel_invoke([&itemQueue] {

// Produce 1000 items

for

(

int

i=0; i<1000; ++i)

itemQueue.push(i);

},

[&itemQueue] {

// Consume 1000 items

for

(

int

i=0; i<1000; ++i) {

int

result = -1;

while

(!itemQueue.try_pop(result))

Context::Yield();

ProcessItem(result);

}

});

background image

The “Many Core Shift” is happening
VS2010 with the Concurrency Runtime can help
Use PPL & Agents to express your potential concurrency

Let the runtime figure out the actual concurrency

Parallel iteration can help your application scale
Asynchronous Agents provide isolation from shared state
Concurrent collections are scalable and lock-free

background image

Parallel Computing Developer Center

http://msdn.com/Concurrency

ConcRT Sample Pack

http://code.msdn.com/concrtextras

Native Concurrency Blog

http://blogs.msdn.com/nativeconcurrency

Forums

http://social.msdn.microsoft.com/Forums/en-

US/category/parallelcomputing

background image
background image

bool

is_carmichael(

const int

n) {

if

(n < 2) {

return

false; }

int

k = n;

for

(

int

i = 2; i <= k / i; ++i) {

if

(k % i == 0) {

if

((k / i) % i == 0) {

return

false; }

if

((n - 1) % (i - 1) != 0) {

return

false; }

k /= i;
i = 1;

}

}

return

k != n && (n - 1) % (k - 1) == 0;

}


Wyszukiwarka

Podobne podstrony:
concurso dgire asesinos
Gramática Exercícios Português Para Concurso
CONCURRENT ENGINEERING
CONCURRENT ENGINEERING
1 5 Data Concurrency and Locking Labid 8997
concurrent programming in mac os x and ios
Ada95 concurrency id 51185 Nieznany (2)
COTA RACIAL CONCURSOS
Coupling of Technologies for Concurrent ECD and Barite Sag Management
Edital 10 16 Concurso Público Prefeitura de Colombo
ANTISOCIAL BEHAVIOR AND DEPRESSIVE SYMPTOMS LONGITUDINAL AND CONCURRENT RELATIONS
1001 QUESTÕES DE CONCURSO PORTUGUÊS FCC 2012
Coupling of Technologies for Concurrent ECD and Barite Sag Management
Brasel, Gips Media Multitasking Behavior Concurrent Television and Computer Usage
Redação Teoria e Prática Série Provas & Concursos 2ª Ed 2012
4 7) 1001 QUESTÕES DE CONCURSO DIREITO PROCESSUAL DO TRABALHO FCC 2012
3 7) 1001 QUESTÕES DE CONCURSO DIREITO DO TRABALHO FCC 2012

więcej podobnych podstron