[MUSIC].
Okay, now we're going to see part two of
cache organization.
Part two let's start with a general
organization, okay?
So as I've said before, the cache has a
certain number of sets, that's where data
can go.
And each set has a cert, certain number
of lines or blocks.
By the way, line or blocks, cache block
or cache line mean the same thing, okay,
I might use them interchangeably.
so a set has a number of, lines, which is
the same thing as the way.
If I say a cache has a four way
associative that means that we're
going to have one, two, three, four lines
in a set, okay?
So when a cache has two, two to the S
sets and two to the E lines, okay?
So now in each, each line in the cache
has a bit of metadata right.
So now in each, each line in the cache
has a bit of metadata right.
The first one is the tag that we saw in
the previous video.
Right, the tag in codes towards the rest
of the address, that cannot be figured
out just with the set alone, okay, so we
know what data was stored in that block
of the cache.
There's also valid bit here.
The valid bit is used to determined
whether the data stored here, stored in
the data is valid or not.
For example, when the cache starts up it
starts up empty...
So if there's some random addresses
there, we don't to that to get confused
with real, with valid data, so this valid
bit is all set to 0.
The valid bit is also used in the context
of multicores, when we might have to
validate data, when there's remote
updates.
But this is subject to a more advanced
computer architecture class, but the
valid bits is also used in the context of
multi processor.
Finally so the, there's two to the B
bytes in a, in a cache line, okay.
So the reason we are using powers of two
here such that, it will be easy to
figured out how many bits we are going to
to need for, for the address, okay.
and cache line cache block here in our
example is has as many bytes as the value
of this B variable here.
So that means that the cache size is S,
which is the total number of sets
multiplied by the number of ways or the
number of lines per set, multiplied by
the size of the block, capital B.
Let's see how a cache read works now.
There's multiple steps in a cache read.
When the processor sends an address to
the cache to be looked up, the first
thing the cache has to do is locate the
set where the, the data goes, okay?
So remember that we saw in the previous
example determining where the data goes.
We found there's some, we saw that
there's some bits of the address that are
used to determine in which sets the data
goes, okay.
And those bits are here in the middle,
right?
So we're going to have S bits, because if
I have two to the S, little s sets, I'm
going to use little s bits as the set
index, okay.
So I'm going to use that part of the
address, as an index to the cache.
In[INAUDIBLE] this is zero, one, two,
until you know s minus 1, and we use that
value to determine which set here, we're
going to deduce, okay?
And once we do that, what we're going to
do is we're going to see whether the, the
tag of the address, these t bits, the
upper part, of the address matches the
tag stored in the block.
And okay.
So that's what we're going to do.
Now and then we're going to use the lower
part of the address the block offset, to
determine which parts of the block.
We are going to read because as it's said
a cache blog or a cache line can be 16,
32, 64 bytes but we often read one
register's worth of data which could be
four or eight bytes, one byte so we have
to get only part of the file.
So let's see now how this works.
I'm going to show how this works for a
direct mapped cache And with the direct
mapped cache there's a single block for
set.
OK, so what we're going to do is we start
with our address.
We're going to get find a set, so it
happens to be 0001 here.
We know that we're interested in this
line here.
What is the next step Well, the next step
is to, the term is to[UNKNOWN] that's
select that line, and then determines
whether the tag matches.
Okay.
if the tag, if the T bit, the upper part
of the address matches the tag stored in
the cache.
If it matches we have a hit.
If it doesn't match, we have a miss,
okay?
So the next step in the sof so it doesn't
if, if it matches and the bit is valid we
have a hitch.
Okay, if either one of these are not true
if its either not valid or its not a
match its going to be a mess.
Okay, so now let's assume that the cache
block size is 8 bytes, which means that
now here we have 8 bytes, okay?
So now depending on what's going to to
happen, we're going to use the block
offsets to determine, it happens to be
1000 which is in this part of the
address, so that's what we're going
to[UNKNOWN] , so these are the 4 bytes
that are going to be read in this In this
example, okay?
Great, so now if there's no match.
If it's if it's a cache miss, what's
going to happen?
The old line, or whatever was here is
going to be scraped, or is going to be
thrown away.
And the data comes back from memory is
going to come back and replace the data
that was there.
Okay.
Now we know how this, how[UNKNOWN] work
on direct mapped cache.
Let's here how[UNKNOWN] work on a set
associative cache.
Okay.
In a generic E-way set-associative cache,
okay.
In our example here let's make E equals
2.
Means that it's a 2 way set-associative
cache that means I have two lines per
set.
Now I have two lines per set, okay?
What is the first thing we're going to
do?
Again, we're going to use the middle part
of the address, so find which set it
goes.
Let's say that it goes again to, to set
one.
But now what we're going to do, we're
going to select this, this set here.
Now we are Focusing on two lines now it
means that the data here can go in two
places in the cache so the comparison
with the t bits of the address with the
tags now has to involve two values now
we're doing two comparisons.
So, and then we go to the same thing.
If either one of them is valid, it it has
a match then we have a hit.
Could be neither 1 of them.
Of course we're not going to have both of
them otherwise it's probably there's a
bug in your cache.
It should be either or the other but not
in both, okay?
So we should[UNKNOWN] only of them, but
if both of them is, have a miss, they do
not, either don't match or the bit is
invalid, the invalid bit is set to 0 then
it's a cache miss, Okay?
So let's say that we determine that this
is a hit.
So now the same thing, we're going to get
the data, so now we just have, we happen
to do a two byte, we'll do a short int
right, with only two bytes...
That's where the offset starts.
We're going to get these two bytes here,
read it and send it to the processor,
okay?
So if there's no match, one one line, one
of those selected for, for eviction, and
replaced.
As I said before we normally use this
policy called least recently used, okay?
Okay, now let's look at these types of
cache misses, many types of cache misses,
okay?
There's three types.
Three types, okay?
That's what we call the three Cs of cache
misses.
There's actually a fourth one when
we[INAUDIBLE] processes, but so that
might be for another time.
the first type of miss is called Code
Miss or compose three miss.
That's a miss, there's nothing you can do
about it.
So it's a code, it's the first time you
accessed the data.
We haven't seen it before, so it's a
miss.
It's the first time you see the data.
In fact, I lied, something you cannot do
anything about.
There's some techniques called[UNKNOWN]
that can reduce that.
But typically a code miss its hard to
avoid and just because[UNKNOWN].
Now a conflict miss is a miss that
happens just because of the cache
orginization when you have multiply the
system map into the same set.
one could kick the other one out.
When that happebns your dealing with
conflict, conflict is not normally a good
thing.
And in the case of caches, you just kick
each other out, so that's going to reduce
the effectiveness of your cache, okay?
So, and for example, if it's just one, if
it's a direct map cache, you have more
conflict misses.
As you increase the set associativity,
the number of ways in your cache.
Okay, you're going to have lower conflict
miss because it will be fewer cases where
data will kick other useful data out,
okay?
So, conflict misses happen when cache is
large enough that multiple data objects
map to the same slot, okay?
Great.
Now the final type of cash miss is a
capacity miss.
A capacity miss just means that the
amount of data that you cycle through,
that you access repeatedly, which is
called the working set data, is larger
than the cache, so[UNKNOWN] you'll be
kicking things out, okay?
So one way to think of conflict misses
is, by the way...
If you increase the[UNKNOWN] and the miss
will go away it's because there was a
conflict miss.
Okay, so far we have talked only about
reads, but writers are very important
too.
The most important thing about writes is
that the data might occur in multiple
places...
So if I have for example I have my CPU
and I have my L1 cache, L2 cache and so
on.
If I access a piece of data A here from
the CPU the data is going to be stored in
tier one eventually and in tier two as
well.
Now, but this, this is when I'm reading,
reading A.
But what happens when I say I now write
2A.
If I write only here, I'm going to
disagree what's here, and vice-versa.
Okay, so the main problem is that the
copies of the thing that is spread over
the caches in the hierarchy might
disagree with each other.
That's not a good thing, disagreement is
not a good thing, okay?
So, so what to do now?
Write hit, that's an important question.
There's two basic policies.
One is called wire through, which, that
means that if I, say have my CPU here and
I have an L1 cache and then here in our
example, say that in this example, say
they have memory.
When a processor says write A, so the CPU
sends data straight to memory.
You might have to take whatever is in the
video one cache, but it sends data
always.
And the bad thing about that is that
every time the CPU writes, so something
always has to go to memory, even if it's
in the cache right?
So because it's in the cache you do both.
You write-through immediately to
immediately write it to memory updating
the cache, okay?
Now, write-back works as follows, this is
write-through.
With write-back, when the CPU writes to A
here, it just writes to the cache.
And now there's a little bit here that
got set, that says okay, this data is now
dirty.
It means that the copy memory might not
be the same.
So whenever this line is displaced, is
kicked out of the cache, it goes and
sends the data to memory.
Okay, so we need a dirty bit to indicate
if the line is different.
So, now this is what do in a write hit,
but to do in a write miss you have two
options.
One is called write allocate, should you
load it into the cache first and then do
the write.
Or and that's good, because if you're
going to write and then do more writes
that's a good thing because we're
allocated the data.
Okay.
the other option is no write allocate.
So you just write if its a message just
write some memory don't do anything to
the cache's, okay?
in another, another way that write
allocates is used for two is if you do a
read if you do a write that's later going
to do a reads thats a good thing because
its likely to be in the cache.
So typical cashes are either write back
with write allocate usually that's the
common case, although write through and
no write allocate occasionally you see
that.
Especially some machines that have
multiple processors.
to end this video let's revisit the
Pentium, the not Pentium, the Core i7
cache hierarchy, okay?
Remember that I showed you there were
multiple L1 caches, L2 caches, and
there's an L3 cache shared by all of the
course, but now we can understand some of
the other data here.
When I said that the L1 is A to[UNKNOWN]
associative that means that both he L1,
the data L1 and the instruction L1.
The d-cache and i-cache are both sets
of[UNKNOWN], they're not direct match.
And that's a pretty high sensitivity
because the designer has decided that it
was worth throwing the complexion to
reduce the type of misses, and to reduce
conflict misses.
Now, the L2 unified cache, and by the
way, the L1 cache it 32 kilobytes each.
The auto cache is eight times bigger, and
it's also an eight-way efficiency.
And notice that the access time of the L1
is four cycles, the acces time of the L2
is 11 cycles.
Now the L3 is all ready much much bigger,
we're talking about eight megabytes and
even higher associativity to have even
lower conflict misses.
And that's because the well, the, the
access time is is also much slower than
the L2.
But the reason that the L3 also has much
higher associativity is you want to
reduce the misses.
When you reach the L3 you better hit,
because if you don't you have to go to
memory, which is much, much more
expensive, okay?
And a block size here is 64 bytes in all
cases.
Okay, see you soon.
Wyszukiwarka
Podobne podstrony:
dysleksja organizacja pomocy w szkole04?che Organizationchemia organiczna200 Notatki organizacyjne20 Organizacja usług dodatkowych w zakładzie hotelarskimElementy wymagan organizacyjnes contNie wspierajcie ekumenicznej organizacji Open Doors!Elementy struktury organizacyjnej i zarządzanie projektowaniem organizacjiActive Directory omówienie domyślnych jednostek organizacyjnychskały charakterystyka (folie) 2 skały pochodz organicznegoOrganizm człowiekaOrganizacja i zarzadzanie 01więcej podobnych podstron