[MUSIC]
.
Okay, now that we know how the basics of
caches work and what makes them work,
which is locality.
Let's look how we put a bunch of caches
together to form what we call a
hierarchy.
But before we get there, let's just stop
to think to little bit about the costs of
cache misses.
The difference in, in, in time costs of a
hit and a miss is huge, it could be 100x.
That means that a hit can be 100 times
faster than a miss and just, just, just
to make you see this in numbers would you
believe if I told you that a 99% hit is
twice as good as a 97% hit in terms of
time.
And in this, in this example here let me
show you say, say that a cache hit costs
1 cycle.
And the penalty of taking a cache miss is
a hundred cycles.
Okay?
It means that the average access time for
nin-for ninety-seven percent hit rate is,
one cycle.
Which you always take the hi-you always
take the hit time.
Plus, three percent of the time, you're
going to pay a hundred cycles which is
your penalty that leads to a four cycle
average cost.
When I, if you look at this number for
99% it'll be one cycle for just a hit
times into where it's there or not, plus
1% of the time you're going to take 100
cycles.
So that means that the average cycle is
two, there's a two access, difference in
that average access time when you go from
97 percentage rates to 99 percent hit
rate.
That's why we often use miss rate more
often than we use hit rate, okay?
So now let's just look at the, the basic
concepts of, of metric, of a cache
performance metrics, okay?
The first ones called the miss rate,
okay?
And the miss rate is the fraction of
memory references That are not found in
the cache.
'Kay, that's one minus the hit rate.
'Kay, so the hit rate is a percentage of
accesses that hit in the cache.
So one minus the hit rate, so we call the
miss rate which is a fraction of accesses
that do not hit in the cache.
'Kay, and, and the typical numbers that
we see for the first level of cache is
which is the level of cache closest to
the processor.
We'll see what that means in more detail
in a second.
But for our one, the one that's closest
to the processor.
The typical number is between 3 and 10%,
okay?
Now, the hits time is the time it takes
to deliver a cache line that's in the
cache to the processor so that it can
consume the data.
Okay.
And that also includes the time to
determine whether the line is in the
cache or not...
That's why in the previous example we
included the hit time even when computing
the overall miss penalty, okay.
So now the typical hit time for one is
between one and two processor cycles, it
means that the L1 cache is very, very
fast, okay.
Now the miss penalty is a time.
You, you pay when you do have a cache
maze and you have to go to the lower
levels of memory to bring the data, okay,
and that varies between 50 and 200
cycles.
Okay.
So that's why I said before there was
about 2, 100x difference between a hit
and a miss, well...
Here's an example.
Okay?
So there's something that we call a
memory hierarchy, which is putting a
bunch of types of memory together, a
bunch of levels of memory together.
And and the reason this, that this is
good is that almost always when you have
some faster and, and smaller memories
typically mean faster, but they are also
more expensive.
Okay, so it means, that suggests why not
put a little bit of fast, expensive
memory closer to eh processor backed up
by little bigger memory that's cheaper
and a little lower, and compose that all
the way down to the disc, even.
Okay.
So and this is, this is profitable, this
is good, because there are gaps between
memory technologies, and these gaps are
widening, okay.
That's so, if you look at the performance
of registers versus caches that's
widening, okay, as, as, with new
processor generations, and so is the gap
between cache performance and DRAM
performance...
And then also between DRAM and disk, and
so on.
'Kay.
And if well-written programs exhibit
locality, that suggests that you can
actually build progressively larger and
slower memory hierarchies and still give
the illusion.
From the processor point of view, that
the memory is most of the time pretty
fast.
'Kay, so this property is really.
Complement each other beautifully to form
a large pool of memory that behaves as if
it were very, very fast.
Even though it's composed in, in, in, to
a large extent by slow large memory.
'Kay?
So they really suggest organizing them in
the form of a hierarchy.
'Kay?
By that I mean that we have one memory
backed by another memory, backed by
another memory.
And so on okay.
So here's another way of looking at the
fundamental idea of hierarchy, is that
for each level K of the hierarchy serves
as a cache for the level below, and the
level below is typically larger, lower
and cheaper as well.
Well, why do they work?
Because what i just said before and
because of locality, programs tend to
access data that's at level K much more
often than level K plus 1, okay.
Therefore the storage at level K plus 1
can be lower, therefore larger, and
cheaper, and also cheaper per bit...
So again, repeating myself the big idea
within hierarchy is to reach a large pool
of storage that costs much as the cheap
storage.
Your at the bottom, okay, but serves it
data to the program mostly by the upper
levels, which is faster.
Okay, so you can create the illusion of a
large pool of memory that is fast and
cheap.
Okay, so let me give you an example to
the memory hierarchy.
At the top of the hierarchy our
registers, which we saw before.
They're different than caches because
software has to use them explciity,
whereas memory caches typically are used
for, all caches that you care to know
about work automatically...
Okay, so, and then, so this is the
registers at the top.
The, the CPU is right here, 'kay?
Then you have registers inside the,
inside the CPU, okay?
And then right very close to the
processor, we have L, what we call DL1
cache.
That's backed by a L2 cache.
Some cases on chips, some cases off-chip.
Let me write on or off chip, okay?
That's all, that's backed up by main
memory made of DRAM.
That's backed up by disks in case you
have virtual memory, which we're going to
see later in this, in this course.
And you can even imagine this being
backed up by caches that are in the
network.
That, you know, go beyond your machine,
'kay?
And in fact, chances are that you guys
also have, you know, file caches and you
have caches web servers and so on, 'kay?
Great, so now one more example.
Let's look at the cache hierarchy of the
Intel Core i7, which I'm sure many of
your have.
Okay, so the Core i7 has for each core
inside is multicore processor has of
course has registers, but it has an L1
cache for data and an L1 cache.
For instructions, and it has a unified
one cache per core, okay.
And then there is a unified L3 cache
shared by all cores in the system.
So now just inside one, just inside the
chip itself...
It has multiples cores.
There's already three levels of cache;
L1, L2, and L3.
See you soon.