03 Memory Hierarchies

[MUSIC] . Okay, now that we know how the basics of caches work and what makes them work, which is locality. Let's look how we put a bunch of caches together to form what we call a hierarchy. But before we get there, let's just stop to think to little bit about the costs of cache misses. The difference in, in, in, in time costs of a hit and a miss is huge, it could be 100x. That means that a hit can be 100 times faster than a miss and just, just, just to make you see this in numbers would you believe if I told you that a 99% hit is twice as good as a 97% hit in terms of time. And in this, in this example here let me show you say, say that a cache hit costs 1 cycle. And the penalty of taking a cache miss is a hundred cycles. Okay? It means that the average access time for nin-for ninety-seven percent hit rate is, one cycle. Which you always take the hi-you always take the hit time. Plus, three percent of the time, you're going to pay a hundred cycles which is your penalty that leads to a four cycle average cost. When I, if you look at this number for 99% it'll be one cycle for just a hit times into where it's there or not, plus 1% of the time you're going to take 100 cycles. So that means that the average cycle is two, there's a two access, difference in that average access time when you go from 97 percentage rates to 99 percent hit rate. That's why we often use miss rate more often than we use hit rate, okay? So now let's just look at the, the basic concepts of, of metric, of a cache performance metrics, okay? The first ones called the miss rate, okay? And the miss rate is the fraction of memory references That are not found in the cache. 'Kay, that's one minus the hit rate. 'Kay, so the hit rate is a percentage of accesses that hit in the cache. So one minus the hit rate, so we call the miss rate which is a fraction of accesses that do not hit in the cache. 'Kay, and, and the typical numbers that we see for the first level of cache is which is the level of cache closest to the processor. We'll see what that means in more detail in a second. But for our one, the one that's closest to the processor. The typical number is between 3 and 10%, okay? Now, the hits time is the time it takes to deliver a cache line that's in the cache to the processor so that it can consume the data. Okay. And that also includes the time to determine whether the line is in the cache or not... That's why in the previous example we included the hit time even when computing the overall miss penalty, okay. So now the typical hit time for one is between one and two processor cycles, it means that the L1 cache is very, very fast, okay. Now the miss penalty is a time. You, you pay when you do have a cache maze and you have to go to the lower levels of memory to bring the data, okay, and that varies between 50 and 200 cycles. Okay. So that's why I said before there was about 2, 100x difference between a hit and a miss, well... Here's an example. Okay? So there's something that we call a memory hierarchy, which is putting a bunch of types of memory together, a bunch of levels of memory together. And and the reason this, that this is good is that almost always when you have some faster and, and smaller memories typically mean faster, but they are also more expensive. Okay, so it means, that suggests why not put a little bit of fast, expensive memory closer to eh processor backed up by little bigger memory that's cheaper and a little lower, and compose that all the way down to the disc, even. Okay. So and this is, this is profitable, this is good, because there are gaps between memory technologies, and these gaps are widening, okay. That's so, if you look at the performance of registers versus caches that's widening, okay, as, as, with new processor generations, and so is the gap between cache performance and DRAM performance... And then also between DRAM and disk, and so on. 'Kay. And if well-written programs exhibit locality, that suggests that you can actually build progressively larger and slower memory hierarchies and still give the illusion. From the processor point of view, that the memory is most of the time pretty fast. 'Kay, so this property is really. Complement each other beautifully to form a large pool of memory that behaves as if it were very, very fast. Even though it's composed in, in, in, to a large extent by slow large memory. 'Kay? So they really suggest organizing them in the form of a hierarchy. 'Kay? By that I mean that we have one memory backed by another memory, backed by another memory. And so on okay. So here's another way of looking at the fundamental idea of hierarchy, is that for each level K of the hierarchy serves as a cache for the level below, and the level below is typically larger, lower and cheaper as well. Well, why do they work? Because what i just said before and because of locality, programs tend to access data that's at level K much more often than level K plus 1, okay. Therefore the storage at level K plus 1 can be lower, therefore larger, and cheaper, and also cheaper per bit... So again, repeating myself the big idea within hierarchy is to reach a large pool of storage that costs much as the cheap storage. Your at the bottom, okay, but serves it data to the program mostly by the upper levels, which is faster. Okay, so you can create the illusion of a large pool of memory that is fast and cheap. Okay, so let me give you an example to the memory hierarchy. At the top of the hierarchy our registers, which we saw before. They're different than caches because software has to use them explciity, whereas memory caches typically are used for, all caches that you care to know about work automatically... Okay, so, and then, so this is the registers at the top. The, the CPU is right here, 'kay? Then you have registers inside the, inside the CPU, okay? And then right very close to the processor, we have L, what we call DL1 cache. That's backed by a L2 cache. Some cases on chips, some cases off-chip. Let me write on or off chip, okay? That's all, that's backed up by main memory made of DRAM. That's backed up by disks in case you have virtual memory, which we're going to see later in this, in this course. And you can even imagine this being backed up by caches that are in the network. That, you know, go beyond your machine, 'kay? And in fact, chances are that you guys also have, you know, file caches and you have caches web servers and so on, 'kay? Great, so now one more example. Let's look at the cache hierarchy of the Intel Core i7, which I'm sure many of your have. Okay, so the Core i7 has for each core inside is multicore processor has of course has registers, but it has an L1 cache for data and an L1 cache. For instructions, and it has a unified one cache per core, okay. And then there is a unified L3 cache shared by all cores in the system. So now just inside one, just inside the chip itself... It has multiples cores. There's already three levels of cache; L1, L2, and L3. See you soon.
