05 Cache Organization cont

[MUSIC]. Okay, now we're going to see part two of cache organization. Part two let's start with a general organization, okay? So as I've said before, the cache has a certain number of sets, that's where data can go. And each set has a cert, certain number of lines or blocks. By the way, line or blocks, cache block or cache line mean the same thing, okay, I might use them interchangeably. so a set has a number of, lines, which is the same thing as the way. If I say a cache has a four way associative that means that we're going to have one, two, three, four lines in a set, okay? So when a cache has two, two to the S sets and two to the E lines, okay? So now in each, each line in the cache has a bit of metadata right. So now in each, each line in the cache has a bit of metadata right. The first one is the tag that we saw in the previous video. Right, the tag in codes towards the rest of the address, that cannot be figured out just with the set alone, okay, so we know what data was stored in that block of the cache. There's also valid bit here. The valid bit is used to determined whether the data stored here, stored in the data is valid or not. For example, when the cache starts up it starts up empty... So if there's some random addresses there, we don't to that to get confused with real, with valid data, so this valid bit is all set to 0. The valid bit is also used in the context of multicores, when we might have to validate data, when there's remote updates. But this is subject to a more advanced computer architecture class, but the valid bits is also used in the context of multi processor. Finally so the, there's two to the B bytes in a, in a cache line, okay. So the reason we are using powers of two here such that, it will be easy to figured out how many bits we are going to to need for, for the address, okay. and cache line cache block here in our example is has as many bytes as the value of this B variable here. So that means that the cache size is S, which is the total number of sets multiplied by the number of ways or the number of lines per set, multiplied by the size of the block, capital B. Let's see how a cache read works now. There's multiple steps in a cache read. When the processor sends an address to the cache to be looked up, the first thing the cache has to do is locate the set where the, the data goes, okay? So remember that we saw in the previous example determining where the data goes. We found there's some, we saw that there's some bits of the address that are used to determine in which sets the data goes, okay. And those bits are here in the middle, right? So we're going to have S bits, because if I have two to the S, little s sets, I'm going to use little s bits as the set index, okay. So I'm going to use that part of the address, as an index to the cache. In[INAUDIBLE] this is zero, one, two, until you know s minus 1, and we use that value to determine which set here, we're going to deduce, okay? And once we do that, what we're going to do is we're going to see whether the, the tag of the address, these t bits, the upper part, of the address matches the tag stored in the block. And okay. So that's what we're going to do. Now and then we're going to use the lower part of the address the block offset, to determine which parts of the block. We are going to read because as it's said a cache blog or a cache line can be 16, 32, 64 bytes but we often read one register's worth of data which could be four or eight bytes, one byte so we have to get only part of the file. So let's see now how this works. I'm going to show how this works for a direct mapped cache And with the direct mapped cache there's a single block for set. OK, so what we're going to do is we start with our address. We're going to get find a set, so it happens to be 0001 here. We know that we're interested in this line here. What is the next step Well, the next step is to, the term is to[UNKNOWN] that's select that line, and then determines whether the tag matches. Okay. if the tag, if the T bit, the upper part of the address matches the tag stored in the cache. If it matches we have a hit. If it doesn't match, we have a miss, okay? So the next step in the sof so it doesn't if, if it matches and the bit is valid we have a hitch. Okay, if either one of these are not true if its either not valid or its not a match its going to be a mess. Okay, so now let's assume that the cache block size is 8 bytes, which means that now here we have 8 bytes, okay? So now depending on what's going to to happen, we're going to use the block offsets to determine, it happens to be 1000 which is in this part of the address, so that's what we're going to[UNKNOWN] , so these are the 4 bytes that are going to be read in this In this example, okay? Great, so now if there's no match. If it's if it's a cache miss, what's going to happen? The old line, or whatever was here is going to be scraped, or is going to be thrown away. And the data comes back from memory is going to come back and replace the data that was there. Okay. Now we know how this, how[UNKNOWN] work on direct mapped cache. Let's here how[UNKNOWN] work on a set associative cache. Okay. In a generic E-way set-associative cache, okay. In our example here let's make E equals 2. Means that it's a 2 way set-associative cache that means I have two lines per set. Now I have two lines per set, okay? What is the first thing we're going to do? Again, we're going to use the middle part of the address, so find which set it goes. Let's say that it goes again to, to set one. But now what we're going to do, we're going to select this, this set here. Now we are Focusing on two lines now it means that the data here can go in two places in the cache so the comparison with the t bits of the address with the tags now has to involve two values now we're doing two comparisons. So, and then we go to the same thing. If either one of them is valid, it it has a match then we have a hit. Could be neither 1 of them. Of course we're not going to have both of them otherwise it's probably there's a bug in your cache. It should be either or the other but not in both, okay? So we should[UNKNOWN] only of them, but if both of them is, have a miss, they do not, either don't match or the bit is invalid, the invalid bit is set to 0 then it's a cache miss, Okay? So let's say that we determine that this is a hit. So now the same thing, we're going to get the data, so now we just have, we happen to do a two byte, we'll do a short int right, with only two bytes... That's where the offset starts. We're going to get these two bytes here, read it and send it to the processor, okay? So if there's no match, one one line, one of those selected for, for eviction, and replaced. As I said before we normally use this policy called least recently used, okay? Okay, now let's look at these types of cache misses, many types of cache misses, okay? There's three types. Three types, okay? That's what we call the three Cs of cache misses. There's actually a fourth one when we[INAUDIBLE] processes, but so that might be for another time. the first type of miss is called Code Miss or compose three miss. That's a miss, there's nothing you can do about it. So it's a code, it's the first time you accessed the data. We haven't seen it before, so it's a miss. It's the first time you see the data. In fact, I lied, something you cannot do anything about. There's some techniques called[UNKNOWN] that can reduce that. But typically a code miss its hard to avoid and just because[UNKNOWN]. Now a conflict miss is a miss that happens just because of the cache orginization when you have multiply the system map into the same set. one could kick the other one out. When that happebns your dealing with conflict, conflict is not normally a good thing. And in the case of caches, you just kick each other out, so that's going to reduce the effectiveness of your cache, okay? So, and for example, if it's just one, if it's a direct map cache, you have more conflict misses. As you increase the set associativity, the number of ways in your cache. Okay, you're going to have lower conflict miss because it will be fewer cases where data will kick other useful data out, okay? So, conflict misses happen when cache is large enough that multiple data objects map to the same slot, okay? Great. Now the final type of cash miss is a capacity miss. A capacity miss just means that the amount of data that you cycle through, that you access repeatedly, which is called the working set data, is larger than the cache, so[UNKNOWN] you'll be kicking things out, okay? So one way to think of conflict misses is, by the way... If you increase the[UNKNOWN] and the miss will go away it's because there was a conflict miss. Okay, so far we have talked only about reads, but writers are very important too. The most important thing about writes is that the data might occur in multiple places... So if I have for example I have my CPU and I have my L1 cache, L2 cache and so on. If I access a piece of data A here from the CPU the data is going to be stored in tier one eventually and in tier two as well. Now, but this, this is when I'm reading, reading A. But what happens when I say I now write 2A. If I write only here, I'm going to disagree what's here, and vice-versa. Okay, so the main problem is that the copies of the thing that is spread over the caches in the hierarchy might disagree with each other. That's not a good thing, disagreement is not a good thing, okay? So, so what to do now? Write hit, that's an important question. There's two basic policies. One is called wire through, which, that means that if I, say have my CPU here and I have an L1 cache and then here in our example, say that in this example, say they have memory. When a processor says write A, so the CPU sends data straight to memory. You might have to take whatever is in the video one cache, but it sends data always. And the bad thing about that is that every time the CPU writes, so something always has to go to memory, even if it's in the cache right? So because it's in the cache you do both. You write-through immediately to immediately write it to memory updating the cache, okay? Now, write-back works as follows, this is write-through. With write-back, when the CPU writes to A here, it just writes to the cache. And now there's a little bit here that got set, that says okay, this data is now dirty. It means that the copy memory might not be the same. So whenever this line is displaced, is kicked out of the cache, it goes and sends the data to memory. Okay, so we need a dirty bit to indicate if the line is different. So, now this is what do in a write hit, but to do in a write miss you have two options. One is called write allocate, should you load it into the cache first and then do the write. Or and that's good, because if you're going to write and then do more writes that's a good thing because we're allocated the data. Okay. the other option is no write allocate. So you just write if its a message just write some memory don't do anything to the cache's, okay? in another, another way that write allocates is used for two is if you do a read if you do a write that's later going to do a reads thats a good thing because its likely to be in the cache. So typical cashes are either write back with write allocate usually that's the common case, although write through and no write allocate occasionally you see that. Especially some machines that have multiple processors. to end this video let's revisit the Pentium, the not Pentium, the Core i7 cache hierarchy, okay? Remember that I showed you there were multiple L1 caches, L2 caches, and there's an L3 cache shared by all of the course, but now we can understand some of the other data here. When I said that the L1 is A to[UNKNOWN] associative that means that both he L1, the data L1 and the instruction L1. The d-cache and i-cache are both sets of[UNKNOWN], they're not direct match. And that's a pretty high sensitivity because the designer has decided that it was worth throwing the complexion to reduce the type of misses, and to reduce conflict misses. Now, the L2 unified cache, and by the way, the L1 cache it 32 kilobytes each. The auto cache is eight times bigger, and it's also an eight-way efficiency. And notice that the access time of the L1 is four cycles, the acces time of the L2 is 11 cycles. Now the L3 is all ready much much bigger, we're talking about eight megabytes and even higher associativity to have even lower conflict misses. And that's because the well, the, the access time is is also much slower than the L2. But the reason that the L3 also has much higher associativity is you want to reduce the misses. When you reach the L3 you better hit, because if you don't you have to go to memory, which is much, much more expensive, okay? And a block size here is 64 bytes in all cases. Okay, see you soon.

Wyszukiwarka

Podobne podstrony:
dysleksja organizacja pomocy w szkole
04?che Organization
chemia organiczna2
00 Notatki organizacyjne
20 Organizacja usług dodatkowych w zakładzie hotelarskim
Elementy wymagan organizacyjne
s cont
Nie wspierajcie ekumenicznej organizacji Open Doors!
Elementy struktury organizacyjnej i zarządzanie projektowaniem organizacji
Active Directory omówienie domyślnych jednostek organizacyjnych
skały charakterystyka (folie) 2 skały pochodz organicznego
Organizm człowieka
Organizacja i zarzadzanie 01

więcej podobnych podstron