06 x86 64 Procedures and Stacks

[MUSIC]. Before we close this section on procedures and stacks let's talk about how things change when we go to the 64-bit architecture as popular today. So the calling convention in x86, 64-bit architectures is a little different, and that's because of the doubling of the number of general purpose registers. There's so many more registers available on the 64-bit architectures that we can decrease our use of the stack. And make better use of the registers. So, we're going to store arguments in in registers and we're going to store temporary variables in registers. Of course, we could always run out of registers and we'll fall back to the way we did things in the 32-bit architectures we've just seen. But for the most part, we're going to try make use of those registers as best we can and avoid the use of memory. so let's take a look at the registers in the 64-bit architecture again. there are now 16 general purpose registers, and they are 8 bytes each rather than 4 bytes because we have code words instead of 4 byte words. We also going to extend our Callee and Callee saved and Caller save conventions. And you can see the registers that are annotated here in green being the Callee saved registers. And in yellow are the caller saved registers. also we're going to use six registers, in these locations for passing arguments. And we're going to us up to six registers to take care of six arguments for procedure. Now, if we have more than six arguments, we'll have to go back to using the stack. But for the most part, we'll use these and most procedures have just a couple of parameters. So most of the time, we won't have to use the stack. we're still going to use rax for the return value of a procedure and we're still going to use rsp for a stack point. Okay, let's revisit the swap function and look at it in both the 32-bit architecture which we see here on the left. This is the one we've seen so far, and a swap implementation using 64 bits, okay. And the differences in these two cases are that arguments are passed in registers. So the first argument is now in the register rdi the second in the register rsi, and that's where we find the 64-bit pointer, those two arguments to swap. so we're not having to get them off the stack. So the only stock operation we really need is return. that goes against the return value from the stock and jumps to that location when we're done. By avoiding the stack, and holding all the local information in registers we can make execution much faster one because we have less instructions. As you can see we have quite a quite less instructions for the 64-bit version. But also because we're not going to memory. And the stack is stored in memory. And that is slower to get to than the registers of the CPU. The general purpose registers. We'll learn more about that later when we talk about the memory system. but for now suffice it say that it's a lot faster to go to registers, okay. So the highlights of this then for the 64-bit case, are that arguments up to the first six are stored in registers. It's faster to get to those values there than if they were in memory in the stack. Local variables can also be placed in registers if there's room.and we don't have too many of them. Otherwise, we will have to go back to the stack. We have a callq instruction now instead of a call instruction, which puts a 64-bit return address on the stack. And of course it will have to increm-, decrement the stack pointer by 8, rather than 4 because were putting 8 bytes on the stack. we also have eliminated the use of the frame pointer, remember ebp, our base frame pointer. we're not going to do that anymore and we're going to make all reference relative to the stack pointer, so we won't have to keep track of two registers pointing to stack, but only one. And epb or now its 64 bit version, rbp is going to be available as a general purpose register. And then the way, the reason that works is because we can access memory up to 128 bytes of beyond the rsp where rsp is pointing without having to use multiple instructions. We can do that directly. This is called the red zone, okay? And so we can store these temporary variables on this stack very easily and access them quickly. registers are still designated as caller-saved or callee-saved however, but slightly differently than they were before. Okay, so ideally the 64 bit architecture has no stack frame at all, except for the return address. So we've now shrunk the stack frame down to just one piece of information namely that that it might return address that is placed on the stack. this makes things a lot simply to for manipulating stack and keeping the, making the frames that we need. However, we always have to fall back to the 32-bit architecture conventions if we can't fit things in registers. And that's why we bothered to show you all that 32-bit stack convention even though we're mostly running on 64-bit architectures these days. Because when we have too many local variables we have to go to the stack. when local variables are more complex data structures like arrays or struts, we'll have to put things on the stack. When we have an address for local variable, we'll have to put it on the stack, because we can't have an address to a register we have to have an address to a memory location. so we will have to put it on the stack. And whenever we need more than six arguments to a function we'll need to stack again. and of course saving registers away that also will potentially have to have us use the stack. So, we still need stack frames and it's still important to understand the general case. But to keep in mind that most of the time on 64-bit architectures, that stack frame is tiny. It's just a return address. All right, let's take a look at an example that that illustrates this. we're going to have this function called call proc, which does some has some four local variables of different sizes. and, then does a call to another procedure called proc, and then finally returns a value that it computes according to this expression. Okay. So, the, the way call proc is going to start, of course, is its stack pointer is pointing to where it has to return and whatever procedure called it. that's, the top of the stack. And the first thing that it's going to do is allocate 32 bytes on the stack for the local variables that it will need. And you'll notice that by adjusting the pointer down to 32, the stack pointer to now down here 32 is 4 times 8 bytes so four 8 byte words. that's why I've drawn it as four horizontal sections of memory. Each of those is 8 bytes. And we're going to allocate the four temporary variables, x1 though x4 to these areas here. And you'll notice that x1 occupies 8 bytes. It's a long integer. X2 is just a regular integer, only needs four bytes. X3 is a short int which only needs two, and x4 is a single byte, okay. now why did we allocate two more words, well we're going to see, we're going to need those because this procedure call here has eight arguments more than the six we can do with registers. So we're going to need two places to go put those two extra arguments for our procedure call. All right. So, let's see what the what the first instructions of the function are. As I mentioned, we adjusted the stack pointer by 32 to create that space. and then we moved four values into different locations on the stack. And you'll notice that we used offset to the current stack pointer to find the right places to put them. We put the 8 byte quantity, the quad word that was value 1 for x1 at 16 plus the stack point, that's at this location. then we moved a long word, value 2, to 24 plus the stack pointer. That's at this location. then a word or rather 16 bits value three at 28 plus the stack pointer. Well, 24 was here, four more over puts us here at x3. And then finally, a single byte of value 4 at 31 plus the stack pointer. That's that 24 here, and then 7 over puts us over where we've labelled x4 as byte, that single byte, okay. let's move on now to setting up the parameters, the arguments for calling the function proc. OK, that's the next part of this of this procedure. And, what we see here is a set of instructions, that Set things up for all those arguments. Now, arguments have passed in a particular order in the registers. The first argument has to go into rdi. The second into rsi. The third into rdx. Rcx, r8, r9 until we're, we got six per six arguments. The rest are going to go on the stack. Okay? And that means two more will have to go on the stack. in this case because we have eight arguments. So let's take a look at the first instruction. It moves a quad word with value 1 to rdi. That's the equivalent of putting that x1 there as the first argument. then we are going to need the address of x1. Well, the address of x1 on the stack is here at 16 from the rsp. So you'll notice that we'll calculate that effective address and put it into rsi, the second argument. Then we'll put a value 2 into edx for the third argument and the address of that value, which is a 24-plus rsp, into rcx, that, that address, for x2. Then we will move a 3 int r8d, that's just the 4 bytes, the low order 4 bytes above rate that's how we referred with and then put that address into the address of x3 into r9, r6 argument. And 28 plus rsp is the address of this byte right here. Okay. Lastly we will move for into where the rsp is pointing right now. Remember, the parenthesis are the preference for that and that's argument number 7, put on through the current, onto the stack, at that location. And then the last argument, argument 8 is the address of x4. And the address of x4 can be computed by doing 31 plus the rsp. We're going to put that in rax temporarily, just so that we can then move it to 8 plus the rsp, the slot for the 8th argument. Okay, so now we've set up all 8 argument, 6 in registers, 2 on the stack. And we're ready to call_proc. At this point of course a new return address gets pushed onto the stack that will help us come back to this point in this procedure call_proc after we're done with the proc call. Okay, once that's completed, we will be back here and now have to do that computation to figure out the return value. So how do we do that computation? Well, what we're going to do is make sure we carefully get the the values of our, of our temporary variables in this procedure. and put them into registers with appropriate sign extension. So, we're going to be using these interesting instructions here, that say, move the s stands for extending the sign bit, of the word, into the long. Alright, so we're taking a 16-bit quantity, the word, sign extending it to 32-bits, the long. Okay? That's what the l refers to. And the ss sign extend. Another option is z, for just put 0s there in the other 16 bits. But this says, do the sign extension. And the result goes into the 32-bit register, eax. We'll do the same thing for now. A byte extended to a long. to get the value of x4. And put that in edx. Sign extended. And then subtract that from eax. So this will have computed x3 minus x4. Now of course thats a 32 bit quantity and we going to have to multiply with some 64 bit quantity. So we going to use some more sign extension using the cl. T Q instruction. That sign extends the 32-bit eax register to 64 bits. The next part computes, as you can imagine, x1 plus x2, and it does that by getting the, value of x1, moving it into rdx. Or rather the value of x2, moving it into rdx, and sign extending it to 64 bits, in this case, from a long to a quad. And then, adds the, already 64-bit value of x1. also to rdx, so here we will have x1 plus x2 now as the result. Finally, we take those two registers rax and rdx and do a multiply instruction to compute the final result. Okay. The result is placed into rax, ready to be returned. Remember, we put the return value in the rax register. The last thing we need to do before executing our return statement, is clean up the stack, and get rid of the space we allocated. while we were in this procedure, and to do that, we add 32 to the stack pointer. the opposite of, the subtract 32, that we did at the beginning. Okay, so now we are exactly back to the stack that looks just like it did before. And, we are ready to execute the return instruction that will take us back to whatever called. called proc in the first place. So, to summarize, the 64-bit architectures make a heavy use of registers because they're faster than using the stack in memory. We use them for parameter passing and we use them for temporary variables. Okay, so there's a minimal use of the stack. sometimes, oftentimes actually, we don't use the stack at all except for the return address of the function. And but when needed, when we need the space there, either for those arguments or for more temporary variables, we allocate and deallocate the entire frame in one, at one time. It's just faster than doing multiple pushes and pops. So we don't, also don't bother with a frame pointer anymore and address everything relative to the stack pointer, as we saw in that previous example, okay. This also creates a lot more room for compiler optimizations that can play with registers and how we use them in, best make use of them. so we don't have to have collisions that would cause us to have to save registers and so on. All right? So that's that ends this section and I hope it provided a good overview of procedure call conventions. Both the 32 and 64 bit. remember that we, although we make minimal use of stack frames in the 64-bit architecture. We often have to fall back to the general case, which we saw with the 32-bit, conventions.

Wyszukiwarka

Podobne podstrony:
06 x86 64 Procedures and Stacks
02 x86 vs x86 64
02 Procedure?lls and Returns
06 Memory Related Perils and Pitfalls
06 Memory Related Perils and Pitfalls
2008 01 Music Makers Tuning Up with the 64 Studio and Jad Audio Linux Distros
02 x86 vs x86 64
SHSpec 06 6402C25 What Auditing Is and What It Isn t
06?TECT AND FILTERING OF HARMONICS
01 Stacks in Memory and Stack Operations
duties and procedures
SHSpec 025 6107C05 Q and A Period Procedures in Auditing
06 User Guide for Artlantis Studio and Artlantis Render Export Add ons

więcej podobnych podstron