06 Tutorial Buffer Overflows


[MUSIC] This video is to help you understand a little bit about the buffer overflow problems you've been encountering and exploiting on one of the assignments, one of the programming assignments. let's start off by talking about what a buffer overflow is. primary this happens because C doesn't check array bounds. and what happens is that when we take input from, example, the keyboard we don't specify a limit on the amount of input. So, as we fill up a buffer with the input we're taking each keystroke from the keyboard, we can actually go past the end of an array, that we're using as our input buffer to store each of those characters. And this is, was, this was a very common type of security vulnerability. what we'll go over in this video is how this stack is laid out and the address space in general that allows this, allowed this to happen. How input buffers are put on the stack. And then how overflowing the buffer can help us inject unwanted code, into our program. And then we'll close with, what are now the defenses against buffer overflows that have prevented this kind of vulnerability. Okay, so remember the, Linux Memory Layout. we have a stack that grows from the high end of memory downward. and then we have a, our program, our programs static data and dynamic data at the bottom addresses with the dynamic data growing upward, okay? Remember the, the dynamic data is allocated using malloc or new or other memory allocation primitives like that. The stack is allocated by making procedure calls. And, what each procedure needs to place on the stack. Things like local variables arguments for the next procedure call and so on. Okay. So just to reiterate that point, here's an example program and let's see where everything actually ends up going in memory. you can see that this program has a couple of big chunks of memory, two big arrays allocated at the top, some variables of various types, some instant pointers. And then a couple of, a couple of procedures, some of which have statements that do some dynamic memory allocation. So, lets take a look at where those things all end up in memory. we'll start with of course, our main procedure and our useless procedure. They end up in the text portion of memory. That's where we'll have the ins, the assembly instructions for those procedures. Then there's some static data, including those big arrays and the variables we declared. That space will be allocated here. even prior to running the program, we'll know in advance how much space we need there. Then there's parts of the program of course those malloc arrays that allocate memory when it's needed and that will come from the heap space that will continue to grow upward. And of course we can free up that space when we don't need it anymore. And use it for other things. our stack pointer of course points to the top of the stack. And we also have some libraries for example the malloc routine that we've linked at, at run time, and that usually sits in this area down here. Okay, so that's where things are ending up in memory. And what happened in the late 80s is a program was created that could actually attack a lot of internet hosts. by exploiting the way that this a, memory allocation was done, and the way that memory was arranged. because the stack grows backwards in memory, and data and instructions are stored in the same memory, we can do an interesting attack. That can help us take control of a some ones machine. let's see how, how this happened. Okay, the stack was based the internet worm was based on a stack buffer overflow exploit. Okay and again as I mentioned at the beginning, it's because many Unix functions do not check argument sizes, so they'll just allow us to fill and over fill or overflow a buffer, for our input. Let's take a look at a common function in Linux, the gets function. which is used to to get an input string from the keyboard. You'll notice that the function returns a pointer to a character buffer, a character a buffer of bytes. And it takes as its argument the start of that of that buffer. Okay. So, let's, let's walk though this code real quickly. You'll see it starts off by getting a character from the keyboard, a single character and setting the, setting a pointer to the destination address. And then asks, is this character, something that if, something that is not the end of file and not a, a new line, or a return. And if it isn't, in other words if it's just some other character, then it will, put that character at where the pointer is pointing, the reference in the pointer. And then it increments the pointer by one, which because of pointer arithmetic will point to the next place in the buffer. In this case they're characters, they'll increment by one. And then it just calls get character again and repeats the loop. asking us again, we reached a return or an end of file. when we're done, it adds one more thing to the end of the string. Namely the null character. Because remember in C, we indicate the end of the string with a null, and then finally just returns that same address it started with, as the place where it placed the input characters. Okay. So, what could go wrong in this code? Well, you can see that. how big is this buffer? we don't know because we're just giving a starting address. we have no idea how much space was allocated in memory for this buffer. we're just going to keep putting things in it until, as long as there's more input. And we'll keep going. All right, so in fact there's no way to specify the limit on the number of characters to read. As defined and this is problem in many a, similar problem in many other unix functions. Like string copy, which is just given to addresses and says, copy a string from here to there. But doesn't bother to check that the destination can hold that length, string. Similarly in in scan F, these are functions used to get input from the keyboard, we have a similar kind of problem as it gets strings of unknown sides. Alright so, lets do the smallest possible example we can, that can show this off. Here we have a simple C function called Echo, it's called from main, notice here, the main prints type of string and then asks us to input a string. And the function Echo is just going to echo it back to the, console after we hit return. All right. So, we will first read a string and then, write the string back out to the console. After we're done. So, how big is the buffer? Well, we've decided on a buffer of size four. Just four bytes. pretty small buffer. But let's see what happens when we run this code. 'Kay. I'm going to run this code and type this string 1,2, 3,4, 5, 6,7, 8. You noticed I typed eight characters, not four. I've gone past the end of that buffer and written into other parts of memory we'll see in a second where those are and there's a segmentation fault. The we tried to, somehow our CPU tries to use an address it shouldn't, and the system complains and says you have a fault here. Something went terribly wrong, but now if we type the string 1, 2, 3, 4, 5, 6, 7, you'll notice that that's only one shorter but it's still greater than four. it'll echo it just fine. It'll just print that right back out. Well why didn't that cause a problem? That overflowed the buffer as well, and if we type the string 123456789ABC, of course we'll keep getting the segmentation faults as these strings overflow the buffer. So, let's take a look for each of these cases, just what is going on, what is happening in this system. To start, lets review the the assembly code that might be generated by echo. You'll notice here there's just some, that usual set up stuff at the beginning and clean up stuff at the end. And in between, we compute an address that we're going to use for, some purposes. Allocate some space on the stack, save that value onto the stack and then call gets. When gets returns, we're just going to basically call puts right after it. to echo the values, then reclaim our stack and return. Why do we, what are we putting on the stack? well we're probably allocating some space for the buffer as well as other things, okay. Remember that buffer was of size four bytes. the code down here in the pink region is the code that we might see in main that calls echo then does something else. And eventually cleans up and returns. Okay. So before we execute that call to gets in echo that occurs here, this is likely what's going to be on the stack. There will have been a stack frame for the main procedure. And then, of course when main called echo, it placed the return address on the stack. That would be the return to the next line after the call to echo. And then we see the stack frame for for the echo function, alright and that involves pushing EBP on to the stack that happens here Then we also push EBX onto the stack, so we that there and then there's some space allocated for our four character buffer on the stack. And that's computed at this location here. the address of the stack you'll notice is the current EBP minus eight bytes is the address of the the buffer. This functions may also allocate some other space on the stack the last thing that's put on the stack is the argument to to gets namely, the address of the buffer. So, let's see what is what that stack looks like, just before we go to get it to gets. Again we've allocated we've placed in the EBP onto the stack, the old EBP onto the stack that's pointing to some earlier place in the stack frame. you notice here the address is ffc658. That would be maybe the start of the, of mains F stack frame. our buffer is here, and remember we have to think about it as not having any particular values in it at this point. We haven't written anything to those locations yet, and then we have put the address of the buffer as the argument to gets. Okay, and that's placed on the stack just before we do that, call. finally, the return address from when we're done with echo is 040885f7, which is the address in main for the instruction after the call to echo. Where we will be returning. Alright, so this is what our stack looks like at this point. let's see what happens next. As we enter the characters 1, 2, 3, 4, 5, 6, 7 followed by a Return one of our examples therefore our input, you'll notice that we'll fill the buffer with the ASCII codes for each of the characters. There's the 1, the 2, the 3, the 4 all the way to the 7, closing with that null, with that null byte to indicate the end of the string. Now, you notice that we've overrun our buffer. We haven't just filled the four bytes, we've filled eight bytes, and overwritten ebx that the that the safe value of ebx that have been stored on the stack. That might cause some problems later on, if we needed that value but it turns out in this case that it's not an issue and we can return just fine After this call, the the return address hasn't been affected, the saved value of ebp hasn't been affected. Everything can still function properly and that's why we actually print that string correctly, we're able to do that. Now, if we add one more character and put in an input of 1, 2, 3, 4, 5, 6, 7, 8. You'll notice that we'll overwrite part of the saved ebp. So, now rather than being, the frame being reset correctly for main to point at the beginning, it'll be pointing at the wrong location, 00FF instead of FFFF. and, that will cause that segmentation fault that we'll see happen later because when we pop ebp in the lead instruction. for echo we'll get the wrong value and main stack frame will be improperly addressed. When I put in the longer string, 12345678ABC, we now go even further and not only overrun the buffer and the save edx and the save ebp, we actually also change a byte of the return address. So, now when we go to return from echo, we're not even going to return to the main procedure anymore. we're going to return to another address rather then going to 080485f7, we're going to go to 000485F7. And that will def, who knows what we will find there? So, that's the main problem with this, buffer overflow is we start destroying data that is on the stack. Both the stack frame, the stack frame discipline is broken, we destroyed the return addresses, we return, destroyed this, Potentially destroy some frame pointers. Okay, so how can we use this maliciously to get the machine to do whatever we want? well what we can do is input a string that isn't just simple characters, but actually are the bytes that could represent some executable code. So, that as we go back and overwrite that buffer, we can overrun that buffer and go and write over the return address, we can actually change the return address. To point into the area we just wrote. So, rather than let's take a look at this situation here. Where we have this function bar, that we're calling. And it's supposed to be returning of course to the address immediately at the bar. Let's call that A. That was the, that's the return address we place on the stack. But then we call bar in it, does that, a buffer allocation and allows us to overrun the buffer. We could actually go as far as inserting code, into the buffer, so that we put some special code, here, of our own. As well as continuing down to overrun where the return address A was stored. And change it to be address B. So, we get get of A, and, and put the address B. The address B is just going to be the start of our buffer, where we've put our own code. And so now rather than returning to this address here in the function foo, we're going to return, erroneously to part of our stack where we've injected some special code, that we'll execute, for our own purposes. And this is how we end up hijacking a machine. we basically overrun its stack, and write our own code in there, and force, a jump to that location. All right. So, that's what we're going to be doing in assign, in one of our programming assignments. is writing code to do this. Now this works in machines in the late 80s, early 90s. it doesn't really work today. But the way that Internet worm does, is that it took advantage of a function in Linux called, finger, which takes somebody’s e-mail address and would tell you whether they were logged in, to the machine or not. And how long they might have been idle. so, it used gets to get that input. And so instead of providing that, that correct input of, that would probably be on be of a maybe 64 byte length or so. We could actually put in something that was much, much longer that included some exploit code The code we actually want to have run, some padding to get us as far as we need to go to make sure to overwrite the return address, and then that new return address that would actually be the start of the exploit code. Alright? and then that would allow us to take over the victim's machine. So, buffer overflow exploits were really common in the late 80s early 90s. What has happened since then? Well, we've gone in and changed those Linux libraries that like gets function, we now use fgets which is a function that's been defined to have an additional argument. which specifies the size of the buffer so we will not read any more characters than that limit. therefore we can now, stop before we get to the end of the buffer and know that we're not overwriting any other areas. similarly, string copy is another function that's been modified. It's now, there's a version called string n copy. where the ends argument specifies the size of the string to copy, so we don't copy some long string into the space for a smaller one and overrun that. similarly scanf has been changed so that rather than having the percent s specification, that just read a string of arbitrary size, it now uses fgets with percent ns. which says read a string but of size n maximum, again limiting the size of that input. but there's been some other changes as well, some system level protections. one is to get the compiler to consider, randomizing the stack offsets, the size of the frame for each stack frame. So that it is difficult to, for an attacker to know how much padding to add. Because every time the procedure is called it adds a slightly it, it makes a frame of a slightly different size. so that it makes it difficult for to predict the size of that code and where we're going to get that return address that we want to be able to jump to the exploit code. To be just, at the right place becomes very hard to do that. people have also developed techniques for detecting stack corruption, checking the stack before and after procedure calls, that it might be sensitive, to these buffer overflow exploits. To check that the stack has not been. And then finally some hardware modifications to create some non-executable areas of memory. So that for example, the portion of the memory or the segment of the memory that the stack occupies would be set to be non-executable. Meaning that whenever we read data from this area, we cannot interprete it as instructions. So, now even if we're able to fill a buffer with, exploit code. And, in, insert the address in the right place so we can jump to that code. we still can't execute it. The system will not interpret that stuff as, instructions. But will insist on saying the stuff on the stack has to be interpreted as data. Not as, as code. For your programming assignment, you'll be working with a virtual machine that is of the late 80s, early 90s vintage, so you'll be able to write a bunch of flow exploit. but you should know that of course that is not the case any longer. And that type of attack is no longer possible. but good luck with the programming assignment and you should really enjoy it and see how you can inject code into a program. that wasn't meant to have that code in it to begin with. Have fun.

Wyszukiwarka