[MUSIC] This video is to help you
understand a little bit about the buffer
overflow problems you've been
encountering and exploiting on one of the
assignments, one of the programming
assignments.
let's start off by talking about what a
buffer overflow is.
primary this happens because C doesn't
check array bounds.
and what happens is that when we take
input from, example, the keyboard we
don't specify a limit on the amount of
input.
So, as we fill up a buffer with the input
we're taking each keystroke from the
keyboard, we can actually go past the end
of an array, that we're using as our
input buffer to store each of those
characters.
And this is, was, this was a very common
type of security vulnerability.
what we'll go over in this video is how
this stack is laid out and the address
space in general that allows this,
allowed this to happen.
How input buffers are put on the stack.
And then how overflowing the buffer can
help us inject unwanted code, into our
program.
And then we'll close with, what are now
the defenses against buffer overflows
that have prevented this kind of
vulnerability.
Okay, so remember the, Linux Memory
Layout.
we have a stack that grows from the high
end of memory downward.
and then we have a, our program, our
programs static data and dynamic data at
the bottom addresses with the dynamic
data growing upward, okay?
Remember the, the dynamic data is
allocated using malloc or new or other
memory allocation primitives like that.
The stack is allocated by making
procedure calls.
And, what each procedure needs to place
on the stack.
Things like local variables arguments for
the next procedure call and so on.
Okay.
So just to reiterate that point, here's
an example program and let's see where
everything actually ends up going in
memory.
you can see that this program has a
couple of big chunks of memory, two big
arrays allocated at the top, some
variables of various types, some instant
pointers.
And then a couple of, a couple of
procedures, some of which have statements
that do some dynamic memory allocation.
So, lets take a look at where those
things all end up in memory.
we'll start with of course, our main
procedure and our useless procedure.
They end up in the text portion of
memory.
That's where we'll have the ins, the
assembly instructions for those
procedures.
Then there's some static data, including
those big arrays and the variables we
declared.
That space will be allocated here.
even prior to running the program, we'll
know in advance how much space we need
there.
Then there's parts of the program of
course those malloc arrays that allocate
memory when it's needed and that will
come from the heap space that will
continue to grow upward.
And of course we can free up that space
when we don't need it anymore.
And use it for other things.
our stack pointer of course points to the
top of the stack.
And we also have some libraries for
example the malloc routine that we've
linked at, at run time, and that usually
sits in this area down here.
Okay, so that's where things are ending
up in memory.
And what happened in the late 80s is a
program was created that could actually
attack a lot of internet hosts.
by exploiting the way that this a, memory
allocation was done, and the way that
memory was arranged.
because the stack grows backwards in
memory, and data and instructions are
stored in the same memory, we can do an
interesting attack.
That can help us take control of a some
ones machine.
let's see how, how this happened.
Okay, the stack was based the internet
worm was based on a stack buffer overflow
exploit.
Okay and again as I mentioned at the
beginning, it's because many Unix
functions do not check argument sizes, so
they'll just allow us to fill and over
fill or overflow a buffer, for our input.
Let's take a look at a common function in
Linux, the gets function.
which is used to to get an input string
from the keyboard.
You'll notice that the function returns a
pointer to a character buffer, a
character a buffer of bytes.
And it takes as its argument the start of
that of that buffer.
Okay.
So, let's, let's walk though this code
real quickly.
You'll see it starts off by getting a
character from the keyboard, a single
character and setting the, setting a
pointer to the destination address.
And then asks, is this character,
something that if, something that is not
the end of file and not a, a new line, or
a return.
And if it isn't, in other words if it's
just some other character, then it will,
put that character at where the pointer
is pointing, the reference in the
pointer.
And then it increments the pointer by
one, which because of pointer arithmetic
will point to the next place in the
buffer.
In this case they're characters, they'll
increment by one.
And then it just calls get character
again and repeats the loop.
asking us again, we reached a return or
an end of file.
when we're done, it adds one more thing
to the end of the string.
Namely the null character.
Because remember in C, we indicate the
end of the string with a null, and then
finally just returns that same address it
started with, as the place where it
placed the input characters.
Okay.
So, what could go wrong in this code?
Well, you can see that.
how big is this buffer?
we don't know because we're just giving a
starting address.
we have no idea how much space was
allocated in memory for this buffer.
we're just going to keep putting things
in it until, as long as there's more
input.
And we'll keep going.
All right, so in fact there's no way to
specify the limit on the number of
characters to read.
As defined and this is problem in many a,
similar problem in many other unix
functions.
Like string copy, which is just given to
addresses and says, copy a string from
here to there.
But doesn't bother to check that the
destination can hold that length, string.
Similarly in in scan F, these are
functions used to get input from the
keyboard, we have a similar kind of
problem as it gets strings of unknown
sides.
Alright so, lets do the smallest possible
example we can, that can show this off.
Here we have a simple C function called
Echo, it's called from main, notice here,
the main prints type of string and then
asks us to input a string.
And the function Echo is just going to
echo it back to the, console after we hit
return.
All right.
So, we will first read a string and then,
write the string back out to the console.
After we're done.
So, how big is the buffer?
Well, we've decided on a buffer of size
four.
Just four bytes.
pretty small buffer.
But let's see what happens when we run
this code.
'Kay.
I'm going to run this code and type this
string 1,2, 3,4, 5, 6,7, 8.
You noticed I typed eight characters, not
four.
I've gone past the end of that buffer and
written into other parts of memory we'll
see in a second where those are and
there's a segmentation fault.
The we tried to, somehow our CPU tries to
use an address it shouldn't, and the
system complains and says you have a
fault here.
Something went terribly wrong, but now if
we type the string 1, 2, 3, 4, 5, 6, 7,
you'll notice that that's only one
shorter but it's still greater than four.
it'll echo it just fine.
It'll just print that right back out.
Well why didn't that cause a problem?
That overflowed the buffer as well, and
if we type the string 123456789ABC, of
course we'll keep getting the
segmentation faults as these strings
overflow the buffer.
So, let's take a look for each of these
cases, just what is going on, what is
happening in this system.
To start, lets review the the assembly
code that might be generated by echo.
You'll notice here there's just some,
that usual set up stuff at the beginning
and clean up stuff at the end.
And in between, we compute an address
that we're going to use for, some
purposes.
Allocate some space on the stack, save
that value onto the stack and then call
gets.
When gets returns, we're just going to
basically call puts right after it.
to echo the values, then reclaim our
stack and return.
Why do we, what are we putting on the
stack?
well we're probably allocating some space
for the buffer as well as other things,
okay.
Remember that buffer was of size four
bytes.
the code down here in the pink region is
the code that we might see in main that
calls echo then does something else.
And eventually cleans up and returns.
Okay.
So before we execute that call to gets in
echo that occurs here, this is likely
what's going to be on the stack.
There will have been a stack frame for
the main procedure.
And then, of course when main called
echo, it placed the return address on the
stack.
That would be the return to the next line
after the call to echo.
And then we see the stack frame for for
the echo function, alright and that
involves pushing EBP on to the stack that
happens here
Then we also push EBX onto the stack, so
we that there and then there's some space
allocated for our four character buffer
on the stack.
And that's computed at this location
here.
the address of the stack you'll notice is
the current EBP minus eight bytes is the
address of the the buffer.
This functions may also allocate some
other space on the stack the last thing
that's put on the stack is the argument
to to gets namely, the address of the
buffer.
So, let's see what is what that stack
looks like, just before we go to get it
to gets.
Again we've allocated we've placed in the
EBP onto the stack, the old EBP onto the
stack that's pointing to some earlier
place in the stack frame.
you notice here the address is ffc658.
That would be maybe the start of the, of
mains F stack frame.
our buffer is here, and remember we have
to think about it as not having any
particular values in it at this point.
We haven't written anything to those
locations yet, and then we have put the
address of the buffer as the argument to
gets.
Okay, and that's placed on the stack just
before we do that, call.
finally, the return address from when
we're done with echo is 040885f7, which
is the address in main for the
instruction after the call to echo.
Where we will be returning.
Alright, so this is what our stack looks
like at this point.
let's see what happens next.
As we enter the characters 1, 2, 3, 4, 5,
6, 7 followed by a Return one of our
examples therefore our input, you'll
notice that we'll fill the buffer with
the ASCII codes for each of the
characters.
There's the 1, the 2, the 3, the 4 all
the way to the 7, closing with that null,
with that null byte to indicate the end
of the string.
Now, you notice that we've overrun our
buffer.
We haven't just filled the four bytes,
we've filled eight bytes, and overwritten
ebx that the that the safe value of ebx
that have been stored on the stack.
That might cause some problems later on,
if we needed that value but it turns out
in this case that it's not an issue and
we can return just fine
After this call, the the return address
hasn't been affected, the saved value of
ebp hasn't been affected.
Everything can still function properly
and that's why we actually print that
string correctly, we're able to do that.
Now, if we add one more character and put
in an input of 1, 2, 3, 4, 5, 6, 7, 8.
You'll notice that we'll overwrite part
of the saved ebp.
So, now rather than being, the frame
being reset correctly for main to point
at the beginning, it'll be pointing at
the wrong location, 00FF instead of FFFF.
and, that will cause that segmentation
fault that we'll see happen later because
when we pop ebp in the lead instruction.
for echo we'll get the wrong value and
main stack frame will be improperly
addressed.
When I put in the longer string,
12345678ABC, we now go even further and
not only overrun the buffer and the save
edx and the save ebp, we actually also
change a byte of the return address.
So, now when we go to return from echo,
we're not even going to return to the
main procedure anymore.
we're going to return to another address
rather then going to 080485f7, we're
going to go to 000485F7.
And that will def, who knows what we will
find there?
So, that's the main problem with this,
buffer overflow is we start destroying
data that is on the stack.
Both the stack frame, the stack frame
discipline is broken, we destroyed the
return addresses, we return, destroyed
this,
Potentially destroy some frame pointers.
Okay, so how can we use this maliciously
to get the machine to do whatever we
want?
well what we can do is input a string
that isn't just simple characters, but
actually are the bytes that could
represent some executable code.
So, that as we go back and overwrite that
buffer, we can overrun that buffer and go
and write over the return address, we can
actually change the return address.
To point into the area we just wrote.
So, rather than let's take a look at this
situation here.
Where we have this function bar, that
we're calling.
And it's supposed to be returning of
course to the address immediately at the
bar.
Let's call that A.
That was the, that's the return address
we place on the stack.
But then we call bar in it, does that, a
buffer allocation and allows us to
overrun the buffer.
We could actually go as far as inserting
code, into the buffer, so that we put
some special code, here, of our own.
As well as continuing down to overrun
where the return address A was stored.
And change it to be address B.
So, we get get of A, and, and put the
address B.
The address B is just going to be the
start of our buffer, where we've put our
own code.
And so now rather than returning to this
address here in the function foo, we're
going to return, erroneously to part of
our stack where we've injected some
special code, that we'll execute, for our
own purposes.
And this is how we end up hijacking a
machine.
we basically overrun its stack, and write
our own code in there, and force, a jump
to that location.
All right.
So, that's what we're going to be doing
in assign, in one of our programming
assignments.
is writing code to do this.
Now this works in machines in the late
80s, early 90s.
it doesn't really work today.
But the way that Internet worm does, is
that it took advantage of a function in
Linux called, finger, which takes
somebody’s e-mail address and would tell
you whether they were logged in, to the
machine or not.
And how long they might have been idle.
so, it used gets to get that input.
And so instead of providing that, that
correct input of, that would probably be
on be of a maybe 64 byte length or so.
We could actually put in something that
was much, much longer that included some
exploit code
The code we actually want to have run,
some padding to get us as far as we need
to go to make sure to overwrite the
return address, and then that new return
address that would actually be the start
of the exploit code.
Alright?
and then that would allow us to take over
the victim's machine.
So, buffer overflow exploits were really
common in the late 80s early 90s.
What has happened since then?
Well, we've gone in and changed those
Linux libraries that like gets function,
we now use fgets which is a function
that's been defined to have an additional
argument.
which specifies the size of the buffer so
we will not read any more characters than
that limit.
therefore we can now, stop before we get
to the end of the buffer and know that
we're not overwriting any other areas.
similarly, string copy is another
function that's been modified.
It's now, there's a version called string
n copy.
where the ends argument specifies the
size of the string to copy, so we don't
copy some long string into the space for
a smaller one and overrun that.
similarly scanf has been changed so that
rather than having the percent s
specification, that just read a string of
arbitrary size, it now uses fgets with
percent ns.
which says read a string but of size n
maximum, again limiting the size of that
input.
but there's been some other changes as
well, some system level protections.
one is to get the compiler to consider,
randomizing the stack offsets, the size
of the frame for each stack frame.
So that it is difficult to, for an
attacker to know how much padding to add.
Because every time the procedure is
called it adds a slightly it, it makes a
frame of a slightly different size.
so that it makes it difficult for to
predict the size of that code and where
we're going to get that return address
that we want to be able to jump to the
exploit code.
To be just, at the right place becomes
very hard to do that.
people have also developed techniques for
detecting stack corruption, checking the
stack before and after procedure calls,
that it might be sensitive, to these
buffer overflow exploits.
To check that the stack has not been.
And then finally some hardware
modifications to create some
non-executable areas of memory.
So that for example, the portion of the
memory or the segment of the memory that
the stack occupies would be set to be
non-executable.
Meaning that whenever we read data from
this area, we cannot interprete it as
instructions.
So, now even if we're able to fill a
buffer with, exploit code.
And, in, insert the address in the right
place so we can jump to that code.
we still can't execute it.
The system will not interpret that stuff
as, instructions.
But will insist on saying the stuff on
the stack has to be interpreted as data.
Not as, as code.
For your programming assignment, you'll
be working with a virtual machine that is
of the late 80s, early 90s vintage, so
you'll be able to write a bunch of flow
exploit.
but you should know that of course that
is not the case any longer.
And that type of attack is no longer
possible.
but good luck with the programming
assignment and you should really enjoy it
and see how you can inject code into a
program.
that wasn't meant to have that code in it
to begin with.
Have fun.