[MUSIC].
Alright, time to turn our attention to
how we represent floating-point numbers.
And specifically using the IEEE standard
notation for floating-point numbers,
which all modern computing systems use.
Alright, so floating-point is analagous
to scientific notation.
you remember maybe in some of your
science classes, that you wouldn't
represent a number like, 12 million.
written with all those zeros behind it,
but rather as 1.2 times 10 to the 7th.
Similarly a really tiny number like this
one 0.0000012, we can represent as 1.2
times 10 to the minus 6.
In fact, C supports this notation, by
letting you describe floating-point
numbers as 1.2e7 and, 1.2e minus 6 for
the two examples above.
Alright this is this goes back to IEEE
standard 754 which was established in
1985 as a uniform standard for
floating-point arithmetic.
Before that, there were all kinds of
different formats that were very
difficult to combine.
but today's CPUs all use this same
standard.
And it's really driven by, the
standardization was really driven by
numerical concerns.
standards for handling routing overflow
and underflow, representing
Rep, representing things like division by
zero and so on.
And it ended up creating a standard that
is very hard to make fast in hardware,
but is numerically very well behaved.
And those concerns dominated that
standardization effort.
Let's take a look at the details of of
the IEEE floating-point representation.
If we have a value in base 10 we are
going to represent it as a magnitude.
And then exponent for a power of 2 since
we are going to binary numbers.
And then we will also have a sign bit for
the entire number, so this is back to
sign and magnitude notation.
Okay, so the sign bit is going to
determine whether the number is negative
or positive.
Then the significand, or the mantissa, M
is normally a fractional value.
Something in the range of 1.0 to 2.
And you notice that it can be exactly
1.0, but just a smidgen less than 2.
That's why we use the rounded parentheses
on that side.
And then the exponent is possibly
negative, of course.
And can multiplies the mantissa by that
power of 2.
Okay.
So the representation then in memory is
going to be that we're going to have one
bit, since that's all we need for the
sign bit.
some number of bits for the exponent,
we're going to use, have a field called
exp for that.
we're going to notice that it's going to
encode, the value of E, but it is not
exactly E.
We'll see what I mean by that in a bit.
And then a fractional field that encodes,
encodes the mantissa, but again is not
exactly equal to the mantissa.
And we'll see what the difference is in
just a sec.
So let's, get to that.
so now, how many bits do we assign to
each of these, fields?
We said that we're going to have one bit
for the sign, that's easy enough.
for floating-point number represented in
32 bits.
the actual police standard says we're
going to use 8 bits for the exponent.
That's going to limit how large and how
small our numbers can get.
And then um,, we're going to use the
remaining 23 bits for the representing
that mantissa or the fractional part.
And that will determine our precision,
okay.
So we have range and precision, and of
course the trade-off between the two is
how many bits we use for each.
So in IEEE floating-point, there's also a
64 bit representation of floats or
doubles.
that uses 11 bits for the exponent and 52
for the fractions.
So quite a bit more range and also more
quite a bit more precision and also a bit
more range.
Alright so lets talk about the mantissa
first.
The the significant.
We're going to talk about normalized
numbers, meaning that the mantissa is
always going to be of the form one point
xxxxx some binary bits.
this is analgous to what we do with
floating-point notation scientific
notation , in decimal numbers.
We always have values that start with one
point something, okay.
So if wanted to represent the number .011
times 2 to the 5th, we would normalize
that to be 1.1 times 2 to the 3.
Okay?
And those are exactly the same.
But the latter makes better use of the
variable bits, because we don't have to
bother with those extra zeros.
And actually since we now the mantissas
always going to start with that one point
at the beginning, we not even going to
bother to store that in our
representation.
Why waste the bit on somethign we know is
always goign to be there, so that's why
the fraction doesnt include the mantissa
exactly.
The fraction only encodes this part of
the mantissa.
those binary digits to the right of the
binary point.
It does not encode the one to the left,
okay?
But now we have to also ask ourselves a
question, for, how do we represent the
number 0.0?
Ideally we'd like it to be the all zeros
number as well.
You know, if we have zeros throughout our
32 bits, I would still like that to
correspond to zero.
So, we have to figure out how to get that
to work out exactly.
so that's going to pose some challenges
for us.
And then what about values that like 1
divided by 0, which yield a basically
something that is not a number.
How are we going to encode that?
So what we're going to do is reserve a
couple of exponent values, exponent field
values.
to handle these cases.
The special values we're most interested
in, as I've already mentioned, is the
case of having the bit pattern of all
zeros represent a zero.
So any exponent that has all zero bits
here, should be help, should be used to
help us represent that zero.
we're also going to reserve an exponent
of all ones for two other kinds of values
that we need.
if the exponent is all ones, and the
fractional part is all zeros, then that's
going to represent infinity and or a very
large number.
and of course we'll have positive
infinity and negative infinity because we
can have the sign bit represent that for
us.
similarly if the fraction is not zero,
we're going to use that to represent not
a number.
That's still within exponent of all ones.
And not a number is an important value to
use.
For operations that have an undefined
result.
Things like the square root of minus one,
infinity minus infinity or an infinity
times zero.
those are clearly not ones we can come up
with a numeric value for.
So we're going to reserve these exponents
of zero and all ones for this purpose.
So now let's turn our attention to how we
deal with that exponent field.
Since we can't use zero, all zeros and
all ones because we need those for those
special values.
we're going to encode the exponent using
a Bias value.
Basically, the real exponent that we want
on the number the value E, the exponent
of the power of 2.
is going to be represented using this exp
field, the exponent field, minus a Bias,
okay.
And the Bias is an unsigned value ranging
from 1 to 2 to the k minus 2, where k is
the number of bits in the exponent field.
So we're going to use a Bias of 2 to the
k minus 1 minus 1.
Alright, let's see what that really
means.
for single precision, that value turns
out to be 127.
That means that since we can have
exponents from 1 to 254 using 8 bits.
Remember we're not using zero and we're
not using 255, because those are reserved
special values.
That will then correspond to an exponent
from minus 126 to 127.
So what that Bias lets us do, is
represent both positive and negative
exponents.
In that range of 1 to 254 for the bit
patterns in the exponent field.
For double precision, of course we have
11 bits.
So we go from 1 to 2046, and the Bias is
going to be a little bit more, it's
going to be 1023.
So that the exponent we can represent are
minus 1022 to positive 1023.
Okay, so these enable both these large
positive exponents for representing large
numbers.
And very small values by having a
negative exponent.
Okay, so the significand as I've
mentioned is then encoded without that
leaving 1 on the mantissa.
We just represent those other bits.
And a significand that has all zeros
would correspond to 1 minus 0 1.0 because
the 1 point is assumed and then zeros
following.
If we have a mantissa that is all ones,
that's equivalent 1.11111, which is very
close to 2, but not quite 2.
Okay.
So we get that leading extra bit for
free.
So now we've seen how we encode both the
E and the M in our exponent and
fractional parts.
Okay?
That's why it's not an exact
representation of those but rather an
encoding.
Alright, so let's look at the
floating-point number 12345.0.
remember, that is that same old bit
pattern for 12345.
and now we have to normalize it.
put it in a form where there the
significance starts with one point.
So the way we would that is by moving
that binary point 13 positions over to be
right after the leading one.
And then we have an exponent of 2 to the
13.
So that's our normalized form, and now we
can encode the significand which is just
this value brought down here.
And of course we're not going to bother
with the leading 1, we're just going to
use the rest of the bits for the
fractional part.
And that will lead to a fractional 23
bits that we'll be using that will look
like this.
And you notice we've just padded with
trailing zeros at the end because we have
to have some bit values there.
So we don't want to change the value we
use all zeros.
Alright, the exponent, remember we have
to use that Bias so our exponent field is
going to be the value of E, plus the
bias.
And the Bias remember was 127, so our
exponent is 13, when we add 127, we have
an exponent of 140.
And that will be the bit pattern for 140
that we will use in, in the 8 bit field
for the exponent.
So the result is this representation for
our floating-point number 12,345.0.
Okay, not immediately obvious at all by
looking at those bits.
But you can see the process that we go
though.
first that normalization, then taking the
fractional part of the mantissa.
And then, adding the Bias to the
exponent, okay?