[MUSIC].
Now that we've seen how to represent
floating point numbers let's take a look
at what it means to do operations on
them.
Unlike the representation for integers,
it is important to remember that the
representation for floating point numbers
is not exact.
Meaning that we're always approximating
the real mathematical value, because we
have a finite representation.
That mantissa doesn't go on forever.
It stops after 23 bits or even after 52
bits in the extended notation.
So, the way we go about doing operations
in floating point, the basic idea is to
do the exact operation first and then
rounded to fit inside of a resulting
floating point number representation.
So, for example here, if we're doing x +
y, first let's try to do that addition as
exactly as possible, and then we try to
fit into a 32 bit representation by
rounding if necessary.
And the same thing with the
multiplication.
Now these operations require some
adjustments to, to happen.
So for example, when we're adding
numbers, because the exponents could be
wildly different.
We have to make sure to first adjust the
fractions so that they line up with the
binary points in the right, in the same
location so that we can do an addition.
Fine, in multiplication we have a
different problem.
We don't have to worry about aligning the
fractions, but we do have to be sure that
when we add the x exponents we get an
exponent that is still within range.
And we could very easily go out of range
if we multiply two numbers with large
exponents.
So the basic idea then for floating point
operations is the first compute that
exact result, and then round to make the
result fit into the desired precision.
we might have possible overflow if the
exponent is too large, and we might have
to drop some least significant bits in
the significand, if our fraction gets too
long.
For example, if we do a, a, an addition
between two numbers of very different
exponents.
Okay.
So, that's the basic idea.
Now how do we get this rounding done.
Well, there's so many choices for how to
do rounding.
This table illustrates five possibilities
using dollar amounts at the top here.
And you'll notice that you know, there's
some fairly easy to explain ones, like
round towards zero, always go towards
zero.
So another one is to go towards negative
infinity, because we're always going to
go towards the negative rather than
towards zero in the middle of the
positive and the negative numbers.
And then always round up towards positive
infinity always moving in one direction
there.
Another possibility is to round to the
nearest value.
In this case, the nearest dollar amount.
But you can see we going to have some
problems when we're right in the middle
which is the nearest.
so, that's always difficult to define.
Another possibility is round towards even
towards the even number that's closest.
And why is that interesting?
Well, that's interesting because that
kind of makes sure that the rounding goes
towards and in the up direction half the
time and the down direction the other
half the time.
Okay.
so if we can repeatedly round results of
our operations, these errors are going to
start to accumulate.
And if we also round in the same
direction, we can introduce a statistical
bias into our set of values.
So to avoid this the[UNKNOWN] floating
point standards uses a rounding mechanism
closest to round-to-even.
And that's to get about half of them
rounding up half the time, and half of
the time rounding down.
to avoid that bias if we repeatedly round
the numbers.
All right, some other mathematical
properties of floating point operations.
if an overflow of the exponent occurs,
our hardware, our unit that does the
operation has to notice that and make the
result be positive or negative infinity.
floats that start off with the value
positive or negative infinity, or not a
number, can, can be used in operations,
but the results usually end up staying
positive or negative infinity.
So again our hardware has to design to
detect these situations.
Detect these numbers and do something
different than it would otherwise do.
This makes the design of floating point
units in CPUs one of the hardest jobs of
of the, the logic design of the machine
itself.
Another important thing to remember, and
this is important as programmers now,
floating point operations are not always
associative or distributive because of
that rounding.
So, we can't always just re-order the
operations as we're used to doing in
mathematics, or with integers for that
matter.
with floating point values we cannot do
that.
Here's some a little example.
For example, if we add a small number to
a really large number, and then subtract
that large number, we would expect to
get, that little number back.
However, what we find is that, that
result is not equal to doing the
operations in a slightly different order.
Because when we add that little number to
the large number, it is so little we
cannot actually fit it into the
representation.
In other words, because we have to
represent this large number, we end up
taking up all of the significant bits.
and adding on a 3.14 in this case just
doesn't register in the 23 bits we have
available.
So that when we go and subract that large
number again, we're going to just get
zero, okay.
While in this case, we first do the, we,
we do this operation first, in the
parentheses, that yields a zero.
But then when we add that to 3.14 we're
left with 3.14, so the results are not
equal, on the two sides here.
another example is when we have
multiplication.
Again we take a large number, and
subtract a large number, that's a zero.
So when we multiply that zero times a
large number, we'll expect to get a zero
result.
However, if we do this operation, just
applying the distributive law.
First we multiply the two the 1 times
te-, times 10 to the 20th times the first
number in the parenthesis, and then we
multiply it times the second number in
the, the parenthesis before doing the
difference.
Well, these values are going to be so
large they might overflow.
And when we end up when we end up looking
at the results of those multiplications,
they might just be positive infinity.
And infinity minus infinity is still
infinity.
According to most hardware units, so this
would not work out either with any, with
an equal comparison.
These would not yield the same result.
Wyszukiwarka
Podobne podstrony:
08 Optional Floating point in C06 Optional IEEE Floating point StandardAlpha Floating PointSH Floating PointHPPA Floating PointMSP430 Floating PointV850 Floating Pointoption extended valid elementstrans operation07 Charakteryzowanie budowy pojazdów samochodowych9 01 07 drzewa binarne02 07więcej podobnych podstron