07 Optional Floating point Operations

[MUSIC]. Now that we've seen how to represent floating point numbers let's take a look at what it means to do operations on them. Unlike the representation for integers, it is important to remember that the representation for floating point numbers is not exact. Meaning that we're always approximating the real mathematical value, because we have a finite representation. That mantissa doesn't go on forever. It stops after 23 bits or even after 52 bits in the extended notation. So, the way we go about doing operations in floating point, the basic idea is to do the exact operation first and then rounded to fit inside of a resulting floating point number representation. So, for example here, if we're doing x + y, first let's try to do that addition as exactly as possible, and then we try to fit into a 32 bit representation by rounding if necessary. And the same thing with the multiplication. Now these operations require some adjustments to, to happen. So for example, when we're adding numbers, because the exponents could be wildly different. We have to make sure to first adjust the fractions so that they line up with the binary points in the right, in the same location so that we can do an addition. Fine, in multiplication we have a different problem. We don't have to worry about aligning the fractions, but we do have to be sure that when we add the x exponents we get an exponent that is still within range. And we could very easily go out of range if we multiply two numbers with large exponents. So the basic idea then for floating point operations is the first compute that exact result, and then round to make the result fit into the desired precision. we might have possible overflow if the exponent is too large, and we might have to drop some least significant bits in the significand, if our fraction gets too long. For example, if we do a, a, an addition between two numbers of very different exponents. Okay. So, that's the basic idea. Now how do we get this rounding done. Well, there's so many choices for how to do rounding. This table illustrates five possibilities using dollar amounts at the top here. And you'll notice that you know, there's some fairly easy to explain ones, like round towards zero, always go towards zero. So another one is to go towards negative infinity, because we're always going to go towards the negative rather than towards zero in the middle of the positive and the negative numbers. And then always round up towards positive infinity always moving in one direction there. Another possibility is to round to the nearest value. In this case, the nearest dollar amount. But you can see we going to have some problems when we're right in the middle which is the nearest. so, that's always difficult to define. Another possibility is round towards even towards the even number that's closest. And why is that interesting? Well, that's interesting because that kind of makes sure that the rounding goes towards and in the up direction half the time and the down direction the other half the time. Okay. so if we can repeatedly round results of our operations, these errors are going to start to accumulate. And if we also round in the same direction, we can introduce a statistical bias into our set of values. So to avoid this the[UNKNOWN] floating point standards uses a rounding mechanism closest to round-to-even. And that's to get about half of them rounding up half the time, and half of the time rounding down. to avoid that bias if we repeatedly round the numbers. All right, some other mathematical properties of floating point operations. if an overflow of the exponent occurs, our hardware, our unit that does the operation has to notice that and make the result be positive or negative infinity. floats that start off with the value positive or negative infinity, or not a number, can, can be used in operations, but the results usually end up staying positive or negative infinity. So again our hardware has to design to detect these situations. Detect these numbers and do something different than it would otherwise do. This makes the design of floating point units in CPUs one of the hardest jobs of of the, the logic design of the machine itself. Another important thing to remember, and this is important as programmers now, floating point operations are not always associative or distributive because of that rounding. So, we can't always just re-order the operations as we're used to doing in mathematics, or with integers for that matter. with floating point values we cannot do that. Here's some a little example. For example, if we add a small number to a really large number, and then subtract that large number, we would expect to get, that little number back. However, what we find is that, that result is not equal to doing the operations in a slightly different order. Because when we add that little number to the large number, it is so little we cannot actually fit it into the representation. In other words, because we have to represent this large number, we end up taking up all of the significant bits. and adding on a 3.14 in this case just doesn't register in the 23 bits we have available. So that when we go and subract that large number again, we're going to just get zero, okay. While in this case, we first do the, we, we do this operation first, in the parentheses, that yields a zero. But then when we add that to 3.14 we're left with 3.14, so the results are not equal, on the two sides here. another example is when we have multiplication. Again we take a large number, and subtract a large number, that's a zero. So when we multiply that zero times a large number, we'll expect to get a zero result. However, if we do this operation, just applying the distributive law. First we multiply the two the 1 times te-, times 10 to the 20th times the first number in the parenthesis, and then we multiply it times the second number in the, the parenthesis before doing the difference. Well, these values are going to be so large they might overflow. And when we end up when we end up looking at the results of those multiplications, they might just be positive infinity. And infinity minus infinity is still infinity. According to most hardware units, so this would not work out either with any, with an equal comparison. These would not yield the same result.

Wyszukiwarka

Podobne podstrony:
08 Optional Floating point in C
06 Optional IEEE Floating point Standard
Alpha Floating Point
SH Floating Point
HPPA Floating Point
MSP430 Floating Point
V850 Floating Point
option extended valid elements
trans operation
07 Charakteryzowanie budowy pojazdów samochodowych
9 01 07 drzewa binarne
02 07

więcej podobnych podstron