08 Optional Floating point in C


[MUSIC] The last part of the discussion of floating point numbers[SOUND] . The last part of our discussion of floating numbers, is how we deal with them in the C language. C offers two levels of precision for floating point numbers, as we've already seen in the IEEE floating point representation. Bot a 32 bit representation for floats as they're referred to and a 64 bit representation for doubles. the, the fault rounding mode is round to even, to avoid that bias in rounding and always in one direction. And there's an, a another header file that we can include that has some important constants called math.h That has constants, for example for infinity and not a number that we can use in our programs. One thing to keep reiterating and you need to remember is, never to use equality comparison for floating point numbers. There's just too many slight differences that could occur in rounding or in how an expression is evaluated associatively or distributively. And we can often get unexpected results for our equality comparisons. The best thing to do with floating point numbers is to avoid equality comparisons. And always do a subtraction of the two values. And then a test that there, those two that, that difference is small. Okay. Another thing we should talk about is casting in C. unlike casting between signed and unsigned integers, in this case, we do change the bit representation. So, for example, when we want to go from an int to a float and cast, an integer value into a floating point value, we actually have to normalize that integer value. Right? Get its exponent, figure out its mantissa and then represent that in the floating point notation. So, that means that integer may in fact get rounded however, overflow is not possible. Floating point numbers can represent much larger values than we can get to with our integer representation, okay. When we go from int to double, we can actually get an exact conversion as long as the int is less than 53 bits. Because now in the double notation the fractional part is 52 bits long plus that one extra bit that one point that sits in front of the mantissa. So, we get 53 bit word size and our integers if they're 32 bits can fit completely in that. so there's going to be exact conversion. if we have a 64-bit integer, we might have some rounding again. And of course if we go from float to double, we also get an exact an exact casting because a float is 32-bits a double is 64, and it has a larger fraction, a larger exponent field. So, it can definitely handle any number that is in the float representation. In doing conversions of doubles or floats to integers we have a couple of issues to think about. One is that the, the fractional part of the floating point number may be truncated. Because as we adjust it to take into account the exponent, we may shift it in such a way to lose a few bits. by convention we're going to always round these values towards zero as we do the conversion. Another issue is when the double or float is bigger or smaller than we can actually represent in our integer notation. In that case we'll use the convention to set the value to Tmin, the two's compliment minimum value. And we'll probably also do that for things like, not a number or infinities and infinities, we might set to Tmax and Tmin, for example. Okay, so to summarize our floating point representations, here I've shown five different possibilities. So, the zero in floating point is the sting of all zeros and we do that for convenience because now if we ever test for zero, all we have to do is the same test we did for intervals. We just look for an all zero bit pattern, and we know its a zero. Then we talked about normalized values where the exponent is anywhere from one to two to the k minus two where k is the number if bits of the exponent. And the significand is 1 point m. Where m is the mantisa, what's represented in that blue portion of the of the number. we also mentioned that we reserved the exponent of all ones to represent positive and negative infinity. Okay. And we're actually going to put a further condition on that, that it's going to be all ones and all zeros in the fractional part. So, all ones in the exponent, all zeros in the fractional part, and of course the sine can be positive or negative. For not a number, we actually have many possibilities. the exponent is still all ones, but now the fractional part is non-zero. That gives us many, many values possible for not a number. And in fact, these are used to signify different conditions under which the not a number arose. And finally we have denormalized values where the exponent is zero, but we treat the signifcand a little differently. And you'll notice that in this case we'll put a zero in front of it, rather than our typical one for normalized values. And this is used to represent values to more densely represent the values near zero, okay? we're not going to talk about denormalized values here but they are treated in more detail in the recommended text by Bryant and O'Halloran, if you want to learn more about that. finally, we always have to remember that all these representations suffer from the problem that there's a fixed number of bits, and that means we can get overflow or underflow. in floating point we also have to consider the fact that even simple fractions like 0.2, do not have an exact representation. In fact it, it, it's a repeating representation that we have to truncate at some point and round okay. So, we can lose precision unlike every operation gets a slightly wrong result that is rounded from the exact result. And these can pile up and that's why we do that round to even, to make sure it doesn't go in one direction all the time. Okay. the other thing we need to remember is that we might get different results as we apply associativity and distributivity. Those operations, those laws do not apply in floating point numbers because of these inexact results to every operation. And lastly, yet again I want to remind you never test floating point values for equality. Okay, that can get you in a lot of trouble because of these rounding effects. Alright, that concludes our discussion of number representations.

Wyszukiwarka

Podobne podstrony:
07 Optional Floating point Operations
06 Optional IEEE Floating point Standard
Alpha Floating Point
SH Floating Point
HPPA Floating Point
MSP430 Floating Point
V850 Floating Point
In Control Omega Point
Fabryka dźwięków syntetycznych 2010 08 29 In The Mix vol 1 domin
Dim Mak How Chi Is Used In Dim Mak Pressure Point Defence
2009 08 Sync or Swim All in One Solution
2006 08 the Sequel Stored Procedures, Triggers, and Views in Mysql 5
08 The Only Living Boy In New York
Options in Fiber

więcej podobnych podstron