Previous Up Next

23.1.2  Representation by hardware floats

A real is represented by a floating number d, that is

  d=2α (1+m),    0<m<1,   −210 < α < 210.

If α>1−210, then m ≥ 1/2, and d is a normalized floating point number, otherwise d is denormalized (α=1−210). The special exponent 210 is used to represent plus or minus infinity and NaN (Not a Number). A hardware float is made of 64 bits:

Examples of representations of the exponent:

Remark.

2−52=0.2220446049250313e−15.

Examples of representations of normalized floats

Representation of 3.1.

We have

     
  3.1
=2·


1+
1
2
+
1
25
+
1
26
+
1
29
+
1
210
+⋯


         
 
=2·


1+
1
2
+
k=1



1
24k+1
+
1
24k+2






,
         

hence α=1 and m=1/2+∑k=1(1/24k+1+1/24k+2). Hence the hexadecimal and binary representation of 3.1 is:

40 (01000000), 8 (00001000), cc (11001100), cc (11001100),
cc (11001100), cc (11001100), cc (11001100), cd (11001101),

the last octet is 1101, the last bit is 1, because the following digit is 1 (upper rounding).

Representation of 3.0.

We have 3=2· (1+1/2). Hence the hexadecimal and binary representation of 3 is:

40 (01000000), 8 (00001000), 0 (00000000), 0 (00000000),
0 (00000000), 0 (00000000), 0 (00000000), 0 (00000000)
The difference between representations of 3.1−3.0 and 0.1.

For the representation of 0.1:

     
  0.1
=2−4·


1+
1
2
+
1
24
+
1
25
+
1
28
+
1
29
+⋯


         
 
=2−4·
k=0



1
24k
+
1
24k+1



,
         

hence α=1 and

  m=
1
2
+
k=1



1
24k
+
1
24k+1



,

therefore the representation of 0.1 is

3f (00111111), b9 (10111001), 99 (10011001), 99 (10011001),
99 (10011001), 99 (10011001), 99 (10011001), 9a (10011010),

the last octet is 1010, indeed the 2 last bits 01 became 10 because the following digit is 1 (upper rounding).

For the representation of a=3.1−3: computing a is done by adjusting exponents (here nothing to do), then subtracting the mantissa and adjusting the exponent of the result to have a normalized float. The exponent is α=−4 (that corresponds at 2·2−5) and the bits corresponding to the mantissa begin at 1/2=2·2−6: the bits of the mantissa are shifted to the left 5 positions and you get:

3f (00111111), b9 (10111001), 99 (10011001), 99 (10011001),
99 (10011001), 99 (10011001), 99 (10011001), a0 (10100000),

Therefore, a>0.1 and a−0.1=1/250+1/251 (since 100000−11010=110).

This is the reason why:

floor(1/(3.1-3))

returns 9 and not 10 when Digits:=14.


Previous Up Next