10.1.2 Representation by hardware floats
A real is represented by a floating number d, that is
d=2(1+m), 0<m<1, −210 < α < 210
|
If α>1−210, then m ≥ 1/2, and d is a normalized floating
point number, otherwise d is denormalized (α=1−210). The
special exponent 210 is used to represent plus or minus infinity
and NaN (Not a Number). A hardware float is made of 64 bits:
-
the first bit is for the sign of d (0 for ’+’ and 1 for ’-’)
- the 11 following bits represents the exponent, more precisely
if α denotes the integer given by the 11 bits,
the exponent is α+210−1,
- the 52 last bits codes the mantissa m, more precisely if
M denotes the integer given by the 52 bits, then
m=1/2+M/253 for normalized floats and m=M/253 for
denormalized floats.
Examples of representations of the exponent:
-
α=0 is coded by 011 1111 1111
- α=1 is coded by 100 0000 0000
- α=4 is coded by 100 0000 0011
- α=5 is coded by 100 0000 0100
- α=−1 is coded by 011 1111 1110
- α=−4 is coded by 011 1111 1011
- α=−5 is coded by 011 1111 1010
- α=210 is coded by 111 1111 1111
- α=2−10−1 is coded by 000 0000 000
Remark: 2−52=0.2220446049250313e−15