Source: http://www.exploringbinary.com/decimal-to-floating-point-needs-arbitrary-precision/

In IEEE-754 specification, a double precision is presented in 64 bits, with 1 sign bit + 11-bit exponent + 52-bit mantissa representing a 53-bit significant. Converting a decimal to such a floating point number may lost precision. The way to do the conversion is as follows:

- Convert the decimal number to an integer times non-positive power-of-10, i.e.
- Since the significant has 53 bits only, for best precision, we scale the integer to by multiplying with , i.e.
- Then round off the number . The rounding
is done according to IEEE 754 round half to even rule, i.e.
- if the remainder is less than half of the divisor, then round down
- if the remainder is more then half of the division, then round up
- if the remainder is half of the divisor, then round up for even quotient or round down otherwise

- Then express the number in normalised binary scientific notation and encode to binary

Two examples are given in the above link, which demonstrates the algorithm clearly:

Consider ,

- with remainder 26432, as , thus round down to 7074231776675438
- Thus

- Encode into binary becomes:
- Sign bit = 0
- Exponent bits = 1 + exponent bias (1023) = 1024 = 10000000000
- Mantissa bits = 1001001000011111100111110000000110111000011001101110
- Value in decimal = 3.14158999999999988261834005243144929409027099609375

Consider

- 1355712 > (2^{21} /2)$$, thus round up to 5886878443352970
- Thus

- Encode into binary becomes:
- Sign bit = 0
- Exponent bits = 73 + exponent bias (1023) = 1096 = 10001001000
- Mantissa bits = 0100111010100001010110110010011100111011001110001010
- Value in decimal = 12345678901234567741440