Converting Decimal to Binary Floating Point
This article explains how to represent decimal (denary) numbers in floating point in binary and assumes an understanding of the IEEE 754 floating point format.
Let's convert the number 37.21875 into IEEE 754 32-bit single precision floating point.
- Step 1: First split the number into the part before the decimal point and the part after, so we have:
37 and 0.21875
- Step 2: Convert 37 to binary. This gives us:
100101
- Step 3: Convert 0.21875 to binary.
One method of doing this is progressive multiplication by 2:
Multiply 0.21875 by 2:
0.21875 * 2 = 0.4375
Record the digit before the decimal point in the calculation result which is 0
Now multiply the calculation result by 2 again:
0.4375 * 2 = 0.875
Append the digit before the decimal point in the result to the digit we previously stored. So now we have: 00
Multiply the calculation result by 2 again:
0.875 * 2 = 1.75
Now append the 1 that is prior to the decimal point to the digits previously stored. We now have: 001
When we get a 1 as the calculation result prior to the decimal point, we replace the 1 with 0 before continuing. So we use 0.75 in the next multiplication. Now multiply this by 2 again:
0.75 * 2 = 1.5
Append the 1 to the result so far. Now we have 0011
We have a 1 result, so replace the 1 before the decimal point with 0 and multiply by 2 again:
0.5 * 2 = 1.0
Store the digit before the decimal point again. We now have: 00111
Again we replace the digit before the decimal in the result and we have 0.0
When we end up with 0.0 we stop. The final binary conversion of 0.21875 is 00111
So now we have two parts of the number in binary:
37 = 100101
0.21875 = 00111
- Step 4: Now combine the integer part in binary with the decimal part in binary with the decimal point in between:
100101.00111
- Step 5: Now shift the decimal point to the left until it is just prior to the first 1. Count how many times you do this. So we need to shift the decimal point 5 times to the left and end up with:
1.0010100111
- Step 6: This is the mantissa part of the IEEE 754 number. However, in this format we store it without the left-most 1, so it becomes:
0010100111
- Step 7: The mantissa in 32-bit floating point is 23 bits long, so we pad it with zeros to the right until it is 23 digits in length:
00101001110000000000000
- Step 8: The count of 5 shifts we did above becomes the exponent part of the number. In IEEE 754, the exponent is stored with 127 added to it. So the exponent is 5 + 127 = 132. In binary this is:
10000100
- Step 9: Now we just need the sign bit. Positive numbers have a sign bit of zero.
So finally we have:
To work out what number this is actually stored as we recall that the left-most bit position of the mantissa represents 1/2, the next position 1/4, then 1/8 and so on. Then remember there is an initial 1 that is not shown. So in the mantissa we have:
1 + (1/8) + (1/32) + (1/256) + (1/512) + (1/1024) = 1.1630859375
The exponent was 5 (with 127 added to it). Hence the number that is actually stored is: