Converting Decimal to Binary Floating Point

This article explains how to represent decimal (denary) numbers in floating point in binary and assumes an understanding of the IEEE 754 floating point format.

Let's convert the number 37.21875 into IEEE 754 32-bit single precision floating point.

Step 1: First split the number into the part before the decimal point and the part after, so we have:

37 and 0.21875

Step 2: Convert 37 to binary. This gives us:

100101

Step 3: Convert 0.21875 to binary.

One method of doing this is progressive multiplication by 2:

Multiply 0.21875 by 2:

0.21875 * 2 = 0.4375

Record the digit before the decimal point in the calculation result which is 0

Now multiply the calculation result by 2 again:

0.4375 * 2 = 0.875

Append the digit before the decimal point in the result to the digit we previously stored. So now we have: 00

Multiply the calculation result by 2 again:

0.875 * 2 = 1.75

Now append the 1 that is prior to the decimal point to the digits previously stored. We now have: 001

When we get a 1 as the calculation result prior to the decimal point, we replace the 1 with 0 before continuing. So we use 0.75 in the next multiplication. Now multiply this by 2 again:

0.75 * 2 = 1.5

Append the 1 to the result so far. Now we have 0011

We have a 1 result, so replace the 1 before the decimal point with 0 and multiply by 2 again:

0.5 * 2 = 1.0

Store the digit before the decimal point again. We now have: 00111

Again we replace the digit before the decimal in the result and we have 0.0

When we end up with 0.0 we stop. The final binary conversion of 0.21875 is 00111

So now we have two parts of the number in binary:

37 = 100101
0.21875 = 00111

Step 4: Now combine the integer part in binary with the decimal part in binary with the decimal point in between:

100101.00111

Step 5: Now shift the decimal point to the left until it is just prior to the first 1. Count how many times you do this. So we need to shift the decimal point 5 times to the left and end up with:

1.0010100111

Step 6: This is the mantissa part of the IEEE 754 number. However, in this format we store it without the left-most 1, so it becomes:

0010100111

Step 7: The mantissa in 32-bit floating point is 23 bits long, so we pad it with zeros to the right until it is 23 digits in length:

00101001110000000000000

Step 8: The count of 5 shifts we did above becomes the exponent part of the number. In IEEE 754, the exponent is stored with 127 added to it. So the exponent is 5 + 127 = 132. In binary this is:

10000100

Step 9: Now we just need the sign bit. Positive numbers have a sign bit of zero.

So finally we have:

37.21875 expressed as IEEE 754 floating point binary format.

To work out what number this is actually stored as we recall that the left-most bit position of the mantissa represents 1/2, the next position 1/4, then 1/8 and so on. Then remember there is an initial 1 that is not shown. So in the mantissa we have:

1 + (1/8) + (1/32) + (1/256) + (1/512) + (1/1024) = 1.1630859375

The exponent was 5 (with 127 added to it). Hence the number that is actually stored is:

37.21875 expressed in decimal scientific notation as 1.1630859375 x 2^5