2.4 Floating Point Representation
2.4 Floating Point Representation
2.4 Floating Point Representation
The term floating-point is derived from the fact that there is no fixed number of digits
before and after the decimal point. There are also representations in which the number
of digits before and after the decimal point is set which is called fixed point
representations.
In general, floating point representations are slower and less accurate than fixed point
representations, but they can handle a large range of numbers. Some examples of real
numbers:
examples of reals:
● 3.14159265… ten (pi)
● 2.71828… ten (e)
● 0.000000001ten or 1.0ten × 10−9 (seconds in a nanosecond)
Scientific notation
A notation that renders numbers with a single digit to the left of the decimal point.
Scientific notation is the way that scientists easily handle very large numbers or very
small numbers. For example, instead of writing 0.0000000056, write 5.6 x 10-9.
Normalised notation
Any given real number can be written in the form m×10nin many ways: for example, 350
can be written as 3.5×102 or 35×101 or 350×100.
FRACTION
The value, generally between 0 and 1, placed in the fraction field. The fraction is also
called the mantissa.
EXPONENT
In the numerical representation system of floating-point arithmetic, the value that is
placed in the exponent field.
Example :
1.2345 = 12354 x 10 -4
[ 12354 - Significand, 10 - Base , -4 - Exponent ]
Single Precision
Floating-point numbers are usually a multiple of the size of a word. A floating-point
value represented in a single 32- bit word is called Single precision.
F involves the value in the fraction field and E involves the value in the exponent Field.
-38
In single precision format, numbers as small as 2.010 x 10 and numbers as large as
2.010 x 1038 can be represented. A situation in which a positive exponent biomes too
large to fit in the exponent field is overflow in floating point. A situation in which a
negative exponent becomes too large to fit in the exponent field is called underflow in
floating point.
Double Precision
The overflow and underflow problems of single precision format can be rectified in
another format which has a larger exponent. A floating point value represented in two 32
bit words is called Double Precision. The representation of double precision floating
point numbers takes two MIPS words as shown below, where s is the sign of the floating
point number (0-positive, 1-negative), exponent is the value of the 11 bit exponent field,
and fraction is the 52 bit number in the fraction field.
In double precision format, numbers as small as 2.010 x 10 -308 and numbers as large as
Biasing
IEEE 754 uses a bias of 127 for single precision, so an exponent of - 1 is represented
by the bit pattern of the value -1 +127ten, or 126ten = 0111 1110two, and +1 is represented
by 1 + 127, or 128ten =1000 0000two.
The exponent bias for double precision is 1023. Biased exponent means that the value
represented by a floating-point number is really
±1.00000000000000000000000two x 2-126
to as large as
±1.11111111111111111111111two 2+127.
Floating-Point Representation
Example : 2
Converting Binary to Decimal Floating Point
What decimal number is represented by this single precision float?