[go: up one dir, main page]

0% found this document useful (0 votes)
17 views15 pages

Floating Point Integer

The document discusses floating point representation, which is used to store real numbers with decimal fractions. It explains the structure of floating point numbers, including the mantissa and exponent, and outlines the IEEE standard for floating point arithmetic. Examples are provided to illustrate the conversion of decimal numbers to binary and their representation in a 32-bit format.

Uploaded by

sahiljr0926
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

Floating Point Integer

The document discusses floating point representation, which is used to store real numbers with decimal fractions. It explains the structure of floating point numbers, including the mantissa and exponent, and outlines the IEEE standard for floating point arithmetic. Examples are provided to illustrate the conversion of decimal numbers to binary and their representation in a 32-bit format.

Uploaded by

sahiljr0926
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

FLOATING POINT

REPRESENTATIO
N
What about the other numbers?

So far we know how to store integers Whole


Numbers

But what if we want to store real numbers


Numbers with decimal fractions

Even 27.5 needs another way to represent it.

This method is called floating point representation


Fixed
Notation
We are accustomed to using a fixed notation where the decimal point is fixed
and we know that any numbers to the right of the decimal point are the
decimal portion and to the left is the integer part

E.g. 10.75

10 is the Integer Portion and 0.75 is the decimal portion


Floating Point Representation
The structure of a floating point(real) number is as
follows:
4.2 *
Exponen
108 t

Mantiss Bas
a e

Only the mantissa and the exponent are stored. The base is implied (known
already) As it is not stored this will save memory capacity
IEEE
standard
There
number
is a IEEE standard that defines the structure of a floating point

IEEE Standard for Floating-Point Arithmetic (IEEE 754-2008)

It defines 4 main sizes of floating point numbers 16, 32, 64 and 128 bit

Sometimes referred to as Half, Single, Double and Quadruple precision


A 32 bit floating pointnumber
Sign Exponent Mantissa
1bit 8 bits 23 bits

S is a sign bit
0 =positive
1 =negative

23 bits for the


mantissa
8 bits for the
exponent
Example
We want the format of a number to be in
m xbe
We want the mantissa to be a single decimal digit Example

3450.00 = 3.45 x 103

The exponent is 3 as the decimal place has been moved 3 places to


the left
Decimal fractions
First we will look at how a decimal number is made up:
173.75
Hundreds Tens Units Decimal Hundredths
place
Tenth
s
1 7 3 . 7 5
102 101 100 Decimal
10
-1
10
-2
place

1 7 3 . 7 5
Binary
Then fractions
look at how the same number could be stored in binary:
1010128
1101 64 32 16 8 4 2 1 . 0.5 0.25

1 0 1 0 1 1 0 1 1 1
This number is constructed as shown above (in a fixed point
notation). These values come from

7 6 5 4 3 2 1 0 . -1 -2
2 2 2 2 2 2 2 2 2 2
1 0 1 0 1 1 0 1 1 1
But the problem is

Wedon’t actua ly have a decimal point in


binary...
Example
In decimalfirst
250.03125

First convert the integer part of the mantissa into binary (as you have done
previously) 250 =1111 1010

Now to convert the decimal portion of the mantissa


.03125
Example (cont)
Decimal fraction => .03125
Multiply and use any remainder over 1 as a carry forward. Continue until you
reach 1.0 with no carry over

0.03125 * 2 0
= r0.0625
0.0625 * 2 0 r 0.125
= 0 r 0.25
0.125 * 2 = 0 r 0.5
0.25 * 2 = 1r0 Read top to
0.5 * 2 = bottom
Binaryfraction =
0.00001
So far….
So far wehave : 1111 1010.00001 (250.03125)
But we need it in the format :.11111 0100 0001 (the decimal point to the left
of the 1)
1 1 1 1 1 0 1 0 . 0 0 0 0 1
8 places to the
. 1 1 1 1 1 0 1 left
0 0 0 0 0 1
So the exponent is 8
(1000)
In 32 bit representation
Example there
❏ 8isbits for the
exponent
So back to our example ❏ 23 bits for the
We will pad the left of the
Mantissa=.11111 0100 0001 (2.5003125) mantissa
exponent
Exponent = 0000 1000 (8) with 0’s up to 8 bits
SignBit = 0 We will pad the right of the
mantissa with 0’s up to 23 bits
And the number is positive so the sign bit
is 0
S Exponent Mantissa
0 0000 1000 11111 0100 000100000000000
1bit 8 bits 23 bits
Further Example 1
102.9375

Sign =0 (+ve) Integer= 102 = 1100110

Decimal portion = .1111 -> Number = 1100110.1111 -> Needs to be


.11001101111

Exponent = 7 = 00000111

Number (32 bit Single Precision) = 0 00000111 11001101111000000000000

You might also like