0% found this document useful (0 votes)

33 views14 pages

Lecture 02 - Floating Point Arithmetic

Uploaded by

alngarm246

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views14 pages

Lecture 02 - Floating Point Arithmetic

Uploaded by

alngarm246

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Digital Engineering

Fall 2024

Lecture 02 - Floating Point Arithmetic

Instructor: Dr. Tarek Abdul Hamid
The World is Not Just Integers
 Programming languages support numbers with fraction
 Called floating-point numbers
 Examples:
3.14159265… (π)
2.71828… (e)
0.000000001 or 1.0 × 10–9 (seconds in a nanosecond)
86,400,000,000,000 or 8.64 × 1013 (nanoseconds in a day)
last number is a large integer that cannot fit in a 32-bit integer
 We use a scientific notation to represent
 Very small numbers (e.g. 1.0 × 10–9)
 Very large numbers (e.g. 8.64 × 1013)
 Scientific notation: ± d . f1f2f3f4 … × 10 ± e1e2e3

2 Dr. Tarek Abdul Hamid Digital Engineering

Floating-Point Numbers
 Examples of floating-point numbers in base 10 …
 5.341×103 , 0.05341×105 , –2.013×10–1 , –201.3×10–3
decimal point
 Examples of floating-point numbers in base 2 …
 1.00101×223 , 0.0100101×225 , –1.101101×2–3 , –1101.101×2–6
 Exponents are kept in decimal for clarity binary
point
 The binary number (1101.101)2 = 23+22+20+2–1+2–3 = 13.625
 Floating-point numbers should be normalized
 Exactly one non-zero digit should appear before the point
 In a decimal number, this digit can be from 1 to 9
 In a binary number, this digit should be 1
 Normalized FP Numbers: 5.341×103 and –1.101101×2–3
 NOT Normalized: 0.05341×105 and –1101.101×2–6

3 Dr. Tarek Abdul Hamid Digital Engineering

Floating-Point Representation
 A floating-point number is represented by the triple
 S is the Sign bit (0 is positive and 1 is negative)
 Representation is called sign and magnitude
 E is the Exponent field (signed)
 Very large numbers have large positive exponents
 Very small close-to-zero numbers have negative exponents
 More bits in exponent field increases range of values
 F is the Fraction field (fraction after binary point)
 More bits in fraction field improves the precision of FP numbers

S Exponent Fraction

Value of a floating-point number = (-1)S × val(F) × 2val(E)

4 Dr. Tarek Abdul Hamid Digital Engineering

IEEE 754 Floating-Point Standard
 Found in virtually every computer invented since 1980
 Simplified porting of floating-point numbers
 Unified the development of floating-point algorithms
 Increased the accuracy of floating-point numbers
 Single Precision Floating Point Numbers (32 bits)
 1-bit sign + 8-bit exponent + 23-bit fraction
S Exponent8 Fraction23

 Double Precision Floating Point Numbers (64 bits)

 1-bit sign + 11-bit exponent + 52-bit fraction
S Exponent11 Fraction52
(continued)
5 Dr. Tarek Abdul Hamid Digital Engineering
Normalized Floating Point
Numbers
 For a normalized floating point number (S, E, F)

S E F = f1 f2 f3 f4 …
 Significand is equal to (1.F)2 = (1.f1f2f3f4…)2
 IEEE 754 assumes hidden 1. (not stored) for normalized numbers
 Significand is 1 bit longer than fraction
 Value of a Normalized Floating Point Number is

(–1)S × (1.F)2 × 2val(E)

(–1)S × (1.f1f2f3f4 …)2 × 2val(E)
(–1)S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 …)2 × 2val(E)
(–1)S is 1 when S is 0 (positive), and –1 when S is 1 (negative)

6 Dr. Tarek Abdul Hamid Digital Engineering

Biased Exponent Representation
 How to represent a signed exponent? Choices are …
 Sign + magnitude representation for the exponent
 Two’s complement representation
 Biased representation
 IEEE 754 uses biased representation for the exponent
 Value of exponent = val(E) = E – Bias (Bias is a constant)
 Recall that exponent field is 8 bits for single precision
 E can be in the range 0 to 255
 E = 0 and E = 255 are reserved for special use (discussed later)
 E = 1 to 254 are used for normalized floating point numbers
 Bias = 127 (half of 254), val(E) = E – 127
 val(E=1) = –126, val(E=127) = 0, val(E=254) = 127

7 Dr. Tarek Abdul Hamid Digital Engineering

Biased Exponent – Cont’d
 For double precision, exponent field is 11 bits
 E can be in the range 0 to 2047
 E = 0 and E = 2047 are reserved for special use
 E = 1 to 2046 are used for normalized floating point numbers
 Bias = 1023 (half of 2046), val(E) = E – 1023
 val(E=1) = –1022, val(E=1023) = 0, val(E=2046) = 1023
 Value of a Normalized Floating Point Number is

(–1)S × (1.F)2 × 2E – Bias

(–1)S × (1.f1f2f3f4 …)2 × 2E – Bias
(–1)S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 …)2 × 2E – Bias

8 Dr. Tarek Abdul Hamid Digital Engineering

Examples of Single Precision Float
 What is the decimal value of this Single Precision float?
10111110001000000000000000000000
 Solution:
 Sign = 1 is negative
 Exponent = (01111100)2 = 124, E – bias = 124 – 127 = –3
 Significand = (1.0100 … 0)2 = 1 + 2-2 = 1.25 (1. is implicit)
 Value in decimal = –1.25 × 2–3 = –0.15625
 What is the decimal value of?
01000001001001100000000000000000

 Solution: implicit
 Value in decimal = +(1.01001100 … 0)2 × 2130–127 =
(1.01001100 … 0)2 × 23 = (1010.01100 … 0)2 = 10.375
9 Dr. Tarek Abdul Hamid Digital Engineering
Examples of Double Precision Float
 What is the decimal value of this Double Precision float ?
01000000010100101010000000000000
00000000000000000000000000000000
 Solution:
 Value of exponent = (10000000101)2 – Bias = 1029 – 1023 = 6
 Value of double float = (1.00101010 … 0)2 × 26 (1. is implicit) =
(1001010.10 … 0)2 = 74.5
 What is the decimal value of ?

10111111100010000000000000000000
00000000000000000000000000000000

 Do it yourself! (answer should be –1.5 × 2–7 = –0.01171875)

10 Dr. Tarek Abdul Hamid Digital Engineering

Converting FP Decimal to Binary
 Convert –0.8125 to binary in single and double precision
 Solution:
 Fraction bits can be obtained using multiplication by 2
 0.8125 × 2 = 1.625
 0.625 × 2 = 1.25
 0.25 × 2 = 0.5 0.8125 = (0.1101)2 = ½ + ¼ + 1/16 = 13/16
 0.5 × 2 = 1.0
 Stop when fractional part is 0
 Fraction = (0.1101)2 = (1.101)2 × 2 –1 (Normalized)
 Exponent = –1 + Bias = 126 (single precision) and 1022 (double)
Single
10111111010100000000000000000000
Precision
10111111111010100000000000000000 Double
Precision
00000000000000000000000000000000
11 Dr. Tarek Abdul Hamid Digital Engineering
Largest Normalized Float
 What is the Largest normalized float?
 Solution for Single Precision:
01111111011111111111111111111111
 Exponent – bias = 254 – 127 = 127 (largest exponent for SP)
 Significand = (1.111 … 1)2 = almost 2
 Value in decimal ≈ 2 × 2127 ≈ 2128 ≈ 3.4028 … × 1038
 Solution for Double Precision:
01111111111011111111111111111111
11111111111111111111111111111111
 Value in decimal ≈ 2 × 21023 ≈ 21024 ≈ 1.79769 … × 10308
 Overflow: exponent is too large to fit in the exponent field

12 Dr. Tarek Abdul Hamid Digital Engineering

Smallest Normalized Float
 What is the smallest (in absolute value) normalized float?
 Solution for Single Precision:
00000000100000000000000000000000
 Exponent – bias = 1 – 127 = –126 (smallest exponent for SP)
 Significand = (1.000 … 0)2 = 1
 Value in decimal = 1 × 2–126 = 1.17549 … × 10–38
 Solution for Double Precision:
00000000000100000000000000000000
00000000000000000000000000000000
 Value in decimal = 1 × 2–1022 = 2.22507 … × 10–308
 Underflow: exponent is too small to fit in exponent field

13 Dr. Tarek Abdul Hamid Digital Engineering

The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
Floating Point
No ratings yet
Floating Point
13 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Floating Point Arithmetic Guide
No ratings yet
Floating Point Arithmetic Guide
42 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
L1 FloatingPointNumbers Intro
No ratings yet
L1 FloatingPointNumbers Intro
17 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
Lecture5 - Arithmetic For Computers - Part 2
No ratings yet
Lecture5 - Arithmetic For Computers - Part 2
57 pages
4.4 - 1 New Floating Point
No ratings yet
4.4 - 1 New Floating Point
22 pages
Neural Network Quantization Guide
No ratings yet
Neural Network Quantization Guide
150 pages
IEEE 754 for Computer Scientists
No ratings yet
IEEE 754 for Computer Scientists
11 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Floating Points
No ratings yet
Floating Points
31 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
38 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
A Level ZIMSEC Computer Science Notes
No ratings yet
A Level ZIMSEC Computer Science Notes
10 pages
Data Representation. Notes To Help Students Pass Their First Co.p Science Topic
No ratings yet
Data Representation. Notes To Help Students Pass Their First Co.p Science Topic
10 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Single Precision Floating Point
No ratings yet
Single Precision Floating Point
24 pages
Computer Architecture: Data Types
No ratings yet
Computer Architecture: Data Types
25 pages
Lec-4 ALU FloatingPoint CompArch Wali
No ratings yet
Lec-4 ALU FloatingPoint CompArch Wali
17 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Fixed vs Floating Point Numbers
No ratings yet
Fixed vs Floating Point Numbers
31 pages
Floating Point
No ratings yet
Floating Point
33 pages
Floating Point
No ratings yet
Floating Point
16 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Booth and Radix-4 Questions
No ratings yet
Booth and Radix-4 Questions
8 pages
Lecture5 COA
No ratings yet
Lecture5 COA
26 pages
Lec5 ch3
No ratings yet
Lec5 ch3
5 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Lec 4
No ratings yet
Lec 4
15 pages
Numerical Methods Chap1
No ratings yet
Numerical Methods Chap1
14 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Asembly Language
No ratings yet
Asembly Language
42 pages
Chap2 Float
No ratings yet
Chap2 Float
20 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
16 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Floatinf Point
No ratings yet
Floatinf Point
11 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
Teach Yourself Thai Complete Course Book Only 1st Edition David Smyth Complete Edition
No ratings yet
Teach Yourself Thai Complete Course Book Only 1st Edition David Smyth Complete Edition
93 pages
Jadual Math
No ratings yet
Jadual Math
1 page
Lab Handout
No ratings yet
Lab Handout
15 pages
CLASS V MATHS The Fish Tale Worksheet
No ratings yet
CLASS V MATHS The Fish Tale Worksheet
40 pages
Case Analysis On Cafe Momento
No ratings yet
Case Analysis On Cafe Momento
37 pages
Miniaturized UWB Monopole Microstrip Antenna Design by The Combination of Giusepe Peano and Sierpinski Carpet Fractals
No ratings yet
Miniaturized UWB Monopole Microstrip Antenna Design by The Combination of Giusepe Peano and Sierpinski Carpet Fractals
4 pages
Lesson Plan - Direct Current Circuit - Sahal Fawaiz
No ratings yet
Lesson Plan - Direct Current Circuit - Sahal Fawaiz
35 pages
Week 3 Math 9 1 Lesson On Transformable Equations
No ratings yet
Week 3 Math 9 1 Lesson On Transformable Equations
3 pages
Evjen PDF
No ratings yet
Evjen PDF
5 pages
Z Distribution
No ratings yet
Z Distribution
2 pages
Math Category 6 Mock Test
No ratings yet
Math Category 6 Mock Test
6 pages
Class Ix Remedial Worksheet
100% (1)
Class Ix Remedial Worksheet
28 pages
Gamma and Beta Function: (Some Special Integrals)
No ratings yet
Gamma and Beta Function: (Some Special Integrals)
31 pages
Slope & Y-Intercept Interpretation
No ratings yet
Slope & Y-Intercept Interpretation
6 pages
SAT and ACT STudy Guide
No ratings yet
SAT and ACT STudy Guide
12 pages
The Concept of Logical Consequence - John Etchemendy
No ratings yet
The Concept of Logical Consequence - John Etchemendy
7 pages
Physical Medicine and Rehabilitation Board Review Second Edition Sara J. Cuccurullo Download Full Chapters
100% (7)
Physical Medicine and Rehabilitation Board Review Second Edition Sara J. Cuccurullo Download Full Chapters
89 pages
Circular Motion and Laws of Motion-1
No ratings yet
Circular Motion and Laws of Motion-1
2 pages
Sacred Geometry: Universal Patterns
No ratings yet
Sacred Geometry: Universal Patterns
3 pages
01-MSBA-615 - Introduction To R Programming and R Studio
No ratings yet
01-MSBA-615 - Introduction To R Programming and R Studio
47 pages
Space Frames
No ratings yet
Space Frames
27 pages
437combined Loading
No ratings yet
437combined Loading
42 pages
Poisson Distribution Explained
No ratings yet
Poisson Distribution Explained
16 pages
Lab2 Assignment
No ratings yet
Lab2 Assignment
2 pages
CH 11 Algebra and Forumulae
No ratings yet
CH 11 Algebra and Forumulae
12 pages
Ligação
No ratings yet
Ligação
5 pages
Formula Fro Ass 2
No ratings yet
Formula Fro Ass 2
50 pages
The Effect of Training and Work Environment On Emp
No ratings yet
The Effect of Training and Work Environment On Emp
13 pages
Direct and Inverse Proportions Quiz
No ratings yet
Direct and Inverse Proportions Quiz
21 pages
Exercise 1 2 HLP
No ratings yet
Exercise 1 2 HLP
34 pages

Lecture 02 - Floating Point Arithmetic

Uploaded by

Lecture 02 - Floating Point Arithmetic

Uploaded by

Digital Engineering

Lecture 02 - Floating Point Arithmetic

2 Dr. Tarek Abdul Hamid Digital Engineering

3 Dr. Tarek Abdul Hamid Digital Engineering

Value of a floating-point number = (-1)S × val(F) × 2val(E)

4 Dr. Tarek Abdul Hamid Digital Engineering

 Double Precision Floating Point Numbers (64 bits)

(–1)S × (1.F)2 × 2val(E)

6 Dr. Tarek Abdul Hamid Digital Engineering

7 Dr. Tarek Abdul Hamid Digital Engineering

(–1)S × (1.F)2 × 2E – Bias

8 Dr. Tarek Abdul Hamid Digital Engineering

 Do it yourself! (answer should be –1.5 × 2–7 = –0.01171875)

10 Dr. Tarek Abdul Hamid Digital Engineering

12 Dr. Tarek Abdul Hamid Digital Engineering

13 Dr. Tarek Abdul Hamid Digital Engineering

You might also like