0% found this document useful (0 votes)

200 views21 pages

Numerical Analysis for STEM Students

1. Numerical analysis deals with developing approximation procedures called numerical methods to solve mathematical problems, developing algorithms to implement these methods, and analyzing errors. 2. Real numbers cannot be stored exactly in computers due to memory restrictions, so they are approximated using floating-point representation. This introduces arithmetic errors. 3. Total error in numerical solutions equals the mathematical error from the numerical method plus the arithmetic error from floating-point representation in computers. Understanding errors is important in numerical analysis.

Uploaded by

gas021018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views21 pages

Numerical Analysis for STEM Students

Uploaded by

gas021018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction

1
1.1 Motivation
Before going to study the course, let us try to motivate ourselves by asking the following
five Wh questions.

1. What is Numerical Analysis ?

2. Where is Numerical Analysis used ?

3. When is Numerical Analysis required ?

4. Who requires Numerical Analysis ?

5. Why should we learn Numerical Analysis ?

And, finally we ask the following:

How to do Numerical Analysis ?

1. What is Numerical Analysis ?

Numerical Analysis is a branch of mathematics that deals with

(a) Developing approximation procedure (called Numerical Methods) to solve math-

ematical problems.

1
1. Introduction 2

(b) Developing efficient algorithm to implement the above approximation proce-

dure as computer codes (called Implementation).
(c) Analyzing error for the above approximation procedure (a bit of error analysis,
convergence, stability).

Numerical Analysis is not about playing with numerical values in computer. It is

about mathematical ideas, insights.
What are Numerical methods?
They are algorithms to compute:

(i) approximations of functions,

(ii) approximations of derivatives of functions at some point,
(iii) approximations of integrals of functions in some interval,
(iv) approximations of solution/s of linear/nonlinear single/system of algebraic/differential
equation/s.

2. Where is Numerical Analysis used ?

(I will put a picture describing whole situation.)

3. When is Numerical Analysis required ?

In the following situation one can go for numerical approximations:

(a) The mathematical problem is known to have solution but does not have any
method to obtain exact analytical expression.
(b) Mathematical problem is very difficult to handle exactly.
(c) There is a lack of enough data about the inputs for the mathematical problem
and therefore not possible to use the available exact methods.

4. Who requires Numerical Analysis ?

Everyone working in the field of science and engineering field. For eg:

• Mathematicians,
• Statisticians,
• Physicists,
• Chemists,
• Engineers, etc.

5. Why should we learn Numerical Analysis ?

There are at least three reasons for which we need to learn and understand numerical
methods.
3 Syllabus we are going to cover

(a) Learning different numerical methods and their analysis will make a person
more familiar with the techniques of developing new numerical methods. This
is important because when the available methods are not enough or not efficient
for a specific problem to solve.
(b) In many situations, one has more methods for a given problem. Hence choosing
an appropriate method is important for producing an accurate result in lesser
time.
(c) With a good enough knowledge of numerical methods, one can use it properly
and efficiently to get solution of problems and understand what is going wrong
if the results obtained are not expected.

Through the rest of the course we will answer ’How to do numerical analysis?’

1.2 Syllabus we are going to cover

(Already I mentioned in the class. This portion will be filled.)

1.3 Prerequisite
(Already I mentioned in the class. This portion will be filled.)
1. Introduction 4
Floating Point Approximation of Real
2
Numbers and Error Analysis

We have known that Numerical Analysis is all about

• Developing approximation procedure called numerical methods, to approximate a

solution of a given Mathematical problem (whenever a solution exists),

• Developing algorithms and implement them as computer codes,

• Analyzing error in numerical approximation.

Therefore, the approximate solution, obtained by numerical method, will involve an error
which is precisely the difference between the exact solution and the approximate solution.
Thus, we have

Exact Solution = Approximate Solution + Error.

This error is called the Mathematical error.

In the next step while developing the algorithm and implementing them as computer code,
we use a set of numerical values to evaluate the approximate solution obtained in
numerical method. After evaluation we obtain a set of numerical values which is called
the numerical solution to the given Mathematical problem. Due to memory restrictions,
a computing device can store only a finite number of digits. Therefore, a real number
cannot be stored exactly. Certain approximation needs to be done, and only an

5
2. Floating Point Approximation of Real Numbers and Error Analysis 6

approximate value of the given number is stored in the device. For further calculations,
this approximate value is used instead of the exact value of the number. Hence, during the
process of computation, the computer introduces a new error, called the arithmetic error.
Then, we note that

Approximate Solution = Numerical Solution + Arithmetic Error.

Therefore, we have

Exact Solution = Numerical Solution + Mathematical Error + Arithmetic Error.

The Total Error is defined as

Total Error = Mathematical Error + Arithmetic Error.

7 Floating-Point Representation

The error (arithmetic error) involved in the numerical solution when compared to the
exact solution can be worse than the mathematical error.
In this chapter, we introduce the floating-point representation of a real number and see
two method to obtain floating-point approximation of a given real number. We introduce
different types of errors that we come across in numerical analysis and their effects in the
computation. At the end of this chapter, we will be familiar with the arithmetic errors,
their effect on computed results and some ways to minimize this error in the computation.

2.1 Floating-Point Representation

Let β ∈ N, with β ⩾ 2. Let x ∈ R be a real number. The real number x can be represented
in base β as

x := (−1)s × (.d1 d2 . . . dt dt+1 . . . )β × β e , (2.1.1)

where di ∈ {0, 1, . . . , β − 1} for all i ∈ N with d1 ̸= 0 or di = 0, for all i ∈ N, s ∈ {0, 1},

and e ∈ Z. Here in above
• β is called the base or radix,

• s is called the sign,

• (.d1 d2 . . . dn dn+1 . . . )β is called the mantissa,

• e is called the exponent,

of the given number x. The representation (2.1.1) of a real number is called the
floating-point representation. The mantissa part can also be written as
∞
di
(.d1 d2 . . . dn dn+1 . . . )β = ∑ i
. (2.1.2)
i=1 β

We note that the right hand side of (2.1.2) is convergent series in R and hence is well
defined.
Remark 2.1.1. We note the following.
• When β = 2, the floating-point representation (2.1.1) is called the binary
floating-point representation.

• When β = 10, the floating-point representation (2.1.1) is called the decimal

floating-point representation.
There are other representation of real numbers. For example octal with β = 8,
hexadecimal (used in ancient China) with β = 16 , vigesimal (used in France) with
β = 20 etc. In the course we will consider β = 10.
2. Floating Point Approximation of Real Numbers and Error Analysis 8

2.1.1 Floating-Point Number System in a Computing Device

Due to memory restrictions, a computing device can store only a finite number of digits in
the mantissa. In this section, we introduce the floating-point approximation and discuss
how a given real number can be approximated.
A computing device stores a real number with only a finite number of digits in the
mantissa. Although different computing devices have different ways of representing the
numbers, here we introduce a mathematical form of this representation, which we will use
throughout this course.

Definition 2.1.1 (n-digit floating point number). Let β ∈ N, with β ⩾ 2. An n-digit

floating point number x f in base β is of the form

x f := (−1)s × (.d1 d2 . . . dn )β × β e (2.1.3)

where di ∈ {0, 1, . . . , β − 1} for all i = 1, 2, . . . , n, with d1 ̸= 0 or di = 0, for all

i = 1, 2, . . . , n, s ∈ {0, 1}, e ∈ Z is an appropriate exponent. Here

n
di
(.d1 d2 . . . dn )β = ∑ i
. (2.1.4)
i=1 β

The number of digits n in the mantissa is called the precision or length of the
floating-point number.

Remark 2.1.2. We note the following.

• When β = 2, the n-digit floating-point representation (2.1.1) is called the n-digit

binary floating-point representation. This is used in computer.

• When β = 10, the n-digit floating-point representation (2.1.1) is called the n-digit
decimal floating-point representation. It is used in daily life computation.

Example 2.1.1. The following are examples of real numbers in the decimal floating point
representation.

(i). The real number x f = 2347 is represented in the decimal floating-point

representation as
x f = (−1)0 × .2347 × 104 .

In the above, we note that β = 10, s = 0, d1 = 2, d2 = 3, d3 = 4, d4 = 7, e = 1.

(ii). The real number y f = −0.000739 is represented in the decimal floating-point

representation as
y f = (−1)1 × .739 × 10−3 .

In the above, we note that β = 10, s = 1, d1 = 7, d2 = 3, d3 = 9, e = −3.

9 Floating-Point Representation

Remark 2.1.3. Any computing device has its own memory limitations in storing a real
number. In terms of the floating-point representation, these limitations lead to the
restrictions in the range of the exponent (e) and the number of digits in the mantissa (n).

2.1.2 Characterization of Floating-Point Number System in a

Computing Device
For a given computing device, there are integers L,U, n ∈ Z such that the exponent e is
limited to a range
L ⩽ e ⩽ U,

and n is the precision. Therefore, a floating point number system, denoted by F, is

characterized by four integer β , n ∈ N, L,U ∈ Z such that

• β = base or radix,

• n= precision,

• L=lower bound of the exponent range,

• U=upper bound of the exponent range.

Any floating point number x f ∈ F, can be represented as in (2.1.3).

2.1.3 Properties of F
(i). The n-digit floating-point representation of a number is unique.

(ii). A floating point number system F is finite and discrete.

(iii). The total numbers in a given floating point number system is

2(β − 1)β n−1 (U − L + 1) + 1.

Indeed, while counting the numbers, there are

• 2 choices of the sign s,

• (β − 1) number of choices of leading digit d1 ,
• β number of choices of each of the remaining digits d2 , d2 , . . . dn ,
• (U − L + 1) number of possible values of the exponent e.

Therefore, in total 2(β − 1)β n−1 (U − L + 1) numbers. Finally, 1 for number zero.
2. Floating Point Approximation of Real Numbers and Error Analysis 10

(iv). The mantissa m(x f ) := (.d1 d2 . . . dn )β = ∑ni=1 βdii satisfies

1
⩽ m(x f ) < 1.
β

(v). The smallest positive floating point number is

UFL = β L−1 .

Indeed, for the smallest positive floating point number

• the sign, s = 0,
• the leading digit, d1 = 1,
• the remaining digits, d2 = d2 = · · · = dn = 0,
• the exponent, e = L.

Hence,
1 L
UFL = (.10 . . . , 0)β β L = β = β L−1 .
β
It is also called Under Flow Level (UFL).

(vi). The largest positive floating point number is

OFL = (1 − β −n )β U .

Indeed, for the largest positive floating point number

• the sign, s = 0,
• all the digits, d1 = d2 = d2 = · · · = dn = (β − 1),
• the exponent, e = U.

Hence,

UFL = (.(β − 1)(β − 1) . . . (β − 1))β β U ,

n
(β − 1) U
=∑ β ,
i=1 βi

(β − 1) U 1 1
= β 1 + + · · · n−1 ,
β β β
 n 
1
(β − 1) U  1 − β 
= β ,
β 1− 1 β
(β − 1) U −n
(1 − β )
= β (β −1)
,
β
β
11 Floating-Point Representation

= (1 − β −n )β U .

It is also called Under Flow Level (UFL).

(vii). All floating point number x f ∈ F must satisfies

UFL ⩽ |x f | ⩽ OFL.

(viii). Floating point numbers are not uniformly distributed throughout their range, but
they are equally spaced only between successive powers of β .

(ix). If an arithmetic operation leads to a result in which a computed number x satisfies

|x| > OFL, then, the calculation may terminate. Then error occurs in the
computation. It is called an overflow error.

(x). If an arithmetic operation leads to a result in which a computed number x satisfies

|x| < UFL, then, the the floating point representation of x (note that floating point
representation of real number will be discussed in subsection 2.1.5) will be set to
zero. This also makes error in the computation. It is called an underflow error.

In subsection 2.1.4, we introduce the concept of under and over flow of memory, which is
a result of the restriction in the exponent. The restriction on the length of the mantissa is
discussed in subsection 2.1.5.

2.1.4 Overflow and Underflow of Memory

• When the value of the exponent e in a floating-point number exceeds the maximum
limit of the memory, we encounter the overflow of memory.

• When the value of the exponent e goes below the minimum of the range, then we
encounter underflow.

During a computation,

• if some computed number x f has an exponent e > L (this also occurs if |x f | > OFL)
then we say, the memory overflow occurs.

• if e < U, (this also occurs if |x f | < UFL) we say the memory underflow occurs.

and

Remark 2.1.4. We note the following.

(i). • In the case of overflow of memory in a floating-point number, a computer will

usually produce meaningless results or simply prints the symbol inf or NaN.
2. Floating Point Approximation of Real Numbers and Error Analysis 12

• When a computation involves an undetermined quantity (like

0 × ∞, ∞ − ∞, 0/0), then the output of the computed value on a computer will
be the symbol NaN (means ‘not a number’). For instance, if x f is a sufficiently
large number that results in an overflow of memory when stored on a
computing device, and y f is another number that results in an underflow, then
their product will be returned as NaN.

(ii). On the other hand, we feel that the underflow is more serious than overflow in a
computation. Because, when underflow occurs, a computer will simply consider the
number as zero without any warning. The error is disguised in the computation and
computer will not inform you about that whereas for the case of overflow computer
will give you warning to modify the computer algorithm(code). If a computation
involves such an underflow of memory, then there is a danger of having a large
difference between the actual value and the computed value. However, by writing a
separate subroutine, one can monitor and get a warning whenever an underflow
occurs.

2.1.5 Floating-Point Approximation of Real Numbers

In general, a real number can have infinitely many digits, which a computing device
cannot hold in its memory. Rather, each computing device will have its own limitation on
the length of the mantissa. If a given real number has infinitely many digits in the
mantissa of the floating-point form as in (2.1.1), then the computing device converts this
number into an n-digit floating-point form as in (2.1.3). Such an approximation is called
the floating-point approximation of a real number.
There are many ways to get floating-point approximation of a given real number. Here we
introduce two types of floating-point approximations.

Definition 2.1.2 (Chopping). Let x ∈ R be a real number given in the floating-point

representation (2.1.1) as

x = (−1)s × (.d1 d2 . . . dn dn+1 . . . )β × β e . (2.1.5)

The floating-point approximations of x using n-digit chopping is given by

fl(x) = (−1)s × (.d1 d2 . . . dn )β × β e . (2.1.6)

The floating-point approximations of x using n-digit rounding is given by


(−1)s × (.d d . . . d ) × β e , 0 ⩽ dn+1 < β2 ,
1 2 n β
fl(x) = (2.1.7)
(−1)s × (.d d . . . (d + 1)) × β e , β
⩽ dn+1 < β ,
1 2 n β 2
13 Floating-Point Representation

where

(.d1 d2 . . . (dn + 1))β = (.d1 d2 . . . (dn )β + (. 00 . . . 0} 1)β ,

| {z
(n−1)- times
n
di 1
=∑ i
+ n.
i=1 β β

Remark 2.1.5. We note the following. If for a real number x, dn+1 ⩾ β2 and dn = β − 1,
then in case of n-th digit rounding dn + 1 in fl(x) would be β . Therefore, we put dn = 0
and add 1 to dn−1 . Again, if dn−1 = β − 1, we put dn−1 = 0 and add 1 to dn−2 . We
continue further if required until all the digits are exhausted. It may happen that
d1 = d2 = · · · = dn = β − 1 and dn+1 ⩾ β2 , then for fl(x) we put d1 = 1, all other di = 0 for
i = 2, 3, . . . , n and increase the exponent by 1, if

β
x = (−1)s × (. (β − 1)(β − 1) . . . (β − 1) dn+1 )β × β e , dn+1 ⩾
| {z } 2
n-times

then in case of n-th digit rounding

fl(x) = (−1)s × (.1 00 . . . 0} )β × β e+1 .

| {z
(n−1)-times

Example 2.1.2. (i). The floating-point representation of π is given by

π = (−1)0 × (.31415926 . . . )10 × 101 .

The floating-point approximation of π using 5-digit chopping is given by

fl(π) = (−1)0 × (.31415)10 × 101 .

The floating-point approximation of π using 5-digit rounding is given by

fl(π) = (−1)0 × (.31416)10 × 101 .

because d6 = 9 ⩾ 5. So in fl(π), d5 = 5 + 1 = 6.

Remark 2.1.6. Most of the modern processors, including Intel, uses IEEE 754 standard
format. This format uses 52 bits in mantissa, (64-bit binary representation), 11 bits in
exponent and 1 bit for sign. This representation is called the double precision. number.
When we perform a computation without any floating-point approximation, we say that
the computation is done using infinite precision (also called exact arithmetic).
2. Floating Point Approximation of Real Numbers and Error Analysis 14

2.1.6 Arithmetic Operations of Floating-Point Approximations

In this subsection, we describe the procedure of performing arithmetic operations using
n-digit rounding. The procedure of performing arithmetic operation using n-digit
chopping can be done in a similar way.

Procedure of performing arithmetic operations

Let ⊙ denote any one of the basic arithmetic operations +, −, ×, ÷. Let x, y ∈ R be two
real numbers. The process of computing x ⊙ y using n-digit rounding is as follows.

• Step 1: First, we consider the n-digit rounding approximations fl(x) and fl(y) of the
numbers x and y, respectively.

• Step 2:Then, we perform the calculation fl(x) ⊙ fl(y) using exact arithmetic.

• Step 3: Finally, we consider the n-digit rounding approximations fl(fl(x) ⊙ fl(y)) of

fl(x) ⊙ fl(y).

The result from Step 3 is the value ofx ⊙ y using n-digit rounding.

Example 2.1.3. Let us consider the function f : [0, ∞) → R given by

√ √
f (x) = x( x + 1 − x), x ∈ R.

We evaluate f (100000) using 6-digit rounding arithmetic. We have

√ √
f (100000) = 100000( 100001 − 100000)

We see that
√
100001 = 316.229347 · · · = .316229347 · · · × 103 ,

and
√
100000 = 316.227766 · · · = .316227766 · · · × 103 .

Using 6-digit rounding we note that

√ √
fl( 100001) = .316229 × 103 , and fl( 100000) = .316228 × 103 .

Then,
√ √
fl( 100001) − fl( 100000) = .000001 × 103 = .1 × 10−2 .

Hence,
√ √
fl(fl( 100001) − fl( 100000)) = .1 × 10−2 .
15 Types of Errors

Again we see that fl(100000) = .1 × 106 . Then,

√ √
fl(100000)×fl(fl( 100001)−fl( 100000)) = .1×106 ×.1×10−2 = .01×104 = .1×103 .

Finally, we have
fl( f (100000)) = .1 × 103 = 100.

Similarly, using 6-digit chopping, the value of fl( f (100000)) = 200.

2.2 Types of Errors

The approximate representation of a real number obviously differs from the actual
number, whose difference is called an error.

Definition 2.2.1 (Errors). (i). The error in a computed quantity is defined as

Error = True Value – Approximate Value.

(ii). Absolute value of an error is called the absolute error.

(iii). The relative error is a measure of the error in relation to the size of the true value
as given by

Error
Relative Error = , Provided True Value ̸= 0.
True Value

(iv). Absolute value of the relative error is called the absolute relative error.

(v). The percentage error is defined as

Percentage Error = 100 × |Relative Error| .

Remark 2.2.1. Let xA denote the approximation to the real number x ∈ R. We use the
following notations.

E(xA ) := Error(xA ) = x − xA . (2.2.1)

Ea (xA ) := Absolute Error(xA ) = |x − xA |. (2.2.2)
x − xA
Er (xA ) := Relative Error(xA ) = , x ̸= 0. (2.2.3)
x
x − xA
Ear (xA ) := Absolute Relative Error(xA ) = , x ̸= 0. (2.2.4)
x

The absolute error has to be understood more carefully because a relatively small
difference between two large numbers can appear to be large, and a relatively large
2. Floating Point Approximation of Real Numbers and Error Analysis 16

difference between two small numbers can appear to be small. On the other hand, the
relative error gives a percentage of the difference between two numbers, which is usually
more meaningful as illustrated below.

Example 2.2.1.

2.3 Sources of Errors

• Mathematical modeling of a physical problem.

• Uncertainty in physical data or empirical measurements.

• Errors from previous computations.

• Blunders in arithmetic computations or in methods.

• Machine Errors.

• Mathematical Truncation Error.

2.4 Machine Errors

In this chapter we will be discussing Machine Errors.

2.4.1 Errors in Floating-point Approximations

With most of real numbers x ∈ R, we have fl(x) ̸= x. We now measure the errors for
floating for approximations of real numbers.

Theorem 2.4.1. Let x ∈ R, x ̸= 0 be such that UFL ⩽ |fl(x)| ⩽ OFL. Then,


β −n+1 , in case of chopping,
Ear (fl(x)) ⩽
 1 β −n+1 , in case of rounding.
2

Proof. Let us assume that x has floating-point representation

∞
di
x = (−1)s × (.d1 d2 . . . dn dn+1 . . . )β × β e = (−1)s β e ∑ i
.
i=1 β

First, We note that

!
∞ ∞
di di d1
|x| = (−1)s β e ∑ βi = βe ∑ i
⩾ β e ⩾ β e−1 . (2.4.1)
i=1 i=1 β β
17 Machine Errors

We prove the result for two cases separately.

Case of chopping : In case of n-digit chopping

n
di
fl(x) = (−1)s × (.d1 d2 . . . dn )β × β e = (−1)s β e ∑ i
.
i=1 β

Hence,
!
n
∞
di di
Ea (fl(x)) = |x − fl(x)| = (−1)s β e ∑ i − ∑ i ,
i=1 β i=1 β
!
∞
di
= βe ∑ i ,
i=n+1 β
∞ ∞
di β −1
= βe ∑ βi ⩽ β e
∑ βi ,
i=n+1 i=n+1
∞
1
= β e (β − 1) ∑ i
,
i=n+1 β
1
e β n+1 βe
= β (β − 1) = = β e−n .
1 − β1 βn

Using (2.4.1), we have

|x − fl(x)| β e−n
Ear (fl(x)) = ⩽ e−1 = β −n+1 .
|x| β

Case of rounding : Sub-case: 0 ⩽ dn+1 < β2 . We note that

n
s e s e di
fl(x) = (−1) × (.d1 d2 . . . dn )β × β = (−1) β ∑ βi.
i=1

Hence,
!
∞ n
s e di di
Ea (fl(x)) = |x − fl(x)| = (−1) β ∑ i − ∑ i ,
i=1 β i=1 β
!
∞
e di
= β ∑ βi ,
i=n+1
( )
∞ ∞
di dn+1 di
= βe ∑ i = βe n+1
+ ∑ i ,
i=n+1 β β i=n+2 β
(β )
− 1 ∞
1
≤ β e 2 n+1 + (β − 1) ∑ i
β i=n+2 β

β β
∵ 0 ⩽ dn+1 < =⇒ 0 ⩽ dn+1 ⩽ − 1, and di ⩽ β − 1
2 2
2. Floating Point Approximation of Real Numbers and Error Analysis 18
( 1 )
β − 2 β n+2
= βe + (β − 1) ,
2β n+1 1 − β1

e β −2 1
=β + ,
2β n+1 β n+1
β −2+2 e β 1 e−n
= βe = β = β .
2β n+1 2β n+1 2

Using (2.4.1), we have

|x − fl(x)| 1 β e−n 1 −n+1

Ear (fl(x)) = ⩽ = β .
|x| 2 β e−1 2

β
Case of rounding : Sub-case: 2 ⩽ dn+1 < β . We note that
!
n
di 1
fl(x) = (−1)s × (.d1 d2 . . . (dn + 1))β × β e = (−1)s β e ∑ βi + βn .
i=1

Hence,
!
∞ n
s e di di 1
Ea (fl(x)) = |x − fl(x)| = (−1) β ∑ i − ∑ i − n ,
i=1 β i=1 β β
!
∞
di 1
= βe ∑ i − n ,
i=n+1 β β
∞ ∞ 1 !
di 1 β n+1 1
∵ ∑ i ≤ (β − 1) ∑ i = (β − 1) 1
= n
i=n+1 β i=n+1 β 1− β β
!
∞
1 di
= βe n
− ∑ i
β i=n+1 β

e 1 dn+1
≤β −
β n β n+1

β
∵ ⩽ dn+1
2

e 1 β 1 e−n
≤β − = β .
β n 2β n+1 2

Finally, using (2.4.1), we have

|x − fl(x)| 1 β e−n 1 −n+1

Ear (fl(x)) = ⩽ = β .
|x| 2 β e−1 2

This completes the proof.

19 Machine Errors

2.4.2 Accuracy of Floating-point Approximations

Unit round-off

We now introduce two measures that give a fairly precise idea of the possible accuracy in
a floating-point approximation.
Definition 2.4.1 (Unit round-off (u)). The unit round-off of a computer is denoted by u
and is defined as
n o
u = sup Ear (fl(x)) : x ∈ R, OFL ≤ |x| ≤ UFL .

i.e., it is the maximum possible value of Ear (fl(x)) for any x satisfying the condition as in
Theorem 2.4.1. By Theorem 2.4.1 we note that

β −n+1 , in case of chopping,
u=
 1 β −n+1 , in case of rounding.
2

Alternative characterization of unit round-off

Definition 2.4.2 (Machine Epsilon(emach )). The machine epsilon of a computer is

denoted by εmach and is defined as

εmach = inf δ ∈ F : δ > 0, and fl(1 + δ ) > 1 .

Theorem 2.4.2. 
β −n+1 , in case of chopping,
εmach =
 1 β −n+1 , in case of rounding.
2
Proof. See book: Atkinson: Introduction ot Numerical Analysis, 2ed., p-15.
Remark 2.4.1. From the definition of εmach , we note that for any δ ∈ F with
0 < δ < εmach implies fl(1 + δ ) = 1, i.e., in computer the number 1 + δ is identical with 1.
Remark 2.4.2. we note that εmach ̸= UFL. εmach is determined by the precision i.e., the
number of digits in the mantissa field of Floating-point system, whereas the UFL is
determined by the number of lower bound in the exponent field.
Example 2.4.1.

Maximal Accuracy

Definition 2.4.3 (Maximal Accuracy). The maximal accuracy in a floating-point

representation is denoted my Macc and is defined as

Macc = sup m ∈ Z : m ⩾ 0, and fl(m) = m .
2. Floating Point Approximation of Real Numbers and Error Analysis 20

Remark 2.4.3. From the definition of maximal accuracy, we note that for any
fl(Macc + 1) ̸= Macc + 1.

Theorem 2.4.3.
Macc = β n .

Proof. See book: Atkinson: Introduction ot Numerical Analysis, 2ed., p-16.

2.4.3 Loss of Significance

In place of relative error, we often use the concept of significant digits that is closely
related to relative error.

Definition 2.4.4 (Significant digits). Let us consider a floating-point number system with
base β ∈ N with β ⩾ 2. For x ∈ R, let xA be an approximation of x. Let

n
k
o 1 s+1−t
s = sup k ∈ Z : β ⩽ |x| and r = sup t ∈ N : |x − xA | ≤ β
2

Then, we say xA has r significant digits to approximate x.

Example 2.4.2. We find significant digits for the following approximations.

1
(i). x = 3 = .3333 . . . and xA = .333. We note that

10−1 = .1 < x, hence, s = −1

and

|x − xA | = (.0003333 . . . ) ⩽ (.3333 . . . ) × 10−3 ,

1
= × (.6666 . . . ) × 10−3 ,
2
1 1
< × 10−3 = × 10−1+1−3
2 2

Therefore, xA has 3 significant digits to approximate x.

(ii). x = 23.496 and xA = 23.494 We note that

101 < x, hence, s = 1,

and

|x − xA | = (.002) = .2 × 10−2 ,
1
= × .4 × 10−2 ,
2
21 Machine Errors

1 1
< × 10−2 = × 101+1−4
2 2

Therefore, xA has 4 significant digits to approximate x.

(iii). x = 0.02138 and xA = 0.02144 We note that

10−2 < x, hence, s = −2,

and

|x − xA | = (.00006) = .6 × 10−4 ,
1
= × 1.2 × 10−4 ,
2
1
= × .12 × 10−3 ,
2
1 1
< × 10−3 = × 10−2+1−2
2 2

Therefore, xA has 2 significant digits to approximate x.

Remark 2.4.4. (i). We note that

|x − xA | 1 β s−r+1 1 1−r
Ear (xA ) = ⩽ = β .
|x| 2 βs 2

Hence, absolute relative error decreases with an increase in the number of

significant digits.

(ii). Number of significant digits roughly measures the number of leading non-zero digits
of xA that are correct relative to the corresponding digits in the true value x.
However, this is not a precise way to get the number of significant digits as we have
seen in the above examples.

(iii). The role of significant digits in numerical calculations is very important in the sense
that the loss of significant digits may result in drastic amplification of the relative
error as illustrated in the following example.

Example 2.4.3 (Loss of Significance). (i). Let us consider two real numbers

x = 7.6545428 = .76545428 × 101 , y = 7.6544201 = .76544201 × 101 .

Let

xA = 7.6545421 = .76545421 × 101 , yA = 7.6544200 = .76544200 × 101

Lecture Notes17
No ratings yet
Lecture Notes17
122 pages
Errors Analysis and Basic Definitions in Numerical Analysis
No ratings yet
Errors Analysis and Basic Definitions in Numerical Analysis
14 pages
Maths All Notes
No ratings yet
Maths All Notes
122 pages
Chapter 1 (5 Lectures)
No ratings yet
Chapter 1 (5 Lectures)
15 pages
Unit 1
No ratings yet
Unit 1
7 pages
Numerical Analysis Lecture Notes: 1. Computer Arithmetic
No ratings yet
Numerical Analysis Lecture Notes: 1. Computer Arithmetic
6 pages
Introduction To Numerical Analysis With Examples
No ratings yet
Introduction To Numerical Analysis With Examples
36 pages
许韬-Lecture Notes 01 - Introduction
No ratings yet
许韬-Lecture Notes 01 - Introduction
43 pages
MA 214 Lecture 4
No ratings yet
MA 214 Lecture 4
103 pages
Error Analysis & Number Representation
No ratings yet
Error Analysis & Number Representation
34 pages
NMO Topic1
No ratings yet
NMO Topic1
34 pages
Macm316 Week1
No ratings yet
Macm316 Week1
28 pages
ch1 Anal Num
No ratings yet
ch1 Anal Num
17 pages
Dan Um Jay A
No ratings yet
Dan Um Jay A
561 pages
Computational Physics I: Luigi Scorzato Lecture 2: Floating Point Arithmetic
No ratings yet
Computational Physics I: Luigi Scorzato Lecture 2: Floating Point Arithmetic
7 pages
WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's
No ratings yet
WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's
30 pages
Week 2 M1Lessons 2-3
No ratings yet
Week 2 M1Lessons 2-3
41 pages
CBNST Notes For BCA PU 3rd Sem Based On Syllabus PDF
100% (1)
CBNST Notes For BCA PU 3rd Sem Based On Syllabus PDF
27 pages
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
No ratings yet
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
104 pages
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
No ratings yet
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
104 pages
Numerical Analysis Essentials
No ratings yet
Numerical Analysis Essentials
68 pages
Numerical Analysis 1 Notes
No ratings yet
Numerical Analysis 1 Notes
111 pages
Numerical Method: Dr. Ali A. F. Al-Hamadani
No ratings yet
Numerical Method: Dr. Ali A. F. Al-Hamadani
13 pages
Numerical Analysis 1 - 2022-2023 Courses
No ratings yet
Numerical Analysis 1 - 2022-2023 Courses
65 pages
Scientific Computing - LESSON 1: Computer Arithmetic and Error Analysis 1
No ratings yet
Scientific Computing - LESSON 1: Computer Arithmetic and Error Analysis 1
10 pages
Eeg 823
No ratings yet
Eeg 823
71 pages
EEPC 102 Module 1
No ratings yet
EEPC 102 Module 1
6 pages
Numerical Analysis: Lecture - 1
No ratings yet
Numerical Analysis: Lecture - 1
9 pages
Numerical Methods
No ratings yet
Numerical Methods
17 pages
L3 Source of Error, Floating-Point
No ratings yet
L3 Source of Error, Floating-Point
26 pages
arch2-LECTURE-ERROR ANALYSIS
No ratings yet
arch2-LECTURE-ERROR ANALYSIS
19 pages
T01 Floating Point
No ratings yet
T01 Floating Point
55 pages
Numerical Error Analysis Guide
No ratings yet
Numerical Error Analysis Guide
8 pages
(3.) Approximation and Errors (Part 1)
No ratings yet
(3.) Approximation and Errors (Part 1)
24 pages
Hamza Ali (NC)
No ratings yet
Hamza Ali (NC)
22 pages
NMST Unit 1
No ratings yet
NMST Unit 1
10 pages
Numerical Computing MCSE004
No ratings yet
Numerical Computing MCSE004
192 pages
Unit 1
No ratings yet
Unit 1
19 pages
Numerical Methods for Students
No ratings yet
Numerical Methods for Students
62 pages
Numerical Methods
100% (1)
Numerical Methods
649 pages
CISE301-Topic 1 Approximation and Round Off Error
No ratings yet
CISE301-Topic 1 Approximation and Round Off Error
61 pages
Numerical and Statistical Techniques
No ratings yet
Numerical and Statistical Techniques
162 pages
GSC-320 Numerical Computing: Lecturer:Fasiha Ikram
No ratings yet
GSC-320 Numerical Computing: Lecturer:Fasiha Ikram
17 pages
MTH 375 Handout
0% (1)
MTH 375 Handout
67 pages
Numerical Analysis
No ratings yet
Numerical Analysis
17 pages
Floating Point Arithmetic Guide
No ratings yet
Floating Point Arithmetic Guide
158 pages
Chapter2 Computation Error
No ratings yet
Chapter2 Computation Error
25 pages
Approximations, Errors and Their Analysis: AML702 Applied Computational Methods
No ratings yet
Approximations, Errors and Their Analysis: AML702 Applied Computational Methods
13 pages
Tut 8s
No ratings yet
Tut 8s
5 pages
Numerical Analysis: Lecture - 1
No ratings yet
Numerical Analysis: Lecture - 1
10 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
Introduction
No ratings yet
Introduction
8 pages
Week1 Handout
No ratings yet
Week1 Handout
24 pages
Lecture 2
No ratings yet
Lecture 2
58 pages
Lect 02&03
No ratings yet
Lect 02&03
22 pages
Mathematical Modeling
No ratings yet
Mathematical Modeling
14 pages
Eepc102 Module 1
No ratings yet
Eepc102 Module 1
6 pages
Course Schedule 2019-202 Semester (Pre-Registration)
No ratings yet
Course Schedule 2019-202 Semester (Pre-Registration)
3 pages
Pods 1 Mth202a 2023 24
No ratings yet
Pods 1 Mth202a 2023 24
2 pages
Fluid Dynamicsss
No ratings yet
Fluid Dynamicsss
13 pages
Pods 3 Mth202a 2023 24
No ratings yet
Pods 3 Mth202a 2023 24
1 page
Pods 4 Mth202a 2023 24
No ratings yet
Pods 4 Mth202a 2023 24
1 page
Advanced Combinatorics Problems
No ratings yet
Advanced Combinatorics Problems
1 page
Numerical Anaysis
No ratings yet
Numerical Anaysis
10 pages
Very Lonely She by by Motilal Jotwani
No ratings yet
Very Lonely She by by Motilal Jotwani
3 pages
Effect of PEG Additive On Membrane Formation by Phase Inversion 1998 Journal of Membrane Science
No ratings yet
Effect of PEG Additive On Membrane Formation by Phase Inversion 1998 Journal of Membrane Science
11 pages
Foundations of Engineering Mechanics
No ratings yet
Foundations of Engineering Mechanics
858 pages
Chapter 2 - Quadratic Functions
No ratings yet
Chapter 2 - Quadratic Functions
48 pages
Electrical Engineers' Load Manager
No ratings yet
Electrical Engineers' Load Manager
15 pages
Advantage and Disadvantage of Interviews
No ratings yet
Advantage and Disadvantage of Interviews
2 pages
Be Computer Engineering Semester 3 2024 May Engineering Mathematics III Rev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 3 2024 May Engineering Mathematics III Rev 2019 C Scheme
2 pages
Seismic Hazards Assessment: Date 4 November 2019 Coordinates 7.40605, 125.41992
No ratings yet
Seismic Hazards Assessment: Date 4 November 2019 Coordinates 7.40605, 125.41992
6 pages
Unit-2 MCQS
No ratings yet
Unit-2 MCQS
25 pages
EPQ Project Timeline & Tasks
No ratings yet
EPQ Project Timeline & Tasks
1 page
2'23 Historical Background of Casework
No ratings yet
2'23 Historical Background of Casework
24 pages
Pip Adg010 Guideline For Numbering Practices: Work Processes
50% (2)
Pip Adg010 Guideline For Numbering Practices: Work Processes
8 pages
Finite Element Analysis for Students
100% (1)
Finite Element Analysis for Students
47 pages
01 Heart of The Tree PDF
No ratings yet
01 Heart of The Tree PDF
4 pages
Journalism Workshop for Students
No ratings yet
Journalism Workshop for Students
3 pages
Practice #3
No ratings yet
Practice #3
5 pages
Navarro - LC Chapter 1
No ratings yet
Navarro - LC Chapter 1
20 pages
Recycling Survey
No ratings yet
Recycling Survey
2 pages
ML Ensemble for Ionosphere Forecasting
No ratings yet
ML Ensemble for Ionosphere Forecasting
4 pages
IUCN Red List of Bangladesh - Freshwater Fishes
No ratings yet
IUCN Red List of Bangladesh - Freshwater Fishes
16 pages
08 Membrane Structure
No ratings yet
08 Membrane Structure
28 pages
Sustainable Fashion Strategy
No ratings yet
Sustainable Fashion Strategy
55 pages
Clarke & Braun (2009)
No ratings yet
Clarke & Braun (2009)
39 pages
STATMed Lab Setup & SOP Guide
No ratings yet
STATMed Lab Setup & SOP Guide
17 pages
Smaw NC II Portfollio
100% (2)
Smaw NC II Portfollio
170 pages
21 Fundamentals+Tourism Industry May 2022
No ratings yet
21 Fundamentals+Tourism Industry May 2022
2 pages
Elias Codes
No ratings yet
Elias Codes
6 pages
Risk Assessment High Strain Dynamic Testing of Piles
90% (10)
Risk Assessment High Strain Dynamic Testing of Piles
4 pages
The Origins of Language
No ratings yet
The Origins of Language
35 pages
Newton Raphson Main Note
No ratings yet
Newton Raphson Main Note
10 pages