0% found this document useful (0 votes)

4 views66 pages

A) B) C) D) E) F)

The document discusses significant figures, accuracy, precision, and various types of numerical errors, including truncation and round-off errors. It also covers computer representation of numbers, including signed integers in different forms (sign-magnitude, 1's complement, and 2's complement), as well as floating-point representation according to IEEE standards. Examples are provided to illustrate calculations and error analysis in numerical approximations.

Uploaded by

ermikasa2716

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views66 pages

A) B) C) D) E) F)

Uploaded by

ermikasa2716

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Number System and Numerical Error Analysis

SIGNIFICANT FIGURES
The concept of a significant figure, or digit, has been developed to formally designate
the reliability of a numerical value.
How many significant figures does each of the following contain?
a) 345
b) 0.00789
c) 4503
d) 45300
e) 4.5300x104
f) 20.00 1
ACCURACY AND PRECISION
✓ Accuracy refers to how closely a computed or measured value agrees with the true value.
✓ Precision refers to how closely individual computed or measured values agree with each other.
✓ These concepts can be illustrated graphically using an analogy from target practice. The bullet holes on
each target in Fig. below can be thought of as the predictions of a numerical technique, whereas the bull’s-
eye represents the truth.

Fig (a) Inaccurate and imprecise; (b) accurate and imprecise; (c) inaccurate and precise; (d) accurate and precise
2
• ERROR DEFINITIONS
Truncation errors results when approximations are used to represent exact mathematical
procedures.
• Consider evaluating the integral
1 −𝑥 2 −𝑥 2
𝐼= ‫׬‬0 𝑒 there is no antiderivative for 𝑒 in terms of elementary functions and thus the
integral can’t be evaluated explicitly. Instead, we approximate the integral with a quantity that
can be computed. For example, use the Taylor approximation Centered at x=0.
4 6 8
−𝑥 2
𝑥 𝑥 𝑥
𝑒 ≈ 1 − 𝑥2 + − +
2! 3! 4!
Then
1 𝑥4 𝑥6 𝑥8
• 𝐼= ‫׬‬0 (1 2
− 𝑥 +
2!
−
3!
+
4!
)dx
• Which can be evaluated easily. We approximate a problem we can’t solve with a new problem
that we can solve this introduces truncation error. 3
Round-off error results when numbers having limited significant figures are used to
represent exact numbers.
1
✓ If we represent 3
with six significant digits as 0.333333, the rounding error is given
1
by − 0.333333 = 0.0000003333.
3

✓ there are other numbers that cannot be represented exactly. For example, π and 2 are
numbers that need to be approximated in computer calculations.

4
Example1: Compute the absolute error and true percent relative error in
approximations of p by pa
p= 2 pa=1.414
Solution
absolute error = true value – approximation
absolute error = 2 – 1.414 = 𝟐. 𝟏𝟑𝟔𝒙𝟏𝟎−𝟒
true percent relative error
true error
εt = x100%
true value
2.136𝑥10−4
εt = x100% = 0.015%
2 5
In real-world applications, we will obviously not know the true answer a priori. For
these situations, an alternative is to normalize the error using the best available estimate
of the true value, that is, to the approximation itself.
𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑒𝑟𝑟𝑜𝑟
✓ εa = 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
100%
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 − 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
✓ εa = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
100%
where the subscript a signifies that the error is normalized to an approximate value

6
Often, when performing computations, we may not be concerned with the sign of the
error, but we are interested in whether the percent absolute value is lower than a
prespecified percent tolerance εs. Therefore, it is often useful to employ the absolute
value of the above Equations. For such cases, the computation is repeated until |εa| < εs
εs = (0.5 × 𝟏𝟎𝟐−𝒏 )%
The result is correct to at least n significant figures.

7
• Example: The exponential function can be computed using
𝑥2 𝑥3 𝑥𝑛
𝑒 𝑥 =1 +x+ + +… +
2 3! 𝑛!
as more terms are added in sequence, the approximation becomes a better and better
estimate of the true value of 𝑒 𝑥 . the above equation is called a Maclaurin series
expansion.
Estimate 𝒆𝟎.𝟓 After each new term is added, compute the true and approximate percent
relative errors. Note that the true value is 𝑒 0.5 = 1.648721.... Add terms until the absolute
value of the approximate error estimate εa falls below a prespecified error criterion εs
conforming to three significant figures.

8
• Solution:
the error criterion that ensures a result that is correct to at least three significant figures:
εs = (0.5 × 102−𝑛 )%
εs = (0.5 × 102−3 )% = 0.05%
Thus, we will add terms to the series until εa falls below this level.
the first estimate is equal to 1. The second estimate is then generated by adding the second term
as in
𝑒 0.5=1 + x=1 + 0.5 = 1.5
This represents a true percent relative error of
1.648721 − 1.5
εt = x100% = 9.02%
1.648721
approximate estimate of the error
1.5−1
εa= 1.5
x100% =33.3%
Because εa is not less than the required value of εs, we would continue the computation by
𝑥2
adding another term and repeating the error calculations. The process is continued
2
until |εa| < εs. The entire computation can be summarized as
9
Thus, after six terms are included, the approximate error falls below εs = 0.05%, and the
computation is terminated.

10
• Computer Representation of Numbers
numbers on the computer are represented with a binary, or base-2, system. Just as with the
decimal system, quantities can be represented using positional notation. For example, the binary
number 101.1 is equivalent to (1 × 22 )+ (0 × 21 ) + (1 ×20 ) + (1 × 2−1 ) = 4 + 0 + 1 + 0.5 = 5.5 in
the decimal system.
Integer Representation
There are three forms in which signed integer (whole) numbers can be represented in binary:
a. Sign-magnitude,
b. 1's complement, and
c. 2' complement.
Of these, the 2's complement is the most important and the sign-magnitude is the least used.

11
❑ The most straightforward approach, called the signed magnitude method, employs
the first bit of a word to indicate the sign, with a 0 for positive and a 1 for negative.
The remaining bits are used to store the number. For example, the integer value of 173
is represented in binary as 10101101:
❑ The binary representation of the decimal integer –173 on a 16-bit computer using the
signed magnitude method is shown in the fig below.

12
Ex. Express the decimal number -45 as an 8-bit number in the
a. sign-magnitude
b. 1's complement, and
c. 2's complement forms.

13
Solution
First write the 8-bit binary number for +45
Convert 45 to binary using repeated division by 2 method as follows
Decimal number Quotient after division Remainder after division
45 22 1 R0
2
22 11 0 R1
2
11 5 1 R2
2
5 2 1 R3
2
2 1 0 R4
2
1 0 1 R5
2

14
Hence 45 in binary =R5R4R3R2R1R0=101101 =00101101(8 bit representation)
In the sign- magnitude form -45 is produced by changing the sign bit to a 1 and leaving
the magnitude bits as they are the number is 10101101
In the 1’s complement form -45 is produced by taking 1’s complement of +45 (00101101)
The 1’s complement of a binary number is found by changing all 1s to 0s and all 0s to 1s,
so 1’s complement of 00101101 is 11010010
Hence -45 = 11010010(1’s complement form)

15
In the 2’s complement form -45 is produced by taking 2’s complement of +45 (00101101)
2’s complement of a binary number can be found as follows
✓ Start at the right with the LSB and write the bits as they are up to and including
the first 1.
✓ Take the 1’s complements of the remaining bits.
Hence -45 = 11010011((2’s complement form)
Note: The 2’s complement of a binary number can be found by adding 1 to the
LSB(leftmost bit) of the 1’s complement.
11010010
+1
11010011
16
The Decimal Value of Signed Numbers
Sign-magnitude Decimal values of positive and negative numbers in the sign-magnitude form are
determined by summing the weights in all the magnitude bit positions where there are 1s and ignoring those
positions where there are zeros. The sign is determined by examination of the sign bit.
Example: Determine the decimal value of the following signed binary number expressed in sign-magnitude
form.
10101101
The seven magnitude bits and their powers-of-two weights are as follows:
26 25 24 23 22 21 20
0 1 0 1 1 0 1

Summing the weights where there are 1s, 25 + 23 + 22 + 20 =45

The sign bit is 1; therefore, the decimal number is -45.

17
1’s Complement Decimal values of positive numbers in the 1's complement form are
determined by summing the weights in all bit positions where there are 1s and ignoring
those positions where there are zeros. Decimal values of negative numbers are
determined by assigning a negative value to the weight of the sign bit. summing all the
weights where there are 1s. and adding 1 to the result.
Example: Determine the decimal value of the following signed binary number expressed
in 1's complement form.
11010010

18
Solution
The bits and their powers-of-two weights for the negative number are as follows.
Notice that the negative sign bit has a weight of −27 or − 128.
−27 26 25 24 23 22 21 20
1 1 0 1 0 0 1 0
Summing the weights where there are 1s,
− 128 + 64 +16 +2 = − 46
Adding 1 to the result, the final decimal number is
− 46 +1 = − 45

19
2’s Complement Decimal values of positive and negative numbers in the 2's
complement form are determined by summing the weights in all bit positions where
there are 1s and ignoring those positions where there are zeros. The weight of the sign
bit in a negative number is given a negative value.
Example: Determine the decimal value of the following signed binary number
expressed in 2's complement form.
11010011

20
Solution
The bits and their powers-of-two weights for the negative number are as follows.
Notice that the negative sign bit has a weight of −27 or − 128.
−27 26 25 24 23 22 21 20
1 1 0 1 0 0 1 1
Summing the weights where there are 1s,
− 128 + 64 +16 +2 + 1 = − 45
hence the decimal number is − 45

21
• Floating-Point Representation.
Fractional quantities are typically represented in computers using floating-point format.
In this approach, which is very much like scientific notation, the number is expressed as
±s x 𝑏 𝑒
where s = the significand (or mantissa), b = the base of the number system being used,
And e = the exponent.
The following figure shows the manner in which a floating-point number is stored in a
word.

22
o Prior to being expressed in this form, the number is normalized by moving the binary
place over so that only one significant digit is to the left of the binary point. This is
done so computer memory is not wasted on storing useless nonsignificant zeros.
o For example, a value like 0.005678 could be represented in a wasteful manner as
0.005678 ×100 in hypothetical floating-point base-10 system. However, normalization
would yield 5.678 × 10−3 which eliminates the useless zeroes.
o The following figure shows the manner in which a floating-point number is stored using
IEEE standard.

23
1. Single Precision: Single Precision is a format proposed by IEEE for the representation
of floating-point numbers. It occupies 32 bits in computer memory.

2. Double Precision: Double Precision is also a format given by IEEE for the
representation of the floating-point number. It occupies 64 bits in computer memory.

24
Example 1: Use Single Precision IEEE-754 floating point standard to find the decimal
equivalent of the following floating-point machine number.

Solution:
Here sign bit is 1 so the number is negative and it can be written as
(−1𝑆 ) x 1.M x 2𝐸−127
Exponent bits = 10000011= 27 + 21 + 20 = 1+2+128=131
M = 10000000000000000000000
Substitute the values into the above formula to obtain final answer
(−11 ) x 1. 10000000000000000000000 x 2131−127
= -1.5 x 24 = -24
25
Example 2 convert -24 to Single Precision IEEE-754 floating point standard
Solution:
Convert 24 to binary using repeated division by 2 method as follows
Decimal number Quotient after division Remainder after division
24 12 0 R0
2

12 6 0 R1
2

6 3 0 R2
2

3 1 1 R3
2

1 0 1 R4
2

26
Hence 24 in binary =R4R3R2R1R0=11000
24=1.1 x 24 the exponent is 4 add exponent bias 127
4 + 127=131 then convert 131 to eight bit binary because in single precision format the
exponent is represented using 8 bits.

27
Decimal number Quotient after division Remainder after division
131 65 1 R0
2
65 32 1 R1
2
32 16 0 R2
2
16 8 0 R3
2
8 4 0 R4
2
4 2 0 R5
2
2 1 0 R6
2
1 0 1 R7
2

28
131=R7R6R5R4R3R2R1R0 =10000011
The given number is negative so the sign bit will be 1 the exponent is 10000011
and the mantissa is 10000000000000000000000
Sign(1bit) Exponent (8 bits) Mantissa (23 bits)

1 10000011 10000000000000000000000

29
For normalized floating-point numbers rounding yields a lower absolute error than
chopping. when chopping is employed
∆𝑥
≤ε
𝑥
and, for cases where rounding is employed,
∆𝑥 𝜀
≤
𝑥 2
Where ε is referred to as the machine epsilon, machine epsilon is the distance between
1 and the next smallest representable number.
In MATLAB this value is assigned to the predefined variable eps. As shown below,
when the name of the variable eps is typed (Command Window), the assigned value is
displayed.
30
Subtractive Cancellation
This term refers to the round-off induced when subtracting two nearly equal floating-point numbers. One
common instance where this can occur involves finding the roots of a quadratic equation or parabola with the
quadratic formula.

−𝑏± 𝑏 2 −4𝑎𝑐
𝑥1 , 𝑥2 = 2𝑎
For cases where b2 >> 4ac, the difference in the numerator can be very small and this causes round-off error. In
many cases, the form of the mathematical expressions that contain subtraction of two quantities that are nearly
equal can be changed to a different form that is less likely to cause round-off errors. For instance, we can change
the form of the quadratic formula by rationalizing the numerator as follows.

−𝑏+ 𝑏 2 −4𝑎𝑐 −𝑏− 𝑏 2 −4𝑎𝑐 𝑏 2 −(𝑏 2 −4𝑎𝑐)

𝑥1 = ( )=
2𝑎 −𝑏− 𝑏 2 −4𝑎𝑐 2𝑎(−𝑏− 𝑏 2 −4𝑎𝑐

31
which simplifies to an alternate quadratic formula
−2𝑐
𝑥1 =
𝑏+ 𝑏 2 −4𝑎𝑐

The rationalization technique can also be applied to give the following alternative
quadratic formula for 𝑥2
−2𝑐
𝑥2 = 2𝑏− 𝑏 −4𝑎𝑐
For example, consider the quadratic equation: x2 - 100.000lx+ 0.01 = 0
for which the exact solutions are 𝑥1 = 100 and 𝑥2 = 0.0001. The solutions can be
calculated with the quadratic formula:
−𝑏± 𝑏 2 −4𝑎𝑐
𝑥1 , 𝑥2 = 2𝑎

32
• Using MATLAB (Command Window) to calculate 𝑥1 and 𝑥2 gives:

33
The value that is calculated by MATLAB for 𝑥2 is not exact due to round-off errors.
The round-off error occurs in the numerator in the expression for 𝑥2 Since b is
negative, the numerator involves subtraction of two numbers that are nearly equal. But
alternative formula for 𝑥2 gives exact value as shown below .

34
CHAPTER TWO
ROOTS OF EQUATIONS
GRAPHICAL METHODS
A simple method for obtaining an estimate of the root of the equation f(x) = 0 is to make a
plot of the function and observe where it crosses the x axis. This point, which represents
the x value for which f(x) = 0, provides a rough approximation of the root.

Graphical techniques are of limited practical value because they are not precise. However, graphical
methods can be utilized to obtain rough estimates of roots. These estimates can be employed as starting
guesses for numerical methods discussed in this chapter. 35
Bracketing Methods
These techniques are called bracketing methods because two initial guesses for the root are required. As the name
implies, these guesses must “bracket,” or be on either side of, the root. these techniques are based on the
Intermediate Value Theorem.
THE BISECTION METHOD
A simple algorithm for the bisection calculation is shown below.
Step 1: Choose lower xl and upper xu guesses for the root such that the function changes
sign over the interval. This can be checked by ensuring that f(xl)f(xu) < 0.
Step 2: An estimate of the root xr is determined by
𝑥𝑙 + 𝑥𝑢
𝑥𝑟 =
2
Step 3: Make the following evaluations to determine in which subinterval the root lies:
(a) If f(xl)f(xr) < 0, the root lies in the lower subinterval. Therefore, set xu = xr and
return to step 2.
(b) If f(xl)f(xr) > 0, the root lies in the upper subinterval. Therefore, set xl = xr and
return to step 2.
(c) If f(xl)f(xr) = 0, the root equals xr; terminate the computation.
36
Example: Determine the root of 𝑥 3 + 4𝑥 2 - 10 using bisection method. Employ initial
guesses of xl = 1 and xu = 2 and iterate until the estimated error ℇa falls below a level of
ℇs = 0.000001%.
Solution:
Iteration 1
xl =1, xu = 2
xl +xu 1+2
xr =
2
= 2
=1.5
f(x ) =13 +
l 4x12 – 10= -5
f(xu) =23 + 4x22 – 10= 14
f(xr) =1.53 + 4x1.52 – 10 =2.3750
f(xl)x f(xr)< 0 Therefore, we will set xu = xr 37
Iteration 2
xl =1, xu = 1.5
xl +xu
xr = 2
= 1+1.5
2
=1.25
f(xl) =13 + 4𝑥12 – 10 = -5
f(xu) =1.53 + 4𝑥1.52 – 10 = 2.3750
f(xr) =1.253 + 4𝑥1.25
2 – 10 = -1.7969
f(xu)x f(xr)< 0 Therefore, we will set x1= xr
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 − 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
εa = x100%
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
1.25 − 1.5
εa = x100% =20%
1.25
After 27 iterations εa falls below εs = 0.000001%, and the computation can be
terminated. The root estimate is 1.365230
38
THE FALSE-POSITION METHOD
A shortcoming of the bisection method is that, in dividing the interval from xl to xu into
equal halves, no account is taken of the magnitudes of f(xl) and f(xu). For example, if f(xl)
is much closer to zero than f(xu), it is likely that the root is closer to xl than to xu.
An alternative method that exploits this graphical insight is to join f(xl) and f(xu) by a
straight line. The intersection of this line with the x axis represents an improved estimate
of the root.

39
Using similar triangles in the above figure, the intersection of the straight line with the x axis
can be estimated as
𝑓(xl) 𝑓(x )
= x −xu
xr−xl r u
Cross-multiplying the above equation yields
f(xl)(xr - xu) = f(xu)(xr - xl)
Collect terms and rearrange:
xr [ f(xl) − f(xu)] = xu f(xl) − xl f(xu)
Divide by f(xl) - f(xu)
𝑥 𝑢 𝑓 (𝑥 𝑙 )−𝑥 𝑙 𝑓(𝑥 𝑢 )
xr =
𝑓 (𝑥 1 )−𝑓(𝑥 𝑢 )

This is one form of the method of false position. Note that it allows the computation of the root xr as a
function of the lower and upper guesses xl and xu. It can be put in an alternative form by expanding it:
40
𝑥𝑢𝑓 (𝑥𝑙 ) 𝑥𝑙𝑓 (𝑥𝑢 )
xr = -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 ) 𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

then adding and subtracting xu on the right-hand side:

expanding it:
𝑥𝑢𝑓 (𝑥𝑙 ) 𝑥𝑙𝑓 (𝑥𝑢 )
xr = xu + – xu -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 ) 𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

Collecting terms yields

𝑥𝑢𝑓 (𝑥𝑢 ) 𝑥𝑙𝑓 (𝑥𝑢 )
xr = xu + -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 ) 𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

𝑓 (𝑥𝑢 )(𝑥𝑙 −𝑥𝑢 )

xr = xu -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

41
which is the same as the former equation we obtained for false position method. We use
this form because it involves one less function evaluation and one less multiplication. In
addition, it is directly comparable with the secant method which will be discussed in this
chapter.
Example: Determine the root of 𝑥 3 + 4𝑥 2 - 10 using false position method. Employ
initial guesses of xl = 1 and xu = 2 and iterate until the estimated error ℇa falls below a
level of ℇs = 0.000001%.
Solution;
Iteration 1
xl =1, xu = 2
f(xl) = 13 + 4x12 – 10 = -5
f(xu) = 23 + 4x22 – 10 = 14
42
𝑓 (𝑥𝑢 )(𝑥𝑙 −𝑥𝑢 )
xr = xu -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

14(1−2)
xr = 2 - = 1.2632
−5−14

f(xr) =1.26323 + 4𝑥1.26322 – 10 = -1.6023

f(xu)x f(xr)< 0 Therefore, we will set xl = xr
Iteration 2
xl =1.2632, xu = 2
f(xl) =1.26323 + 4x1.26322 – 10 = -1.6023
f(xu) =23 + 4x22 – 10 = 14

43
𝑓 (𝑥𝑢 )(𝑥𝑙 −𝑥𝑢 )
xr= xu -
𝑓 (𝑥𝑙 )−𝑓(𝑥𝑢 )

14(1.2632 −2)
xr= 2 - = 1.3388
−1.6023 −14

f(xr) =1.33883 + 4𝑥1.33882 – 10 = - 0.4304

f(xu)x f(xr)< 0 Therefore, we will set xl = xr
1.3388 − 1.2632
εa = x100% =5.6520%
1.3388

• After 14 iterations εa falls below εs = 0.000001%, and the computation can be

terminated. The root estimate is 1.365230

44
Open Methods
For the bracketing methods we have seen the root is located within an interval
prescribed by a lower and an upper bound. Repeated application of these methods
always results in closer estimates of the true value of the root. Such methods are said
to be convergent because they move closer to the truth as the computation progresses
in contrast, the open methods described below are based on formulas that require only
a single starting value of x or two starting values that do not necessarily bracket the
root. As such, they sometimes diverge or move away from the true root as the
computation progresses. However, when the open methods converge, they usually do
so much more quickly than the bracketing methods.

45
SIMPLE FIXED-POINT ITERATION
The formula for root estimation is obtained by rearranging the function f(x) = 0 so
that x is on the left-hand side of the equation:
x = g(x)
This transformation can be accomplished either by algebraic manipulation or by
simply adding x to both sides of the original equation. For example,𝑥 2 -2x +3 =0
𝑥 2 +3
can be simply manipulated to yield whereas sin x = 0 could be put into the form
2
of Equation above by adding x to both sides to yield x = sin x + x.

46
Example: Use simple fixed-point iteration to locate the root of f(x) = 𝑒 −𝑥 - x. Starting
with an initial guess of x0 = 0 and iterate up to third iteration, compute the approximate
error for each iteration.
Solution:
Express the given function in the form of xi+1 = g(xi)
xi+1 = 𝑒 −𝑥𝑖
Iteration 1
xi+1 = 𝑒 0 =1
Iteration 2
xi+1 = 𝑒 −1 = 0.367879

47
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 − 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
εa = x100%
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛
0.367879 − 1
εa = x100% =71.8%
1
Iteration 3
xi+1 = 𝑒 −0.367879 = 0.692201
0.692201 − 0.367879
εa = x100% =46.9 %
0.692201

48
THE NEWTON-RAPHSON METHOD
The following figure shows Graphical depiction of the Newton-Raphson method. A
tangent to the function of xi [that is, f’(xi)] is extrapolated down to the x axis to
provide an estimate of the root at xi+1.

49
the first derivative at x is equivalent to the slope:
𝑓 𝑥𝑖 − 0
𝑓′ 𝑥𝑖 =
𝑥𝑖 − 𝑥𝑖+1
which can be rearranged to yield
𝒇 𝒙𝒊
xi+1 = xi -
𝒇′ (𝒙𝒊 )
Example: Use the Newton-Raphson method to estimate the root of f(x) = 𝑒 −𝑥 - x, employing an initial
guess of x0 = 0. and iterate up to third iteration, compute the approximate error for each iteration.
Solution:
The first derivative of the function can be evaluated as
𝑓 ′ (𝑥) = -𝑒 𝑥 – 1

50
Iteration 1
xi = 0
𝑓 ′ (𝑥) = -𝑒 −𝑥 – 1
𝑓 𝑥𝑖
xi+1 = xi -
𝑓′ (𝑥𝑖 )
𝑒 −𝑥𝑖 − 𝑥𝑖
xi+1 = xi - = 0.5000
−𝑒 −𝑥𝑖 – 1
Iteration 2
xi = 0.5000
𝑒 −𝑥𝑖 − 𝑥𝑖 𝑒 −0.5000 − 0.5000
xi+1 = xi - = 0.5000 - = 0.5663
−𝑒 −𝑥𝑖 – 1 −𝑒 −0.5000 – 1

0.5663 − 0.5000
εa = x100% =11.7076%
0.5663

51
Iteration 3
xi = 0.5663
𝑒 −𝑥𝑖 − 𝑥𝑖 𝑒 −0.5663 − 0.5663
xi+1 = xi - = 0.5663 - = 0.5671
−𝑒 −𝑥𝑖 – 1 −𝑒 −0.5663 – 1
0.5671 − 0.5663
εa = x100% = 0.1411%
0.5671
Thus, the approach rapidly converges on the true root. Notice that the true percent
relative error at each iteration decreases much faster than it does in simple fixed-
point iteration.

52
THE SECANT METHOD
A potential problem in implementing the Newton-Raphson method is the evaluation of the
derivative. Although this is not inconvenient for polynomials and many other functions, there are
certain functions whose derivatives may be extremely difficult or inconvenient to evaluate. For
these cases, the derivative can be approximated by a backward finite divided difference shown
below.
𝑓 𝑥𝑖−1 − 𝑓(𝑥𝑖 )
𝑓 ′ (𝑥𝑖 ) ≈
𝑥𝑖−1 −𝑥𝑖
This approximation can be substituted into Newton-Raphson formula to yield the following
iterative equation:
𝑓(𝑥𝑖 )(𝑥𝑖−1 − 𝑥𝑖 )
𝑥𝑖+1 = 𝑥𝑖 –
𝑓 𝑥𝑖−1 −𝑓(𝑥𝑖 )
The above Equation is the formula for the secant method. Notice that the approach requires two initial
estimates of x. However, because f(x) is not required to change signs between the estimates, it is not
classified as a bracketing method.
53
Example: Use the Secant method to estimate the root of f(x) = 𝑒 −𝑥 - x, Start with
initial estimates of x−1 = 0 and x0 = 1.0. and iterate up to third iteration, compute the
approximate error for each iteration.
Solution: Recall that the true root is 0.56714329. . .
Iteration 1
x−1 = 0
x0 = 1.0
𝑓 𝑥−1 = 1
𝑓 𝑥0 = -0.6321
−0.6321(0− 1)
𝑥1 = 1 – = 0.6127
1−(−0.6321)
54
Iteration 2
x−1 = 1
x0 = 0.6127
𝑓 𝑥−1 = -0.6321
𝑓(𝑥0 )= -0.0708
−0.0708(1−0.6127)
𝑥1 = -0.6321 – = 0.5638
−0.6321−(−0.0708)
0.5638− 0.6127
εa = x100% = 8.6733%
0.5638

55
Iteration 3
x−1 = 0.6127
x0 = 0.5638
𝑓 𝑥−1 = -0.0708
𝑓(𝑥0 )= 0.0052
0.0052(0.6127−0.5638)
𝑥1 =0.5638 – = 0.5672
−0.0708−0.0052
0.5672− 0.5638
εa = x100%= 0.5994%
0.5672

56
Modified Newton-Raphson method
Newton-Raphson and secant methods are linearly, rather than quadratically, convergent for
multiple roots (Ralston and Rabinowitz, 1978).
Modifications have been proposed to alleviate this problem. Ralston and Rabinowitz (1978)
have indicated that a slight change in the formulation returns it to quadratic convergence.
define a new function u(x), that is, the ratio of the function to its derivative
𝑓(𝑥)
𝑢 𝑥 = ′
𝑓 (𝑥)
Substitute 𝑢 𝑥 into formula of Newton-Raphson method
𝑓(𝑥)
𝑥𝑖+1 = 𝑥𝑖 − ′
𝑓 (𝑥)
𝑢(𝑥)
𝑥𝑖+1 = 𝑥𝑖 − ′
𝑢 (𝑥)
Then find 𝑢′ (𝑥)
57
𝑣 ′ 𝑣 ′ 𝑤−𝑣𝑤 ′
Let’s differentiate 𝑢 𝑥 using quotient rule = where 𝑣 𝑥 = 𝑓 𝑥 𝑎𝑛𝑑 𝑣 𝑥 =
𝑤 𝑤2
𝑓 ′ (𝑥)
𝑑𝑢 𝑓′ 𝑥 𝑓′ 𝑥 − 𝑓(𝑥) 𝑓′′ (𝑥)
=
𝑑𝑥 (𝑓′ (𝑥))2

Now let’s substitute 𝑢 𝑥 𝑎𝑛𝑑 𝑢′ 𝑥 𝑖𝑛𝑡𝑜 formula of Newton-Raphson method and simplify the
equation
𝑓(𝑥)
𝑓 ′ (𝑥)
𝑥𝑖+1 = 𝑥𝑖 − ′
𝑓 𝑥 𝑓 ′ 𝑥 − 𝑓(𝑥) 𝑓 ′′ (𝑥)
(𝑓 ′ (𝑥))2
𝑓(𝑥)(𝑓 ′ (𝑥))2
𝑥𝑖+1 = 𝑥𝑖 − ′
𝑓 (𝑥)(𝑓 ′ 𝑥 𝑓 ′ 𝑥 − 𝑓 𝑥 𝑓 ′′ 𝑥 )
𝑓(𝑥𝑖 ) 𝑓 ′ (𝑥𝑖 )
𝑥𝑖+1 = 𝑥𝑖 − ′
(𝑓 (𝑥𝑖 ))2 − 𝑓 𝑥𝑖 𝑓 ′′ 𝑥𝑖 58
Problem Statement. Use modified Newton-Raphson methods to evaluate the multiple
roots of 𝑓 𝑥 = 𝑥 3 − 6𝑥 2 + 11𝑥 − 6 with an initial guess of 𝑥0 = 0.8. Round your
answer to 4 decimal places.
Solution.
Calculate 𝑓 ′ (𝑥)
𝑓 ′ 𝑥 = 3𝑥 2 − 12𝑥 + 11
Calculate 𝑓 ′′ (𝑥)
𝑓 ′′ 𝑥 = 6𝑥 − 12

59
Iteration 1
𝑥0 = 0.8
Compute 𝑓 𝑥0
𝑓 𝑥0 = 𝑓 0.8 = 0.83 − 6𝑥0.82 + 11𝑥0.8 − 6 = −0.528
Compute 𝑓 ′ 𝑥0
𝑓 ′ 𝑥0 = 𝑓 ′ 0.8 = 3𝑥0.82 − 12𝑥0.8 + 11 = 3.32
Compute 𝑓 ′′ 𝑥0
𝑓 ′′ 𝑥0 = 𝑓 ′′ 0.8 = 6𝑥0.8 − 12 = −7.2
Now calculate the next approximation 𝑥1
𝑓(𝑥0 ) 𝑓 ′ (𝑥0 )
𝑥1 = 𝑥0 − ′
(𝑓 (𝑥0 ))2 − 𝑓 𝑥0 𝑓 ′′ 𝑥0
−0.528𝑥3.32
𝑥1 = 0.8 −
(3.32)2 −(−0.528)(−7.2)
𝑥1 = 1.0428
60
Iteration 2
Set 𝑥0 = 𝑥1 = 1.0428
Compute 𝑓 𝑥0
𝑓 𝑥0 = 𝑓 1.0428 = 1.04283 − 6𝑥1.04282 + 11𝑥1.0428 − 6 = 0.0802
Compute 𝑓 ′ 𝑥0
𝑓 ′ 𝑥0 = 𝑓 ′ 1.0428 = 3𝑥1.04282 − 12𝑥1.0428 + 11 = 1.7487
Compute 𝑓 ′′ 𝑥0
𝑓 ′′ 𝑥0 = 𝑓 ′′ 1.0428 = 6𝑥1.0428 − 12 = −5.7432
Now calculate the next approximation 𝑥1

61
𝑓(𝑥0 ) 𝑓 ′ (𝑥0 )
𝑥1 = 𝑥0 − ′
(𝑓 (𝑥0 ))2 − 𝑓 𝑥0 𝑓 ′′ 𝑥0
0.0802𝑥1.7487
𝑥1 = 1.0428 −
(1.7487)2 −(0.0802)(−5.7432)
𝑥1 = 1.0029
Therefore, after 2 iterations using Modified Newton-Raphson method with 𝑥0 = 0.8 the
approximate root of the equation is 0.7103.
Note. Although Modified Newton-Raphson method is preferable for multiple roots, it is
somewhat less efficient and requires more computational effort than the standard method
for simple roots.

62
Modified Secant Method
Rather than using two arbitrary values to estimate the derivative, an alternative approach
involves a fractional perturbation of the independent variable to estimate f (x)
𝑓(𝑥𝑖 +𝛿𝑥𝑖 ) − 𝑓(𝑥𝑖 )
𝑓 ′ (𝑥𝑖 ) ≈
𝛿𝑥𝑖
where δ = a small perturbation fraction. This approximation can be substituted into
Newton-Raphson method formula to yield the following iterative equation:
𝛿𝑥𝑖 𝑓(𝑥𝑖 )
𝑥𝑖+1 = 𝑥𝑖 –
𝑓(𝑥𝑖 +𝛿𝑥𝑖 ) − 𝑓(𝑥𝑖 )

63
Example: Use the modified secant method to estimate the root of f(x)=𝑥 4 - 𝑥 3 - 3𝑥 2 - x.
Use a value of 0.01 for δ and start with x = 2 and iterate up to third iteration, compute
the approximate error for each iteration.
Solution:
Iteration 1
x0 =2
𝑓(𝑥0 )= -6
x0 + δx0 =2.0200
𝑓(𝑥0 + 𝛿𝑥0 )= -5.8539
𝛿𝑥0 𝑓(𝑥0 )
𝑥1 =𝑥0 – = 2.8216
𝑓(𝑥0 +𝛿𝑥0 ) − 𝑓(𝑥0 )

64
Iteration 2
x0 = 2.8216
𝑓 𝑥0 = 14.2145
x0 + δx0 = 2.8498
𝑓(𝑥0 + 𝛿𝑥0 ) = 15.5992
𝛿𝑥0 𝑓(𝑥0 )
𝑥1 = 𝑥0 – = 2.5319
𝑓(𝑥0 +𝛿𝑥0 ) − 𝑓(𝑥0 )
2.5319− 2.8216
εa = x100% = 11.442%
2.5319

65
Iteration 3
x0 = 2.5359
𝑓(𝑥0 ) = 3.2189
x0 + δx0 = 2.5613
𝑓(𝑥0 + 𝛿𝑥0 ) = 3.9908
𝛿𝑥0 𝑓(𝑥0 )
𝑥1 = 𝑥0 – = 2.4301
𝑓(𝑥0 +𝛿𝑥0 ) − 𝑓(𝑥0 )
2.4301− 2.5359
εa = x100% = 4.3537%
2.4301

DLCO Unit-1
No ratings yet
DLCO Unit-1
38 pages
Motion Physics Module 4
No ratings yet
Motion Physics Module 4
143 pages
02 HLDD - CombinationalDesign
No ratings yet
02 HLDD - CombinationalDesign
187 pages
Electronics10 q3 Mod5 StepsinTrackingandAligningAM FMReceivers v3
50% (4)
Electronics10 q3 Mod5 StepsinTrackingandAligningAM FMReceivers v3
20 pages
Unit - 1
No ratings yet
Unit - 1
92 pages
Logic Design - Lec.1part 2
No ratings yet
Logic Design - Lec.1part 2
115 pages
Advanced Direct Digital Controller Manual
0% (1)
Advanced Direct Digital Controller Manual
50 pages
5 A - Number Systems Int Representation
No ratings yet
5 A - Number Systems Int Representation
86 pages
Chapter 5 Data Representation
No ratings yet
Chapter 5 Data Representation
80 pages
Quantization Error - Finite Word Length Effect Unit 4
No ratings yet
Quantization Error - Finite Word Length Effect Unit 4
30 pages
6 AM A Technology
No ratings yet
6 AM A Technology
103 pages
EC207 LCD Notes
No ratings yet
EC207 LCD Notes
46 pages
Ch1 Intro Approximation Errors
No ratings yet
Ch1 Intro Approximation Errors
41 pages
Unit 6 - Updated
No ratings yet
Unit 6 - Updated
37 pages
De Lecture - 02
No ratings yet
De Lecture - 02
49 pages
Programming - U1 (Basics of C++)
No ratings yet
Programming - U1 (Basics of C++)
23 pages
Block-1 (Operating System Fundamentals and Networking)
No ratings yet
Block-1 (Operating System Fundamentals and Networking)
108 pages
L4
No ratings yet
L4
29 pages
مقدمة للتحليل العددي محاضرة 1
No ratings yet
مقدمة للتحليل العددي محاضرة 1
29 pages
Kal Dayna 4
No ratings yet
Kal Dayna 4
16 pages
Week 4 - Tute Slides Week 4
No ratings yet
Week 4 - Tute Slides Week 4
49 pages
Week 2 M1Lessons 2-3
No ratings yet
Week 2 M1Lessons 2-3
41 pages
Strategic Management Practices and Performance of Parishes in The Catholic Diocese of Nyahururu
No ratings yet
Strategic Management Practices and Performance of Parishes in The Catholic Diocese of Nyahururu
18 pages
2 Data Representation
No ratings yet
2 Data Representation
67 pages
Programming - U2 (Array & String)
No ratings yet
Programming - U2 (Array & String)
38 pages
Programming - U3 (Custom Data Type)
No ratings yet
Programming - U3 (Custom Data Type)
38 pages
11 Number Systems
No ratings yet
11 Number Systems
53 pages
03 Relational Model and ER Diagrams
No ratings yet
03 Relational Model and ER Diagrams
139 pages
Number Systems
No ratings yet
Number Systems
45 pages
DLD Chapter2
No ratings yet
DLD Chapter2
64 pages
Chap-1 DLD
No ratings yet
Chap-1 DLD
41 pages
Lec9 Fixed Point Representation
No ratings yet
Lec9 Fixed Point Representation
19 pages
3 Lecture Three
No ratings yet
3 Lecture Three
31 pages
DLD Chapter - 2
No ratings yet
DLD Chapter - 2
26 pages
FAC1002 - Binary Number System
No ratings yet
FAC1002 - Binary Number System
31 pages
UTF-8''Digital Fundamentals Tenth Edition Floyd-Chapter 2
No ratings yet
UTF-8''Digital Fundamentals Tenth Edition Floyd-Chapter 2
49 pages
2 Error Analysis and Computer Arithmetic Errors in Computation
No ratings yet
2 Error Analysis and Computer Arithmetic Errors in Computation
13 pages
Aqua Security Ebook Container Security-10 Things DevOps Need To Do
No ratings yet
Aqua Security Ebook Container Security-10 Things DevOps Need To Do
12 pages
CH 2 Pink Floyd
No ratings yet
CH 2 Pink Floyd
46 pages
2023 Physiology Honours Prospectus Final
No ratings yet
2023 Physiology Honours Prospectus Final
11 pages
Fundamental E.E - CH-4 Pt-1 Transient Circut Analysis
No ratings yet
Fundamental E.E - CH-4 Pt-1 Transient Circut Analysis
16 pages
Binary
No ratings yet
Binary
8 pages
Number System
No ratings yet
Number System
33 pages
Fluidewell Profile
No ratings yet
Fluidewell Profile
12 pages
1st Sem Syllabus (ICT 416 Programming Concept With C)
No ratings yet
1st Sem Syllabus (ICT 416 Programming Concept With C)
5 pages
Chap 02
No ratings yet
Chap 02
16 pages
Topic #2 - Data Representation
No ratings yet
Topic #2 - Data Representation
61 pages
Using The Moody Diagram
No ratings yet
Using The Moody Diagram
6 pages
Chapter2 Numbering Systems and Conversions
No ratings yet
Chapter2 Numbering Systems and Conversions
67 pages
793f2 PDF
100% (1)
793f2 PDF
8 pages
Number Systems: Prof. Indranil Sen Gupta
No ratings yet
Number Systems: Prof. Indranil Sen Gupta
21 pages
Module 2 Roundoff and Truncation Errors
No ratings yet
Module 2 Roundoff and Truncation Errors
11 pages
R23 - EEE-CS & Syllabus Website
No ratings yet
R23 - EEE-CS & Syllabus Website
56 pages
One's and Twos
No ratings yet
One's and Twos
30 pages
Laboratory Activity 6 Curve Fitting 1
No ratings yet
Laboratory Activity 6 Curve Fitting 1
12 pages
Dobot Magician User Manual-Tutorial
100% (1)
Dobot Magician User Manual-Tutorial
119 pages
Special Powers of 2
No ratings yet
Special Powers of 2
47 pages
Numerical Analysis - Patel
No ratings yet
Numerical Analysis - Patel
16 pages
Python Part 1
100% (1)
Python Part 1
351 pages
Australian Orthodontic Wires
100% (1)
Australian Orthodontic Wires
5 pages
Littelfuse Varistor DC Application Varistor Design Guide PDF
No ratings yet
Littelfuse Varistor DC Application Varistor Design Guide PDF
25 pages
Module 3b
No ratings yet
Module 3b
9 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
Treatment Time Outcome and Patient Satisfaction
No ratings yet
Treatment Time Outcome and Patient Satisfaction
7 pages
Worksheet Ampu 2
No ratings yet
Worksheet Ampu 2
1 page
EconomicsforManagers PYQ 2019
No ratings yet
EconomicsforManagers PYQ 2019
1 page
Mahagenco Mom 1
No ratings yet
Mahagenco Mom 1
3 pages
Module 1 Data Rep
No ratings yet
Module 1 Data Rep
14 pages
Materials Today: Proceedings: Sahil Sharma, Umesh Kumar Vates, Amit Bansal
No ratings yet
Materials Today: Proceedings: Sahil Sharma, Umesh Kumar Vates, Amit Bansal
7 pages
Machine Level Representation of Data Part 3
100% (1)
Machine Level Representation of Data Part 3
32 pages
Integer Representation
No ratings yet
Integer Representation
34 pages
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
No ratings yet
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
15 pages
Subsonic Flow Characteristics
100% (1)
Subsonic Flow Characteristics
12 pages
Index: History Chain Drive Design Chain Failures and Analysis Conclusion Reference
No ratings yet
Index: History Chain Drive Design Chain Failures and Analysis Conclusion Reference
8 pages
Creating A Source List
No ratings yet
Creating A Source List
4 pages
Ceng252 Lecture1
No ratings yet
Ceng252 Lecture1
8 pages
Shear Reinforcement Design of Beam Column Joints
No ratings yet
Shear Reinforcement Design of Beam Column Joints
35 pages
Digital Electronics: Kwame Oteng Gyasi Lecture 2: Number System
No ratings yet
Digital Electronics: Kwame Oteng Gyasi Lecture 2: Number System
28 pages
Lec3-Intro To Number System
No ratings yet
Lec3-Intro To Number System
47 pages
Slide03 NumSys Ops Part1
No ratings yet
Slide03 NumSys Ops Part1
47 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
02 Numbersystems
No ratings yet
02 Numbersystems
56 pages
3 - Engine (NEF)
No ratings yet
3 - Engine (NEF)
76 pages
CHAPTER 2 Data Representations
No ratings yet
CHAPTER 2 Data Representations
16 pages
CSC2201 - Class Notes
No ratings yet
CSC2201 - Class Notes
97 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Digital Logic Design Introduction
No ratings yet
Digital Logic Design Introduction
35 pages
Number Systems and Codes
No ratings yet
Number Systems and Codes
78 pages
Number Systems, Operations, and Codes
No ratings yet
Number Systems, Operations, and Codes
50 pages
Extend Aeration RF
No ratings yet
Extend Aeration RF
27 pages
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
From Everand
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
Mourad Boufadene
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet

A) B) C) D) E) F)

Uploaded by

A) B) C) D) E) F)

Uploaded by

Number System and Numerical Error Analysis

Summing the weights where there are 1s, 25 + 23 + 22 + 20 =45

The sign bit is 1; therefore, the decimal number is -45.

−𝑏+ 𝑏 2 −4𝑎𝑐 −𝑏− 𝑏 2 −4𝑎𝑐 𝑏 2 −(𝑏 2 −4𝑎𝑐)

then adding and subtracting xu on the right-hand side:

Collecting terms yields

𝑓 (𝑥𝑢 )(𝑥𝑙 −𝑥𝑢 )

f(xr) =1.26323 + 4𝑥1.26322 – 10 = -1.6023

f(xr) =1.33883 + 4𝑥1.33882 – 10 = - 0.4304

• After 14 iterations εa falls below εs = 0.000001%, and the computation can be

You might also like