NPTELSolution of NonLinEqn
NPTELSolution of NonLinEqn
Contents
1 Introduction 2
3 Newtons Method 3
3.1 Univariate Newton Type Methods . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Multi-variate Secant or Wegstein Iterations . . . . . . . . . . . . . . . . . . . 4
3.3 Multivariate Newtons Method . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Damped Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.5 Quasi-Newton Method with Broyden Update . . . . . . . . . . . . . . . . . . 7
1
7 Summary 26
1 Introduction
Consider set of n nonlinear simultaneous equations of type
where x Rn and F(.) : Rn Rn represents a n1 function vector. This problem may have
no solution, an infinite number of solutions or any finite number of solutions. In the module
on Problem Discretization using Approximation Theory, we have already introduced a basic
version of the Newtons method, in which a sequence of approximate linear transformations
is constructed to solve equation (2). In this module, we develop this method further and also
discuss the conditions under which it converges to the solution. In addition, we discuss the
following two approaches that are frequently used for solving nonlinear algebraic equations:
(a) method of successive substitutions and (b) unconstrained optimization. Towards the end
of the module, we briefly touch upon two fundamental issues related to nonlinear algebraic
equations, namely (a) the (local) existence uniqueness of the solutions and (b) the notion of
conditioning of nonlinear algebraic equations.
Ax = G(x) (3)
h iT
G(x) = g1 (x) g2 (x) .... gn (x) (4)
such that the solution of equation (3) is also solution of equation (2). The nonlinear Equation
(3) can be used to formulate iteration sequence of the form
Ax(k+1) = G x(k) (5)
Given a guess x(k) ,the R.H.S. is a fixed vector, say b(k) = G x(k) , and computation of the
next guess x(k+1) essentially involves solving the linear algebraic equation
Ax(k+1) = b(k)
2
at each iteration. Thus, the set of nonlinear algebraic equations is solved by formulating a
sequence of linear sub-problems. Computationally ecient method of solving such sequence
of linear problems would be to use LU decomposition of matrix A.
A special case of interest is when matrix A = I in equation (5).In this case, if the set of
equations given by (3) can be rearranged as follows
xi = gi (x) for i = 1, 2, .....n (6)
then, the method of successive substitution can be implemented using either of the following
iteration schemes
Jacobi-Iterations
(k+1)
xi = gi [x(k) ] for i = 1, 2, .....n (7)
i = 1, 2, ....., n
Relaxation Iterations
(k+1) (k) (k)
xi = xi + k [gi (xk ) xi ] (9)
The iterations can be terminated when
(k+1)
x x(k)
< (10)
kx(k+1) k
A major advantage of the iteration schemes discussed in this section is that they do
not involve gradient calculations. However, these schemes are likely to converge if a good
initial guess can be generated, which is close to the solution. We postpone the discussion on
convergence of these methods to the end of this module.
3 Newtons Method
For a general set of simultaneous equations F(x) =0, it may not be always possible to trans-
form to form Ax = G(x) by simple rearrangement of equations. Even when it is possible,
the iterations may not converge. When the function vector F(x) is once dierentiable, New-
tons method provides a way to transform an arbitrary set of equations F(x) =0 to form
Ax = G(x) using Taylor series expansion. In this section we introduce univariate and
multivariate Newtons method and their popular variants.
3
3.1 Univariate Newton Type Methods
In the module on Problem Discretization using Approximation Theory, we have already
introduced the Newtons method. For the univariate case, i.e. for solving scalar equation
f (x) = 0 where x R, this reduces to the following iteration equation
f (x(k) )
x(k+1) = x(k) (11)
df (x(k) )/dx
Secant method is a variation of the univariate Newtons method, which is based on using
approximation of the function derivative using Taylor series approximation of f (x(k) ) in the
neighborhood the previous iteration x(k1) . Thus, term df (x(k) )/dx appearing in equation
(11) is approximated as follows
To initialize iterations by secant method, we need two initial guesses x(0) and x(1) . Note that
the secant method does not require f (x) to be a dierentiable function. The convergence
properties of the secant method can be improved if f(x) is continuous on [x(0) , x(1) ] and f (x(0) )
and f (x(1) ) have opposite sign. In such a case, there is at least one root of f(x) in the interval
[x(0) , x(1) ]. This bracketing of the root is continued through the iterations. Let us assume
that we have
f (x(k) ) > 0 and f (x(k1) ) < 0
When we move to compute x(k+2) , the iterations proceed in either of the two directions
x(k+1) x(k)
Case f (x(k+1) ) < 0 : x(k+2) = x(k+1) f (x(k+1) )
f (x(k+1) ) f (x(k) )
x(k+1) x(k1)
Case f (x(k+1) ) > 0 : x(k+2) = x(k+1) f (x(k+1) )
f (x(k+1) ) f (x(k1) )
This variation of secant method, known as regula falsi method, makes sure that at least one
root is bracketed in the successive iterates.
4
parameter. By this approach, the relaxation parameter is estimated by applying a secant
method (i.e. local derivative approximation) independently to each component of x [2]. Let
us define fi (x) as
fi (x) = xi gi (x)
(k)
and slopes si as follows
(k) gi (x(k) ) gi (x(k1) )
si = h i (13)
(k) (k1)
xi xi
for i = 1, 2, ..., n
(k) (k) (k)
where i = 1/ 1 si . To avoid large changes, the value of i is typically clipped
(k)
between 0 < i and typical value for is 5 [3].
5
equations " #
F x(k)
J(k) = ; F(k) = F x(k)
x
1 (k)
x(k) = J(k) F (15)
x(k+1) = x(k) + x(k) (16)
||F(k+1) || <
(k+1)
x x(k)
< (19)
kx(k+1) k
This approach is called as simplified Newtons method.
In practical problems this damped Newtons algorithm converges faster than the one that
uses (k) = 1. To see rationale behind this approach, consider a scalar function defined as
follows [2]
1
(x(k+1) ) = F(x(k+1) )T F(x(k+1) )
2
1
= F(x(k) + x(k) )T F(x(k) + x(k) )
2
6
1 (k)
where x(k) = J(k) F . Using Taylor series expansion in the neighborhood of x(k) ,
we have
(k+1) (k) 2 2
(x ) = (x ) + + + ...
2 2
(k) (k) T
(k) 2 (k) T 2
= (x ) + (x ) x + x (x(k) ) x(k) + ...
2
Now, it is easy to see that
" #T
F(x(k) ) T
(k)
(x ) = F(x(k) ) = J(k) F(k)
x
and
T
(x(k) )T x(k) = F(k) F(k) = 2(x(k) ) < 0
Now, if we let 0 in the Taylor series expansion, then
Thus, for suciently small , the Newtons step will reduce (x). This property forms the
basis of the damped Newtons method. A popular approach to determine the step length is
Armijo line search [2]. Algorithm for the damped Newtons method together with Armijo
line search (using typical values of tuning parameters and ) is listed in Table 1.
Remark 1 Note that the Newtons iterations can be expressed in form (3) as follows
" !#1
F(x(k) )
x(k+1) = x(k) F(x(k) ) = G(x(k) ) (20)
x
i.e. 1
F
G(x) = x F(x)
x
It is easy to see that at the point x = x where F(x ) = 0, the iteration equation (20) has a
fixed point x = G(x ).
7
Table 1: Damped Newtons Algorithm with Armijo Line SEarch
INITIALIZE: x(0) , 1 , 2 ,, k, kmax, , , 2
1 = F(0)
WHILE [( 1 > 1 ) AN D ( 2 > 2 ) AND (k < kmax )]
1 (k) T (k)
x(k) = J(k)T J(k) J F )
Set = 1, = 0.1, = 0.1
x(k+1) = (k)
hx + x
(k)
1
(k+1) 2
(k) 2 (k) 2 i
1 = 2
F x F ) + 2 F )
2
2 2
WHILE[ 1 > 0]
2
F(k) x(k) 2
q = 2 2
(2 1) kF(k) (x(k) )k2 + kF (x(k) + x(k) )k2
= max {, q }
=
x(k+1) = (k)
hx + x
(k)
2 2 2 i
1 = 1 F x(k+1) F(k) ) + 2 F(k) )
2 2 2 2
END WHILE
2= ||x(k+1) x(k) || / ||x(k+1) ||
k =k+1
END WHILE
8
only at the beginning of the iterations. By this approach, the Newtons step is computed as
follows [6]
" (0) #
F x
J(0) = (21)
x
1 (k)
x(k) = J(0) F (22)
x(k+1) = x(k) + x(k) (23)
There is another variation of this approach where the Jacobian is changed periodically but
at much less frequency than the iterations.
Alternatively, the quasi-Newton methods try to overcome the diculty associated with
Jacobian calculations by generating approximate successive Jacobians using function vectors
evaluated at previous iterations. While moving from iteration k to (k + 1), if ||x(k+1) x(k) ||
is not too large, then it can be argued that J(k+1) is close to J(k) . Under such situation
,we can use the following rank-one update of the Jacobian matrix
Here, y(k) and z(k) are two vectors that depend on x(k) , x(k+1) , F(k) and F(k+1) . To arrive
at the update formula, consider Jacobian J(k) that produces step x(k) as
In other words, both J(k) and J(k+1) will predict some change in direction perpendicular
to x(k) .
2. J(k+1) predicts for x(k) , the same F(k) in linear expansion, i.e.,
or
J(k+1) x(k) = F(k) (30)
9
Now, for vector r perpendicular to x(k) , we have
As
J(k+1) r = J(k) r (32)
We have
y(k) [z(k) ]T r = 0 (33)
Since x(k) is perpendicular to r, we can choose z(k) = x(k) . Substituting this choice of
z(k) in equation (24) and post multiplying equation (24) by x(k) , we get
which yields
(k) (k) (k)
F J x
y(k) = (36)
[[x(k) ]T x(k) ]
Thus, the Broydens update formula for the Jacobian is
[F(k) J(k) x(k) ][x(k) ]T
J(k+1) = J(k) + (37)
[[x(k) ]T x(k) ]
This can be further simplified as
10
is taking place. The steady state model for a non-isothermal CSTR is given as follows :
F E
0 = (CA0 CA ) k0 exp( )CA
V RT
F (Hr ) k0 E Q
0 = (T0 T ) + exp( )CA
V Cp RT V Cp
b+1
aFc
Q = (T Tcin )
aFcb
Fc +
2c Cpc
Model parameters are listed in Table 1, Example 4 in Module on Solving Nonlinear ODE-
IVPs. The set of model parameters considered here correspond to the stable steady state. The
problem at hand is to find steady state CA = 0.265 and T = 393.954 starting from an initial
guess for the steady state. Figure 1 to 4 show progress of Newtons method, Newtons method
with initial Jacobian, Newtons method with Broyden update and Damped Newtons method
with dierent initial conditions. It may be observed that the Damped Newtons method exhibits
most well behaved iterations among all the variants considered. The iterations remain within
physically meaningful ranges of values. It also ensures smoother convergence to the solution
(see Figure 4). The Newtons method with initial Jacobian also seems to converge in all the
cases with significantly slow rate of convergence and large number of iterations. It may be
noted that Newtons method with Broydens update and base Newtons method can move into
infeasible regions of the state space, i.e. -ve concentrations or absurdly large intermediate
temperature guesses (see Figure 3 and 4). For the cases demonstrated in the figures, these
methods are somehow able to recover and finally converge to the solution. However, such a
behavior can potentially lead to divergence of iterations.
11
Figure 1: CSTR Example: Progress of iterations of variants of Newtons method - Initial
Condition 1
12
Figure 3: CSTR Example: Progress of iterations of variants of Newtons method - Initial
Condition 3
13
F(x)
Let us ignore the degenerate case where matrix is singular and vector F(x) belongs
x
to the null space of this matrix. Then, the necessary condition for optimality is satisfied when
F(x) = 0 (43)
which also corresponds to the global minimum of (x) . The resulting nonlinear optimiza-
tion problem can be solved using the the conjugate gradient method or Newtons optimiza-
tion method. Another popular approach to solve the optimization problem is Leverberg-
Marquardt method, which is known to work well for solving nonlinear algebraic equations.
In this section, we present details of the conjugate gradient method and the Leverberg-
Marquardt method, which is based on the Newtons method.
The gradient based methods have a definite advantage over the Newtons method in the
situations where J(k) become singular. The computation of the search direction does not
involve inverse of the Jacobian matrix.
14
4.2 Newton and Quasi-Newton Methods
The gradient based methods tend to become slow in the neighborhood of the optimum. This
diculty can be alleviated if local Hessian can be used for computing the search direction.
The necessary condition for optimization of a scalar function (x) is
(x) = 0 (44)
x(k+1) = xk + k xk
where k is the step length parameter. In order that x(k) is a descent direction it should
satisfy the condition
T
[x(k) ] x(k) < 0 (46)
or
T (k) 1
[x(k) ] H [x(k) ] > 0 (47)
i.e. in order that x(k) is a descent direction, Hessian H(k) should be a positive definite
matrix. This method has good convergence but demands large amount of computations
i.e. solving a system of linear equations and evaluation of Hessian at each step. The steps
involved in the Newtons unconstrained optimization scheme are summarized in Table 2.
Major disadvantage of Newtons method is the need to compute the Hessian of each iter-
ation. The quasi-Newton methods overcome this diculty by constructing an approximate
Hessian from the gradient information available at the successive iteration. One of the widely
used algorithm is of this type is variable metric (or Devidon - Fletcher- Powell) method.
Let us define matrix
L = H1
15
Table 2: Newtons Method for Unconstrained Optimization
INITIALIZE: x(0) , , kmax , (0)
k=0
= 100
WHILE [( > ) AN D (k < kmax )]
H(k) s(k) = (k)
min (k)
k = x s(k)
(k+1)
x = x(k) k s(k)
= x(k+1) 2
END WHILE
starting from some initial guess, usually a positive definite matrix. Typical choice is L(0) = I.
16
Table 3: Table Caption
INITIALIZE: x(0) , , kmax , (0)
k=0
= 100
WHILE [( > ) AND (k < kmax )]
STEP 1 : Compute H(k) and [x(k) ]
STEP 2 : Solve H(k) + k I s(k) = [x(k) ]
IF [x(k+1) ] < [x(k) ]
(k+1) = 12 (k)
= [x(k+1) ]
k =k+1
ELSE
(k) = 2 (k)
GO TO STEP 2
END WHILE
e 4 )is
Here is used to set the search direction. To begin the search, a large value of (=10
selected so that
(0)
H + 0I = e [ 0 I]
Thus, for suciently large , the search direction s(k) is in the negative of the gradient
direction i.e. (0) / 0 . On the other hand, when k 0 we have s(k) goes from steepest
descent to Newtons method. With intermediate values of , we get a step that is intermediate
between the Newtons step and gradient step. The steps involved in the implementation of
the basic version of Leverberg Marquardt algorithm are summarized in Table 3. The main
advantage of Leverberg-Marquardt method is simplicity and excellent convergence in the
neighborhood of the solution. However, it is necessary to compute Hessian matrix, H(k) and
set of linear equations has to be solved many times at each iteration before fixing s(k) . Also,
when k is large initially, the step size s(k) = (k) / k is small and this can result in slow
progress of the iterations.
17
5 Condition Number of Nonlinear Set of Equations [7]
Concept of condition number can be easily extended to analyze numerical conditioning of
set on nonlinear algebraic equations. Consider nonlinear algebraic equations of the form
F(x, u) = 0 ; x Rn , u Rm
F(x+x, u+u) = 0
sup ||x||/||x||
C(x) =
u ||u||/||u||
||x||/||x||
C(x)
||u||/||u||
If the solution does not depend continuously on u, then the C(x) becomes infinity and such
systems are called as (numerically) unstable systems. Systems with large condition numbers
are more susceptible to computational errors.
eu+u eu
x/x = u
= eu 1
e
sup eu 1
C(x) = u
u u
C(x) = |u|
18
6 Existence of Solutions and Convergence of Iterative
Methods [12]
If we critically view the methods presented for solving equation (2), it is clear that this prob-
lem, in general, cannot solved in its original form. To generate a numerical approximation
to the solution of equation (2), this equation is further transformed to formulate an iteration
sequence as follows
x(k+1) = G x(k) ; k = 0, 1, 2, ...... (52)
(k)
where x : k = 0, 1, 2, ...... is sequence of vectors in vector space under consideration.
The iteration equation is formulated in such a way that the solution x of equation (52) also
solves equation (2), i.e.
x = G [x ] F(x ) = 0
For example, in the Newtons method, we have
1
F
G [x] F(x) F(x)
x
which defines a mapping from a Banach space X into itself, i.e. G(.) : XX.
for all x(1) , x(2) U(x(0) , r). The quantity is called contraction constant of G on U(x(0) , r).
kG(x) G(e
x)k kxe
xk with 0 < 1 for all x,e
xS
19
When the map G(.) is dierentiable, an exact characterization of the contraction property
can be developed.
Lemma 6 Let the operator G(.) on a Banach space X be dierentiable in U(x(0) , r). Oper-
ator G(.) is a contraction of U(x(0) , r) if and only if
G
(0)
x < 1 for every x U(x , r)
The contraction mapping theorem is stated next. Here, x(0) refers to the initial guess
vector in the iteration process given by equation (53).
Theorem 7 [12, 9] If G(.) maps U (x(0) , r) into itself and G(.) is a contraction mapping on
the set with contraction constant , for
r r0
1
G x(0) x(0)
r0 =
1
then:
2. x is unique in U (x(0) , r)
3. The sequence x(k) generated by equation x(k+1) = G x(k) converges to x with
(k)
x x k x(0) x
converges to x with
(k)
x x k x(0) x
The proof of this theorem can be found in Rall [12] and Linz [9].
20
Example 8 [9] Consider simultaneous nonlinear equations
1 1
z + y2 = (54)
4 16
1 1
sin(z) + y = (55)
3 2
We can form an iteration sequence
1 1 (k) 2
z (k+1) = y (56)
16 4
1 1
y (k+1) = sin(z (k) ) (57)
2 3
Using norm In the unit ball U(x(0) = 0, 1) in the neighborhood of origin, we have
(i) (j) 1 (i) 2 (j) 2 1
G x G x = max y y , sin(x ) sin(x )
(i) (j)
(58)
4 3
1 (i)
(j) 1 (i)
(j)
max y y , x x (59)
4 3
1x(i) x(j)
(60)
2
Thus, G(.) is a contraction map with = 1/2 and the system of equation has a unique
solution in the unit ball U(x(0) = 0, 1) i.e. 1 x 1 and 1 y 1. The iteration
sequence converges to the solution.
x 2y 2 = 1 (61)
2
3x y = 2 (62)
is not a contraction mapping near (1,1) and the iterations do not converge even if we start
from a value close to the solution. On the other hand, the rearrangement
q
x(k + 1) = (y (k) + 2)/3 (65)
q
y (k+1) = (x(k) + 1)/2 (66)
is a contraction mapping and solution converges if the starting guess is close to the solution.
21
6.1 Convergence of Successive Substitution Schemes [4]
Either by successive substitution approach or Newtons method, we generate an iteration
sequence
x(k+1) = G(x(k) ) (67)
which has a fixed point
x = G(x ) (68)
at solution of F(x ) = 0. Defining error
Substituting in (69)
(k+1) G
e = e(k) (72)
x x=x(k)
where
e(k) = x(k) x
and using definition of induced matrix norm, we can write
||e(k+1) ||
G
< (73)
(k)
||e || x x=x(k)
It is easy to see that the successive errors will reduce in magnitude if the following condition
is satisfied at each iteration i.e.
G
< 1 for k = 1, 2, .... (74)
x
x=x(k)
Applying contraction mapping theorem (refer to Appendix A for details), a sucient condi-
tion for convergence of iterations in the neighborhood x can be stated as
G
x 1 < 1
1
or
G
x < 1
22
Note that this is only a sucient conditions. If the condition is not satisfied, then the
iteration scheme may or may not converge. Also, note that introduction of step length
parameter (k) in Newtons step as
kG(x) G(e
x)kp kJkp kxe
xkp
and if kJkp < 1 holds in the region of interest, then G(.) is a contraction mapping with L =
kJkp . Also, note that, for 2 norm, the following inequality can be used
kG(x) G(e
x)k2 kJkF RO kxe xk2
" #1/2
X
kJkF RO = (Ji,j )2
i,j
23
Further, kJk1 = 12 , kJk = 1 and kJkF RO = 13/6. The G(.) is a contraction mapping
for norms k.k1 and k.k2 in region S. From contraction mapping theorem, it follows that the
system of equations x = G(x) has a unique solution in region S.
2 F(x(0) )
in a closed ball U(x(0) , 2 0 )
x2
h0 = 0 0 < 1/2
exists for all k 0 and converges to the solution of F(x) =0, which exists and is unique in
U(x(0) , 2 0 ).
F (x) F x(0) 2
(0)
sup F x + (1 )x (0)
x x
x x 01 x2
x x(0)
24
Figure 5: CSTR Example: Basins of convergence for dierent variants of Newtons method.
Green dots represent initial conditions that lead to convergence of iterations while the black
cross represents the solution.
1
F(x(0) )
Multiplying by x
on both the sides, we have
" (0) #1 " (0) #1
(0)
F x F (x) F x F x
x x(0)
x x x x
When conditions of the Kantorovic theorem are satisfied, it follows that
" (0) #1
(0)
F x F (x) F x
2 0 0 < 1
x x x
The term on the L.H.S. of the inequality represents the magnitude of relative change in the
Jacobian of operator F(.) in ball U (x(0) , 2 0 ). The Kantorovic theorem asserts that Newtons
method converges if the relative change of the Jacobian in ball U(x(0) , 2 0 ) is less than 100%.
Example 12 Basins of Attraction for CSTR Example: The CSTR system described
in Example 2 was studied for understanding convergence behavior of variants of Newtons
method. Iterations were started from various initial conditions in a box around the steady
state solution and progress of iterations towards the solutions was recorded. Figure 5 compares
sets of initial conditions starting from which the respective methods converge to the solution.
In each box, green dots represent initial conditions that lead to convergence of iterations. As
evident from this figure, Newtons method with the initial Jacobian has a smaller basin of
convergence. Damped Newtons method appears to have largest basin of convergence.
25
7 Summary
In these lecture notes, we have developed methods for eciently solving nonlinear algebraic
equations. These methods can be classified as derivative free and derivative based methods.
Issues related to existence and uniqueness of the solutions and convergence of the iterative
schemes have also been discussed briefly.
References
[1] Bazara, M.S., Sherali, H. D., Shetty, C. M., Nonlinear Programming, John Wiley, 1979.
[3] Gupta, S. K.; Numerical Methods for Engineers. Wiley Eastern, New Delhi, 1995.
[4] Gourdin, A. and M Boumhrat; Applied Numerical Methods. Prentice Hall India, New
Delhi.
[5] Strang, G.; Linear Algebra and Its Applications. Harcourt Brace Jevanovich College
Publisher, New York, 1988.
[8] Linfield, G. and J. Penny; Numerical Methods Using Matlab, Prentice Hall, 1999.
[9] Linz, P.; Theoretical Numerical Analysis, Dover, New York, 1979.
[11] Rao, S. S., Optimization: Theory and Applications, Wiley Eastern, New Delhi, 1978.
[12] Rall, L. B.; Computational Solutions of Nonlinear Operator Equations. John Wiley,
New York, 1969.
26