Traditional Optimization Methods: An Overview
A. D. Bhatt
1 Introduction
There are various books available for a detailed study of optimization methods in
engineering. However, in the present article, a brief description of a number of popular
optimization algorithms used in engineering design and decision–making problems is
presented. Algorithms were taken from the book, “Optimization for Engineering Design:
Algorithms and Examples” (Deb, 1995). Some parts of these notes were taken from the
class-notes for a course in optimization techniques at I.I.T. Kanpur (Dr. Kalyanmoy
Deb).
Initially, a traditional non-linear optimization problem is described. Since single variable
optimization techniques are useful in finding out multi-variable problems, interval
halving method is described in brief. Direct search methods and A feasible direction
method described in detail. Finally, a mathematical nonlinear programming problem is
solved using the penalty function method and the feasible direction method.
2 Optimization Methods
Optimization methods are finding increasing popularity in engineering design activities
both in industries and academics, partly because of the availability and affordability of
computers. There exist a number of optimization software packages that are now being
used in industries and academia alike. Since the general interest is not to have any
feasible design but to have a competitive design (a design that is optimal in some sense),
some CAD software packages are also coming equipped with optimization routines.
A typical optimization problem generally known as non linear programming problem
(NLP)) is stated in terms of the design (or decision) variables, x = {x1, x2 , ......., xN}T
Minimize f(x)
Subject to gj (x)≥0 j = 1,2, ……, J
hk (x)= 0 k = 1,2,…....., K (1)
xiL ≤ xi ≤ xiU i = 1,2,…....., N
The objective function f(x) is usually the overall cost involved or overall energy required
or the total profit achievable or a combination of some objectives, commonly known as
objective function. The above NLP is an minimization problem, even though an
maximization problem can also be represented by above problem using the duality
principle which states that the problem of maximizing a constrained objective F(x) is the
same as the problem of minimizing –F(x).Usually an NLP contains a number of
inequality and equality constraints, which arise from space limitations or from physics of
the problem, or from critical engineering analysis of the problem. In the above problem, J
inequality (greater-than-equal to type) and K equality constraint are considered. In most
situations, the lower and upper limits for each variable are usually known. Thus, the
above NLP represents a problem of finding a set of N decision variables (within limits
specified by variables bound) so that the objective function f(x) is minimized and all
inequality and equality constraints are satisfied. A solution that satisfies all constraints
and variable bounds is called a feasible solution and other solutions are called infeasible
solutions. An inequality constraint gj(x) is said to be active if gj (x) =0.
There are large number of optimization algorithms, most of them can be broadly
categorized into two groups:
• Direct search method
• Gradient-based methods
The direct-search methods only use objective function values and do not use any gradient
information and are therefore more flexible and can be used to a broad class of problems.
Gradient-based methods require derivative information of the objective function and/or
constraints. Most of these optimization methods begin with an initial point, say x (0) , this
initial guess may be in the feasible or infeasible region depending on the algorithm, and
then improve the solution through a series of successive iterations. Usually a search
direction d is used from the starting point x(0) to find a point that corresponds to a
minimum function value in that direction by using a one-dimensional search technique. If
the new point found is x(1) , a new search direction is found at this point and another one-
dimensional search is performed to locate the best point in that direction. This process is
continued for a number of iterations until a set of predefined convergence criteria are
satisfied. This procedure is illustrated in figure 1.Optimization literature contains a
number of one-dimensional search methods , which again can be classified into two
groups depending on whether gradient information is used or not used. Most optimization
algorithms vary in the way the search direction d is chosen.
2.1 Direct Search Methods
Direct search methods vary in the way the search directions are used. Some methods use
search directions that depends on the solution to previous iterations and some methods
use random search directions. Powell’s method (Ravindram, Ragsdell, & Ravindram,
1983) use conjugate directions whereas Hooke and Jeeves method use a history of
previous points to create search directions adaptively. Random search methods are simply
in principle and in general require more function evaluations to a desired accuracy than
non-random methods.
The most popular of the direct search methods are transformation methods where the
constrained NLP problem is transformed into an unconstrained optimization problem by
explicitly penalizing the objective function for any constrained violation. The NLP stated
in equation 1 can be transformed into following unconstrained function:
J K
∑ ∑[h ( x)]
1 2
P ( x, R ) = f ( x ) + R 1 / g j ( x) + k
j =1
R k =1
An unconstrained optimization procedure (Hooke and Jeeves method , for example ) is
performed in successive stages for different values of R, each time using the solution
obtained in previous stage as the initial point for the current stage. The value of R is
usually started with a large value and reduced to zero. Because of complexities in other
Feasible Region
d(2)
O
Optimum
d(1)
(2)
x
x(1)
Search Space
Figure 1: A typical optimization process. The initial point is x(1) and the initial search
direction is d(1). The search process continues by finding an optimal point ( x(2)) along d(1)
and then searching along another direction d(2) from x(2) . (Courtesy: Dr. K. Deb).
methods are very popular in engineering applications ( Reklaitis , Ravindran & Ragsdell ,
1983).
Algorithm:
Step 1
Define N,ε, x(0),R(0),C ,and a suitable , t=0.
Step 2
Form P(x(t) , R(t) ) = f( x(t) + ( R(t) , g(x(t)) , h(x(t))).
Step 3
Find x(t+1) such that P(x(t+1)) , R(t) is a minimum for a fixed value of R(t) . Use ε2 to
terminate the unconstrained search.
Step 4
Is P(x(t+1) , R(t) - P(x(t) , R(t-1))<= ε3 ?
If Yes , set xT = x(t+1) and terminate
Else go to step 5.
Step5
Choose R(t+1) = C. R(t) according to a prescribed update rule. Set t = t+1 .Go to
Step 2.
2.2 Gradient-based Methods
Even though gradient-based methods require derivative information exactly, the
derivatives can be evaluated numerically. Fletcher and Reeves method uses conjugate
gradients as directions and Davidon-Flether-Powell (DFP) method uses Newton’s method
by adaptively approximating the underlying Hessian matrix.
Figure 2:Any direction in the hatched region is a descent direction. (Courtesy: Dr. K.
Deb).
The gradient vector can be computed numerically using the central difference technique.
In a N-variable objective function, the gradient vector at any point x(t) (represented by ∇f
(x(t)) can be calculated as follows:
∂f ∂f ∂f T
∇f ( x ( t ) ) = ( , ,.......... ) x (t )
∂x1 ∂x2 ∂x
The first-order partial derivatives can be calculated using the following equation:
f (x(t)+∆x(t) ) – f (x(t)- ∆x(t))
(t)
f ’(x ) =
2∆x(t)
The parameter ∆x(t) is usually a small value, which can be assigned as follows:
(t )
0 . 01 x ( t ) ,if x ( t ) > 0 .01
∆x =
0 . 01 , otherwise
There exists a number of gradient-based methods that forces the search directions to be
such that the objective function value improves in that direction. If gradient information
is available, such descent search directions are easy to obtain. The function value
increases the most in the direction ∇ f(x(t)). For NLP stated in equation 1, adescent
direction, d, would be such that ∇ f(x(t)) T d ≤ 0. Figure 2 shows the possible regions
for a direction to be descent. Any direction vector that falls in the hatched region makes a
negative dot product with the gradient at point x(t). Thus, any point along any of these
directions is going to have a smaller function value that at the point x(t) . On the other
hand, for inequality constraints the feasible search directions are such for which ∇
g(x(t))T ≥ 0. Both these conditions can be derived by using only first two terms in
Taylor’s series expansion of objective function and constraints(Arora, 1983).
Zoutendijk’s feasible direction method require the search directions to be not only
feasible but also to be descent. A pseudo-code for this algorithm is described in the
following(Reklaitis, Ravindran & Ragsdell,1989):
Algorithm:
Step 1
Let t=1. At a given feasible point x(t), let I (t) be the set of indices of active
inequality constraints at x(t) , within some tolerance ε :
I (t)={ j : 0 ≤ gj (x(t))≤ ε ,j=1,2,….,J }
If I (t) is empty use θ(t) = d (t) = -∇ f(x(t)). Go to step 3.
Step 2
Solve the following Linear Program:
Maximize θ
Subject to ∇ f(x(t))T d ≤ -θ
∇ gj(x(t))T d ≥ θ j ∈ I (t)
-1 ≤ di ≤ 1 i = 1,2,….,
Label the solution d(t) and θ(t) .
Figure 3: Three points used in interval halving method.
Step 3
If θ(t) ≤ ε, terminate
Else α = min[α : gj(x(t) + α d(t)) = 0, j = 1,2,……,J and α > 0 ]
If no α > 0 exists, set α = ∞.
Step 4
Find α(t) using a one-dimensional search such that f (x(t) + α d(t)) is minimum in
the interval (0 , α ). Set x(t+1)= x(t) + α(t) d(t) and go to step 1
The algorithm starts with a feasible point. At this point, a search direction that is
maximally descent as well as feasible is found using linear programming technique
(Dantzig, 1963). The limits along this direction are found by checking the feasibility of
all constraints. A one-dimensional search is then performed from the current point to find
a solution which is best in that direction. Tthis algorithm is illustrated by taking an
example NLP problem a little later. At this point a discussion about one-dimensional
optimization method is appropriate..
2.3 One-dimensional Optimization
Among the direct search methods, region-elimination methods are mostly used. By
assuming that the objective function is unimodal, objective function values at two points
are compared and some portion of the search space is eliminated. The interval-halving
method works with this philosophy by calculating function values at three equi-spaced
points and by comparing them(Figure 3). The algorithm is stated in the following (
Reklaitis, Ravindran & Ragsdell, 1989):
Algorithm:
Step1
Choose the search interval (a,b). Let xm = (a+b)/2 and L= b-a. Compute f (xm).
Step 2
Set x1=a + L/4 and x2 = b-L/4. Compute f (x1) and f (x2).
Step 3
If f (x1) < f (xm), set b = xm and xm = x1 ( eliminate the region (xm , b)).
Go to step 5.
Else go to step 4.
Step 4
If f (x2) < f (xm), set a = xm and xm = x2(eliminate the region (a , xm)).
Else set a = x1, b = x2 ( eliminate the regions (a , x1) and (x2 , b))
Step 5
Calculate L= b-a. If | L | is small, terminate
Else go to step 2.
Among the gradient-based techniques, Newton-Raphson method and bisection methods
are popular. A cubic search method that approximates the objective function using a
cubic function constructed using function values and derivatives at two distinct points is
also very popular. In the cubic search method, the minimum of the approximating cubic
function is used as an estimate for the objective function. The objective function is
successively approximated by cubic functions at new end improved (hopefully) points to
iteratively find the minimum of the objective function (Rao, 1978).
3 An Example Problem
In this section, an NLP problem using penalty function method and feasible direction
method outlined in the previous section is solved. The example illustrates how
optimization algorithm can be used iteratively to solve an engineering decision-making
problem.
Consider the following mathematical NLP problem:
Minimize (x1 – 1.5)2 + ( x2 – 4)2
Subject to 4.5x1 + x22 – 18 ≤ 0
2x1 – x2 – 1 ≥ 0
x1 , x2 ≥ 0
Since the first constraint is a less-than-equal-to type, first of all convert the above
problem in NLP form stated in equation 1. Thus the above problem in NLP form is given
as follows:
Minimize f (x) = (x1 – 1.5)2 + (x2 – 4)2
Subject to g1(x) = -4.5x1 – x22 +18 ≥ 0
g2(x) = 2x1 – x2 -1 ≥ 0
x1 , x 2 ≥ 0
The objective function f (x) and feasible region is shown in figure 4. It is clear from the
figure that the optimum lies at x* = (2,3)T and the optimal function value f * = 1.25. Now
use penalty function method and Zoutendijk’s algorithm to try to solve this problem.
Figure 4: The feasible region and the optimum point are shown. (Courtesy: Dr. K. Deb).
3.1 Penalty function method
The NLP problem has one objective and four constraints:
g1(x) = 18 – 4.5x1 + x22 ≥ 0
g2(x) = 2x1 – x2 -1 ≥ 0
g3(x) = x1≥ 0
g4(x) = x2≥ 0
In this problem, all constraints are inequality-type. Use bracket operator penalty function
to take care of the constraints. The bracket operator penalty function is an exterior
penalty function and is defined as follows:
Ω ( x ) = { R .g ( x )
0 ,
2
, if g(x)
otherwise.
< 0
For each constraint, add such penalty for constraint violation and add them to the
objective function:
4
P ( x, R ) = f ( x ) + ∑ Ω( g
j =1
j ( x )).
Use the penalty function algorithm described in the previous section to try to solve the
NLP problem. The unconstrained optimization algorithm used in this example is
Davidon-Fletcher-Powell ( DFP ) method with successive cubic interpolation as the one-
dimensional search method.
Step 1
Begin with a small penalty coefficient, R(0) =0.001 and an update factor C = 10.
Let x(0) = (5, 5)T ( function value f (x(0)) = 13.25 ). At this point, Only one
constraint is violated: g1(x(0)) = - 29.5 and g2( x(0))= 4.0. Thus the penalized
function value is P(x (0),0.001) =14.12. The termination parameters used are all
10-2. Also set an iteration counter t = 0.
Step 2
The objective function used for unconstrained minimization is that given in
equation 9 with R = 0.001.
Step 3
The unconstrained minimization routine produced a solution x(1) = (1.48,3.94)T
(function value f (x(1)) = 0.0367 ) after five iterations. The amount of constraint
values are as follows: g1(x(1)) = -4.18, g2(x(1)) = -1.98, g3(x(1)) = 1.48, and g4(x(1))
= 3.94. Thus two constraints are still violated. The penalized function value is
P(x(1),0.001) = 0.058. The proceedings of the algorithm is shown in figure 5.
Step 4
The termination criterion is not satisfied ( P (x(1), 0.001 ) = 0.058 is very small
compared to ( P x(0), 0.001) = 14.12). Thus move to step 5.
Step 5
New value of R is R(1) = C R(0) = 0.01. Set t = 1.
Figure 5: An iteration of penalty function method are shown on a contour plot .R(0)
=0.001 is chosen. Two independent runs converge to the same point at the end of
iteration (Courtesy: Dr. K. Deb).
Step 2
Form the new unconstrained function using equation 9 with R = 0.01.
Step 3
Use x(1) = (1.48, 3.94)T as the initial point for the unconstrained minimization
using DFP method. At this point the penalized function value is P(x(1), 0.01)m =
0.251. The final solution is x(2) = (1.42, 3.76)T with an objective function value
f(x(2)) = 0.0641. Even though the function value has increased from ht initial
point, the amount of constraint violation has decreased: g1(x(2)) = -2.53, g2(x(2)) =
-1.92, g3(x(2)) = 1.42, and g4(x(2)) = 3.76. The penalized function value is P(x(2),
0.01) = 0.165, an improvement from the initial solution.
Step 4
Here, the difference between P(x(2), 0.01) = 0.165 and P(x(1), 0.001) = 0.058 is not
small. So continue with step 5.
Figure 6: Iteration of penalty function method are shown on a contour plot. Two
independent runs are shown . (Courtesy: Dr. K. Deb).
Set new R(2) = 0.1, set t = 2 and go to step 2.
Continue with one more iteration and find a point x(3) = (1.89,3.08)T with function value
f(x(3)) = 0.991 and following constraint values: g1(x(3)) = 0.009, g2(x(3)) = -0.30, g3(x(3)) =
1.89, and g4(x(3)) = 3.08. This point is very close to the actual solution to the NLP
problem. The proceedings of the entire algorithm is shown in figure 6. The figure shows
proceedings from another initial point x (0) = (0, 0)T. It is important to point out that as the
penalty coefficient is increased, the shape of the penalized objective function changes.
This figure also depicts that the original spherical objective function has twisted to a non-
spherical shape. Since at every iteration the previous-best point is used as an initial point
to the new unconstrained function minimization, the convergence to a local minimum
may be achieved if a gradual but small distortion of the function is performed. Now solve
the same problem using feasible direction method outlined in the previous section.
3.2 Feasible direction method
Feasible direction method is very different in principle than the penalty function method.
Feasible direction method directly uses the gradient information of the function and the
constraints.
Step 1
Let us assume that the initial point be x(1) = (1, 1)T. At this point the constraint
g1(x) is not active, but g2(x) is active, in other words g2(x (1)) = 0. Thus I (1) = {2}.
Step 2
Form a linear program (LP) using gradients of f(x) and g2(x). By taking the
derivatives., observe that ∇f(x) = (2(x1-1.5), 2(x2-4)) T and ∇g2(x) = (2, -1) T. Thus
form the following LP:
Maximize θ
Subject to −d 1 − 6d 2 ≤ −θ
2d 1 − d 2 ≥ θ
−1 ≤ d 1 , d 2 ≤ 1
In order to solve above LP for optimal values for d1, d2, and θ, a number of fix-
ups are required. First of all, observe that the variable bounds on d1 and d2 are not
suitable for them to be used in a LP, because in a LP, all variables must take
positive values only. Use two variables ti = di + 1 for i = 1, 2 instead of d1 and d2
in order to take care of the variable bounds. Thereafter, introduce slack variables
for each inequality constraint and artificial variables for generating basis
Figure 7: The intermediate points found at various stages of the optimization algorithm
are shown. The figure shows that the solution is approaching the true optimum point .
(Courtesy: Dr. K. Deb).
variables (Dantzig, 1963). Thus the LP problem becomes as follows:
Maximize θ
Subject to t1 + 6t 2 − θ − y1 + y 5 = 7
2t1 − t 2 − θ − y 2 + y 6 = 1
t1 + y3 = 2
t2 + y4 = 2
t1 , t 2 , y1 , y2 , y3 , y4 , y5 , y6 ≥ 0
The solution to this LP can be found by using dual phase method (Dantzig, 1963)
by first using an objective function (y5+y6) for minimization and then using the
above LP without artificial variables. The solution obtained after five iterations of
LP is d (1) = (1, 1/7)T and θ(1) = 13/7. The direction vector is shown in figure 7.
Step 3
Since θ(1) > 0, continue with the algorithm. The next step is to find out the bound
along the direction d (1) at which the solution becomes infeasible. Any point along
d (1) can be represented as follows:
x (α ) = x (1) + αd (1)
Substituting this expression in the constraints, obtain g1 (α) = 0 for α 1 = 2 . 583 and
g2 (α) = 0 for α 2 = 0 . Even though these values are found by directly solving
the underlying equations, in practice these values can be obtained by using a root-
finding algorithm (for example, Newton-Raphson method). Thus, the maximum
limit is α = 2.583 .
Step 4
The next step is to find a point x(α) given by equation 10 for α values in the
interval (0, 2.583) for which the objective function is minimum. Use interval-
halving method described in previous subsection to find that point.
Interval-Halving Method:
Step 1
a = 0 and b = 2.583. xm = (0+2.583) / 2. L = 2.583. For a given value of α,
x(α) is calculated using equation 10 and the objective function value is
calculated using f(x(α)). Thus, f (xm) = f (2.291, 1.184) = 8.553.
Step 2
x1 = 0+2.583 / 4 = 0.646 and x2 = 2.583-2.583 / 4 = 1.937. The objective
function values are f(x1) = f(1.646, 1.092) = 8.48 and f(x2) = f(2.937,
1.277) = 9.480.
Step 3
Since f(x1) < f (xm), eliminate the region (1.291, 2.583). Thus, b = 1.291
and xm = 0.646.
Step 5
L = 1.291 is not small.
Step 2
x1 = 0.323 and x2 = 0.969. The objective function values at these points are
f(x1) = f (1.323, 1.046) = 8.760 and f(x2) = f(1.969, 1.138) = 8.411.
Step 3
Since f(x1) > f (xm) go to step 4.
Step 4
Since f(x2) < f (xm), eliminate the region (0, 0.646). Thus, a = 0.646 and
xm = 0.969.
Step 5
L = 1.291-0.646 = 0.646 is not small. Go to step 2.
This process continues until the interval length on α is very small. The final
solution is found to be α* = 0.910 for which x(2) = (1.91, 1.13)T. The iteration
procedure is depicted in figure 8. This completes one iteration of Zoutendijk’s
method.
Step 1
At the new point x (2), none of the constraints are active (g1(x (2)) = 8.12, and
g2(x (2)) = 1.69). So, I (2) = φ. Thus, the new search direction is the steepest descent
direction, or d (2) = -∇f(x (2)) = (-0.82, 5.74) T.
Step 3
Any point along d (2) can be represented as follows:
x (α ) = x ( 2 ) + αd ( 2 )
In order to find maximum bound along d (2), set g1(x (α)) = 0 and g2(x (α)) = 0
and obtain α1 = 0.347 and α 2 = 0.229 . Thus, α = 0.229 .
Step 4
Perform another one dimensional search along d (2) in the α-interval (0, 0.229) to
find appoint at which the objective function is minimum. It turns out that the
optimal point is α * = 0.229 and x (3) = (1.722, 2.444) T.
In the next iteration, the constraint g2 is active and form another LP problem. The optimal
search direction d (3) = (0.595, 1) T is obtained by solving the LP problem. Searching for
minimum along d (3), obtain x (4) = (2.036, 2.972) T. Continue this process until no further
improvement is found. The iterations show that the optimization process approaching the
exact optima x* = (2.0, 3.0) T (Figure 7).
Figure 8:The intermediate steps of interval-halving method are shown (Courtesy: Dr. K.
Deb)..
In a real-world optimization problem, it is difficult to tell whether the obtained solution is
optimal or not. But there are at least two different procedures which can be used to check
the optimality of the obtained solution. The optimization procedure can be started with
different initial points in the search space. If the obtained solution is more or less the
same in each case, then the obtained solution can be considered as an optimal point by
locally perturbing the parameters of the problem and observing the so-called Langrange
multipliers or shadow costs (Reklaitis, Ravindran & Ragsdell, 1989).
4. Conclusion
There are numerous algorithms available for solving optimization problems. Most of
them are similar in principle but vary in the way the search directions are obtained. Some
of the methods begin with a point and a one-dimensional search is carried out along a
search direction to find the optimum in that direction. At this new optimum, another
search direction is used and the best point is found along the new direction. This process
continues till no further improvement is possible. Since algorithms vary in the way
directions are used, some algorithms are faster than others. Some algorithms use gradient
information to create a search direction and some methods use previous history of points
to create the search direction. The choice of this algorithm over that primarily depends on
experience and availability of software packages (Deb).
Though these classical optimization algorithms have been extensively used in many
engineering design and decision-making problems, there are a number of shortcomings.
Most of the popular methods require the gradient information, which may not be easy to
calculate (or not at all available) in many real world problems. Moreover, most of these
methods may converge to a locally optimal solution which may be very different than the
actual global optimal solution. The most difficult problems among all is probably that
none of these methods can be applied to a wide variety of problems. In order to alleviate
some of these problems, a couple non-traditional search and optimization algorithms like
genetic algorithms (GAs) and simulated annealing, are finding wide spread applicability
in engineering design and decision-making problems. These methods work according to
the principles of natural phenomenon and found to have solved many nonlinear
programming problems to global optimality successfully (Goldberg, 1989; Kirkpatrick,
Gelatt & Vecchi, 1983). Some problems are solved by heuristic methods. One such
problem is provided in the subsequent article: Development of Software for Flat paten
nesting.
References
Dantzig, G. B. (1963). Linear programming and extensions. Princeton, NJ: Princeton
University Press.
Rao, S. S. (1978). Optimization theory and applications. New Delhi: Wiley Eastern
Limited.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated
annealing. Science (220), 4598, 671-680.
Arora, J. (1989). Introduction to optimal design. Singapore: McGraw Hill.
Reklaitis, G. V., Ravindran, A., & Ragsdell, K. M. (1989). Engineering optimization
methods and applications. New York: John Wiley and Sons.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine
learning. Reading: Addison-Wesley.
Deb, K. (1995). Optimization for engineering design: Algorithms and examples. New
Delhi: Prentice-Hall.