OPTIMIZATION TECHNIQUES
[AcSIR-01-ES-AD-009]
Presented by:
Dr. R. S. Bisht, Pr. Scientist,
rsbisht@cbri.res.in
CSIR-CBRI, Roorkee, India
Gradient Descent Algorithm
• Gradient Descent is a fundamental optimization algorithm used to
minimize functions by iteratively moving toward the function’s local
minimum. It is widely used in machine learning, especially for training
models by minimizing loss functions.
• The main idea is to update the variables or parameters of the function
in the direction of the steepest descent (i.e., the negative gradient),
which is the direction that reduces the function's value the fastest.
Gradient Descent Algorithm
• Gradient Calculation (Derivative)
• Update Rule:
• Learning Rate The learning rate α controls how fast we move towards
the minimum.
• If α is too small, convergence will be slow.
• If α is too large, we may overshoot the minimum and fail to converge.
• Convergence The algorithm stops when the updates to the
parameters become very small, meaning the cost function is
minimized, or after a fixed number of iterations.
gradient descent method
Mathematical Formulation:
Disadvantages:
• Sensitive to the choice of the learning rate α. If the learning rate is too
large, it may overshoot; if too small, convergence can be slow.
• For functions with a complex geometry (e.g., ill-conditioned
problems), the algorithm may suffer from slow convergence,
especially in narrow valleys.
Steepest Descent Method
• The steepest descent method is similar to gradient descent, but it
finds the optimal step size for each iteration.
• Instead of using a fixed learning rate, it computes the step size by
solving a line search problem that minimizes the function along the
direction of the negative gradient.
Mathematical Formulation:
Advantages:
• Steepest descent is generally more reliable because it always finds the
best step size at each iteration.
• It works well in situations where the landscape of the function is
complicated (e.g., narrow valleys, ill-conditioned problems).
The steepest descent method can be computationally
Disadvantages: expensive because it requires solving a line search problem at
every iteration, which can significantly slow down the
algorithm for large-scale problems.
In practice, for many machine learning tasks, it can be overkill
due to the overhead of computing the optimal step size.
Key Differences:
Aspect Gradient Descent Steepest Descent
Fixed step size α chosen before the Dynamically computed αk using line
Step Size (Learning Rate)
process. search.
Also moves in the direction of the
Moves in the direction of the
Direction negative gradient, but with an
negative gradient.
optimal step size for each step.
Sensitive to the choice of the More robust convergence due to
Convergence
learning rate. optimal step size choice.
Potentially faster for well- Can be slower due to the overhead
Speed
conditioned problems. of calculating αk in each iteration.
Simple to implement; commonly Requires line search, which adds
Implementation
used in large-scale optimization. complexity and computational cost.
Suitable for problems where high
Suitable for machine learning precision and robust convergence
Suitability
problems with large datasets. are important (e.g., scientific
computations).