[go: up one dir, main page]

0% found this document useful (0 votes)
21 views33 pages

Week 5 Local Search

The document discusses various local search techniques and optimization methods, including hill-climbing, gradient descent, simulated annealing, local beam search, and genetic algorithms. It highlights the advantages of local search, such as low memory usage and effectiveness in large state spaces, while also addressing challenges like local maxima and plateaus. Additionally, it provides examples and details on how these techniques can be applied to problems like the 8-queens puzzle.

Uploaded by

leenamuqhit2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views33 pages

Week 5 Local Search

The document discusses various local search techniques and optimization methods, including hill-climbing, gradient descent, simulated annealing, local beam search, and genetic algorithms. It highlights the advantages of local search, such as low memory usage and effectiveness in large state spaces, while also addressing challenges like local maxima and plateaus. Additionally, it provides examples and details on how these techniques can be applied to problems like the 8-queens puzzle.

Uploaded by

leenamuqhit2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lect.

5 Local Search and Optimization


Outline

• Local search techniques and optimization


– Hill-climbing
– Gradient methods
– Simulated annealing
– Local beam search
– Genetic algorithms

Local Search 2
Local search and optimization

• Previously: systematic exploration of search space.


– Path to goal is solution to problem

• YET, for some problems path is irrelevant.


– E.g 8-queens
– might never explore a portion of the search space where a
solution actually resides

• Different algorithms can be used


– Local search

Local Search 3
Local search and optimization

• Local search
– Keep track of single current state
– Move only to neighboring states
– Ignore paths

• Advantages:
– Use very little memory
– Can often find reasonable solutions in large or infinite (continuous)
state spaces.

• “Pure optimization” problems


– All states have an objective function
– Goal is to find state with max (or min) objective value
– Does not quite fit into path-cost/goal-state formulation
– Local search can do quite well on these problems.

Local Search 4
“Landscape” of search

Local Search 5
1. Hill-climbing search

Local Search 6
1. Hill-climbing search

• “a loop that continuously moves in the direction of increasing


value”
– terminates when a peak is reached
– Aka greedy local search
• It terminates when it reaches a “peak” where no neighbor has a higher
value.
• Value can be either
– Objective function value
– Heuristic function value (minimized)
• Hill climbing does not look ahead beyond the immediate neighbors
of the current state.
• Can randomly choose among the set of best successors, if multiple
have the best value
• Characterized as “trying to find the top of Mount Everest while in a
thick fog”

Local Search 7
1. Hill climbing and local maxima

• When local maxima exist, hill climbing is suboptimal


• Simple (often effective) solution
– Multiple random restarts

Local Search 8
1. Hill-climbing example

• 8-queens problem, complete-state formulation


– All 8 queens on the board in some configuration

• Successor function:
– move a single queen to another square in the same column.

• Example of a heuristic function h(n):


– the number of pairs of queens that are attacking each other
(directly or indirectly)
– (so we want to minimize this)

Local Search 9
1. Hill-climbing example

Local Search 10
A local minimum for 8-queens

A local minimum in the 8-queens state space (h=1)

Local Search 11
Hill Climbing Drawbacks

• Local maxima

• Plateaus

• Diagonal ridges

Local Search 12
Hill Climbing Drawbacks

a. Depending on the initial state


b. Hill climbing can get stuck for any of the following reasons
1. Local maxima: A local maximum is a peak that is higher than each of its
neighboring states but lower than the global maximum
2. Plateau: an area of the state space where the evaluation function is flat.
3. Ridge: sequence of local maxima difficult for greedy algorithms to navigate

Local Search 13
Other drawbacks

Local Search 14
Performance of hill-climbing on 8-queens

• Randomly generated 8-queens starting states…

• 14% the time it solves the problem

• 86% of the time it get stuck at a local minimum

• However…
– Takes only 4 steps on average when it succeeds
– And 3 on average when it gets stuck
– (for a state space with ~17 million states)

Local Search 15
Possible solution…sideways moves

• If no downhill (uphill) moves, allow sideways moves in hope


that algorithm can escape
– Need to place a limit on the possible number of sideways moves to
avoid infinite loops

• For 8-queens
– Now allow sideways moves with a limit of 100
– Raises percentage of problem instances solved from 14 to 94%

– However….
• 21 steps for every successful solution
• 64 for each failure

Local Search 16
Optimization of Continuous Functions

• Discretization
– use hill-climbing

• Gradient descent
– Used with continuous domain
– To use gradient descent, we need to know the gradient of our cost
function, the vector that points in the direction of greatest
steepness (we want to repeatedly take steps in the opposite
direction of the gradient to eventually arrive at the minimum).

17
Local Search 17
2. Gradient Descent

Assume we have some cost-function: C (x1 ,..., xn )


Image:Gradient ascent (contour).png

and we want minimize over continuous variables X1,X2,..,Xn

1. Compute the gradient (slope) of current state (derivatives) :



C (x1 ,..., xn ) i
xi

2. Take a small step downhill in opposite direction of the gradient:



xi → x 'i = xi −  C (x1 ,..., xn ) i
xi

3. Check if C (x1 ,.., x ',..,


i xn )  C (x1 ,.., xi ,.., xn )

4. If true then accept move, if not reject.

5. Repeat.

Local Search 18
2. Gradient Descent

Local Search 19
Learning as optimization

• Many machine learning problems can be cast as optimization

• Example:
– Training data D = {(x1,c1),………(xn, cn)}
where xi = feature or attribute vector
and ci = class label (say binary-valued)

– We have a model (a function or classifier) that maps from x to c


e.g., sign( w. x’ ) = {-1, +1}

– We can measure the error E(w) for any setting of the weights w,
and given a training data set D

– Optimization problem: find the weight vector that minimizes E(w)

Local Search 20
3. Simulated Annealing

• A hill-climbing algorithm that never makes “downhill” moves toward


states with lower value (or higher cost) is always vulnerable to getting
stuck in a local maximum.

• A purely random walk that moves to a successor state without


concern for the value will eventually stumble upon the global
maximum, but will be extremely inefficient.

• Simulated annealing combines hill climbing with a random walk in a


way that yields both efficiency and completeness

• Switch our point of view from hill climbing to gradient descent

Local Search 21
Physical Interpretation of Simulated Annealing

• A Physical Analogy:
• imagine letting a ball roll downhill on the function surface
– this is like hill-climbing (for minimization)
• now imagine shaking the surface, while the ball rolls, gradually
reducing the amount of shaking
– this is like simulated annealing

• Annealing = physical process of cooling a liquid or metal until


particles achieve a certain frozen crystal state
• simulated annealing:
– get this by slowly reducing temperature T, which particles
move around randomly

Local Search 22
3. Simulated Annealing

• Basic ideas:
– like hill-climbing identify the quality of the local improvements
– instead of picking the best move, pick one randomly
– say the change in objective function is 
– if  is positive, then move to that state
– otherwise:
• move to this state with probability proportional to 
• thus: worse moves (very large negative ) are executed less
often
– however, there is always a chance of escaping from local maxima
– over time, make it less likely to accept locally bad moves

Local Search 23
3. Simulated Annealing

Local Search 24
More Details on Simulated Annealing

– Lets say there are 3 moves available, with changes in the objective
function of d1 = -0.1, d2 = 0.5, d3 = -5. (Let T = 1).
– pick a move randomly:
• if d2 is picked, move there.
• if d1 or d3 are picked, probability of move = exp(d/T)
• move 1: prob1 = exp(-0.1) = 0.9,
– i.e., 90% of the time we will accept this move
• move 3: prob3 = exp(-5) = 0.05
– i.e., 5% of the time we will accept this move

– T = “temperature” parameter
• high T => probability of “locally bad” move is higher
• low T => probability of “locally bad” move is lower
• typically, T is decreased as the algorithm runs longer
– i.e., there is a “temperature schedule”

Local Search 25
Simulated Annealing in Practice

– Method proposed in 1983 by IBM researchers for solving VLSI


layout problems (Kirkpatrick et al, Science, 220:671-680, 1983).

– If T decreases slowly enough, then simulated annealing search will


find a global optimum with a probability approaching 1

– It often works very well in practice, but usually VERY VERY slow
– slowness comes about because T must be decreased very
gradually to retain optimality
• In practice how do we decide the rate at which to decrease T?
(this is a practical problem with this method)

Local Search 26
4. Local beam search

• Keep track of k states instead of one


– Initially: k randomly selected states
– Next: determine all successors of k states
– If any of successors is goal → finished
– Else select k best from successors and repeat.

• Major difference with random-restart search


– Information is shared among k search threads.

• Problem: suffer from a lack of diversity among the k states—they can


become clustered in a small region of the state space, making the
search little more than a k-times-slower version of hill climbing.

Local Search 27
5. Genetic algorithms
• Different approach to other search algorithms
– A successor state is generated by combining two parent states
– https://www.youtube.com/watch?v=-kpcAa-qKwY

• A state is represented as a string over a finite alphabet (e.g. binary)


– 8-queens: State = position of 8 queens each in a column
=> 8 x log(8) bits = 24 bits (for binary representation)

1. Start with k randomly generated states (population)

2. Evaluation function (fitness function).


- Higher values for better states.

3. Select individuals for next generation based on fitness


- P(indiv. in next gen) = indiv. fitness / total population fitness

4. Produce the next generation of states by “simulated evolution”


– Random selection
– Crossover
– Random mutation

5. Mutate the offspring randomly with some low probability

Local Search 28
5. Genetic algorithms

• The cth digit represents the row number of the queen in column c
• Fitness function: number of non-attacking pairs of queens (min = 0, max = 8 × 7/2 = 28)
• 24/(24+23+20+11) = 31%
• 23/(24+23+20+11) = 29%
• 20/(24+23+20+11) = 26%
• 11/(24+23+20+11) = 14%
Local Search 29
5. Genetic algorithms

4 states for 2 pairs of 2 states randomly New states Random


8-queens selected based on fitness. after crossover mutation
problem Random crossover points applied
selected
Local Search 30
5. Genetic algorithms

Local Search 31
Resourcse

• https://medium.com/codex/gradient-descent-
cb0f02dc6eab#:~:text=To%20use%20gradient%20descent%2C%20we,eventua
lly%20arrive%20at%20the%20minimum).
• https://www.youtube.com/watch?v=21EDdFVMz8I

Local Search 32
Local Search 33

You might also like