BSCM-02 Evolutionary Algorithms
BSCM-02 Evolutionary Algorithms
BSCM-02 Evolutionary Algorithms
Krzysztof Michalak
krzysztof.michalak@ue.wroc.pl
https://krzysztof-michalak.pl/
5 6
1
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
The 4PP is a deceptive problem The 4PP instance for n = 16 and T = 4
7 8
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
All solutions of the 4PP instance for n = 16 and T = 4 Solutions found by an evolutionary algorithm solving
the 4PP instance for n = 100 and T = 15 mapped to
ℝ2 using the LDEE technique (red: REWARDed
For this visualization, all 65536 solutions were mapped from { 0, 1 }16
to ℝ2 using the LDEE technique[1]. solution)
LDEE tries to find positions of points in the target space (ℝ2) in such
a way that the distances are "similar" to those in the source space
({ 0, 1 }16).
[1] K. Michalak "Low-Dimensional Euclidean Embedding for Visualization of Search Spaces
in Combinatorial Optimization", IEEE Transactions on Evolutionary Computation, 23(2),
pp. 232-246, IEEE, 2019.
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
Here, the selective pressure drives us
The gradual improvement suggests increasing the Theaway
gradual improvement
from a REWARDed suggests increasing the
solution!
number of either zeros or ones number of either zeros or ones
Here, the selective pressure vanishes.
Getting (gradually) from a locally optimal solution to Getting
If only (gradually)
a few bits have from a locally optimal solution to
to be fixed,
a REWARDed one requires worsening of the solution a REWARDed solution
a REWARDed could be found
one requires worsening of the solution
by the genetic drift.
x0 = 000…00000000000000000 f(x0) = 100 x0 = 000…00000000000000000 f(x0) = 100
84 zeros 84 zeros
11 12
2
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
There is a gradual improvement path from less A problem: when the left or right part of the solution
optimized solutions to the REWARDed ones is improved, there is no incentive to work on the
x0 =0100010010110000…0010010100100011 f(x0) = 2
other end of the vector
x1 =0100010010110000…0010010100100111 f(x1) = 3 x2 =0000010010110000…0010010100100111 f(x2) = 5
x2 =0000010010110000…0010010100100111 f(x2) = 5 x3 =0000010010110000…0010010100101111 f(x2) = 5
x3 =0000000010110000…0010010100100111 f(x3) = 8
x4 =0000000000110000…0010010100100111 f(x4) = 10
… The more improved one end of the vector, the har-
first 16 positions last 16 positions der to improve a solution by gradually modifying the
other end
x4 =0000000000110000…0010010100100111 f(x4) = 10
x5 =0000000000110000…0010010111111111 f(x5) = 10
13 14
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
Solution: process a population of solutions and allow Solution: process a population of solutions and allow
transfering information between them transfering information
If the right between them
part is not optimized,
gradually improving the left part
Some solutions improve the left part of the vector, Some solutions gives
improve theanleft
a solution part of the vector,
advantage
xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10 xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10
15 16
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
Solution: process a population of solutions and allow Solution: process a population of solutions and allow
transfering information between
If the left part them
is not optimized, transfering information between them
gradually improving the right part
Some solutions gives a solution
improve theanleft
advantage
part of the vector, Some solutions improve the left part of the vector,
others the right part others the right part
x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2 x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2
xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10 xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10
17 18
3
The Four Peaks problem (4PP) Population-based metaheuristics
Conclusions Design principles
Start with many initial solutions
If a population of solutions is processed, each population
member can improve a different part of the solution Evaluate the goal function for these solutions
Combining these partial solutions can be beneficial Use the obtained information from the entire population to
guide the search
A less optimized but more diverse population can find Details depend on a particular method
a REWARDed solution more easily For example in EAs information from parents is combined to
generate offspring
Population-based metaheuristics
Advantages
Much less likely to get stuck in local optima
evaluated solutions
21 22
4
Evolution vs. a random search A broader view
Evolution is not a random process (and neither are Evolution as a (very) general principle
evolutionary algorithms) Selection causes "better" individuals to dominate in the population
Optimization
Mutation can (minimization)
be assumed of to be purely random Population contains more and more "better" individuals
potential energy! Statistically, the entire population "moves" towards a "better" region
Selection usually involves an element of chance, but the in the search space
probability landscape
Optimization distribution is skewed towards better solultions Next generation is selected from an already "improved" population
The shape of the suface under
The whole process is guided
water by the selective pressure
Everything else are just "implementational details"
Good analogy: a waterfall
Biological
Droplets fall chaotically Chemical (abiogenesis) – self replicating molecules
The entire flow is directed Computer-based
Evolution of memes (ideas, behaviors, or styles that spread by means
of imitation from person to person)
25 26
Terminology Terminology
Individual (a.k.a. specimen) Representation
Can be used to denote both a genotype and a phenotype Two different meanings
A particular representation (genotype) of a phenotype
Often, genotypes are composed from elements
A method of representing phenotypes in the genotype space
Examples: vectors, matrices, etc.
Locus (plural: loci), a.k.a. position, gene: where the element can be
placed
Encoding and decoding
Encoding: the mapping from phenotypes to genotypes
Allele (a.k.a. value): what is actually stored
Decoding: the mapping from genotypes to phenotypes
29 30
5
Terminology Terminology
Population Mating pool
The unit of evolution (both biological and artificial) A subset of the population
In EAs it is a multiset*) of genotypes
Contains solutions which are recombined to produce a new
A parameter: Npop the population size (the number of genotypes) generation
More advanced population structures:
Neighbourhood structure (e.g. in the MOEA/D algorithm)
Solutions that are included in the mating pool are referred to as
In multipopulation algorithms: subpopulations, a migration scheme
parents
Selected in such a way that fitter solutions are preferred (e.g. more
Archive copies, more probable to be selected)
Contains genotypes which we want to prevent from being lost
Depending on the recombination operators the mating pool can be
Example: the best approximation of the Pareto front found so far in split into pairs or used in some other way
multiobjective evolutionary algorithms Differential evolution uses three solutions to create a single new one.
In some algorithms (e.g. SPEA2) solutions from the archive can mix The inver-over operator (dedicated to the TSP) can mutate one parent or
with the evolving population recombine it with another one. This is repeated many times, effectively involving
numerous solutions in generating a single new one.
*) A multiset is a set where multiple copies of an element are possible 31 32
*) Unless noted otherwise, the material from the following book is used
Solution evaluation. A problem-specific numerical value
A.E. Eiben, J.E. Smith, "Introduction to Evolutionary Computing", 2nd ed., Springer, 2015.
https://link.springer.com/book/10.1007%2F978-3-662-44874-8
representing the quality we want to optimize.
It is free to download from the UEW network (http proxy, virtual machines or labs) 33 34
6
SGA for the Knapsack Problem (KP)
Given n items, each with a weight wi and a value vi
choose the items to pack to a knapsack with the
A Simple Genetic Algorithm weight limit W so tha the total value is maximized.
37 38
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
Remedy: use a constraint-aware solution decoding Population initialization
Given a binary vector x = [ x i ] i = 1, …, n
Generate N pop = 500 binary vectors
At each position in each vector set 0 or 1 with P = 0.5
[ 0, 1, 0, 1, 1, 1, 0, 1, 0 ]
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
Crossover Crossover
single-point and two-point n -point
A generalization of the two-point crossover
The vector is cut at n randomly selected positions
uniform
Values from each position are independently assigned to
each of the two offspring with P = 0.5
41 42
7
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
Parent selection Parent selection
Binary tournament Binary tournament
Select two individuals randomly from the population with Select two individuals randomly from the population with
uniform probability uniformNote,
probability
that the tournament is binary because two
individuals take part.
The one with the higher value is selected The one with the higher value is selected
Resolve draws at random ResolveIt draws
has nothing to do with the binary representation
at random
used for the Knapsack Problem (it can be used, for
For the crossover operators we need two parents, so we For theexample, alsooperators
crossover for the TSP we
based on permutations).
need two parents, so we
perform two separate tournaments perform two separate tournaments
When more than two indivuals take part we get an
n -ary tournament.
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
Reproduction Mating pool Next population Mutation
With the probability P cross Bit-flip mutation
apply the crossover operator Pcross Applied independently to each position in each vector
evaluate the new solutions
With the probability P mut the value is flipped from 0 to 1
With the probability 1 - P cross or vice versa
just copy the parents (the 1 - Pcross
evaluation does not have A typical setting is P mut = 1 / n
to be recalculated)
On average one value is changed in each individual
A setting of P cross = 0.7 balances exploitation (keeping
good solutions in the population) with the exploration
(generating new solutions)
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
Survival selection Summary of design choices proposed in the
Generational: offspring population replaces the parent Eiben&Smith book
population
Additional mechanisms
Elitism: the best solution is always copied to the next
generation without change
External archiving: a copy of the best solution is made
after each generation and put aside (but not included in
the population)
47 48
8
Real-coded representation
Useful for problems described by numerical
parameters, for example, engineering problems
representation
49 50
[1] Correia J., Ferreira F., „Designing Cable-Stayed Bridges with Genetic Algorithms”, in: Castillo
P. A. et al. (eds.): EvoApplications 2020, LNCS 12104, pp. 228–243, 2020. 51 52
[1] Herrera F., Lozano M., Verdegay J. L., "Tackling Real-Coded Genetic Algorithms: Operators If [-0.25, 1.25] we call it the extended line crossover
and Tools for Behavioural Analysis", Artificial Intelligence Review, vol. 12, p. 265–319, 1998.
53 54
9
Real-coded representation Real-coded representation
BLX- crossover BLX- crossover – the role of the parameter
One offspring H = (h1, …, hd) is generated, where h i is a BLX0.00 hi [ cmin, cmax]
cmin
randomly (uniformly) chosen value from the interval: If parents are in the basin cmax f (x)
of attraction of a suboptimum x’
[c min – I ∙ , c max + I ∙ ] the new solution converges to x’
x'
x*
where:
Ω
c min = min(c i1, c i2) hi range
c max = max(c i1, c i2) BLX0.25 hi [c min – I ∙ 0.25, c max + I ∙ 0.25 ]
The new solution can be located
I = c max – cmin cmin
f (x)
outside the basin of attraction of cmax
a suboptimum x’ and may converge
BLX0.00 = flat crossover to the true optimum x* x'
x*
BLX0.25 = extended intermediate crossover
Ω
55 hi range 56
57 58
Permutation-based representation
Useful for problems in which the ordering of items
matters
The TSP: the order in which to visit cities.
Permutation-based Scheduling problems: the order in which to perform
representation operations.
59 60
10
Permutation-based representation Permutation-based representation
Population initialization – random permutations of n The need for dedicated crossover operators for
elements permutations
The Fisher-Yates shuffle[1,2] We can store permutations as vectors of integers
// All numbers from 1 to n
for i from 0 to n−1 do However, generic crossovers for vectors (arrays)
a[i] = i + 1
produce invalid permutations
// Shuffling
for i from 0 to n−2 do Example: a single point crossover
j := a random integer such that i ≤ j < n
exchange a[i] and a[j]
1 2 3 4 5 6 1 2 3 3 2 1
6 5 4 3 2 1 6 5 4 4 5 6
[1] Knuth D. E., "The Art of Computer Programming. Volume 2, Seminumerical algorithms",
Reading, MA: Addison–Wesley. pp. 139–140, 1969.
[2] https://en.wikipedia.org/wiki/Fisher-Yates_shuffle 61 62
1. Divide parents into cycles 3. Look at the value v2 in the same position in P2
2. From the first, third, etc. cycles copy the elements 4. Add v2 to the cycle
P1 O1 and P2 O2 5. Go to the position with the value v2 in P1
3. From the second, fourth, etc. cycles copy the elements 6. Repeat steps 3 through 5 until you arrive again at the value
P1 O2 and P2 O1 v1 in P1. When you do, the cycle is completed.
11
Permutation-based representation Permutation-based representation
The first cycle is: (1, 9, 4, 8)
Cycle Crossover (CX) Cycle Crossover (CX) Elements from this cycle are
assigned: P1 O1 and P2 O2
67 68
69 70
Mumford C. L., "New order-based crossovers for the graph coloring problem", in:
Runarsson T. P., Beyer H. G., Burke E., Merelo-Guervos J. J., Whitley D. L., Yao X. (eds.) Precedence Preservative Crossover (PPX)
PPSN IX, LNCS, vol. 4193, pp. 880–889. Springer, Heidelberg, 2006. Bierwirth C., Mattfeld D. C., Kopfer H., "On permutation representations for
scheduling problems", in: Proceedings of the 4th International Conference on Parallel
Non-Wrapping Order Crossover (NWOX) Problem Solving from Nature, pp. 310–318, Springer, 1996.
Blanton J. L. Jr., Wainwright R. L., "Multiple vehicle routing with time and capacity
Cicirello V. A., "Non-wrapping order crossover: an order preserving crossover operator that constraints using genetic algorithms", in: Proceedings of the 5th International Conference
respects absolute position", in: Proceedings of the 8th Annual Conference on Genetic and on Genetic Algorithms, pp. 452–459, Morgan Kaufmann Publishers Inc., San Francisco, CA,
Evolutionary Computation, pp. 1125–1132. ACM, New York, NY, USA, 2006. USA, 1993.
12
Permutation-based representation Permutation-based representation
Swap Mutation Illustration of the mutation operators for the
Two positions in the chromosome are selected at random permutation-based representation
and their values are swapped
Insert Mutation
Two values are selected at random and the second is
moved next to the first, shifting the others to make room
Scramble Mutation
The entire chromosome, or some randomly chosen subset
of it, is randomly shuffled
Inversion Mutation
Two positions in the chromosome are selected and the
elements between them are placed in the reversed order
73 74
Dedicated operators for the TSP Dedicated operators for the TSP
The Inver-Over Operator[1] The Inver-Over Operator - example
Random inverse rate parameter Assume that the current individual is:
With probability 1-: transfer of information from another solution A random city is selected from S, let's say c = 3
using the "guided inversion" operation A random number x is drawn from [0, 1]
Typical setting: = 0.02 If x then inversion mutation is performed
Another random city is selected from S, let's say c' = 8
The solution S is updated – the segment (c, c'] is reversed
[1] Tao G., Michalewicz Z., "Inver-over operator for the TSP", in: Eiben A. E., Back T., S [ 2, 3, 8, 5, 1, 4, 9, 6, 7 ]
Schoenauer M., Schwefel H. P. (eds.), Parallel Problem Solving from Nature PPSN V, Lecture
Notes in Computer Science, vol. 1498, pp. 803–812, Springer, Berlin Heidelberg, 1998.
75 76
Dedicated operators for the TSP Dedicated operators for the TSP
The Inver-Over Operator – example The 2-opt operator
If x > then another solution is randomly selected: Randomly selects two non-consecutive edges <c 1, c 2>
and <c 3, c 4>
S' = [ 1, 6, 4, 3, 5, 7, 9, 2, 8 ]
c c' Replaces them with the two other edges, which do not
cut the tour in two: <c 1, c 3> and <c 2, c 4>
The city c' is selected as the next one, after c in S'
The solution S is updated – the segment (c, c'] is reversed This operator is intended to untangle loops in the tour
Used as a local search operator
S = [ 2, 3, 9, 4, 1, 5, 8, 6, 7 ]
c c'
S [ 2, 3, 5, 1, 4, 9, 8, 6, 7 ]
c c'
Information is transferred from S' to S : the next city after c
becomes c' (taken from S' )
source: http://www.devx.com/dotnet/Article/33574/0/page/3
77 78
13
Dedicated operators for the TSP Dedicated operators for the TSP
The 2-opt operator – C# implementation The 2-opt operator – C# implementation
Inputs:
perm – the solution we start from
costs – the cost matrix
Output:
perm – the solution improved using 2-opt
79 80
Dedicated operators for the TSP Dedicated operators for the TSP
The 2-opt operator – C# implementation The 2-opt operator – key features
Very fast, because only delta evaluation is needed
An informed operator
Uses information about the solved problem
The cost matrix is provided
Fitness
A single number that represents how good a
soltution is
Should be higher for better individuals
Fitness-proportionate For maximization problems a possible choice is
selection fitness(x ) = f(x )
For minimization problems the fitness should be
higher for lower f(x )
Fitness can summarize multiple objectives as well as
other aspects, such as crowding or constraint
violation
83 84
14
Fitness assignment Fitness assignment
A scheme which determines how the fitness depends Rank-based fitness
on the evaluation f(x ) Order solutions in the population w.r.t. to f(x )
If values of f(x ) are large, and differences are small Use the (normalized) position in the ranking as fitness
there can be little selective pressure
For constrained optimization problems fitness can be
Use normalized evaluations as fitness
reduced by a penalty
f ( x ) f min For feasible solutions: no penalty
Maximization: fitness ( x )
f max f min Penalty increases when constraints are violated
f f ( x) Infeasible solutions can contain valuable information
Minimization: fitness( x ) max
f max f min
They should neither be allowed to dominate the population
nor to go extinct entirely
f min , f max calculated in the whole population
85 86
optimization problems) 87 88
15
Stochastic remainder selection Stochastic remainder selection
Given parameters The truncated integer value of ei copies of i-th
The population size n solution are propagated to the next generation
The fitness values fi, for i = 1, …, n
The fractional parts of each ei are used to allocate
The expected number ei of copies is calculated that proportionally-sized slots on a roulette wheel
statistically should be carried over to the next
generation for i = 1, …, n The roulette wheel is randomly spun and the chosen
individuals are copied to the next generation until the
f next generation contains n individuals
ei n n i
fj
j 1
This way at least ⌊ei⌋ copies are guaranteed to
survive
91 92
16
The role of EA elements The role of EA elements
Selection Selection
Key driver of evolution (both biological and artificial) Increasing selective pressure
More solutions participating in a tournament
Causes better solutions to have a higher chance of survival
Nonlinear fitness assignment (e.g. f(x)α with α > 1).
Higher selective pressure Truncation selection (the best Npop solutions selected from a union
of parents and offspring)
Better solutions found faster
Faster convergence (diversity decreases more rapidly) Decreasing selective pressure
Higher risk of getting stuck in local suboptima Allow the worse solution to win the tournament with some
small probability
Nonlinear fitness assignment (e.g. f(x)α with α < 1)
97 98
Crossover
CX, LOX, MOX, NWOX, OBX, OX, PBX, PMX, PPX, UPMX
In each generation one operator selected randomly
Mutation
Displacement, Insertion, Inversion, Scramble, Transpose
In each generation one operator selected randomly
99 100
EA description EA variants
Note, that these operators are not
Initial population TSP-specific and can be used for "Random"
200 specimens many permutation-based problems In each generation a new population is randomly
Each specimen is a randomly generated permutation generated
No operators, no selection
Crossover Random search, not an evolutionary algorithm
CX, LOX, MOX, NWOX, OBX, OX, PBX, PMX, PPX, UPMX
In each generation one operator selected randomly "Random selection"
Crossover and mutation
Mutation
Specimens for the new generation are randomly selected
Displacement, Insertion, Inversion, Scramble, Transpose
Full EA, but with random selection
In each generation one operator selected randomly
No selective pressure
101 102
17
EA variants EA variants
"Elitism only" "No crossover" and "no mutation"
The same as the "random selection" variant Only one of the operators is used
The best specimen is preserved and copied to the next Specimens for the new generation are selected using the
generation binary tournament
Full EA, but with all specimens randomly selected, except This creates a selective pressure
the best one
EA with one of the operators missing
103 104
EA variants Plots
"All" – full EA, no missing parts Generation 0 – the best solution in the initial
Both operators are used population f (x ) = 6 216 147
Crossover: allows information transfer
Mutation: increases diversity
Specimens for the new generation are selected using the
binary tournament
This creates a selective pressure
Plots Plots
Generation 10 000 Generation 100 000
Random: 5 956 257 Random selection: 5 968 342 Elitism only: 4 277 255 Random: 5 956 257 Random selection: 5 954 514 Elitism only: 2 157 273
No crossover: 863 838 No mutation: 3 868 928 All: 1 056 090 No crossover: 305 276 No mutation: 3 868 928 All: 320 296
107 108
18
Plots Plots
Generation 500 000 Generation 1 000 000
Random: 5 935 172 Random selection: 5 931 241 Elitism only: 1 079 215 Random: 5 935 172 Random selection: 5 931 241 Elitism only: 801 917
No crossover: 282 174 No mutation: 3 868 928 All: 280 468 No crossover: 281 355 No mutation: 3 868 928 All: 279 238
109 110
For this problem mutation is more important than A local search based on the 2-opt operator
crossover Untangles "loops"
Is known to greatly improve the results
The best solution found is 7.8% worse than optimal
3-opt (and possibly also k -opt with k > 3)
Note, that no TSP-specific elements were used Lin-Kernighan (LK) algorithm
A variable k -opt algorithm
Decides which k is the most suitable at each iteration step
113 114
19
TSP-specific algorithm improvements
A comparison of the EA variants presented before
with the EA using the 2-opt operator as the local
search
115
20