An architecture for massively parallelization of
the compact genetic algorithm
arXiv:cs/0402049v1 [cs.NE] 20 Feb 2004
Fernando G. Lobo, Cláudio F. Lima, and Hugo Mártires
ADEEC-FCT
Universidade do Algarve
Campus de Gambelas
8000-062 Faro, Portugal
{flobo,clima}@ualg.pt, hmartires@myrealbox.com
Abstract. This paper presents an architecture which is suitable for a
massive parallelization of the compact genetic algorithm. The resulting
scheme has three major advantages. First, it has low synchronization
costs. Second, it is fault tolerant, and third, it is scalable.
The paper argues that the benefits that can be obtained with the proposed approach is potentially higher than those obtained with traditional
parallel genetic algorithms. In addition, the ideas suggested in the paper
may also be relevant towards parallelizing more complex probabilistic
model building genetic algorithms.
1
Introduction
There has been several efforts in the field of evolutionary computation towards
improving the genetic algorithm’s efficiency. One of the efficiency enhancement
techniques that has been investigated, both in theory and in practice, is the topic
of parallelization [1].
In this paper, parallelization is investigated further, this time in the context
of Probabilistic Model Building Genetic Algorithms (PMBGAs), a class of genetic algorithms whose operational mechanics differ somewhat from those of the
traditional GAs.
Efficiency is a very important factor in problem solving. When talking about
computer algorithms, efficiency is usually addressed along two major axis: Time
and Memory requirements needed to solve a problem. In the context of genetic
and other evolutionary algorithms, there is another axis, Solution Quality, that
also needs to be addressed. This third aspect comes into play because many of
the problems that genetic and evolutionary algorithms attempt to solve cannot
be solved optimally with 100% confidence unless a complete enumeration of the
search space is performed. Therefore, genetic algorithms, as well as many other
methods, use a search bias to try to give good approximate solutions without
doing a complete exploration of the search space.
Summarizing, efficiency in genetic algorithms translates into a 3-objective
problem: (1) maximize solution quality, (2) minimize execution time, and (3)
minimize memory resources. The latter is usually not a great concern because in
traditional GAs, memory requirements are constant throughout a run, leaving
us with a tradeoff between solution quality and execution time.
The rest of the paper is organized as follows. The next section presents background material on parallel GAs and on PMBGAs. Then, section 3 raises some
issues that can be explored when parallelizing PMBGAs that were not possible with regular GAs. Section 4 presents an architecture that allows a massive
parallelization of the compact GA, and in section 5 computer experiments are
conducted and its results are discussed. Finally, a number of extensions are outlined, and the paper finishes with a summary and some conclusions.
2
Background
This section presents background material which is necessary for understanding
the rest of the paper. It starts with a review of the major issues involved in
parallelizing GAs, and then reviews the basic ideas of PMBGAs giving particular
emphasis to the compact GA.
2.1
Parallel GAs
An important efficiency question that people are faced with in problem solving is
the following: Given a fixed computational time, what is the best way to allocate
computer resources in order to have as good a solution as possible.
Under such a challenge, the idea of parallelization stands out naturally as a
way of improving the efficiency of the problem solving task. By using multiple
computers in parallel, there is an opportunity for delivering better solutions in
a shorter period of time.
Many computer algorithms are difficult to parallelize, but that is not the
case with GAs because GAs work with a population of solutions which can be
evaluated independently of one another. Moreover, in many problems most of
the time is spent on evaluating solutions rather than on the internal mechanisms
of the GA operators themselves. Indeed, the time spent on the GA operators is
usually negligible compared to the time spent on evaluating individual solutions.
Several researchers have investigated the topic of parallel GAs and the major
design issues are in choices such as using one or more populations, and in the
case of using multiple populations, decide when, with who, and how often do
individuals communicate with other individuals of other populations.
Although implementing parallel genetic algorithms is relatively simple, the
answers to the questions raised above are not so straightforward and traditionally
have only been answered by means of empirical experimentation. One exception
to that has been the work of Cantú-Paz [1] who has built theoretical models
that lead to rational decisions for setting the different parameters involved in
parallelizing GAs. There are two major ways of implementing parallel GAs:
1. Using a single population.
2. Using multiple populations.
In single population parallel GAs, also called Master-Slave parallel GAs, one
computer (the master) executes the GA operations and distributes individuals to
be evaluated by other computers (the slaves). After evaluating the individuals,
the slaves return the results back to the master. There can be significant benefits
with such a scheme because the slaves can work in parallel, independently of one
another. On the other hand, there is an extra overhead in communication costs
that must be paid in order to communicate individuals and fitness values back
and forth.
In multiple population parallel GAs, what would be a whole population in
a regular non-parallel GA, becomes several smaller populations (usually called
demes), each of which is located in a different computer. Each computer executes
a regular GA and occasionally, individuals may be exchanged with individuals
from other populations. Multiple population parallel GAs are much harder to
design because there are more degrees of freedom to explore. Specifically, four
main things need to be chosen: (1) the size of each population, (2) the topology
of the connection between the populations, (3) the number of individuals that
are exchanged, and (4) how often do the individuals exchange.
Cantú-Paz investigated both approaches and concluded that for the case of
the Master-Slave architecture, the benefits of parallelization occur mainly on
problems with long function evaluation times because it needs constant communication. Multiple population parallel GAs have less communication costs but
do not avoid completely the communication scalability problem. In other words,
in either approach, communication costs impose a limit on how fast parallel GAs
can be. To overcome this limitation, Cantú-Paz proposed a combination of the
two approaches in what was called Hierarchical Parallel GAs, and verified that
when using such an approach it is possible to reduce the execution time more
than by using either approach alone. The interested reader is referred to the
original source for the mathematical formulation and for additional information
on the design of parallel GAs.
2.2
Probabilistic Model Building Genetic Algorithms
Probabilistic Model Building Genetic Algorithms (PMBGAs), also referred by
some authors as Estimation of Distribution Algorithms (EDAs), or Iterated Density Evolutionary Algorithms (IDEAs), are a class of Evolutionary Algorithms
that replace the traditional variation operators, crossover and mutation, by the
construction of a probabilistic model of the population and subsequent sampling
from that model to obtain a new population of individuals. The operation of
PMBGAs can be summarized by the following 5 steps:
1.
2.
3.
4.
5.
Create a random population of individuals.
Apply selection to obtain a population of “good” individuals.
Build a probabilistic model of those good individuals.
Generate a new population according to the probabilistic model.
Return to step 2.
Work on this area begun with simple probabilistic models that treated each
gene independently, sometimes also called order-1 models. Later, more complex
algorithms were developed to allow dependencies among genes. A detailed review
of these algorithms can be found elsewhere [2] [3].
The next subsection, reviews in detail the compact GA [4], which is an example of an order-1 PMBGA, and whose parallelization is discussed later in the
paper.
2.3
The Compact Genetic Algorithm
Consider a 5-bit problem with a population of 10 individuals as shown below:
10000
11001
01111
11000
01101
01110
11000
10000
01101
10011
Under the compact GA, the population can be represented by the following
probability vector:
0.6 0.7 0.4 0.3 0.5
The probabilities are the relative frequency counts of the number of 1’s for
the different gene positions, and can be interpreted as a compact representation
of the population. In other words, the individuals of the population could have
been sampled from the probability vector.
Harik et al. [4] noticed that it was possible to mimic the behavior of a simple
GA, without storing the population explicitly. Such observation came from the
fact that during the course of a regular GA run, alleles compete with each other
at every gene position. At the beginning, scanning the population column-wise,
we should expect to observe that roughly 50% of the alleles have value 0 and
50% of the alleles have value 1. As the search progresses, for each column, either
the zeros take over the ones, or vice-versa. Harik et al. built an algorithm that
explicitly simulates the random walk that takes place on the allele frequency
makeup for every gene position. The resulting algorithm, the compact GA, was
shown to be operationally equivalent to a simple GA that does not assume any
linkage between genes.
The compact GA does not follow exactly the 5 steps mentioned previously (in
section 2.2) for a typical PMBGA, because the algorithm does not manipulate
the population explicitly. Instead, it does so in an indirect way through the
update step of 1/N , where N denotes the population size of a regular GA.
3
Motivation for parallelizing PMBGAs
The main motivation for parallelizing PMBGAs is the same as the one for parallelizing regular GAs, or any other algorithm: efficiency. By using multiple computers it is possible to make the algorithm run faster.
In many ways, parallelizing PMBGAs has many similarities with parallelizing
regular GAs. On the other hand, the mechanics of PMBGAs are different from
those of regular GAs, and it is possible to take advantage of that. Specifically,
it is possible to increase efficiency by exploring the following two things:
1. Parallelize model building.
2. Communicate model rather than individuals.
In regular GAs, the time spent on the GA operations (selection, crossover,
and mutation) is usually negligible compared to the time spent in fitness function
evaluations. When using PMBGAs, and especially when using multivariate models, the model-building phase is much more compute intensive than the usual
crossover and mutation operators of a regular GA. For many problems, such
overhead can contribute to a significant fraction of the overall execution time. In
such cases, it makes a lot of sense to parallelize the model-building phase. There
has been a couple of research efforts addressing this topic [5] [6].
Another aspect that makes PMBGAs very attractive for parallelization comes
from the observation that the model is a compact representation of the population, and it is possible to communicate the model rather than individuals
themselves. Communication costs can be reduced this way because the model
needs significant less storage than the whole population. Since communication
costs can be drastically reduced, it might make sense to clone the model to
several computers, and each computer could work independently on solving the
problem by running a separate PMBGA. Then, the different models would have
to be consolidated (or mixed) once in a while. The next section presents an
architecture that implements this idea with the compact GA.
4
An architecture for building a massively parallel
compact GA
This section presents an architecture which is suitable for a scalable parallelization of the compact GA. Similar schemes can be done with other order-1 PMBGAs. However, the connection that exists between the population size and the
update step, makes the compact GA more suitable when working with very large
populations, a topic that is revisited later.
Since the model-building phase of the compact GA is trivial, our study focuses only on the second item mentioned in section 3; communicate the model
rather than individuals. In the case of the compact GA, the model is represented
by a probability vector of size ℓ (ℓ is the chromosome length). Each variable of
the probability vector contains a value which has to be a member of a finite set
of N + 1 values (N denotes the size of the population that the compact GA
is simulating). The N + 1 numbers correspond to all possible allele frequency
counts for a particular gene (0, 1, 2, . . . , N ), and can be stored with log2 (N + 1)
bits. Therefore, the probability vector can be represented with ℓ × log2 (N + 1)
bits. This value is of a different order of magnitude than the ℓ × N bits needed
to represent a population in a regular GA, making it feasible to communicate
the model back an forth between different computers.
The storage savings are especially important when using large populations.
For instance, let us suppose that we are interested in solving a 1000-bit problem
using a population of size 1 million. With a regular parallel GA, in order to communicate the whole population it would be necessary to transmit approximately
1 Giga bit over a network. Instead, with the compact GA, it would only be necessary to transmit 20 thousand bits. The difference is large and suggests that
running multiple compact GAs in parallel with model exchanges once in a while
is something that deserves to be explored. We have devised an architecture, that
we call manager-worker, that implements this idea. Figure 1 shows a schematic
of the approach.
worker #1
worker #2
model
enc
fer
l
mo
m
dif
od
de
d
el
iff
mo
m
e
er
del
o
e
e
nc
l
de
model difference
manager
worker #3
m
od
el
m
od
el
di
ffe
...
ren
ce
worker #n
Fig. 1. Manager-worker architecture.
Although Figure 1 resembles a master-slave configuration, we decided to give
it a different name (manager-worker) to contrast with the usual master-slave
architecture of regular parallel GAs. There, the master executes and coordinates
the GA operations and the slaves just compute fitness function evaluations. In
the case of the parallel compact GA that we are suggesting, the manager also
coordinates the work of the workers, but each worker runs a compact GA on
its own. There can be an arbitrary number of workers and there is no direct
communication among them; the only communication that takes place occurs
between the manager and a worker.
4.1
Operational details
One could think of different ways of parallelizing the compact GA. Indeed, some
researchers have proposed different schemes [7] [8]. The way that we are about to
propose is particularly attractive because once the manager starts, there can be
an arbitrary number of workers, each of which can start and finish at any given
point in time making the whole system fault tolerant. The operational details
consist of the following seven steps:
1. The manager initializes a probability vector of size ℓ with each variable set
to 0.5. Then it goes to sleep, and waits to be woken up by some worker
computer.
2. When a worker computer enters in action for the first time, it sends a signal
to the manager saying that it is ready to start working.
3. The manager wakes up, sends a copy of its probability vector to the worker,
and goes back to sleep.
4. Once the worker receives the probability vector, it explores m new individuals
with a compact GA. During this period, m fitness function evaluations are
performed and the worker’s local probability vector (which initially is just a
copy of the manager’s probability vector) is updated along the way.
5. After m fitness function evaluations have elapsed, the worker wakes up the
manager in order to report the results of those m function evaluations. The
results can be summarized by sending only the differences that occurred
between the vector that was sent from the master and the worker’s vector
state after the execution of the m fitness function evaluations.
6. When the manager receives the probability vector differences sent by the
worker, it updates its own probability vector by adding the differences to its
current vector.
7. Then it sends the newly updated probability vector back to the worker. The
manager goes back to sleep and the worker starts working for m more fitness
function evaluations (back to step 4).
There are a number of subtle points that are worth mentioning. First of
all, step number 7 is not a broadcast operation. The manager just sends its
newly updated probability vector to one particular worker. Notice however, that
the manager’s probability vector not only incorporates the results of the m
function evaluations performed by that particular worker, but it also incorporates
the results of the evaluations conducted by the other workers. That is, while a
particular worker is working, other workers might be updating the manager’s
probability vector. Thus, at a given point in time, workers are working with a
slightly outdated probability vector. Although this might seem a disadvantage
at first sight, the error that is committed by working with a slightly outdated
probability vector is likely to be negligible for the overall search because an
iteration of the compact GA represents only a small step in the action of the GA
(this is especially true for large population sizes). The proposed parallelization
scheme has several advantages, namely:
– Low synchronization costs.
– Fault tolerance.
– Scalability.
All the communication that takes place consist of short transactions. Workers
do their job independently and only interrupt the manager once in a while.
During the interruption period, the manager communicates with a single worker,
and the other workers can continue working non-stop.
The architecture is fault tolerant because workers can go up or down at any
given point in time. This makes it suitable for massively parallelization using
the Internet. It is scalable because potentially there is no limit on the number
of workers.
5
Computer simulations
This section presents computer simulations that were done to validate the proposed approach. For the purpose of this paper, we are only interested in checking
if the idea is valid. Therefore, and in order to simplify both the implementation
and the interpretation of the results, we decided to do a serial implementation
of the parallel compact GA architecture. Although it might seem strange (after
all, we are describing a scheme for doing massive parallelization), doing a serial
simulation of the behavior of the algorithm has a number of advantages:
– we can analyze the algorithm’s behavior under careful controlled conditions.
– we can do scalability tests by simulating a parallel compact GA with a large
number of computers without having the hardware.
– we can ignore network delays and different execution speeds of different
machines.
The serial implementation that we have developed simulates that there are a
number of P worker processors and 1 manager processor. The P worker processors start running at the same time and they all execute at the same speed. In
addition, it is assumed that the communication cost associated with a managerworker transaction takes a constant time which is proportional to the probability
vector’s size. Such a scheme can be implemented by having a collection of P regular compact GAs, each one with its own probability vector, and iterating through
all of them, doing a small step of the compact GA main loop, one at a time.
After a particular compact GA worker completes m fitness function evaluations,
the worker-manager communication is simulated as illustrated during section 4.
We present experiments on a single problem, a bounded deceptive function consisting of the concatenation of 10 copies of a 3-bit trap function with
deceptive-to-optimal ratio of 0.7 [9]. This same function has been used in the
original compact GA work. We simulate a selection rate of s = 8 and did tests
with a population size of N = 100000 individuals (each worker processor runs
a compact GA that simulates a 100000 population size). We chose this population size because we wanted to use a size large enough to solve all the building
blocks correctly. We use s = 8 following the recommendation given by Harik et
al. in the original compact GA paper for this type of problem. Finally, we chose
this problem as a test function because, even though the compact GA is a poor
algorithm in solving the problem, we wanted to use a function that requires a
large population size because those are the situations where the benefits from
parallelization are more pronounced.
Having fixed both the population size and the selection rate, we decided
to systematically vary the number of worker processors P , as well as the m
parameter which has an effect on the rate of communication that occurs between
the manager and a worker. We did experiments for P in {1, 2, 4, 8, 16, 32, 64,
128, 256, 512, 1024}, and for a particular P , we varied the parameter m in {8,
80, 800, 8000, 80000}. This totalled 55 different configurations, each of which
was run 30 independent times.
6
7
10
m=8
m = 80
m = 800
m = 8000
m = 80000
6
10
5
10
4
10
m=8
m = 80
m = 800
m = 8000
m = 80000
5
10
communication steps per processor
function evaluations per processor
10
4
10
3
10
2
10
1
10
0
10
−1
3
10
1
2
4
8
16
32
64
128
number of processors
256
512
1024
10
1
2
4
8
16
32
64
128
256
512
1024
number of processors
Fig. 2. Both graphs depict a log-log plot. On the left, we see the average number
of function evaluations per processor. On the right, we see the average number
of communication steps per processor.
The m parameter is important because it is the one that affects communication costs. Smaller m values imply an increase in communication costs. On the
other hand, for very large m values, performance degrades because the compact
GA workers start sampling individuals from outdated probability vectors.
Figure 2 shows the results. In terms of fitness function evaluations per processor, we observe a linear speedup for low m values. For instance, for m = 8
we observe a straight line on the log-log plot. Using the data directly, we calculated the slope of the line and obtained an approximate value of -0.3. In order
to take into account the different logarithm bases, we need to multiply it by
log2 10 (y-axis is log10 , x-axis is log2 ) yielding a slope of approximately -1. This
means that the number of function evaluation per processor decreases linearly
with a growing number of processors. That is, whenever we double the number
of processors, the average number of fitness function evaluations per processor
gets cut by a half.
Likewise, in terms of communication costs, as we raise the parameter m, the
average number of communication steps between manager and worker decreases
in the same proportion as expected. For instance, for m = 80, communication
costs are reduced 10 times when compared with m = 8. Notice that there is
a degradation in terms of speedup for the larger m values. For instance, for
m = 8000 and m = 80000 (which is about the same order of the population size),
the speedup obtained goes away from the idealized case. This can be explained by
the fact that in this case (and especially with a large number of processors), the
average number of communication steps per processor approaches zero. That
means that a large fraction of processors were actually doing some work but
never communicated their results back to the manager because the problem was
solved before they had a chance to do so.
6
Extensions
This work has a number of extensions worthwhile exploring. Below, we outline
some of them:
–
–
–
–
Build theory for analyzing the effect of m, N , and P .
Compare with traditional parallel GA schemes.
Extend the approach to multivariate PMBGAs.
Take advantage of the Internet and build something like SETI@@home.
It would be interesting to study the mathematical analysis of the proposed
parallel compact GA. A number of questions come to mind. For instance, what
is the effect of the m parameter? What about the number of workers P ? Should
m be adjusted automatically as a function of P and N ? Our experiments suggest
that there is an “optimal” m that depends on the number of compact GA workers
P , and most likely depends on the population size N as well.
Another extension that could be done is to compare the proposed parallel architecture with those used more often in traditional parallel GAs, either
master-slave and multiple deme GAs. Again, our experiments suggest that the
parallel compact GA is likely to be on top of regular parallel GAs due to lower
communication costs.
The model structure of the compact GA never changes, every gene is always
treated independently. There are other PMBGAs that are able to learn a more
complex structure dynamically as the search progresses. One could think of using
some of the ideas presented here for parallelizing these more complex PMBGAs.
Finally, it would be interesting to have a parallel compact GA implementation
based on the Internet infrastructure, where computers around the world could
contribute with some processing power when they are idle. Similar schemes have
been done with other projects, one of the most well known is the SETI@@home
project [10]. Our parallel GA architecture is suitable for a similar kind of project
because computers can go up or down at any given point in time.
7
Summary and conclusions
This paper reviewed the compact GA and presented an architecture that allows
its massive parallelization. The motivation for doing so has been discussed and
a serial implementation of the parallel architecture was simulated. Computer
experiments were done under idealized conditions and we have verified an almost
linear speedup with a growing number of processors.
The paper presented a novel way of parallelizing GAs. This was possible due
to the different operational mechanisms of the compact GA when compared with
a more traditional GA. By taking advantage of the compact representation of
the population, it becomes possible do distribute its representation to different
computers without the associated cost of sending it individual by individual.
Additional empirical and theoretical research needs to be done to confirm
our preliminary results. Nonetheless, the speedups observed in our experiments
suggest that a massive parallelization of the compact GA may constitute an
efficient and practical alternative for solving a variety of problems.
References
1. Cantú-Paz, E.: Efficient and accurate parallel genetic algorithms. Kluwer Academic
Publishers, Boston, MA (2000)
2. Pelikan, M., Goldberg, D.E., Lobo, F.: A survey of optimization by building and using probabilistic models. Computational Optimization and Applications 21 (2002)
5–20 Also IlliGAL Report No. 99018.
3. Larrañaga, P., Lozano, J.A.E.: Estimation of distribution algorithms: A new tool
for evolutionary computation. Kluwer Academic Publishers, Boston, MA (2001)
4. Harik, G.R., Lobo, F.G., Goldberg, D.E.: The compact genetic algorithm. IEEE
Transactions on Evolutionary Computation 3 (1999) 287–297
5. Ocenasek, J., Schwarz, J., Pelikan, M.: Design of multithreaded estimation of
distribution algorithms. Proceedings of the Genetic and Evolutionary Computation
Conference (GECCO-2003) (2003) 1247–1258
6. Lam, W., Segre, A.: A parallel learning algorithm for bayesian inference networks.
IEEE Transactions on Knowledge Discovery and Data Engineering 14 (2002) 93–
105
7. Ahn, C.W., Goldberg, D.E., Ramakrishna, R.S.: Multiple-deme parallel estimation
of distribution algorithms: Basic framework and application. IlliGAL Report No.
2003016, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms
Laboratory, Urbana (2003)
8. Hidalgo, J.I., Prieto, M., Lanchares, J., Baraglia, R., Tirado, F., Garnica, O.:
Hybrid parallelization of a compact genetic algorithm. In: Eleventh Euromicro
Conference on Parallel, Distributed and Network-Based Processing. (2003) 449–
455
9. Deb, K., Goldberg, D.E.: Analyzing deception in trap functions. In Whitley, L.D.,
ed.: Foundations of Genetic Algorithms 2, San Mateo, CA, Morgan Kaufmann
(1993) 93–108
10. Korpela, E., Werthimer, D., Anderson, D., Cobb, J., Lebofsky, M.: SETI@@home massively distributed computing for SETI. Computing in Science and Engineering
3 (2001) 79