AIM−GP and Parallelism
Peter Nordin
Physical Resource Theory
Chalmers University of
Technology
Sweden
Frank Hoffmann
LS11
University of
Dortmund
Germany
Frank D. Francone
RML
Technologies, Inc.
USA
Wolfgang
Banzhaf
LS11
University of
Dortmund
Germany
Markus Brameier
LS11
University of
Dortmund
Germany
Abstract
Many machine learning tasks are just too hard to be
solved with a single processor machine, no matter how
efficient algorithms we use and how fast our hardware is.
Luckily genetic programming is well suited for
parallelization compared to standard serial algorithms.
This paper describes the first parallel implementation of
an AIM-GP system creating the potential for an
extremely fast system. The system is tested on three
problems and several variants of demes and migration
are evaluated. Most of the results are applicable to both
linear and tree based systems.
Keywords: Genetic Programming, Machine Code GP,
Parallelism
1. Introduction
The hardware speed of desktop computers is growing
exponentially. This means that every year new
application areas can be addressed with machine learning
and adaptive systems. The constant development of new
algorithms and implementation methods also enables us
to explore previously impossible terrain. However there
are absolute physical limitations to even the fastest serial
computer with the most efficient algorithm. The ultimate
goal of artificial intelligence might be to build an
algorithm with capabilities of the human mind. The
problem is that if we were to simulate the maximum
computing capability of the human brainwith all
synapses activea serial computer must be smaller than
a single atom to accomplish this. This fact is due to
limitations caused by the speed of light and there is
nothing we can do about it except for extending our
algorithms to parallel computers. In addition, parallel
desktop computers are becoming mainstream with
seamless support from operating systems making multiprocessor systems uncomplicated to use.
the most studied linear GP method. But as seen above,
even a very efficient algorithm will eventually benefit
from parallelization. This paper documents the
implementation of a parallel AIM-GP algorithm and
experiments design to evaluate it. A large part of the
results count for tree based systems, too.
2. Method
The method can be subdivided into AIM-GP
implementation, hardware, and parallelization:
2.1 AIM−GP
AIM-GP uses a linear representation directly
corresponding to binary machine code. Crossover is
performed as a string crossover between instructions
while mutation switches instruction or selected bits
within the instruction. In the experiments we use the
PowerPC chip from Motorola and the original version of
AIM-GP without instruction blocks. For further details
on AIM-GP see (Nordin 1997, Banzhaf et al . 1998,
Nordin et al. 1999).
2.2 Hardware
To connect several processors into a parallel computer is
not a trivial task. Several different architectures exist. In
our experiments we have used the Parsytec Power
Explorer, one of the most widespread commercial
systems. The power explorer is a MIMD machine
(multiple instruction multiple data system) with up to 64
power PC processors. The configuration used for the
experiments was a 16 processor variant, but the approach
will port directly to 64 processors.
Each node in the computer consists of a PowerPC601, a
T414 support transputer processor, 32 MB of RAM and
Boot ROM as seen in Figure 1. The support processor
manages efficient serial communication between nodes,
see Figure 2.
Genetic programming has proven to be a very robust,
efficient and broadly applicable method. The good news
in this context is that it is also very suitable for
parallelization which has been successfully demonstrated
(Andre and Koza 1996). Some parallel approaches even
display nearly "super linear speed-up" over the simple
serial approach. This means that the parallel version of
the algorithm is more efficient even if it is simulated on a
purely serial machine (Andre and Koza 1996). This is a
very beneficial property of a machine learning algorithm
and it has led to unique and impressive projects such as
the 1000 processor Alpha machine under construction in
John Koza's team.
AIM-GP (Automatic Induction of Machine Code with
Genetic Programming) is formerly know as CGPS. This
method induces binary machine code directly without
any interpreting steps enabling speed-ups of two orders
of magnitude. AIM-GP is the fastest GP approach and
Figure 1: A node in the Power XPlorer
Figure 4: Connection of Nodes
Figure 2: The T414 Transputer
The PowerPC is a conventional cached RISC processor
also used in for instance Macintosh computers.
We use the standard set-up with the Ultrix operation
system.
2.3 GP and parallelization
There are several ways to parallelize the GP algorithm. It
could be divided both with regard to fitness cases and to
the population. However, the most common way to
achieve parallelization is through "demes". The notion of
demes was introduced by Wright in 1943 as a description
of a phenomenon in natural evolution where an animal
population is divided into subpopulations (Wright 1943).
Between the otherwise separated subpopulations
infrequent migration takes place. This model is often
referred to as the island model, see Figure 5. Demes are
argued to reduce the possibility for evolution to end up in
a local minimum and this increase the efficiency of the
paradigm. This "super linear speed-up" has been
observed in simulations of evolutionary algorithms.
Figure 3: The PowerPC processor
The nodes of the the Parsytec Power Explorer are
coupled together in a grid as seen in Figure 4.
Figure 5: Island Model
The demes can be arranged in different patterns. Two of
the most common are the "stepping stone model as ring"
and "the stepping stone model as lattice" (Kimura 1953),
see Figure 6.
Figure 6: Stepping Stone model as ring and lattice
A related approach using demes is the neighborhood
model. Here the individual can migrate to more than the
closest island, they can move within a predefined
"neighborhood distance", see Figure 7 (Gorges-Schleuter
1989).
Figure 8: A node with AIM−GP
The emigration of individuals takes place through an
input/output queue where the export thread is responsible
for sending it to another node. In a similar way there is
an immigration queue and an immigration thread on the
receiving island. The immigration individual in the
simplest case replaces a random existing individual.
Figure 7: Neighborhood model
There exists several implementations of parallel EAs in
research. Tanese (Tanese 1989) used an island model of
a GA for a parallel machine. He concluded that
migration rates have a decisive influence on the
performance of the algorithm and that it is possible to
achieve at least linear speed-up by adding new nodes.
Andre and Koza implemented a parallel GP system on a
transputer network (Andre and Koza 1996).
2.4 Design of the AIM-GP system
To maximize performance we knew that we needed to
keep communication between nodes to a moderate level.
We also wanted to keep the system flexible and portable
using standard Ultrix features. The main design of an
AIM-GP node is illustrated in Figure 8.
AIMGP uses 8 of the 32 registers of the PowerPC (r03 r10), while the machine instructions in the function set
were add, addi, subf, mullw, mulli, and, andi, or, ori,
xor, xori, eqv, nand, nor, slw, srw, sraw.
The output of the system can be presented both as
assembler and decompiled into C, for example:
00000000:
00000004:
00000008:
0000000C:
00000010:
00000014:
00000018:
0000001C:
00000020:
00000024:
00000028:
0000002C:
00000030:
00000034:
00000038:
0000003C:
00000040:
00000044:
00000048:
7D032378
7C051838
7CE85038
7C883C30
7C683C30
7CA54A14
7CE54BB8
7C8643B8
7C644378
7CA449D6
7CE623B8
7C880278
7D0A1838
7C842038
7C895430
7CE019D6
7C834B78
7C834B78
4E800020
* or r03,r08,r04 ;
and r05,r00,r03 ;
and r08,r07,r10 ;
srw r08,r04,r07 ;
* srw r08,r03,r07 ;
add r05,r05,r09 ;
nand r05,r07,r09 ;
nand r06,r04,r08 ;
* or r04,r03,r08 ;
mullw r05,r04,r09 ;
nand r06,r07,r04 ;
* xor r08,r04,r00 ;
* and r10,r08,r03 ;
* and r04,r04,r04 ;
* srw r09,r04,r10 ;
mullw r07,r00,r03 ;
or r03,r04,r09 ;
* or r03,r04,r09 ;
blr ;
int individual_function(int r03, int r04, int
r05, int r06, int r07, int r08)
{
r03 = r08 | r04;
r08 = r03 >> r07;
r04 = r03 | r08;
r08 = r04 ^ r00;
r10 = r08 & r03;
r04 = r04 & r04;
r09 = r04 >> r07;
r03 = r04 | r06;
return r03;
}
3. Experiments
3.1 Evaluation problems
We selected three problem classes to evaluate the
properties of the parallel AIM−GP:
1.
2.
3.
generalization. The output of the individual is interpreted
as one of two cases depending on if its greater than or
less than a certain threshold value. The individual is free
to use all of the machine code instruction in the PPC
AIM-GP implementation.
A boolean problem
A function regression problem
An image classification problem
Twenty runs with different random seeds were performed
for each of the problem set−ups. All problems were
either run for a maximal number of tournaments or until
a perfect solution (fitness 0) was found. The fitness
function always used the sum of the absolute values of
errors. Both mutation and crossover probabilities were
set to 90%.
Function regression
For function regression we used a polynomial which
allowed for scaling of difficulty:
f ( x , y ) = 5( x 4 + y 3 ) − 3 x 2
Each time the fitness cases consisted of 200 randomly
selected values. The instructions in the function set
consisted of the operations add, sub, mul.
Figure 8: Exampe image segmentatio
Parity function
The parity function takes a number of N bits and outputs
"1" if the number of "ones" is even or "0" otherwise:
N
e=
⊕b
n=0
n
In this case the function set was {and, or, nand, nor}
with the constants 1 and 0. We deliberately sustained
from using the xor instruction to make the problem
harder. Using the xor AIM-GP solves the problem almost
immediately. The parity function belongs to the hardest
class of Boolean function with a very rough search space.
Partitioning of Images
Images classification is a growing application domain
due to increasing use of digital images and due to
decreasing cost of hardware.
The objective of the problem is to divide the pixels of an
image into two classes depending on properties of the
original pixels in the image. Figure 8 illustrates the
output of an individual segmenting the image in two
areas with two gray-tones. The individual is feed by x
and y coordinates and expected to give the class of that
pixel as output. 500 randomly selected pixels are used for
individual fitness and another 500 are used for
Figure 9: Input image data
Initial populations show individuals usually overfitting
stripes to certain points as in Figure 10. Later more
complex, often fractal patterns appear, see Figure 11.
Finally a fitting geometry evolves as in Figure 8.
Figure 10: Stipes: uses only one input
system uses tournament selection. The migration rate is
measured in relation to number of tournaments. With a
very high migration rate (every 250th tournament) we get
a development as in Figure 13. The high migration rate
results in a worse final fitness. With a migration every
1000nd tournaments we get a graph as depicted in Figure
14. We conclude that migration is important for the
parallelization of a GP system and we can see a 15 fold
increase in performance in the best migration setting
compared with runs without migration. On the other hand
too much migration is not optimal. In all test cases we
see worse performance with too high migration.
4.3 Migration Strategies
The method by which an individual is selected for
emigration controls the quality of the emigrating
individual. In our experiments we evaluated several
different approaches. We used random emigration,
emigration of the best and tournament emigration.
Tournament emigration chooses the winner of the last
tournament as emigrant giving a random distribution of
better individuals but seldomly the best individual in the
population.
Figure 11: Pattern which uses both inputs
4. Results
In this work the parallel AIMGP system has been
investigated in relation to the influence of migration on
the learning process. The system has been configured
using a sub−population size of 1000 individuals for all
16 processing nodes. Each experimental result is
documented as an average of the 20 runs performed.
4.1 Migration Effects
The migration between demes has a qualitative and a
quantitative effect. The migration frequency specifies
how many individuals migrate each time interval. The
qualitative part defines which individuals are allowed to
migrate.
4.2 Migration Frequency
In all our experiments migration rate is varied by
changing the frequency of migration instead of the
number of individuals that are emigrated from each deme
during each migration phase. This is different from
(Andre and Koza 1996) who took the latter approach.
Our migration technique is strongly motivated by nature
where migration is a rather continuous process. During
each migration step only one individual is selected from
each deme and moved to all adjacent nodes in the
transputer network as shown in Figure 4. In addition to
the motivation by nature there are some technical
advantages: The workload of the links is reduced and
synchronization problems occur less frequently.
A test run without migration is shown in Figure 12 where
the fitness development of all isolated demes can be
seen. Effectively, this means running the problem with
different random seeds 16 times simultaneously. The GP
Tables 1-9 summarize our results for the three problems
and the three migrations strategies respectively. Tables 1
and 2, for instance, show how random migration and
tournament migration tend to perform similarly in
relation to the best fitness found, the number of
evaluations and the number of executed instructions per
run. Average values and standard deviations (σ) are
given for all three measures.
In comparison, emigration of the best performs worse
(see Table 3) probably due to a higher tendency to get
stuck in local optima. Only runs with very fast search
might do better with emigration of the best since a more
elitistic approach is better as a global optimum is being
approached.
There seems to be a trade-off between fast progress in
fitness and loss of diversity with emigration of the best as
illustrated in Figure 19. Loss of diversity lets the
evolutioary process run into local sub-optima because no
new genetic material can be created. This is especially
the case if best individuals are reproduced in different
demes with high migration rates. In contrast, Figure 18
shows a rather continuous improve in fitness for the
random strategy when the migration rate is increased.
The selection of individuals which are to be replaced by
the immigrants can also affect performance. We have
studied three basic methods: random selection, selection
of the worst individual and tournament selection where
the loosers of the tournament are selected for
replacement. We found that there is little difference in
performance. Even random selection appears to be a
robust method.
σ Fitness
% No solution
0
0
0
0
0
0
0
0
0
0
185
30
Evaluations*106
162
240
198
219
346
2569
σ Evaluations
Instructions*106
13
49
25
20
99
588
7790
12823
9852
10504
σ Instructions
856
3078
1658
1836
19523 161755
6349
37683
Table 1: Random emigration (function regression)
Migration distance
Figure 12: Fitness over time (in million evaluations) of
best individual in all 16 demes without migration
∞
250
500
1000
2000
3000
Fitness
0
0
72
0
0
355
σ Fitnes
% No solution
0
0
72
0
0
185
0
0
5
0
0
20
Evaluations*106
272
164
480
381
272
2569
σ Evaluations
Instructions*106
82
14
304
125
39
588
9811
7650
σ instructions
1068
921
28086 14309
19492
2655
14954 161755
2480
37683
Table 2: Tournament emigration (function regression)
250
500
1000
2000
3000
∞
Fitness
79
0
0
72
218
355
σ Fitness
% No solution
55
0
0
72
150
185
10
0
0
5
10
20
Evaluations*106
1371
444
252
637
982
2569
σ Evaluations
Instructions*106
472
137
46
321
415
588
80121
25400
13374 38064
σ instructions
30563
8821
2910 20575
Migration distance
60188 161755
26577
37683
Table 3: Emigration of the best (function regression)
Figure 13: Fitness over time of best individual in all
demes with migration every 250 tournaments
Migration
distance
250
500
1000
2000
3000
Fitness
0,5
0,6
0,6
0,9
1,7
6,2
σ Fitnes
% No solution
0,2
0,3
0,3
0,4
0,8
0,8
20
20
15
20
25
95
Evaluations*106
988
930
865
834
1085
2014
172
169
171
160
164
78
σ Evaluations
Instructions*106
σ Instructions
∞
23344 21198 21382 19862 258084 477506
7
2
3
0
45273 42213 43789 41124
43065
29289
Table 4: Random emigration (parity function)
Migration
distance
Fitness
σ Fitnes
500
1000
2000
3000
1
1
0,9
0,8
1,1
6,2
0,5
0,5
0,5
0,4
0,4
0,8
∞
20
20
20
20
35
95
6
710
765
835
973
1144
2014
σ Evaluations
Instructions*106
179
164
162
172
171
78
% No solution
Evaluations*10
Figure 14: Fitness over time of best individual in all
demes with migration every 1000 tournaments.
250
σ Instructions
16696 17718 19812 23400 277529 477506
2
5
2
7
45829 43833 41783 44068
43968
29289
Table 5: Emigration of the best (parity function)
Migration distance
Fitness
250
500
1000
2000
3000
∞
0
0
0
0
0
355
Migration
distance
250
500
1000
2000
3000
∞
Fitness
0,2
12
12
8
10
6,2
σ Fitnes
% No solution
0,1
1
1,2
1,15
1,2
0,8
10
20
25
25
20
95
801
790
1063
999
998
2014
157
184
177
187
156
78
Evaluations*10
6
σ Evaluations
Instructions
σ Instructions
19013 18767 25620 24087 240227 477506
1
8
7
0
40300 47350 45369 47971
40384
29289
Table 6: Tournament emigration (parity function)
Migration distance
Fitness
σ Fitness
100
400
1000
2000
34
34
37
57
∞
97
8
9
8
14
9
Table 7: Random emigration (image partitioning)
Migration distance
Fitness
σ Fitness
100
400
1000
2000
24
49
54
68
∞
97
8
14
23
14
9
Figure 20: Progress of fitness for the different migration
strategies and migration frequency 250 (function regression)
Table 8: Emigration of the best (image partitioning)
Migration distance
Fitness
σ Fitness
100
400
1000
2000
41
33
44
58
∞
97
5
3
3
5
9
Table 9: Tournament migration (image partitioning)
5. Summary
Our AIM−GP system has been parallized efficiently on a
transputer network. Results have been presented for
different problem domains using different migration
strategies and migrations rates. In general the deme
approach has proven a usefull concept for parallization .
Acknowledgement
Peter Nordin gratefully acknowledges support from the
Swedish Research Council for Engineering Sciences.
Bibliography
Figure 18: Best fitness over time (in million evaluations) for
random migration and different migration frequencies
(function regression)
Figure 19: Best fitness over time for elitist migration and
different migration frequencies (function regression)
[Andre and Koza 1996] Andre, D. and Koza, J. (1996)
Parallel Genetic Programming: A Scalable
Implementation Using The Transputer Network
Architecture. In Angeline, P.J. and Kinnear, K.E.
(eds.), Advances in Genetic Programming 2, MIT
Press, Cambrige.
[Banzhaf et al., 1998] Banzhaf, W., Nordin, P. Keller,
R. E., and Francone, F. D. (1998). Genetic
Programming An Introduction. On the
automatic evolution of computer programs and its
applications. Morgan Kaufmann, San Francisco
and dpunkt Verlag, Heidelberg.
[Gorges-Schleuter 1990] Gorges-Schleuter, M. (1990)
Genetic algorithms and population structure A
massively parallel algorithm. Ph.D. thesis,
University of Dortmund, Germany.
[Kimura 1953] Kimura, M. (1953) Stepping Stone
Model of Population. Annual Report of Nat. Gent.
Japan, p. 62-63.
[Nordin 1997] Nordin, J.P. (1997) Evolutionary
Program Induction of Binary Machine Code and
its Application. Krehl Verlag, Münster, Germany.
[Nordin et al. 1999] Nordin, P., Banzhaf, W., and
Francone, F. (1999) Effective Evolution of
Machine Code for CISC Architecture using Blocks
and Homologous Crossover. In Spector, L.,
Langdon, W.B., O`Reilly, U.-M., and Angeline,
P.J. (eds.), Advances in Genetic Programming 3,
MIT Press, in press.
[Tanese 1989] Tanese, R. (1989) Distributed Genetic
Algorithms. In Schaffer, J.D. (ed.), Proceedings of
the 3rd International Conference on Genetic
Algorithms, Morgan Kaufmann.
[Wright 1953] Wright, S. (1943) Genetics 28, p.114.