Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
1. Introduction
In the last decade, fuzzy logic has supplanted conventional technologies in some scientific
applications and engineering systems especially in control systems, particularly the control
of the mobile robots evolving (moving) in completely unknown environments. Fuzzy logic
has the ability to express the ambiguity of human thinking and translate expert knowledge
into computable numerical data. Also, for real-time applications, its relatively low
computational complexity makes it a good candidate. A fuzzy system consists of a set of
fuzzy if-then rules. Conventionally, the selection of fuzzy if-then rules often relies on a
substantial amount of heuristic observation to express the knowledge of proper strategies.
Recently, many authors proved that it is possible to reproduce the operation of any standard
continuous controller using fuzzy controller L. Jouffe, C. Watkins, P. Dayan Dongbing Gu,
Huosheng Hu, Libor Spacek . However it is difficult for human experts to examine complex
systems, then it isn't easy to design an optimized fuzzy controller.
Generally the performances of Fuzzy inference system (FIS) depend on the formulation of
the rules, but also the numerical specification of all the linguistic terms used and an
important number of choices is given a priori, also it is not always easy or possible to extract
these data using human expert. These choices are carried with empirical methods, and then
the design of the FIS can prove to be long and delicate vis--vis the important number of
parameters to determine, and can lead then to a solution with poor performance. To cope
with this difficulty, many researchers have been working to find learning algorithms for
fuzzy system design. These automatic methods enable to extract information when the
experts priori knowledge is not available.
The most popular approach to design Fuzzy Logic Controller (FLC) may be a kind of
supervised learning where the training data is available. However, in real applications,
extraction of training data is not always easy and become impossible when the cost to obtain
training data is expensive. For these problems, reinforcement learning is more suitable than
supervised learning. In reinforcement learning, an agent receives from its environment a
critic, called reinforcement, which can be thought of as a reward or a punishment. The
objective then is to generate a policy maximizing on average the sum of the rewards in the
course of time, starting from experiments (state, action, reward).
Source: Robotics, Automation and Control, Book edited by: Pavla Pecherkov, Miroslav Fldr and Jindich Dunk,
ISBN 978-953-7619-18-3, pp. 494, October 2008, I-Tech, Vienna, Austria
www.intechopen.com
228
In this chapter, we used the algorithm of reinforcement learning, Fuzzy Q-Learning (FQL) L.
Jouffe, A. Souissi which allows the adaptation of apprentices of the type FIS (continuous states
and actions), fuzzy Q-learning is applied to select the consequent action values of a fuzzy
inference system. For these methods, the consequent value is selected from a predefined
values set which is kept unchanged during learning, and if an improper value set is assigned,
then the algorithm may fail. Also, the approach suggested called Fuzzy-Q-Learning Genetic
Algorithm (FQLGA), is a hybrid method of Reinforcement Genetic combining FQL and
genetic algorithms for on line optimization of the parametric characteristics of a FIS. In FQLGA
we will tune free parameters (precondition and consequent part) by genetic algorithms (GAs)
which is able to explore the space of solutions effectively. However, many times the priory
knowledge about the FIS structure is not available, as a solution, the suggested approach
called Dynamic Fuzzy Q-Learning Genetic Algorithm (DFQLGA), which is a hybrid learning
method, this method combines the Dynamic Fuzzy Q-Learning algorithm (DFQL) Meng Joo
Er, Chang Deng and the genetic algorithms to optimize the structural and parametric
characteristics of the FIS, with out any priori knowledge, the interest of the latter (GA) is to
explore the space of solutions effectively and permits the optimization of conclusions starting
from a random initialization of the parameters.
This chapter is organized as follows. In Section II, overviews of Reinforcement learning,
implementation and the limits of the Fuzzy-Q-Learning algorithm is described. The
implementation and the limits of the Fuzzy-Q-Learning algorithm are introduced in Section
III. Section IV describes the combination of Reinforcement Learning (RL) and genetic
algorithm (GA) and the architecture of the proposed algorithm called Fuzzy-Q-Learning
Genetic Algorithm (FQLGA). In section V we present the DFQL algorithm, followed by the
DFQLGA algorithm in section VI. Section VII shows simulation and experimentation results
of the proposed algorithms, on line learning of two elementary behaviours of mobile robot
reactive navigation, Go to Goal and Obstacles Avoidance is presented with discussion.
Finally, conclusions and prospects are drawn in Section VIII.
2. Reinforcement learning
As previously mentioned, there are two ways to learn either you are told what to do in
different situations or you get credit or blame for doing good respectively bad things. The
former is called supervised learning and the latter is called learning with a critic, of which
reinforcement learning (RL) is the most prominent representative. The basic idea of RL is
that agents learn behaviour through trial-and-error, and receive rewards for behaving in
such a way that a goal is fulfilled.
Reinforcement signal, measures the utility of the exits suggested relative with the task to be
achieved, the received reinforcement is the sanction (positive, negative or neutral) of behaviour:
this signal states that it should be done without saying how to do it. The goal of reinforcement
learning, is to find the behaviour most effective, i.e. to know, in each possible situation, which
action is achieved to maximize the cumulated future rewards. Unfortunately the sum of
rewards could be infinite for any policy. To solve this problem a discount factor is introduced.
R= k r
k
k 0
Where 0
www.intechopen.com
(1)
229
The idea of RL can be generalized into a model, in which there are two components: an agent
that makes decisions and an environment in which the agent acts. For every time step, the
agent perceives information from the environment about the current state, s. The information
perceived could be, for example, the positions of a physical agent, to simplify say the x and y
coordinates. In every state, the agent takes an action ut, which transits the agent to a new state.
As mentioned before, when taking that action the agent receives a reward.
Formally the model can be written as follows; for every time step t the agent is in a state
st S where S is the set of all possible states, and in that state the agent can take an action
at (At), where (At) is the set of all possible actions in the state st. As the agent transits to a
new state st+1 at time t + 1 it receives a numerical reward rt+1. It up to date then its estimate
of the function of evaluation of the action using the immediate reinforcement, rt +1, and the
estimated value of the following state, Vt (St+1), which is defined by:
( )
Vt s
= max Q s , ut
t +1 u U
t +1
t t+1
(2)
( )
Q s t , ut = Q ( st , ut ) + r + Vt s
Q ( st , ut )
t +1
t +1
t +1
Where r
(3)
(4)
can lead to local minima. To obtain a useful estimate of Q, it is necessary to sweep and
evaluate the whole of the possible actions for all the states: it is what one calls the phase of
exploration L. Jouffe, A. Souissi. In the preceding algorithm, called TD (0), we use only the
state which follows the robot evolution, moreover only the running state is concerned. Sutton
A. Souissi extended the evaluation in all the states, according to their eligibility traces that
memorise the previously visited state action pairs in our case. Eligibility traces can be defined in
several ways L. Jouffe, C. Watkins, P. Dayan, A. Souissi. Accumulating eligibility is defined by:
1 + e ( s )
t 1
et ( s ) =
et 1 ( s )
si
s = st
(5)
else
( )
Q st , ut = Q ( st , ut ) + r + Vt s
Q ( st , ut ) .et ( s )
+1
t +1
t +1
www.intechopen.com
(6)
230
Q t ( S
2.
t +1
3.
( M a x ( q ti ( U ) ) R ( S
),
t +1
R i At U U
i
) =
t +1
= r + Qt ( S ) Qt ( St ,U t )
t +1
t +1
T
q i = qti + .# eti , Ri
t +1
t +1
4.
(S ) =
Election (qi ) ( S ), U U ,
U t +1 Ri t +1
t +1 t +1 R A
i t +1
5.
www.intechopen.com
ei (U i ) + i , (U i = U i )
t
t +1
t +1
ei (U i ) =
t +1
i
i
sinon
et (U ),
231
Q (S ,U ) = qi (U i ) (S ),
t+1 t+1 t+1 R A
t +1 t+1 Ri t+1
i t +1
This value will be used to calculate the TD error in the next step time. However the
performances of the controller are closely dependent on the correct choice of the discrete
actions set, witch is determined using a priori knowledge about system, for complex systems
like robots, priori knowledge are not available, then it becomes difficult to determine a set of
correct actions in which figure the optimal action for each fuzzy rule. To solve this problem
and to improve the performances of the reinforcement learning, the genetic algorithms will
explore the broadest space of solutions to find the solution optimal Dongbing Gu, Huosheng
Hu, Libor Spacek Chia-Feng Juang Min-Soeng Kim and Kim and Ju-Jang Lee Chia-Feng
Juang and Chun-Feng Lu, and that without any priori knowledge.
Reproduction: Individuals are copied according to their fitness values. The individuals
with higher fitness values have more offspring than those with lower fitness values.
Crossover: The crossover will happen for two parents that have high fitness values with
the crossover probability pc. One point crossover is used to exchange the genes.
Mutation: The real value mutation is done by adding a certain amount of noise
(Gaussian in this paper) to new individuals to produce the offspring with the mutation
probability pm. For the ith variable in jth individual, it can be expressed as:
a
= a + (i ).N (0, )
t +1 t
(7)
A. FQLGA Algorithm
Because of its simplicity, a Takagi-Sugeno Fuzzy inference system is considered, with
triangular membership function. The structure (partition of its input space, the
determination of the number of IF-THEN rules) of the FIS is predetermined.
www.intechopen.com
232
a ij is a vector representing the discrete set of K conclusions generated randomly for the rule
Ri with which is associated a vector q ij representing the quality of each action (i= 1 ~N and
j= 1 ~ K).
The principle of the approach is to use a population of K (FIS) and to treat the output of each
one of them as a possible action to apply on the system. FQL algorithm exploits local quality
function q witch is associated with each action of a fuzzy rule (quat.6) whereas FQLGA
algorithm uses as function fitness the sum of local qualities q given by:
N
f ( In d j ) = Q ( S t , S IF
) = q ij
t +1
+1
i =1
(8)
To reduce the training time, the quality matrix Q is not initialized after every iteration, but
undergoes the same genetic operations as those applied to the set of the individuals
(selection, crossing).
B. Optimization of the consequent part of a FIS
A population of K individuals of a predetermined structure is adopted. The size of an
individual is equal to number N of the FISs rules. The architecture of FQLGA algorithm
proposed for the optimization of the conclusions is represented on the figure (1).
Fig. 1. Representation of the individuals and qualities of the actions in FQLGA algorithm
C. Optimization of the antecedent part of a FIS
To find the best set of premises generated by GA, a population made up of M FIS is created.
Each individual (FIS) of the population encode the parameters of the antecedents i.e. the
www.intechopen.com
233
modal points of the FIS and his performance is evaluated by the fitness function of Q (global
quality).
The conclusion part of each individual FIS remains fixed and corresponds to the values
determined previously. The coding of the membership functions of the antecedent part
of a FIS (individual) is done according to the figure (2). To keep the legibility of the FIS,
we impose constraints during the evolution of the FIS to ensure the interpretability of the
FIS.
+1
< . < m1
Nm -2
< m1
Nm
The fitness function used in by genetic algorithm for the optimization of the antecedent part
is the global quality of the FIS which uses the degree of activation of the fuzzy rules; this
fitness function is given by the following equation:
i
Ri ( S (t )).q (t )
R R
f (indi ) = Q( S (t ), SIF i ) = i A
( S (t ))
RiR Ri
A
www.intechopen.com
(9)
234
At the beginning of the learning process, the quality matrix is initialized at zero, and
then traditional algorithm FQL evaluates each action using an exploration policy. This
step finishes when a number of negative reinforcements is received.
After the evaluation of the individuals, the genetic algorithm for the optimization of the
consequent part of the fuzzy rules creates a new better adapted generation. This stage is
repeated until obtaining convergence of the conclusions or after having reached a
certain number of generations. The algorithm passes then to the third stage:
Once the conclusions of the FIS are optimized, the second genetic algorithm for the
optimization of the antecedent part is carried out to adjust the positions of the input
membership functions of the controller which are initially equidistant on their universe
of discourse.
www.intechopen.com
235
www.intechopen.com
236
rotation speed, Vrot_CB and the translation speed Vtran_CB each output is represented by
nine actions initialised randomly.
r
(t ) = 0
Vrot _ CB
-1
Si
Si
(10)
Else
1
Else
The parameters of FQL algorithm and the genetic algorithms are as follows:
LP and Lc respectively indicate the sizes of the chromosomes for the antecedents and the
conclusions part, Np, Nc respectively represent the size of the population of the parameters
of antecedent and the conclusions and Pm the probability of mutation.
The simulation results of the first behaviour "Go to Goal" are presented in the figure (6), 28
generations were sufficient to find the good actions.
www.intechopen.com
237
Figure (7) shows the convergence of the fitness values of the genetic algorithms for the two
output variables Vrot_CB and Vtran_CB obtained during experimental test.
0.8
0.8
0.8
0.6
0.4
0.2
Degree of membership
Degree of membership
Degree of membership
P
1
0.6
0.4
200
400
600
1400
1600
1800
2000
0.6
0.4
0
0
0.2
0.2
200
400
600
1400
1600
1800
2000
200
400
600
1400
1600
1800
2000
Vtran _ EO =
V max
.( Dis _ F Ds )
D max
(11)
r
(t ) = -1 Signe (Vrot _ EO)
Vrot _CB
0 elsewhere
www.intechopen.com
(12)
238
Fig. 9. Trajectories of the robot obtained by FQL algorithm using a random initialization of
parameters
The trajectories of figure (10) show the effectiveness of the association of the reinforcement
learning FQL and the genetic algorithm as stochastic tool for exploration. FQLGA Algorithm
enables to find an optimal FIS for the desired behaviour (obstacles avoidance). The duration
of learning depend to the genetic algorithm parameters and the obstruction density of the
environment. We observe that after each generation the quality of the FIS (sum of local
qualities) increases, which give more chance to the best individuals to be in the next
generations.
Fig. 10. Learning/Validation Trajectories of the robot with FQLGA algorithm for various
environments
www.intechopen.com
239
Figure (11) shows the performances of FQLGA algorithm compared to the FQL algorithm
which can be blocked in a local minimum when the optimal solution is not present in the
randomly generated set of actions. On the other hand FQLGA algorithm converges towards
the optimal solution independently of the initialized values.
FQL --- FQLGA
4.5
4
3.5
Q
ualitduSIF
FQL
FQLGA
2.5
2
Optimum Local
1.5
1
0.5
0
-0.5
0.5
1.5
Temps
2.5
3
4
x 10
Fig. 11. Evolution of the quality of the Fuzzy controller with FQL and FQLGA algorithms
C. Experimental results with the real robot Pioneer II
Figure (12) represents the results of the on line learning of the robot Pioneer II for the
behaviour "Go to goal". During the learning phase, the robot does not follow a rectilinear
trajectory (represented in green) between the starting point and the goal point because
several actions are tested (exploration). Finally the algorithm could find the good actions,
and the robot converges towards the goal marked in red colour, the necessary time to find
these good actions is estimated at 2mn. Each generation is generated after having noted
twenty (20) failures. The learning process requires respectively 32 and 38 generations for GA
to determine rotation and translation speed.
Fig.12. On line learning of the real robot Pioneer II, "Go to goal" behaviour
www.intechopen.com
240
Figure (13) represents the results of the on line learning of the "Obstacles Avoidance" robot
behaviour. For reasons of safety, we consider that the minimal distances detected by the
frontal sonars and of left/right are respectively 408 mm and of 345 mm a lower value than
these distances is considered then as a failure and involves a retreat (represented in green)
of the mobile robot. A generation is created after 50 failures. The genetic algorithms require
14 generations to optimize the conclusions and 24 generations to optimize the parameters of
the antecedents. The duration of on line learning is estimated at 20 min, this time is
acceptable vis--vis the heaviness of the traditional genetic algorithms.
Fig. 13. On line learning of the real robot Pioneer II, behaviour "Obstacle Avoidance
Figure (14) represents the evolution of the fitness function obtained during this
experimentation.
www.intechopen.com
241
avec
K >0
By developing the preceding equation, the errors TD coefficients are expressed as follows:
(13)
After each iteration, all preceding errors TD are balanced by the term
i
K t 1
< 1.
K
Using the squared TD error as the criterion, the rule firing strength i determines how much
the fuzzy rule Ri affects the TD error. It should be noted that (13) acts as a digital low-pass
filter. In this way, TD errors in the past are gradually forgotten as time passes, but are
never completely lost. The more recent the TD error is received, the more it affects the value
of . The initial value of i is set to zero. The parameter K which controls the overall
www.intechopen.com
242
behaviour of is usually selected between ten and 100. A small value of K makes adapt
very rapidly and a large value makes more stable in a noisy environment. Thus, if i is
lower than a certain threshold ke, further segmentations should be considered for this fuzzy
subspace at least. The figure (15) represents the global flowchart of the DFQL algorithm
Meng Joo Er,et Chang Deng.
www.intechopen.com
243
N
f ( Ind j ) = Q ( S t , SIF ) = q ij
+1
t +1
i =1
(14)
To accelerate the time of learning, the quality matrix Q is not initialized at any iteration, but
undergoes the same genetic operations as those applied on the population of the individuals
www.intechopen.com
244
(selection, crossing). A new generation is created not at any iteration but only after having
received a specified number of failures (negative rewards) Chia-Feng Juang, Min-Soeng Kim
and Kim and Ju-Jang Lee. Between two generations, the evolution of the quality matrix is
done by the FQL algorithm according to the equation (6). An exploration/exploitation
policy (EEP) is implemented at the beginning of the FQL algorithm; thereafter it is ensured
by the mutation operator of the genetic algorithm.
www.intechopen.com
245
initialized with one fuzzy sub-set only for each fuzzy rule (Fig.18). The two outputs
variables are the rotation velocity Vrot_GG and the translation, velocity Vtran_GG.
Fig. 18. the initialized membership functions for the two input of the FIS
www.intechopen.com
246
rVrot _GG = 0
if
if
else
r
= 1 if Vtrans > 220mm & g 800 mm & < 50
Vtrans_GG
RG
1 else
(15)
The genetic algorithm run only, once a number of fuzzy rules Ns is reached by algorithm
DFQL. Np, Pm respectively represents the size of the population and the probability
mutation.
The figure (19) represents the results at the beginning of the learning, after the generation of
four fuzzy rules; the genetic algorithm was not run yet. The membership functions
determined by the DFQL algorithm are represented on the figure (20).
N=One Rule
www.intechopen.com
a) Angle
Rb
(Robot-Goal)
b) Distance (Robot-Goal)
Fig. 20. Determined membership functions after the generation of four fuzzy rules
a) Angle
RB
b) Distance
Fig. 21. final membership functions after the generation of 08 fuzzy rules
www.intechopen.com
247
248
Vtran _ EO =
V max
.( Dis _ F Ds )
D max
(16)
Vmax is the maximum velocity of the robot equal to 350m/s. Dmax is the maximum reading
allowed the frontal sensors equal to 2000mm and Dsis a safety distance fixed at 250mm. The
parameters of the algorithm are given as follows:
Fig. 23. Trajectories of the robot, Learning & Validation with the algorithm DFQLGA for
various environments.
a
b
c
Fig. 24. Finals membership functions of the inputs of the FIS determined by the DFQLGA
algorithm a) Left Distance, b) Right Distance, c) Frontal Distance
www.intechopen.com
249
(a)
(b)
Fig. 25. (a) Number of fuzzy rules generated by the DFQLGA algorithm (b) Fitness function
evolution
FIS with four inputs:
The inputs of the fuzzy controller are the minimal distances provided by the four sets of
sonars {D1=min (d1, d2), D2=min (d3, d4), G1=min (g1, g2), G2=min (g3, g4)}. The
translation velocity of the robot is given as previously (eq.16) with a light modification: it is
linearly proportional to the Frontal Distance.
Vtrans =
V max
.(min( g 4, d 4) Ds )
D max
(17)
At the end of the learning process, the final FIS is composed by forty five (45) fuzzy rules
and the generated membership functions are represented on the figure (26).
c
d
Fig. 26. Final membership functions of the inputs of the FIS determined by the DFQLGA
algorithm
a) Right Distance,
b) Right-Frontal Distance
c) Frontal-Left Distance
d) Left Distance.
www.intechopen.com
250
The figure (28) represents the evolution of the number of generations, the execution of the
genetic algorithm starts after the generation of eight (08) fuzzy rules. A new generation of
the conclusions is carried out after reception of certain number of failures (20 failures in our
case).
The figure (27) shows the evolution of the fitness function (sum of the locals qualities) used
by the genetic algorithm.
Fig. 27. Evolution of the Number of fuzzy rules generated by the algorithm DFQLGA
In the beginning the robot is almost blind and it moves according to straight line until
entering in collision with the obstacles. In this stage the FIS is composed by one fuzzy
rule only initialized randomly, this stage contains the first four failures. Once the
Completeness and the Error_TD criteria are satisfied, then new rules are generated.
www.intechopen.com
2.
3.
251
In a second stage the structure of the found FIS permits the mobile robot to learn the
obstacles avoidance behaviour if the correct action is among the set of the suggested
actions, otherwise robot affects other failures (failure 5 and failure 6). In this case the
genetic algorithm will create new generations of the actions more adapted based on the
fitness function.
In this third stage, the robot is able to carry out the desired behaviour successfully
(fig.31). The necessary number of fuzzy rules and membership functions is generated
by the algorithm DFQL when the values of the conclusions are optimized by the genetic
algorithm.
Fig. 30. Trajectories of the robot during the learning of the SIF by algorithm DFQLGA (the
first two stages)
Fig. 31. Trajectories of the robot after the learning of the FIS by algorithm DFQLGA (the
third stage)
C. experimental Results on the robot real Pioneer II
The figure (32) represents the results of validation on the mobile robot Pioneer II for the
behaviour Go to goal by choosing two goals of coordinate, B1 (2500 mm, - 1000 mm) and
www.intechopen.com
252
B2 (3250 mm, 1250 mm) the figure (32) shows that the robot succeeds in achieving
respectively the two goals B1 then B2.
Fig. 32. Validation of the FIS obtained in simulation by algorithm DFQLGA on the real robot
Pioneer II for the Go to goal behaviour
The figure (33) represents the results of learning (a) and validation (b) in real time for the
robot Pioneer II for the go to goal behaviour.
The figure (34) represents the results of the on line learning of the robot for the behaviour
Obstacles Avoidance. For reasons of safety, we consider that the minimal distance
detected by the frontal sonar is 408 mm and the right/left sonar is 345 mm, if the detected
value is lower than these thresholds then the behaviour is regarded as a failure and involves
a retreat (represented in green) of the mobile robot.
The figure (34-a) represents the behaviour of the robot during the real time learning of the
robot with the behaviour obstacles avoidance. The figure (34-b) represents the evolution
of the fitness function, according to this figure we can distinguish two phases, the first
www.intechopen.com
253
corresponds to the exploration stage, where the robot carries out several tests/failures, then
in a second phase the robot begin to choose the suitables actions at the appropriate state, it is
the end of learning. The algorithm DFQLGA generates fourteen (14) fuzzy rules (fig.34.c)
after 21 generations; the time of real time learning is estimated to 19 minutes.
b) Validation
Fig. 33. Learning/Validation of the FIS conceived in real time on the robot pioneer II
www.intechopen.com
254
a. Robot Behaviour
b. Fitness Function
Fig. 34. Real time learning by the algorithm DFQLGA of the robot Pioneer II, behaviour
Obstacles Avoidance (FIS with 03 inputs)
Figure.35 represents the validation results of a FIS conceived in real time, for the behaviour
Obstacles Avoidance.
8. Conclusion
The combination of the reinforcement Q-Learning algorithm and genetics Algorithms give a
new type of hybrid algorithms (FQLGA) which is more powerful than traditional learning
algorithms. FQLGA proved its effectiveness when no priori knowledge about system is
available. Indeed, starting from a random initialization of the conclusions values and
equidistant distribution for the membership functions for antecedent part the genetic
algorithm enables to find the best individual for the task indicated using only the received
reinforcement signal. The controller optimized by FQLGA algorithm was validated on a real
robot and satisfactory results were obtained. The next stage of this work is the on line
optimization of the structure of the Fuzzy controller.
www.intechopen.com
255
9. Acknowledgment
Abdelkrim Nemra: was born in August, 7th, 1980 in Tenes (Algeria). He received the
Engineering and the Master degree in 2004, 2006 respectively from the Military Polytechnic
School (Algeria). Currently, he is PhD student at Cranfield University, Department
Aerospace, Power and Sensors (UK). His current research areas include Robotic, Control of
System, Unmanned Aerial Vehicle (UAV) and Visual SLAM.
Hacne Rezine: was born in 1955 in El-Kala, Algeria. He received an engineering degree
from the Ecole Nationale dIngnieurs et de Techniciens, Alger in 1979, the Engineer Degree
from the Ecole Nationale Suprieur dElectrotechnique, dElectronique, dInformatique, et
dHydrolique, Toulouse, France in 1980 and the Doctor-Engineer Degree from the Institut
Nationale Polytechnique of Toulouse in 1983. He is currently the Head of Unit of teaching
www.intechopen.com
256
and control research of the Algerian Polytechnic Military School. His current research areas
include Electrical drives, Mobile robotic, Fuzzy control and Artificial Intelligent Control.
10. References
L. Jouffe: Actor-Critic Learning Based on Fuzzy Inference System, Proc of the IEEE
International Conference on Systems, Man and Cybernetics. Beijing, China, pp.339344, 1996.
C. Watkins, P. Dayan: Q-Learning Machine Learning, pp.279-292, 1992.
P. Y. Glorennec and L. Jouffe, Fuzzy Q-learning, Proc. Of IEEE Int. Con On Fuzzy
Systems, pp. 659-662, 1997
Dongbing Gu, Huosheng Hu, Libor Spacek Learning Fuzzy Logic Controller for Reactive
Robot Behaviours Proc. of IEEE/ASME International Conference on Advanced
Japan, pp.20-24 July, 2003.
A. Souissi : Apprentissage par Renforcement des Systmes dinfrence Floue par des
Mthodes par renforcement application la navigation ractive dun robot mobile,
Thse de Magistre, Ecole Militaire Polytechnique, Janvier 2005.
Meng Joo Er,et Chang Deng Online Tuning of Fuzzy Inference Systems Using Dynamic
Fuzzy Q-Learning IEEE transactions on systems, man, and cybernetics, part B:
cybernetics, Vol. 34, No. 3, pp.1478-1489, June 2004
Chia-Feng Juang Combination of online Clustering and Q-Value Based GA for
Reinforcement Fuzzy System Design IEEE Transactions On Systems, Man, And
Cybernetics Vol. 13, N. 3, pp.289-302, June 2005
Min-Soeng Kim and Kim and Ju-Jang Lee. Constructing a Fuzzy Logic ControllerUsing
Evolutionary Q-Learning IEEE. pp.1785-1790, 2000
Chia-Feng Juang and Chun-Feng Lu. Combination of On-line Clustering and Q-value Based
Genetic Reinforcement Learning For Fuzzy Network Design 0-7803-78989/03/$17.00 0 2003 IEEE.
David E Goldberg Algorithme gntique Exploration, optimisation et apprentissage
automatique Edition Addison-Wesley, France 1994.
C. C. Lee, Fuzzy logic in control system: Fuzzy logic controller, IEEE Trans. Syst., Man,
Cybern., vol. 20, pp. 404435, Mar./Apr. 1990.
A. Nemra H. Rezine : Genetic reinforcement learning of fuzzy inference system
Application to mobile robotic , ICINCO, may 2007, P 206-2130.
Chia-Feng Juang, Jiann-Yow Lin, and Chin-Teng Lin Genetic Reinforcement Learning
through Symbiotic Evolution for Fuzzy Controller Design IEEE transactions on
systems, man, and cybernetics-part b: cybernetics, vol. 30, no. 2, april 2000.
Chia-Feng Juang Combination of online Clustering and Q-Value Based GA for
Reinforcement Fuzzy System Design IEEE Transactions On Systems, Man, And
Cybernetics Vol. 13, N. 3, pp.289-302, June 2005.
Min-Soeng Kim and Kim and Ju-Jang Lee. Constructing a Fuzzy Logic ControllerUsing
Evolutionary Q-Learning IEEE. pp.1785-1790, 2000.
www.intechopen.com
ISBN 978-953-7619-18-3
Hard cover, 494 pages
Publisher InTech
How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following:
Abdelkrim Nemra, and and Hacene Rezine (2008). Genetic Reinforcement Learning Algorithms for On-line
Fuzzy Inference System Tuning "Application to Mobile Robotic", Robotics Automation and Control, Pavla
Pecherkova, Miroslav Flidr and Jindrich Dunik (Ed.), ISBN: 978-953-7619-18-3, InTech, Available from:
http://www.intechopen.com/books/robotics_automation_and_control/genetic_reinforcement_learning_algorith
ms_for_on-line_fuzzy_inference_system_tuning_application
InTech Europe
InTech China