Real-time fault tolerance design method based on Evolvable Hardware
Technical field
The invention belongs to Evolvable Hardware fields, are related to a kind of real-time fault tolerance design method based on Evolvable Hardware.
Background technique
With the arriving of information age, each electronic product emerges one after another, and the Working and life styles of people are occurring
Huge change.The electronic product to become increasingly abundant is while bringing convenient to people's life, also to the design level of electronic system
More stringent requirements are proposed.The function of electronic system constantly enhances, and scale constantly expands, and the complexity of electronic system design is in finger
Number increases, and the experience of relying solely on will be unable to design the electronic system for meeting demand.For a few thing in particular surroundings
Electronic equipment, such as space ship, bathyscaphe etc. is extremely stringent to the reliability requirement of system, once system breaks down,
Fatal damage will be caused to these equipment, bring huge economic loss.Due to traditional design method be it is static, be
System structure is once designed to be difficult to be modified again, relies solely on the experience of designer, what unpredictable system can meet with
Kind failure, can not also be prevented, this just brings great hidden danger to the safety and reliability of system in advance.In order to
The limitation of engineer's circuit is broken through, the adaptability of enhancing system counter external environment variation improves system reliability, compels
It is essential and wants a kind of design method that can be improved Fault Tolerance.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of real-time fault tolerance design method based on Evolvable Hardware, knot
The self-organizing of Evolvable Hardware is closed, adaptive and selfreparing characteristic proposes a kind of dynamic self-adapting Descartes genetic coding
(Cartesian Genetic Programming, CGP) method and real-time fault tolerance mechanism, and one is realized in FPGA platform
Real-time fault tolerance system prototype is covered, general frame is as shown in Figure 1.Wherein, evolution algorithmic is write using C code, operates in FPGA
In the Microblaze processor soft core of platform built-in;Other modules are then write using Verilog code, and FPGA platform is operated in
On.
A kind of real-time fault tolerance design method based on Evolvable Hardware, steps are as follows:
(1) dynamic self-adapting CGP coding method
Traditional CGP coding one stationary nodes of random selection are as final output node, dynamic self-adapting CGP coding staff
Method output node is not fixed, and selects different nodes as output from CGP encoder matrix using a kind of dynamic random sampling model
Node, calculates separately the fitness value of CGP encoder matrix, and selects the maximum matrix of fitness value as optimal solution;Node
Sampled probability is adjusted with population entirety fitness situation of change dynamic, and the fitness distribution situation of current population uses following public affairs
Formula calculates:
Wherein, fd(t) individual adaptation degree distribution or diversity in population are indicated;It indicates individual in population average to fit
Answer angle value;fmax(t) maximum adaptation angle value individual in population is indicated;fmin(t) minimum fitness value individual in population is indicated;
With the increase of the number of iterations, the fitness difference between individual is smaller and smaller, and average fitness value increases, fd(t) value
It becomes larger;
Sampled probability is divided into two parts, and at the initial stage of evolution, Population adaptation angle value is lower, using fixed higher sampled probability
Add block convergence rate;When population average fitness value or the number of iterations reach scheduled threshold value, according to population average fitness
Value dynamic adjustment sampled probability;The calculation formula of sampled probability in evolutionary process are as follows:
Wherein, Ps0It is the initial samples probability of setting, the set comprehensive of initial samples probability considers population scale and dyeing
Body length;If initial samples probability is set as 1, all nodes are traversed at this time, and calculate fitness value;If initial samples
Probability setting is too small, then randomly chooses, and initial samples probability is not less than 0.5;t0It is the critical the number of iterations of sampled probability adjustment,
t0Value be 0.3tmax;It is the customized parameter regulation factor, is set as 2, tmaxIt is the maximum number of iterations of evolution algorithmic;
(2) the real-time fault tolerance mechanism based on Evolvable Hardware
Tolerant system is substantially a kind of real-time system, and real-time is an important restrictions condition of tolerant system;It is based on
The real-time fault tolerance mechanism of Evolvable Hardware accelerates repair process using static configuration library using the System Fault Tolerance time as constraint condition, simultaneous
Care for the real-time and fault-tolerance of tolerant system;Entire real-time fault tolerance mechanism mainly includes the following aspects:
1. calculating evolution algorithmic maximum number of iterations
System tolerable fault time is defined as to repair time limit Tmax;TmaxIt is obtained by fault analysis tree technology;For
Guarantee system is repairing repair system failure in the time limit, and the time limit will be repaired in evolutionary process as the constraint item of evolution algorithmic
Part;First choice calculates the time T that evolution algorithmic iteration once needs by formula (3)g;
Tg=λ (Tcfg+Tfit)+Tea (3)
Wherein, TcfgIt is the setup time of evolution platform, TfitIt is Fitness analysis time, TeaWhen being the execution of evolution algorithmic
Between, λ is the quantity of the offspring individual generated in each iterative process;Maximum the number of iterations is then obtained according to formula (4);
2. run time fault detects
It needs to detect mistake within the shortest time when system breaks down in the system operation phase;Failure is final
Lead to system output error, the reality output and desired output of comparison system judge the whether faulty generation of system;Fault detection
Result with formula (5) indicate
Wherein R indicates the exclusive or of reality output and desired output, if it is desired to output is consistent with reality output, then R is 0,
Otherwise R is not zero, and by detecting the value of R, obtains whether detection system malfunctions;
3. repairing failure using static configuration library
Static configuration library is mainly used to the repair time of acceleration disturbance;In system design stage, pass through FTA technology mining system
It unites potential failure, and the compensation circuit of these failures is stored in static configuration library;In system operation, in case of event
Barrier, then with vector (In,Oreal) it is to index to search corresponding compensation circuit in static configuration library;
If searching corresponding compensation circuit, illustrates that known fault has occurred, directly searched from repository corresponding
Compensation circuit completes the reparation to the system failure;
If not finding corresponding reparation circuit, illustrate that unknown failure has occurred, starts evolution algorithmic and mended to develop
Repay circuit;
In the compensation circuit of evolution unknown failure, accelerate to develop based on the evolution algorithmic of similitude using a kind of
Journey;Firstly, calculating the similitude for having the compensation output of circuit in expected compensation output and repository, table using Hamming distances
It is shown as Dis (R, R '), wherein R is the output of expected compensation circuit, and R ' is the compensation output for having circuit in repository;IfThe initial population of evolution algorithmic then is generated with the corresponding compensation circuit of R ', and the R out that develops;
Conversely, then being developed out according to the corresponding compensation circuit of R 'Then phase inverter is added in compensation output;
The size in static configuration library depends on system scale and hardware resource, and the bigger occupancy in the space in static configuration library is deposited
Storage resource is more, and the time of checking storehouse is incremented by;Reparation circuit and malfunction index in repository are one-to-many relationships, i.e., one is repaired
Compound circuit corresponds to multiple malfunction indexes;
4. the reconfiguration system in the way of developing with compensation
One failure of every reparation, corresponding compensation circuit is stored in repository, but the memory space of repository has
Limit, cannot store all compensation circuits;When repository memory overflows, repository will be unable to store new compensation circuit, and
And as system is run for a long time, there are more unknown failures, the effect of repository gradually decreases, final influence system
Fault-tolerant ability;In order to improve the long-term running reliability of system, when repository memory overflows, skill will be repaired using based on compensation
Art reconfiguration system;
When reconstructing goal systems, using the evolution mode based on compensation recovery technique;If detected in evolutionary process
Retention effects then stop developing, it is assumed that current iteration number is GsIf Gs,≥Gmax, then showing cannot be in defined iteration time
Develop goal systems out in number;If Gs<Gmax, then start reparation process and the correct system in the part to have developed out repaired
Just, repairing process must be in Gmax-GsIt is completed in secondary the number of iterations.
Beneficial effects of the present invention: (1) it by using dynamic self-adapting CGP coding mode can reduce evolution time-consuming, mention
High evolution success rate;(2) using the System Fault Tolerance time as constraint condition, in conjunction with repository and compensation fault-tolerant mode, it can be improved and be
The fault-tolerance and real-time of system;(3) virtual restructural frame is constructed in FPGA platform, and is realized on the frame a set of real-time
Tolerant system provides a kind of feasible reference model for the design of tolerant system.
Detailed description of the invention
Fig. 1 is the overall structure block diagram of system.
Fig. 2 is CGP encoder matrix exemplary diagram.
Fig. 3 is static configuration library working principle diagram.
Specific embodiment
Below in conjunction with attached drawing and technical solution, a specific embodiment of the invention is further illustrated.
A kind of real-time fault tolerance design method based on Evolvable Hardware, steps are as follows:
The cardinal principle of dynamic self-adapting CGP coding method and real-time fault tolerance mechanism is as follows:
(1) dynamic self-adapting CGP coding method
CGP is that Evolvable Hardware field is the most commonly used coding mode, and its principle is as shown in Figure 2.Traditional CGP coding with
Machine selects a stationary nodes as final output node, and dynamic self-adapting CGP coding method output node is not fixed, using one
Kind dynamic random sampling model selects different nodes as output node from CGP encoder matrix, calculates separately CGP encoder matrix
Fitness value, and select the maximum matrix of fitness value as optimal solution.The sampled probability of node is integrally adapted to population
It spends situation of change dynamic to adjust, the fitness distribution situation of current population can be calculated using the following equation:
Wherein, fd(t) individual adaptation degree distribution or diversity in population are indicated;It indicates individual in population average to fit
Answer angle value;fmax(t) maximum adaptation angle value individual in population is indicated;fminIt (t) is minimum fitness value.With the number of iterations
Increase, the fitness difference between individual is smaller and smaller, and average fitness increases, fd(t) value also becomes larger.Sampled probability
Two parts are broadly divided into, at the initial stage of evolution, population's fitness is lower, adds block convergence rate using fixed higher sampled probability;
It is general according to the value dynamic adjustment sampling of population average fitness when population average fitness or the number of iterations reach scheduled threshold value
Rate.The calculation formula of sampled probability in evolutionary process are as follows:
Wherein, Ps0It is the initial samples probability of setting, the setting of initial samples probability value needs to comprehensively consider population scale
And chromosome length.If initial samples probability is set as 1, all nodes are needed to be traversed at this time and calculate fitness value;If
The setting of initial samples probability is too small, then becomes random selection, may will affect the number of iterations.Therefore, initial samples probability one
As be not less than 0.5.t0It is the critical the number of iterations of sampled probability adjustment, t0Value cannot be too big, value is excessive to will affect algorithm
Acceleration effect, generally take 0.3tmax。It is the customized parameter regulation factor, is usually set to 2, tmaxBe evolution algorithmic most
Big the number of iterations.In early period of developing, population entirety fitness is lower, and sampled probability is larger, is conducive to accelerate evolution algorithmic convergence
Speed;In the later period of developing, population is whole more excellent, and sampled probability becomes smaller, and reduction is conducive to excellent genes to the adjustment of circuit structure
Accumulation, improve the success rate of evolution algorithmic.
(2) the real-time fault tolerance mechanism based on Evolvable Hardware
Traditional fault tolerant system design method mainly considers system survivability, has ignored the real-time of tolerant system,
And tolerant system is substantially a kind of real-time system, real-time is an important restrictions condition of tolerant system.It is hard based on developing
The real-time fault tolerance mechanism of part accelerates repair process using static configuration library, can take into account using the System Fault Tolerance time as constraint condition
The real-time and fault-tolerance of tolerant system.Entire real-time fault tolerance mechanism mainly includes the following aspects:
1. calculating evolution algorithmic maximum number of iterations
System tolerable fault time is defined as to repair the time limit, is defined as Tmax。TmaxFault analysis tree can be passed through
(Fault Tree Analysis, FTA) technology is analyzed to obtain.In order to guarantee that system can repair repair system event in the time limit
Barrier will repair the time limit as the constraint condition of evolution algorithmic in evolutionary process.First choice can calculate evolution by formula (3)
The time T that algorithm iteration once needsg。
Tg=λ (Tcfg+Tfit)+Tea (3)
Wherein, TcfgIt is the setup time of evolution platform, TfitIt is Fitness analysis time, TeaWhen being the execution of evolution algorithmic
Between, λ is the quantity of the offspring individual generated in each iterative process.Then according to the available maximum the number of iterations of formula (4).
2. run time fault detects
In the system operation phase, when system breaks down, it is desirable to be able to detect mistake within the shortest time.Failure
Eventually result in system output error, it is only necessary to which the reality output and desired output of comparison system can detect whether faulty
Occur.The result of fault detection can be indicated with formula (5).
Wherein R indicates the exclusive or of reality output and desired output, if it is desired to output is consistent with reality output, then R is 0,
Otherwise R is not zero, by detect R value can detection system whether malfunction.
3. repairing failure using static configuration library
Static configuration library is mainly used to the repair time of acceleration disturbance.In system design stage, can be dug by FTA technology
The potential failure of pick system, and the compensation circuit of these failures is stored in static configuration library.In system operation, if hair
Raw failure, then with vector (In,Oreal) it is to index to search corresponding compensation circuit in static configuration library.
If searching corresponding compensation circuit, illustrates that known fault has occurred, directly searched from repository corresponding
Compensation circuit completes the reparation to the system failure.
If not finding corresponding reparation circuit, illustrate that unknown failure has occurred, needs to start evolution algorithmic to drill
Change compensation circuit.In the compensation circuit of evolution unknown failure, accelerate to develop based on the evolution algorithmic of similitude using a kind of
Process.Firstly, the similitude for having the compensation output of circuit in expected compensation output and repository is calculated using Hamming distances,
It is expressed as Dis (R, R '), wherein R is the output of expected compensation circuit, and R ' is the compensation output for having circuit in repository.IfThe initial population of evolution algorithmic then is generated with the corresponding compensation circuit of R ', and the R out that develops;
Conversely, then being developed out according to the corresponding compensation circuit of R 'Then phase inverter is added in compensation output.Based on similitude
Evolution repair mechanism can make full use of the design feature of existing compensation circuit, accelerate the convergence rate of evolution algorithmic, and reduction is drilled
Change time-consuming, is suitable for real-time tolerant system.
The size in static configuration library depends on system scale and hardware resource, and the bigger occupancy in the space in static configuration library is deposited
It is also more to store up resource, while the time of checking storehouse can also be incremented by.Fig. 3, which gives, repairs three bit parity check device electricity with static configuration library
The example of failure in road.Computing unit matrix (Function Element Array, FEA) in figure includes that four calculating are single
Member, each computing unit are made of two multiple selector and a function module, and each power function module includes eight letters
Number function.The input data of computing unit is by cfg1 and cfg2 signal deciding, and the function of computing unit is by cfg3 signal deciding.It is false
It is located at route f0There is the failure of short circuit, as input InWhen=(010), the reality output of system is Oreal=0, desired output Oexp
=1, according to formula (5) available R=1, then it may determine that system breaks down.At this moment vector (I is selectedn,Oreal)=
The index of (010,0) as checking storehouse illustrates that the failure is known event if can find matched reparation circuit in library
Barrier directly reconstructs VRC with the configuration information in library.In this example, VRC is reconfigured into an XOR circuit, faulty circuit
It is studied for a second time courses one has flunked just, finally obtains correct output.Reparation circuit and malfunction index in repository are one-to-many relationships, i.e., one is repaired
Compound circuit may correspond to multiple malfunction indexes.Such as fault vectors (011,1), (100,0) and (101,1) all correspond to same
Repair circuit.
4. the reconfiguration system in the way of developing with compensation
Corresponding compensation circuit, can all be stored in repository, but the storage of repository is empty by one failure of every reparation
Between it is limited, it is impossible to store all compensation circuits.When repository memory overflows, repository will be unable to store new compensation electricity
Road, and as system is run for a long time, it is possible that more unknown failures, the effect of repository is gradually decreased, most
It will affect system survivability eventually.In order to improve the long-term running reliability of system, when repository memory overflows, will use
Based on compensation recovery technique reconfiguration system.
In reconfiguration system, influence of the retention effects to evolution algorithmic is considered.Retention effects refer to that the fitness of population exists
Evolution initial stage increases quickly, but when the fitness of population is close to target value, fitness value is stagnated, and a large amount of the number of iterations is all
Spent in the lag phase, but there is no increase for the fitness value of population, it is clear that the correct system in part that develops than
The complete goal systems that develops is simply more.When reconstructing goal systems, using the evolution mode based on compensation recovery technique.
Stop developing if detecting retention effects in evolutionary process, it is assumed that current iteration number is GsIf Gs,≥Gmax, then table
The bright goal systems out that can not develop in defined the number of iterations, it is this since retention effects are generally present in evolution initial stage
Situation will not generally occur.If Gs<Gmax, then start reparation process and the correct system in the part to have developed out be modified,
Reparation process must be in Gmax-GsIt is completed in secondary the number of iterations.Since the scale for repairing circuit is generally smaller, it is easier
It arrives.