[go: up one dir, main page]

CN106055426B - Real-time fault tolerance design method based on Evolvable Hardware - Google Patents

Real-time fault tolerance design method based on Evolvable Hardware Download PDF

Info

Publication number
CN106055426B
CN106055426B CN201610341467.1A CN201610341467A CN106055426B CN 106055426 B CN106055426 B CN 106055426B CN 201610341467 A CN201610341467 A CN 201610341467A CN 106055426 B CN106055426 B CN 106055426B
Authority
CN
China
Prior art keywords
fault
time
real
evolution
configuration library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610341467.1A
Other languages
Chinese (zh)
Other versions
CN106055426A (en
Inventor
王洁
柳继委
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201610341467.1A priority Critical patent/CN106055426B/en
Publication of CN106055426A publication Critical patent/CN106055426A/en
Application granted granted Critical
Publication of CN106055426B publication Critical patent/CN106055426B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1489Generic software techniques for error detection or fault masking through recovery blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明公开了一种基于演化硬件的实时容错系统设计方法,属于演化硬件领域。该方法从电路编码和系统容错机制进行分析,并在FPGA平台上实现了一套实时容错系统。采用动态自适应CGP编码方法,挖掘编码矩阵的潜能,减少演化耗时,提高演化成功率;采用多种机制保证容错系统的实时性和容错性。以系统故障容错时间为约束条件,保证系统的实时性;利用静态故障配置库加速对故障的修复过程;当静态配置库溢出时,采用补偿修复模式重构系统,提高系统长期运行能力;采用虚拟可重构技术在FPGA平台上构建了一套实时容错系统原型,验证该设计方法的可行性和有效性。该方法兼顾系统的容错性和实时性,充分发挥演化硬件在容错系统设计领域的优势。

The invention discloses a real-time fault-tolerant system design method based on evolutionary hardware, and belongs to the field of evolutionary hardware. The method analyzes the circuit coding and system fault-tolerant mechanism, and implements a real-time fault-tolerant system on the FPGA platform. The dynamic adaptive CGP coding method is used to tap the potential of the coding matrix, reduce the time-consuming evolution, and improve the success rate of evolution. Various mechanisms are used to ensure the real-time and fault tolerance of the fault-tolerant system. Taking the fault tolerance time of the system as a constraint, the real-time performance of the system is guaranteed; the static fault configuration library is used to speed up the repair process; The reconfigurable technology builds a set of real-time fault-tolerant system prototypes on the FPGA platform to verify the feasibility and effectiveness of the design method. The method takes into account the fault tolerance and real-time performance of the system, and gives full play to the advantages of evolutionary hardware in the field of fault-tolerant system design.

Description

Real-time fault tolerance design method based on Evolvable Hardware
Technical field
The invention belongs to Evolvable Hardware fields, are related to a kind of real-time fault tolerance design method based on Evolvable Hardware.
Background technique
With the arriving of information age, each electronic product emerges one after another, and the Working and life styles of people are occurring Huge change.The electronic product to become increasingly abundant is while bringing convenient to people's life, also to the design level of electronic system More stringent requirements are proposed.The function of electronic system constantly enhances, and scale constantly expands, and the complexity of electronic system design is in finger Number increases, and the experience of relying solely on will be unable to design the electronic system for meeting demand.For a few thing in particular surroundings Electronic equipment, such as space ship, bathyscaphe etc. is extremely stringent to the reliability requirement of system, once system breaks down, Fatal damage will be caused to these equipment, bring huge economic loss.Due to traditional design method be it is static, be System structure is once designed to be difficult to be modified again, relies solely on the experience of designer, what unpredictable system can meet with Kind failure, can not also be prevented, this just brings great hidden danger to the safety and reliability of system in advance.In order to The limitation of engineer's circuit is broken through, the adaptability of enhancing system counter external environment variation improves system reliability, compels It is essential and wants a kind of design method that can be improved Fault Tolerance.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of real-time fault tolerance design method based on Evolvable Hardware, knot The self-organizing of Evolvable Hardware is closed, adaptive and selfreparing characteristic proposes a kind of dynamic self-adapting Descartes genetic coding (Cartesian Genetic Programming, CGP) method and real-time fault tolerance mechanism, and one is realized in FPGA platform Real-time fault tolerance system prototype is covered, general frame is as shown in Figure 1.Wherein, evolution algorithmic is write using C code, operates in FPGA In the Microblaze processor soft core of platform built-in;Other modules are then write using Verilog code, and FPGA platform is operated in On.
A kind of real-time fault tolerance design method based on Evolvable Hardware, steps are as follows:
(1) dynamic self-adapting CGP coding method
Traditional CGP coding one stationary nodes of random selection are as final output node, dynamic self-adapting CGP coding staff Method output node is not fixed, and selects different nodes as output from CGP encoder matrix using a kind of dynamic random sampling model Node, calculates separately the fitness value of CGP encoder matrix, and selects the maximum matrix of fitness value as optimal solution;Node Sampled probability is adjusted with population entirety fitness situation of change dynamic, and the fitness distribution situation of current population uses following public affairs Formula calculates:
Wherein, fd(t) individual adaptation degree distribution or diversity in population are indicated;It indicates individual in population average to fit Answer angle value;fmax(t) maximum adaptation angle value individual in population is indicated;fmin(t) minimum fitness value individual in population is indicated; With the increase of the number of iterations, the fitness difference between individual is smaller and smaller, and average fitness value increases, fd(t) value It becomes larger;
Sampled probability is divided into two parts, and at the initial stage of evolution, Population adaptation angle value is lower, using fixed higher sampled probability Add block convergence rate;When population average fitness value or the number of iterations reach scheduled threshold value, according to population average fitness Value dynamic adjustment sampled probability;The calculation formula of sampled probability in evolutionary process are as follows:
Wherein, Ps0It is the initial samples probability of setting, the set comprehensive of initial samples probability considers population scale and dyeing Body length;If initial samples probability is set as 1, all nodes are traversed at this time, and calculate fitness value;If initial samples Probability setting is too small, then randomly chooses, and initial samples probability is not less than 0.5;t0It is the critical the number of iterations of sampled probability adjustment, t0Value be 0.3tmaxIt is the customized parameter regulation factor, is set as 2, tmaxIt is the maximum number of iterations of evolution algorithmic;
(2) the real-time fault tolerance mechanism based on Evolvable Hardware
Tolerant system is substantially a kind of real-time system, and real-time is an important restrictions condition of tolerant system;It is based on The real-time fault tolerance mechanism of Evolvable Hardware accelerates repair process using static configuration library using the System Fault Tolerance time as constraint condition, simultaneous Care for the real-time and fault-tolerance of tolerant system;Entire real-time fault tolerance mechanism mainly includes the following aspects:
1. calculating evolution algorithmic maximum number of iterations
System tolerable fault time is defined as to repair time limit Tmax;TmaxIt is obtained by fault analysis tree technology;For Guarantee system is repairing repair system failure in the time limit, and the time limit will be repaired in evolutionary process as the constraint item of evolution algorithmic Part;First choice calculates the time T that evolution algorithmic iteration once needs by formula (3)g
Tg=λ (Tcfg+Tfit)+Tea (3)
Wherein, TcfgIt is the setup time of evolution platform, TfitIt is Fitness analysis time, TeaWhen being the execution of evolution algorithmic Between, λ is the quantity of the offspring individual generated in each iterative process;Maximum the number of iterations is then obtained according to formula (4);
2. run time fault detects
It needs to detect mistake within the shortest time when system breaks down in the system operation phase;Failure is final Lead to system output error, the reality output and desired output of comparison system judge the whether faulty generation of system;Fault detection Result with formula (5) indicate
Wherein R indicates the exclusive or of reality output and desired output, if it is desired to output is consistent with reality output, then R is 0, Otherwise R is not zero, and by detecting the value of R, obtains whether detection system malfunctions;
3. repairing failure using static configuration library
Static configuration library is mainly used to the repair time of acceleration disturbance;In system design stage, pass through FTA technology mining system It unites potential failure, and the compensation circuit of these failures is stored in static configuration library;In system operation, in case of event Barrier, then with vector (In,Oreal) it is to index to search corresponding compensation circuit in static configuration library;
If searching corresponding compensation circuit, illustrates that known fault has occurred, directly searched from repository corresponding Compensation circuit completes the reparation to the system failure;
If not finding corresponding reparation circuit, illustrate that unknown failure has occurred, starts evolution algorithmic and mended to develop Repay circuit;
In the compensation circuit of evolution unknown failure, accelerate to develop based on the evolution algorithmic of similitude using a kind of Journey;Firstly, calculating the similitude for having the compensation output of circuit in expected compensation output and repository, table using Hamming distances It is shown as Dis (R, R '), wherein R is the output of expected compensation circuit, and R ' is the compensation output for having circuit in repository;IfThe initial population of evolution algorithmic then is generated with the corresponding compensation circuit of R ', and the R out that develops; Conversely, then being developed out according to the corresponding compensation circuit of R 'Then phase inverter is added in compensation output;
The size in static configuration library depends on system scale and hardware resource, and the bigger occupancy in the space in static configuration library is deposited Storage resource is more, and the time of checking storehouse is incremented by;Reparation circuit and malfunction index in repository are one-to-many relationships, i.e., one is repaired Compound circuit corresponds to multiple malfunction indexes;
4. the reconfiguration system in the way of developing with compensation
One failure of every reparation, corresponding compensation circuit is stored in repository, but the memory space of repository has Limit, cannot store all compensation circuits;When repository memory overflows, repository will be unable to store new compensation circuit, and And as system is run for a long time, there are more unknown failures, the effect of repository gradually decreases, final influence system Fault-tolerant ability;In order to improve the long-term running reliability of system, when repository memory overflows, skill will be repaired using based on compensation Art reconfiguration system;
When reconstructing goal systems, using the evolution mode based on compensation recovery technique;If detected in evolutionary process Retention effects then stop developing, it is assumed that current iteration number is GsIf Gs,≥Gmax, then showing cannot be in defined iteration time Develop goal systems out in number;If Gs<Gmax, then start reparation process and the correct system in the part to have developed out repaired Just, repairing process must be in Gmax-GsIt is completed in secondary the number of iterations.
Beneficial effects of the present invention: (1) it by using dynamic self-adapting CGP coding mode can reduce evolution time-consuming, mention High evolution success rate;(2) using the System Fault Tolerance time as constraint condition, in conjunction with repository and compensation fault-tolerant mode, it can be improved and be The fault-tolerance and real-time of system;(3) virtual restructural frame is constructed in FPGA platform, and is realized on the frame a set of real-time Tolerant system provides a kind of feasible reference model for the design of tolerant system.
Detailed description of the invention
Fig. 1 is the overall structure block diagram of system.
Fig. 2 is CGP encoder matrix exemplary diagram.
Fig. 3 is static configuration library working principle diagram.
Specific embodiment
Below in conjunction with attached drawing and technical solution, a specific embodiment of the invention is further illustrated.
A kind of real-time fault tolerance design method based on Evolvable Hardware, steps are as follows:
The cardinal principle of dynamic self-adapting CGP coding method and real-time fault tolerance mechanism is as follows:
(1) dynamic self-adapting CGP coding method
CGP is that Evolvable Hardware field is the most commonly used coding mode, and its principle is as shown in Figure 2.Traditional CGP coding with Machine selects a stationary nodes as final output node, and dynamic self-adapting CGP coding method output node is not fixed, using one Kind dynamic random sampling model selects different nodes as output node from CGP encoder matrix, calculates separately CGP encoder matrix Fitness value, and select the maximum matrix of fitness value as optimal solution.The sampled probability of node is integrally adapted to population It spends situation of change dynamic to adjust, the fitness distribution situation of current population can be calculated using the following equation:
Wherein, fd(t) individual adaptation degree distribution or diversity in population are indicated;It indicates individual in population average to fit Answer angle value;fmax(t) maximum adaptation angle value individual in population is indicated;fminIt (t) is minimum fitness value.With the number of iterations Increase, the fitness difference between individual is smaller and smaller, and average fitness increases, fd(t) value also becomes larger.Sampled probability Two parts are broadly divided into, at the initial stage of evolution, population's fitness is lower, adds block convergence rate using fixed higher sampled probability; It is general according to the value dynamic adjustment sampling of population average fitness when population average fitness or the number of iterations reach scheduled threshold value Rate.The calculation formula of sampled probability in evolutionary process are as follows:
Wherein, Ps0It is the initial samples probability of setting, the setting of initial samples probability value needs to comprehensively consider population scale And chromosome length.If initial samples probability is set as 1, all nodes are needed to be traversed at this time and calculate fitness value;If The setting of initial samples probability is too small, then becomes random selection, may will affect the number of iterations.Therefore, initial samples probability one As be not less than 0.5.t0It is the critical the number of iterations of sampled probability adjustment, t0Value cannot be too big, value is excessive to will affect algorithm Acceleration effect, generally take 0.3tmaxIt is the customized parameter regulation factor, is usually set to 2, tmaxBe evolution algorithmic most Big the number of iterations.In early period of developing, population entirety fitness is lower, and sampled probability is larger, is conducive to accelerate evolution algorithmic convergence Speed;In the later period of developing, population is whole more excellent, and sampled probability becomes smaller, and reduction is conducive to excellent genes to the adjustment of circuit structure Accumulation, improve the success rate of evolution algorithmic.
(2) the real-time fault tolerance mechanism based on Evolvable Hardware
Traditional fault tolerant system design method mainly considers system survivability, has ignored the real-time of tolerant system, And tolerant system is substantially a kind of real-time system, real-time is an important restrictions condition of tolerant system.It is hard based on developing The real-time fault tolerance mechanism of part accelerates repair process using static configuration library, can take into account using the System Fault Tolerance time as constraint condition The real-time and fault-tolerance of tolerant system.Entire real-time fault tolerance mechanism mainly includes the following aspects:
1. calculating evolution algorithmic maximum number of iterations
System tolerable fault time is defined as to repair the time limit, is defined as Tmax。TmaxFault analysis tree can be passed through (Fault Tree Analysis, FTA) technology is analyzed to obtain.In order to guarantee that system can repair repair system event in the time limit Barrier will repair the time limit as the constraint condition of evolution algorithmic in evolutionary process.First choice can calculate evolution by formula (3) The time T that algorithm iteration once needsg
Tg=λ (Tcfg+Tfit)+Tea (3)
Wherein, TcfgIt is the setup time of evolution platform, TfitIt is Fitness analysis time, TeaWhen being the execution of evolution algorithmic Between, λ is the quantity of the offspring individual generated in each iterative process.Then according to the available maximum the number of iterations of formula (4).
2. run time fault detects
In the system operation phase, when system breaks down, it is desirable to be able to detect mistake within the shortest time.Failure Eventually result in system output error, it is only necessary to which the reality output and desired output of comparison system can detect whether faulty Occur.The result of fault detection can be indicated with formula (5).
Wherein R indicates the exclusive or of reality output and desired output, if it is desired to output is consistent with reality output, then R is 0, Otherwise R is not zero, by detect R value can detection system whether malfunction.
3. repairing failure using static configuration library
Static configuration library is mainly used to the repair time of acceleration disturbance.In system design stage, can be dug by FTA technology The potential failure of pick system, and the compensation circuit of these failures is stored in static configuration library.In system operation, if hair Raw failure, then with vector (In,Oreal) it is to index to search corresponding compensation circuit in static configuration library.
If searching corresponding compensation circuit, illustrates that known fault has occurred, directly searched from repository corresponding Compensation circuit completes the reparation to the system failure.
If not finding corresponding reparation circuit, illustrate that unknown failure has occurred, needs to start evolution algorithmic to drill Change compensation circuit.In the compensation circuit of evolution unknown failure, accelerate to develop based on the evolution algorithmic of similitude using a kind of Process.Firstly, the similitude for having the compensation output of circuit in expected compensation output and repository is calculated using Hamming distances, It is expressed as Dis (R, R '), wherein R is the output of expected compensation circuit, and R ' is the compensation output for having circuit in repository.IfThe initial population of evolution algorithmic then is generated with the corresponding compensation circuit of R ', and the R out that develops; Conversely, then being developed out according to the corresponding compensation circuit of R 'Then phase inverter is added in compensation output.Based on similitude Evolution repair mechanism can make full use of the design feature of existing compensation circuit, accelerate the convergence rate of evolution algorithmic, and reduction is drilled Change time-consuming, is suitable for real-time tolerant system.
The size in static configuration library depends on system scale and hardware resource, and the bigger occupancy in the space in static configuration library is deposited It is also more to store up resource, while the time of checking storehouse can also be incremented by.Fig. 3, which gives, repairs three bit parity check device electricity with static configuration library The example of failure in road.Computing unit matrix (Function Element Array, FEA) in figure includes that four calculating are single Member, each computing unit are made of two multiple selector and a function module, and each power function module includes eight letters Number function.The input data of computing unit is by cfg1 and cfg2 signal deciding, and the function of computing unit is by cfg3 signal deciding.It is false It is located at route f0There is the failure of short circuit, as input InWhen=(010), the reality output of system is Oreal=0, desired output Oexp =1, according to formula (5) available R=1, then it may determine that system breaks down.At this moment vector (I is selectedn,Oreal)= The index of (010,0) as checking storehouse illustrates that the failure is known event if can find matched reparation circuit in library Barrier directly reconstructs VRC with the configuration information in library.In this example, VRC is reconfigured into an XOR circuit, faulty circuit It is studied for a second time courses one has flunked just, finally obtains correct output.Reparation circuit and malfunction index in repository are one-to-many relationships, i.e., one is repaired Compound circuit may correspond to multiple malfunction indexes.Such as fault vectors (011,1), (100,0) and (101,1) all correspond to same Repair circuit.
4. the reconfiguration system in the way of developing with compensation
Corresponding compensation circuit, can all be stored in repository, but the storage of repository is empty by one failure of every reparation Between it is limited, it is impossible to store all compensation circuits.When repository memory overflows, repository will be unable to store new compensation electricity Road, and as system is run for a long time, it is possible that more unknown failures, the effect of repository is gradually decreased, most It will affect system survivability eventually.In order to improve the long-term running reliability of system, when repository memory overflows, will use Based on compensation recovery technique reconfiguration system.
In reconfiguration system, influence of the retention effects to evolution algorithmic is considered.Retention effects refer to that the fitness of population exists Evolution initial stage increases quickly, but when the fitness of population is close to target value, fitness value is stagnated, and a large amount of the number of iterations is all Spent in the lag phase, but there is no increase for the fitness value of population, it is clear that the correct system in part that develops than The complete goal systems that develops is simply more.When reconstructing goal systems, using the evolution mode based on compensation recovery technique. Stop developing if detecting retention effects in evolutionary process, it is assumed that current iteration number is GsIf Gs,≥Gmax, then table The bright goal systems out that can not develop in defined the number of iterations, it is this since retention effects are generally present in evolution initial stage Situation will not generally occur.If Gs<Gmax, then start reparation process and the correct system in the part to have developed out be modified, Reparation process must be in Gmax-GsIt is completed in secondary the number of iterations.Since the scale for repairing circuit is generally smaller, it is easier It arrives.

Claims (1)

1.一种基于演化硬件的实时容错系统设计方法,其特征在于,步骤如下:1. a real-time fault-tolerant system design method based on evolution hardware, is characterized in that, step is as follows: (1)动态自适应CGP编码方法(1) Dynamic adaptive CGP coding method 传统的CGP编码随机选择一个固定节点作为最终输出节点,动态自适应CGP编码方法输出节点不固定,采用一种动态随机采样模型从CGP编码矩阵中选择不同节点作为输出节点,分别计算CGP编码矩阵的适应度值,并选择适应度值最大的矩阵作为最优解;节点的采样概率随着种群整体适应度变化情况动态调整,当前种群的适应度分布情况采用以下公式计算:The traditional CGP coding randomly selects a fixed node as the final output node, and the dynamic adaptive CGP coding method has no fixed output node. A dynamic random sampling model is used to select different nodes from the CGP coding matrix as output nodes, and calculate the CGP coding matrix separately. The fitness value is selected, and the matrix with the largest fitness value is selected as the optimal solution; the sampling probability of the node is dynamically adjusted with the change of the overall fitness of the population, and the fitness distribution of the current population is calculated by the following formula: 其中,fd(t)表示种群中个体适应度分布或多样性;表示种群中个体的平均适应度值;fmax(t)表示种群中个体的最大适应度值;fmin(t)表示种群中个体的最小适应度值;随着迭代次数的增加,个体之间的适应度差值越来越小,平均适应度值增加,fd(t)的值也逐渐变大;Among them, f d (t) represents the fitness distribution or diversity of individuals in the population; Represents the average fitness value of individuals in the population; f max (t) represents the maximum fitness value of individuals in the population; f min (t) represents the minimum fitness value of individuals in the population; with the increase of the number of iterations, between individuals The fitness difference of , becomes smaller and smaller, the average fitness value increases, and the value of f d (t) also gradually becomes larger; 采样概率分为两部分,在演化初期,种群适应度值较低,采用固定的较高采样概率加块收敛速度;当种群平均适应度值或迭代次数达到预定的阈值时,根据种群平均适应度值动态调整采样概率;演化过程中采样概率的计算公式为:The sampling probability is divided into two parts. In the early stage of evolution, the population fitness value is relatively low, and a fixed higher sampling probability is used to increase the block convergence speed; when the population average fitness value or the number of iterations reaches a predetermined threshold, the population average fitness value The sampling probability is dynamically adjusted by the value; the calculation formula of the sampling probability in the evolution process is: 其中,Ps0是设定的初始采样概率,初始采样概率的设定综合考虑种群规模和染色体长度;如果初始采样概率设为1,则此时遍历所有的节点,并计算适应度值;如果初始采样概率设置过小,则随机选择,初始采样概率不低于0.5;t0是采样概率调整的临界迭代次数,t0的取值为0.3tmax是自定义的参数调节因子,设定为2,tmax是演化算法的最大迭代次数;Among them, P s0 is the set initial sampling probability, the setting of the initial sampling probability comprehensively considers the population size and chromosome length; if the initial sampling probability is set to 1, then all nodes are traversed at this time, and the fitness value is calculated; if the initial sampling probability is set to 1 If the sampling probability is set too small, it is randomly selected, and the initial sampling probability is not less than 0.5; t 0 is the critical iteration number of sampling probability adjustment, and the value of t 0 is 0.3t max ; is a user-defined parameter adjustment factor, set to 2, and t max is the maximum number of iterations of the evolutionary algorithm; (2)基于演化硬件的实时容错机制(2) Real-time fault tolerance mechanism based on evolutionary hardware 容错系统本质上是一种实时系统,实时性是容错系统的一个重要约束条件;基于演化硬件的实时容错机制,以系统容错时间为约束条件,利用静态配置库加速修复过程,兼顾容错系统的实时性和容错性;整个实时容错机制主要包括以下几个方面:A fault-tolerant system is essentially a real-time system, and real-time performance is an important constraint of the fault-tolerant system; the real-time fault-tolerant mechanism based on evolutionary hardware takes the fault-tolerant time of the system as the constraint, uses the static configuration library to speed up the repair process, and takes into account the real-time performance of the fault-tolerant system. The whole real-time fault tolerance mechanism mainly includes the following aspects: ①计算演化算法最大迭代次数①Calculate the maximum number of iterations of the evolutionary algorithm 将系统可容忍的故障时间定义为修复期限Tmax;Tmax通过故障分析树技术得到;为了保证系统在修复期限内修复系统故障,在演化过程中将修复期限作为演化算法的约束条件;首选通过公式(3)计算出演化算法迭代一次需要的时间TgThe tolerable failure time of the system is defined as the repair period T max ; T max is obtained through the fault analysis tree technology; in order to ensure that the system can repair the system failure within the repair period, the repair period is used as the constraint condition of the evolutionary algorithm in the evolution process; it is preferred to pass Formula (3) calculates the time T g required for one iteration of the evolutionary algorithm; Tg=λ(Tcfg+Tfit)+Tea (3)T g =λ(T cfg +T fit )+T ea (3) 其中,Tcfg是演化平台的配置时间,Tfit是适应度评估时间,Tea是演化算法的执行时间,λ是每次迭代过程中产生的子代个体的数量;则根据公式(4)得到最大的迭代次数;Among them, T cfg is the configuration time of the evolution platform, T fit is the fitness evaluation time, Tea is the execution time of the evolution algorithm, and λ is the number of offspring individuals generated in each iteration process; then according to formula (4), we get maximum number of iterations; ②运行时故障检测②Fault detection during operation 在系统运行阶段,当系统出现故障时,需要在最短的时间内检测到错误;故障最终导致系统输出错误,比较系统的实际输出和期望输出,判断系统是否有故障发生;故障检测的结果用公式(5)表示In the system operation stage, when the system fails, it needs to detect the error in the shortest time; the failure eventually leads to the system output error, compare the actual output and expected output of the system, and judge whether the system has a failure; the result of the failure detection is calculated by the formula (5) indicates R=Oreal⊕Oexp (5)R=O real ⊕ O exp (5) 其中R表示实际输出和期望输出的异或,Oreal表示系统的实际输出值,Oexp表示系统的期望输出值,⊕为异或运算符号,如果期望输出和实际输出一致,则R为0,否则R不为零,通过检测R的值,得到检测系统是否出错;where R represents the XOR of the actual output and the expected output, O real represents the actual output value of the system, O exp represents the expected output value of the system, and ⊕ is the XOR operator symbol. If the expected output is consistent with the actual output, then R is 0, Otherwise, R is not zero, and by detecting the value of R, it is obtained whether the detection system is in error; ③利用静态配置库修复故障③Using the static configuration library to repair the fault 静态配置库主要用来加速故障的修复时间;在系统设计阶段,通过FTA技术挖掘系统潜在的故障,并将这些故障的补偿电路存储在静态配置库中;在系统运行时,如果发生故障,则以向量(In,Oreal)为索引到静态配置库中查找对应的补偿电路,其中In表示系统的输入值向量,Oreal表示In所对应的系统实际输出;The static configuration library is mainly used to speed up the repair time of faults; in the system design stage, the potential faults of the system are excavated through FTA technology, and the compensation circuits for these faults are stored in the static configuration library; when the system is running, if a fault occurs, then Take the vector (I n , O real ) as the index to find the corresponding compensation circuit in the static configuration library, where In represents the input value vector of the system, and O real represents the actual output of the system corresponding to In; 如果查找对应的补偿电路,则说明发生了已知故障,直接从配置库中查找对应的补偿电路,完成对系统故障的修复;If the corresponding compensation circuit is found, it means that a known fault has occurred, and the corresponding compensation circuit is directly searched from the configuration library to complete the repair of the system fault; 如果没有查到对应的修复电路,则说明发生了未知故障,启动演化算法来演化补偿电路;If the corresponding repair circuit is not found, it means that an unknown fault has occurred, and the evolution algorithm is started to evolve the compensation circuit; 在演化未知故障的补偿电路时,采用一种基于相似性的演化算法来加速演化过程;首先,采用海明距离来计算期望补偿输出和配置库中已有电路的补偿输出的相似性,表示为Dis(R,R’),其中R是期望补偿电路的输出,R’是配置库中已有电路的补偿输出;如果则用R’对应的补偿电路来产生演化算法的初始种群,并演化出R;反之,则根据R’对应的补偿电路来演化出然后在补偿输出上加上反相器;When the compensation circuit of unknown fault is evolved, a similarity-based evolution algorithm is used to accelerate the evolution process; first, the Hamming distance is used to calculate the similarity between the expected compensation output and the compensation output of the existing circuit in the configuration library, which is expressed as Dis(R,R'), where R is the output of the desired compensation circuit and R' is the compensation output of the existing circuit in the configuration library; if Then use the compensation circuit corresponding to R' to generate the initial population of the evolutionary algorithm, and evolve R; otherwise, according to the compensation circuit corresponding to R' to evolve Then add an inverter to the compensation output; 静态配置库的大小取决于系统规模和硬件资源,静态配置库的空间越大占用的存储资源越多,查库的时间递增;配置库中的修复电路和故障索引是一对多的关系,即一个修复电路对应多个故障索引;The size of the static configuration library depends on the system scale and hardware resources. The larger the space of the static configuration library, the more storage resources it occupies, and the time for checking the library increases. The repair circuit and the fault index in the configuration library have a one-to-many relationship, that is, One repair circuit corresponds to multiple fault indexes; ④利用演化与补偿的方式重构系统④ Reconstruct the system by means of evolution and compensation 每修复一个故障,将对应的补偿电路存储在配置库中,但是配置库的存储空间有限,不能存储所有的补偿电路;当配置库内存溢出时,配置库将无法存储新的补偿电路,而且随着系统长时间的运行,出现更多的未知故障,配置库的作用逐渐降低,最终影响系统的容错能力;为了提高系统长期运行的可靠性,当配置库内存溢出时,将采用基于补偿修复技术重构系统;Each time a fault is repaired, the corresponding compensation circuit is stored in the configuration library, but the storage space of the configuration library is limited, and all compensation circuits cannot be stored; when the memory of the configuration library overflows, the configuration library will not be able to store new compensation circuits, and the As the system runs for a long time, more unknown faults appear, and the role of the configuration library is gradually reduced, which ultimately affects the fault tolerance of the system; in order to improve the reliability of the long-term operation of the system, when the memory of the configuration library overflows, the compensation-based repair technology will be used. rebuild the system; 在重构目标系统时,采用基于补偿修复技术的演化方式;演化过程中如果检测到停滞效应则停止演化,假设当前迭代次数为Gs,如果Gs,≥Gmax,则表明不能在规定的迭代次数内演化出目标系统;如果Gs&lt;Gmax,则启动修复进程对已经演化出的部分正确系统进行修正,修复进程必须在Gmax-Gs次迭代次数内完成。When reconstructing the target system, the evolution method based on compensation repair technology is adopted; if the stagnation effect is detected during the evolution process, the evolution will be stopped, assuming that the current number of iterations is G s , if G s , ≥ G max , it means that the specified The target system is evolved within the number of iterations; if G s &lt; G max , the repair process is started to correct some of the correct systems that have been evolved, and the repair process must be completed within the number of iterations G max -G s .
CN201610341467.1A 2016-05-21 2016-05-21 Real-time fault tolerance design method based on Evolvable Hardware Active CN106055426B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610341467.1A CN106055426B (en) 2016-05-21 2016-05-21 Real-time fault tolerance design method based on Evolvable Hardware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610341467.1A CN106055426B (en) 2016-05-21 2016-05-21 Real-time fault tolerance design method based on Evolvable Hardware

Publications (2)

Publication Number Publication Date
CN106055426A CN106055426A (en) 2016-10-26
CN106055426B true CN106055426B (en) 2019-02-01

Family

ID=57177374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610341467.1A Active CN106055426B (en) 2016-05-21 2016-05-21 Real-time fault tolerance design method based on Evolvable Hardware

Country Status (1)

Country Link
CN (1) CN106055426B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969206B (en) * 2019-11-29 2023-06-06 大连理工大学 Real-time Diagnosis and Self-repair Method of Circuit Fault Based on Hierarchy Division
CN113239655B (en) * 2020-05-21 2024-06-28 台湾积体电路制造股份有限公司 System and method for determining constraints of semiconductor circuits
CN114647541B (en) * 2022-03-18 2024-04-26 西安微电子技术研究所 Circuit self-repairing method based on improved evolution hardware

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975684A (en) * 2006-12-13 2007-06-06 天津理工大学 Distributing real-time data bank fault recovering method capable of supporting serving and recovering simultaneously
EP2068246A1 (en) * 2007-11-30 2009-06-10 Giga-Byte Technology Co., Ltd. Auto repair method of system configurations using single key control
CN101937435A (en) * 2009-07-01 2011-01-05 北京科东电力控制系统有限责任公司 Online Synchronization Method of Distributed Real-time Database for Power System
CN102495823A (en) * 2011-12-07 2012-06-13 重庆邮电大学 System for classifying deoxyribonucleic acid (DNA) micro-array data based on evolvable hardware and method for constructing system
CN102521082A (en) * 2011-12-08 2012-06-27 上海交通大学 Recovery and fault tolerance method and recovery and fault tolerance system for check points in satellite-borne real-time operating system
CN102855314A (en) * 2012-08-27 2013-01-02 济南大学 Real-time database backup system and method based on FPGA (field programmable gate array)
CN103455393A (en) * 2013-09-25 2013-12-18 浪潮电子信息产业股份有限公司 Fault tolerant system design method based on process redundancy

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975684A (en) * 2006-12-13 2007-06-06 天津理工大学 Distributing real-time data bank fault recovering method capable of supporting serving and recovering simultaneously
EP2068246A1 (en) * 2007-11-30 2009-06-10 Giga-Byte Technology Co., Ltd. Auto repair method of system configurations using single key control
CN101937435A (en) * 2009-07-01 2011-01-05 北京科东电力控制系统有限责任公司 Online Synchronization Method of Distributed Real-time Database for Power System
CN102495823A (en) * 2011-12-07 2012-06-13 重庆邮电大学 System for classifying deoxyribonucleic acid (DNA) micro-array data based on evolvable hardware and method for constructing system
CN102521082A (en) * 2011-12-08 2012-06-27 上海交通大学 Recovery and fault tolerance method and recovery and fault tolerance system for check points in satellite-borne real-time operating system
CN102855314A (en) * 2012-08-27 2013-01-02 济南大学 Real-time database backup system and method based on FPGA (field programmable gate array)
CN103455393A (en) * 2013-09-25 2013-12-18 浪潮电子信息产业股份有限公司 Fault tolerant system design method based on process redundancy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《The Dynamic Evaluation Strategy for Evolvable Hardware》;王洁;《2015 Ninth International Conference on Frontier of Computer Science and Technology》;20151231;第91-95页
《可重构系统的演化修复机制》;朱继祥等;《计算机学报》;20140911(第7期);第1599-1606页

Also Published As

Publication number Publication date
CN106055426A (en) 2016-10-26

Similar Documents

Publication Publication Date Title
US20190122096A1 (en) Automated evaluation of neural networks using trained classifier
US11294754B2 (en) System and method for contextual event sequence analysis
Robidoux et al. Automated modeling of dynamic reliability block diagrams using colored Petri nets
Mendonça et al. GODA: A goal-oriented requirements engineering framework for runtime dependability analysis
CN105653411B (en) The multi-core processor chip reconfigurable system for supporting local permanent fault to restore
CN106055426B (en) Real-time fault tolerance design method based on Evolvable Hardware
CN104598341B (en) For determining the method and system of the location of fault between interconnection/controller
CN106844877A (en) The analysis method of multimode phased mission systems dependability parameter
US11586976B2 (en) Method and apparatus for creating tests for execution in a storage environment
CN104216825A (en) Problem locating method and system
CN112434733A (en) Small sample hard disk fault data generation method, storage medium and computing device
Morozov et al. Openerrorpro: A new tool for stochastic model-based reliability and resilience analysis
Morozov et al. ErrorPro: Software tool for stochastic error propagation analysis
Ali et al. A fault-tolerant gyrokinetic plasma application using the sparse grid combination technique
CN106339553B (en) A reconfigured flight control method and system for a space vehicle
Xie et al. A model of software fault detection and correction processes considering heterogeneous faults
Boi-Ukeme et al. Real-time fault detection and diagnosis of CPS faults in DEVS
CN108415819A (en) A hard disk fault tracking method and device
Wang et al. Fault-tolerant strategy for real-time system based on evolvable hardware
US20170146983A1 (en) Method and system for generating minimal cut-sets for highly integrated large systems
Fazel A new method to predict the software fault using improved genetic algorithm
Beer et al. Analysis of an Airport Surveillance Radar using the QuantUM approach
CN106354930B (en) Adaptive reconstruction method and system for a space vehicle
US11334053B2 (en) Failure prediction model generating apparatus and method thereof
Pucel et al. Intermittent fault diagnosis as discrete signal estimation: Trackability analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant