CN105205535A

CN105205535A - Improved gene expression programming method

Info

Publication number: CN105205535A
Application number: CN201510623547.1A
Authority: CN
Inventors: 陈云亮; 李方圆; 陈小岛; 邓泽; 杜波; 陈佳
Original assignee: China University of Geosciences Wuhan
Current assignee: China University of Geosciences Wuhan
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2015-12-30

Abstract

The invention discloses an improved gene expression programming method, which solves the technical problems of premature convergence, slow convergence process and easy falling into local optimal solution faced by the existing method. It includes the following steps: In the chromosome initialization stage, a mixed level uniform table is generated with the chromosome length as the level and the function set and the terminator set as the factors, and the adaptive multi-parent hybrid operator is applied to the initial population generated based on the mixed level uniform table. For generation p, the gene recombination operator and the Dc domain recombination operator are applied to the progeny for evolution processing to obtain the evolved offspring corresponding to the offspring P; the current optimal solution is selected according to the elite selection strategy, if the current optimal solution is When the global optimal solution or the preset maximum genetic algebra is reached, the genetic evolution process ends, the chromosome population is evenly initialized and the adaptive hybridization operator is used to achieve global convergence when solving complex problems, and the convergence speed is also greatly improved.

Description

An Improved Gene Expression Programming Method

技术领域technical field

本发明涉及计算机人工智能领域技术领域，尤其涉及一种基因表达式编程方法。The invention relates to the technical field of computer artificial intelligence, in particular to a gene expression programming method.

背景技术Background technique

GEP(GeneExpressionProgramming，基因表达式编程)是从GAs(GeneticAlgorithms，遗传算法)和遗传程序设计中发展而来的，它在吸收了二者优点的同时，又克服了二者的不足之处，其显著特点就是可以利用简单编码解决复杂问题。基因表达式编程相比GAs、遗传程序设计的主要差别在于染色体的编码方式不同。GEP中，一个基因型就代表了一个染色体(染色体)，一个基因型有多个基因组成，染色体是线性实体，可进行杂交、变异等过程，而基因型与表现型的分离赋予基因表达式编程更灵活的全局空间搜索能力。GEP (Gene Expression Programming, Gene Expression Programming) is developed from GAs (Genetic Algorithms, Genetic Algorithms) and genetic programming. It absorbs the advantages of both and overcomes their shortcomings. The characteristic is that it can use simple coding to solve complex problems. The main difference between gene expression programming and GAs and genetic programming lies in the way chromosomes are encoded. In GEP, a genotype represents a chromosome (chromosome), and a genotype is composed of multiple genes. Chromosomes are linear entities that can undergo processes such as hybridization and mutation, and the separation of genotype and phenotype endows gene expression programming More flexible global space search capability.

传统的基因表达式用随机的方法产生初始种群，并采用适应函数来评价每一代的染色体，也是根据适应值来进行选择、变异的过程。产生的新染色体会循环进行此过程，直到达到预设的遗传代数或者找到问题的解。传统GEP存在的桎梏是早熟收敛和后期收敛减慢。现有技术中，采取了一些措施来缓解这一弊端，如：采取根据特定问题初始化染色体的方法；如：从副本取样进行杂交，自适应的控制基因组大小，如：采用杂交算子来提高算法的性能，如：结合模拟退火算法和遗传机制的并行算法获得了较高的稳定性以及搜索能力。The traditional gene expression uses a random method to generate the initial population, and uses the fitness function to evaluate the chromosomes of each generation, which is also a process of selection and mutation based on the fitness value. The resulting new chromosomes are cycled through this process until a preset number of hereditary generations is reached or a solution to the problem is found. The shackles of traditional GEP are premature convergence and late convergence slowdown. In the prior art, some measures have been taken to alleviate this disadvantage, such as: adopting the method of initializing chromosomes according to specific problems; such as: sampling from replicas for hybridization, adaptively controlling the genome size, such as: using hybridization operators to improve the algorithm For example, the parallel algorithm combined with simulated annealing algorithm and genetic mechanism obtains high stability and search ability.

但是，现有解决措施在种群初始化时存在种群规模设置过小或者种群规模设置过大的问题，种群规模设置过小可能丢失重要的元素；种群规模设置过大则可能产生大量重复的元素，而GEP算法中种群初始化时基因的多样性将会直接影响后续染色体的多样性，因此，现有解决措施均存在种群初始化时基因的多样性不足，进而导致现有GEP仍然存在早熟收敛、后期收敛较慢、易陷入局部最优解的技术问题。However, the existing solutions have the problem that the population size is set too small or the population size is too large when the population is initialized. If the population size is set too small, important elements may be lost; if the population size is set too large, a large number of repeated elements may be generated. In the GEP algorithm, the diversity of genes at the time of population initialization will directly affect the diversity of subsequent chromosomes. Therefore, the existing solutions all have insufficient gene diversity at the time of population initialization, which leads to the premature convergence of the existing GEP and the slow convergence at the later stage. Slow and easy to fall into the technical problem of local optimal solution.

发明内容Contents of the invention

本发明实施例通过提供一种改进的基因表达式编程方法，解决了现有GEP仍然存在早熟收敛、后期收敛较慢、易陷入局部最优解的技术问题。The embodiment of the present invention provides an improved gene expression programming method, which solves the technical problems that the existing GEP still has premature convergence, slow convergence in the later stage, and easy to fall into a local optimal solution.

本发明实施例提供的一种改进的基因表达式编程方法，包括如下步骤：以染色体长度为水平、由函数集和终止符集为因素生成一混合水平均匀表，其中，所述混合水平均匀表由g个矩阵构建，g为一个染色体上的基因组成个数，所述混合水平均匀表的每行值代表一个染色体；子代产生步骤：对基于所述混合水平均匀表产生的初始种群应用自适应多亲杂交算子产生子代p，p依次取1,2,…,P，其中，P为预设最大遗传代数；对所述子代p应用基因重组算子和Dc域重组算子进行进化处理，以获得与所述子代p对应的进化后子代；依据精英选择策略从所述进化后子代中选择适应度最高的染色体为当前最优解；在所述当前最优解为全局最优解或达到所述预设最大遗传代数时，结束遗传过程，否则令p＝p+1后返回所述子代产生步骤。An improved gene expression programming method provided by an embodiment of the present invention includes the following steps: generating a mixed level uniform table with the chromosome length as the level and the function set and the terminator set as factors, wherein the mixed level uniform table Constructed by g matrices, g is the number of genes on a chromosome, and each row value of the mixed horizontal uniform table represents a chromosome; offspring generation step: apply the automatic method to the initial population generated based on the mixed horizontal uniform table Adapt to the multi-parental hybridization operator to generate offspring p, p takes 1, 2, ..., P in turn, where P is the preset maximum genetic algebra; apply gene recombination operator and Dc domain recombination operator to the offspring p Evolutionary processing, to obtain the evolved offspring corresponding to the offspring p; select the chromosome with the highest fitness from the evolved offspring according to the elite selection strategy as the current optimal solution; the current optimal solution is When the global optimal solution or the preset maximum genetic algebra is reached, the genetic process ends; otherwise, set p=p+1 and return to the offspring generation step.

优选的，所述以染色体长度为水平、由函数集和终止符集为因素生成一混合水平均匀表，具体包括如下步骤：创建g个调整后等水平均匀表U_n×k((n×k)^h·(v+1)^t)，其中，h取值为染色体的头部长度，t取值为染色体的尾部长度，k∈(v+1，f+v+1)，v取值为所述终止符集的元素数目，f取值为所述函数集的元素数目，n为常量整数；将每个所述调整后等水平均匀表U_n×k((n×k)^h·(v+1)^t)中的元素循环复制到对应的空白表中，每个所述空白表被填满后形成子混合水平均匀表；根据形成的g个所述子混合水平均匀表创建出如下公式表示的所述混合水平均匀表：U_n*(f+v+1)(g*(f+v+1)^h·(v+1)^t)，其中，h取值为染色体的头部长度，t取值为染色体的尾部长度，v取值为所述终止符集的元素数目，f取值为所述函数集的元素数目，n为常量整数，g取值为染色体上的基因组成个数。Preferably, the generation of a mixed level uniform table with the chromosome length as the level and the function set and the terminator set as factors specifically includes the following steps: creating g adjusted equal level uniform tables U _n×k ((n×k ) ^h ·(v+1) ^t ), where h takes the length of the head of the chromosome, t takes the length of the tail of the chromosome, k∈(v+1, f+v+1), and v takes the value The number of elements of the terminator set, the value of f is the number of elements of the function set, and n is a constant integer; the adjusted equal level uniform table U _n×k ((n×k) ^h ( The elements in v+1) ^t ) are circularly copied to the corresponding blank table, and each blank table is filled to form a sub-mixed level uniform table; according to the formed g sub-mixed level uniform table, create as follows The mixed level uniform table expressed by the formula: U _n*(f+v+1) (g*(f+v+1) ^h (v+1) ^t ), wherein, the value of h is the head of the chromosome Length, t is the tail length of the chromosome, v is the number of elements in the terminator set, f is the number of elements in the function set, n is a constant integer, and g is the genetic composition on the chromosome number.

优选的，所述创建g个调整后等水平均匀表U_n×k((n×k)^h·(v+1)^t)，包括如下步骤：创建原始等水平均匀表U_n*k((n*k)^h+t)，其中，h+t表示矩阵列数，h取值为染色体的头部长度，t取值为染色体的尾部长度，n*k表示矩阵行数，k∈(v+1，f+v+1)，v取值为所述终止符集的元素数目，f取值为所述函数集的元素数目，n为常量整数；通过拟水平法调整所述原始等水平均匀表；对调整后的所述原始等水平均匀表中染色体的头部长度进行限制，以及虚拟出调整后的所述原始等水平均匀表的部分水平，以所述调整后等水平均匀表。Preferably, the creation of g adjusted equal-level uniform tables U _n×k ((n×k) ^h ·(v+1) ^t ) includes the following steps: creating an original equal-level uniform table U _n*k (( n*k) ^h+t ), wherein, h+t represents the number of matrix columns, h takes the value of the head length of the chromosome, t takes the value of the tail length of the chromosome, n*k represents the number of matrix rows, k∈(v +1, f+v+1), the value of v is the number of elements of the terminator set, the value of f is the number of elements of the function set, and n is a constant integer; the original level is adjusted by quasi-level method Uniform table: limit the head length of chromosomes in the adjusted original iso-level uniform table, and virtualize partial levels of the adjusted original iso-level uniform table, and use the adjusted iso-level uniform table.

优选的，所述对基于所述混合水平均匀表产生的初始种群应用自适应多亲杂交算子产生子代p，具体为：将基于所述初始种群所产生父本的m*n个子段进行均匀杂交组合，以产生所述子代p，其中，m表示所述父本包括染色体的数目，n表示所述父本中一个染色体所划分子段的数目。Preferably, applying an adaptive multi-parent hybridization operator to the initial population generated based on the mixed level uniform table to generate offspring p is specifically: performing m*n sub-segments of the male parent generated based on the initial population Uniform hybridization combination to generate the offspring p, wherein, m represents the number of chromosomes included in the male parent, and n represents the number of sub-segments divided by one chromosome in the male parent.

优选的，在所述对所述子代p应用基因重组算子和Dc域重组算子进行进化处理，以获得与所述子代p对应的进化后子代之后，所述方法还包括：根据适应函数计算出所述进化后子代中每个染色体的适应值；根据每个染色体的适应值,从所述进化后子代中选择出预设百分比的精英染色体遗传到下一代中。Preferably, after applying the genetic recombination operator and the Dc domain recombination operator to the progeny p to perform evolutionary processing to obtain an evolved progeny corresponding to the progeny p, the method further includes: The fitness function calculates the fitness value of each chromosome in the evolved offspring; according to the fitness value of each chromosome, a preset percentage of elite chromosomes is selected from the evolved offspring to be passed on to the next generation.

优选的，所述将基于所述初始种群所产生父本的m*n个子段进行均匀杂交组合，产生所述子代p，包括：从所述父本中选出m个基父本放入配对池；从所述父本中随机选择m-1个从父本加入所述配对池；基于所述m个基父本与所述m-1个从父本构造出杂交算子均匀表U_m(mⁿ)，其中，U_m(mⁿ)的每一行代表所述子代p中的一个染色体。Preferably, the uniform cross-combination of the m*n sub-segments of the male parents generated based on the initial population to generate the offspring p includes: selecting m basic male parents from the male parents and putting them into Pairing pool; randomly select m-1 slave parents from the male parent to join the pairing pool; construct a hybrid operator uniform table U based on the m basic male parents and the m-1 secondary male parents _m (m ⁿ ), where each row of U _m (m ⁿ ) represents a chromosome in the offspring p.

本发明实施例提供的一个或多个技术方案至少具有如下技术效果或优点：One or more technical solutions provided by the embodiments of the present invention have at least the following technical effects or advantages:

通过在种群初始化时基于均匀设计思想，采用了混合水平均匀表使得初始样本均匀分布，初始种群元素从元素集中均匀取样，既不会丢失重要的元素；也不会产生大量重复的元素，从而保证了初始种群染色体的多样性。然后在遗传过程中采用自适应的多亲均匀杂交算子产生子代，兼顾交叉和变异的特点。通过这两个技术的结合改进了基因表达式编程的初始化和遗传变异过程，解决了现有技术面临的早熟收敛、收敛过程慢、易陷入局部最优解的技术问题，均匀初始化染色体种群和采用自适应杂交算子使得解决复杂问题时具有全局收敛性使得具有全局收敛性，收敛速度也获得了极大地提升。Based on the idea of uniform design during population initialization, a mixed level uniform table is used to make the initial samples evenly distributed, and the initial population elements are uniformly sampled from the element set, neither losing important elements nor generating a large number of repeated elements, thus ensuring Chromosomal diversity of the initial population. Then, in the genetic process, an adaptive polyparental uniform crossover operator is used to generate offspring, taking into account the characteristics of crossover and mutation. The combination of these two technologies improves the initialization and genetic mutation process of gene expression programming, solves the technical problems of premature convergence, slow convergence process, and easy to fall into local optimal solutions faced by existing technologies, uniformly initializes chromosome populations and uses The adaptive hybridization operator enables global convergence when solving complex problems, and the convergence speed is also greatly improved.

进一步，本发明实施例应用了多种遗传算子和淘汰机制的使用使得算法不易陷入局部极值，提高了获得全局最优解的概率。Further, the embodiment of the present invention applies a variety of genetic operators and the use of the elimination mechanism, so that the algorithm is not easy to fall into the local extremum, and the probability of obtaining the global optimal solution is improved.

2、本发明实施例中基于均匀设计思想生成混合水平均匀表的改进的基因表达式编程方法相比正交设计也减少了实验次数。2. In the embodiment of the present invention, the improved gene expression programming method for generating a mixed-level uniform table based on the idea of uniform design also reduces the number of experiments compared with the orthogonal design.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can also obtain other drawings according to the provided drawings on the premise of not paying creative efforts.

图1为本发明实施例中改进的基因表达式编程方法的流程图；Fig. 1 is the flowchart of the gene expression programming method improved in the embodiment of the present invention;

图2为本发明实施例中生成一混合水平均匀表的流程图；Fig. 2 is the flowchart of generating a mixing level uniform table in the embodiment of the present invention;

图3为本发明实施例中自适应多亲杂交均匀算子的流程图。Fig. 3 is a flow chart of an adaptive multi-parent hybrid uniform operator in an embodiment of the present invention.

具体实施方式Detailed ways

为了解决现有GEP仍然存在早熟收敛、后期收敛较慢、易陷入局部最优解的技术问题，本发明实施例提供了一种改进的基因表达式编程方法，总体思路如下：In order to solve the technical problems that the existing GEP still has premature convergence, slow convergence in the later stage, and easy to fall into a local optimal solution, the embodiment of the present invention provides an improved gene expression programming method, the general idea is as follows:

以染色体长度为水平、由函数集和终止符集为因素生成混合水平均匀表，然后在遗传过程中采用自适应的多亲均匀杂交算子进行多代遗传产生子代，利用子代淘汰父代中比较差的个体得到当前最优解，循环遗传得到全局最优解，或者达到预设的最大遗传代数时就结束遗传过程。A mixed-level uniform table is generated with the chromosome length as the level and the function set and terminator set as factors, and then in the genetic process, an adaptive polyparental uniform crossover operator is used to carry out multi-generation inheritance to generate offspring, and the offspring is used to eliminate the parent generation The poorer individuals get the current optimal solution, and the global optimal solution is obtained by cyclic inheritance, or the genetic process ends when the preset maximum genetic algebra is reached.

通过混合水平均匀表使得初始样本均匀分布，从而根据均匀分布的初始样本进行遗传过程使得每次取样更具代表性，克服了早熟收敛和后期收敛较慢的缺陷。初始种群元素从元素集中均匀取样，既不会丢失重要的元素；也不会产生大量重复的元素，从而保证了初始种群染色体的多样性。自适应多亲均匀杂交算子兼顾了交叉和变异的特点。通过这两个技术的结合改进了基因表达式编程的初始化和遗传变异过程，从而解决了现有技术面临的早熟收敛、收敛过程慢、易陷入局部最优解的技术问题，均匀初始化染色体种群和采用自适应杂交算子使得在解决复杂问题时具有全局收敛性，收敛速度也获得了极大地提升。The uniform distribution of the initial samples is made by mixing the level uniform table, so that the genetic process is carried out according to the uniform distribution of the initial samples, so that each sampling is more representative, and the defects of premature convergence and late convergence are overcome. The elements of the initial population are uniformly sampled from the element set, neither losing important elements nor generating a large number of repeated elements, thus ensuring the diversity of the initial population chromosomes. The self-adaptive polyparental uniform crossover operator takes into account the characteristics of crossover and mutation. Through the combination of these two technologies, the initialization of gene expression programming and the genetic mutation process are improved, thereby solving the technical problems of premature convergence, slow convergence process, and easy to fall into local optimal solution faced by the existing technology, uniform initialization of chromosome population and The self-adaptive hybridization operator has global convergence when solving complex problems, and the convergence speed has also been greatly improved.

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

本发明实施例提供的一种改进的基因表达式编程方法，参考图1所示，包括如下步骤：An improved gene expression programming method provided by an embodiment of the present invention, as shown in FIG. 1 , includes the following steps:

S101、以染色体长度为水平，函数集和终止符集为因素生成一混合水平均匀表，混合水平均匀表由g个矩阵构建，g为一个染色体上的基因组成个数，混合水平均匀表的每行值代表一个染色体。S101. Taking the chromosome length as the level, the function set and the terminator set as factors to generate a mixed level uniform table, the mixed level uniform table is constructed by g matrices, g is the number of genes on a chromosome, and each of the mixed horizontal uniform table A row value represents a chromosome.

比如，人的1号染色体上有3186个基因，则1号染色体由3186个矩阵构建；人的X染色体有1529个基因，则X染色体由1529个矩阵构成。For example, there are 3186 genes on human chromosome 1, and then chromosome 1 is constructed by 3186 matrices; human X chromosome has 1529 genes, and then X chromosome is composed of 1529 matrices.

具体的，染色体长度包括染色体的头部长度h和染色体的尾部长度t，每个染色体都有一个头部和一个尾部，染色体的头部长度h由用户指定，染色体的尾部长度t由如下公式计算得到：t＝h*(n-1)+1，n为常量整数。Specifically, the chromosome length includes the head length h of the chromosome and the tail length t of the chromosome. Each chromosome has a head and a tail. The head length h of the chromosome is specified by the user, and the tail length t of the chromosome is calculated by the following formula Obtain: t=h*(n-1)+1, n is a constant integer.

具体的，函数集为f(x)，终止符集为var(x)，比如，函数集f(x)为{+，-，*，/，Q，E，S，T，C}；终止符集var(x)为{a，b，c，d，e}。Specifically, the function set is f(x), and the terminator set is var(x). For example, the function set f(x) is {+, -, *, /, Q, E, S, T, C}; The character set var(x) is {a, b, c, d, e}.

S101具体包括步骤1～步骤3：S101 specifically includes steps 1 to 3:

步骤1、创建g个调整后等水平均匀表U_n×k((n×k)^h·(v+1)^t)，用于形成对应的g个子混合水平均匀表，其中，h取值为染色体的头部长度，t取值为染色体的尾部长度，k∈(v+1，f+v+1)，v取值为终止符集var(x)的元素数目，f取值为函数集f(x)的元素数目，n为常量整数。Step 1. Create g adjusted equal-level uniform tables U _n×k ((n×k) ^h (v+1) ^t ), which are used to form corresponding g sub-mixed horizontal uniform tables, where h takes the value The head length of the chromosome, the value of t is the length of the tail of the chromosome, k∈(v+1, f+v+1), the value of v is the number of elements in the terminator set var(x), and the value of f is the function set The number of elements of f(x), where n is a constant integer.

步骤2、将每个调整后等水平均匀表中的元素循环复制到对应的空白表中，每个空白表被填满后形成子混合水平均匀表。Step 2. Circularly copy the elements in each adjusted equal-level uniform table to the corresponding blank table, and each blank table is filled to form a sub-mixed horizontal uniform table.

步骤3、根据形成的g个子混合水平均匀表创建出如下公式表示的混合水平均匀表：Step 3. Create a mixed level uniform table represented by the following formula according to the formed g sub-mixed level uniform tables:

U_n*(f+v+1)(g*(f+v+1)^h·(v+1)^t)。U _n*(f+v+1) (g*(f+v+1) ^h ·(v+1) ^t ).

具体的，U_n*(f+v+1)(g*(f+v+1)^h·(v+1)^t)中，h取值为染色体的头部长度，t取值为染色体的尾部长度，v取值为终止符集var(x)的元素数目，f取值为函数集f(x)的元素数目，n为常量整数，g取值为染色体上的基因组成个数。Specifically, in U _n*(f+v+1) (g*(f+v+1) ^h ·(v+1) ^t ), the value of h is the head length of the chromosome, and the value of t is the length of the chromosome Tail length, v is the number of elements in the terminator set var(x), f is the number of elements in the function set f(x), n is a constant integer, and g is the number of genes on the chromosome.

具体的，参考图2所示，生成一混合水平均匀表通过如下循环步骤完成：Specifically, as shown in FIG. 2, generating a mixed level uniform table is completed through the following cyclic steps:

S1011、设置染色体和基因的相关参数。包括：染色体的头部长度：h，函数集：f(x)，f(x)的元素数目：f，终止符集：var(x)，var(x)的元素数目：v，染色体的尾部长度：t＝h*(n-1)+1，基因组成个数：g，染色体数目：n*(f+v+1)，常量整数：n。对染色体编码时，染色体是固定长度的字符串，由函数集f(x)和终止符集var(x)中元素组成。S1011, setting related parameters of chromosomes and genes. Including: head length of chromosome: h, function set: f(x), number of elements of f(x): f, terminator set: var(x), number of elements of var(x): v, tail of chromosome Length: t=h*(n-1)+1, number of genes: g, number of chromosomes: n*(f+v+1), constant integer: n. When encoding a chromosome, the chromosome is a fixed-length string consisting of elements in the function set f(x) and the terminator set var(x).

S1012、给函数集f(x)和终止符集var(x)中的元素分配标签号。以作为矩阵的初始成员。S1012. Assign label numbers to elements in the function set f(x) and the terminator set var(x). as the initial member of the matrix.

S1013、创建一原始等水平均匀表U_n*k((n*k)^h+t)，其中，h+t表示矩阵列数，h取值为染色体的头部长度，t取值为染色体的尾部长度，n*k表示矩阵行数，k∈(v+1，f+v+1)，n为常量整数。S1013. Create an original equal-level uniform table U _n*k ((n*k) ^h+t ), wherein, h+t represents the number of matrix columns, h is the head length of the chromosome, and t is the length of the chromosome Tail length, n*k represents the number of matrix rows, k∈(v+1, f+v+1), n is a constant integer.

具体的，原始等水平均匀表U_n*k((n*k)^h+t)的创建，首先需要构建矩阵U_n*k((n*k)^h+t)，并用generation-vectors(矢量生成)方法对矩阵U_n*k((n*k)^h+t)进行元素填充得到原始等水平均匀表。比如，对创建的原始等水平均匀表举例，给定原始等水平均匀表为U₆(6³)如下表1所示：Specifically, to create the original equal-level uniform table U _n*k ((n*k) ^h+t ), it is first necessary to construct a matrix U _n*k ((n*k) ^h+t ), and use generation-vectors (vectors Generation) method fills the elements of the matrix U _n*k ((n*k) ^h+t ) to obtain the original equal-level uniform table. For example, for the example of the created original equal-level uniform table, the original equal-level uniform table is given as U ₆ (6 ³ ) as shown in Table 1 below:

表1.原始等水平均匀表U₆(6³)Table 1. The original equal level uniform table U ₆ (6 ³ )

A列Column A B列Column B C列Column C 1行1 line 11 22 33 2行2 lines 22 44 66 3行3 lines 33 66 22 4行4 lines 44 11 55 5行5 lines 55 33 11 6行6 lines 66 55 44

从表1中A列、B列、C列中同一数字都只出现一次，即保证：每个因素在每个水平仅做一次试验；任意两个因素的试验点在平面的格子上，每行、每列有且仅有一个试验点。From Table 1, the same number appears only once in column A, column B, and column C, that is, it is guaranteed that each factor is only tested once at each level; the test points of any two factors are on the grid of the plane, and each row , each column has one and only one test point.

S1014、通过拟水平法调整原始等水平均匀表，对调整后的原始等水平均匀表中染色体的头部长度进行限制，以及虚拟出调整后的原始等水平均匀表的部分水平，以形成调整后等水平均匀表：U_n×k((n×k)^h·(v+1)^t)。S1014. Adjust the original equal-level uniform table by the quasi-level method, limit the head length of the chromosome in the adjusted original equal-level uniform table, and virtualize partial levels of the adjusted original equal-level uniform table, so as to form the adjusted original equal-level uniform table. Equal level uniform table: U _n×k ((n×k) ^h ·(v+1) ^t ).

具体的，限制原始等水平均匀表的头部取值，具体通过如下公式进行：Specifically, limit the value of the head of the original equal-level uniform table, specifically through the following formula:

$u u {* *}_{i i j j} = = \{\begin{matrix} {u u}_{i i j j} & mod mod & ((f f + + v v + + 11)),, {{i i &Element; &Element; ((11,, n no \times \times k k)),, j j &Element; &Element; ((11,, h h))}},, {u u}_{i i j j} &LeftArrow; &LeftArrow; u u {* *}_{i i j j} \\ {u u}_{i i j j} & mod mod & ((v v + + 11)),, {{i i &Element; &Element; ((11,, n no \times \times k k)),, j j &Element; &Element; ((h h + + 11,, h h + + t t))}},, {u u}_{i i j j} &LeftArrow; &LeftArrow; u u {* *}_{i i j j} \end{matrix}$

其中，k∈(v+1，f+v+1)，n为常量整数，v取值为终止符集var(x)的元素数目，f取值为函数集f(x)的元素数目。Among them, k∈(v+1, f+v+1), n is a constant integer, v takes the value of the number of elements in the terminator set var(x), and f takes the value of the number of elements in the function set f(x).

S1015、将调整后等水平均匀表中的元素循环复制到一个空白表中，空白表被填满后形成一子混合水平均匀表。S1015. Circularly copy the elements in the adjusted equal-level uniform table to a blank table, and the blank table is filled to form a sub-mixed horizontal uniform table.

S1016、令g＝g-1后判断是否g＝0，若是，则执行S1017，否则返回步骤S1013。直到创建所有基因的对应的子混合水平均匀表。S1016. Determine whether g=0 after setting g=g-1, if yes, execute S1017, otherwise return to step S1013. until the corresponding submixture levels of all genes are created uniformly.

S1017、通过direct-product(直积)方法对循环步骤S1011～S1016生成的g个子混合水平均匀表进行处理，创建出矩阵U_n*(f+v+1)(g*(f+v+1)^h·(v+1)^t)，以形成混合水平均匀表U_n*(f+v+1)(g*(f+v+1)^h·(v+1)^t)。S1017, process the g sub-mixed horizontal uniform tables generated by the loop steps S1011～S1016 by direct-product (direct product) method, and create a matrix U _n*(f+v+1) (g*(f+v+1 ) ^h ·(v+1) ^t ), to form a mixed level uniform table U _n*(f+v+1) (g*(f+v+1) ^h ·(v+1) ^t ).

S102、子代产生步骤：对基于混合水平均匀表产生的初始种群应用自适应多亲杂交算子产生子代p，p依次取1,2,…,P，其中，P为预设最大遗传代数。S102. Progeny generation step: apply an adaptive multi-parent hybridization operator to the initial population generated based on the mixed level uniform table to generate offspring p, where p is 1, 2, ..., P in turn, where P is the preset maximum genetic algebra .

在具体实施过程中，由于子代产生步骤为多代遗传过程，针对为第一次子代产生时，即p＝1，直接基于S101生成的混合水平均匀表中每一行值产生该初始种群的染色体。对初始种群应用自适应多亲杂交算子产生子代1。初始种群的染色体的质量极大地影响着整个收敛过程，染色体的多样性也决定着搜索空间的效率，本发明采用了混合水平均匀表使得初始样本均匀分布，初始种群元素从元素集中均匀取样保证了初始种群染色体的多样性。对第2代父本应用自适应多亲杂交算子产生子代2；第2代父本为来自于初始种群产生的子代1，对第3代父本应用自适应多亲杂交算子产生子代3，第3代父本为来自于子代2，依次类推。In the specific implementation process, since the generation of offspring is a multi-generational genetic process, when the offspring is generated for the first time, that is, p=1, the value of each row in the mixed level uniform table generated by S101 is directly used to generate the initial population. chromosome. Apply the adaptive multi-parent cross operator to the initial population to produce offspring 1. The quality of the chromosomes of the initial population greatly affects the entire convergence process, and the diversity of the chromosomes also determines the efficiency of the search space. The invention uses a mixed level uniform table to make the initial samples evenly distributed, and the initial population elements are evenly sampled from the element set to ensure Chromosomal diversity in the initial population. Apply the adaptive multi-parent hybridization operator to the second-generation male parent to generate offspring 2; the second-generation male parent is the offspring 1 generated from the initial population, and apply the adaptive multi-parental hybridization operator to the third-generation male parent to generate Offspring 3, the father of the third generation comes from offspring 2, and so on.

具体的，将基于初始种群所产生父本的m*n个子段进行均匀杂交组合产生子代p，其中，m表示父本所包括父本染色体的数目，n表示一个染色体所划分成子段的数目。在不同代遗传中，基于初始种群所产生父本不同，具体来讲，在第一代遗传中，父本为初始种群中的染色体，在第二代遗传中，父本来自于子代1中的染色体，依次类推。比如，在某代种群中共有200个染色体，每个染色体被分为1000个子段，设计一个均匀的表格来对这200*1000个子段进行均匀杂交组合，以产生下一代子代。Specifically, the m*n sub-segments of the male parent generated based on the initial population are uniformly crossed and combined to generate the offspring p, where m represents the number of paternal chromosomes included in the male parent, and n represents the number of sub-segments divided into one chromosome . In different generations of heredity, the paternal parents are different based on the initial population. Specifically, in the first-generation heredity, the paternal parent is the chromosome in the initial population. In the second-generation heredity, the male parent comes from the offspring 1 chromosomes, and so on. For example, there are 200 chromosomes in a population of a certain generation, each chromosome is divided into 1000 sub-segments, and a uniform table is designed to uniformly cross-combine these 200*1000 sub-segments to generate the next generation of offspring.

参考图3对自适应多亲杂交均匀算子的设计进行详细描述：Referring to Figure 3, the design of the self-adaptive multi-parent hybridization uniform operator is described in detail:

S1021、把每个染色体L_p划分为n个互不相交的子集，L_p∈(t∈(1,P))，其中，p∈1,2,…,P，p代表目前进行的遗传代数。S1021. Divide each chromosome L _p into n mutually disjoint subsets, L _p ∈ (t ∈ (1,P)), where p ∈ 1,2,...,P, p represents the current inheritance algebra.

S1022、从父本选出待杂交的m个基父本放入配对池。S1022. Select m basic male parents to be crossed from the male parents and put them into the matching pool.

具体而言，基父本为发生杂交的染色体，从旁协助参与杂交的染色体称为从父本。自适应多亲杂交算子兼有交叉和变异的特点，即可在基父本周围搜寻新解，又能完成基父本与从父本之间的信息交流。由于自适应多亲杂交算子的杂交规模由基父本当前状态控制，则若基父本适应值f_p与当前最优适应值f_max相差较大，就构造大规模混合水平均匀表指导杂交，以加强父本间的信息交流；若相差不大，就缩小混合水平均匀表的规模，以防止大变异破坏优秀的基因片段。在具体实施过程中，可按如下经验公式确定组成混合水平均匀表的基父本数m：Specifically, the basic parent is the chromosome that hybridizes, and the chromosome that assists in hybridization is called the secondary parent. The self-adaptive multi-parent hybrid operator has the characteristics of crossover and mutation, which can search for new solutions around the basic parent and complete the information exchange between the basic parent and the secondary parent. Since the hybridization scale of the self-adaptive multi-parent hybridization operator is controlled by the current state of the basic parent, if the basic parent's fitness value f _p is greatly different from the current optimal fitness value f _max , a large-scale mixed level uniform table is constructed to guide the hybridization , to strengthen the information exchange between the male parents; if the difference is not large, reduce the size of the mixed level uniform table to prevent large mutations from destroying excellent gene segments. In the specific implementation process, the basic number m of the mixed level uniform table can be determined according to the following empirical formula:

其中，δ∈(0,1)，p是目前进行的遗传代数，p∈1,2,…P。Among them, δ∈(0,1), p is the current genetic algebra, p∈1,2,...P.

S1023、从父本中随机选择m-1个从父本加入配对池。S1023. Randomly select m-1 male parents from the male parents to join the matching pool.

S1024、基于m个基父本与m-1个从父本构造出杂交算子均匀表U_m(mⁿ)，其中，U_m(mⁿ)的每一行都代表子代p的一个染色体。S1024. Construct a crossover operator uniform table U _m (m ⁿ ) based on the m basic fathers and m-1 slave fathers, where each row of U _m (m ⁿ ) represents a chromosome of the offspring p.

在S102之后，执行S103、对子代p应用基因重组算子和Dc域重组算子进行进化处理，以获得与子代对应的进化后子代。After S102, execute S103, apply the gene recombination operator and the Dc domain recombination operator to the offspring p to perform evolution processing, so as to obtain an evolved offspring corresponding to the offspring.

其中，Dc域是基因编码中的常数域，通常包括染色体头部、染色体尾部和附加的常数域。Among them, Dc domain is a constant domain in gene coding, usually including chromosome head, chromosome tail and additional constant domain.

S104、依据精英选择策略从进化后子代中选择适应度最高的染色体为当前最优解；S104. Select the chromosome with the highest fitness from the evolved offspring according to the elite selection strategy as the current optimal solution;

在具体实施过程，根据适应函数计算出进化后子代中每个染色体的适应值，将染色体根据适应值高低进行排序，从进化后子代中选择适应值最大的染色体保留，即为当代的最优解，看其是否为全局最优解。In the specific implementation process, the fitness value of each chromosome in the evolved offspring is calculated according to the fitness function, the chromosomes are sorted according to the fitness value, and the chromosome with the largest fitness value is selected from the evolved offspring to keep, that is, the most contemporary chromosome. optimal solution to see if it is the global optimal solution.

S105、在当前最优解为全局最优解，或达到预设最大遗传代数时结束遗传过程，否则令p＝p+1后返回步骤子代产生步骤。S105. End the genetic process when the current optimal solution is the global optimal solution, or the preset maximum genetic algebra is reached; otherwise, set p=p+1 and return to the step of generating offspring.

在具体实施过程中，对遗传过程应用精英选择策略，还将选择出的预设百分比的精英染色体遗传到下一代中。所谓精英选择策略为：在群体进化过程中的精英染色体(所谓精英染色体是迄今出现的最好个体)直接遗传到下一代，而不进行配对交叉，这种选择操作又称为复制。比如：遗传进化到第p代时，群体中a(p)为精英染色体。则把a(p)加入到第p+1代中，作为第p+1代的第N+1个染色体，N为第p代群体的大小。In the specific implementation process, the elite selection strategy is applied to the genetic process, and the selected preset percentage of elite chromosomes is passed on to the next generation. The so-called elite selection strategy is: in the process of group evolution, the elite chromosomes (the so-called elite chromosomes are the best individuals that have appeared so far) are directly inherited to the next generation without paired crossover. This selection operation is also called replication. For example: when the genetic evolution reaches the pth generation, a(p) in the population is the elite chromosome. Then a(p) is added to the p+1th generation as the N+1th chromosome of the p+1th generation, and N is the size of the p-th generation population.

具体的，预设百分比可以设定为5％，6％等，本领域技术人员在具体应用过程中进行自定义预设百分比的取值，比如预设百分比为设定为5％，则根据当代染色体的适应值从大到小选择5％的染色体作为精英染色体直接复制到下一代。可见，选择出的精英染色体的适应值均大于未被选择的染色体的适应值。Specifically, the preset percentage can be set to 5%, 6%, etc., and those skilled in the art can customize the value of the preset percentage in the specific application process. For example, if the preset percentage is set to 5%, then according to the contemporary The fitness value of chromosomes from large to small selects 5% of chromosomes as elite chromosomes to be directly copied to the next generation. It can be seen that the fitness values of selected elite chromosomes are greater than those of unselected chromosomes.

在S103之后，自适应多亲杂交均匀算子还包括如下设计：根据适应函数计算出进化后子代中的每个染色体的适应值G^P(L₁,L₂,…,L_n),p是目前进行的遗传代数，p∈1,2,…P，即计算杂交算子均匀表U_m(mⁿ)中每一行值的最大适应值，选择进化后子代中适应值最大的染色体作为基父本用于杂交后遗传给下一代。After S103, the self-adaptive polyparental hybridization uniform operator also includes the following design: Calculate the fitness value G ^P (L ₁ ,L ₂ ,…,L _n ),p of each chromosome in the offspring after evolution according to the fitness function is the current genetic algebra, p∈1,2,...P, that is to calculate the maximum fitness value of each row in the uniform table U _m (m ⁿ ) of the hybridization operator, and select the chromosome with the maximum fitness value in the offspring after evolution as The basic male parent is passed on to the next generation after crossing.

本发明的发明人通过试验验证了本发明实施例中技术方案应用在相关规则挖掘方面的有效性。The inventors of the present invention have verified the effectiveness of the application of the technical solutions in the embodiments of the present invention in mining related rules through experiments.

用Iris数据集来验证算法的有效性，Iris是UCI(UniversityofCaliforniaIrvine，加州大学)机器学习知识库中的一个数据集。在实验过程中，我们选择Iris数据集中2/3的数据用于训练，Iris数据集中剩余1/3的数据用于测试。试验中用现有技术SA-MGEP作为试验对比。本发明和SA-MGEP均运行100次，并通过如下公式来量化分析规则集的多样性：Use the Iris data set to verify the effectiveness of the algorithm. Iris is a data set in the UCI (University of California Irvine, University of California) machine learning knowledge base. During the experiment, we selected 2/3 of the data in the Iris dataset for training, and the remaining 1/3 of the data in the Iris dataset for testing. In the test, the prior art SA-MGEP was used as a test comparison. Both the present invention and SA-MGEP run 100 times, and quantify and analyze the diversity of rule sets by the following formula:

${DIV DIV}_{p p o o p p} = = \frac{{Σ Σ}_{m m = = 11}^{N N} {Σ Σ}_{n no = = 11}^{N N} {d d}_{m m,, n no}}{{N N}^{22}}$

其中，N代表种群大小，m代表染色体个数，n代表一个染色体的子段数，d_m,n代表个体m和n之间的汉明距离。Among them, N represents the population size, m represents the number of chromosomes, n represents the number of sub-segments of a chromosome, and d _m,n represents the Hamming distance between individuals m and n.

本发明与现有技术SA-MGEP试验结果对比如下表2所示，表2中准确率代表每轮实验准确度的平均值，测试集的预测率代表每一轮测试实验预测准确率的平均值。The present invention compares with prior art SA-MGEP test result as shown in table 2 below, and in table 2, accuracy rate represents the average value of each round of experimental accuracy, and the prediction rate of test set represents the average value of each round of test experiment prediction accuracy rate .

表2.本发明与现有技术SA-MGEP试验结果对比Table 2. The present invention compares with prior art SA-MGEP test result

时间(s)time(s) 规则数number of rules 准确率(％)Accuracy(%) 测试集的预测率(％)Prediction rate on test set (%) 本发明this invention 16.816.8 3636 62.462.4 51.251.2 现有技术current technology 22.422.4 2828 48.748.7 38.338.3

通过表2所示的本发明与现有技术的对比可以看出，应用本发明实施例中的技术方案获得了更高的准确度和较低的时间消耗，且改进的基因表达式算法能从Iris数据集获得更多的关联规则。As can be seen from the comparison between the present invention and the prior art shown in table 2, the application of the technical solutions in the embodiments of the present invention has obtained higher accuracy and lower time consumption, and the improved gene expression algorithm can be obtained from The Iris dataset gets more association rules.

通过上述本发明实施例提供的一个或多个实施例，至少具有如下技术效果或优点：One or more embodiments provided by the above embodiments of the present invention have at least the following technical effects or advantages:

通过本发明实施例通过在种群初始化时基于均匀设计思想，采用了混合水平均匀表使得初始样本均匀分布，初始种群元素从元素集中均匀取样，既不会丢失重要的元素；也不会产生大量重复的元素，从而保证了初始种群染色体的多样性。然后在遗传过程中采用自适应的多亲均匀杂交算子产生子代，兼顾交叉和变异的特点。通过这两个技术的结合改进了基因表达式编程的初始化和遗传变异过程，从而解决了现有技术面临的早熟收敛、收敛过程慢、易陷入局部最优解的技术问题，均匀初始化染色体种群和采用自适应杂交算子使得在解决复杂问题时具有全局收敛性，收敛速度也获得了极大地提升。Through the embodiment of the present invention, based on the idea of uniform design when the population is initialized, a mixed level uniform table is used to make the initial samples uniformly distributed, and the initial population elements are uniformly sampled from the element set, neither losing important elements nor generating a large number of repetitions elements, thus ensuring the diversity of the initial population chromosomes. Then, in the genetic process, an adaptive polyparental uniform crossover operator is used to generate offspring, taking into account the characteristics of crossover and mutation. Through the combination of these two technologies, the initialization of gene expression programming and the genetic mutation process are improved, thereby solving the technical problems of premature convergence, slow convergence process, and easy to fall into local optimal solution faced by the existing technology, uniform initialization of chromosome population and The self-adaptive hybridization operator has global convergence when solving complex problems, and the convergence speed has also been greatly improved.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.

Claims

1. an improved gene expression programming method, is characterized in that, comprises the steps:

A mixed level uniform table is generated with the chromosome length as the level and the function set and the terminator set as factors, wherein the mixed level uniform table is constructed by g matrices, g is the number of genes on a chromosome, and the mixed Each row value of the horizontal uniform table represents a chromosome;

Progeny generation step: apply an adaptive multi-parental hybridization operator to the initial population generated based on the mixed level uniform table to generate offspring p, where p is 1, 2, ..., P in turn, where P is the preset maximum genetic algebra ;

Applying a gene recombination operator and a Dc domain recombination operator to the offspring p to perform evolution processing, so as to obtain an evolved offspring corresponding to the offspring p;

Select the chromosome with the highest fitness from the evolved offspring according to the elite selection strategy as the current optimal solution;

When the current optimal solution is the global optimal solution or reaches the preset maximum genetic algebra, end the genetic process; otherwise set p=p+1 and return to the offspring generation step.

2. the improved gene expression programming method as claimed in claim 1, is characterized in that, described take chromosome length as level, by function set and terminator set as factor generation a mixed level uniform table, specifically comprises the steps:

Create g adjusted equal-level uniform tables U _n×k ((n×k) ^h (v+1) ^t ), where h is the length of the head of the chromosome, and t is the length of the tail of the chromosome. k∈(v+1, f+v+1), v is the number of elements of the terminator set, f is the number of elements of the function set, and n is a constant integer;

Copy the elements in each of the adjusted equal-level uniform tables U _n×k ((n×k) ^h ·(v+1) ^t ) to the corresponding blank table, and each of the blank tables is filled After full, a submixed horizontal uniform table is formed;

Create the mixed level uniform table represented by the following formula according to the formed g sub-mixed level uniform tables:

U _n*(f+v+1) (g*(f+v+1) ^h (v+1) ^t )

Wherein, the value of h is the head length of the chromosome, the value of t is the tail length of the chromosome, the value of v is the number of elements of the terminator set, the value of f is the number of elements of the function set, and n is a constant integer , and the value of g is the number of genes on the chromosome.

3. the improved gene expression programming method as claimed in claim 2, is characterized in that, the equal level uniform table U _n×k ((n×k) ^h (v+1) ^t after the described creation g adjustments ), including the following steps:

Create the original equal-level uniform table U _n*k ((n*k) ^h+t ), wherein, h+t represents the number of matrix columns, h takes the value of the head length of the chromosome, and t takes the value of the tail length of the chromosome, n*k represents the number of matrix rows, k∈(v+1, f+v+1), v is the number of elements of the terminator set, f is the number of elements of the function set, and n is a constant integer;

Adjust the original equal level uniform table by quasi-level method;

The head length of the chromosome in the adjusted original equal-level uniform table is limited, and partial levels of the adjusted original equal-level uniform table are virtualized to form the adjusted equal-level uniform table.

4. the improved gene expression programming method as claimed in claim 1, is characterized in that, described initial population application self-adaptive multi-parent hybridization operator is produced child p to the initial population that produces based on described mixing level uniform table, specifically is :

The m*n sub-segments of the male parent generated based on the initial population are uniformly hybridized and combined to generate the offspring p, where m represents the number of chromosomes included in the male parent, and n represents one of the male parents The number of subsegments the chromosome is divided into.

5. The improved gene expression programming method as claimed in claim 4, characterized in that, applying gene recombination operator and Dc domain recombination operator to said progeny p to carry out evolutionary processing, to obtain the After the evolved offspring corresponding to the offspring p, the method also includes:

Calculate the fitness value of each chromosome in the offspring after evolution according to the fitness function;

According to the fitness value of each chromosome, a preset percentage of elite chromosomes is selected from the evolved offspring and passed on to the next generation.

6. The improved gene expression programming method according to claim 5, wherein, the m*n sub-segments of the male parent produced based on the initial population are uniformly hybridized to generate the offspring p ,include:

Select m basic fathers from the fathers and put them into the matching pool;

Randomly select m-1 from the male parent to join the paired pool from the male parent;

A hybrid operator uniform table U _m (m ⁿ ) is constructed based on the m basic male parents and the m-1 secondary male parents, where each row of U _m (m ⁿ ) represents the of a chromosome.