CN118014010B

CN118014010B - Multi-objective evolutionary nerve architecture searching method based on multiple group mechanisms and agent models

Info

Publication number: CN118014010B
Application number: CN202410418128.3A
Authority: CN
Inventors: 朱陈陈; 薛羽
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-04-09
Filing date: 2024-04-09
Publication date: 2024-06-18
Anticipated expiration: 2044-04-09
Also published as: CN118014010A

Abstract

The invention discloses a multi-objective evolutionary neural architecture searching method based on multiple group mechanisms and agent models in the technical field of automated machine learning, aiming at solving the problems that the labor cost is high, the efficiency is low and multi-objective scenes are difficult to consider when a convolutional neural network for various visual tasks is designed in the prior art; initializing an evolution search process through an initial population, and carrying out multi-target evolution search in a search space by combining a pre-trained agent model and a plurality of group mechanisms to obtain a candidate neural network architecture; and screening out the neural network architecture taking two optimization targets into consideration from the candidate neural network architectures. According to the invention, the searching process is accelerated through the proxy model, the diversity of solutions is expanded by using a plurality of group mechanisms, so that the efficient neural architecture searching is performed, and a group of network architectures considering a plurality of optimization targets can be obtained on the target data set.

Description

Multi-objective evolutionary nerve architecture searching method based on multiple group mechanisms and agent models

Technical Field

The invention relates to a multi-target evolutionary neural architecture searching method based on a plurality of group mechanisms and agent models, and belongs to the technical field of automatic machine learning.

Background

Convolutional neural networks (Convolutional Neural Network, CNN) have enjoyed tremendous success in a variety of computer vision tasks. However, conventional CNNs are typically designed manually by experts with a great deal of field knowledge and experience. Not every interested user has such expertise, and even for an expert, designing CNNs is a time-consuming and constantly-trial-and-error process. Neural architecture search (Neural Architecture Search, NAS) can simplify and automate the design of deep convolutional neural networks and can obtain neural structures that are more competitive than manual neural structures. In recent years, researchers have developed a number of NAS approaches, which have attracted increasing attention in the industry and academia in various learning tasks such as object detection, semantic segmentation, and natural language processing.

The evolution algorithm is a search method based on the evolution principle, and the optimal solution is searched through the evolution and selection of the population. NAS methods based on evolutionary algorithms are widely studied due to their global search capabilities and flexibility. However, a significant bottleneck in this area of NAS is the need to evaluate a large number of network architectures during the search, which requires a large consumption of computing resources. Many researchers have proposed many approaches to improve the efficiency of NAS algorithms, but existing acceleration methods still have to train a large number of network architectures or only pay attention to the absolute classification accuracy values.

In addition to the accuracy of neural networks, practical applications also require NAS approaches to find computationally efficient network architectures, e.g., low power consumption in mobile applications and low delay in autonomous driving. Maximizing accuracy and minimizing network complexity are two goals of competing nature, and therefore require multi-objective optimization of NAS. In the multi-objective evolution NAS, the diversity of the algorithm may be gradually lost along with the increase of the iteration times, so that the algorithm converges to the local optimal solution and falls into the local optimal solution. How to maintain diversity of populations without sacrificing convergence is a challenging problem.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a multi-target evolutionary neural architecture searching method based on multiple group mechanisms and proxy models, which accelerates the searching process through the proxy models, expands the diversity of solutions by using the multiple group mechanisms to perform efficient neural architecture searching, can obtain a group of network architectures considering multiple optimization targets on a target data set, improves the searching efficiency and reduces the consumption of computing resources.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

in one aspect, the present invention provides a multi-objective evolutionary neural architecture search method based on multiple group mechanisms and proxy models, comprising:

constructing a search space, and encoding a neural network architecture in the search space;

Determining two optimization targets, wherein the optimization targets are network complexity and classification precision respectively;

Carrying out population initialization on the neural network architecture in the search space according to the network complexity to obtain an initial population for evolution search;

Initializing an evolution search process through an initial population, and carrying out multi-target evolution search in a search space by combining a pre-trained agent model and a plurality of group mechanisms to obtain a candidate neural network architecture;

and screening the neural network architecture taking the network complexity and the classification accuracy into consideration.

Further, the construction method of the search space comprises the following steps:

setting search parameters;

searching a trunk part of the convolutional neural network according to the search parameters to obtain a convolutional neural network which meets the conditions, wherein all the convolutional neural networks which meet the conditions form a search space;

The trunk part of the convolutional neural network comprises five MBConvBlocks modules which are sequentially connected, wherein the MBConvBlocks module consists of a plurality of layers, and each layer adopts an inverted bottleneck structure;

The search parameters comprise the number of layers of each MBConvBlocks modules, the convolution kernel size of each layer, the expansion rate and the resolution.

Further, encoding the neural network architecture in the search space includes:

Coding a neural network architecture in a search space by utilizing an integer character string with a fixed length, wherein the coding content comprises resolution, architecture layer number and expected expansion rate of convolution kernel size;

When the coding length of the neural network architecture is smaller than the fixed length, the fixed length is achieved by padding zeros.

Further, initializing a neural network architecture in a search space according to network complexity to obtain an initial population, including:

Random sampling from search space A neural network architecture;

Calculating the complexity of each neural network architecture;

dividing all neural network architectures into segments according to the complexity level of each neural network architecture Each population, wherein each population comprises/>A neural network architecture;

Random sampling from each population The neural network architecture is combined to obtain an initial population, wherein the initial population comprises/>A neural network architecture, wherein >。

Further, after initializing the population of the neural network architecture in the search space according to the network complexity to obtain the neural network architecture after the population initialization, the method further includes:

constructing a proxy model;

training the agent model by using the initial population, taking the neural network architecture in the initial population as input, and taking the comparison result of the merits of the neural network architecture as output to obtain the pre-trained agent model.

Further, the training process of the proxy model includes:

training the neural network architecture in the initial population by using a random gradient descent algorithm to obtain the classification precision of the neural network architecture, wherein each neural network architecture and the classification precision corresponding to each neural network architecture are taken as one sample to form an original training sample set;

pairing original training sample sets in sequence, namely Individual samples and rear remainder/>The samples are paired separately, if the/>The classification precision of each sample is superior to that of the paired samples, the label is marked as1, otherwise, the label is marked as 0, the sample with more 1 labels is obtained as the excellent sample, and the sample with more 0 labels is obtained as the inferior sample.

Further, initializing an evolution search process through an initial population, and performing multi-target evolution search in a search space by combining a pre-trained agent model and a plurality of group mechanisms to obtain a candidate neural network architecture, wherein the method comprises the following steps:

S1, non-dominant sorting and crowding degree calculation are carried out on a neural network architecture in an initial population, and a main population and a sub population are obtained;

S2, calculating a threshold value of the evolution of the current round, wherein the expression is as follows:

；

Wherein, Representing a threshold value/>Representing a random access function,/>For randomly generated/>Number between,/>Representing the number of evolutions,/>，/>Is the total evolution times;

S3, comparing the threshold value with a preset super parameter, if the threshold value is larger than the super parameter, taking the main population and the auxiliary population as parent individuals together, and if the threshold value is smaller than the super parameter, taking the main population as the parent individuals; crossing and mutating parent individuals to generate a series of child individuals;

S4, processing through a pre-trained agent model to obtain classification precision ranking of the child individuals, and calculating complexity of the child individuals; non-dominant ranking and congestion degree calculation are carried out on child individuals according to classification precision ranking and complexity, and reservation is carried out A child generation individual;

S5, will And merging the sub-generation individuals with the initial population, and repeating S1-S5 until the total evolution times are reached, so as to obtain the candidate neural network architecture.

Further, performing non-dominant ranking and congestion degree calculation on the neural network architecture in the initial population to obtain a main population and a sub population, including:

Non-dominated sorting is carried out on the neural network architecture in the initial population, and a first level in the non-dominated sorting is used as a main population;

removing the main population, calculating the congestion degree of the rest neural network architecture, and selecting the highest congestion degree ranking The individual neural network architecture acts as a sub-population.

Further, non-dominant ranking and crowding degree calculation are carried out on child individuals according to classification precision ranking and complexity, and reservation is carried outA child, comprising:

Non-dominant ranking is carried out by utilizing a rapid non-dominant algorithm according to classification precision ranking and complexity of the child individuals, and a non-dominant ranking result of the child individuals is obtained;

Selecting child individuals with non-dominant ranking results at a first level to calculate the degree of congestion, obtaining a congestion degree calculation result, and reserving the front of the congestion degree calculation result Sub-individuals.

Further, screening the neural network architecture considering two optimization targets of network complexity and classification accuracy from the candidate neural network architectures, including:

Non-dominant ranking is carried out on the candidate neural network architecture, and a non-dominant ranking result is obtained;

And selecting a plurality of neural network architectures with non-dominant sequencing results positioned at the front part, and training the neural network architectures by using a random gradient descent algorithm to obtain the neural network architecture considering two optimization targets.

In another aspect, the present invention further provides a computer program product, including a computer program, where the computer program when executed by a processor implements the steps of the multi-objective evolutionary neural architecture searching method based on multiple swarm mechanisms and agent models described in any of the above.

Compared with the prior art, the invention has the beneficial effects that:

According to the invention, the agent model is constructed based on the pairwise comparison relation, and the precision ranking of the system structure is predicted instead of the absolute precision value of the network structure, so that the search is more efficient; a plurality of group mechanisms are provided, the main population and the auxiliary population cooperate with each other in the searching process, the main population is dominant and evolved, the auxiliary population expands diversity, the algorithm can be effectively prevented from being trapped into local optimum, and the convergence rate of the algorithm is accelerated;

in the searching process, the agent model is used, so that the time consumption can be greatly reduced, the searching efficiency is improved, and meanwhile, the diversity of solutions can be expanded by using various group mechanisms, and the algorithm is prevented from being trapped into local optimum.

Drawings

FIG. 1 is a flow chart of a multi-objective evolutionary neural architecture search method based on multiple swarm mechanisms and agent models in an embodiment of the invention;

FIG. 2 is a schematic diagram of a search space construction flow of a multi-objective evolutionary neural architecture search method based on multiple group mechanisms and proxy models in an embodiment of the present invention;

FIG. 3 is a schematic diagram of architecture coding flow of a multi-objective evolutionary neural architecture search method based on multiple group mechanisms and proxy models in an embodiment of the present invention;

Fig. 4 is a schematic diagram of a proxy model processing flow of a multi-objective evolutionary neural architecture search method based on multiple group mechanisms and proxy models according to an embodiment of the invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Example 1

As shown in FIG. 1, the multi-objective evolutionary neural architecture searching method based on multiple group mechanisms and agent models provided by the embodiment of the invention comprises the following steps of

The first step, constructing a proper search space, and coding a neural network architecture in the search space:

The convolutional neural network for searching comprises three parts, namely a starting part, a trunk part and an output part, wherein the starting part extracts characteristics, the output part outputs categories, the two parts do not need searching, the trunk part needs searching, the trunk part of the convolutional neural network comprises five MBConvBlocks modules (modules with inverse linear bottleneck structures of depth separable convolution) which are sequentially connected, each MBConvBlocks module consists of a series of layers, and each layer adopts an inverse bottleneck structure and comprises three parts: one or more of Convolution, one depth separable convolution, one/>And (5) convolution.

Search parameters including the number of layers (depth) of each Block, the convolution kernel size of each layer, the expansion ratio, and the resolution (input image size) are set. In this embodiment, the candidate range of the number of layers is {2,3,4}, the expansion ratio is selected from {3,4,6}, and the kernel size is selected from {3,5,7 }. Further, the candidate range of the input image size is from 192 to 256, and the step size is 4. As shown in fig. 2, the trunk part of the convolutional neural network is searched according to the search parameters to obtain the convolutional neural network which meets the conditions, and all the convolutional neural networks which meet the conditions form a search space.

As shown in fig. 3, the neural network architecture in the search space is encoded with a fixed length integer string, which in this embodiment is 46. The encoded content includes resolution, architecture layer number, convolution kernel size, and expected expansion rate.

If the coding length of the architecture with fewer layers is less than 46 bits, a fixed length is achieved by padding zeros.

Secondly, determining an optimization target:

in the present embodiment, two optimization targets are determined, which are the network complexity (model size, calculation amount, etc.) and classification accuracy, respectively.

Thirdly, initializing a population of the neural network architecture in the search space according to network complexity to obtain an initial population for evolution search:

Random sampling from search space And calculating the complexity of each neural network architecture. The complexity of the network can be generally measured by the parameters, the calculated amount, the network delay and the like of the network, and the calculated amount is selected to be used as a second target, namely FLOPs (floating point operation times), and the calculation process comprises two parts of a convolution layer and a full connection layer, and for the convolution layer, the calculated amount is expressed as follows:

Wherein, Representing the number of input channels,/>Representing the number of output channels and the number of convolution kernels of that layer,/>For convolution kernel size,/>And/>Representing the length and width of the feature map, respectively.

For the fully connected layer, the calculation formula is as follows:

Wherein, To input the number of neurons,/>To output the number of neurons.

Dividing all neural network architectures into segments according to the complexity level of each neural network architectureEach population, wherein each population comprises/>A neural network architecture.

Random sampling from each populationThe neural network architecture is combined to obtain an initial population, and the initial population comprises/>A neural network architecture, wherein >。

Fourth, constructing a proxy model and training the proxy model:

a proxy model is constructed, and in this embodiment, a support vector machine (Support Vector Machines, SVM) model is selected as the proxy model.

Training the agent model by using the initial population, taking the neural network architecture in the initial population as input, and taking the comparison result of the merits of the neural network architecture as output to obtain the pre-trained agent model. With reference to fig. 4, a specific training process is as follows:

Taking the 1 st and 2 nd neural network architecture as an example: first architecture and remainderThe architectures are respectively paired to obtainIf the classification accuracy value of the first architecture is better than that of the other architecture, the label is marked as 1, otherwise, the label is marked as 0. The first architecture is then removed from the second architecture, and the remainder/>The fabric pairs and tags.

Finally can obtainAnd paired samples, the paired samples forming a training data set.

Initializing an evolution search process through an initial population, and carrying out multi-target evolution search in a search space by combining a pre-trained agent model and a plurality of group mechanisms to obtain a candidate neural network architecture:

S1, non-dominated sorting is conducted on the neural network architecture in the initial population, and a first level in the non-dominated sorting is used as a main population. Removing the main population, calculating the congestion degree of the rest neural network architecture, and selecting the highest congestion degree ranking The individual neural network architecture acts as a sub-population.

；

Wherein, Representing a threshold value/>Representing a random access function,/>For randomly generated/>Number between,/>Representing the number of evolutions,/>，/>Is the total evolution times.

S3, comparing the threshold value with a preset super parameter, if the threshold value is larger than the super parameter, taking the main population and the auxiliary population as parent individuals together, and if the threshold value is smaller than the super parameter, taking the main population as the parent individuals; a series of child individuals are generated for the parent individuals through crossover and mutation operations, and in the embodiment, crossover operators use two-point crossover, and mutation operators use polynomial mutation.

S4, obtaining classification precision ranking of the child individuals through pre-trained agent model processing, and calculating complexity of the child individuals.

And performing non-dominant ranking by using a rapid non-dominant algorithm according to the classification precision ranking and the complexity of the child individuals to obtain non-dominant ranking results of the child individuals.

Selecting child individuals with non-dominant ranking results at a first level to calculate the degree of congestion, obtaining a congestion degree calculation result, and reserving the front of the congestion degree calculation resultSub-individuals.

S5, willAnd merging the sub-generation individuals with the initial population, and repeating the steps S1-S6 until the total evolution times are reached, so as to obtain the candidate neural network architecture.

End of evolution reservation per roundSub-individuals may decode and train them and then supplement the training dataset for training the proxy model.

Step six, screening out a neural network architecture taking account of two optimization targets of network complexity and classification accuracy from candidate neural network architectures:

And performing non-dominant ranking on the candidate neural network architecture to obtain a non-dominant ranking result. And selecting a plurality of neural network architectures with non-dominant sequencing results positioned at the front part, and training the neural network architectures by using a random gradient descent algorithm to obtain the neural network architecture considering two optimization targets.

Example 2:

On the basis of embodiment 1, this embodiment provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, implements the steps of the multi-population mechanism and proxy model-based multi-objective evolutionary neural architecture searching method in embodiment 1.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A multi-objective evolutionary neural architecture search method based on a multi-population mechanism and an agent model, characterized by comprising:

Constructing a search space, including: setting search parameters, searching the backbone of the convolutional neural network according to the search parameters, obtaining a convolutional neural network that meets the conditions, and all convolutional neural networks that meet the conditions constitute a search space;

The backbone of the convolutional neural network includes five MBConvBlocks modules connected in sequence, each of which is composed of multiple layers, and each layer adopts an inverted bottleneck structure; the search parameters include the number of layers of each MBConvBlocks module, the convolution kernel size of each layer, the expansion rate and the resolution, and the resolution is the input image size;

Encode the neural network architecture in the search space;

Determine two optimization goals, the optimization goals are network complexity and classification accuracy;

Initialize the population of neural network architectures in the search space according to the network complexity to obtain the initial population for evolutionary search;

The evolutionary search process is initialized through the initial population, and a multi-objective evolutionary search is performed in the search space in combination with the pre-trained proxy model and the multi-population mechanism to obtain candidate neural network architectures, including:

S1. Perform non-dominated sorting and crowding calculation on the neural network architectures in the initial population to obtain the main population and the secondary population, including:

Perform non-dominated sorting on the neural network architectures in the initial population, and use the first level in the non-dominated sorting as the main population;

Excluding the main population, the crowding of the remaining neural network architectures is calculated, and the one with the highest crowding ranking is selected. A neural network architecture as a sub-population;

S2. Calculate the threshold of this round of evolution. The expression is as follows:

;

in, Indicates the threshold value, /> Randomly generated /> The number between, /> Indicates the number of evolutions, /> is the total number of evolutions;

S3. Compare the threshold with the preset hyperparameter. If the threshold is greater than the hyperparameter, the main population and the secondary population are taken as parent individuals. If the threshold is less than the hyperparameter, the main population is taken as the parent individual. Perform crossover and mutation operations on the parent individuals to generate a series of offspring individuals.

S4. Obtain the classification accuracy ranking of the offspring individuals through the pre-trained proxy model, and calculate the complexity of the offspring individuals; perform non-dominated sorting and crowding calculation on the offspring individuals according to their classification accuracy ranking and complexity, and retain Offspring individuals, including:

According to the classification accuracy ranking and complexity of the offspring individuals, the fast non-dominated algorithm is used to perform non-dominated sorting, and the non-dominated sorting result of the offspring individuals is obtained;

Select the offspring individuals in the first level of the non-dominated sorting result to calculate the crowding degree, obtain the crowding degree calculation result, and retain the crowding degree calculation result at the top offspring individuals;

S5. The offspring individuals are merged with the initial population, and S1-S5 are repeated until the total number of evolutions is reached to obtain the candidate neural network architecture;

From the candidate neural network architectures, a neural network architecture that takes into account both network complexity and classification accuracy is selected.

2. The multi-objective evolutionary neural architecture search method based on a multi-population mechanism and an agent model according to claim 1, characterized in that encoding the neural network architecture in the search space comprises:

Use a fixed-length integer string to encode the neural network architecture in the search space, including resolution, number of module layers, convolution kernel size, and expansion rate;

When the encoding length of the neural network architecture is less than the fixed length, it is padded with zeros to reach the fixed length.

3. The multi-objective evolutionary neural architecture search method based on a multi-population mechanism and an agent model according to claim 1 is characterized in that the neural network architecture in the search space is initialized according to the network complexity to obtain an initial population, including:

Randomly sample from the search space A neural network architecture;

Calculate the complexity of each neural network architecture;

All neural network architectures are divided into populations, each of which includes/> A neural network architecture;

Randomly sample from each population neural network architectures, and merge them to obtain an initial population, the initial population includes/> A neural network architecture, where .

4. The multi-objective evolutionary neural architecture search method based on a multi-population mechanism and an agent model according to claim 1 is characterized in that after initializing the population of the neural network architecture in the search space according to the network complexity to obtain the initial population for evolutionary search, the method further comprises:

Building agent models;

The proxy model is trained using the initial population, with the neural network architecture in the initial population as input and the comparison result of the neural network architecture as output, to obtain a pre-trained proxy model;

The training process of the proxy model includes:

The neural network architecture in the initial population is trained using the stochastic gradient descent algorithm to obtain the classification accuracy of the neural network architecture. Each neural network architecture and its corresponding classification accuracy are used as a sample to form the original training sample set.

The original training sample sets are paired in order, that is, samples and the remaining /> The samples are paired separately. If the If the classification accuracy of a sample is better than that of the paired sample, the label is recorded as 1, otherwise the label is recorded as 0. The samples with more 1 labels are good samples, and the samples with more 0 labels are bad samples.

5. The multi-objective evolutionary neural architecture search method based on a multi-population mechanism and an agent model according to claim 1 is characterized in that a neural network architecture that takes into account both network complexity and classification accuracy optimization objectives is selected from the candidate neural network architectures, comprising:

Perform non-dominated sorting on the candidate neural network architectures to obtain a non-dominated sorting result;

Multiple neural network architectures with non-dominated sorting results at the front are selected, and the stochastic gradient descent algorithm is used to train them to obtain a neural network architecture that takes into account both optimization objectives.

6. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the multi-objective evolutionary neural architecture search method based on multiple population mechanisms and agent models described in any one of claims 1 to 5 are implemented.