CN115114401B

CN115114401B - Twin network structure search method and semantic matching method for data intelligence

Info

Publication number: CN115114401B
Application number: CN202210738417.2A
Authority: CN
Inventors: 黄翰; 刘浩龙; 吴宁; 曾庆醒; 颜学明
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2025-02-14
Anticipated expiration: 2042-06-27
Also published as: CN115114401A

Abstract

The invention provides a twin network structure searching method and a semantic matching method for data intelligence. The method comprises the steps of constructing a twin network model according to a target task, wherein the twin network model comprises five network structures, the network structures are formed by connecting a plurality of operation nodes, a search space of the network structures is defined, the search space is formed by a plurality of operation nodes, operation constraint conditions of the twin network model are set, a controller samples on the given search space to obtain the network model, training the network model and the controller, updating network weight parameters until convergence is achieved, ensuring that the network model meets the operation constraint conditions in the training process, and determining the target twin network model according to the updated network weight parameters. The method can adaptively construct the optimal twin network model, effectively reduces the cost and difficulty of manual searching, and can be used for text semantic matching.

Description

Twin network structure searching method and semantic matching method for data intelligence

Technical Field

The invention relates to the field of artificial intelligence, in particular to a twin network structure searching method, a semantic matching method, equipment and a medium.

Background

Twin network models are commonly used in the field of natural language processing for text semantic matching, such as question-answering systems and search engines, which are typical application scenarios. Over the past decade, manually designed twin network models have good effects, but for text semantic matching in different fields, the optimal twin network models are often different, and it takes several days for a person to construct an optimal network model.

In order to accelerate the manual searching process, the neural structure searching can automate the network model design process, and can reduce the labor cost of optimal model design. However, the existing neural structure search still has the problems of low search efficiency and more occupied resources, and the research work for twin network structure search is still deficient.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a twin network structure searching method oriented to data intelligence.

In order to achieve the purpose of the invention, the data intelligent-oriented twin network structure searching method provided by the invention comprises the following steps:

The method comprises the steps of constructing a twin network model according to a target task, wherein the network structure of the twin network model comprises an input layer, a coding layer, an aggregation layer, an interaction layer and an output layer, the input layer is used for mapping discrete text sequences into continuous digital vectors, word embedded vectors can be obtained from the input layer, the coding layer is used for combining semantic information of text contexts and semantic association between text pairs to generate sentence embedded vectors, the aggregation layer is used for aggregating the word embedded vectors and the sentence embedded vectors obtained by the input layer and the coding layer to obtain global semantic reasoning information, the interaction layer is used for fusing the sentence semantic reasoning information output by the aggregation layer to generate feature vectors capable of distinguishing semantic relativity between the text pairs, and the output layer is used for carrying out final feature extraction by using an activating function and predicting labels of the input sentence pairs;

defining a search space of the twin network model, the search space comprising a plurality of operational nodes;

Setting constraint conditions of the operation of the twin network model;

The controller samples on the given search space to obtain a twin network model, trains the twin network model and the controller, updates network weight parameters until convergence, and ensures that the twin network model meets the operation constraint condition in the training process;

and determining a target twin network model according to the updated network weight parameters, wherein the target twin network model is used for matching text semantics.

Further, in the search space of the network structure, the coding layer and the aggregation layer in the network structure are network structures to be searched, and respectively correspond to different search spaces;

the operation node set in the coding layer search space is divided into two major categories, namely a self-coding operation node set and a cross-sentence coding operation node set;

the self-coding operation node set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a zeroing layer;

The cross-sentence coding operation node set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer;

the operation node set in the aggregation layer search space comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer.

Further, the operation constraint condition comprises the predicted speed of the twin network model and the resources occupied by the operation to meet a set threshold.

Further, the ensuring that the network model meets the operation constraint condition in the training process includes:

if the network model can not meet the operation constraint condition, setting a loss value to zero;

if the network models can meet the operation constraint conditions, the loss value is kept unchanged.

Further, the controller samples the given search space to obtain a network model, including:

In the sampling process, the controller selects a plurality of operation nodes in the coding layer search space and connects the operation nodes, and selects one operation node in the aggregation layer search space.

Further, in the updating network weight parameters, the same type of operation nodes in the search space share the same network weight parameters, the convolutional neural network layer in the coding layer search space comprises multiple products with different sizes, the same network weight parameters are shared, each network layer in the cyclic neural network layer shares the same network weight parameters, the multi-head attention mechanism layer comprises four-head and eight-head attention mechanism layers, and the four-head and eight-head multi-head attention mechanism layers share the same network weight parameters.

Further, in the sampling process that the controller samples on the given search space to obtain the network model, the controller selects a plurality of operation nodes in the coding layer search space and connects the operation nodes, and selects one operation node in the aggregation layer search space.

Further, when training the twin network model and the controller, the twin network model learns the predicted output of the large pre-training model on the verification set besides learning the one-hot real label, and then the loss value of the twin network model on the verification set comprises two parts, namely a real label score and a predicted output score of the large pre-training model.

According to a second aspect of the present invention, there is provided a text semantic matching method comprising:

acquiring training data for text semantic matching;

training by adopting the twin network structure searching method to obtain a target twin network model for text semantic matching;

and carrying out matching prediction on the data by adopting the target twin network model to obtain a text semantic matching result.

The invention also provides a storage medium which stores a computer program, wherein the computer program realizes any one of the data intelligent-oriented twin network structure searching methods or the text semantic matching method when being executed by a processor.

The invention also provides equipment, which comprises a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored by the memory, the method for searching the twin network structure oriented to the data intelligence or the method for matching text semantics is realized.

Compared with the prior art, the invention has at least the following advantages and technical effects:

Firstly, the invention designs a new search space aiming at a twin network model coding layer, which comprises two kinds of operation sets, and allows the model to simultaneously select self-coding and cross-sentence coding operations. Secondly, the invention also introduces a search space containing a plurality of aggregation operation sets in the twinning network aggregation layer, so that the model can autonomously learn the optimal aggregation operation, and the method can align semantic information between inputs and provide deeper semantic connection between input pairs. And to speed up the process of neural structure searching, the same kind of operations in the search space share the same network weight. Meanwhile, the invention adds the operation constraint condition of the twin network model in the searching process, if the performance of the twin network model on the verification set can not meet the constraint condition, the loss value fed back to the controller by the twin network model is set to 0, thereby improving the speed of searching the better twin network model by the controller and accelerating the matching speed of the semantic matching task.

Drawings

Fig. 1 is a flow chart of steps of a twin network structure search method for data intelligence in an embodiment.

Fig. 2 is a schematic diagram of a network structure of a twin network model according to the present invention.

FIG. 3 is a schematic diagram of cross-sentence encoding operations in a twin network model encoding layer.

Detailed Description

Embodiments of the present invention will be further described with reference to examples, but the practice of the present invention is not limited thereto.

Referring to fig. 1, the method for searching a twin network structure for data intelligence provided by the invention is used for text semantic matching, and comprises the following steps:

s1, constructing a twin network model according to the target task.

Referring to fig. 2, the twin network model has five network structures, namely an input layer, a coding layer, an aggregation layer, an interaction layer and an output layer, wherein the input layer is used for mapping discrete text sequences into continuous digital vectors, word embedding vectors can be obtained from the input layer, the coding layer is used for generating higher-level sentence vector embedding representation by combining semantic information of text context and semantic association between text pairs on the basis of the word embedding vectors, and the coding layer mainly solves the semantic problem of word in the actual context. The aggregation layer is used for aggregating word embedded vectors and sentence embedded vectors obtained by the input layer and the coding layer to obtain global semantic reasoning information, the interaction layer is used for fusing the sentence semantic reasoning information output by the aggregation layer by using the fully-connected neural network to generate feature vectors capable of distinguishing semantic relativity between text pairs, and the output layer is used for inputting the final feature vectors into the classifier to obtain predicted class labels.

In some embodiments of the present invention, since the text languages commonly used in specific target tasks include chinese and english, a specific Word2Vec pre-training model is loaded according to the language type, and vectorization is performed on the text, so as to obtain an input vectorized representation.

And determining the classifier of the output layer according to the number of labels of the specific target task. In some embodiments of the present invention, if the target task is a classified task, the classifier of the output layer is a Sigmoid function, and if the target task is a multi-classified task, the classifier of the output layer is a Softmax function.

S2, defining a search space of the network structure, wherein the search space comprises a plurality of operation nodes.

In some embodiments of the present invention, the two portions of the coding layer and the aggregation layer are network structures to be searched of the twin network model, corresponding to different search spaces, respectively.

The coding layer search space is defined as a fully connected directed acyclic graph, containing K operational nodes, K being set to 10 in some embodiments of the invention. The directed edges < i, j > between the operation nodes indicate that the output of operation node i is the input of operation node j, and if one operation node contains multiple directed edges, these inputs will be subjected to a point-by-point addition operation. For each operation node in the search space, the controller first decides the kind of the operation node, and divides the operation node into two kinds of self-coding operation nodes and cross-sentence coding operation nodes. The self-coding operation node refers to the output of which the input is the operation node of the upper layer. The cross-sentence coding operation node refers to that the input of the cross-sentence coding operation node comprises the output of the operation node of the upper layer and the input of the operation node of the same layer of the coding layer, and the attention mechanism calculation is performed.

If the operation node is a self-coding operation node, the selectable operation set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a nulling layer. The convolution neural network layer comprises convolution operations with convolution kernel sizes of 1,3 and 5, the convolution neural network layer is used for expanding receptive fields of semantic features of sentences and improving long-distance dependency capturing capability, the circulation neural network layer comprises a long-period memory network and a gate control circulation network, the circulation neural network layer is used for capturing semantic information of contexts of the sentences, the maximum pooling layer comprises pooling operations with window sizes of 3 and 5, the maximum pooling layer is used for extracting features with the most abundant semantic information and reducing dimensions of the features, the multi-head attention mechanism layer comprises attention mechanisms with four heads and eight heads, the multi-head attention mechanism layer is used for capturing semantic information of long-distance dependency of sentences, the jump connection layer is used for supporting residual operation, and the zeroing layer sets the output of a certain operation node to 0. The speed of training the controller can be accelerated through zero setting operation, and the predicted set loss of the twin network model which does not meet the constraint condition is set to zero, so that the controller does not learn.

If the operation node is a cross-sentence coding operation node, the optional operation set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer, and semantic information aggregation is carried out with another sentence through the layers. As shown in fig. 3, let the operation node 3 of the text one in the coding layer be a cross-sentence coding operation, then it will perform a cross-sentence coding operation with the operation node 3 of the text two in the coding layer.

The aggregation layer search space is defined as a plurality of operation node sets of aggregation operation and comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer. The controller selects outputs of a plurality of operation nodes in the coding layer as inputs of the aggregation layer in addition to one of the aggregation operations at the aggregation layer. In some embodiments of the present invention, the output of the coding layer is specifically input into a controller, and the controller outputs the number of the coding layer.

S3, setting operation constraint conditions of the twin network model.

The controller may preset a certain constraint condition when searching the twin network model, and in some embodiments of the present invention, the set constraint condition is that a predicted speed of the twin network model on the verification set is greater than a set threshold value, and a resource required by the twin network model to run on the verification set is less than the set threshold value. In some embodiments of the present invention, the threshold is set mainly according to the requirements of the actual application scenario, such as a scenario with high real-time performance, the prediction speed must be specified, and may also be determined according to the average characteristics of the samples, and the following STS dataset embodiment is determined according to the characteristics of the samples.

And S4, the controller samples the given search space to obtain a network model, trains the network model and the controller, updates network weight parameters until convergence, and ensures that the network model meets the constraint conditions in the training process.

(1) Using a recurrent neural network architecture as a controller, in some embodiments of the present invention, a long and short term memory network is selected for upsampling in the search space to generate the network model to be trained.

(2) In each round, the controller performs up-sampling in the coding layer and the aggregation layer respectively to generate a network model. The controller makes three steps of judgment on each operation node in the search space of the coding layer, wherein the first step is to judge whether the type of the operation node is a self-coding operation node or a cross-sentence coding operation node, the second step is to judge specific operation, and the third step is to judge which operation node is provided with a directed edge. The controller makes two steps of judgment in the aggregation layer search space, wherein the first step of judgment selects which operation node outputs in the coding layer to be used as the input of the aggregation layer, and the second step of judgment selects which specific aggregation operation. After the controller finishes sampling, the generated network model comprises a plurality of operation nodes in the coding layer and the aggregation layer.

(3) And training the network model generated by the controller sampling on a training set, and updating the network weight parameters of the operation nodes. In order to accelerate the convergence speed of the network model, the same type of operation nodes in the coding layer share the same network weight parameter. Convolutional neural network layers with the integral sizes of 1, 3 and 5 all share the same network weight parameter, and four-head and eight-head multi-head attention mechanism layers share the same network weight parameter.

(4) And carrying out predictive scoring on the trained network model on the verification set, and taking the obtained loss value as a network weight parameter of the rewarding update controller.

(5) Repeating the steps (3) to (4) until the network model and the controller converge.

The network model ensures that the operation constraint condition is met in the training process. And if the predicted speed of the network model on the verification set and the resources occupied by the operation can meet the preset constraint conditions, the loss value is kept unchanged.

In some embodiments of the present invention, in order for the controller to learn faster how to generate a better network model, the network model learns the predicted output of a large pre-trained model on the validation set in addition to the one-hot encoded real labels. The large pre-training model performs fine-tuning training on the target task data set in advance. Therefore, the loss value of the twin network model generated by the controller sampling on the verification set comprises two parts, namely a real label score and a prediction output score of the large pre-training model, and the controller can obtain more feedback from a single sample.

Wherein, the large pre-training model adopts the existing model, as in the text semantic matching method embodiment described below, the RoBERTa model is used.

S5, determining a target twin network model according to the updated network weight parameters.

And the controller with updated network weight parameters resamples the network model, and retrains the network model according to the target task data set to obtain the twin network model optimal for the target task.

The invention also provides a text semantic matching method. In the existing text semantic matching method, a text input by a user is usually analyzed, coded and matched in a preset twin network model, and although a good effect can be obtained, the existing text semantic matching method can only aim at a single field, if the existing text semantic matching method is used in other fields, the preset model can not necessarily obtain a good effect. The semantic matching method provided by the invention can fully utilize the data characteristics of each field to automatically and rapidly search out the optimal twin network model.

The text semantic matching method provided by the invention comprises the following steps:

Step 1, acquiring training data for text semantic matching;

training by adopting the twin network structure searching method provided in the previous embodiment to obtain a target twin network model for text semantic matching;

And 3, carrying out matching prediction on the data by adopting the target twin network model for text semantic matching, so as to obtain a text semantic matching result.

Specifically, in some embodiments of the present invention, taking text training data of a public dataset (such as an STS dataset) as a model input to implement text semantic matching, the text semantic matching method specifically includes the following steps:

M1, acquiring training data for text semantic matching, wherein the training data is text training data of a common public data set STS, and the STS data set is a data set published in 2017 by an international semantic evaluation team and comprises English data sets used by STS tasks in 2012-2017;

m2, training by adopting a twin network structure searching method based on the text training data obtained by the M1 to obtain a target twin network model for text semantic matching;

And M3, performing matching prediction on text test data by adopting the target twin network model obtained by the M2, wherein the test data are STS text test data, and obtaining a test text semantic matching result.

The M2 may be implemented by using the S1-S5, and in some embodiments of the present invention, the method specifically includes:

S1, constructing a twin network model according to the target task. In this embodiment, the text in STS is english, and a Word2Vec pre-training model for english is loaded to vectorize the text, so as to obtain an input vectorized representation.

STS classifies sentence similarity into six classes 0-5, and determines the classifier of the output layer as a Softmax function.

S2, defining a search space of the network structure. The coding layer and the aggregation layer of the twin network model are network structures to be searched, and respectively correspond to different search spaces.

The coding layer search space is defined as a fully connected directed acyclic graph, comprising K operational nodes, K being set to 10 in this embodiment. The directed edges < i, j > between the operation nodes indicate that the output of operation node i is the input of operation node j, and if one operation node contains multiple directed edges, these inputs will be subjected to a point-by-point addition operation. For each operation node in the search space, the controller first decides the kind of the operation node, and divides the operation node into two kinds of self-coding operation nodes and cross-sentence coding operation nodes.

If the operation node is a self-coding operation node, the selectable operation set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a nulling layer. The convolution neural network layer comprises convolution operations with convolution kernel sizes of 1, 3 and 5, the circulation neural network layer comprises a long-period memory network and a gating circulation network, the maximum pooling layer comprises pooling operations with window sizes of 3 and 5, the multi-head attention mechanism layer comprises four-head and eight-head attention mechanisms, the jump connection layer is used for supporting residual operation, and the zeroing layer sets the output of a certain operation node to 0.

If the operation node is a cross-sentence coding operation node, the selectable operation set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer. As shown in fig. 3, let the operation node 3 of the text one in the coding layer be a cross-sentence coding operation, then it will perform a cross-sentence coding operation with the operation node 3 of the text two in the coding layer.

The aggregation layer search space is defined as a plurality of operation node sets of aggregation operation and comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer. The controller selects outputs of a plurality of operation nodes in the coding layer as inputs of the aggregation layer in addition to one aggregation operation at the aggregation layer.

S3, setting operation constraint conditions of the twin network model. In the embodiment, the average length of STS sentences is 10, the maximum length is 56, the prediction speed threshold of the twin network model on the verification set is set to be 50 milliseconds per sample according to the data characteristics, and the resource threshold required by the twin network model to run on the verification set is set to be 512MB of the GPU video memory.

(1) The cyclic neural network structure is used as a controller, and a long-term and short-term memory network is selected in the embodiment for generating a network model to be trained.

(2) In each round, the controller samples in the coding layer and the aggregation layer respectively to generate a network model. The controller makes three steps of judgment on each operation node in the search space of the coding layer, the first step judges whether the type of the operation node is a self-coding operation node or a cross-sentence coding operation node, the second step judges specific operation, and the third step judges which operation node establishes a directed edge. The controller makes two steps of judgment in the aggregation layer search space, wherein the first step of judgment selects which operation node outputs in the coding layer to be used as the input of the aggregation layer, and the second step of judgment selects which specific aggregation operation. After the controller finishes sampling, the generated network model comprises a plurality of operation nodes in the coding layer and the aggregation layer.

(3) The controller samples the generated network model to train on the training set, and updates the network weight parameters of the operation nodes. In order to accelerate the convergence speed of the network model, the same type of operation nodes in the coding layer share the same network weight parameter. Convolutional neural network layers with the integral sizes of 1,3 and 5 all share the same network weight parameter, and four-head and eight-head multi-head attention mechanism layers share the same network weight parameter.

The network model ensures that the operation constraint condition is met in the training process. If the prediction speed of the network model on the verification set cannot reach 50 milliseconds per sample, or the running required resources are higher than the GPU video memory by 512MB, the loss value is set to be 0.

The network model learns the predicted output of the large pre-training model on the verification set besides learning the real label of the one-hot code. In some embodiments of the present invention RoBERTa is selected as the large pre-training model. RoBERTa fine tuning training is performed on the target task dataset in advance. The loss value of the sub-model on the validation set consists of two parts, namely the true label score and the predicted output score of RoBERTa.

S5, determining a target twin network model according to the updated network weight parameters. And the controller with updated network weight parameters resamples the network model, and retrains the network model according to STS text training data to obtain the twin network model optimal for STS.

Compared with the conventional semantic matching method which uses a pre-training language model as feature extraction and manually designs a complex network structure according to a specific scene to carry out fine adjustment, the scheme does not need to manually design the network structure, but automatically searches the optimal network structure according to the specific scene by a twin network structure searching method, so that the labor cost is greatly reduced, and the semantic matching effect is improved.

The embodiment of the invention also provides a storage medium which can be a storage medium such as ROM, RAM, magnetic disk, optical disk and the like, and one or more programs are stored in the storage medium, and when the programs are executed by a processor, the twin network structure searching method or the text semantic matching method for data intelligence provided by the embodiment is realized.

The embodiment of the invention also provides equipment which can be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet personal computer or other terminal equipment with a display function, and the equipment comprises a processor and a memory, wherein the memory stores one or more programs, and when the processor executes the programs stored in the memory, the twin network structure searching method facing the data intelligence or the text semantic matching method provided by the embodiment is realized.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A twin network structure search method for data intelligence, characterized in that it includes the following steps:

A twin network model is constructed according to the target task. The network structure of the twin network model includes an input layer, an encoding layer, an aggregation layer, an interaction layer and an output layer. The input layer is used to map a discrete text sequence into a continuous digital vector. A word embedding vector can be obtained from the input layer. The encoding layer is used to combine the semantic information of the text context and the semantic association between text pairs to generate a sentence embedding vector. The aggregation layer is used to aggregate the word embedding vector and the sentence embedding vector obtained by the input layer and the encoding layer to obtain global semantic reasoning information. The interaction layer is used to fuse the sentence semantic reasoning information output by the aggregation layer to generate a feature vector that can distinguish the semantic relevance between text pairs. The output layer is used to input the final feature vector into a classifier to obtain a predicted category label.

Defining a search space of the twin network model, wherein the search space includes a plurality of operation nodes;

Setting operating constraints of the twin network model;

The controller samples the given search space to obtain a twin network model, trains the twin network model and the controller, updates the network weight parameters until convergence, and ensures that the twin network model satisfies the operating constraints during the training process;

According to the updated network weight parameters, a target twin network model is determined, and the target twin network model is used to match the text semantics.

2. The twin network structure search method for data intelligence according to claim 1 is characterized in that, in the search space of the network structure, the coding layer and the aggregation layer in the network structure are the network structures to be searched, corresponding to different search spaces respectively;

The operation node sets in the coding layer search space are divided into two categories, including self-coding operation node sets and cross-sentence coding operation node sets;

The self-encoding operation node set includes a convolutional neural network layer, a recurrent neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a skip connection layer and a zeroing layer; the cross-sentence encoding operation node set includes a dot product attention mechanism layer, a concatenation layer, a point-by-point multiplication layer, a point-by-point addition layer, and a point-by-point subtraction layer;

The set of operation nodes in the aggregation layer search space includes a maximum pooling layer, an average pooling layer, and a self-attention mechanism pooling layer.

3. According to the twin network structure search method for data intelligence according to claim 2, it is characterized in that, in the updating of the network weight parameters, the operation nodes of the same type in the search space share the same network weight parameters; the convolutional neural network layer in the coding layer search space includes a variety of product kernels of different sizes, sharing the same network weight parameters, each network layer in the recurrent neural network layer shares the same network weight parameters, the multi-head attention mechanism layer includes four-head and eight-head attention mechanism layers, and the four-head and eight-head multi-head attention mechanism layers share the same network weight parameters.

4. According to the data intelligence-oriented twin network structure search method according to claim 1, it is characterized in that the operating constraints include that the prediction speed of the twin network model and the resources occupied by the operation must meet the set thresholds.

5. The twin network structure search method for data intelligence according to claim 1 is characterized in that ensuring that the network model satisfies the operating constraints during the training process includes:

If the twin network model cannot meet the operating constraints, the loss value is set to zero;

If the network models can all satisfy the operating constraints, the loss value remains unchanged.

6. According to the twin network structure search method for data intelligence according to claim 1, it is characterized in that in the sampling process of the controller sampling in the given search space to obtain the network model, the controller selects multiple operation nodes in the coding layer search space and connects them, and selects an operation node in the aggregation layer search space.

7. The data-intelligent twin network structure search method according to any one of claims 1-6 is characterized in that, when training the twin network model and the controller, the twin network model not only learns the one-hot true label on the validation set, but also learns the predicted output of the large pre-trained model on the validation set; the loss value of the twin network model on the validation set includes two parts: the true label score and the predicted output score of the large pre-trained model.

8. A text semantic matching method, characterized by comprising the following steps:

Obtain training data for text semantic matching;

The twin network structure search method according to any one of claims 1 to 7 is used for training to obtain a target twin network model for text semantic matching;

The target twin network model is used to match and predict the text data to be processed to obtain the text semantic matching results.

9. A storage medium storing a computer program, characterized in that when the computer program is executed by a processor, it implements the data intelligence-oriented twin network structure search method as described in any one of claims 1 to 7 or implements the text semantic matching method as described in claim 8.

10. A device comprising a processor and a memory for storing processor executable programs, characterized in that when the processor executes the program stored in the memory, it implements the data intelligence-oriented twin network structure search method described in any one of claims 1-7 or implements the text semantic matching method described in claim 8.