CN115114401B - Twin network structure search method and semantic matching method for data intelligence - Google Patents
Twin network structure search method and semantic matching method for data intelligence Download PDFInfo
- Publication number
- CN115114401B CN115114401B CN202210738417.2A CN202210738417A CN115114401B CN 115114401 B CN115114401 B CN 115114401B CN 202210738417 A CN202210738417 A CN 202210738417A CN 115114401 B CN115114401 B CN 115114401B
- Authority
- CN
- China
- Prior art keywords
- layer
- network model
- twin network
- twin
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 41
- 230000008569 process Effects 0.000 claims abstract description 15
- 230000002776 aggregation Effects 0.000 claims description 42
- 238000004220 aggregation Methods 0.000 claims description 42
- 230000007246 mechanism Effects 0.000 claims description 26
- 238000011176 pooling Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 230000015654 memory Effects 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 230000003993 interaction Effects 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 description 13
- 125000004122 cyclic group Chemical group 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000001537 neural effect Effects 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a twin network structure searching method and a semantic matching method for data intelligence. The method comprises the steps of constructing a twin network model according to a target task, wherein the twin network model comprises five network structures, the network structures are formed by connecting a plurality of operation nodes, a search space of the network structures is defined, the search space is formed by a plurality of operation nodes, operation constraint conditions of the twin network model are set, a controller samples on the given search space to obtain the network model, training the network model and the controller, updating network weight parameters until convergence is achieved, ensuring that the network model meets the operation constraint conditions in the training process, and determining the target twin network model according to the updated network weight parameters. The method can adaptively construct the optimal twin network model, effectively reduces the cost and difficulty of manual searching, and can be used for text semantic matching.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a twin network structure searching method, a semantic matching method, equipment and a medium.
Background
Twin network models are commonly used in the field of natural language processing for text semantic matching, such as question-answering systems and search engines, which are typical application scenarios. Over the past decade, manually designed twin network models have good effects, but for text semantic matching in different fields, the optimal twin network models are often different, and it takes several days for a person to construct an optimal network model.
In order to accelerate the manual searching process, the neural structure searching can automate the network model design process, and can reduce the labor cost of optimal model design. However, the existing neural structure search still has the problems of low search efficiency and more occupied resources, and the research work for twin network structure search is still deficient.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a twin network structure searching method oriented to data intelligence.
In order to achieve the purpose of the invention, the data intelligent-oriented twin network structure searching method provided by the invention comprises the following steps:
The method comprises the steps of constructing a twin network model according to a target task, wherein the network structure of the twin network model comprises an input layer, a coding layer, an aggregation layer, an interaction layer and an output layer, the input layer is used for mapping discrete text sequences into continuous digital vectors, word embedded vectors can be obtained from the input layer, the coding layer is used for combining semantic information of text contexts and semantic association between text pairs to generate sentence embedded vectors, the aggregation layer is used for aggregating the word embedded vectors and the sentence embedded vectors obtained by the input layer and the coding layer to obtain global semantic reasoning information, the interaction layer is used for fusing the sentence semantic reasoning information output by the aggregation layer to generate feature vectors capable of distinguishing semantic relativity between the text pairs, and the output layer is used for carrying out final feature extraction by using an activating function and predicting labels of the input sentence pairs;
defining a search space of the twin network model, the search space comprising a plurality of operational nodes;
Setting constraint conditions of the operation of the twin network model;
The controller samples on the given search space to obtain a twin network model, trains the twin network model and the controller, updates network weight parameters until convergence, and ensures that the twin network model meets the operation constraint condition in the training process;
and determining a target twin network model according to the updated network weight parameters, wherein the target twin network model is used for matching text semantics.
Further, in the search space of the network structure, the coding layer and the aggregation layer in the network structure are network structures to be searched, and respectively correspond to different search spaces;
the operation node set in the coding layer search space is divided into two major categories, namely a self-coding operation node set and a cross-sentence coding operation node set;
the self-coding operation node set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a zeroing layer;
The cross-sentence coding operation node set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer;
the operation node set in the aggregation layer search space comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer.
Further, the operation constraint condition comprises the predicted speed of the twin network model and the resources occupied by the operation to meet a set threshold.
Further, the ensuring that the network model meets the operation constraint condition in the training process includes:
if the network model can not meet the operation constraint condition, setting a loss value to zero;
if the network models can meet the operation constraint conditions, the loss value is kept unchanged.
Further, the controller samples the given search space to obtain a network model, including:
In the sampling process, the controller selects a plurality of operation nodes in the coding layer search space and connects the operation nodes, and selects one operation node in the aggregation layer search space.
Further, in the updating network weight parameters, the same type of operation nodes in the search space share the same network weight parameters, the convolutional neural network layer in the coding layer search space comprises multiple products with different sizes, the same network weight parameters are shared, each network layer in the cyclic neural network layer shares the same network weight parameters, the multi-head attention mechanism layer comprises four-head and eight-head attention mechanism layers, and the four-head and eight-head multi-head attention mechanism layers share the same network weight parameters.
Further, in the sampling process that the controller samples on the given search space to obtain the network model, the controller selects a plurality of operation nodes in the coding layer search space and connects the operation nodes, and selects one operation node in the aggregation layer search space.
Further, when training the twin network model and the controller, the twin network model learns the predicted output of the large pre-training model on the verification set besides learning the one-hot real label, and then the loss value of the twin network model on the verification set comprises two parts, namely a real label score and a predicted output score of the large pre-training model.
According to a second aspect of the present invention, there is provided a text semantic matching method comprising:
acquiring training data for text semantic matching;
training by adopting the twin network structure searching method to obtain a target twin network model for text semantic matching;
and carrying out matching prediction on the data by adopting the target twin network model to obtain a text semantic matching result.
The invention also provides a storage medium which stores a computer program, wherein the computer program realizes any one of the data intelligent-oriented twin network structure searching methods or the text semantic matching method when being executed by a processor.
The invention also provides equipment, which comprises a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored by the memory, the method for searching the twin network structure oriented to the data intelligence or the method for matching text semantics is realized.
Compared with the prior art, the invention has at least the following advantages and technical effects:
Firstly, the invention designs a new search space aiming at a twin network model coding layer, which comprises two kinds of operation sets, and allows the model to simultaneously select self-coding and cross-sentence coding operations. Secondly, the invention also introduces a search space containing a plurality of aggregation operation sets in the twinning network aggregation layer, so that the model can autonomously learn the optimal aggregation operation, and the method can align semantic information between inputs and provide deeper semantic connection between input pairs. And to speed up the process of neural structure searching, the same kind of operations in the search space share the same network weight. Meanwhile, the invention adds the operation constraint condition of the twin network model in the searching process, if the performance of the twin network model on the verification set can not meet the constraint condition, the loss value fed back to the controller by the twin network model is set to 0, thereby improving the speed of searching the better twin network model by the controller and accelerating the matching speed of the semantic matching task.
Drawings
Fig. 1 is a flow chart of steps of a twin network structure search method for data intelligence in an embodiment.
Fig. 2 is a schematic diagram of a network structure of a twin network model according to the present invention.
FIG. 3 is a schematic diagram of cross-sentence encoding operations in a twin network model encoding layer.
Detailed Description
Embodiments of the present invention will be further described with reference to examples, but the practice of the present invention is not limited thereto.
Referring to fig. 1, the method for searching a twin network structure for data intelligence provided by the invention is used for text semantic matching, and comprises the following steps:
s1, constructing a twin network model according to the target task.
Referring to fig. 2, the twin network model has five network structures, namely an input layer, a coding layer, an aggregation layer, an interaction layer and an output layer, wherein the input layer is used for mapping discrete text sequences into continuous digital vectors, word embedding vectors can be obtained from the input layer, the coding layer is used for generating higher-level sentence vector embedding representation by combining semantic information of text context and semantic association between text pairs on the basis of the word embedding vectors, and the coding layer mainly solves the semantic problem of word in the actual context. The aggregation layer is used for aggregating word embedded vectors and sentence embedded vectors obtained by the input layer and the coding layer to obtain global semantic reasoning information, the interaction layer is used for fusing the sentence semantic reasoning information output by the aggregation layer by using the fully-connected neural network to generate feature vectors capable of distinguishing semantic relativity between text pairs, and the output layer is used for inputting the final feature vectors into the classifier to obtain predicted class labels.
In some embodiments of the present invention, since the text languages commonly used in specific target tasks include chinese and english, a specific Word2Vec pre-training model is loaded according to the language type, and vectorization is performed on the text, so as to obtain an input vectorized representation.
And determining the classifier of the output layer according to the number of labels of the specific target task. In some embodiments of the present invention, if the target task is a classified task, the classifier of the output layer is a Sigmoid function, and if the target task is a multi-classified task, the classifier of the output layer is a Softmax function.
S2, defining a search space of the network structure, wherein the search space comprises a plurality of operation nodes.
In some embodiments of the present invention, the two portions of the coding layer and the aggregation layer are network structures to be searched of the twin network model, corresponding to different search spaces, respectively.
The coding layer search space is defined as a fully connected directed acyclic graph, containing K operational nodes, K being set to 10 in some embodiments of the invention. The directed edges < i, j > between the operation nodes indicate that the output of operation node i is the input of operation node j, and if one operation node contains multiple directed edges, these inputs will be subjected to a point-by-point addition operation. For each operation node in the search space, the controller first decides the kind of the operation node, and divides the operation node into two kinds of self-coding operation nodes and cross-sentence coding operation nodes. The self-coding operation node refers to the output of which the input is the operation node of the upper layer. The cross-sentence coding operation node refers to that the input of the cross-sentence coding operation node comprises the output of the operation node of the upper layer and the input of the operation node of the same layer of the coding layer, and the attention mechanism calculation is performed.
If the operation node is a self-coding operation node, the selectable operation set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a nulling layer. The convolution neural network layer comprises convolution operations with convolution kernel sizes of 1,3 and 5, the convolution neural network layer is used for expanding receptive fields of semantic features of sentences and improving long-distance dependency capturing capability, the circulation neural network layer comprises a long-period memory network and a gate control circulation network, the circulation neural network layer is used for capturing semantic information of contexts of the sentences, the maximum pooling layer comprises pooling operations with window sizes of 3 and 5, the maximum pooling layer is used for extracting features with the most abundant semantic information and reducing dimensions of the features, the multi-head attention mechanism layer comprises attention mechanisms with four heads and eight heads, the multi-head attention mechanism layer is used for capturing semantic information of long-distance dependency of sentences, the jump connection layer is used for supporting residual operation, and the zeroing layer sets the output of a certain operation node to 0. The speed of training the controller can be accelerated through zero setting operation, and the predicted set loss of the twin network model which does not meet the constraint condition is set to zero, so that the controller does not learn.
If the operation node is a cross-sentence coding operation node, the optional operation set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer, and semantic information aggregation is carried out with another sentence through the layers. As shown in fig. 3, let the operation node 3 of the text one in the coding layer be a cross-sentence coding operation, then it will perform a cross-sentence coding operation with the operation node 3 of the text two in the coding layer.
The aggregation layer search space is defined as a plurality of operation node sets of aggregation operation and comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer. The controller selects outputs of a plurality of operation nodes in the coding layer as inputs of the aggregation layer in addition to one of the aggregation operations at the aggregation layer. In some embodiments of the present invention, the output of the coding layer is specifically input into a controller, and the controller outputs the number of the coding layer.
S3, setting operation constraint conditions of the twin network model.
The controller may preset a certain constraint condition when searching the twin network model, and in some embodiments of the present invention, the set constraint condition is that a predicted speed of the twin network model on the verification set is greater than a set threshold value, and a resource required by the twin network model to run on the verification set is less than the set threshold value. In some embodiments of the present invention, the threshold is set mainly according to the requirements of the actual application scenario, such as a scenario with high real-time performance, the prediction speed must be specified, and may also be determined according to the average characteristics of the samples, and the following STS dataset embodiment is determined according to the characteristics of the samples.
And S4, the controller samples the given search space to obtain a network model, trains the network model and the controller, updates network weight parameters until convergence, and ensures that the network model meets the constraint conditions in the training process.
(1) Using a recurrent neural network architecture as a controller, in some embodiments of the present invention, a long and short term memory network is selected for upsampling in the search space to generate the network model to be trained.
(2) In each round, the controller performs up-sampling in the coding layer and the aggregation layer respectively to generate a network model. The controller makes three steps of judgment on each operation node in the search space of the coding layer, wherein the first step is to judge whether the type of the operation node is a self-coding operation node or a cross-sentence coding operation node, the second step is to judge specific operation, and the third step is to judge which operation node is provided with a directed edge. The controller makes two steps of judgment in the aggregation layer search space, wherein the first step of judgment selects which operation node outputs in the coding layer to be used as the input of the aggregation layer, and the second step of judgment selects which specific aggregation operation. After the controller finishes sampling, the generated network model comprises a plurality of operation nodes in the coding layer and the aggregation layer.
(3) And training the network model generated by the controller sampling on a training set, and updating the network weight parameters of the operation nodes. In order to accelerate the convergence speed of the network model, the same type of operation nodes in the coding layer share the same network weight parameter. Convolutional neural network layers with the integral sizes of 1, 3 and 5 all share the same network weight parameter, and four-head and eight-head multi-head attention mechanism layers share the same network weight parameter.
(4) And carrying out predictive scoring on the trained network model on the verification set, and taking the obtained loss value as a network weight parameter of the rewarding update controller.
(5) Repeating the steps (3) to (4) until the network model and the controller converge.
The network model ensures that the operation constraint condition is met in the training process. And if the predicted speed of the network model on the verification set and the resources occupied by the operation can meet the preset constraint conditions, the loss value is kept unchanged.
In some embodiments of the present invention, in order for the controller to learn faster how to generate a better network model, the network model learns the predicted output of a large pre-trained model on the validation set in addition to the one-hot encoded real labels. The large pre-training model performs fine-tuning training on the target task data set in advance. Therefore, the loss value of the twin network model generated by the controller sampling on the verification set comprises two parts, namely a real label score and a prediction output score of the large pre-training model, and the controller can obtain more feedback from a single sample.
Wherein, the large pre-training model adopts the existing model, as in the text semantic matching method embodiment described below, the RoBERTa model is used.
S5, determining a target twin network model according to the updated network weight parameters.
And the controller with updated network weight parameters resamples the network model, and retrains the network model according to the target task data set to obtain the twin network model optimal for the target task.
The invention also provides a text semantic matching method. In the existing text semantic matching method, a text input by a user is usually analyzed, coded and matched in a preset twin network model, and although a good effect can be obtained, the existing text semantic matching method can only aim at a single field, if the existing text semantic matching method is used in other fields, the preset model can not necessarily obtain a good effect. The semantic matching method provided by the invention can fully utilize the data characteristics of each field to automatically and rapidly search out the optimal twin network model.
The text semantic matching method provided by the invention comprises the following steps:
Step 1, acquiring training data for text semantic matching;
training by adopting the twin network structure searching method provided in the previous embodiment to obtain a target twin network model for text semantic matching;
And 3, carrying out matching prediction on the data by adopting the target twin network model for text semantic matching, so as to obtain a text semantic matching result.
Specifically, in some embodiments of the present invention, taking text training data of a public dataset (such as an STS dataset) as a model input to implement text semantic matching, the text semantic matching method specifically includes the following steps:
M1, acquiring training data for text semantic matching, wherein the training data is text training data of a common public data set STS, and the STS data set is a data set published in 2017 by an international semantic evaluation team and comprises English data sets used by STS tasks in 2012-2017;
m2, training by adopting a twin network structure searching method based on the text training data obtained by the M1 to obtain a target twin network model for text semantic matching;
And M3, performing matching prediction on text test data by adopting the target twin network model obtained by the M2, wherein the test data are STS text test data, and obtaining a test text semantic matching result.
The M2 may be implemented by using the S1-S5, and in some embodiments of the present invention, the method specifically includes:
S1, constructing a twin network model according to the target task. In this embodiment, the text in STS is english, and a Word2Vec pre-training model for english is loaded to vectorize the text, so as to obtain an input vectorized representation.
STS classifies sentence similarity into six classes 0-5, and determines the classifier of the output layer as a Softmax function.
S2, defining a search space of the network structure. The coding layer and the aggregation layer of the twin network model are network structures to be searched, and respectively correspond to different search spaces.
The coding layer search space is defined as a fully connected directed acyclic graph, comprising K operational nodes, K being set to 10 in this embodiment. The directed edges < i, j > between the operation nodes indicate that the output of operation node i is the input of operation node j, and if one operation node contains multiple directed edges, these inputs will be subjected to a point-by-point addition operation. For each operation node in the search space, the controller first decides the kind of the operation node, and divides the operation node into two kinds of self-coding operation nodes and cross-sentence coding operation nodes.
If the operation node is a self-coding operation node, the selectable operation set comprises a convolutional neural network layer, a cyclic neural network layer, a maximum pooling layer, a multi-head attention mechanism layer, a jump connection layer and a nulling layer. The convolution neural network layer comprises convolution operations with convolution kernel sizes of 1, 3 and 5, the circulation neural network layer comprises a long-period memory network and a gating circulation network, the maximum pooling layer comprises pooling operations with window sizes of 3 and 5, the multi-head attention mechanism layer comprises four-head and eight-head attention mechanisms, the jump connection layer is used for supporting residual operation, and the zeroing layer sets the output of a certain operation node to 0.
If the operation node is a cross-sentence coding operation node, the selectable operation set comprises a dot product attention mechanism layer, a splicing layer, a point-by-point multiplication layer, a point-by-point addition layer and a point-by-point subtraction layer. As shown in fig. 3, let the operation node 3 of the text one in the coding layer be a cross-sentence coding operation, then it will perform a cross-sentence coding operation with the operation node 3 of the text two in the coding layer.
The aggregation layer search space is defined as a plurality of operation node sets of aggregation operation and comprises a maximum pooling layer, an average pooling layer and a self-attention mechanism pooling layer. The controller selects outputs of a plurality of operation nodes in the coding layer as inputs of the aggregation layer in addition to one aggregation operation at the aggregation layer.
S3, setting operation constraint conditions of the twin network model. In the embodiment, the average length of STS sentences is 10, the maximum length is 56, the prediction speed threshold of the twin network model on the verification set is set to be 50 milliseconds per sample according to the data characteristics, and the resource threshold required by the twin network model to run on the verification set is set to be 512MB of the GPU video memory.
And S4, the controller samples the given search space to obtain a network model, trains the network model and the controller, updates network weight parameters until convergence, and ensures that the network model meets the constraint conditions in the training process.
(1) The cyclic neural network structure is used as a controller, and a long-term and short-term memory network is selected in the embodiment for generating a network model to be trained.
(2) In each round, the controller samples in the coding layer and the aggregation layer respectively to generate a network model. The controller makes three steps of judgment on each operation node in the search space of the coding layer, the first step judges whether the type of the operation node is a self-coding operation node or a cross-sentence coding operation node, the second step judges specific operation, and the third step judges which operation node establishes a directed edge. The controller makes two steps of judgment in the aggregation layer search space, wherein the first step of judgment selects which operation node outputs in the coding layer to be used as the input of the aggregation layer, and the second step of judgment selects which specific aggregation operation. After the controller finishes sampling, the generated network model comprises a plurality of operation nodes in the coding layer and the aggregation layer.
(3) The controller samples the generated network model to train on the training set, and updates the network weight parameters of the operation nodes. In order to accelerate the convergence speed of the network model, the same type of operation nodes in the coding layer share the same network weight parameter. Convolutional neural network layers with the integral sizes of 1,3 and 5 all share the same network weight parameter, and four-head and eight-head multi-head attention mechanism layers share the same network weight parameter.
(4) And carrying out predictive scoring on the trained network model on the verification set, and taking the obtained loss value as a network weight parameter of the rewarding update controller.
(5) Repeating the steps (3) to (4) until the network model and the controller converge.
The network model ensures that the operation constraint condition is met in the training process. If the prediction speed of the network model on the verification set cannot reach 50 milliseconds per sample, or the running required resources are higher than the GPU video memory by 512MB, the loss value is set to be 0.
The network model learns the predicted output of the large pre-training model on the verification set besides learning the real label of the one-hot code. In some embodiments of the present invention RoBERTa is selected as the large pre-training model. RoBERTa fine tuning training is performed on the target task dataset in advance. The loss value of the sub-model on the validation set consists of two parts, namely the true label score and the predicted output score of RoBERTa.
S5, determining a target twin network model according to the updated network weight parameters. And the controller with updated network weight parameters resamples the network model, and retrains the network model according to STS text training data to obtain the twin network model optimal for STS.
Compared with the conventional semantic matching method which uses a pre-training language model as feature extraction and manually designs a complex network structure according to a specific scene to carry out fine adjustment, the scheme does not need to manually design the network structure, but automatically searches the optimal network structure according to the specific scene by a twin network structure searching method, so that the labor cost is greatly reduced, and the semantic matching effect is improved.
The embodiment of the invention also provides a storage medium which can be a storage medium such as ROM, RAM, magnetic disk, optical disk and the like, and one or more programs are stored in the storage medium, and when the programs are executed by a processor, the twin network structure searching method or the text semantic matching method for data intelligence provided by the embodiment is realized.
The embodiment of the invention also provides equipment which can be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet personal computer or other terminal equipment with a display function, and the equipment comprises a processor and a memory, wherein the memory stores one or more programs, and when the processor executes the programs stored in the memory, the twin network structure searching method facing the data intelligence or the text semantic matching method provided by the embodiment is realized.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210738417.2A CN115114401B (en) | 2022-06-27 | 2022-06-27 | Twin network structure search method and semantic matching method for data intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210738417.2A CN115114401B (en) | 2022-06-27 | 2022-06-27 | Twin network structure search method and semantic matching method for data intelligence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115114401A CN115114401A (en) | 2022-09-27 |
CN115114401B true CN115114401B (en) | 2025-02-14 |
Family
ID=83329683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210738417.2A Active CN115114401B (en) | 2022-06-27 | 2022-06-27 | Twin network structure search method and semantic matching method for data intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115114401B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781680A (en) * | 2019-10-17 | 2020-02-11 | 江南大学 | Semantic Similarity Matching Method Based on Siamese Network and Multi-Head Attention Mechanism |
CN112115347A (en) * | 2020-07-17 | 2020-12-22 | 腾讯科技(深圳)有限公司 | Search result acquisition method and device and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114020906A (en) * | 2021-10-20 | 2022-02-08 | 杭州电子科技大学 | Chinese medical text information matching method and system based on twin neural network |
-
2022
- 2022-06-27 CN CN202210738417.2A patent/CN115114401B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781680A (en) * | 2019-10-17 | 2020-02-11 | 江南大学 | Semantic Similarity Matching Method Based on Siamese Network and Multi-Head Attention Mechanism |
CN112115347A (en) * | 2020-07-17 | 2020-12-22 | 腾讯科技(深圳)有限公司 | Search result acquisition method and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115114401A (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614471B (en) | A method for automatic generation of open-ended questions based on generative adversarial networks | |
CN111612103B (en) | Image description generation method, system and medium combined with abstract semantic representation | |
CN113987147B (en) | Sample processing method and device | |
CN111159485B (en) | Tail entity linking method, device, server and storage medium | |
CN117609421A (en) | Electric power professional knowledge intelligent question-answering system construction method based on large language model | |
CN114358201A (en) | Text-based emotion classification method and device, computer equipment and storage medium | |
CN116594748B (en) | Model customization processing method, device, equipment and medium for task | |
CN110083702B (en) | Aspect level text emotion conversion method based on multi-task learning | |
CN113656563A (en) | A kind of neural network search method and related equipment | |
CN110852089B (en) | Operation and maintenance project management method based on intelligent word segmentation and deep learning | |
CN112925904A (en) | Lightweight text classification method based on Tucker decomposition | |
KR20240138087A (en) | Routing to expert subnetworks in a mixture-of-expert neural network | |
CN114492661B (en) | Text data classification method and device, computer equipment and storage medium | |
CN113204633A (en) | Semantic matching distillation method and device | |
CN112183106A (en) | Semantic understanding method and device based on phoneme association and deep learning | |
CN119539095B (en) | Financial question-answering method, system, device and medium based on multi-agent interaction | |
CN118332170A (en) | Search enhancement generation method based on optimized word embedding | |
CN113051904A (en) | A Link Prediction Method for Small-scale Knowledge Graphs | |
US11941360B2 (en) | Acronym definition network | |
CN110852066B (en) | A method and system for multilingual entity relation extraction based on adversarial training mechanism | |
CN116976283A (en) | Language processing method, training method, device, equipment, medium and program product | |
CN115422369A (en) | Knowledge graph completion method and device based on improved TextRank | |
CN113887836A (en) | Narrative event prediction method fusing event environment information | |
CN114519353A (en) | Model training method, emotion message generation device, emotion message generation equipment and emotion message generation medium | |
CN113033212A (en) | Text data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |