[go: up one dir, main page]

CN110874553A - Recognition model training method and device - Google Patents

Recognition model training method and device Download PDF

Info

Publication number
CN110874553A
CN110874553A CN201811019880.1A CN201811019880A CN110874553A CN 110874553 A CN110874553 A CN 110874553A CN 201811019880 A CN201811019880 A CN 201811019880A CN 110874553 A CN110874553 A CN 110874553A
Authority
CN
China
Prior art keywords
probability
sequence
preset
target sequence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811019880.1A
Other languages
Chinese (zh)
Inventor
陈凯
谢迪
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201811019880.1A priority Critical patent/CN110874553A/en
Publication of CN110874553A publication Critical patent/CN110874553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a recognition model training method and a device, wherein the recognition model training method comprises the following steps: obtaining a sequence sample; inputting the sequence sample into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence; according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that targets in the preset backward target sequence are in front and targets in the preset forward target sequence are behind at the same position to obtain a forward and backward target sequence, and calculating a third probability of the forward and backward target sequence; calculating an objective function according to the first probability, the second probability and the third probability; and training the recognition model by utilizing a preset training algorithm according to the target function. By the scheme, the real-time identification of the identification model can be realized.

Description

Recognition model training method and device
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a recognition model training method and apparatus.
Background
With the development of artificial intelligence technology, machine learning has been widely used in the fields of target detection and tracking, behavior detection and recognition, speech recognition, etc., as a core technology of artificial intelligence. DNN (Deep Neural Network) is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain.
In a conventional DNN, such as a CNN (Convolutional Neural Network), a Network model establishes a mapping relationship between input data and an output result, the input data is input into the Network model to obtain the output result, and the output results obtained by the input data at different times are independent of each other. However, in some special application scenarios, such as speech recognition, video object tracking, etc., there is a large correlation between the data at each time and the data at other times. RNN (Recurrent Neural Network) is a DNN that implements cyclic sequence operations, and the operation of RNN on each input data depends on the operation results on the input data at other times.
When the recognition model established based on the RNN is trained, a forward calculation mode is mostly adopted, and the forward calculation process is to introduce the operation result of the past time into the operation of the current time. The model obtained by training tends to utilize as much future information as possible, so that the operation result at each moment often has delay, and the recognition model cannot meet the requirement of real-time recognition.
Disclosure of Invention
The embodiment of the application aims to provide a recognition model training method and a recognition model training device so as to realize real-time recognition of a recognition model. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a recognition model training method, where the method includes:
obtaining a sequence sample;
inputting the sequence sample into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence;
according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that the target in the preset backward target sequence is in front and the target in the preset forward target sequence is behind at the same position to obtain a forward and backward target sequence, and calculating a third probability of the forward and backward target sequence;
calculating an objective function according to the first probability, the second probability and the third probability;
and training the recognition model by utilizing a preset training algorithm according to the target function.
Optionally, the recognition model includes a recurrent neural network and a join-sense time-series classification algorithm;
the step of inputting the sequence samples into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence includes:
inputting the sequence samples into the recurrent neural network, obtaining a first probability sequence consisting of output probabilities of all the features in the sequence samples through forward calculation of the recurrent neural network, and calculating a first probability of a preset forward target sequence by using the connection ambiguity time sequence classification algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample through backward calculation of the recurrent neural network, and calculating a second probability of a preset backward target sequence by using the join-sense time sequence classification algorithm according to the second probability sequence.
Optionally, the calculating a third probability of the forward and backward target sequences includes:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequences by utilizing the joint sense time sequence classification algorithm according to the third probability sequence.
Optionally, the calculating an objective function according to the first probability, the second probability and the third probability includes:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb)
said g being said objective function, said PfIs the first probability, the PbIs the second probability, the PfbIs the third probability.
Optionally, the preset training algorithm includes: a back propagation algorithm;
the training the recognition model according to the target function by using a preset training algorithm comprises:
determining an error between a prediction sequence obtained after the sequence sample is input into the recognition model and a preset target sequence according to the target function, wherein the preset target sequence is the preset forward target sequence or the preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model according to the error by utilizing the back propagation algorithm.
In a second aspect, an embodiment of the present application provides a recognition model training apparatus, where the apparatus includes:
the acquisition module is used for acquiring a sequence sample;
the identification module is used for inputting the sequence samples into an identification model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence;
the arrangement module is used for arranging according to the preset forward target sequence and the preset backward target sequence and the sequence of the target in the preset backward target sequence at the same position before and the target in the preset forward target sequence after to obtain a forward and backward target sequence;
the calculation module is used for calculating a third probability of the forward and backward target sequence; calculating an objective function according to the first probability, the second probability and the third probability;
and the training module is used for training the recognition model by utilizing a preset training algorithm according to the target function.
Optionally, the recognition model includes a recurrent neural network and a join-sense time-series classification algorithm;
the identification module is specifically configured to:
inputting the sequence samples into the recurrent neural network, obtaining a first probability sequence consisting of output probabilities of all the features in the sequence samples through forward calculation of the recurrent neural network, and calculating a first probability of a preset forward target sequence by using the connection ambiguity time sequence classification algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample through backward calculation of the recurrent neural network, and calculating a second probability of a preset backward target sequence by using the join-sense time sequence classification algorithm according to the second probability sequence.
Optionally, the calculation module is specifically configured to:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequences by utilizing the joint sense time sequence classification algorithm according to the third probability sequence.
Optionally, the calculation module is specifically configured to:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb)
said g being said objective function, said PfIs the first probability, the PbIs the second probability, the PfbIs the third probability.
Optionally, the preset training algorithm includes: a back propagation algorithm;
the training module is specifically configured to:
determining an error between a prediction sequence obtained after the sequence sample is input into the recognition model and a preset target sequence according to the target function, wherein the preset target sequence is the preset forward target sequence or the preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model according to the error by utilizing the back propagation algorithm.
According to the recognition model training method and device provided by the embodiment of the application, a sequence sample is obtained, the sequence sample is input into a recognition model, a first probability of a preset forward target sequence and a second probability of a preset backward target sequence are obtained, according to the preset forward target sequence and the preset backward target sequence, the sequence of a target in the preset backward target sequence in front and a target in the preset forward target sequence in back are preset according to the same position, a forward and backward target sequence is obtained by arranging, a third probability of the forward and backward target sequence is calculated, according to the first probability, the second probability and the third probability, a target function is calculated, and according to the target function, a preset training algorithm is utilized to train the recognition model. The preset forward target sequence and the preset backward target sequence are rearranged, the sequence of the target in each position preset backward target sequence in the forward target sequence and the sequence of the target in the preset forward target sequence in the backward target sequence are restricted in the forward target sequence and the backward target sequence, so that the restriction on the decoding positions of forward calculation and backward calculation is added into the calculated target function, namely, the target at each position is decoded, the backward calculation is earlier than the forward calculation, the forward calculation can be delayed and the backward calculation can be advanced, therefore, the result of the forward calculation in the trained recognition model is not delayed and the result of the backward calculation is not advanced through the restriction on the decoding positions, and the real-time recognition of the recognition model is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of prior art RNN-based speech recognition;
FIG. 2 is a schematic flow chart illustrating a recognition model training method according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a recognition model training apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in FIG. 1, the RNN-based speech recognition block is represented by an input speech feature sequence x ═ x1,x2,x3,…,xT]The RNN is a sequence learning network model, and may be an LSTM (Long Short Term Memory) network, a GRU (Gated recovery Unit) network, or the like.
In currently used RNNs, a bidirectional calculation process is generally used to input a feature sequence, or a forward calculation process is used to input a feature sequence. If the input feature sequence is processed by bidirectional calculation, the output result at any moment is related to the whole input feature sequence, can only be used for offline identification and cannot be used for real-time identification; if the characteristic sequence is input by adopting forward calculation processing, the RNN network model is also obtained by adopting forward calculation training, finally, the trained network model has different time delays during recognition, and the recognition model can not meet the requirement of real-time recognition.
In order to realize real-time recognition of a recognition model, the embodiment of the application provides a recognition model training method and device, electronic equipment and a machine-readable storage medium.
Next, a method for training a recognition model provided in the embodiment of the present application is first described.
An execution main body of the recognition model training method provided in the embodiments of the present application may be an electronic device for executing an intelligent algorithm, the electronic device may be an intelligent device having functions of target detection and tracking, behavior detection and recognition, or voice recognition, for example, a remote computer, a remote server, an intelligent camera, an intelligent voice device, and the like, and the execution main body should at least include a processor with a core processing chip. The method for implementing the recognition model training method provided by the embodiment of the application may be at least one of software, hardware circuit and logic circuit arranged in the execution subject.
As shown in fig. 2, a recognition model training method provided in the embodiment of the present application may include the following steps:
s201, obtaining a sequence sample.
The embodiment can be applied to scenes such as voice recognition, target tracking and the like, and therefore, the sequence samples can be voice sequences, video frame sequences, character sequences and the like.
S202, inputting the sequence sample into the recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence.
The recognition model is a DNN model for implementing functions such as speech recognition and target recognition, in order to implement a cyclic operation on a sequence, the recognition model includes an RNN and an algorithm unit for performing probability calculation, and the algorithm for performing probability calculation may include a CTC (connection semantic Temporal Classification), an HMM (Hidden markov model), an Attention mechanism, and the like.
The RNN operation process comprises forward calculation and backward calculation, wherein the forward calculation is that when the characteristics of a certain moment are operated, the operation input needs to consider the operation state of each moment before the moment besides the characteristics of the moment; in the backward calculation, when the feature at a certain time is calculated, the input of the calculation needs to take into account the calculation state at each time after the time in addition to the feature at the time. The preset forward target sequence is a target sequence expected to be obtained during forward calculation, and the preset backward target sequence is a target sequence expected to be obtained during backward calculation, and in general, the preset forward target sequence is the same as the preset backward target sequence. The first probability is the probability of obtaining the preset forward target sequence through forward calculation, and the second probability is the probability of obtaining the preset backward target sequence through backward calculation.
Alternatively, the recognition model may include RNN and CTC algorithms.
RNN is a powerful sequence learning model, but it requires that the input be pre-segmented data, and thus the application is greatly limited. The basic idea behind the requirement to pre-segment data, combined with CTC, is to interpret the network output of RNN as a probability distribution over all possible class sequences. Given this distribution, the objective function is the probability of maximizing the preset target sequence.
For each input of the RNN at time t, the network output layer has L +1(L is the number of class sets) nodes, where the output of the first L nodes is the probability that each class is observed at time t, and the output of the L +1 st node is the probability that a space is observed. The addition of the space output allows the CTC to handle the fact that the adjacent objects in the default target sequence are of the same type. And (4) integrating the values of the output layers at all the moments to calculate the probability of any target sequence. CTC integrates all alignment scenarios, so no pre-segmentation of data is required.
Correspondingly, S202 may specifically be:
inputting the sequence sample into an RNN (radio network node), obtaining a first probability sequence consisting of output probabilities of all features in the sequence sample through forward calculation of the RNN, and calculating a first probability of a preset forward target sequence by using a CTC (continuous traffic control) algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample by RNN backward calculation, and calculating a second probability of a preset backward target sequence by using a CTC algorithm according to the second probability sequence.
By forward calculation of RNN, a first probability sequence [ p ] can be obtainedf1,pf2,pf3,…,pfT]Wherein n ∈ [1, T ]]Setting a preset forward target sequence as [ pi ] for the time of inputting the characteristic sequencef1f2f3,…,πfU]Calculating the first probability of the preset forward target sequence as P through a CTC algorithmf(ii) a Through the backward calculation of RNN, a second probability sequence [ p ] can be obtainedb1,pb2,pb3,…,pbT]Setting a preset backward target sequence as [ pi ]b1b2b3,…,πbU]Calculating a second probability P of the preset backward target sequence through the CTC algorithmb
S203, according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that the target in the preset backward target sequence is in front and the target in the preset forward target sequence is behind at the same position to obtain a forward and backward target sequence, and calculating a third probability of the forward and backward target sequence.
In the RNN operation, the forward calculation may be delayed and the backward calculation may be advanced, and in order to make the forward calculation not delayed and the backward calculation not advanced, the embodiment uses the mutual constraint of the decoding results of the forward and backward RNNs, that is, according to the preset forward target sequence and the preset backward target sequence, the forward and backward target sequences are obtained by arranging according to the sequence of the target in the preset backward target sequence at the same position before the target in the preset forward target sequence after the target in the preset forward target sequence, as described above, the preset forward target sequence is set to [ pi [ ]f1f2f3,…,πfU]The preset backward target sequence is [ pi ]b1b2b3,…,πbU]Then the obtained forward and backward target sequence is[πb1f1b2f2b3f3,…,πbUfU]. And calculating to obtain a third probability of the forward and backward target sequence through an algorithm mechanism such as CTC, HMM, Attention and the like.
Optionally, the step of calculating the third probability of the forward and backward target sequences in S203 may specifically be:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequence by using a CTC algorithm according to the third probability sequence.
Before the third probability is calculated by the CTC algorithm, a third probability sequence with mutual constraint in the forward and backward directions needs to be calculated, and the third probability sequence is calculated by adding each output probability in the first probability sequence and the output probability at the same time in the second probability sequence and dividing the added probability by two (namely, calculating the average value of each output probability in the first probability sequence and the output probability at the same time in the second probability sequence), for example, the first probability sequence is [ p [f1,pf2,pf3,…,pfT]The second probability sequence is [ p ]b1,pb2,pb3,…,pbT]Then the third probability sequence is [ (p)f1+pb1)/2,(pf2+pb2)/2,(pf3+pb3)/2,…,(pfT+pbT)/2]Calculating the third probability of the forward and backward target sequence as P by using a CTC algorithmfb
And S204, calculating an objective function according to the first probability, the second probability and the third probability.
The target function represents the direction and degree of the model parameter adjustment for the function based on which the recognition model training is performed, for example, the gradient function in the gradient training, and in this embodiment, the first probability of the preset forward target sequence, the second probability of the preset backward target sequence, and the third probability of the forward and backward target sequence are calculated respectively, and by integrating these three probabilities, the target function can be calculated, and the target function represents the direction and degree of the model parameter adjustment more completely. The objective function guarantees the recognition rate of forward calculation and backward calculation, and limits the decoding order of the forward calculation and the backward calculation.
Optionally, S204 may specifically be:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb) (1)
g is an objective function, PfIs a first probability, PbIs a second probability, PfbIs the third probability.
For a training algorithm such as a back propagation algorithm, a logarithmic relationship is formed between the objective function and the probabilities, the first probability, the second probability and the third probability can be respectively subjected to logarithm negation, and then the three results are added to obtain the objective function.
And S205, training the recognition model by using a preset training algorithm according to the target function.
The preset training algorithm may be a back propagation algorithm, a gradient algorithm, or other common training algorithms, and is not limited in this respect.
Optionally, the preset training algorithm may include: a back propagation algorithm.
Correspondingly, S205 may specifically be:
determining an error between a prediction sequence obtained after a sequence sample is input into an identification model and a preset target sequence according to a target function, wherein the preset target sequence is a preset forward target sequence or a preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model by using a back propagation algorithm according to the error between the prediction sequence and the preset target sequence.
The training process is to input the sequence sample into the recognition model to obtain a prediction sequence, calculate the error between the prediction sequence and a preset target sequence, continuously adjust the model parameters of the recognition model by using a back propagation algorithm based on the error, and train the recognition model through multiple times of loop iteration. After the final recognition model is obtained through training, when the input of a voice sequence, a video sequence and the like is recognized, the recognition result can be obtained in real time by directly using the forward calculation of the RNN.
By applying the embodiment, a sequence sample is obtained and input into the recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence, according to the preset forward target sequence and the preset backward target sequence, a sequence of a target in the backward target sequence being forward and a sequence of a target in the preset forward target sequence being backward is preset according to the same position to obtain a forward and backward target sequence, a third probability of the forward and backward target sequence is calculated, an objective function is calculated according to the first probability, the second probability and the third probability, and according to the objective function, a preset training algorithm is used for training the recognition model. The preset forward target sequence and the preset backward target sequence are rearranged, the sequence of the target in each position preset backward target sequence in the forward target sequence and the sequence of the target in the preset forward target sequence in the backward target sequence are restricted in the forward target sequence and the backward target sequence, so that the restriction on the decoding positions of forward calculation and backward calculation is added into the calculated target function, namely, the target at each position is decoded, the backward calculation is earlier than the forward calculation, the forward calculation can be delayed and the backward calculation can be advanced, therefore, the result of the forward calculation in the trained recognition model is not delayed and the result of the backward calculation is not advanced through the restriction on the decoding positions, and the real-time recognition of the recognition model is realized.
Corresponding to the above method embodiment, an embodiment of the present application provides a recognition model training apparatus, as shown in fig. 3, the recognition model training apparatus may include:
an obtaining module 310, configured to obtain a sequence sample;
the recognition module 320 is configured to input the sequence samples into a recognition model, so as to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence;
an arranging module 330, configured to, according to the preset forward target sequence and the preset backward target sequence, arrange to obtain a forward and backward target sequence according to an order that a target in the preset backward target sequence is forward and a target in the preset forward target sequence is backward at the same position;
a calculating module 340, configured to calculate a third probability of the forward and backward target sequence; calculating an objective function according to the first probability, the second probability and the third probability;
and a training module 350, configured to train the recognition model according to the objective function by using a preset training algorithm.
Optionally, the recognition model may include a recurrent neural network and a join-sense time-series classification algorithm;
the identification module 320 may be specifically configured to:
inputting the sequence samples into the recurrent neural network, obtaining a first probability sequence consisting of output probabilities of all the features in the sequence samples through forward calculation of the recurrent neural network, and calculating a first probability of a preset forward target sequence by using the connection ambiguity time sequence classification algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample through backward calculation of the recurrent neural network, and calculating a second probability of a preset backward target sequence by using the join-sense time sequence classification algorithm according to the second probability sequence.
Optionally, the calculating module 340 may be specifically configured to:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequences by utilizing the joint sense time sequence classification algorithm according to the third probability sequence.
Optionally, the calculating module 340 may be specifically configured to:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb)
said g being said objective function, said PfIs the first probability, the PbIs the second probability, the PfbIs the third probability.
Optionally, the preset training algorithm may include: a back propagation algorithm;
the training module 350 may specifically be configured to:
determining an error between a prediction sequence obtained after the sequence sample is input into the recognition model and a preset target sequence according to the target function, wherein the preset target sequence is the preset forward target sequence or the preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model according to the error by utilizing the back propagation algorithm.
By applying the embodiment, a sequence sample is obtained and input into the recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence, according to the preset forward target sequence and the preset backward target sequence, a sequence of a target in the backward target sequence being forward and a sequence of a target in the preset forward target sequence being backward is preset according to the same position to obtain a forward and backward target sequence, a third probability of the forward and backward target sequence is calculated, an objective function is calculated according to the first probability, the second probability and the third probability, and according to the objective function, a preset training algorithm is used for training the recognition model. The preset forward target sequence and the preset backward target sequence are rearranged, the sequence of the target in each position preset backward target sequence in the forward target sequence and the sequence of the target in the preset forward target sequence in the backward target sequence are restricted in the forward target sequence and the backward target sequence, so that the restriction on the decoding positions of forward calculation and backward calculation is added into the calculated target function, namely, the target at each position is decoded, the backward calculation is earlier than the forward calculation, the forward calculation can be delayed and the backward calculation can be advanced, therefore, the result of the forward calculation in the trained recognition model is not delayed and the result of the backward calculation is not advanced through the restriction on the decoding positions, and the real-time recognition of the recognition model is realized.
In accordance with the above method embodiments, the present application provides an electronic device, as shown in fig. 4, which includes a processor 401 and a memory 402, wherein,
the memory 402 for storing a computer program;
the processor 401 is configured to implement any step of the above-mentioned recognition model training method when executing the computer program stored in the memory 402.
The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a GPU (Graphics Processing Unit), a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processor), an ASIC (application specific Integrated Circuit), an FPGA (Field-Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In this embodiment, the processor of the electronic device can read the computer program stored in the memory and run the computer program to implement: the method comprises the steps of obtaining a sequence sample, inputting the sequence sample into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence, obtaining a forward and backward target sequence according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that targets in the preset backward target sequence are forward and targets in the preset forward target sequence are backward at the same position, calculating a third probability of the forward and backward target sequence, calculating a target function according to the first probability, the second probability and the third probability, and training the recognition model according to the target function by using a preset training algorithm. The preset forward target sequence and the preset backward target sequence are rearranged, the sequence of the target in each position preset backward target sequence in the forward target sequence and the sequence of the target in the preset forward target sequence in the backward target sequence are restricted in the forward target sequence and the backward target sequence, so that the restriction on the decoding positions of forward calculation and backward calculation is added into the calculated target function, namely, the target at each position is decoded, the backward calculation is earlier than the forward calculation, the forward calculation can be delayed and the backward calculation can be advanced, therefore, the result of the forward calculation in the trained recognition model is not delayed and the result of the backward calculation is not advanced through the restriction on the decoding positions, and the real-time recognition of the recognition model is realized.
In addition, corresponding to the recognition model training method provided in the foregoing embodiments, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any step of the recognition model training method.
In this embodiment, when running, the computer readable storage medium executes the computer program of the recognition model training method provided in the embodiment of the present application, so that the following can be implemented: the method comprises the steps of obtaining a sequence sample, inputting the sequence sample into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence, obtaining a forward and backward target sequence according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that targets in the preset backward target sequence are forward and targets in the preset forward target sequence are backward at the same position, calculating a third probability of the forward and backward target sequence, calculating a target function according to the first probability, the second probability and the third probability, and training the recognition model according to the target function by using a preset training algorithm. The preset forward target sequence and the preset backward target sequence are rearranged, the sequence of the target in each position preset backward target sequence in the forward target sequence and the sequence of the target in the preset forward target sequence in the backward target sequence are restricted in the forward target sequence and the backward target sequence, so that the restriction on the decoding positions of forward calculation and backward calculation is added into the calculated target function, namely, the target at each position is decoded, the backward calculation is earlier than the forward calculation, the forward calculation can be delayed and the backward calculation can be advanced, therefore, the result of the forward calculation in the trained recognition model is not delayed and the result of the backward calculation is not advanced through the restriction on the decoding positions, and the real-time recognition of the recognition model is realized.
For the embodiments of the electronic device and the computer-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the electronic device, and the computer-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiments.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A recognition model training method, the method comprising:
obtaining a sequence sample;
inputting the sequence sample into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence;
according to the preset forward target sequence and the preset backward target sequence, arranging according to the sequence that the target in the preset backward target sequence is in front and the target in the preset forward target sequence is behind at the same position to obtain a forward and backward target sequence, and calculating a third probability of the forward and backward target sequence;
calculating an objective function according to the first probability, the second probability and the third probability;
and training the recognition model by utilizing a preset training algorithm according to the target function.
2. The method of claim 1, wherein the recognition model comprises a recurrent neural network and a join-sense temporal classification algorithm;
the step of inputting the sequence samples into a recognition model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence includes:
inputting the sequence samples into the recurrent neural network, obtaining a first probability sequence consisting of output probabilities of all the features in the sequence samples through forward calculation of the recurrent neural network, and calculating a first probability of a preset forward target sequence by using the connection ambiguity time sequence classification algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample through backward calculation of the recurrent neural network, and calculating a second probability of a preset backward target sequence by using the join-sense time sequence classification algorithm according to the second probability sequence.
3. The method of claim 2, wherein the calculating the third probability for the forward and backward target sequences comprises:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequences by utilizing the joint sense time sequence classification algorithm according to the third probability sequence.
4. The method of claim 1, wherein computing an objective function based on the first probability, the second probability, and the third probability comprises:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb)
said g being said objective function, said PfIs the first probability, the PbIs the second probability, the PfbIs the third probability.
5. The method of claim 1, wherein the pre-set training algorithm comprises: a back propagation algorithm;
the training the recognition model according to the target function by using a preset training algorithm comprises:
determining an error between a prediction sequence obtained after the sequence sample is input into the recognition model and a preset target sequence according to the target function, wherein the preset target sequence is the preset forward target sequence or the preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model according to the error by utilizing the back propagation algorithm.
6. An apparatus for training a recognition model, the apparatus comprising:
the acquisition module is used for acquiring a sequence sample;
the identification module is used for inputting the sequence samples into an identification model to obtain a first probability of a preset forward target sequence and a second probability of a preset backward target sequence;
the arrangement module is used for arranging according to the preset forward target sequence and the preset backward target sequence and the sequence of the target in the preset backward target sequence at the same position before and the target in the preset forward target sequence after to obtain a forward and backward target sequence;
the calculation module is used for calculating a third probability of the forward and backward target sequence; calculating an objective function according to the first probability, the second probability and the third probability;
and the training module is used for training the recognition model by utilizing a preset training algorithm according to the target function.
7. The apparatus of claim 6, wherein the recognition model comprises a recurrent neural network and a join-sense temporal classification algorithm;
the identification module is specifically configured to:
inputting the sequence samples into the recurrent neural network, obtaining a first probability sequence consisting of output probabilities of all the features in the sequence samples through forward calculation of the recurrent neural network, and calculating a first probability of a preset forward target sequence by using the connection ambiguity time sequence classification algorithm according to the first probability sequence;
and obtaining a second probability sequence consisting of the output probabilities of the features in the sequence sample through backward calculation of the recurrent neural network, and calculating a second probability of a preset backward target sequence by using the join-sense time sequence classification algorithm according to the second probability sequence.
8. The apparatus of claim 7, wherein the computing module is specifically configured to:
calculating the average value of the output probabilities in the first probability sequence and the output probabilities at the same time in the second probability sequence according to the first probability sequence and the second probability sequence to obtain a third probability sequence;
and calculating a third probability of the forward and backward target sequences by utilizing the joint sense time sequence classification algorithm according to the third probability sequence.
9. The apparatus of claim 6, wherein the computing module is specifically configured to:
calculating an objective function by using an objective function calculation formula according to the first probability, the second probability and the third probability, wherein the objective function calculation formula is as follows:
g=-log(Pf)-log(Pb)-log(Pfb)
said g being said objective function, said PfIs the first probability, the PbIs the second probability, the PfbIs the third probability.
10. The apparatus of claim 6, wherein the predetermined training algorithm comprises: a back propagation algorithm;
the training module is specifically configured to:
determining an error between a prediction sequence obtained after the sequence sample is input into the recognition model and a preset target sequence according to the target function, wherein the preset target sequence is the preset forward target sequence or the preset backward target sequence;
and training the recognition model by adjusting each parameter of the recognition model according to the error by utilizing the back propagation algorithm.
CN201811019880.1A 2018-09-03 2018-09-03 Recognition model training method and device Pending CN110874553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811019880.1A CN110874553A (en) 2018-09-03 2018-09-03 Recognition model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811019880.1A CN110874553A (en) 2018-09-03 2018-09-03 Recognition model training method and device

Publications (1)

Publication Number Publication Date
CN110874553A true CN110874553A (en) 2020-03-10

Family

ID=69716749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811019880.1A Pending CN110874553A (en) 2018-09-03 2018-09-03 Recognition model training method and device

Country Status (1)

Country Link
CN (1) CN110874553A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737920A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Data processing method, equipment and medium based on recurrent neural network
CN114463376A (en) * 2021-12-24 2022-05-10 北京达佳互联信息技术有限公司 Video character tracking method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737920A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Data processing method, equipment and medium based on recurrent neural network
CN111737920B (en) * 2020-06-24 2024-04-26 深圳前海微众银行股份有限公司 Data processing method, equipment and medium based on cyclic neural network
CN114463376A (en) * 2021-12-24 2022-05-10 北京达佳互联信息技术有限公司 Video character tracking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11107222B2 (en) Video object tracking
CN109086873B (en) Training method, identification method, device and processing device of recurrent neural network
US20230022387A1 (en) Method and apparatus for image segmentation model training and for image segmentation
CN111741330B (en) Video content evaluation method and device, storage medium and computer equipment
WO2021238262A1 (en) Vehicle recognition method and apparatus, device, and storage medium
Arnelid et al. Recurrent conditional generative adversarial networks for autonomous driving sensor modelling
TWI734375B (en) Image processing method, proposal evaluation method, and related devices
CN114155270A (en) Pedestrian trajectory prediction method, device, device and storage medium
CN110047096B (en) A kind of multi-object tracking method and system based on depth conditions random field models
CN111652181B (en) Target tracking method and device and electronic equipment
Jie et al. Anytime recognition with routing convolutional networks
CN112131944B (en) Video behavior recognition method and system
CN110147699A (en) A kind of image-recognizing method, device and relevant device
CN111950419A (en) Image information prediction method, image information prediction device, computer equipment and storage medium
CN116343080A (en) Dynamic sparse key frame video target detection method, device and storage medium
CN111079507A (en) Behavior recognition method and device, computer device and readable storage medium
CN112614168B (en) Target face tracking method and device, electronic equipment and storage medium
CN112884147A (en) Neural network training method, image processing method, device and electronic equipment
CN113033212B (en) Text data processing method and device
WO2019138897A1 (en) Learning device and method, and program
Ramasso et al. Human action recognition in videos based on the transferable belief model: application to athletics jumps
CN110874553A (en) Recognition model training method and device
CN114004992B (en) Training methods for multi-label classification models, multi-label classification methods for images
US20230386164A1 (en) Method for training an object recognition model in a computing device
CN112396069B (en) Semantic edge detection method, device, system and medium based on joint learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination