CN112466282B - Speech recognition system and method oriented to aerospace professional field - Google Patents
Speech recognition system and method oriented to aerospace professional field Download PDFInfo
- Publication number
- CN112466282B CN112466282B CN202011139217.2A CN202011139217A CN112466282B CN 112466282 B CN112466282 B CN 112466282B CN 202011139217 A CN202011139217 A CN 202011139217A CN 112466282 B CN112466282 B CN 112466282B
- Authority
- CN
- China
- Prior art keywords
- network
- sequence
- long
- short
- memory network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000004927 fusion Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/083—Recognition networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a voice recognition system and a method for the aerospace professional field, wherein the system comprises the following steps: the encoder is composed of a first long-short-time memory network and is used for inputting an acoustic characteristic sequence and outputting hidden representations corresponding to the acoustic characteristic sequence after encoding; the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos at first, outputs hidden representation corresponding to a first word of the text sequence, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network; the bias coding network is composed of a third long-short-time memory network and is used for inputting the professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field; and a fusion network formed by the multi-layer perceptron, fusing output results of the three networks, and predicting the next word of the text sequence.
Description
Technical Field
The invention relates to the technical field of electronic information. And more particularly to a voice recognition system and method for the aerospace field.
Background
Voice interaction is one of the most natural man-machine interaction modes. At the heart of voice interaction is voice recognition, i.e., converting voice into text for subsequent processing by a computer. In recent years, speech recognition has made a great breakthrough and has been put into practical use by people. Meanwhile, with the development of aerospace technology, people have had an opportunity to enter space. It has become a necessary technology to make astronauts more natural and convenient to interact and control with the equipment. The voice recognition system in the aerospace field needs to occupy system resources lower and have lower calculation cost, and meanwhile, the professional vocabulary of the aerospace equipment needs to be recognized more accurately.
Currently, there are many techniques and systems for speech recognition, such as large vocabulary speech recognition systems based on hidden markov, which are used in many commercial products. These large vocabulary continuous speech recognition systems often build decoding networks based on weighted finite state transducers. The decoding network is very bulky, resulting in a very computationally expensive search in the decoding process. The storage and memory occupation of the whole system are high, and the power consumption during decoding is also high, so that the application of the whole system in the field of aerospace is limited. However, if the size of the decoding network is too compressed, the performance of the recognition system is greatly impaired, resulting in a great increase in error rate.
Therefore, a new voice recognition method and system facing the aerospace field are needed, so that the calculation cost and the storage occupation can be reduced, and the professional vocabulary and daily language in the aerospace field can be recognized efficiently and accurately.
Disclosure of Invention
The invention provides a voice recognition system and a voice recognition method for the aerospace professional field, which solve the problems of high calculation cost and low accuracy of professional vocabulary recognition of the existing voice recognition system.
In order to achieve the above object, the present invention provides the following technical solutions:
the first aspect of the present invention provides a speech recognition system oriented to the aerospace professional field, comprising:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
In a specific embodiment, the encoder formed by the first long-short-time memory network encodes the extracted acoustic feature sequence according to the following formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
In a specific embodiment, the prediction network formed by the second long-short-time memory network obtains the hidden representation of each word in the corresponding text sequence according to the following formula:
c j =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden representation corresponding to the word at the j-1 th position, y j An embedded vector of words for j-position.
In a specific embodiment, the bias coding network formed by the third long-short-time memory network obtains hidden representations corresponding to the professional vocabulary sequence in the aerospace field according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k The embedded vector is an embedded vector of a special vocabulary sequence k position word in the aerospace field.
In a specific embodiment, the multi-layer perceptron comprises a fusion network, which fuses output results of three networks, namely an encoder formed by a first long-short-time memory network, a prediction network formed by a second long-short-time memory network and a bias coding network formed by a third long-time memory network, and predicts the next word of the text sequence according to the following formula:
P(y j+1 )=MLP([c t ,b k ,h t ])
wherein MLP is a function of the multi-layer perceptron, wherein h t B) representing hidden representation corresponding to acoustic feature sequence at t time extracted by encoder composed of first long-short-time memory network k C) hiding representation of word correspondence for a specialized vocabulary sequence in the k-th position aerospace domain of a bias encoding network consisting of a third long-short-term memory network t Hidden representations corresponding to words at the t-th position of the predictive network formed for the second long-short term memory network.
In a specific embodiment, in the recognition stage, an optimal text sequence is searched out according to the Viterbi algorithm from the following formulas, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
In a specific embodiment, the system may identify the corresponding specialized vocabulary of the aerospace domain in response to the user providing the specialized vocabulary sequence z of the aerospace domain.
A second aspect of the invention provides a method of training using the system of the first aspect of the invention by:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a labeled text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary sequence in the aerospace field.
A third aspect of the invention provides a method of speech recognition of a system trained using the method of the second aspect of the invention, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a method according to the second aspect of the invention or any of the third aspects of the invention.
A fifth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterised in that the processor implements the method according to the second or any of the third aspects of the invention when executing the program.
The beneficial effects of the invention are as follows:
according to the aerospace-field-oriented voice recognition method and system, voice recognition is performed by inputting the acoustic feature sequence and the aerospace-field professional vocabulary provided by the user. The method can realize higher recognition accuracy for the professional vocabulary in the aerospace field.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic diagram of a speech recognition system for the aerospace field according to one embodiment of the invention.
Detailed Description
In order to more clearly illustrate the present invention, the present invention will be further described with reference to preferred embodiments and the accompanying drawings. Like parts in the drawings are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and that this invention is not limited to the details given herein.
A first embodiment of the present invention provides a speech recognition system for the aerospace professional field, as shown in fig. 1, including:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
In a specific embodiment, the encoder formed by the first long-short-time memory network encodes the extracted acoustic feature sequence according to the following formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
In a specific embodiment, the prediction network formed by the second long-short-time memory network obtains the hidden representation of each word in the corresponding text sequence according to the following formula:
c j =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden table corresponding to the word in the j-1 positionShow, y j An embedded vector of words for j-position.
In a specific embodiment, the bias coding network formed by the third long-short-time memory network obtains hidden representations corresponding to the professional vocabulary sequence in the aerospace field according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k The embedded vector is an embedded vector of a special vocabulary sequence k position word in the aerospace field.
In a specific embodiment, the multi-layer perceptron comprises a fusion network, which fuses output results of three networks, namely an encoder formed by a first long-short-time memory network, a prediction network formed by a second long-short-time memory network and a bias coding network formed by a third long-time memory network, and predicts the next word of the text sequence according to the following formula:
P(y j+1 (=MLP([c j ,b k ,y j ])
wherein MLP is a function of the multi-layer perceptron.
In one specific embodiment, the MLP is composed of a matrix and a nonlinear function:
MLP(q)=W 2 max(W 1 q,0)
wherein W is 1 And W is 2 Q is the input vector and q is the parameter.
In a specific embodiment, in the recognition stage, an optimal text sequence is searched out according to the Viterbi algorithm from the following formulas, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
In a specific embodiment, the system may identify the corresponding specialized vocabulary of the aerospace domain in response to the user providing the specialized vocabulary sequence z of the aerospace domain.
A second embodiment of the present invention provides a method for training using the system according to the first embodiment of the present invention, by training with the following loss function:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a labeled text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary sequence in the aerospace field.
A third embodiment of the present invention provides a method of speech recognition of a system trained using the method of the second embodiment of the present invention, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
A fourth embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method according to any of the second or third embodiments of the invention.
A fifth embodiment of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to the second embodiment of the invention or any of the third embodiments of the invention when executing the program.
It should be understood that the foregoing examples of the present invention are provided merely for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention, and that various other changes and modifications may be made therein by one skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (11)
1. A speech recognition system for the aerospace profession field, comprising:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
2. The system of claim 1, wherein the encoder comprised of the first long-short-term memory network encodes the extracted acoustic signature sequence according to the formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
3. The system of claim 1, wherein the predictive network of second long-short-term memory networks derives a hidden representation of each word in the corresponding text sequence according to the following formula;
c t =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden representation corresponding to the word at the j-1 th position, y j An embedded vector of words for j-position.
4. The system of claim 1, wherein the bias encoding network obtains the hidden representation corresponding to the sequence of specialized vocabulary in the aerospace domain according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k For special vocabulary sequence k position words in the aerospace fieldThe vector is embedded.
5. The system of claim 1, wherein the multi-layer perceptron is configured to fuse the output results of three networks, namely an encoder configured from a first long-short-term memory network, a prediction network configured from a second long-short-term memory network, and a bias encoding network configured from a third long-short-term memory network, to predict the next word of the text sequence according to the following formula:
P(y j+1 )=MLP([c t ,b k ,h t ])
wherein MLP is a function of the multi-layer perceptron, wherein h t B) representing hidden representation corresponding to acoustic feature sequence at t time extracted by encoder composed of first long-short-time memory network k C) hiding representation of word correspondence for a specialized vocabulary sequence in the k-th position aerospace domain of a bias encoding network consisting of a third long-short-term memory network t Hidden representations corresponding to words at the t-th position of the predictive network formed for the second long-short term memory network.
6. The system of claim 1, wherein during the recognition stage, the optimal text sequence is searched out according to the Viterbi algorithm from the following formula, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
7. The system of claim 6, wherein the system is responsive to a user providing a sequence of specialized vocabulary z for the aerospace field to identify a corresponding specialized vocabulary for the aerospace field.
8. A method of training the system of any of claims 1-7, characterized by training by the following loss function:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a marked text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary text sequence in the aerospace field.
9. A method of speech recognition of a system trained using the method of claim 8, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 8 or 9.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 8 or 9 when the program is executed by the processor.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011139217.2A CN112466282B (en) | 2020-10-22 | 2020-10-22 | Speech recognition system and method oriented to aerospace professional field |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011139217.2A CN112466282B (en) | 2020-10-22 | 2020-10-22 | Speech recognition system and method oriented to aerospace professional field |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112466282A CN112466282A (en) | 2021-03-09 |
| CN112466282B true CN112466282B (en) | 2023-11-28 |
Family
ID=74834120
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011139217.2A Active CN112466282B (en) | 2020-10-22 | 2020-10-22 | Speech recognition system and method oriented to aerospace professional field |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112466282B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113903358B (en) * | 2021-10-15 | 2022-11-04 | 贝壳找房(北京)科技有限公司 | Voice quality inspection method, readable storage medium and computer program product |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
| CN107408111A (en) * | 2015-11-25 | 2017-11-28 | 百度(美国)有限责任公司 | End-to-end speech recognition |
| CN110738984A (en) * | 2019-05-13 | 2020-01-31 | 苏州闪驰数控系统集成有限公司 | Artificial intelligence CNN, LSTM neural network speech recognition system |
| CN110970031A (en) * | 2019-12-16 | 2020-04-07 | 苏州思必驰信息科技有限公司 | Speech recognition system and method |
| CN111785257A (en) * | 2020-07-10 | 2020-10-16 | 四川大学 | A method and device for air-traffic speech recognition for a small number of labeled samples |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10170114B2 (en) * | 2013-05-30 | 2019-01-01 | Promptu Systems Corporation | Systems and methods for adaptive proper name entity recognition and understanding |
| EP3510594B1 (en) * | 2016-10-10 | 2020-07-01 | Google LLC | Very deep convolutional neural networks for end-to-end speech recognition |
| CN111429889B (en) * | 2019-01-08 | 2023-04-28 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention |
-
2020
- 2020-10-22 CN CN202011139217.2A patent/CN112466282B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107408111A (en) * | 2015-11-25 | 2017-11-28 | 百度(美国)有限责任公司 | End-to-end speech recognition |
| CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
| CN110738984A (en) * | 2019-05-13 | 2020-01-31 | 苏州闪驰数控系统集成有限公司 | Artificial intelligence CNN, LSTM neural network speech recognition system |
| CN110970031A (en) * | 2019-12-16 | 2020-04-07 | 苏州思必驰信息科技有限公司 | Speech recognition system and method |
| CN111785257A (en) * | 2020-07-10 | 2020-10-16 | 四川大学 | A method and device for air-traffic speech recognition for a small number of labeled samples |
Non-Patent Citations (2)
| Title |
|---|
| 《Music Chord Recognition Based on Midi-Trained Deep Feature and BLSTM-CRF Hybird Decoding》;Yiming Wu;《2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;全文 * |
| 陆空语音识别的深度网络模型研究;邱意;《中国优秀硕士学位论文全文数据库》;全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112466282A (en) | 2021-03-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113987169B (en) | Methods, apparatus, devices, and storage media for semantic block-based text summarization generation | |
| CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
| JP7346788B2 (en) | Speech recognition model training methods, devices, equipment, and storage media | |
| CN115309877B (en) | Dialogue generation method, dialogue model training method and device | |
| CN110196967B (en) | Sequence labeling method and device based on deep conversion architecture | |
| CN110427625B (en) | Sentence completion method, apparatus, medium, and dialogue processing system | |
| KR20180001889A (en) | Language processing method and apparatus | |
| JP2016110082A (en) | Language model training method and apparatus, and speech recognition method and apparatus | |
| CN113655893B (en) | A word and sentence generation method, model training method and related equipment | |
| CN112037773B (en) | An N-optimal spoken language semantic recognition method, device and electronic device | |
| CN117708692A (en) | Entity emotion analysis method and system based on double-channel graph convolution neural network | |
| CN115831102A (en) | Speech recognition method and device based on pre-training feature representation and electronic equipment | |
| CN113160820B (en) | Speech recognition method, training method, device and equipment of speech recognition model | |
| CN109979461B (en) | Voice translation method and device | |
| JP7765622B2 (en) | Fusion of acoustic and textual representations in an automatic speech recognition system implemented as an RNN-T | |
| CN115238048A (en) | A Fast Interactive Approach to Joint Intent Recognition and Slot Filling | |
| CN114912441A (en) | Text error correction model generation method, error correction method, system, device and medium | |
| CN109933773A (en) | A kind of multiple semantic sentence analysis system and method | |
| CN115101050A (en) | Speech recognition model training method and device, speech recognition method and medium | |
| KR20210058765A (en) | Speech recognition method, device, electronic device and storage media | |
| CN114373443A (en) | Speech synthesis method and apparatus, computing device, storage medium, and program product | |
| CN120239884A (en) | Semi-supervised training scheme for speech recognition | |
| CN116306612A (en) | A method for generating words and sentences and related equipment | |
| CN112257432A (en) | Adaptive intent recognition method, device and electronic device | |
| CN112466282B (en) | Speech recognition system and method oriented to aerospace professional field |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |