[go: up one dir, main page]

CN112466282B - Speech recognition system and method oriented to aerospace professional field - Google Patents

Speech recognition system and method oriented to aerospace professional field Download PDF

Info

Publication number
CN112466282B
CN112466282B CN202011139217.2A CN202011139217A CN112466282B CN 112466282 B CN112466282 B CN 112466282B CN 202011139217 A CN202011139217 A CN 202011139217A CN 112466282 B CN112466282 B CN 112466282B
Authority
CN
China
Prior art keywords
network
sequence
long
short
memory network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011139217.2A
Other languages
Chinese (zh)
Other versions
CN112466282A (en
Inventor
温正棋
李博
刘进涛
任斌
李振龙
周仔恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simulation Center
Original Assignee
Beijing Simulation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simulation Center filed Critical Beijing Simulation Center
Priority to CN202011139217.2A priority Critical patent/CN112466282B/en
Publication of CN112466282A publication Critical patent/CN112466282A/en
Application granted granted Critical
Publication of CN112466282B publication Critical patent/CN112466282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a voice recognition system and a method for the aerospace professional field, wherein the system comprises the following steps: the encoder is composed of a first long-short-time memory network and is used for inputting an acoustic characteristic sequence and outputting hidden representations corresponding to the acoustic characteristic sequence after encoding; the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos at first, outputs hidden representation corresponding to a first word of the text sequence, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network; the bias coding network is composed of a third long-short-time memory network and is used for inputting the professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field; and a fusion network formed by the multi-layer perceptron, fusing output results of the three networks, and predicting the next word of the text sequence.

Description

Speech recognition system and method oriented to aerospace professional field
Technical Field
The invention relates to the technical field of electronic information. And more particularly to a voice recognition system and method for the aerospace field.
Background
Voice interaction is one of the most natural man-machine interaction modes. At the heart of voice interaction is voice recognition, i.e., converting voice into text for subsequent processing by a computer. In recent years, speech recognition has made a great breakthrough and has been put into practical use by people. Meanwhile, with the development of aerospace technology, people have had an opportunity to enter space. It has become a necessary technology to make astronauts more natural and convenient to interact and control with the equipment. The voice recognition system in the aerospace field needs to occupy system resources lower and have lower calculation cost, and meanwhile, the professional vocabulary of the aerospace equipment needs to be recognized more accurately.
Currently, there are many techniques and systems for speech recognition, such as large vocabulary speech recognition systems based on hidden markov, which are used in many commercial products. These large vocabulary continuous speech recognition systems often build decoding networks based on weighted finite state transducers. The decoding network is very bulky, resulting in a very computationally expensive search in the decoding process. The storage and memory occupation of the whole system are high, and the power consumption during decoding is also high, so that the application of the whole system in the field of aerospace is limited. However, if the size of the decoding network is too compressed, the performance of the recognition system is greatly impaired, resulting in a great increase in error rate.
Therefore, a new voice recognition method and system facing the aerospace field are needed, so that the calculation cost and the storage occupation can be reduced, and the professional vocabulary and daily language in the aerospace field can be recognized efficiently and accurately.
Disclosure of Invention
The invention provides a voice recognition system and a voice recognition method for the aerospace professional field, which solve the problems of high calculation cost and low accuracy of professional vocabulary recognition of the existing voice recognition system.
In order to achieve the above object, the present invention provides the following technical solutions:
the first aspect of the present invention provides a speech recognition system oriented to the aerospace professional field, comprising:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
In a specific embodiment, the encoder formed by the first long-short-time memory network encodes the extracted acoustic feature sequence according to the following formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
In a specific embodiment, the prediction network formed by the second long-short-time memory network obtains the hidden representation of each word in the corresponding text sequence according to the following formula:
c j =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden representation corresponding to the word at the j-1 th position, y j An embedded vector of words for j-position.
In a specific embodiment, the bias coding network formed by the third long-short-time memory network obtains hidden representations corresponding to the professional vocabulary sequence in the aerospace field according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k The embedded vector is an embedded vector of a special vocabulary sequence k position word in the aerospace field.
In a specific embodiment, the multi-layer perceptron comprises a fusion network, which fuses output results of three networks, namely an encoder formed by a first long-short-time memory network, a prediction network formed by a second long-short-time memory network and a bias coding network formed by a third long-time memory network, and predicts the next word of the text sequence according to the following formula:
P(y j+1 )=MLP([c t ,b k ,h t ])
wherein MLP is a function of the multi-layer perceptron, wherein h t B) representing hidden representation corresponding to acoustic feature sequence at t time extracted by encoder composed of first long-short-time memory network k C) hiding representation of word correspondence for a specialized vocabulary sequence in the k-th position aerospace domain of a bias encoding network consisting of a third long-short-term memory network t Hidden representations corresponding to words at the t-th position of the predictive network formed for the second long-short term memory network.
In a specific embodiment, in the recognition stage, an optimal text sequence is searched out according to the Viterbi algorithm from the following formulas, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
In a specific embodiment, the system may identify the corresponding specialized vocabulary of the aerospace domain in response to the user providing the specialized vocabulary sequence z of the aerospace domain.
A second aspect of the invention provides a method of training using the system of the first aspect of the invention by:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a labeled text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary sequence in the aerospace field.
A third aspect of the invention provides a method of speech recognition of a system trained using the method of the second aspect of the invention, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a method according to the second aspect of the invention or any of the third aspects of the invention.
A fifth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterised in that the processor implements the method according to the second or any of the third aspects of the invention when executing the program.
The beneficial effects of the invention are as follows:
according to the aerospace-field-oriented voice recognition method and system, voice recognition is performed by inputting the acoustic feature sequence and the aerospace-field professional vocabulary provided by the user. The method can realize higher recognition accuracy for the professional vocabulary in the aerospace field.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic diagram of a speech recognition system for the aerospace field according to one embodiment of the invention.
Detailed Description
In order to more clearly illustrate the present invention, the present invention will be further described with reference to preferred embodiments and the accompanying drawings. Like parts in the drawings are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and that this invention is not limited to the details given herein.
A first embodiment of the present invention provides a speech recognition system for the aerospace professional field, as shown in fig. 1, including:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
In a specific embodiment, the encoder formed by the first long-short-time memory network encodes the extracted acoustic feature sequence according to the following formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
In a specific embodiment, the prediction network formed by the second long-short-time memory network obtains the hidden representation of each word in the corresponding text sequence according to the following formula:
c j =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden table corresponding to the word in the j-1 positionShow, y j An embedded vector of words for j-position.
In a specific embodiment, the bias coding network formed by the third long-short-time memory network obtains hidden representations corresponding to the professional vocabulary sequence in the aerospace field according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k The embedded vector is an embedded vector of a special vocabulary sequence k position word in the aerospace field.
In a specific embodiment, the multi-layer perceptron comprises a fusion network, which fuses output results of three networks, namely an encoder formed by a first long-short-time memory network, a prediction network formed by a second long-short-time memory network and a bias coding network formed by a third long-time memory network, and predicts the next word of the text sequence according to the following formula:
P(y j+1 (=MLP([c j ,b k ,y j ])
wherein MLP is a function of the multi-layer perceptron.
In one specific embodiment, the MLP is composed of a matrix and a nonlinear function:
MLP(q)=W 2 max(W 1 q,0)
wherein W is 1 And W is 2 Q is the input vector and q is the parameter.
In a specific embodiment, in the recognition stage, an optimal text sequence is searched out according to the Viterbi algorithm from the following formulas, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
In a specific embodiment, the system may identify the corresponding specialized vocabulary of the aerospace domain in response to the user providing the specialized vocabulary sequence z of the aerospace domain.
A second embodiment of the present invention provides a method for training using the system according to the first embodiment of the present invention, by training with the following loss function:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a labeled text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary sequence in the aerospace field.
A third embodiment of the present invention provides a method of speech recognition of a system trained using the method of the second embodiment of the present invention, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
A fourth embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method according to any of the second or third embodiments of the invention.
A fifth embodiment of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to the second embodiment of the invention or any of the third embodiments of the invention when executing the program.
It should be understood that the foregoing examples of the present invention are provided merely for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention, and that various other changes and modifications may be made therein by one skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (11)

1. A speech recognition system for the aerospace profession field, comprising:
the encoder is composed of a first long-short-time memory network and is used for inputting the acoustic feature sequence extracted by the feature extractor based on signal processing, and outputting hidden representation corresponding to the acoustic feature sequence after encoding;
the prediction network formed by the second long-short-time memory network inputs a text sequence initial symbol sos, outputs hidden representation corresponding to a first word of the text sequence through the prediction network, inputs an embedded vector of a word every time, and outputs hidden representation corresponding to the predicted word after passing through the prediction network;
the bias coding network is composed of a third long-short-time memory network and is used for inputting a professional vocabulary sequence in the aerospace field and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
and inputting output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the offset coding network formed by the third long-time memory network by the fusion network formed by the multi-layer perceptron, and predicting the next word of the text sequence.
2. The system of claim 1, wherein the encoder comprised of the first long-short-term memory network encodes the extracted acoustic signature sequence according to the formula:
h t =LSTM(h t-1 ,x t )
wherein LSTM is a unit function of a long-short-time memory network, h t For the hidden representation corresponding to the acoustic feature sequence at the time t, h t-1 For the hidden representation corresponding to the acoustic feature sequence at time t-1, x t Is the acoustic feature sequence at time t.
3. The system of claim 1, wherein the predictive network of second long-short-term memory networks derives a hidden representation of each word in the corresponding text sequence according to the following formula;
c t =LSTM(c j-1 ,y j )
wherein LSTM is a unit function of long-short-term memory network, c j-1 Hidden representation corresponding to the word at the j-1 th position, y j An embedded vector of words for j-position.
4. The system of claim 1, wherein the bias encoding network obtains the hidden representation corresponding to the sequence of specialized vocabulary in the aerospace domain according to the following formula:
b k =LSTM(b k-1 ,z k )
wherein LSTM is a unit function of a long-short-term memory network, b k-1 Hidden representation corresponding to words of a specialized vocabulary sequence for the k-1 th position aerospace field, z k For special vocabulary sequence k position words in the aerospace fieldThe vector is embedded.
5. The system of claim 1, wherein the multi-layer perceptron is configured to fuse the output results of three networks, namely an encoder configured from a first long-short-term memory network, a prediction network configured from a second long-short-term memory network, and a bias encoding network configured from a third long-short-term memory network, to predict the next word of the text sequence according to the following formula:
P(y j+1 )=MLP([c t ,b k ,h t ])
wherein MLP is a function of the multi-layer perceptron, wherein h t B) representing hidden representation corresponding to acoustic feature sequence at t time extracted by encoder composed of first long-short-time memory network k C) hiding representation of word correspondence for a specialized vocabulary sequence in the k-th position aerospace domain of a bias encoding network consisting of a third long-short-term memory network t Hidden representations corresponding to words at the t-th position of the predictive network formed for the second long-short term memory network.
6. The system of claim 1, wherein during the recognition stage, the optimal text sequence is searched out according to the Viterbi algorithm from the following formula, which specifically includes:
y * =argmax(trans(x,z,y))
wherein trans represents the whole speech recognition system model, argmax represents the word corresponding to the maximum probability value, x represents the acoustic feature sequence, z represents the professional vocabulary sequence in the aerospace field, y represents all text sequences, and y represents the optimal text sequence.
7. The system of claim 6, wherein the system is responsive to a user providing a sequence of specialized vocabulary z for the aerospace field to identify a corresponding specialized vocabulary for the aerospace field.
8. A method of training the system of any of claims 1-7, characterized by training by the following loss function:
wherein θ represents a parameter of the whole neural network, a represents a text sequence containing inserted filling symbols, y represents a marked text sequence, x represents an acoustic feature sequence, and z represents a special vocabulary text sequence in the aerospace field.
9. A method of speech recognition of a system trained using the method of claim 8, comprising:
inputting the acoustic feature sequence extracted by the feature extractor based on signal processing into the trained system, and outputting hidden representations corresponding to the acoustic feature sequence through an encoder formed by a first long-short-time memory network;
in a trained system, inputting a text sequence initial symbol sos into a prediction network formed by a second long-short-time memory network, outputting hidden representations corresponding to a first word of the text sequence, inputting an embedded vector of a word each time, and outputting the hidden representations corresponding to the predicted word after passing through the prediction network;
in the trained system, inputting the professional vocabulary sequence in the aerospace field into a bias coding network formed by a third long-short-time memory network, and outputting hidden representations corresponding to the professional vocabulary sequence in the aerospace field after coding;
in the trained system, a fusion network formed by a multi-layer perceptron fuses output results of the encoder formed by the first long-short-time memory network, the prediction network formed by the second long-short-time memory network and the bias coding network formed by the third long-short-time memory network to predict the next word of the text sequence.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 8 or 9.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 8 or 9 when the program is executed by the processor.
CN202011139217.2A 2020-10-22 2020-10-22 Speech recognition system and method oriented to aerospace professional field Active CN112466282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011139217.2A CN112466282B (en) 2020-10-22 2020-10-22 Speech recognition system and method oriented to aerospace professional field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011139217.2A CN112466282B (en) 2020-10-22 2020-10-22 Speech recognition system and method oriented to aerospace professional field

Publications (2)

Publication Number Publication Date
CN112466282A CN112466282A (en) 2021-03-09
CN112466282B true CN112466282B (en) 2023-11-28

Family

ID=74834120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011139217.2A Active CN112466282B (en) 2020-10-22 2020-10-22 Speech recognition system and method oriented to aerospace professional field

Country Status (1)

Country Link
CN (1) CN112466282B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903358B (en) * 2021-10-15 2022-11-04 贝壳找房(北京)科技有限公司 Voice quality inspection method, readable storage medium and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network
CN107408111A (en) * 2015-11-25 2017-11-28 百度(美国)有限责任公司 End-to-end speech recognition
CN110738984A (en) * 2019-05-13 2020-01-31 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network speech recognition system
CN110970031A (en) * 2019-12-16 2020-04-07 苏州思必驰信息科技有限公司 Speech recognition system and method
CN111785257A (en) * 2020-07-10 2020-10-16 四川大学 A method and device for air-traffic speech recognition for a small number of labeled samples

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10170114B2 (en) * 2013-05-30 2019-01-01 Promptu Systems Corporation Systems and methods for adaptive proper name entity recognition and understanding
EP3510594B1 (en) * 2016-10-10 2020-07-01 Google LLC Very deep convolutional neural networks for end-to-end speech recognition
CN111429889B (en) * 2019-01-08 2023-04-28 百度在线网络技术(北京)有限公司 Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107408111A (en) * 2015-11-25 2017-11-28 百度(美国)有限责任公司 End-to-end speech recognition
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network
CN110738984A (en) * 2019-05-13 2020-01-31 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network speech recognition system
CN110970031A (en) * 2019-12-16 2020-04-07 苏州思必驰信息科技有限公司 Speech recognition system and method
CN111785257A (en) * 2020-07-10 2020-10-16 四川大学 A method and device for air-traffic speech recognition for a small number of labeled samples

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Music Chord Recognition Based on Midi-Trained Deep Feature and BLSTM-CRF Hybird Decoding》;Yiming Wu;《2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;全文 *
陆空语音识别的深度网络模型研究;邱意;《中国优秀硕士学位论文全文数据库》;全文 *

Also Published As

Publication number Publication date
CN112466282A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN113987169B (en) Methods, apparatus, devices, and storage media for semantic block-based text summarization generation
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
JP7346788B2 (en) Speech recognition model training methods, devices, equipment, and storage media
CN115309877B (en) Dialogue generation method, dialogue model training method and device
CN110196967B (en) Sequence labeling method and device based on deep conversion architecture
CN110427625B (en) Sentence completion method, apparatus, medium, and dialogue processing system
KR20180001889A (en) Language processing method and apparatus
JP2016110082A (en) Language model training method and apparatus, and speech recognition method and apparatus
CN113655893B (en) A word and sentence generation method, model training method and related equipment
CN112037773B (en) An N-optimal spoken language semantic recognition method, device and electronic device
CN117708692A (en) Entity emotion analysis method and system based on double-channel graph convolution neural network
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN113160820B (en) Speech recognition method, training method, device and equipment of speech recognition model
CN109979461B (en) Voice translation method and device
JP7765622B2 (en) Fusion of acoustic and textual representations in an automatic speech recognition system implemented as an RNN-T
CN115238048A (en) A Fast Interactive Approach to Joint Intent Recognition and Slot Filling
CN114912441A (en) Text error correction model generation method, error correction method, system, device and medium
CN109933773A (en) A kind of multiple semantic sentence analysis system and method
CN115101050A (en) Speech recognition model training method and device, speech recognition method and medium
KR20210058765A (en) Speech recognition method, device, electronic device and storage media
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
CN120239884A (en) Semi-supervised training scheme for speech recognition
CN116306612A (en) A method for generating words and sentences and related equipment
CN112257432A (en) Adaptive intent recognition method, device and electronic device
CN112466282B (en) Speech recognition system and method oriented to aerospace professional field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant