CN104700828B - The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle - Google Patents
The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle Download PDFInfo
- Publication number
- CN104700828B CN104700828B CN201510122982.6A CN201510122982A CN104700828B CN 104700828 B CN104700828 B CN 104700828B CN 201510122982 A CN201510122982 A CN 201510122982A CN 104700828 B CN104700828 B CN 104700828B
- Authority
- CN
- China
- Prior art keywords
- time
- gate
- input
- output
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
一种基于选择性注意原理的深度长短期记忆循环神经网络声学模型的构建方法,通过在深度长短期记忆循环神经网络声学模型中增加注意门单元,来表征听觉皮层神经元的瞬时功能改变,注意门单元与其他门单元不同之处在于,其他门单元与时间序列一一对应,而注意门单元体现的是短期可塑性效应,因此在时间序列上存在间隔;通过对包含Cross‑talk噪声的大量语音数据进行训练获得的上述神经网络声学模型,可以实现对Cross‑talk噪声的鲁棒特征提取和鲁棒声学模型的构建,通过抑制非目标流对特征提取的影响可以达到提高声学模型的鲁棒性的目的;该方法可广泛应用于涉及语音识别的说话人识别、关键词识别、人机交互等多种机器学习领域。
A method for constructing an acoustic model of deep long-term short-term memory recurrent neural network based on the principle of selective attention. By adding attention gate units to the acoustic model of deep long-term short-term memory recurrent neural network, it can represent the instantaneous functional changes of neurons in the auditory cortex. The difference between the gate unit and other gate units is that other gate units have one-to-one correspondence with the time series, while the attention gate unit reflects the short-term plasticity effect, so there are gaps in the time series; The above-mentioned neural network acoustic model obtained by data training can realize robust feature extraction and construction of a robust acoustic model for Cross-talk noise, and improve the robustness of the acoustic model by suppressing the impact of non-target flow on feature extraction The purpose of this method; this method can be widely used in various machine learning fields such as speaker recognition, keyword recognition, and human-computer interaction involving speech recognition.
Description
技术领域technical field
本发明属于音频技术领域,特别涉及一种基于选择性注意原理的深度长短期记忆循环神经网络声学模型的构建方法。The invention belongs to the field of audio technology, in particular to a method for constructing an acoustic model of a deep long-term short-term memory cyclic neural network based on the principle of selective attention.
背景技术Background technique
随着信息技术的迅速发展,语音识别技术已经具备大规模商业化的条件。目前语音识别主要采用基于统计模型的连续语音识别技术,其主要目标是通过给定的语音序列寻找其所代表的概率最大的词序列。基于统计模型的连续语音识别系统的任务是根据给定的语音序列寻找其所代表的概率最大的词序列,通常包括构建声学模型和语言模型及其对应的搜索解码方法。随着声学模型和语言模型的快速发展,语音识别系统的性能在理想声学环境下已经大为改善,现有的深度神经网络-隐马尔科夫模型(Deep Neural Network-Hidden Markov Model,DNN-HMM)初步成熟,通过机器学习的方法可以自动提取有效特征,并能对多帧语音对应的上下文信息建模,但是此类模型每一层都有百万量级的参数,且下一层的输入是上一次的输出,因此需要使用GPU设备来训练DNN声学模型,训练时间长;高度非线性以及参数共享的特性也使得DNN难以进行参数自适应。With the rapid development of information technology, speech recognition technology has the conditions for large-scale commercialization. At present, speech recognition mainly uses continuous speech recognition technology based on statistical models, and its main goal is to find the word sequence with the highest probability represented by a given speech sequence. The task of the continuous speech recognition system based on the statistical model is to find the most probable word sequence represented by the given speech sequence, usually including the construction of acoustic models and language models and their corresponding search and decoding methods. With the rapid development of acoustic models and language models, the performance of speech recognition systems has been greatly improved in an ideal acoustic environment. The existing Deep Neural Network-Hidden Markov Model (DNN-HMM ) is initially mature, and effective features can be automatically extracted through machine learning methods, and context information corresponding to multi-frame speech can be modeled, but each layer of this type of model has millions of parameters, and the input of the next layer It is the last output, so it is necessary to use GPU equipment to train the DNN acoustic model, and the training time is long; the highly nonlinear and parameter sharing characteristics also make it difficult for DNN to perform parameter adaptation.
循环神经网络(Recurrent Neural Network,RNN)是一种单元之间存在有向循环来表达网络内部动态时间特性的神经网络,在手写体识别和语言模型等方面得到广泛应用。语音信号是复杂的时变信号,在不同时间尺度上具有复杂的相关性,因此相比于深度神经网络而言,循环神经网络具有的循环连接功能更适合处理这类复杂时序数据。Recurrent Neural Network (RNN) is a neural network in which there are directed cycles between units to express the dynamic time characteristics of the network. It is widely used in handwriting recognition and language models. Speech signals are complex time-varying signals with complex correlations on different time scales. Therefore, compared with deep neural networks, the recurrent connection function of recurrent neural networks is more suitable for processing such complex time series data.
作为循环神经网络的一种,长短期记忆(Long Short-Term Memory,LSTM)模型比循环神经网络更适合处理和预测事件滞后且时间不定的长时序列。多伦多大学提出的增加了记忆模块(memory block)的深度LSTM-RNN声学模型则将深度神经网络的多层次表征能力与循环神经网络灵活利用长跨度上下文的能力结合,使得基于TIMIT库的音素识别错误率降至17.1%。As a type of recurrent neural network, the Long Short-Term Memory (LSTM) model is more suitable than the recurrent neural network for processing and predicting long-term sequences with event lags and uncertain times. The deep LSTM-RNN acoustic model proposed by the University of Toronto, which adds a memory block, combines the multi-level representation capabilities of deep neural networks with the ability of recurrent neural networks to flexibly use long-span contexts, making phoneme recognition errors based on the TIMIT library The rate fell to 17.1%.
但是循环神经网络中使用的梯度下降法存在梯度消散(vanishing gradient)问题,也就是在对网络的权重进行调整的过程中,随着网络层数增加,梯度逐层消散,致使其对权重调整的作用越来越小。谷歌提出的两层深度LSTM-RNN声学模型,在以前的深度LSTM-RNN模型中增加了线性循环投影层(Recurrent Projection Layer),用于解决梯度消散问题。对比实验表明,RNN的帧正确率(Frame Accuracy)及其收敛速度明显逊于LSTM-RNN和DNN;在词错误率及其收敛速度方面,最好的DNN在训练数周后的词错误率为11.3%;而两层深度LSTM-RNN模型在训练48小时后词错误率降低至10.9%,训练100/200小时后,词错误率降低至10.7/10.5(%)。However, the gradient descent method used in the cyclic neural network has the problem of gradient dissipation (vanishing gradient), that is, in the process of adjusting the weight of the network, as the number of network layers increases, the gradient dissipates layer by layer, resulting in its effect on weight adjustment. The effect is getting smaller and smaller. The two-layer deep LSTM-RNN acoustic model proposed by Google adds a linear recurrent projection layer (Recurrent Projection Layer) to the previous deep LSTM-RNN model to solve the gradient dissipation problem. Comparative experiments show that the Frame Accuracy and its convergence speed of RNN are obviously inferior to those of LSTM-RNN and DNN; in terms of word error rate and its convergence speed, the word error rate of the best DNN after several weeks of training is 11.3%; while the word error rate of the two-layer deep LSTM-RNN model was reduced to 10.9% after training for 48 hours, and after 100/200 hours of training, the word error rate was reduced to 10.7/10.5(%).
慕尼黑大学提出的深度双向长短期记忆循环神经网络(Deep BidirectionalLong Short-Term Memory Recurrent Neural Networks,DBLSTM-RNN)声学模型,在神经网络的每个循环层中定义了相互独立的前向层和后向层,并使用多隐藏层对输入的声学特征进行更高层表征,同时对噪声和混响进行有监督学习实现特征投影和增强。此方法在2013PASCAL CHiME数据集上,在信噪比[-6dB,9dB]范围内实现了词错误率从基线的55%降低到22%。The Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks (DBLSTM-RNN) acoustic model proposed by the University of Munich defines mutually independent forward and backward layers in each cyclic layer of the neural network. Layer, and use multiple hidden layers to perform higher-level representation of the input acoustic features, and perform supervised learning on noise and reverberation to achieve feature projection and enhancement. This method achieves a word error rate reduction from 55% of the baseline to 22% in the SNR [-6dB, 9dB] range on the 2013 PASCAL CHiME dataset.
但实际声学环境的复杂性仍然严重影响和干扰连续语音识别系统的性能,即使利用目前主流的DNN声学模型方法,在包括噪声、音乐、口语、重复等复杂环境条件下的连续语音识别数据集上也只能获得70%左右的识别率,连续语音识别系统中声学模型的抗噪性和鲁棒性仍有待改进。However, the complexity of the actual acoustic environment still seriously affects and interferes with the performance of the continuous speech recognition system. Even if the current mainstream DNN acoustic model method is used, the continuous speech recognition data set under complex environmental conditions including noise, music, spoken language, repetition, etc. It can only obtain a recognition rate of about 70%, and the noise resistance and robustness of the acoustic model in the continuous speech recognition system still need to be improved.
随着声学模型和语言模型的快速发展,语音识别系统的性能在理想声学环境下已经大为改善,现有的DNN-HMM模型初步成熟,通过机器学习的方法可以自动提取有效特征,并能对多帧语音对应的上下文信息建模。然而大多数识别系统对于声学环境的改变仍然十分敏感,特别是在cross-talk噪声(两人或多人同时说话)干扰下不能满足实用性能的要求。与深度神经网络声学模型相比,循环神经网络声学模型中的单元之间存在有向循环,可以有效的描述神经网络内部的动态时间特性,更适合处理具有复杂时序的语音数据。而长短期记忆神经网络比循环神经网络更适合处理和预测事件滞后且时间不定的长时序列,因此用于构建语音识别的声学模型能够取得更好的效果。With the rapid development of acoustic models and language models, the performance of speech recognition systems has been greatly improved in an ideal acoustic environment. The existing DNN-HMM model is initially mature, and effective features can be automatically extracted through machine learning methods, and can be used for Contextual information modeling for multi-frame speech. However, most recognition systems are still very sensitive to changes in the acoustic environment, especially under the interference of cross-talk noise (two or more people talking at the same time), which cannot meet the requirements of practical performance. Compared with the deep neural network acoustic model, there is a directed cycle between the units in the recurrent neural network acoustic model, which can effectively describe the dynamic time characteristics inside the neural network, and is more suitable for processing speech data with complex timing. The long-short-term memory neural network is more suitable than the recurrent neural network for processing and predicting long-term sequences with event lags and uncertain times, so the acoustic model used to build speech recognition can achieve better results.
人脑在处理复杂场景的语音时存在选择性注意的现象,其主要原理为:人脑具有听觉选择性注意的能力,在听觉皮层区域通过自上而下的控制机制,来实现抑制非目标流和增强目标流的目的。研究表明,在选择性注意的过程中,听觉皮层的短期可塑性(Short-Term Plasticity)效应增加了对声音的区分能力。在注意力非常集中时,在初级听觉皮层可以在50毫秒内开始对声音目标进行增强处理。The human brain has the phenomenon of selective attention when processing speech in complex scenes. The main principle is that the human brain has the ability of auditory selective attention. In the auditory cortex area, the top-down control mechanism is used to suppress non-target flow. And the purpose of enhancing the target flow. Studies have shown that in the process of selective attention, the short-term plasticity (Short-Term Plasticity) effect in the auditory cortex increases the ability to distinguish sounds. During intense attention, reinforcement processing of sound objects can begin within 50 milliseconds in the primary auditory cortex.
发明内容Contents of the invention
为了克服上述现有技术的缺点,本发明的目的在于提供一种基于选择性注意原理的深度长短期记忆循环神经网络声学模型的构建方法,建立了基于选择性注意原理的深度长短期记忆循环神经网络声学模型,通过在深度长短期记忆循环神经网络声学模型中增加注意门单元,来表征听觉皮层神经元的瞬时功能改变,注意门单元与其他门单元不同之处在于,其他门单元与时间序列一一对应,而注意门单元体现的是短期可塑性效应,因此在时间序列上存在间隔。通过对包含Cross-talk噪声的大量语音数据进行训练获得的上述神经网络声学模型,可以实现对Cross-talk噪声的鲁棒特征提取和鲁棒声学模型的构建,通过抑制非目标流对特征提取的影响可以达到提高声学模型的鲁棒性的目的。In order to overcome the above-mentioned shortcoming of the prior art, the object of the present invention is to provide a kind of construction method of deep long short-term memory recurrent neural network acoustic model based on selective attention principle, set up the deep long short-term memory recurrent neural network acoustic model based on selective attention principle The network acoustic model, by adding the attention gate unit in the deep long short-term memory recurrent neural network acoustic model, to represent the instantaneous functional changes of auditory cortex neurons, the difference between the attention gate unit and other gate units is that other gate units and time series One-to-one correspondence, while the attention gate unit reflects the short-term plasticity effect, so there is an interval in the time series. The above-mentioned neural network acoustic model obtained by training a large amount of speech data containing Cross-talk noise can realize robust feature extraction and construction of a robust acoustic model for Cross-talk noise. Influence can achieve the purpose of improving the robustness of the acoustic model.
为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:
一种基于选择性注意原理的连续语音识别方法,包括如下步骤:A continuous speech recognition method based on the principle of selective attention, comprising the steps of:
第一步,构建基于选择性注意原理的深度长短期记忆循环神经网络The first step is to construct a deep long short-term memory recurrent neural network based on the principle of selective attention
从输入到隐藏层定义为一个长短期记忆循环神经网络,深度指的是每个长短期记忆循环神经网络的输出为下一个长短期记忆循环神经网络的输入,如此重复,最后一个长短期记忆循环神经网络的输出作为整个系统的输出;在每一个长短期记忆循环神经网络中,语音信号xt为t时刻的输入,xt-1为t-1时刻的输入,以此类推,总时间长度上的输入为x=[x1,...,xT]其中t∈[1,T],T为语音信号的总时间长度;t时刻的长短期记忆循环神经网络由注意门、输入门、输出门、遗忘门、记忆细胞、tanh函数、隐藏层、乘法器组成,t-1时刻的长短期记忆循环神经网络由输入门、输出门、遗忘门、记忆细胞、tanh函数、隐藏层、乘法器组成;总时间长度上的隐藏层输出为y=[y1,...,yT];From the input to the hidden layer is defined as a long-term short-term memory recurrent neural network, and the depth refers to the output of each long-term short-term memory recurrent neural network as the input of the next long-term short-term memory recurrent neural network, so repeated, the last long-term short-term memory recurrent neural network The output of the neural network is the output of the whole system; in each long-short-term memory recurrent neural network, the speech signal x t is the input at time t, x t-1 is the input at time t-1, and so on, the total time length The input above is x=[x 1 ,...,x T ] where t∈[1,T], T is the total time length of the speech signal; the long short-term memory recurrent neural network at time t consists of attention gate, input gate , output gate, forget gate, memory cell, tanh function, hidden layer, multiplier, the long short-term memory cycle neural network at t-1 time is composed of input gate, output gate, forget gate, memory cell, tanh function, hidden layer, Composed of multipliers; the hidden layer output on the total time length is y=[y 1 ,...,y T ];
第二步,构建基于选择性注意原理的深度长短期记忆循环神经网络声学模型The second step is to construct a deep long short-term memory recurrent neural network acoustic model based on the principle of selective attention
在第一步的基础上,每间隔s时刻对应的深度长短期记忆循环神经网络存在注意门,其他时刻的深度长短期记忆循环神经网络不存在注意门,即,基于选择性注意原理的深度长短期记忆循环神经网络声学模型由间隔存在注意门的深度长短期记忆循环神经网络组成。On the basis of the first step, there is an attention gate in the deep long-term short-term memory recurrent neural network corresponding to each interval s, and there is no attention gate in the deep long-term short-term memory recurrent neural network at other times, that is, the depth long-term short-term memory recurrent neural network based on the principle of selective attention has no attention gate. The short-term memory recurrent neural network acoustic model consists of a deep long-short-term memory recurrent neural network with interval-existing attention gates.
如何在复杂环境干扰,特别是在cross-talk噪声干扰下进行识别,一直是语音识别的难点之一,阻碍了语音识别的大规模应用。与现有技术相比,本发明借鉴人脑在处理复杂场景的语音时存在选择性注意的现象来实现抑制非目标流和增强目标流,通过在深度长短期记忆递归神经网络声学模型中增加注意门单元,来表征听觉皮层神经元的瞬时功能改变,注意门单元与其他门单元不同之处在于,其他门单元与时间序列一一对应,而注意门单元体现的是短期可塑性效应,因此在时间序列上存在间隔。在一些包含Cross-talk噪声的连续语音识别数据集上采用这种方法,可以获得比深度神经网络方法更好的性能。How to recognize in a complex environment, especially under the interference of cross-talk noise, has always been one of the difficulties in speech recognition, which hinders the large-scale application of speech recognition. Compared with the prior art, the present invention learns from the phenomenon of selective attention in the human brain when processing speech in complex scenes to suppress non-target flow and enhance target flow. The gate unit is used to represent the instantaneous functional changes of neurons in the auditory cortex. The difference between the attention gate unit and other gate units is that the other gate units correspond to the time series one by one, while the attention gate unit reflects the short-term plasticity effect, so in time There are gaps in the sequence. Using this method on some continuous speech recognition datasets containing Cross-talk noise can achieve better performance than the deep neural network method.
附图说明Description of drawings
图1是本发明的基于选择性注意原理的深度长短期记忆循环神经网络流程图。Fig. 1 is the flowchart of deep long short-term memory recurrent neural network based on selective attention principle of the present invention.
图2是本发明的基于选择性注意原理的深度长短期记忆神经网络声学模型流程图。Fig. 2 is a flow chart of the deep long short-term memory neural network acoustic model based on the selective attention principle of the present invention.
具体实施方式detailed description
下面结合附图和实施例详细说明本发明的实施方式。The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.
本发明利用基于选择性注意原理的深度长短期记忆循环神经网络声学模型,实现了连续语音识别。但本发明提供的模型及方法不局限于连续语音识别,也可以是任何与语音识别有关的方法和装置。The invention realizes continuous speech recognition by utilizing a deep long-short-term memory cycle neural network acoustic model based on the principle of selective attention. However, the models and methods provided by the present invention are not limited to continuous speech recognition, and can also be any methods and devices related to speech recognition.
本发明主要包括如下步骤:The present invention mainly comprises the steps:
第一步,构建基于选择性注意原理的深度长短期记忆循环神经网络The first step is to construct a deep long short-term memory recurrent neural network based on the principle of selective attention
如图1所示,输入101和输入102为t时刻和t-1时刻语音信号输入xt和xt-1(t∈[1,T],T为语音信号的总时间长度);t时刻的长短期记忆循环神经网络由注意门103、输入门104、遗忘门105、记忆细胞106、输出门107、tanh函数108、tanh函数109、隐藏层110、乘法器122以及乘法器123组成;t-1时刻的长短期记忆循环神经网络由输入门112、遗忘门113、记忆细胞114、输出门115、tanh函数116、tanh函数117、隐藏层118、乘法器120以及乘法器121组成。t时刻和t-1时刻隐藏层输出分别为输出111和输出119。As shown in Figure 1, input 101 and input 102 are voice signal input x t and x t-1 (t ∈ [1, T], T is the total time length of voice signal) at t moment and t-1 moment; The long short-term memory recurrent neural network is made up of attention gate 103, input gate 104, forget gate 105, memory cell 106, output gate 107, tanh function 108, tanh function 109, hidden layer 110, multiplier 122 and multiplier 123; The long-short-term memory recurrent neural network at time -1 consists of an input gate 112 , a forgetting gate 113 , a memory cell 114 , an output gate 115 , a tanh function 116 , a tanh function 117 , a hidden layer 118 , a multiplier 120 and a multiplier 121 . The output of the hidden layer at time t and time t-1 is output 111 and output 119 respectively.
其中,输入102同时作为输入门112、遗忘门113、输出门115以及tanh函数116的输入,输入门112的输出与tanh函数116的输出送入乘法器120,运算后的输出作为记忆细胞114的输入,记忆细胞114的输出作为tanh函数117的输入,tanh函数117的输出和输出门115的输出送入乘法器121,运算后的输出作为隐藏层118的输入,隐藏层118的输出即为输出119。Among them, the input 102 is simultaneously used as the input of the input gate 112, the forget gate 113, the output gate 115 and the tanh function 116, the output of the input gate 112 and the output of the tanh function 116 are sent to the multiplier 120, and the output after the operation is used as the output of the memory cell 114. Input, the output of the memory cell 114 is used as the input of the tanh function 117, the output of the tanh function 117 and the output of the output gate 115 are sent to the multiplier 121, and the output after the operation is used as the input of the hidden layer 118, and the output of the hidden layer 118 is the output 119.
输入101、记忆细胞114的输出以及乘法器121的输出共同作为注意门103的输入,注意门103的输出和乘法器121的输出共同作为tanh函数108的输入,注意门103的输出、记忆细胞114的输出和乘法器121的输出分别共同作为输入门104、遗忘门105以及输出门107的输入,遗忘门105的输出和记忆细胞114的输出送入乘法器124,输入门104的输出与tanh函数108的输出送入乘法器122,乘法器124的输出和乘法器122的输出作为记忆细胞106的输入,记忆细胞106的输出作为tanh函数109的输入,tanh函数109的输出和输出门107的输出送入乘法器123,乘法器123的输出作为隐藏层110的输入,隐藏层110的输出即为输出111。The output of the input 101, the memory cell 114 and the output of the multiplier 121 are jointly used as the input of the attention gate 103, the output of the attention gate 103 and the output of the multiplier 121 are jointly used as the input of the tanh function 108, the output of the attention gate 103, the memory cell 114 The output of the output of the multiplier 121 and the output of the multiplier 121 are jointly used as the input of the input gate 104, the forget gate 105 and the output gate 107 respectively, the output of the forget gate 105 and the output of the memory cell 114 are sent to the multiplier 124, the output of the input gate 104 and the tanh function The output of 108 is sent to multiplier 122, and the output of multiplier 124 and the output of multiplier 122 are as the input of memory cell 106, and the output of memory cell 106 is as the input of tanh function 109, the output of tanh function 109 and the output of output gate 107 The output of the multiplier 123 is used as the input of the hidden layer 110, and the output of the hidden layer 110 is the output 111.
即:在t∈[1,T]时刻的参数按照如下公式计算:That is: the parameters at time t∈[1,T] are calculated according to the following formula:
Gatten_t=sigmoid(Waxxt+Wam mt-1+Wac Cellt-1+ba)G atten_t =sigmoid(W ax x t +W am m t-1 +W ac Cell t-1 +b a )
Ginput_t=sigmoid(Wia Gatten_t+Wim mt-1+Wic Cellt-1+bi)G input_t =sigmoid(W ia G atten_t +W im m t-1 +W ic Cell t-1 +b i )
Gforget_t=sigmoid(Wfa Gatten_t+Wfm mt-1+Wfc Cellt-1+bf)G forget_t =sigmoid(W fa G atten_t +W fm m t-1 +W fc Cell t-1 +b f )
Cellt=Gforget_t⊙Cellt-1+Ginput_t⊙tanh(Wca Gatten_t+Wcm mt-1+bc)Cell t =G forget_t ⊙Cell t-1 +G input_t ⊙tanh(W ca G atten_t +W cm m t-1 +b c )
Goutput_t=sigmoid(Woa Gatten_t+Wom mt-1+Woc Cellt-1+bo)G output_t =sigmoid(W oa G atten_t +W om m t-1 +W oc Cell t-1 +b o )
mt=Goutput_t⊙tanh(Cellt)m t =G output_t ⊙tanh(Cell t )
yt=softmaxk(Wym mt+by)y t =softmax k (W ym m t +b y )
其中Gatten_t为t时刻注意门103的输出,Ginput_t为t时刻输入门104的输出,Gforget_t为t时刻遗忘门105的输出,Cellt为t时刻记忆细胞106的输出,Goutput_t为t时刻输出门107的输出,mt为t时刻隐藏层110的输入,yt为t时刻的输出111;xt为t时刻的输入101,mt-1为t-1时刻隐藏层118的输入,Cellt-1为t-1时刻记忆细胞114的输出;Wax为t时刻注意门a与t时刻输入x之间的权重,Wam为t时刻注意门a与t-1时刻隐藏层输入m之间的权重,Wac为t时刻注意门a与t-1时刻记忆细胞c之间的权重,Wia为t时刻输入门i与t时刻注意门a之间的权重,Wim为t时刻输入门i与t-1时刻隐藏层输入m之间的权重,Wic为t时刻输入门i与t-1时刻记忆细胞c之间的权重,Wfa为t时刻遗忘门f与t时刻注意门a之间的权重,Wfm为t时刻遗忘门f与t-1时刻隐藏层输入m之间的权重,Wfc为t时刻遗忘门f与t-1时刻记忆细胞c之间的权重,Wca为t时刻记忆细胞c与t时刻注意门a之间的权重,Wcm为t时刻记忆细胞c与t-1时刻隐藏层输入m之间的权重,Woa为t时刻输出门o与t时刻注意门a之间的权重,Wom为t时刻输出门o与t-1时刻隐藏层输入m之间的权重,Woc为t时刻输出门o与t-1时刻记忆细胞c之间的权重;ba为注意门a的偏差量,bi为输入门i的偏差量,bf为遗忘门f的偏差量,bc为记忆细胞c的偏差量,bo为输出门o的偏差量,by为输出y的偏差量,不同的b代表不同的偏差量;且有 其中xk表示第k∈[1,K]个softmax函数的输入,,l∈[1,K],表示对全部求和;⊙代表矩阵元素相乘。Wherein G atten_t is the output of the attention gate 103 at the time t, G input_t is the output of the input gate 104 at the time t, G forget_t is the output of the forgetting gate 105 at the time t, Cell t is the output of the memory cell 106 at the time t, and G output_t is the time at t The output of output gate 107, m t is the input of hidden layer 110 at time t, y t is the output 111 at time t; x t is the input 101 at time t, m t-1 is the input of hidden layer 118 at time t-1, Cell t-1 is the output of the memory cell 114 at time t-1; W ax is the weight between the attention gate a at time t and the input x at time t; W am is the attention gate a at time t and the hidden layer input m at time t-1 W ac is the weight between attention gate a at time t and memory cell c at time t-1, W ia is the weight between input gate i at time t and attention gate a at time t, W im is time t The weight between input gate i and hidden layer input m at time t-1, W ic is the weight between input gate i at time t and memory cell c at time t-1, W fa is the forgetting gate f at time t and attention at time t The weight between gate a, W fm is the weight between the forgotten gate f at time t and the hidden layer input m at time t-1, W fc is the weight between the forgotten gate f at time t and memory cell c at time t-1, W ca is the weight between memory cell c at time t and attention gate a at time t, W cm is the weight between memory cell c at time t and hidden layer input m at time t-1, W oa is the output gate o and gate a at time t Pay attention to the weight between the gate a at time t, W om is the weight between the output gate o at time t and the hidden layer input m at time t-1, W oc is the weight between the output gate o at time t and memory cell c at time t-1 b a is the deviation of attention gate a, b i is the deviation of input gate i, b f is the deviation of forgetting gate f, b c is the deviation of memory cell c, b o is the deviation of output gate o Deviation, b y is the deviation of the output y, and different b represents different deviations; and there are where x k represents the input of the k∈[1,K]th softmax function, l∈[1,K], express to all Sum; ⊙ stands for matrix element multiplication.
第二步,构建基于选择性注意原理的深度长短期记忆循环神经网络声学模型The second step is to construct a deep long short-term memory recurrent neural network acoustic model based on the principle of selective attention
在第一步的基础上,每间隔s(s=5)时刻对应的深度长短期记忆循环神经网络存在注意门,其他时刻的深度长短期记忆循环神经网络不存在注意门,即,基于选择性注意原理的深度长短期记忆循环神经网络声学模型由间隔存在注意门的深度长短期记忆循环神经网络组成。如图2所示为所建立的基于选择性注意原理的深度长短期记忆循环神经网络声学模型,t时刻的深度长短期记忆循环神经网络存在注意门201,t-s时刻的深度长短期记忆循环神经网络存在注意门202,如此循环。On the basis of the first step, there is an attention gate in the deep long-short-term memory recurrent neural network corresponding to each interval s (s=5), and there is no attention gate in the deep long-term short-term memory recurrent neural network at other moments, that is, based on selectivity Attention-based deep long-term short-term memory recurrent neural network The acoustic model consists of a deep long-term short-term memory recurrent neural network with attention gates at intervals. As shown in Figure 2, the established acoustic model of the deep long-term short-term memory recurrent neural network based on the principle of selective attention, the deep long-term short-term memory recurrent neural network at time t has an attention gate 201, and the deep long-term short-term memory recurrent neural network at time t-s There is an attention gate 202, and so on.
Claims (2)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510122982.6A CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
PCT/CN2015/092381 WO2016145850A1 (en) | 2015-03-19 | 2015-10-21 | Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510122982.6A CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104700828A CN104700828A (en) | 2015-06-10 |
CN104700828B true CN104700828B (en) | 2018-01-12 |
Family
ID=53347887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510122982.6A Expired - Fee Related CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104700828B (en) |
WO (1) | WO2016145850A1 (en) |
Families Citing this family (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104700828B (en) * | 2015-03-19 | 2018-01-12 | 清华大学 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
CN105185374B (en) * | 2015-09-11 | 2017-03-29 | 百度在线网络技术(北京)有限公司 | Prosody hierarchy mask method and device |
KR102313028B1 (en) * | 2015-10-29 | 2021-10-13 | 삼성에스디에스 주식회사 | System and method for voice recognition |
CN105354277B (en) * | 2015-10-30 | 2020-11-06 | 中国船舶重工集团公司第七0九研究所 | Recommendation method and system based on recurrent neural network |
KR102494139B1 (en) * | 2015-11-06 | 2023-01-31 | 삼성전자주식회사 | Apparatus and method for training neural network, apparatus and method for speech recognition |
US10043512B2 (en) * | 2015-11-12 | 2018-08-07 | Google Llc | Generating target sequences from input sequences using partial conditioning |
CN105513591B (en) * | 2015-12-21 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model |
EP3374932B1 (en) * | 2016-02-03 | 2022-03-16 | Google LLC | Compressed recurrent neural network models |
EP3398118B1 (en) * | 2016-02-04 | 2023-07-12 | Deepmind Technologies Limited | Associative long short-term memory neural network layers |
US9799327B1 (en) * | 2016-02-26 | 2017-10-24 | Google Inc. | Speech recognition with attention-based recurrent neural networks |
US10373612B2 (en) * | 2016-03-21 | 2019-08-06 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
JP6480644B1 (en) * | 2016-03-23 | 2019-03-13 | グーグル エルエルシー | Adaptive audio enhancement for multi-channel speech recognition |
CN107293291B (en) * | 2016-03-30 | 2021-03-16 | 中国科学院声学研究所 | An end-to-end speech recognition method based on adaptive learning rate |
CN105956469B (en) * | 2016-04-27 | 2019-04-26 | 百度在线网络技术(北京)有限公司 | File security recognition methods and device |
CN106096729B (en) * | 2016-06-06 | 2018-11-20 | 天津科技大学 | A kind of depth-size strategy learning method towards complex task in extensive environment |
US11222253B2 (en) | 2016-11-03 | 2022-01-11 | Salesforce.Com, Inc. | Deep neural network model for processing data through multiple linguistic task hierarchies |
CN108062505B (en) | 2016-11-09 | 2022-03-18 | 微软技术许可有限责任公司 | Method and apparatus for neural network based motion detection |
CN106650789B (en) * | 2016-11-16 | 2023-04-07 | 同济大学 | Image description generation method based on depth LSTM network |
KR102692670B1 (en) * | 2017-01-04 | 2024-08-06 | 삼성전자주식회사 | Voice recognizing method and voice recognizing appratus |
US10241684B2 (en) * | 2017-01-12 | 2019-03-26 | Samsung Electronics Co., Ltd | System and method for higher order long short-term memory (LSTM) network |
US10769522B2 (en) | 2017-02-17 | 2020-09-08 | Wipro Limited | Method and system for determining classification of text |
CN107293288B (en) * | 2017-06-09 | 2020-04-21 | 清华大学 | An Acoustic Model Modeling Method of Residual Long Short-Term Memory Recurrent Neural Network |
CN107492121B (en) * | 2017-07-03 | 2020-12-29 | 广州新节奏智能科技股份有限公司 | Two-dimensional human body bone point positioning method of monocular depth video |
CN107484017B (en) * | 2017-07-25 | 2020-05-26 | 天津大学 | A Supervised Video Summary Generation Method Based on Attention Model |
CN109460812B (en) * | 2017-09-06 | 2021-09-14 | 富士通株式会社 | Intermediate information analysis device, optimization device, and feature visualization device for neural network |
CN107563122B (en) * | 2017-09-20 | 2020-05-19 | 长沙学院 | Crime prediction method based on interleaving time sequence local connection cyclic neural network |
CN107993636B (en) * | 2017-11-01 | 2021-12-31 | 天津大学 | Recursive neural network-based music score modeling and generating method |
CN109243494B (en) * | 2018-10-30 | 2022-10-11 | 南京工程学院 | Children emotion recognition method based on multi-attention mechanism long-time memory network |
CN109243493B (en) * | 2018-10-30 | 2022-09-16 | 南京工程学院 | Infant crying emotion recognition method based on improved long-time and short-time memory network |
CN109614485B (en) * | 2018-11-19 | 2023-03-14 | 中山大学 | Sentence matching method and device of hierarchical Attention based on grammar structure |
CN109543165B (en) * | 2018-11-21 | 2022-09-23 | 中国人民解放军战略支援部队信息工程大学 | Text generation method and device based on circular convolution attention model |
CN109523995B (en) * | 2018-12-26 | 2019-07-09 | 出门问问信息科技有限公司 | Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment |
CN109866713A (en) * | 2019-03-21 | 2019-06-11 | 斑马网络技术有限公司 | Safety detection method and device, vehicle |
CN110135634B (en) * | 2019-04-29 | 2022-01-25 | 广东电网有限责任公司电网规划研究中心 | Medium-and long-term power load prediction device |
CN110085249B (en) * | 2019-05-09 | 2021-03-16 | 南京工程学院 | Single-channel speech enhancement method of recurrent neural network based on attention gating |
CN110473554B (en) * | 2019-08-08 | 2022-01-25 | Oppo广东移动通信有限公司 | Audio verification method and device, storage medium and electronic equipment |
CN110473529B (en) * | 2019-09-09 | 2021-11-05 | 北京中科智极科技有限公司 | Stream type voice transcription system based on self-attention mechanism |
CN111079906B (en) * | 2019-12-30 | 2023-05-05 | 燕山大学 | Method and system for predicting specific surface area of cement products based on long-short-term memory network |
CN111314345B (en) * | 2020-02-19 | 2022-09-16 | 安徽大学 | Method and device for protecting sequence data privacy, computer equipment and storage medium |
CN111311009B (en) * | 2020-02-24 | 2023-05-26 | 广东工业大学 | Pedestrian track prediction method based on long-term and short-term memory |
CN111429938B (en) * | 2020-03-06 | 2022-09-13 | 江苏大学 | Single-channel voice separation method and device and electronic equipment |
CN111695607B (en) * | 2020-05-25 | 2025-01-21 | 北京信息科技大学 | Electronic equipment fault prediction method based on LSTM enhanced model |
CN111709754B (en) * | 2020-06-12 | 2023-08-25 | 中国建设银行股份有限公司 | User behavior feature extraction method, device, equipment and system |
CN111814849B (en) * | 2020-06-22 | 2024-02-06 | 浙江大学 | A fault early warning method for key components of wind turbines based on DA-RNN |
CN111985610B (en) * | 2020-07-15 | 2024-05-07 | 中国石油大学(北京) | Oil pumping well pump efficiency prediction system and method based on time sequence data |
CN111930602B (en) * | 2020-08-13 | 2023-09-22 | 中国工商银行股份有限公司 | Performance index prediction method and device |
CN112001482B (en) * | 2020-08-14 | 2024-05-24 | 佳都科技集团股份有限公司 | Vibration prediction and model training method, device, computer equipment and storage medium |
CN112214852B (en) * | 2020-10-09 | 2022-10-14 | 电子科技大学 | A Recession Prediction Method of Turbomachinery Performance Considering Recession Rate |
CN112382265B (en) * | 2020-10-21 | 2024-05-28 | 西安交通大学 | Active noise reduction method, storage medium and system based on deep cyclic neural network |
CN112434784A (en) * | 2020-10-22 | 2021-03-02 | 暨南大学 | Deep student performance prediction method based on multilayer LSTM |
CN112906291B (en) * | 2021-01-25 | 2023-05-19 | 武汉纺织大学 | A neural network-based modeling method and device |
CN112784472B (en) * | 2021-01-27 | 2023-03-24 | 电子科技大学 | Simulation method for simulating quantum condition principal equation in quantum transport process by using cyclic neural network |
CN113792772B (en) * | 2021-09-01 | 2023-11-03 | 中国船舶重工集团公司第七一六研究所 | Cold and hot data identification method for data hierarchical hybrid storage |
CN114511067A (en) * | 2022-02-02 | 2022-05-17 | 上海图灵智算量子科技有限公司 | Method and system for realizing long-term and short-term memory based on quantum |
CN115034129B (en) * | 2022-05-17 | 2024-08-20 | 齐鲁工业大学 | NOx emission concentration soft measurement method for thermal power plant denitration device |
US11995658B2 (en) * | 2022-05-25 | 2024-05-28 | Dell Products L.P. | Machine learning-based detection of potentially malicious behavior on an e-commerce platform |
CN115563475A (en) * | 2022-10-25 | 2023-01-03 | 南京工业大学 | Pressure soft sensor of excavator hydraulic system |
CN117849628B (en) * | 2024-03-08 | 2024-05-10 | 河南科技学院 | Lithium ion battery health state estimation method based on time sequence transformation memory network |
CN118824493B (en) * | 2024-09-18 | 2024-12-06 | 苏州阿基米德网络科技有限公司 | Medical equipment scheduling prediction method based on dynamic long-short time series attention |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102983819A (en) * | 2012-11-08 | 2013-03-20 | 南京航空航天大学 | Imitating method of power amplifier and imitating device of power amplifier |
CN103049792A (en) * | 2011-11-26 | 2013-04-17 | 微软公司 | Discriminative pretraining of Deep Neural Network |
CN103680496A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Deep-neural-network-based acoustic model training method, hosts and system |
CN104217226A (en) * | 2014-09-09 | 2014-12-17 | 天津大学 | Dialogue act identification method based on deep neural networks and conditional random fields |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7647284B2 (en) * | 2007-01-12 | 2010-01-12 | Toyota Motor Engineering & Manufacturing North America, Inc. | Fixed-weight recurrent neural network controller with fixed long-term and adaptive short-term memory |
CN104700828B (en) * | 2015-03-19 | 2018-01-12 | 清华大学 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
-
2015
- 2015-03-19 CN CN201510122982.6A patent/CN104700828B/en not_active Expired - Fee Related
- 2015-10-21 WO PCT/CN2015/092381 patent/WO2016145850A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049792A (en) * | 2011-11-26 | 2013-04-17 | 微软公司 | Discriminative pretraining of Deep Neural Network |
CN102983819A (en) * | 2012-11-08 | 2013-03-20 | 南京航空航天大学 | Imitating method of power amplifier and imitating device of power amplifier |
CN103680496A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Deep-neural-network-based acoustic model training method, hosts and system |
CN104217226A (en) * | 2014-09-09 | 2014-12-17 | 天津大学 | Dialogue act identification method based on deep neural networks and conditional random fields |
Non-Patent Citations (1)
Title |
---|
"Towards end-to-end speech recognition with recurrent neural networks";Alex Graves等;《Proceedings of the 31st International Conference on Machine》;20141231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104700828A (en) | 2015-06-10 |
WO2016145850A1 (en) | 2016-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104700828B (en) | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle | |
CN104538028B (en) | A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term | |
JP7337953B2 (en) | Speech recognition method and device, neural network training method and device, and computer program | |
US11715486B2 (en) | Convolutional, long short-term memory, fully connected deep neural networks | |
Sun et al. | Speech emotion recognition based on DNN-decision tree SVM model | |
Gelly et al. | Optimization of RNN-based speech activity detection | |
Nakkiran et al. | Compressing deep neural networks using a rank-constrained topology. | |
Oord et al. | Parallel wavenet: Fast high-fidelity speech synthesis | |
US10262260B2 (en) | Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition | |
CN107293288B (en) | An Acoustic Model Modeling Method of Residual Long Short-Term Memory Recurrent Neural Network | |
EP3459077B1 (en) | Permutation invariant training for talker-independent multi-talker speech separation | |
CN111243579B (en) | Time domain single-channel multi-speaker voice recognition method and system | |
CN105139864B (en) | Audio recognition method and device | |
Sainath et al. | Convolutional, long short-term memory, fully connected deep neural networks | |
Guiming et al. | Speech recognition based on convolutional neural networks | |
Li et al. | A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition | |
JP7143091B2 (en) | Method and apparatus for training acoustic models | |
KR20160069329A (en) | Method and apparatus for training language model, method and apparatus for recognizing speech | |
US11205419B2 (en) | Low energy deep-learning networks for generating auditory features for audio processing pipelines | |
Han et al. | Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification | |
CN110853656A (en) | Audio Tampering Recognition Algorithm Based on Improved Neural Network | |
Kang et al. | Gated recurrent units based hybrid acoustic models for robust speech recognition | |
Li et al. | Improving long short-term memory networks using maxout units for large vocabulary speech recognition | |
Bijwadia et al. | Unified end-to-end speech recognition and endpointing for fast and efficient speech systems | |
Cornell et al. | Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180112 |