[go: up one dir, main page]

CN103310272B - Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved - Google Patents

Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved Download PDF

Info

Publication number
CN103310272B
CN103310272B CN201310274341.3A CN201310274341A CN103310272B CN 103310272 B CN103310272 B CN 103310272B CN 201310274341 A CN201310274341 A CN 201310274341A CN 103310272 B CN103310272 B CN 103310272B
Authority
CN
China
Prior art keywords
diva
model
knowledge base
sound channel
pronunciation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310274341.3A
Other languages
Chinese (zh)
Other versions
CN103310272A (en
Inventor
张少白
徐歆冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201310274341.3A priority Critical patent/CN103310272B/en
Publication of CN103310272A publication Critical patent/CN103310272A/en
Application granted granted Critical
Publication of CN103310272B publication Critical patent/CN103310272B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrophonic Musical Instruments (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention relates to a kind of manner of articulation, be based especially on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved. The described DIVA neural network model manner of articulation improved based on sound channel action knowledge base utilizes the DIVA neural network model after the improvement that with the addition of sound channel action knowledge base, for the voice not having in voice mapping ensemblen, revised auditory feedback information is obtained in conjunction with disturbance factor, recycle revised auditory feedback information training neutral net, decrease the model frequency of training when producing pronunciation, improve pronunciation accuracy.

Description

基于声道动作知识库改进的DIVA神经网络模型发音方法Pronunciation method based on improved DIVA neural network model based on vocal tract action knowledge base

技术领域technical field

本发明涉及一种发音方法,尤其是基于声道动作知识库改进的DIVA神经网络模型发音方法。The invention relates to a pronunciation method, in particular to an improved DIVA neural network model pronunciation method based on vocal tract action knowledge base.

背景技术Background technique

神经计算学语音模型(Neuro-computationalspeechmodel)是用计算机仿真实现语音生成、感知和获取等一系列复杂过程的模型。神经计算学语音模型的组成十分复杂,至少包括一个认知部分、一个运动处理部分和一个感官处理部分:认知部分的作用是在语音生成和语音感知阶段产生神经激活(或产生音素表征);运动处理部分开始于根据产生音素表征激活规划运动,结束于特定音素项对应的发音器官运动;感官处理部分包括根据外部声音信号产生相应的听觉表征并激活相应的音素表征。Neuro-computational speech model (Neuro-computational speech model) is a model that uses computer simulation to realize a series of complex processes such as speech generation, perception and acquisition. The composition of the neurocomputational speech model is very complex, including at least a cognitive part, a motor processing part and a sensory processing part: the role of the cognitive part is to generate neural activation (or generate phoneme representations) during the speech generation and speech perception stages; The motor processing part starts with the activation planning movement according to the generated phoneme representation, and ends with the movement of the vocal organs corresponding to the specific phoneme item; the sensory processing part includes generating the corresponding auditory representation according to the external sound signal and activating the corresponding phoneme representation.

到目前为止,对神经计算语音模型的研究已经取得了很多成果,其中DIVA(DirectionsIntoofArticulators)模型就是一种比较先进的语音生成、感知及获取的神经计算学语音模型。So far, research on neural computing speech models has achieved many results, among which DIVA (Directions Intoof Articulators) model is a relatively advanced neural computing speech model for speech generation, perception and acquisition.

DIVA模型是波士顿大学语音实验室Frank.Guenther教授及其团队开发的。在目前真正具有生物物理学意义的神经计算语音模型中,DIVA模型的定义和测试是最彻底的,而且它还是唯一一种应用伪逆控制技术的自适应神经网络模型。DIVA模型可以描述语音获取、感知及生成过程中的相关的处理过程,并可以通过控制模拟声道生成音素、音节或单词。图1中给出了DIVA模型的组成框图。The DIVA model was developed by Professor Frank. Guenther and his team at the Speech Laboratory at Boston University. Among the neural computing speech models that are truly biophysically meaningful, the DIVA model is the most thoroughly defined and tested, and it is also the only adaptive neural network model that uses pseudo-inverse control technology. The DIVA model can describe the relevant processing processes in the process of speech acquisition, perception and generation, and can generate phonemes, syllables or words by controlling the analog vocal tract. Figure 1 shows the block diagram of the DIVA model.

DIVA模型的特点包括:Features of the DIVA model include:

模型包括前馈控制和反馈控制两个子系统;The model includes two subsystems of feedforward control and feedback control;

模型的目标区域是由基频F0、前三个共振峰频率和对应的体觉目标组成;The target area of the model is composed of the fundamental frequency F0, the first three formant frequencies and the corresponding somatosensory targets;

模型的输入是单词、音节或音素。尽管迄今为止模型所聚焦的对象仍是短且简单的语音序列,但其对语言的影响(即韵律和韵律学结构、词法和词界等)必定涉及到更长更复杂的结构,而且这些结构已经在模型中被考虑;The input to the model is words, syllables or phonemes. Although the models have so far focused on short and simple phonetic sequences, their impact on language (i.e. prosody and prosodic structure, morphology and word boundaries, etc.) necessarily involves longer and more complex structures that has been considered in the model;

模型对协同发音以及其关联现象的解释类似于Keating的窗口模型,但在目标如何被学习的解释方面却比窗口模型更具有优势;The model's explanation of co-articulation and its associated phenomena is similar to Keating's window model, but it has more advantages than the window model in explaining how the target is learned;

DIVA模型通过充分应用对感知系统的学习获得了前所未有的成功。它所依据的方法是对已经存在的听觉声音进行分类,并且不用解释如何被学习。The DIVA model has achieved unprecedented success by fully applying the learning of the perception system. It is based on categorizing auditory sounds that already exist, without explaining how they are learned.

DIVA模型存在一些缺陷,这些缺陷主要表现在以下几点:对于模型而言,假设所有在给定点给出的状态信息都是瞬间可用的;假定模型不存在神经延迟而且系统使用瞬时反馈控制;用于控制的基准框架只能选择发音器官感觉参考框架空间或听觉空间参考框架,两者不能同时并存;关于皮层与子皮层处理过程的分割以及大脑区域成分的关联性的描述相对粗糙。There are some defects in the DIVA model, which are mainly manifested in the following points: For the model, it is assumed that all state information given at a given point is available instantaneously; the model is assumed to have no neural delay and the system uses instantaneous feedback control; The reference frame for control can only choose the sensory reference frame space of the articulation organ or the auditory space reference frame, and the two cannot coexist at the same time; the description of the division of cortical and subcortical processing processes and the correlation of brain region components is relatively rough.

发明内容Contents of the invention

本发明所要解决的技术问题是针对上述背景技术的不足,提供了基于声道动作知识库改进的DIVA神经网络模型发音方法。The technical problem to be solved by the present invention is to provide an improved DIVA neural network model pronunciation method based on the vocal tract action knowledge base for the deficiency of the above-mentioned background technology.

本发明为实现上述发明目的采用如下技术方案:The present invention adopts following technical scheme for realizing above-mentioned purpose of the invention:

基于声道动作知识库改进的DIVA神经网络模型发音方法,包括如下步骤:The improved DIVA neural network model pronunciation method based on vocal tract action knowledge base comprises the following steps:

步骤1,构建改进的DIVA神经计算学语音模型:在DIVA神经计算学语音模型中添加作用于模拟发音器官的声道动作知识库,Step 1, build an improved DIVA neural computing speech model: add the vocal tract action knowledge base that acts on the simulated pronunciation organs to the DIVA neural computing speech model,

所述声道动作知识库,激活始于语音项的音素表征的激活,在处理高频音节时已经获取高频音节的规划运动,语音映射集激活规划运动,每个音节对应的声道动作产生运动神经元激活模式,神经肌肉处理导致了发音器官的运动并允许通过发音-听觉模型生成语音信号,在处理低频音节时通过语音映射集激活相似音节语音学上的规划来激活规划运动;In the vocal tract action knowledge base, the activation starts from the activation of the phoneme representation of the phonetic item, the planned motion of the high-frequency syllable has been obtained when the high-frequency syllable is processed, the voice mapping set activates the planned motion, and the vocal tract action corresponding to each syllable is generated motoneuron activation patterns, neuromuscular processing leading to movement of articulatory organs and allowing the generation of speech signals through articulatory-auditory models, activation of planned movements by activating phonetic plans of similar syllables through phonetic maps when processing low-frequency syllables;

步骤2,采集发音单元的共振峰频率,作为DIVA神经计算学语音模型的输入量;Step 2, collecting the formant frequency of the articulation unit as the input of the DIVA neural computing speech model;

步骤3,将DIVA神经网络模型的输入量映射到语音映射集中,初始化语音映射集中所有的音素单元为未激活状态;Step 3, the input volume of the DIVA neural network model is mapped to the voice mapping set, and all phoneme units in the voice mapping set are initialized to be inactive;

步骤4,输入任意发音单元的振峰频率,训练基于声道动作知识库改进的DIVA神经计算学语音模型:Step 4, input the vibration peak frequency of any pronunciation unit, and train the improved DIVA neural computing speech model based on the vocal tract action knowledge base:

当语音映射集中存在于输入的发音单元的振峰频率相同的因素单元时,模拟发音器官直接经过前馈控制发出输入的发音单元;When there are factor units with the same vibration peak frequency in the input pronunciation unit in the phonetic mapping, the simulated pronunciation organ sends out the input pronunciation unit directly through feed-forward control;

否则,模拟发音器官经过反馈控制学习发出输入的发音单元。Otherwise, the simulated articulation organ learns to emit the input articulation unit through feedback control.

所述基于声道动作知识库改进的DIVA神经网络模型发音方法,步骤4中所述的模拟发音器官经过反馈控制发出输入的发音单元具体实施方式如下:The improved DIVA neural network model pronunciation method based on the vocal tract action knowledge base, the simulated pronunciation organ described in step 4 sends out the pronunciation unit of input through feedback control The specific implementation is as follows:

步骤A,对模拟发音器官施以扰动发音单元,采集DIVA模型的听觉反馈信息、体觉反馈信息,体觉误差映射集由体觉目标区域以及体觉反馈信息得到体觉反馈命令;Step A: Disturb the pronunciation unit of the simulated pronunciation organ, collect auditory feedback information and somatosensory feedback information of the DIVA model, and obtain somatosensory feedback commands from the somatosensory target area and somatosensory feedback information in the somatosensory error mapping set;

步骤B,将DIVA模型的听觉反馈信息、扰动发音单元映射到听觉状态映射集;Step B, mapping the auditory feedback information and perturbed pronunciation units of the DIVA model to the auditory state mapping set;

步骤C,听觉误差映射集根据所述DIVA神经网络模型的输入量以及所述模拟发音器官听觉反馈信息得到听觉反馈命令;Step C, the auditory error mapping set obtains an auditory feedback command according to the input amount of the DIVA neural network model and the auditory feedback information of the simulated pronunciation organ;

步骤D,发音器官速率和位置映射集根据体觉反馈命令、听觉反馈命令得到所述模拟发音器官的训练量,模拟发音器官在声道动作知识库的作用下发音。Step D, the pronunciation organ velocity and position mapping set obtains the training amount of the simulated pronunciation organ according to the somatosensory feedback command and the auditory feedback command, and the simulated pronunciation organ pronounces under the function of the vocal tract action knowledge base.

本发明采用上述技术方案,具有以下有益效果:减少模型在产生发音时的训练次数,提高发音准确性。The present invention adopts the above-mentioned technical scheme, and has the following beneficial effects: reducing the training times of the model when producing pronunciation, and improving the accuracy of pronunciation.

附图说明Description of drawings

图1为DIVA模型的框图。Figure 1 is a block diagram of the DIVA model.

图2为声道振动知识库的框图。Fig. 2 is a block diagram of the vocal tract vibration knowledge base.

图3为改进的DIVA模型的框图。Figure 3 is a block diagram of the improved DIVA model.

具体实施方式detailed description

下面结合附图对发明的技术方案进行详细说明:Below in conjunction with accompanying drawing, the technical scheme of invention is described in detail:

图2中给出了声道动作知识库模型的组成框图。声道动作知识库中包含感官运动(sensory-motor)、发音技巧(speakingskills)和可比较的心理音节(mentalsyllabary)。Figure 2 shows the composition block diagram of the vocal tract action knowledge base model. The vocal tract movement knowledge base includes sensory-motor, speaking skills and comparable mental syllabary.

声道动作知识库模型的工作流程分为语音产生和分类感知两个阶段:The workflow of the vocal tract action knowledge base model is divided into two stages: speech generation and classification perception:

语音产生阶段工作流程为:声道动作知识库模型激活始于语音项的音素表征的激活,这种言语模式是处理一个一个的音节。在处理高频音节的情况里,模型已经获取了高频音节的规划运动,首先规划运动通过语音映射集被激活,然后每个音节对应的声道动作产生运动神经元激活模式。随后的神经肌肉处理导致了发音器官的运动,并允许通过发音-听觉模型生成语音信号。前面获取的相同音节的感官状态通过语音映射集同时激活。图3中状态TS与状态ES对应,进而产生当前音节。在存在明显差异的情况下,听觉和体觉误差信号通过语音映射进行传递,用来改变一个新的或更新后的音节的规划运动。在低频音节的情况中,通过语音映射集激活相似音节语音学上的规划来激活规划运动模块进而产生规划运动。The workflow of the speech production stage is as follows: the activation of the vocal tract action knowledge base model starts with the activation of the phoneme representation of the phonetic item, and this speech mode is to process syllables one by one. In the case of processing high-frequency syllables, the model has captured the planned motion of high-frequency syllables, first the planned motion is activated through the phonetic map set, and then the corresponding vocal tract action of each syllable produces a motoneuron activation pattern. Subsequent neuromuscular processing results in the movement of the vocal organs and allows the generation of speech signals through the articulation-auditory model. The previously acquired sensory states of the same syllable are simultaneously activated by the set of phonetic maps. The state TS in Fig. 3 corresponds to the state ES, and then generates the current syllable. In the presence of significant discrepancies, auditory and somatosensory error signals are passed through the phonetic map to alter the planned movement of a new or updated syllable. In the case of low-frequency syllables, the phonetic planning of similar syllables is activated by the phonetic map set to activate the planned motion module to generate the planned motion.

分类感知阶段模型工作流程为:语音感知始于外部声音信号的产生。如果旨在音素识别,必须是高频音节的信号才能实现。为了这个目的,信号在外围和下皮层区域进行预处理,把短期记忆加载到外部听觉状态。然后其神经激活模式被传递到训练状态映射集,首先导致语音映射水平上的神经元区域的共同激活,其次是音素映射集水平上的特殊神经元的共同激活;第一个表示该音节的发音,第二个表示该音节的音韵。这种神经通路通过语音映射,也称为背侧神经束的语音感知,也为高频音节共同激活一个规划运动。语音感知中的第二个神经束,比如腹侧神经束,直接联系听觉激活模式与语音处理模块。假定背侧神经束在语音获取过程中是十分重要的,而腹侧神经束在后来的成人言语感知中占主导地位。The workflow of the classification perception stage model is as follows: Speech perception begins with the generation of external sound signals. If it is aimed at phoneme recognition, it must be a signal of high-frequency syllables to achieve it. For this purpose, signals are preprocessed in peripheral and subcortical areas to load short-term memory into external auditory states. Its neural activation pattern is then passed to the training state map, leading first to the co-activation of neuronal regions at the level of the phonetic map, followed by the co-activation of specific neurons at the level of the phoneme map; the first represents the pronunciation of the syllable , the second indicates the phonology of the syllable. This neural pathway also co-activates a planned movement for high-frequency syllables through phonological mapping, also known as phonological perception of the dorsal tract. A second neural tract in speech perception, such as the ventral tract, directly links auditory activation patterns with speech processing modules. The dorsal tract is postulated to be important in speech acquisition, whereas the ventral tract is later dominant in speech perception in adults.

本发明所述的改进DIVA模型如图3所示,添加了作用于模拟发音器官的声道动作知识库模块以及扰动模块。The improved DIVA model of the present invention is shown in Fig. 3, adding a vocal tract action knowledge base module and a disturbance module acting on the simulated vocal organs.

模型运用不同语音到音素、感官、规划运动映射集的初始化来训练200个实例。在牙牙学语时期和模仿时期模型中每一个实例获取的“知识“存储在语音映射集到其他映射集的双向神经映射中。在语音映射集中神经元表示为:The model is trained on 200 instances using initializations from different speech-to-phoneme, sensory, and planned-motion mapping sets. The "knowledge" acquired by each instance of the model during the babble and imitation phases is stored in a bidirectional neural map from the speech map to the other map. A neuron is represented in a speech map set as:

(a)元音或元音辅音音素状态的实现;(a) realization of vowel or vowel-consonant phoneme states;

(b)一个规划运动状态;(b) a planned motion state;

(c)一个听觉状态;(c) an auditory state;

(d)一个体觉状态。(d) A somatosensory state.

训练实验包括咿呀学语阶段和模仿阶段(在DIVA模型中得到体现)。在咿呀学语阶段,模型把规划运动状态和听觉状态关联在一起。在此基础上,该模型在模仿训练阶段能够产生规划运动。The training experiment included a babbling phase and an imitation phase (represented in the DIVA model). During the babbling phase, the model associates planned motor states with auditory states. Based on this, the model is able to generate planned motion during the imitation training phase.

在模仿训练阶段中,语音映射集水平上出现了音素区域。在进行了这些最初的实验后,我们继续进行了更复杂的模型语言,包括元音--,辅音元音--和辅音元音元音--音节,这是基于一个更大的辅音集。训练再次表明了一个语音映射集的严格的排序,这个排序相关于语音特性、音位排列特性和群集的辅音类型。During the imitation training phase, phoneme regions emerged at the phonetic map level. After these initial experiments, we moved on to more complex model languages, including vowel--, consonant-vowel--, and consonant-vowel--syllables, based on a larger set of consonants. Training again revealed a strict ordering of phonetic map sets with respect to phonetic properties, phonemic arrangement properties, and clustered consonant types.

为了了解改进后的DIVA过程的工作流程和发音效果,我们使用改进后的DIVA模型进行了以下的学习实验:In order to understand the workflow and pronunciation effects of the improved DIVA process, we conducted the following learning experiments using the improved DIVA model:

1.一个五元音系统/i,e,a,o,u/1. A five-vowel system /i,e,a,o,u/

2.一个小的辅音系统(由浊塞音/b,d,g/和之前获得的5个元音组合成的简单音节)2. A small consonant system (simple syllables composed of voiced stops /b, d, g/ and the 5 vowels acquired earlier)

3.一个小的语言模型,包括五个元音系统,浊塞音和清塞音/b,d,g,p,t,k/,鼻音/m,n/,侧音/l/和三个音节类型(V,CV,CCV)3. A small language model, including five vowel systems, voiced and voiceless stops /b, d, g, p, t, k/, nasal /m, n/, lateral /l/ and three syllables Type (V, CV, CCV)

4.以一个6岁的孩子的测试标准测试英语中最常见的200个音节。4. Test the most common 200 syllables in English with the test standard of a 6-year-old child.

步骤1,构建改进的DIVA神经计算学语音模型:在DIVA神经计算学语音模型中添加作用于模拟发音器官的声道动作知识库;Step 1, constructing an improved DIVA neural computing speech model: adding a vocal tract action knowledge base that acts on the simulated pronunciation organs to the DIVA neural computing speech model;

步骤2,采集发音单元的共振峰频率,作为DIVA神经计算学语音模型的输入量;Step 2, collecting the formant frequency of the articulation unit as the input of the DIVA neural computing speech model;

步骤3,将DIVA神经网络模型的输入量映射到语音映射集中,初始化语音映射集中所有的音素单元为未激活状态;Step 3, the input volume of the DIVA neural network model is mapped to the voice mapping set, and all phoneme units in the voice mapping set are initialized to be inactive;

步骤4,输入任意发音单元的振峰频率,训练基于声道动作知识库改进的DIVA神经计算学语音模型:Step 4, input the vibration peak frequency of any pronunciation unit, and train the improved DIVA neural computing speech model based on the vocal tract action knowledge base:

当语音映射集中存在于输入的发音单元的振峰频率相同的因素单元时,模拟发音器官直接经过前馈控制发出输入的发音单元;When there are factor units with the same vibration peak frequency in the input pronunciation unit in the phonetic mapping, the simulated pronunciation organ sends out the input pronunciation unit directly through feed-forward control;

否则,模拟发音器官经过反馈控制学习发出输入的发音单元。Otherwise, the simulated articulation organ learns to emit the input articulation unit through feedback control.

步骤4中,模拟发音器官经过反馈控制发出输入的发音单元具体实施方式如下:In step 4, the specific implementation of the pronunciation unit that simulates the pronunciation organ to issue input through feedback control is as follows:

步骤A,对模拟发音器官施以扰动发音单元,采集DIVA模型的听觉反馈信息、体觉反馈信息,体觉误差映射集由体觉目标区域以及体觉反馈信息得到体觉反馈命令;Step A: Disturb the pronunciation unit of the simulated pronunciation organ, collect auditory feedback information and somatosensory feedback information of the DIVA model, and obtain somatosensory feedback commands from the somatosensory target area and somatosensory feedback information in the somatosensory error mapping set;

步骤B,将DIVA模型的听觉反馈信息、扰动发音单元映射到听觉状态映射集;Step B, mapping the auditory feedback information and perturbed pronunciation units of the DIVA model to the auditory state mapping set;

步骤C,听觉误差映射集根据所述DIVA神经网络模型的输入量以及所述模拟发音器官听觉反馈信息得到听觉反馈命令;Step C, the auditory error mapping set obtains an auditory feedback command according to the input amount of the DIVA neural network model and the auditory feedback information of the simulated pronunciation organ;

步骤D,发音器官速率和位置映射集根据体觉反馈命令、听觉反馈命令得到所述模拟发音器官的训练量,模拟发音器官在声道动作知识库的作用下发音。Step D, the pronunciation organ velocity and position mapping set obtains the training amount of the simulated pronunciation organ according to the somatosensory feedback command and the auditory feedback command, and the simulated pronunciation organ pronounces under the function of the vocal tract action knowledge base.

将扰动发音单元映射到听觉状态映射集的目的在于进一步完善听觉状态映射集,声道动作知识库的加入旨在丰富模拟发音器官的动作,进而提高发音精确度,提高整个DIVA模型的学习效率。The purpose of mapping the perturbed pronunciation unit to the auditory state mapping set is to further improve the auditory state mapping set. The addition of the vocal tract action knowledge base aims to enrich the actions of the simulated vocal organs, thereby improving the pronunciation accuracy and improving the learning efficiency of the entire DIVA model.

修改后的模型在感官运动和认知方面进行了整合。音语处理过程中语音或感觉运动模型面临的一个严峻的问题是没有对语音获取时音素映射集的发展进行建模。我们对这个问题进行了改进,引入了一个可行的解决方法:是在没有明确的引入语音映射集的语音获取开始阶段,使行为知识库和心理词汇直接耦合。这样我们修改后的DIVA模型与原有模型相比发音时延更小,准确性更高。The modified model integrates sensory-motor and cognitive aspects. A serious problem faced by speech or sensorimotor models during phonological processing is the failure to model the development of phoneme maps during speech acquisition. We improve this problem and introduce a feasible solution: it is to directly couple the behavioral knowledge base and the mental vocabulary at the beginning of speech acquisition without explicitly introducing a phonetic mapping set. In this way, our modified DIVA model has a smaller pronunciation delay and higher accuracy than the original model.

本发明与现有技术相比,具有以下显著的优点:本发明以DIVA神经网络模型为基础,在神经解剖学和神经生理学层次上描述和仿真了发音的相关功能,对模型添加扰动模块,使模型能够更高效、精准的产生发音;对模型添加声道动作知识库模块丰富了DIVA模型原有的声道配置,减少模型在产生发音时的训练次数,提高发音准确性。DIVA神经网络模型最终可以通过与脑机接口(BCI)的结合,构造出符合汉语语音发声规律、具有真正生理学意义的汉语语音生成与获取的神经计算模型,从而进一步构造出具有中国人思维特征的“思想阅读器”奠定理论和实践基础。Compared with the prior art, the present invention has the following significant advantages: the present invention is based on the DIVA neural network model, describes and simulates the relevant functions of pronunciation at the level of neuroanatomy and neurophysiology, and adds a disturbance module to the model, so that The model can produce pronunciation more efficiently and accurately; adding the vocal channel action knowledge base module to the model enriches the original vocal channel configuration of the DIVA model, reduces the number of training times for the model when producing pronunciation, and improves the accuracy of pronunciation. The DIVA neural network model can eventually be combined with the brain-computer interface (BCI) to construct a neural computing model for the generation and acquisition of Chinese speech that conforms to the law of Chinese speech and has real physiological significance, thereby further constructing a Chinese thinking characteristic. The Thought Reader lays the theoretical and practical foundations.

Claims (2)

1. the DIVA neural network model manner of articulation improved based on sound channel action knowledge base, it is characterised in that comprise the steps:
Step 1, builds the DIVA neurocomputing speech model improved: add the sound channel action knowledge base acting on simulation phonatory organ in DIVA neurocomputing speech model,
Described sound channel action knowledge base activates and starts from the activation that the phoneme of speech items characterizes, the programming movement of high frequency syllable has been obtained when processing high frequency syllable, voice mapping ensemblen activates programming movement, the sound channel action that each syllable is corresponding produces motor neuron and activates pattern, neuromuscular processes the motion that result in phonatory organ and allows to generate voice signal by pronunciation-auditory model, and the planning activated in similar syllable verbal audio by voice mapping ensemblen when processing low frequency syllable activates programming movement;
Step 2, gathers the formant frequency of pronunciation unit, as the input quantity of DIVA neurocomputing speech model;
Step 3, is mapped in voice mapping ensemblen by the input quantity of DIVA neural network model, and initializing all of phoneme unit in voice mapping ensemblen is unactivated state;
Step 4, the peak frequency of shaking of input arbitrarily pronunciation unit, train the DIVA neurocomputing speech model improved based on sound channel action knowledge base:
When being present in the identical factor unit of peak frequency of shaking of pronunciation unit of input in voice mapping ensemblen, simulation phonatory organ are directly over the feedforward and send the pronunciation unit of input;
Otherwise, simulation phonatory organ send the pronunciation unit of input through feedback control study.
2. the DIVA neural network model manner of articulation improved based on sound channel action knowledge base according to claim 1, it is characterised in that the pronunciation unit detailed description of the invention that the simulation phonatory organ described in step 4 send input through feedback control is as follows:
Simulation phonatory organ are imposed disturbance pronunciation unit, gather the auditory feedback information of DIVA model, somatesthesia feedback information by step A, and somatesthesia error map collection is obtained somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B, is mapped to hearing status mapping ensemblen by the auditory feedback information of DIVA model, disturbance pronunciation unit;
Step C, audition error map collection obtains auditory feedback order according to input quantity and the described simulation phonatory organ auditory feedback information of described DIVA neural network model;
Step D, phonatory organ speed and position mapping ensemblen obtain the training burden of described simulation phonatory organ according to somatesthesia feedback command, auditory feedback order, and simulation phonatory organ pronounce under the effect of sound channel action knowledge base.
CN201310274341.3A 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved Expired - Fee Related CN103310272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Publications (2)

Publication Number Publication Date
CN103310272A CN103310272A (en) 2013-09-18
CN103310272B true CN103310272B (en) 2016-06-08

Family

ID=49135459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310274341.3A Expired - Fee Related CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Country Status (1)

Country Link
CN (1) CN103310272B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104605845B (en) * 2015-01-30 2017-01-25 南京邮电大学 A Method of EEG Signal Processing Based on DIVA Model
CN104679249B (en) * 2015-03-06 2017-07-07 南京邮电大学 A kind of Chinese brain-computer interface implementation method based on DIVA models
CN107368895A (en) * 2016-05-13 2017-11-21 扬州大学 A kind of combination machine learning and the action knowledge extraction method planned automatically

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Also Published As

Publication number Publication date
CN103310272A (en) 2013-09-18

Similar Documents

Publication Publication Date Title
Kröger et al. Towards a neurocomputational model of speech production and perception
Kröger et al. Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception
Murakami et al. Seeing [u] aids vocal learning: Babbling and imitation of vowels using a 3D vocal tract model, reinforcement learning, and reservoir computing
Prom-on et al. Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach
Prom-on et al. Training an articulatory synthesizer with continuous acoustic data.
Serkhane et al. Infants’ vocalizations analyzed with an articulatory model: A preliminary report
CN103310272B (en) Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved
Guenther et al. A model of cortical and cerebellar function in speech
Kröger et al. Modeling speech production using the Neural Engineering Framework
Kröger et al. Neural modeling of speech processing and speech learning
Kröger et al. Articulatory synthesis of speech and singing: State of the art and suggestions for future research
Howard et al. A computational model of infant speech development
CN103310273A (en) Method for articulating Chinese vowels with tones and based on DIVA model
Saltzman et al. The distinctions between state, parameter and graph dynamics in sensorimotor control and coordination
Bekolay Biologically inspired methods in speech recognition and synthesis: closing the loop
Zhao et al. Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training
Kröger Modeling dysfunctions in the coordination of voice and supraglottal articulation in neurogenic speech disorders
Guenther et al. A neural model of speech production
Delić et al. Toward more expressive speech communication in human-robot interaction
Lapthawan et al. Estimating underlying articulatory targets of Thai vowels by using deep learning based on generating synthetic samples from a 3D vocal tract model and data augmentation
Xu et al. Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak
Xu et al. Degrees of freedom in prosody modeling
Shaobai et al. Research on the mechanism for phonating stressed English syllables based on DIVA model
Kim et al. Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints.
Kröger et al. The LS Model (Lexicon-Syllabary Model)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130918

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Denomination of invention: Articulation method of Directions Into of Articulators (DIVA) neural network model improved on basis of track action knowledge base

Granted publication date: 20160608

License type: Common License

Record date: 20161109

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Date of cancellation: 20180116

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160608

Termination date: 20190702