CN107808659A - Intelligent sound signal type recognition system device - Google Patents
Intelligent sound signal type recognition system device Download PDFInfo
- Publication number
- CN107808659A CN107808659A CN201711253194.6A CN201711253194A CN107808659A CN 107808659 A CN107808659 A CN 107808659A CN 201711253194 A CN201711253194 A CN 201711253194A CN 107808659 A CN107808659 A CN 107808659A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- input
- feature
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种智能语音信号模式识别系统装置,包含有框体10,所述框体10设置有腔体,在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9,语音采集模块1包含有话筒11、无线对讲机12和固定录音器13,语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23,语音信号由语音采集模块1采集,采集到的信号由语音识别模块2处理,数据信号由存储器33保存,人机交互的操作流程以及结果的输出的可视化由显示屏8显示,因此,人们识别语音信号更方便。
An intelligent voice signal pattern recognition system device, comprising a frame body 10, the frame body 10 is provided with a cavity, and a voice collection module 1, a voice recognition module 2, a central processing unit 3, a wireless signal Transceiver 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, voice recognition module 2 includes voice Input unit 20, speech preprocessing unit 21, speech signal feature extraction unit 22, feature matching discrimination classification unit 23, speech signal is collected by speech acquisition module 1, the signal that gathers is processed by speech recognition module 2, and data signal is preserved by memory 33 , the visualization of the operation flow of the human-computer interaction and the output of the result is displayed by the display screen 8, so it is more convenient for people to recognize the voice signal.
Description
技术领域technical field
本发明公开了一种智能语音信号模式识别系统装置,属于智能电子产品技术领域,具体地说是装备了语音采集模块、语音识别模块、控制系统及扬声器为一体的一种智能语音信号模式识别系统装置。The invention discloses an intelligent voice signal pattern recognition system device, which belongs to the technical field of intelligent electronic products, and specifically is an intelligent voice signal pattern recognition system equipped with a voice collection module, a voice recognition module, a control system and a loudspeaker. device.
背景技术Background technique
在人们的日常生活中,存在着各种各样的语号信号,如人们的交流发出的语音信号、机器运作产生的声音、播放音乐发出的声音、汽车鸣笛产生的声音等,语音信号几乎充斥了整个生活环境周围,有些时候人们希望准确的获悉和识别一组语音信号中是由哪些对象发出的。在常见的声音信号中,人们往往可以辨别出不同的声音是由什么物体发出的,但是当多种对象同时发出声音时,尤其是多个同类对象同时发声时,或者录音环境嘈杂,人们很难区别哪种声音是由哪个物体发出的,例如,在一组多人辩论现象的录音中,讲话的人数较多时,人们很难通过听录音而区分出哪些话是哪个辩手说的。因此,人们通常需要一种能够识别语音的装置。In people's daily life, there are all kinds of speech signals, such as speech signals from people's communication, sounds from machine operation, sounds from playing music, sounds from car whistles, etc. The speech signals are almost Flooded around the entire living environment, sometimes people want to accurately learn and identify which objects in a group of voice signals are sent. In common sound signals, people can often distinguish what objects make different sounds, but when multiple objects make sounds at the same time, especially when multiple objects of the same kind make sounds at the same time, or the recording environment is noisy, it is difficult for people Distinguish which sound is made by which object, for example, in a group of recordings of a multi-person debate phenomenon, when there are many people speaking, it is difficult for people to distinguish which words are spoken by which debater by listening to the recording. Therefore, people usually need a device capable of recognizing speech.
在本发明之前,市面上也存在一些识别语音的产品,倒如一些语音输入软件等,但是大多是识别语音中的文字或字母,或者是对简单的单一语音进行配对识别,也有的可以通过对着手机等产品说话,手机识别语音语义后完成某些任务,如打电话搜索等简单任务,但是无法实现对语音特征的区别,不能准确的识别区分出相似语音或相同的词语是由哪个人或对象说出的类似问题。因此,不便于人们的灵活使用。Before the present invention, there were also some voice recognition products on the market, such as some voice input software, etc., but most of them recognized words or letters in the voice, or paired and recognized simple single voices. Talking with mobile phones and other products, the mobile phone can complete certain tasks after recognizing the semantics of the voice, such as simple tasks such as calling and searching, but it cannot distinguish the characteristics of the voice, and cannot accurately identify the origin of similar voices or the same words. Similar questions uttered by the subject. Therefore, it is not convenient for people to use flexibly.
发明内容Contents of the invention
为了克服上述技术缺点,本发明的目的是提供一种智能语音信号模式识别系统装置,可以方便的识别和记录语音信号及提出特征参数,并通过对现有信号进行对未知语音信号进行智能模式识别、分类和提取。In order to overcome the above-mentioned technical shortcomings, the object of the present invention is to provide an intelligent voice signal pattern recognition system device, which can easily identify and record voice signals and propose characteristic parameters, and perform intelligent pattern recognition on unknown voice signals by performing existing signals , classification and extraction.
为达到上述目的,本发明采取的技术方案是:包含有框体10,框体10设置有腔体,在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9,语音采集模块1包含有话筒11、无线对讲机12和固定录音器13,语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23,语音信号由语音采集模块1采集,采集到的信号由语音识别模块2处理,数据信号由存储器33保存,人机交互的操作流程以及结果的输出的可视化由显示屏8显示,扬声器35设置为对操作步骤进行语音提示及播报识别结果,网络模块31设置为将本发明与互联网云平台进行连接,中央处理器3设置为对整个系统装置的程序控制及数据运算,无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接,内存卡32设置为将已录制的外部语音数据读入本发明数据库中。In order to achieve the above-mentioned purpose, the technical scheme that the present invention takes is: comprise frame body 10, frame body 10 is provided with cavity, is provided with speech collection module 1, speech recognition module 2, central processing unit 3, wireless in frame body 10 Signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, and voice recognition module 2 includes Voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal collected is processed by voice recognition module 2, and data signal is processed by memory 33 Preserve, the visualization of the output of the operation process of man-machine interaction and result is shown by display screen 8, loudspeaker 35 is set to carry out voice prompt and broadcast recognition result to operation step, and network module 31 is set to the present invention and Internet cloud platform are connected, The central processing unit 3 is set to the program control and data calculation of the whole system device, and the wireless signal transceiver 4 is set to receive and transmit the radio signals generated by the wireless walkie-talkie 12, the smart phone, and the network module 31 and connect the present invention with the Internet. For wireless connection, the memory card 32 is set to read the recorded external voice data into the database of the present invention.
本发明设计了,语音输入单元20设置为包含有“语音录入模式”和“语音测试模式”两种类型,可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音,在“语音录入模式”中,语音输入单元20设置为一次只能对一个人或一个对象进行语音录入,其特征在于,录入的语音为一段5~30秒的音频信号,本发明采用多状态语音录入策略,其特征在于,录入的语音中可包含有正常讲话、唱歌或者高/中/低音的多状态组合语音,显示器8实时显示语音波形及完成进度条,录入语音完毕后需要进行数据标记,标记方法采用人工手动标记,如采集完张三的声音,即在本发明显示屏8显示的对话框中备注:“张三的声音”,保存即可,录入的语音保存在存储器33中,在“语音测试模式”下,本发明通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音,测试语音采集过程为实时采集,没有任何人数、对象和时间的限制。The present invention has designed that the voice input unit 20 is set to include two types of "voice input mode" and "voice test mode", which can be provided by the microphone 11 provided by the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone. Input voice in any mode of the mobile phone. In the "voice input mode", the voice input unit 20 is set to only perform voice input on one person or one object at a time. It is characterized in that the input voice is a 5-30 second audio Signal, the present invention adopts the multi-state voice input strategy, it is characterized in that, can comprise the multi-state combined voice of normal speech, singing or high/middle/bass in the voice of input, display 8 real-time display voice waveform and complete progress bar, input Need to carry out data labeling after speech is finished, labeling method adopts manual manual labeling, as having collected Zhang San's sound, promptly remark in the dialog box that display screen 8 of the present invention shows: " Zhang San's voice ", preserve and get final product, input The voice is stored in the memory 33. Under the "voice test mode", the present invention collects and tests together with one or more input tools of the microphone 11 in the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone. Voice, the test voice collection process is real-time collection, without any restrictions on the number of people, objects and time.
本发明设计了,语音输入单元20设置为与语音采集模块1相连接,话筒11通过音频线连接到语音采集模块1,无线对讲机12通过无线电信号与语音采集模块1连接。The present invention designs, voice input unit 20 is set to be connected with voice collection module 1, microphone 11 is connected to voice collection module 1 by audio line, wireless walkie-talkie 12 is connected with voice collection module 1 by radio signal.
本发明设计了,语音采集模块1还可采用智能手机进行语音信号输入,通过用手机与本发明语音采集模块1匹配连接,匹配方式包括蓝牙、红外线、WIFI以及扫描二维码进行连接,实现语音录入,相当于把手机当成无线语筒使用,更方便于多人群语音互动。The present invention is designed, voice collection module 1 can also adopt smart mobile phone to carry out voice signal input, by matching connection with voice collection module 1 of the present invention with mobile phone, matching mode comprises bluetooth, infrared ray, WIFI and scan two-dimension code to connect, realize voice Recording is equivalent to using the mobile phone as a wireless microphone, which is more convenient for multi-group voice interaction.
本发明设计了,语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号,即将模拟信号转变为数字信号,然后进行常规的信号处理,包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。The present invention has designed that the voice preprocessing unit 21 converts the voice signal collected by the voice acquisition module 1 into an electrical signal, that is, converts an analog signal into a digital signal, and then performs conventional signal processing, including environmental background noise elimination, signal framing, Filtering, pre-emphasis, windowing functions, endpoint detection, etc.
本发明设计了,语音信号特征提取单元22设置为从原始语音信号中提取出反映语音本质的主要特征参数,形成特征向量xi,xi=(xi1,xi2,…xij,…,xin)T,xij表示第i个对象或个人的第j个语音特征值,特征参数提取方法优选的采用频率倒谱系数法(MFCC),还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征,提取特征后得到的特征向量系统将自动保存到模式类数据库中,一个对象或人的所有声音特征对应一个模式类,若录入N个人或对象的语音后,即得到N个模式类,若每个模式类有n个特征参数,即可构成n维特征空间,即标记后的特征信号集可记为D={(x1,y1),(x2,y2),…(xi,yi),…,(xN,yN)},其中xi∈χ=Rn,xi表示所录入的第i个对象或人的语音特征信号,yi∈Y={1,2,…,N},yi表示第i个人或对象,N表示第N个人或对象的数字编号,标记后的语音特征数据构成模式类数据库,并存储在本发明的存储器33中。The present invention designs that the speech signal feature extraction unit 22 is set to extract the main feature parameters reflecting the essence of the speech from the original speech signal to form a feature vector x i , x i = (x i1 , x i2 ,...x ij ,..., x in ) T , x ij represents the j-th speech feature value of the i-th object or individual, the feature parameter extraction method preferably adopts frequency cepstral coefficient method (MFCC), and spectral envelope method, LPC interpolation method can also be used , LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and the feature vector system obtained after feature extraction will be automatically saved in the pattern class database. All sound features of an object or person correspond to a pattern class. If input N After the speech of an individual or object, N pattern classes are obtained. If each pattern class has n feature parameters, an n-dimensional feature space can be formed, that is, the marked feature signal set can be recorded as D={(x 1 , y 1 ),(x 2 ,y 2 ),…( xi ,y i ),…,(x N ,y N )}, where x i ∈ χ=R n , xi represents the i-th input The speech feature signal of the object or person, y i ∈ Y={1,2,…,N}, y i represents the i-th person or object, N represents the number of the N-th person or object, and the marked speech feature data constitutes The pattern class database is stored in the memory 33 of the present invention.
本发明设计了,特征匹配判别分类单元23设置为采用智能的多类分类器,分类器的学习算法设置为采用改进的神经网络分类算法,通过对已录入并标记的语音特征信号集作为训练数据,让网络模型对训练数据进行学习,得到分类规则,完成分类器的训练;然后利用已经训练好的分类器对未知的测试语音信号进行智能分类和识别;当测试信号提取特征后,本发明会自动进行特征匹配,将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入并标记好的样本语音特征参数进行特征匹配,并计算测试语音信号与所有已录入的样本语音信号的相似度,然后把测试语音信号划分到与其相似度最高的那一样本信号模式类别中,最后本发明向外界输出识别结果,“这是XXX的声音”类似的报告,例如,如果本发明已存储了张三的语音特征信号,当张三对着本发明说话或唱歌时,本发明会自动计算出张三的测试语音特征参数与已录入并标记过的张三的录入语音信号最相似,经过识别,自动输出“这是张三的声音”。The present invention has designed, and feature matching distinguishes classification unit 23 and is set to adopt intelligent multiclass classifier, and the learning algorithm of classifier is set to adopt improved neural network classification algorithm, by the speech feature signal set that has entered and marked as training data , let the network model learn the training data, obtain classification rules, and complete the training of the classifier; then use the trained classifier to intelligently classify and identify unknown test voice signals; after the test signal extracts features, the present invention will Automatically carry out feature matching, carry out feature matching with the feature parameters of the extracted test voice signal in real time and the sample voice feature parameters that have been entered and marked in the memory 33 of the present invention, and calculate the test voice signal and all the sample voice signals that have been entered. similarity, then the test voice signal is divided into the sample signal pattern category with the highest similarity, and finally the present invention outputs the recognition result to the outside world, a report similar to "this is the sound of XXX", for example, if the present invention has stored Zhang San's speech characteristic signal, when Zhang San spoke or sang to the present invention, the present invention can calculate Zhang San's test speech characteristic parameter automatically and the entry speech signal of Zhang San that has entered and marked the most similar, through Recognize and automatically output "This is Zhang San's voice".
本发明设计了,多类分类器采用的多层人工神经网络结构,其特征是,网络的一端定义为输入层,另一端定义为输出层,输入层与输出层中间的部分定义为隐含层,输入层用于接收外界的输入信号,重新将输入信号发送给隐含层的所有神经元,经隐含层计算后将结果传递给输出层,输出层从隐含层接收信号,计算后输出分类结果,即识别的结果,本发明优选的隐含层的层数设置为1~200层。The present invention has designed, the multi-layer artificial neural network structure that multiclass classifier adopts, and it is characterized in that, one end of network is defined as input layer, and the other end is defined as output layer, and the part in the middle of input layer and output layer is defined as hidden layer , the input layer is used to receive external input signals, re-send the input signals to all neurons in the hidden layer, and pass the results to the output layer after calculation by the hidden layer, the output layer receives signals from the hidden layer, and outputs after calculation As for the classification result, that is, the recognition result, the preferred number of hidden layers in the present invention is set to 1-200 layers.
本发明设计了,改进的人工神经网络分类算法训练的过程包含步骤1~7。The invention designs that the training process of the improved artificial neural network classification algorithm includes steps 1-7.
步骤1:网络初始化。根据语音信号录入的个数,不断更新算法数据库,当录入了N个对象的语音信号时,即构成N个模式类,得到样本空间(X,Y),第i组样本即(Xi,Yi),Xi表示对第i个对象所提取的特征向量集合,Yi表示所标记的第i个对象;根据系统输入输出序列(X,Y)确定网络输入层结点数n、隐含层结点数l、输出层结点数m,其中n值由输入信号特征提取中对应特征值的个数确定,m值由存储的语音模式类的个数确定,l的参照值为其中a的取值范围为0~10,由模型自动计算确定,初始化输入层与隐含层的神经元之间的连接权值ωij和隐含层与输出层神经元之间的连接权值ωjk,初始化隐含层阈值a和输出层阈值b,给定学习率η和神经元激励函数。Step 1: Network initialization. According to the number of input voice signals, the algorithm database is continuously updated. When the voice signals of N objects are input, N pattern classes are formed, and the sample space (X, Y) is obtained. The i-th group of samples is (X i , Y i ), X i represents the set of feature vectors extracted from the i-th object, and Y i represents the marked i-th object; according to the system input and output sequence (X, Y), determine the network input layer node number n, the hidden layer The number of nodes l, the number of nodes in the output layer m, wherein the value of n is determined by the number of corresponding feature values in the input signal feature extraction, the value of m is determined by the number of stored speech pattern classes, and the reference value of l is The value of a ranges from 0 to 10, which is automatically calculated and determined by the model, and the connection weight ω ij between the neurons of the input layer and the hidden layer and the connection weight between the neurons of the hidden layer and the output layer are initialized ω jk , initialize hidden layer threshold a and output layer threshold b, given learning rate η and neuron activation function.
步骤2:计算隐含层的输出。根据输入变更X,输入层与隐含层的神经元的连接权值ωij,以及隐含层阈值a,计算隐含层输出H;记第j个隐含层结点的输出为Hj,j=1,2,…,l,其中l为隐含层结点数,f为隐含层激励函数,所述激励函数有多种,本发明优选的采用f(x)=(1+e-x)-1。Step 2: Calculate the output of the hidden layer. According to the input change X, the connection weight ω ij of the neurons in the input layer and the hidden layer, and the hidden layer threshold a, calculate the hidden layer output H; record the output of the jth hidden layer node as H j , j=1,2,...,l, where l is the number of nodes in the hidden layer, f is the hidden layer activation function, and there are many kinds of activation functions, and the present invention preferably adopts f(x) = (1+e- x ) -1 .
步骤3:计算输出层的输出。根据隐含层输出H,隐含层与输出层神经元之间的连接权值ωjk,以及输出层阈值b,计算输出层输出O,记第k个输出层结点的输出为Ok,k=1,2,…,m,其中m为输出层结点数,bk为输出层第k个结点的阈值,Hj为隐含层第j个结点的输出值。Step 3: Compute the output of the output layer. According to the hidden layer output H, the connection weight ω jk between the hidden layer and the output layer neurons, and the output layer threshold b, calculate the output layer output O, record the output of the kth output layer node as O k , k=1,2,...,m, where m is the number of nodes in the output layer, b k is the threshold of the kth node in the output layer, and H j is the output value of the jth node in the hidden layer.
步骤4:计算预测误差。根据网络预测得到的输出O和期望输出Y(真值),计算网络预测总误差e,ek为第k个输出层结点产生的误差, Step 4: Calculate the prediction error. According to the output O and the expected output Y (true value) obtained by the network prediction, calculate the total error e of the network prediction, e k is the error generated by the kth output layer node,
步骤5:更新权值。根据网络预测总误差e更新网络连接权值ωjk和ωij,ωjk +=ωjk+η·Hj·Ek,其中j=1,2,…,l,k=1,2,…,m,η为学习率,Ek表示输出层结点的网络总误差对输出层网络结点k的灵敏度, 其中i=1,2,…,n,j=1,2,…,l。Step 5: Update weights. Update the network connection weights ω jk and ω ij according to the total network prediction error e, ω jk + =ω jk +η·H j ·E k , where j=1,2,...,l, k=1,2,... , m, η is the learning rate, E k represents the sensitivity of the network total error of the output layer node to the output layer network node k, where i=1,2,...,n, j=1,2,...,l.
步骤6:阈值更新。根据网络预测总误差e更新隐含层阈值a和输出层阈值b,j=1,2,…,l;bk +=bk+η·Ek,k=1,2,…,m。Step 6: Threshold update. Update the hidden layer threshold a and the output layer threshold b according to the network prediction total error e, j=1,2,...,l; b k + =b k +η·E k , k=1,2,...,m.
步骤7:判断算法迭代是否收敛,若没收敛返回步骤2,本发明优选的最小误差为0.001时结束迭代。Step 7: Judging whether the algorithm iteration is convergent, if not, return to step 2, and the preferred minimum error of the present invention is 0.001 to end the iteration.
本发明设计了,语音采集模块1内置有语音采集卡,用于收集和处理采集到的语音信号。The present invention designs that the voice collection module 1 is built with a voice collection card for collecting and processing the collected voice signals.
本发明设计了,固定录音器13采用防风式麦克风。The present invention has designed, and fixed recorder 13 adopts windproof type microphone.
本发明设计了,显示屏8采用带背景灯的触摸屏或LED显示屏。The present invention designs, and display screen 8 adopts touch screen or LED display screen with backlight.
在本发明中,固定录音器13可以设置多个,布置在本发明外壳处,用于增加语音录制强度。In the present invention, a plurality of fixed recorders 13 can be provided, arranged at the shell of the present invention, for increasing the intensity of voice recording.
本发明对已录入并标记好的语音信号具有长期存储功能,凡是存储在本发明语音模式类数据库中的语音信号,本发明都可随时调取与未知测试语音进行对比识别。The present invention has a long-term storage function for the voice signals that have been entered and marked. Any voice signal stored in the voice pattern database of the present invention can be retrieved at any time with unknown test voices for comparison and recognition.
本发明的使用流程是,先打开电源开关5,然后系统自动运行,显示屏8点亮并显示操作界面,人们可以选择“语音录入模式”和“语音测试模式”两种功能。The use procedure of the present invention is that first turn on the power switch 5, then the system runs automatically, the display screen 8 lights up and displays the operation interface, and people can select two functions of "voice input mode" and "voice test mode".
(1)当选择语音录入时,中央处理器3会控制语音输入单元20进入“语音录入模式”,显示屏8和扬声器35会同时提示“现在是语音录入模式,请说话”类似的提示,人们可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音;为保证本发明能够准确识别和量化被识别对象的语音特征,因此在“语音录入模式”阶段每次只能对一个人或一个对象进行语音录入,由于同一人在说话和唱歌时发出的声音信号数据会存在一定的特征偏差,因此,为提高语音信号识别的准确度,本发明采用多状态语音录入策略,即录入的语音中可包含正常讲话、唱歌或者高/中/低音及其他状态下的多状态组合声音,录音时长为5~30秒,显示器8会显示语音实时波形及完成进度条,如果录制的语音不理想可以删除再次录入,录入语音完毕后需要进行数据标记,标记方法采用人工手动标记,如采集完张三的声音,即在本发明显示屏8显示的对话框中备注:“张三的声音”,保存即可,录入的语音存储在本发明的存储器33中。(1) When selecting voice input, central processing unit 3 can control voice input unit 20 to enter " voice input mode ", and display screen 8 and loudspeaker 35 can prompt simultaneously " now is voice input mode, please speak " similar hint, people Can input voice in any mode of microphone 11 provided by voice collection module 1, wireless walkie-talkie 12, fixed recorder 13 and smart phone; in order to ensure that the present invention can accurately identify and quantify the voice characteristics of the identified object, therefore in "voice In the "recording mode" stage, only one person or one object can be recorded at a time. Since the sound signal data emitted by the same person when speaking and singing will have certain characteristic deviations, in order to improve the accuracy of speech signal recognition, this The invention adopts a multi-state voice input strategy, that is, the recorded voice can include normal speech, singing, or high/middle/bass and other multi-state combined sounds. The recording time is 5 to 30 seconds, and the display 8 will display the real-time waveform of the voice And complete the progress bar, if the voice of recording is unsatisfactory can delete input again, need to carry out data mark after input voice completes, and marking method adopts artificial manual mark, as gathering the sound of Zhang San, namely the dialogue shown in display screen 8 of the present invention Remarks in the frame: "Zhang San's voice", just save it, and the input voice is stored in the memory 33 of the present invention.
(2)语音信号录入完毕后,本发明的控制系统自动将已标记的语音信号送入语音预处理单元21,语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号,即将模拟信号转变为数字信号,然后进行常规的信号处理,包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。(2) After the voice signal has been entered, the control system of the present invention automatically sends the marked voice signal into the voice preprocessing unit 21, and the voice preprocessing unit 21 converts the voice signal collected by the voice acquisition module 1 into an electrical signal, which is about to The analog signal is converted into a digital signal, and then undergoes conventional signal processing, including environmental background noise elimination, signal framing, filtering, pre-emphasis, window function and endpoint detection, etc.
(3)本发明的控制系统自动把已预处理后语音信号送入信号特征提取单元22,语音信号特征提取单元22从预处理后的语音信号中提取出反映语音本质的特征参数,得到特征向量xi,特征参数提取方法优选的采用频率倒谱系数法(MFCC),还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征,提取特征后得到的特征向量系统自动保存到模式类别库中,一个人的所有声音特征对应一个模式类,若录入N个人语音后,即得到N个模式类,若每个模式类有n个特征参数,从而得到一人对应语音信号模式类的数据库,所有的数据都存储在本发明的存储器33中,至此,语言信号录入模式内容完毕。(3) control system of the present invention sends into signal feature extraction unit 22 automatically to speech signal after preprocessing, and speech signal feature extraction unit 22 extracts the feature parameter reflecting speech essence from the speech signal after preprocessing, obtains feature vector x i , the feature parameter extraction method preferably adopts frequency cepstrum coefficient method (MFCC), and can also use spectral envelope method, LPC interpolation method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and extract features The feature vector system obtained after that is automatically saved in the pattern category library. All the voice features of a person correspond to a pattern category. If the voices of N individuals are input, N pattern categories are obtained. Thereby obtain the database of one person's corresponding speech signal pattern class, all data are all stored in the memory 33 of the present invention, so far, speech signal input pattern content is finished.
(4)语音录入完毕后,可进行语音测试,当进行语音测试时,只需要在显示屏8的操作界面中选择“语音测试模式”即可,中央处理器3会控制语音输入单元20进入“语音测试模式”,显示屏8和扬声器35会同时提示“语音测试中…”类似的提示,这时人们不取要做任何操作,本发明会通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音,测试语音采集过程为实时采集,没有任何时间限制和人数的限制。(4) After the voice entry is complete, the voice test can be carried out. When carrying out the voice test, it is only necessary to select "voice test mode" in the operation interface of the display screen 8, and the central processing unit 3 can control the voice input unit 20 to enter " Voice test mode", display screen 8 and loudspeaker 35 can prompt "in voice test..." similar prompt at the same time, at this moment people do not get to do any operation, the present invention can pass microphone 11, wireless walkie-talkie 12 in voice collection module 1 , the fixed recorder 13 and one or more input tools of the smart phone collect the test voice together, and the test voice collection process is real-time collection without any time limit and number of people.
(5)在“语音测试模式”下采集到的语音数据,本发明系统装置会自动地对测试语音信号进行预处理和特征提取,将采集到的语音测试信号转化为电信号,并进行常规的滤波、去除噪音、加窗函数及端点检测后进行信号特征提取。(5) voice data collected under "voice test mode", the system device of the present invention can automatically carry out preprocessing and feature extraction to the test voice signal, convert the voice test signal collected into an electrical signal, and perform conventional Signal feature extraction is performed after filtering, noise removal, window function and endpoint detection.
(6)测试信号提取特征后,本发明会自动进行特征匹配,将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入的已标记好的样本语音特征参数进行特征匹配,并计算测试语音信号与所有已录入的原始语音信号的相似度,并把测试语音信号分到与其相似度最高的那一模式类别中,最后本发明向外界输出,“这是XXX的声音”类似的报告,例如,如果本发明已存储了张三的语音特征信号,当张三对着本发明说话或唱歌时,本发明经过识别,会自动输出“这是张三的声音”。(6) After the test signal extracts features, the present invention can automatically carry out feature matching, and the feature parameter of the test voice signal of extraction is carried out feature matching with the marked sample voice feature parameter that has entered in memory 33 of the present invention in real time, and Calculate the similarity between the test voice signal and all the original voice signals that have been recorded, and divide the test voice signal into the pattern category with the highest similarity, and finally the present invention outputs to the outside world, "This is the sound of XXX" similar Report, for example, if the present invention has stored Zhang San's voice feature signal, when Zhang San speaks or sings to the present invention, the present invention will automatically output "this is Zhang San's voice" after identification.
当本发明在公共场合测试时,由于测试环境中,同一时间段可能存在多个对象同时说话,即采集到的语音信号是宽带混叠的信号,为防止本发明对此时采集的语音信号特征提取时出错,本发明采用的策略在于,运用智能算法,先匹配和识别出单个人说话时的语音特征参数并进行标识和存储,然后系统再对共同说话时的语音信号进行自动筛选和分离,最后输出识别结果并报告“现在是张三、李四、王五……共同的声音”类似的提示,并提示存在XX个语音未能识别,关闭系统时按下电源关闭键6。When the present invention is tested in public places, because in the test environment, there may be multiple objects speaking at the same time in the same time period, that is, the voice signal collected is a signal of broadband aliasing. Errors in extraction, the strategy adopted by the present invention is to use intelligent algorithms to first match and identify the voice feature parameters of a single person speaking and then identify and store them, and then the system automatically screens and separates the voice signals when speaking together, Output recognition result at last and report " now is the common voice of Zhang San, Li Si, Wang Wu... " similar hint, and prompts to exist XX voice and fails to recognize, press power off key 6 when shutting down the system.
本发明还设计了,系统装置还可以向人们输出对多人交流环境下的识别结果清单,包含测试环境下有多少人或对象在现场说话的数量,以及筛选并播放从多人同时说话的录音中识别分离出每个人所讲的内容,而过滤掉其他人的声音和环境音。The present invention also designs that the system device can also output to people a list of recognition results in a multi-person communication environment, including the number of people or objects speaking on the spot under the test environment, and screen and play recordings from multiple people speaking at the same time Identify and separate out what each person is saying, while filtering out other people's voices and ambient sounds.
当测试语音信号中出现了本发明未存储的标本语音信号特征时,本发明会自动记录未知的该语音信号特征,以提醒人们是否标记并存储该对象的语音信号。When the voice signal feature of the specimen not stored in the present invention appears in the test voice signal, the present invention will automatically record the unknown voice signal feature to remind people whether to mark and store the voice signal of the object.
附图说明Description of drawings
图1为本发明的结构示意图。Fig. 1 is a structural schematic diagram of the present invention.
图2为本发明的系统框架图。Fig. 2 is a system frame diagram of the present invention.
图3为本发明的多层人工神经网络示意图。Fig. 3 is a schematic diagram of a multi-layer artificial neural network of the present invention.
图4为本发明的语音信号改进的神经网络分类算法流程图。Fig. 4 is a flow chart of the improved neural network classification algorithm of the speech signal of the present invention.
具体实施方式Detailed ways
附图1为本发明的一个实施例,结合附图1~附图4具体说明本实施例,包含有框体10,框体10设置有腔体,在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9,语音采集模块1包含有话筒11、无线对讲机12和固定录音器13,语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23,语音信号由语音采集模块1采集,采集到的信号由语音识别模块2处理,数据信号由存储器33保存,人机交互的操作流程以及结果的输出的可视化由显示屏8显示,扬声器35设置为对操作步骤进行语音提示及播报识别结果,网络模块31设置为将本发明与互联网云平台进行连接,中央处理器3设置为对整个系统装置的程序控制及数据运算,无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接,内存卡32设置为将已录制的外部语音数据读入本发明数据库中。Accompanying drawing 1 is an embodiment of the present invention, in conjunction with accompanying drawing 1~accompanying drawing 4 concrete description present embodiment, comprise frame body 10, frame body 10 is provided with cavity, is provided with voice acquisition module 1 in frame body 10 , voice recognition module 2, central processing unit 3, wireless signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 With fixed recorder 13, voice recognition module 2 includes voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal that gathers Processed by the voice recognition module 2, the data signal is stored by the memory 33, the visualization of the operation process of human-computer interaction and the output of the result is displayed by the display screen 8, the speaker 35 is set to perform voice prompts and broadcast recognition results to the operation steps, and the network module 31 It is set to connect the present invention with the Internet cloud platform, the central processing unit 3 is set to program control and data calculation of the entire system device, and the wireless signal transceiver device 4 is set to the wireless interphone 12, smart phone, and network module 31. The radio signal is received, transmitted and the present invention is wirelessly connected to the Internet, and the memory card 32 is set to read the recorded external voice data into the database of the present invention.
在本实施例中,语音输入单元20设置为包含有“语音录入模式”和“语音测试模式”两种类型,可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音,在“语音录入模式”中,语音输入单元20设置为一次只能对一个人或一个对象进行语音录入,其特征在于,录入的语音为一段5~30秒的音频信号,本发明采用多状态语音录入策略,其特征在于,录入的语音中可包含有正常讲话、唱歌或者高/中/低音的多状态组合语音,显示器8实时显示语音波形及完成进度条,录入语音完毕后需要进行数据标记,标记方法采用人工手动标记,如采集完张三的声音,即在本发明显示屏8显示的对话框中备注:“张三的声音”,保存即可,录入的语音保存在存储器33中,在“语音测试模式”下,本发明通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音,测试语音采集过程为实时采集,没有任何人数、对象和时间的限制。In the present embodiment, the voice input unit 20 is set to include two types of "voice input mode" and "voice test mode", which can be provided by the microphone 11 provided by the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and Smartphone input voice in any way, in the "voice input mode", the voice input unit 20 is set to only perform voice input to one person or one object at a time, and it is characterized in that the input voice is a section of 5-30 seconds Audio signal, the present invention adopts the multi-state voice input strategy, it is characterized in that, can comprise the multi-state combination voice of normal speech, singing or high/middle/bass in the voice of input, display 8 real-time display voice waveform and complete progress bar, Need to carry out data labeling after input voice finishes, labeling method adopts artificial manual labeling, as having collected Zhang San's sound, promptly remark in the dialog box that display screen 8 of the present invention shows: " Zhang San's voice ", save and get final product, input The voice of the voice is stored in the memory 33. Under the "voice test mode", the present invention collects together through one or more input tools of the microphone 11 in the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone Test voice, the test voice collection process is real-time collection, without any restrictions on the number of people, objects and time.
在本实施例中,语音输入单元20设置为与语音采集模块1相连接,话筒11通过音频线连接到语音采集模块1,无线对讲机12通过无线电信号与语音采集模块1连接。In this embodiment, the voice input unit 20 is configured to be connected to the voice collection module 1, the microphone 11 is connected to the voice collection module 1 through an audio cable, and the wireless walkie-talkie 12 is connected to the voice collection module 1 through a radio signal.
在本实施例中,语音采集模块1还可采用智能手机进行语音信号输入,通过用手机与本发明语音采集模块1匹配连接,匹配方式包括蓝牙、红外线、WIFI以及扫描二维码进行连接,实现语音录入,相当于把手机当成无线语筒使用,更方便于多人群语音互动。In this embodiment, the voice collection module 1 can also use a smart phone to input voice signals, by matching and connecting the mobile phone with the voice collection module 1 of the present invention, the matching methods include Bluetooth, infrared, WIFI and scanning two-dimensional codes to connect, to achieve Voice recording is equivalent to using the mobile phone as a wireless microphone, which is more convenient for multi-group voice interaction.
在本实施例中,语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号,即将模拟信号转变为数字信号,然后进行常规的信号处理,包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。In this embodiment, the voice preprocessing unit 21 converts the voice signal collected by the voice collection module 1 into an electrical signal, that is, converts an analog signal into a digital signal, and then performs conventional signal processing, including environmental background noise elimination, signal framing , filtering, pre-emphasis, windowing function and endpoint detection, etc.
在本实施例中,语音信号特征提取单元22设置为从原始语音信号中提取出反映语音本质的主要特征参数,形成特征向量xi,xi=(xi1,xi2,…xij,…,xin)T,xij表示第i个对象或个人的第j个语音特征值,特征参数提取方法优选的采用频率倒谱系数法(MFCC),还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征,提取特征后得到的特征向量系统将自动保存到模式类数据库中,一个对象或人的所有声音特征对应一个模式类,若录入N个人或对象的语音后,即得到N个模式类,若每个模式类有n个特征参数,即可构成n维特征空间,即标记后的特征信号集可记为D={(x1,y1),(x2,y2),…(xi,yi),…,(xN,yN)},其中xi∈χ=Rn,xi表示所录入的第i个对象或人的语音特征信号,yi∈Y={1,2,…,N},yi表示第i个人或对象,N表示第N个人或对象的数字编号,标记后的语音特征数据构成模式类数据库,并存储在本发明的存储器33中。In this embodiment, the speech signal feature extraction unit 22 is configured to extract the main feature parameters reflecting the essence of the speech from the original speech signal to form a feature vector x i , x i = (x i1 , x i2 , ... x ij , ... , x in ) T , x ij represents the j-th speech feature value of the i-th object or individual, the feature parameter extraction method preferably adopts frequency cepstral coefficient method (MFCC), and spectral envelope method, LPC interpolation can also be used method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and the feature vector system obtained after feature extraction will be automatically saved in the pattern class database. All sound features of an object or person correspond to a pattern class. After the voices of N people or objects, N pattern classes are obtained. If each pattern class has n feature parameters, it can form an n-dimensional feature space, that is, the marked feature signal set can be recorded as D={(x 1 ,y 1 ),(x 2 ,y 2 ),…( xi ,y i ),…,(x N ,y N )}, where x i ∈ χ=R n , x i represents the i-th input The speech feature signal of an object or person, y i ∈ Y={1,2,…,N}, y i represents the i-th person or object, N represents the number of the N-th person or object, and the marked speech feature data A pattern database is formed and stored in the memory 33 of the present invention.
在本实施例中,特征匹配判别分类单元23设置为采用智能的多类分类器,分类器的学习算法设置为采用改进的神经网络分类算法,通过对已录入并标记的语音特征信号集作为训练数据,让网络模型对训练数据进行学习,得到分类规则,完成分类器的训练;然后利用已经训练好的分类器对未知的测试语音信号进行智能分类和识别;当测试信号提取特征后,本发明会自动进行特征匹配,将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入并标记好的样本语音特征参数进行特征匹配,并计算测试语音信号与所有已录入的样本语音信号的相似度,然后把测试语音信号划分到与其相似度最高的那一样本信号模式类别中,最后本发明向外界输出识别结果,“这是XXX的声音”类似的报告,例如,如果本发明已存储了张三的语音特征信号,当张三对着本发明说话或唱歌时,本发明会自动计算出张三的测试语音特征参数与已录入并标记过的张三的录入语音信号最相似,经过识别,自动输出“这是张三的声音”。In the present embodiment, the feature matching discrimination classification unit 23 is set to adopt an intelligent multiclass classifier, and the learning algorithm of the classifier is set to adopt an improved neural network classification algorithm, and the speech feature signal set that has been entered and marked is used as a training data, let the network model learn the training data, obtain classification rules, and complete the training of the classifier; then use the trained classifier to intelligently classify and identify unknown test voice signals; after the test signal extracts features, the present invention Can automatically carry out feature matching, the feature parameter of the test voice signal of extraction is carried out feature matching with the sample voice feature parameter that has entered and marked in the memory 33 of the present invention in real time, and calculates the test voice signal and all the sample voice signals that have entered The similarity of the test voice signal is then divided into the sample signal pattern category with the highest similarity, and finally the present invention outputs the recognition result to the outside world, a report similar to "this is the sound of XXX", for example, if the present invention has Stored Zhang San's speech characteristic signal, when Zhang San speaks or sings to the present invention, the present invention can automatically calculate Zhang San's test speech characteristic parameter and the input speech signal of Zhang San that has entered and marked the most similar, After identification, automatically output "This is Zhang San's voice".
在本实施例中,多类分类器采用的多层人工神经网络结构,其特征是,网络的一端定义为输入层,另一端定义为输出层,输入层与输出层中间的部分定义为隐含层,输入层用于接收外界的输入信号,重新将输入信号发送给隐含层的所有神经元,经隐含层计算后将结果传递给输出层,输出层从隐含层接收信号,计算后输出分类结果,即识别的结果,本发明优选的隐含层的层数设置为1~200层。In this embodiment, the multi-layer artificial neural network structure adopted by the multi-class classifier is characterized in that one end of the network is defined as the input layer, the other end is defined as the output layer, and the part between the input layer and the output layer is defined as the hidden layer. Layer, the input layer is used to receive the input signal from the outside world, re-send the input signal to all neurons in the hidden layer, and pass the result to the output layer after calculation by the hidden layer, the output layer receives the signal from the hidden layer, and after calculation To output classification results, that is, recognition results, the number of hidden layers is preferably set to 1-200 in the present invention.
在本实施例中,改进的人工神经网络分类算法训练的过程如下:In the present embodiment, the process of the improved artificial neural network classification algorithm training is as follows:
步骤1:网络初始化。根据语音信号录入的个数,不断更新算法数据库,当录入了N个对象的语音信号时,即构成N个模式类,得到样本空间(X,Y),第i组样本即(Xi,Yi),Xi表示对第i个对象所提取的特征向量集合,Yi表示所标记的第i个对象;根据系统输入输出序列(X,Y)确定网络输入层结点数n、隐含层结点数l、输出层结点数m,其中n值由输入信号特征提取中对应特征值的个数确定,m值由存储的语音模式类的个数确定,l的参照值为其中a的取值范围为0~10,由模型自动计算确定,初始化输入层与隐含层的神经元之间的连接权值ωij和隐含层与输出层神经元之间的连接权值ωjk,初始化隐含层阈值a和输出层阈值b,给定学习率η和神经元激励函数。Step 1: Network initialization. According to the number of input voice signals, the algorithm database is continuously updated. When the voice signals of N objects are input, N pattern classes are formed, and the sample space (X, Y) is obtained. The i-th group of samples is (X i , Y i ), X i represents the set of feature vectors extracted from the i-th object, and Y i represents the marked i-th object; according to the system input and output sequence (X, Y), determine the network input layer node number n, the hidden layer The number of nodes l, the number of nodes in the output layer m, wherein the value of n is determined by the number of corresponding feature values in the input signal feature extraction, the value of m is determined by the number of stored speech pattern classes, and the reference value of l is The value of a ranges from 0 to 10, which is automatically calculated and determined by the model, and the connection weight ω ij between the neurons of the input layer and the hidden layer and the connection weight between the neurons of the hidden layer and the output layer are initialized ω jk , initialize hidden layer threshold a and output layer threshold b, given learning rate η and neuron activation function.
步骤2:计算隐含层的输出。根据输入变更X,输入层与隐含层的神经元的连接权值ωij,以及隐含层阈值a,计算隐含层输出H;记第j个隐含层结点的输出为Hj,j=1,2,…,l,其中l为隐含层结点数,f为隐含层激励函数,所述激励函数有多种,本发明优选的采用f(x)=(1+e-x)-1。Step 2: Calculate the output of the hidden layer. According to the input change X, the connection weight ω ij of the neurons in the input layer and the hidden layer, and the hidden layer threshold a, calculate the hidden layer output H; record the output of the jth hidden layer node as H j , j=1,2,...,l, where l is the number of nodes in the hidden layer, f is the hidden layer activation function, and there are many kinds of activation functions, and the present invention preferably adopts f(x) = (1+e- x ) -1 .
步骤3:计算输出层的输出。根据隐含层输出H,隐含层与输出层神经元之间的连接权值ωjk,以及输出层阈值b,计算输出层输出O,记第k个输出层结点的输出为Ok,k=1,2,…,m,其中m为输出层结点数,bk为输出层第k个结点的阈值,Hj为隐含层第j个结点的输出值。Step 3: Compute the output of the output layer. According to the hidden layer output H, the connection weight ω jk between the hidden layer and the output layer neurons, and the output layer threshold b, calculate the output layer output O, record the output of the kth output layer node as O k , k=1,2,...,m, where m is the number of nodes in the output layer, b k is the threshold of the kth node in the output layer, and H j is the output value of the jth node in the hidden layer.
步骤4:计算预测误差。根据网络预测得到的输出O和期望输出Y(真值),计算网络预测总误差e,ek为第k个输出层结点产生的误差,步骤5:更新权值。根据网络预测总误差e更新网络连接权值ωjk和ωij,ωjk +=ωjk+η·Hj·Ek,其中j=1,2,…,l,k=1,2,…,m,η为学习率,Ek表示输出层结点的网络总误差对输出层网络结点k的灵敏度, 其中i=1,2,…,n,j=1,2,…,l。Step 4: Calculate the prediction error. According to the output O and the expected output Y (true value) obtained by the network prediction, calculate the total error e of the network prediction, e k is the error generated by the kth output layer node, Step 5: Update weights. Update the network connection weights ω jk and ω ij according to the total network prediction error e, ω jk + =ω jk +η·H j ·E k , where j=1,2,...,l, k=1,2,... , m, η is the learning rate, E k represents the sensitivity of the network total error of the output layer node to the output layer network node k, Where i=1,2,...,n, j=1,2,...,l.
步骤6:阈值更新。根据网络预测总误差e更新隐含层阈值a和输出层阈值b,j=1,2,…,l;bk +=bk+η·Ek,k=1,2,…,m。Step 6: Threshold update. Update the hidden layer threshold a and the output layer threshold b according to the network prediction total error e, j=1,2,...,l; b k + =b k +η·E k , k=1,2,...,m.
步骤7:判断算法迭代是否收敛,若没收敛返回步骤2,本发明优选的最小误差为0.001时结束迭代。Step 7: Judging whether the algorithm iteration is convergent, if not, return to step 2, and the preferred minimum error of the present invention is 0.001 to end the iteration.
在本实施例中,语音采集模块1内置有语音采集卡,用于收集和处理采集到的语音信号。In this embodiment, the voice collection module 1 has a built-in voice collection card for collecting and processing the collected voice signals.
在本实施例中,固定录音器13采用防风式麦克风。In this embodiment, the fixed recorder 13 adopts a windproof microphone.
在本实施例中,显示屏8采用带背景灯的触摸屏或LED显示屏。In this embodiment, the display screen 8 adopts a touch screen or an LED display screen with a backlight.
在本实施例中,固定录音器13可以设置多个,布置在本发明外壳处,用于增加语音录制强度。In this embodiment, there may be multiple fixed recorders 13 arranged at the casing of the present invention to increase the intensity of voice recording.
本发明对已录入并标记好的语音信号具有长期存储功能,凡是存储在本发明语音模式类数据库中的语音信号,本发明都可随时调取与未知测试语音进行对比识别。The present invention has a long-term storage function for the voice signals that have been entered and marked. Any voice signal stored in the voice pattern database of the present invention can be retrieved at any time with unknown test voices for comparison and recognition.
本发明的使用流程是,先打开电源开关5,然后系统自动运行,显示屏8点亮并显示操作界面,人们可以选择“语音录入模式”和“语音测试模式”两种功能。The use procedure of the present invention is that first turn on the power switch 5, then the system runs automatically, the display screen 8 lights up and displays the operation interface, and people can select two functions of "voice input mode" and "voice test mode".
(1)当选择语音录入时,中央处理器3会控制语音输入单元20进入“语音录入模式”,显示屏8和扬声器35会同时提示“现在是语音录入模式,请说话”类似的提示,人们可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音;为保证本发明能够准确识别和量化被识别对象的语音特征,因此在“语音录入模式”阶段每次只能对一个人或一个对象进行语音录入,由于同一人在说话和唱歌时发出的声音信号数据会存在一定的特征偏差,因此,为提高语音信号识别的准确度,本发明采用多状态语音录入策略,即录入的语音中可包含正常讲话、唱歌或者高/中/低音及其他状态下的多状态组合声音,录音时长为5~30秒,显示器8会显示语音实时波形及完成进度条,如果录制的语音不理想可以删除再次录入,录入语音完毕后需要进行数据标记,标记方法采用人工手动标记,如采集完张三的声音,即在本发明显示屏8显示的对话框中备注:“张三的声音”,保存即可,录入的语音存储在本发明的存储器33中。(1) When selecting voice input, central processing unit 3 can control voice input unit 20 to enter " voice input mode ", and display screen 8 and loudspeaker 35 can prompt simultaneously " now is voice input mode, please speak " similar hint, people Can input voice in any mode of microphone 11 provided by voice collection module 1, wireless walkie-talkie 12, fixed recorder 13 and smart phone; in order to ensure that the present invention can accurately identify and quantify the voice characteristics of the identified object, therefore in "voice In the "recording mode" stage, only one person or one object can be recorded at a time. Since the sound signal data emitted by the same person when speaking and singing will have certain characteristic deviations, in order to improve the accuracy of speech signal recognition, this The invention adopts a multi-state voice input strategy, that is, the recorded voice can include normal speech, singing, or high/middle/bass and other multi-state combined sounds. The recording time is 5 to 30 seconds, and the display 8 will display the real-time waveform of the voice And complete the progress bar, if the voice of recording is unsatisfactory can delete input again, need to carry out data mark after input voice completes, and marking method adopts artificial manual mark, as gathering the sound of Zhang San, namely the dialogue shown in display screen 8 of the present invention Remarks in the frame: "Zhang San's voice", just save it, and the input voice is stored in the memory 33 of the present invention.
(2)语音信号录入完毕后,本发明的控制系统自动将已标记的语音信号送入语音预处理单元21,语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号,即将模拟信号转变为数字信号,然后进行常规的信号处理,包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。(2) After the voice signal has been entered, the control system of the present invention automatically sends the marked voice signal into the voice preprocessing unit 21, and the voice preprocessing unit 21 converts the voice signal collected by the voice acquisition module 1 into an electrical signal, which is about to The analog signal is converted into a digital signal, and then undergoes conventional signal processing, including environmental background noise elimination, signal framing, filtering, pre-emphasis, window function and endpoint detection, etc.
(3)本发明的控制系统自动把已预处理后语音信号送入信号特征提取单元22,语音信号特征提取单元22从预处理后的语音信号中提取出反映语音本质的特征参数,得到特征向量xi,特征参数提取方法优选的采用频率倒谱系数法(MFCC),还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征,提取特征后得到的特征向量系统自动保存到模式类别库中,一个人的所有声音特征对应一个模式类,若录入N个人语音后,即得到N个模式类,若每个模式类有n个特征参数,从而得到一人对应语音信号模式类的数据库,所有的数据都存储在本发明的存储器33中,至此,语言信号录入模式内容完毕。(3) control system of the present invention sends into signal feature extraction unit 22 automatically to speech signal after preprocessing, and speech signal feature extraction unit 22 extracts the feature parameter reflecting speech essence from the speech signal after preprocessing, obtains feature vector x i , the feature parameter extraction method preferably adopts frequency cepstrum coefficient method (MFCC), and can also use spectral envelope method, LPC interpolation method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and extract features The feature vector system obtained after that is automatically saved in the pattern category library. All the voice features of a person correspond to a pattern category. If the voices of N individuals are input, N pattern categories are obtained. Thereby obtain the database of one person's corresponding speech signal pattern class, all data are all stored in the memory 33 of the present invention, so far, speech signal input pattern content is finished.
(4)语音录入完毕后,可进行语音测试,当进行语音测试时,只需要在显示屏8的操作界面中选择“语音测试模式”即可,中央处理器3会控制语音输入单元20进入“语音测试模式”,显示屏8和扬声器35会同时提示“语音测试中…”类似的提示,这时人们不取要做任何操作,本发明会通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音,测试语音采集过程为实时采集,没有任何时间限制和人数的限制。(4) After the voice entry is complete, the voice test can be carried out. When carrying out the voice test, it is only necessary to select "voice test mode" in the operation interface of the display screen 8, and the central processing unit 3 can control the voice input unit 20 to enter " Voice test mode", display screen 8 and loudspeaker 35 can prompt "in voice test..." similar prompt at the same time, at this moment people do not get to do any operation, the present invention can pass microphone 11, wireless walkie-talkie 12 in voice collection module 1 , the fixed recorder 13 and one or more input tools of the smart phone collect the test voice together, and the test voice collection process is real-time collection without any time limit and number of people.
(5)在“语音测试模式”下采集到的语音数据,本发明系统装置会自动地对测试语音信号进行预处理和特征提取,将采集到的语音测试信号转化为电信号,并进行常规的滤波、去除噪音、加窗函数及端点检测后进行信号特征提取。(5) voice data collected under "voice test mode", the system device of the present invention can automatically carry out preprocessing and feature extraction to the test voice signal, convert the voice test signal collected into an electrical signal, and perform conventional Signal feature extraction is performed after filtering, noise removal, window function and endpoint detection.
(6)测试信号提取特征后,本发明会自动进行特征匹配,将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入的已标记好的样本语音特征参数进行特征匹配,并计算测试语音信号与所有已录入的原始语音信号的相似度,并把测试语音信号分到与其相似度最高的那一模式类别中,最后本发明向外界输出,“这是XXX的声音”类似的报告,例如,如果本发明已存储了张三的语音特征信号,当张三对着本发明说话或唱歌时,本发明经过识别,会自动输出“这是张三的声音”。(6) After the test signal extracts features, the present invention can automatically carry out feature matching, and the feature parameter of the test voice signal of extraction is carried out feature matching with the marked sample voice feature parameter that has entered in memory 33 of the present invention in real time, and Calculate the similarity between the test voice signal and all the original voice signals that have been recorded, and divide the test voice signal into the pattern category with the highest similarity, and finally the present invention outputs to the outside world, "This is the sound of XXX" similar Report, for example, if the present invention has stored Zhang San's voice feature signal, when Zhang San speaks or sings to the present invention, the present invention will automatically output "this is Zhang San's voice" after identification.
当本发明在公共场合测试时,由于测试环境中,同一时间段可能存在多个对象同时说话,即采集到的语音信号是宽带混叠的信号,为防止本发明对此时采集的语音信号特征提取时出错,本发明采用的策略在于,运用智能算法,先匹配和识别出单个人说话时的语音特征参数并进行标识和存储,然后系统再对共同说话时的语音信号进行自动筛选和分离,最后输出识别结果并报告“现在是张三、李四、王五……共同的声音”类似的提示,并提示存在XX个语音未能识别,关闭系统时按下电源关闭键6。When the present invention is tested in public places, because in the test environment, there may be multiple objects speaking at the same time in the same time period, that is, the voice signal collected is a signal of broadband aliasing. Errors in extraction, the strategy adopted by the present invention is to use intelligent algorithms to first match and identify the voice feature parameters of a single person speaking and then identify and store them, and then the system automatically screens and separates the voice signals when speaking together, Output recognition result at last and report " now is the common voice of Zhang San, Li Si, Wang Wu... " similar hint, and prompts to exist XX voice and fails to recognize, press power off key 6 when shutting down the system.
在本实施例中,系统装置还可以向人们输出对多人交流环境下的识别结果清单,包含测试环境下有多少人或对象在现场说话的数量,以及筛选并播放从多人同时说话的录音中识别分离出每个人所讲的内容,而过滤掉其他人的声音和环境音。In this embodiment, the system device can also output to people a list of recognition results in a multi-person communication environment, including the number of people or objects speaking on the spot in the test environment, and screen and play recordings from multiple people speaking at the same time Identify and separate out what each person is saying, while filtering out other people's voices and ambient sounds.
当测试语音信号中出现了本发明未存储的标本语音信号特征时,本发明会自动记录未知的该语音信号特征,以提醒人们是否标记并存储该对象的语音信号。When the voice signal feature of the specimen not stored in the present invention appears in the test voice signal, the present invention will automatically record the unknown voice signal feature to remind people whether to mark and store the voice signal of the object.
在智能语音信号模式识别系统装置技术领域内;凡是包含有框体10,框体10设置有腔体,在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9,语音采集模块1包含有话筒11、无线对讲机12和固定录音器13,语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23,语音信号由语音采集模块1采集,采集到的信号由语音识别模块2处理,数据信号由存储器33保存,人机交互的操作流程以及结果的输出的可视化由显示屏8显示,扬声器35设置为对操作步骤进行语音提示及播报识别结果,网络模块31设置为将本发明与互联网云平台进行连接,中央处理器3设置为对整个系统装置的程序控制及数据运算,无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接,内存卡32设置为将已录制的外部语音数据读入本发明数据库中的技术内容都在本发明的保护范围内,应当指出,本发明保护范围不应受限于外形特征,本发明的框体10的造型可以设置为方形、圆柱形、多棱柱体形或类似于白菜、西瓜、石头等其他造型,凡是造型不同而实质的技术内容与本发明相同的一切技术内容也在本发明的保护范围之内;同时,本技术领域技术人员在本发明内容的基础上作常规的显而易见的小改进或小组合,只要技术内容包含在本发明所记载的内容范围之内的技术内容也在本发明的保护范围内。In the technical field of intelligent voice signal pattern recognition system device; every frame body 10 is included, the frame body 10 is provided with a cavity, and a voice collection module 1, a voice recognition module 2, a central processing unit 3, a wireless Signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, and voice recognition module 2 includes Voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal collected is processed by voice recognition module 2, and data signal is processed by memory 33 Preserve, the visualization of the output of the operation process of man-machine interaction and result is shown by display screen 8, loudspeaker 35 is set to carry out voice prompt and broadcast recognition result to operation step, and network module 31 is set to the present invention and Internet cloud platform are connected, The central processing unit 3 is set to the program control and data calculation of the whole system device, and the wireless signal transceiver 4 is set to receive and transmit the radio signals generated by the wireless walkie-talkie 12, the smart phone, and the network module 31 and connect the present invention with the Internet. Wireless connection, memory card 32 is set to the technical content that the recorded external voice data is read in the database of the present invention all within the scope of protection of the present invention, it should be pointed out that the scope of protection of the present invention should not be limited to appearance feature, the present invention The molding of the frame body 10 can be set to square, cylindrical, polygonal prism or other moldings similar to cabbage, watermelon, stone, etc., and all technical contents that are different in molding and substantially technical content are also the same as the present invention. At the same time, those skilled in the art make conventional obvious small improvements or small combinations on the basis of the content of the present invention, as long as the technical content is included in the scope of the content of the present invention. within the scope of protection of the invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253194.6A CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253194.6A CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107808659A true CN107808659A (en) | 2018-03-16 |
Family
ID=61589300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711253194.6A Pending CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808659A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520752A (en) * | 2018-04-25 | 2018-09-11 | 西北工业大学 | Method and device for voiceprint recognition |
CN108564954A (en) * | 2018-03-19 | 2018-09-21 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, auth method and storage medium |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN108877823A (en) * | 2018-07-27 | 2018-11-23 | 三星电子(中国)研发中心 | Sound enhancement method and device |
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN109611703A (en) * | 2018-10-19 | 2019-04-12 | 宁波市鄞州利帆灯饰有限公司 | A kind of LED light being easily installed |
CN109714491A (en) * | 2019-02-26 | 2019-05-03 | 上海凯岸信息科技有限公司 | Intelligent sound outgoing call detection system based on voice mail |
CN109785855A (en) * | 2019-01-31 | 2019-05-21 | 秒针信息技术有限公司 | Method of speech processing and device, storage medium, processor |
CN109801619A (en) * | 2019-02-13 | 2019-05-24 | 安徽大尺度网络传媒有限公司 | A kind of across language voice identification method for transformation of intelligence |
CN109859763A (en) * | 2019-02-13 | 2019-06-07 | 安徽大尺度网络传媒有限公司 | A kind of intelligent sound signal type recognition system |
CN109936814A (en) * | 2019-01-16 | 2019-06-25 | 深圳市北斗智能科技有限公司 | A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system |
CN110033785A (en) * | 2019-03-27 | 2019-07-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device |
CN110060717A (en) * | 2019-01-02 | 2019-07-26 | 孙剑 | A kind of law enforcement equipment laws for criterion speech French play system |
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN110297179A (en) * | 2018-05-11 | 2019-10-01 | 宫文峰 | Diesel-driven generator failure predication and monitoring system device based on integrated deep learning |
CN111314451A (en) * | 2020-02-07 | 2020-06-19 | 普强时代(珠海横琴)信息技术有限公司 | Language processing system based on cloud computing application |
CN111475206A (en) * | 2019-01-04 | 2020-07-31 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN111603191A (en) * | 2020-05-29 | 2020-09-01 | 上海联影医疗科技有限公司 | Voice noise reduction method and device in medical scanning and computer equipment |
CN111674360A (en) * | 2019-01-31 | 2020-09-18 | 青岛科技大学 | A method for establishing a discriminative sample model in a vehicle tracking system based on blockchain |
CN111989742A (en) * | 2018-04-13 | 2020-11-24 | 三菱电机株式会社 | Speech recognition system and method for using speech recognition system |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11265197A (en) * | 1997-12-13 | 1999-09-28 | Hyundai Electronics Ind Co Ltd | Voice recognizing method utilizing variable input neural network |
US6026358A (en) * | 1994-12-22 | 2000-02-15 | Justsystem Corporation | Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network |
CN1662956A (en) * | 2002-06-19 | 2005-08-31 | 皇家飞利浦电子股份有限公司 | Mega speaker identification (ID) system and corresponding methods therefor |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
CN101419799A (en) * | 2008-11-25 | 2009-04-29 | 浙江大学 | Speaker identification method based mixed t model |
US20100057453A1 (en) * | 2006-11-16 | 2010-03-04 | International Business Machines Corporation | Voice activity detection system and method |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
CN103619021A (en) * | 2013-12-10 | 2014-03-05 | 天津工业大学 | Neural network-based intrusion detection algorithm for wireless sensor network |
JP2014048534A (en) * | 2012-08-31 | 2014-03-17 | Sogo Keibi Hosho Co Ltd | Speaker recognition device, speaker recognition method, and speaker recognition program |
US20140195236A1 (en) * | 2013-01-10 | 2014-07-10 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
CN104008751A (en) * | 2014-06-18 | 2014-08-27 | 周婷婷 | Speaker recognition method based on BP neural network |
US20160260428A1 (en) * | 2013-11-27 | 2016-09-08 | National Institute Of Information And Communications Technology | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106227038A (en) * | 2016-07-29 | 2016-12-14 | 中国人民解放军信息工程大学 | Grain drying tower intelligent control method based on neutral net and fuzzy control |
CN106782603A (en) * | 2016-12-22 | 2017-05-31 | 上海语知义信息技术有限公司 | Intelligent sound evaluating method and system |
CN106779053A (en) * | 2016-12-15 | 2017-05-31 | 福州瑞芯微电子股份有限公司 | The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method |
CN106875943A (en) * | 2017-01-22 | 2017-06-20 | 上海云信留客信息科技有限公司 | A kind of speech recognition system for big data analysis |
US20170178666A1 (en) * | 2015-12-21 | 2017-06-22 | Microsoft Technology Licensing, Llc | Multi-speaker speech separation |
US20170270919A1 (en) * | 2016-03-21 | 2017-09-21 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
CN112541533A (en) * | 2020-12-07 | 2021-03-23 | 阜阳师范大学 | Modified vehicle identification method based on neural network and feature fusion |
-
2017
- 2017-12-02 CN CN201711253194.6A patent/CN107808659A/en active Pending
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026358A (en) * | 1994-12-22 | 2000-02-15 | Justsystem Corporation | Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network |
JPH11265197A (en) * | 1997-12-13 | 1999-09-28 | Hyundai Electronics Ind Co Ltd | Voice recognizing method utilizing variable input neural network |
CN1662956A (en) * | 2002-06-19 | 2005-08-31 | 皇家飞利浦电子股份有限公司 | Mega speaker identification (ID) system and corresponding methods therefor |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
US20100057453A1 (en) * | 2006-11-16 | 2010-03-04 | International Business Machines Corporation | Voice activity detection system and method |
CN101419799A (en) * | 2008-11-25 | 2009-04-29 | 浙江大学 | Speaker identification method based mixed t model |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
JP2014048534A (en) * | 2012-08-31 | 2014-03-17 | Sogo Keibi Hosho Co Ltd | Speaker recognition device, speaker recognition method, and speaker recognition program |
US20140195236A1 (en) * | 2013-01-10 | 2014-07-10 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
US20160260428A1 (en) * | 2013-11-27 | 2016-09-08 | National Institute Of Information And Communications Technology | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model |
CN103619021A (en) * | 2013-12-10 | 2014-03-05 | 天津工业大学 | Neural network-based intrusion detection algorithm for wireless sensor network |
CN104008751A (en) * | 2014-06-18 | 2014-08-27 | 周婷婷 | Speaker recognition method based on BP neural network |
US20170178666A1 (en) * | 2015-12-21 | 2017-06-22 | Microsoft Technology Licensing, Llc | Multi-speaker speech separation |
US20170270919A1 (en) * | 2016-03-21 | 2017-09-21 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106227038A (en) * | 2016-07-29 | 2016-12-14 | 中国人民解放军信息工程大学 | Grain drying tower intelligent control method based on neutral net and fuzzy control |
CN106779053A (en) * | 2016-12-15 | 2017-05-31 | 福州瑞芯微电子股份有限公司 | The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method |
CN106782603A (en) * | 2016-12-22 | 2017-05-31 | 上海语知义信息技术有限公司 | Intelligent sound evaluating method and system |
CN106875943A (en) * | 2017-01-22 | 2017-06-20 | 上海云信留客信息科技有限公司 | A kind of speech recognition system for big data analysis |
CN112541533A (en) * | 2020-12-07 | 2021-03-23 | 阜阳师范大学 | Modified vehicle identification method based on neural network and feature fusion |
Non-Patent Citations (4)
Title |
---|
刘拥军等;: "基于神经网络算法的粮食智能控制系统研究", 计算机与数字工程, vol. 44, no. 07, pages 1271 - 1276 * |
曾向阳等: "声信号处理基础", vol. 1, 30 September 2015, 西北工业大学出版社, pages: 160 - 163 * |
王小川等: "MATLAB神经网络43个案例分析", vol. 1, 31 August 2013, 北京航空航天大学出版社 , pages: 8 - 10 * |
赵力: "语音信号处理", vol. 1, 31 March 2003, 机械工业出版社, pages: 141 - 145 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564954A (en) * | 2018-03-19 | 2018-09-21 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, auth method and storage medium |
CN108564954B (en) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, identity verification method, and storage medium |
CN111989742A (en) * | 2018-04-13 | 2020-11-24 | 三菱电机株式会社 | Speech recognition system and method for using speech recognition system |
CN108520752A (en) * | 2018-04-25 | 2018-09-11 | 西北工业大学 | Method and device for voiceprint recognition |
CN108520752B (en) * | 2018-04-25 | 2021-03-12 | 西北工业大学 | A voiceprint recognition method and device |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN110297179A (en) * | 2018-05-11 | 2019-10-01 | 宫文峰 | Diesel-driven generator failure predication and monitoring system device based on integrated deep learning |
CN108877823A (en) * | 2018-07-27 | 2018-11-23 | 三星电子(中国)研发中心 | Sound enhancement method and device |
CN109611703A (en) * | 2018-10-19 | 2019-04-12 | 宁波市鄞州利帆灯饰有限公司 | A kind of LED light being easily installed |
CN110060717A (en) * | 2019-01-02 | 2019-07-26 | 孙剑 | A kind of law enforcement equipment laws for criterion speech French play system |
CN111475206A (en) * | 2019-01-04 | 2020-07-31 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN111475206B (en) * | 2019-01-04 | 2023-04-11 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN109936814A (en) * | 2019-01-16 | 2019-06-25 | 深圳市北斗智能科技有限公司 | A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system |
CN109785855B (en) * | 2019-01-31 | 2022-01-28 | 秒针信息技术有限公司 | Voice processing method and device, storage medium and processor |
CN109785855A (en) * | 2019-01-31 | 2019-05-21 | 秒针信息技术有限公司 | Method of speech processing and device, storage medium, processor |
CN111674360A (en) * | 2019-01-31 | 2020-09-18 | 青岛科技大学 | A method for establishing a discriminative sample model in a vehicle tracking system based on blockchain |
CN109859763A (en) * | 2019-02-13 | 2019-06-07 | 安徽大尺度网络传媒有限公司 | A kind of intelligent sound signal type recognition system |
CN109801619A (en) * | 2019-02-13 | 2019-05-24 | 安徽大尺度网络传媒有限公司 | A kind of across language voice identification method for transformation of intelligence |
CN109714491A (en) * | 2019-02-26 | 2019-05-03 | 上海凯岸信息科技有限公司 | Intelligent sound outgoing call detection system based on voice mail |
CN110033785A (en) * | 2019-03-27 | 2019-07-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device |
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN111314451A (en) * | 2020-02-07 | 2020-06-19 | 普强时代(珠海横琴)信息技术有限公司 | Language processing system based on cloud computing application |
CN111603191A (en) * | 2020-05-29 | 2020-09-01 | 上海联影医疗科技有限公司 | Voice noise reduction method and device in medical scanning and computer equipment |
CN111603191B (en) * | 2020-05-29 | 2023-10-20 | 上海联影医疗科技股份有限公司 | Speech noise reduction method and device in medical scanning and computer equipment |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
CN113572492B (en) * | 2021-06-23 | 2022-08-16 | 力声通信股份有限公司 | Communication equipment prevents falling digital intercom |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808659A (en) | Intelligent sound signal type recognition system device | |
CN110853618B (en) | Language identification method, model training method, device and equipment | |
CN110136727B (en) | Speaker identification method, device and storage medium based on speaking content | |
CN108305616B (en) | Audio scene recognition method and device based on long-time and short-time feature extraction | |
CN112151030B (en) | Multi-mode-based complex scene voice recognition method and device | |
CN108701453B (en) | Modular deep learning model | |
CN105940407B (en) | System and method for assessing the intensity of audio password | |
CN105374356B (en) | Audio recognition method, speech assessment method, speech recognition system and speech assessment system | |
KR101614756B1 (en) | Apparatus of voice recognition, vehicle and having the same, method of controlling the vehicle | |
WO2020244402A1 (en) | Speech interaction wakeup electronic device and method based on microphone signal, and medium | |
CN109074806A (en) | Distributed audio output is controlled to realize voice output | |
CN107112006A (en) | Speech processes based on neutral net | |
CN108364662B (en) | Voice emotion recognition method and system based on paired identification tasks | |
CN110853617A (en) | Model training method, language identification method, device and equipment | |
CN111429919B (en) | Crosstalk prevention method based on conference real recording system, electronic device and storage medium | |
CN109377981B (en) | Phoneme alignment method and device | |
CN110097875A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
WO2023222089A1 (en) | Item classification method and apparatus based on deep learning | |
WO2020244411A1 (en) | Microphone signal-based voice interaction wakeup electronic device and method, and medium | |
CN117762372A (en) | Multi-mode man-machine interaction system | |
Sun et al. | A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea | |
CN110728993A (en) | Voice change identification method and electronic equipment | |
Espi et al. | Spectrogram patch based acoustic event detection and classification in speech overlapping conditions | |
CN102141812A (en) | Robot | |
KR102113879B1 (en) | The method and apparatus for recognizing speaker's voice by using reference database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180316 |