CN107808659A

CN107808659A - Intelligent sound signal type recognition system device

Info

Publication number: CN107808659A
Application number: CN201711253194.6A
Authority: CN
Inventors: 宫文峰; 张泽辉; 刘志勇
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-12-02
Filing date: 2017-12-02
Publication date: 2018-03-16

Abstract

An intelligent voice signal pattern recognition system device, comprising a frame body 10, the frame body 10 is provided with a cavity, and a voice collection module 1, a voice recognition module 2, a central processing unit 3, a wireless signal Transceiver 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, voice recognition module 2 includes voice Input unit 20, speech preprocessing unit 21, speech signal feature extraction unit 22, feature matching discrimination classification unit 23, speech signal is collected by speech acquisition module 1, the signal that gathers is processed by speech recognition module 2, and data signal is preserved by memory 33 , the visualization of the operation flow of the human-computer interaction and the output of the result is displayed by the display screen 8, so it is more convenient for people to recognize the voice signal.

Description

Intelligent voice signal pattern recognition system device

技术领域technical field

本发明公开了一种智能语音信号模式识别系统装置，属于智能电子产品技术领域，具体地说是装备了语音采集模块、语音识别模块、控制系统及扬声器为一体的一种智能语音信号模式识别系统装置。The invention discloses an intelligent voice signal pattern recognition system device, which belongs to the technical field of intelligent electronic products, and specifically is an intelligent voice signal pattern recognition system equipped with a voice collection module, a voice recognition module, a control system and a loudspeaker. device.

背景技术Background technique

在人们的日常生活中，存在着各种各样的语号信号，如人们的交流发出的语音信号、机器运作产生的声音、播放音乐发出的声音、汽车鸣笛产生的声音等，语音信号几乎充斥了整个生活环境周围，有些时候人们希望准确的获悉和识别一组语音信号中是由哪些对象发出的。在常见的声音信号中，人们往往可以辨别出不同的声音是由什么物体发出的，但是当多种对象同时发出声音时，尤其是多个同类对象同时发声时，或者录音环境嘈杂，人们很难区别哪种声音是由哪个物体发出的，例如，在一组多人辩论现象的录音中，讲话的人数较多时，人们很难通过听录音而区分出哪些话是哪个辩手说的。因此，人们通常需要一种能够识别语音的装置。In people's daily life, there are all kinds of speech signals, such as speech signals from people's communication, sounds from machine operation, sounds from playing music, sounds from car whistles, etc. The speech signals are almost Flooded around the entire living environment, sometimes people want to accurately learn and identify which objects in a group of voice signals are sent. In common sound signals, people can often distinguish what objects make different sounds, but when multiple objects make sounds at the same time, especially when multiple objects of the same kind make sounds at the same time, or the recording environment is noisy, it is difficult for people Distinguish which sound is made by which object, for example, in a group of recordings of a multi-person debate phenomenon, when there are many people speaking, it is difficult for people to distinguish which words are spoken by which debater by listening to the recording. Therefore, people usually need a device capable of recognizing speech.

在本发明之前，市面上也存在一些识别语音的产品，倒如一些语音输入软件等，但是大多是识别语音中的文字或字母，或者是对简单的单一语音进行配对识别，也有的可以通过对着手机等产品说话，手机识别语音语义后完成某些任务，如打电话搜索等简单任务，但是无法实现对语音特征的区别，不能准确的识别区分出相似语音或相同的词语是由哪个人或对象说出的类似问题。因此，不便于人们的灵活使用。Before the present invention, there were also some voice recognition products on the market, such as some voice input software, etc., but most of them recognized words or letters in the voice, or paired and recognized simple single voices. Talking with mobile phones and other products, the mobile phone can complete certain tasks after recognizing the semantics of the voice, such as simple tasks such as calling and searching, but it cannot distinguish the characteristics of the voice, and cannot accurately identify the origin of similar voices or the same words. Similar questions uttered by the subject. Therefore, it is not convenient for people to use flexibly.

发明内容Contents of the invention

为了克服上述技术缺点，本发明的目的是提供一种智能语音信号模式识别系统装置，可以方便的识别和记录语音信号及提出特征参数，并通过对现有信号进行对未知语音信号进行智能模式识别、分类和提取。In order to overcome the above-mentioned technical shortcomings, the object of the present invention is to provide an intelligent voice signal pattern recognition system device, which can easily identify and record voice signals and propose characteristic parameters, and perform intelligent pattern recognition on unknown voice signals by performing existing signals , classification and extraction.

为达到上述目的，本发明采取的技术方案是：包含有框体10，框体10设置有腔体，在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9，语音采集模块1包含有话筒11、无线对讲机12和固定录音器13，语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23，语音信号由语音采集模块1采集，采集到的信号由语音识别模块2处理，数据信号由存储器33保存，人机交互的操作流程以及结果的输出的可视化由显示屏8显示，扬声器35设置为对操作步骤进行语音提示及播报识别结果，网络模块31设置为将本发明与互联网云平台进行连接，中央处理器3设置为对整个系统装置的程序控制及数据运算，无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接，内存卡32设置为将已录制的外部语音数据读入本发明数据库中。In order to achieve the above-mentioned purpose, the technical scheme that the present invention takes is: comprise frame body 10, frame body 10 is provided with cavity, is provided with speech collection module 1, speech recognition module 2, central processing unit 3, wireless in frame body 10 Signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, and voice recognition module 2 includes Voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal collected is processed by voice recognition module 2, and data signal is processed by memory 33 Preserve, the visualization of the output of the operation process of man-machine interaction and result is shown by display screen 8, loudspeaker 35 is set to carry out voice prompt and broadcast recognition result to operation step, and network module 31 is set to the present invention and Internet cloud platform are connected, The central processing unit 3 is set to the program control and data calculation of the whole system device, and the wireless signal transceiver 4 is set to receive and transmit the radio signals generated by the wireless walkie-talkie 12, the smart phone, and the network module 31 and connect the present invention with the Internet. For wireless connection, the memory card 32 is set to read the recorded external voice data into the database of the present invention.

本发明设计了，语音输入单元20设置为包含有“语音录入模式”和“语音测试模式”两种类型，可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音，在“语音录入模式”中，语音输入单元20设置为一次只能对一个人或一个对象进行语音录入，其特征在于，录入的语音为一段5～30秒的音频信号，本发明采用多状态语音录入策略，其特征在于，录入的语音中可包含有正常讲话、唱歌或者高/中/低音的多状态组合语音，显示器8实时显示语音波形及完成进度条，录入语音完毕后需要进行数据标记，标记方法采用人工手动标记，如采集完张三的声音，即在本发明显示屏8显示的对话框中备注：“张三的声音”，保存即可，录入的语音保存在存储器33中，在“语音测试模式”下，本发明通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音，测试语音采集过程为实时采集，没有任何人数、对象和时间的限制。The present invention has designed that the voice input unit 20 is set to include two types of "voice input mode" and "voice test mode", which can be provided by the microphone 11 provided by the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone. Input voice in any mode of the mobile phone. In the "voice input mode", the voice input unit 20 is set to only perform voice input on one person or one object at a time. It is characterized in that the input voice is a 5-30 second audio Signal, the present invention adopts the multi-state voice input strategy, it is characterized in that, can comprise the multi-state combined voice of normal speech, singing or high/middle/bass in the voice of input, display 8 real-time display voice waveform and complete progress bar, input Need to carry out data labeling after speech is finished, labeling method adopts manual manual labeling, as having collected Zhang San's sound, promptly remark in the dialog box that display screen 8 of the present invention shows: " Zhang San's voice ", preserve and get final product, input The voice is stored in the memory 33. Under the "voice test mode", the present invention collects and tests together with one or more input tools of the microphone 11 in the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone. Voice, the test voice collection process is real-time collection, without any restrictions on the number of people, objects and time.

本发明设计了，语音输入单元20设置为与语音采集模块1相连接，话筒11通过音频线连接到语音采集模块1，无线对讲机12通过无线电信号与语音采集模块1连接。The present invention designs, voice input unit 20 is set to be connected with voice collection module 1, microphone 11 is connected to voice collection module 1 by audio line, wireless walkie-talkie 12 is connected with voice collection module 1 by radio signal.

本发明设计了，语音采集模块1还可采用智能手机进行语音信号输入，通过用手机与本发明语音采集模块1匹配连接，匹配方式包括蓝牙、红外线、WIFI以及扫描二维码进行连接，实现语音录入，相当于把手机当成无线语筒使用，更方便于多人群语音互动。The present invention is designed, voice collection module 1 can also adopt smart mobile phone to carry out voice signal input, by matching connection with voice collection module 1 of the present invention with mobile phone, matching mode comprises bluetooth, infrared ray, WIFI and scan two-dimension code to connect, realize voice Recording is equivalent to using the mobile phone as a wireless microphone, which is more convenient for multi-group voice interaction.

本发明设计了，语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号，即将模拟信号转变为数字信号，然后进行常规的信号处理，包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。The present invention has designed that the voice preprocessing unit 21 converts the voice signal collected by the voice acquisition module 1 into an electrical signal, that is, converts an analog signal into a digital signal, and then performs conventional signal processing, including environmental background noise elimination, signal framing, Filtering, pre-emphasis, windowing functions, endpoint detection, etc.

本发明设计了，语音信号特征提取单元22设置为从原始语音信号中提取出反映语音本质的主要特征参数，形成特征向量x_i，x_i＝(x_i1,x_i2,…x_ij,…,x_in)^T，x_ij表示第i个对象或个人的第j个语音特征值，特征参数提取方法优选的采用频率倒谱系数法(MFCC)，还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征，提取特征后得到的特征向量系统将自动保存到模式类数据库中，一个对象或人的所有声音特征对应一个模式类，若录入N个人或对象的语音后，即得到N个模式类，若每个模式类有n个特征参数，即可构成n维特征空间，即标记后的特征信号集可记为D＝{(x₁,y₁),(x₂,y₂),…(x_i,y_i),…,(x_N,y_N)}，其中x_i∈χ＝Rⁿ，x_i表示所录入的第i个对象或人的语音特征信号，y_i∈Y＝{1,2,…,N}，y_i表示第i个人或对象，N表示第N个人或对象的数字编号，标记后的语音特征数据构成模式类数据库，并存储在本发明的存储器33中。The present invention designs that the speech signal feature extraction unit 22 is set to extract the main feature parameters reflecting the essence of the speech from the original speech signal to form a feature vector x _i , x _i = (x _i1 , x _i2 ,...x _ij ,..., x _in ) ^T , x _ij represents the j-th speech feature value of the i-th object or individual, the feature parameter extraction method preferably adopts frequency cepstral coefficient method (MFCC), and spectral envelope method, LPC interpolation method can also be used , LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and the feature vector system obtained after feature extraction will be automatically saved in the pattern class database. All sound features of an object or person correspond to a pattern class. If input N After the speech of an individual or object, N pattern classes are obtained. If each pattern class has n feature parameters, an n-dimensional feature space can be formed, that is, the marked feature signal set can be recorded as D={(x ₁ , y ₁ ),(x ₂ ,y ₂ ),…( _xi ,y _i ),…,(x _N ,y _N )}, where x _i ∈ χ=R ⁿ , _xi represents the i-th input The speech feature signal of the object or person, y _i ∈ Y={1,2,…,N}, y _i represents the i-th person or object, N represents the number of the N-th person or object, and the marked speech feature data constitutes The pattern class database is stored in the memory 33 of the present invention.

本发明设计了，特征匹配判别分类单元23设置为采用智能的多类分类器，分类器的学习算法设置为采用改进的神经网络分类算法，通过对已录入并标记的语音特征信号集作为训练数据，让网络模型对训练数据进行学习，得到分类规则，完成分类器的训练；然后利用已经训练好的分类器对未知的测试语音信号进行智能分类和识别；当测试信号提取特征后，本发明会自动进行特征匹配，将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入并标记好的样本语音特征参数进行特征匹配，并计算测试语音信号与所有已录入的样本语音信号的相似度，然后把测试语音信号划分到与其相似度最高的那一样本信号模式类别中，最后本发明向外界输出识别结果，“这是XXX的声音”类似的报告，例如，如果本发明已存储了张三的语音特征信号，当张三对着本发明说话或唱歌时，本发明会自动计算出张三的测试语音特征参数与已录入并标记过的张三的录入语音信号最相似，经过识别，自动输出“这是张三的声音”。The present invention has designed, and feature matching distinguishes classification unit 23 and is set to adopt intelligent multiclass classifier, and the learning algorithm of classifier is set to adopt improved neural network classification algorithm, by the speech feature signal set that has entered and marked as training data , let the network model learn the training data, obtain classification rules, and complete the training of the classifier; then use the trained classifier to intelligently classify and identify unknown test voice signals; after the test signal extracts features, the present invention will Automatically carry out feature matching, carry out feature matching with the feature parameters of the extracted test voice signal in real time and the sample voice feature parameters that have been entered and marked in the memory 33 of the present invention, and calculate the test voice signal and all the sample voice signals that have been entered. similarity, then the test voice signal is divided into the sample signal pattern category with the highest similarity, and finally the present invention outputs the recognition result to the outside world, a report similar to "this is the sound of XXX", for example, if the present invention has stored Zhang San's speech characteristic signal, when Zhang San spoke or sang to the present invention, the present invention can calculate Zhang San's test speech characteristic parameter automatically and the entry speech signal of Zhang San that has entered and marked the most similar, through Recognize and automatically output "This is Zhang San's voice".

本发明设计了，多类分类器采用的多层人工神经网络结构，其特征是，网络的一端定义为输入层，另一端定义为输出层，输入层与输出层中间的部分定义为隐含层，输入层用于接收外界的输入信号，重新将输入信号发送给隐含层的所有神经元，经隐含层计算后将结果传递给输出层，输出层从隐含层接收信号，计算后输出分类结果，即识别的结果，本发明优选的隐含层的层数设置为1～200层。The present invention has designed, the multi-layer artificial neural network structure that multiclass classifier adopts, and it is characterized in that, one end of network is defined as input layer, and the other end is defined as output layer, and the part in the middle of input layer and output layer is defined as hidden layer , the input layer is used to receive external input signals, re-send the input signals to all neurons in the hidden layer, and pass the results to the output layer after calculation by the hidden layer, the output layer receives signals from the hidden layer, and outputs after calculation As for the classification result, that is, the recognition result, the preferred number of hidden layers in the present invention is set to 1-200 layers.

本发明设计了，改进的人工神经网络分类算法训练的过程包含步骤1～7。The invention designs that the training process of the improved artificial neural network classification algorithm includes steps 1-7.

步骤1：网络初始化。根据语音信号录入的个数，不断更新算法数据库，当录入了N个对象的语音信号时，即构成N个模式类，得到样本空间(X,Y)，第i组样本即(X_i,Y_i)，X_i表示对第i个对象所提取的特征向量集合，Y_i表示所标记的第i个对象；根据系统输入输出序列(X,Y)确定网络输入层结点数n、隐含层结点数l、输出层结点数m，其中n值由输入信号特征提取中对应特征值的个数确定，m值由存储的语音模式类的个数确定，l的参照值为其中a的取值范围为0～10，由模型自动计算确定，初始化输入层与隐含层的神经元之间的连接权值ω_ij和隐含层与输出层神经元之间的连接权值ω_jk，初始化隐含层阈值a和输出层阈值b，给定学习率η和神经元激励函数。Step 1: Network initialization. According to the number of input voice signals, the algorithm database is continuously updated. When the voice signals of N objects are input, N pattern classes are formed, and the sample space (X, Y) is obtained. The i-th group of samples is (X _i , Y _i ), X _i represents the set of feature vectors extracted from the i-th object, and Y _i represents the marked i-th object; according to the system input and output sequence (X, Y), determine the network input layer node number n, the hidden layer The number of nodes l, the number of nodes in the output layer m, wherein the value of n is determined by the number of corresponding feature values in the input signal feature extraction, the value of m is determined by the number of stored speech pattern classes, and the reference value of l is The value of a ranges from 0 to 10, which is automatically calculated and determined by the model, and the connection weight ω _ij between the neurons of the input layer and the hidden layer and the connection weight between the neurons of the hidden layer and the output layer are initialized ω _jk , initialize hidden layer threshold a and output layer threshold b, given learning rate η and neuron activation function.

步骤2：计算隐含层的输出。根据输入变更X，输入层与隐含层的神经元的连接权值ω_ij，以及隐含层阈值a，计算隐含层输出H；记第j个隐含层结点的输出为H_j，j＝1,2,…,l，其中l为隐含层结点数，f为隐含层激励函数，所述激励函数有多种，本发明优选的采用f(x)＝(1+e^-x)^-1。Step 2: Calculate the output of the hidden layer. According to the input change X, the connection weight ω _ij of the neurons in the input layer and the hidden layer, and the hidden layer threshold a, calculate the hidden layer output H; record the output of the jth hidden layer node as H _j , j=1,2,...,l, where l is the number of nodes in the hidden layer, f is the hidden layer activation function, and there are many kinds of activation functions, and the present invention preferably adopts f(x) ⁼ (1+e- ^x ) ^-1 .

步骤3：计算输出层的输出。根据隐含层输出H，隐含层与输出层神经元之间的连接权值ω_jk，以及输出层阈值b，计算输出层输出O，记第k个输出层结点的输出为O_k，k＝1,2,…,m，其中m为输出层结点数，b_k为输出层第k个结点的阈值，H_j为隐含层第j个结点的输出值。Step 3: Compute the output of the output layer. According to the hidden layer output H, the connection weight ω _jk between the hidden layer and the output layer neurons, and the output layer threshold b, calculate the output layer output O, record the output of the kth output layer node as O _k , k=1,2,...,m, where m is the number of nodes in the output layer, b _k is the threshold of the kth node in the output layer, and H _j is the output value of the jth node in the hidden layer.

步骤4：计算预测误差。根据网络预测得到的输出O和期望输出Y(真值)，计算网络预测总误差e，e_k为第k个输出层结点产生的误差， Step 4: Calculate the prediction error. According to the output O and the expected output Y (true value) obtained by the network prediction, calculate the total error e of the network prediction, e _k is the error generated by the kth output layer node,

步骤5：更新权值。根据网络预测总误差e更新网络连接权值ω_jk和ω_ij，ω_jk ⁺＝ω_jk+η·H_j·E_k，其中j＝1,2,…,l，k＝1,2,…,m，η为学习率，E_k表示输出层结点的网络总误差对输出层网络结点k的灵敏度，其中i＝1,2,…,n，j＝1,2,…,l。Step 5: Update weights. Update the network connection weights ω _jk and ω _ij according to the total network prediction error e, ω _jk ⁺ ＝ω _jk +η·H _j ·E _k , where j=1,2,...,l, k=1,2,... , m, η is the learning rate, E _k represents the sensitivity of the network total error of the output layer node to the output layer network node k, where i=1,2,...,n, j=1,2,...,l.

步骤6：阈值更新。根据网络预测总误差e更新隐含层阈值a和输出层阈值b，j＝1,2,…,l；b_k ⁺＝b_k+η·E_k，k＝1,2,…,m。Step 6: Threshold update. Update the hidden layer threshold a and the output layer threshold b according to the network prediction total error e, j=1,2,...,l; b _k ⁺ =b _k +η·E _k , k=1,2,...,m.

步骤7：判断算法迭代是否收敛，若没收敛返回步骤2，本发明优选的最小误差为0.001时结束迭代。Step 7: Judging whether the algorithm iteration is convergent, if not, return to step 2, and the preferred minimum error of the present invention is 0.001 to end the iteration.

本发明设计了，语音采集模块1内置有语音采集卡，用于收集和处理采集到的语音信号。The present invention designs that the voice collection module 1 is built with a voice collection card for collecting and processing the collected voice signals.

本发明设计了，固定录音器13采用防风式麦克风。The present invention has designed, and fixed recorder 13 adopts windproof type microphone.

本发明设计了，显示屏8采用带背景灯的触摸屏或LED显示屏。The present invention designs, and display screen 8 adopts touch screen or LED display screen with backlight.

在本发明中，固定录音器13可以设置多个，布置在本发明外壳处，用于增加语音录制强度。In the present invention, a plurality of fixed recorders 13 can be provided, arranged at the shell of the present invention, for increasing the intensity of voice recording.

本发明对已录入并标记好的语音信号具有长期存储功能，凡是存储在本发明语音模式类数据库中的语音信号，本发明都可随时调取与未知测试语音进行对比识别。The present invention has a long-term storage function for the voice signals that have been entered and marked. Any voice signal stored in the voice pattern database of the present invention can be retrieved at any time with unknown test voices for comparison and recognition.

本发明的使用流程是，先打开电源开关5，然后系统自动运行，显示屏8点亮并显示操作界面，人们可以选择“语音录入模式”和“语音测试模式”两种功能。The use procedure of the present invention is that first turn on the power switch 5, then the system runs automatically, the display screen 8 lights up and displays the operation interface, and people can select two functions of "voice input mode" and "voice test mode".

(1)当选择语音录入时，中央处理器3会控制语音输入单元20进入“语音录入模式”，显示屏8和扬声器35会同时提示“现在是语音录入模式，请说话”类似的提示，人们可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音；为保证本发明能够准确识别和量化被识别对象的语音特征，因此在“语音录入模式”阶段每次只能对一个人或一个对象进行语音录入，由于同一人在说话和唱歌时发出的声音信号数据会存在一定的特征偏差，因此，为提高语音信号识别的准确度，本发明采用多状态语音录入策略，即录入的语音中可包含正常讲话、唱歌或者高/中/低音及其他状态下的多状态组合声音，录音时长为5～30秒，显示器8会显示语音实时波形及完成进度条，如果录制的语音不理想可以删除再次录入，录入语音完毕后需要进行数据标记，标记方法采用人工手动标记，如采集完张三的声音，即在本发明显示屏8显示的对话框中备注：“张三的声音”，保存即可，录入的语音存储在本发明的存储器33中。(1) When selecting voice input, central processing unit 3 can control voice input unit 20 to enter " voice input mode ", and display screen 8 and loudspeaker 35 can prompt simultaneously " now is voice input mode, please speak " similar hint, people Can input voice in any mode of microphone 11 provided by voice collection module 1, wireless walkie-talkie 12, fixed recorder 13 and smart phone; in order to ensure that the present invention can accurately identify and quantify the voice characteristics of the identified object, therefore in "voice In the "recording mode" stage, only one person or one object can be recorded at a time. Since the sound signal data emitted by the same person when speaking and singing will have certain characteristic deviations, in order to improve the accuracy of speech signal recognition, this The invention adopts a multi-state voice input strategy, that is, the recorded voice can include normal speech, singing, or high/middle/bass and other multi-state combined sounds. The recording time is 5 to 30 seconds, and the display 8 will display the real-time waveform of the voice And complete the progress bar, if the voice of recording is unsatisfactory can delete input again, need to carry out data mark after input voice completes, and marking method adopts artificial manual mark, as gathering the sound of Zhang San, namely the dialogue shown in display screen 8 of the present invention Remarks in the frame: "Zhang San's voice", just save it, and the input voice is stored in the memory 33 of the present invention.

(2)语音信号录入完毕后，本发明的控制系统自动将已标记的语音信号送入语音预处理单元21，语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号，即将模拟信号转变为数字信号，然后进行常规的信号处理，包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。(2) After the voice signal has been entered, the control system of the present invention automatically sends the marked voice signal into the voice preprocessing unit 21, and the voice preprocessing unit 21 converts the voice signal collected by the voice acquisition module 1 into an electrical signal, which is about to The analog signal is converted into a digital signal, and then undergoes conventional signal processing, including environmental background noise elimination, signal framing, filtering, pre-emphasis, window function and endpoint detection, etc.

(3)本发明的控制系统自动把已预处理后语音信号送入信号特征提取单元22，语音信号特征提取单元22从预处理后的语音信号中提取出反映语音本质的特征参数，得到特征向量x_i，特征参数提取方法优选的采用频率倒谱系数法(MFCC)，还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征，提取特征后得到的特征向量系统自动保存到模式类别库中，一个人的所有声音特征对应一个模式类，若录入N个人语音后，即得到N个模式类，若每个模式类有n个特征参数，从而得到一人对应语音信号模式类的数据库，所有的数据都存储在本发明的存储器33中，至此，语言信号录入模式内容完毕。(3) control system of the present invention sends into signal feature extraction unit 22 automatically to speech signal after preprocessing, and speech signal feature extraction unit 22 extracts the feature parameter reflecting speech essence from the speech signal after preprocessing, obtains feature vector x _i , the feature parameter extraction method preferably adopts frequency cepstrum coefficient method (MFCC), and can also use spectral envelope method, LPC interpolation method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and extract features The feature vector system obtained after that is automatically saved in the pattern category library. All the voice features of a person correspond to a pattern category. If the voices of N individuals are input, N pattern categories are obtained. Thereby obtain the database of one person's corresponding speech signal pattern class, all data are all stored in the memory 33 of the present invention, so far, speech signal input pattern content is finished.

(4)语音录入完毕后，可进行语音测试，当进行语音测试时，只需要在显示屏8的操作界面中选择“语音测试模式”即可，中央处理器3会控制语音输入单元20进入“语音测试模式”，显示屏8和扬声器35会同时提示“语音测试中…”类似的提示，这时人们不取要做任何操作，本发明会通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音，测试语音采集过程为实时采集，没有任何时间限制和人数的限制。(4) After the voice entry is complete, the voice test can be carried out. When carrying out the voice test, it is only necessary to select "voice test mode" in the operation interface of the display screen 8, and the central processing unit 3 can control the voice input unit 20 to enter " Voice test mode", display screen 8 and loudspeaker 35 can prompt "in voice test..." similar prompt at the same time, at this moment people do not get to do any operation, the present invention can pass microphone 11, wireless walkie-talkie 12 in voice collection module 1 , the fixed recorder 13 and one or more input tools of the smart phone collect the test voice together, and the test voice collection process is real-time collection without any time limit and number of people.

(5)在“语音测试模式”下采集到的语音数据，本发明系统装置会自动地对测试语音信号进行预处理和特征提取，将采集到的语音测试信号转化为电信号，并进行常规的滤波、去除噪音、加窗函数及端点检测后进行信号特征提取。(5) voice data collected under "voice test mode", the system device of the present invention can automatically carry out preprocessing and feature extraction to the test voice signal, convert the voice test signal collected into an electrical signal, and perform conventional Signal feature extraction is performed after filtering, noise removal, window function and endpoint detection.

(6)测试信号提取特征后，本发明会自动进行特征匹配，将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入的已标记好的样本语音特征参数进行特征匹配，并计算测试语音信号与所有已录入的原始语音信号的相似度，并把测试语音信号分到与其相似度最高的那一模式类别中，最后本发明向外界输出，“这是XXX的声音”类似的报告，例如，如果本发明已存储了张三的语音特征信号，当张三对着本发明说话或唱歌时，本发明经过识别，会自动输出“这是张三的声音”。(6) After the test signal extracts features, the present invention can automatically carry out feature matching, and the feature parameter of the test voice signal of extraction is carried out feature matching with the marked sample voice feature parameter that has entered in memory 33 of the present invention in real time, and Calculate the similarity between the test voice signal and all the original voice signals that have been recorded, and divide the test voice signal into the pattern category with the highest similarity, and finally the present invention outputs to the outside world, "This is the sound of XXX" similar Report, for example, if the present invention has stored Zhang San's voice feature signal, when Zhang San speaks or sings to the present invention, the present invention will automatically output "this is Zhang San's voice" after identification.

当本发明在公共场合测试时，由于测试环境中，同一时间段可能存在多个对象同时说话，即采集到的语音信号是宽带混叠的信号，为防止本发明对此时采集的语音信号特征提取时出错，本发明采用的策略在于，运用智能算法，先匹配和识别出单个人说话时的语音特征参数并进行标识和存储，然后系统再对共同说话时的语音信号进行自动筛选和分离，最后输出识别结果并报告“现在是张三、李四、王五……共同的声音”类似的提示，并提示存在XX个语音未能识别，关闭系统时按下电源关闭键6。When the present invention is tested in public places, because in the test environment, there may be multiple objects speaking at the same time in the same time period, that is, the voice signal collected is a signal of broadband aliasing. Errors in extraction, the strategy adopted by the present invention is to use intelligent algorithms to first match and identify the voice feature parameters of a single person speaking and then identify and store them, and then the system automatically screens and separates the voice signals when speaking together, Output recognition result at last and report " now is the common voice of Zhang San, Li Si, Wang Wu... " similar hint, and prompts to exist XX voice and fails to recognize, press power off key 6 when shutting down the system.

本发明还设计了，系统装置还可以向人们输出对多人交流环境下的识别结果清单，包含测试环境下有多少人或对象在现场说话的数量，以及筛选并播放从多人同时说话的录音中识别分离出每个人所讲的内容，而过滤掉其他人的声音和环境音。The present invention also designs that the system device can also output to people a list of recognition results in a multi-person communication environment, including the number of people or objects speaking on the spot under the test environment, and screen and play recordings from multiple people speaking at the same time Identify and separate out what each person is saying, while filtering out other people's voices and ambient sounds.

当测试语音信号中出现了本发明未存储的标本语音信号特征时，本发明会自动记录未知的该语音信号特征，以提醒人们是否标记并存储该对象的语音信号。When the voice signal feature of the specimen not stored in the present invention appears in the test voice signal, the present invention will automatically record the unknown voice signal feature to remind people whether to mark and store the voice signal of the object.

附图说明Description of drawings

图1为本发明的结构示意图。Fig. 1 is a structural schematic diagram of the present invention.

图2为本发明的系统框架图。Fig. 2 is a system frame diagram of the present invention.

图3为本发明的多层人工神经网络示意图。Fig. 3 is a schematic diagram of a multi-layer artificial neural network of the present invention.

图4为本发明的语音信号改进的神经网络分类算法流程图。Fig. 4 is a flow chart of the improved neural network classification algorithm of the speech signal of the present invention.

具体实施方式Detailed ways

附图1为本发明的一个实施例，结合附图1～附图4具体说明本实施例，包含有框体10，框体10设置有腔体，在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9，语音采集模块1包含有话筒11、无线对讲机12和固定录音器13，语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23，语音信号由语音采集模块1采集，采集到的信号由语音识别模块2处理，数据信号由存储器33保存，人机交互的操作流程以及结果的输出的可视化由显示屏8显示，扬声器35设置为对操作步骤进行语音提示及播报识别结果，网络模块31设置为将本发明与互联网云平台进行连接，中央处理器3设置为对整个系统装置的程序控制及数据运算，无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接，内存卡32设置为将已录制的外部语音数据读入本发明数据库中。Accompanying drawing 1 is an embodiment of the present invention, in conjunction with accompanying drawing 1～accompanying drawing 4 concrete description present embodiment, comprise frame body 10, frame body 10 is provided with cavity, is provided with voice acquisition module 1 in frame body 10 , voice recognition module 2, central processing unit 3, wireless signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 With fixed recorder 13, voice recognition module 2 includes voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal that gathers Processed by the voice recognition module 2, the data signal is stored by the memory 33, the visualization of the operation process of human-computer interaction and the output of the result is displayed by the display screen 8, the speaker 35 is set to perform voice prompts and broadcast recognition results to the operation steps, and the network module 31 It is set to connect the present invention with the Internet cloud platform, the central processing unit 3 is set to program control and data calculation of the entire system device, and the wireless signal transceiver device 4 is set to the wireless interphone 12, smart phone, and network module 31. The radio signal is received, transmitted and the present invention is wirelessly connected to the Internet, and the memory card 32 is set to read the recorded external voice data into the database of the present invention.

在本实施例中，语音输入单元20设置为包含有“语音录入模式”和“语音测试模式”两种类型，可通过语音采集模块1所提供的话筒11、无线对讲机12、固定录音器13及智能手机任意一种方式输入语音，在“语音录入模式”中，语音输入单元20设置为一次只能对一个人或一个对象进行语音录入，其特征在于，录入的语音为一段5～30秒的音频信号，本发明采用多状态语音录入策略，其特征在于，录入的语音中可包含有正常讲话、唱歌或者高/中/低音的多状态组合语音，显示器8实时显示语音波形及完成进度条，录入语音完毕后需要进行数据标记，标记方法采用人工手动标记，如采集完张三的声音，即在本发明显示屏8显示的对话框中备注：“张三的声音”，保存即可，录入的语音保存在存储器33中，在“语音测试模式”下，本发明通过语音采集模块1中的话筒11、无线对讲机12、固定录音器13及智能手机其中的一种或多种输入工具一同采集测试语音，测试语音采集过程为实时采集，没有任何人数、对象和时间的限制。In the present embodiment, the voice input unit 20 is set to include two types of "voice input mode" and "voice test mode", which can be provided by the microphone 11 provided by the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and Smartphone input voice in any way, in the "voice input mode", the voice input unit 20 is set to only perform voice input to one person or one object at a time, and it is characterized in that the input voice is a section of 5-30 seconds Audio signal, the present invention adopts the multi-state voice input strategy, it is characterized in that, can comprise the multi-state combination voice of normal speech, singing or high/middle/bass in the voice of input, display 8 real-time display voice waveform and complete progress bar, Need to carry out data labeling after input voice finishes, labeling method adopts artificial manual labeling, as having collected Zhang San's sound, promptly remark in the dialog box that display screen 8 of the present invention shows: " Zhang San's voice ", save and get final product, input The voice of the voice is stored in the memory 33. Under the "voice test mode", the present invention collects together through one or more input tools of the microphone 11 in the voice collection module 1, the wireless walkie-talkie 12, the fixed recorder 13 and the smart phone Test voice, the test voice collection process is real-time collection, without any restrictions on the number of people, objects and time.

在本实施例中，语音输入单元20设置为与语音采集模块1相连接，话筒11通过音频线连接到语音采集模块1，无线对讲机12通过无线电信号与语音采集模块1连接。In this embodiment, the voice input unit 20 is configured to be connected to the voice collection module 1, the microphone 11 is connected to the voice collection module 1 through an audio cable, and the wireless walkie-talkie 12 is connected to the voice collection module 1 through a radio signal.

在本实施例中，语音采集模块1还可采用智能手机进行语音信号输入，通过用手机与本发明语音采集模块1匹配连接，匹配方式包括蓝牙、红外线、WIFI以及扫描二维码进行连接，实现语音录入，相当于把手机当成无线语筒使用，更方便于多人群语音互动。In this embodiment, the voice collection module 1 can also use a smart phone to input voice signals, by matching and connecting the mobile phone with the voice collection module 1 of the present invention, the matching methods include Bluetooth, infrared, WIFI and scanning two-dimensional codes to connect, to achieve Voice recording is equivalent to using the mobile phone as a wireless microphone, which is more convenient for multi-group voice interaction.

在本实施例中，语音预处理单元21把语音采集模块1采集到的语音信号转变为电信号，即将模拟信号转变为数字信号，然后进行常规的信号处理，包括环境背景噪音消除、信号分帧、滤波、预加重、加窗函数及端点检测等。In this embodiment, the voice preprocessing unit 21 converts the voice signal collected by the voice collection module 1 into an electrical signal, that is, converts an analog signal into a digital signal, and then performs conventional signal processing, including environmental background noise elimination, signal framing , filtering, pre-emphasis, windowing function and endpoint detection, etc.

在本实施例中，语音信号特征提取单元22设置为从原始语音信号中提取出反映语音本质的主要特征参数，形成特征向量x_i，x_i＝(x_i1,x_i2,…x_ij,…,x_in)^T，x_ij表示第i个对象或个人的第j个语音特征值，特征参数提取方法优选的采用频率倒谱系数法(MFCC)，还可采用谱包络法、LPC内插法、LPC求根法、希尔伯特变换法等得到声学特征，提取特征后得到的特征向量系统将自动保存到模式类数据库中，一个对象或人的所有声音特征对应一个模式类，若录入N个人或对象的语音后，即得到N个模式类，若每个模式类有n个特征参数，即可构成n维特征空间，即标记后的特征信号集可记为D＝{(x₁,y₁),(x₂,y₂),…(x_i,y_i),…,(x_N,y_N)}，其中x_i∈χ＝Rⁿ，x_i表示所录入的第i个对象或人的语音特征信号，y_i∈Y＝{1,2,…,N}，y_i表示第i个人或对象，N表示第N个人或对象的数字编号，标记后的语音特征数据构成模式类数据库，并存储在本发明的存储器33中。In this embodiment, the speech signal feature extraction unit 22 is configured to extract the main feature parameters reflecting the essence of the speech from the original speech signal to form a feature vector x _i , x _i = (x _i1 , x _i2 , ... x _ij , ... , x _in ) ^T , x _ij represents the j-th speech feature value of the i-th object or individual, the feature parameter extraction method preferably adopts frequency cepstral coefficient method (MFCC), and spectral envelope method, LPC interpolation can also be used method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and the feature vector system obtained after feature extraction will be automatically saved in the pattern class database. All sound features of an object or person correspond to a pattern class. After the voices of N people or objects, N pattern classes are obtained. If each pattern class has n feature parameters, it can form an n-dimensional feature space, that is, the marked feature signal set can be recorded as D={(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…( _xi ,y _i ),…,(x _N ,y _N )}, where x _i ∈ χ=R ⁿ , x _i represents the i-th input The speech feature signal of an object or person, y _i ∈ Y={1,2,…,N}, y _i represents the i-th person or object, N represents the number of the N-th person or object, and the marked speech feature data A pattern database is formed and stored in the memory 33 of the present invention.

在本实施例中，特征匹配判别分类单元23设置为采用智能的多类分类器，分类器的学习算法设置为采用改进的神经网络分类算法，通过对已录入并标记的语音特征信号集作为训练数据，让网络模型对训练数据进行学习，得到分类规则，完成分类器的训练；然后利用已经训练好的分类器对未知的测试语音信号进行智能分类和识别；当测试信号提取特征后，本发明会自动进行特征匹配，将提取的测试语音信号的特征参数实时地与本发明存储器33中已录入并标记好的样本语音特征参数进行特征匹配，并计算测试语音信号与所有已录入的样本语音信号的相似度，然后把测试语音信号划分到与其相似度最高的那一样本信号模式类别中，最后本发明向外界输出识别结果，“这是XXX的声音”类似的报告，例如，如果本发明已存储了张三的语音特征信号，当张三对着本发明说话或唱歌时，本发明会自动计算出张三的测试语音特征参数与已录入并标记过的张三的录入语音信号最相似，经过识别，自动输出“这是张三的声音”。In the present embodiment, the feature matching discrimination classification unit 23 is set to adopt an intelligent multiclass classifier, and the learning algorithm of the classifier is set to adopt an improved neural network classification algorithm, and the speech feature signal set that has been entered and marked is used as a training data, let the network model learn the training data, obtain classification rules, and complete the training of the classifier; then use the trained classifier to intelligently classify and identify unknown test voice signals; after the test signal extracts features, the present invention Can automatically carry out feature matching, the feature parameter of the test voice signal of extraction is carried out feature matching with the sample voice feature parameter that has entered and marked in the memory 33 of the present invention in real time, and calculates the test voice signal and all the sample voice signals that have entered The similarity of the test voice signal is then divided into the sample signal pattern category with the highest similarity, and finally the present invention outputs the recognition result to the outside world, a report similar to "this is the sound of XXX", for example, if the present invention has Stored Zhang San's speech characteristic signal, when Zhang San speaks or sings to the present invention, the present invention can automatically calculate Zhang San's test speech characteristic parameter and the input speech signal of Zhang San that has entered and marked the most similar, After identification, automatically output "This is Zhang San's voice".

在本实施例中，多类分类器采用的多层人工神经网络结构，其特征是，网络的一端定义为输入层，另一端定义为输出层，输入层与输出层中间的部分定义为隐含层，输入层用于接收外界的输入信号，重新将输入信号发送给隐含层的所有神经元，经隐含层计算后将结果传递给输出层，输出层从隐含层接收信号，计算后输出分类结果，即识别的结果，本发明优选的隐含层的层数设置为1～200层。In this embodiment, the multi-layer artificial neural network structure adopted by the multi-class classifier is characterized in that one end of the network is defined as the input layer, the other end is defined as the output layer, and the part between the input layer and the output layer is defined as the hidden layer. Layer, the input layer is used to receive the input signal from the outside world, re-send the input signal to all neurons in the hidden layer, and pass the result to the output layer after calculation by the hidden layer, the output layer receives the signal from the hidden layer, and after calculation To output classification results, that is, recognition results, the number of hidden layers is preferably set to 1-200 in the present invention.

在本实施例中，改进的人工神经网络分类算法训练的过程如下：In the present embodiment, the process of the improved artificial neural network classification algorithm training is as follows:

步骤4：计算预测误差。根据网络预测得到的输出O和期望输出Y(真值)，计算网络预测总误差e，e_k为第k个输出层结点产生的误差，步骤5：更新权值。根据网络预测总误差e更新网络连接权值ω_jk和ω_ij，ω_jk ⁺＝ω_jk+η·H_j·E_k，其中j＝1,2,…,l，k＝1,2,…,m，η为学习率，E_k表示输出层结点的网络总误差对输出层网络结点k的灵敏度，其中i＝1,2,…,n，j＝1,2,…,l。Step 4: Calculate the prediction error. According to the output O and the expected output Y (true value) obtained by the network prediction, calculate the total error e of the network prediction, e _k is the error generated by the kth output layer node, Step 5: Update weights. Update the network connection weights ω _jk and ω _ij according to the total network prediction error e, ω _jk ⁺ ＝ω _jk +η·H _j ·E _k , where j=1,2,...,l, k=1,2,... , m, η is the learning rate, E _k represents the sensitivity of the network total error of the output layer node to the output layer network node k, Where i=1,2,...,n, j=1,2,...,l.

在本实施例中，语音采集模块1内置有语音采集卡，用于收集和处理采集到的语音信号。In this embodiment, the voice collection module 1 has a built-in voice collection card for collecting and processing the collected voice signals.

在本实施例中，固定录音器13采用防风式麦克风。In this embodiment, the fixed recorder 13 adopts a windproof microphone.

在本实施例中，显示屏8采用带背景灯的触摸屏或LED显示屏。In this embodiment, the display screen 8 adopts a touch screen or an LED display screen with a backlight.

在本实施例中，固定录音器13可以设置多个，布置在本发明外壳处，用于增加语音录制强度。In this embodiment, there may be multiple fixed recorders 13 arranged at the casing of the present invention to increase the intensity of voice recording.

在本实施例中，系统装置还可以向人们输出对多人交流环境下的识别结果清单，包含测试环境下有多少人或对象在现场说话的数量，以及筛选并播放从多人同时说话的录音中识别分离出每个人所讲的内容，而过滤掉其他人的声音和环境音。In this embodiment, the system device can also output to people a list of recognition results in a multi-person communication environment, including the number of people or objects speaking on the spot in the test environment, and screen and play recordings from multiple people speaking at the same time Identify and separate out what each person is saying, while filtering out other people's voices and ambient sounds.

在智能语音信号模式识别系统装置技术领域内；凡是包含有框体10，框体10设置有腔体，在框体10中设置有语音采集模块1、语音识别模块2、中央处理器3、无线信号收发装置4、显示屏8、存储器33、网络模块31、内存卡32、扬声器35和电源9，语音采集模块1包含有话筒11、无线对讲机12和固定录音器13，语音识别模块2包含有语音输入单元20、语音预处理单元21、语音信号特征提取单元22、特征匹配判别分类单元23，语音信号由语音采集模块1采集，采集到的信号由语音识别模块2处理，数据信号由存储器33保存，人机交互的操作流程以及结果的输出的可视化由显示屏8显示，扬声器35设置为对操作步骤进行语音提示及播报识别结果，网络模块31设置为将本发明与互联网云平台进行连接，中央处理器3设置为对整个系统装置的程序控制及数据运算，无线信号收发装置4设置为对无线对讲机12、智能手机、网络模块31所产生的无线电信号进行接收、发射及将本发明与互联网无线连接，内存卡32设置为将已录制的外部语音数据读入本发明数据库中的技术内容都在本发明的保护范围内，应当指出，本发明保护范围不应受限于外形特征，本发明的框体10的造型可以设置为方形、圆柱形、多棱柱体形或类似于白菜、西瓜、石头等其他造型，凡是造型不同而实质的技术内容与本发明相同的一切技术内容也在本发明的保护范围之内；同时，本技术领域技术人员在本发明内容的基础上作常规的显而易见的小改进或小组合，只要技术内容包含在本发明所记载的内容范围之内的技术内容也在本发明的保护范围内。In the technical field of intelligent voice signal pattern recognition system device; every frame body 10 is included, the frame body 10 is provided with a cavity, and a voice collection module 1, a voice recognition module 2, a central processing unit 3, a wireless Signal transceiving device 4, display screen 8, memory 33, network module 31, memory card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 includes microphone 11, wireless walkie-talkie 12 and fixed recorder 13, and voice recognition module 2 includes Voice input unit 20, voice preprocessing unit 21, voice signal feature extraction unit 22, feature matching discrimination classification unit 23, voice signal is collected by voice collection module 1, the signal collected is processed by voice recognition module 2, and data signal is processed by memory 33 Preserve, the visualization of the output of the operation process of man-machine interaction and result is shown by display screen 8, loudspeaker 35 is set to carry out voice prompt and broadcast recognition result to operation step, and network module 31 is set to the present invention and Internet cloud platform are connected, The central processing unit 3 is set to the program control and data calculation of the whole system device, and the wireless signal transceiver 4 is set to receive and transmit the radio signals generated by the wireless walkie-talkie 12, the smart phone, and the network module 31 and connect the present invention with the Internet. Wireless connection, memory card 32 is set to the technical content that the recorded external voice data is read in the database of the present invention all within the scope of protection of the present invention, it should be pointed out that the scope of protection of the present invention should not be limited to appearance feature, the present invention The molding of the frame body 10 can be set to square, cylindrical, polygonal prism or other moldings similar to cabbage, watermelon, stone, etc., and all technical contents that are different in molding and substantially technical content are also the same as the present invention. At the same time, those skilled in the art make conventional obvious small improvements or small combinations on the basis of the content of the present invention, as long as the technical content is included in the scope of the content of the present invention. within the scope of protection of the invention.

Claims

1. An intelligent speech signal pattern recognition system device; it is characterized in that: comprise frame (10), described frame (10) is provided with cavity, it is characterized in that, in frame (10) is provided with voice Acquisition module (1), voice recognition module (2), central processing unit (3), wireless signal transceiver (4), display screen (8), memory (33), network module (31), memory card (32) , loudspeaker (35) and power supply (9), voice acquisition module (1) includes microphone (11), wireless walkie-talkie (12) and fixed recorder (13), and voice recognition module (2) includes voice input unit (20 ), speech preprocessing unit (21), speech signal feature extraction unit (22), feature matching discrimination classification unit (23), speech signal is collected by speech collection module (1), and the signal collected is by speech recognition module (2) processing, the data signal is stored by the memory (33), the visualization of the operation process of human-computer interaction and the output of the result is displayed by the display screen (8), the loudspeaker (35) is set to perform voice prompts and broadcast recognition results to the operation steps, and the network module (31) is set to connect the present invention with Internet cloud platform, and central processing unit (3) is set to the program control of whole system device and data is operated, and wireless signal transceiving device (4) is set to wireless walkie-talkie (12) , Smartphone, the radio signal that network module (31) produces receives, transmits and the present invention is wirelessly connected with the Internet, and memory card (32) is set to read the recorded external voice data into the database of the present invention.

2. intelligent voice signal pattern recognition system device according to claim 1; It is characterized in that: voice input unit (20) is set to include " voice entry mode " and " voice test mode " two types, can collect by voice The microphone (11), wireless walkie-talkie (12), fixed recorder (13) and smart phone provided by the module (1) input voice in any mode. In the "voice input mode", the voice input unit (20) is set to Only one person or object can be entered into voice at a time. It is characterized in that the input voice is an audio signal of 5 to 30 seconds. The present invention adopts a multi-state voice input strategy. It is characterized in that the input voice can include For normal speaking, singing, or high/middle/bass multi-state combined voice, the display (8) displays the voice waveform and the completion progress bar in real time. After the input voice is completed, the data needs to be marked. The marking method adopts manual marking. On the display screen (8 ) Remarks in the dialog box shown: "the sound of XXX", save and get final product, the voice of entry is stored in memory (33), under " voice test mode ", pass the microphone (11) in the voice collection module (1) , wireless walkie-talkie (12), fixed recorder (13) and one or more of the input tools in the smart phone to collect the test voice together, the test voice collection process is real-time collection, without any restrictions on the number of people, objects and time, the smart phone The voice acquisition module (1) is set as a wireless matching connection, and the matching methods include Bluetooth, infrared, WIFI, and scanning a QR code to connect to realize voice recording, which is equivalent to using the mobile phone as a wireless microphone to realize multi-group voice interaction.

3. the intelligent voice signal pattern recognition system device according to claim 1; It is characterized in that: the voice preprocessing unit (21) converts the voice signal collected by the voice acquisition module (1) into an electrical signal, that is, converts an analog signal into The digital signal is then subjected to conventional signal processing, including ambient background noise removal, signal framing, filtering, pre-emphasis, windowing, and endpoint detection.

4. intelligent speech signal pattern recognition system device according to claim 1; It is characterized in that: speech signal feature extraction unit (22) is set to extract the main feature parameter that reflects speech essence from original speech signal, forms feature vector x _i , x _i =(x _i1 , x _i2 ,…x _ij ,…,x _in ) ^T , x _ij represents the jth speech feature value of the i-th object or individual, and the feature parameter extraction method adopts the frequency cepstral coefficient method (MFCC), can also use spectral envelope method, LPC interpolation method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and the feature vector system obtained after feature extraction will be automatically saved in the pattern database. All sound features of an object or person correspond to a pattern class. If the voices of N individuals or objects are input, N pattern classes are obtained. If each pattern class has n feature parameters, an n-dimensional feature space can be formed, namely The marked feature signal set can be written as D={(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…( _xi ,y _i ),…,(x _N ,y _N )}, where x _i ∈ χ=R ⁿ , x _i represents the speech feature signal of the i-th object or person entered, y _i ∈ Y={1,2,…,N}, y _i represents the i-th person or object, and N represents The digital number of the Nth person or object, and the marked voice feature data form a pattern database, and are stored in the memory (33) of the present invention.

5. intelligent speech signal pattern recognition system device according to claim 1; It is characterized in that: feature matching distinguishes classification unit (23) and is set to adopt intelligent multiclass classifier, and the learning algorithm of classifier is set to adopt improved neural The network classification algorithm uses the input and marked speech feature signal set as the training data, so that the network model can learn the training data, obtain the classification rules, and complete the training of the classifier; then use the trained classifier to test the unknown Speech signal is carried out intelligent classification and identification; After test signal extraction feature, the present invention can carry out feature matching automatically, the feature parameter of the test speech signal of extracting is in real time with the sample speech that has entered and marked in memory (33) of the present invention The characteristic parameter is carried out feature matching, and calculates the similarity of test speech signal and all the sample speech signals that have entered, then divides test speech signal into that sample signal mode category with the highest similarity with it, and finally the present invention outputs recognition to the outside world As a result, "this is the voice of XXX" similar reports.

6. intelligent speech signal pattern recognition system device according to claim 1; It is characterized in that: the multi-layer artificial neural network structure that multiclass classifier adopts, it is characterized in that, one end of network is defined as input layer, and the other end is defined as The output layer, the part between the input layer and the output layer is defined as the hidden layer, the input layer is used to receive the input signal from the outside world, resend the input signal to all neurons in the hidden layer, and pass the result after calculation by the hidden layer For the output layer, the output layer receives signals from the hidden layer, outputs the classification result after calculation, that is, the recognition result, and the number of layers of the preferred hidden layer in the present invention is set to 1-200 layers.

7. intelligent speech signal pattern recognition system device according to claim 1; It is characterized in that: the step of improving artificial neural network training is as follows:

Step 1: Network initialization; according to the number of input voice signals, the algorithm database is continuously updated. When the voice signals of N objects are input, N pattern classes are formed, and the sample space (X, Y) is obtained. The i-th group of samples Namely (X _i , Y _i ), Xi _i represents the set of feature vectors extracted from the i-th object, and Y _i represents the marked i-th object; determine the network input layer structure according to the system input and output sequence (X, Y) The number of points n, the number of hidden layer nodes l, and the number of output layer nodes m, wherein the value of n is determined by the number of corresponding feature values in the feature extraction of the input signal, the value of m is determined by the number of stored speech pattern classes, and the reference value of l for The value of a ranges from 0 to 10, which is automatically calculated and determined by the model, and the connection weight ω _ij between the neurons of the input layer and the hidden layer and the connection weight between the neurons of the hidden layer and the output layer are initialized ω _jk , initialize hidden layer threshold a and output layer threshold b, given learning rate η and neuron activation function;

Step 2: Calculate the output of the hidden layer; according to the input change X, the connection weight ω _ij of the neurons in the input layer and the hidden layer, and the hidden layer threshold a, calculate the output H of the hidden layer; record the jth hidden layer The output of the layer-containing node is H _j , j=1,2,...,l, where l is the number of nodes in the hidden layer, f is the hidden layer activation function, and there are many kinds of activation functions, and the present invention preferably adopts f(x) ⁼ (1+e- ^x ) ^-1 ;

Step 3: Calculate the output of the output layer; according to the hidden layer output H, the connection weight ω _jk between the hidden layer and the output layer neurons, and the output layer threshold b, calculate the output layer output O, record the kth output The output of the layer node is O _k , k=1,2,...,m, where m is the number of nodes in the output layer, b _k is the threshold of the kth node in the output layer, and Hj is the output value of the _jth node in the hidden layer;

Step 4: Calculate the prediction error; calculate the total error e of the network prediction according to the output O and the expected output Y (true value) obtained by the network prediction, e _k is the error generated by the kth output layer node,

Step 5: Update the weights; update the network connection weights ω _jk and ω _ij according to the total network prediction error e, ω _jk ⁺ =ω _jk +η·H _j ·E _k , where j=1,2,...,l, k=1,2,...,m, η is the learning rate, E _k represents the sensitivity of the network total error of the output layer node to the output layer network node k, where i=1,2,...,n, j=1,2,...,l;

Step 6: Threshold update; update the hidden layer threshold a and the output layer threshold b according to the network prediction total error e, j=1,2,...,l; b _k ⁺ =b _k +η·E _k , k=1,2,...,m;

Step 7: Judging whether the algorithm iteration is convergent, if not, return to step 2, and the preferred minimum error of the present invention is 0.001 to end the iteration.

8. according to claim 1,2,3,4,5,6,7 any described intelligent voice signal pattern recognition system device; It is characterized in that: basic operating system flow process is set to:

1) Turn on the power switch (5), then the system runs automatically, the display screen (8) lights up and displays the operation interface, and people can select two functions of "voice input mode" and "voice test mode"; when selecting voice input, Central processing unit (3) can control voice input unit (20) to enter " voice recording mode ", and display screen (8) and loudspeaker (35) can prompt simultaneously " be voice recording mode now, please speak " similar prompting, and people can By the microphone (11) provided by the voice collection module (1), wireless walkie-talkie (12), fixed recorder (13) and intelligent mobile phone any mode input voice; In order to ensure that the present invention can accurately identify and quantify the identified object Voice characteristics, so in the "voice recording mode" stage, only one person or one object can be recorded at a time. Since the same person speaks and sings, the sound signal data will have certain characteristic deviations. Therefore, in order to improve the voice For the accuracy of signal recognition, the present invention adopts a multi-state voice input strategy, that is, the input voice can include normal speech, singing, or high/middle/bass and other states of multi-state combination sounds, and the recording time is 5 to 30 seconds. Display (8) can display voice real-time waveform and complete progress bar, if the recorded voice is unsatisfactory, it can be deleted and entered again. After the input voice is completed, data marking is required. The marking method adopts manual marking, such as collecting the voice of Zhang San, that is Remark in the dialog box that display screen (8) of the present invention shows: " Zhang San's voice ", save and get final product, the voice of typing is stored in memory (33) of the present invention;

2) After the voice signal has been entered, the control system of the present invention automatically sends the marked voice signal into the voice preprocessing unit (21), and the voice preprocessing unit (21) converts the voice signal collected by the voice acquisition module (1) It is an electrical signal, that is, the analog signal is converted into a digital signal, and then conventional signal processing is performed, including environmental background noise elimination, signal framing, filtering, pre-emphasis, windowing function and endpoint detection, etc.;

3) control system of the present invention sends into signal feature extraction unit (22) automatically with preprocessed speech signal, and speech signal feature extraction unit (22) extracts the feature parameter reflecting speech essence from the speech signal after preprocessing, The feature vector x _i is obtained, and the feature parameter extraction method adopts frequency cepstral coefficient method (MFCC), and can also use spectral envelope method, LPC interpolation method, LPC root-finding method, Hilbert transform method, etc. to obtain acoustic features, and extract The feature vector system obtained after the feature is automatically saved in the pattern category library. All the voice features of a person correspond to a pattern category. If the voices of N people are input, N pattern categories will be obtained. If each pattern category has n feature parameters , thereby obtain the database of one people's corresponding speech signal pattern class, all data are all stored in the memory (33) of the present invention, so far, speech signal input pattern content is finished;

4) After the voice input is completed, the voice test can be carried out. When performing the voice test, you only need to select "voice test mode" in the operation interface of the display screen (8), and the central processing unit (3) will control the voice input unit (20) Enter " voice test mode ", display screen (8) and loudspeaker (35) can prompt simultaneously " in voice test... " similar hint, at this moment people do not get to do any operation, the present invention can pass voice acquisition module The microphone (11), wireless walkie-talkie (12), fixed recorder (13) and one or more input tools in the smart phone in (1) collect the test voice together, and the test voice collection process is real-time collection without any time Restrictions and limitations on numbers of people;

5) For the voice data collected under the "voice test mode", the system device of the present invention will automatically perform preprocessing and feature extraction on the test voice signal, convert the collected voice test signal into an electrical signal, and perform conventional filtering , signal feature extraction after noise removal, window function and endpoint detection;

6) after the test signal extraction feature, the present invention can automatically carry out feature matching, the feature parameter of the test voice signal of extraction is carried out feature matching with the sample voice feature parameter that has entered in the memory (33) of the present invention in real time, And calculate the degree of similarity between the test voice signal and all recorded original voice signals, and classify the test voice signal into the highest pattern category with the highest similarity, and finally the present invention outputs to the outside world, "this is the sound of XXX" is similar For example, if the present invention has stored Zhang San's voice feature signal, when Zhang San speaks or sings to the present invention, the present invention will automatically output "this is Zhang San's voice" after recognition;

7) The system device can also output to people a list of recognition results in a multi-person communication environment, including the number of people or objects speaking on the spot in the test environment, as well as screening and playing the recognition and separation from the recordings of multiple people speaking at the same time. What each person said, while filtering out other people's voices and ambient sounds, when the voice signal features of the specimen that are not stored in the present invention appear in the test voice signal, the present invention will automatically record the unknown voice signal features to remind Whether people label and store the speech signal of the object.