CN212342269U - Emotion monitoring system based on sound frequency analysis - Google Patents
Emotion monitoring system based on sound frequency analysis Download PDFInfo
- Publication number
- CN212342269U CN212342269U CN202021353381.9U CN202021353381U CN212342269U CN 212342269 U CN212342269 U CN 212342269U CN 202021353381 U CN202021353381 U CN 202021353381U CN 212342269 U CN212342269 U CN 212342269U
- Authority
- CN
- China
- Prior art keywords
- speech
- emotion
- voice
- mfcc
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 31
- 238000012544 monitoring process Methods 0.000 title claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000002996 emotional effect Effects 0.000 claims abstract description 5
- 230000008909 emotion recognition Effects 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims abstract description 3
- 241000282414 Homo sapiens Species 0.000 claims description 5
- 238000000034 method Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 claims description 2
- 210000005069 ears Anatomy 0.000 claims description 2
- 230000006399 behavior Effects 0.000 claims 1
- 230000019771 cognition Effects 0.000 claims 1
- 230000006870 function Effects 0.000 claims 1
- 230000000306 recurrent effect Effects 0.000 claims 1
- 210000004556 brain Anatomy 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000007177 brain activity Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Landscapes
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention provides an emotion monitoring system based on sound frequency analysis, which comprises: the array microphone collects the speaking voice of a tested object, the array microphone is connected with the PC, and the PC is operated with: the voice signal preprocessing algorithm is used for processing voice characteristics by using MFCC voice emotional characteristic parameters; the speech emotion feature extraction algorithm is characterized in that an LSTM is adopted to obtain a complete speech sequence, similarity calculation is carried out on the speech sequence output in the LSTM, and emotion weight of each frame signal of speech relative to a test object is confirmed; a speech emotion recognition algorithm that recognizes the emotional state of a test subject based on small changes in speech from high frequency (RHFR) and lower frequency (RLFR).
Description
Technical Field
The invention relates to the field of biological recognition and artificial intelligence, and can be applied to some special industries, such as public security inquiries, inspection commission conversation and other business scenes needing to detect the emotion change of an object.
Background
The human pronunciation mechanism is a very complex process, and to do so requires a significant amount of muscle and body organ involvement, and in some way synchronizes them in precise time. First, the brain understands a given situation and evaluates the impact due to speech. Then, if it is decided to speak, air is squeezed from the lungs up to the vocal cords, which causes them to vibrate at a particular frequency, producing sound. The vibrating air continues to flow to the brain-manipulated tongue, teeth, and lips to create a sound stream that becomes a word or phrase that we can understand. The brain closely monitors this process to ensure that the emitted sound expresses a unique intent, is understandable and can be heard by listeners. Due to this uninterrupted brain monitoring, every "event" of brain activity is reflected by the speech flow. The core of the system is that the long-term memory network LTSM algorithm accurately monitors small changes in the high frequency (RHFR) and the lower frequency (RLFR) of the voice of the slave object, so as to recognize and monitor the emotion change of the object.
Disclosure of Invention
The invention mainly aims to provide an emotion monitoring system based on sound frequency analysis, which is characterized in that speaking voice of an object is collected through an array microphone, voice signal preprocessing is carried out on voice characteristics by using MFCC voice emotion characteristic parameters, then the characteristics extracted by MFCC are input into an LSTM model, a complete voice sequence is obtained through LSTM, similarity calculation is carried out on the voice sequence output in LSTM, the emotion weight of each frame signal of the voice relative to the object is learned, and finally the obtained information is subjected to emotion classification through a full connection layer, so that the voice emotion of the object is recognized and monitored.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention;
FIG. 1: an implementation logic diagram of the present invention.
Detailed Description
In the following embodiments, shown in fig. 1, the emotion monitoring system based on sound frequency analysis includes: the system comprises an array microphone, a voice signal preprocessing algorithm, a voice emotion feature extraction algorithm and a voice emotion recognition algorithm.
The array microphone collects voice data of an object and transmits the voice data to the voice signal preprocessing algorithm.
The voice signal preprocessing algorithm adopts MFCC voice emotional characteristic parameters to process voice characteristics (acoustic characteristics, prosodic characteristics and acoustic characteristics), MFCC is a cepstrum coefficient extracted in a Mel scale frequency domain, is a characteristic widely used in automatic voice and speaker recognition, can simulate the characteristics of human ears, and constructs characteristic parameters through the auditory characteristics of human beings.
The voice emotion feature extraction algorithm uses an LSTM long-time memory model, the LSTM model is a time cycle neural network (RNN), large-scale acoustic modeling is more effective, the voice sequence is modeled in each layer of network according to the long-term dependence characteristic of the voice sequence, and the overall recognition performance is high.
The speech emotion recognition algorithm calculates the correlation weight of each time domain and emotional characteristics in the speech signal through sound frequency, then compares the correlation weights of different time domains in the speech signal, selects the time domain signal with larger weight from the correlation weights to recognize, and realizes recognition of speech emotion.
Although the present invention has been described with reference to specific examples, the description of the examples does not limit the scope of the present invention. Those skilled in the art can easily make various modifications or combinations of the embodiments without departing from the spirit and scope of the invention by referring to the description of the invention, which should also be construed as the scope of the invention.
Claims (4)
1. An emotion monitoring system based on sound frequency analysis, comprising:
array microphone can gather the pronunciation sound of test object within 3 meters to transmit the audio file to PC, the operation has on PC:
a speech signal preprocessing algorithm, which uses MFCC speech emotion feature parameters to process speech features, wherein MFCC is a speech emotion feature parameter, which is a cepstrum coefficient extracted in a Mel scale frequency domain and is a feature widely used in automatic speech and speaker recognition;
the speech emotion feature extraction algorithm is used for inputting the sound features extracted by the MFCC into the LSTM model and obtaining a complete speech sequence through the LSTM model;
the speech emotion recognition algorithm is used for carrying out similarity calculation on a speech sequence obtained by the LSTM model, learning emotion weight of each frame signal of speech relative to a test object, and carrying out emotion classification on the obtained information through high frequency and low frequency so as to realize recognition and monitoring of speech emotion of the test object;
and the voice emotion display interface displays the voice waveform of the test object on the user interface, and simultaneously displays the voice emotion change in a plurality of chart modes, wherein the charts comprise a line graph, a bar graph, a scatter diagram and an instrument panel.
2. The emotion monitoring system based on sound frequency analysis as claimed in claim 1, wherein the speech signal preprocessing algorithm selects MFCC as speech emotion feature to increase the utility of feature parameters and reduce the complexity of feature extraction, aiming at the characteristics of non-stationary randomness and time variability of speech signals, MFCC is a speech emotion feature parameter which is a cepstrum coefficient extracted in Mel scale frequency domain and is a feature widely used in automatic speech and speaker recognition, and the feature parameter is constructed by simulating the characteristics of human ears through the auditory features of human beings.
3. The emotion monitoring system based on sound frequency analysis as claimed in claim 1, wherein the speech emotion feature extraction algorithm selects LSTM as an improved method of a Recurrent Neural Network (RNN), and the LSTM circularly transmits states in a self network, so that the time series structure has a wider acceptable input range and a function of describing dynamic time behaviors.
4. The system of claim 1, wherein the speech emotion feature extraction algorithm can accurately monitor the subtle changes in voice from high frequency (RHFR) which can reflect emotional states with high excitement or strong, and lower frequency (RLFR) which can reflect stress states, thinking levels, and other cognitive processes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202021353381.9U CN212342269U (en) | 2020-07-11 | 2020-07-11 | Emotion monitoring system based on sound frequency analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202021353381.9U CN212342269U (en) | 2020-07-11 | 2020-07-11 | Emotion monitoring system based on sound frequency analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN212342269U true CN212342269U (en) | 2021-01-12 |
Family
ID=74081541
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202021353381.9U Active CN212342269U (en) | 2020-07-11 | 2020-07-11 | Emotion monitoring system based on sound frequency analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN212342269U (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113205009A (en) * | 2021-04-16 | 2021-08-03 | 广州朗国电子科技有限公司 | Animal emotion recognition method and device and storage medium |
CN113892952A (en) * | 2021-06-09 | 2022-01-07 | 上海良相智能化工程有限公司 | An intelligent research and judgment system |
-
2020
- 2020-07-11 CN CN202021353381.9U patent/CN212342269U/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113205009A (en) * | 2021-04-16 | 2021-08-03 | 广州朗国电子科技有限公司 | Animal emotion recognition method and device and storage medium |
CN113892952A (en) * | 2021-06-09 | 2022-01-07 | 上海良相智能化工程有限公司 | An intelligent research and judgment system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gevaert et al. | Neural networks used for speech recognition | |
Yu et al. | Speech enhancement based on denoising autoencoder with multi-branched encoders | |
KR20240135018A (en) | Multi-modal system and method for voice-based mental health assessment using emotional stimuli | |
CN109887489B (en) | Speech dereverberation method based on depth features for generating countermeasure network | |
CN107329996A (en) | A kind of chat robots system and chat method based on fuzzy neural network | |
KR19990028694A (en) | Method and device for evaluating the property of speech transmission signal | |
CN212342269U (en) | Emotion monitoring system based on sound frequency analysis | |
CN113571095B (en) | Speech emotion recognition method and system based on nested deep neural network | |
Parmar et al. | Effectiveness of cross-domain architectures for whisper-to-normal speech conversion | |
Cardona et al. | Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs | |
Chen et al. | Ema2s: An end-to-end multimodal articulatory-to-speech system | |
Fan et al. | The impact of student learning aids on deep learning and mobile platform on learning behavior | |
Ling | An acoustic model for English speech recognition based on deep learning | |
CN102880906A (en) | Chinese vowel pronunciation method based on DIVA nerve network model | |
Peng et al. | Urban noise monitoring using edge computing with CNN-LSTM on Jetson Nano | |
Pertilä et al. | Online own voice detection for a multi-channel multi-sensor in-ear device | |
Schoentgen | Vocal cues of disordered voices: An overview | |
Tsenov et al. | Speech recognition using neural networks | |
Rodriguez et al. | A fuzzy information space approach to speech signal non‐linear analysis | |
Luo | The improving effect of intelligent speech recognition System on english learning | |
Azam et al. | Urdu spoken digits recognition using classified MFCC and backpropgation neural network | |
Wang et al. | Speech Emotion Feature Extraction Method Based on Improved MFCC and IMFCC Fusion Features | |
Ganhinhin et al. | Voice conversion of tagalog synthesized speech using cycle-generative adversarial networks (cycle-gan) | |
Lin et al. | Bipolar population threshold encoding for audio recognition with deep spiking neural networks | |
Stolar et al. | Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GR01 | Patent grant | ||
GR01 | Patent grant |