CN212342269U

CN212342269U - Emotion monitoring system based on sound frequency analysis

Info

Publication number: CN212342269U
Application number: CN202021353381.9U
Authority: CN
Inventors: 丁晨; 刘豫华; 陈磊
Original assignee: Suzhou Qunzhi Intelligent Technology Co ltd
Current assignee: Suzhou Qunzhi Intelligent Technology Co ltd
Priority date: 2020-07-11
Filing date: 2020-07-11
Publication date: 2021-01-12
Anticipated expiration: 2030-07-11

Abstract

The invention provides an emotion monitoring system based on sound frequency analysis, which comprises: the array microphone collects the speaking voice of a tested object, the array microphone is connected with the PC, and the PC is operated with: the voice signal preprocessing algorithm is used for processing voice characteristics by using MFCC voice emotional characteristic parameters; the speech emotion feature extraction algorithm is characterized in that an LSTM is adopted to obtain a complete speech sequence, similarity calculation is carried out on the speech sequence output in the LSTM, and emotion weight of each frame signal of speech relative to a test object is confirmed; a speech emotion recognition algorithm that recognizes the emotional state of a test subject based on small changes in speech from high frequency (RHFR) and lower frequency (RLFR).

Description

Emotion monitoring system based on sound frequency analysis

Technical Field

The invention relates to the field of biological recognition and artificial intelligence, and can be applied to some special industries, such as public security inquiries, inspection commission conversation and other business scenes needing to detect the emotion change of an object.

Background

The human pronunciation mechanism is a very complex process, and to do so requires a significant amount of muscle and body organ involvement, and in some way synchronizes them in precise time. First, the brain understands a given situation and evaluates the impact due to speech. Then, if it is decided to speak, air is squeezed from the lungs up to the vocal cords, which causes them to vibrate at a particular frequency, producing sound. The vibrating air continues to flow to the brain-manipulated tongue, teeth, and lips to create a sound stream that becomes a word or phrase that we can understand. The brain closely monitors this process to ensure that the emitted sound expresses a unique intent, is understandable and can be heard by listeners. Due to this uninterrupted brain monitoring, every "event" of brain activity is reflected by the speech flow. The core of the system is that the long-term memory network LTSM algorithm accurately monitors small changes in the high frequency (RHFR) and the lower frequency (RLFR) of the voice of the slave object, so as to recognize and monitor the emotion change of the object.

Disclosure of Invention

The invention mainly aims to provide an emotion monitoring system based on sound frequency analysis, which is characterized in that speaking voice of an object is collected through an array microphone, voice signal preprocessing is carried out on voice characteristics by using MFCC voice emotion characteristic parameters, then the characteristics extracted by MFCC are input into an LSTM model, a complete voice sequence is obtained through LSTM, similarity calculation is carried out on the voice sequence output in LSTM, the emotion weight of each frame signal of the voice relative to the object is learned, and finally the obtained information is subjected to emotion classification through a full connection layer, so that the voice emotion of the object is recognized and monitored.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention;

FIG. 1: an implementation logic diagram of the present invention.

Detailed Description

In the following embodiments, shown in fig. 1, the emotion monitoring system based on sound frequency analysis includes: the system comprises an array microphone, a voice signal preprocessing algorithm, a voice emotion feature extraction algorithm and a voice emotion recognition algorithm.

The array microphone collects voice data of an object and transmits the voice data to the voice signal preprocessing algorithm.

The voice signal preprocessing algorithm adopts MFCC voice emotional characteristic parameters to process voice characteristics (acoustic characteristics, prosodic characteristics and acoustic characteristics), MFCC is a cepstrum coefficient extracted in a Mel scale frequency domain, is a characteristic widely used in automatic voice and speaker recognition, can simulate the characteristics of human ears, and constructs characteristic parameters through the auditory characteristics of human beings.

The voice emotion feature extraction algorithm uses an LSTM long-time memory model, the LSTM model is a time cycle neural network (RNN), large-scale acoustic modeling is more effective, the voice sequence is modeled in each layer of network according to the long-term dependence characteristic of the voice sequence, and the overall recognition performance is high.

The speech emotion recognition algorithm calculates the correlation weight of each time domain and emotional characteristics in the speech signal through sound frequency, then compares the correlation weights of different time domains in the speech signal, selects the time domain signal with larger weight from the correlation weights to recognize, and realizes recognition of speech emotion.

Although the present invention has been described with reference to specific examples, the description of the examples does not limit the scope of the present invention. Those skilled in the art can easily make various modifications or combinations of the embodiments without departing from the spirit and scope of the invention by referring to the description of the invention, which should also be construed as the scope of the invention.

Claims

1. An emotion monitoring system based on sound frequency analysis, comprising:

array microphone can gather the pronunciation sound of test object within 3 meters to transmit the audio file to PC, the operation has on PC:

a speech signal preprocessing algorithm, which uses MFCC speech emotion feature parameters to process speech features, wherein MFCC is a speech emotion feature parameter, which is a cepstrum coefficient extracted in a Mel scale frequency domain and is a feature widely used in automatic speech and speaker recognition;

the speech emotion feature extraction algorithm is used for inputting the sound features extracted by the MFCC into the LSTM model and obtaining a complete speech sequence through the LSTM model;

the speech emotion recognition algorithm is used for carrying out similarity calculation on a speech sequence obtained by the LSTM model, learning emotion weight of each frame signal of speech relative to a test object, and carrying out emotion classification on the obtained information through high frequency and low frequency so as to realize recognition and monitoring of speech emotion of the test object;

and the voice emotion display interface displays the voice waveform of the test object on the user interface, and simultaneously displays the voice emotion change in a plurality of chart modes, wherein the charts comprise a line graph, a bar graph, a scatter diagram and an instrument panel.

2. The emotion monitoring system based on sound frequency analysis as claimed in claim 1, wherein the speech signal preprocessing algorithm selects MFCC as speech emotion feature to increase the utility of feature parameters and reduce the complexity of feature extraction, aiming at the characteristics of non-stationary randomness and time variability of speech signals, MFCC is a speech emotion feature parameter which is a cepstrum coefficient extracted in Mel scale frequency domain and is a feature widely used in automatic speech and speaker recognition, and the feature parameter is constructed by simulating the characteristics of human ears through the auditory features of human beings.

3. The emotion monitoring system based on sound frequency analysis as claimed in claim 1, wherein the speech emotion feature extraction algorithm selects LSTM as an improved method of a Recurrent Neural Network (RNN), and the LSTM circularly transmits states in a self network, so that the time series structure has a wider acceptable input range and a function of describing dynamic time behaviors.

4. The system of claim 1, wherein the speech emotion feature extraction algorithm can accurately monitor the subtle changes in voice from high frequency (RHFR) which can reflect emotional states with high excitement or strong, and lower frequency (RLFR) which can reflect stress states, thinking levels, and other cognitive processes.