CN119673161A

CN119673161A - AI-based emotion recognition method, device, computer equipment and storage medium

Info

Publication number: CN119673161A
Application number: CN202411830154.3A
Authority: CN
Inventors: 王盼
Original assignee: Shenzhen Coocaa Network Technology Co Ltd
Current assignee: Shenzhen Coocaa Network Technology Co Ltd
Priority date: 2024-12-12
Filing date: 2024-12-12
Publication date: 2025-03-21

Abstract

The present invention discloses an emotion recognition method, device, computer equipment and storage medium based on AI. Based on the received voice data, the timbre features are extracted from the voice data, and the user's personality features are obtained through the personality test data set submitted by the user; a virtual character is created according to the obtained timbre features and the user's personality features; the virtual character performs real-time emotion recognition on the user according to the changing characteristics of the tone in the user's conversation, and obtains a real-time emotion recognition result. This application creates a virtual character, and in the process of communicating with the virtual character, by judging the user's emotions, an emotional chat with the user is achieved.

Description

Emotion recognition method and device based on AI, computer equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to an AI-based emotion recognition method, apparatus, computer device, and storage medium.

Background

Along with the continuous development of AI technology, AI is increasingly widely applied in various fields, especially in the field of intelligent chat, but compared with chat with people, chat between people and AI is ice-cold and has no personal taste, and the existing AI also lacks the ability of identifying emotion of people, so a method capable of identifying emotion of people is needed to solve the problems.

Disclosure of Invention

The embodiment of the invention provides an emotion recognition method, an emotion recognition device, computer equipment and a storage medium based on AI (advanced technology attachment), so as to solve the problem of emotion recognition based on AI.

In a first aspect, an AI-based emotion recognition method is provided, including:

Extracting tone characteristics from the voice data based on the received voice data, and acquiring character characteristics of a user through a character test data set submitted by the user;

Creating a virtual character according to the mentioned tone characteristics and the character characteristics of the user;

and the virtual role carries out real-time emotion recognition on the user according to the change characteristics of the mood in the user dialogue, and a real-time emotion recognition result is obtained.

Optionally, creating a virtual character according to the mentioned tone color feature and the user character feature, including:

cloning the tone according to the extracted tone characteristics to create the sound of the virtual character;

And creating character characters of the virtual characters according to the character characteristics of the users.

Optionally, the virtual character performs real-time emotion recognition on the user according to the variation characteristics of the mood in the user dialogue, so as to obtain a real-time emotion recognition result, including:

In a dialogue between a user and a virtual character, the virtual character acquires speaking voice of the user and converts the voice into audio text so as to extract emotion words in the audio text;

and forwarding the extracted emotion words to an inference model, and judging the emotion of the user by the inference model according to the character characteristics of the user and the emotion words to obtain a user emotion recognition result.

Optionally, obtaining the real-time emotion recognition result, and then, including:

Generating a reply language conforming to the dialogue context according to the emotion recognition result, and generating a voice reply by using a voice model to complete the voice dialogue.

Optionally, the inference model determines the emotional intensity of the user according to the position of the emotional words in the audio text.

In a second aspect, there is provided an AI-based emotion recognition device, including:

the acquisition module is used for extracting tone characteristics from the voice data based on the received voice data and acquiring character characteristics of a user through a character test data set submitted by the user;

the creation module is used for creating a virtual character according to the mentioned tone characteristics and the user character characteristics;

And the identification module is used for carrying out real-time emotion identification on the user according to the change characteristics of the mood in the user dialogue by the virtual character to obtain a real-time emotion identification result.

Optionally, the identification module is further configured to:

In a third aspect, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above AI-based emotion recognition method when executing the computer program.

In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements the above AI-based emotion recognition method.

According to one scheme realized by the AI-based emotion recognition method, the AI-based emotion recognition device, the computer equipment and the storage medium, tone characteristics are extracted from voice data based on the received voice data, character characteristics of a user are obtained through character test data sets submitted by the user, virtual roles are created according to the mentioned tone characteristics and the character characteristics of the user, and real-time emotion recognition is carried out on the user by the virtual roles according to the change characteristics of the language in a user dialogue to obtain a real-time emotion recognition result. The application establishes the virtual character, and in the process of communicating with the virtual character, the chatting with emotion of the user is realized by judging emotion of the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of an AI-based emotion recognition method in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of an AI-based emotion recognition method in an embodiment of the present invention;

FIG. 3 is another flow chart of an AI-based emotion recognition method in an embodiment of the present invention;

FIG. 4 is another flow chart of an AI-based emotion recognition method in an embodiment of the present invention;

FIG. 5 is a schematic diagram of an AI-based emotion recognition device in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The emotion recognition method based on the AI provided by the embodiment of the application can be applied to an application environment shown in fig.1. Specifically, the AI-based emotion recognition method is applied to an AI-based emotion recognition system including a client and a server as shown in fig.1, the client and the server communicating through a network. The server can be realized by an independent server or a server cluster formed by a plurality of servers, the server is used for extracting tone characteristics from the voice data based on the received voice data, acquiring user character characteristics through character test data sets submitted by users, creating virtual roles according to the acquired tone characteristics and the user character characteristics, and carrying out real-time emotion recognition on the users according to the change characteristics of the language in the user dialogue to obtain real-time emotion recognition results. The application establishes the virtual character, and in the process of communicating with the virtual character, the chatting with emotion of the user is realized by judging emotion of the user. The client is also called a user end, and refers to a program corresponding to a server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices.

In one embodiment, as shown in fig. 2, an AI-based emotion recognition method is provided, and the method is applied in fig. 2 for illustration, and includes the following steps:

S10, extracting tone characteristics from the voice data based on the received voice data, and acquiring character characteristics of a user through a character test data set submitted by the user.

And S20, creating a virtual character according to the mentioned tone characteristics and the character characteristics of the user.

And S30, the virtual roles recognize the emotion of the user in real time according to the change characteristics of the mood in the dialogue of the user, and a real-time emotion recognition result is obtained.

Specifically, due to the superstrong intention recognition capability and the context memory capability of the AI, the AI is applied to chat software more and more, the existing AI can well complete chat with people, but the chat with people is cold due to whether the chat with people is lack of judgment on human emotion, in order to solve the problem, the application stores a large amount of audio data provided by a user after completing the character test, the audio data are the audio data comprising speaking states under different emotion, the tone characteristics of the user are obtained from the audio data, the tone characteristics of the user under different emotion can be obtained through a large amount of audio data of the user, the created virtual character can obtain the tone characteristics of the user under different emotion, meanwhile, the character test data set submitted by the user is combined, the character test data set refers to a character test data set submitted by the user when the user submits the audio data, after completing the character test, the answer of the character test of the user is stored, the character test data set is generated, the character test data set can be better understood, the character test characteristics of the user can be created through the virtual character test device, and the character test terminal can be better understood, and the character test characteristics of the user can be created. Wherein the creation of the virtual character is carried out through a virtual character cloud platform, the cloud platform creates the virtual character according to the tone characteristics and the character characteristics of the user after acquiring the tone characteristics and the character characteristics of the user in different moods, the created virtual character is a virtual character with the same tone as the user, and the emotion of the user can be judged based on the character characteristics of the user, wherein the created virtual character can be applied to various terminal devices including, but not limited to, various personal computers, notebook computers, tablet computers, mobile phones and other terminal devices.

In this embodiment, according to the method, tone characteristics are extracted from the voice data based on the received voice data, character characteristics of a user are obtained through a character test data set submitted by the user, a virtual character is created according to the mentioned tone characteristics and the character characteristics of the user, the virtual character carries out real-time emotion recognition on the user according to the change characteristics of the language in the user dialogue, and a real-time emotion recognition result is obtained, so that an AI can generate the virtual character according to the character characteristics of the user to generate chat content conforming to the style of the user, chat with emotion temperature and character characteristics of the user can be realized, and real emotion accompanying effect is achieved.

It is noted that, in addition to the acquisition of the character characteristics of the user through the character test, the method can also conduct dialogue guidance with the user in the process of creating the virtual character, judge the character characteristics of the user and finish the acquisition of the character characteristics of the user, and the acquisition mode of the test result of the character characteristics of the user is not limited, and the above is only exemplified.

In the application, the voice data is collected, besides the voice data uploaded by the user is received to extract the tone color characteristics of the user, the tone color characteristics of the user and the expression changes of the user under different emotions can be obtained by receiving the video of various emotions uploaded by the user and collecting the facial expressions of the user under different emotions and the tone color characteristics of the speech under different emotions in the video, so that the emotion of the user can be analyzed more accurately, and a more accurate emotion analysis result of the user can be obtained.

In one embodiment, as shown in fig. 3, an AI-based emotion recognition method is provided, and in S20, a virtual character is created according to the mentioned tone characteristics and user character characteristics, and specifically includes the following steps:

and S21, cloning the tone according to the extracted tone characteristics, and creating the sound of the virtual character.

And S22, creating character characters of the virtual characters according to the character characteristics of the users.

Specifically, when creating the virtual character, the user uploads the voice data according to the instruction, and simultaneously completes the response of character test according to the requirement, after receiving the voice data submitted by the user, the virtual character cloud platform submits the voice data to the voice model to complete tone reproduction, and simultaneously submits the data in the character test data set to the character model to complete extraction and modeling of character characteristics, and finally completes the creation of the virtual character.

In this embodiment, the present application creates the character of the virtual character based on the user character characteristics by cloning the tone based on the extracted tone characteristics, creating the sound of the virtual character,

In one embodiment, as shown in fig. 4, an AI-based emotion recognition method is provided, in S30, that is, the virtual character performs real-time emotion recognition on a user according to a feature of a change of a language in a dialogue of the user, so as to obtain a real-time emotion recognition result, which specifically includes the following steps:

s31, in a dialogue between a user and a virtual character, the virtual character acquires speaking voice of the user and converts the voice into audio text so as to extract emotion words in the audio text;

S32, forwarding the extracted emotion words to an inference model, and judging the emotion of the user by the inference model according to the character characteristics of the user and the emotion words to obtain a user emotion recognition result.

Specifically, when the virtual character is established, the user and the virtual character talk, the speaking voice of the user is required to be acquired through an automatic voice recognition technology, the speaking voice of the user is uploaded to the virtual character cloud platform, the virtual character cloud platform is required to convert the voice into an audio text after receiving the voice of the user, the extraction of emotion words in the speaking voice of the user is facilitated, the extracted emotion words are forwarded to an inference model, the inference model carries out calculation and inference on the extracted emotion words and the voice text, the emotion state of the user at the moment is judged, the inference result is fed back to the virtual character cloud platform, and the virtual character is controlled to talk according to the emotion recognition result and the user.

The method comprises the steps that a user speaks 'what is really a happy thing', an automatic voice recognition technology collects the words of the fact that the user speaks 'what is really a happy thing', the words of the user speaks are uploaded to a virtual character cloud platform, the virtual character cloud platform converts the words into texts after receiving the words, emotion word extraction is conducted on the 'what is really a happy thing', for example, 'happy', the extracted emotion words and voice texts are calculated and inferred according to the characteristics of the user character of the user, the emotion of the user at the moment is judged to be 'happy', the result that the emotion of the user is 'happy', and the virtual character cloud platform is controlled to conduct conversation with the user according to the emotion recognition result.

Specifically, the inference model is used for conveniently identifying words used for representing mood or emotion in the audio text according to the acquired audio text of the user speaking and extracting the emotion words, judging the emotion of the user speaking according to the position of the emotion words after the emotion words are extracted, judging the emotion of the user which is happy, angry or sad, and judging the current emotion intensity of the user which is sad, for example, judging the current emotion intensity of the user which is sad, judging the sad intensity of the user which is sad or very sad according to the position of the emotion words of the user, and the different emotions and different emotion intensities and states of the user when the user performs a conversation are different.

In this embodiment, in a dialogue between a user and a virtual character, the virtual character acquires speech of the user, converts the speech into an audio text, so as to extract emotion words in the audio text, and forwards the extracted emotion words to an inference model, wherein the inference model judges the emotion of the user according to the character characteristics and the emotion words of the user, and obtains a user emotion recognition result. According to the invention, the voice of the user is converted into the audio text, the emotion words in the voice of the user are extracted, the emotion state of the user when speaking is judged, the reasoning result is fed back to the virtual character cloud platform, the virtual character is controlled to carry out dialogue with the user according to the emotion recognition result, so that basic emotion recognition capability is realized, chat with the user with emotion temperature and character characteristics is achieved, and emotion accompaniment is carried out on the user.

It is noted that after the virtual character is created, the virtual character is trained according to the voice data provided by the user and the character characteristics of the user analyzed by using the character test data set, wherein the trained data include, but are not limited to, voice data of speaking under various emotions of the user, video of various submitted emotion facial expressions and character characteristics of the user, and the character and speaking logic of the user are analyzed and trained, so that a better conversation with the user is performed.

In one embodiment, an AI-based emotion recognition method is provided, and in S30, a real-time emotion recognition result is obtained, and then specifically includes the following steps:

Specifically, the inference model judges the emotion of the user according to the character features of the user and the emotion words, after the emotion recognition result of the user is obtained, the inference model generates a reply word for replying the voice of the user according to the emotion recognition result of the user, and then the voice model is used for replying the reply word conforming to the current dialogue context to the user to generate the voice so as to complete voice chat.

Illustratively, when the user and the virtual character are in conversation, for example, the user speaks "this is really a happy thing" to the virtual character, the inference model judges that the emotion of the user should be good at the moment according to the character characteristics of the user and the emotion words "happy", and then the inference model can generate a reply word conforming to the current conversation context according to the emotion of the user to generate a voice reply to the user, for example, reply to the user "this is really a good moment, i can share the happy together with me, and the content is only used for illustration and is not limited at all.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, an AI-based emotion recognition device is provided, which corresponds to the AI-based emotion recognition method in the above embodiment one by one. As shown in fig. 5, the AI-based emotion recognition device includes an acquisition module, a creation module, and a recognition module. The functional modules are described in detail as follows:

In particular, due to the ultra-strong intention recognition capability and the context memory capability of the AI, the application of the AI in chat software is increased, the existing AI can well complete the chat with people, but the chat with people is ice-cold due to the lack of judgment on human emotion, in order to solve the problems, the application can obtain the tone characteristics of the user through receiving a large amount of audio data provided by the user, and the tone characteristics of the user under different emotions through obtaining a large amount of audio data of the user, so that the created virtual character obtains the speaking characteristics of the user under different states and combines with the character test data set submitted by the user, wherein the character test data set refers to the condition that the user submits the audio data, after the character test is completed, answers of the character test of the user are saved, the character test data set is generated, character characteristics of the user can be better known according to the character test data set, so that the created virtual user can know the character characteristics of the user, wherein the creation of the virtual character is carried out through a virtual character cloud platform, the cloud platform creates virtual characters according to tone characteristics and character characteristics of the user after acquiring tone characteristics and character characteristics of the user, and the created virtual characters are virtual characters with the same tone as the user and can judge the emotion of the user based on the character characteristics of the user because the creation of the virtual characters is based on the tone characteristics and character characteristics of the user.

Optionally, the identification module is further configured to:

Specifically, when the virtual character is established, the user and the virtual character talk, the speaking voice of the user is required to be acquired through an automatic voice recognition technology, the speaking voice of the user is uploaded to the virtual character cloud platform, the virtual character cloud platform is required to convert the voice into audio text after receiving the voice of the user, the extraction of emotion words in the speaking voice of the user is facilitated, the extracted emotion words are forwarded to an inference model, the inference model carries out calculation and inference on the extracted emotion words and the voice text, the emotion state of the user at the moment is judged, the inference result is fed back to the virtual character cloud platform, and the virtual character is controlled to talk according to the emotion recognition result and the user.

According to one scheme realized by the AI-based emotion recognition device, tone characteristics are extracted from voice data based on the received voice data, character characteristics of a user are obtained through character test data sets submitted by the user, virtual roles are created according to the mentioned tone characteristics and the character characteristics of the user, and real-time emotion recognition is carried out on the user by the virtual roles according to the change characteristics of the mood in a user dialogue, so that a real-time emotion recognition result is obtained. The application establishes the virtual character, and in the process of communicating with the virtual character, the chatting with emotion of the user is realized by judging emotion of the user.

For specific limitations on the AI-based emotion recognition device, reference may be made to the limitations on the AI-based emotion recognition method hereinabove, and no further description is given here. The respective modules in the above AI-based emotion recognition device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The computer program, when executed by a processor, implements an AI-based emotion recognition method. The computer equipment is used for extracting tone characteristics from the voice data based on the received voice data, acquiring user character characteristics through character test data sets submitted by users, creating virtual roles according to the acquired tone characteristics and the user character characteristics, and carrying out real-time emotion recognition on the users according to the change characteristics of the mood in user conversations to obtain real-time emotion recognition results. The application establishes the virtual character, and in the process of communicating with the virtual character, the chatting with emotion of the user is realized by judging emotion of the user. The network interface of the computer device is used for communicating with an external terminal through a network connection.

In an embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the AI-based emotion recognition method in the above embodiment, for example, an application environment schematic diagram of the AI-based emotion recognition method shown in fig. 1, or shown in fig. 2 to 4, and is not repeated herein. Or the processor when executing the computer program implements the functions of the modules/units in this embodiment of the AI-based emotion recognition device, such as the AI-based emotion recognition function shown in fig. 5, and will not be described again here for avoiding repetition.

In an embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, where the computer program is executed by a processor to implement the AI-based emotion recognition method in the above embodiment, for example, an application environment schematic diagram of the AI-based emotion recognition method shown in fig. 1, or flowcharts of the AI-based emotion recognition method shown in fig. 2 to 4, which are not repeated herein. Or the computer program, when executed by the processor, implements the functions of the modules/units in the embodiment of the AI-based emotion recognition device, such as the AI-based emotion recognition function shown in fig. 5, which is not repeated here. The computer readable storage medium may be nonvolatile or may be volatile.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The foregoing embodiments are merely illustrative of the technical solutions of the present invention, and not restrictive, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may still be made to the technical solutions described in the foregoing embodiments or equivalent substitutions of some technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An AI-based emotion recognition method, comprising:

2. The AI-based emotion recognition method of claim 1, wherein creating a virtual character from the mentioned timbre feature and the user personality feature comprises:

3. The AI-based emotion recognition method of claim 1, wherein the virtual character performs real-time emotion recognition on the user based on the change characteristics of the mood in the user dialogue, and obtains a real-time emotion recognition result, comprising:

4. The AI-based emotion recognition method of claim 1, wherein a real-time emotion recognition result is obtained, and thereafter, comprising:

5. The AI-based emotion recognition method of claim 3, wherein the inference model determines the emotional intensity of the user from the position of the emotional word in the audio text.

6. The AI-based emotion recognition method of claim 1, wherein the created virtual character is applied in a different terminal device.

7. An AI-based emotion recognition device, comprising:

8. The AI-based emotion recognition device of claim 7, comprising the recognition module further to:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the AI-based emotion recognition method of any of claims 1-6 when the computer program is executed.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the AI-based emotion recognition method of any of claims 1-6.