[go: up one dir, main page]

CN111724799B - Sound expression application method, device, equipment and readable storage medium - Google Patents

Sound expression application method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111724799B
CN111724799B CN201910217204.3A CN201910217204A CN111724799B CN 111724799 B CN111724799 B CN 111724799B CN 201910217204 A CN201910217204 A CN 201910217204A CN 111724799 B CN111724799 B CN 111724799B
Authority
CN
China
Prior art keywords
expression
sound
emotion
voice
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910217204.3A
Other languages
Chinese (zh)
Other versions
CN111724799A (en
Inventor
贾锦杰
廖多依
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910217204.3A priority Critical patent/CN111724799B/en
Publication of CN111724799A publication Critical patent/CN111724799A/en
Application granted granted Critical
Publication of CN111724799B publication Critical patent/CN111724799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an application method, device and equipment of a sound expression and a readable storage medium. The method comprises the following steps: providing an expression selection interface, and displaying the sound expression which can be selected by a user; receiving an expression selection instruction of a user, and determining a corresponding target sound expression; and playing the target sound expression in an associated application window associated with the target sound expression.

Description

Sound expression application method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for applying a sound expression.
Background
With the rapid development of internet technology and the popularization of intelligent terminals, more and more users are used to social contact through social applications (e.g., weChat, QQ, microblog, blogs, etc.) installed on terminal devices such as smartphones, palm computers, tablet computers, etc. to provide instant chat services or leave word comment services.
When users communicate with the social application, the users can input contents such as characters, voices and pictures to interact with other users through a chat window or a message window provided by the social application to realize social interaction, and meanwhile, the users can express self emotion or emotion when communicating with other users by using picture expressions provided or supported by the social application, so that the communication is more concise and vivid.
However, in a social scene that a user does not communicate by inputting characters or pictures, the current social application only provides a function that the user sends voice to communicate, and the user cannot directly express the emotion or emotion of the user in a concise and vivid way, namely, in the process of using the social application to communicate voice, so that the emotion or emotion expression requirement of the user in voice communication is difficult to be met practically.
Disclosure of Invention
It is an object of the present invention to provide a new solution for applying sound expressions.
According to a first aspect of the present invention, there is provided a method for applying a sound expression, including:
Providing an expression selection interface, and displaying the sound expression which can be selected by a user;
receiving an expression selection instruction of a user, and determining a corresponding target sound expression;
and playing the target sound expression in an associated application window associated with the target sound expression.
Optionally, each of the sound expressions has a corresponding emotion feature and sound content; the emotion characteristics at least comprise emotion types and emotion degrees;
the method further comprises the steps of:
Acquiring voice emotion characteristics of a user according to associated voice data input by the user; the voice emotion characteristics at least comprise voice emotion types and voice emotion degrees;
and selecting the sound expression corresponding to the emotion characteristics from a sound expression database comprising a plurality of sound expressions according to the voice emotion characteristics, and taking the sound expression corresponding to the emotion characteristics as the sound expression displayed through the expression selection interface.
Optionally, the step of acquiring the voice emotion feature of the user according to the associated voice data input by the user includes:
Performing voice analysis on the associated voice data to obtain tone characteristics, volume characteristics and rhythm characteristics of the associated voice data;
Processing the tone characteristics, the volume characteristics and the rhythm characteristics of the associated voice data according to the emotion characteristic extraction model to obtain the corresponding voice emotion characteristics;
the emotion feature extraction model is a machine learning model obtained through training of collected voice samples and is used for inputting corresponding voice emotion features according to the tone features, volume features and rhythm features of input voice.
Optionally, the step of acquiring the voice emotion feature of the user according to the associated voice data input by the user includes:
Converting the associated voice data into associated text data corresponding to the associated text data;
Extracting emotion keywords from the associated text data according to a pre-constructed emotion word library;
carrying out structural analysis on the emotion keywords through an emotion structural model to obtain the corresponding voice emotion characteristics;
the emotion structural model is a vocabulary model obtained by classifying and structuring collected emotion vocabularies related to emotion; each emotion vocabulary included in the emotion structural model has a corresponding emotion type and emotion degree.
Optionally, the step of providing an expression selection interface to display the acoustic expression for user selection includes:
Generating a corresponding sound waveform according to the emotion characteristics of the sound expression so as to display the sound expression;
And/or the number of the groups of groups,
The expression selection indication is a voice selection indication input by a user.
Optionally, each of the sound expressions has a corresponding emotion feature and sound content; the emotion characteristics at least comprise emotion types and emotion degrees; the sound expression comprises a voice expression and an audio expression; the sound content of the voice expression is voice corresponding to emotion characteristics of the voice expression; the sound content of the sound effect expression is a sound effect corresponding to the emotion characteristics of the sound effect expression;
The step of playing the target sound expression in an associated application window associated with the target sound expression includes:
When the target sound expression is the voice expression, in the process of playing the user voice associated with the target sound expression in the associated application window, playing the target sound expression according to the insertion position of the target sound expression in the user voice;
And when the target sound expression is the sound effect expression, mixing and synthesizing the user voice associated with the target sound expression and the target sound expression, and then playing the user voice in the associated application window.
Optionally, the method further comprises:
responding to a sound expression generating request of a user, and generating a corresponding sound expression according to sound content input by the user so as to be selected by the user.
According to a second aspect of the present invention, there is provided an application apparatus of a sound expression, including:
the expression providing unit is used for providing an expression selection interface and displaying the sound expression which can be selected by the user;
The expression determining unit is used for receiving the expression selection instruction of the user and determining a corresponding target sound expression;
And the expression playing unit is used for playing the target sound expression in an associated application window associated with the target sound expression.
According to a third aspect of the present invention, there is provided an application apparatus of a sound expression, comprising:
A display device;
a memory for storing executable instructions;
And the processor is used for running the application equipment of the sound expression to execute the application method of the sound expression according to the first aspect of the invention according to the control of the executable instructions.
According to a fourth aspect of the present invention, there is provided a readable storage medium, wherein the readable storage medium stores a computer program readable by a computer for executing the method of applying the acoustic expression according to the first aspect of the present invention when the computer program is read and executed by the computer.
According to one embodiment of the disclosure, an expression selection interface is provided to display selectable sound expressions with corresponding emotion characteristics and sound contents, after receiving an expression selection instruction of a user, a target sound expression selected by the user is determined, and in an associated application window associated with the target sound expression, the target sound expression is played, so that the user can vividly and succinctly express own emotion or feeling through the sound expression directly in the process of communicating through voice, the emotion or emotion expression requirement of the user in voice communication is met, and the voice communication experience of the user is improved. The method is particularly suitable for application scenes such as voice chat, voice message or voice comment.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram showing an example of a hardware configuration of an electronic device that can be used to implement an embodiment of the present invention.
Fig. 2 shows a flowchart of an application method of the acoustic expression of the embodiment of the present invention.
Fig. 3 is a schematic diagram of an example of an expression selection interface showing a sound expression.
Fig. 4 is a schematic diagram of an example of generating a sound expression from a sound expression generation request of a user.
Fig. 5 is a schematic diagram of an example of applying a sound expression in a voice chat scenario.
Fig. 6 is a schematic diagram of an example of applying a sound expression in a speech comment scene.
Fig. 7 shows a block diagram of an application device 3000 of a sound expression of an embodiment of the present invention.
Fig. 8 shows a block diagram of an application device 4000 of a sound expression of an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
< Hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of an electronic device 1000 in which an embodiment of the present invention can be implemented.
The electronic device 1000 may be a laptop, desktop, cell phone, tablet, etc. As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 can be capable of wired or wireless communication, and specifically can include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display, a touch display, or the like. The input device 1600 may include, for example, a touch screen, keyboard, somatosensory input, and the like. A user may input/output voice information through the speaker 1700 and microphone 1800.
The electronic device shown in fig. 1 is merely illustrative and is in no way meant to limit the invention, its application or uses. In an embodiment of the present invention, the memory 1200 of the electronic device 1000 is configured to store instructions for controlling the processor 1100 to operate to perform any one of the methods for applying the acoustic expressions provided in the embodiment of the present invention. It will be appreciated by those skilled in the art that although a plurality of devices are shown for electronic device 1000 in fig. 1, the present invention may relate to only some of the devices, for example, electronic device 1000 may relate to only processor 1100 and storage device 1200. The skilled person can design instructions according to the disclosed solution. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
< Example >
The general idea in this embodiment is to provide an application scheme of a sound expression, provide an expression selection interface to display alternative sound expressions with corresponding emotion features and sound contents, determine a target sound expression selected by a user after receiving an expression selection instruction of the user, and play the target sound expression in an associated application window associated with the target sound expression, so that the user can directly express his own emotion or feeling through the sound expression in a vivid and concise manner in a process of communicating through voice, thereby meeting emotion or emotion expression requirements of the user in voice communication and improving voice communication experience of the user. The method is particularly suitable for application scenes such as voice chat, voice message or voice comment.
< Method >
In this embodiment, an application method of a sound expression is provided, as shown in fig. 2, including: steps S2100-S2300.
In step S2100, an expression selection interface is provided to display the acoustic expression for the user to select.
The expression selection interface is a man-machine interaction interface for the user to interact through gesture operations such as clicking, sliding, hooking and the like or voice and text input, and the voice expression displayed in the selection interface can be shown in fig. 3, for example.
In this embodiment, each of the sound expressions has a corresponding sound content.
The sound content is specific content when the sound expression is specifically played. In this embodiment, the sound content of the sound expression accords with a preset sound duration, and the sound inner margin is set within the corresponding sound duration, so that the influence of the sound expression on the normal voice communication of the user can be avoided, and the sound duration can be set according to a specific application scenario or application requirement, for example, set to 1 second.
In this embodiment, the sound content of the sound expression may have different content classifications according to the content to be expressed. For example, the sound content may be classified into emotion expression class, speech response class, action expression class, and state expression class. The sound content of the emotion expression class may be used to express a specific emotion, such as happiness, embarrassment, comfort, etc. The sound content of the talk response class may be sound response content designed for some specific preamble, e.g. "dumb" designed for sorry, "worship" designed for bye, etc. The sound content of the action expression class can be sound content of some specific actions through sound simulation or directly output, for example, directly simulate whistle with sound or record sound saying "bounce ten thousand times", etc. The sound content of the status expression class is used to express the current status of the user, such as "i eat", "i want to take a bath to get a la", etc.
Each of the sound expressions may have a corresponding emotional characteristic.
The emotion characteristics are emotion or emotion characteristics reflected by the corresponding sound expression. The emotion characteristics may include emotion type and emotion degree. The emotion type may be a type preset according to emotion and emotion classification of human beings, for example, the emotion type may include vigilance, happiness, sadness, joy and the like, the emotion degree may include emotion degrees of the corresponding emotion types, for example, the emotion type of vigilance may include anger, fire, anger and other different degrees of vigilance emotion. Depending on the specific application scenario or application requirements, the emotional features may also include other content, for example, may also include expression topics, and scene settings for different use expressions, for example, include new year red pack topics, open study topics, and the like.
In this embodiment, the sound expression selectable by the user may be automatically built-in record generated by an operating system of the electronic device supporting playing the sound expression or an application system supporting playing the sound expression according to specific application requirements, or may also be generated by specific sound effect personnel and dubbing personnel (such as certain stars and the like) according to specified content record.
In one example, the method for applying the acoustic expression provided in the present embodiment further includes:
responding to a sound expression generating request of a user, and generating a corresponding sound expression according to sound content input by the user so as to be selected by the user for use.
The voice expression generation request is a command triggered by a user and requesting to generate the voice expression, and the command can be triggered and sent by a user through man-machine interaction operations such as clicking or hooking a corresponding function button, gesture selection function item and the like or voice command input and the like. For example, as shown in fig. 4, the request for generating the acoustic expression may be sent by providing a function button for the user to click on to trigger.
In addition, as shown in fig. 4, in this example, in response to the request for generating the sound expression, a prompt box may be provided to guide the user to input the sound content, and may further guide the user to configure the emotion feature corresponding to the sound expression, generate the corresponding sound expression according to the sound content input by the user and the emotion feature of the configured sound expression, and provide the trial listening of the sound expression, so that the user confirms whether to store the sound expression as the sound expression, and after the user confirms, store the generated sound expression for the user to select for use later.
Through this example, can be according to user's self expression user demand, the customization of sound expression is realized to the sound expression that produces corresponding, satisfies the individualized demand of user to sound expression in a flexible way.
In this embodiment, the selectable acoustic expressions displayed through the expression selection interface may be all selectable acoustic expressions, so as to avoid the user spending more time selecting among a large number of acoustic expressions, and the acoustic expressions commonly used by the user may be preferentially displayed to the user for selection, so as to improve the selection efficiency of the user.
In one example, each of the sound expressions may have a corresponding emotion feature and sound content, and step S2100 in this embodiment may further include: step S2110 to step S2120.
Step S2110, according to the associated voice data input by the user, the voice emotion characteristics of the user are obtained.
In this example, the associated speech data is speech data for which the user desires to express a corresponding emotion or emotion using a sound expression. For example, in the context of voice chat, the associated voice data is a sentence of voice that the user desires to use a sound expression; in the voice message scene, the associated voice data is a voice message which is input by a user and is expected to use the voice expression; in the context of speech comments, the associated speech data is a speech comment entered by the user that is desired to use a sound expression.
In this example, the speech emotion feature is an emotion or emotional feature embodied in the associated speech data. The voice emotion characteristics at least comprise voice emotion types and voice emotion degrees. The voice emotion features are similar to those of the above-mentioned sound expression, and the corresponding voice emotion types are similarly classified as those of the above-mentioned sound expression, and the voice emotion degree is similar to that of the above-mentioned sound expression, and will not be described again here.
In a more specific example, step S2110 may include: steps S2111 to S2112.
Step S2111, performing voice analysis on the associated voice data to obtain a pitch feature, a volume feature and a rhythm feature of the associated voice data.
The voice analysis is carried out on the associated voice data, and the tone height, the volume, the rhythm speed and the like of the associated voice data can be determined by using a common voice signal analysis means, so that the tone characteristic, the volume characteristic and the rhythm characteristic of the associated voice data are correspondingly obtained.
The tone level, the volume level, the rhythm speed and the like of the associated voice data can reflect the emotion or feeling contained in the voice to a certain extent, for example, the voice which reflects the sad emotion is usually low in volume, low in tone, low in rhythm and the like, and the voice emotion characteristic reflected by the associated voice data can be obtained by combining the tone characteristic, the volume characteristic and the rhythm characteristic of the associated voice data with the subsequent steps.
Step S2212, according to the emotion feature extraction model, the tone feature, the volume feature and the rhythm feature of the associated voice data are processed to obtain the corresponding voice emotion feature.
The emotion feature extraction model is a machine learning model obtained through training of collected voice samples and is used for inputting corresponding voice emotion features according to tone features, volume features and rhythm features of input voice. Specifically, a large number of voice samples can be collected in advance, each voice sample has corresponding volume, tone and rhythm, preset voice emotion characteristics are reflected, training is performed through the large number of voice samples, training can be achieved through a training network commonly used for training a machine learning model, such as a convolutional neural network, finally, the obtained emotion characteristic extraction model can be obtained through training, and the voice emotion characteristics reflected by the voice can be identified and output according to the tone characteristics, the tone characteristics and the rhythm characteristics of the input voice.
The emotion feature extraction model obtained by training a large number of voice samples is utilized to identify the voice emotion features reflected by the associated voice data, so that the current emotion or feeling of the user reflected by the associated voice data can be accurately and effectively extracted.
In another more specific example, step S2110 may include: steps S21101 to S21103.
Step S21101 converts the associated voice data into associated text data corresponding to the pair.
In this example, the associated voice data may be passed through a voice recognition engine or a voice-to-text tool, plug-in, etc., to obtain corresponding associated text data.
Step S21102, extracting emotion keywords from the associated text data according to the pre-constructed emotion word library.
The emotion word library comprises a plurality of emotion words which respectively reflect different human emotions or human emotions. In this example, the emotion words can be mined manually or by a machine to construct an emotion word library.
According to the emotion word library, similarity analysis can be carried out on words obtained by word segmentation of the associated text data and emotion words contained in the emotion word library through cosine similarity and other methods, and emotion words with similarity higher than a preset similarity threshold value are extracted to serve as emotion keywords.
Step S21103, carrying out structural analysis on the emotion keywords through an emotion structural model to obtain corresponding voice emotion characteristics.
The emotion structured model is a vocabulary model obtained by classifying and structuring collected emotion vocabularies related to emotion. Each emotion vocabulary included in the emotion structural model has a corresponding emotion type and emotion degree.
In this example, emotion vocabularies obtained through manual or machine mining in advance can be classified into different levels according to human emotion or human emotion, for example, emotion vocabularies belonging to the same emotion type are classified into major classes according to each emotion type, emotion vocabularies belonging to the same emotion type are further subdivided into different minor classes according to different emotion degrees in each major class, emotion vocabularies can be ordered according to the emotion degrees under each minor class to form structures of different classification levels, and accordingly emotion structured models corresponding to the emotion vocabularies are organized.
Through the emotion structural model, the emotion keywords are subjected to structural analysis, emotion words corresponding to the emotion keywords can be found in the emotion structural model, and voice emotion characteristics are obtained according to emotion types and emotion degrees of the emotion words.
In this example, through extracting emotion keywords from associated text data corresponding to associated voice data through a preset emotion word library, and then through carrying out structural analysis on emotion keywords through an emotion structural model obtained by carrying out hierarchical structure organization on emotion words, voice emotion characteristics are obtained, a large number of voice samples can be not required to be collected, and voice emotion characteristics reflected by the associated voice data can be rapidly and effectively obtained through a simple structural analysis means.
In practical application, based on the two examples of obtaining the voice emotion characteristics reflected by the associated voice data disclosed in the embodiment, a person skilled in the art may select any one of the two examples to implement according to a specific application scenario or application requirements, or in order to obtain more accurate voice emotion characteristics, the two examples may be implemented simultaneously, so as to obtain the voice emotion characteristics reflected by the associated voice data respectively, and the overlapping portion of the obtained voice emotion characteristics in the two examples is extracted as the final voice emotion characteristic.
After the voice emotion characteristics of the user are obtained in step S2110, the process proceeds to:
Step S2120, selecting the sound expression with emotion characteristics corresponding to the voice emotion characteristics from a sound expression database comprising a plurality of sound expressions according to the voice emotion characteristics as the sound expression displayed through the expression selection interface.
In this example, the sound expression database including a plurality of sound expressions may be constructed by built-in generation of an operating system or an application system supporting the playing of sound expressions, by recording generation of specific persons, or by customizing generation of various sound expressions according to a user's request.
The emotional characteristics of the acoustic expression embody the emotion or feeling expressed when the acoustic expression is used. The speech emotion features represent the emotion or feeling that is hidden in the associated speech data that the user inputs and desires to use the acoustic expression. The voice expression corresponding to the emotion characteristics and the voice emotion characteristics is selected from the voice expression database and used as the voice expression displayed through the expression selection interface, the display range of the voice expression can be narrowed, the user can select the voice expression meeting the emotion or emotion expression requirements more quickly and efficiently, and the voice expression using experience of the user is improved.
It should be understood that in practical application, in order to better satisfy the user's use experience of the sound expression, the sound expression corresponding to the emotion characteristics and the voice emotion characteristics can be selected from the sound expression database for preferential display, and when the user fails to select the sound expression, other sound expressions in the sound expression database are displayed again, so as to finally satisfy the user's use requirement of the sound expression.
In the present embodiment, the sound expression may be presented in the expression selection interface in various manners, for example, the sound expression may be presented by using a distinguishable icon, number, or the sound expression may also be presented by a keyword of the sound content of the sound expression, or the like. When the sound expression is displayed, a hearing test function can be provided for the user to play the sound expression for hearing test, and then the sound expression is selected.
In a specific example, step S2100 may include:
And generating corresponding sound waveforms according to the emotion characteristics of the sound expression so as to display the sound expression.
The emotion characteristics of the sound expression include emotion type and emotion degree. In this example, the color of the sound waveform may be set according to the emotion type, the shape of the sound waveform (waveform amplitude size, waveform period, etc.) may be set according to the emotion degree, and the corresponding sound waveform may be generated according to the set color and shape of the sound waveform. For example, as shown in fig. 3, the emotion feature of the sound expression 1 is "lively", the emotion degree is a "slightly angry" sound expression, and is shown by a black, smaller-amplitude sound waveform, the emotion feature of the sound expression 2 is "happy", the emotion degree is a "happy bad" sound expression, and is shown by a light, larger-amplitude sound waveform, and so on.
The sound expression is displayed through the sound waveform generated according to the emotion characteristics of the sound expression, so that a user can intuitively and rapidly know the emotion characteristics reflected by the sound expression, and the sound expression meeting the self expression use requirement can be more conveniently selected.
After the above step S2100 displays the sound expression through the expression selection interface, the process proceeds to:
Step S2200, receiving an expression selection instruction of a user, and determining a corresponding target sound expression.
The user's expression selection indication is used to indicate which sound expression is selected as the target sound expression in the expression selection interface by the user.
In this embodiment, the expression selection instruction may be triggered according to operations such as clicking, hooking, and sliding gestures performed by the user on the expression selection interface.
In one example, the expression selection indication is a voice selection indication entered by the user. The voice selection indication may be a number indicating a selected target sound expression, a voice of a keyword, or the like.
The voice expression is selected through the voice selection indication, so that a user can directly select the voice expression through the voice instruction without manual operation, the voice expression is smoother to use in the pure voice communication process, and the efficiency is higher.
After determining the target sound expression selected by the user in step S2200, the process proceeds to:
Step S2300, playing the target sound expression in an associated application window associated with the target sound expression.
The associated application window is an application window provided in an associated application of the target sound expression to be played. For example, the target sound expression is used in voice chat, and the associated application window is a chat window provided by a social application providing a voice chat service; the target sound expression is used in a voice message, and the associated application window is a message window provided by an application providing voice message service; the target sound expression is used in voice comments, and the associated application window is a comment window provided by an application providing a voice comment service.
In the associated application window associated with the target sound expression, the target sound expression is played, so that the user can directly express the emotion or feeling of the user through the sound expression in the process of communicating through the voice, the emotion or emotion expression requirement of the user in the voice communication is met, and the voice communication experience of the user is improved. The method is particularly suitable for application scenes such as voice chat, voice message or voice comment.
In one example, the sound expressions include a speech expression and an audio expression; the sound content of the voice expression is voice corresponding to the emotion characteristics of the voice expression; the sound content of the sound effect expression is a sound effect corresponding to the emotion characteristics of the sound effect expression. In this example, step S2300 of the method of applying the acoustic expression in the present embodiment may include: steps S2310 through S2320.
In step S2310, when the target sound expression is a speech expression, in the process of playing the user speech associated with the target sound expression in the associated application window, the target sound expression is played according to the insertion position of the target sound expression in the user speech.
The user's voice associated with the target sound expression is voice data input by the user that desires to express emotion or emotion of the user's voice by using the target sound expression. For example, the target sound expression is used in a voice chat, and the user voice associated with the target sound expression is chat voice input by the user for which use of the target expression is desired; the target sound expression is used in a voice message, and the user voice associated with the target sound expression is a voice message input by a user and expected to use the target expression; the target sound expression is used in a speech comment, and the user speech associated with the target sound expression is a speech comment input by the user for which use of the target expression is desired.
When the target sound expression is a voice expression, the sound content of the voice expression is voice corresponding to the emotion characteristics of the voice expression, and the voice expression is the sound expression with language content. Users often desire to express their own emotion or feeling through the language content when the speech expression is played. The speech expression, when selected for use by the user, typically has a corresponding insertion location in the user's voice, e.g., in a voice chat, may be the beginning, middle, or end of a chat voice currently entered by the user, and is similar in a voice message or comment.
According to the insertion position of the voice expression as the target voice expression in the voice of the user associated with the target voice expression, the voice expression is played, the voice expression with language content can be inserted and played in the voice of the user, and the emotion or feeling expected to be expressed by the user in voice communication is expressed. For example, the speech content of the target voice expression is "heart comparing", the user inserts the voice expression at the end of a sentence of voice chat in the voice chat scene, and correspondingly, when the user receiving the sentence of voice chat plays the sentence of voice chat in the associated voice chat window, the user can hear the voice expression "heart comparing" at the end of the sentence of voice chat, so that the emotion or feeling expressed by the user sending the sentence of voice chat can be intuitively perceived.
Step S2320, when the target sound expression is an audio expression, the user voice associated with the target sound and the audio expression are mixed and synthesized, and then played in the associated application window.
The user' S voice associated with the target sound expression is similar to the above step S2310, and will not be described again.
When the target sound expression is an audio expression, the sound content of the audio expression is an audio corresponding to the emotion characteristics of the audio expression, and the audio expression is a sound expression without language content. The sound effect expression, when selected for use by a user, is intended to express its own emotion or feeling through the sound effect generated when the sound effect expression is played. After the user voice associated with the target sound expression and the target sound expression are mixed and synthesized, the sound effect expression serving as the target sound expression becomes a sound effect background of the user voice, and when the user voice is played in the associated application window, the user voice has sound effect formed by the sound effect expression, and the emotion or feeling expected to be expressed by the user in voice communication is expressed. For example, the sound effect of the sound effect expression as the target sound expression is a laughter special effect of laughter, the user uses the sound effect expression in a sentence of voice chat in a voice chat scene, and correspondingly, when the user receiving the sentence of voice chat plays the sentence of voice chat in an associated voice chat window, the user can hear the sentence of voice chat with the laughter special effect of laughter as the sound effect, and intuitively perceives the emotion or feeling expressed by the user sending the sentence of voice chat.
In this example, different types of sound expressions can be distinguished to adopt different playing modes of the sound expressions, so that different sound expression requirements of a user in a voice communication process can be met more flexibly through the sound expressions.
In an actual application scenario, the user may directly use the sound expression in the voice communication process, without associating the user voice input by the user, for example, the user directly selects the target sound expression as his own comment when the user comments, or directly selects the target sound expression as the chat voice when the user chat is sent to other users, where the above steps S2310-S2320 may not be executed, but the target sound expression may be directly played in the associated application window.
< Application example 1>
The method for applying the acoustic expression provided in the present embodiment in the voice chat scenario will be further described below with reference to fig. 5.
In this example, assume that user A is about to send a chat voice to user B.
As shown in fig. 5, the method for applying the acoustic expression includes: steps S201 to S206.
In step S201, user a inputs chat voice to be transmitted to B in the voice chat window.
Suppose that the chat voice input by user a is "like you well".
In step S202, a chat voice of the user a is received, and a dialog box is popped up for the user a to confirm whether to use the acoustic expression.
Step S203, receiving confirmation of the user a using the sound expression, and displaying the sound expression for the user to select through the expression selection interface.
In this example, the acoustic expression presented through the expression selection interface may be selected from a pre-built acoustic expression database as in steps S2110-S2120 described above. The sound expression database may include an operating system or an application built-in generation that holds the playing sound expression, recording generation by a specific person, and customized generation of various sound expressions according to a user's request.
Step S204, receiving a voice instruction of selecting a target sound expression by the user A, and selecting the target sound expression.
Assuming that the target sound expression selected in this example is a speech expression, the sound content is "heart comparing", and the chat speech has been already entered when the user a selects the target sound expression, so that the insertion position corresponding to the sound expression is the end of the chat speech.
In step S205, the target sound expression is inserted into the chat voice input by the user a, and is sent to the user B.
In this example, the speech expression "heart better" is inserted at the end of the chat voice "like you" and sent to user B.
In step S206, the user B receives the chat voice sent by the user a, and plays the chat voice inserted with the target sound expression in the voice chat window.
In this example, the chat voice heard by user B is "like you better than heart".
In the voice chat scene, the voice expression is provided for the user to select for use, so that the user can vividly and succinctly express own emotion or feeling directly through the voice expression in the voice communication process, and the emotion or emotion expression requirement of the user in voice communication is met.
The application method of implementing the acoustic expression in this embodiment under the scenes of voice message, recording, etc. is similar to that in the case of voice chat, and will not be described here again.
< Application example 2>
The method of using the acoustic expression provided in the present embodiment in the voice comment scene will be further described below with reference to fig. 6.
As shown in fig. 6, the method for applying the acoustic expression includes: steps S211 to S216.
In step S211, the user a inputs comment voices in the comment window of the article W.
Assume that in this example, the comment voice input by the user a is "the article telop".
In step S212, comment voice of the user B is received, and a dialog box is popped up for the user to confirm whether to use the acoustic expression.
Step S213, receiving confirmation of the user A using the sound expression, and displaying the sound expression for the user to select through the expression selection interface.
In this example, the acoustic expression presented through the expression selection interface may be selected from a pre-built acoustic expression database as in steps S2110-S2120 described above. The sound expression database may include an operating system or an application built-in generation that holds the playing sound expression, recording generation by a specific person, and customized generation of various sound expressions according to a user's request.
Step S214, receiving a voice instruction of selecting a target sound expression by the user a, and selecting the target sound expression.
Assume that the selected sound expression in this example is an audio expression, and the sound content is a special effect of haha laughing.
Step S215, the target sound expression and comment voice input by the user A are mixed and synthesized and then published.
In the example, the sound effect expression with the special effect of haha laughing and the comment voice, namely the article tai bana, are mixed and synthesized and then published.
In step S216, the user B browses the comments of the article W, clicks the comment voice of the user a, and plays the comment voice in the comment window.
In this example, the comment voice heard by the user B is "the article tai bana" with the sound effect of haha laugh.
< Device for applying Sound expression >
In this embodiment, there is also provided an application apparatus 3000 of sound expression, as shown in fig. 7, including: the expression providing unit 3100, the expression determining unit 3200, and the expression playing unit 3300 are used for the application method of the sound expression provided in the present embodiment, and are not described herein again.
The expression providing unit 3100 is configured to provide an expression selection interface, and display a sound expression that can be selected by a user.
Alternatively, the method comprises the steps of; each sound expression has corresponding emotion characteristics and sound content; the emotion characteristics at least comprise emotion types and emotion degrees; the expression providing unit 3100 further includes:
Means for obtaining a voice emotion feature of the user based on the associated voice data input by the user; the voice emotion characteristics at least comprise voice emotion types and voice emotion degrees;
And the device is used for selecting the sound expression corresponding to the emotion characteristics from a sound expression database comprising a plurality of sound expressions according to the voice emotion characteristics as the sound expression displayed through the expression selection interface.
Optionally, the device for acquiring the voice emotion feature of the user according to the associated voice data input by the user is further configured to:
Performing voice analysis on the associated voice data to obtain tone characteristics, volume characteristics and rhythm characteristics of the associated voice data;
Processing the tone characteristics, the volume characteristics and the rhythm characteristics of the associated voice data according to the emotion characteristic extraction model to obtain the corresponding voice emotion characteristics;
the emotion feature extraction model is a machine learning model obtained through training of collected voice samples and is used for inputting corresponding voice emotion features according to the tone features, volume features and rhythm features of input voice.
Optionally, the device for acquiring the voice emotion feature of the user according to the associated voice data input by the user is further configured to:
Converting the associated voice data into associated text data corresponding to the associated text data;
Extracting emotion keywords from the associated text data according to a pre-constructed emotion word library;
carrying out structural analysis on the emotion keywords through an emotion structural model to obtain the corresponding voice emotion characteristics;
the emotion structural model is a vocabulary model obtained by classifying and structuring collected emotion vocabularies related to emotion; each emotion vocabulary included in the emotion structural model has a corresponding emotion type and emotion degree.
Optionally, the expression providing unit 3100 further includes:
and the device is used for generating corresponding sound waveforms according to the emotion characteristics of the sound expressions so as to display the sound expressions.
Optionally, the expression providing unit 3100 further includes:
and the device is used for responding to the sound expression generation request of the user, generating the corresponding sound expression according to the sound content input by the user, and allowing the user to select the corresponding sound expression for use.
The expression determining unit 3200 is configured to receive an expression selection instruction of a user, and determine a corresponding target sound expression.
Optionally, the expression selection indication is a voice selection indication entered by a user.
And the expression playing unit 3300 is configured to play the target sound expression in an associated application window associated with the target sound expression.
Optionally, each of the sound expressions has a corresponding emotion feature and sound content; the emotion characteristics at least comprise emotion types and emotion degrees; the sound expression comprises a voice expression and an audio expression; the sound content of the voice expression is voice corresponding to emotion characteristics of the voice expression; the sound content of the sound effect expression is a sound effect corresponding to the emotion characteristics of the sound effect expression;
The expression playing unit 3300 is further configured to:
When the target sound expression is the voice expression, in the process of playing the user voice associated with the target sound expression in the associated application window, playing the target sound expression according to the insertion position of the target sound expression in the user voice;
And when the target sound expression is the sound effect expression, mixing and synthesizing the user voice associated with the target sound expression and the target sound expression, and then playing the user voice in the associated application window.
It should be appreciated by those skilled in the art that the application device 3000 of the acoustic expression may be implemented in various ways. For example, the processor may be configured by instructions to implement the application device 3000 of the acoustic expression. For example, instructions may be stored in the ROM, and when the device is started, the instructions are read from the ROM into the programmable device to implement the application apparatus 3000 of the acoustic expression. For example, the application 3000 of the sound expression may be solidified into a dedicated device (e.g., ASIC). The application device 3000 of the acoustic expression may be divided into units independent of each other, or they may be implemented by being combined together. The application device 3000 of the acoustic expression may be implemented by one of the above-described various implementations, or may be implemented by a combination of two or more of the above-described various implementations.
In the present embodiment, the application device 3000 of the acoustic expression may be a software product or an application program that arbitrarily provides a function of voice communication using the acoustic expression. For example, the application device 3000 of the sound expression may be a social application providing a sound expression in a voice chat, or the application device 3000 of the sound expression may support a communication-type application using the sound expression in a voice message, or the application device 3000 of the sound expression may be a content distribution-type application providing a voice comment function and using the sound expression in a voice comment.
< Application device of Sound expression >
In this embodiment, there is also provided an application device 4000 of a sound expression, as shown in fig. 8, including:
the display device 4100 is provided with a display device,
Memory 4200 for storing executable instructions;
the processor 4300 is configured to execute the application apparatus for a sound expression according to the control of the executable instruction to execute the application method for a sound expression according to the embodiment.
In this embodiment, the application device 4000 of the sound expression may be an electronic device such as a mobile phone, a palm computer, a tablet computer, a notebook computer, or a desktop computer. In a specific example, the application 4000 of the acoustic expression may be a mobile phone that optionally provides a software product or an application program for voice communication using the acoustic expression function, for example, a mobile phone installed with a social application for providing the acoustic expression in the voice chat.
The application device 4000 of the acoustic expression may further comprise other devices, for example, the electronic device 1000 shown in fig. 1, input devices, etc.
< Readable storage Medium >
In the present embodiment, there is also provided a readable storage medium storing a computer program readable and executable by a computer for executing the method of applying the acoustic expression described in the present embodiment when the computer program is read and executed by the computer.
The readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. A readable storage medium as used herein is not to be construed as a transitory signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., a pulse of light through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The embodiments of the present invention have been described above with reference to the accompanying drawings and examples, according to the present embodiment, an application method, apparatus, device, and readable storage medium for providing an expression selection interface to display alternative acoustic expressions having corresponding emotion features and acoustic contents are provided, after receiving an expression selection instruction from a user, determining a target acoustic expression selected by the user, and playing the target acoustic expression in an associated application window associated with the target acoustic expression, so that the user can directly and vividly and succinctly express his own emotion or feeling through the acoustic expression in the process of communicating through voice, thereby satisfying the emotion or emotion expression requirement of the user in voice communication, and improving the voice communication experience of the user. The method is particularly suitable for application scenes such as voice chat, voice message or voice comment.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (9)

1. A method for applying a sound expression, comprising:
Providing an expression selection interface, and displaying the sound expression which can be selected by a user;
receiving an expression selection instruction of a user, and determining a corresponding target sound expression;
Playing the target sound expression in an associated application window associated with the target sound expression;
Before the providing the expression selection interface and displaying the acoustic expression for the user to select, the method further comprises the following steps: responding to a sound expression generation request of a user, providing a prompt box to guide the user to input sound content, guiding the user to configure emotion characteristics corresponding to the sound expression, generating the sound expression with sound waveform patterns according to the sound content and the emotion characteristics, providing trial listening of the sound expression, enabling the user to confirm whether the sound expression can be stored as the sound expression, and storing the generated sound expression after the user confirms the sound expression; the sound waveform pattern includes a color and a shape of the sound waveform, the color of the sound waveform being generated according to a emotion type of the sound expression, the shape of the sound waveform being generated according to a emotion degree of the sound expression.
2. The method of claim 1, wherein each of the sound expressions has a corresponding emotional characteristic and sound content; the emotion characteristics at least comprise emotion types and emotion degrees;
the method further comprises the steps of:
Acquiring voice emotion characteristics of a user according to associated voice data input by the user; the voice emotion characteristics at least comprise voice emotion types and voice emotion degrees;
and selecting the sound expression corresponding to the emotion characteristics from a sound expression database comprising a plurality of sound expressions according to the voice emotion characteristics, and taking the sound expression corresponding to the emotion characteristics as the sound expression displayed through the expression selection interface.
3. The method of claim 2, wherein,
The step of acquiring the voice emotion characteristics of the user according to the associated voice data input by the user comprises the following steps:
Performing voice analysis on the associated voice data to obtain tone characteristics, volume characteristics and rhythm characteristics of the associated voice data;
Processing the tone characteristics, the volume characteristics and the rhythm characteristics of the associated voice data according to the emotion characteristic extraction model to obtain the corresponding voice emotion characteristics;
the emotion feature extraction model is a machine learning model obtained through training of collected voice samples and is used for inputting corresponding voice emotion features according to the tone features, volume features and rhythm features of input voice.
4. The method of claim 2, wherein,
The step of acquiring the voice emotion characteristics of the user according to the associated voice data input by the user comprises the following steps:
Converting the associated voice data into associated text data corresponding to the associated text data;
Extracting emotion keywords from the associated text data according to a pre-constructed emotion word library;
carrying out structural analysis on the emotion keywords through an emotion structural model to obtain the corresponding voice emotion characteristics;
the emotion structural model is a vocabulary model obtained by classifying and structuring collected emotion vocabularies related to emotion; each emotion vocabulary included in the emotion structural model has a corresponding emotion type and emotion degree.
5. The method of claim 1, wherein,
The step of providing an expression selection interface for displaying the sound expression for the user to select comprises the following steps:
Generating a corresponding sound waveform according to the emotion characteristics of the sound expression so as to display the sound expression;
And/or the number of the groups of groups,
The expression selection indication is a voice selection indication input by a user.
6. The method of claim 1, wherein,
Each sound expression has corresponding emotion characteristics and sound content; the emotion characteristics at least comprise emotion types and emotion degrees;
the sound expression comprises a voice expression and an audio expression; the sound content of the voice expression is voice corresponding to emotion characteristics of the voice expression; the sound content of the sound effect expression is a sound effect corresponding to the emotion characteristics of the sound effect expression;
The step of playing the target sound expression in an associated application window associated with the target sound expression includes:
When the target sound expression is the voice expression, in the process of playing the user voice associated with the target sound expression in the associated application window, playing the target sound expression according to the insertion position of the target sound expression in the user voice;
And when the target sound expression is the sound effect expression, mixing and synthesizing the user voice associated with the target sound expression and the target sound expression, and then playing the user voice in the associated application window.
7. An application device of sound expression, comprising:
the expression providing unit is used for providing an expression selection interface and displaying the sound expression which can be selected by the user;
The expression determining unit is used for receiving the expression selection instruction of the user and determining a corresponding target sound expression;
The expression playing unit is used for playing the target sound expression in an associated application window associated with the target sound expression;
Before the providing the expression selection interface and displaying the acoustic expression for the user to select, the method further comprises the following steps: responding to a sound expression generation request of a user, providing a prompt box to guide the user to input sound content, guiding the user to configure emotion characteristics corresponding to the sound expression, generating the sound expression with sound waveform patterns according to the sound content and the emotion characteristics, providing trial listening of the sound expression, enabling the user to confirm whether the sound expression can be stored as the sound expression, and storing the generated sound expression after the user confirms the sound expression; the sound waveform pattern includes a color and a shape of the sound waveform, the color of the sound waveform being generated according to a emotion type of the sound expression, the shape of the sound waveform being generated according to a emotion degree of the sound expression.
8. An application device of a sound expression, comprising:
A display device;
a memory for storing executable instructions;
a processor for executing the application device of the acoustic expression to perform the application method of the acoustic expression according to any one of claims 1-6, according to the control of the executable instructions.
9. A readable storage medium, wherein the readable storage medium stores a computer program readable by a computer for performing the method of applying a sound expression according to any one of claims 1-6 when the computer program is read and executed by the computer.
CN201910217204.3A 2019-03-21 2019-03-21 Sound expression application method, device, equipment and readable storage medium Active CN111724799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910217204.3A CN111724799B (en) 2019-03-21 2019-03-21 Sound expression application method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910217204.3A CN111724799B (en) 2019-03-21 2019-03-21 Sound expression application method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111724799A CN111724799A (en) 2020-09-29
CN111724799B true CN111724799B (en) 2024-09-20

Family

ID=72562143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910217204.3A Active CN111724799B (en) 2019-03-21 2019-03-21 Sound expression application method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111724799B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834576A (en) * 2024-01-05 2024-04-05 北京字跳网络技术有限公司 Expression interaction method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107450746A (en) * 2017-08-18 2017-12-08 联想(北京)有限公司 A kind of insertion method of emoticon, device and electronic equipment
CN109302339A (en) * 2018-09-10 2019-02-01 郭素英 A kind of band personalized speech implementation method and its platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040040549A (en) * 2002-11-07 2004-05-13 에스케이 텔레콤주식회사 The Method for Sound Emoticon Service
JP2006330958A (en) * 2005-05-25 2006-12-07 Oki Electric Ind Co Ltd Image composition apparatus, communication terminal and image communication system using the apparatus, and chat server in the system
CN105228013B (en) * 2015-09-28 2018-09-07 百度在线网络技术(北京)有限公司 Barrage information processing method, device and barrage video player
CN105657482B (en) * 2016-03-28 2018-11-06 广州华多网络科技有限公司 A kind of implementation method and device of voice barrage
CN106570106A (en) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 Method and device for converting voice information into expression in input process
CA2997760A1 (en) * 2017-03-07 2018-09-07 Salesboost, Llc Voice analysis training system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107450746A (en) * 2017-08-18 2017-12-08 联想(北京)有限公司 A kind of insertion method of emoticon, device and electronic equipment
CN109302339A (en) * 2018-09-10 2019-02-01 郭素英 A kind of band personalized speech implementation method and its platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
vEmotion为聊天注入动感;千江雪;现代计算机(普及版)(第11期);119 *
千江雪.vEmotion为聊天注入动感.现代计算机(普及版).2006,(第11期),119. *

Also Published As

Publication number Publication date
CN111724799A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
US20220230374A1 (en) User interface for generating expressive content
US11514886B2 (en) Emotion classification information-based text-to-speech (TTS) method and apparatus
CN108962219B (en) method and device for processing text
WO2022170848A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
US20200395008A1 (en) Personality-Based Conversational Agents and Pragmatic Model, and Related Interfaces and Commercial Models
CN106570106A (en) Method and device for converting voice information into expression in input process
KR101628050B1 (en) Animation system for reproducing text base data by animation
CN112750187B (en) Animation generation method, device, equipment and computer-readable storage medium
US10586528B2 (en) Domain-specific speech recognizers in a digital medium environment
Pauletto et al. Exploring expressivity and emotion with artificial voice and speech technologies
CN114023301A (en) Audio editing method, electronic device and storage medium
CN106873800A (en) Information output method and device
CN111142667A (en) System and method for generating voice based on text mark
CN115222857A (en) Method, apparatus, electronic device and computer readable medium for generating avatar
CN112883181A (en) Session message processing method and device, electronic equipment and storage medium
US20250117590A1 (en) Interaction method and apparatus, computer device, and storage medium
An et al. Emowear: Exploring emotional teasers for voice message interaction on smartwatches
CN111726696B (en) Application method, device and equipment of sound barrage and readable storage medium
CN116564272A (en) Method for providing voice content and electronic equipment
CN111914115B (en) Sound information processing method and device and electronic equipment
CN111724799B (en) Sound expression application method, device, equipment and readable storage medium
CN108255917A (en) Image management method, equipment and electronic equipment
KR102226427B1 (en) Apparatus for determining title of user, system including the same, terminal and method for the same
KR102583986B1 (en) Speech balloon expression method and system for voice messages reflecting emotion classification based on voice
CN114093341A (en) Data processing method, apparatus and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant