[go: up one dir, main page]

WO2020098107A1 - Detection model-based emotions analysis method, apparatus and terminal device - Google Patents

Detection model-based emotions analysis method, apparatus and terminal device Download PDF

Info

Publication number
WO2020098107A1
WO2020098107A1 PCT/CN2018/124629 CN2018124629W WO2020098107A1 WO 2020098107 A1 WO2020098107 A1 WO 2020098107A1 CN 2018124629 W CN2018124629 W CN 2018124629W WO 2020098107 A1 WO2020098107 A1 WO 2020098107A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
voice
emotion
speech
parameters
Prior art date
Application number
PCT/CN2018/124629
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
彭俊清
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020098107A1 publication Critical patent/WO2020098107A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • FIG. 2 is an implementation flowchart of a detection model-based sentiment analysis method provided in Embodiment 2 of the present application;
  • the above-mentioned sound intensity value, loudness value and pitch value can be obtained by analyzing the pre-stored voice signal through the open-source voice analysis component, and because the pre-stored voice signal is not limited to a signal at a certain time, that is, within the duration of the pre-stored voice signal .
  • the sound intensity value, the loudness value and the pitch value may change at different times, so the average value of all sound intensity values, the average value of all loudness values and the average value of all pitch values collected within the duration of the pre-stored speech signal
  • the value is used as the final sound intensity value, loudness value and pitch value.
  • this does not constitute a limitation on the embodiments of the present application. According to different actual application scenarios, there are other ways to obtain the above sound intensity value, loudness value, and pitch value.
  • the calculation formula of the signal period is as follows:
  • E n is the signal energy value at time n.
  • multiple sampling points can be set within the split duration of the sub-speech signal, and the average value of the signal energy values obtained from the multiple sampling points can be determined as the final signal energy value corresponding to the sub-speech signal.
  • x (n + m) is the sub-speech signal at time n + m, and the value range of ⁇ is 0 ⁇ ⁇ 60.
  • Measure ( ⁇ ) compares the values to determine the Measure ( ⁇ ) with the largest value, and uses the value of ⁇ in the Measure ( ⁇ ) with the largest value as the signal period.
  • the sound intensity value, loudness value, pitch value, signal energy value and signal period of the weighted sub-voice signal are combined into a voice parameter. Since a pre-stored voice signal corresponds to multiple sub-voice signals, after the above calculation One pre-stored speech signal corresponds to multiple speech parameters. In the subsequent training process of the initial model, the training accuracy of the initial model is improved, and the amount of training calculation is also increased.
  • the embodiment of the present application improves the training accuracy of training the initial model by splitting the pre-stored sound signal in the time dimension and generating speech parameters corresponding to each sub-speech signal.
  • the accuracy of the emotional model after training is further improved.
  • FIG. 3 is an implementation flowchart of a sentiment analysis method based on a detection model provided by an embodiment of the present application.
  • this embodiment includes multiple initial models, and the pre-stored voice signals also correspond to attribute characteristics, and refine S102 to obtain S301-S302. Details are as follows:
  • Pre-stored voice signals may correspond to attribute characteristics, which are related to an attribute of the pre-stored voice signal.
  • attribute characteristics may be related to the age of the speaker of the pre-stored voice signal, and divide the age between 0 and 10
  • the attribute feature divides the age between 11 and 20 years into another attribute feature.
  • the attribute feature can be more content.
  • the pre-stored voice signals corresponding to different attribute features are separately processed, and the voice parameters of the pre-stored voice signals corresponding to the same attribute feature are divided into separate feature parameter sets based on the voices in the feature parameter set The parameter constructs the parameter vector.
  • a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models is established, and through a plurality of the parameter vectors and a corresponding plurality of the emotions in the feature parameter set Level trains the initial model to obtain the emotion model.
  • FIG. 4 is a flowchart of an implementation of a sentiment analysis method based on a detection model provided by an embodiment of the present application. Compared with the embodiment corresponding to FIG. 3, this embodiment expands the process before S301 to obtain S401-S403 based on the attribute characteristics including male and female. The details are as follows:
  • the attribute features include male and female, and due to the high vibration frequency of the female vocal cords, there is a difference in the vocalization frequency of male and female.
  • the reciprocal of the signal period is determined as the signal frequency of the pre-stored voice signal, and the attribute characteristic corresponding to the pre-stored voice signal is determined to be male or female according to the signal frequency.
  • the analyzing unit 51 is configured to acquire a plurality of pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters.
  • the voice parameters include a sound intensity value, a loudness value, a pitch value, and a signal period.
  • the pre-stored voice signal corresponds to a preset emotion level;
  • the training unit 52 is configured to construct a parameter vector based on the speech parameters, and train the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
  • the input unit 53 is configured to input the speech parameters of the speech signal to be tested into the emotion model, and determine the output result of the emotion model as the emotion level corresponding to the speech signal to be tested.
  • a splitting unit configured to split the pre-stored voice signal into multiple sub-voice signals in a time dimension, and perform a product operation on each of the sub-voice signals and a weighting coefficient, wherein the weighting coefficient is determined by a preset Weighted formula generation;
  • the combining unit is configured to combine the weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-voice signal into the voice parameter.
  • a establishing unit configured to establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and pass a plurality of the parameter vectors in the feature parameter set and a corresponding plurality of the The emotional model trains the initial model to obtain the emotional model.
  • the input unit 53 includes:
  • the determining unit is configured to determine the emotion model corresponding to the attribute feature according to the mapping relationship, and input the voice parameters of the voice signal to be tested into the emotion model.
  • a frequency determining unit configured to acquire a preset frequency threshold, and determine the signal frequency of the pre-stored voice signal based on the signal period;
  • a first setting unit configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to female if the signal frequency is higher than the frequency threshold;
  • the second setting unit is configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to male if the signal frequency is not higher than the frequency threshold.
  • the terminal device may include, but is not limited to, the processor 60 and the memory 61. Those skilled in the art may understand that FIG. 6 is only an example of the terminal device 6 and does not constitute a limitation on the terminal device 6, and may include more or less components than the illustration, or a combination of certain components or different components.
  • the terminal device may further include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6.
  • the memory 61 may also be an external storage device of the terminal device 6, for example, a plug-in hard disk equipped on the terminal device 6, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc.
  • the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device.
  • the memory 61 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 61 can also be used to temporarily store data that has been or will be output.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The present application is applicable in the technical field of data processing, and provided thereby are a detection model-based emotions analysis method, apparatus, and terminal device, the method comprising: obtaining a plurality of pre-stored voice signals, and analyzing the pre-stored voice signals to obtain voice parameters, the voice parameters comprising sound intensity values, loudness values, pitch values, and signal periods, wherein each pre-stored voice signal corresponds to one preset emotion level; constructing parameter vectors on the basis of the voice parameters, and training an initial model by means of the plurality of parameter vectors and a plurality of corresponding emotion levels to obtain an emotion model; inputting voice parameters of a voice signal to be tested into the emotion model, and determining an output result of the emotion model to be an emotion level corresponding to the voice signal to be tested. In the present application, emotions are analyzed at an objective level on the basis of quantized values of pre-stored voice signals and an emotion level training model, which thereby improves the objectivity and accuracy of emotions analysis.

Description

基于检测模型的情绪分析方法、装置及终端设备Emotion analysis method, device and terminal equipment based on detection model
本申请要求于2018年11月12日提交中国专利局、申请号为201811340781.3、名称为“基于检测模型的情绪分析方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on November 12, 2018 in the Chinese Patent Office with the application number 201811340781.3 and the title "Sentiment Analysis Method, Device and Terminal Equipment Based on Detection Model", the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本申请属于数据处理技术领域,尤其涉及一种基于检测模型的情绪分析方法、装置及终端设备。The present application belongs to the technical field of data processing, and particularly relates to a sentiment analysis method, device and terminal device based on a detection model.
背景技术Background technique
情绪分析是现今的研究热点,适用于面审、咨询以及推销等场景,一种情绪分析技术是对被访谈者的发音进行分析,得到被访谈者当前的情绪状况,便于访谈者根据情绪状况调整话术及谈话方式。Sentiment analysis is a hot research topic nowadays. It is suitable for interviews, consultations, and marketing. A sentiment analysis technique analyzes the pronunciation of the interviewee to obtain the current emotional state of the interviewee, which is convenient for the interviewer to adjust according to the emotional state. Talking and talking style.
在现有技术中,通常是访谈者根据被访谈者当前时刻的发音进行人为判断,即根据声音特征来推测被访谈者的情绪状况。由于人为判断带有较大的主观性,容易受到访谈者本身的影响,导致得到的情绪状况并不是客观结果,情绪分析的准确性低。In the prior art, the interviewer usually makes an artificial judgment based on the pronunciation of the interviewee at the current moment, that is, the emotional state of the interviewee is estimated based on the sound characteristics. Since human judgment is subjective, it is easily influenced by the interviewer itself, resulting in that the emotional state obtained is not an objective result, and the accuracy of emotional analysis is low.
技术问题technical problem
有鉴于此,本申请实施例提供了一种基于检测模型的情绪分析方法、装置及终端设备,以解决现有技术中情绪分析依赖于主观判断,准确性低的问题。In view of this, the embodiments of the present application provide a method, device and terminal device for sentiment analysis based on a detection model to solve the problem that sentiment analysis in the prior art relies on subjective judgment and has low accuracy.
技术解决方案Technical solution
本申请实施例的第一方面提供了一种基于检测模型的情绪分析方法,包括:The first aspect of the embodiments of the present application provides a sentiment analysis method based on a detection model, including:
获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;Obtain multiple pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters, where the voice parameters include sound intensity value, loudness value, pitch value, and signal period, wherein each of the pre-stored voice signals corresponds to a pre-stored voice signal Set emotion level;
基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;Constructing a parameter vector based on the speech parameters, and training the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The voice parameters of the voice signal to be tested are input to the emotion model, and the output result of the emotion model is determined as the emotion level corresponding to the voice signal to be tested.
本申请实施例的第二方面提供了一种基于检测模型的情绪分析装置,可以包括用于实现上述基于检测模型的情绪分析方法的步骤的单元。A second aspect of the embodiments of the present application provides a detection model-based sentiment analysis device, which may include a unit for implementing the steps of the above-described detection model-based sentiment analysis method.
本申请实施例的第三方面提供了一种终端设备,包括存储器以及处理器,所述存储器中 存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述基于检测模型的情绪分析方法的步骤。A third aspect of the embodiments of the present application provides a terminal device, including a memory and a processor. The memory stores computer-readable instructions that can run on the processor. The processor executes the computer. When reading the instruction, the steps of the emotion analysis method based on the detection model described above are implemented.
本申请实施例的第四方面提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述基于检测模型的情绪分析方法的步骤。A fourth aspect of the embodiments of the present application provides a computer non-volatile readable storage medium. The computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor. To realize the steps of the above sentiment analysis method based on detection model.
有益效果Beneficial effect
本申请实施例通过获取多个对应有情绪等级的预存语音信号,并分析得到预存语音信号的语音参数,然后基于语音参数及对应的情绪等级对初始模型进行训练得到情绪模型,最后将作为分析对象的待测语音信号的语音参数输入情绪模型,将情绪模型的输出结果确定为待测语音信号对应的情绪等级,本申请实施例将预设的情绪等级和提取出的语音参数作为输入参数,完成对情绪模型的训练,提升了情绪分析的客观性和准确性。In this embodiment of the present application, a plurality of pre-stored voice signals corresponding to emotional levels are obtained, and the voice parameters of the pre-stored voice signals are analyzed, and then the initial model is trained based on the voice parameters and the corresponding emotional levels to obtain the emotional model. The voice parameters of the voice signal to be tested are input into the emotion model, and the output result of the emotion model is determined as the emotion level corresponding to the voice signal to be tested. The embodiment of the present application uses the preset emotion level and the extracted voice parameter as input parameters to complete The training of emotion models has improved the objectivity and accuracy of emotion analysis.
附图说明BRIEF DESCRIPTION
图1是本申请实施例一中提供的基于检测模型的情绪分析方法的实现流程图;1 is a flowchart of an implementation of a sentiment analysis method based on a detection model provided in Embodiment 1 of the present application;
图2是本申请实施例二中提供的基于检测模型的情绪分析方法的实现流程图;2 is an implementation flowchart of a detection model-based sentiment analysis method provided in Embodiment 2 of the present application;
图3是本申请实施例三中提供的基于检测模型的情绪分析方法的实现流程图;3 is a flowchart of an implementation of a sentiment analysis method based on a detection model provided in Embodiment 3 of the present application;
图4是本申请实施例四中提供的基于检测模型的情绪分析方法的实现流程图;4 is a flowchart of an implementation of a sentiment analysis method based on a detection model provided in Embodiment 4 of the present application;
图5是本申请实施例五中提供的基于检测模型的情绪分析装置的结构框图;5 is a structural block diagram of a sentiment analysis device based on a detection model provided in Embodiment 5 of the present application;
图6是本申请实施例六中提供的终端设备的示意图;6 is a schematic diagram of the terminal device provided in Embodiment 6 of the present application;
图7是本申请实施例七中提供的初始模型中的一个模型单元的架构图。7 is an architecture diagram of a model unit in the initial model provided in Embodiment 7 of the present application.
本发明的实施方式Embodiments of the invention
为了对本申请的技术特征、目的和效果有更加清楚的理解,现对照附图详细说明本申请的具体实施方式。In order to have a clearer understanding of the technical features, purposes and effects of the present application, the specific implementation of the present application will now be described in detail with reference to the drawings.
请参阅图1,图1是本申请实施例提供的一种基于检测模型的情绪分析方法的实现流程图。如图1所示,该情绪分析方法包括以下步骤:Please refer to FIG. 1, which is a flowchart of an implementation of a sentiment analysis method based on a detection model provided by an embodiment of the present application. As shown in Figure 1, the sentiment analysis method includes the following steps:
S101:获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级。S101: Obtain multiple pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters. The voice parameters include a sound intensity value, a loudness value, a pitch value, and a signal period, wherein each of the pre-stored voice signals corresponds to A preset emotional level.
在本申请实施例中,为了在客观层面上对语音信号携带的情绪进行分析,首先获取多个预存语音信号,将多个预存语音信号作为情绪分析的数据基础,其中,预存语音信号优选为连续的语音信号,预存语音信号可从开源的语音库中预先获取并存储在本地,并且每个预存 语音信号都对应一个预设的情绪等级。情绪等级可预先由人为确定,比如由专门的情绪分析师对每个预存语音信号进行分析,确定情绪等级。值得一提的是,情绪等级的具体确定规则可根据实际应用场景进行制定,本申请实施例并不做限定,比如一种方式是为预存语音信号的情绪等级赋予1至10之间的整数,数值越大则代表情绪越激烈。另外,为了方便进行后续训练,在获取到多个预存语音信号后,对多个预存语音信号进行截取,用于截取的截取时长可以预先统一设置,比如设置为2分钟,则对于时长超过2分钟的预存语音信号,从预存语音信号的起始位置开始,截取到时长为2分钟为止;对于时长不超过2分钟的预存语音信号,则不进行截取。按照预设的截取时长进行截取后可能会导致多个预存语音信号的时长不一致,故还可应用另一种截取方式,即先获取多个预存语音信号的时长,并将其中最短的时长作为截取时长进行截取,应用该截取方式进行截取后,所有预存语音信号的时长都为一致。In the embodiment of the present application, in order to analyze the emotion carried by the speech signal at an objective level, first of all, a plurality of pre-stored speech signals are obtained, and the plurality of pre-stored speech signals are used as the data basis for sentiment analysis, wherein the pre-stored speech signals are preferably continuous Voice signals, pre-stored voice signals can be pre-acquired from the open source voice library and stored locally, and each pre-stored voice signal corresponds to a preset emotion level. The emotion level can be determined by people in advance, for example, a dedicated emotion analyst analyzes each pre-stored speech signal to determine the emotion level. It is worth mentioning that the specific rules for determining the emotional level can be formulated according to the actual application scenario, and this embodiment of the present application is not limited, for example, one way is to assign an integer between 1 and 10 to the emotional level of the pre-stored voice signal, The larger the value, the more intense the emotion. In addition, in order to facilitate subsequent training, after acquiring multiple pre-stored voice signals, the multiple pre-stored voice signals are intercepted. The interception duration for interception can be set in advance in a unified manner, for example, if it is set to 2 minutes, the duration exceeds 2 minutes. The pre-stored voice signal starts from the starting position of the pre-stored voice signal and is intercepted until the duration is 2 minutes; for the pre-stored voice signal whose duration is not more than 2 minutes, it is not intercepted. After intercepting according to the preset interception duration, the duration of multiple pre-stored voice signals may be inconsistent, so another interception method may be applied, that is, first obtain the duration of multiple pre-stored voice signals, and use the shortest duration as interception The duration is intercepted. After the interception method is used for interception, the duration of all pre-stored voice signals is the same.
对于截取后的每个预存语音信号,对其进行分析得到语音参数,语音参数包括声强值、响度值、音高值以及信号周期。具体地,声强为单位时间内通过垂直于声波传播方向的单位面积的能量,单位为瓦/平米,而声强值是通过将语音信号的声强与基准声强进行相比后再取常用对数,最后乘以10得到的数值,计算公式为L=10lg(I/I0),该公式中L为声强值,单位为分贝(db),I为语音信号的声强,I0为基准声强,数值为10-12瓦/平米,lg()为底数为10的常用对数。响度指示语音信号的声音强弱,响度与语音信号的声强和频率相关,在语音信号的频率一定时,随着语音信号的声强越大,响度也越大,而响度值为响度的对数值,单位为方(PHON),频率为1000赫兹,且声强值为n分贝的语音信号,其响度值为n方,其中n为大于零的整数。音高值指示语音信息的声音频率高低,单位为美尔(Mel),响度值为40方,频率为1000赫兹的语音信号的音高值为1000美尔。信号周期是发出语音信号的发音人的声带经历过开启和闭合一次的时长。For each pre-stored voice signal after interception, it is analyzed to obtain voice parameters. The voice parameters include sound intensity value, loudness value, pitch value and signal period. Specifically, the sound intensity is the energy per unit time passing through the unit area perpendicular to the sound wave propagation direction in units of watts / square meter, and the sound intensity value is obtained by comparing the sound intensity of the speech signal with the reference sound intensity and then taking the common Logarithm, the value obtained by multiplying by 10 finally, the calculation formula is L = 10lg (I / I0), where L is the sound intensity value, the unit is decibel (db), I is the sound intensity of the voice signal, I0 is the reference Sound intensity, the value is 10-12 watts / square meter, lg () is the common logarithm with base 10. Loudness indicates the strength of the voice signal. Loudness is related to the intensity and frequency of the voice signal. When the frequency of the voice signal is fixed, the louder the louder with the louder the voice signal, and the loudness value is the opposite of the loudness. A numerical value, in units of square (PHON), a frequency of 1000 Hz, and a sound signal with a sound intensity value of n decibels, whose loudness value is n square, where n is an integer greater than zero. The pitch value indicates the level of the sound frequency of the voice information, the unit is Mel, the loudness value is 40 square, and the pitch value of the voice signal with a frequency of 1000 Hz is 1000 mel. The signal period is the length of time that the vocal cord of the speaker who emits a voice signal has been opened and closed once.
对于上述的声强值、响度值和音高值,可通过开源的语音分析组件对预存语音信号进行分析得到,并且由于预存语音信号不是仅限于某一个时刻的信号,即在预存语音信号的时长内,声强值、响度值和音高值可能在不同的时刻发生变化,故将预存语音信号的时长内统计到的所有声强值的平均值、所有响度值的平均值和所有音高值的平均值作为最终的声强值、响度值和音高值。当然这并不构成对本申请实施例的限定,根据实际应用场景的不同,上述的声强值、响度值和音高值还存在其他的获取方式。另外,信号周期的计算公式如下:The above-mentioned sound intensity value, loudness value and pitch value can be obtained by analyzing the pre-stored voice signal through the open-source voice analysis component, and because the pre-stored voice signal is not limited to a signal at a certain time, that is, within the duration of the pre-stored voice signal , The sound intensity value, the loudness value and the pitch value may change at different times, so the average value of all sound intensity values, the average value of all loudness values and the average value of all pitch values collected within the duration of the pre-stored speech signal The value is used as the final sound intensity value, loudness value and pitch value. Of course, this does not constitute a limitation on the embodiments of the present application. According to different actual application scenarios, there are other ways to obtain the above sound intensity value, loudness value, and pitch value. In addition, the calculation formula of the signal period is as follows:
Figure PCTCN2018124629-appb-000001
Figure PCTCN2018124629-appb-000001
在上述计算公式中,Measure(m)为衡量函数,x(n)为n时刻的预存语音信号,N为预存语音信号的时长,并且m>0,其中,N可取预存语音信号中数值最大的时刻,比如预存语音 信号的时长为3分钟,n的数值以秒为单位确定,则N为3*60=180。在通过上述计算公式计算信号周期时,对根据不同的m得到的Measure(m)进行数值比较,确定出其中数值最大的Measure(m),并将数值最大的Measure(m)中m的数值作为信号周期。In the above calculation formula, Measure (m) is the measurement function, x (n) is the pre-stored voice signal at time n, N is the duration of the pre-stored voice signal, and m> 0, where N can take the largest value of the pre-stored voice signal At the moment, for example, the duration of the pre-stored voice signal is 3 minutes, and the value of n is determined in seconds, then N is 3 * 60 = 180. When calculating the signal period by the above calculation formula, compare the Measure (m) obtained according to different m, determine the Measure (m) with the largest value, and use the value of m in the Measure (m) with the largest value as Signal period.
S102:基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型。S102: Construct a parameter vector based on the speech parameters, and train the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model.
为了量化语音参数与情绪等级之间的关系,在得到语音参数之后,基于语音参数构建参数向量,该参数向量为多维向量,其中,声强值、响度值、音高值以及信号周期分别是参数向量中的一个维度。由于参数向量是基于某一个预存语音信号得到的,故构建完成的参数向量与该预存语音信号的情绪等级存在对应关系,故依次将多个参数向量以及对应的多个情绪等级输入初始模型进行训练,并将训练完成的初始模型确定为情绪模型。In order to quantify the relationship between speech parameters and emotion levels, after obtaining speech parameters, a parameter vector is constructed based on the speech parameters. The parameter vector is a multi-dimensional vector, in which sound intensity value, loudness value, pitch value and signal period are parameters One dimension in the vector. Since the parameter vector is obtained based on a pre-stored speech signal, the constructed parameter vector has a corresponding relationship with the emotional level of the pre-stored voice signal, so multiple parameter vectors and corresponding multiple emotional levels are sequentially input into the initial model for training , And determine the initial model after training as the emotion model.
为了便于说明,假设第t个参数向量x t为(Value sound-t,Value volume-t,Value loudness-t,Period signal-t),该参数向量对应的情绪等级为grade t,其中,t为大于零的整数。初始模型包括多个模型单元,图7是初始模型中的第t个模型单元的架构图,该模型单元内包括四个层级,分别为向量层级、第一层级、第二层级和第三层级,图7中的圆圈表示一种操作,若圆圈中为加号,则代表向量的求和操作;若圆圈中为乘号,则表示向量的乘法操作。对于每一个模型单元,其都要维护一个单元状态,该单元状态的格式为向量,由于图7是初始模型中的第t个模型单元,则假设该模型单元的单元状态为State t。另外,假设该模型单元的输出参数为output t,则根据模型单元中的各个层级分别描述模型单元中的计算过程如下: For ease of explanation, assume that the t-th parameter vector x t is (Value sound-t , Value volume-t , Value loudness-t , Period signal-t ), and the emotion level corresponding to the parameter vector is grade t , where t is Integer greater than zero. The initial model includes multiple model units. FIG. 7 is an architectural diagram of the t-th model unit in the initial model. The model unit includes four levels, namely a vector level, a first level, a second level, and a third level. The circle in FIG. 7 represents an operation. If the circle is a plus sign, it represents a vector sum operation; if the circle is a multiplication sign, it represents a vector multiplication operation. For each model unit, it must maintain a unit state. The unit state format is vector. Since FIG. 7 is the t-th model unit in the initial model, the unit state of the model unit is assumed to be State t . In addition, assuming that the output parameter of the model unit is output t , the calculation process in the model unit is described according to each level in the model unit as follows:
(1)对于模型单元中的向量层级,其输入参数包括output t-1(即上一个模型单元的输出参数)和x t,该向量层级用于创建符合本模型单元的维护向量,计算公式为State t-support=tanh(W support·[output t-1,x t]+b support),其中,tanh表示双曲正切函数; (1) For the vector level in the model unit, the input parameters include output t-1 (that is, the output parameter of the previous model unit) and x t . The vector level is used to create a maintenance vector that conforms to the model unit. The calculation formula is State t-support = tanh (W support · [output t-1 , x t ] + b support ), where tanh represents a hyperbolic tangent function;
(2)对于模型单元中的第一层级,其作用是设置门限,以确定单元状态State t中需要更新的参数,具体计算公式为First t=σ(W First·[output t-1,x t]+b First),输出的First t为0到1之间的数值,若First t为1,则表示完全保留参数,若First t为0,则表示完全舍弃参数,其中,σ表示神经网络的SIGMOID函数。在经历过向量层级和第一层级的计算后,将First t和State t-support进行乘积运算,以便后续更新本模型单元的单元状态; (2) For the first level in the model unit, its role is to set a threshold to determine the parameters that need to be updated in the unit state State t , the specific calculation formula is First t = σ (W First · [output t-1 , x t ] + b First ), the output First t is a value between 0 and 1. If First t is 1, it means that the parameters are completely retained, if First t is 0, it means that the parameters are completely abandoned, where σ represents the neural network ’s SIGMOID function. After going through the calculation of the vector level and the first level, multiply the First t and State t-support in order to subsequently update the unit state of the model unit;
(3)对于模型单元中的第二层级,其作用是确定从上一个模型单元的单元状态State t-1中丢弃的信息,具体计算公式为Second t=σ(W Second·[output t-1,x t]+b Second),同样地,输出的Second t为0到1之间的数值,1表示完全保留参数,0表示完全舍弃参数。由于第二层级是确定从单元状态State t-1中丢弃的信息,故在得到Second t后,将State t-1和Second t进行乘积运算,即体现在图7中的自循环部分。在向量层级、第一层级以及第二层级上计算完成后, 即可对本模型单元的单元状态进行更新,计算公式为State t=Second t·State t-1+First t·State t-support。此外,由图7所示,计算出的单元状态还用于维护模型单元的第一层级、第二层级以及第三层级,便于进行后续的模型单元的计算; (3) For the second level in the model unit, its role is to determine the information discarded from the unit state State t-1 of the previous model unit. The specific calculation formula is Second t = σ (W Second · [output t-1 , X t ] + b Second ), Similarly, the output Second t is a value between 0 and 1, 1 means completely retain the parameters, 0 means completely discard the parameters. Since the second level is to determine the information discarded from the cell state State t-1 , after the Second t is obtained, the State t-1 and the Second t are multiplied, which is reflected in the self-loop part in FIG. 7. After the calculation is completed at the vector level, the first level, and the second level, the unit state of the model unit can be updated. The calculation formula is State t = Second t · State t-1 + First t · State t-support . In addition, as shown in FIG. 7, the calculated unit state is also used to maintain the first level, second level, and third level of the model unit, to facilitate subsequent calculation of the model unit;
(4)对于模型单元中的第三层级,其作用是计算本模型单元的输出参数,具体计算公式包括Third t=σ(W Third·[output t-1,x t]+b Third)和output t=Third t·tanh(State t)。 (4) For the third level in the model unit, its function is to calculate the output parameters of the model unit. The specific calculation formula includes Third t = σ (W Third · [output t-1 , x t ] + b Third ) and output t = Third t · tanh (State t ).
在将输入参数x t输入初始模型中的第t个模型单元后,可经过计算得到输出参数output t,由于已知该输入参数x t对应的情绪等级grade t,故可将输出参数output t与情绪等级grade t之间的差值作为误差值,并基于计算出的误差值,采用反向传播算法(Backpropagation algorithm)调整模型单元的各个层级中的参数,包括W First、b First、W Second、b Second、W Third和b Third,从而使得模型单元的输出参数尽量接近于情绪等级。值得说明的是,上述的W First、W Second和W Third分别表示第一层级、第二层级和第三层级的层级权重,上述的b First、b Second和b Third表示均衡变量,在初始化时,可对多个层级权重的数值和多个均衡变量的数值进行随机设置,并在得到误差值后,基于反向传播算法对层级权重和均衡变量的数值进行调整。在将所有的参数向量及情绪等级输入初始模型,并完成参数调整后,得到训练完成的情绪模型。 After inputting the input parameter x t into the t-th model unit in the initial model, the output parameter output t can be obtained through calculation. Since the emotional level grade t corresponding to the input parameter x t is known, the output parameter output t can be The difference between sentiment levels grade t is used as the error value, and based on the calculated error value, a backpropagation algorithm is used to adjust the parameters in each level of the model unit, including W First , b First , W Second , b Second , W Third and b Third , so that the output parameters of the model unit are as close as possible to the emotional level. It is worth noting that the above W First , W Second and W Third respectively represent the level weights of the first, second and third levels, and the above b First , b Second and b Third represent the equilibrium variables. During initialization, The values of multiple level weights and multiple equilibrium variables can be set randomly, and after the error value is obtained, the values of the level weights and equilibrium variables can be adjusted based on the back propagation algorithm. After inputting all the parameter vectors and emotion levels into the initial model, and completing the parameter adjustment, the trained emotion model is obtained.
S103:将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。S103: Input the speech parameter of the speech signal to be tested into the emotion model, and determine the output result of the emotion model as the emotion level corresponding to the speech signal to be tested.
完成对情绪模型的训练后,即可开始进行情绪分析。具体地,获取待测语音信号,并对待测语音信号分析得到语音参数,其中,对待测语音信号进行分析的方式与对预存语音信号进行分析的方式相同。将对待测语音信号进行分析得到的语音参数同样以向量形式输入情绪模型,并将情绪模型的输出结果(输出参数)确定为待测语音信号对应的情绪等级。后续若需要向外界输出情绪等级,则可以通过文字、图形或语音等形式进行输出。After completing the training of the emotion model, you can start emotion analysis. Specifically, the speech signal to be tested is obtained, and the speech signal to be tested is analyzed to obtain a speech parameter, wherein the method of analyzing the speech signal to be tested is the same as the method of analyzing the pre-stored speech signal. The speech parameters obtained by analyzing the speech signal to be tested are also input into the emotion model in the form of vectors, and the output result (output parameter) of the emotion model is determined as the emotion level corresponding to the speech signal to be tested. If the emotion level needs to be output to the outside world later, it can be output in the form of text, graphics or voice.
通过图1所示实施例可知,本申请实施例通过拆解预存语音信号中的特征,并根据特征及情绪等级训练情绪模型,提升了情绪分析的客观性和准确性。It can be known from the embodiment shown in FIG. 1 that the embodiment of the present application improves the objectivity and accuracy of emotion analysis by disassembling the features in the pre-stored speech signal and training the emotion model according to the features and emotion levels.
请参阅图2,图2是本申请实施例提供的一种基于检测模型的情绪分析方法的实现流程图。相对于图1对应的实施例,本实施例在语音参数还包括信号能量值的基础上,对S101进行细化后得到S201~S202,详述如下:Please refer to FIG. 2, which is an implementation flowchart of a sentiment analysis method based on a detection model provided by an embodiment of the present application. Compared with the embodiment corresponding to FIG. 1, this embodiment obtains S201-S202 after refining S101 on the basis that the speech parameter further includes the signal energy value, and the details are as follows:
S201:在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成。S201: Split the pre-stored speech signal into multiple sub-speech signals in a time dimension, and perform a product operation on each of the sub-speech signals and a weighting coefficient, where the weighting coefficient is generated by a preset weighting formula.
在预存语音信号为连续的语音信号的情况下,为了提升对初始模型进行训练的精度,在本申请实施例中,在时间维度上将预存语音信号拆分为多个子语音信号。具体地,由于语音信号在较短的时长内具有平稳性,故可预先确定拆分时长,并从预存语音信号的起始位置 开始,每隔一个拆分时长进行一次截取,将截取出的语音信号作为子语音信号,比如预设的拆分时长为30毫秒,预存语音信号的时长为120毫秒,则可截取出4个子语音信号。In the case where the pre-stored voice signal is a continuous voice signal, in order to improve the accuracy of training the initial model, in the embodiment of the present application, the pre-stored voice signal is split into multiple sub-voice signals in the time dimension. Specifically, since the voice signal is stable in a short duration, the split duration can be determined in advance, and starting from the starting position of the pre-stored voice signal, interception is performed every other split duration, and the intercepted voice The signal is used as a sub-speech signal, for example, the preset splitting time is 30 milliseconds, and the pre-stored voice signal is 120 milliseconds, then 4 sub-voice signals can be intercepted.
可选地,获取预设的重叠时长,在已截取一个子语音信号后,向后移动一个拆分时长,然后向前移动一个重叠时长,按照拆分时长的宽度截取下一个语音信号。在本申请实施例中,由于预存语音信号是连续信号,故为了防止丢失预存语音信号中的动态信息,预先设置重叠时长,并按照拆分时长和重叠时长进行对预存语音信号的截取,其中,重叠时长小于拆分时长。在截取过程中,后一个子语音信号与前一个子语音信号产生重叠,且重叠区域的宽度为重叠时长。举例来说,拆分时长为30毫秒,重叠时长为10毫秒,预存语音信号的时长为120毫秒,则第一个子语音信号的宽度为预存语音信号的第0秒至第30毫秒,第二个子语音信号的宽度为预存语音信号的第20毫秒至第50毫秒,第三个子语音信号的宽度为预存语音信号的第40毫秒至第70毫秒,以此类推,其中,宽度是指时间维度上的数值。Optionally, a preset overlap duration is obtained, after a sub-voice signal has been intercepted, a split duration is moved backward, and then an overlap duration is moved forward, and the next voice signal is intercepted according to the width of the split duration. In the embodiment of the present application, since the pre-stored voice signal is a continuous signal, in order to prevent the loss of dynamic information in the pre-stored voice signal, the overlap duration is set in advance, and the pre-stored voice signal is intercepted according to the split duration and the overlap duration, wherein, The overlap duration is less than the split duration. In the interception process, the latter sub-speech signal overlaps with the previous sub-speech signal, and the width of the overlapping area is the overlapping duration. For example, if the split duration is 30 ms, the overlap duration is 10 ms, and the duration of the pre-stored voice signal is 120 ms, then the width of the first sub-voice signal is from the 0th to 30 ms of the pre-stored voice signal, the second The width of each sub-speech signal is from the 20th to the 50th millisecond of the pre-stored speech signal, the width of the third sub-speech signal is from the 40th to 70th millisecond of the pre-stored speech signal, and so on, where width refers to the time dimension Value.
完成对预存语音信号的截取后,为了提升每一个子语音信号的周期性,将得到的每一个子语音信号与加权系数进行乘积运算,以对每一个子语音信号的左右两端进行弱化处理。加权系数由预设的加权公式生成,加权公式如下:After the interception of the pre-stored speech signal is completed, in order to increase the periodicity of each sub-speech signal, the obtained sub-speech signal and the weighting coefficient are multiplied to weaken the left and right ends of each sub-speech signal. The weighting coefficient is generated by a preset weighting formula. The weighting formula is as follows:
Figure PCTCN2018124629-appb-000002
Figure PCTCN2018124629-appb-000002
进行乘积运算的公式为:x new(n)=x(n)·ω(n),其中,本步骤中的x(n)为n时刻的子语音信号,x new(n)是加权后的n时刻的子语音信号。 The formula for performing the product operation is: x new (n) = x (n) · ω (n), where x (n) in this step is the sub-speech signal at time n, and x new (n) is weighted Sub-voice signal at time n.
可选地,获取预设的重叠时长,在已截取一个子语音信号后,向后移动一个拆分时长,然后向前移动一个重叠时长,按照拆分时长的宽度截取下一个语音信号。在本申请实施例中,由于将子语音信号与加权系数进行乘积运算会弱化子语音信号的两端,故在本申请实施例中预先设置重叠时长,并按照拆分时长和重叠时长进行对预存语音信号的截取,其中,重叠时长小于拆分时长。在截取过程中,后一个子语音信号与前一个子语音信号产生重叠,且重叠区域的宽度为重叠时长。举例来说,拆分时长为30毫秒,重叠时长为10毫秒,预存语音信号的时长为120毫秒,则第一个子语音信号的宽度为预存语音信号的第0秒至第30毫秒,第二个子语音信号的宽度为预存语音信号的第20毫秒至第50毫秒,第三个子语音信号的宽度为预存语音信号的第40毫秒至第70毫秒,以此类推,其中,宽度是指时间维度上的数值。在根据上述方法生成子语音信号后,由于子语音信号的两端存在重叠,故将子语音信号与加权系数进行乘积运算时,降低了两端弱化后对子语音信号本身的影响。Optionally, a preset overlap duration is obtained, after a sub-voice signal has been intercepted, a split duration is moved backward, and then an overlap duration is moved forward, and the next voice signal is intercepted according to the width of the split duration. In the embodiment of the present application, since the product operation of the sub-speech signal and the weighting coefficient will weaken both ends of the sub-speech signal, the overlap duration is preset in the embodiment of the present application, and the pre-stored according to the split duration and overlap duration Interception of voice signals, where the overlap duration is less than the split duration. In the interception process, the latter sub-speech signal overlaps with the previous sub-speech signal, and the width of the overlapping area is the overlapping duration. For example, if the split duration is 30 ms, the overlap duration is 10 ms, and the duration of the pre-stored voice signal is 120 ms, then the width of the first sub-voice signal is from the 0th to 30 ms of the pre-stored voice signal, the second The width of each sub-speech signal is from the 20th to the 50th millisecond of the pre-stored speech signal, the width of the third sub-speech signal is from the 40th to 70th millisecond of the pre-stored speech signal, and so on, where width refers to the time dimension Value. After the sub-speech signal is generated according to the above method, since the two ends of the sub-speech signal overlap, the product operation of the sub-speech signal and the weighting coefficient reduces the influence of the weakened ends on the sub-speech signal itself.
S202:将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信 号能量值以及所述信号周期组合为所述语音参数。S202: Combine the weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-speech signal into the speech parameter.
对于加权后的每个子语音信号,获取该子语音信号的声强值、响度值、音高值、信号能量值以及信号周期,其中声强值、响度值以及音高值同样可通过运行语音分析组件进行自动获取,具体在该子语音信号的拆分时长内,将获取到的所有声强值、响度值以及音高值进行平均值运算,得到最终的声强值、响度值以及音高值。至于信号能量值,其计算公式为:For each sub-speech signal after weighting, the sound intensity value, loudness value, pitch value, signal energy value and signal period of the sub-speech signal are obtained, wherein the sound intensity value, loudness value and pitch value can also be analyzed by running speech The component is automatically acquired. Specifically, within the split duration of the sub-speech signal, the acquired sound intensity value, loudness value and pitch value are averaged to obtain the final sound intensity value, loudness value and pitch value . As for the signal energy value, the calculation formula is:
Figure PCTCN2018124629-appb-000003
Figure PCTCN2018124629-appb-000003
在上述公式中,E n为n时刻的信号能量值。在计算过程中,可在子语音信号的拆分时长内设置多个采样点,并将多个采样点得到的信号能量值的平均值确定为最终的与子语音信号对应的信号能量值。 In the above formula, E n is the signal energy value at time n. In the calculation process, multiple sampling points can be set within the split duration of the sub-speech signal, and the average value of the signal energy values obtained from the multiple sampling points can be determined as the final signal energy value corresponding to the sub-speech signal.
至于信号周期,由于子语音信号的拆分时长较短,且周期性更强,故计算方式发生更新,计算公式如下:As for the signal period, because the splitting time of the sub-voice signal is shorter and the periodicity is stronger, the calculation method is updated. The calculation formula is as follows:
Figure PCTCN2018124629-appb-000004
Figure PCTCN2018124629-appb-000004
在上述计算公式中,x(n+m)为n+m时刻的子语音信号,θ的取值范围为0<θ<60,在计算子语音信号的信号周期时,对根据不同的θ得到的Measure(θ)进行数值比较,确定出其中数值最大的Measure(θ),并将数值最大的Measure(θ)中θ的数值作为信号周期。完成计算后,将加权后的子语音信号的声强值、响度值、音高值、信号能量值以及信号周期组合为语音参数,由于一个预存语音信号对应多个子语音信号,故通过上述计算后,一个预存语音信号对应多个语音参数,后续对初始模型的训练过程中,在提升了对初始模型的训练精度的同时,也增加了训练的计算量。In the above calculation formula, x (n + m) is the sub-speech signal at time n + m, and the value range of θ is 0 <θ <60. When calculating the signal period of the sub-speech signal, we obtain Measure (θ) compares the values to determine the Measure (θ) with the largest value, and uses the value of θ in the Measure (θ) with the largest value as the signal period. After the calculation is completed, the sound intensity value, loudness value, pitch value, signal energy value and signal period of the weighted sub-voice signal are combined into a voice parameter. Since a pre-stored voice signal corresponds to multiple sub-voice signals, after the above calculation One pre-stored speech signal corresponds to multiple speech parameters. In the subsequent training process of the initial model, the training accuracy of the initial model is improved, and the amount of training calculation is also increased.
通过图2所示实施例可知,本申请实施例通过对预存声音信号在时间维度上进行拆分,并生成与每一个子语音信号对应的语音参数,提升了对初始模型进行训练的训练精度,进一步提升了训练完成的情绪模型的准确性。It can be seen from the embodiment shown in FIG. 2 that the embodiment of the present application improves the training accuracy of training the initial model by splitting the pre-stored sound signal in the time dimension and generating speech parameters corresponding to each sub-speech signal. The accuracy of the emotional model after training is further improved.
请参阅图3,图3是本申请实施例提供的一种基于检测模型的情绪分析方法的实现流程图。相对于图1对应的实施例,本实施例在包括多个初始模型,且预存语音信号还对应属性特征的基础上,对S102进行细化后得到S301~S302,详述如下:Please refer to FIG. 3, which is an implementation flowchart of a sentiment analysis method based on a detection model provided by an embodiment of the present application. Compared with the embodiment corresponding to FIG. 1, this embodiment includes multiple initial models, and the pre-stored voice signals also correspond to attribute characteristics, and refine S102 to obtain S301-S302. Details are as follows:
S301:将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量。S301: Divide the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and construct the parameter vector based on the voice parameters in the feature parameter set.
预存语音信号可能对应属性特征,属性特征与预存语音信号的一种属性相关,比如属性特征可与预存语音信号的发音人的年龄相关,并将年龄在0岁至10岁之间划分为一种属 性特征,将年龄在11岁至20岁之间划分为另一种属性特征,当然,根据实际应用场景的不同,属性特征还可为更多的内容。在本申请实施例中,将不同属性特征对应的预存语音信号进行分别处理,具体将同一个属性特征对应的预存语音信号的语音参数划分至单独的特征参数集,并基于特征参数集内的语音参数构建参数向量。Pre-stored voice signals may correspond to attribute characteristics, which are related to an attribute of the pre-stored voice signal. For example, attribute characteristics may be related to the age of the speaker of the pre-stored voice signal, and divide the age between 0 and 10 The attribute feature divides the age between 11 and 20 years into another attribute feature. Of course, depending on the actual application scenario, the attribute feature can be more content. In the embodiment of the present application, the pre-stored voice signals corresponding to different attribute features are separately processed, and the voice parameters of the pre-stored voice signals corresponding to the same attribute feature are divided into separate feature parameter sets based on the voices in the feature parameter set The parameter constructs the parameter vector.
在S302中,将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训练得到所述情绪模型。In S302, a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models is established, and through a plurality of the parameter vectors and a corresponding plurality of the emotions in the feature parameter set Level trains the initial model to obtain the emotion model.
由于特征参数集与预存语音信号的一种属性特征相关,故将该属性特征与多个初始模型中的一个初始模型(可随机选择)建立映射关系,并将特征参数集内的多个参数向量,以及与参数向量对应的情绪等级作为输入参数,对存在映射关系的初始模型进行训练得到情绪模型。通过上述方法,若存在y种属性特征,则最终可得到y个情绪模型,每个情绪模型都与一种属性特征存在映射关系,其中y为大于零的整数。Since the feature parameter set is related to an attribute feature of the pre-stored voice signal, a mapping relationship is established between the attribute feature and one of the initial models (which can be randomly selected), and multiple parameter vectors in the feature parameter set , And the emotional level corresponding to the parameter vector is used as the input parameter, and the initial model with the mapping relationship is trained to obtain the emotional model. Through the above method, if there are y attribute features, y emotion models can be finally obtained, and each emotion model has a mapping relationship with one attribute feature, where y is an integer greater than zero.
可选地,获取待测语音信号对应的属性特征;根据映射关系确定与属性特征对应的情绪模型,将待测语音信号的语音参数输入该情绪模型。在训练得到多个情绪模型后,获取待测语音信号的属性特征,并根据映射关系确定与该属性特征对应的情绪模型,将待测语音信号的语音参数输入该情绪模型进行情绪分析。举例来说,对于多个预存语音信号来说,包括3种属性特征,一种是年龄在0至10岁之间,命名为特征一;一种是年龄在11岁至20岁之间,命名为特征二;最后一种是年龄在21岁至30岁之间,命名为特征三。则最终训练得到的情绪模型也为3个,分别与特征一、特征二和特征三存在映射关系。对于待测语音信号,获取其对应的属性特征,比如待测语音信号的发音人的年龄的22岁,则确定对应的属性特征为特征三,并将待测语音信号的语音参数输入至与特征三存在映射关系的情绪模型。Optionally, acquire the attribute characteristics corresponding to the voice signal to be tested; determine the emotion model corresponding to the attribute characteristics according to the mapping relationship, and input the voice parameters of the voice signal to be tested into the emotion model. After training to obtain multiple emotion models, the attribute characteristics of the voice signal to be tested are obtained, and the emotion model corresponding to the attribute characteristics is determined according to the mapping relationship, and the voice parameters of the voice signal to be tested are input into the emotion model for emotion analysis. For example, for multiple pre-stored speech signals, there are three attribute features, one is between 0 and 10 years old, named as feature one; the other is between 11 and 20 years old, named Feature two; the last one is between 21 and 30 years old, and is named Feature three. Then there are three emotion models finally trained, which have mapping relationship with feature one, feature two and feature three respectively. For the voice signal to be tested, the corresponding attribute feature is obtained, for example, the age of the speaker of the voice signal to be tested is 22 years old, then the corresponding attribute feature is determined as feature three, and the voice parameters of the voice signal to be tested are input Three emotional models with mapping relationships.
通过图3所示实施例可知,在本申请实施例中,将同一个属性特征对应的预存语音信号的语音参数划分至特征参数集,并基于特征参数集内的语音参数构建参数向量,将特征参数集对应的属性特征与其中一个初始模型建立映射关系,并通过特征参数集内的多个参数向量及对应的多个情绪等级对该初始模型进行训练得到情绪模型,本申请实施例针对不同的属性特征训练出多个情绪模型,提升了情绪分析的针对性。It can be seen from the embodiment shown in FIG. 3 that in the embodiment of the present application, the voice parameters of the pre-stored voice signals corresponding to the same attribute feature are divided into feature parameter sets, and a parameter vector is constructed based on the voice parameters in the feature parameter set to distinguish the features The attribute feature corresponding to the parameter set establishes a mapping relationship with one of the initial models, and the initial model is trained through multiple parameter vectors in the feature parameter set and corresponding multiple emotional levels to obtain the emotional model. Attribute features train multiple emotion models, improving the pertinence of sentiment analysis.
请参阅图4,图4是本申请实施例提供的一种基于检测模型的情绪分析方法的实现流程图。相对于图3对应的实施例,本实施例在属性特征包括男性和女性的基础上,对S301之前的过程进行扩展后得到S401~S403,详述如下:Please refer to FIG. 4, which is a flowchart of an implementation of a sentiment analysis method based on a detection model provided by an embodiment of the present application. Compared with the embodiment corresponding to FIG. 3, this embodiment expands the process before S301 to obtain S401-S403 based on the attribute characteristics including male and female. The details are as follows:
S401:获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率。S401: Acquire a preset frequency threshold, and determine the signal frequency of the pre-stored voice signal based on the signal period.
在本申请实施例中,属性特征包括男性和女性,而由于女性声带的振动频率较高,造成 男性和女性的发声频率存在差异,故在本申请实施例中,针对预存语音信号对应的属性特征未知的情况,将信号周期的倒数确定为预存语音信号的信号频率,并根据信号频率判断预存语音信号对应的属性特征为男性或女性。In the embodiments of the present application, the attribute features include male and female, and due to the high vibration frequency of the female vocal cords, there is a difference in the vocalization frequency of male and female. In an unknown situation, the reciprocal of the signal period is determined as the signal frequency of the pre-stored voice signal, and the attribute characteristic corresponding to the pre-stored voice signal is determined to be male or female according to the signal frequency.
S402:若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性。S402: If the signal frequency is higher than the frequency threshold, set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to female.
在本申请实施例中,预先设置频率阈值,若信号频率高于频率阈值,则将信号频率对应的预存语音信号的属性特征设置为女性。其中,频率阈值可自定义设置,如设置为500赫兹,也可在大量已知属性特征的样本语音信号的基础上,设置多个频率阈值,并分析出其中准确率最高的频率阈值,将该频率阈值确定为本步骤中作为判断条件的频率阈值。In the embodiment of the present application, a frequency threshold is set in advance, and if the signal frequency is higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to female. Among them, the frequency threshold can be customized. For example, if it is set to 500 Hz, multiple frequency thresholds can be set based on a large number of sample speech signals with known attribute characteristics, and the frequency threshold with the highest accuracy rate is analyzed, and the The frequency threshold is determined as the frequency threshold used as a judgment condition in this step.
S403:若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。S403: If the signal frequency is not higher than the frequency threshold, set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to male.
若信号频率不高于频率阈值,则将信号频率对应的预存语音信号的属性特征设置为男性。在完成对所有预存语音信号的属性特征设置后,将男性对应的预存语音信号的语音参数划分至特征参数集,将女性对应的预存语音信号的语音参数划分至另一个特征参数集,实现对不同属性特征的预存语音信号的分别处理。If the signal frequency is not higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to male. After setting the attribute characteristics of all pre-stored voice signals, divide the voice parameters of the pre-stored voice signals corresponding to men into feature parameter sets, and divide the voice parameters of the pre-stored voice signals corresponding to women into another feature parameter set to achieve different Separate processing of pre-stored speech signals of attribute features.
通过图4所示实施例可知,在本申请实施例中,获取预设的频率阈值,并基于信号周期确定预存语音信号的信号频率,若信号频率高于频率阈值,则将信号频率对应的预存语音信号的属性特征设置为女性;若信号频率不高于频率阈值,则将信号频率对应的预存语音信号的属性特征设置为男性。本申请实施例通过信号频率来确定预存语音信号的属性特征,方便后续对不同属性特征的预存语音信号进行分类,提升了情绪分析的针对性。It can be known from the embodiment shown in FIG. 4 that in the embodiment of the present application, a preset frequency threshold is obtained, and the signal frequency of the pre-stored voice signal is determined based on the signal period. If the signal frequency is higher than the frequency threshold, the signal frequency corresponding to the pre-stored The attribute characteristic of the voice signal is set to female; if the signal frequency is not higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to male. The embodiment of the present application determines the attribute characteristics of the pre-stored voice signal by the signal frequency, which facilitates the subsequent classification of the pre-stored voice signals with different attribute characteristics, and improves the pertinence of the sentiment analysis.
对应于上文实施例所述的一种基于检测模型的情绪分析方法,图5示出了本申请实施例提供的一种基于检测模型的情绪分析装置的一个结构框图,参照图5,该装置包括:Corresponding to a detection model-based sentiment analysis method described in the above embodiment, FIG. 5 shows a structural block diagram of a detection model-based sentiment analysis device provided by an embodiment of the present application. Referring to FIG. 5, the device include:
分析单元51,用于获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;The analyzing unit 51 is configured to acquire a plurality of pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters. The voice parameters include a sound intensity value, a loudness value, a pitch value, and a signal period. The pre-stored voice signal corresponds to a preset emotion level;
训练单元52,用于基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;The training unit 52 is configured to construct a parameter vector based on the speech parameters, and train the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
输入单元53,用于将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The input unit 53 is configured to input the speech parameters of the speech signal to be tested into the emotion model, and determine the output result of the emotion model as the emotion level corresponding to the speech signal to be tested.
可选地,语音参数还包括信号能量值,分析单元51包括:Optionally, the voice parameter further includes a signal energy value, and the analysis unit 51 includes:
拆分单元,用于在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个 所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成;A splitting unit, configured to split the pre-stored voice signal into multiple sub-voice signals in a time dimension, and perform a product operation on each of the sub-voice signals and a weighting coefficient, wherein the weighting coefficient is determined by a preset Weighted formula generation;
组合单元,用于将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信号能量值以及所述信号周期组合为所述语音参数。The combining unit is configured to combine the weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-voice signal into the voice parameter.
可选地,包括多个初始模型,且预存语音信号还对应属性特征,训练单元52包括:Optionally, multiple initial models are included, and the pre-stored voice signals also correspond to attribute features. The training unit 52 includes:
划分单元,用于将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量;A dividing unit, configured to divide the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and construct the parameter vector based on the voice parameters in the feature parameter set;
建立单元,用于将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训练得到所述情绪模型。A establishing unit, configured to establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and pass a plurality of the parameter vectors in the feature parameter set and a corresponding plurality of the The emotional model trains the initial model to obtain the emotional model.
可选地,输入单元53包括:Optionally, the input unit 53 includes:
获取单元,用于获取所述待测语音信号对应的所述属性特征;An obtaining unit, configured to obtain the attribute characteristic corresponding to the voice signal to be tested;
确定单元,用于根据所述映射关系确定与所述属性特征对应的所述情绪模型,将所述待测语音信号的所述语音参数输入所述情绪模型。The determining unit is configured to determine the emotion model corresponding to the attribute feature according to the mapping relationship, and input the voice parameters of the voice signal to be tested into the emotion model.
可选地,属性特征包括男性和女性,划分单元还包括:Optionally, the attribute characteristics include male and female, and the division unit further includes:
频率确定单元,用于获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率;A frequency determining unit, configured to acquire a preset frequency threshold, and determine the signal frequency of the pre-stored voice signal based on the signal period;
第一设置单元,用于若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性;A first setting unit, configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to female if the signal frequency is higher than the frequency threshold;
第二设置单元,用于若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。The second setting unit is configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to male if the signal frequency is not higher than the frequency threshold.
图6是本申请实施例提供的终端设备的示意图。如图6所示,该实施例的终端设备6包括:处理器60以及存储器61,所述存储器61中存储有可在所述处理器60上运行的计算机可读指令62,例如基于检测模型的情绪分析程序。所述处理器60执行所述计算机可读指令62时实现上述各个基于检测模型的情绪分析方法实施例中的步骤,例如图1所示的步骤S101至S103。或者,所述处理器60执行所述计算机可读指令62时实现上述基于检测模型的情绪分析装置实施例中各单元的功能,例如图5所示单元51至53的功能。6 is a schematic diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 6, the terminal device 6 of this embodiment includes: a processor 60 and a memory 61, and the memory 61 stores computer-readable instructions 62 that can run on the processor 60, for example, based on a detection model Sentiment analysis program. When the processor 60 executes the computer-readable instruction 62, the steps in the foregoing embodiments of the sentiment analysis method based on the detection model are implemented, for example, steps S101 to S103 shown in FIG. 1. Alternatively, when the processor 60 executes the computer-readable instructions 62, the functions of the units in the foregoing embodiment of the emotion analysis device based on the detection model are implemented, for example, the functions of the units 51 to 53 shown in FIG.
示例性的,所述计算机可读指令62可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器61中,并由所述处理器60执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令62在所述终端设备6中的执行过程。例如,所述计算机可读指令62可以被分割成分析单元、训练单元和输入单元,各单元具体功能如上所述。Exemplarily, the computer-readable instructions 62 may be divided into one or more modules / units, the one or more modules / units are stored in the memory 61, and executed by the processor 60, To complete this application. The one or more modules / units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 62 in the terminal device 6. For example, the computer-readable instructions 62 may be divided into an analysis unit, a training unit, and an input unit, and the specific functions of each unit are as described above.
所述终端设备可包括,但不仅限于,处理器60、存储器61。本领域技术人员可以理解,图6仅仅是终端设备6的示例,并不构成对终端设备6的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。The terminal device may include, but is not limited to, the processor 60 and the memory 61. Those skilled in the art may understand that FIG. 6 is only an example of the terminal device 6 and does not constitute a limitation on the terminal device 6, and may include more or less components than the illustration, or a combination of certain components or different components. For example, the terminal device may further include an input and output device, a network access device, a bus, and the like.
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
所述存储器61可以是所述终端设备6的内部存储单元,例如终端设备6的硬盘或内存。所述存储器61也可以是所述终端设备6的外部存储设备,例如所述终端设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器61还可以既包括所述终端设备6的内部存储单元也包括外部存储设备。所述存储器61用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, for example, a plug-in hard disk equipped on the terminal device 6, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc. Further, the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device. The memory 61 is used to store the computer-readable instructions and other programs and data required by the terminal device. The memory 61 can also be used to temporarily store data that has been or will be output.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they can still The technical solutions described in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种基于检测模型的情绪分析方法,其特征在于,包括:A sentiment analysis method based on detection model, which is characterized by:
    获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;Obtain multiple pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters, where the voice parameters include sound intensity value, loudness value, pitch value, and signal period, wherein each of the pre-stored voice signals corresponds to a pre-stored voice signal Set emotion level;
    基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;Constructing a parameter vector based on the speech parameters, and training the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
    将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The voice parameters of the voice signal to be tested are input to the emotion model, and the output result of the emotion model is determined as the emotion level corresponding to the voice signal to be tested.
  2. 如权利要求1所述的情绪分析方法,其特征在于,所述语音参数还包括信号能量值,所述对所述预存语音信号进行分析得到语音参数,包括:The sentiment analysis method according to claim 1, wherein the speech parameter further includes a signal energy value, and the analysis of the pre-stored speech signal to obtain the speech parameter includes:
    在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成;Split the pre-stored speech signal into multiple sub-speech signals in the time dimension, and perform a product operation on each of the sub-speech signals and a weighting coefficient, where the weighting coefficient is generated by a preset weighting formula;
    将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信号能量值以及所述信号周期组合为所述语音参数。The weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-speech signal are combined into the speech parameter.
  3. 如权利要求1所述的情绪分析方法,其特征在于,包括多个所述初始模型,且所述预存语音信号还对应属性特征,所述基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型,包括:The sentiment analysis method according to claim 1, characterized in that it includes a plurality of said initial models, and said pre-stored speech signals also correspond to attribute features, said constructing a parameter vector based on said speech parameters, through a plurality of said The parameter vector and corresponding multiple emotion levels train the initial model to obtain the emotion model, including:
    将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量;Dividing the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and constructing the parameter vector based on the voice parameters in the feature parameter set;
    将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训练得到所述情绪模型。Establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and use multiple parameter vectors and corresponding multiple emotion levels in the feature parameter set to The initial model is trained to obtain the emotion model.
  4. 如权利要求3所述的情绪分析方法,其特征在于,所述将待测语音信号的所述语音参数输入所述情绪模型,包括:The emotion analysis method according to claim 3, wherein the inputting the speech parameters of the speech signal to be tested into the emotion model includes:
    获取所述待测语音信号对应的所述属性特征;Acquiring the attribute characteristics corresponding to the voice signal to be tested;
    根据所述映射关系确定与所述属性特征对应的所述情绪模型,将所述待测语音信号的所述语音参数输入所述情绪模型。The emotion model corresponding to the attribute feature is determined according to the mapping relationship, and the speech parameters of the speech signal to be tested are input into the emotion model.
  5. 如权利要求3所述的情绪分析方法,其特征在于,所述属性特征包括男性和女性,所 述将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集之前,还包括:The sentiment analysis method according to claim 3, wherein the attribute features include male and female, and the voice parameters of the pre-stored voice signals corresponding to the same attribute feature are divided into feature parameter sets, Also includes:
    获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率;Acquiring a preset frequency threshold, and determining the signal frequency of the pre-stored voice signal based on the signal period;
    若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性;If the signal frequency is higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to female;
    若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。If the signal frequency is not higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to male.
  6. 一种基于检测模型的情绪分析装置,其特征在于,包括:A sentiment analysis device based on a detection model is characterized in that it includes:
    分析单元,用于获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;The analyzing unit is used to obtain a plurality of pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters. The voice parameters include a sound intensity value, a loudness value, a pitch value, and a signal period, wherein each of the pre-stored voice signals The voice signal corresponds to a preset emotion level;
    训练单元,用于基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;A training unit, configured to construct a parameter vector based on the speech parameters, and train the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
    输入单元,用于将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The input unit is configured to input the speech parameters of the speech signal to be tested into the emotion model, and determine the output result of the emotion model as the emotion level corresponding to the speech signal to be tested.
  7. 如权利要求6所述的情绪分析装置,其特征在于,所述语音参数还包括信号能量值,所述分析单元包括:The sentiment analysis device according to claim 6, wherein the speech parameter further includes a signal energy value, and the analysis unit includes:
    拆分单元,用于在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成;A splitting unit for splitting the pre-stored voice signal into multiple sub-voice signals in the time dimension, and performing a product operation on each of the sub-voice signals and a weighting coefficient, wherein Weighted formula generation;
    组合单元,用于将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信号能量值以及所述信号周期组合为所述语音参数。The combining unit is configured to combine the weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-voice signal into the voice parameter.
  8. 如权利要求6所述的情绪分析装置,其特征在于,包括多个所述初始模型,且所述预存语音信号还对应属性特征,所述训练单元包括:The emotion analysis device according to claim 6, characterized in that it includes a plurality of the initial models, and the pre-stored speech signal also corresponds to attribute characteristics, and the training unit includes:
    划分单元,用于将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量;A dividing unit, configured to divide the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and construct the parameter vector based on the voice parameters in the feature parameter set;
    建立单元,用于将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训练得到所述情绪模型。A establishing unit, configured to establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and pass a plurality of the parameter vectors in the feature parameter set and a corresponding plurality of the The emotional model trains the initial model to obtain the emotional model.
  9. 如权利要求8所述的情绪分析装置,其特征在于,所述输入单元包括:The sentiment analysis device according to claim 8, wherein the input unit comprises:
    获取单元,用于获取所述待测语音信号对应的所述属性特征;An obtaining unit, configured to obtain the attribute characteristic corresponding to the voice signal to be tested;
    确定单元,用于根据所述映射关系确定与所述属性特征对应的所述情绪模型,将所述待 测语音信号的所述语音参数输入所述情绪模型。The determining unit is configured to determine the emotion model corresponding to the attribute feature according to the mapping relationship, and input the voice parameters of the voice signal to be tested into the emotion model.
  10. 如权利要求8所述的情绪分析装置,其特征在于,所述属性特征包括男性和女性,所述划分单元还包括:The sentiment analysis device according to claim 8, wherein the attribute feature includes male and female, and the dividing unit further includes:
    频率确定单元,用于获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率;A frequency determining unit, configured to acquire a preset frequency threshold, and determine the signal frequency of the pre-stored voice signal based on the signal period;
    第一设置单元,用于若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性;A first setting unit, configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to female if the signal frequency is higher than the frequency threshold;
    第二设置单元,用于若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。The second setting unit is configured to set the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency to male if the signal frequency is not higher than the frequency threshold.
  11. 一种终端设备,其特征在于,包括存储器以及处理器,所述存储器中存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A terminal device, characterized in that it includes a memory and a processor, and the memory stores computer-readable instructions executable on the processor, and the processor implements the computer-readable instructions to implement the following steps :
    获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;Obtain multiple pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters, where the voice parameters include sound intensity value, loudness value, pitch value, and signal period, wherein each of the pre-stored voice signals corresponds to a pre-stored voice signal Set emotion level;
    基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;Constructing a parameter vector based on the speech parameters, and training the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
    将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The voice parameters of the voice signal to be tested are input to the emotion model, and the output result of the emotion model is determined as the emotion level corresponding to the voice signal to be tested.
  12. 如权利要求11所述的终端设备,其特征在于,所述语音参数还包括信号能量值,所述对所述预存语音信号进行分析得到语音参数,包括:The terminal device according to claim 11, wherein the voice parameter further includes a signal energy value, and the analyzing the pre-stored voice signal to obtain the voice parameter includes:
    在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成;Split the pre-stored speech signal into multiple sub-speech signals in the time dimension, and perform a product operation on each of the sub-speech signals and a weighting coefficient, where the weighting coefficient is generated by a preset weighting formula;
    将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信号能量值以及所述信号周期组合为所述语音参数。The weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-speech signal are combined into the speech parameter.
  13. 如权利要求11所述的终端设备,其特征在于,包括多个所述初始模型,且所述预存语音信号还对应属性特征,所述基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型,包括:The terminal device according to claim 11, characterized in that it includes a plurality of said initial models, and said pre-stored speech signal also corresponds to attribute characteristics, said constructing a parameter vector based on said speech parameters, through a plurality of said parameters The initial model is trained by the vector and corresponding multiple of the emotion levels, including:
    将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量;Dividing the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and constructing the parameter vector based on the voice parameters in the feature parameter set;
    将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训 练得到所述情绪模型。Establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and use multiple parameter vectors and corresponding multiple emotion levels in the feature parameter set to The initial model is trained to obtain the emotion model.
  14. 如权利要求13所述的终端设备,其特征在于,所述将待测语音信号的所述语音参数输入所述情绪模型,包括:The terminal device according to claim 13, wherein the inputting the voice parameters of the voice signal to be tested into the emotion model includes:
    获取所述待测语音信号对应的所述属性特征;Acquiring the attribute characteristics corresponding to the voice signal to be tested;
    根据所述映射关系确定与所述属性特征对应的所述情绪模型,将所述待测语音信号的所述语音参数输入所述情绪模型。The emotion model corresponding to the attribute feature is determined according to the mapping relationship, and the speech parameters of the speech signal to be tested are input into the emotion model.
  15. 根据权利要求13所述的终端设备,其特征在于,所述属性特征包括男性和女性,所述将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集之前,还包括:The terminal device according to claim 13, wherein the attribute features include male and female, and before dividing the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, include:
    获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率;Acquiring a preset frequency threshold, and determining the signal frequency of the pre-stored voice signal based on the signal period;
    若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性;If the signal frequency is higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to female;
    若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。If the signal frequency is not higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to male.
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被至少一个处理器执行时实现如下步骤:A computer nonvolatile readable storage medium, the computer nonvolatile readable storage medium stores computer readable instructions, characterized in that, when the computer readable instructions are executed by at least one processor, the following steps are realized :
    获取多个预存语音信号,对所述预存语音信号进行分析得到语音参数,所述语音参数包括声强值、响度值、音高值以及信号周期,其中,每个所述预存语音信号对应一个预设的情绪等级;Obtain multiple pre-stored voice signals, and analyze the pre-stored voice signals to obtain voice parameters, where the voice parameters include sound intensity value, loudness value, pitch value, and signal period, wherein each of the pre-stored voice signals corresponds to a pre-stored voice signal Set emotion level;
    基于所述语音参数构建参数向量,通过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型;Constructing a parameter vector based on the speech parameters, and training the initial model through a plurality of the parameter vectors and a corresponding plurality of the emotion levels to obtain an emotion model;
    将待测语音信号的所述语音参数输入所述情绪模型,并将所述情绪模型的输出结果确定为所述待测语音信号对应的所述情绪等级。The voice parameters of the voice signal to be tested are input to the emotion model, and the output result of the emotion model is determined as the emotion level corresponding to the voice signal to be tested.
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述语音参数还包括信号能量值,所述对所述预存语音信号进行分析得到语音参数,包括:The computer non-volatile storage medium according to claim 16, wherein the voice parameter further includes a signal energy value, and the analyzing the pre-stored voice signal to obtain the voice parameter includes:
    在时间维度上将所述预存语音信号拆分为多个子语音信号,并将每一个所述子语音信号与加权系数进行乘积运算,其中,所述加权系数由预设的加权公式生成;Split the pre-stored speech signal into multiple sub-speech signals in the time dimension, and perform a product operation on each of the sub-speech signals and a weighting coefficient, where the weighting coefficient is generated by a preset weighting formula;
    将加权后的所述子语音信号的所述声强值、所述响度值、所述音高值、所述信号能量值以及所述信号周期组合为所述语音参数。The weighted sound intensity value, the loudness value, the pitch value, the signal energy value, and the signal period of the sub-speech signal are combined into the speech parameter.
  18. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,包括多个所述初始模型,且所述预存语音信号还对应属性特征,所述基于所述语音参数构建参数向量,通 过多个所述参数向量以及对应的多个所述情绪等级对初始模型进行训练得到情绪模型,包括:The computer non-volatile storage medium according to claim 16, characterized in that it includes a plurality of the initial models, and the pre-stored voice signal also corresponds to attribute characteristics, and the parameter vector is constructed based on the voice parameters , The initial model is trained through multiple of the parameter vectors and corresponding multiple of the emotional levels, including:
    将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集,并基于所述特征参数集内的所述语音参数构建所述参数向量;Dividing the voice parameters of the pre-stored voice signal corresponding to the same attribute feature into feature parameter sets, and constructing the parameter vector based on the voice parameters in the feature parameter set;
    将所述特征参数集对应的所述属性特征与其中一个所述初始模型建立映射关系,并通过所述特征参数集内的多个所述参数向量及对应的多个所述情绪等级对所述初始模型进行训练得到所述情绪模型。Establish a mapping relationship between the attribute feature corresponding to the feature parameter set and one of the initial models, and use multiple parameter vectors and corresponding multiple emotion levels in the feature parameter set to The initial model is trained to obtain the emotion model.
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述将待测语音信号的所述语音参数输入所述情绪模型,包括:The computer non-volatile readable storage medium according to claim 18, wherein the inputting the voice parameters of the voice signal to be tested into the emotion model includes:
    获取所述待测语音信号对应的所述属性特征;Acquiring the attribute characteristics corresponding to the voice signal to be tested;
    根据所述映射关系确定与所述属性特征对应的所述情绪模型,将所述待测语音信号的所述语音参数输入所述情绪模型。The emotion model corresponding to the attribute feature is determined according to the mapping relationship, and the speech parameters of the speech signal to be tested are input into the emotion model.
  20. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述属性特征包括男性和女性,所述将同一个属性特征对应的所述预存语音信号的所述语音参数划分至特征参数集之前,还包括:The computer non-volatile readable storage medium according to claim 18, wherein the attribute features include male and female, and the voice parameters of the pre-stored voice signals corresponding to the same attribute feature are divided Before the feature parameter set, it also includes:
    获取预设的频率阈值,并基于所述信号周期确定所述预存语音信号的信号频率;Acquiring a preset frequency threshold, and determining the signal frequency of the pre-stored voice signal based on the signal period;
    若所述信号频率高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为女性;If the signal frequency is higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to female;
    若所述信号频率不高于所述频率阈值,则将所述信号频率对应的所述预存语音信号的所述属性特征设置为男性。If the signal frequency is not higher than the frequency threshold, the attribute characteristic of the pre-stored voice signal corresponding to the signal frequency is set to male.
PCT/CN2018/124629 2018-11-12 2018-12-28 Detection model-based emotions analysis method, apparatus and terminal device WO2020098107A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811340781.3 2018-11-12
CN201811340781.3A CN109473122A (en) 2018-11-12 2018-11-12 Mood analysis method, device and terminal device based on detection model

Publications (1)

Publication Number Publication Date
WO2020098107A1 true WO2020098107A1 (en) 2020-05-22

Family

ID=65671765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124629 WO2020098107A1 (en) 2018-11-12 2018-12-28 Detection model-based emotions analysis method, apparatus and terminal device

Country Status (2)

Country Link
CN (1) CN109473122A (en)
WO (1) WO2020098107A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263326B (en) * 2019-05-21 2022-05-03 平安科技(深圳)有限公司 User behavior prediction method, prediction device, storage medium and terminal equipment
CN110570844B (en) * 2019-08-15 2023-05-05 平安科技(深圳)有限公司 Speech emotion recognition method, device and computer readable storage medium
CN111128190B (en) * 2019-12-31 2023-03-21 恒信东方文化股份有限公司 Expression matching method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
US20180197557A1 (en) * 2017-01-12 2018-07-12 Qualcomm Incorporated Characteristic-based speech codebook selection
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN108682432A (en) * 2018-05-11 2018-10-19 南京邮电大学 Speech emotion recognition device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222500A (en) * 2011-05-11 2011-10-19 北京航空航天大学 Extracting method and modeling method for Chinese speech emotion combining emotion points
CN104240720A (en) * 2013-06-24 2014-12-24 北京大学深圳研究生院 Voice emotion recognition method based on multi-fractal and information fusion
CN105894039A (en) * 2016-04-25 2016-08-24 京东方科技集团股份有限公司 Emotion recognition modeling method, emotion recognition method and apparatus, and intelligent device
CN105895101A (en) * 2016-06-08 2016-08-24 国网上海市电力公司 Speech processing equipment and processing method for power intelligent auxiliary service system
CN108346436B (en) * 2017-08-22 2020-06-23 腾讯科技(深圳)有限公司 Voice emotion detection method and device, computer equipment and storage medium
CN107464573A (en) * 2017-09-06 2017-12-12 竹间智能科技(上海)有限公司 A kind of new customer service call quality inspection system and method
CN107944542A (en) * 2017-11-21 2018-04-20 北京光年无限科技有限公司 A kind of multi-modal interactive output method and system based on visual human
CN108597539B (en) * 2018-02-09 2021-09-03 桂林电子科技大学 Speech emotion recognition method based on parameter migration and spectrogram
CN108536802B (en) * 2018-03-30 2020-01-14 百度在线网络技术(北京)有限公司 Interaction method and device based on child emotion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180197557A1 (en) * 2017-01-12 2018-07-12 Qualcomm Incorporated Characteristic-based speech codebook selection
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN108682432A (en) * 2018-05-11 2018-10-19 南京邮电大学 Speech emotion recognition device

Also Published As

Publication number Publication date
CN109473122A (en) 2019-03-15

Similar Documents

Publication Publication Date Title
WO2020173133A1 (en) Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium
EP3955246A1 (en) Voiceprint recognition method and device based on memory bottleneck feature
US20170358306A1 (en) Neural network-based voiceprint information extraction method and apparatus
Mariooryad et al. Compensating for speaker or lexical variabilities in speech for emotion recognition
US9123342B2 (en) Method of recognizing gender or age of a speaker according to speech emotion or arousal
US8655656B2 (en) Method and system for assessing intelligibility of speech represented by a speech signal
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
RU2720359C1 (en) Method and equipment for recognizing emotions in speech
CN107767881B (en) Method and device for acquiring satisfaction degree of voice information
US20160086622A1 (en) Speech processing device, speech processing method, and computer program product
CN111326169B (en) Voice quality evaluation method and device
WO2020098107A1 (en) Detection model-based emotions analysis method, apparatus and terminal device
US10311888B2 (en) Voice quality conversion device, voice quality conversion method and program
CN108305639A (en) Speech-emotion recognition method, computer readable storage medium, terminal
WO2023283823A1 (en) Speech adversarial sample testing method and apparatus, device, and computer-readable storage medium
Wang et al. Automatic hypernasality detection in cleft palate speech using cnn
Shankar et al. A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective.
CN115019833B (en) Voice emotion recognition method and system based on time-frequency characteristics and global attention
Fan et al. The impact of student learning aids on deep learning and mobile platform on learning behavior
Deb et al. Classification of speech under stress using harmonic peak to energy ratio
CN113539243A (en) Training method of voice classification model, voice classification method and related device
Johar Paralinguistic profiling using speech recognition
Jaiswal et al. A generative adversarial network based ensemble technique for automatic evaluation of machine synthesized speech
Li et al. An improved method of speech recognition based on probabilistic neural network ensembles
CN114333844A (en) Voiceprint recognition method, device, medium and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18939959

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18939959

Country of ref document: EP

Kind code of ref document: A1