CN111489740A

CN111489740A - Voice processing method and device and elevator control method and device

Info

Publication number: CN111489740A
Application number: CN202010325555.9A
Authority: CN
Inventors: 许孝先; 冯大航; 陈孝良
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Shandong Shengzhi Wulian Technology Co ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-08-04

Abstract

The present invention provides a voice processing method and device, and an elevator control method and device. The voice processing method includes: extracting a first voice feature of the to-be-processed voice; Two voice features; obtaining the processing result of the voice to be processed based on the second voice feature. The embodiment of the present invention can improve the performance of the network model in the process of speech processing.

Description

Voice processing method and device, elevator control method and device

技术领域technical field

本发明涉及自然语言处理技术领域，尤其涉及一种语音处理方法及装置、电梯控制方法及装置。The present invention relates to the technical field of natural language processing, and in particular, to a method and device for speech processing, and a method and device for controlling an elevator.

背景技术Background technique

自然语言是指通过自然进化产生的人类之间用于交流的语言。自然语言处理(Natural Language Processing，NLP)是计算机科学，人工智能，语言学关注计算机和人类(自然)语言之间的相互作用的领域。自然语言处理技术可以基于网络模型对语音进行处理，满足各个使用场景的需求，例如，在语音识别使用场景，可以基于语音识别网络模型将语音处理为文字。Natural language refers to the language used for communication between human beings produced through natural evolution. Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interaction between computers and human (natural) language. Natural language processing technology can process speech based on network models to meet the needs of various usage scenarios. For example, in speech recognition usage scenarios, speech can be processed into text based on speech recognition network models.

在进行语音处理的过程中，相同的语音内容，音量不同，导致语音的幅值不同，从而语音特征也呈现较大的差异性，但实际上，语音处理的结果是相同的，从而会使得网络模型的性能较差。In the process of voice processing, the same voice content and different volume will lead to different voice amplitudes, so the voice characteristics also show great differences, but in fact, the results of voice processing are the same, which will make the network The performance of the model is poor.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种语音处理方法及装置、电梯控制方法及装置，以解决现有技术中音量不同导致语音的幅值不同，从而语音特征呈现较大的差异性，使得网络模型的性能较差的问题。Embodiments of the present invention provide a voice processing method and device, and an elevator control method and device, so as to solve the problem that in the prior art, the amplitude of the voice is different due to the different volume, so that the voice features show a large difference, so that the performance of the network model is better than that of the network model. Bad question.

为了解决上述技术问题，本发明是这样实现的：In order to solve the above-mentioned technical problems, the present invention is achieved in this way:

第一方面，本发明实施例提供了一种语音处理方法，所述方法包括：In a first aspect, an embodiment of the present invention provides a speech processing method, the method comprising:

提取待处理语音的第一语音特征；extracting the first speech feature of the speech to be processed;

对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征；Performing voice amplitude feature separation processing on the first voice feature to obtain a second voice feature;

基于所述第二语音特征获取所述待处理语音的处理结果。The processing result of the to-be-processed speech is acquired based on the second speech feature.

第二方面，本发明实施例提供了一种电梯控制方法，所述方法包括：In a second aspect, an embodiment of the present invention provides an elevator control method, the method comprising:

接收用户在使用电梯场景下输入的目标语音；Receive the target voice input by the user in the elevator scenario;

采用本发明实施例所述的语音处理方法对所述目标语音进行离线意图识别，得到第一控制信息；Use the voice processing method described in the embodiment of the present invention to perform off-line intention recognition on the target voice to obtain first control information;

控制电梯执行所述第一控制信息对应的第一操作。The elevator is controlled to perform the first operation corresponding to the first control information.

第三方面，本发明实施例提供了一种语音处理装置，所述语音处理装置包括：In a third aspect, an embodiment of the present invention provides a voice processing apparatus, where the voice processing apparatus includes:

提取模块，用于提取待处理语音的第一语音特征；an extraction module for extracting the first speech feature of the speech to be processed;

分离模块，用于对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征；A separation module, configured to perform speech amplitude feature separation processing on the first speech feature to obtain a second speech feature;

获取模块，用于基于所述第二语音特征获取所述待处理语音的处理结果。An obtaining module, configured to obtain the processing result of the voice to be processed based on the second voice feature.

第四方面，本发明实施例提供了一种电梯控制装置，所述电梯控制装置包括：In a fourth aspect, an embodiment of the present invention provides an elevator control device, and the elevator control device includes:

第一接收模块，用于接收用户在使用电梯场景下输入的目标语音；The first receiving module is used to receive the target voice input by the user under the scene of using the elevator;

识别模块，用于采用本发明实施例所述的语音处理方法对所述目标语音进行离线意图识别，得到第一控制信息；a recognition module, configured to perform off-line intention recognition on the target voice by using the voice processing method described in the embodiment of the present invention to obtain first control information;

第一控制模块，用于控制电梯执行所述第一控制信息对应的第一操作。The first control module is used to control the elevator to perform the first operation corresponding to the first control information.

第五方面，本发明实施例提供了一种电子设备，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的程序，所述程序被所述处理器执行时实现如第一方面所述的语音处理方法中的步骤，或者，所述程序被所述处理器执行时实现如第二方面所述的电梯控制方法中的步骤。In a fifth aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a program stored in the memory and executable on the processor, when the program is executed by the processor The steps in the voice processing method according to the first aspect are implemented, or, when the program is executed by the processor, the steps in the elevator control method according to the second aspect are implemented.

第六方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的语音处理方法中的步骤，或者，所述计算机程序被处理器执行时实现如第二方面所述的电梯控制方法中的步骤。In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the voice processing described in the first aspect is implemented The steps in the method, or, when the computer program is executed by the processor, implements the steps in the elevator control method according to the second aspect.

本发明实施例中，提取待处理语音的第一语音特征；对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征；基于所述第二语音特征获取所述待处理语音的处理结果。这样，相同的语音内容且不同音量的语音信号，可以表征为语音幅度存在放大倍数，通过对所述第一语音特征进行语音幅度特征分离处理，可以降低由于音量不同导致的语音特征的差异性，从而可以提高网络模型的性能。In the embodiment of the present invention, a first voice feature of the to-be-processed voice is extracted; a voice amplitude feature separation process is performed on the first voice feature to obtain a second voice feature; and a second voice feature is obtained based on the second voice feature process result. In this way, the voice signals with the same voice content and different volumes can be characterized as having a magnification of the voice amplitude. Thereby, the performance of the network model can be improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present invention. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1是本发明实施例提供的一种语音处理方法的流程图；1 is a flowchart of a speech processing method provided by an embodiment of the present invention;

图2是本发明实施例提供的一种网络模型学习的示意图；2 is a schematic diagram of a network model learning provided by an embodiment of the present invention;

图3是本发明实施例提供的一种语音处理装置的结构示意图；3 is a schematic structural diagram of a voice processing apparatus provided by an embodiment of the present invention;

图4是本发明实施例提供的一种电梯控制装置的结构示意图之一；Fig. 4 is one of the structural schematic diagrams of an elevator control device provided by an embodiment of the present invention;

图5是本发明实施例提供的一种电梯控制装置的结构示意图之二；Fig. 5 is the second structural schematic diagram of an elevator control device provided by an embodiment of the present invention;

图6是本发明实施例提供的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

参见图1，图1是本发明实施例提供的一种语音处理方法的流程图，如图1所示，包括以下步骤：Referring to FIG. 1, FIG. 1 is a flowchart of a speech processing method provided by an embodiment of the present invention, as shown in FIG. 1, including the following steps:

步骤101、提取待处理语音的第一语音特征。Step 101: Extract the first speech feature of the speech to be processed.

其中，所述第一语音特征可以包括多个第一特征值。所述多个第一特征值可以基于取对数运算获得，可以是以常数e为底数取对数，或者还可以，以其他数为底数取对数，本发明实施例对此不进行限定。所述第一语音特征可以为filter banks(滤波器组)特征。Wherein, the first speech feature may include a plurality of first feature values. The plurality of first eigenvalues may be obtained based on a logarithmic operation, and may take a logarithm with a constant e as a base, or may also take a logarithm with other numbers as a base, which is not limited in this embodiment of the present invention. The first voice feature can be filter banks (filter bank) feature.

步骤102、对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征。Step 102: Perform a voice amplitude feature separation process on the first voice feature to obtain a second voice feature.

其中，所述第二语音特征可以包括多个第二特征值。所述对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征，可以包括：对所述多个第一特征值进行特征平均处理，得到语音幅度特征值，所述语音幅度特征值用于表征所述待处理语音的语音幅度特征，基于所述语音幅度特征值，分别对每个所述第一特征值进行语音幅度特征分离处理，得到所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值。Wherein, the second speech feature may include a plurality of second feature values. The performing voice amplitude feature separation processing on the first voice feature to obtain a second voice feature may include: performing feature averaging processing on the plurality of first feature values to obtain a voice amplitude feature value, the voice amplitude feature The value is used to characterize the speech amplitude feature of the to-be-processed speech, and based on the speech amplitude feature value, each of the first feature values is subjected to speech amplitude feature separation processing to obtain the second speech features corresponding to the second eigenvalue of each of the first eigenvalues.

步骤103、基于所述第二语音特征获取所述待处理语音的处理结果。Step 103: Acquire a processing result of the to-be-processed speech based on the second speech feature.

其中，所述基于所述第二语音特征获取所述待处理语音的处理结果，可以是，基于所述第二语音特征进行语音识别，得到语音识别的结果；或者还可以是，基于所述第二语音特征进行语音翻译，得到语音翻译的结果；或者还可以是，将所述第二语音特征用于其他使用场景，得到所述待处理语音的处理结果，本发明实施例对此不进行限定。可以在对网络模型进行训练的过程中，将第二语音特征输入网络模型进行训练；或者，还可以是在使用网络模型进行预测的过程中，将第二语音特征输入网络模型进行预测。Wherein, the obtaining the processing result of the speech to be processed based on the second speech feature may be, performing speech recognition based on the second speech feature to obtain a speech recognition result; or may also be, based on the first speech recognition The second voice feature is used for voice translation, and the result of the voice translation is obtained; or, the second voice feature is used in other usage scenarios to obtain the processing result of the to-be-processed voice, which is not limited in this embodiment of the present invention. . In the process of training the network model, the second speech feature may be input into the network model for training; or, in the process of using the network model for prediction, the second speech feature may be input into the network model for prediction.

在实际应用中，分别以对第一语音和第二语音进行语音处理为例，第二语音可以与第一语音的语音内容相同，且第二语音可以由第一语音的音量放大n倍得到，第一语音的第一语音特征和第二语音的第一语音特征可以均为filter banks特征。第一语音的多个第一特征值可以为(a₁，a₂，a₃，…，a_i)，由于第二语音由第一语音的音量放大n倍得到，且filter banks特征基于取对数运算获得，第二语音的多个第一特征值为(a₁+ln(n)，a₂+ln(n)，a₃+ln(n)，…，a_i+ln(n))。所述语音幅度特征值可以为所述多个第一特征值的平均值，第一语音的多个第一特征值的平均值为：a_avg＝(a₁+a₂+a₃+…+a_i)/i，第二语音的多个第一特征值的平均值为：a_avg+ln(n)，可以分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，得到所述每个第一特征值对应的第二特征值，计算得到的第一语音的多个第二特征值可以为(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)，计算得到的第二语音的多个第二特征值可以为(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)。In practical applications, taking the voice processing of the first voice and the second voice as an example, the second voice can be the same as the voice content of the first voice, and the second voice can be obtained by amplifying the volume of the first voice n times, The first voice feature of the first voice and the first voice feature of the second voice may both be filter banks features. A plurality of first feature values of the first voice can be (a ₁ , a ₂ , a ₃ , ..., a _i ), since the second voice is obtained by amplifying the volume of the first voice n times, and the filter banks feature is based on taking pairs Obtained by arithmetic operation, the plurality of first eigenvalues of the second speech are (a ₁ +ln(n), a ₂ +ln(n), a ₃ +ln(n), ..., a _i +ln(n)) . The speech amplitude feature value may be the average value of the plurality of first feature values, and the average value of the plurality of first feature values of the first speech is: a _avg =(a ₁ +a ₂ +a ₃ +...+ a _i )/i, the average value of multiple first feature values of the second speech is: a _avg +ln(n), and the relationship between each first feature value in the multiple first feature values and the The difference value of the average value is obtained to obtain the second eigenvalue corresponding to each of the first eigenvalues, and the plurality of second eigenvalues of the first speech obtained by calculation may be (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ..., a _i -a _avg ), the calculated second eigenvalues of the second speech may be (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ... , a _i -a _avg ).

在实际应用中，可以在改进的filter banks特征中增加一维特征值用于表征待处理语音的语音幅度特征，语音幅度特征可以表示音量，使得其他维度的特征值与音量无关，例如，可以将(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg，a_avg)作为第一语音的改进的filterbanks特征，可以将(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg，a_avg+ln(n))作为第二语音的改进的filter banks特征；或者，可以将(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)作为第一语音的改进的filter banks特征，可以将(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)作为第二语音的改进的filter banks特征。对于相同的语音内容，改变语音幅度，改进的filter banks特征前i维是固定值，不会改变，只有最后一维表征平均值的特征值会改变，前i维是纯粹的特征，与幅值大小毫无关系，有利于网络模型学习。使用相同的网络模型，改进的filter banks特征在语音处理过程中性能比filter banks特征可以提升3％到10％。In practical applications, one-dimensional eigenvalues can be added to the improved filter banks feature to represent the speech amplitude feature of the speech to be processed. The speech amplitude feature can represent the volume, so that the eigenvalues of other dimensions are independent of the volume. For example, you can use (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ..., a _i -a _avg , a _avg ) as the improved filterbanks features of the first speech, (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ..., a _i -a _avg , a _avg +ln(n)) as an improved filter banks feature for the second voice; alternatively, (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ..., a _i -a _avg ) as the improved filter banks feature of the first speech, which can be (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg ) , ..., a _i -a _avg ) as an improved filter banks feature for the second speech. For the same speech content, changing the speech amplitude, the first i dimension of the improved filter banks feature is a fixed value and will not change, only the eigenvalue representing the average value of the last dimension will change, the first i dimension is a pure feature, and the amplitude The size has nothing to do with the network model learning. Using the same network model, the performance of the improved filter banks feature can be improved by 3% to 10% over the filter banks feature during speech processing.

需要说明的是，语音的音量在一定范围内，例如，语音的音量大于预设值，相同的语音内容，语音识别的结果是相同的。音量对语音特征的特征向量的每一维均存在相同的影响，在语音处理的过程中会降低网络模型学习的效率。以filter banks特征为例，filterbanks特征可以将待处理语音依次通过傅里叶变换、梅尔滤波及取对数运算获得，若语音的音量放大n倍，n为正整数，则语音的幅值放大n倍，通过傅里叶变换，在频谱上，相应放大n倍；通过梅尔滤波，维持放大n倍；通过取对数运算，音量放大n倍的语音比未放大的语音的filter banks特征中，每一维增加ln(n)。如图2所示，对于网络模型来说，在输入语音内容均为A，但音量不同的多个语音的filter banks特征时，同样的语音内容，因为语音幅值大小不一样，A随着幅值影响在一条直线上平移，然而进行语音识别等语音处理时，得到的结果都是A。由于音量不同，filter banks特征每一维在做相同的平移，语音幅值对每一维都有着相同的影响，会增加网络模型的负担。It should be noted that when the volume of the voice is within a certain range, for example, the volume of the voice is greater than a preset value, and the same voice content results in the same voice recognition result. The volume has the same effect on each dimension of the feature vector of the speech feature, which will reduce the efficiency of network model learning in the process of speech processing. Taking the filter banks feature as an example, the filterbanks feature can be obtained by Fourier transform, Mel filtering and logarithmic operations in turn. If the volume of the voice is amplified by n times, and n is a positive integer, the amplitude of the voice is amplified. n times, through the Fourier transform, in the spectrum, the corresponding amplification is n times; through mel filtering, the amplification is maintained n times; by taking the logarithmic operation, the volume of the voice amplified by n times is larger than that of the unamplified voice in the filter banks feature , each dimension increases by ln(n). As shown in Figure 2, for the network model, when the input voice content is A, but the filter banks feature of multiple voices with different volumes, the same voice content, because the voice amplitude is different, A varies with the amplitude. The value affects the translation on a straight line, but when performing speech processing such as speech recognition, the results obtained are all A. Since the volume is different, each dimension of the filter banks feature is doing the same translation, and the speech amplitude has the same effect on each dimension, which will increase the burden of the network model.

可选的，所述第一语音特征包括多个第一特征值，所述第二语音特征包括多个第二特征值；Optionally, the first voice feature includes multiple first feature values, and the second voice feature includes multiple second feature values;

所述对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征，包括：The said first voice feature is subjected to voice amplitude feature separation processing to obtain a second voice feature, including:

对所述多个第一特征值进行特征平均处理，得到语音幅度特征值，所述语音幅度特征值用于表征所述待处理语音的语音幅度特征；Performing feature averaging processing on the plurality of first feature values to obtain a voice amplitude feature value, where the voice amplitude feature value is used to characterize the voice amplitude feature of the to-be-processed voice;

基于所述语音幅度特征值，分别对每个所述第一特征值进行语音幅度特征分离处理，得到所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值。Based on the speech amplitude feature values, the speech amplitude feature separation processing is performed on each of the first feature values respectively, and the second features corresponding to each of the first feature values in the second speech features are obtained. value.

其中，所述多个第一特征值可以基于取对数运算获得，可以是以常数e为底数取对数，或者还可以，以其他数为底数取对数，本发明实施例对此不进行限定。The plurality of first eigenvalues may be obtained based on a logarithmic operation, the logarithm may be obtained by taking the constant e as the base, or the logarithm may be obtained by taking other numbers as the base, which is not performed in this embodiment of the present invention. limited.

另外，所述对所述多个第一特征值进行特征平均处理，可以是，基于所述多个第一特征值计算所述多个第一特征值的平均值，所述语音幅度特征值可以为所述多个第一特征值的平均值。所述基于所述语音幅度特征值，分别对每个所述第一特征值进行语音幅度特征分离处理，得到所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值，可以是，分别基于所述多个第一特征值中每个第一特征值与所述多个第一特征值的平均值，获取所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值。In addition, the feature averaging process performed on the plurality of first feature values may be to calculate an average value of the plurality of first feature values based on the plurality of first feature values, and the speech amplitude feature value may be is the average value of the plurality of first feature values. The voice amplitude feature separation processing is performed on each of the first feature values based on the feature value of the voice amplitude, to obtain the first feature value corresponding to each of the first feature values in the second voice feature. Two eigenvalues, which can be obtained from the second speech feature corresponding to each the second eigenvalue of the first eigenvalue.

另外，所述分别基于所述多个第一特征值中每个第一特征值与所述多个第一特征值的平均值，获取所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值，可以包括，可以分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，得到所述每个第一特征值对应的第二特征值；或者，可以包括，分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，并将所述差值与第一预设值相乘得到所述每个第一特征值对应的第二特征值；或者，可以包括，分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，并将所述差值与第二预设值相减得到所述每个第一特征值对应的第二特征值，等等，本发明实施例对此不进行限定。In addition, obtaining the second speech feature corresponding to each of the The second eigenvalue of an eigenvalue may include, respectively calculating the difference between each of the first eigenvalues in the plurality of first eigenvalues and the average value to obtain each of the first eigenvalues The corresponding second feature value; or, it may include calculating the difference between each first feature value in the plurality of first feature values and the average value, respectively, and comparing the difference with the first preset value Multiplying to obtain a second eigenvalue corresponding to each of the first eigenvalues; or, it may include separately calculating the difference between each of the first eigenvalues in the plurality of first eigenvalues and the average value, and A second feature value corresponding to each of the first feature values is obtained by subtracting the difference value from the second preset value, and so on, which is not limited in this embodiment of the present invention.

优选的，所述分别基于所述多个第一特征值中每个第一特征值与所述多个第一特征值的平均值，获取所述第二语音特征中分别对应于每个所述第一特征值的所述第二特征值，可以包括：分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，得到所述每个第一特征值对应的第二特征值。能够使得相同的语音内容且不同音量的至少两个语音信号中，每个语音信号的第二特征值为相同值，从而可以进一步降低由于音量不同导致的语音特征的差异性，进而可以提高网络模型的性能。Preferably, according to the average value of each of the first feature values in the plurality of first feature values and the plurality of first feature values, the second speech features corresponding to each of the The second eigenvalue of the first eigenvalue may include: separately calculating the difference between each of the first eigenvalues in the plurality of first eigenvalues and the average value to obtain each of the first eigenvalues the corresponding second eigenvalue. It can make at least two voice signals with the same voice content and different volume, the second feature value of each voice signal is the same value, so that the difference of voice features caused by different volume can be further reduced, and then the network model can be improved. performance.

以第一语音和第二语音为例，第二语音可以与第一语音的语音内容相同，且第二语音可以由第一语音的音量放大n倍得到，第二语音的第一语音特征中的第一特征值可以基于取对数运算获得，第一语音的第一语音特征中的第一特征值可以基于取对数运算获得，因此，第二语音的第一语音特征中每个第一特征值比第一语音的第一语音特征中对应的第一特征值增加ln(n)或者log(n)。Taking the first voice and the second voice as an example, the second voice can be the same as the voice content of the first voice, and the second voice can be obtained by amplifying the volume of the first voice n times. The first feature value can be obtained based on the logarithmic operation, and the first feature value in the first voice feature of the first voice can be obtained based on the logarithmic operation. Therefore, each first feature in the first voice feature of the second voice The value is increased by ln(n) or log(n) from the corresponding first feature value in the first voice feature of the first voice.

以第一语音的第一语音特征和第二语音的第一语音特征均为filter banks特征为例，第二语音的第一语音特征中每个第一特征值比第一语音的第一语音特征中对应的第一特征值增加ln(n)，第一语音的多个第一特征值可以为(a₁，a₂，a₃，…，a_i)，第二语音的多个第一特征值可以为(a₁+ln(n)，a₂+ln(n)，a₃+ln(n)，…，a_i+ln(n))，i为正整数。第一语音的多个第一特征值的平均值可以为：a_avg＝(a₁+a₂+a₃+…+a_i)/i，第二语音的多个第一特征值的平均值可以为：a_avg+ln(n)，可以分别计算所述多个第一特征值中每个第一特征值与所述平均值的差值，得到所述每个第一特征值对应的第二特征值，计算得到的第一语音的多个第二特征值可以为(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)，计算得到的第二语音的多个第二特征值可以为(a₁-a_avg，a₂-a_avg，a₃-a_avg，…，a_i-a_avg)。With the first voice feature of the first voice and the first voice feature of the second voice are filter banks feature as example, each first feature value in the first voice feature of the second voice is more than the first voice feature of the first voice _ln ( _n ₎ is added to the corresponding first _eigenvalues in The values can be (a ₁ +ln(n), a ₂ +ln(n), a ₃ +ln(n), ..., a _i +ln(n)), where i is a positive integer. The average value of the plurality of first feature values of the first speech may be: a _avg =(a ₁ +a ₂ +a ₃ +...+a _i )/i, the average value of the plurality of first feature values of the second speech It can be: a _avg +ln(n), the difference between each first eigenvalue in the plurality of first eigenvalues and the average value can be calculated separately, and the first eigenvalue corresponding to each first eigenvalue can be obtained. Two eigenvalues, the calculated second eigenvalues of the first speech may be (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , ..., a _i -a _avg ), and the calculated The plurality of second feature values of the second speech may be (a ₁ -a _avg , a ₂ -a _avg , a ₃ -a _avg , . . . , a _i -a _avg ).

该实施方式中，通过对所述多个第一特征值进行特征平均处理，能够快捷且准确地对所述第一语音特征进行语音幅度特征分离处理。In this embodiment, by performing feature averaging processing on the plurality of first feature values, the voice amplitude feature separation processing can be performed on the first voice feature quickly and accurately.

可选的，所述第二语音特征还包括所述语音幅度特征值。Optionally, the second speech feature further includes the speech amplitude feature value.

其中，所述语音幅度特征值可以为所述多个第一特征值的平均值。以多个第一特征值为(x₁，x₂，x₃，…，x_k)，多个第一特征值的平均值为x_avg为例，对应于每个所述第一特征值的所述第二特征值可以为：(x₁-x_avg，x₂-x_avg，x₃-x_avg，…，x_k-x_avg，x_avg)。可以将(x₁-x_avg，x₂-x_avg，x₃-x_avg，…，x_k-x_avg，x_avg)输入网络模型，从而进一步对语音进行处理。Wherein, the speech amplitude feature value may be an average value of the plurality of first feature values. Taking multiple first eigenvalues (x ₁ , x ₂ , x ₃ , . . . , x _k ) and the average value of multiple first eigenvalues as x _avg as an example, corresponding to each of the first eigenvalues The second eigenvalues may be: (x ₁ -x _avg , x ₂ -x _avg , x ₃ -x _avg , . . . , x _k -x _avg , x _avg ). (x ₁ -x _avg , x ₂ -x _avg , x ₃ -x _avg , . . . , x _k -x _avg , x _avg ) can be fed into the network model for further speech processing.

该实施方式中，所述第二语音特征还包括所述语音幅度特征值，可以将语音信号中与音量相关的特征单独提取出来，作为语音特征的一部分，从而可以降低由于音量不同导致的语音特征的差异性，从而可以提高网络模型的性能；并且，所述语音幅度特征值还可以用于区分噪声，通过语音幅度特征值可以区分噪声和语音，在进行语音处理的过程中需要使用噪声的情况下，包括所述语音幅度特征值的第二语音特征进行语音处理的效果较好。In this embodiment, the second voice feature further includes the voice amplitude feature value, and the volume-related features in the voice signal can be separately extracted as a part of the voice features, so that the voice features caused by different volumes can be reduced. The difference of the speech amplitude can improve the performance of the network model; in addition, the speech amplitude characteristic value can also be used to distinguish noise, and the speech amplitude characteristic value can be used to distinguish noise and speech, and the noise needs to be used in the process of speech processing. In this case, the effect of voice processing on the second voice feature including the voice amplitude feature value is better.

可选的，所述第二语音特征的维度大于或等于所述第一语音特征的维度。Optionally, the dimension of the second speech feature is greater than or equal to the dimension of the first speech feature.

其中，所述第一语音特征可以包括多个第一特征值，所述第二语音特征可以包括多个第二特征值。第一特征值可以与第二特征值一一对应，从而第二语音特征的维度可以等于所述第一语音特征的维度。第二语音特征还可以包括语音幅度特征值，所述语音幅度特征值用于表征所述待处理语音的语音幅度特征，从而所述第二语音特征的维度大于所述第一语音特征的维度。进一步的，第二语音特征还可以包括用于表征待处理语音的其他特征的特征值，本发明实施例对此不进行限定。Wherein, the first voice feature may include multiple first feature values, and the second voice feature may include multiple second feature values. The first feature value may correspond to the second feature value one-to-one, so that the dimension of the second speech feature may be equal to the dimension of the first speech feature. The second voice feature may further include a voice amplitude feature value, and the voice amplitude feature value is used to represent the voice amplitude feature of the to-be-processed voice, so that the dimension of the second voice feature is larger than the dimension of the first voice feature. Further, the second speech feature may further include a feature value used to represent other features of the speech to be processed, which is not limited in this embodiment of the present invention.

该实施方式中，所述第二语音特征的维度大于或等于所述第一语音特征的维度，从而可以获取更多的待处理语音的特征，进而可以提高语音处理的效果。In this embodiment, the dimension of the second speech feature is greater than or equal to the dimension of the first speech feature, so that more features of the speech to be processed can be acquired, thereby improving the effect of speech processing.

可选的，所述第一语音特征包括filter banks滤波器组特征。Optionally, the first voice feature includes filter banks filter bank feature.

其中，filter banks特征，也就是，Fbank特征，是目前常用的语音特征，人耳对声音频谱的响应是非线性的，Fbank特征可以模仿人耳的方式对语音进行处理，在语音识别过程中，采用Fbank特征可以提高语音识别的性能。可以通过对语音逐帧进行傅里叶变换和梅尔滤波，然后通过取对数，得到Fbank特征。在实际应用中，可以将待处理语音进行傅里叶变换，得到待处理语音的频域特征，对所述待处理语音的频域特征进行梅尔滤波，得到滤波结果，对所述滤波结果取对数，得到待处理语音的Fbank特征。Among them, the filter banks feature, that is, the Fbank feature, is a commonly used speech feature at present. The response of the human ear to the sound spectrum is non-linear. The Fbank feature can imitate the human ear to process speech. In the speech recognition process, using Fbank features can improve the performance of speech recognition. The Fbank feature can be obtained by performing Fourier transform and Mel filtering on the speech frame by frame, and then taking the logarithm. In practical applications, Fourier transform can be performed on the speech to be processed to obtain the frequency domain features of the to-be-processed speech, and mel filtering is performed on the frequency-domain features of the to-be-processed speech to obtain a filtering result, and the filtering result is taken as Logarithm to get the Fbank feature of the speech to be processed.

以第一语音和第二语音为例，第二语音可以与第一语音的语音内容相同，且第二语音可以由第一语音的音量放大n倍得到，第一语音的第一语音特征和第二语音的第一语音特征可以均为filter banks特征。例如，在对第一语音进行傅里叶变换和梅尔滤波后，可以得到(b₁，b₂，b₃，…，b_i)，通过取对数，可以得到第一语音的第一语音特征(ln b₁，ln b₂，lnb₃，…，ln b_i)。在对第二语音进行傅里叶变换和梅尔滤波后，可以得到(nb₁，nb₂，nb₃，…，nb_i)，通过取对数，可以得到第二语音的第一语音特征(ln b₁+ln(n)，ln b₂+ln(n)，ln b₃+ln(n)，…，ln b_i+ln(n))。Taking the first voice and the second voice as an example, the second voice can be the same as the voice content of the first voice, and the second voice can be obtained by amplifying the volume of the first voice n times. The first voice features of the two voices may both be filter banks features. For example, after performing Fourier transform and Mel filtering on the first speech, (b ₁ , b ₂ , b ₃ , . . . , b _i ) can be obtained, and by taking the logarithm, the first speech of the first speech can be obtained features(ln b ₁ , ln b ₂ , lnb ₃ , . . . , ln b _i ). After performing Fourier transform and Mel filtering on the second speech, (nb ₁ , nb ₂ , nb ₃ , ..., nb _i ) can be obtained, and by taking the logarithm, the first speech feature of the second speech can be obtained ( ln b ₁ +ln(n), ln b ₂ +ln(n), ln b ₃ +ln(n), , ln b _i +ln(n)).

该实施方式中，所述第一语音特征包括filter banks特征，filter banks特征对声音的音量较为敏感，音量对filter banks特征的特征向量的每一维均存在相同的影响，通过分离出所述待处理语音的语音幅度特征，能够提取出语音信号中与音量相关的特征，计算得到第二语音特征，可以实现对filter banks特征的改进，可以降低由于音量不同导致的语音特征的差异性，从而采用改进后的filter banks特征进行语音处理可以提高网络模型的性能。In this embodiment, the described first voice feature includes filter banks feature, filter banks feature is more sensitive to the volume of sound, and volume has the same influence to each dimension of the feature vector of filter banks feature, by separating the described waiting By processing the voice amplitude feature of the voice, the features related to the volume in the voice signal can be extracted, and the second voice feature can be obtained by calculation, which can realize the improvement of the filter banks feature, and can reduce the difference of the voice feature caused by the different volume. The improved filter banks feature for speech processing can improve the performance of the network model.

本发明实施例还提供一种电梯控制方法，所述方法包括：An embodiment of the present invention also provides an elevator control method, the method comprising:

其中，电梯控制方法可以应用于电梯中的电梯控制装置，用于控制电梯前往某个楼层或者取消前往某个楼层，又或者控制电梯开门或者关门，电梯控制装置还可以控制电梯进行其他操作，本发明实施例对此不做限制。电梯控制装置接收用户输入的目标语音的实现方式可以为：电梯控制装置响应于接收到用户输入的唤醒词后，接收输入的语音作为目标语音。其中，唤醒词可以根据需要设定，例如，唤醒词可以为“你好，电梯”。或者，电梯控制装置也可以直接接收输入的语音作为目标语音。Among them, the elevator control method can be applied to the elevator control device in the elevator to control the elevator to go to a certain floor or cancel going to a certain floor, or to control the elevator to open or close the door. The elevator control device can also control the elevator to perform other operations. This embodiment of the invention does not limit this. The implementation manner of the elevator control device receiving the target voice input by the user may be as follows: the elevator control device receives the input voice as the target voice in response to receiving the wake-up word input by the user. The wake-up word can be set as required, for example, the wake-up word can be "Hello, elevator". Alternatively, the elevator control device may directly receive the input voice as the target voice.

另外，所述采用本发明实施例所述的语音处理方法对所述目标语音进行离线意图识别，得到第一控制信息，可以是，提取目标语音的第一语音特征；对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征；基于所述第二语音特征获取所述第一控制信息。In addition, the use of the voice processing method according to the embodiment of the present invention to perform off-line intent recognition on the target voice to obtain the first control information may be: extracting the first voice feature of the target voice; Performing voice amplitude feature separation processing to obtain a second voice feature; and acquiring the first control information based on the second voice feature.

需要说明的是，基于所述第二语音特征获取所述第一控制信息，可以是，基于所述第二语音特征进行离线意图识别，得到所述第一控制信息，具体的，可以将第二语音特征输入用于离线意图识别的网络模型，得到第一控制信息。所述基于第二语音特征进行离线意图识别，得到第一控制信息的实现方式有两种，第一种方式是直接根据第二语音特征获取第一控制信息，第二种方式是基于第二语音特征将目标语音转换为文本，根据该文本获取第一控制信息。It should be noted that acquiring the first control information based on the second voice feature may be performing offline intent recognition based on the second voice feature to obtain the first control information. Specifically, the second The voice feature is input to a network model for offline intent recognition to obtain first control information. There are two ways to obtain the first control information by performing offline intent recognition based on the second voice feature. The first way is to obtain the first control information directly according to the second voice feature, and the second way is to obtain the first control information based on the second voice feature. The feature converts the target speech into text, and obtains the first control information according to the text.

例如，在对电梯进行电梯控制的场景下，第一控制信息可以包括对电梯控制的控制指令和该控制指令对应的楼层，其中，控制指令可以包括确认指令，用于控制电梯前往某个楼层，还可以包括取消指令，用于取消电梯前往某个楼层的操作。需要说明的一点是，上述第一控制信息是示例性说明，第一控制信息可以根据应用场景改变，本发明实施例对此不做限制。For example, in a scenario where elevator control is performed on an elevator, the first control information may include a control instruction for controlling the elevator and a floor corresponding to the control instruction, wherein the control instruction may include a confirmation instruction for controlling the elevator to go to a certain floor, It can also include a cancel command for canceling the operation of the elevator going to a certain floor. It should be noted that the foregoing first control information is an exemplary description, and the first control information may be changed according to an application scenario, which is not limited in this embodiment of the present invention.

在第一种方式中，电梯控制装置可以预先存储语音命令词库，该语音命令词库用于存储多个语音命令词，可以存储有每个语音命令词的语音特征，且一个语音命令词对应一个意图信息。所述基于第二语音特征进行离线意图识别，得到第一控制信息的实现方式可以为：从语音命令词库中选择与第二语音特征相似度最高的语音特征对应的语音命令词，将该语音命令词对应的意图信息作为第一控制信息。In the first mode, the elevator control device can pre-store a voice command vocabulary, and the voice command vocabulary is used to store a plurality of voice command words, and can store the voice characteristics of each voice command word, and one voice command word corresponds to an intent message. The implementation of the offline intention recognition based on the second voice feature to obtain the first control information may be: selecting a voice command word corresponding to the voice feature with the highest similarity of the second voice feature from the voice command vocabulary, The intent information corresponding to the command word is used as the first control information.

在第二种方式中，电梯控制装置可以存储文本命令词库，该文本命令词库用于存储多个文本命令词，且一个文本命令词对应一个意图信息。所述基于第二语音特征进行离线意图识别，得到第一控制信息的实现方式可以为：获取第二语音特征对应的第一文本，从文本命令词库中选择与第一文本相似度最高的文本命令词，将该文本命令词对应的意图信息作为第一控制信息。In the second manner, the elevator control device may store a text command thesaurus, the text command thesaurus is used to store a plurality of text command words, and one text command word corresponds to one piece of intention information. The offline intention recognition based on the second voice feature to obtain the first control information may be implemented by: obtaining the first text corresponding to the second voice feature, and selecting the text with the highest similarity to the first text from the text command vocabulary For the command word, the intention information corresponding to the text command word is used as the first control information.

现有技术中，在线语音识别需要经过网络传输，响应速度慢，并且，在线语音识别易受网络质量的影响，在网络质量差的情况下，响应的延迟较大，从而导致电梯控制的效率低。In the prior art, online speech recognition needs to be transmitted through the network, the response speed is slow, and the online speech recognition is easily affected by the quality of the network, and in the case of poor network quality, the delay of the response is relatively large, thereby causing the efficiency of elevator control to be low. .

本发明实施例中，接收用户在使用电梯场景下输入的目标语音；采用本发明实施例所述的语音处理方法对所述目标语音进行离线意图识别，得到第一控制信息；控制电梯执行所述第一控制信息对应的第一操作。通过本发明实施例的语音处理方法进行离线意图识别，可以提高识别的效率，由于离线识别一般比在线识别的响应速度快，通过进行离线意图识别，得到第一控制信息，执行第一控制信息对应的第一操作，保证了电梯控制的响应速度，从而可提高电梯控制的效率。将该方法应用于控制电梯的场景下，可以大大提高电梯启动和运行的效率，提高用户粘性。In the embodiment of the present invention, the target voice input by the user in the scenario of using the elevator is received; the off-line intention recognition of the target voice is performed by using the voice processing method described in the embodiment of the present invention, and the first control information is obtained; the elevator is controlled to execute the The first operation corresponding to the first control information. The offline intent recognition is performed by the speech processing method of the embodiment of the present invention, which can improve the recognition efficiency. Since the offline recognition generally has a faster response speed than the online recognition, the first control information is obtained by performing the offline intent recognition, and the corresponding first control information is executed. The first operation ensures the response speed of the elevator control, thereby improving the efficiency of the elevator control. Applying this method to the scene of controlling an elevator can greatly improve the efficiency of elevator startup and operation, and improve user stickiness.

可选的，所述方法还包括：Optionally, the method further includes:

向服务器发送所述目标语音，以使所述服务器对所述目标语音进行在线意图识别；sending the target speech to a server, so that the server performs online intent recognition on the target speech;

接收所述服务器发送的第二控制信息；receiving the second control information sent by the server;

若所述第二控制信息与所述第一控制信息不一致，则控制所述电梯取消执行所述第一操作，并执行所述第二控制信息对应的第二操作。If the second control information is inconsistent with the first control information, control the elevator to cancel the execution of the first operation, and execute the second operation corresponding to the second control information.

其中，第二控制信息与第一控制信息同理，此处不做赘述。服务器对目标语音进行在线意图识别，可以得到第二控制信息。得到第二控制信息的实现方式可以与电梯控制装置进行离线意图识别，得到第一控制信息的实现方式同理，此处不再赘述。需要说明的是，由于在线意图识别的语音命令词库和文本命令词库存储在云端，其中的样本数据更加丰富，语音识别的成功率和准确率很高。例如，语音命令词库可以用于控制电梯，在线意图识别的语音命令词库可以包括语音命令词“去餐厅”，“去餐厅”对应的意图信息可以为“确认指令-3楼”，从而电梯控制更加智能。The second control information is the same as the first control information, and details are not described here. The server performs online intention recognition on the target speech, and can obtain the second control information. The implementation manner of obtaining the second control information may be performed with the elevator control device for offline intention identification, and the implementation manner of obtaining the first control information is the same, and will not be repeated here. It should be noted that since the speech command thesaurus and the text command thesaurus for online intent recognition are stored in the cloud, the sample data therein is more abundant, and the success rate and accuracy of speech recognition are high. For example, the voice command thesaurus can be used to control the elevator, the voice command thesaurus for online intent recognition can include the voice command word "go to the restaurant", and the intent information corresponding to "go to the restaurant" can be "confirm instruction - 3rd floor", so the elevator Controls are smarter.

电梯控制装置若确定第二控制信息与第一控制信息一致，则可以忽略第二控制信息，继续控制电梯执行第一操作。电梯控制装置若确定第二控制信息与第一控制信息不一致，可以直接控制所述电梯取消执行所述第一操作，并执行所述第二控制信息对应的第二操作，方法简单，效率高。If it is determined that the second control information is consistent with the first control information, the elevator control device may ignore the second control information and continue to control the elevator to perform the first operation. If the elevator control device determines that the second control information is inconsistent with the first control information, it can directly control the elevator to cancel the execution of the first operation and execute the second operation corresponding to the second control information. The method is simple and efficient.

以第一控制信息为“确认指令-3楼”为例，对应的第一操作为前往3楼，第二控制信息可以为“确认指令-5楼”，对应的第二操作为前往5楼，则电梯可以直接取消前往3楼的操作，前往5楼。Taking the first control information as "confirm command-3rd floor" as an example, the corresponding first operation is to go to the 3rd floor, the second control information can be "confirm command-5th floor", and the corresponding second operation is to go to the 5th floor, Then the elevator can directly cancel the operation to go to the 3rd floor and go to the 5th floor.

该实施方式中，通过服务器进行在线意图识别，由于在线识别的准确率一般比离线识别的准确率高，若第二控制信息与第一控制信息不一致，控制所述电梯取消执行第一操作，执行第二控制信息对应的第二操作，保证了电梯控制的准确性。In this embodiment, online intention identification is performed through the server. Since the accuracy of online identification is generally higher than that of offline identification, if the second control information is inconsistent with the first control information, the elevator is controlled to cancel the first operation and execute the The second operation corresponding to the second control information ensures the accuracy of elevator control.

参见图3，图3是本发明实施例提供的一种语音处理装置的结构示意图，如图3所示，语音处理装置200包括：Referring to FIG. 3, FIG. 3 is a schematic structural diagram of a speech processing apparatus provided by an embodiment of the present invention. As shown in FIG. 3, the speech processing apparatus 200 includes:

提取模块201，用于提取待处理语音的第一语音特征；The extraction module 201 is used for extracting the first speech feature of the speech to be processed;

分离模块202，用于对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征；A separation module 202, configured to perform speech amplitude feature separation processing on the first speech feature to obtain a second speech feature;

获取模块203，用于基于所述第二语音特征获取所述待处理语音的处理结果。The obtaining module 203 is configured to obtain the processing result of the voice to be processed based on the second voice feature.

所述分离模块202具体用于：The separation module 202 is specifically used for:

语音处理装置能够实现图1的方法实施例中实现的各个过程，为避免重复，这里不再赘述。The speech processing apparatus can implement each process implemented in the method embodiment of FIG. 1 , and to avoid repetition, details are not repeated here.

参见图4，图4是本发明实施例提供的一种电梯控制装置的结构示意图之一，如图4所示，电梯控制装置300包括：Referring to FIG. 4, FIG. 4 is one of the structural schematic diagrams of an elevator control device provided by an embodiment of the present invention. As shown in FIG. 4, the elevator control device 300 includes:

第一接收模块301，用于接收用户在使用电梯场景下输入的目标语音；The first receiving module 301 is used to receive the target voice input by the user in the scenario of using the elevator;

识别模块302，用于采用本发明实施例所述的语音处理方法对所述目标语音进行离线意图识别，得到第一控制信息；A recognition module 302, configured to perform off-line intention recognition on the target voice by using the voice processing method described in the embodiment of the present invention, to obtain first control information;

第一控制模块303，用于控制电梯执行所述第一控制信息对应的第一操作。The first control module 303 is configured to control the elevator to perform the first operation corresponding to the first control information.

可选的，如图5所示，所述电梯控制装置300还包括：Optionally, as shown in FIG. 5 , the elevator control device 300 further includes:

发送模块304，用于向服务器发送所述目标语音，以使所述服务器对所述目标语音进行在线意图识别；A sending module 304, configured to send the target voice to a server, so that the server performs online intention recognition on the target voice;

第二接收模块305，用于接收所述服务器发送的第二控制信息；A second receiving module 305, configured to receive the second control information sent by the server;

第二控制模块306，用于若所述第二控制信息与所述第一控制信息不一致，则控制所述电梯取消执行所述第一操作，并执行所述第二控制信息对应的第二操作。The second control module 306 is configured to control the elevator to cancel the execution of the first operation and execute the second operation corresponding to the second control information if the second control information is inconsistent with the first control information .

电梯控制装置能够实现本发明实施例中电梯控制方法中实现的各个过程，为避免重复，这里不再赘述。The elevator control device can implement each process implemented in the elevator control method in the embodiment of the present invention, and in order to avoid repetition, details are not repeated here.

在本发明实施例中，电子设备包括但不限于手机、平板电脑、笔记本电脑、掌上电脑、车载移动终端、可穿戴设备、以及电梯等。In this embodiment of the present invention, electronic devices include but are not limited to mobile phones, tablet computers, notebook computers, palmtop computers, vehicle-mounted mobile terminals, wearable devices, elevators, and the like.

请参见图6，图6是本发明实施例提供的一种电子设备的结构示意图，如图6所示，电子设备400包括：存储器402、处理器401及存储在所述存储器402上并可在所述处理器401上运行的程序，其中：Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 6, the electronic device 400 includes: a memory 402, a processor 401, and a memory 402 and a processor 401 stored on the memory 402 and available in the The program running on the processor 401, wherein:

作为一种实施方式，所述处理器401读取存储器402中的程序，用于执行：As an implementation manner, the processor 401 reads the program in the memory 402 to execute:

所述处理器401用于执行的对所述第一语音特征进行语音幅度特征分离处理，得到第二语音特征，包括：The processor 401 is configured to perform speech amplitude feature separation processing on the first speech feature to obtain a second speech feature, including:

作为另一种实施方式，所述处理器401读取存储器402中的程序，用于执行：As another implementation manner, the processor 401 reads the program in the memory 402 to execute:

可选的，所述处理器401还用于执行：Optionally, the processor 401 is further configured to execute:

在图6中，总线架构可以包括任意数量的互联的总线和桥，具体由处理器401代表的一个或多个处理器和存储器402代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起，这些都是本领域所公知的，因此，本文不再对其进行进一步描述。总线接口提供接口。In FIG. 6, the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors represented by processor 401 and various circuits of memory represented by memory 402 linked together. The bus architecture may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein. The bus interface provides the interface.

处理器401负责管理总线架构和通常的处理，存储器402可以存储处理器401在执行操作时所使用的数据。The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 401 in performing operations.

需要说明的是，本发明实施例方法实施例中的任意实施方式都可以被本实施例中的上述电子设备所实现，以及达到相同的有益效果，此处不再赘述。It should be noted that any implementation manner in the method embodiment of the embodiment of the present invention can be implemented by the electronic device in this embodiment, and achieve the same beneficial effects, which will not be repeated here.

本发明实施例还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现上述语音处理方法实施例的各个过程，或者，该计算机程序被处理器执行时实现上述电梯控制方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，所述的计算机可读存储介质，如只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the foregoing speech processing method embodiment is implemented, or the computer program is When the processor executes, each process of the above-mentioned embodiments of the elevator control method is implemented, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here. The computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk, or an optical disk.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the spirit of the present invention and the scope protected by the claims, many forms can be made, which all belong to the protection of the present invention.

Claims

1. A method of speech processing, the method comprising:

extracting a first voice feature of the voice to be processed;

carrying out voice amplitude characteristic separation processing on the first voice characteristic to obtain a second voice characteristic;

and acquiring a processing result of the voice to be processed based on the second voice characteristic.

2. The method of claim 1, wherein the first speech feature comprises a plurality of first feature values and the second speech feature comprises a plurality of second feature values;

the performing speech amplitude feature separation processing on the first speech feature to obtain a second speech feature includes:

carrying out feature average processing on the plurality of first feature values to obtain a voice amplitude feature value, wherein the voice amplitude feature value is used for representing the voice amplitude feature of the voice to be processed;

and respectively carrying out voice amplitude characteristic separation processing on each first characteristic value based on the voice amplitude characteristic value to obtain second characteristic values corresponding to each first characteristic value in the second voice characteristic.

3. The method of claim 2, wherein the second speech feature further comprises the speech magnitude feature value.

4. The method of claim 1, wherein the dimension of the second speech feature is greater than or equal to the dimension of the first speech feature.

5. The method of claim 1, wherein the first speech features comprise filter banks features.

6. An elevator control method, characterized in that the method comprises:

receiving target voice input by a user in a scene of using an elevator;

performing offline intention recognition on the target voice by adopting the voice processing method of any one of claims 1 to 5 to obtain first control information;

and controlling the elevator to execute a first operation corresponding to the first control information.

7. The method of claim 6, further comprising:

sending the target voice to a server so that the server performs online intention recognition on the target voice;

receiving second control information sent by the server;

and if the second control information is inconsistent with the first control information, controlling the elevator to cancel the execution of the first operation and executing a second operation corresponding to the second control information.

8. A speech processing apparatus, characterized in that the speech processing apparatus comprises:

the extraction module is used for extracting a first voice feature of the voice to be processed;

the separation module is used for carrying out voice amplitude characteristic separation processing on the first voice characteristic to obtain a second voice characteristic;

and the acquisition module is used for acquiring the processing result of the voice to be processed based on the second voice characteristic.

9. An elevator control device, characterized by comprising:

the first receiving module is used for receiving target voice input by a user in an elevator using scene;

the recognition module is used for performing offline intention recognition on the target voice by adopting the voice processing method of any one of claims 1 to 5 to obtain first control information;

and the first control module is used for controlling the elevator to execute the first operation corresponding to the first control information.

10. An electronic device, comprising: a memory, a processor and a program stored on the memory and executable on the processor, the program, when executed by the processor, implementing the steps in the speech processing method according to any of claims 1 to 5; alternatively, the program realizes the steps in the elevator control method according to any one of claims 6 to 7 when executed by the processor.