CN112509596B

CN112509596B - Wakeup control method, wakeup control device, storage medium and terminal

Info

Publication number: CN112509596B
Application number: CN202011303745.7A
Authority: CN
Inventors: 徐祥燕
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2024-07-09
Anticipated expiration: 2040-11-19
Also published as: CN112509596A

Abstract

The disclosure relates to a wake-up control method, a device, a storage medium and a terminal, wherein the method comprises the following steps: collecting multiple paths of audio data; respectively carrying out signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal; acquiring a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises confidence coefficients of a plurality of second target audio data; and determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient. That is, whether to wake up the terminal can be determined according to the first confidence coefficient of the target time period and the second confidence coefficient of the history time period, so that the probability that the terminal is awakened by mistake or is not awakened can be reduced, and the accuracy of the voice recognition system can be improved.

Description

Wakeup control method, wakeup control device, storage medium and terminal

Technical Field

The disclosure relates to the technical field of terminals, and in particular relates to a wake-up control method, a wake-up control device, a storage medium and a terminal.

Background

With the development of technology, more and more intelligent devices gradually enter the life of users, and in the intelligent devices, applications such as voice control, voice input, and voice start are becoming more and more popular. The intelligent device can collect voice data of the user in real time by carrying the voice recognition system, and execute a control instruction sent by the user according to the voice data to interact with the user.

However, in a real environment, due to noise interference, when responding to a control instruction of a user according to real-time voice data, the probability of error recognition is high, so that the accuracy of the existing voice recognition system is low.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a wake-up control method, device, storage medium and terminal.

According to a first aspect of embodiments of the present disclosure, there is provided a wake-up control method, the method including: collecting multiple paths of audio data; respectively carrying out signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal; acquiring a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises a plurality of confidence coefficients of the second target audio data; and determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient.

Optionally, the signal processing is performed on each path of first audio data collected in the target time period, so as to obtain a plurality of first target audio data, where the steps include: selecting one microphone of a microphone array of the terminal as a reference channel; acquiring reference audio data acquired by the reference channel in the target time period; and respectively carrying out signal processing on each path of first audio data through a plurality of signal processing modes according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data of different paths are different.

Optionally, the acquiring the first confidence degrees of the plurality of first target audio data respectively includes: determining a signal processing mode corresponding to each of the first target audio data in the plurality of first target audio data; determining a target decoder corresponding to the first target audio data according to the signal processing mode, wherein different signal processing modes correspond to different decoders; and inputting the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.

Optionally, the determining, according to the signal processing manner, the target decoder corresponding to the first target audio data includes: determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relation, wherein the decoder association relation comprises the corresponding relation between different signal processing modes and the decoders; and taking the decoder corresponding to the signal processing mode as the target decoder.

Optionally, the determining whether to wake up the terminal according to the first confidence level and the second confidence level includes: executing the following wake-up processing mode according to the first target confidence coefficient output by the target decoder under the condition that the first confidence coefficient output by one target decoder is obtained, until the terminal is waken up, or executing the wake-up processing mode according to the first confidence coefficient output by a plurality of target decoders; the wake-up processing mode comprises the following steps: and determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.

Optionally, the determining whether to wake up the terminal according to the first confidence level and the second confidence level includes: determining a target confidence coefficient from the second confidence coefficient, wherein the target confidence coefficient and the first confidence coefficient are confidence coefficients obtained by decoding by the same decoder; acquiring a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and a third confidence coefficient, wherein the third confidence coefficient comprises other confidence coefficients except the target confidence coefficient in the second confidence coefficient; determining a final confidence level according to the weight value and the first confidence level; and determining whether to wake up the terminal according to the final confidence level.

Optionally, the obtaining, according to the target confidence coefficient and the third confidence coefficient, a weight value corresponding to the first confidence coefficient includes: acquiring a confidence coefficient difference value between the target confidence coefficient and the third confidence coefficient; and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.

Optionally, the determining whether to wake up the terminal according to the final confidence level includes: and determining to wake up the terminal under the condition that the final confidence coefficient is greater than or equal to a preset confidence coefficient threshold value.

Optionally, the signal processing means includes blind source separation or noise suppression.

According to a second aspect of embodiments of the present disclosure, there is provided a wake-up control device, the device comprising: the data acquisition module is configured to acquire multiple paths of audio data; the signal processing module is configured to respectively perform signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data; the first confidence coefficient acquisition module is configured to acquire first confidence coefficients of a plurality of first target audio data respectively, wherein the confidence coefficients are used for representing the probability that the audio data can wake up a terminal; the second confidence coefficient acquisition module is configured to acquire a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises a plurality of confidence coefficients of the second target audio data; and the awakening module is configured to determine whether to awaken the terminal according to the first confidence coefficient and the second confidence coefficient.

Optionally, the signal processing module includes: a channel selection sub-module configured to select one microphone of a microphone array of the terminal as a reference channel; a reference data acquisition sub-module configured to acquire reference audio data acquired by the reference channel in the target time period; and the signal processing sub-module is configured to respectively perform signal processing on each path of the first audio data according to the reference audio data in a plurality of signal processing modes to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data of different paths are different.

Optionally, the first confidence acquiring module includes: a processing mode determining sub-module configured to determine, for each of the plurality of first target audio data, a signal processing mode corresponding to the first target audio data; a decoder determining sub-module configured to determine a target decoder corresponding to the first target audio data according to the signal processing manner, wherein different signal processing manners correspond to different decoders; the confidence determining submodule is configured to input the first target audio data into the target decoder for decoding processing to obtain a first confidence of the first target audio data.

Optionally, the decoder determination submodule is configured to: determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relation, wherein the decoder association relation comprises the corresponding relation between different signal processing modes and the decoders; and taking the decoder corresponding to the signal processing mode as the target decoder.

Optionally, the wake-up module includes: a wake-up processing sub-module configured to execute, in a case where the first confidence level output by one of the target decoders is obtained, the following wake-up processing manner according to the first target confidence level output by the target decoder until the terminal is waken up, or execute the wake-up processing manner according to the first confidence levels output by a plurality of the target decoders; the wake-up processing mode comprises the following steps: and determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.

Optionally, the wake-up module includes: a target confidence determining submodule configured to determine a target confidence from the second confidence, the target confidence and the first confidence being confidence decoded by the same decoder; the weight value determining submodule is configured to obtain a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and a third confidence coefficient, wherein the third confidence coefficient comprises other confidence coefficients except the target confidence coefficient in the second confidence coefficient; a final confidence coefficient obtaining sub-module configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient; and the wake-up sub-module is configured to determine whether to wake up the terminal according to the final confidence.

Optionally, the weight value determination submodule is further configured to: acquiring a confidence coefficient difference value between the target confidence coefficient and the third confidence coefficient; and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.

Optionally, the wake-up processing sub-module is further configured to: and determining to wake up the terminal under the condition that the final confidence coefficient is greater than or equal to a preset confidence coefficient threshold value.

According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the wake-up control method provided by the first aspect of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a terminal comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the wake-up control method provided in the first aspect of the disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: collecting multiple paths of audio data; respectively carrying out signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal; acquiring a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises a plurality of confidence coefficients of the second target audio data; and determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient. That is, the present disclosure may determine whether to wake up the terminal according to the first confidence level of the target period and the second confidence level of the history period, so that the probability that the terminal is erroneously or missed to wake up may be reduced, and thus the accuracy of the speech recognition system may be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a wake-up control method according to an exemplary embodiment;

fig. 2 is a schematic diagram illustrating a structure of a terminal according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating another wake-up control method according to an exemplary embodiment;

FIG. 4 is a schematic diagram of a wake-up control device, according to an exemplary embodiment;

Fig. 5 is a block diagram of a terminal according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

First, an application scenario of the present disclosure will be described. The method and the device can be applied to the terminal with the voice recognition function, and in a real environment, due to the influence of environmental noise on a voice recognition system, the probability of the terminal being awakened by mistake or being awakened by omission is high. In view of the fact that a single microphone cannot effectively process noise, particularly noise whose frequency response varies with time, such as music, in the related art, in order to satisfy different noise scenes, such as background noise, washing machines, televisions, etc., a microphone array may be employed in a terminal, and prediction is performed by a plurality of decoders to determine whether to wake up the terminal.

However, when prediction is performed by a plurality of decoders, if the prediction result output by any decoder is a wake-up terminal, it is determined to wake up the terminal. In this case, if the accuracy of the prediction of the decoder is low, the terminal is caused to wake up by mistake, so that the accuracy of the voice recognition system of the terminal is low, and the user experience is affected.

In order to solve the above problems, the present disclosure provides a wake-up control method, a device, a storage medium, and a terminal, where a plurality of first target audio data are obtained by performing signal processing on each path of first audio data collected in a target time period, and first confidence degrees of the plurality of first target audio data are obtained, respectively, after that, a second confidence degree of second target audio data in a history time period may be obtained, and whether to wake up the terminal is determined according to the first confidence degree and the second confidence degree. That is, the present disclosure may determine whether to wake up the terminal according to the first confidence level of the target period and the second confidence level of the history period, so that the probability that the terminal is erroneously or missed to wake up may be reduced, and thus the accuracy of the speech recognition system may be improved.

The present disclosure is described below in connection with specific embodiments.

FIG. 1 is a flow chart illustrating a wake-up control method, as shown in FIG. 1, according to an exemplary embodiment, comprising:

S101, collecting multiple paths of audio data.

It should be noted that, the wake-up control method is applied to a terminal device with a voice interaction function, for example, the terminal device is installed with an application with the voice interaction function, such as a voice assistant application, and the voice assistant application is used for identifying voice information of a user. The embodiments of the present disclosure may be applied to various terminal devices including, but not limited to, stationary devices and mobile devices, for example, including, but not limited to: personal computers (Personal Computer, PC), televisions, air conditioners, wall hanging furnaces and the like; the mobile device includes, but is not limited to: cell phone, tablet computer, wearable device, audio amplifier, alarm clock etc., this disclosure does not limit this. Fig. 2 is a schematic structural diagram of a terminal according to an exemplary embodiment, and as shown in fig. 2, the terminal may include a microphone array, a signal processing module, a decoder, and a wake-up module, where the microphone array may include a plurality of microphones, and the decoder may also include a plurality of microphones. The terminal can acquire multiple paths of audio data in real time through the microphone array, the multiple paths of audio data are sent to the signal processing module, the signal processing module can process the multiple paths of audio data according to the multiple paths of audio data to obtain processed target audio data, then the multiple target audio data can be decoded through multiple decoders to obtain multiple confidence degrees, and finally whether the terminal is awakened is determined according to the confidence degrees.

In this step, after the terminal is started, the acquisition module of the terminal may acquire multiple paths of audio data through multiple microphones in the microphone array, where each microphone corresponds to one path of audio data.

S102, respectively performing signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data.

In this step, the corresponding signal processing method may be preset according to the environment in which the terminal is used, for example, more signal processing methods may be set for terminals that are often used in noisy environments, such as mobile phones, and fewer signal processing methods may be set for terminals that are used in relatively quiet environments, such as air conditioners.

After the multiple paths of first audio data acquired in the target time period are acquired, the first audio data can be subjected to signal processing in a signal processing mode preset by the terminal, so that multiple first target audio data are obtained.

S103, respectively acquiring first confidence degrees of a plurality of first target audio data.

The confidence is used to characterize the probability that the audio data can wake up the terminal, and the confidence can range from 0 to 100.

In this step, after the plurality of first target audio data are acquired, decoding processing may be performed on the first target audio data by a decoder for each first target audio data, to obtain a first confidence coefficient of the first target audio data, and finally obtain a plurality of first confidence coefficients.

S104, acquiring a second confidence coefficient of the second target audio data in the historical time period.

The historical time period is a preset time period before the target time period, and the historical time period may be a time period belonging to the same scene as the target time period, for example, the historical time period and the target time period both belong to a time period for collecting audio data in a voice wake scene. In addition, the duration of the preset time period may be set according to the type of the terminal, or may be set according to a test experience value, which is not limited in the present disclosure.

The second confidence level may include confidence levels of a plurality of second target audio data, which are audio data after signal processing of the second audio data. The second confidence coefficient may be obtained by referring to the first confidence coefficient, which is not described herein. In addition, the second confidence may be stored in the terminal, for example, if the preset time period is 1 minute, the second confidence of the second target audio data 1 minute before the target time period may be stored. For example, the present disclosure may store the second confidence in a queue, and the different second confidence may correspond to different queues, e.g., if the second confidence includes 10, the 10 second confidence may be stored using 10 queues. For each queue, the second confidence coefficient of the head of the queue is the second confidence coefficient of the second target audio data acquired earliest, the second confidence coefficient of the tail of the queue is the second confidence coefficient of the second target audio data acquired latest, when the new second confidence coefficient needs to be stored, the second confidence coefficient of the head of the queue can be deleted, and the new second confidence coefficient is stored at the tail of the queue.

In this step, after the first confidence degrees of the plurality of first target audio data are acquired, a plurality of stored second confidence degrees may be acquired.

S105, determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient.

In this step, after obtaining the first confidence degrees of the first target audio data, it may be determined whether to wake up the terminal according to any one of the first confidence degrees and the second confidence degrees. In view of the small change of environmental noise in a period of time, in order to avoid that the first confidence is not accurate enough to cause the terminal to be awakened by mistake or to be awakened by mistake, the first confidence can be adjusted according to the second confidence. For example, if the obtained first confidence coefficient is relatively high, and the second confidence coefficient of the second target audio data obtained in the historical period before the first target audio data corresponding to the first confidence coefficient is relatively low, the first confidence coefficient may have a larger error, in this case, the first confidence coefficient may be reduced by referring to the second confidence coefficient, so that a more accurate first confidence coefficient may be obtained.

Further, after the adjusted first confidence coefficient is obtained, whether to wake up the terminal may be determined according to the adjusted first confidence coefficient, for example, in the case that the adjusted first confidence coefficient is higher, it may be determined to wake up the terminal, and in the case that the adjusted first confidence coefficient is lower, it may be determined not to wake up the terminal.

By adopting the method, the plurality of first target audio data are obtained by respectively carrying out signal processing on each path of first audio data acquired in the target time period, and the first confidence coefficient of the plurality of first target audio data is respectively obtained, and then the second confidence coefficient of the second target audio data in the history time period can be obtained, and whether the terminal is awakened or not is determined according to the first confidence coefficient and the second confidence coefficient. That is, the present disclosure may determine whether to wake up the terminal according to the first confidence level of the target period and the second confidence level of the history period, so that the probability that the terminal is erroneously or missed to wake up may be reduced, and thus the accuracy of the speech recognition system may be improved.

FIG. 3 is a flow chart illustrating another wake-up control method, as shown in FIG. 3, according to an exemplary embodiment, including:

S301, collecting multiple paths of audio data.

S302, selecting one microphone of the microphone array of the terminal as a reference channel.

It should be noted that, while the terminal collects the multiple paths of audio data through the collection module, the terminal may also output audio data, for example, the terminal is playing music, playing video, playing ring tones, etc., where the multiple paths of audio data collected by the terminal may also include the audio data output by the terminal. When the terminal performs voice recognition, voice input by a user needs to be extracted from the collected audio data, so as shown in fig. 2, one microphone in the microphone array of the terminal can be used as a reference channel, and the audio data output by the terminal can be obtained through the reference channel.

S303, acquiring the reference audio data acquired by the reference channel in the target time period.

In this step, the terminal may acquire the reference audio data output by the terminal in real time through the reference channel while acquiring the multiple audio data, so that the terminal acquires the first audio data of the target time period, and then synchronously acquires the reference audio data of the target time period.

S304, according to the reference audio data, signal processing is carried out on each path of the first audio data through a plurality of signal processing modes, so as to obtain a plurality of first target audio data.

The signal processing modes of the first audio data of different paths are different, and the signal processing modes can comprise blind source separation or noise suppression.

In this step, after the multiple paths of first audio data acquired in the target time period are acquired, signal processing may be performed on the multiple paths of first audio data according to the reference audio data by using the multiple signal processing modes preset by the terminal, and noise in the first audio data is filtered, so as to obtain speech in the first audio data, that is, first target audio data. For example, if the first audio data includes two paths, the signal processing manner includes two paths of blind source separation and noise suppression, signal processing may be performed on the two paths of first audio data according to the reference audio data by the blind source separation manner and the noise suppression manner, so as to obtain two first target audio data. Because the reference audio data output by the terminal in the target period is synchronously acquired, when the first audio data is subjected to signal processing, partial noise in the first audio data can be filtered according to the reference audio data, so that the complexity of signal processing can be simplified, the wake-up delay of the terminal can be reduced, and the user experience can be improved.

S305, determining a signal processing mode corresponding to each first target audio data in the plurality of first target audio data.

Wherein, the first audio data of different ways corresponds to different signal processing modes.

In this step, after each path of first audio data is collected, signal processing is performed on the path of first audio data by using a corresponding signal processing mode, so as to obtain first target audio data corresponding to the path of first audio data, and therefore, one signal processing mode is corresponding to each first target audio data. After the plurality of first target audio data are obtained, a signal processing mode corresponding to each first target audio data can be determined for each first target audio data.

S306, determining a target decoder corresponding to the first target audio data according to the signal processing mode.

Wherein different signal processing manners correspond to different decoders, parameters of the different decoders may be different, for example, the parameters may be determined according to types of signal processing manners, and different parameters may be set for different types of signal processing manners, which is not limited in the present disclosure.

In this step, after determining the signal processing manner corresponding to the first target audio data, a decoder corresponding to the signal processing manner may be determined from a plurality of decoders through a preset decoder association relationship, where the decoder association relationship may include correspondence between different signal processing manners and decoders, and the decoder corresponding to the signal processing manner is taken as the target decoder.

S307, inputting the first target audio data into the target decoder for decoding processing, and outputting the first confidence coefficient of the first target audio data.

In this step, after the target decoder corresponding to each first target audio data is obtained, the first target audio data may be input into the target decoder for the first target audio data, and the first target audio data is decoded by the target decoder to obtain a first confidence coefficient of the first target audio data.

S308, acquiring a second confidence coefficient of the second target audio data in the historical time period.

The historical time period is a preset time period before the target time period, the historical time period may be a time period belonging to the same scene as the target time period, the duration of the preset time period may be set according to the type of the terminal, and the duration of the preset time period may also be set according to a test experience value, which is not limited in this disclosure.

The second confidence level may include confidence levels of a plurality of second target audio data, which are audio data after signal processing of the second audio data. The method for obtaining the second confidence coefficient may refer to the method for obtaining the first confidence coefficient, which is not described herein, and in addition, the second confidence coefficient may be stored in the terminal, for example, if the preset time period is 1 minute, the second confidence coefficient of the second target audio data 1 minute before the target time period may be stored. For example, the present disclosure may store the second confidence in a queue, and the different second confidence may correspond to different queues, e.g., if the second confidence includes 10, the 10 second confidence may be stored using 10 queues. For each queue, the second confidence coefficient of the head of the queue is the second confidence coefficient of the second target audio data acquired earliest, the second confidence coefficient of the tail of the queue is the second confidence coefficient of the second target audio data acquired latest, when the new second confidence coefficient needs to be stored, the second confidence coefficient of the head of the queue can be deleted, and the new second confidence coefficient is stored at the tail of the queue.

S309, determining the target confidence degree from the second confidence degree.

Wherein the target confidence and the first confidence are confidence levels obtained by decoding by the same decoder.

In this step, before determining the target confidence, determining a target decoder corresponding to the first confidence, and determining, from the second confidence, a target confidence identical to the target decoder corresponding to the first confidence according to the target decoder corresponding to the first confidence.

S310, acquiring a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and the third confidence coefficient.

Wherein the third confidence level includes other confidence levels in the second confidence level than the target confidence level.

In this step, after the target confidence coefficient is obtained, the third confidence coefficient may be determined according to the target confidence coefficient, and then a confidence coefficient difference between the target confidence coefficient and the third confidence coefficient may be obtained, and a weight value corresponding to the first confidence coefficient may be obtained according to the confidence coefficient difference and a preset corresponding relationship. If the third confidence coefficient comprises a confidence coefficient, the confidence coefficient difference value between the target confidence coefficient and the third confidence coefficient can be directly calculated; if the third confidence coefficient includes a plurality of confidence coefficients, an average confidence coefficient of the plurality of third confidence coefficients may be obtained first, and then a confidence coefficient difference value between the target confidence coefficient and the average confidence coefficient may be calculated.

In one possible implementation manner, the preset corresponding relationship may be a preset weight value relationship, where the weight value relationship includes a corresponding relationship between the confidence coefficient difference and the weight value, and after the confidence coefficient difference between the target confidence coefficient and the third confidence coefficient is obtained, the weight value corresponding to the confidence coefficient difference may be determined through the weight value relationship. The weight relationship may be empirically set, for example, the weight may be 1.1 when the confidence difference is 0.15, and 0.8 when the confidence difference is-0.2.

In another possible implementation manner, the preset correspondence of the weight value of any one decoder may be the following calculation formula:

ratio＝1+(A_smooth-B_smooth)/a (1)

where ratio is the weight, a _smooth is the target confidence, B _smooth is the third confidence, and a is a predetermined constant.

After obtaining the confidence coefficient difference between the target confidence coefficient and the third confidence coefficient, calculating according to the confidence coefficient difference and a preset constant through a formula (1) to obtain a weight value corresponding to the first confidence coefficient.

S311, obtaining a final confidence coefficient according to the weight value and the first confidence coefficient.

In some embodiments of this step, after the weight value is obtained, the final confidence level may be obtained by multiplying the first confidence level by the weight value. In other embodiments, the final confidence level may also be a sum, difference, or division of the first confidence level and the weight value. The present disclosure is not limited to how to obtain the final confidence according to the weight value and the first confidence, and may be set according to different needs.

When the target confidence is greater than the third confidence, the final confidence is greater than the first confidence of the target decoder output, and when the target confidence is less than the third confidence, the final confidence is less than the first confidence of the target decoder output. In this way, the first confidence coefficient can be corrected through the second confidence coefficient of the historical time period, so that more accurate confidence coefficient can be obtained, and the accuracy of the voice recognition system of the terminal is improved.

S312, determining whether to wake up the terminal according to the final confidence.

In this step, after the final confidence coefficient is obtained, a preset confidence coefficient threshold value may be obtained first, the final confidence coefficient and the confidence coefficient threshold value are compared, and when the final confidence coefficient is greater than or equal to the preset confidence coefficient threshold value, the terminal is determined to be awakened. The preset confidence threshold may be determined according to the type of the terminal, for example, for a terminal with a higher requirement on the wake-up rate but a lower requirement on the false alarm rate, a lower preset confidence threshold may be set, for example, 0.7, for a terminal with a lower requirement on the wake-up rate but a higher requirement on the false alarm rate, a higher preset confidence threshold may be set, for example, 0.9, or according to the requirements of the terminal on the wake-up rate and the false alarm rate, and the test determines that the setting mode of the preset confidence threshold is not limited in the disclosure.

It should be noted that, the first confidence in the above steps S308 to S312 may be any one of a plurality of first confidence, but it is considered that after the first audio data in the target period is acquired, the first audio data may be subjected to signal processing by a plurality of signal processing methods to obtain a plurality of first target audio data, and then the plurality of first target audio data may be input to a plurality of target decoders to obtain the first confidence of the plurality of first target audio data. Since the time taken for the plurality of signal processing manners to perform signal processing on the first audio data is different, the time taken for obtaining each first target audio data is different, so that the time taken for each first target audio data to be input into the target decoder is also different, and the time taken for each target decoder to perform decoding processing on the first target audio data is also different, so that finally the time taken for each target decoder to output the first confidence of the first target audio data is also different.

For the above reasons, if the output of the arbitrarily selected first confidence coefficient is slow, the wake-up delay time is relatively long, and the user experience is affected. Therefore, in order to avoid that the wake-up delay time is too long, which results in too bad user experience, in case that the first confidence level of one target decoder output is obtained, the following wake-up processing manner may be performed according to the first target confidence level of the target decoder output until the terminal is waken up, or the wake-up processing manner may be performed according to the first confidence levels of a plurality of target decoder outputs.

The wake-up processing mode comprises the following steps: and determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.

For example, if the terminal includes the target decoder a, the target decoder B, and the target decoder C, the wake-up processing manner may be executed according to the first confidence level output by the target decoder B when the target decoder B outputs the first confidence level. If the terminal is determined to be awakened after the awakening processing mode is executed, stopping executing the awakening processing mode, and directly awakening the terminal; if the terminal is determined not to be awakened after the awakening processing mode is executed, the first confidence coefficient output by the next target decoder can be obtained, if the first confidence coefficient output by the next target decoder is the target decoder A, the awakening processing mode can be continuously executed according to the first confidence coefficient output by the target decoder A, if the terminal is determined to be awakened according to the awakening processing mode, the execution of the awakening processing mode is stopped, and the terminal can be directly awakened; if the terminal is determined not to be awakened after the awakening processing mode is executed, the first confidence coefficient output by the target decoder C can be obtained, the awakening processing mode is continuously executed according to the first confidence coefficient output by the target decoder C, and whether the terminal is awakened is determined. Therefore, the terminal can be determined whether to wake up according to the first confidence coefficient which is output first without waiting for the first confidence coefficient which is output by the specific target decoder, so that the wake-up delay time can be shortened, and the user experience can be improved.

By adopting the method, the target confidence coefficient can be determined from the second confidence coefficient according to the first confidence coefficient output by the target decoder, the weight value corresponding to the first confidence coefficient is obtained according to the target confidence coefficient and the third confidence coefficient, the final confidence coefficient is determined according to the weight value and the first confidence coefficient, and whether the terminal is awakened or not is determined according to the target confidence coefficient, so that the first confidence coefficient can be adjusted according to the target confidence coefficient and the third confidence coefficient, and further more accurate final confidence coefficient can be obtained, and the accuracy of a voice recognition system of the terminal is higher; in addition, after the first confidence coefficient output by one target decoder is obtained, the wake-up processing mode is executed, so that the wake-up efficiency of the terminal can be improved, and the user experience is improved.

Fig. 4 is a schematic structural view of a wake-up control device according to an exemplary embodiment, and as shown in fig. 4, the device includes:

a data acquisition module 401 configured to acquire multiple paths of audio data;

the signal processing module 402 is configured to perform signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data;

A first confidence acquiring module 403 configured to acquire first confidence degrees of a plurality of first target audio data, respectively, where the confidence degrees are used to characterize a probability that the audio data can wake up a terminal;

a second confidence acquiring module 404 configured to acquire a second confidence of the second target audio data in a historical period, the historical period being a preset period before the target period, the second confidence including a plurality of confidence of the second target audio data;

a wake-up module 405 configured to determine whether to wake up the terminal according to the first confidence level and the second confidence level.

Optionally, the signal processing module 402 includes:

a channel selection sub-module configured to select one microphone of the microphone array of the terminal as a reference channel;

A reference data acquisition sub-module configured to acquire reference audio data acquired by the reference channel in the target time period;

The signal processing sub-module is configured to respectively perform signal processing on each path of first audio data in a plurality of signal processing modes according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data in different paths are different.

Optionally, the first confidence acquiring module 403 includes:

a processing mode determining sub-module configured to determine, for each of a plurality of first target audio data, a signal processing mode corresponding to the first target audio data;

a decoder determining sub-module configured to determine a target decoder corresponding to the first target audio data according to the signal processing manner, wherein different signal processing manners correspond to different decoders;

The confidence determining submodule is configured to input the first target audio data into the target decoder for decoding processing and output a first confidence of the first target audio data.

Optionally, the decoder determination submodule is configured to:

Determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relation, wherein the decoder association relation comprises the corresponding relation between different signal processing modes and the decoder;

And taking the decoder corresponding to the signal processing mode as the target decoder.

Optionally, the wake-up module 405 includes:

A wake-up processing sub-module configured to execute, in a case where the first confidence coefficient output by one target decoder is obtained, the following wake-up processing manner according to the first target confidence coefficient output by the target decoder until the terminal is waken up, or execute the wake-up processing manner according to the first confidence coefficients output by a plurality of target decoders;

The wake-up processing mode comprises the following steps:

And determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.

Optionally, the wake-up module includes:

A target confidence determining submodule configured to determine a target confidence from the second confidence, the target confidence and the first confidence being confidence decoded by the same decoder;

the weight value determining submodule is configured to acquire a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and a third confidence coefficient, wherein the third confidence coefficient comprises other confidence coefficients except the target confidence coefficient in the second confidence coefficient;

The final confidence coefficient obtaining sub-module is configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient;

And the wake-up sub-module is configured to determine whether to wake up the terminal according to the final confidence.

Optionally, the weight value determination submodule is further configured to:

Acquiring a confidence coefficient difference value between the target confidence coefficient and the third confidence coefficient;

And acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.

Optionally, the wake-up sub-module is further configured to:

and determining to wake up the terminal under the condition that the final confidence coefficient is greater than or equal to a preset confidence coefficient threshold value.

Optionally, the signal processing means comprises blind source separation or noise suppression.

Through the device, the plurality of first target audio data are obtained by respectively carrying out signal processing on each path of first audio data acquired in the target time period, and the first confidence coefficient of the plurality of first target audio data is respectively obtained, and then, the second confidence coefficient of the second target audio data in the history time period can be obtained, and whether the terminal is awakened or not is determined according to the first confidence coefficient and the second confidence coefficient. That is, the present disclosure may determine whether to wake up the terminal according to the first confidence level of the target period and the second confidence level of the history period, so that the probability that the terminal is erroneously or missed to wake up may be reduced, and thus the accuracy of the speech recognition system may be improved.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the wake-up control method provided by the present disclosure.

Fig. 5 is a block diagram of a terminal 500, according to an example embodiment. For example, the terminal 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.

Referring to fig. 5, a terminal 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the terminal 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 to execute instructions to perform all or part of the steps of the wake-up control method described above. Further, the processing component 502 can include one or more modules that facilitate interactions between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the terminal 500. Examples of such data include instructions for any application or method operating on the terminal 500, contact data, phonebook data, messages, pictures, videos, and the like. The memory 504 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 506 provides power to the various components of the terminal 500. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 500.

The multimedia component 508 includes a screen between the terminal 500 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 500 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the terminal 500 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 504 or transmitted via the communication component 516. In some embodiments, the audio component 510 further comprises a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 514 includes one or more sensors for providing status assessment of various aspects of the terminal 500. For example, the sensor assembly 514 may detect the on/off state of the terminal 500, the relative positioning of the components, such as the display and keypad of the terminal 500, the sensor assembly 514 may also detect a change in position of the terminal 500 or a component of the terminal 500, the presence or absence of user contact with the terminal 500, the orientation or acceleration/deceleration of the terminal 500, and a change in temperature of the terminal 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the terminal 500 and other devices, either wired or wireless. The terminal 500 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 516 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the wake-up control method described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 504, including instructions executable by processor 520 of terminal 500 to perform the wake-up control method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned wake-up control method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A wake-up control method, the method comprising:

Collecting multiple paths of audio data;

Respectively carrying out signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data;

Respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal;

Acquiring a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises a plurality of confidence coefficients of the second target audio data;

Determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient;

Wherein the determining whether to wake up the terminal according to the first confidence level and the second confidence level comprises:

Determining a target confidence coefficient from the second confidence coefficient, wherein the target confidence coefficient and the first confidence coefficient are confidence coefficients obtained by decoding by the same decoder;

Acquiring a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and a third confidence coefficient, wherein the third confidence coefficient comprises other confidence coefficients except the target confidence coefficient in the second confidence coefficient;

determining a final confidence level according to the weight value and the first confidence level;

and determining whether to wake up the terminal according to the final confidence level.

2. The method of claim 1, wherein the performing signal processing on each path of the first audio data acquired in the target time period to obtain a plurality of first target audio data includes:

selecting one microphone of a microphone array of the terminal as a reference channel;

Acquiring reference audio data acquired by the reference channel in the target time period;

And respectively carrying out signal processing on each path of first audio data through a plurality of signal processing modes according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data of different paths are different.

3. The method according to claim 1 or 2, wherein the obtaining first confidence levels of the plurality of first target audio data, respectively, comprises:

Determining a signal processing mode corresponding to each of the first target audio data in the plurality of first target audio data;

Determining a target decoder corresponding to the first target audio data according to the signal processing mode, wherein different signal processing modes correspond to different decoders;

And inputting the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.

4. A method according to claim 3, wherein said determining a target decoder corresponding to the first target audio data according to the signal processing mode comprises:

determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relation, wherein the decoder association relation comprises the corresponding relation between different signal processing modes and the decoders;

5. The method of claim 3, wherein the determining whether to wake the terminal based on the first confidence level and the second confidence level comprises:

executing the following wake-up processing mode according to the first confidence coefficient output by the target decoder under the condition that the first confidence coefficient output by one target decoder is obtained, until the terminal is waken up, or executing the wake-up processing mode according to the first confidence coefficient output by a plurality of target decoders;

The wake-up processing mode comprises the following steps:

6. The method of claim 1, wherein the obtaining a weight value corresponding to the first confidence level according to the target confidence level and the third confidence level comprises:

7. The method of claim 1, wherein the determining whether to wake the terminal based on the final confidence comprises:

8. The method of claim 1, wherein the signal processing means comprises blind source separation or noise suppression.

9. A wake-up control device, the device comprising:

the data acquisition module is configured to acquire multiple paths of audio data;

The signal processing module is configured to respectively perform signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data;

the first confidence coefficient acquisition module is configured to acquire first confidence coefficients of a plurality of first target audio data respectively, wherein the confidence coefficients are used for representing the probability that the audio data can wake up a terminal;

The second confidence coefficient acquisition module is configured to acquire a second confidence coefficient of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence coefficient comprises a plurality of confidence coefficients of the second target audio data;

A wake-up module configured to determine whether to wake up the terminal according to the first confidence level and the second confidence level;

wherein, the wake-up module comprises:

The weight value determining submodule is configured to obtain a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and a third confidence coefficient, wherein the third confidence coefficient comprises other confidence coefficients except the target confidence coefficient in the second confidence coefficient;

a final confidence coefficient obtaining sub-module configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient;

10. The apparatus of claim 9, wherein the signal processing module comprises:

A channel selection sub-module configured to select one microphone of a microphone array of the terminal as a reference channel;

And the signal processing sub-module is configured to respectively perform signal processing on each path of the first audio data in a plurality of signal processing modes according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data in different paths are different.

11. The apparatus of claim 9 or 10, wherein the first confidence acquisition module comprises:

a processing mode determining sub-module configured to determine, for each of the plurality of first target audio data, a signal processing mode corresponding to the first target audio data;

the confidence determining submodule is configured to input the first target audio data into the target decoder for decoding processing to obtain a first confidence of the first target audio data.

12. The apparatus of claim 11, wherein the decoder determination submodule is configured to:

13. The apparatus of claim 11, wherein the wake-up module comprises:

A wake-up processing sub-module configured to execute, in a case where the first confidence level output by one of the target decoders is obtained, the following wake-up processing manner according to the first target confidence level output by the target decoder until the terminal is waken up, or execute the wake-up processing manner according to the first confidence levels output by a plurality of the target decoders;

The wake-up processing mode comprises the following steps:

14. The apparatus of claim 9, wherein the weight value determination submodule is further configured to:

15. The apparatus of claim 9, wherein the wake-up sub-module is further configured to:

16. The apparatus of claim 9, wherein the signal processing means comprises blind source separation or noise suppression.

17. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-8.

18. A terminal, comprising:

A memory having a computer program stored thereon;

A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-8.