CN109065046A

CN109065046A - Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up

Info

Publication number: CN109065046A
Application number: CN201811006300.5A
Authority: CN
Inventors: 李深; 胡亚光
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2018-12-21

Abstract

The embodiment of the invention provides method, apparatus, electronic equipment and computer readable storage mediums that a kind of voice wakes up, are applied to technical field of voice recognition.This method comprises: extracting spectrum signature information in user speech from collecting, then by spectrum signature information input to the first keyword detection model, obtain corresponding first confidence level of spectrum signature information, if corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, spectrum signature information and corresponding first confidence level of spectrum signature information are then input to the second keyword detection model, obtain testing result, first confidence threshold value is the corresponding confidence threshold value of the first keyword detection model, it is then based on testing result, determine whether to execute voice wake operation.The embodiment of the present invention, which realizes, reduces the computing cost that keyword detection is carried out to user speech.

Description

Voice wake-up method and device, electronic equipment and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of voice recognition, in particular to a voice awakening method and device, electronic equipment and a computer readable storage medium.

Background

With the development of information technology, voice recognition technology has also developed, and products using voice recognition, such as conversation assistants, smart robots, smart watches, and the like, are increasing. These products all enhance the user experience and improve the level of natural human-computer interaction through speech recognition. In speech recognition, keyword detection is a very important technique, and can also be called voice wakeup in general.

In the prior art, a voice awakening mode is to perform keyword detection on collected user voice through a preset keyword detection model, and when a target keyword exists in the collected user voice, voice awakening is realized.

However, in the process of the invention creation, the inventor finds that, when voice wakeup is implemented through the existing preset keyword detection model, since all voices of users need to be subjected to keyword detection by the preset keyword detection model to determine whether to execute voice wakeup operation, since the existing preset keyword detection model is relatively complex, the calculation amount of keyword detection on the voices of users is relatively large, and thus the calculation amount overhead is relatively large.

Disclosure of Invention

The embodiment of the invention provides a voice awakening method and device, electronic equipment and a computer readable storage medium, which are used for solving the problem of high calculation cost of keyword detection on user voice.

In order to solve the above problems, embodiments of the present invention mainly provide the following technical solutions:

in a first aspect, a method for voice wake-up is provided, where the method includes:

extracting frequency spectrum characteristic information from the collected user voice;

inputting the frequency spectrum characteristic information into a first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information;

if the first confidence degree corresponding to the frequency spectrum characteristic information is not smaller than a first confidence degree threshold value, inputting the frequency spectrum characteristic information and the first confidence degree corresponding to the frequency spectrum characteristic information into a second keyword detection model to obtain a detection result, wherein the first confidence degree threshold value is a confidence degree threshold value corresponding to the first keyword detection model;

and determining whether to execute voice wakeup operation based on the detection result.

In a second aspect, an apparatus for voice wake-up is provided, the apparatus comprising:

the extraction module is used for extracting frequency spectrum characteristic information from the collected user voice;

the first input module is used for inputting the frequency spectrum characteristic information extracted by the extraction module into a first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information;

the second input module is used for inputting the frequency spectrum feature information extracted by the extraction module and the first confidence coefficient corresponding to the frequency spectrum feature information into a second keyword detection model to obtain a detection result when the first confidence coefficient corresponding to the frequency spectrum feature information is not smaller than a first confidence coefficient threshold value, and the first confidence coefficient threshold value is a confidence coefficient threshold value corresponding to the first keyword detection model;

and the determining module is used for determining whether to execute the voice awakening operation or not based on the detection result.

In a third aspect, an electronic device is provided, which includes:

at least one processor;

and at least one memory, bus connected with the processor; wherein,

the processor and the memory complete mutual communication through the bus;

the processor is configured to call the program instructions in the memory to perform the voice wake-up method shown in the first aspect.

In a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of voice wake-up of the first aspect.

The technical scheme provided by the embodiment of the invention at least has the following advantages:

compared with the prior art that voice awakening is realized through the existing preset keyword detection model, the method comprises the steps of extracting frequency spectrum characteristic information from collected user voice, inputting the frequency spectrum characteristic information into the first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information, inputting the frequency spectrum characteristic information and the first confidence coefficient corresponding to the frequency spectrum characteristic information into the second keyword detection model to obtain a detection result if the first confidence coefficient corresponding to the frequency spectrum characteristic information is not smaller than a first confidence coefficient threshold value, wherein the first confidence coefficient threshold value is a confidence coefficient threshold value corresponding to the first keyword detection model, and then determining whether voice awakening operation is executed or not based on the detection result. In the embodiment of the invention, after part of the user voice passes through the first keyword detection model, the voice awakening operation can be determined not to be executed, and key detection is not required to be carried out through the second keyword detection model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below.

Fig. 1 is a schematic flow chart of a voice wake-up method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a voice wake-up apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of another voice wake-up apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device awakened by voice according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the prior art, keyword detection needs to be performed through an existing keyword detection model during voice awakening, when the false awakening rate is low in the prior art, the structure of the key keyword detection model needs to be particularly complex, and the calculation mode needs to be particularly complex.

The voice wake-up method, device, electronic device and computer-readable storage medium provided by the embodiments of the present invention are directed to solving the above technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Example one

An embodiment of the present invention provides a voice wake-up method, as shown in fig. 1, the method includes:

and S101, extracting frequency spectrum characteristic information from the collected user voice.

For the embodiment of the invention, the voice of the user within a period of time is collected, and then the spectrum characteristic information is extracted from the collected voice of the user within a period of time.

For the embodiment of the invention, the voice wake-up method can be operated on the electronic equipment and can receive the voice sent by the user. In the embodiment of the invention, when the electronic equipment is in the working state, the surrounding sound can be monitored in real time, so that the voice information of the user is received.

For the embodiment of the invention, the voice awakening method is operated in the way that the electronic equipment can receive the voice information from the terminal which is used by the user for voice interaction in a wired connection mode or a wireless connection mode. It should be noted that the wireless connection manners may include, but are not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection manners.

For the embodiment of the present invention, the manner of extracting the spectral feature information from the user speech is common knowledge in the art, and is not described herein again.

Step S102, inputting the frequency spectrum characteristic information into a first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information.

For the embodiment of the present invention, the first keyword detection model may be a neural network with a relatively simple structure. In the embodiment of the present invention, the first keyword detection model is used to determine a relationship between a first confidence degree corresponding to the input spectral feature information and a first confidence degree threshold.

For the embodiments of the present invention, the confidence level is used to characterize the probability that the user voice is a wake-up voice for waking up the electronic device. In the embodiment of the present invention, the first confidence level is used to characterize a probability that the first keyword detection model detects that the user voice is a wake-up voice for waking up the electronic device.

Step S103, if the first confidence degree corresponding to the frequency spectrum characteristic information is not smaller than the first confidence degree threshold value, inputting the frequency spectrum characteristic information and the first confidence degree corresponding to the frequency spectrum characteristic information into a second keyword detection model to obtain a detection result.

And the first confidence coefficient threshold is a confidence coefficient threshold corresponding to the first keyword detection model.

For the embodiment of the invention, the detection dimensionalities of the second keyword detection model and the first keyword detection model on the frequency spectrum information are different. In the embodiment of the present invention, the second keyword detection model may be a neural network, and the structure of the second keyword detection model is more complex than that of the first keyword detection model.

And step S104, determining whether to execute voice awakening operation or not based on the detection result.

The embodiment of the invention provides a voice awakening method, compared with the prior art that voice awakening is realized through a preset keyword detection model, the embodiment of the invention extracts frequency spectrum characteristic information from collected user voice, then inputs the frequency spectrum characteristic information into a first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information, if the first confidence coefficient corresponding to the frequency spectrum characteristic information is not smaller than a first confidence coefficient threshold value, the frequency spectrum characteristic information and the first confidence coefficient corresponding to the frequency spectrum characteristic information are input into a second keyword detection model to obtain a detection result, the first confidence coefficient threshold value is a confidence coefficient threshold value corresponding to the first keyword detection model, and then whether voice awakening operation is executed or not is determined based on the detection result. In the embodiment of the invention, after part of the user voice passes through the first keyword detection model, the voice awakening operation can be determined not to be executed, and key detection is not required to be carried out through the second keyword detection model.

Example two

The embodiment of the invention provides another possible implementation manner, and on the basis of the first embodiment, the method shown in the second embodiment is further included, wherein,

step S104 includes step S1041 (not shown) and step S1042 (not shown), wherein,

step S1041, if the second confidence corresponding to the spectrum feature information indicated in the detection result is not less than the second confidence threshold, determining to execute a voice wakeup operation.

For the embodiment of the present invention, the second confidence is a probability representing that the voice of the user detected by the second keyword detection model is the wake-up voice of the electronic device.

For the embodiment of the present invention, if the second confidence corresponding to the indication spectrum feature information in the detection result is not less than the second confidence threshold, that is, the probability that the voice of the user is the voice awakening voice of the electronic device is represented to be higher, it is determined to perform voice awakening.

Step S1042, if the second confidence corresponding to the spectrum feature information indicated in the detection result is smaller than the second confidence threshold, determining not to execute the voice wakeup operation.

And the second confidence coefficient threshold is a confidence coefficient threshold corresponding to the second keyword detection model.

For the embodiment of the invention, if the detection result indicates the second confidence coefficient threshold value of the fish corresponding to the spectral feature information, the probability that the voice of the user is represented as the awakening voice is low, and the voice awakening operation is determined not to be executed.

EXAMPLE III

Another possible implementation manner of the embodiment of the present invention further includes, on the basis of the operation shown in the second embodiment, the operation shown in the third embodiment, wherein,

step S102 further includes a step SA (not shown), wherein,

and step SA, training a first keyword detection model and a second keyword detection model.

For the embodiment of the invention, the method for realizing voice awakening needs to utilize a first keyword detection model and a second keyword detection model. In the embodiment of the present invention, before performing keyword detection using the first keyword detection model and the second keyword detection model, a large number of training samples are required to train the first keyword detection model and the second keyword detection model.

For the embodiment of the present invention, step SA may specifically include: training a first keyword detection model and a second keyword detection model in a offline training mode; and/or training the first keyword detection model and the second keyword detection model in an online learning mode.

For the embodiment of the present invention, in the step SA, the first keyword detection model and the second keyword detection model may be simultaneously performed, or the first keyword detection model may be trained first, and then the second keyword detection model may be trained, or the second keyword detection model may be trained first, and then the first keyword detection model may be trained. The present invention is not limited to the embodiments.

Example four

Another possible implementation manner of the embodiment of the present invention further includes, on the basis of the operation shown in the third embodiment, the operation shown in the fourth embodiment, wherein,

the method for training the first keyword detection model in the step SA includes: step SA1 (not shown) and step SA2 (not shown), wherein,

step SA1, first sample information is acquired.

Wherein the first sample information includes: and whether the first confidence degree corresponding to the at least one first sample spectrum information and each first sample spectrum information is not less than the labeling information of the first confidence degree threshold value or not.

For the embodiment of the present invention, the annotation information may include a first identifier and a second identifier, where the first identifier may represent that a first confidence corresponding to the first sample spectrum information is not less than a first confidence threshold, and the second identifier is used to represent that a first confidence corresponding to the first sample spectrum information is less than the first confidence threshold.

For example, the first identifier may be "0" and the second identifier may be "1".

Step SA2, training a first keyword detection model based on the first sample information.

Specifically, the method for training the second keyword detection model in step SA includes: step SA3 (not shown) and step SA4 (not shown), wherein,

and step SA3, obtaining second sample information.

Wherein the second sample information includes: labeling information, which indicates whether a second confidence degree corresponding to at least one second sample spectrum information group and any second sample spectrum information group is not less than a second confidence degree threshold value; any second sample spectral information group includes: the second sample frequency spectrum information and the second frequency spectrum information respectively correspond to first confidence coefficients; any second sample spectrum information is sample spectrum information of which the first confidence coefficient is not less than the first confidence coefficient threshold value.

For the embodiment of the present invention, the annotation information may include a third identifier and a fourth identifier, where the third identifier may represent that the second confidence corresponding to the second sample spectrum information group is not less than the second confidence threshold, and the fourth identifier is used to represent that the second confidence corresponding to the second sample spectrum information group is less than the second confidence threshold.

For the embodiment of the present invention, the third identifier is different from the fourth identifier, the third identifier may be the same as the first identifier or the second identifier, and the fourth identifier may be the same as the first identifier or the second identifier.

For the embodiment of the present invention, the first sample spectrum information and the second sample spectrum information may be the same or different. The present invention is not limited to the embodiments.

For the embodiment of the present invention, if the first sample spectrum information is the same as the second sample spectrum information, the first confidence corresponding to the second sample spectrum information is the confidence obtained after the second spectrum information passes through the first keyword detection model.

Step SA4, training a second keyword detection model based on the second sample information.

For the embodiment of the invention, the first keyword detection model and the second keyword detection model are respectively trained through a large amount of sample information (including the first sample information and the second sample information), so that the trained first keyword detection model and the trained second keyword detection model can be obtained.

EXAMPLE five

Another possible implementation manner of the embodiment of the present invention further includes, on the basis of the third embodiment or the fourth embodiment, the operation shown in the fifth embodiment, wherein,

step SA is preceded by step SB (not shown), wherein,

and step SB, configuring a first confidence coefficient threshold value and a second confidence coefficient threshold value.

Wherein the first confidence threshold is less than the second confidence threshold.

For the embodiment of the invention, because the first confidence threshold is the confidence threshold corresponding to the first keyword detection model, the second confidence threshold is the confidence threshold corresponding to the second keyword detection model, and the second keyword detection model performs secondary verification on the frequency spectrum information passing through the first keyword detection model, the confidence threshold corresponding to the second keyword detection model has the effect that the secondary verification can be performed only when the confidence threshold corresponding to the second keyword detection model is higher than the confidence threshold corresponding to the first keyword detection model.

For the embodiment of the present invention, the first confidence threshold and the second confidence threshold may be set by a user, or may be set by an operator of the voice wakeup application. The present invention is not limited to the embodiments.

EXAMPLE six

Another possible implementation manner of the embodiment of the present invention further includes, on the basis of the first embodiment, the operations shown in the sixth embodiment, wherein,

step S102 is followed by step SC (not shown), wherein,

and step SC, if the first confidence corresponding to the frequency spectrum characteristic information is smaller than a first confidence threshold, determining not to execute voice awakening operation.

For the embodiment of the invention, if the first confidence coefficient corresponding to the frequency spectrum characteristic information obtained by the first keyword detection model is smaller than the first confidence coefficient threshold value, the voice awakening operation is determined not to be executed, and the frequency spectrum characteristic information and the first confidence coefficient corresponding to the frequency spectrum characteristic information do not need to be input into the second keyword detection model for secondary detection, so that the calculation pressure can be reduced.

EXAMPLE seven

Another possible implementation manner of the embodiment of the present invention further includes, on the basis of the first embodiment, the operations shown in the seventh embodiment, wherein,

the method further comprises a step SD (not shown), wherein,

and step SD, if the first keyword detection model is configured in the local area, the second keyword detection model is configured in the cloud end, and it is detected that the terminal equipment is not connected to the cloud end currently, whether voice awakening operation is executed or not is determined according to a first confidence coefficient corresponding to the frequency spectrum characteristic information and a first confidence coefficient threshold value.

For the embodiment of the present invention, the first keyword detection model and the second keyword detection model may be both configured in the local, the first keyword detection model and the second keyword detection model may be both configured in the cloud, the first keyword detection model may be configured in the local, and the second keyword detection model may be configured in the cloud. The present invention is not limited to the embodiments.

For the embodiment of the present invention, whether the first keyword detection model and the second keyword detection model are configured locally or in the cloud is determined by the computing power of the terminal device and the storage space of the terminal device.

For the embodiment of the present invention, when the first keyword detection model is configured in the local area, the second keyword detection model is configured in the cloud, and the terminal device is not currently connected to the cloud device, that is, the terminal device cannot send the spectrum feature information and the first confidence corresponding to the spectrum feature information to the cloud device, it may be determined whether to execute the wakeup operation only according to the detection result of the first keyword detection model.

For the embodiment of the invention, when the detection result of the first keyword detection model indicates that the first confidence coefficient of the frequency spectrum characteristic information is smaller than the first confidence coefficient threshold value, the voice awakening operation is determined not to be executed; and when the detection result of the first keyword detection model indicates that the first confidence coefficient of the frequency spectrum characteristic information is not less than the first confidence coefficient threshold value, determining to execute voice awakening operation.

For the embodiment of the invention, when the first keyword detection model is configured in the local area and the second keyword detection model is configured in the cloud, and it is detected that the terminal device is not currently connected to the cloud, whether to execute the voice awakening operation can be determined according to the detection result of the first keyword detection model, so that the situation that the voice awakening operation cannot be executed due to the fact that the terminal device cannot be currently connected to the cloud is avoided, and the user experience can be further improved.

For the embodiment of the invention, when the first keyword detection model and the second keyword detection model are both configured in the cloud and the terminal device is not connected to the cloud currently, prompt information is output to prompt a user that the terminal device cannot be connected to the cloud currently.

Example eight

As shown in fig. 2, the voice wake-up apparatus 20 according to the embodiment of the present invention may include: an extraction module 201, a first input module 202, a second input module 203, and a determination module 204, wherein,

an extracting module 201, configured to extract spectral feature information from the collected user speech.

The first input module 202 is configured to input the spectrum feature information extracted by the extraction module 201 to the first keyword detection model, so as to obtain a first confidence corresponding to the spectrum feature information.

The second input module 203 is configured to, when the first confidence degree corresponding to the spectrum feature information is not smaller than the first confidence degree threshold, input the spectrum feature information extracted by the extraction module 201 and the first confidence degree corresponding to the spectrum feature information to the second keyword detection model, so as to obtain a detection result.

A determining module 204, configured to determine whether to perform a voice wakeup operation based on the detection result.

The embodiment of the invention provides a voice awakening device, compared with the prior art that voice awakening is realized through the existing preset keyword detection model, the embodiment of the invention extracts frequency spectrum characteristic information from collected user voice, then inputs the frequency spectrum characteristic information into the first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information, if the first confidence coefficient corresponding to the frequency spectrum characteristic information is not smaller than a first confidence coefficient threshold value, the frequency spectrum characteristic information and the first confidence coefficient corresponding to the frequency spectrum characteristic information are input into the second keyword detection model to obtain a detection result, the first confidence coefficient threshold value is a confidence coefficient threshold value corresponding to the first keyword detection model, and then whether voice awakening operation is executed or not is determined based on the detection result. In the embodiment of the invention, after part of the user voice passes through the first keyword detection model, the voice awakening operation can be determined not to be executed, and key detection is not required to be carried out through the second keyword detection model.

The voice wake-up apparatus of this embodiment can execute the voice wake-up method provided in the first embodiment of the present invention, and the implementation principles thereof are similar, and are not described herein again.

Example nine

As shown in fig. 3, another voice wake-up apparatus 30 according to an embodiment of the present invention includes: an extraction module 301, a first input module 302, a second input module 303, and a determination module 304, wherein,

an extracting module 301, configured to extract spectral feature information from the collected user speech.

Wherein, the extracting module 301 in fig. 3 has the same or similar function as the extracting module 201 in fig. 2.

The first input module 302 is configured to input the spectrum feature information extracted by the extraction module 301 to the first keyword detection model, so as to obtain a first confidence corresponding to the spectrum feature information.

The first input module 302 in fig. 3 has the same or similar function as the first input module 202 in fig. 2.

The second input module 303 is configured to, when the first confidence degree corresponding to the spectrum feature information is not smaller than the first confidence degree threshold, input the spectrum feature information extracted by the extraction module 301 and the first confidence degree corresponding to the spectrum feature information to the second keyword detection model, so as to obtain a detection result.

Wherein the second input module 303 in fig. 3 has the same or similar function as the second input module 203 in fig. 2.

A determining module 304, configured to determine whether to perform a voice wakeup operation based on the detection result.

Wherein the determining module 304 in fig. 3 has the same or similar function as the determining module 204 in fig. 2.

Specifically, the determining module 304 is specifically configured to determine to perform a voice wakeup operation when a second confidence degree corresponding to the spectrum feature information indicated in the detection result is not less than a second confidence degree threshold.

The determining module 304 is further specifically configured to determine not to perform the voice wakeup operation when the second confidence degree corresponding to the spectrum feature information indicated in the detection result is smaller than the second confidence degree threshold.

Further, as shown in fig. 3, the apparatus 30 may further include: a first training module 305 and a second training module 306, wherein the first training module 305 and the second training module 306 may be the same training module or two training modules. The present invention is not limited to the embodiments. In fig. 3, the first training module 305 and the second training module 306 are shown as two training modules, wherein,

a first training module 305 for training a first keyword detection model.

And a second training module 306, configured to train a second keyword detection model.

Specifically, the first training module 305 includes: a first obtaining unit 3051 and a first training unit 3052, wherein,

a first obtaining unit 3051, configured to obtain the first sample information.

The first training unit 3052 is configured to train a first keyword detection model based on the first sample information acquired by the first acquiring unit 3051.

Specifically, the second training module 306 includes: a second obtaining unit 3061 and a second training unit 3062, wherein,

a second obtaining unit 3061, configured to obtain second sample information.

For the embodiment of the present invention, when the first training module 305 and the second training module 306 are the same training module, the first obtaining unit 3051 and the second obtaining unit 3061 may be the same obtaining unit or two obtaining units. The present invention is not limited to the embodiments. Only the case where the first retrieving unit 3051 and the second retrieving unit 3061 are two retrieving units is shown in fig. 3.

A second training unit 3062, configured to train a second keyword detection model based on the second sample information acquired by the second acquisition unit 3061.

For the embodiment of the present invention, when the first training module 305 and the second training module 306 are the same training module, the first training unit 3052 and the second training unit 3062 may be the same training unit or two training units. The present invention is not limited to the embodiments. Only the case where the first training unit 3052 and the second training unit 3062 are two training units is shown in fig. 3.

Further, as shown in fig. 3, the apparatus 30 further includes: a first configuration module 307 and a second configuration module 308, wherein,

a first configuration module 307 for configuring a first confidence threshold.

A second configuration module 308 for configuring a second confidence threshold.

For the embodiment of the present invention, the first configuration module 307 and the second configuration module 308 may be the same configuration module or two configuration modules. The present invention is not limited to the embodiments.

Wherein, only the case that the first configuration module 307 and the second configuration module 308 are two configuration modules is shown in fig. 3.

In one possible implementation, the determining module 304 is further configured to determine not to perform the voice wakeup operation when the first confidence degree corresponding to the spectrum feature information is smaller than the first confidence degree threshold.

In a possible implementation manner, the determining module 304 is further configured to determine whether to perform a voice wakeup operation according to a first confidence level corresponding to the spectrum feature information and a first confidence level threshold when the first keyword detection model is configured in the local area and the second keyword detection model is configured in the cloud end and it is detected that the terminal device is not currently connected to the cloud end.

The voice wake-up apparatus of this embodiment can execute the voice wake-up method according to any one of the first to seventh embodiments of the present invention, and the implementation principles thereof are similar, and are not described herein again.

Example ten

An embodiment of the present invention provides an electronic device, and as shown in fig. 4, an electronic device 4000 shown in fig. 4 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002.

The processor 4001 is applied in the embodiment of the present invention, and is configured to implement the functions of the extracting module, the first input module, the second input module, and the determining module shown in fig. 2 or fig. 3, and/or the functions of the first training module, the second training module, the first configuring module, and the second configuring module shown in fig. 3.

Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. The processor 4001 is configured to execute application code stored in the memory 4003 to implement the actions of the voice wake-up apparatus provided by the embodiments shown in fig. 2 or fig. 3.

The electronic device in the embodiment of the invention can be a local terminal device and can also be a cloud device. And are not limited herein.

The embodiment of the invention provides voice awakening electronic equipment, compared with the prior art that voice awakening is realized through a preset keyword detection model, the embodiment of the invention extracts frequency spectrum characteristic information from collected user voice, then inputs the frequency spectrum characteristic information into a first keyword detection model to obtain a first confidence coefficient corresponding to the frequency spectrum characteristic information, if the first confidence coefficient corresponding to the frequency spectrum characteristic information is not smaller than a first confidence coefficient threshold value, the frequency spectrum characteristic information and the first confidence coefficient corresponding to the frequency spectrum characteristic information are input into a second keyword detection model to obtain a detection result, the first confidence coefficient threshold value is a confidence coefficient threshold value corresponding to the first keyword detection model, and then whether voice awakening operation is executed or not is determined based on the detection result. In the embodiment of the invention, after part of the user voice passes through the first keyword detection model, the voice awakening operation can be determined not to be executed, and key detection is not required to be carried out through the second keyword detection model.

The embodiment of the invention provides electronic equipment suitable for any one of the first embodiment to the seventh embodiment of the method. And will not be described in detail herein.

EXAMPLE eleven

The embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions enable a computer to execute the voice wake-up method shown in any one of the first to seventh embodiments of the above-mentioned methods.

Compared with the prior art that voice awakening is realized through an existing preset keyword detection model, the non-transitory computer readable storage medium extracts frequency spectrum characteristic information from collected user voice, inputs the frequency spectrum characteristic information into the first keyword detection model to obtain a first confidence degree corresponding to the frequency spectrum characteristic information, inputs the frequency spectrum characteristic information and the first confidence degree corresponding to the frequency spectrum characteristic information into the second keyword detection model to obtain a detection result if the first confidence degree corresponding to the frequency spectrum characteristic information is not smaller than a first confidence degree threshold value, and determines whether to execute voice awakening operation or not based on the detection result. In the embodiment of the invention, after part of the user voice passes through the first keyword detection model, the voice awakening operation can be determined not to be executed, and key detection is not required to be carried out through the second keyword detection model.

The embodiment of the invention provides a non-transitory computer readable storage medium which is suitable for any embodiment of the method. And will not be described in detail herein.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of voice wakeup, comprising:

2. The method of claim 1, wherein determining whether to perform a voice wakeup operation based on the detection result comprises:

if the detection result indicates that a second confidence corresponding to the frequency spectrum characteristic information is not smaller than a second confidence threshold, determining to execute voice awakening operation;

if the detection result indicates that a second confidence corresponding to the frequency spectrum characteristic information is smaller than the second confidence threshold, determining not to execute voice awakening operation;

3. The method according to claim 2, wherein the inputting the spectral feature information into a first keyword detection model to obtain a first confidence degree corresponding to the spectral feature information further comprises:

and training the first keyword detection model and the second keyword detection model.

4. The method of claim 3, wherein training the first keyword detection model comprises:

obtaining first sample information, wherein the first sample information comprises: whether the first confidence degree corresponding to at least one first sample frequency spectrum information and each first sample frequency spectrum information is not less than the labeling information of the first confidence degree threshold value or not;

training the first keyword detection model based on the first sample information;

the method for training the second keyword detection model comprises the following steps:

obtaining second sample information, the second sample information comprising: labeling information, which indicates whether a second confidence degree corresponding to at least one second sample spectrum information group and any second sample spectrum information group is not less than a second confidence degree threshold value; any second sample spectral information group includes: the second sample frequency spectrum information and the second frequency spectrum information respectively correspond to first confidence coefficients; any second sample spectrum information is sample spectrum information of which the first confidence coefficient is not less than the first confidence coefficient threshold;

and training the second keyword detection model based on the second sample information.

5. The method of any of claims 3-4, wherein training the first keyword detection model and the second keyword detection model further comprises:

configuring the first confidence threshold and the second confidence threshold, the first confidence threshold being less than the second confidence threshold.

6. The method of claim 1, wherein the spectral feature information is input into a first keyword detection model to obtain a first confidence degree corresponding to the spectral feature information, and then further comprising:

and if the first confidence corresponding to the frequency spectrum characteristic information is smaller than a first confidence threshold, determining not to execute voice awakening operation.

7. The method of claim 1, further comprising:

and if the first keyword detection model is configured locally, the second keyword detection model is configured in the cloud, and it is detected that the terminal equipment is not connected to the cloud currently, determining whether to execute voice awakening operation or not according to a first confidence coefficient corresponding to the frequency spectrum characteristic information and the first confidence coefficient threshold value.

8. An apparatus for voice wake-up, comprising:

9. An electronic device, comprising:

at least one processor;

and at least one memory, bus connected with the processor; wherein,

the processor and the memory complete mutual communication through the bus;

the processor is configured to call program instructions in the memory to perform the voice wake-up method of any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of voice wake-up of any one of claims 1 to 7.