[go: up one dir, main page]

CN115116441B - Method, device and equipment for waking up voice recognition function - Google Patents

Method, device and equipment for waking up voice recognition function Download PDF

Info

Publication number
CN115116441B
CN115116441B CN202210735039.2A CN202210735039A CN115116441B CN 115116441 B CN115116441 B CN 115116441B CN 202210735039 A CN202210735039 A CN 202210735039A CN 115116441 B CN115116441 B CN 115116441B
Authority
CN
China
Prior art keywords
interrupt
preset
detection result
activity detection
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210735039.2A
Other languages
Chinese (zh)
Other versions
CN115116441A (en
Inventor
袁瑾
肖踞雄
朱凌
王娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Dayu Semiconductor Co ltd
Original Assignee
Nanjing Dayu Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Dayu Semiconductor Co ltd filed Critical Nanjing Dayu Semiconductor Co ltd
Priority to CN202210735039.2A priority Critical patent/CN115116441B/en
Publication of CN115116441A publication Critical patent/CN115116441A/en
Application granted granted Critical
Publication of CN115116441B publication Critical patent/CN115116441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1091Details not provided for in groups H04R1/1008 - H04R1/1083
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/10Details of earpieces, attachments therefor, earphones or monophonic headphones covered by H04R1/10 but not provided for in any of its subgroups

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Electric Clocks (AREA)

Abstract

The application provides a wake-up method, device and equipment for a voice recognition function, and relates to the field of voice detection. The method comprises the steps of obtaining an activity detection result of a voice signal, wherein the activity detection result comprises the following steps: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times; counting a plurality of interrupt signals; judging whether the activity detection result meets a preset effective interruption condition according to the statistical result; if the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up the voice recognition function to recognize the collected voice signal. Therefore, the interrupt signal is detected to determine the active state of the voice signal, and then the voice recognition function is awakened, so that part of misrecognition information is filtered, the misrecognition rate of the VAD module is reduced, and the power consumption of voice recognition is reduced.

Description

Method, device and equipment for waking up voice recognition function
Technical Field
The present invention relates to the field of voice detection, and in particular, to a method, an apparatus, and a device for waking up a voice recognition function.
Background
With the rapid development of bluetooth headsets, their excellent user experience is favored by more and more people, and as a result, more functions are integrated on the headsets, such as voice recognition.
However, since the compact body of the bluetooth headset can only accommodate a small battery, stringent requirements are placed on power consumption during development. The voice recognition function requires a large amount of operation, so that power consumption becomes a difficult problem for the voice recognition function to operate on the earphone. Aiming at the problem, a VAD (Voice Active Detection voice activity detection) module is generally added at the front end of voice recognition, so that the voice recognition module is in a standby state in a normal state, and the voice recognition module is opened to start working after the VAD module detects active voice information, thereby realizing low power consumption.
Since the VAD module is always in the operation mode, low power consumption is also necessary. The low power consumption means that it cannot increase the complex calculation amount, and in order not to reduce the overall recognition rate, the VAD module must achieve higher sensitivity, so that invalid voice can be recognized, and the problem of higher false recognition rate of the VAD module is also caused.
Disclosure of Invention
The invention aims to provide a wake-up method, device and equipment for voice recognition function, aiming at the defects in the prior art, so as to solve the problems of high false recognition rate of a VAD module and the like in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a method for waking up a voice recognition function, where the method includes:
acquiring an activity detection result of a voice signal, wherein the activity detection result comprises: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times;
Counting the plurality of interrupt signals;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result;
if the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up a voice recognition function to recognize the collected voice signal.
Optionally, the counting the plurality of interrupt signals includes:
Counting the number of interrupt signals in a preset time period;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result, including:
And judging whether the activity detection result meets the preset effective interrupt condition according to the interrupt signal quantity in the preset duration.
Optionally, the determining, according to the number of interrupt signals in the preset duration, whether the activity detection result meets the preset effective interrupt condition includes:
Judging whether the number of the interrupt signals in the preset duration reaches a preset interrupt number threshold value or not;
if the number of the interrupt signals in the preset duration reaches the preset interrupt number threshold, determining that the activity detection result meets the preset effective interrupt condition;
if the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold, determining that the activity detection result does not meet the preset effective interrupt condition.
Optionally, the determining whether the number of interrupt signals in the preset duration reaches a preset interrupt number threshold includes:
and determining the threshold value of the preset interruption quantity according to a preset application scene.
Optionally, the counting the plurality of interrupt signals includes:
determining at least two continuous interrupt signals of which the time differences between adjacent interrupt signals of the interrupt signals are within a preset duration range;
determining continuous interrupt duration according to the duration of the continuous at least two interrupt signals;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result, including:
And judging whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration.
Optionally, the determining, according to the continuous interrupt duration, whether the activity detection result meets the preset valid interrupt condition includes:
Judging whether the continuous interruption time length reaches a preset interruption time length threshold value or not;
If the continuous interruption time length reaches the preset interruption time length threshold value, determining that the activity detection result meets the preset effective interruption condition;
And if the continuous interruption time does not reach the preset interruption time threshold, determining that the activity detection result does not meet the preset effective interruption condition.
Optionally, the determining whether the continuous interrupt duration reaches a preset interrupt duration threshold includes:
and determining the preset interrupt duration threshold according to a preset application scene.
Optionally, the method further comprises:
If the activity detection result does not meet the preset effective interrupt condition, determining that the interrupt signal in the activity detection result is a false detection interrupt signal, and clearing the interrupt signal in the activity detection result.
In a second aspect, an embodiment of the present application provides a wake-up device for a speech recognition function, the device including:
An acquisition module for acquiring the activity detection result of the voice signal, the activity detection result includes: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times;
The statistics module is used for counting the plurality of interrupt signals;
the judging module is used for judging whether the activity detection result meets a preset effective interruption condition according to the statistical result;
and the determining module is used for determining that the voice signal is in an active state and waking up a voice recognition function to recognize the collected voice signal if the activity detection result meets the preset effective interrupt condition.
In a third aspect, an embodiment of the present application provides a speech processing apparatus, including: the processor is in communication connection with the storage medium through a bus, the storage medium stores program instructions executable by the processor, and the processor calls the program instructions stored in the storage medium to execute the steps of the wake-up method of the voice recognition function according to any one of the first steps.
Compared with the prior art, the application has the following beneficial effects:
the application provides a wake-up method, a device and equipment for voice recognition function, wherein the method comprises the steps of obtaining an activity detection result of a voice signal, wherein the activity detection result comprises the following steps: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times; counting a plurality of interrupt signals; judging whether the activity detection result meets a preset effective interruption condition according to the statistical result; if the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up the voice recognition function to recognize the collected voice signal. Therefore, the interrupt signal is detected to determine the active state of the voice signal, and then the voice recognition function is awakened, so that part of misrecognition information is filtered, the misrecognition rate of the VAD module is reduced, and the power consumption of voice recognition is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a wake-up method of a voice recognition function according to an embodiment of the present application;
fig. 2 is a flow chart of a statistical determining method for interrupt signals according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for determining whether a preset interrupt condition is satisfied according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating another method for determining statistics of interrupt signals according to an embodiment of the present application;
FIG. 5 is a flowchart of another method for determining whether a preset interrupt condition is satisfied according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a wake-up device with a voice recognition function according to an embodiment of the present application;
fig. 7 is a schematic diagram of a voice processing device according to an embodiment of the present application.
Icon: 601-acquisition module, 602-statistics module, 603-judgment module, 604-determination module, 701-processor, 702-storage medium.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
In the Bluetooth headset, a VAD module is added at the front end of the voice recognition module, so that the voice recognition module is in a standby state in a normal state, and the voice recognition module is opened to start working after the VAD module detects active voice information, thereby realizing low power consumption. In order to reduce the transmission of the detected ineffective voice to the voice recognition module by the VAD module, the application provides a wake-up method, a device and equipment of the voice recognition function so as to reduce the false recognition rate of the VAD module. The following describes a wake-up method of a voice recognition function provided by the present application through a specific embodiment.
Fig. 1 is a flow chart of a wake-up method of a voice recognition function according to an embodiment of the present application. As shown in fig. 1, the execution body of the method may be a Chip with a computing processing function, for example, a System On Chip (SOC), and the method includes:
S101, acquiring an activity detection result of a voice signal.
Wherein, the activity detection result includes: a plurality of interrupt signals generated by the voice signal are detected a plurality of times.
When the VAD module detects that an active voice signal exists, an interrupt pulse signal is generated and reported to the system chip. When the system chip receives the interrupt signal reported by the VAD module, the interrupt signal can be further processed.
S102, counting a plurality of interrupt signals.
The system chip acquires the activity detection result of the voice signal, and can count a plurality of interrupt signals in the activity detection result. And starting from the system chip receiving the first interrupt signal, counting the received interrupt signals within a preset counting time. An interrupt signal is generated due to the presence of the voice signal. Thus, a number of interrupt signals, i.e. speech signals, are counted. Further, a statistical result of the interrupt signal is obtained.
S103, judging whether the activity detection result meets a preset effective interrupt condition according to the statistical result.
The statistics of the interrupt signal may be compared with a preset effective interrupt condition to determine whether the activity detection result meets the preset effective interrupt condition. The statistics of the interrupt signal characterizes the statistics of the voice data, so that whether the detected voice is in an active state can be determined according to the comparison result of the statistics of the interrupt signal and the preset effective interrupt condition. For example, a continuously emitted sound is in an active state, whereas a sudden transient noise is not in an active state.
And S104, if the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up a voice recognition function to recognize the collected voice signal.
If the activity detection result meets the preset effective interruption condition, whether the detected voice is effective voice or not can be determined, and the voice signal is determined to be in a continuous activity state. The voice signal in the continuous active state is effective voice, and can be identified. Thus, by determining that the speech signal is active, the speech recognition function may be awakened to recognize the collected speech signal. The voice signal detected is prevented from being in an active state, and the voice signal can not be detected just after the voice recognition function is awakened, so that the voice recognition function is awakened frequently, and the power consumption is increased. And the active state of the voice signal is determined first, and then the voice recognition function is awakened, so that the power consumption of voice recognition is reduced.
In summary, according to the wake-up method for a voice recognition function provided by the embodiment of the present application, by acquiring an activity detection result of a voice signal, the activity detection result includes: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times; counting a plurality of interrupt signals; judging whether the activity detection result meets a preset effective interruption condition according to the statistical result; if the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up the voice recognition function to recognize the collected voice signal. Therefore, the interrupt signal is detected to determine the active state of the voice signal, and then the voice recognition function is awakened, so that part of misrecognition information is filtered, the misrecognition rate of the VAD module is reduced, and the power consumption of voice recognition is reduced.
Fig. 2 is a flowchart of a method for determining statistics of interrupt signals according to an embodiment of the present application. As shown in fig. 2, counting the plurality of interrupt signals in S102 includes:
s201, counting the number of interrupt signals in a preset time period.
And counting the number of the interrupt signals within a counting preset time period. Since the interrupt signal is issued whenever a voice signal is detected, the number of interrupt signals within the preset time period may represent the activity amount of the voice signal within the preset time period.
It should be noted that, because the VAD module cannot perform data buffering, if the statistics time is too long, too much voice data will be lost, which is not beneficial to subsequent voice recognition processing; if the statistics time is too short, too many interrupt signals are filtered, so that the voice recognition function is frequently awakened, and power consumption is lost. Thus, it is desirable to determine the statistical duration based on the capabilities of the speech recognition algorithm employed.
In S103, according to the statistical result, determining whether the activity detection result meets a preset effective interrupt condition includes:
S202, judging whether the activity detection result meets a preset effective interrupt condition according to the number of interrupt signals in a preset duration.
The number of interrupt signals within the preset duration may be compared with a preset effective interrupt condition to determine whether the activity detection result meets the preset effective interrupt condition. The number of interrupt signals in the preset duration characterizes the activity of the voice signal in the preset duration, so that whether the detected voice is in an active state can be determined according to the comparison result of the number of interrupt signals in the preset duration and the preset effective interrupt condition.
To sum up, in this embodiment, the number of interrupt signals in the preset duration is counted; and judging whether the activity detection result meets the preset effective interrupt condition according to the number of interrupt signals in the preset duration. Therefore, by counting the number of interrupt signals in the preset duration, whether the activity detection result meets the preset effective interrupt condition can be accurately judged.
Fig. 3 is a flowchart of a method for determining whether a preset valid interrupt condition is satisfied according to an embodiment of the present application. As shown in fig. 3, in S202, determining whether the activity detection result meets the preset effective interrupt condition according to the number of interrupt signals in the preset duration includes:
S301, judging whether the number of interrupt signals in a preset duration reaches a preset interrupt number threshold.
If the number of the interrupt signals in the preset duration is greater than or equal to the preset interrupt number threshold, the number of the interrupt signals in the preset duration reaches the preset interrupt number threshold; if the number of the interrupt signals in the preset duration is smaller than the preset interrupt number threshold, the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold.
S302, if the number of interrupt signals in the preset duration reaches a preset interrupt number threshold, determining that the activity detection result meets a preset effective interrupt condition.
If the number of the interrupt signals in the preset duration reaches the preset interrupt number threshold, the number of the interrupt signals in the preset duration is enough, and the activity detection result is determined to meet the preset effective interrupt condition.
S303, if the number of interrupt signals in the preset duration does not reach the preset interrupt number threshold, determining that the activity detection result does not meet the preset effective interrupt condition.
If the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold, the number of the interrupt signals in the preset duration is smaller, and the condition that the detected voice in the preset duration is not in an active state may exist, and the preset effective interrupt condition is not met.
To sum up, in this embodiment, it is determined whether the number of interrupt signals within the preset duration reaches a preset interrupt number threshold; if the number of the interrupt signals in the preset duration reaches a preset interrupt number threshold, determining that the activity detection result meets a preset effective interrupt condition; if the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold, determining that the activity detection result does not meet the preset effective interrupt condition. Therefore, whether the preset effective interrupt condition is met or not is judged more accurately by comparing the interrupt signal quantity in the preset duration with the preset interrupt quantity threshold value.
With continued reference to fig. 3, determining in S301 whether the number of interrupt signals within the preset duration reaches the preset interrupt number threshold includes:
And determining a preset interruption quantity threshold according to a preset application scene.
The specific preset interrupt number threshold may be set according to different preset application scenarios, which is not limited herein.
Fig. 4 is a flowchart of another method for determining statistics of interrupt signals according to an embodiment of the present application. As shown in fig. 4, counting a plurality of interrupt signals in S201 includes:
s401, determining at least two continuous interrupt signals with time differences of adjacent interrupt signals in a preset duration range.
The voice in the active state is likely to be actually a series of continuous voice signals, and the corresponding interrupt signals are also continuous multiple interrupt signals. Therefore, in order to judge whether the voice signal is effective, a plurality of continuous interrupt signals are determined within a statistical preset duration range, namely, at least two continuous interrupt signals with the time difference of adjacent interrupt signals in the plurality of interrupt signals within the preset duration range are determined.
S402, determining continuous interrupt duration according to duration time of at least two continuous interrupt signals.
Each two adjacent interrupt signals of the continuous at least two interrupt signal interrupts are continuously issued, and thus, the time intervals between each two adjacent interrupt signals of the continuous at least two interrupt signal interrupts are equal. The duration of the continuous at least two interrupt signals may be determined, and thus the continuous interrupt duration may be determined.
Further, a plurality of continuous at least two interrupt signals may occur within a preset duration range, the duration of the continuous at least two interrupt signals may be determined, and the maximum duration is determined to be the continuous interrupt duration.
In S103, according to the statistical result, determining whether the activity detection result meets a preset effective interrupt condition includes:
S403, judging whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration.
The continuous interrupt duration of the interrupt signal within the preset duration may be compared with the preset effective interrupt condition to determine whether the activity detection result satisfies the preset effective interrupt condition. The continuous interruption time in the preset time characterizes the activity of the voice signal in the preset time, so that whether the detected voice is in an active state can be determined according to the comparison result of the continuous interruption time in the preset time and the preset effective interruption condition.
In summary, in this embodiment, determining at least two consecutive interrupt signals, where a time difference between adjacent interrupt signals in the plurality of interrupt signals is within a preset duration range; determining continuous interrupt duration according to the duration of at least two continuous interrupt signals; and judging whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration. Therefore, by counting the continuous interruption time length, whether the activity detection result meets the preset effective interruption condition can be accurately judged.
Fig. 5 is a flowchart of another method for determining whether a preset valid interrupt condition is satisfied according to an embodiment of the present application. As shown in fig. 5, in S403, determining whether the activity detection result satisfies the preset valid interrupt condition according to the continuous interrupt duration includes:
s501, judging whether the continuous interruption time reaches a preset interruption time threshold value.
If the continuous interruption time length is greater than or equal to the preset interruption time length threshold value, the continuous interruption time length reaches the preset interruption time length threshold value; if the continuous interruption time is smaller than the preset interruption time threshold, the continuous interruption time does not reach the preset interruption time threshold.
S502, if the continuous interruption time length reaches a preset interruption time length threshold value, determining that the activity detection result meets a preset effective interruption condition.
If the continuous interruption time length reaches the preset interruption time length threshold value, the interruption time length in the preset time length is enough, and the activity detection result is determined to meet the preset effective interruption condition.
S503, if the continuous interruption time length reaches a preset interruption time length threshold value, determining that the activity detection result does not meet a preset effective interruption condition.
If the continuous interruption time does not reach the preset interruption time threshold, the interruption time in the preset time is smaller, and the condition that the detected voice in the preset time is not in an active state may exist and the preset effective interruption condition is not satisfied.
To sum up, in this embodiment, it is determined whether the continuous interruption time length reaches a preset interruption time length threshold value; if the continuous interruption time length reaches a preset interruption time length threshold value, determining that the activity detection result meets a preset effective interruption condition; if the continuous interruption time length reaches the preset interruption time length threshold value, determining that the activity detection result does not meet the preset effective interruption condition. Therefore, whether the preset effective interrupt condition is met or not is judged more accurately through the continuous interrupt duration and the preset interrupt duration threshold value.
With continued reference to fig. 5, the determining in S501 whether the continuous interruption time period reaches the preset interruption time period threshold includes:
And determining a preset interruption time threshold according to a preset application scene.
The specific preset interrupt number threshold may be set according to different preset application scenarios, which is not limited herein. For example, the preset statistical duration may be 50ms and the preset interrupt duration threshold may be 20ms.
On the basis of any one of the embodiments, the wake-up method for a voice recognition function provided by the present application further includes:
if the activity detection result does not meet the preset effective interrupt condition, determining that the interrupt signal in the activity detection result is a false detection interrupt signal, and clearing the interrupt signal in the activity detection result.
If the interrupt signal in the activity detection result is determined to be the false detection interrupt signal, the subsequent voice recognition is not needed, and the interrupt signal in the activity detection result is cleared to reduce the memory pressure.
The following describes a wake-up device, a storage medium, etc. for performing a voice recognition function provided by the present application, and specific implementation processes and technical effects thereof are referred to above, which are not described in detail below.
Fig. 6 is a schematic diagram of a wake-up device with a voice recognition function according to an embodiment of the present application, where the device includes:
The acquiring module 601 is configured to acquire an activity detection result of a voice signal, where the activity detection result includes: a plurality of interrupt signals generated by the voice signal are detected a plurality of times.
The statistics module 602 is configured to perform statistics on a plurality of interrupt signals.
The judging module 603 is configured to judge whether the activity detection result meets a preset valid interrupt condition according to the statistical result.
And the determining module 604 is configured to determine that the voice signal is in an active state if the activity detection result meets the preset interrupt condition, and wake up the voice recognition function to recognize the collected voice signal.
Further, the statistics module 602 is specifically configured to perform statistics on the plurality of interrupt signals, and includes: and counting the number of interrupt signals in a preset duration.
Further, the determining module 603 is specifically configured to determine, according to the statistical result, whether the activity detection result meets a preset valid interrupt condition, including: and judging whether the activity detection result meets the preset effective interrupt condition according to the interrupt signal quantity in the preset duration.
Further, the determining module 603 is specifically configured to determine, according to the number of interrupt signals in the preset duration, whether the activity detection result meets the preset valid interrupt condition, where the determining module includes: judging whether the number of the interrupt signals in the preset duration reaches a preset interrupt number threshold value or not; if the number of the interrupt signals in the preset duration reaches the preset interrupt number threshold, determining that the activity detection result meets the preset effective interrupt condition; if the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold, determining that the activity detection result does not meet the preset effective interrupt condition.
Further, the determining module 603 is specifically configured to determine whether the number of interrupt signals in the preset duration reaches a preset interrupt number threshold, and includes: and determining the threshold value of the preset interruption quantity according to a preset application scene.
Further, the statistics module 602 is specifically configured to perform statistics on the plurality of interrupt signals, and includes: determining at least two continuous interrupt signals of which the time differences between adjacent interrupt signals of the interrupt signals are within a preset duration range; determining continuous interrupt duration according to the duration of the continuous at least two interrupt signals;
Further, the determining module 603 is specifically configured to determine, according to the statistical result, whether the activity detection result meets a preset valid interrupt condition, including: and judging whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration.
Further, the determining module 603 is specifically configured to determine, according to the continuous interruption time, whether the activity detection result meets the preset valid interruption condition, including: judging whether the continuous interruption time length reaches a preset interruption time length threshold value or not; if the continuous interruption time length reaches the preset interruption time length threshold value, determining that the activity detection result meets the preset effective interruption condition; and if the continuous interruption time length reaches the preset interruption time length threshold value, determining that the activity detection result does not meet the preset effective interruption condition.
Further, the determining module 603 is specifically configured to determine whether the continuous interruption duration reaches a preset interruption duration threshold, and includes: and determining the preset interrupt duration threshold according to a preset application scene.
Further, the determining module 604 is further configured to determine that the interrupt signal in the activity detection result is a false detection interrupt signal if the activity detection result does not meet the preset valid interrupt condition, and clear the interrupt signal in the activity detection result.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SINGNAL processor, DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic diagram of a speech processing device according to an embodiment of the present application, where the speech processing device may be a device with a computing function.
The voice processing apparatus includes: a processor 701, and a storage medium 702. The processor 701 and the storage medium 702 are connected by a bus.
The storage medium 702 is used to store a program, and the processor 701 calls the program stored in the storage medium 702 to execute the above-described method embodiment. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present invention also provides a program product, such as a computer readable storage medium, comprising a program for performing the above-described method embodiments when being executed by a processor.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Claims (9)

1. A method for waking up a speech recognition function, the method comprising:
acquiring an activity detection result of a voice signal, wherein the activity detection result comprises: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times;
Counting the plurality of interrupt signals;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result;
If the activity detection result meets the preset effective interruption condition, determining that the voice signal is in an active state, and waking up a voice recognition function to recognize the collected voice signal;
The counting the plurality of interrupt signals includes:
Determining at least two continuous interrupt signals of which the time difference between adjacent interrupt signals in the interrupt signals is within a preset duration range;
determining continuous interrupt duration according to the duration of the continuous at least two interrupt signals;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result, including:
And judging whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration.
2. The method of claim 1, wherein said counting said plurality of interrupt signals comprises:
Counting the number of interrupt signals in a preset time period;
judging whether the activity detection result meets a preset effective interruption condition according to the statistical result, including:
And judging whether the activity detection result meets the preset effective interrupt condition according to the interrupt signal quantity in the preset duration.
3. The method according to claim 2, wherein the determining whether the activity detection result meets the preset valid interrupt condition according to the number of interrupt signals in the preset duration includes:
Judging whether the number of the interrupt signals in the preset duration reaches a preset interrupt number threshold value or not;
if the number of the interrupt signals in the preset duration reaches the preset interrupt number threshold, determining that the activity detection result meets the preset effective interrupt condition;
if the number of the interrupt signals in the preset duration does not reach the preset interrupt number threshold, determining that the activity detection result does not meet the preset effective interrupt condition.
4. The method of claim 3, wherein said determining whether the number of interrupt signals within the predetermined time period reaches a predetermined interrupt number threshold comprises:
and determining the threshold value of the preset interruption quantity according to a preset application scene.
5. The method according to claim 1, wherein the determining whether the activity detection result meets the preset valid interrupt condition according to the continuous interrupt duration includes:
Judging whether the continuous interruption time length reaches a preset interruption time length threshold value or not;
If the continuous interruption time length reaches the preset interruption time length threshold value, determining that the activity detection result meets the preset effective interruption condition;
And if the continuous interruption time does not reach the preset interruption time threshold, determining that the activity detection result does not meet the preset effective interruption condition.
6. The method of claim 5, wherein the determining whether the continuous break duration reaches a preset break duration threshold comprises:
and determining the preset interrupt duration threshold according to a preset application scene.
7. The method according to any one of claims 1-6, further comprising:
If the activity detection result does not meet the preset effective interrupt condition, determining that the interrupt signal in the activity detection result is a false detection interrupt signal, and clearing the interrupt signal in the activity detection result.
8. A wake-up device for a speech recognition function, the device comprising:
An acquisition module for acquiring the activity detection result of the voice signal, the activity detection result includes: detecting a plurality of interrupt signals generated by the voice signal for a plurality of times;
The statistics module is used for counting the plurality of interrupt signals;
the judging module is used for judging whether the activity detection result meets a preset effective interruption condition according to the statistical result;
the determining module is used for determining that the voice signal is in an active state and waking up a voice recognition function to recognize the collected voice signal if the activity detection result meets the preset effective interrupt condition;
the statistics module is specifically configured to determine at least two continuous interrupt signals, where a time difference between adjacent interrupt signals in the plurality of interrupt signals is within a preset duration range; determining continuous interrupt duration according to the duration of the continuous at least two interrupt signals;
the judging module is specifically configured to judge whether the activity detection result meets the preset effective interrupt condition according to the continuous interrupt duration.
9. A speech processing apparatus, comprising: the processor is in communication connection with the storage medium through a bus, the storage medium stores program instructions executable by the processor, and the processor calls the program instructions stored in the storage medium to execute the steps of the wake-up method of the voice recognition function according to any one of claims 1 to 7.
CN202210735039.2A 2022-06-27 2022-06-27 Method, device and equipment for waking up voice recognition function Active CN115116441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210735039.2A CN115116441B (en) 2022-06-27 2022-06-27 Method, device and equipment for waking up voice recognition function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210735039.2A CN115116441B (en) 2022-06-27 2022-06-27 Method, device and equipment for waking up voice recognition function

Publications (2)

Publication Number Publication Date
CN115116441A CN115116441A (en) 2022-09-27
CN115116441B true CN115116441B (en) 2024-10-22

Family

ID=83330079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210735039.2A Active CN115116441B (en) 2022-06-27 2022-06-27 Method, device and equipment for waking up voice recognition function

Country Status (1)

Country Link
CN (1) CN115116441B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115641882B (en) * 2022-10-27 2023-05-12 深圳市移文科技有限公司 Intelligent starting method and device for recording of wearable equipment and wearable equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858488A (en) * 2018-08-24 2020-03-03 阿里巴巴集团控股有限公司 Voice activity detection method, device, equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000046789A1 (en) * 1999-02-05 2000-08-10 Fujitsu Limited Sound presence detector and sound presence/absence detecting method
JP4441125B2 (en) * 1999-03-25 2010-03-31 サゲム エス エー Voice detection device for detecting a voice signal in an input signal
US7263074B2 (en) * 1999-12-09 2007-08-28 Broadcom Corporation Voice activity detection based on far-end and near-end statistics
JP4601970B2 (en) * 2004-01-28 2010-12-22 株式会社エヌ・ティ・ティ・ドコモ Sound / silence determination device and sound / silence determination method
JP4490090B2 (en) * 2003-12-25 2010-06-23 株式会社エヌ・ティ・ティ・ドコモ Sound / silence determination device and sound / silence determination method
CN105261375B (en) * 2014-07-18 2018-08-31 中兴通讯股份有限公司 Activate the method and device of sound detection
CN110265036A (en) * 2019-06-06 2019-09-20 湖南国声声学科技股份有限公司 Voice awakening method, system, electronic equipment and computer readable storage medium
CN114242064B (en) * 2021-12-31 2025-10-10 科大讯飞股份有限公司 Speech recognition method and device, speech recognition model training method and device
CN114495907B (en) * 2022-01-27 2024-08-13 多益网络有限公司 Adaptive voice activity detection method, device, equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858488A (en) * 2018-08-24 2020-03-03 阿里巴巴集团控股有限公司 Voice activity detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115116441A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN109036428A (en) Voice wake-up device and method and computer readable storage medium
CN112652306B (en) Voice wakeup method, voice wakeup device, computer equipment and storage medium
CN109672775B (en) Method, device and terminal for adjusting awakening sensitivity
CN110349579B (en) Voice wake-up processing method and device, electronic equipment and storage medium
CN110968353A (en) Central processing unit awakening method and device, voice processor and user equipment
CN110853644B (en) Voice wake-up method, device, equipment and storage medium
CN103971681A (en) Voice recognition method and system
EP3823310A1 (en) Microphone hole clogging detection method and related products
CN115116441B (en) Method, device and equipment for waking up voice recognition function
EP3755107A1 (en) Method and device for transmitting synchronization signal block, and storage medium
CN112233676A (en) Intelligent device awakening method and device, electronic device and storage medium
CN112289311B (en) Voice wakeup method and device, electronic equipment and storage medium
CN112073862A (en) Audible keyword detection and method
CN106095566B (en) Response control method and mobile terminal
CN115171690B (en) Control method, device, equipment and storage medium of speech recognition device
CN112289336A (en) Audio signal processing method and device
CN110956968A (en) Voice wake-up and method, device and terminal device for triggering voice wake-up function
CN113918757B (en) Application recommendation method and device, electronic equipment and storage medium
CN106990830A (en) A kind of method for information display and device
CN114333017A (en) Dynamic pickup method and device, electronic equipment and storage medium
CN110189763B (en) Sound wave configuration method and device and terminal equipment
CN116386676B (en) Voice awakening method, voice awakening device and storage medium
CN112469111B (en) Wireless communication method and device based on LoRa and gateway equipment
CN104768132B (en) A kind of call detection method and detection device of conversing
CN110600060A (en) Hardware audio active detection HVAD system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant