Detailed Description
In order to better understand the technical solutions in the embodiments of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the present application, shall fall within the scope of protection of the embodiments of the present application.
In the current voice wake-up device, in order to enable the system to wake up faster when in use, a single processor core with smaller performance is generally used for executing the wake-up function, but when some hardware or platforms have no similar processor core with smaller performance, the system can only run on a processor with better performance, thus greatly wasting the power consumption of the system. Based on the above, the embodiment of the application provides a wake-up scheme of the voice recognition equipment with more power consumption saving.
The following further describes specific implementations of embodiments of the present application in conjunction with the accompanying drawings of embodiments of the present application. As shown in fig. 1, fig. 1 is a flow chart of a wake-up method of a voice recognition device according to an embodiment of the present application, which specifically includes:
And S101, when receiving an activation signal sent by the vibration sensor after sensing the vibration signal, the processor changes the dormant state into the working state.
As previously mentioned, in conventional manner, it is desirable to keep at least a portion of the processor cores in an operational state. In the scenario according to the embodiments of the present application, the speech recognition device is already in a dormant state, at which point all processors and corresponding other modules in the speech recognition device are already in a dormant state.
For example, after the user gets off, if the user may come back again in a short time, the user may set the control center of the carrying device to a sleep state. At this time, other modules such as a processor included in a control center of the carrying device and a voice wake-up module in the intelligent sound equipment can all enter a sleep state at the same time.
But at this time, the vibration sensor (e.g., gyro sensor) included in the carrier device is still operating normally. The vibration sensor is a simple and easy to use control device based on free space movement positioning, which is capable of accurately determining the orientation of the carrier device. In addition, the vibration signal generated by the surrounding environment can be sensitively sensed, and the vibration signal comprises a vibration signal generated by the vibration of air and a vibration signal caused by the shaking of a vehicle in which the gyroscope is positioned. For example, the vibration sensor may sample at a preset sampling frequency (e.g., a sampling frequency of 150 Hz), where each sampling obtains a sampling point, and every other sampling point includes information about acceleration characteristics of the vibration sensor and information about vibration signals of the vibration sensor, and multiple sampling points within a continuous preset duration are regarded as one sampling space (e.g., every 1s interval duration is regarded as 1 sampling space, and at a sampling frequency of 150Hz, that is, there are 150 sampling points per second), where vibration signals corresponding to the sampling space are obtained by simulation based on the information about the multiple sampling points in the sampling space.
The vibration signal generated by the surrounding environment may be a voice signal generated when the user speaks, or a sound signal generated when the user opens a door and performs other corresponding operations when entering the vehicle, or a signal representing vibration of the vehicle generated based on shaking of the vehicle caused by the user getting on the vehicle, or a mixed signal generated based on beating or knocking of certain parts on the vehicle (i.e. including the sound signal and the signal representing vibration of the vehicle).
The vibration sensor detects the vibration signal and sends a corresponding activation signal to the processor, and at the moment, the processor changes the sleep state into a normal working state so as to process the received vibration signal. For example, the vibration sensor may cause the processor to go out of sleep by a wake-up pin on the processor to which it is connected (e.g., by the pin having a low-to-high level change).
S103, determining the signal characteristics of the vibration signal.
The vibration signal is a signal sequence which can be regarded as time-dependent, and the corresponding signal characteristics of the vibration signal can be obtained by analysis, such as fast fourier transform (Fast Fourier Transform, FFT).
Specifically, the signal characteristics of the vibration signal include acceleration characteristics of the vibration sensor, or frequency domain characteristics and time domain characteristics of the vibration signal.
The time domain features comprise waveform indexes, pulse indexes, kurtosis indexes, margin indexes, peak values, vibration time lengths and the like, and the frequency domain features comprise center-of-gravity frequencies, mean square frequencies, root mean square frequencies, frequency variances, frequency standard deviations, frequency areas and the like, and the frequency domain features and the time domain features can be obtained based on the FFT mode.
The acceleration characteristics of the vibration sensor include acceleration of the vibration signal corresponding to each sampling space, and may also include three acceleration components corresponding to three axis directions in a space coordinate system pre-established by the vibration sensor, that is, three acceleration components in x, y and z axes, respectively. In addition, the acceleration characteristic of the vibration sensor may further include three acceleration components corresponding to each sampling point.
S105, if the vibration signal is determined to be an effective signal according to the signal characteristics of the vibration signal, waking up the voice wake-up module in the dormant state to perform voice recognition, otherwise, enabling the processor to resume the dormant state.
Specifically, whether the vibration signal has a valid signal is determined by whether the signal characteristics of the vibration signal meet certain preset conditions, and the preset conditions can be preset differently based on different scenes.
For example, if the sound signal generated by the user closing the door is desired to be regarded as an effective signal in the case of activating the automatic control module by the door closing sound based on the user getting on the car, since the sound vibration generated by the user closing the door is generally short and low-frequency, a short pulse time, a short vibration duration, a large peak-to-peak value can be selected as the time domain feature, and a low frequency variance and a low frequency region can be selected as the conditions required to be satisfied by the vibration signal.
For another example, if the corresponding intelligent voice function is to be awakened by the voice command of the user after the user gets on the vehicle, for example, the intelligent sound box is to be awakened by the voice command after the user gets on the vehicle, then the voice command of the user is expected to be regarded as an effective signal. Because the frequency of the sound vibration generated by the voice command of the user is the frequency of the normal sound range and the time length is medium, the medium vibration duration, the larger peak and peak value can be selected as the time domain characteristic, and the lower frequency variance and the medium frequency region can be selected as the conditions required to be met by the vibration signal.
For example, if the vibration generated when the user gets on the vehicle is generated as the user gets on the vehicle, the vibration signal corresponding to the vibration generated when the user gets on the vehicle is longer in duration and lower in frequency in practice in the scene that the system needs to work normally, so that the longer vibration duration can be selected as the time domain feature, and the lower frequency region can be selected as the condition required to be satisfied by the vibration signal.
For another example, the system is used in a scene where the system needs to work normally based on a mixed signal generated by a user beating or knocking certain parts on the vehicle as the user has get on the vehicle. The sound signal and the signal representing shaking are distinguished based on the FFT signal processing, so that the sound signal and the signal representing shaking are obtained respectively, the shorter pulse time, the larger peak value and the larger frequency variance are used as preset conditions required to be met by the mixed signal, and meanwhile, the conditions required to be met by the signal and the like can be set for the sound signal and the signal representing shaking respectively.
In other words, the selection criteria based on the effective signal are different in different scenarios, so the selection terms and the ranges of selection terms for the time domain features, the frequency domain features are all based on the scenario and all the differences are not listed here.
Furthermore, the acceleration characteristics of the vibration sensor may also be different in different scenarios. For example, in a voice wake scenario, the acceleration caused by the vibration signal based on the air vibration generated by the user speaking is typically not too great, and therefore a second preset condition comprising a lower acceleration threshold, for example 0.30m/s2, may be set, i.e. when the acceleration of the vibration sensor exceeds 0.30m/s2, the vibration signal is determined to be a valid signal.
In the case of waking up the device with a door-closing sound generated when the user gets on, the generated vibration signal is generally larger and shorter, so a second preset condition including a larger acceleration threshold, for example, 2m/s2, may be set, i.e., when the acceleration of the vibration sensor exceeds 2m/s2, the vibration signal is determined to be a valid signal (i.e., the user may possibly get on with the door open).
If a vibration signal is determined to be a valid signal, it indicates that the system is ready to enter a normal operation state at this time, so that the processor will be activated first, i.e. the processor will maintain the normal operation state, and send a relevant activation signal to the corresponding voice wake-up module according to the requirements of the scenario, so that the corresponding voice wake-up module also enters the normal operation state from the sleep state, in preparation for subsequent voice recognition.
In addition, when the accuracy of the vibration sensor is high, the foregoing vibration signal may be used to perform voice recognition when the voice wake-up module performs subsequent voice recognition, so as to implement faster voice recognition.
If a vibration signal is not a valid signal, then the processor will re-enter the sleep state and will not send an activation signal to the voice wake-up module.
According to the scheme provided by the embodiment of the application, when the voice recognition equipment enters the dormant state, the processor and other modules enter the dormant state together, when the activating signal sent by the vibration signal sensed by the vibration sensor is received, the processor in the dormant state is awakened, the signal characteristics of the vibration signal are determined, if the vibration signal is determined to be an effective signal according to the signal characteristics of the vibration signal, the voice awakening module in the dormant state is awakened to perform voice recognition, and otherwise, the processor is enabled to resume the dormant state. In the embodiment of the application, the vibration signal in the environment is detected in advance through the vibration sensor, the normal working state of the processor is started and maintained when the vibration signal is the effective signal, and the processor is enabled to resume the dormant state when the vibration signal is the ineffective signal, so that the power consumption of the system is effectively saved.
In one embodiment, the processor may only need to use the acceleration characteristic of the vibration sensor as the signal characteristic of the vibration signal when determining the signal characteristic of the vibration signal. The acceleration characteristic of the vibration sensor can be directly obtained through signal acquisition by the vibration sensor (such as a gyroscope sensor) with corresponding functions, and the signal characteristic of the vibration signal can be obtained more quickly and effectively by adopting the method.
In one embodiment, based on the applied scenario, a further determination of the effective signal may be made based on the acceleration characteristics of the vibration sensor.
Specifically, the acceleration characteristic of the vibration sensor may be decomposed, so as to obtain three acceleration components (i.e., three acceleration components in the x, y, and z three-axis directions) corresponding to the acceleration characteristic of the vibration sensor in three directions of the preset space. As shown in fig. 2, fig. 2 is a schematic diagram of an acceleration component according to an embodiment of the present application.
If only one of the three acceleration components in the x, y, z directions has a value greater than the acceleration threshold value (e.g., 0.25m/s 2), the vibration signal is considered to be an invalid signal. Only when the values of two or three acceleration components are greater than the acceleration threshold value, the vibration signal at this time is considered to be a valid signal. The specific value of the acceleration threshold may be determined empirically based on the needs of the scene. Due to the fact that vibration signals generated in all scenes are different, effective signals in all scenes can be further accurately determined, invalid awakening is avoided, and power consumption is further saved.
In one embodiment, if there are at least two acceleration components exceeding the acceleration threshold, then it may be determined whether the vibration signal is a valid signal based further on the information of the sampling point. For example, at a sampling frequency of 150Hz, there will be 150 sampling points per 1 second of sampling space, at which time multiple sampling points (i.e., 150 sampling points) in the sampling space and three acceleration components corresponding to each sampling point may be acquired. And judging whether one sampling point is a valid sampling point or not based on the acceleration component of the sampling point. As shown in fig. 3, fig. 3 is a schematic diagram of a sampling space provided by an embodiment of the present application, where a plurality of sampling points sequentially arranged according to sampling time are included in one sampling space, and sampling time periods of the sampling spaces are equal, for example, all 2s.
For each sampling point, if at least two acceleration components of one sampling point exceed an acceleration threshold, the sampling point is determined to be a valid sampling point. Further, whether the vibration signal is a valid signal may be determined based on the number of valid sampling points.
For example, when the value of acceleration components of 2 or more of the x, y, z three axes is greater than 0.25m/s2, then the number of effective sampling points among 150 sampling points acquired in 1000 milliseconds is determined, and if the number of effective sampling points satisfies a preset value or exceeds a preset proportion, for example, the number of effective sampling points exceeds 120 or the ratio exceeds 80%, the vibration signal is considered to be an effective signal, otherwise, the vibration signal is considered to be an ineffective signal.
In one embodiment, on the basis of using the acceleration characteristic of the vibration sensor, the frequency domain characteristic and the time domain characteristic of the vibration signal can be used to comprehensively determine whether the vibration signal is a valid signal. For example, the center of gravity frequency, the frequency standard deviation, the frequency region in the frequency domain feature, the waveform index, the pulse index, the peak-to-peak value, the vibration duration and the like in the time domain feature can be used for comprehensively judging whether one vibration signal is an effective signal, and the judgment on the effective signal can be more accurate by using the frequency domain feature, the time domain feature and the acceleration feature of the vibration sensor of the vibration signal.
In one embodiment, a specific voice signal (i.e., a voice signal containing a wake-up word) sent by a user is generally used as a wake-up signal for voice recognition by the voice recognition device. At this time, since the low frequency band is very small and the high frequency band is very small when the user speaks, the judgment condition of the effective signal can be set as that the vibration signal is the common frequency of normal person speaking, and the length of the vibration signal does not exceed the pronunciation time length corresponding to the wake-up word, so that the vibration signal at this time is the effective voice signal.
Then, the processor can maintain its normal working state and send an activation signal to the voice wake-up module in the intelligent sound box, so that the voice wake-up module in the intelligent sound box can enter the normal working state and process the voice signal (including voice recognition, man-machine conversation, etc.), and further realize voice interaction with the user based on the processor, the voice wake-up module and even the server.
In the case of voice wake-up, the usual speaking speed of a person is usually 100 to 200 words per minute, which is approximately 0.45 seconds, the usual voice wake-up words are all over 2 words, so the duration of the effective vibration signal should not be lower than 1 second, at the same time, the bass and treble parts are removed, the speaking frequency of a normal person is approximately between 80hz and 400hz, and the vibration of the generated sound signal is not great.
Therefore, the preset condition may be set such that the frequency of the vibration signal is between [80hz,400hz ], the duration of the vibration signal is not less than 1 second, and at least two acceleration components corresponding to the vibration sensor in three directions exceeds 0.25m/s2. When the vibration signal satisfies the aforementioned preset condition, then the vibration signal may be determined to be a valid signal. Further, the processor may send an activation signal to the voice wake module to activate the voice wake module to go out of sleep and perform subsequent voice recognition.
In this embodiment, when the driver or the passenger does not need the voice signal, the voice signal processor and the voice signal processor can be put into a sleep state, and only when the vibration sensor detects the effective voice signal, the voice signal processor can maintain the normal working state and wake up the corresponding function of the voice signal processor, and if the detected voice signal is an invalid signal (in fact, the driver and the passenger often possibly have occasional communication), the voice wake-up module in the voice signal processor cannot be woken up, and the voice signal processor can also reenter the sleep state, so that the power consumption can be greatly saved.
In one embodiment, before the processor determines whether the vibration signal is a valid signal, noise filtering may be further performed on the vibration signal, that is, spectrum analysis is performed on the vibration signal, and the vibration signal whose amplitude does not exceed a preset amplitude value and/or whose vibration time does not exceed a preset duration is noise. For example, in the voice wake-up scene, the vibration signal with the amplitude not exceeding 1mm and the vibration time not exceeding 0.2s is regarded as noise, and filtered, so that the jitter in the vibration signal is filtered, and the recognition accuracy of the effective signal in the subsequent step is improved. For example, the vibration signal in fig. 3 may be subjected to corresponding amplitude filtering, so as to remove the background noise signals with smaller amplitudes and straighter in the drawing, so as to improve the subsequent recognition accuracy.
In a second aspect of the embodiment of the present application, as shown in fig. 4, fig. 4 is a schematic structural diagram of an electronic device provided in the embodiment of the present application, and the specific embodiment of the present application is not limited to specific implementation of the electronic device.
As shown in fig. 4, the electronic device may include a gyro sensor 1000 and a processor 1002 the gyro sensor 1000 sensing vibration signals generated in an environment and generating an activation signal;
the processor 1002 receives the activation signal generated by the gyro sensor, changes the sleep state into the working state, determines the characteristics corresponding to the vibration signal, determines whether the vibration signal is an effective signal according to the characteristics corresponding to the vibration signal, maintains the normal working state if the vibration signal is an effective signal, and re-enters the sleep state if the vibration signal is an effective signal.
The voice wake-up module 1003 performs voice recognition after being woken up from the sleep state.
The electronic equipment provided by the embodiment of the application can be used as accessories in various other entities, such as mobile phones, intelligent home appliances, intelligent vehicles and the like.
Other relevant components for implementing the foregoing functions may also be included in the electronic device, such as a communication interface 1004, memory 1006, and communication bus 1008.
Wherein:
The voice wakeup module 1001, the processor 1002, the voice wakeup module 1003, the communication interface 1004, and the memory 1006 perform communication with each other through the communication bus 1008.
Communication interface 1004 is used to communicate with other electronic devices or servers.
The processor 1002 is configured to execute the program 1010, and may specifically perform relevant steps in the foregoing wake-up method embodiment of the processor.
In particular, program 1010 may include program code including computer operating instructions.
The processor 1002 may be a central processing unit, CPU, or a specific integrated circuit ASIC (ApplicationSpecific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included in the smart device may be the same type of processor, such as one or more CPUs, or different types of processors, such as one or more CPUs and one or more ASICs.
Memory 1006 for storing programs 1010. The memory 1006 may include high-speed RAM memory or may further include non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
The program 1010 may be specifically configured to cause the processor 1002 to perform operations corresponding to a wake-up method of a processor as in the first aspect.
The specific implementation of each step in the program 1010 may refer to corresponding steps and corresponding descriptions in the units in the wake-up method embodiment of the processor, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.
The embodiment of the application discloses TS8 and voice recognition equipment, which comprises:
A vibration sensor sensing a vibration signal generated in an environment and generating an activation signal;
The processor is used for receiving an activation signal generated by the vibration sensor, changing the dormant state into the working state, determining the signal characteristics of the vibration signal, waking up a voice wake-up module in the dormant state if the vibration signal is determined to be an effective signal according to the signal characteristics of the vibration signal, and otherwise, enabling the processor to resume the dormant state;
and the voice wake-up module is used for executing voice recognition after being waken up from the dormant state.
TS9, the voice recognition device according to TS8, wherein the processor acquires three acceleration components corresponding to acceleration characteristics of the vibration sensor in three directions of a preset space, and determines the vibration signal to be an effective signal when at least two acceleration components exceed an acceleration threshold.
TS10, the speech recognition device according to TS9, the processor determines the vibration signal as an effective signal when the frequency of the vibration signal is between [80HZ,400HZ ], the duration of the vibration signal is not less than 1 second, and at least two acceleration components corresponding to the vibration sensor in three directions exceed 0.25m/s 2.
In a third aspect of the embodiments of the present application, there is also provided a carrying device on which the speech recognition device as described above is arranged.
The embodiment of the application also discloses TS11 and a carrying device, wherein the carrying device is provided with the voice recognition device from TS8 to TS 10.
For example, an intelligent sound box including a voice wake-up module may be configured on an automobile, and a control center of the automobile may provide functions of a processor of the voice recognition device (i.e., the control center of the automobile is a processor of the voice recognition device), while a gyroscope included on the automobile is a vibration sensor of the voice recognition device.
When a user gets off or temporarily sets the intelligent sound box in a dormant state, the control center enters the dormant state, and a gyroscope on the automobile can sense surrounding sound signals such as a sound signal generated by opening and closing a car door, a voice signal generated by speaking of the user and the like and generate an activation signal, so that the control center judges whether a voice awakening module in the intelligent sound box needs to be awakened or not.
In a fourth aspect of embodiments of the present application, there is also provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a wake-up method of a speech recognition device as described in the first aspect.
According to a fifth aspect of embodiments of the present application, there is provided a computer program product comprising computer instructions that instruct a computing device to perform operations corresponding to the wake-up method of a speech recognition device as described in the first aspect.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.
The system, method, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the method embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The above-described method embodiments are merely illustrative, in that the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present disclosure. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely a specific implementation of the embodiments of this disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of this disclosure, and these improvements and modifications should also be considered as protective scope of the embodiments of this disclosure.