WO2020140840A1 - 用于唤醒可穿戴设备的方法及装置 - Google Patents
用于唤醒可穿戴设备的方法及装置 Download PDFInfo
- Publication number
- WO2020140840A1 WO2020140840A1 PCT/CN2019/129114 CN2019129114W WO2020140840A1 WO 2020140840 A1 WO2020140840 A1 WO 2020140840A1 CN 2019129114 W CN2019129114 W CN 2019129114W WO 2020140840 A1 WO2020140840 A1 WO 2020140840A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- wearer
- sound signal
- wearable device
- facial
- detected
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/4401—Bootstrapping
- G06F9/4418—Suspend and resume; Hibernate and awake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/015—Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/169—Holistic features and representations, i.e. based on the facial image taken as a whole
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2460/00—Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
- H04R2460/13—Hearing devices using bone conduction transducers
Definitions
- the present disclosure relates to the field of electronic devices, and in particular, to a method and apparatus for waking up wearable devices.
- wearable devices are gradually occupying an important position in people's lives. Considering power consumption and battery life issues, wearable devices are not always in normal working condition. When the user needs to use it, the wearable device can be awakened to a normal working state through certain means.
- the way of waking up the wearable device (for example, the method of waking up words) is not natural enough.
- a wake-up word is used to wake up the wearable device
- the user speaks a specific wake-up word
- the wearable device listens to the wake-up word and performs corresponding voice processing to wake up the wearable device.
- the way of awakening the wearable device in the prior art is too mechanical, and cannot achieve a natural and smooth experience effect. Therefore, there is an urgent need for a wake-up method that can naturally wake up the wearable device.
- the present disclosure provides a method and apparatus for waking up a wearable device, which can be used to wake up the wearable device during the normal interaction between the wearer and the wearable device, thereby enabling a natural wake-up process To improve the user experience of wearable devices.
- a method for waking up a wearable device including: determining whether the sound signal is from a wearer of the wearable device based on the detected sound signal; based on the sound signal , Using a voice classification model to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device; and when the sound signal comes from the wearer and the sound signal belongs to the wearer During the conversation with the wearable device, wake up the wearable device.
- determining whether the sound signal is from the wearer of the wearable device based on the detected sound signal may include: acquiring the time of the wearer when the sound signal is detected Point back or forward biometric information related to vocalization of the wearer within a predetermined period of time; and determine whether the sound signal comes from the wearer based on the biometric information.
- acquiring biometric information related to the utterance of the wearer within a predetermined time period backward or forward from the time point when the sound signal is detected by the wearer may be The method includes: acquiring biological characteristics of the wearer related to vocalization within a predetermined time period backward or forward from the time point when the sound signal is detected and the sound signal reaches the first threshold information.
- the biometric information may include a muscle myoelectric signal of the wearer, and determining whether the sound signal is from the wearer based on the biometric information may include: When the received muscle myoelectric signal is not lower than a predetermined myoelectric threshold, it is determined that the sound signal comes from the wearer.
- the biometric information may include facial muscle movement information of the wearer, and determining whether the sound signal is from the wearer based on the biometric information includes: When the facial muscle movement information of the wearer indicates that the wearer's facial muscles related to vocalization are in motion, it is determined that the sound signal comes from the wearer.
- acquiring biometric information related to utterances of the wearer within a predetermined time period backward or forward from the time point when the sound signal is detected may include: A facial image of the wearer backward or forward within a predetermined period of time from the time point when the sound signal is detected; and identifying the facial muscle-related face of the wearer based on the facial image Muscle movement information.
- acquiring biometric information of the wearer backward or forward within a predetermined time period from the time point when the sound signal is detected may include: acquiring the sound signal Facial structure information of the wearer backward or forward within a predetermined time period from the time point when it is detected; establishing a facial 3D model of the wearer based on the facial structure information; and based on the facial 3D model Detect facial muscle movement information of the wearer's facial muscles related to vocalization.
- the biometric information includes muscle vibration information of the wearer related to vocalization
- determining whether the sound signal is from the wearer based on the biometric information may include: When the muscular vibration information related to the utterance indicates that the wearer's muscles related to the utterance are vibrating, it is determined that the sound signal comes from the wearer.
- determining whether the sound signal is from the wearer of the wearable device based on the detected sound signal may include: identifying voiceprint information of the detected sound signal; based on the wearing The voiceprint characteristics of the wearer and the recognized voiceprint information determine whether the sound signal comes from the wearer.
- the sound signal may be a bone conduction sound signal, which is detected by a bone conduction sound detection device attached to the wearer's head or neck , Based on the detected sound signal, determining whether the sound signal is from a wearer of the wearable device may include: when the bone conduction sound signal is not lower than a predetermined sound threshold, determining that the sound signal is from a wearable device The sound signal of the wearer wearing the device.
- a method for waking up a wearable device comprising: acquiring biometric information related to utterance of a wearer of the wearable device; when the biometric information indicates the wearer When uttering, detecting a sound signal during the utterance of the wearer; based on the sound signal, using a voice classification model to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device; and When the sound signal detected during the wearer's utterance belongs to the dialogue between the wearer and the wearable device, the wearable device is awakened.
- the biometric information may include at least one of the following: muscle electromyographic signals of the wearer; facial muscle movement information of the wearer; and bone of the wearer Conduct sound signals.
- an apparatus for waking up a wearable device including: a sound source determination unit configured to determine whether the sound signal is from the wearable device based on the detected sound signal The wearer; a sound signal classification unit configured to determine whether the sound signal belongs to a dialogue between the wearer and the wearable device based on the sound signal using a voice classification model; and a device wake-up unit, It is configured to wake up the wearable device when the sound signal comes from the wearer and the sound signal belongs to the dialogue between the wearer and the wearable device.
- the sound source determination unit may include: a biometric information acquisition module configured to acquire the wearer backward or forward from the time point when the sound signal is detected Biometric information related to vocalization of the wearer within a predetermined period of time; and a sound signal source determination module configured to determine whether the sound signal comes from the wearer based on the biometric information.
- the biometric information acquisition module may be configured to: acquire the wearer backward from the time point when the sound signal is detected and the sound signal reaches the first threshold Or the biometric information related to the utterance of the wearer within a predetermined time period forward.
- the biometric information acquisition module may include: a muscle myoelectricity detection submodule configured to acquire the wearer backward or forward from the time point when the sound signal is detected The muscle myoelectric signal of the wearer within a predetermined time period, the sound signal source determination module is configured to determine the sound when the acquired muscle myoelectric signal is not lower than a predetermined myoelectric threshold The signal comes from the wearer.
- a muscle myoelectricity detection submodule configured to acquire the wearer backward or forward from the time point when the sound signal is detected The muscle myoelectric signal of the wearer within a predetermined time period
- the sound signal source determination module is configured to determine the sound when the acquired muscle myoelectric signal is not lower than a predetermined myoelectric threshold The signal comes from the wearer.
- the biometric information includes facial muscle movement information of the wearer
- the sound signal source determination module may be configured to: indicate in the wearer's facial muscle movement information that the When the wearer's facial muscles related to vocalization move, it is determined that the sound signal comes from the wearer.
- the biometric information acquisition module may include: a facial image acquisition sub-module configured to acquire a predetermined time period backward or forward from the time point when the sound signal is detected A facial image of the wearer within; and a muscle motion information recognition sub-module configured to recognize facial muscle motion information of the wearer's facial muscles related to vocalization based on the facial image.
- the biometric information acquisition module may include: a facial structure information acquisition submodule configured to acquire a predetermined time backward or forward from the time point when the sound signal is detected The face structure information of the wearer in the paragraph; a facial 3D model building sub-module configured to build the wearer's face 3D model based on the facial structure information; and a muscle motion information recognition sub-module configured to be based on The facial 3D model detects facial muscle movement information of the wearer's facial muscles related to vocalization.
- a facial structure information acquisition submodule configured to acquire a predetermined time backward or forward from the time point when the sound signal is detected The face structure information of the wearer in the paragraph
- a facial 3D model building sub-module configured to build the wearer's face 3D model based on the facial structure information
- a muscle motion information recognition sub-module configured to be based on The facial 3D model detects facial muscle movement information of the wearer's facial muscles related to vocalization.
- the biometric information includes muscular vibration information related to vocalization of the wearer
- the sound signal source determination module may be configured to: in the muscular vibration information related to vocalization When it is indicated that the wearer's muscles related to utterance are vibrating, it is determined that the sound signal comes from the wearer.
- the sound source determination unit may include: a voiceprint information recognition module configured to recognize voiceprint information of the detected sound signal; a sound signal source determination module configured to be based on The wearer's voiceprint characteristics and the recognized voiceprint information determine whether the sound signal comes from the wearer.
- the device further includes: a sound detection unit configured to detect a sound signal.
- the sound signal may be a bone conduction sound signal
- the sound detection unit may include: a bone conduction sound detection unit configured to be able to be attached when the wearer wears the wearable device Attached to the wearer's head or neck to detect bone conduction sound signals.
- the sound signal source determining module is configured to determine that the sound signal is a sound signal from a wearer of the wearable device when the bone conduction sound signal is not lower than a predetermined sound threshold.
- an apparatus for awakening a wearable device including: a biometric information acquisition unit configured to acquire biometric information related to utterance of a wearer of the wearable device; sound detection A unit configured to detect a sound signal during the wearer's utterance when the biometric information indicates that the wearer utters a voice; a sound signal classification unit configured to use a voice classification model based on the sound signal To determine whether the sound signal belongs to the dialogue between the wearer and the wearable device; and the device wake-up unit is configured to detect when the sound signal detected during the wearer's utterance belongs to the wearer and all Wake up the wearable device during the conversation between the wearable devices.
- the biometric information acquisition unit may include at least one of the following: a muscle myoelectricity detection module configured to detect a muscle myoelectric signal of the wearer; a muscle movement detection module, Is configured to detect facial muscle movement information of the wearer; and a bone conduction sound detection module is configured as a bone conduction sound signal of the wearer.
- a computing device including: at least one processor; and a memory that stores instructions that when executed by the at least one processor, causes the at least one processor The method as described in the claims is carried out.
- a non-transitory machine-readable storage medium that stores executable instructions that when executed cause the machine to perform the method as described above.
- the wearable device by awakening the wearable device when the detected sound signal comes from the wearer and the sound signal belongs to human-machine interactive voice, the user does not need to make a wake-up operation specifically, but the user and the wearable During the normal interaction process of the device, the wearable device is awakened, so that it can be naturally awakened, and thus can bring the user a natural and smooth experience.
- the device and system of the present disclosure determines whether the sound signal comes from the wearer based on the biometric information related to the wearer's utterance within a predetermined time period from the time point when the sound signal is detected, Since the biometric information can accurately reflect whether the wearer has made a vocalization action, it can accurately identify whether the detected sound signal is emitted by the wearer.
- the device and system of the present disclosure it is possible to determine whether the sound signal comes from the wearer based on the biometric information of a predetermined time period backward or forward from the time point when the sound signal is detected and the sound signal reaches the first threshold Avoid interference from environmental noise, thereby avoiding misjudgement of the source of the sound.
- the device and system of the present disclosure it is possible to determine whether the signal at the time of sound production comes from the wearer based on the wearer's vocalization-related muscle motion information, facial muscle vibration information and other biometric information, thereby providing a variety of natural Implementation of wake-up wearable device.
- the device and system of the present disclosure determines whether the sound signal comes from the wearer by recognizing the voiceprint information from the detected sound signal and based on the recognized voiceprint information and the wearer's voiceprint characteristics, because the wearer The characteristics of the voiceprint are unique, so it can accurately determine the source of the sound signal.
- the bone conduction sound signal acquired by the bone conduction detection device is used to determine whether the sound signal comes from the wearer, which can accurately determine the source of the sound signal, and not only provides an easy wake-up Program, and no need to configure additional detection hardware, saving hardware costs.
- FIG. 1 is a flowchart of a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 2 is a flowchart of an example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 3 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 4 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 5 and 6 are flowcharts of examples of facial muscle motion information acquisition processes in a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 7 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 8 is a flowchart of a method for waking up a wearable device according to another embodiment of the present disclosure.
- FIG. 9 is a flowchart of a method for waking up a wearable device according to another embodiment of the present disclosure.
- FIG. 10 is a structural block diagram of an apparatus for waking up a wearable device according to an embodiment of the present disclosure
- FIG. 11 is a structural block diagram of an example of a sound source determination unit in the apparatus for waking up a wearable device shown in FIG. 10;
- FIG. 12 and 13 are structural block diagrams of examples of the biometric information acquisition module in the device for waking up the wearable device of FIG. 10;
- FIG. 14 is a structural block diagram of another example of the sound source determination unit in the apparatus for waking up the wearable device shown in FIG. 10;
- FIG. 15 is a structural block diagram of an apparatus for waking up a wearable device according to another embodiment of the present disclosure.
- 16 is a structural block diagram of a computing device for implementing a method for waking up a wearable device according to an embodiment of the present disclosure.
- FIG. 17 is a structural block diagram of a computing device for implementing a method for waking up a wearable device according to an embodiment of the present disclosure.
- the term “including” and its variations represent open terms, meaning “including but not limited to.”
- the term “based on” means “based at least in part on.”
- the terms “one embodiment” and “an embodiment” mean “at least one embodiment”.
- the term “another embodiment” means “at least one other embodiment”.
- the terms “first”, “second”, etc. may refer to different or the same objects. The following may include other definitions, whether explicit or implicit. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
- FIG. 1 is a flowchart of a method for waking up a wearable device according to an embodiment of the present disclosure.
- a sound signal is detected, and at block 104, it is determined whether a sound signal is detected.
- the process from block 102 to block 104 is not an essential element, and the process may also be a process performed independently of the method of the present disclosure.
- the wearable device When a sound signal is detected, at block 106, based on the detected sound signal, it is determined whether the sound signal is from the wearer of the wearable device.
- the detected sound signal may be emitted by the wearer of the wearable device, or may be ambient noise in the environment in which the wearable device is located or sounds generated by others. In this embodiment, only when the detected sound signal comes from the wearable device, the wearable device is allowed to wake up to avoid the wearable device from being awakened by mistake.
- the voiceprint information in the sound signal can be recognized, and then whether the detected sound signal comes from the wearing of the wearable device can be determined according to the voiceprint information and the wearer's voiceprint characteristics By. For example, samples of sound signals can be collected to train a voiceprint recognition model. Before the wearer uses the wearable device, the wearer can record his own voice multiple times. After the wearer's voice is recorded, the recorded voice is used to train the trained voiceprint recognition model again, so as to perform feature extraction on the wearer's voice.
- the voiceprint recognition model may recognize whether the detected sound signal comes from the wearer based on the extracted voiceprint features of the wearer. For example, the voiceprint recognition model can recognize the similarity between the detected sound signal and the wearer's voiceprint feature, and determine that the sound signal comes from the wearer when the similarity is not lower than a certain threshold.
- a voice classification model is used to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device. It can be determined whether the detected sound signal belongs to a dialogue between the wearer and the wearable device by determining whether the detected sound signal belongs to a human-machine interactive dialogue.
- the characteristics of the rhythm, loudness, tone, wording, sentence structure, overtone strength and other characteristics of speech are usually different from those of human-to-human conversations. Therefore, it is possible to collect voice data samples including human-to-human dialogues and voice data samples of human-machine interactive dialogues, and then use these voice data samples to train a voice classification model.
- the sound signal can be input to a trained speech classification model to classify and predict the sound signal.
- the speech classification model may output the probability that the detected sound signal belongs to a human-computer interactive dialogue. At this time, when the probability output by the voice classification model is not lower than a predetermined probability threshold, the sound signal may be determined as a dialogue between the wearer and the wearable device.
- the wearable device When the sound signal comes from the wearer and the sound signal belongs to the dialogue between the wearer and the wearable device, at block 110, the wearable device is awakened. Thereby, when the wearer issues an arbitrary voice command to the wearable device, the wearable device can be naturally awakened. The wearer does not need to specifically wake up when using the wearable device, which can bring a natural and smooth user experience to the wearer.
- FIG. 2 is a flowchart of an example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure.
- biometric information related to utterance of the wearer within a predetermined time period from the time point when the sound signal is detected is acquired by the wearer backward or forward.
- biometric information within a predetermined time period forward or backward at the time point may be acquired.
- biometric information during sound signal detection can be acquired and stored in a database.
- biometric information for a predetermined period of time before the time point when the sound signal reaches the first threshold is acquired.
- biometric information for a predetermined time period backward from the time point it is also possible to detect the biometric information for a predetermined time period backward from the time point.
- biometric information of the wearer during the detection of the sound signal can be acquired.
- the sound signal is from the wearer based on the biometric information.
- the biometrics of the wearer's face, throat, etc. will change, so it is possible to identify whether the wearer uttered during the detection of the sound signal based on the changes in the biometric information related to the utterance.
- the biometric information indicates that when the wearer makes a sound during the detection of the sound signal, it can be determined that the sound signal comes from the wearer.
- the biometric information may be, for example, muscle myoelectric signals, facial muscle movement information, muscle vibration information, and the like.
- the following describes an example of determining whether the wearer utters a voice based on the acquired biometric information, and then determining the source of the sound, with reference to FIGS. 3 to 6.
- FIG. 3 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure.
- the wearer's muscular myoelectric signal related to utterance at a predetermined time period from the time point when the sound signal is detected is acquired by the wearer.
- a myoelectric detection device can be used to detect the myoelectric signal of the wearer's face, scalp, or neck.
- Muscle electromyography signals can be acquired using electromyography (EMG) devices.
- the acquired muscle EMG signal is not lower than the EMG threshold, at block 306, it is determined that the sound signal is from the wearer.
- these related muscle EMG signals will show peak fluctuations. Therefore, when the acquired muscle myoelectric signal is not lower than the myoelectric threshold value, it can be determined that the wearer is speaking when the sound signal is detected. At this time, it can be determined that the sound signal is from the wearer.
- FIG. 4 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure.
- facial wearer movement information of the wearer is obtained backward or forward within a predetermined time period from the time point when the sound signal is detected. Facial muscle movement information can be obtained using the examples shown in FIGS. 5 and 6.
- a facial image of the wearer is acquired backward or forward within a predetermined time period from the time point when the sound signal is detected.
- a monocular RGB camera installed on the wearer's forehead can be used to acquire a facial image.
- facial muscle movement information of the wearer's facial muscles related to vocalization is identified based on the facial image. For example, it is possible to detect whether there is movement of the face and mouth in the face image.
- the facial image when a person is speaking and the facial image when not speaking can be used as training samples to train an image classification model, and then the trained image classification model is used to predict and classify the acquired facial image, thereby passing Recognize facial muscle information to identify whether the wearer has performed a speech action.
- the facial structure information of the wearer is acquired backward or forward within a predetermined time period from the time point when the sound signal is detected.
- the facial structure information may be facial ToF (time of flight) information, facial ultrasound scanning information, facial structure light information, and the like. You can use the ToF 3D module, ultrasonic sensor, structured light scanning module, binocular RGB camera and other devices installed on the wearer's forehead to obtain facial structure information.
- a 3D model of the wearer's face is created based on the facial structure information.
- facial muscle movement information of the wearer's facial muscles related to vocalization is detected based on the facial 3D model.
- the wearer's facial muscles move based on the facial muscle movement information. For example, whether the wearer's mouth muscles are moving can be detected based on the facial 3D model.
- the wearer When the facial muscles of the wearer are moving, it means that the wearer has performed a speech action when the sound signal is detected, so it can be determined at block 406 that the sound signal is from the wearer. For example, when the wearer's mouth or face moves, it can be determined that the wearer has performed a speech action, and thus it can be determined that the wearer has spoken when detecting the sound signal, and thus the sound signal can be determined to come from the wearer.
- FIG. 7 is a flowchart of another example of a sound source determination process in a method for waking up a wearable device according to an embodiment of the present disclosure.
- the wearer's muscular vibration information related to utterance is acquired backward or forward within a predetermined time period from the time point when the sound signal is detected.
- Muscle vibration information can be detected using a motion sensor (e.g., an inertial measurement unit (IMU)) attached to the muscle related to vocalization.
- IMU inertial measurement unit
- the facial muscles vibrate at block 706, it is determined that the sound signal is from the wearer of the wearable device.
- the wearer speaks, the facial muscles will vibrate, so when the facial muscles vibrate, it can be determined that there is a speaking action on the wearer's face, and based on this, it is determined that the sound signal comes from the wearable device.
- FIG. 8 is a flowchart of a method for waking up a wearable device according to another embodiment of the present disclosure.
- a bone conduction sound signal is detected using a bone conduction sound detection device attached to the head or neck of the wearer.
- the bone conduction sound detection device may be, for example, a bone conduction microphone.
- the bone conduction microphone can be attached to any position on the head or neck to detect bone conduction sound.
- the bone conduction sound signal is not lower than the predetermined sound threshold, at block 806, it is determined that the sound signal is from the wearer.
- the sound can be conducted through the bones of the head or neck.
- the bone conduction sound signal from the wearer's own is more than that from the environment other than the wearer's own The sound signal is stronger. Therefore, when the bone conduction sound signal is not lower than the predetermined sound threshold, it can be determined that the detected sound signal comes from the wearer.
- the predetermined sound threshold can be obtained through experiments to verify the value, and can also be adjusted by the wearer.
- the wearable device When it is determined that the sound signal comes from the wearer, it may be determined at block 810 whether the sound signal belongs to a human-machine interactive dialogue. When the sound signal originates from the wearer and belongs to a human-computer interaction dialogue, at block 812, the wearable device is awakened.
- FIG. 9 is a flowchart of a method for waking up a wearable device according to another embodiment of the present disclosure.
- biometric information related to vocalization of the wearer of the wearable device is acquired.
- the biometric information may be any one or more of the muscle myoelectric signals related to vocalization, facial muscle movement information, bone conduction sound signals, etc. as described above.
- the wearer vocalizes based on the biometric information.
- the process of determining whether the wearer utters can be performed, for example, by the utterance determination process described above with reference to FIGS. 3-8.
- a bone conduction microphone attached to the head or neck of the wearer may be used to detect a bone conduction sound signal, and when the bone conduction sound signal reaches a predetermined sound threshold, it is determined that the wearer is vocalizing.
- a sound signal during the wearer's utterance is detected.
- a muscle myoelectric detection device can be used to detect the muscle myoelectric signal of the wearer's facial muscles, when the muscle myoelectric signal is not lower than a predetermined myoelectric threshold, it is determined that the wearer is vocalizing, and the muscle myoelectric signal can reach the predetermined The sound signal starts to be detected at the EMG threshold.
- a motion sensor for example, an inertial measurement unit (IMU) attached to the vocalization-related muscles can be used to detect whether the vocalization-related muscles vibrate, and when the vibration occurs, it can be determined that the wearer is vocalizing At this time, the sound signal can be detected.
- the facial muscle movement information of the wearer can also be obtained by the method described above with reference to FIGS. 4-6. When the facial muscle movement information indicates that the wearer is uttering sound, the detection of the sound signal is started.
- a voice classification model is used to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device.
- a speech classification model may be trained using speech data samples including human-to-human conversations and speech data samples of human-machine interactive conversations, so that the speech classification model is used to classify the detected sound signals.
- the wearable device is awakened.
- FIG. 10 is a structural block diagram of an apparatus for waking up a wearable device (hereinafter referred to as a wearable device wake-up device) 1000 according to an embodiment of the present disclosure.
- the wearable device wake-up device 1000 includes a sound detection unit 1010, a sound source determination unit 1020, a sound signal classification unit 1030, and a device wake-up unit 1040.
- the sound detection unit 1010 is configured to detect a sound signal.
- the sound source determination unit 1020 is configured to determine whether the sound signal comes from the wearer of the wearable device based on the detected sound signal.
- the sound signal classification unit 1030 is configured to determine whether the sound signal belongs to a dialogue between the wearer and the wearable device using a voice classification model based on the detected sound signal.
- the device wake-up unit 1040 is configured to wake up the wearable device.
- the wearable device wake-up device of the present disclosure may not include a sound detection unit.
- the sound detection unit may also be an element independent of the wake-up device of the wearable device.
- the sound signal may be a bone conduction sound signal
- the sound detection unit 1010 may be a bone conduction sound detection unit.
- the bone conduction sound detection unit is configured to be attached to the head or neck of the wearer when the wearer wears the wearable device to detect a bone conduction sound signal.
- the bone conduction sound detection device may be a bone conduction microphone that can be worn on the ear of the wearer, and the sound signal detected by the bone conduction sound detection device may pass through Or wirelessly send to the sound source determination unit and sound signal classification unit.
- the bone conduction sound detection device can be designed to be hung on the wearable device, and the wearer can attach the wearable device to any position near the head bone or the neck bone when using the wearable device.
- the sound signal source determination module may determine that the sound signal is a sound signal from the wearer of the wearable device when the bone conduction sound signal is not lower than a predetermined sound threshold.
- FIG. 11 is a structural block diagram of an example of a sound source determination unit 1020 in the wearable device wake-up device 1000 according to an embodiment of the present disclosure.
- the sound source determination unit 1020 includes a biometric information acquisition module 1021 and a sound source determination module 1022.
- the biometric information acquisition module 1021 is configured to acquire biometric information related to vocalization of the wearer backward or forward within a predetermined time period from the time point when the sound signal is detected.
- the biometric information acquisition module 1021 may also be configured to acquire the wearer's vocalization-related organisms within a predetermined time period backward or forward from the time point when the sound signal is detected and the sound signal reaches the first threshold Feature information.
- the biometric information may be muscle myoelectric signals related to vocalization, facial structure information, facial muscle movement information, facial muscle vibration information, and the like.
- the sound signal source determination module 1022 may determine whether the sound signal comes from the wearer based on the biometric information.
- the biometric information may include wearer's muscle vibration information related to vocalization.
- the biometric information acquisition module 1021 may include a muscle vibration information acquisition sub-module for detecting muscle vibration information related to vocalization.
- the sound signal source determination module 1022 may be configured to determine that the sound signal comes from the wearer when the muscle vibration information related to the utterance indicates that the wearer's muscle related to the utterance is vibrating.
- the biometric information acquisition module 1021 may include a muscle myoelectricity detection sub-module.
- the muscle myoelectricity detection sub-module is configured to acquire the wearer's muscle myoelectricity signal backward or forward within a predetermined time period from the time point when the sound signal is detected.
- the sound source determination module is configured to determine that the sound signal is from the wearer when the acquired muscle EMG signal is not lower than a predetermined EMG threshold.
- the biometric information includes facial muscle movement information of the wearer.
- the sound signal source determination module may be configured to determine that the sound signal is from the wearer when the wearer's facial muscle movement information indicates that the wearer's facial muscles related to vocalization are in motion. Muscle motion information can be detected based on facial images or facial 3D models.
- FIG. 12 and 13 are structural block diagrams of examples of biometric information acquisition modules in the wearable device wake-up device 1000 in the embodiment shown in FIG. 11.
- the biometric information acquisition module 1021 may include: a facial image acquisition submodule 10211 and a muscle motion information recognition submodule 10212.
- the facial image acquisition submodule 10211 is configured to acquire a facial image of the wearer backward or forward within a predetermined time period from the time point when the sound signal is detected.
- the muscle movement information recognition submodule 10212 is configured to recognize facial muscle movement information of the wearer's facial muscles related to vocalization based on the facial image.
- the biometric information acquisition module may include a facial structure information acquisition submodule 10214, a facial 3D model establishment submodule 10215, and a muscle motion information recognition submodule 10216.
- the facial structure information obtaining sub-module 10213 is configured to obtain the facial structure information of the wearer backward or forward within a predetermined time period from the time point when the sound signal is detected. Then, the facial 3D model creation submodule 10214 creates a facial 3D model of the wearer based on the facial structure information.
- the muscle motion information recognition submodule 10216 may detect facial muscle motion information of the wearer's facial muscles related to vocalization based on the facial 3D model.
- FIG. 14 is a structural block diagram of another example of the sound source determination unit 1020 in the wearable device wake-up device 1000 shown in FIG. 10.
- the sound source determination unit may include a voiceprint information recognition module 1023 and a sound source determination module 1024.
- the voiceprint information recognition module 1023 is configured to recognize voiceprint information of the detected sound signal.
- the sound source determination module 1024 is configured to determine whether the sound signal comes from the wearer based on the wearer's voiceprint characteristics and the recognized voiceprint information.
- the wearable device wake-up device 1500 includes a biometric information acquisition unit 1510, a sound detection unit 1520, a sound signal classification unit 1530, and a device wake-up unit 1540.
- the biometric information acquisition unit 1510 is configured to acquire biometric information related to utterance of the wearer of the wearable device.
- the sound detection unit 1520 detects a sound signal during the wearer's vocalization action.
- the sound signal classification unit 1530 is configured to determine whether the sound signal belongs to a dialogue between the wearer and the wearable device using a voice classification model based on the sound signal.
- the device wake-up unit 1540 wakes up the wearable device.
- the biometric information acquisition unit 1510 may include at least one of a muscle myoelectricity detection module, a muscle motion detection module, and a bone conduction sound detection module.
- the muscle EMG detection module is configured to detect the muscle EMG signal of the wearer.
- the muscle movement detection module is configured to detect facial muscle movement information of the wearer.
- the bone conduction sound detection module is configured to detect the bone conduction sound signal of the wearer.
- the apparatus for identifying application program controls displayed on a terminal device of the present disclosure may be implemented in hardware, or may be implemented in software or a combination of hardware and software.
- the apparatus for identifying the application program control displayed on the terminal device may be implemented using a computing device, for example.
- the computing device 1600 may include at least one processor 1610 that executes at least one computer-readable instruction (ie, the above) stored or encoded in a computer-readable storage medium (ie, memory 1620) Elements implemented in software).
- a computer-readable storage medium ie, memory 1620
- computer executable instructions are stored in the memory 1620, which when executed cause at least one processor 1610 to: based on the detected sound signal, determine whether the sound signal is from the wearer of the wearable device; based on the sound Signal, using the voice classification model to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device; and when the sound signal comes from the wearer and the sound signal belongs to the dialogue between the wearer and the wearable device, wake up the wearable equipment.
- the computing device 1700 may include at least one processor 1710 that executes at least one computer-readable instruction (ie, the above) stored or encoded in a computer-readable storage medium (ie, memory 1720) Elements implemented in software).
- a computer-readable storage medium ie, memory 1720
- computer executable instructions are stored in the memory 1720, which when executed causes at least one processor 1710 to: acquire biometric information related to vocalization of the wearer of the wearable device; when the biometric information indicates the When the wearer utters, detect the sound signal during the wearer's utterance; based on the sound signal, use the voice classification model to determine whether the sound signal belongs to the dialogue between the wearer and the wearable device; and when the wearer utters the sound When the received sound signal belongs to the dialogue between the wearer and the wearable device, wake up the wearable device.
- a program product such as a non-transitory machine-readable medium.
- the non-transitory machine-readable medium may have instructions (ie, the above-mentioned elements implemented in the form of software), which when executed by the machine, causes the machine to perform various embodiments of the present disclosure in conjunction with FIGS. 1-8, 10-14 above Describe various operations and functions.
- the non-transitory machine-readable medium may have instructions (ie, the above-mentioned elements implemented in the form of software), which when executed by a machine, causes the machine to perform various embodiments of the present disclosure in conjunction with FIG. 9 and Figure 15 describes various operations and functions.
- a system or device equipped with a readable storage medium may be provided, on which the software program code that implements the functions of any of the above-mentioned embodiments is stored, and the computer or the system or device
- the processor reads and executes the instructions stored in the readable storage medium.
- the program code itself read from the readable medium can realize the functions of any of the above-mentioned embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute the present invention a part of.
- Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, non- Volatile memory card and ROM.
- the program code can be downloaded from the server computer or the cloud by the communication network.
- the device structure described in the above embodiments may be a physical structure or a logical structure, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented by multiple Some components in independent devices are implemented together.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Neurology (AREA)
- Neurosurgery (AREA)
- Biomedical Technology (AREA)
- Dermatology (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- User Interface Of Digital Computer (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
一种用于唤醒可穿戴设备的方法及装置。该方法包括:基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者(106);基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话(108);以及在所述声音信号来自于所述佩戴者且所述声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备(110)。利用该方法及装置能够在佩戴者与可穿戴设备的正常交互过程中唤醒可穿戴设备,从而能够实现自然的唤醒过程,提高可穿戴设备的用户体验。
Description
本公开涉及电子设备领域,具体地,涉及用于唤醒可穿戴设备的方法及装置。
随着科技的发展,可穿戴设备正逐步在人们的生活中占据重要的地位。考虑到功耗以及电池续航等问题,可穿戴设备通常不会一直处于正常工作状态。在用户需要使用时,可以通过一定手段将可穿戴设备唤醒至正常工作状态。
现有技术中,唤醒可穿戴设备的方式(例如有唤醒词方法)都不够自然。利用唤醒词来唤醒可穿戴设备时,由用户说出某个特定唤醒词,可穿戴设备监听到唤醒词后进行相应的语音处理,从而唤醒可穿戴设备。现有技术中唤醒可穿戴设备的方式过于机械,不能达到自然流畅的体验效果。因此,亟需一种能够实现自然唤醒可穿戴设备的唤醒方法。
发明内容
鉴于上述,本公开提供了一种用于唤醒可穿戴设备的方法及装置,利用该方法及装置能够在佩戴者与可穿戴设备的正常交互过程中唤醒可穿戴设备,从而能够实现自然的唤醒过程,提高可穿戴设备的用户体验。
根据本公开的一个方面,提供了一种用于唤醒可穿戴设备的方法,包括:基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者;基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及在所述声音信号来自于所述佩戴者且所述声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
可选地,在一个示例中,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者可以包括:获取所述佩戴者在所述声音 信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息;以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者。
可选地,在一个示例中,获取所述佩戴者在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息可以包括:获取所述佩戴者在所述声音信号被检测到且所述声音信号达到第一阈值时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息。
可选地,在一个示例中,所述生物特征信息可以包括所述佩戴者的肌肉肌电信号,基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者可以包括:在所获取到的所述肌肉肌电信号不低于预定肌电阈值时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,所述生物特征信息可以包括所述佩戴者的面部肌肉运动信息,基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者包括:在所述佩戴者的面部肌肉运动信息表明所述佩戴者的与发声相关的面部肌肉发生运动时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息可以包括:获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部图像;以及基于所述面部图像识别所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
可选地,在一个示例中,获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的生物特征信息可以包括:获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部结构信息;基于所述面部结构信息建立所述佩戴者的面部3D模型;以及基于所述面部3D模型检测所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
可选地,在一个示例中,所述生物特征信息包括所述佩戴者的与发声相关的肌肉震动信息,以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者可以包括:在所述与发声相关的肌肉震动信息表明所述 佩戴者的与发声有关的肌肉存在震动时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者可以包括:识别所检测到的声音信号的声纹信息;基于所述佩戴者的声纹特征和所识别的声纹信息,确定所述声音信号是否来自于所述佩戴者。
可选地,在一个示例中,所述声音信号可以为骨传导声音信号,所述骨传导声音信号是利用贴附在所述佩戴者的头部或颈部的骨传导声音检测装置检测得到的,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者可以包括:在所述骨传导声音信号不低于预定声音阈值时,确定所述声音信号为来自于可穿戴设备的佩戴者的声音信号。
根据本公开的另一方面,还提供一种用于唤醒可穿戴设备的方法,包括:获取可穿戴设备的佩戴者的与发声相关的生物特征信息;当所述生物特征信息表明所述佩戴者发声时,检测在所述佩戴者发声过程中的声音信号;基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及当在佩戴者发声过程中检测到的声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
可选地,在一个示例中,所述生物特征信息可以包括以下中的至少一者:所述佩戴者的肌肉肌电信号;所述佩戴者的面部肌肉运动信息;以及所述佩戴者的骨传导声音信号。
根据本公开的另一方面,还提供一种用于唤醒可穿戴设备的装置,包括:声音来源确定单元,被配置为基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者;声音信号分类单元,被配置为基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及设备唤醒单元,被配置为在所述声音信号来自于所述佩戴者且所述声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
可选地,在一个示例中,所述声音来源确定单元可以包括:生物特征信息获取模块,被配置为获取所述佩戴者在所述声音信号被检测到时的时 间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息;以及声音信号来源确定模块,被配置为基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者。
可选地,在一个示例中,所述生物特征信息获取模块可以被配置为:获取所述佩戴者在所述声音信号被检测到且所述声音信号达到第一阈值时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息。
可选地,在一个示例中,生物特征信息获取模块可以包括:肌肉肌电检测子模块,被配置为获取所述佩戴者在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的肌肉肌电信号,所述声音信号来源确定模块被配置为:在所获取到的所述肌肉肌电信号不低于预定肌电阈值时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,所述生物特征信息包括所述佩戴者的面部肌肉运动信息,所述声音信号来源确定模块可以被配置为:在所述佩戴者的面部肌肉运动信息表明所述佩戴者的与发声相关的面部肌肉发生运动时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,所述生物特征信息获取模块可以包括:面部图像获取子模块,被配置为获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部图像;以及肌肉运动信息识别子模块,被配置为基于所述面部图像识别所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
可选地,在一个示例中,所述生物特征信息获取模块可以包括:面部结构信息获取子模块,被配置为获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部结构信息;面部3D模型建立子模块,被配置为基于所述面部结构信息建立所述佩戴者的面部3D模型;以及肌肉运动信息识别子模块,被配置为基于所述面部3D模型检测所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
可选地,在一个示例中,所述生物特征信息包括所述佩戴者的与发声相关的肌肉震动信息,所述声音信号来源确定模块可以被配置为:在所述与发声相关的肌肉震动信息表明所述佩戴者的与发声有关的肌肉存在震动 时,确定所述声音信号来自于所述佩戴者。
可选地,在一个示例中,所述声音来源确定单元可以包括:声纹信息识别模块,被配置为识别所检测到的声音信号的声纹信息;声音信号来源确定模块,被配置为基于所述佩戴者的声纹特征和所识别的声纹信息,确定所述声音信号是否来自于所述佩戴者。
可选地,在一个示例中,所述装置还包括:声音检测单元,被配置为检测声音信号。
可选地,在一个示例中,所述声音信号可以为骨传导声音信号,声音检测单元可以包括:骨传导声音检测单元,被配置为当所述佩戴者佩戴所述可穿戴设备时,能够贴附在所述佩戴者的头部或颈部,以检测骨传导声音信号。声音信号来源确定模块,被配置为:在所述骨传导声音信号不低于预定声音阈值时,确定所述声音信号为来自于可穿戴设备的佩戴者的声音信号。
根据本公开的另一方面,还提供一种用于唤醒可穿戴设备的装置,包括:生物特征信息获取单元,被配置为获取可穿戴设备的佩戴者的与发声相关的生物特征信息;声音检测单元,被配置为当所述生物特征信息表明所述佩戴者发声时,检测在所述佩戴者发声过程中的声音信号;声音信号分类单元,被配置为基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及设备唤醒单元,被配置为当在佩戴者发声过程中检测到的声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
可选地,在一个示例中,所述生物特征信息获取单元可以包括以下中的至少一者:肌肉肌电检测模块,被配置为检测所述佩戴者的肌肉肌电信号;肌肉运动检测模块,被配置为检测所述佩戴者的面部肌肉运动信息;以及骨传导声音检测模块,被配置为所述佩戴者的骨传导声音信号。
根据本公开的另一方面,还提供计算设备,包括:至少一个处理器;以及存储器,所述存储器存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如权利要求如上所述的方法。
根据本公开的另一方面,还提供非暂时性机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如上所述的方法。
利用本公开的方法及装置,通过在检测到的声音信号来自于佩戴者且该声音信号属于人机交互语音时唤醒可穿戴设备,不需要用户特意做出唤醒操作,而是在用户与可穿戴设备的正常交互过程中唤醒可穿戴设备,从而能够自然地实现唤醒,进而能够带给用户自然流畅的体验。
利用本公开的装置和系统,通过基于声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的生物特征信息来确定声音信号是否来自于佩戴者,由于生物特征信息能够准确反应出佩戴者是否进行了发声动作,因而能够准确识别出所检测到的声音信号是否是佩戴者发出的。
利用本公开的装置和系统,基于声音信号被检测到且声音信号达到第一阈值时的时间点起向后或向前预定时间段内的生物特征信息来确定声音信号是否来自于佩戴者,能够避免环境噪音的干扰,从而避免对声音的来源的误判。
利用本公开的装置和系统,可以基于佩戴者的与发声相关的肌肉运动信息、面部肌肉震动信息等生物特征信息来确定声产时信号是否来自于佩戴者,从而提供了多种可用于自然地唤醒可穿戴设备的实现方式。
利用本公开的装置和系统,通过从所检测到的声音信号中识别声纹信息并基于所识别出的声纹信息和佩戴者的声纹特征来确定声音信号是否来自于佩戴者,由于佩戴者的声纹特征是独一无二的,因而能够准确判断出声音信号的来源。
利用本公开的装置和系统,利用骨传导检测装置获取的骨传导声音信号来判断声音信号是否来自于佩戴者,能够在准确地确定声音信号的来源的前替下,不仅提供了容易实现的唤醒方案,而且不需要配置额外的检测硬件,节省了硬件成本。
通过参照下面的附图,可以实现对于本公开内容的本质和优点的进一步理解。在附图中,类似组件或特征可以具有相同的附图标记。附图是用来提供对本发明实施例的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开的实施例,但并不构成对本公开的实 施例的限制。在附图中:
图1是根据本公开的一个实施例的用于唤醒可穿戴设备的方法的流程图;
图2是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的一个示例的流程图;
图3是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图;
图4是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图;
图5和图6是根据本公开的实施例的用于唤醒可穿戴设备的方法中的面部肌肉运动信息获取过程的示例的流程图;
图7是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图;
图8是根据本公开的另一实施例的用于唤醒可穿戴设备的方法的流程图;
图9是根据本公开的另一实施例的用于唤醒可穿戴设备的方法的流程图;
图10是根据本公开的一个实施例的用于唤醒可穿戴设备的装置的结构框图;
图11是图10所示的用于唤醒可穿戴设备的装置中的声音来源确定单元的一个示例的结构框图;
图12和图13是图10的唤醒可穿戴设备的装置中的生物特征信息获取模块的示例的结构框图;
图14是图10所示的的用于唤醒可穿戴设备的装置中的声音来源确定单元的另一示例的结构框图;
图15是根据本公开的另一实施例的用于唤醒可穿戴设备的装置的结构框图;
图16是根据本公开的一个实施例的用于实现用于唤醒可穿戴设备的方法的计算设备的结构框图;以及
图17是根据本公开的一个实施例的用于实现用于唤醒可穿戴设备的方 法的计算设备的结构框图。
以下将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本公开内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
现在结合附图来描述本公开的用于实现用于唤醒可穿戴设备的方法及装置。
图1是根据本公开的一个实施例的用于唤醒可穿戴设备的方法的流程图。
如图1所示,在块102,检测声音信号,并在块104判断是否检测到了声音信号。对于本公开的方法,块102至块104的过程并不是必要要素,该过程也可以是独立于本公开的方法而执行的过程。
在检测到声音信号时,在块106,基于所检测到的声音信号,确定声音信号是否来自于可穿戴设备的佩戴者。所检测到的声音信号可能是可穿戴设备的佩戴者发出的,也可能是可穿戴设备所处环境中的环境噪音或其他人发生的声音。在该实施例中,仅在所检测到的声音信号来自于可穿戴设备时,允许唤醒可穿戴设备,以避免可穿戴设备被错误唤醒。
在一个示例中,在检测到声音信号时,可以识别声音信号中的声纹信息,进而根据声纹信息和佩戴者的声纹特征来确定所检测到的声音信号是 否来自于可穿戴设备的佩戴者。例如,可以采集声音信号样本来训练声纹识别模型,在佩戴者使用可穿戴设备之前,可以由佩戴者多次录入自己的声音。当佩戴者的声音被录入后,录入的声音用来再次训练已训练过的声纹识别模型,从而对佩戴者的声音进行特征提取。当佩戴者使用可穿戴设备时,声纹识别模型可以基于所提取的佩戴者的声纹特征来识别所检测到的声音信号是否来自于佩戴者。例如,声纹识别模型可以识别检测到的声音信号与佩戴者的声纹特征的相似度,当相似度不低于某一阈值时确定该声音信号来自于佩戴者。
当确定所检测到声音信号来自于可穿戴设备的佩戴者时,在块108,基于所检测到声音信号,利用语音分类模型来确定声音信号是否属于佩戴者与可穿戴设备之间的对话。可以通过确定所检测的声音信号是否属于人机交互对话来确定该声音信号是否属于佩戴者与可穿戴设备之间的对话。
当人与机器之间进行人机交互时,讲话的节奏、响度、音调、用词、句式、泛音强度等特征通常与人与人对话时不同。因而可以采集包括人与人对话的语音数据样本和人机交互对话的语音数据样本,进而利用这些语音数据样本来训练语音分类模型。当检测到声音信号时,可以将声音信号输入经过训练的语音分类模型,以对该声音信号进行分类预测。在一个示例中,语音分类模型可以输出所检测到的声音信号属于人机交互对话的概率。此时,可以在语音分类模型所输出的概率不低于预定概率阈值时,确定该声音信号为佩戴者与可穿戴设备之间的对话。
在声音信号来自于佩戴者且声音信号属于佩戴者与可穿戴设备之间的对话时,在块110,唤醒可穿戴设备。由此,能够在佩戴者向可穿戴设备发出任意语音指令时,自然地唤醒可穿戴设备。佩戴者在使用可穿戴设备时不需要特地进行唤醒操作,从而能为佩戴者带来自然流畅的使用体验。
图2是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的一个示例的流程图。
如图2所示,在块202,获取佩戴者在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的生物特征信息。在一个示例中,可以在所检测到的声音信号达到第一阈值时,获取该时间点向前或向后预定时间段内的生物特征信息。例如,可以获取在检测声音信号的 过程中的生物特征信息,并存储在数据库中。当所检测到的声音信号达到第一阈值时,获取在声音信号达到第一阈值的时间点之前预定时间段的生物特征信息。还可以在声音信号达到第一阈值时,检测从该时间点起向后预定时间段的生物特征信息。由此,能够获取在检测声音信号过程中的佩戴者的生物特征信息。
然后,在块204,基于生物特征信息确定声音信号是否来自于佩戴者。当佩戴者讲话时,佩戴者的面部、喉咙等部位的生物特征将会发生变化,因而可以基于与发声相关的生物特征信息的变化来识别在检测声音信号的过程中佩戴者有没有发声,当生物特征信息表明在检测声音信号的过程中佩戴者发声时,可以确定声音信号来自于佩戴者。
生物特征信息例如可以是肌肉肌电信号、面部肌肉运动信息、肌肉震动信息等。以下参考图3-图6,说明基于所获取到的生物特征信息来确定佩戴者是否发声,进而确定声音来源的示例。
图3是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图。
如图3所示,在块302,获取佩戴者在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的肌肉肌电信号。例如可以利用肌电检测装置来检测佩戴者的面部、头皮或颈部等部位的肌肉肌电信号。肌肉肌电信号可以利用肌电检查(EMG)装置来获取。
在获取到肌肉肌电信号时,在块304,判断所获取的肌肉肌电信号是否不低于预定肌电阈值。
当所获取的肌肉肌电信号不低于肌电阈值时,在块306,确定声音信号来自于佩戴者。当佩戴者发出声音时,这些相关的肌肉肌电信号会出现峰值波动。因而可以在所获取到的肌肉肌电信号不低于肌电阈值时,确定在检测到声音信号时佩戴者正在讲话,此时可确定声音信号来自于佩戴者。
图4是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图。
如图4所示,在块402,获取佩戴者在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的面部肌肉运动信息。面部肌肉运动信息可以利用图5和图6所示的示例来获取。
在图5所示的示例中,在块502,获取在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的面部图像。例如,可以利用设置于佩戴者前额处的单目RGB摄像头,来获取面部图像。
在块504,基于面部图像识别所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。例如,可以通过检测面部图像中的面部、嘴部是否有运动。在一个示例中,可以将人讲话时的面部图像和未讲话时的面部图像作为训练样本来训练图像分类模型,然后利用训练过的图像分类模型来对所获取的面部图像进行预测分类,从而通过识别面部肌肉信息来识别佩戴者是否进行了讲话动作。
在图6所示的示例中,在块602,获取在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的面部结构信息。例如,面部结构信息可以是面部ToF(飞行时间)信息、面部超声波扫描信息、面部结构光信息等。可以利用设置于佩戴者前额处的ToF 3D模块、超声波传感器、结构光扫描模块、双目RGB摄像头等装置来获取面部结构信息。
在块604,基于面部结构信息建立所述佩戴者的面部3D模型。
在获得面部3D模型之后,在块606,基于面部3D模型检测佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
在通过上述方式获取到肌肉运动信息之后,在块404,基于面部肌肉运动信息确定佩戴者的面部肌肉是否发生运动。例如,可以基于面部3D模型来检测佩戴者的嘴部肌肉是否发生运动。
当佩戴者的面部肌肉发生运动时,说明在检测到声音信号时佩戴者进行了讲话动作,因而可以在块406,确定声音信号来自于佩戴者。例如,当佩戴者的嘴部或脸部发生运动时,可以确定佩戴者进行了讲话动作,由此可确定在检测声音信号时佩戴者进行了讲话,因而可以确定声音信号来自于佩戴者。
图7是根据本公开的一个实施例的用于唤醒可穿戴设备的方法中的声音来源确定过程的另一示例的流程图。
如图7所示,在块702,获取佩戴者在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的肌肉震动信息。肌肉震动信息可以利用贴附在与发声相关的肌肉上的运动传感器(例如,惯性测 量单元(IMU))来检测。
然后在块704,基于肌肉震动信息确定佩戴者的与发声相关的面部肌肉是否发生震动。
当确定面部肌肉发生了震动时,在块706,确定声音信号来自于可穿戴设备的佩戴者。当佩戴者讲话时,面部肌肉会产生震动,因而可以在面部肌肉发生震动时,确定佩戴者的面部存在讲话动作,进而基于此确定声音信号来自于可穿戴设备。
图8是根据本公开的另一实施例的用于唤醒可穿戴设备的方法的流程图。
如图8所示,在块802,利用贴附在佩戴者的头部或颈部的骨传导声音检测装置检测骨传导声音信号。骨传导声音检测装置例如可以是骨传导麦克风。骨传导麦克风可以贴附于头部或颈部的任意位置,以检测骨传导声音。
然后在块804,确定骨传导声音信号是否不低于预定声音阈值。
当骨传导声音信号不低于预定声音阈值时,在块806,确定声音信号来自于佩戴者。当人发声时,声音能够通过头部或颈部的骨头进行传导,利用骨传导声音检测装置检测声音信号时,来自佩戴者自身的骨传导声音信号比来自于除佩戴者自身之外的环境的声音信号要强。因而可以在骨传导声音信号不低于预定声音阈值时,确定所检测到的声音信号来自于佩戴者。预定声音阈值可以通过实验获取以验值,还可以由佩戴者调整。
在确定声音信号来自于佩戴者时,可以在块810确定声音信号是否属于人机交互对话。当该声音信号来源于佩戴者且属于人机交互对话时,在块812,唤醒可穿戴设备。
图9是根据本公开的另一实施例的用于唤醒可穿戴设备的方法的流程图。
如图9所示,在块902,获取可穿戴设备的佩戴者的与发声相关的生物特征信息。生物特征信息可以是如上所述的与发声相关的肌肉肌电信号、面部肌肉运动信息、骨传导声音信号等中的任意一种或多种。
在块904,根据生物特征信息确定佩戴者是否发声。确定佩戴者是否发声的过程例如可以通过参照上述图3-图8所描述的发声确定过程来进行。 例如,可以利用贴附于佩戴者头部或颈部的骨传导麦克风来检测骨传导声音信号,当骨传导声音信号达到预定声音阈值时,确定佩戴者正在发声。
当生物特征信息表明佩戴者发声时,在块906,检测在佩戴者发声过程中的声音信号。例如,可以利用肌肉肌电检测装置来检测佩戴者面部肌肉的肌肉肌电信号,当肌肉肌电信号不低于预定肌电阈值时,确定佩戴者正在发声,可以在肌肉肌电信号达到该预定肌电阈值时开始检测声音信号。再例如,可以利用贴附在与发声相关的肌肉上的运动传感器(例如,惯性测量单元(IMU))来检测与发声相关的肌肉是否发生震动,当发生震动时可以确定此时佩戴者正在发声,此时可以开始检测声音信号。在一个示例中,还可以通过如上参照图4-6所描述的方法来获取佩戴者的面部肌肉动动信息,当面部肌肉运动信息表示佩戴者正在发声时,开始检测声音信号。
然后,在块908,基于声音信号,利用语音分类模型来确定声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话。可以利用包括人与人对话的语音数据样本和人机交互对话的语音数据样本来训练语音分类模型,从而利用该语音分类模型来对所检测到的声音信号进行分类。
当在佩戴者发声过程中检测到的声音信号属于佩戴者与可穿戴设备之间的对话时,在块910,唤醒可穿戴设备。
图10是根据本公开的一个实施例的用于唤醒可穿戴设备的装置(下称可穿戴设备唤醒装置)1000的结构框图。如图10所示,可穿戴设备唤醒装置1000包括声音检测单元1010、声音来源确定单元1020、声音信号分类单元1030和设备唤醒单元1040。
声音检测单元1010被配置为检测声音信号。声音来源确定单元1020被配置为基于所检测到的声音信号,确定声音信号是否来自于可穿戴设备的佩戴者。声音信号分类单元1030被配置为基于所检测到的声音信号,利用语音分类模型来确定声音信号是否属于佩戴者与可穿戴设备之间的对话。在声音信号来自于佩戴者且声音信号属于佩戴者与可穿戴设备之间的对话时,设备唤醒单元1040被配置为唤醒可穿戴设备。
需要说明的是,虽然图10中示出了声音检测单元,但应当理解的是,本公开的可穿戴设备唤醒装置可以不包括声音检测单元。声音检测单元也可以是独立于可穿戴设备唤醒装置的元素。
在一个示例中,声音信号可以为骨传导声音信号,声音检测单元1010可以是骨传导声音检测单元。骨传导声音检测单元被配置为当佩戴者佩戴所述可穿戴设备时,能够贴附在所述佩戴者的头部或颈部,以检测骨传导声音信号。例如,在可穿戴设备唤醒装置应用于可穿戴设备时,骨传导声音检测装置可以是能够戴在佩戴者的耳部的骨传导麦克风,骨传导声音检测装置所检测到的声音信号可以通过有经或无线的方式发送至声音来源确定单元和声音信号分类单元。再例如,骨传导声音检测装置可以被设计成悬挂在可穿戴设备上,在佩戴者使用可穿戴设备时可以将其贴附在靠近头部骨骼或颈部骨骼的任意位置。在该示例中,声音信号来源确定模块可以在骨传导声音信号不低于预定声音阈值时,确定声音信号为来自于可穿戴设备的佩戴者的声音信号。
图11是根据本公开的一个实施例的可穿戴设备唤醒装置1000中的声音来源确定单元1020的一个示例的结构框图。如图11所示,声音来源确定单元1020包括生物特征信息获取模块1021和声音来源确定模块1022。
生物特征信息获取模块1021被配置为获取佩戴者在声音信号被检测到时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的生物特征信息。生物特征信息获取模块1021还可以被配置为获取佩戴者在声音信号被检测到且声音信号达到第一阈值时的时间点起向后或向前预定时间段内的佩戴者的与发声相关的生物特征信息。生物特征信息可以是与发声相关的肌肉肌电信号、面部结构信息、面部肌肉运动信息、面部肌肉震动信息等。
在获取到佩戴者的生物特征信息之后,声音信号来源确定模块1022可以基于生物特征信息确定声音信号是否来自于佩戴者。在一个示例中,生物特征信息可以包括佩戴者的与发声相关的肌肉震动信息。在该示例中,生物特征信息获取模块1021可以包括肌肉震动信息获取子模块,以用于检测与发声相关的肌肉震动信息。此时,声音信号来源确定模块1022可以被配置为在与发声相关的肌肉震动信息表明佩戴者的与发声有关的肌肉存在震动时,确定声音信号来自于佩戴者。
在一个示例中,生物特征信息获取模块1021可以包括肌肉肌电检测子模块。肌肉肌电检测子模块被配置为获取佩戴者在声音信号被检测到时的 时间点起向后或向前预定时间段内的佩戴者的肌肉肌电信号。在该示例中,声音来源确定模块被配置为在所获取到的肌肉肌电信号不低于预定肌电阈值时,确定声音信号来自于佩戴者。
在一个示例中,生物特征信息包括佩戴者的面部肌肉运动信息。在该示例中,声音信号来源确定模块可以被配置为在佩戴者的面部肌肉运动信息表明佩戴者的与发声相关的面部肌肉发生运动时,确定声音信号来自于佩戴者。肌肉运动信息可以基于面部图像或面部3D模型来检测。
图12和图13是图11所示的实施例中的可穿戴设备唤醒装置1000中的生物特征信息获取模块的示例的结构框图。
如12所示,生物特征信息获取模块1021可以包括:面部图像获取子模块10211和肌肉运动信息识别子模块10212。面部图像获取子模块10211被配置为获取在声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部图像。肌肉运动信息识别子模块10212被配置为基于面部图像识别佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
如图13所示,所述生物特征信息获取模块可以包括面部结构信息获取子模块10214、面部3D模型建立子模块10215和肌肉运动信息识别子模块10216。面部结构信息获取子模块10213被配置为获取在声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部结构信息。然后,面部3D模型建立子模块10214基于面部结构信息建立佩戴者的面部3D模型。在建立面部3D模型之后,肌肉运动信息识别子模块10216可以基于面部3D模型检测佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
图14是图10所示的可穿戴设备唤醒装置1000中的声音来源确定单元1020的另一示例的结构框图。如图13所示,声音来源确定单元可以包括声纹信息识别模块1023和声音来源确定模块1024。声纹信息识别模块1023被配置为识别所检测到的声音信号的声纹信息。声音来源确定模块1024被配置为基于佩戴者的声纹特征和所识别的声纹信息,确定声音信号是否来自于佩戴者。
图15是根据本公开的另一实施例的可穿戴设备唤醒装置1500的结构框图。如图15所示,可穿戴设备唤醒装置1500包括生物特征信息获取单 元1510、声音检测单元1520、声音信号分类单元1530以及设备唤醒单元1540。
生物特征信息获取单元1510被配置为获取可穿戴设备的佩戴者的与发声相关的生物特征信息。当生物特征信息表明佩戴者正在进行发声动作时,声音检测单元1520检测在佩戴者进行发声动作过程中的声音信号。声音信号分类单元1530被配置为基于声音信号,利用语音分类模型来确定声音信号是否属于佩戴者与可穿戴设备之间的对话。在佩戴者发声过程中检测到的声音信号属于佩戴者与可穿戴设备之间的对话时,设备唤醒单元1540唤醒可穿戴设备。
在一个示例中,生物特征信息获取单元1510可以包括肌肉肌电检测模块、肌肉运动检测模块和骨传导声音检测模块中的至少一者。肌肉肌电检测模块被配置为检测佩戴者的肌肉肌电信号。肌肉运动检测模块被配置为检测佩戴者的面部肌肉运动信息。骨传导声音检测模块被配置为检测佩戴者的骨传导声音信号。
以上参考图1-15说明了本公开的用于唤醒可穿戴设备的方法及装置。需要说明的是,以上对方法实施例的细节说明同样适用于装置实施例。本公开的识别终端设备上显示的应用程序控件的装置可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。在本公开中,识别终端设备上显示的应用程序控件的装置例如可以利用计算设备实现。
图16是根据本公开的一个实施例的用于实现用于唤醒可穿戴设备的方法的计算设备1600的结构框图。根据一个实施例,计算设备1600可以包括至少一个处理器1610,该至少一个处理器1610执行在计算机可读存储介质(即,存储器1620)中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器1620中存储计算机可执行指令,其当执行时使得至少一个处理器1610:基于所检测到的声音信号,确定声音信号是否来自于可穿戴设备的佩戴者;基于声音信号,利用语音分类模型来确定声音信号是否属于佩戴者与可穿戴设备之间的对话;以及在声音信号来自于佩戴者且声音信号属于佩戴者与可穿戴设备之间的对话时,唤醒可穿戴设备。
应该理解,在存储器1620中存储的计算机可执行指令当执行时使得至少一个处理器1610进行本公开的各个实施例中以上结合图1-8、10-14描述的各种操作和功能。
图17是根据本公开的一个实施例的用于实现用于唤醒可穿戴设备的方法的计算设备1700的结构框图。根据一个实施例,计算设备1700可以包括至少一个处理器1710,该至少一个处理器1710执行在计算机可读存储介质(即,存储器1720)中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器1720中存储计算机可执行指令,其当执行时使得至少一个处理器1710:获取可穿戴设备的佩戴者的与发声相关的生物特征信息;当生物特征信息表明所述佩戴者发声时,检测在佩戴者发声过程中的声音信号;基于声音信号,利用语音分类模型来确定声音信号是否属于佩戴者与可穿戴设备之间的对话;以及当在佩戴者发声过程中检测到的声音信号属于佩戴者与可穿戴设备之间的对话时,唤醒可穿戴设备。
应该理解,在存储器1720中存储的计算机可执行指令当执行时使得至少一个处理器1710进行本公开的各个实施例中以上结合图9和15描述的各种操作和功能。
根据一个实施例,提供了一种例如非暂时性机器可读介质的程序产品。非暂时性机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本公开的各个实施例中以上结合图1-8、10-14描述的各种操作和功能。在一个示例中,非暂时性机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本公开的各个实施例中以上结合图9和图15描述的各种操作和功能。
具体地,可以提供配有可读存储介质的系统或者装置,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。
在这种情况下,从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存 储介质构成了本发明的一部分。
可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上或云上下载程序代码。
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。各步骤的执行顺序不是固定的,可以根据需要进行确定。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
以上结合附图详细描述了本公开的实施例的可选实施方式,但是,本公开的实施例并不限于上述实施方式中的具体细节,在本公开的实施例的技术构思范围内,可以对本公开的实施例的技术方案进行多种简单变型,这些简单变型均属于本公开的实施例的保护范围。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。
Claims (29)
- 一种用于唤醒可穿戴设备的方法,包括:基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者;基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及在所述声音信号来自于所述佩戴者且所述声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
- 如权利要求1所述的方法,其中,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者包括:获取所述佩戴者在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息;以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者。
- 如权利要求2所述的方法,其中,获取所述佩戴者在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息包括:获取所述佩戴者在所述声音信号被检测到且所述声音信号达到第一阈值时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息。
- 如权利要求2或3所述的方法,其中,所述生物特征信息包括所述佩戴者的肌肉肌电信号,以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者包括:在所获取到的所述肌肉肌电信号不低于预定肌电阈值时,确定所述声音信号来自于所述佩戴者。
- 如权利要求2或3所述的方法,其中,所述生物特征信息包括所述佩戴者的面部肌肉运动信息,以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者包括:在所述佩戴者的面部肌肉运动信息表明所述佩戴者的与发声相关的面部肌肉发生运动时,确定所述声音信号来自于所述佩戴者。
- 如权利要求5所述的方法,其中,获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息包括:获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部图像;以及基于所述面部图像识别所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
- 如权利要求5所述的方法,其中,获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息包括:获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部结构信息;基于所述面部结构信息建立所述佩戴者的面部3D模型;以及基于所述面部3D模型检测所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
- 如权利要求2或3所述的方法,其中,所述生物特征信息包括所述佩戴者的与发声相关的肌肉震动信息,以及基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者包括:在所述与发声相关的肌肉震动信息表明所述佩戴者的与发声有关的肌 肉存在震动时,确定所述声音信号来自于所述佩戴者。
- 如权利要求1所述的方法,其中,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者包括:识别所检测到的声音信号的声纹信息;基于所述佩戴者的声纹特征和所识别的声纹信息,确定所述声音信号是否来自于所述佩戴者。
- 如权利要求1所述的方法,其中,所述声音信号为骨传导声音信号,所述骨传导声音信号是利用贴附在所述佩戴者的头部或颈部的骨传导声音检测装置检测得到的,基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者包括:在所述骨传导声音信号不低于预定声音阈值时,确定所述声音信号为来自于可穿戴设备的佩戴者的声音信号。
- 一种用于唤醒可穿戴设备的方法,包括:获取可穿戴设备的佩戴者的与发声相关的生物特征信息;当所述生物特征信息表明所述佩戴者发声时,检测在所述佩戴者发声过程中的声音信号;基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及当在佩戴者发声过程中检测到的声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
- 如权利要求11所述的方法,其中,所述生物特征信息包括以下中的至少一者:所述佩戴者的肌肉肌电信号;所述佩戴者的面部肌肉运动信息;以及所述佩戴者的骨传导声音信号。
- 一种用于唤醒可穿戴设备的装置,包括:声音来源确定单元,被配置为基于所检测到的声音信号,确定所述声音信号是否来自于可穿戴设备的佩戴者;声音信号分类单元,被配置为基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及设备唤醒单元,被配置为在所述声音信号来自于所述佩戴者且所述声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
- 如权利要求13所述的装置,其中,所述声音来源确定单元包括:生物特征信息获取模块,被配置为获取所述佩戴者在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息;以及声音信号来源确定模块,被配置为基于所述生物特征信息确定所述声音信号是否来自于所述佩戴者。
- 如权利要求14所述的装置,其中,所述生物特征信息获取模块被配置为:获取所述佩戴者在所述声音信号被检测到且所述声音信号达到第一阈值时的时间点起向后或向前预定时间段内的所述佩戴者的与发声相关的生物特征信息。
- 如权利要求14或15所述的装置,其中,生物特征信息获取模块包括:肌肉肌电检测子模块,被配置为获取所述佩戴者在所述声音信号被检 测到时的时间点起向后或向前预定时间段内的所述佩戴者的肌肉肌电信号,其中,所述声音信号来源确定模块被配置为:在所获取到的所述肌肉肌电信号不低于预定肌电阈值时,确定所述声音信号来自于所述佩戴者。
- 如权利要求14或15所述的装置,其中,所述生物特征信息包括所述佩戴者的面部肌肉运动信息,所述声音信号来源确定模块被配置为:在所述佩戴者的面部肌肉运动信息表明所述佩戴者的与发声相关的面部肌肉发生运动时,确定所述声音信号来自于所述佩戴者。
- 如权利要求17所述的装置,其中,所述生物特征信息获取模块包括:面部图像获取子模块,被配置为获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部图像;以及肌肉运动信息识别子模块,被配置为基于所述面部图像识别所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
- 如权利要求17所述的装置,其中,所述生物特征信息获取模块包括:面部结构信息获取子模块,被配置为获取在所述声音信号被检测到时的时间点起向后或向前预定时间段内的所述佩戴者的面部结构信息;面部3D模型建立子模块,被配置为基于所述面部结构信息建立所述佩戴者的面部3D模型;以及肌肉运动信息识别子模块,被配置为基于所述面部3D模型检测所述佩戴者的与发声有关的面部肌肉的面部肌肉运动信息。
- 如权利要求14或15所述的装置,其中,所述生物特征信息包括所述佩戴者的与发声相关的肌肉震动信息,所述声音信号来源确定模块被 配置为:在所述与发声相关的肌肉震动信息表明所述佩戴者的与发声有关的肌肉存在震动时,确定所述声音信号来自于所述佩戴者。
- 如权利要求13所述的装置,其中,所述声音来源确定单元包括:声纹信息识别模块,被配置为识别所检测到的声音信号的声纹信息;声音信号来源确定模块,被配置为基于所述佩戴者的声纹特征和所识别的声纹信息,确定所述声音信号是否来自于所述佩戴者。
- 如权利要求13所述的装置,其中,所述装置还包括:声音检测单元,被配置为检测声音信号。
- 如权利要求22所述的装置,其中,所述声音信号为骨传导声音信号,所述声音检测单元包括:骨传导声音检测单元,被配置为当所述佩戴者佩戴所述可穿戴设备时,能够贴附在所述佩戴者的头部或颈部,以检测骨传导声音信号,声音信号来源确定模块,被配置为:在所述骨传导声音信号不低于预定声音阈值时,确定所述声音信号为来自于可穿戴设备的佩戴者的声音信号。
- 一种用于唤醒可穿戴设备的装置,包括:生物特征信息获取单元,被配置为获取可穿戴设备的佩戴者的与发声相关的生物特征信息;声音检测单元,被配置为当所述生物特征信息表明所述佩戴者发声时,检测在所述佩戴者发声过程中的声音信号;声音信号分类单元,被配置为基于所述声音信号,利用语音分类模型来确定所述声音信号是否属于所述佩戴者与所述可穿戴设备之间的对话;以及设备唤醒单元,被配置为当在佩戴者发声过程中检测到的声音信号属于所述佩戴者与所述可穿戴设备之间的对话时,唤醒所述可穿戴设备。
- 如权利要求24所述的装置,其中,所述生物特征信息获取单元包括以下中的至少一者:肌肉肌电检测模块,被配置为检测所述佩戴者的肌肉肌电信号;肌肉运动检测模块,被配置为检测所述佩戴者的面部肌肉运动信息;以及骨传导声音检测模块,被配置为所述佩戴者的骨传导声音信号。
- 一种计算设备,包括:至少一个处理器;以及存储器,所述存储器存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如权利要求1到10中任一所述的方法。
- 一种非暂时性机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如权利要求1到10中任一所述的方法。
- 一种计算设备,包括:至少一个处理器;以及存储器,所述存储器存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如权利要求11或12所述的方法。
- 一种非暂时性机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如权利要求11或12所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/420,465 US20220084529A1 (en) | 2019-01-04 | 2019-12-27 | Method and apparatus for awakening wearable device |
EP19907267.9A EP3890342B1 (en) | 2019-01-04 | 2019-12-27 | Waking up a wearable device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910007365.XA CN111475206B (zh) | 2019-01-04 | 2019-01-04 | 用于唤醒可穿戴设备的方法及装置 |
CN201910007365.X | 2019-01-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020140840A1 true WO2020140840A1 (zh) | 2020-07-09 |
Family
ID=71407287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/129114 WO2020140840A1 (zh) | 2019-01-04 | 2019-12-27 | 用于唤醒可穿戴设备的方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220084529A1 (zh) |
EP (1) | EP3890342B1 (zh) |
CN (1) | CN111475206B (zh) |
WO (1) | WO2020140840A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20210009596A (ko) * | 2019-07-17 | 2021-01-27 | 엘지전자 주식회사 | 지능적 음성 인식 방법, 음성 인식 장치 및 지능형 컴퓨팅 디바이스 |
US11908478B2 (en) | 2021-08-04 | 2024-02-20 | Q (Cue) Ltd. | Determining speech from facial skin movements using a housing supported by ear or associated with an earphone |
US12216749B2 (en) | 2021-08-04 | 2025-02-04 | Q (Cue) Ltd. | Using facial skin micromovements to identify a user |
CN113782038A (zh) * | 2021-09-13 | 2021-12-10 | 北京声智科技有限公司 | 一种语音识别方法、装置、电子设备及存储介质 |
CN115035886B (zh) * | 2021-09-17 | 2023-04-14 | 荣耀终端有限公司 | 声纹识别方法及电子设备 |
CN113724699B (zh) * | 2021-09-18 | 2024-06-25 | 优奈柯恩(北京)科技有限公司 | 设备唤醒识别模型训练方法、设备唤醒控制方法及装置 |
AU2023311501A1 (en) | 2022-07-20 | 2025-02-06 | Q (Cue) Ltd. | Detecting and utilizing facial micromovements |
JP7632424B2 (ja) | 2022-09-14 | 2025-02-19 | カシオ計算機株式会社 | 電子機器、電子機器の制御方法及びプログラム |
CN117135266B (zh) * | 2023-10-25 | 2024-03-22 | Tcl通讯科技(成都)有限公司 | 一种信息处理方法、装置及计算机可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103956164A (zh) * | 2014-05-20 | 2014-07-30 | 苏州思必驰信息科技有限公司 | 一种声音唤醒方法及系统 |
US9485733B1 (en) * | 2015-05-17 | 2016-11-01 | Intel Corporation | Apparatus, system and method of communicating a wakeup packet |
CN106714023A (zh) * | 2016-12-27 | 2017-05-24 | 广东小天才科技有限公司 | 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机 |
CN107665708A (zh) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | 智能语音交互方法及系统 |
Family Cites Families (78)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU7349998A (en) * | 1997-05-19 | 1998-12-11 | Creator Ltd. | Programmable assembly toy |
CN1767873B (zh) * | 2003-04-01 | 2012-03-28 | 麦道科技有限公司 | 监控肌肉活动的方法和用于监控肌肉活动的设备 |
CN100535806C (zh) * | 2007-11-16 | 2009-09-02 | 哈尔滨工业大学 | 基于双数字信号处理器的嵌入式多自由度肌电假手控制系统 |
CN101246687A (zh) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | 一种智能语音交互系统及交互方法 |
US8359020B2 (en) * | 2010-08-06 | 2013-01-22 | Google Inc. | Automatically monitoring for voice input based on context |
CN102999154B (zh) * | 2011-09-09 | 2015-07-08 | 中国科学院声学研究所 | 一种基于肌电信号的辅助发声方法及装置 |
US9214157B2 (en) * | 2011-12-06 | 2015-12-15 | At&T Intellectual Property I, L.P. | System and method for machine-mediated human-human conversation |
US9257115B2 (en) * | 2012-03-08 | 2016-02-09 | Facebook, Inc. | Device for extracting information from a dialog |
US9704486B2 (en) * | 2012-12-11 | 2017-07-11 | Amazon Technologies, Inc. | Speech recognition power management |
CN103279734A (zh) * | 2013-03-26 | 2013-09-04 | 上海交通大学 | 新型的智能手语翻译与人机交互系统及其使用方法 |
DE102013007502A1 (de) * | 2013-04-25 | 2014-10-30 | Elektrobit Automotive Gmbh | Computer-implementiertes Verfahren zum automatischen Trainieren eins Dialogsystems und Dialogsystem zur Erzeugung von semantischen Annotationen |
CN103458056B (zh) * | 2013-09-24 | 2017-04-26 | 世纪恒通科技股份有限公司 | 自动外呼系统基于自动分类技术的语音意图判定系统 |
CN103853071B (zh) * | 2014-01-20 | 2016-09-28 | 南京升泰元机器人科技有限公司 | 基于生物信号的人机面部表情交互系统 |
CN105575395A (zh) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | 语音唤醒方法及装置、终端及其处理方法 |
JP2017538146A (ja) * | 2014-10-20 | 2017-12-21 | アウディマックス・エルエルシー | インテリジェントな音声認識および処理のためのシステム、方法、およびデバイス |
US9786299B2 (en) * | 2014-12-04 | 2017-10-10 | Microsoft Technology Licensing, Llc | Emotion type classification for interactive dialog system |
AU2016205850B2 (en) * | 2015-01-06 | 2018-10-04 | David Burton | Mobile wearable monitoring systems |
KR102324735B1 (ko) * | 2015-01-19 | 2021-11-10 | 삼성전자주식회사 | 생체 정보를 이용하여 적응적 제어가 가능한 웨어러블 장치, 이를 포함하는 시스템, 및 이의 동작 방법 |
CN204537060U (zh) * | 2015-04-23 | 2015-08-05 | 宁波市美灵思医疗科技有限公司 | 一种基于肌电流和多传感器协同作用的人机交互设备 |
WO2016195156A1 (ko) * | 2015-06-02 | 2016-12-08 | 엘지전자 주식회사 | 이동 단말기 및 그 제어방법 |
DE102015210430A1 (de) * | 2015-06-08 | 2016-12-08 | Robert Bosch Gmbh | Verfahren zum Erkennen eines Sprachkontexts für eine Sprachsteuerung, Verfahren zum Ermitteln eines Sprachsteuersignals für eine Sprachsteuerung und Vorrichtung zum Ausführen der Verfahren |
KR20170029390A (ko) * | 2015-09-06 | 2017-03-15 | 정경환 | 음성 명령 모드 진입 방법 |
US9824287B2 (en) * | 2015-09-29 | 2017-11-21 | Huami Inc. | Method, apparatus and system for biometric identification |
CN105487661A (zh) * | 2015-11-27 | 2016-04-13 | 东莞酷派软件技术有限公司 | 一种终端控制方法及装置 |
US9913050B2 (en) * | 2015-12-18 | 2018-03-06 | Cochlear Limited | Power management features |
EP3185244B1 (en) * | 2015-12-22 | 2019-02-20 | Nxp B.V. | Voice activation system |
CN105501121B (zh) * | 2016-01-08 | 2018-08-03 | 北京乐驾科技有限公司 | 一种智能唤醒方法及系统 |
CN105810200A (zh) * | 2016-02-04 | 2016-07-27 | 深圳前海勇艺达机器人有限公司 | 基于声纹识别的人机对话装置及其方法 |
US9972322B2 (en) * | 2016-03-29 | 2018-05-15 | Intel Corporation | Speaker recognition using adaptive thresholding |
CN105912092B (zh) * | 2016-04-06 | 2019-08-13 | 北京地平线机器人技术研发有限公司 | 人机交互中的语音唤醒方法及语音识别装置 |
CN105869637B (zh) * | 2016-05-26 | 2019-10-15 | 百度在线网络技术(北京)有限公司 | 语音唤醒方法和装置 |
CN107767861B (zh) * | 2016-08-22 | 2021-07-02 | 科大讯飞股份有限公司 | 语音唤醒方法、系统及智能终端 |
US10566007B2 (en) * | 2016-09-08 | 2020-02-18 | The Regents Of The University Of Michigan | System and method for authenticating voice commands for a voice assistant |
JP6515897B2 (ja) * | 2016-09-28 | 2019-05-22 | トヨタ自動車株式会社 | 音声対話システムおよび発話意図理解方法 |
CN106569607A (zh) * | 2016-11-08 | 2017-04-19 | 上海交通大学 | 一种基于肌电及运动传感器的头部动作识别系统 |
JP6913164B2 (ja) * | 2016-11-11 | 2021-08-04 | マジック リープ, インコーポレイテッドMagic Leap,Inc. | 完全な顔画像の眼球周囲およびオーディオ合成 |
KR20180055661A (ko) * | 2016-11-16 | 2018-05-25 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
CN106558308B (zh) * | 2016-12-02 | 2020-05-15 | 深圳撒哈拉数据科技有限公司 | 一种互联网音频数据质量自动打分系统及方法 |
US10692485B1 (en) * | 2016-12-23 | 2020-06-23 | Amazon Technologies, Inc. | Non-speech input to speech processing system |
CN106653021B (zh) * | 2016-12-27 | 2020-06-02 | 上海智臻智能网络科技股份有限公司 | 语音唤醒的控制方法、装置及终端 |
KR20180084392A (ko) * | 2017-01-17 | 2018-07-25 | 삼성전자주식회사 | 전자 장치 및 그의 동작 방법 |
CN106952646A (zh) * | 2017-02-27 | 2017-07-14 | 深圳市朗空亿科科技有限公司 | 一种基于自然语言的机器人交互方法和系统 |
US10468032B2 (en) * | 2017-04-10 | 2019-11-05 | Intel Corporation | Method and system of speaker recognition using context aware confidence modeling |
US11250844B2 (en) * | 2017-04-12 | 2022-02-15 | Soundhound, Inc. | Managing agent engagement in a man-machine dialog |
US10313782B2 (en) * | 2017-05-04 | 2019-06-04 | Apple Inc. | Automatic speech recognition triggering system |
CN108229283B (zh) * | 2017-05-25 | 2020-09-22 | 深圳市前海未来无限投资管理有限公司 | 肌电信号采集方法及装置 |
GB201801526D0 (en) * | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for authentication |
CN107972028B (zh) * | 2017-07-28 | 2020-10-23 | 北京物灵智能科技有限公司 | 人机交互方法、装置及电子设备 |
CN107644641B (zh) * | 2017-07-28 | 2021-04-13 | 深圳前海微众银行股份有限公司 | 对话场景识别方法、终端以及计算机可读存储介质 |
CN107704275B (zh) * | 2017-09-04 | 2021-07-23 | 百度在线网络技术(北京)有限公司 | 智能设备唤醒方法、装置、服务器及智能设备 |
US10395655B1 (en) * | 2017-09-13 | 2019-08-27 | Amazon Technologies, Inc. | Proactive command framework |
CN107730211A (zh) * | 2017-10-29 | 2018-02-23 | 佛山市凯荣泰科技有限公司 | 采用可穿戴设备的睡眠提醒方法以及系统 |
CN107679042B (zh) * | 2017-11-15 | 2021-02-05 | 北京灵伴即时智能科技有限公司 | 一种面向智能语音对话系统的多层级对话分析方法 |
US10488831B2 (en) * | 2017-11-21 | 2019-11-26 | Bose Corporation | Biopotential wakeup word |
US11140450B2 (en) * | 2017-11-28 | 2021-10-05 | Rovi Guides, Inc. | Methods and systems for recommending content in context of a conversation |
KR102469753B1 (ko) * | 2017-11-30 | 2022-11-22 | 삼성전자주식회사 | 음원의 위치에 기초하여 서비스를 제공하는 방법 및 이를 위한 음성 인식 디바이스 |
CN107808659A (zh) * | 2017-12-02 | 2018-03-16 | 宫文峰 | 智能语音信号模式识别系统装置 |
CN108074310B (zh) * | 2017-12-21 | 2021-06-11 | 广东汇泰龙科技股份有限公司 | 基于语音识别模块的语音交互方法及智能锁管理系统 |
KR20200091389A (ko) * | 2017-12-21 | 2020-07-30 | 삼성전자주식회사 | 생체 인식 사용자 인증을 위한 시스템 및 방법 |
CN108134876A (zh) * | 2017-12-21 | 2018-06-08 | 广东欧珀移动通信有限公司 | 对话分析方法、装置、存储介质及移动终端 |
CN108337362A (zh) * | 2017-12-26 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | 语音交互方法、装置、设备和存储介质 |
CN108200509A (zh) * | 2017-12-27 | 2018-06-22 | 中国人民解放军总参谋部第六十研究所 | 一种用于噪杂环境下的录音装置 |
US10424186B2 (en) * | 2017-12-28 | 2019-09-24 | Sony Corporation | System and method for customized message playback |
CN108039171A (zh) * | 2018-01-08 | 2018-05-15 | 珠海格力电器股份有限公司 | 语音控制方法及装置 |
CN108306797A (zh) * | 2018-01-30 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | 声控智能家居设备方法、系统、终端以及存储介质 |
KR102515023B1 (ko) * | 2018-02-23 | 2023-03-29 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
DE112018007242T5 (de) * | 2018-03-08 | 2020-12-10 | Sony Corporation | Datenverarbeitungsvorrichtung,Datenverarbeitungsverfahren, Programm undDatenverarbeitungssystem |
US10878825B2 (en) * | 2018-03-21 | 2020-12-29 | Cirrus Logic, Inc. | Biometric processes |
CN108694942A (zh) * | 2018-04-02 | 2018-10-23 | 浙江大学 | 一种基于家居智能服务机器人的智能家居交互问答系统 |
CN108962240B (zh) * | 2018-06-14 | 2021-09-21 | 百度在线网络技术(北京)有限公司 | 一种基于耳机的语音控制方法及系统 |
CN108735218A (zh) * | 2018-06-25 | 2018-11-02 | 北京小米移动软件有限公司 | 语音唤醒方法、装置、终端及存储介质 |
CN108920639B (zh) * | 2018-07-02 | 2022-01-18 | 北京百度网讯科技有限公司 | 基于语音交互的上下文获取方法及设备 |
KR102498811B1 (ko) * | 2018-08-21 | 2023-02-10 | 구글 엘엘씨 | 자동화된 어시스턴트를 호출하기 위한 다이내믹 및/또는 컨텍스트 특정 핫워드 |
US11016968B1 (en) * | 2018-09-18 | 2021-05-25 | Amazon Technologies, Inc. | Mutation architecture for contextual data aggregator |
US10861444B2 (en) * | 2018-09-24 | 2020-12-08 | Rovi Guides, Inc. | Systems and methods for determining whether to trigger a voice capable device based on speaking cadence |
CN109712646A (zh) * | 2019-02-20 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | 语音播报方法、装置和终端 |
CN118675519A (zh) * | 2019-02-20 | 2024-09-20 | 谷歌有限责任公司 | 利用事件前和事件后输入流来接洽自动化助理 |
EP4174850A4 (en) * | 2020-09-09 | 2023-12-06 | Samsung Electronics Co., Ltd. | ELECTRONIC VOICE RECOGNITION DEVICE AND CONTROL METHOD THEREFOR |
-
2019
- 2019-01-04 CN CN201910007365.XA patent/CN111475206B/zh active Active
- 2019-12-27 EP EP19907267.9A patent/EP3890342B1/en active Active
- 2019-12-27 US US17/420,465 patent/US20220084529A1/en active Pending
- 2019-12-27 WO PCT/CN2019/129114 patent/WO2020140840A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103956164A (zh) * | 2014-05-20 | 2014-07-30 | 苏州思必驰信息科技有限公司 | 一种声音唤醒方法及系统 |
US9485733B1 (en) * | 2015-05-17 | 2016-11-01 | Intel Corporation | Apparatus, system and method of communicating a wakeup packet |
CN107665708A (zh) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | 智能语音交互方法及系统 |
CN106714023A (zh) * | 2016-12-27 | 2017-05-24 | 广东小天才科技有限公司 | 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3890342A4 * |
Also Published As
Publication number | Publication date |
---|---|
CN111475206A (zh) | 2020-07-31 |
CN111475206B (zh) | 2023-04-11 |
US20220084529A1 (en) | 2022-03-17 |
EP3890342A4 (en) | 2022-01-19 |
EP3890342B1 (en) | 2024-09-11 |
EP3890342A1 (en) | 2021-10-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020140840A1 (zh) | 用于唤醒可穿戴设备的方法及装置 | |
US12080295B2 (en) | System and method for dynamic facial features for speaker recognition | |
CN108320733B (zh) | 语音数据处理方法及装置、存储介质、电子设备 | |
US11854550B2 (en) | Determining input for speech processing engine | |
US10242666B2 (en) | Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method | |
WO2016150001A1 (zh) | 语音识别的方法、装置及计算机存储介质 | |
JP5323770B2 (ja) | ユーザ指示取得装置、ユーザ指示取得プログラムおよびテレビ受像機 | |
CN108711429B (zh) | 电子设备及设备控制方法 | |
CN112102850A (zh) | 情绪识别的处理方法、装置、介质及电子设备 | |
CN109558788B (zh) | 静默语音输入辨识方法、计算装置和计算机可读介质 | |
CN110874137A (zh) | 一种交互方法以及装置 | |
US20230386461A1 (en) | Voice user interface using non-linguistic input | |
CN109272991A (zh) | 语音交互的方法、装置、设备和计算机可读存储介质 | |
CN111326152A (zh) | 语音控制方法及装置 | |
CN110946554A (zh) | 咳嗽类型识别方法、装置及系统 | |
CN109074809B (zh) | 信息处理设备、信息处理方法和计算机可读存储介质 | |
CN113724699B (zh) | 设备唤醒识别模型训练方法、设备唤醒控制方法及装置 | |
CN112185422B (zh) | 提示信息生成方法及其语音机器人 | |
CN108648758B (zh) | 医疗场景中分离无效语音的方法及系统 | |
Lin et al. | Nonverbal acoustic communication in human-computer interaction | |
CN118212917A (zh) | 语音助手唤醒方法、装置、设备及存储介质 | |
CN119025825A (zh) | 一种多模态面部动点数据与声带运动数据的数据处理方法及系统 | |
CN115609596A (zh) | 用户注册方法及相关设备 | |
CN111583939A (zh) | 语音识别用于特定目标唤醒的方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19907267 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019907267 Country of ref document: EP Effective date: 20210702 |