US20200034108A1 - Dynamic Volume Adjustment For Virtual Assistants - Google Patents
Dynamic Volume Adjustment For Virtual Assistants Download PDFInfo
- Publication number
- US20200034108A1 US20200034108A1 US16/045,560 US201816045560A US2020034108A1 US 20200034108 A1 US20200034108 A1 US 20200034108A1 US 201816045560 A US201816045560 A US 201816045560A US 2020034108 A1 US2020034108 A1 US 2020034108A1
- Authority
- US
- United States
- Prior art keywords
- user
- criteria
- content
- volume level
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/02—Manually-operated control
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/20—Automatic control
- H03G3/30—Automatic control in amplifiers having semiconductor devices
- H03G3/3005—Automatic control in amplifiers having semiconductor devices in amplifiers suitable for low-frequencies, e.g. audio amplifiers
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/20—Automatic control
- H03G3/30—Automatic control in amplifiers having semiconductor devices
- H03G3/3089—Control of digital or coded signals
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/20—Automatic control
- H03G3/30—Automatic control in amplifiers having semiconductor devices
- H03G3/32—Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Definitions
- voice-based virtual assistants have become a popular feature on electronic devices such as smartphones, smart speakers, media streaming devices, televisions, and so on.
- a virtual assistant is a software program that understands natural language voice commands and can process those commands in order to complete tasks for and/or provide information to users. For instance, according to one implementation, a user can say a predefined trigger word or phrase, known as a wake word, immediately followed by a voice query or command. The virtual assistant will usually be listening for the wake word in an always-on modality.
- the virtual assistant can recognize the follow-on voice query or command (e.g., “what is the weather today?” or “play music by Michael Jackson”) using a combination of speech recognition and artificial intelligence (AI) techniques.
- the virtual assistant can then act upon the voice query/command and return a verbal response, if appropriate, to the user (e.g., “today will be mostly sunny with a high of 82 degrees” or “ok, now playing Thriller by Michael Jackson”).
- the volume level of a virtual assistant's sound output is set manually by a user based on the environment in which the virtual assistant is used. For example, if the virtual assistant is used in a relatively quiet environment such as a home, the user may set the virtual assistant's sound output to a low volume level. While this manual approach for volume adjustment is functional, it can be cumbersome if the acoustic conditions surrounding the virtual assistant and/or its user change often. In these scenarios, the user will need to frequently adjust the virtual assistant's volume level so that it suitable for current conditions (i.e., loud enough to be heard, but not too loud).
- the virtual assistant can receive a voice query or command from a user, recognize the content of the voice query or command, process the voice query or command based on the recognized content, and determine an auditory response to be output to the user.
- the virtual assistant can then identify a plurality of criteria for automatically determining an output volume level for the response, where the plurality of criteria including content-based criteria and environment-based criteria, calculate values for the plurality of criteria, and combine the values to determine the output volume level.
- the virtual assistant can subsequently cause the auditory response to be output to the user at the determined output volume level.
- FIG. 1 depicts a system environment.
- FIG. 2 depicts a system environment that implements the techniques of the present disclosure according to an embodiment.
- FIG. 3 depicts a high-level workflow for implementing dynamic volume control for a virtual assistant according to an embodiment.
- FIG. 4 depicts a modified version of the workflow of FIG. 3 that incorporates a dynamic feedback/learning loop according to an embodiment.
- FIG. 5 depicts a computing device according to an embodiment.
- the present disclosure is directed to techniques that can be implemented by a virtual assistant for dynamically adjusting the volume level of its sound output based on a combination of criteria.
- these criteria can include a calculated importance of a message being spoken by the virtual assistant, the distance between the device implementing the virtual assistant and its user, the ambient noise level, and more.
- FIG. 1 depicts a system environment 100 in which embodiments of the present disclosure may be implemented.
- system environment 100 includes an electronic device 102 that is communicatively coupled with a microphone 104 and a speaker 106 .
- electronic device 102 can be a handheld or wearable device, such as a smartphone, a tablet, a smartwatch, or the like.
- electronic device 102 can be a larger or stationary device or system, such as a smart speaker, a laptop or desktop computer, a television, a media streaming device, a video game console, a kiosk, an in-vehicle computer system, a home automation or security system, or the like.
- Microphone 104 is operable for capturing audio signals from its surrounding environment, such as speech uttered by a device user 108 .
- Speaker 106 is operable for outputting audio from electronic device 102 , such as audio signals generated locally on device 102 or audio signals received from one or more remote systems/servers (e.g., cloud server 110 ).
- microphone 104 and speaker 106 can be integrated directly into the physical housing of electronic device 102 .
- microphone 104 and/or speaker 106 may be resident in another device or housing that is separate from electronic device 102 .
- microphone 104 and speaker 106 may be placed at different locations or in different fixtures in a home.
- audio data captured via microphone 104 and audio data output via speaker 106 can be relayed to/from electronic device 102 via an appropriate communications link (e.g., a wired or wireless link).
- system environment 100 further includes a virtual assistant 112 .
- virtual assistant 112 is shown as running on cloud server 110 , but in other embodiments virtual assistant 112 may run wholly or partially on electronic device 102 .
- Examples of existing virtual assistants include Siri (developed by Apple Inc.) and Alexa (developed by Amazon).
- a wake word detection module 114 residing on electronic device 102 can continuously monitor for the utterance of a wake word by device user 108 via microphone 104 . If module 114 detects the wake word as being spoken, electronic device 102 can capture one or more follow-on voice queries/commands uttered by device user 108 and forward the voice queries/commands to virtual assistant 112 .
- Virtual assistant 112 can then process the voice queries/commands, determine an auditory response if appropriate (such as a verbal message, a tone, a song, etc.), and transmit the response to electronic device 102 . Finally, electronic device 102 can output the response to device user 108 via speaker 106 .
- the acoustic conditions under which users interact with the virtual assistant can change over relatively short periods of time. For example, if the virtual assistant is accessed via a smart speaker that is placed at a fixed location within a home (e.g., on a coffee table in the living room), a user may move around the home, and thus move closer to and/or farther away from the smart speaker, as he/she interacts with the virtual assistant. As another example, the ambient noise level around the smart speaker can go up and down (e.g., when a television in the living room is turned on or off).
- FIG. 2 depicts an enhanced version of system environment 100 (i.e., system environment 200 ) that includes, as part of virtual assistant 112 , a novel dynamic volume adjustment module 202 .
- Dynamic volume adjustment module 202 may be implemented in software, hardware, or a combination thereof.
- dynamic volume adjustment module 202 can automatically adjust the sound output volume level of virtual assistant 112 by taking into account a combination of criteria, such as criteria pertaining to the content of the message being output, criteria pertaining to the most current acoustic conditions/environment surrounding electronic device 102 and/or device user 108 , and so on. In this way, module 202 can advantageously ensure that device user 108 is able to hear the responses generated by virtual assistant 112 at an appropriate volume level at all times, without requiring any manual adjustments by the user.
- system environment 200 of FIG. 2 is illustrative and not intended to limit embodiments of the present disclosure.
- dynamic volume adjustment module 202 is shown as running on cloud server 110 , in some embodiments module 202 may be implemented on electronic device 102 .
- the various entities shown in FIG. 2 may include subcomponents or functions that are not explicitly described.
- One of ordinary skill in the art will recognize other variations, modifications, and alternatives.
- FIG. 3 depicts a high-level workflow 300 that can be executed by virtual assistant 112 and dynamic volume adjustment module 202 of FIG. 2 for recognizing/processing a user voice query or command and outputting a response at a dynamically determined volume level according to an embodiment.
- wake word detection module 114 can listen for the wake word configured for triggering virtual assistant 112 . Upon detecting an utterance of the wake word by, e.g., device user 108 (block 304 ), wake word detection module 114 can listen for a follow-on voice query/command from device user 108 , record the voice query/command (block 306 ), and forward the recorded voice query/command to virtual assistant 112 (block 308 ).
- virtual assistant 112 can recognize the content of the voice query/command using speech recognition and AI techniques. Virtual assistant 112 can then process the recognized voice query/command (block 312 ). For example, if the voice query/command is a request to perform a task, such as setting an alarm, virtual assistant 112 can execute the requested task. This may involve, e.g., interacting with one or more other software services/components. Alternatively, if the voice query/command is a request to retrieve a piece of information, such as current weather conditions, virtual assistant 112 can retrieved the requested information by, e.g., accessing an online data repository or search engine.
- a piece of information such as current weather conditions
- virtual assistant 112 Upon processing the voice query/command, virtual assistant 112 can determine an auditory response to be output to device user 108 (block 314 ).
- the auditory response may be a verbal message replying to/acknowledging the voice query/command, a tone, a piece of music, or some other sound.
- Virtual assistant 112 can then invoke dynamic volume adjustment module 202 , which can identify and calculate values for various criteria that module 202 deems relevant for determining an appropriate volume level for the response (block 316 ).
- module 202 can identify and calculate values for one or more content-based criteria that pertain to the content of the response message.
- a content-based criterion is the importance of the response message content, where messages of higher importance are assigned a higher value (and thus higher volume level) than messages of lower importance.
- dynamic volume adjustment module 202 can execute a sub-process that involves (1) recognizing the content of the response message, (2) classifying the message content according to one of several predefined content types, (3) identifying an importance level associated with the content type, and (4) assigning a value to the response based on the importance level.
- the specific mappings between importance levels and content types may be defined by the developer of virtual assistant 112 and/or by end-users.
- messages requesting the confirmation of actions that may affect user privacy or cost money e.g., external messaging or financial/purchase transactions
- messages that alert the user to potentially dangerous conditions e.g., detection of a home intrusion, presence of smoke, etc.
- messages that simply repeat or acknowledge the user's original voice query/command can be assigned a relatively low importance level.
- dynamic volume adjustment module 202 can execute a sub-process that involves (1) recognizing the identity of device user 108 based on, e.g., his/her voice (and/or other biometric factors, such as face, fingerprint, etc.), (2) retrieving a preferred volume level mapped to the identified user, and (3) assigning a value to the response based on the preferred user volume level.
- dynamic volume adjustment module 202 can also take into account environment-based criteria that pertain to the environmental conditions around electronic device 102 and/or device user 108 at the time the voice query/command was submitted.
- environment-based criterion is how far device user 102 is located from speaker 106 of electronic device 102 .
- dynamic volume adjustment module 202 can execute a sub-process that involves (1) calculating the distance between device user 102 and speaker 106 using, e.g., computer vision, echo-location, radar, or known techniques, and (2) assigning a value to the response based on the calculated distance.
- dynamic volume adjustment module 202 can execute a sub-process that involves (1) determining the ambient noise level via microphone 104 and (2) assigning a value to the response based on the determined ambient noise level.
- module can combine the criteria values to arrive at a final output volume level for the response (block 318 ).
- this step can include weighting each criteria value using a developer or user-defined weight. For example, if device user 108 feels that his/her preferred volume level setting should take priority over other types of criteria, user 108 can define a high weight for the “user identity” criterion described above, which will strongly bias the final volume determination based on the user's preferred volume level.
- virtual assistant 112 can provide the response and volume level to electronic device 102 , which can output the response via speaker 106 at the specified volume. Workflow 300 can then return to block 302 in order to listen for and process additional user voice queries/commands.
- Workflow 300 of FIG. 3 is a static process in that, for a given set of inputs (e.g., importance level of response message content type, user identity, distance between user and device, ambient noise level, criteria weights, etc.), dynamic volume adjustment module 202 will always calculate the same volume level for outputting a response.
- device user 108 may wish to train the behavior of module 202 in a dynamic and ongoing manner. For instance, some of the initial criteria rules or weights may be not be ideal, and thus device user 108 may want to inform virtual assistant 112 that the output volume for a given response was too loud (or too soft) and for virtual assistant 112 to learn from that feedback.
- FIG. 4 depicts a modified version of workflow 300 (i.e., workflow 400 ) that implements a feedback/learning loop for dynamic volume adjustment module 202 .
- workflow 400 implements a feedback/learning loop for dynamic volume adjustment module 202 .
- the majority of the steps of workflow 400 are similar to those of workflow 300 ; however, once electronic device 102 has output the response at the calculated volume level to device user 108 (per block 420 ), user 108 can provide a verbal indication that the response was too loud or not loud enough (block 422 ). This indication can be provided to virtual assistant 112 , which can reiterate the response at a higher or lower volume as needed, but also incorporate this feedback into its algorithm for calculating the volume level (block 424 ).
- dynamic volume adjustment module 202 can remember this along with the inputs/criteria values used to calculate the volume level. Then, the next time the same or similar inputs/criteria values are encountered, module 202 can slightly decrease the output volume from the calculated level. In this manner, module 202 can better match its automatic volume adjustment algorithm with the user's preferences.
- FIG. 5 is a simplified block diagram of the architecture of an example computing device 500 according to an embodiment. This architecture may be used to implement electronic device 102 and/or cloud server 110 of FIGS. 1 and 2 . As shown, computing device 500 includes one or more processors 502 that communicate with a number of peripheral devices via a bus subsystem 504 . These peripheral devices include a storage subsystem 506 (comprising a memory subsystem 508 and a file storage subsystem 510 ), input devices 512 , output devices 514 , and a network interface subsystem 516 .
- a storage subsystem 506 comprising a memory subsystem 508 and a file storage subsystem 510
- input devices 512 input devices
- output devices 514 output devices
- network interface subsystem 516 a network interface subsystem
- Bus subsystem 504 can provide a mechanism for letting the various components and subsystems of computing device 500 communicate with each other as intended. Although bus subsystem 504 is shown schematically as a single bus, alternative embodiments of the bus subsystem can utilize multiple buses.
- Network interface subsystem 516 can serve as an interface for communicating data between computing device 500 and other computing devices or networks.
- Embodiments of network interface subsystem 516 can include wired (e.g., coaxial, twisted pair, or fiber optic Ethernet) and/or wireless (e.g., Wi-Fi, cellular, Bluetooth, etc.) interfaces.
- Input devices 512 can include a camera, a touch-screen incorporated into a display, a keyboard, a pointing device (e.g., mouse, touchpad, etc.), an audio input device (such as microphone 104 of FIG. 1 ), and/or other types of input devices.
- use of the term “input device” is intended to include all possible types of devices and mechanisms for inputting information into computing device 500 .
- Output devices 514 can include a display subsystem (e.g., a flat-panel display), an audio output device (e.g., such as speaker 106 of FIG. 1 ), and/or the like.
- a display subsystem e.g., a flat-panel display
- an audio output device e.g., such as speaker 106 of FIG. 1
- use of the term “output device” is intended to include all possible types of devices and mechanisms for outputting information from computing device 500 .
- Storage subsystem 506 includes a memory subsystem 508 and a file/disk storage subsystem 510 .
- Subsystems 508 and 510 represent non-transitory computer-readable storage media that can store program code and/or data that provide the functionality of various embodiments described herein.
- Memory subsystem 508 can include a number of memories including a main random access memory (RAM) 518 for storage of instructions and data during program execution and a read-only memory (ROM) 520 in which fixed instructions are stored.
- File storage subsystem 510 can provide persistent (i.e., non-volatile) storage for program and data files and can include a magnetic or solid-state hard disk drive, an optical drive along with associated removable media (e.g., CD-ROM, DVD, Blu-Ray, etc.), a removable flash memory-based drive or card, and/or other types of storage media known in the art.
- computing device 500 is illustrative and not intended to limit embodiments of the present invention. Many other configurations having more or fewer components than computing device 500 are possible.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
- In recent years, voice-based virtual assistants (referred to herein as simply “virtual assistants”) have become a popular feature on electronic devices such as smartphones, smart speakers, media streaming devices, televisions, and so on. Generally speaking, a virtual assistant is a software program that understands natural language voice commands and can process those commands in order to complete tasks for and/or provide information to users. For instance, according to one implementation, a user can say a predefined trigger word or phrase, known as a wake word, immediately followed by a voice query or command. The virtual assistant will usually be listening for the wake word in an always-on modality. Upon detecting an utterance of the wake word, the virtual assistant can recognize the follow-on voice query or command (e.g., “what is the weather today?” or “play music by Michael Jackson”) using a combination of speech recognition and artificial intelligence (AI) techniques. The virtual assistant can then act upon the voice query/command and return a verbal response, if appropriate, to the user (e.g., “today will be mostly sunny with a high of 82 degrees” or “ok, now playing Thriller by Michael Jackson”).
- Typically, the volume level of a virtual assistant's sound output is set manually by a user based on the environment in which the virtual assistant is used. For example, if the virtual assistant is used in a relatively quiet environment such as a home, the user may set the virtual assistant's sound output to a low volume level. While this manual approach for volume adjustment is functional, it can be cumbersome if the acoustic conditions surrounding the virtual assistant and/or its user change often. In these scenarios, the user will need to frequently adjust the virtual assistant's volume level so that it suitable for current conditions (i.e., loud enough to be heard, but not too loud).
- Techniques for implementing dynamic volume adjustment by a virtual assistant are provided. In one embodiment, the virtual assistant can receive a voice query or command from a user, recognize the content of the voice query or command, process the voice query or command based on the recognized content, and determine an auditory response to be output to the user. The virtual assistant can then identify a plurality of criteria for automatically determining an output volume level for the response, where the plurality of criteria including content-based criteria and environment-based criteria, calculate values for the plurality of criteria, and combine the values to determine the output volume level. The virtual assistant can subsequently cause the auditory response to be output to the user at the determined output volume level.
- A further understanding of the nature and advantages of the embodiments disclosed herein can be realized by reference to the remaining portions of the specification and the attached drawings.
-
FIG. 1 depicts a system environment. -
FIG. 2 depicts a system environment that implements the techniques of the present disclosure according to an embodiment. -
FIG. 3 depicts a high-level workflow for implementing dynamic volume control for a virtual assistant according to an embodiment. -
FIG. 4 depicts a modified version of the workflow ofFIG. 3 that incorporates a dynamic feedback/learning loop according to an embodiment. -
FIG. 5 depicts a computing device according to an embodiment. - In the following description, for purposes of explanation, numerous examples and details are set forth in order to provide an understanding of specific embodiments. It will be evident, however, to one skilled in the art that certain embodiments can be practiced without some of these details, or can be practiced with modifications or equivalents thereof.
- The present disclosure is directed to techniques that can be implemented by a virtual assistant for dynamically adjusting the volume level of its sound output based on a combination of criteria. In various embodiments, these criteria can include a calculated importance of a message being spoken by the virtual assistant, the distance between the device implementing the virtual assistant and its user, the ambient noise level, and more.
- With these techniques, there is no need for the user to manually adjust the virtual assistant's volume in response to, e.g., environmental changes. Instead, the virtual assistant itself can automate this process in a precise and optimal manner, resulting in an improved user experience.
- The foregoing and other aspects of the present disclosure are described in further detail in the sections that follow.
-
FIG. 1 depicts asystem environment 100 in which embodiments of the present disclosure may be implemented. As shown,system environment 100 includes anelectronic device 102 that is communicatively coupled with amicrophone 104 and aspeaker 106. In one set of embodiments,electronic device 102 can be a handheld or wearable device, such as a smartphone, a tablet, a smartwatch, or the like. In other embodiments,electronic device 102 can be a larger or stationary device or system, such as a smart speaker, a laptop or desktop computer, a television, a media streaming device, a video game console, a kiosk, an in-vehicle computer system, a home automation or security system, or the like. - Microphone 104 is operable for capturing audio signals from its surrounding environment, such as speech uttered by a device user 108. Speaker 106 is operable for outputting audio from
electronic device 102, such as audio signals generated locally ondevice 102 or audio signals received from one or more remote systems/servers (e.g., cloud server 110). In one embodiment,microphone 104 andspeaker 106 can be integrated directly into the physical housing ofelectronic device 102. In other embodiments,microphone 104 and/orspeaker 106 may be resident in another device or housing that is separate fromelectronic device 102. For example, in a scenario whereelectronic device 102 is a home automation or security system, microphone 104 andspeaker 106 may be placed at different locations or in different fixtures in a home. In this and other similar scenarios, audio data captured via microphone 104 and audio data output viaspeaker 106 can be relayed to/fromelectronic device 102 via an appropriate communications link (e.g., a wired or wireless link). - In addition to the foregoing components,
system environment 100 further includes avirtual assistant 112. In the example ofFIG. 1 ,virtual assistant 112 is shown as running oncloud server 110, but in other embodimentsvirtual assistant 112 may run wholly or partially onelectronic device 102. Examples of existing virtual assistants include Siri (developed by Apple Inc.) and Alexa (developed by Amazon). According to one conventional approach, a wakeword detection module 114 residing onelectronic device 102 can continuously monitor for the utterance of a wake word by device user 108 via microphone 104. Ifmodule 114 detects the wake word as being spoken,electronic device 102 can capture one or more follow-on voice queries/commands uttered by device user 108 and forward the voice queries/commands tovirtual assistant 112.Virtual assistant 112 can then process the voice queries/commands, determine an auditory response if appropriate (such as a verbal message, a tone, a song, etc.), and transmit the response toelectronic device 102. Finally,electronic device 102 can output the response to device user 108 viaspeaker 106. - As mentioned previously, one challenge with implementing a virtual assistant is that the acoustic conditions under which users interact with the virtual assistant can change over relatively short periods of time. For example, if the virtual assistant is accessed via a smart speaker that is placed at a fixed location within a home (e.g., on a coffee table in the living room), a user may move around the home, and thus move closer to and/or farther away from the smart speaker, as he/she interacts with the virtual assistant. As another example, the ambient noise level around the smart speaker can go up and down (e.g., when a television in the living room is turned on or off). These changing acoustic conditions can in some cases make it difficult for the user to hear the auditory responses that the virtual assistant provides, and in other cases make the virtual assistant's sound output too loud. It is possible for the user to manually adjust the virtual assistant's volume level as needed, but having to make such manual volume adjustments on a frequent basis is cumbersome and thus undesirable from a usability perspective.
- To address the foregoing and other similar issues,
FIG. 2 depicts an enhanced version of system environment 100 (i.e., system environment 200) that includes, as part ofvirtual assistant 112, a novel dynamicvolume adjustment module 202. Dynamicvolume adjustment module 202 may be implemented in software, hardware, or a combination thereof. As described in further detail below, dynamicvolume adjustment module 202 can automatically adjust the sound output volume level ofvirtual assistant 112 by taking into account a combination of criteria, such as criteria pertaining to the content of the message being output, criteria pertaining to the most current acoustic conditions/environment surroundingelectronic device 102 and/or device user 108, and so on. In this way,module 202 can advantageously ensure that device user 108 is able to hear the responses generated byvirtual assistant 112 at an appropriate volume level at all times, without requiring any manual adjustments by the user. - It should be appreciated that
system environment 200 ofFIG. 2 is illustrative and not intended to limit embodiments of the present disclosure. For example, although dynamicvolume adjustment module 202 is shown as running oncloud server 110, in someembodiments module 202 may be implemented onelectronic device 102. Further, the various entities shown inFIG. 2 may include subcomponents or functions that are not explicitly described. One of ordinary skill in the art will recognize other variations, modifications, and alternatives. -
FIG. 3 depicts a high-level workflow 300 that can be executed byvirtual assistant 112 and dynamicvolume adjustment module 202 ofFIG. 2 for recognizing/processing a user voice query or command and outputting a response at a dynamically determined volume level according to an embodiment. - Starting with
block 302, wakeword detection module 114 can listen for the wake word configured for triggeringvirtual assistant 112. Upon detecting an utterance of the wake word by, e.g., device user 108 (block 304), wakeword detection module 114 can listen for a follow-on voice query/command from device user 108, record the voice query/command (block 306), and forward the recorded voice query/command to virtual assistant 112 (block 308). - At
block 310,virtual assistant 112 can recognize the content of the voice query/command using speech recognition and AI techniques.Virtual assistant 112 can then process the recognized voice query/command (block 312). For example, if the voice query/command is a request to perform a task, such as setting an alarm,virtual assistant 112 can execute the requested task. This may involve, e.g., interacting with one or more other software services/components. Alternatively, if the voice query/command is a request to retrieve a piece of information, such as current weather conditions,virtual assistant 112 can retrieved the requested information by, e.g., accessing an online data repository or search engine. - Upon processing the voice query/command,
virtual assistant 112 can determine an auditory response to be output to device user 108 (block 314). For example, the auditory response may be a verbal message replying to/acknowledging the voice query/command, a tone, a piece of music, or some other sound.Virtual assistant 112 can then invoke dynamicvolume adjustment module 202, which can identify and calculate values for various criteria thatmodule 202 deems relevant for determining an appropriate volume level for the response (block 316). - The specific types of criteria that dynamic
volume adjustment module 202 considers atblock 316 can vary according to the implementation. For instance, according to one set of embodiments,module 202 can identify and calculate values for one or more content-based criteria that pertain to the content of the response message. One example of a content-based criterion is the importance of the response message content, where messages of higher importance are assigned a higher value (and thus higher volume level) than messages of lower importance. For this criterion, as part ofblock 316, dynamicvolume adjustment module 202 can execute a sub-process that involves (1) recognizing the content of the response message, (2) classifying the message content according to one of several predefined content types, (3) identifying an importance level associated with the content type, and (4) assigning a value to the response based on the importance level. - The specific mappings between importance levels and content types may be defined by the developer of
virtual assistant 112 and/or by end-users. In a particular embodiment, messages requesting the confirmation of actions that may affect user privacy or cost money (e.g., external messaging or financial/purchase transactions) and messages that alert the user to potentially dangerous conditions (e.g., detection of a home intrusion, presence of smoke, etc.) can be assigned a relatively high importance level, while messages that simply repeat or acknowledge the user's original voice query/command can be assigned a relatively low importance level. - Another example of a content-based criterion is the identity of the user being spoken to (i.e., device user 108). This criterion covers scenarios where some users may prefer a higher general volume level (e.g., older users that are hard-of-hearing), while other users may prefer a lower general volume level. For this criterion, as part of
block 316, dynamicvolume adjustment module 202 can execute a sub-process that involves (1) recognizing the identity of device user 108 based on, e.g., his/her voice (and/or other biometric factors, such as face, fingerprint, etc.), (2) retrieving a preferred volume level mapped to the identified user, and (3) assigning a value to the response based on the preferred user volume level. - In addition to (and/or in lieu of) content-based criteria, dynamic
volume adjustment module 202 can also take into account environment-based criteria that pertain to the environmental conditions aroundelectronic device 102 and/or device user 108 at the time the voice query/command was submitted. One example of an environment-based criterion is howfar device user 102 is located fromspeaker 106 ofelectronic device 102. For this criterion, as part ofblock 316, dynamicvolume adjustment module 202 can execute a sub-process that involves (1) calculating the distance betweendevice user 102 andspeaker 106 using, e.g., computer vision, echo-location, radar, or known techniques, and (2) assigning a value to the response based on the calculated distance. - Another example of an environment-based criterion is the ambient noise level around
electronic device 102. For this criterion, as part ofblock 316, dynamicvolume adjustment module 202 can execute a sub-process that involves (1) determining the ambient noise level viamicrophone 104 and (2) assigning a value to the response based on the determined ambient noise level. - It should be appreciated that the criteria noted above are exemplary and that other types of criteria that may be relevant for automated volume level determination are within the scope of the present disclosure.
- Once dynamic
volume adjustment module 202 has calculated values for all relevant criteria atblock 316, module can combine the criteria values to arrive at a final output volume level for the response (block 318). In certain embodiments, this step can include weighting each criteria value using a developer or user-defined weight. For example, if device user 108 feels that his/her preferred volume level setting should take priority over other types of criteria, user 108 can define a high weight for the “user identity” criterion described above, which will strongly bias the final volume determination based on the user's preferred volume level. - Finally, at
block 320,virtual assistant 112 can provide the response and volume level toelectronic device 102, which can output the response viaspeaker 106 at the specified volume.Workflow 300 can then return to block 302 in order to listen for and process additional user voice queries/commands. -
Workflow 300 ofFIG. 3 is a static process in that, for a given set of inputs (e.g., importance level of response message content type, user identity, distance between user and device, ambient noise level, criteria weights, etc.), dynamicvolume adjustment module 202 will always calculate the same volume level for outputting a response. However, in some scenarios, device user 108 may wish to train the behavior ofmodule 202 in a dynamic and ongoing manner. For instance, some of the initial criteria rules or weights may be not be ideal, and thus device user 108 may want to informvirtual assistant 112 that the output volume for a given response was too loud (or too soft) and forvirtual assistant 112 to learn from that feedback. - To address this,
FIG. 4 depicts a modified version of workflow 300 (i.e., workflow 400) that implements a feedback/learning loop for dynamicvolume adjustment module 202. The majority of the steps ofworkflow 400 are similar to those ofworkflow 300; however, onceelectronic device 102 has output the response at the calculated volume level to device user 108 (per block 420), user 108 can provide a verbal indication that the response was too loud or not loud enough (block 422). This indication can be provided tovirtual assistant 112, which can reiterate the response at a higher or lower volume as needed, but also incorporate this feedback into its algorithm for calculating the volume level (block 424). - For example, if device user 108 indicates at
block 422 that the volume level of the response was too loud, dynamicvolume adjustment module 202 can remember this along with the inputs/criteria values used to calculate the volume level. Then, the next time the same or similar inputs/criteria values are encountered,module 202 can slightly decrease the output volume from the calculated level. In this manner,module 202 can better match its automatic volume adjustment algorithm with the user's preferences. -
FIG. 5 is a simplified block diagram of the architecture of anexample computing device 500 according to an embodiment. This architecture may be used to implementelectronic device 102 and/orcloud server 110 ofFIGS. 1 and 2 . As shown,computing device 500 includes one ormore processors 502 that communicate with a number of peripheral devices via a bus subsystem 504. These peripheral devices include a storage subsystem 506 (comprising amemory subsystem 508 and a file storage subsystem 510),input devices 512,output devices 514, and anetwork interface subsystem 516. - Bus subsystem 504 can provide a mechanism for letting the various components and subsystems of
computing device 500 communicate with each other as intended. Although bus subsystem 504 is shown schematically as a single bus, alternative embodiments of the bus subsystem can utilize multiple buses. -
Network interface subsystem 516 can serve as an interface for communicating data betweencomputing device 500 and other computing devices or networks. Embodiments ofnetwork interface subsystem 516 can include wired (e.g., coaxial, twisted pair, or fiber optic Ethernet) and/or wireless (e.g., Wi-Fi, cellular, Bluetooth, etc.) interfaces. -
Input devices 512 can include a camera, a touch-screen incorporated into a display, a keyboard, a pointing device (e.g., mouse, touchpad, etc.), an audio input device (such asmicrophone 104 ofFIG. 1 ), and/or other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and mechanisms for inputting information intocomputing device 500. -
Output devices 514 can include a display subsystem (e.g., a flat-panel display), an audio output device (e.g., such asspeaker 106 ofFIG. 1 ), and/or the like. In general, use of the term “output device” is intended to include all possible types of devices and mechanisms for outputting information fromcomputing device 500. -
Storage subsystem 506 includes amemory subsystem 508 and a file/disk storage subsystem 510.Subsystems -
Memory subsystem 508 can include a number of memories including a main random access memory (RAM) 518 for storage of instructions and data during program execution and a read-only memory (ROM) 520 in which fixed instructions are stored.File storage subsystem 510 can provide persistent (i.e., non-volatile) storage for program and data files and can include a magnetic or solid-state hard disk drive, an optical drive along with associated removable media (e.g., CD-ROM, DVD, Blu-Ray, etc.), a removable flash memory-based drive or card, and/or other types of storage media known in the art. - It should be appreciated that
computing device 500 is illustrative and not intended to limit embodiments of the present invention. Many other configurations having more or fewer components than computingdevice 500 are possible. - The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the present disclosure may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present disclosure as defined by the following claims. For example, although certain embodiments have been described with respect to particular process flows and steps, it should be apparent to those skilled in the art that the scope of the present invention is not strictly limited to the described flows and steps. Steps described as sequential may be executed in parallel, order of steps may be varied, and steps may be modified, combined, added, or omitted.
- Further, although certain embodiments have been described using a particular combination of hardware and software, it should be recognized that other combinations of hardware and software are possible, and that specific operations described as being implemented in software can also be implemented in hardware and vice versa.
- The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense. Other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the disclosure as set forth in the following claims.
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/045,560 US10705789B2 (en) | 2018-07-25 | 2018-07-25 | Dynamic volume adjustment for virtual assistants |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/045,560 US10705789B2 (en) | 2018-07-25 | 2018-07-25 | Dynamic volume adjustment for virtual assistants |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200034108A1 true US20200034108A1 (en) | 2020-01-30 |
US10705789B2 US10705789B2 (en) | 2020-07-07 |
Family
ID=69178096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/045,560 Active US10705789B2 (en) | 2018-07-25 | 2018-07-25 | Dynamic volume adjustment for virtual assistants |
Country Status (1)
Country | Link |
---|---|
US (1) | US10705789B2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111899733A (en) * | 2020-07-02 | 2020-11-06 | 北京如影智能科技有限公司 | A method and device for determining volume |
US20210304745A1 (en) * | 2020-03-30 | 2021-09-30 | Motorola Solutions, Inc. | Electronic communications device having a user interface including a single input interface for electronic digital assistant and voice control access |
US11151990B2 (en) * | 2018-12-14 | 2021-10-19 | International Business Machines Corporation | Operating a voice response system |
CN113656125A (en) * | 2021-07-30 | 2021-11-16 | 阿波罗智联(北京)科技有限公司 | Virtual assistant generation method and device and electronic equipment |
US11233490B2 (en) * | 2019-11-21 | 2022-01-25 | Motorola Mobility Llc | Context based volume adaptation by voice assistant devices |
EP4024705A1 (en) * | 2021-01-04 | 2022-07-06 | Toshiba TEC Kabushiki Kaisha | Speech sound response device and speech sound response method |
US20220383871A1 (en) * | 2021-05-28 | 2022-12-01 | Zebra Technologies Corporation | Virtual assistant for a communication session |
CN115424623A (en) * | 2022-03-23 | 2022-12-02 | 北京罗克维尔斯科技有限公司 | Voice interaction method, device, equipment and computer-readable storage medium |
US11580969B2 (en) * | 2019-03-27 | 2023-02-14 | Lg Electronics Inc. | Artificial intelligence device and method of operating artificial intelligence device |
US20230333810A1 (en) * | 2022-04-15 | 2023-10-19 | Actu8 Llc | Electronic device having a virtual assistant for adjusting an output sound level of the electronic device based on a determined sound level of a reference sound input |
US12001754B2 (en) * | 2019-11-21 | 2024-06-04 | Motorola Mobility Llc | Context based media selection based on preferences setting for active consumer(s) |
EP4307692A4 (en) * | 2021-03-31 | 2024-07-24 | Huawei Technologies Co., Ltd. | Method and system for adjusting volume, and electronic device |
US12164828B2 (en) | 2020-10-30 | 2024-12-10 | Samsung Electronics Co., Ltd. | Method and system for assigning unique voice for electronic device |
CN119360853A (en) * | 2024-12-26 | 2025-01-24 | 福建船政交通职业学院 | Voice interaction method and system based on artificial intelligence |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10609475B2 (en) | 2014-12-05 | 2020-03-31 | Stages Llc | Active noise control and customized audio system |
US10945080B2 (en) | 2016-11-18 | 2021-03-09 | Stages Llc | Audio analysis and processing system |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6577998B1 (en) * | 1998-09-01 | 2003-06-10 | Image Link Co., Ltd | Systems and methods for communicating through computer animated images |
US20030167167A1 (en) * | 2002-02-26 | 2003-09-04 | Li Gong | Intelligent personal assistants |
US20060122840A1 (en) * | 2004-12-07 | 2006-06-08 | David Anderson | Tailoring communication from interactive speech enabled and multimodal services |
US20060229873A1 (en) * | 2005-03-29 | 2006-10-12 | International Business Machines Corporation | Methods and apparatus for adapting output speech in accordance with context of communication |
US20090204410A1 (en) * | 2008-02-13 | 2009-08-13 | Sensory, Incorporated | Voice interface and search for electronic devices including bluetooth headsets and remote systems |
US20110096137A1 (en) * | 2009-10-27 | 2011-04-28 | Mary Baker | Audiovisual Feedback To Users Of Video Conferencing Applications |
US20140172953A1 (en) * | 2012-12-14 | 2014-06-19 | Rawles Llc | Response Endpoint Selection |
US9318101B2 (en) * | 2009-12-15 | 2016-04-19 | At&T Mobility Ii Llc | Automatic sound level control |
US20160173049A1 (en) * | 2014-12-10 | 2016-06-16 | Ebay Inc. | Intelligent audio output devices |
US20170161319A1 (en) * | 2015-12-08 | 2017-06-08 | Rovi Guides, Inc. | Systems and methods for generating smart responses for natural language queries |
US20170329766A1 (en) * | 2014-12-09 | 2017-11-16 | Sony Corporation | Information processing apparatus, control method, and program |
US9965247B2 (en) * | 2016-02-22 | 2018-05-08 | Sonos, Inc. | Voice controlled media playback system based on user profile |
US20180310100A1 (en) * | 2017-04-21 | 2018-10-25 | DISH Technologies L.L.C. | Dynamically adjust audio attributes based on individual speaking characteristics |
US20180341643A1 (en) * | 2017-05-26 | 2018-11-29 | Bose Corporation | Dynamic text-to-speech response from a smart speaker |
US20180349093A1 (en) * | 2017-06-02 | 2018-12-06 | Rovi Guides, Inc. | Systems and methods for generating a volume-based response for multiple voice-operated user devices |
-
2018
- 2018-07-25 US US16/045,560 patent/US10705789B2/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6577998B1 (en) * | 1998-09-01 | 2003-06-10 | Image Link Co., Ltd | Systems and methods for communicating through computer animated images |
US20030167167A1 (en) * | 2002-02-26 | 2003-09-04 | Li Gong | Intelligent personal assistants |
US20060122840A1 (en) * | 2004-12-07 | 2006-06-08 | David Anderson | Tailoring communication from interactive speech enabled and multimodal services |
US20060229873A1 (en) * | 2005-03-29 | 2006-10-12 | International Business Machines Corporation | Methods and apparatus for adapting output speech in accordance with context of communication |
US20090204410A1 (en) * | 2008-02-13 | 2009-08-13 | Sensory, Incorporated | Voice interface and search for electronic devices including bluetooth headsets and remote systems |
US20110096137A1 (en) * | 2009-10-27 | 2011-04-28 | Mary Baker | Audiovisual Feedback To Users Of Video Conferencing Applications |
US9318101B2 (en) * | 2009-12-15 | 2016-04-19 | At&T Mobility Ii Llc | Automatic sound level control |
US20140172953A1 (en) * | 2012-12-14 | 2014-06-19 | Rawles Llc | Response Endpoint Selection |
US20170329766A1 (en) * | 2014-12-09 | 2017-11-16 | Sony Corporation | Information processing apparatus, control method, and program |
US20160173049A1 (en) * | 2014-12-10 | 2016-06-16 | Ebay Inc. | Intelligent audio output devices |
US20170161319A1 (en) * | 2015-12-08 | 2017-06-08 | Rovi Guides, Inc. | Systems and methods for generating smart responses for natural language queries |
US9965247B2 (en) * | 2016-02-22 | 2018-05-08 | Sonos, Inc. | Voice controlled media playback system based on user profile |
US20180310100A1 (en) * | 2017-04-21 | 2018-10-25 | DISH Technologies L.L.C. | Dynamically adjust audio attributes based on individual speaking characteristics |
US20180341643A1 (en) * | 2017-05-26 | 2018-11-29 | Bose Corporation | Dynamic text-to-speech response from a smart speaker |
US20180349093A1 (en) * | 2017-06-02 | 2018-12-06 | Rovi Guides, Inc. | Systems and methods for generating a volume-based response for multiple voice-operated user devices |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11151990B2 (en) * | 2018-12-14 | 2021-10-19 | International Business Machines Corporation | Operating a voice response system |
US11580969B2 (en) * | 2019-03-27 | 2023-02-14 | Lg Electronics Inc. | Artificial intelligence device and method of operating artificial intelligence device |
US11233490B2 (en) * | 2019-11-21 | 2022-01-25 | Motorola Mobility Llc | Context based volume adaptation by voice assistant devices |
US12001754B2 (en) * | 2019-11-21 | 2024-06-04 | Motorola Mobility Llc | Context based media selection based on preferences setting for active consumer(s) |
US11682391B2 (en) * | 2020-03-30 | 2023-06-20 | Motorola Solutions, Inc. | Electronic communications device having a user interface including a single input interface for electronic digital assistant and voice control access |
US20210304745A1 (en) * | 2020-03-30 | 2021-09-30 | Motorola Solutions, Inc. | Electronic communications device having a user interface including a single input interface for electronic digital assistant and voice control access |
CN111899733A (en) * | 2020-07-02 | 2020-11-06 | 北京如影智能科技有限公司 | A method and device for determining volume |
US12164828B2 (en) | 2020-10-30 | 2024-12-10 | Samsung Electronics Co., Ltd. | Method and system for assigning unique voice for electronic device |
EP4024705A1 (en) * | 2021-01-04 | 2022-07-06 | Toshiba TEC Kabushiki Kaisha | Speech sound response device and speech sound response method |
EP4307692A4 (en) * | 2021-03-31 | 2024-07-24 | Huawei Technologies Co., Ltd. | Method and system for adjusting volume, and electronic device |
US20220383871A1 (en) * | 2021-05-28 | 2022-12-01 | Zebra Technologies Corporation | Virtual assistant for a communication session |
US11688398B2 (en) * | 2021-05-28 | 2023-06-27 | Zebra Technologies Corporation | Virtual assistant for a communication session |
CN113656125A (en) * | 2021-07-30 | 2021-11-16 | 阿波罗智联(北京)科技有限公司 | Virtual assistant generation method and device and electronic equipment |
CN115424623A (en) * | 2022-03-23 | 2022-12-02 | 北京罗克维尔斯科技有限公司 | Voice interaction method, device, equipment and computer-readable storage medium |
US20230333810A1 (en) * | 2022-04-15 | 2023-10-19 | Actu8 Llc | Electronic device having a virtual assistant for adjusting an output sound level of the electronic device based on a determined sound level of a reference sound input |
CN119360853A (en) * | 2024-12-26 | 2025-01-24 | 福建船政交通职业学院 | Voice interaction method and system based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
US10705789B2 (en) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10705789B2 (en) | Dynamic volume adjustment for virtual assistants | |
US12080280B2 (en) | Systems and methods for determining whether to trigger a voice capable device based on speaking cadence | |
US12254038B2 (en) | Methods and systems for providing a secure automated assistant | |
US10438595B2 (en) | Speaker identification and unsupervised speaker adaptation techniques | |
US9959863B2 (en) | Keyword detection using speaker-independent keyword models for user-designated keywords | |
RU2699587C2 (en) | Updating models of classifiers of understanding language based on crowdsourcing | |
JP6306190B2 (en) | Method and apparatus for controlling access to an application | |
CN109313897B (en) | Communication using multiple virtual assistant services | |
US9916832B2 (en) | Using combined audio and vision-based cues for voice command-and-control | |
US12159622B2 (en) | Text independent speaker recognition | |
JP7230806B2 (en) | Information processing device and information processing method | |
JP7347217B2 (en) | Information processing device, information processing system, information processing method, and program | |
US20240386892A1 (en) | Interruption detection and handling by digital assistants | |
US20240212687A1 (en) | Supplemental content output | |
JP2024529888A (en) | Degree-based hotword detection | |
US20240312455A1 (en) | Transferring actions from a shared device to a personal device associated with an account of a user | |
US20240203413A1 (en) | Selecting an automated assistant as the primary automated assistant for a device based on determined affinity scores for candidate automated assistants | |
CN118339609A (en) | Warm word arbitration between automated assistant devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SENSORY, INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MOZER, TODD F.;REEL/FRAME:046462/0445 Effective date: 20180725 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |