EP3776169A1 - Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method - Google Patents
Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching methodInfo
- Publication number
- EP3776169A1 EP3776169A1 EP18895921.7A EP18895921A EP3776169A1 EP 3776169 A1 EP3776169 A1 EP 3776169A1 EP 18895921 A EP18895921 A EP 18895921A EP 3776169 A1 EP3776169 A1 EP 3776169A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- voice
- user
- soundbar
- enhanced
- dsp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000008569 process Effects 0.000 claims abstract description 17
- 230000004044 response Effects 0.000 claims description 78
- 230000000694 effects Effects 0.000 claims description 22
- 230000002708 enhancing effect Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 17
- 230000001965 increasing effect Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 238000012544 monitoring process Methods 0.000 claims 2
- 238000001228 spectrum Methods 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000238558 Eucarida Species 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/01—Input selection or mixing for amplifiers or loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
Definitions
- the present invention relates to Voice Controlled media playback systems or Smart Speakers adapted to generate Voice Controlled Assistant Output Signals and receive and respond to a user’s spoken commands.
- Some VC speaker systems (e.g., the Amazon EchoTM 104, Google HomeTM or CortanaTM VC speaker systems) will run third party voice-based software (“chat- bots”) or assistant applications (e.g., SkillsTM or ActionsTM) and can respond to a user’s spoken commands 122 with a voice-based application’s synthesized audible response or Voice Controlled Assistant Output Signal 120 generated as part of Voice Assistance (“VA”) operations, where the VC speaker (e.g., 104) senses or detects user-spoken trigger phrases (i.e.,“wake” words or phrases) or commands and generates an audible VA reply or acknowledgement (e.g., 120) in response.
- VA Voice Assistance
- Amazon’s VA or voice software system is known as“Alexa”
- Goggle’s VA or voice software system is known as“Assistant”
- Apple’s VA or voice software system is known as“Siri”
- each of these VA systems is programmable to respond to a user’s“wake word” or response- triggering phrase, whereupon the VA takes over control of the VC speaker (e.g., 104) and responds to the user with an audible response or reply.
- VC loudspeaker systems necessarily reproduce several types of audio program material, including music, news, podcasts, etc., and issue Voice Assistance (VA) audible feedback or Voice Controlled Assistant Output Signals 120 to the user in response to detecting the user’s voice control commands and queries (e.g., 122, 210). Reproduced music may be significantly enhanced if the audio program signal is modified with Digital Signal Processing (“DSP”) parameters selected for optimal audio performance, but any sensed wake word or other user-spoken voice
- DSP Digital Signal Processing
- VA response or Voice Controlled Assistant Output Signal e.g. 120
- every VA response or Voice Controlled Assistant Output Signal 120 to a user’s spoken voice command is clearly heard and easily understood by the user (e.g., 106), but in current VC speaker systems, if the DSP parameters are optimized for music playback (for example), the VA response will be less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal 120 is played with those music-enhancing DSP settings.
- Surround-sound or home theater loudspeaker systems can also be integrated with Voice Controlled Assistant or VA features and are configured for use with standardized home theater audio systems having a plurality of playback channels, each typically served by an amplifier and a loudspeaker.
- DolbyTM home theater audio playback systems there are typically five channels of substantially full range material plus a subwoofer channel configured to reproduce band-limited low frequency material.
- the five substantially full range channels in a Dolby Digital 5.1TM system are typically, center, left front, right front, left surround and right surround.
- the center channel is typically positioned in a home theater system directly over or under the video display and that channel used by content creators for most of the dialog, which has the desirable effect of making reproduced dialog sound as if it were emanating from the display.
- typical surround sound loudspeaker systems are installed in listener’s homes, setup problems are encountered and many users struggle with speaker placement, component connections and related complications.
- many listeners have turned to “soundbar” style home theater loudspeaker systems (e.g., 350 as shown in Fig. 1 D which incorporate at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1 G).
- Soundbar style single enclosure loudspeaker systems (e.g., 350, as shown in Figs 1D-1 F) are simpler to install and connect, but usually require significant care in product design to provide satisfactory performance for listeners who listen from listening positions arrayed in a listening space, so that the listener can actually hear dialog and localize the center channel and the dialog as appearing to emanate from the display while appreciating a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel.
- VA controlled soundbar for example, the VA response (e.g. 120) was less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal (e.g., 120) was not well suited for processing through those movie soundtrack center-channel enhancing (e.g.,“Voice Adjust”) DSP settings.
- Figs 1A-1C illustrate typical Voice-Control (“VC”) speaker architectures as used in an exemplary AlexaTM brand Voice Controlled Assistant system, in
- Figs 1 D-1G illustrate one of Applicant’s Soundbar/Subwoofer home theater loudspeaker systems configured with Polk Audio’s Voice AdjustTM Digital Signal Processing (“DSP”) system for enhancing intelligibility of dialog and center channel fidelity in a multi-element single enclosure soundbar loudspeaker system, in accordance with the Prior Art.
- DSP Digital Signal Processing
- Fig 2 is a diagram illustrating a Voice-Control Loudspeaker System with Dedicated DSP Settings for the Voice Controlled Assistant’s (e.g., Alexa’s) spoken responses (which differ from the DSP settings used when playing audio program material such as movie soundtracks, music or sportscasts) and a Mode Switching Method, in accordance with the present invention.
- Voice Controlled Assistant e.g., Alexa
- Fig 3 is a diagram illustrating the mode switching method for the Voice- Control Loudspeaker System of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
- Fig 4 is a process flow diagram illustrating the mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2 and 3, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
- Fig 5 is a table illustrating the DSP program mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2-4, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
- the DSP program mode switching method of the present invention as illustrated in Fig 5 is also embodied in the Soundbar/Subwoofer system 500 of Figs 6A-7.
- Fig 6A-6C illustrate an improved Voice Actuated or Voice Controlled
- Fig 7 illustrates an improved DSP signal flow and mode switching method for the preferred embodiment of the Voice-Controlled Soundbar/Subwoofer
- loudspeaker system 500 of Figs 6A and 6B with Dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal in accordance with the present invention.
- VC Voice-Controlled
- VA Voice Assistance
- VC speakers must also generate VA audible responses to user’s commands and queries and VA response audible feedback (e.g., Alexa’s answers) was discovered to benefit from a very different set of DSP settings.
- VA response audible feedback e.g., Alexa’s answers
- the VC speaker system architecture of the present invention 400 incorporates a DSP mode switching system and method (as illustrated in Figs 2-5) to provide a more consistently intelligible and more subjectively pleasant sounding Voice Assistance response 420 as perceived by user 106 (e.g., when used with AmazonTM AlexaTM AppleTM SiriTM or GoogleTM Voice Assist), regardless of the audio settings that the user may have selected on the host VC speaker product 404 (which may otherwise inhibit VA response intelligibility).
- VA sound quality and Voice Controlled Assistant Output Signal 420
- VA sound quality are controlled with dedicated, preprogrammed DSP settings that override the particular DSP settings selected by the user on the basis of program material, listening conditions and personal taste.
- Figs 3-5 include diagrams and a table illustrating the DSP mode switching method for the Voice-Control Loudspeaker System 400 of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to sensing a user’s wake word or command 422, in accordance with the present invention.
- the general principles of the method of the present invention illustrated in Figs 3-5 apply to (a) the single enclosure standalone embodiment 404 of Fig. 2 and (b) the Soundbar/Subwoofer system embodiment 500 of Figs 6A-7 (as described below).
- VC speaker system architecture e.g., sold as Amazon EchoTM or AlexaTM brand“voice controlled assistants”
- Fig. 1A illustrates a first exemplary (typical) prior art system architecture 100 set in an exemplary VC speaker use environment 102, which includes a typical VC speaker system 104 and at least a first user 106.
- the user 106 is typically near VC speaker system 104.
- VC speaker system 104 is physically positioned on a table 108 within the environment 102.
- the VC speaker system 104 is shown sitting upright and supported on its base end.
- the VC speaker system 104 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
- the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1 ), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
- the cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “Software as a Service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
- Cloud services 116 may host any number of applications that can process the user input received from the VC speaker system 104, and produce a suitable response.
- Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth.
- user 106 is shown communicating with the remote entities 110 via VC speaker system 104.
- the VC speaker system 104 Voice Assist outputs an audible question, "What do you want to do?" as represented by dialog bubble 120. This output may represent a question from a far end talker 114, or from a cloud service 116 (e.g., an entertainment service).
- the user 106 is shown replying to the question by stating, "I'd like to buy tickets to a movie" as represented by the dialog bubble 122.
- the VC speaker system 104 (or voice controlled assistant 104) is equipped with an array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
- the microphones 126(1)-(M) are generally arranged at a first or top end of the VC speaker system 104 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the VC speaker system 104 may be embodied with only one microphone.
- the VC speaker system 104 may further include a speaker array 128 of speakers 130(1), . . . , 130(P) to output sounds in humanly perceptible frequency ranges.
- the speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the VC speaker system 104 may output high frequency signals, mid frequency signals, and low frequency signals.
- the speakers 130(1)-(P) are generally arranged at a second or base end of the VC speaker system 104 and oriented to emit the sound in a downward direction toward the base end and opposite to the microphone array 124 in the top end.
- the voice controlled assistant or VC speaker system 104 may further include computing components 132 that process the voice input received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128.
- the computing components 132 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used.
- the VC speaker system 104 may be configured to produce stereo or non-stereo output.
- the speakers 130(1)-(P) may receive a mono signal for output in a non-stereo configuration.
- the computing components 132 may generate and output to the speakers 130(1)-(P) two different channel signals for stereo output.
- a first channel signal (e.g., left channel signal) is provided to one of the speakers, such as the larger speaker 130(1).
- a second channel signal (e.g., right channel signal) is provided to the other of the speakers, such as the smaller speaker 130(P). Due to the vertically stacked arrangement of the speakers, however, the two-channel stereo output may not be appreciated to the user 106.
- Fig 1 B shows another implementation of voice interactive computing architecture 200 similar to the architecture 100 of Fig. 1A, but in this illustration a voice controlled assistant or VC speaker system 204 has a different physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo output, rather the vertically stacked arrangement found in the assistant 104 of Fig 1A. More particularly, the speakers 130(1)-(P) are shown at a horizontally spaced distance from one another. Optionally, VC speaker system 204 is able to play full spectrum stereo using only two speakers of different sizes. In Fig 1 B, VC speaker system 204 is communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116.
- the entertainment service 206 is hosted on one or more servers, such as servers 208(1), . . . , 208(K), which may be arranged in any number of configurations, such as server farms, stacks, and the like that are commonly used in data centers.
- the entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the voice controlled assistant.
- the voice controlled assistant 204 can play the audio in stereo with full spectrum sound quality, even though the device has a small form factor and only two speakers.
- the user 106 is shown directing the VC speaker system 204 to pause the music being played through the audible statement, "Pause the music" in dialog bubble 210.
- the VC speaker system 204 is not only designed to play music in full spectrum stereo, but is also configured with an acoustic echo cancellation (AEC) module to cancel audio components being received at the microphone array 124 so that the VC speaker system 204 can clearly hear the statements and commands spoken by the user 106.
- AEC acoustic echo cancellation
- Fig. 1C shows selected functional components of the voice controlled assistants or VC speaker systems 104 and 204 in more detail.
- each of the VC speaker systems 104 and 204 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities.
- the VC speaker systems 104 and 204 may not have a keyboard, keypad, or other form of
- assistants 104 and 204 may be implemented with the ability to receive and output audio, a network interface
- each VC speaker system 104/204 includes the microphone array 124, a speaker array 128, a processor 302, and memory 304.
- the microphone array 124 may be used to capture speech input from the user 106, or other sounds in the environment 102.
- the speaker array 128 may be used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound.
- the speaker array 128 may output a wide range of audio frequencies including both human perceptible frequencies and non-human perceptible
- the speaker array 128 is formed of two speakers capable of outputting full spectrum stereo sound, as will be described below in more detail. Two speaker array arrangements are shown, including the vertically stacked arrangement 128A and the horizontally spaced arrangement 128B.
- the memory 304 may include computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor 302 to execute instructions stored on the memory.
- CRSM may include random access memory (“RAM”) and Flash memory.
- RAM random access memory
- Flash memory Flash memory
- CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other medium which can be used to store the desired information and which can be accessed by the processor 302.
- An operating system module 306 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104/204 for the benefit of other modules.
- a speech recognition module 308 provides some level of speech recognition functionality.
- this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like.
- the amount of speech recognition capabilities implemented on the VC speaker system 104/204 is an implementation detail, but the architecture described herein can support having some speech recognition at the local VC speaker system
- An acoustic echo cancellation module 310 and a double talk reduction module 312 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that may occur. These modules may work together to identify times where echoes are present, where double talk is likely, where background noise is present, and attempt to reduce these external factors to isolate and focus on the“near talker” (i.e., user 106). By isolating on the near talker, better signal quality is provided to the speech recognition module 308 to enable more accurate interpretation of the speech utterances.
- a query formation module 314 may also be provided to receive the parsed speech content output by the speech recognition module 308 and to form a search query or some form of request.
- This query formation module 314 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input.
- NLP natural language processing
- the modules shown stored in the memory 304 are merely representative.
- Other modules 316 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided.
- the voice controlled assistant 104/204 might further include a codec 318 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals.
- the codec 318 may convert audio data between analog and digital formats.
- a user may interact with the assistant 104/204 by speaking to it, and the microphone array 124 receives the user speech.
- the codec 318 encodes the user speech and transfers that audio data to other components.
- the assistant 104/204 can
- the VC speaker system or voice controlled assistant 104/204 includes a wireless unit 320 coupled to an antenna 322 to facilitate a wireless connection to a network.
- the wireless unit 320 may implement one or more of various wireless technologies, such as wife, Bluetooth, RF, and so on.
- a USB port 324 may further be provided as part of the assistant 104/204 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 324, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection.
- a power unit 326 is further provided to distribute power to the various components on the assistant 104/204.
- a stereo component 328 is optionally provided to output stereo signals to the various speakers in the speaker array 128.
- the VC speaker system or voice controlled assistant 104/204 is designed to support audio interactions with the user, in the form of receiving voice commands (e.g., words, phrase, sentences, etc.) from the user and outputting audible feedback to the user and in one implementation, the voice controlled assistant 104/204 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But, otherwise, the assistant 104/204 does not use or need to use any input devices or displays.
- voice commands e.g., words, phrase, sentences, etc.
- the voice controlled assistant 104/204 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons.
- the VC speaker system 104/204 may be implemented as an aesthetically appealing device with a power cord and optionally a wired interface (e.g., broadband, USB, etc.).
- the cylindrical-shaped (e.g., EchoTM) assistant 104 has an elongated cylindrical housing with apertures or slots formed in a base end to allow emission of sound waves.
- a cube-shaped assistant 204 may also be implemented as an aesthetically appealing device with smooth surfaces, and covered apertures for passage of sound waves. The cube or box shape enables the two speakers to be spaced apart to provide a stereo sound experience for the user. Once plugged in, each device 104/204 may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the VC speaker system or assistant 104/204 may be generally produced at a low cost.
- the audio performance of VC speaker system 404 is enhanced by condition responsive use or implementation of dedicated DSP settings optimized for reproduction of various kinds of program material (e.g., movies, music, sports and news) and may be further affected by user settings intended to improve or alter audio playback such as dialogue/voice enhancement (or a“late night” listening mode).
- program material e.g., movies, music, sports and news
- user settings intended to improve or alter audio playback such as dialogue/voice enhancement (or a“late night” listening mode).
- a first exemplary enhanced system architecture 400 is set in a VC speaker use environment 102 where the enhanced VC speaker system 404 is shown with a first user 106.
- the user 106 is typically near or proximal to enhanced VC speaker system 404.
- enhanced VC speaker system 404 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and supported on its base end.
- the enhanced VC speaker system 404 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
- the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services.
- the cloud services 116 may host any number of applications that can process the user input received from the enhanced VC speaker system 404, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC speaker system 404.
- enhanced VC speaker system 404 incorporates Voice Assist which outputs a VA output signal 420 comprising audible questions (e.g., "What do you want to do?" as represented by dialog bubble 420), and this VA output may represent a question from a far end talker 114 or from cloud service 116 (e.g., an entertainment service).
- the user 106 is shown replying to the question by stating, "I'd like to Hear another song" as represented by the dialog bubble 422.
- the enhanced VC speaker system 404 is equipped with array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
- the microphones 126(1)-(M) may be arranged at a first or top end of enhanced VC speaker system 404 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the enhanced VC speaker system 404 may be embodied with only one microphone.
- the enhanced VC speaker system 404 may further include a speaker array 128 of speakers 130(1), . . , 130(P) to output sounds in humanly perceptible frequency ranges.
- the speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the enhanced VC speaker system 404 may output high frequency signals, mid
- the speakers 130(1)-(P) are generally arranged within enhanced VC speaker system 404 and oriented to emit the sound in a selected direction.
- the enhanced VC speaker system 404 may further include computing components 432 (best seen in Figs 2 and 3 and further including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) and these computing components process the user’s voice input (e.g., 422) as received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128.
- the computing components 432 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used.
- the enhanced VC speaker system 404 may be configured to produce stereo or non-stereo (e.g., mono or multi-channel home theater) output.
- enhanced VC speaker system 404 has a physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo or other multi-channel output with speakers 130(1)-(P) spaced from one another and is able to play full spectrum stereo using speakers of different sizes.
- enhanced VC speaker system 404 is optionally communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116.
- the entertainment service 206 is hosted on one or more servers (e.g., such as servers 208(1), . . . , 208(K) as shown in Fig. 1 B), which may be arranged in any number of configurations, as described above.
- the entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC speaker system 404.
- the enhanced VC speaker system 404 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality.
- the user 106 is shown directing the enhanced VC speaker system 404 to pause the music being played and select another recording for playback through the audible
- enhanced VC speaker system 404 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106.
- enhanced VC speaker system 404 has pre-programmed DSP modes (see, e.g., Fig. 5) including audio content listening modes with selected effects which are user selected on the basis of the nature of the program material and personal taste.
- DSP modes see, e.g., Fig. 5
- a VC speaker configured as a soundbar will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material.
- “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements.
- a further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance.
- voice control feedback e.g., formerly VA output signal 120
- the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass- heavy, with a chesty, muffled quality.
- the system 400 and method of the present invention establishes a plurality of distinct DSP modes with dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC speaker system 404) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 420.
- voice-control VA feedback typically programmed into enhanced VC speaker system 404
- the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback.
- master volume there are settings below and above which voice feedback should not be played for some voice-controlled audio products.
- voice feedback signal (formerly 120) will be difficult to hear for user 106 when the master volume is set too low even if certain program material may be enjoyed at that level under some listening conditions.
- voice feedback 420 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
- the DSP or program mode switching method of the present invention comprises the enhanced VC speaker system 404 including at least one microphone transducer (e.g., like 128 but preferably an array like 124) configured and aimed to sense and receive a first user signal spoken by user 106 (e.g., a wake word, trigger sound, user query or command (e.g., 422)).
- a first user signal spoken by user 106 e.g., a wake word, trigger sound, user query or command (e.g., 422)
- Enhanced VC speaker 404 also includes a controller or processor 432 configured to implement a first (“audio program enhancing”) playback DSP mode and a second (“VA response enhancing”) playback DSP mode which differs from the first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422.
- a controller or processor 432 configured to implement a first (“audio program enhancing") playback DSP mode and a second (“VA response enhancing”) playback DSP mode which differs from the first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422.
- VA Voice Assist
- the enhanced VC speaker system 404 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 422), and then, if no further user commands are sensed, switching enhanced VC speaker system 404 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
- the first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
- the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
- Enhanced VC speaker system 404 is also programmed to attenuate or mute the listener’s selected music or movie program material in response to sensing that user command before changing from said first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and is also programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
- the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
- the enhanced VC speaker system 404 is also programmed to disable or mute any home theater (e.g., Dolby 5.1)“surround left” or“surround right” signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- any home theater e.g., Dolby 5.1
- “surround left” or“surround right” signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- FIG. 1 Another, but not limiting example, pertains to the configuration and physical lay-out of a voice controlled multi-loudspeaker system such as a multi-driver, single enclosure soundbar 552 as used in a soundbar/subwoofer home theater system (e.g., 500, as illustrated in Figs 6A and 6B). While this configuration may be optimal for reproduction of movie soundtracks - owing to their discrete, multichannel nature and wide bandwidth - employing all of the available loudspeaker drivers (e.g., all five mid-bass drivers 354, 356, 358, 360 and 362 as shown in Fig.
- a voice controlled multi-loudspeaker system such as a multi-driver, single enclosure soundbar 552 as used in a soundbar/subwoofer home theater system (e.g., 500, as illustrated in Figs 6A and 6B). While this configuration may be optimal for reproduction of movie soundtracks - owing to their discrete, multichannel nature and wide bandwidth - employing all of the available loud
- a voice assistant response 520 for reproduction of a voice assistant response 520 will only deleteriously affect sound quality and intelligibility relative to use of only the soundbar in such a setup.
- attempted derivation of a VA response 520 for the surround channels may result in only objectionable noise (since the VA response signal is monophonic and surround derivation depends on the difference between Front Left and Right channels (e.g., as described in commonly owned US patent 7231053, the entire disclosure of which is incorporated herein by reference.)
- the system may feature an“all- channel stereo” mode in which the surround channels emulate the front channel signals, resulting in poorly intelligibility due to multiple sources playing identical program material if the surround loudspeakers were to reproduce the VA signal.
- VA signal (or input signal from a VA signal engine, as best seen in Fig. 3) may be reproduced by the soundbar 552 alone and the DSP parameters associated with VA playback are preferably established as a separate dedicated DSP mode for reproduction solely via amplifiers and drivers in the soundbar 552.
- enhanced VC soundbar speaker system 552 is configured with at least three pre-programmed user-selectable sound reproduction (e.g., DSP) modes including a Movie Mode for which enhanced VC soundbar subwoofer speaker system 500 is Acoustically optimized for both movie and TV content and provides a bass boost, increased spatialization, and enhanced center channel Voice Adjust levels for improved dialogue clarity.
- Movie Mode is typically the default sound mode for HDMI and Optical input sources.
- a second user-selectable sound reproduction mode is the Music Mode which gives the user and listener a spectrally balanced sound and smoother bass while minimizing spatialization effects to ensure more natural sound reproduction of musical performance program material.
- a third user-selectable sound reproduction mode is a Sport Mode which gives the user enhanced vocal intelligibility for dialogue-rich content, like sporting events, news casts and talk shows.
- this mode e.g., Polk’s own“Voice Adjust”TM sound reproduction system and method
- the DSP adjustments boost dialogue clarity and optimize the subwoofer volume levels.
- enhanced VC soundbar subwoofer speaker system 500 is also configured to provide user-selectable sound reproduction mode known as“Night Mode" which reduces bass and volume dynamics while improving voice intelligibility for low-volume listening.
- any of these user-selectable sound reproduction (or DSP) modes are changed when user 106 utters a wake word or speaks a voice command 522, in accordance with the present invention.
- FIG. 6A another exemplary enhanced system architecture 500 is set in a VC soundbar speaker system use environment 02 wherein the enhanced VC soundbar speaker system 552 is shown with a first user 106.
- the user 106 is typically near or proximal to enhanced VC soundbar speaker system 552.
- enhanced VC soundbar speaker system 552 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and
- the enhanced VC soundbar speaker system 552 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
- the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet.
- Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services.
- the cloud services 116 may host any number of applications that can process the user input received from the enhanced VC soundbar speaker system 552, and produce a suitable response.
- Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC soundbar speaker system 552.
- Enhanced VC soundbar speaker system 552 incorporates Voice Assist which outputs a VA output signal 520 comprising audible responses or questions (e.g., "What do you want to do?" as represented by dialog bubble 520), and this VA output may represent a question from a far end talker 114, or from cloud service 116 (e.g., an entertainment service).
- the user 106 is shown replying to the question by stating, "I'd like to Change Sound Modes" as represented by the dialog bubble 522.
- the enhanced VC soundbar speaker system 552 is equipped with array 524 of microphones 126(1), . . . , 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
- the microphone array 524 is preferably arranged in a circular array in the center of a first or top end of enhanced VC soundbar speaker system 552 opposite the base, as best seen in Fig. 6B.
- the enhanced VC soundbar speaker system 552 further includes a soundbar enclosure speaker array 528 of loudspeaker drivers (similar to the array of drivers illustrated in Fig. 1 F) to output sounds in humanly perceptible frequency ranges.
- the soundbar enclosure speaker array 528 includes tweeter and mid-bass drivers configured to emit sounds at various frequency ranges, so that each speaker has a different range.
- enhanced VC soundbar speaker system 552 may output high frequency signals, mid frequency signals, and low frequency signals which are supplemented or augmented by low frequency signals from subwoofer 554.
- the speakers in soundbar enclosure speaker array 528 are generally supported within and aimed by an enclosure front baffle to emit the sound in a selected direction toward a listening position (e.g., in a manner similar to that shown in Figs 1 F and 1G).
- the enhanced VC soundbar speaker system 552 may further include computing components 532 (similar to 432, for the embodiment of Figs. 2-5 and including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) that process the user’s voice input (e.g., 522) as received by the microphone array 524, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 528.
- the computing components 532 are generally responsive to signals from the microphone array 123 and include DSP circuits used to drive amplifiers connected to the speaker array 528.
- the enhanced VC soundbar speaker system 552 may be configured to produce stereo or non-stereo (e.g., mono or multi- channel home theater) output.
- enhanced VC soundbar speaker system 552 is optionally communicatively coupled over the network 112 to an entertainment service that is part of the cloud services 116.
- the entertainment service is hosted on one or more servers which may be arranged in any number of configurations, as described above.
- the entertainment service may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC soundbar speaker system 552.
- the enhanced VC soundbar speaker system 552 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality.
- the user 106 is shown directing the enhanced VC soundbar speaker system 552 to select another audio playback mode (e.g., thereby controlling whether the DSP engine is optimized with“Voice Adjust” processing) through the audible statement, " Change the Sound Mode " in dialog bubble 522.
- the enhanced VC soundbar speaker system 552 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106.
- the audio content listening modes and effects are user selected on the basis of the nature of the program material and personal taste.
- a VC speaker configured as an enhanced VC soundbar speaker system 552 will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material.
- “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements.
- These DSP settings generally do not lend themselves to voice control feedback (e.g., 520) whose optimal DSP settings will both enhance speech intelligibility and clarity, while providing the VA assistant’s desired sound quality at the possible expense of bandwidth and soundstage (breadth of acoustic image).
- a further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance.
- the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass-heavy, with a chesty, muffled quality.
- the system 500 and method of the present invention establishes dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC soundbar speaker system 552) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 520.
- the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback.
- master volume there are settings below and above which voice feedback should not be played for some voice-controlled audio products.
- VA voice feedback signal
- the VA voice feedback signal (formerly 120) will be difficult to hear for user 106 because the master volume is set too low for VA response intelligibility even if certain program material may be enjoyed at that level under some listening conditions.
- the voice feedback 520 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
- a voice command or wake word e.g., 522
- that command or wake word is detected and that detection triggers (within enhanced VC soundbar speaker system 552) a response which first attenuates or mutes the audio program material or system program audio signal then being processed and amplified for playback through the speakers (e.g., within soundbar speaker array 528).
- the system switches the DSP settings (e.g., amplitude, frequency response, subwoofer level, magnitude shaping, etc, from the program material enhancing DSP settings or mode to a dedicated VA response enhancing mode, as described and shown in Figs 6A-6C and Fig. 7).
- DSP settings e.g., amplitude, frequency response, subwoofer level, magnitude shaping, etc, from the program material enhancing DSP settings or mode to a dedicated VA response enhancing mode, as described and shown in Figs 6A-6C and Fig. 7.
- enhanced VC soundbar speaker system 552 plays the VA response audio signal for user 106, and then enhanced VC soundbar speaker system 552 is programmed to monitor the microphones to detect and sense any subsequent response (e.g., command 522) from user 106.
- user 106 may use commands 522 to ask the VA (e.g.,“Alexa”) to perform many useful tasks.
- VA e.g.,“Alexa”
- user 106 simply says, for example,“Alexa” and then asks the VA (Alexa) for sports updates, weather reports, cooking questions or asks the VA (Alexa) to control the sound bar (e.g., to switch to a different selected input audio signal source, change sound modes, adjust the bass or choose a user-customized sound mode (or DSP setting) such as a Polk Audio’s Voice Adjust® center channel level.
- a Polk Audio Voice Adjust® center channel level.
- the table of Fig. 5 illustrates, in table form, preferred method steps, responses, and dedicated DSP selection modes for VA signals when reproduced or played aloud using enhanced VC soundbar speaker system 552 also.
- the crossover between the soundbar enclosure drivers (e.g., generally within enclosure 552) and the subwoofer (e.g., generally within enclosure 554) is usually about 120Hz, but in the present invention, for the dedicated mode pre-programmed for more intelligible Voice Assistant responses (e.g., 520) that crossover between soundbar and subwoofer is shifted downwardly to about 80Hz, specifically so that substantially all of the audio for Voice Assistant responses (e.g., 520) emanates from the soundbar 552, meaning that soundbar 552 is used over a wider passband (nearly full range) than it is employed for non-VA signals.
- the microphone array may be housed in an enclosure separate and distinct from the enhanced VC soundbar speaker system 552 and linked with controller/signal processing block 532 wirelessly.
- enhanced VC soundbar speaker system 552 may be configured to function in the same manner as a Google miniTM or Amazon EchoTM type device which is configured and programmed to be paired with a loudspeaker to make a voice controlled speaker system with better Voice Assist sound quality and
- VA Voice Assist
- VA audio may include“special effects" that are best reproduced via spatialization techniques.
- the surround channels are enabled (for example) for optimal VA reproduction.
- the system and method of the present invention may be readily adapted for use in a VA controllable Soundbar or enhanced VC soundbar speaker system 552 or Surround-sound or home theater loudspeaker system with Voice Controlled Assistant or VA features configured for use with a plurality of playback channels, each typically served by an amplifier and a loudspeaker.
- a DolbyTM compatible home theater audio playback system including DSP as illustrated in Fig. 6C
- the five substantially full range channels in a Dolby Digital 5.1TM system are typically, center (“C”) , left front (“FL”) , right front (“FR”), left surround (“SL”) and right surround (“SR”).
- the center channel (“C”) is positioned in a home theater system directly over or under the video display (as shown in Fig. 1G) and in the system and method of the present invention the VA output signal playback 520 is preferably via the soundbar enclosure speakers.
- enhanced VC soundbar speaker system 552 includes a VA speaker subsystem including, preferably, a VC speaker microphone array 524 and related components near the center of enclosure in an upward facing surface (as best seen in Fig. 6B).
- the enhanced VC soundbar speaker system 552 is otherwise very similar to soundbar 352 of loudspeaker system 350 (as shown in Fig. 1 D, which also incorporates at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1G), but includes the enhanced DSP for generating a more intelligible Voice Controlled Assistant Output Signal 520, as best seen in Fig. 7).
- soundbar style single enclosure loudspeaker systems can provide unsatisfactory performance for listeners in some of the listening positions arrayed in a listening space. There is often only one position directly in front of the center of the soundbar which provides acceptable center-channel performance, meaning that the listener can actually hear dialog, localize the center channel and the dialog as appearing to emanate from the display and appreciate a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel.
- the audibility of Voice Controlled Assistant Output Signal e.g., 120
- a typical soundbar speaker system e.g. 350
- the typical soundbar loudspeaker (by definition, multi-element, single-enclosure, e.g., 352) can thus do a poor job of reproducing the Voice Controlled Assistant Output Signal 120, whether discrete within a multichannel mix (such as Dolby Digital 5.1) or derived from a 2- channel mixdown via any appropriate means (such as SRS or Dolby ProLogic algorithms), and so listeners experience poor intelligibility of dialog, a lack of overall clarity, with a chesty or muted unnatural timbre and this poor performance is experienced and appreciated for most of the listener seating and viewing locations in typical (domestic) home theater environments.
- a multichannel mix such as Dolby Digital 5.1
- any appropriate means such as SRS or Dolby ProLogic algorithms
- the Voice Assist output signal 120 is separated from the other audio signal inputs and processed separately from the Soundbar System’s program material (e.g., music or movie soundtrack) signals, as illustrated in the table of Fig. 5 and the signal flow diagram of Fig 7.
- program material e.g., music or movie soundtrack
- enhanced VC soundbar speaker system 552 is modified with a VA speaker subsystem including, preferably, a VC speaker microphone array (e.g., 524) and related components preferably incorporated near the center of enclosure in an upward facing surface (as shown in Fig. 6B).
- VA speaker subsystem including, preferably, a VC speaker microphone array (e.g., 524) and related components preferably incorporated near the center of enclosure in an upward facing surface (as shown in Fig. 6B).
- Soundbar subwoofer systems are designed such that each major loudspeaker subsystem - the soundbar system 552 and subwoofer 554 - reproduces the passband for which it has been designed.
- the Voice Assistant s primary signal (e.g., 520) content - its voice itself - is relatively bandiimited to the range above approximately 80Hz, where the human speaking voice resides.
- VA content i.e., Voice Controlled Assistant Output Signal 520
- VA content may be characterized by its relatively low dynamic range.
- VA output signal 520 - that it is relatively both band-limited and restricted in dynamic range - facilitates designation of the soundbar for expanded usage in terms of VA reproduction by the methods described in the present invention and to thereby address the shortfalls of VA reproduction by conventional soundbar-subwoofer systems.
- the soundbar-subwoofer crossover frequency By moving down in frequency the soundbar-subwoofer crossover frequency to 80Hz or even lower, the soundbar 552 would reproduce substantially all of the VA content (not withstanding any low frequency effects mixed to that channel) and thereby avoid the potential problems described above (see Fig. 7).
- the soundbar-subwoofer DSP system of Fig. 6C all of the channels are subjected to a bass level control.
- the present invention processes the VA’s signal and effectively bypasses the bass summation and adjustment control, thereby ensuring that the VA signal (now Voice Controlled Assistant Output Signal 520) is clear, natural sounding and highly intelligible regardless of the bass adjustment setting.
- a Compression/Expansion (often referred to as a“compander”) feature may be placed in the VA response or voice signal path.
- the compander boosts signals above a set threshold (e.g. -45dB) by as much as a prescribed maximum gain level (e.g. +12dB). Additionally, the compander will typically permit setting an expansion ratio (e.g. 3:1).
- Fig.7 includes a compander within the VA response’s signal path. It should be noted that the compander is placed upstream of the subwoofer and soundbar within the VA’s signal path so that the compander operates on both the subwoofer and soundbar signal components.
- Bandpass frequency (such as 120Flz).
- Bass Level Bandpass frequency
- subwoofer reproduces any substantial portion of the VA’s bandwidth
- unintended consequences such as poor spectral balance and inappropriate spatial cues may result, depending upon system settings (Bass Level) and subwoofer placement relative to the soundbar.
- Bass Level Bandpass frequency
- the dedicated system 500 for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal 520 in a Voice-Controlled Soundbar/Subwoofer Loudspeaker Product provides a useful combination of elements including an enhanced VC Voice- Controlled Soundbar/Subwoofer Loudspeaker including a subwoofer 554 and a soundbar 552 including at least one microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system programmed into a controller or processor 532 configured to implement a first (“soundbar subwoofer audio program”) playback DSP mode and a second (“soundbar optimized VA output signal”) playback DSP mode which differs from said first playback DSP mode to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist
- enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 522), and then, if no further user commands are sensed, switching the Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
- the first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
- the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
- the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to attenuate or mute program material in response to sensing said user command before changing from said first audioplayback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is preferably also programmed to generate substantially no subwoofer signal upon changing to the second VA dedicated DSP settings and filters the VA response signal in a manner which permits all of the VA response signal 520 to be played through the soundbar 552 when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- the Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is also programmed to (a) disable or mute any“surround left” or“surround right” signals and (b) mute or bypass any specialization processing such as SRS, 3D or D2 Widesound processing when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
- the method of the present invention may be considered to have as many as four dedicated“modes” of audio program or DSP control pre-programmed into the system of the present invention (e.g., 500), namely, a first (“audio program enhancing”) playback DSP mode, a second (“VA response enhancing”) playback DSP mode, a third (movie, music sports, Voice Adjust audio program enhancing) playback DSP mode and a fourth (“Soundbar optimized home theater VA response enhancing”) playback DSP mode.
- the method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 comprises:
- microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system, controller or processor (e.g., as illustrated in Figs 5 and 7) configured to implement a third (“soundbar subwoofer audio program”) playback DSP mode and a fourth (“soundbar optimized VA output signal”) playback mode which differs from said first or third DSP modes to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA”) output signal 520 reproduced primarily through the soundbar 552 in response to the first user signal 522;
- a first user signal e.g., 522
- a user query or command e.g., such as 122, 210
- a DSP system, controller or processor e.g., as illustrated in Figs 5 and 7) configured to implement a third (“soundbar subwoofer audio
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762611832P | 2017-12-29 | 2017-12-29 | |
PCT/US2018/068074 WO2019133942A1 (en) | 2017-12-29 | 2018-12-29 | Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3776169A1 true EP3776169A1 (en) | 2021-02-17 |
EP3776169A4 EP3776169A4 (en) | 2022-01-26 |
Family
ID=67064150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18895921.7A Pending EP3776169A4 (en) | 2017-12-29 | 2018-12-29 | Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3776169A4 (en) |
WO (1) | WO2019133942A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114302248B (en) * | 2021-04-30 | 2024-04-12 | 海信视像科技股份有限公司 | Display equipment and multi-window voice broadcasting method |
CN113362845B (en) * | 2021-05-28 | 2022-12-23 | 阿波罗智联(北京)科技有限公司 | Method, apparatus, device, storage medium and program product for noise reduction of sound data |
DE202022102864U1 (en) | 2022-05-24 | 2023-09-05 | Vierton-Audio AG | Loudspeaker unit with hearing aid function |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7558393B2 (en) * | 2003-03-18 | 2009-07-07 | Miller Iii Robert E | System and method for compatible 2D/3D (full sphere with height) surround sound reproduction |
US6937737B2 (en) | 2003-10-27 | 2005-08-30 | Britannia Investment Corporation | Multi-channel audio surround sound from front located loudspeakers |
US7817812B2 (en) | 2005-05-31 | 2010-10-19 | Polk Audio, Inc. | Compact audio reproduction system with large perceived acoustic size and image |
JP5039214B2 (en) * | 2011-02-17 | 2012-10-03 | 株式会社東芝 | Voice recognition operation device and voice recognition operation method |
US9060224B1 (en) | 2012-06-01 | 2015-06-16 | Rawles Llc | Voice controlled assistant with coaxial speaker and microphone arrangement |
US8971543B1 (en) * | 2012-06-25 | 2015-03-03 | Rawles Llc | Voice controlled assistant with stereo sound from two speakers |
US9516440B2 (en) * | 2012-10-01 | 2016-12-06 | Sonos | Providing a multi-channel and a multi-zone audio environment |
KR102516577B1 (en) * | 2013-02-07 | 2023-04-03 | 애플 인크. | Voice trigger for a digital assistant |
WO2014168618A1 (en) * | 2013-04-11 | 2014-10-16 | Nuance Communications, Inc. | System for automatic speech recognition and audio entertainment |
US9277044B2 (en) | 2013-05-09 | 2016-03-01 | Steven P. Kahn | Transportable wireless loudspeaker and system and method for managing multi-user wireless media playback over a media playback system |
US9374640B2 (en) | 2013-12-06 | 2016-06-21 | Bradley M. Starobin | Method and system for optimizing center channel performance in a single enclosure multi-element loudspeaker line array |
US9692742B1 (en) * | 2014-12-23 | 2017-06-27 | Amazon Technologies, Inc. | Third party audio announcements |
US9706320B2 (en) | 2015-05-29 | 2017-07-11 | Sound United, LLC | System and method for providing user location-based multi-zone media |
US9807484B2 (en) | 2015-10-02 | 2017-10-31 | Sound United, LLC | Loudspeaker system |
-
2018
- 2018-12-29 EP EP18895921.7A patent/EP3776169A4/en active Pending
- 2018-12-29 WO PCT/US2018/068074 patent/WO2019133942A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2019133942A1 (en) | 2019-07-04 |
EP3776169A4 (en) | 2022-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019202553B2 (en) | Handsfree beam pattern configuration | |
US10440492B2 (en) | Calibration of virtual height speakers using programmable portable devices | |
US9374640B2 (en) | Method and system for optimizing center channel performance in a single enclosure multi-element loudspeaker line array | |
KR20200015662A (en) | Spatially ducking audio produced through a beamforming loudspeaker array | |
JP6009547B2 (en) | Audio system and method for audio system | |
JP2011512768A (en) | Audio apparatus and operation method thereof | |
WO2009101778A1 (en) | Audio reproduction device and audio and video reproduction system | |
JP2021513263A (en) | How to do dynamic sound equalization | |
EP3776169A1 (en) | Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method | |
US9813039B2 (en) | Multiband ducker | |
JP4036140B2 (en) | Sound output system | |
US20240323608A1 (en) | Dynamics processing across devices with differing playback capabilities | |
JP4418479B2 (en) | Sound playback device | |
WO2019136460A1 (en) | Synchronized voice-control module, loudspeaker system and method for incorporating vc functionality into a separate loudspeaker system | |
JP5194614B2 (en) | Sound field generator | |
RU2804680C2 (en) | Playback at lower level | |
EP3776174A1 (en) | System and method for generating an improved voice assist algorithm signal input | |
CN113728661B (en) | Audio system and method for reproducing multi-channel audio and storage medium | |
KR100703923B1 (en) | Stereo sound optimization device and method for multimedia equipment | |
JP2007295634A (en) | Sound output system | |
Barbar | Electronic architecture-recent developments in the design, implementation and performance of time variant acoustic enhancement systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAJ | Public notification under rule 129 epc |
Free format text: ORIGINAL CODE: 0009425 |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
111Z | Information provided on other rights and legal means of execution |
Free format text: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR Effective date: 20201208 |
|
17P | Request for examination filed |
Effective date: 20201207 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
111Z | Information provided on other rights and legal means of execution |
Free format text: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR Effective date: 20201208 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: COX, BRIAN, E. Inventor name: STAROBIN, BRADLEY, M. |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: POLK AUDIO, LLC |
|
R11X | Information provided on other rights and legal means of execution (corrected) |
Free format text: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR Effective date: 20201208 |
|
D11X | Information provided on other rights and legal means of execution (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20220107 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/0208 20130101ALI20211223BHEP Ipc: G10L 15/22 20060101ALI20211223BHEP Ipc: H04R 27/00 20060101ALI20211223BHEP Ipc: H04R 5/04 20060101ALI20211223BHEP Ipc: H04R 3/00 20060101ALI20211223BHEP Ipc: G10L 25/78 20130101ALI20211223BHEP Ipc: G10L 21/16 20130101ALI20211223BHEP Ipc: G10L 15/20 20060101ALI20211223BHEP Ipc: G06F 3/16 20060101AFI20211223BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20240111 |