CN113079401A - Display device and echo cancellation method - Google Patents
Display device and echo cancellation method Download PDFInfo
- Publication number
- CN113079401A CN113079401A CN202110335146.1A CN202110335146A CN113079401A CN 113079401 A CN113079401 A CN 113079401A CN 202110335146 A CN202110335146 A CN 202110335146A CN 113079401 A CN113079401 A CN 113079401A
- Authority
- CN
- China
- Prior art keywords
- signal
- reference signal
- display device
- audio
- channel reference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/64—Constructional details of receivers, e.g. cabinets or dust covers
- H04N5/642—Disposition of sound reproducers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The embodiment of the application provides a display device and an echo cancellation method, wherein the display device comprises a display and a processing unit, wherein the display is configured to display a user interface; a controller configured to: acquiring a microphone signal acquired by an audio input device, a left channel reference signal and a right channel reference signal of the audio output device, and acquiring an audio playing stream of a display device playing media assets; carrying out output simulation on the audio play stream to obtain a simulation output signal of the audio play stream; carrying out frequency band separation on the simulation output signal; if the bass sound channel reference signal is separated from the simulation output signal, echo cancellation is carried out on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal, semantic recognition is carried out on the echo-cancelled signal, and a display is controlled to generate a voice interaction user interface according to a recognition result. The method and the device solve the problem that echo cancellation effect is not good during voice interaction, and voice interaction experience is improved.
Description
Technical Field
The present application relates to the field of display device technologies, and in particular, to a display device and an echo cancellation method.
Background
Along with the rapid development of the voice recognition technology, the application scene of voice interaction is more and more common, intelligent sound boxes, intelligent televisions, intelligent vehicles, intelligent homes and intelligent robots are all the application scene of voice interaction, meanwhile, as the requirement of voice interaction on user experience is higher and higher, the man-machine distance during voice interaction is not limited to a near field, and the far-field voice technology is rapidly developed. The far-field voice technology aims to keep a good voice recognition effect when the distance of man-machine interaction is increased and the volume is not increased by a person specially. The most key indexes of the far-field voice technology comprise an awakening rate, an interrupting awakening rate, an awakening response speed, a recognition rate, a recognition response speed and a service accuracy rate, wherein the interrupting awakening rate and the awakening response speed are directly related to the user experience of a user when the user uses the far-field voice function, particularly the interrupting awakening rate, and by taking an intelligent television as an example, in a playing state, if the intelligent television is not timely and insensitive to the awakening instruction response of the user, the user experience can be seriously influenced.
In order to improve the voice recognition effect of the smart television, the smart television can perform echo cancellation on the received audio frequency, so that the sound emitted by the smart television is cancelled in the audio frequency, and the interference of the sound emitted by the smart television to the awakening instruction is avoided. In the related art, the smart television includes a left channel, a right channel, and a bass channel, however, echo cancellation usually only aims at the left channel and the right channel, and by presetting a left channel extraction circuit and a right channel extraction circuit, sounds of the left channel and the right channel of the smart television can be picked up, and then these two channels of sounds are cancelled in audio received by the smart television, so as to implement echo cancellation. In a near-field speech scene, the method for eliminating only the left channel echo and the right channel echo basically can meet the speech recognition requirement, however, in a far-field speech scene, the sound received by the smart television is noisy, and the echo elimination method can not meet the speech recognition requirement gradually.
Disclosure of Invention
In order to solve the technical problem, the application provides a display device and an echo cancellation method.
In a first aspect, the present application provides a display device comprising:
a display configured to display a user interface;
a controller connected with the display, the controller configured to:
acquiring a microphone signal acquired by an audio input device, a left channel reference signal and a right channel reference signal of the audio output device, and acquiring an audio playing stream of a display device playing media assets;
performing output simulation on the audio play stream to obtain a simulation output signal of the audio play stream;
carrying out frequency band separation on the simulation output signal;
if a bass sound channel reference signal is separated from the simulation output signal, echo cancellation is carried out on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal, semantic recognition is carried out on the echo-cancelled signal, and the display is controlled to generate a voice interaction user interface according to a recognition result.
In some embodiments, the simulating the output of the audio play stream to obtain a simulated output signal of the audio play stream includes:
and carrying out play sound effect simulation, volume control and dynamic range control on the audio play stream to obtain a simulation output signal of the audio play stream, wherein the output simulation comprises the play sound effect simulation, the volume control and the dynamic range control.
In some embodiments, the frequency band separating the simulation output signal includes:
and filtering the simulation output signal through a low-pass filter, and determining the filtered signal as a bass sound channel reference signal.
In some embodiments, the controller is further configured to:
and if no bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the left sound channel reference signal and the right sound channel reference signal, performing semantic recognition on the echo-cancelled signal, and controlling the display to generate a voice interaction user interface according to a recognition result.
In a second aspect, an embodiment of the present application provides an echo cancellation method, used in the display device according to the first aspect, where the method includes:
acquiring a microphone signal acquired by an audio input device, a left channel reference signal and a right channel reference signal of the audio output device, and acquiring an audio playing stream of a display device playing media assets;
performing output simulation on the audio play stream to obtain a simulation output signal of the audio play stream;
carrying out frequency band separation on the simulation output signal;
and if a bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal.
In some embodiments, the method further comprises:
and if no bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the left sound channel reference signal and the right sound channel reference signal, performing semantic recognition on the echo-cancelled signal, and controlling the display to generate a voice interaction user interface according to a recognition result.
The display device and the echo cancellation method provided by the application have the beneficial effects that:
according to the embodiment of the application, the audio play stream of the media asset is played by the display device, the audio play stream is output and simulated to obtain the simulation output signal of the audio play stream, and the bass sound channel reference signal is separated from the simulation output signal, so that echo cancellation is performed on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal, the problem of interference of the bass signal on the voice signal input by a user is solved, the voice recognition accuracy is favorably improved, and further the user experience of voice interaction is improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic diagram illustrating an operational scenario between a display device and a control apparatus according to some embodiments;
a block diagram of the hardware configuration of the control device 100 according to some embodiments is illustrated in fig. 2;
a block diagram of a hardware configuration of a display device 200 according to some embodiments is illustrated in fig. 3;
a schematic diagram of a software configuration in a display device 200 according to some embodiments is illustrated in fig. 4;
FIG. 5 illustrates a schematic diagram of a speech recognition network in accordance with some embodiments;
a schematic distribution diagram of a microphone array and a loudspeaker array according to some embodiments is illustrated in fig. 6;
an audio transmission schematic according to some embodiments is illustrated in fig. 7, according to some embodiments;
a schematic diagram of echo cancellation according to some embodiments is illustrated in fig. 8;
a schematic diagram of echo cancellation according to some embodiments is illustrated in fig. 9;
fig. 10 schematically illustrates a method of obtaining a bass channel reference signal according to some embodiments;
a filter filtering diagram according to some embodiments is illustrated in fig. 11.
Detailed Description
To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
Fig. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display apparatus 200 through the smart device 300 or the control device 100.
In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may input a user instruction through a key on a remote controller, voice input, control panel input, etc., to control the display apparatus 200.
In some embodiments, the smart device 300 (e.g., mobile terminal, tablet, computer, laptop, etc.) may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.
In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received by a module configured inside the display device 200 to obtain a voice command, or may be received by a voice control device provided outside the display device 200.
In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.
Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.
Fig. 3 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.
In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.
In some embodiments the controller comprises a processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.
In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, a component for receiving an image signal from the controller output, performing display of video content, image content, and a menu manipulation interface, and a user manipulation UI interface.
In some embodiments, the display 260 may be a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the external control apparatus 100 or the server 400 through the communicator 220.
In some embodiments, the user interface may be configured to receive control signals for controlling the apparatus 100 (e.g., an infrared remote control, etc.).
In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.
In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.
In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.
In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.
In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.
A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions receiving external input, so as to finally display and play various audio-video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.
In some embodiments, a graphics processor for generating various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.
In some embodiments, the video processor is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.
In some embodiments, the video processor includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.
In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played in the speaker.
In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.
In some embodiments, a system of a display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.
The system of the display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.
As shown in fig. 4, the system of the display device is divided into three layers, i.e., an application layer, a middleware layer and a hardware layer from top to bottom.
The Application layer mainly includes common applications on the television and an Application Framework (Application Framework), wherein the common applications are mainly applications developed based on the Browser, such as: HTML5 APPs; and Native APPs (Native APPs);
an Application Framework (Application Framework) is a complete program model, and has all basic functions required by standard Application software, such as: file access, data exchange, and interfaces to use these functions (toolbars, status lists, menus, dialog boxes).
Native APPs (Native APPs) may support online or offline, message push, or local resource access.
The middleware layer comprises various television protocols, multimedia protocols, system components and other middleware. The middleware can use basic service (function) provided by system software to connect each part of an application system or different applications on a network, and can achieve the purposes of resource sharing and function sharing.
The hardware layer mainly comprises an HAL interface, hardware and a driver, wherein the HAL interface is a unified interface for butting all the television chips, and specific logic is realized by each chip. The driving mainly comprises: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..
For clarity of explanation of the embodiments of the present application, a speech recognition network architecture provided by the embodiments of the present application is described below with reference to fig. 5.
Referring to fig. 5, fig. 5 is a schematic diagram of a speech recognition network principle provided in an embodiment of the present application. In fig. 5, the smart device is configured to receive input information and output a processing result of the information. The voice recognition service equipment is electronic equipment with voice recognition service deployed, the semantic service equipment is electronic equipment with semantic service deployed, and the business service equipment is electronic equipment with business service deployed. The electronic device may include a server, a computer, and the like, and the speech recognition service, the semantic service (also referred to as a semantic engine), and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used for recognizing audio as text, the semantic service is used for semantic parsing of the text, and the business service is used for providing specific services such as a weather query service for ink weather, a music query service for QQ music, and the like. In one embodiment, there may be multiple entity service devices deployed with different business services in the architecture shown in fig. 5, and one or more function services may also be aggregated in one or more entity service devices.
In some embodiments, the following describes an example of a process for processing information input to a smart device based on the architecture shown in fig. 5, where the information input to the smart device is an example of a query statement input by voice, the process may include the following three processes:
[ Speech recognition ]
The intelligent device can upload the audio of the query sentence to the voice recognition service device after receiving the query sentence input by voice, so that the voice recognition service device can recognize the audio as a text through the voice recognition service and then return the text to the intelligent device. In one embodiment, before uploading the audio of the query statement to the speech recognition service device, the smart device may perform denoising processing on the audio of the query statement, where the denoising processing may include removing echo and environmental noise.
[ semantic understanding ]
The intelligent device uploads the text of the query sentence identified by the voice identification service to the semantic service device, and the semantic service device performs semantic analysis on the text through semantic service to obtain the service field, intention and the like of the text.
[ semantic response ]
And the semantic service equipment issues a query instruction to corresponding business service equipment according to the semantic analysis result of the text of the query statement so as to obtain the query result given by the business service. The intelligent device can obtain the query result from the semantic service device and output the query result. As an embodiment, the semantic service device may further send a semantic parsing result of the query statement to the intelligent device, so that the intelligent device outputs a feedback statement in the semantic parsing result.
It should be noted that the architecture shown in fig. 5 is only an example, and is not intended to limit the scope of the present application. In the embodiment of the present application, other architectures may also be adopted to implement similar functions, for example: all or part of the three processes can be completed by the intelligent terminal, and are not described herein.
In some embodiments, the intelligent device shown in fig. 5 may be a display device, such as a smart television, the functions of the speech recognition service device may be implemented by cooperation of a sound collector and a controller provided on the display device, and the functions of the semantic service device and the business service device may be implemented by the controller of the display device or by a server of the display device.
In some embodiments, a query statement or other interactive statement that a user enters a display device through speech may be referred to as a voice instruction.
In some embodiments, the display device obtains, from the semantic service device, a query result given by the business service, and the display device may analyze the query result to generate response data of the voice instruction, and then control the display device to execute a corresponding action according to the response data.
In some embodiments, the semantic analysis result of the voice instruction is acquired by the display device from the semantic service device, and the display device may analyze the semantic analysis result to generate response data, and then control the display device to execute a corresponding action according to the response data.
The hardware or software architecture in some embodiments may be based on the description in the above embodiments, and in some embodiments may be based on other hardware or software architectures that are similar to the above embodiments, and it is sufficient to implement the technical solution of the present application.
In some embodiments, the display device may be provided with a voice assistant application to implement the above-mentioned intelligent voice service, such as services of searching media resources, adjusting volume, and the like. The user can wake up the voice assistant application by sending a voice signal to the display device, the voice signal can be some preset wake-up words, and after the voice assistant application wakes up, the user can interact with the voice assistant application to perform voice control on the display device.
In some embodiments, when a user inputs a voice signal, such as a wake-up word, to the display device, the display device does not play audio and video, and an audio signal received by a microphone on the display device includes the voice signal of the user and a noise signal of an environment where the display device is located.
In some embodiments, the display device may implement audio acquisition through an audio input device, and implement audio playing through an audio output device, where the audio input device may be a microphone array disposed on the display device, and the audio output device may be a speaker array disposed on the display device. Referring to fig. 6, the microphone array may include a first microphone 501 and a second microphone 502, and the speaker array includes a left channel speaker 601, a right channel speaker 602, and a bass channel speaker 603, wherein the first microphone 501 and the left channel speaker 601 may be disposed at the left side of the display apparatus, the second microphone 502 and the right channel speaker 602 may be disposed at the right side of the display apparatus, and the bass channel speaker 603 may be disposed at the middle of the display apparatus.
In some embodiments, the audio input device may also be an external device connected to the display device, such as an external microphone, and the audio output device may be an external device connected to the display device, such as an external speaker.
In some embodiments, the audio input device and the audio output device may also be an external integrated device connected to the display device, such as an external smart speaker.
In some embodiments, the audio input device and the audio output device may also be an internal integrated device connected to the display device, such as a built-in smart speaker.
The following description will take an example in which the audio input device is a microphone array provided on the display device, and the audio output device is a speaker array provided on the display device.
In some embodiments, in the display device shown in fig. 6, when a user performs a voice interaction with the display device, an audio transmission process may refer to fig. 7, which is a schematic audio transmission diagram according to some embodiments, in fig. 7, a voice signal input by the user to the display device may be s (n), and a calculation formula of a microphone signal d (n) actually received by a microphone array of the display device is:
d(n)=s(n)+(x(n)*h(n)) (1)
where x (n) is an audio signal x (n) played by a speaker array of the display device, and h (n) may be an impulse response function of a space where the display device is located. (x (n) × h (n)) means that x (n) and h (n) are convoluted, the convoluted audio signal is an echo, the volume of the echo may be large, and the echo may also contain human voice, and even may contain voice which is the same as or close to the voice signal of the user, which may cause interference to the voice signal s (n), resulting in a low interruption wake-up rate of the display device, and a slow wake-up response speed, and the interruption wake-up rate and wake-up response speed can be improved by eliminating the echo, thereby improving the user experience.
In some embodiments, the echo to which echo cancellation is applied is a left channel reference signal output by the left channel speaker 601 and a right channel reference signal output by the right channel speaker 602, and after the two paths of echoes are cancelled, the requirements of the break-and-wake rate and the wake-up response speed of the near-field speech can be met.
In some embodiments, the Echo Cancellation architecture diagram of the near-field speech can be seen in fig. 8, and as shown in fig. 8, the CPU, i.e., the controller of the display device can be provided with an audio DSP (Digital Signal Processing) and AEC (Acoustic Echo Cancellation) modules. The audio playing stream is a data stream obtained by decoding the media asset, and the decoded data stream is a data stream in a raw format. The Audio play stream is processed by a sound effect DSP and then is respectively input to a stereo PA (Pro Audio, professional Audio equipment) and a bass PA (Pro Audio, professional Audio equipment) for processing.
In some embodiments, the stereo PA resolves the signal to be played on the left channel speaker 601 and the signal to be played on the right channel speaker 602 from the sound-effect processed signal; then, sending a signal to be played by the left channel speaker 601 to the left channel speaker 601, so that the left channel speaker 601 sounds, and the sound signal sent by the left channel speaker 601 can be a left channel speaker signal; the stereo PA sends a signal to be played on the right channel speaker 602 to the right channel speaker 602, so that the right channel speaker 602 sounds, and the sound signal sent by the right channel speaker 602 may be a right channel speaker signal.
In some embodiments, the bass PA resolves the signal to be played at the bass channel speaker 603 from the sound-effect processed signal; then, the signal to be played by the bass channel speaker 603 is sent to the bass channel speaker 603, so that the bass channel speaker 603 sounds, and the sound signal emitted by the bass channel speaker 603 can be referred to as a bass channel speaker signal.
In some embodiments, the AEC module is configured to perform echo cancellation, and the AEC module may acquire a left channel reference signal, that is, a signal input to the left channel speaker 601, through an extraction circuit previously built between the left channel speaker 601 and the AEC module; a right channel reference signal, that is, a signal input to the right channel speaker 602, can be acquired through an extraction circuit previously built between the right channel speaker 602 and the AEC module; after the AEC module collects the left channel reference signal and the right channel reference signal, the AEC module may process the microphone signals collected by the first microphone 501 and the second microphone 502: and eliminating the left channel loudspeaker signal in the microphone signal according to the left channel reference signal, and eliminating the right channel loudspeaker signal in the microphone signal according to the right channel reference signal, thereby realizing the echo elimination of the left channel and the right channel.
It can be seen that the above echo cancellation for the left and right channels relies on a pre-built back sampling circuit, and there is no pre-built back sampling circuit between the AEC module and the woofer channel speaker 603.
With the continuous development of the voice interaction technology towards the far-field voice direction, the man-machine distance required by the voice interaction is larger and larger, and the requirement on the voice volume of a user is lower and lower.
In order to solve the technical problem, the application provides an echo cancellation method for a display device without a pre-built bass sound channel extraction circuit, the method provides a bass sound channel reference signal for an AEC module through output simulation and frequency band separation, and solves the problem that the AEC module cannot perform bass cancellation, and the specific scheme can be described in the following.
Referring to fig. 9, a schematic diagram of an echo cancellation principle provided in this embodiment of the present application is shown in fig. 9, where a controller of a display device acquires an audio play stream, a left channel reference signal, and a right channel reference signal being played by the display device after a microphone array acquires a microphone signal, performs output simulation and frequency band separation on the audio play stream output by an audio stream player to obtain a bass channel reference signal, and sends the bass channel reference signal, the left channel reference signal, and the right channel reference signal to an AEC module, respectively, so as to implement echo cancellation of a left channel, a right channel, and a bass channel, where acquisition and sending of the bass channel reference signal are implemented by software in a CPU of the display device, and are not implemented by a hardware circuit.
In some embodiments, when playing the media asset, the display device plays the audio play stream corresponding to the media asset according to a default set audio mode or an audio mode selected by a user, and some audio modes, such as a cinema mode and a subwoofer mode, are configured to play a signal with a lower frequency in the audio play stream on a bass channel speaker, so as to form a bass effect, in this case, the audio play stream may be subjected to output simulation and frequency band separation, so as to perform echo cancellation of a bass channel; and other sound effect modes, such as a normal mode, are configured to play signals in the audio through the left channel speaker and the right channel speaker without calling the bass channel speaker, in which case echo cancellation of the audio play stream may not be required. Therefore, after the microphone signals are collected by the microphone array, the controller of the display device can judge the sound effect mode of playing the media resources, determine that the sound effect mode is the mode needing the sounding of the bass sound channel loudspeaker, and then perform echo cancellation on the left sound channel, the right sound channel and the bass sound channel, and if the sound effect mode is determined to be the mode needing no sounding of the bass sound channel loudspeaker, only perform echo cancellation on the left sound channel and the right sound channel.
In some embodiments, the process of the controller of the display device performing the output simulation and the frequency band separation on the audio play stream may be as shown in fig. 10, which is a schematic signal processing flow of the audio play stream according to some embodiments, and as shown in fig. 10, the processing on the audio play stream may include PEQ (conference adjuster), volume Control, 3BDRC (Dynamic Range Control), and frequency band separation, where the PEQ, the volume Control, and the 3BDRC belong to the output simulation on the audio play stream.
It should be noted that, the specific data processing process of the output simulation may refer to the data processing process of the sound effect processing DSP of the display device, and since the data processing process of the sound effect processing DSP may be different for different display devices, the signal processing flow in fig. 9 may be adaptively adjusted according to the data processing process of the sound effect processing DSP of the display device. For example, in some embodiments, the output emulation may include only volume control; in some embodiments, the output simulation may include only PEQ and volume control; in some embodiments, in addition to the PEQ, volume control, and 3BDRC, the output simulation may include some other steps, such as a denoising process.
In some embodiments, the PEQ is a simulation of the play sound effect on the audio play stream. And selecting a corresponding sound effect algorithm according to the sound effect mode of the audio play stream, and processing the audio play stream through the sound effect algorithm to obtain a data stream of the simulation play sound effect. Wherein, the audio effect algorithm can be determined in advance according to the audio effect characteristics of the audio effect mode, and exemplary audio effect characteristics are as follows: flattening of the audio signal waveform for a particular frequency band, enhancement of the audio signal for a low frequency band, which may be 0-300Hz, and so forth. After the audio play stream is subjected to simulation of play sound effect, the simulated signal can be subjected to volume control.
In some embodiments, the volume control is to perform gain on the simulated played sound-effect signal according to the volume of the display device, and the larger the volume is, the larger the gain is, and the smaller the volume is, the smaller the gain is. After the volume control is performed, the signal after the gain can be subjected to dynamic range control again.
In some embodiments, dynamic range control may prevent clipping caused by excessive volume settings. By presetting a volume threshold, when the maximum volume value corresponding to the signal after volume control exceeds the volume threshold, windowing is carried out on the signals 50ms before and after the audio signal corresponding to the maximum volume value, so that the audio volume of the whole wave band is reduced, meanwhile, the smoothness of a change part is kept, and the audio playing effect is more natural, wherein the amplitude of the signal after volume control corresponds to the volume value. After the dynamic range control is performed, the signals after the dynamic range control can be subjected to frequency band separation.
In some embodiments, the frequency band separation may be implemented by a filter, and since the AEC module may obtain the left channel reference signal and the right channel reference signal through a pre-established extraction circuit, the AEC module may only need to obtain the bass channel reference signal through a low pass filter. The signal after the dynamic range control is filtered by a low-pass filter, and a bass sound channel reference signal required by the AEC module can be output, wherein the low-pass filter can be selected as a filter with a cut-off frequency of 300 Hz.
In some embodiments, after obtaining the reference signals of the channels, the AEC module may perform echo cancellation, and assuming that the echo-cancelled signal is d (n), the calculation formula of d (n) is:
D(n)=d(n)-x(n)*h(n) (2)
where x (n) includes the left channel reference signal, the right channel reference signal, and the bass channel reference signal described in the above embodiments, and h (n) may be solved by a wiener filter.
Referring to fig. 11, which is a schematic diagram of a wiener filter according to some embodiments, as shown in fig. 11, a FIR filter may be preset as h (n), x (n) is input to the FIR filter to obtain an output signal, and the output signal is subtracted from a desired signal, that is, a microphone signal d (n) in fig. 7 to obtain an error signal e (n), where the error signal may be a speech signal s (n) input to a display device by a user in fig. 7, that is, the calculation formula of the error signal e (n) is:
e(n)=x(n)*h(n)-d(n) (3)
based on the principle of the wiener filter, the maximum correlation can be calculated by solving the minimum mean square error of the error function, wherein the calculation formula of the minimum mean square error is as follows:
E[e2(n)]=E[((x(n)*h(n)-d(n))^2] (4)
finally, find outWherein,being the inverse of the autocorrelation matrix of the input signal, Rxx=E[x(n) x(n)T],rxd=E[x(n) d(n)]。
After h (n) is obtained, a signal D (n) after echo cancellation can be obtained according to the formula (2), semantic recognition is carried out on the signal D (n), a recognition result can be obtained, response data can be further generated according to the recognition result, and a voice interaction user interface can be generated according to the response data. Because the echoes of the left sound channel, the right sound channel and the bass sound channel are eliminated, the recognition speed of semantic recognition and the accuracy of a recognition result are improved, and further the interrupting and awakening rate and the response speed of voice interaction are improved.
Since the above embodiments are all described by referring to and combining with other embodiments, the same portions are provided between different embodiments, and the same and similar portions between the various embodiments in this specification may be referred to each other. And will not be described in detail herein.
It is noted that, in this specification, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a circuit structure, article, or device comprising the element.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims. The above embodiments of the present application do not limit the scope of the present application.
Claims (10)
1. A display device, comprising:
a display configured to display a user interface;
a controller connected with the display, the controller configured to:
acquiring a microphone signal acquired by an audio input device, a left channel reference signal and a right channel reference signal of the audio output device, and acquiring an audio playing stream of a display device playing media assets;
performing output simulation on the audio play stream to obtain a simulation output signal of the audio play stream;
carrying out frequency band separation on the simulation output signal;
if a bass sound channel reference signal is separated from the simulation output signal, echo cancellation is carried out on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal, semantic recognition is carried out on the echo-cancelled signal, and the display is controlled to generate a voice interaction user interface according to a recognition result.
2. The display device of claim 1, wherein the simulating the output of the audio play stream to obtain a simulated output signal of the audio play stream comprises:
and carrying out play sound effect simulation, volume control and dynamic range control on the audio play stream to obtain a simulation output signal of the audio play stream, wherein the output simulation comprises the play sound effect simulation, the volume control and the dynamic range control.
3. The display device of claim 1, wherein the frequency band separating the emulated output signal comprises:
and filtering the simulation output signal through a low-pass filter, and determining the filtered signal as a bass sound channel reference signal.
4. A display device as claimed in claim 3, characterized in that the bandwidth of the low-pass filter is 0-300 Hz.
5. The display device of claim 1, wherein the controller is further configured to:
and if no bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the left sound channel reference signal and the right sound channel reference signal, performing semantic recognition on the echo-cancelled signal, and controlling the display to generate a voice interaction user interface according to a recognition result.
6. The display device of claim 1, wherein the audio input device is a microphone array of a display device, the microphone array comprising a plurality of microphones disposed on the display device.
7. The display device of claim 1, wherein the audio output device is a speaker array of the display device, the speaker array comprising a plurality of speakers disposed on the display device.
8. The display device of claim 7, wherein the speaker array comprises a left channel speaker, a right channel speaker, and a bass channel speaker.
9. An echo cancellation method, comprising:
acquiring a microphone signal acquired by an audio input device, a left channel reference signal and a right channel reference signal of the audio output device, and acquiring an audio playing stream of a display device playing media assets;
performing output simulation on the audio play stream to obtain a simulation output signal of the audio play stream;
carrying out frequency band separation on the simulation output signal;
and if a bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the bass sound channel reference signal, the left sound channel reference signal and the right sound channel reference signal.
10. The echo cancellation method of claim 9, further comprising:
and if no bass sound channel reference signal is separated from the simulation output signal, performing echo cancellation on the microphone signal according to the left sound channel reference signal and the right sound channel reference signal, performing semantic recognition on the echo-cancelled signal, and controlling the display to generate a voice interaction user interface according to a recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110335146.1A CN113079401B (en) | 2021-03-29 | 2021-03-29 | Display device and echo cancellation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110335146.1A CN113079401B (en) | 2021-03-29 | 2021-03-29 | Display device and echo cancellation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113079401A true CN113079401A (en) | 2021-07-06 |
CN113079401B CN113079401B (en) | 2022-09-30 |
Family
ID=76611188
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110335146.1A Active CN113079401B (en) | 2021-03-29 | 2021-03-29 | Display device and echo cancellation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113079401B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115623121A (en) * | 2021-07-13 | 2023-01-17 | 北京荣耀终端有限公司 | Call method and electronic equipment |
CN118737197A (en) * | 2024-07-17 | 2024-10-01 | 北京拓灵新声科技有限公司 | Speech recognition method, device, electronic device and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4609787A (en) * | 1984-05-21 | 1986-09-02 | Communications Satellite Corporation | Echo canceller with extended frequency range |
CN1636423A (en) * | 2002-01-17 | 2005-07-06 | 皇家飞利浦电子股份有限公司 | Multichannel echo canceller system using active audio matrix coefficients |
CN1826797A (en) * | 2003-05-27 | 2006-08-30 | 皇家飞利浦电子股份有限公司 | Loudspeaker-microphone system with echo cancellation system and method for echo cancellation |
CN106297815A (en) * | 2016-07-27 | 2017-01-04 | 武汉诚迈科技有限公司 | A kind of method of echo cancellation in speech recognition scene |
CN106548783A (en) * | 2016-12-09 | 2017-03-29 | 西安Tcl软件开发有限公司 | Sound enhancement method, device and intelligent sound box, intelligent television |
CN207022188U (en) * | 2017-06-15 | 2018-02-16 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit and smart machine |
US9967661B1 (en) * | 2016-02-09 | 2018-05-08 | Amazon Technologies, Inc. | Multichannel acoustic echo cancellation |
CN109788399A (en) * | 2019-01-30 | 2019-05-21 | 珠海迈科智能科技股份有限公司 | A kind of echo cancel method and system of speaker |
CN110366067A (en) * | 2019-05-27 | 2019-10-22 | 深圳康佳电子科技有限公司 | A kind of far field voice module echo cancel circuit and device |
CN110972000A (en) * | 2019-12-31 | 2020-04-07 | 青岛海之声科技有限公司 | Microphone array signal noise reduction system and microphone array optimization method |
CN111292759A (en) * | 2020-05-11 | 2020-06-16 | 上海亮牛半导体科技有限公司 | Stereo echo cancellation method and system based on neural network |
CN111418011A (en) * | 2017-09-28 | 2020-07-14 | 搜诺思公司 | Multi-channel acoustic echo cancellation |
-
2021
- 2021-03-29 CN CN202110335146.1A patent/CN113079401B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4609787A (en) * | 1984-05-21 | 1986-09-02 | Communications Satellite Corporation | Echo canceller with extended frequency range |
CN1636423A (en) * | 2002-01-17 | 2005-07-06 | 皇家飞利浦电子股份有限公司 | Multichannel echo canceller system using active audio matrix coefficients |
CN1826797A (en) * | 2003-05-27 | 2006-08-30 | 皇家飞利浦电子股份有限公司 | Loudspeaker-microphone system with echo cancellation system and method for echo cancellation |
US9967661B1 (en) * | 2016-02-09 | 2018-05-08 | Amazon Technologies, Inc. | Multichannel acoustic echo cancellation |
CN106297815A (en) * | 2016-07-27 | 2017-01-04 | 武汉诚迈科技有限公司 | A kind of method of echo cancellation in speech recognition scene |
CN106548783A (en) * | 2016-12-09 | 2017-03-29 | 西安Tcl软件开发有限公司 | Sound enhancement method, device and intelligent sound box, intelligent television |
CN207022188U (en) * | 2017-06-15 | 2018-02-16 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit and smart machine |
CN111418011A (en) * | 2017-09-28 | 2020-07-14 | 搜诺思公司 | Multi-channel acoustic echo cancellation |
CN109788399A (en) * | 2019-01-30 | 2019-05-21 | 珠海迈科智能科技股份有限公司 | A kind of echo cancel method and system of speaker |
CN110366067A (en) * | 2019-05-27 | 2019-10-22 | 深圳康佳电子科技有限公司 | A kind of far field voice module echo cancel circuit and device |
CN110972000A (en) * | 2019-12-31 | 2020-04-07 | 青岛海之声科技有限公司 | Microphone array signal noise reduction system and microphone array optimization method |
CN111292759A (en) * | 2020-05-11 | 2020-06-16 | 上海亮牛半导体科技有限公司 | Stereo echo cancellation method and system based on neural network |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115623121A (en) * | 2021-07-13 | 2023-01-17 | 北京荣耀终端有限公司 | Call method and electronic equipment |
CN115623121B (en) * | 2021-07-13 | 2024-04-05 | 北京荣耀终端有限公司 | Communication method, electronic equipment, chip system and storage medium |
CN118737197A (en) * | 2024-07-17 | 2024-10-01 | 北京拓灵新声科技有限公司 | Speech recognition method, device, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113079401B (en) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111200746B (en) | Method for awakening display equipment in standby state and display equipment | |
CN111757171A (en) | Display device and audio playing method | |
CN112992171B (en) | Display device and control method for eliminating echo received by microphone | |
CN112163086A (en) | Multi-intention recognition method and display device | |
CN112153440B (en) | Display equipment and display system | |
CN113038048B (en) | Far-field voice awakening method and display device | |
CN113079401B (en) | Display device and echo cancellation method | |
CN113727179B (en) | Display equipment and method for enabling display equipment to be compatible with external equipment | |
CN112599126B (en) | Awakening method of intelligent device, intelligent device and computing device | |
CN111836083A (en) | Display device and screen sounding method | |
CN113096681B (en) | Display device, multi-channel echo cancellation circuit and multi-channel echo cancellation method | |
CN112562666A (en) | Method for screening equipment and service equipment | |
CN113709535B (en) | Display equipment and far-field voice recognition method based on sound channel use | |
CN113066491A (en) | Display device and voice interaction method | |
CN211860471U (en) | Intelligent sound box | |
CN112214190A (en) | Display equipment resource playing method and display equipment | |
CN112104950A (en) | Volume control method and display device | |
CN118283202A (en) | Display equipment and audio processing method | |
CN112053688A (en) | Voice interaction method, interaction equipment and server | |
CN113053380B (en) | Server and voice recognition method | |
CN113542860A (en) | Bluetooth device sound output method and display device | |
CN113079400A (en) | Display device, server and voice interaction method | |
CN111914565A (en) | Electronic equipment and user statement processing method | |
CN114302197A (en) | Voice separation control method and display device | |
CN113709557A (en) | Audio output control method and display device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |