[go: up one dir, main page]

CN114627864A - Display device and voice interaction method - Google Patents

Display device and voice interaction method Download PDF

Info

Publication number
CN114627864A
CN114627864A CN202011433067.6A CN202011433067A CN114627864A CN 114627864 A CN114627864 A CN 114627864A CN 202011433067 A CN202011433067 A CN 202011433067A CN 114627864 A CN114627864 A CN 114627864A
Authority
CN
China
Prior art keywords
voice data
user
intention
candidate user
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011433067.6A
Other languages
Chinese (zh)
Inventor
岳文浩
杨善松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to CN202011433067.6A priority Critical patent/CN114627864A/en
Publication of CN114627864A publication Critical patent/CN114627864A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a display device and a voice interaction method, wherein when first voice data input by a user is received, candidate user intentions corresponding to the first voice data are determined; when the first voice data corresponds to a plurality of candidate user intentions, generating an inquiry statement according to the candidate user intentions, and feeding back the inquiry statement to the user for prompting the user to select one user intention from the candidate user intentions; receiving second voice data input by a user, and determining a target user intention corresponding to the first voice data in the candidate user intentions according to the second voice data; and outputting the associated information associated with the target user intention. According to the method and the device, the accuracy of understanding the user intention in the voice interaction process can be effectively improved.

Description

显示设备与语音交互方法Display device and voice interaction method

技术领域technical field

本申请实施例涉及语音交互技术领域,尤其涉及一种显示设备与语音交互方法。The embodiments of the present application relate to the technical field of voice interaction, and in particular, to a display device and a voice interaction method.

背景技术Background technique

现阶段由于语音技术的发展,智能语音交互设备越来越多,语音交互已经成为一种非常重要的人机交互途径,尤其是近些年语音助手的普及,从移动终端到一些智能家电,都可以通过语音交互来获取服务。At this stage, due to the development of voice technology, there are more and more intelligent voice interaction devices, and voice interaction has become a very important way of human-computer interaction, especially the popularity of voice assistants in recent years, from mobile terminals to some smart home appliances. The service can be obtained through voice interaction.

现有的语音交互系统,通常都是根据用户输入的语句先理解用户意图,然后再根据用户意图来为用户提供相关的服务。Existing voice interaction systems usually first understand the user's intention according to the sentence input by the user, and then provide relevant services to the user according to the user's intention.

然而,在面对交叉业务决策或者用户意图模糊的情况下,现有的语音交互系统很难做出准确的意图理解与决策。例如,当用户输入的语句(query)对应的搜索结果同时包括歌曲与视频时,现有的语音交互系统会对两种搜索结果进行打分,若搜索到的“歌曲”的得分高于“视频”的得分,则优先向用户反馈搜索到的歌曲;若搜索到的“视频”的得分高于“歌曲”的得分,则优先向用户反馈搜索到的视频,由此可能会导致最终决策并不符合用户的真实意图。However, in the face of cross-service decisions or ambiguous user intentions, it is difficult for existing voice interaction systems to make accurate intention understanding and decision-making. For example, when the search result corresponding to the query input by the user includes both songs and videos, the existing voice interaction system will score the two search results, if the score of the searched "song" is higher than that of the "video" If the score of the searched "video" is higher than that of "song", the user will be given priority to feed back the searched video, which may lead to the final decision not conforming to the the real intent of the user.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种显示设备与语音交互方法,可以提升语音交互过程中用户意图理解的准确性。Embodiments of the present application provide a display device and a voice interaction method, which can improve the accuracy of user intention understanding during the voice interaction process.

在一些实施例中,本申请实施例提供一种显示设备,该显示设备包括:In some embodiments, embodiments of the present application provide a display device, the display device comprising:

语音采集装置,用于采集语音数据;A voice collection device for collecting voice data;

音频处理器,用于处理采集到的语音数据;An audio processor for processing the collected voice data;

显示屏,用于显示图像;a display screen for displaying images;

控制器,所述控制器被配置为:A controller that is configured to:

接收用户输入的第一语音数据,并确定所述第一语音数据对应的候选用户意图;Receive the first voice data input by the user, and determine the candidate user intent corresponding to the first voice data;

当所述第一语音数据对应多个候选用户意图时,根据所述多个候选用户意图生成询问语句,并将所述询问语句发送至所述显示屏进行显示,所述询问语句用于提示用户从所述多个候选用户意图中选择一个用户意图;When the first voice data corresponds to multiple candidate user intentions, generate an inquiry sentence according to the multiple candidate user intentions, and send the inquiry sentence to the display screen for display, where the inquiry sentence is used to prompt the user selecting a user intent from the plurality of candidate user intents;

接收所述用户输入的第二语音数据,并根据所述第二语音数据,在所述多个候选用户意图中确定所述第一语音数据对应的目标用户意图;receiving second voice data input by the user, and determining, according to the second voice data, a target user intent corresponding to the first voice data among the plurality of candidate user intents;

输出与所述目标用户意图关联的关联信息。Output association information associated with the target user intent.

在一种可能的设计方式中,当所述第一语音数据对应单个候选用户意图时,输出与所述候选用户意图关联的关联信息。In a possible design manner, when the first voice data corresponds to a single candidate user intent, output associated information associated with the candidate user intent.

在一种可能的设计方式中,所述控制器被配置为:In one possible design, the controller is configured to:

获取语音交互过程中确定的历史用户意图;Obtain the historical user intent determined during the voice interaction process;

根据所述第一语音数据与所述历史用户意图,利用所述意图识别模型确定所述第一语音数据对应的候选用户意图。According to the first voice data and the historical user intent, the intent recognition model is used to determine a candidate user intent corresponding to the first voice data.

在一种可能的设计方式中,所述控制器被配置为:In one possible design, the controller is configured to:

基于对话状态追踪模型确定所述第一语音数据与所述历史用户意图是否属于同一个对话序列;Determine whether the first voice data and the historical user intent belong to the same dialogue sequence based on the dialogue state tracking model;

当所述第一语音数据与所述历史用户意图属于同一个对话序列时,利用所述意图识别模型确定所述第一语音数据对应的初始用户意图,并根据所述历史用户意图更新所述第一语音数据对应的初始用户意图,得到所述第一语音数据对应的候选用户意图;When the first voice data and the historical user intent belong to the same dialogue sequence, use the intent recognition model to determine the initial user intent corresponding to the first voice data, and update the first voice data according to the historical user intent an initial user intent corresponding to the voice data, obtaining a candidate user intent corresponding to the first voice data;

当所述第一语音数据与所述历史用户意图不属于同一个对话序列时,利用所述意图识别模型确定所述第一语音数据对应的候选用户意图。When the first voice data and the historical user intent do not belong to the same dialogue sequence, the intent recognition model is used to determine a candidate user intent corresponding to the first voice data.

在一种可能的设计方式中,所述控制器被配置为:In one possible design, the controller is configured to:

根据所述第一语音数据对应的各个候选用户意图,确定对话策略学习模型中各个输出模块的得分;所述对话策略学习模型中包括以下输出模块中的至少一种:改写模块,指代消解模块、垂直领域意图解析模块、任务多轮应答模块、问答模块、新闻搜索模块、聊天模块、推荐模块以及候选意图解析模块;According to each candidate user intent corresponding to the first voice data, the score of each output module in the dialogue strategy learning model is determined; the dialogue strategy learning model includes at least one of the following output modules: a rewriting module, which refers to a resolution module , vertical domain intent analysis module, task multi-round response module, question and answer module, news search module, chat module, recommendation module and candidate intent analysis module;

当所述对话策略学习模型中得分最高的输出模块为所述候选意图解析模块时,根据各个所述候选用户意图的得分,生成所述询问语句。When the output module with the highest score in the dialogue strategy learning model is the candidate intent parsing module, the query sentence is generated according to the score of each candidate user intent.

在一种可能的设计方式中,所述控制器被配置为:In one possible design, the controller is configured to:

当所述第一语音数据对应的第一候选用户意图的得分小于第一预设阈值、所述第一语音数据对应的第二候选用户意图的得分大于第二预设阈值,且所述第一候选用户意图的得分与所述第二候选用户意图的得分之差小于预设间隔阈值时,基于所述第一候选用户意图与所述第二候选用户意图生成所述询问语句;其中,所述第一候选用户意图为所述第一语音数据对应的得分最高的候选用户意图,所述第二候选用户意图为所述第一语音数据对应的得分第二高的候选用户意图,所述第一预设阈值大于所述第二预设阈值。When the score of the first candidate user intent corresponding to the first voice data is less than the first preset threshold, the score of the second candidate user intent corresponding to the first voice data is greater than the second preset threshold, and the first When the difference between the score of the candidate user intent and the score of the second candidate user intent is less than a preset interval threshold, the query sentence is generated based on the first candidate user intent and the second candidate user intent; wherein, the The first candidate user intent is the candidate user intent with the highest score corresponding to the first voice data, the second candidate user intent is the candidate user intent with the second highest score corresponding to the first voice data, and the first candidate user intent The preset threshold is greater than the second preset threshold.

在一些实施例中,本申请实施例还提供一种语音交互方法,该方法包括:In some embodiments, the embodiments of the present application further provide a voice interaction method, the method comprising:

接收用户输入的第一语音数据,并确定所述第一语音数据对应的候选用户意图;Receive the first voice data input by the user, and determine the candidate user intent corresponding to the first voice data;

当所述第一语音数据对应多个候选用户意图时,根据所述多个候选用户意图生成询问语句,并向所述用户输出所述询问语句,所述询问语句用于提示用户从所述多个候选用户意图中选择一个用户意图;When the first voice data corresponds to a plurality of candidate user intentions, a query sentence is generated according to the plurality of candidate user intentions, and the query sentence is output to the user, where the query sentence is used to prompt the user to select from the plurality of user intentions. select a user intent from among the candidate user intents;

接收所述用户输入的第二语音数据,并根据所述第二语音数据,在所述多个候选用户意图中确定所述第一语音数据对应的目标用户意图;receiving second voice data input by the user, and determining, according to the second voice data, a target user intent corresponding to the first voice data among the plurality of candidate user intents;

输出与所述目标用户意图关联的关联信息。Output association information associated with the target user intent.

在一种可能的设计方式中,当所述第一语音数据对应单个候选用户意图时,输出与所述候选用户意图关联的关联信息。In a possible design manner, when the first voice data corresponds to a single candidate user intent, output associated information associated with the candidate user intent.

在一种可能的设计方式中,所述基于意图识别模型确定所述第一语音数据对应的候选用户意图,包括:In a possible design manner, the determining the candidate user intent corresponding to the first voice data based on the intent recognition model includes:

获取语音交互过程中确定的历史用户意图;Obtain the historical user intent determined during the voice interaction process;

根据所述第一语音数据与所述历史用户意图,利用所述意图识别模型确定所述第一语音数据对应的候选用户意图。According to the first voice data and the historical user intent, the intent recognition model is used to determine a candidate user intent corresponding to the first voice data.

在一种可能的设计方式中,所述根据所述第一语音数据与所述历史用户意图,利用所述意图识别模型确定所述第一语音数据对应的候选用户意图,包括:In a possible design manner, determining the candidate user intent corresponding to the first voice data by using the intent recognition model according to the first voice data and the historical user intent includes:

基于对话状态追踪模型确定所述第一语音数据与所述历史用户意图是否属于同一个对话序列;Determine whether the first voice data and the historical user intent belong to the same dialogue sequence based on the dialogue state tracking model;

当所述第一语音数据与所述历史用户意图属于同一个对话序列时,利用所述意图识别模型确定所述第一语音数据对应的初始用户意图,并根据所述历史用户意图更新所述第一语音数据对应的初始用户意图,得到所述第一语音数据对应的候选用户意图;When the first voice data and the historical user intent belong to the same dialogue sequence, use the intent recognition model to determine the initial user intent corresponding to the first voice data, and update the first voice data according to the historical user intent an initial user intent corresponding to the voice data, obtaining a candidate user intent corresponding to the first voice data;

当所述第一语音数据与所述历史用户意图不属于同一个对话序列时,利用所述意图识别模型确定所述第一语音数据对应的候选用户意图。When the first voice data and the historical user intent do not belong to the same dialogue sequence, the intent recognition model is used to determine a candidate user intent corresponding to the first voice data.

在一种可能的设计方式中,所述根据所述多个候选用户意图生成询问语句,包括:In a possible design manner, the generating a query sentence according to the multiple candidate user intentions includes:

根据所述第一语音数据对应的各个候选用户意图,确定对话策略学习模型中各个输出模块的得分;所述对话策略学习模型中包括以下输出模块中的至少一种:改写模块,指代消解模块、垂直领域意图解析模块、任务多轮应答模块、问答模块、新闻搜索模块、聊天模块、推荐模块以及候选意图解析模块;According to each candidate user intent corresponding to the first voice data, the score of each output module in the dialogue strategy learning model is determined; the dialogue strategy learning model includes at least one of the following output modules: a rewriting module, which refers to a resolution module , vertical domain intent analysis module, task multi-round response module, question and answer module, news search module, chat module, recommendation module and candidate intent analysis module;

当所述对话策略学习模型中得分最高的输出模块为所述候选意图解析模块时,根据各个所述候选用户意图的得分,生成所述询问语句。When the output module with the highest score in the dialogue strategy learning model is the candidate intent parsing module, the query sentence is generated according to the score of each candidate user intent.

在一种可能的设计方式中,所述根据各个所述候选用户意图的得分,生成所述询问语句,包括:In a possible design manner, the generating the query sentence according to the scores of each of the candidate user intentions includes:

当所述第一语音数据对应的第一候选用户意图的得分大于第一预设阈值、所述第一语音数据对应的第二候选用户意图的得分大于第二预设阈值,且所述第一候选用户意图的得分与所述第二候选用户意图的得分之差小于预设间隔阈值时,基于所述第一候选用户意图与所述第二候选用户意图生成所述询问语句;其中,所述第一候选用户意图为所述第一语音数据对应的得分最高的候选用户意图,所述第二候选用户意图为所述第一语音数据对应的得分第二高的候选用户意图,所述第一预设阈值大于所述第二预设阈值。When the score of the first candidate user intent corresponding to the first voice data is greater than a first preset threshold, the score of the second candidate user intent corresponding to the first voice data is greater than a second preset threshold, and the first When the difference between the score of the candidate user intent and the score of the second candidate user intent is less than a preset interval threshold, the query sentence is generated based on the first candidate user intent and the second candidate user intent; wherein, the The first candidate user intent is the candidate user intent with the highest score corresponding to the first voice data, the second candidate user intent is the candidate user intent with the second highest score corresponding to the first voice data, and the first candidate user intent The preset threshold is greater than the second preset threshold.

本申请实施例所提供的显示设备与语音交互方法,在接收到用户输入的第一语音数据时,确定该第一语音数据对应的候选用户意图;当第一语音数据对应多个候选用户意图时,根据多个候选用户意图生成询问语句,并向用户输出该询问语句,用于提示用户从上述多个候选用户意图中选择一个用户意图;接收用户输入的第二语音数据,并根据该第二语音数据在上述多个候选用户意图中确定第一语音数据对应的目标用户意图;输出与该目标用户意图关联的应答信息。在本申请中,当用户输入语句对应的候选用户意图为两种或两种以上时,语音交互系统通过一种拟人化的交互方式主动向用户反馈询问语句,然后再根据用户输入的应答语句来确定用户的真实意图,能够有效提升语音交互过程中用户意图理解的准确性。In the display device and the voice interaction method provided by the embodiment of the present application, when the first voice data input by the user is received, the candidate user intent corresponding to the first voice data is determined; when the first voice data corresponds to multiple candidate user intents , generate a query statement according to multiple candidate user intentions, and output the query statement to the user, which is used to prompt the user to select a user intention from the above-mentioned multiple candidate user intentions; The voice data determines a target user intent corresponding to the first voice data among the plurality of candidate user intents; and outputs response information associated with the target user intent. In this application, when there are two or more candidate user intentions corresponding to the user input sentence, the voice interaction system actively feeds back the query sentence to the user through an anthropomorphic interactive way, and then according to the user input response sentence Determining the real intention of the user can effectively improve the accuracy of understanding the user's intention in the process of voice interaction.

附图说明Description of drawings

图1中示例性示出了根据实施例中显示设备与控制装置之间操作场景的示意图;FIG. 1 exemplarily shows a schematic diagram of an operation scene between a display device and a control device according to an embodiment;

图2中示例性示出了根据示例性实施例中显示设备200的硬件配置框图;FIG. 2 exemplarily shows a block diagram of the hardware configuration of the display device 200 according to the exemplary embodiment;

图3中示例性示出了根据示例性实施例中控制设备1001的配置框图;FIG. 3 exemplarily shows a block diagram of the configuration of the control device 1001 according to an exemplary embodiment;

图4为本申请提供的显示设备的软件系统示意图;4 is a schematic diagram of a software system of a display device provided by the present application;

图5为本申请提供的显示设备能够提供的应用程序的示意图;5 is a schematic diagram of an application program that can be provided by the display device provided by the present application;

图6为显示设备在语音交互场景的一种应用示意图;6 is a schematic diagram of an application of a display device in a voice interaction scenario;

图7为显示设备应用在语音交互场景的流程示意图;FIG. 7 is a schematic flowchart of a display device being applied to a voice interaction scenario;

图8为本申请实施例中示例性示出的一种应用场景示意图;FIG. 8 is a schematic diagram of an application scenario exemplarily shown in an embodiment of the present application;

图9为显示设备应用在语音交互场景的另一流程示意图;FIG. 9 is another schematic flowchart of a display device applied in a voice interaction scenario;

图10为识别模型的供应商下发识别模型的示意图;FIG. 10 is a schematic diagram of an identification model issued by a supplier of an identification model;

图11为服务器400得到识别模型的一种流程示意图;Fig. 11 is a kind of schematic flow chart that the server 400 obtains the recognition model;

图12为服务器对识别模型进行更新的一种流程示意图;Fig. 12 is a kind of schematic flow chart that the server updates the recognition model;

图13为本申请实施例中提供的一种语音交互方法的流程示意图一;13 is a schematic flowchart 1 of a voice interaction method provided in an embodiment of the application;

图14为本申请实施例中提供的一种语音交互方法的流程示意图二;14 is a second schematic flowchart of a voice interaction method provided in an embodiment of the application;

图15a至图15d为本发明实施例中显示设备的语音交互示意图;15a to 15d are schematic diagrams of voice interaction of a display device in an embodiment of the present invention;

图16a至图16d为本发明实施例中显示设备的另一种语音交互示意图。16a to 16d are schematic diagrams of another voice interaction of a display device in an embodiment of the present invention.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。此外,虽然本申请中公开内容按照示范性一个或几个实例来介绍,但应理解,可以就这些公开内容的各个方面也可以单独构成一个完整实施方式。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application. Furthermore, although the disclosures in this application have been presented in terms of illustrative example or instances, it should be understood that various aspects of this disclosure may also constitute a complete embodiment in isolation.

需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。It should be noted that the brief description of the terms in the present application is only for the convenience of understanding the embodiments described below, rather than intended to limit the embodiments of the present application. Unless otherwise specified, these terms are to be understood according to their ordinary and ordinary meanings.

本申请中说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似或同类的对象或实体,而不必然意味着限定特定的顺序或先后次序,除非另外注明。应该理解这样使用的用语在适当情况下可以互换,例如能够根据本申请实施例图示或描述中给出那些以外的顺序实施。The terms "first", "second" and the like in the description and claims of this application and the above drawings are used to distinguish similar or similar objects or entities, and do not necessarily mean to limit a specific order or sequence. unless otherwise noted. It should be understood that the terms so used are interchangeable under appropriate circumstances, eg, can be implemented in an order other than those given in the illustration or description of the embodiments of the present application.

此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖但不排他的包含,例如,包含了一系列组件的产品或设备不必限于清楚地列出的那些组件,而是可包括没有清楚地列出的或对于这些产品或设备固有的其它组件。Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover but not exclusively include, for example, a product or device incorporating a series of components is not necessarily limited to those explicitly listed, but may include No other components are expressly listed or inherent to these products or devices.

本申请中使用的术语“模块”,是指任何已知或后来开发的硬件、软件、固件、人工智能、模糊逻辑或硬件或/和软件代码的组合,能够执行与该元件相关的功能。The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic or combination of hardware or/and software code capable of performing the function associated with that element.

本申请中使用的术语“遥控器”,是指电子设备(如本申请中公开的显示设备)的一个组件,通常可在较短的距离范围内无线控制电子设备。一般使用红外线和/或射频(RF)信号和/或蓝牙与电子设备连接,也可以包括WiFi、无线USB、蓝牙、动作传感器等功能模块。例如:手持式触摸遥控器,是以触摸屏中用户界面取代一般遥控装置中的大部分物理内置硬键。As used in this application, the term "remote control" refers to a component of an electronic device, such as the display device disclosed in this application, that can wirelessly control the electronic device, usually over a short distance. Generally, infrared and/or radio frequency (RF) signals and/or Bluetooth are used to connect with electronic devices, and functional modules such as WiFi, wireless USB, Bluetooth, and motion sensors may also be included. For example, a hand-held touch remote control replaces most of the physical built-in hard keys in a general remote control device with a user interface in a touch screen.

图1中示例性示出了根据实施例中显示设备与控制装置之间操作场景的示意图。如图1中示出,用户可通过移动终端1002和控制装置1001操作显示设备200。FIG. 1 exemplarily shows a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in FIG. 1 , the user can operate the display apparatus 200 through the mobile terminal 1002 and the control apparatus 1001 .

在一些实施例中,控制装置1001可以是遥控器,遥控器和显示设备的通信包括红外协议通信或蓝牙协议通信,及其他短距离通信方式等,通过无线或其他有线方式来控制显示设备200。用户可以通过遥控器上按键,语音输入、控制面板输入等输入用户指令,来控制显示设备200。如:用户可以通过遥控器上音量加减键、频道控制键、上/下/左/右的移动按键、语音输入按键、菜单键、开关机按键等输入相应控制指令,来实现控制显示设备200的功能。In some embodiments, the control device 1001 may be a remote control, and the communication between the remote control and the display device includes infrared protocol communication or Bluetooth protocol communication, and other short-range communication methods, etc., and controls the display device 200 by wireless or other wired methods. The user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, and the like. For example, the user can control the display device 200 by inputting corresponding control commands through the volume up/down key, channel control key, up/down/left/right movement keys, voice input key, menu key, power-on/off key, etc. on the remote control. function.

在一些实施例中,也可以使用移动终端、平板电脑、计算机、笔记本电脑、和其他智能设备以控制显示设备200。例如,使用在智能设备上运行的应用程序控制显示设备200。该应用程序通过配置可以在与智能设备关联的屏幕上,在直观的用户界面(UI)中为用户提供各种控制。In some embodiments, mobile terminals, tablet computers, computers, notebook computers, and other smart devices may also be used to control the display device 200 . For example, the display device 200 is controlled using an application running on the smart device. The app can be configured to provide users with various controls in an intuitive user interface (UI) on the screen associated with the smart device.

在一些实施例中,移动终端1002可与显示设备200安装软件应用,通过网络通信协议实现连接通信,实现一对一控制操作的和数据通信的目的。如:可以实现用移动终端1002与显示设备200建立控制指令协议,将遥控控制键盘同步到移动终端1002上,通过控制移动终端1002上用户界面,实现控制显示设备200的功能。也可以将移动终端1002上显示音视频内容传输到显示设备200上,实现同步显示功能。In some embodiments, the mobile terminal 1002 can install a software application with the display device 200 to realize connection communication through a network communication protocol, so as to achieve the purpose of one-to-one control operation and data communication. For example, a control command protocol can be established between the mobile terminal 1002 and the display device 200, the remote control keyboard can be synchronized to the mobile terminal 1002, and the function of controlling the display device 200 can be realized by controlling the user interface on the mobile terminal 1002. It is also possible to transmit the audio and video content displayed on the mobile terminal 1002 to the display device 200 to implement a synchronous display function.

如图1中还示出,显示设备200还与服务器400通过多种通信方式进行数据通信。可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。示例的,显示设备200通过发送和接收信息,以及电子节目指南(EPG)互动,接收软件程序更新,或访问远程储存的数字媒体库。服务器400可以是一个集群,也可以是多个集群,可以包括一类或多类服务器。通过服务器400提供视频点播和广告服务等其他网络服务内容。As also shown in FIG. 1 , the display device 200 also performs data communication with the server 400 through various communication methods. The display device 200 may be allowed to communicate via local area network (LAN), wireless local area network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200 . For example, the display device 200 interacts by sending and receiving information, and electronic program guide (EPG), receiving software program updates, or accessing a remotely stored digital media library. The server 400 may be a cluster or multiple clusters, and may include one or more types of servers. Other network service contents such as video-on-demand and advertising services are provided through the server 400 .

显示设备200,可以液晶显示器、OLED显示器、投影显示设备。具体显示设备类型,尺寸大小和分辨率等不作限定,本领技术人员可以理解的是,显示设备200可以根据需要做性能和配置上一些改变。The display device 200 may be a liquid crystal display, an OLED display, or a projection display device. The specific display device type, size and resolution are not limited. Those skilled in the art can understand that the display device 200 can make some changes in performance and configuration as required.

显示设备200除了提供广播接收电视功能之外,还可以附加提供计算机支持功能的智能网络电视功能,包括但不限于,网络电视、智能电视、互联网协议电视(IPTV)等。The display device 200 may additionally provide a smart IPTV function that provides computer-supported functions, including but not limited to, IPTV, smart TV, Internet Protocol TV (IPTV), and the like, in addition to the broadcast receiving TV function.

图2中示例性示出了根据示例性实施例中显示设备200的硬件配置框图。FIG. 2 exemplarily shows a block diagram of the hardware configuration of the display device 200 according to the exemplary embodiment.

在一些实施例中,显示设备200中包括控制器250、调谐解调器210、通信器220、检测器230、输入/输出接口255、显示屏275,音频输出接口285、存储器260、供电电源290、用户接口265、外部装置接口240中的至少一种。In some embodiments, the display device 200 includes a controller 250 , a tuner 210 , a communicator 220 , a detector 230 , an input/output interface 255 , a display screen 275 , an audio output interface 285 , a memory 260 , and a power supply 290 , at least one of the user interface 265 and the external device interface 240 .

在一些实施例中,显示屏275,用于接收源自第一处理器输出的图像信号,进行显示视频内容和图像以及菜单操控界面的组件。In some embodiments, the display screen 275 is used to receive the image signal from the output of the first processor, and to perform components for displaying video content and images and a menu manipulation interface.

在一些实施例中,显示屏275,包括用于呈现画面的显示屏组件,以及驱动图像显示的驱动组件。In some embodiments, the display screen 275 includes a display screen component for presenting a picture, and a driving component for driving image display.

在一些实施例中,显示视频内容,可以来自广播电视内容,也可以是说,可通过有线或无线通信协议接收的各种广播信号。或者,可显示来自网络通信协议接收来自网络服务器端发送的各种图像内容。In some embodiments, the video content displayed may be from broadcast television content or various broadcast signals that may be received via wired or wireless communication protocols. Alternatively, various image contents sent from the network server side can be displayed and received from the network communication protocol.

在一些实施例中,显示屏275用于呈现显示设备200中产生且用于控制显示设备200的用户操控UI界面。In some embodiments, display screen 275 is used to present a user-manipulated UI interface generated in display device 200 and used to control display device 200 .

在一些实施例中,根据显示屏275类型不同,还包括用于驱动显示的驱动组件。In some embodiments, depending on the type of display screen 275, a driving component for driving the display is also included.

在一些实施例中,显示屏275为一种投影显示屏,还可以包括一种投影装置和投影屏幕。In some embodiments, the display screen 275 is a projection display screen, and may also include a projection device and a projection screen.

在一些实施例中,通信器220是用于根据各种通信协议类型与外部设备或外部服务器进行通信的组件。例如:通信器可以包括Wifi芯片,蓝牙通信协议芯片,有线以太网通信协议芯片等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example, the communicator may include at least one of a Wifi chip, a Bluetooth communication protocol chip, a wired Ethernet communication protocol chip and other network communication protocol chips or a near field communication protocol chip, and an infrared receiver.

在一些实施例中,显示设备200可以通过通信器220与外部控制设备100或内容提供设备之间建立控制信号和数据信号发送和接收。In some embodiments, the display device 200 may establish control signal and data signal transmission and reception between the communicator 220 and the external control device 100 or the content providing device.

在一些实施例中,用户接口265,可用于接收控制装置(如:红外遥控器等)红外控制信号。In some embodiments, the user interface 265 may be used to receive infrared control signals from a control device (eg, an infrared remote control, etc.).

在一些实施例中,检测器230是显示设备200用于采集外部环境或与外部交互的信号。In some embodiments, the detector 230 is a signal used by the display device 200 to collect the external environment or interact with the outside.

在一些实施例中,检测器230包括光接收器,用于采集环境光线强度的传感器,可以通过采集环境光可以自适应性显示参数变化等。In some embodiments, the detector 230 includes a light receiver, a sensor for collecting ambient light intensity, and can adaptively display parameter changes and the like by collecting ambient light.

在一些实施例中,检测器230还可以包括图像采集器,如相机、摄像头等,可以用于采集外部环境场景,以及用于采集用户的属性或与用户交互手势,可以自适应变化显示参数,也可以识别用户手势,以实现与用户之间互动的功能。In some embodiments, the detector 230 may also include an image collector, such as a camera, a camera, etc., which can be used to collect external environment scenes, and used to collect user attributes or interactive gestures with the user, and can adaptively change display parameters, User gestures can also be recognized to implement functions that interact with users.

在一些实施例中,检测器230还可以包括温度传感器等,如通过感测环境温度。In some embodiments, detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

在一些实施例中,显示设备200可自适应调整图像的显示色温。如当温度偏高的环境时,可调整显示设备200显示图像色温偏冷色调,或当温度偏低的环境时,可以调整显示设备200显示图像偏暖色调。In some embodiments, the display device 200 can adaptively adjust the display color temperature of the image. For example, when the temperature is relatively high, the display device 200 can be adjusted to display a relatively cool color temperature of the image, or when the temperature is relatively low, the display device 200 can be adjusted to display a warmer color of the image.

在一些实施例中,检测器230还可声音采集器等,如麦克风,可以用于接收用户的声音。示例性的,包括用户控制显示设备200的控制指令的语音信号,或采集环境声音,用于识别环境场景类型,使得显示设备200可以自适应适应环境噪声。In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, which may be used to receive the user's voice. Exemplarily, a voice signal including a control instruction of the user to control the display device 200, or collecting ambient sounds, is used to identify the type of the environment scene, so that the display device 200 can adaptively adapt to the ambient noise.

在一些实施例中,如图2所示,输入/输出接口255被配置为,可进行控制器250与外部其他设备或其他控制器250之间的数据传输。如接收外部设备的视频信号数据和音频信号数据、或命令指令数据等。In some embodiments, as shown in FIG. 2 , the input/output interface 255 is configured to enable data transfer between the controller 250 and other external devices or other controllers 250 . Such as receiving video signal data and audio signal data of external equipment, or command instruction data, etc.

在一些实施例中,外部装置接口240可以包括,但不限于如下:可以高清多媒体接口HDMI接口、模拟或数据高清分量输入接口、复合视频输入接口、USB输入接口、RGB端口等任一个或多个接口。也可以是上述多个接口形成复合性的输入/输出接口。In some embodiments, the external device interface 240 may include, but is not limited to, the following: any one or more of a high-definition multimedia interface HDMI interface, an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port, etc. interface. It is also possible to form a composite input/output interface by a plurality of the above-mentioned interfaces.

在一些实施例中,如图2所示,调谐解调器210被配置为,通过有线或无线接收方式接收广播电视信号,可以进行放大、混频和谐振等调制解调处理,从多个无线或有线广播电视信号中解调出音视频信号,该音视频信号可以包括用户所选择电视频道频率中所携带的电视音视频信号,以及EPG数据信号。In some embodiments, as shown in FIG. 2 , the tuner and demodulator 210 is configured to receive broadcast television signals through wired or wireless reception, and can perform modulation and demodulation processing such as amplification, frequency mixing, and resonance, and can perform modulation and demodulation processing from multiple wireless receivers. Or demodulate the audio and video signal from the cable broadcast TV signal, the audio and video signal may include the TV audio and video signal carried in the frequency of the TV channel selected by the user, and the EPG data signal.

在一些实施例中,调谐解调器210解调的频点受到控制器250的控制,控制器250可根据用户选择发出控制信号,以使的调制解调器响应用户选择的电视信号频率以及调制解调该频率所携带的电视信号。In some embodiments, the frequency demodulated by the tuner-demodulator 210 is controlled by the controller 250, and the controller 250 can send a control signal according to the user's selection, so that the modem responds to the user-selected TV signal frequency and modulates and demodulates the frequency. The frequency carried by the television signal.

在一些实施例中,广播电视信号可根据电视信号广播制式不同区分为地面广播信号、有线广播信号、卫星广播信号或互联网广播信号等。或者根据调制类型不同可以区分为数字调制信号,模拟调制信号等。或者根据信号种类不同区分为数字信号、模拟信号等。In some embodiments, broadcast television signals may be classified into terrestrial broadcast signals, cable broadcast signals, satellite broadcast signals, or Internet broadcast signals, etc. according to different broadcast formats of the television signals. Or according to different modulation types, it can be divided into digital modulation signal, analog modulation signal, etc. Or, it can be divided into digital signals, analog signals, etc. according to different types of signals.

在一些实施例中,控制器250和调谐解调器210可以位于不同的分体设备中,即调谐解调器210也可在控制器250所在的主体设备的外置设备中,如外置机顶盒等。这样,机顶盒将接收到的广播电视信号调制解调后的电视音视频信号输出给主体设备,主体设备经过第一输入/输出接口接收音视频信号。In some embodiments, the controller 250 and the tuner 210 may be located in different separate devices, that is, the tuner 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box Wait. In this way, the set-top box outputs the modulated and demodulated television audio and video signals of the received broadcast television signals to the main device, and the main device receives the audio and video signals through the first input/output interface.

在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250可以控制显示设备200的整体操作。例如:响应于接收到用于选择在显示屏275上显示UI对象的用户命令,控制器250便可以执行与由用户命令选择的对象有关的操作。In some embodiments, the controller 250, through various software control programs stored on the memory, controls the operation of the display device and responds to user operations. The controller 250 may control the overall operation of the display apparatus 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display screen 275, the controller 250 may perform an operation related to the object selected by the user command.

如图2所示,控制器250包括随机存取存储器251(Random Access Memory,RAM)、只读存储器252(Read-Only Memory,ROM)、视频处理器270、音频处理器280、其他处理器253(例如:图形处理器(Graphics Processing Unit,GPU)、中央处理器254(CentralProcessing Unit,CPU)、通信接口(Communication Interface),以及通信总线256(Bus)中的至少一种。其中,通信总线连接各个部件。As shown in FIG. 2 , the controller 250 includes a random access memory 251 (Random Access Memory, RAM), a read-only memory 252 (Read-Only Memory, ROM), a video processor 270, an audio processor 280, and other processors 253 (For example: at least one of a graphics processing unit (Graphics Processing Unit, GPU), a central processing unit 254 (Central Processing Unit, CPU), a communication interface (Communication Interface), and a communication bus 256 (Bus). Wherein, the communication bus connects various parts.

在一些实施例中,RAM 251用于存储操作系统或其他正在运行中的程序的临时数据In some embodiments, RAM 251 is used to store temporary data for the operating system or other running programs

在一些实施例中,ROM 252用于存储各种系统启动的指令。In some embodiments, ROM 252 is used to store various system startup instructions.

在一些实施例中,ROM 252用于存储一个基本输入输出系统,称为基本输入输出系统(Basic Input Output System,BIOS)。用于完成对系统的加电自检、系统中各功能模块的初始化、系统的基本输入/输出的驱动程序及引导操作系统。In some embodiments, ROM 252 is used to store a basic input output system, referred to as a Basic Input Output System (BIOS). It is used to complete the power-on self-check of the system, the initialization of each functional module in the system, the driver program of the basic input/output of the system, and the boot operating system.

在一些实施例中,在收到开机信号时,显示设备200电源开始启动,CPU运行ROM252中系统启动指令,将存储在存储器的操作系统的临时数据拷贝至RAM 251中,以便于启动或运行操作系统。当操作系统启动完成后,CPU再将存储器中各种应用程序的临时数据拷贝至RAM 251中,然后,以便于启动或运行各种应用程序。In some embodiments, when a power-on signal is received, the power supply of the display device 200 starts, and the CPU executes the system start-up instruction in the ROM 252, and copies the temporary data of the operating system stored in the memory to the RAM 251, so as to facilitate the start-up or running operation system. After the startup of the operating system is completed, the CPU copies the temporary data of various application programs in the memory to the RAM 251, so as to facilitate the startup or operation of various application programs.

在一些实施例中,CPU处理器254,用于执行存储在存储器中操作系统和应用程序指令。以及根据接收外部输入的各种交互指令,来执行各种应用程序、数据和内容,以便最终显示和播放各种音视频内容。In some embodiments, the CPU processor 254 executes operating system and application program instructions stored in memory. And various application programs, data and content are executed according to various interactive instructions received from the external input, so as to finally display and play various audio and video content.

在一些示例性实施例中,CPU处理器254,可以包括多个处理器。多个处理器可包括一个主处理器以及一个或多个子处理器。主处理器,用于在预加电模式中执行显示设备200一些操作,和/或在正常模式下显示画面的操作。一个或多个子处理器,用于在待机模式等状态下一种操作。In some exemplary embodiments, CPU processor 254 may include multiple processors. The plurality of processors may include a main processor and one or more sub-processors. The main processor is used to perform some operations of the display device 200 in the pre-power-on mode, and/or an operation of displaying a picture in the normal mode. One or more sub-processors for an operation in a state such as standby mode.

在一些实施例中,图形处理器253,用于产生各种图形对象,如:图标、操作菜单、以及用户输入指令显示图形等。包括运算器,通过接收用户输入各种交互指令进行运算,根据显示属性显示各种对象。以及包括渲染器,对基于运算器得到的各种对象,进行渲染,上述渲染后的对象用于显示在显示屏上。In some embodiments, the graphics processor 253 is used to generate various graphic objects, such as: icons, operation menus, and user input instructions to display graphics and the like. It includes an operator, which performs operations by receiving various interactive instructions input by the user, and displays various objects according to the display properties. and includes a renderer, which renders various objects obtained based on the operator, and the rendered objects are used for displaying on the display screen.

在一些实施例中,视频处理器270被配置为将接收外部视频信号,根据输入信号的标准编解码协议,进行解压缩、解码、缩放、降噪、帧率转换、分辨率转换、图像合成等等视频处理,可得到直接可显示设备200上显示或播放的信号。In some embodiments, the video processor 270 is configured to receive the external video signal and perform decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, etc. according to the standard codec protocol of the input signal. After video processing, a signal that can be directly displayed or played on the display device 200 can be obtained.

在一些实施例中,视频处理器270,包括解复用模块、视频解码模块、图像合成模块、帧率转换模块、显示格式化模块等。In some embodiments, the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.

其中,解复用模块,用于对输入音视频数据流进行解复用处理,如输入MPEG-2,则解复用模块进行解复用成视频信号和音频信号等。Wherein, the demultiplexing module is used for demultiplexing the input audio and video data stream. For example, if MPEG-2 is input, the demultiplexing module demultiplexes it into video signals and audio signals.

视频解码模块,则用于对解复用后的视频信号进行处理,包括解码和缩放处理等。The video decoding module is used to process the demultiplexed video signal, including decoding and scaling.

图像合成模块,如图像合成器,其用于将图形生成器根据用户输入或自身生成的GUI信号,与缩放处理后视频图像进行叠加混合处理,以生成可供显示的图像信号。The image synthesizing module, such as an image synthesizer, is used for superimposing and mixing the GUI signal generated by the graphics generator according to the user's input or itself, and the zoomed video image, so as to generate an image signal that can be displayed.

帧率转换模块,用于对转换输入视频帧率,如将60Hz帧率转换为120Hz帧率或240Hz帧率,通常的格式采用如插帧方式实现。The frame rate conversion module is used to convert the input video frame rate, such as converting 60Hz frame rate to 120Hz frame rate or 240Hz frame rate. The usual format is implemented by means of frame insertion.

显示格式化模块,则用于将接收帧率转换后视频输出信号,改变信号以符合显示格式的信号,如输出RGB数据信号。The display formatting module is used for converting the received frame rate into the video output signal, and changing the signal to conform to the display format signal, such as outputting the RGB data signal.

在一些实施例中,图形处理器253可以和视频处理器可以集成设置,也可以分开设置,集成设置的时候可以执行输出给显示屏的图形信号的处理,分离设置的时候可以分别执行不同的功能,例如GPU+FRC(Frame Rate Conversion))架构。In some embodiments, the graphics processor 253 may be integrated with the video processor, or may be separately configured. When integrated, the graphics signal processing output to the display screen may be performed, and when separated, different functions may be performed separately. , such as GPU+FRC (Frame Rate Conversion)) architecture.

在一些实施例中,音频处理器280,用于接收外部的音频信号,根据输入信号的标准编解码协议,进行解压缩和解码,以及降噪、数模转换、和放大处理等处理,得到可以在扬声器中播放的声音信号。In some embodiments, the audio processor 280 is configured to receive an external audio signal, perform decompression and decoding, and noise reduction, digital-to-analog conversion, and amplification processing according to a standard codec protocol of the input signal to obtain a The sound signal played in the speaker.

在一些实施例中,视频处理器270可以包括一颗或多颗芯片组成。音频处理器,也可以包括一颗或多颗芯片组成。In some embodiments, the video processor 270 may comprise one or more chips. The audio processor may also include one or more chips.

在一些实施例中,视频处理器270和音频处理器280,可以单独的芯片,也可以于控制器一起集成在一颗或多颗芯片中。In some embodiments, the video processor 270 and the audio processor 280 may be separate chips, or may be integrated into one or more chips together with the controller.

供电电源290,在控制器250控制下,将外部电源输入的电力为显示设备200提供电源供电支持。供电电源290可以包括安装显示设备200内部的内置电源电路,也可以是安装在显示设备200外部电源,在显示设备200中提供外接电源的电源接口。The power supply 290, under the control of the controller 250, provides power supply support for the display device 200 with the power input from the external power supply. The power supply 290 may include a built-in power supply circuit installed inside the display device 200 , or may be an external power supply installed in the display device 200 to provide an external power supply interface in the display device 200 .

用户接口265,用于接收用户的输入信号,然后,将接收用户输入信号发送给控制器250。用户输入信号可以是通过红外接收器接收的遥控器信号,可以通过网络通信模块接收各种用户控制信号。The user interface 265 is used for receiving user input signals, and then sending the received user input signals to the controller 250 . The user input signal may be a remote control signal received through an infrared receiver, and various user control signals may be received through a network communication module.

在一些实施例中,用户通过控制装置或移动终端输入用户命令,用户输入接口则根据用户的输入,显示设备200则通过控制器250响应用户的输入。In some embodiments, the user inputs user commands through the control device or the mobile terminal, the user input interface is based on the user's input, and the display device 200 responds to the user's input through the controller 250 .

在一些实施例中,用户可在显示器275上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。In some embodiments, the user may input user commands on a graphical user interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the graphical user interface (GUI). Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.

在一些实施例中,“用户界面”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面常用的表现形式是图形用户界面(Graphic User Interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。In some embodiments, a "user interface" is a medium interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. A commonly used form of user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of the electronic device, wherein the control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. visual interface elements.

存储器260,包括存储用于驱动显示设备200的各种软件模块。如:第一存储器中存储的各种软件模块,包括:基础模块、检测模块、通信模块、显示控制模块、浏览器模块、和各种服务模块等中的至少一种。The memory 260 includes storing various software modules for driving the display device 200 . For example, various software modules stored in the first memory include at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.

基础模块用于显示设备200中各个硬件之间信号通信、并向上层模块发送处理和控制信号的底层软件模块。检测模块用于从各种传感器或用户输入接口中收集各种信息,并进行数模转换以及分析管理的管理模块。The basic module is used for signal communication between various hardwares in the display device 200, and is a low-level software module that sends processing and control signals to the upper-layer module. The detection module is a management module used to collect various information from various sensors or user input interfaces, perform digital-to-analog conversion, and analyze and manage.

例如,语音识别模块中包括语音解析模块和语音指令数据库模块。显示控制模块用于控制显示器进行显示图像内容的模块,可以用于播放多媒体图像内容和UI界面等信息。通信模块,用于与外部设备之间进行控制和数据通信的模块。浏览器模块,用于执行浏览服务器之间数据通信的模块。服务模块,用于提供各种服务以及各类应用程序在内的模块。同时,存储器260还用存储接收外部数据和用户数据、各种用户界面中各个项目的图像以及焦点对象的视觉效果图等。For example, the speech recognition module includes a speech parsing module and a speech instruction database module. The display control module is a module used to control the display to display image content, and can be used to play information such as multimedia image content and UI interface. Communication module, a module for control and data communication with external devices. The browser module is a module for performing data communication between browsing servers. Service modules are used to provide various services and modules including various applications. At the same time, the memory 260 is also used to store and receive external data and user data, images of various items in various user interfaces, and visual effect diagrams of focus objects, and the like.

图3示例性示出了根据示例性实施例中控制设备1001的配置框图。如图3所示,控制设备1001包括控制器110、通信接口130、用户输入/输出接口、存储器、供电电源。FIG. 3 exemplarily shows a block diagram of the configuration of the control device 1001 according to an exemplary embodiment. As shown in FIG. 3, the control device 1001 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

控制设备1001被配置为控制显示设备200,以及可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起用用户与显示设备200之间交互中介作用。如:用户通过操作控制设备1001上频道加减键,显示设备200响应频道加减的操作。The control device 1001 is configured to control the display device 200 , and can receive the user's input operation instructions, and convert the operation instructions into instructions that the display device 200 can recognize and respond to, so as to play an interactive role between the user and the display device 200 . For example, the user operates the channel addition and subtraction keys on the control device 1001, and the display device 200 responds to the channel addition and subtraction operations.

在一些实施例中,控制设备1001可是一种智能设备。如:控制设备1001可根据用户需求安装控制显示设备200的各种应用。In some embodiments, the control device 1001 may be a smart device. For example, the control device 1001 can install various applications for controlling the display device 200 according to user requirements.

在一些实施例中,如图1所示,移动终端1002或其他智能电子设备,可在安装操控显示设备200的应用之后,可以起到控制设备1001类似功能。如:用户可以通过安装应用,在移动终端1002或其他智能电子设备上可提供的图形用户界面的各种功能键或虚拟按钮,以实现控制设备1001实体按键的功能。In some embodiments, as shown in FIG. 1 , the mobile terminal 1002 or other intelligent electronic device can perform a similar function of controlling the device 1001 after installing the application for operating the display device 200 . For example, the user can install the application, various function keys or virtual buttons of the graphical user interface available on the mobile terminal 1002 or other intelligent electronic devices, so as to realize the function of controlling the physical keys of the device 1001 .

控制器110包括处理器112和RAM 113和ROM 114、通信接口130以及通信总线。控制器用于控制控制设备1001的运行和操作,以及内部各部件之间通信协作以及外部和内部的数据处理功能。The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used to control the operation and operation of the control device 1001, as well as communication and cooperation between internal components and external and internal data processing functions.

通信接口130在控制器110的控制下,实现与显示设备200之间控制信号和数据信号的通信。如:将接收到的用户输入信号发送至显示设备200上。通信接口130可包括WiFi芯片131、蓝牙模块132、NFC模块133等其他近场通信模块中至少之一种。The communication interface 130 realizes the communication of control signals and data signals with the display device 200 under the control of the controller 110 . For example, the received user input signal is sent to the display device 200 . The communication interface 130 may include at least one of other near field communication modules such as a WiFi chip 131 , a Bluetooth module 132 , and an NFC module 133 .

用户输入/输出接口140,其中,输入接口包括麦克风141、触摸板142、传感器143、按键144等其他输入接口中至少一者。如:用户可以通过语音、触摸、手势、按压等动作实现用户指令输入功能,输入接口通过将接收的模拟信号转换为数字信号,以及数字信号转换为相应指令信号,发送至显示设备200。The user input/output interface 140, wherein the input interface includes at least one of other input interfaces such as a microphone 141, a touch panel 142, a sensor 143, and a key 144. For example, the user can implement the user command input function through actions such as voice, touch, gesture, pressing, etc. The input interface converts the received analog signal into a digital signal, and converts the digital signal into a corresponding command signal, and sends it to the display device 200.

输出接口包括将接收的用户指令发送至显示设备200的接口。在一些实施例中,可以红外接口,也可以是射频接口。如:红外信号接口时,需要将用户输入指令按照红外控制协议转化为红外控制信号,经红外发送模块进行发送至显示设备200。再如:射频信号接口时,需将用户输入指令转化为数字信号,然后按照射频控制信号调制协议进行调制后,由射频发送端子发送至显示设备200。The output interface includes an interface for transmitting received user instructions to the display device 200 . In some embodiments, it can be an infrared interface or a radio frequency interface. For example, when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to the infrared control protocol, and sent to the display device 200 through the infrared sending module. For another example, when a radio frequency signal interface is used, the user input command needs to be converted into a digital signal, and then modulated according to the radio frequency control signal modulation protocol, and then sent to the display device 200 by the radio frequency transmission terminal.

在一些实施例中,控制设备1001包括通信接口130和输入输出接口140中至少一者。In some embodiments, the control device 1001 includes at least one of a communication interface 130 and an input-output interface 140 .

存储器190,用于在控制器的控制下存储驱动和控制控制设备1001的各种运行程序、数据和应用。存储器190,可以存储用户输入的各类控制信号指令。The memory 190 is used to store various operating programs, data and applications for driving and controlling the control device 1001 under the control of the controller. The memory 190 can store various control signal instructions input by the user.

供电电源180,用于在控制器的控制下为控制设备1001各元件提供运行电力支持。可以电池及相关控制电路。The power supply 180 is used to provide operating power support for each element of the control device 1001 under the control of the controller. Can battery and related control circuit.

在一些实施例中,系统可以包括内核(Kernel)、命令解析器(shell)、文件系统和应用程序。内核、shell和文件系统一起组成了基本的操作系统结构,它们让用户可以管理文件、运行程序并使用系统。上电后,内核启动,激活内核空间,抽象硬件、初始化硬件参数等,运行并维护虚拟内存、调度器、信号及进程间通信(IPC)。内核启动后,再加载Shell和用户应用程序。应用程序在启动后被编译成机器码,形成一个进程。In some embodiments, a system may include a kernel (Kernel), a command parser (shell), a file system, and applications. Together, the kernel, shell, and file system make up the basic operating system structures that allow users to manage files, run programs, and use the system. After power-on, the kernel starts, activates the kernel space, abstracts hardware, initializes hardware parameters, etc., runs and maintains virtual memory, scheduler, signals and inter-process communication (IPC). After the kernel starts, the shell and user applications are loaded. An application is compiled into machine code after startup, forming a process.

图4为本申请提供的显示设备的软件系统示意图,参见图4,在一些实施例中,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架(Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。FIG. 4 is a schematic diagram of a software system of a display device provided by the present application. Referring to FIG. 4 , in some embodiments, the system is divided into four layers, which are respectively an application layer (referred to as “application layer”) from top to bottom. , the Application Framework layer (referred to as the "framework layer"), the Android runtime (Android runtime) and the system library layer (referred to as the "system runtime layer"), and the kernel layer.

在一些实施例中,应用程序层中运行有至少一个应用程序,这些应用程序可以是操作系统自带的窗口(Window)程序、系统设置程序、时钟程序、相机应用等;也可以是第三方开发者所开发的应用程序,比如嗨见程序、K歌程序、魔镜程序等。在具体实施时,应用程序层中的应用程序包不限于以上举例,实际还可以包括其它应用程序包,本申请实施例对此不做限制。In some embodiments, at least one application program runs in the application program layer, and these application programs may be a Window program, a system setting program, a clock program, a camera application, etc. built into the operating system; they may also be developed by a third party The application programs developed by the author, such as the Hijian program, the K song program, the magic mirror program, etc. During specific implementation, the application package in the application layer is not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.

框架层为应用程序层的应用程序提供应用编程接口(application programminginterface,API)和编程框架。应用程序框架层包括一些预先定义的函数。应用程序框架层相当于一个处理中心,这个中心决定让应用层中的应用程序做出动作。应用程序通过API接口,可在执行中访问系统中的资源和取得系统的服务The framework layer provides an application programming interface (application programming interface, API) and a programming framework for the applications of the application layer. The application framework layer includes some predefined functions. The application framework layer is equivalent to a processing center, which decides to let the applications in the application layer take action. Through the API interface, the application can access the resources in the system and obtain the services of the system during execution

如图4所示,本申请实施例中应用程序框架层包括管理器(Managers),内容提供者(Content Provider)等,其中管理器包括以下模块中的至少一个:活动管理器(ActivityManager)用与和系统中正在运行的所有活动进行交互;位置管理器(Location Manager)用于给系统服务或应用提供了系统位置服务的访问;文件包管理器(Package Manager)用于检索当前安装在设备上的应用程序包相关的各种信息;通知管理器(NotificationManager)用于控制通知消息的显示和清除;窗口管理器(Window Manager)用于管理用户界面上的图标、窗口、工具栏、壁纸和桌面部件。As shown in FIG. 4 , the application framework layer in the embodiment of the present application includes managers (Managers), content providers (Content Provider), etc., wherein the manager includes at least one of the following modules: the activity manager (ActivityManager) is used with Interact with all activities running in the system; Location Manager is used to provide system services or applications with access to system location services; Package Manager is used to retrieve files currently installed on the device. Various information related to the application package; Notification Manager (NotificationManager) is used to control the display and clearing of notification messages; Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpapers and desktop components on the user interface .

在一些实施例中,活动管理器用于:管理各个应用程序的生命周期以及通常的导航回退功能,比如控制应用程序的退出(包括将显示窗口中当前显示的用户界面切换到系统桌面)、打开、后退(包括将显示窗口中当前显示的用户界面切换到当前显示的用户界面的上一级用户界面)等。In some embodiments, the activity manager is used to: manage the life cycle of each application and the usual navigation and fallback functions, such as controlling the exit of the application (including switching the user interface currently displayed in the display window to the system desktop), opening the , back (including switching the currently displayed user interface in the display window to the upper-level user interface of the currently displayed user interface), and the like.

在一些实施例中,窗口管理器用于管理所有的窗口程序,比如获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕,控制显示窗口变化(例如将显示窗口缩小显示、抖动显示、扭曲变形显示等)等。In some embodiments, the window manager is used to manage all window programs, such as obtaining the size of the display screen, judging whether there is a status bar, locking the screen, taking screenshots, and controlling the change of the display window (for example, reducing the display window to display, shaking display, twisting deformation display, etc.), etc.

在一些实施例中,系统运行库层为上层即框架层提供支撑,当框架层被使用时,安卓操作系统会运行系统运行库层中包含的C/C++库以实现框架层要实现的功能。In some embodiments, the system runtime layer provides support for the upper layer, that is, the framework layer. When the framework layer is used, the Android operating system will run the C/C++ library included in the system runtime layer to implement the functions to be implemented by the framework layer.

在一些实施例中,内核层是硬件和软件之间的层。如图4所示,内核层至少包含以下驱动中的至少一种:音频驱动、显示驱动、蓝牙驱动、摄像头驱动、WIFI驱动、USB驱动、HDMI驱动、传感器驱动(如指纹传感器,温度传感器,触摸传感器、压力传感器等)等。In some embodiments, the kernel layer is the layer between hardware and software. As shown in Figure 4, the kernel layer at least includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, touch sensors, pressure sensors, etc.), etc.

在一些实施例中,内核层还包括用于进行电源管理的电源驱动模块。In some embodiments, the kernel layer further includes a power driver module for power management.

在一些实施例中,图4中的软件架构对应的软件程序和/或模块存储在图2或图3所示的第一存储器或第二存储器中。In some embodiments, software programs and/or modules corresponding to the software architecture in FIG. 4 are stored in the first memory or the second memory shown in FIG. 2 or FIG. 3 .

在一些实施例中,以魔镜应用(拍照应用)为例,当遥控接收装置接收到遥控器输入操作,相应的硬件中断被发给内核层。内核层将输入操作加工成原始输入事件(包括输入操作的值,输入操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,根据焦点当前的位置识别该输入事件所对应的控件以及以该输入操作是确认操作,该确认操作所对应的控件为魔镜应用图标的控件,魔镜应用调用应用框架层的接口,启动魔镜应用,进而通过调用内核层启动摄像头驱动,实现通过摄像头捕获静态图像或视频。In some embodiments, taking the magic mirror application (photography application) as an example, when the remote control receiving device receives the input operation of the remote control, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the input operation into the original input event (including the value of the input operation, the timestamp of the input operation and other information). Raw input events are stored at the kernel layer. The application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the input event according to the current position of the focus, and regards the input operation as a confirmation operation, and the control corresponding to the confirmation operation is the control of the magic mirror application icon. The mirror application calls the interface of the application framework layer, starts the mirror application, and then starts the camera driver by calling the kernel layer to capture still images or videos through the camera.

在一些实施例中,对于具备触控功能的显示设备,以分屏操作为例,显示设备接收用户作用于显示屏上的输入操作(如分屏操作),内核层可以根据输入操作产生相应的输入事件,并向应用程序框架层上报该事件。由应用程序框架层的活动管理器设置与该输入操作对应的窗口模式(如多窗口模式)以及窗口位置和大小等。应用程序框架层的窗口管理根据活动管理器的设置绘制窗口,然后将绘制的窗口数据发送给内核层的显示驱动,由显示驱动在显示屏的不同显示区域显示与之对应的应用界面。In some embodiments, for a display device with a touch function, taking a split-screen operation as an example, the display device receives an input operation (such as a split-screen operation) performed by the user on the display screen, and the kernel layer can generate corresponding input operations according to the input operation. Enter an event and report the event to the application framework layer. The window mode (such as multi-window mode) and window position and size corresponding to the input operation are set by the activity manager of the application framework layer. The window management of the application framework layer draws the window according to the settings of the activity manager, and then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display screen.

在一些实施例中,图5为本申请提供的显示设备能够提供的应用程序的示意图,如图5中所示,应用程序层包含至少一个应用程序可以在显示器中显示对应的图标控件,如:直播电视应用程序图标控件、视频点播应用程序图标控件、媒体中心应用程序图标控件、应用程序中心图标控件、游戏应用图标控件等。In some embodiments, FIG. 5 is a schematic diagram of an application program that can be provided by the display device provided by the present application. As shown in FIG. 5 , the application program layer includes at least one application program that can display corresponding icon controls on the display, such as: Live TV application icon control, video on demand application icon control, media center application icon control, application center icon control, game application icon control, etc.

在一些实施例中,直播电视应用程序,可以通过不同的信号源提供直播电视。例如,直播电视应用程序可以使用来自有线电视、无线广播、卫星服务或其他类型的直播电视服务的输入提供电视信号。以及,直播电视应用程序可在显示设备200上显示直播电视信号的视频。In some embodiments, the live TV application may provide live TV from different sources. For example, a live TV application may provide a TV signal using input from cable, over-the-air, satellite services, or other types of live TV services. And, the live TV application may display the video of the live TV signal on the display device 200 .

在一些实施例中,视频点播应用程序,可以提供来自不同存储源的视频。不同于直播电视应用程序,视频点播提供来自某些存储源的视频显示。例如,视频点播可以来自云存储的服务器端、来自包含已存视频节目的本地硬盘储存器。In some embodiments, a video-on-demand application may provide video from various storage sources. Unlike live TV applications, video-on-demand provides a display of video from certain storage sources. For example, video-on-demand can come from the server side of cloud storage, from local hard disk storage containing existing video programs.

在一些实施例中,媒体中心应用程序,可以提供各种多媒体内容播放的应用程序。例如,媒体中心,可以为不同于直播电视或视频点播,用户可通过媒体中心应用程序访问各种图像或音频所提供服务。In some embodiments, the media center application may provide various multimedia content playback applications. For example, a media center may provide services other than live TV or video-on-demand, where users can access various images or audio through a media center application.

在一些实施例中,应用程序中心,可以提供储存各种应用程序。应用程序可以是一种游戏、应用程序,或某些和计算机系统或其他设备相关但可以在智能电视中运行的其他应用程序。应用程序中心可从不同来源获得这些应用程序,将它们储存在本地储存器中,然后在显示设备200上可运行。In some embodiments, the application center may provide storage of various applications. An application can be a game, an application, or some other application that is related to a computer system or other device but can be run on a Smart TV. The application center can obtain these applications from various sources, store them in local storage, and then run them on the display device 200 .

更为具体地,在一些实施例中,本申请前述的任一显示设备200,均可具有语音交互的功能,来提高显示设备200的智能化程度,并提高显示设备200的用户体验。More specifically, in some embodiments, any of the aforementioned display devices 200 in the present application may have the function of voice interaction, so as to improve the intelligence of the display device 200 and improve the user experience of the display device 200 .

在一些实施例中,图6为显示设备在语音交互场景的一种应用示意图,其中,用户1可以通过声音说出希望显示设备200执行的指令,则对于显示设备200可以实时采集语音数据,并对语音数据中包括的用户1的指令进行识别,并在识别出用户1的指令后,直接执行该指令,在整个过程中,用户1没有实际对显示设备200或者其他设备进行操作,只是简单地说出了指令。In some embodiments, FIG. 6 is a schematic diagram of an application of a display device in a voice interaction scenario, in which the user 1 can speak the command that the display device 200 wishes to execute through voice, then the display device 200 can collect voice data in real time, and Identify the instruction of user 1 included in the voice data, and directly execute the instruction after recognizing the instruction of user 1. During the whole process, user 1 does not actually operate the display device 200 or other devices, but simply commanded.

在一些实施例中,当如图2所示的显示设备200应用在如图6所示的场景中,显示设备200可以通过其声音采集器231实时采集语音数据,随后,声音采集器231将采集得到的语音数据发送给控制器250,最终由控制器250对语音数据中包括的指令进行识别。In some embodiments, when the display device 200 shown in FIG. 2 is applied in the scenario shown in FIG. 6 , the display device 200 can collect voice data in real time through its sound collector 231 , and then the sound collector 231 will collect voice data in real time. The obtained voice data is sent to the controller 250, and finally the controller 250 recognizes the instructions included in the voice data.

在一些实施例中,图7为显示设备应用在语音交互场景的流程示意图,可以由如图6所示场景中的设备执行,具体地,在S11中,显示设备200内的声音采集器231实时采集显示设备200所在周围环境中的语音数据,并将所采集到的语音数据发送给控制器250进行识别。In some embodiments, FIG. 7 is a schematic flowchart of the application of the display device in the voice interaction scenario, which can be executed by the device in the scenario shown in FIG. 6 . Specifically, in S11, the sound collector 231 in the display device 200 is real-time The voice data in the surrounding environment where the display device 200 is located is collected, and the collected voice data is sent to the controller 250 for identification.

在一些实施例中,在如图7所示的S12中,控制器250在接收到语音数据后,对语音数据中包括的指令进行识别。例如,语音数据中包括用户1所出的“增大亮度”的指令,则控制器250并在识别到语音数据中包括的指令后,可以由该控制器250执行所识别出的指令,控制显示器275增加亮度。可以理解的是,这种情况下控制器250对每个接收到的语音数据进行识别,可能出现识别语音数据中没有指令情况。In some embodiments, in S12 as shown in FIG. 7 , after receiving the voice data, the controller 250 recognizes the instructions included in the voice data. For example, if the voice data includes an instruction of “increase the brightness” from the user 1, after the controller 250 recognizes the instruction included in the voice data, the controller 250 can execute the identified instruction to control the display 275 increases brightness. It can be understood that, in this case, the controller 250 recognizes each received voice data, and it may happen that there is no instruction in the recognized voice data.

而在另一些实施例中,基于指令识别的模型较大、运算效率较低,还可以规定用户1在说出指令前加入关键词,例如“ABCD”,则用户需要说出“ABCD,增大亮度”的指令,使得在如图7所示的S12中,控制器250在接收到语音数据后,首先对每个语音数据中是否有“ABCD”的关键词进行识别,在识别到有关键词之后,再使用指令识别模型对语音数据中的“增大亮度”对应的具体指令进行识别。In other embodiments, the model based on instruction recognition is relatively large and the operation efficiency is low, and it can also be specified that user 1 adds a keyword before speaking the instruction, such as "ABCD", then the user needs to say "ABCD, increase the “Brightness” instruction, so that in S12 as shown in FIG. 7 , after receiving the voice data, the controller 250 firstly identifies whether there is a keyword of “ABCD” in each voice data, and after identifying that there is a keyword After that, the instruction recognition model is used to identify the specific instruction corresponding to "increase the brightness" in the speech data.

在一些实施例中,控制器250在接收到语音数据后,还可以对语音数据进行去噪,包括去除回声和环境噪声,处理为干净的语音数据,并将处理后的语音数据进行识别。In some embodiments, after receiving the speech data, the controller 250 may further perform denoising on the speech data, including removing echoes and ambient noise, processing it into clean speech data, and recognizing the processed speech data.

在一些实施例中,图7为显示设备在语音交互场景的另一种应用示意图,其中,显示设备200可以通过互联网与服务器400连接,则当显示设备200采集到语音数据后,可以将语音数据通过互联网发送给服务器400,由服务器400对语音数据中包括的指令进行识别,并将识别后的指令发送回显示设备200,使得显示设备200可以直接执行所接收到的指令。这种场景与如图6所示的场景相比,减少了对显示设备200运算能力的要求,能够在服务器400上设置更大的识别模型,来进一步提高对语音数据中指令识别的准确率。In some embodiments, FIG. 7 is a schematic diagram of another application of the display device in a voice interaction scenario, wherein the display device 200 can be connected to the server 400 through the Internet, and after the display device 200 collects the voice data, the voice data can be Sent to the server 400 through the Internet, the server 400 recognizes the instructions included in the voice data, and sends the recognized instructions back to the display device 200, so that the display device 200 can directly execute the received instructions. Compared with the scene shown in FIG. 6 , this scenario reduces the requirement on the computing power of the display device 200 , and a larger recognition model can be set on the server 400 to further improve the accuracy of command recognition in the voice data.

在一些实施例中,当如图2所示的显示设备200应用在如图6所示的场景中,显示设备200可以通过其声音采集器231实时采集语音数据,随后,声音采集器231将采集得到的语音数据发送给控制器250,控制器250通过通信器220将语音数据发送给服务器400,由服务器400对语音数据中包括的指令进行识别后,显示设备200再通过通信器220接收服务器400发送的指令,并最终由控制器250执行所接收到的指令。In some embodiments, when the display device 200 shown in FIG. 2 is applied in the scenario shown in FIG. 6 , the display device 200 can collect voice data in real time through its sound collector 231 , and then the sound collector 231 will collect voice data in real time. The obtained voice data is sent to the controller 250, and the controller 250 sends the voice data to the server 400 through the communicator 220. After the server 400 recognizes the instructions included in the voice data, the display device 200 receives the server 400 through the communicator 220. sent instructions, and finally the received instructions are executed by the controller 250 .

在一些实施例中,图9为显示设备应用在语音交互场景的另一流程示意图,可以由如图8所示的场景中的设备执行,其中,在S21中,显示显示设备200内的声音采集器231实时采集显示设备200所在周围环境中的语音数据,并将所采集到的语音数据发送给控制器250,控制器250在S22中将语音数据进一步通过通信器220发送给服务器400,由服务器在S23中识别语音数据中包括的指令,随后,服务器400将识别得到的指令在S24中发送回显示设备200,对应地,显示设备200通过通信器220接收指令后发送给控制器250,最终控制器250可以直接执行所接收到的指令。In some embodiments, FIG. 9 is another schematic flowchart of the application of the display device in the voice interaction scenario, which can be executed by the device in the scenario as shown in FIG. 8 , wherein, in S21 , the sound collection in the display device 200 is displayed. The device 231 collects the voice data in the surrounding environment where the display device 200 is located in real time, and sends the collected voice data to the controller 250. The controller 250 further sends the voice data to the server 400 through the communicator 220 in S22, and the server The instruction included in the voice data is recognized in S23, and then, the server 400 sends the recognized instruction back to the display device 200 in S24. Correspondingly, the display device 200 receives the instruction through the communicator 220 and sends it to the controller 250, and finally controls the The controller 250 may directly execute the received instructions.

在一些实施例中,如图7所示的S23中,服务器400在接收到语音数据后,对语音数据中包括的指令进行识别。例如,语音数据中包括用户1所出的“增大亮度”的指令。而由于指令识别的模型较大,且服务器400对每个接收到的语音数据进行识别,可能出现识别语音数据中没有指令情况,因此为了降低服务器400进行无效的识别、以及减少显示设备200和服务器400之间的通信交互数据量,在具体实现时,还可以规定用户1在说出指令前加入关键词,例如“ABCD”,则用户需要说出“ABCD,增大亮度”的指令,随后,由显示设备200的控制器250在S22中,首先通过模型较小、运算量较低的关键词识别模型,对语音数据中是否存在关键词“ABCD”进行识别,若当前控制器250正在处理的语音数据中没有识别出关键词,则控制器250不会将该语音数据发送给服务器400;若当前控制器250正在处理的语音数据中识别出关键词,则控制器250再将该语音数据全部,或者语音数据中关键词之后的部分发送给服务器400,由服务器400对所接收到的语音数据进行识别。由于此时控制器250所接收到的语音数据中包括关键词,发送给服务器400所识别的语音数据中也更有可能包括用户的指令,因此能够减少服务器400的无效识别计算,也能够减少显示设备200和服务器400之间的无效通信。In some embodiments, in S23 shown in FIG. 7 , after receiving the voice data, the server 400 recognizes the instructions included in the voice data. For example, the voice data includes an instruction of "increase the brightness" given by the user 1 . However, since the model of instruction recognition is relatively large, and the server 400 recognizes each received voice data, it may happen that there is no instruction in the recognized voice data. Therefore, in order to reduce the invalid recognition performed by the server 400 and reduce the number of display devices 200 and servers The amount of communication interaction data between 400 and 400, in the specific implementation, it can also be specified that user 1 adds a keyword before speaking the command, such as "ABCD", then the user needs to speak the command "ABCD, increase brightness", then, In S22, the controller 250 of the display device 200 first identifies whether the keyword "ABCD" exists in the speech data through a keyword recognition model with a smaller model and a lower computational load. If no keyword is identified in the voice data, the controller 250 will not send the voice data to the server 400; if a keyword is identified in the voice data currently being processed by the controller 250, the controller 250 will then send all the voice data to the server 400. , or the part after the keyword in the speech data is sent to the server 400, and the server 400 recognizes the received speech data. Since the voice data received by the controller 250 includes keywords at this time, the voice data sent to the server 400 for recognition is more likely to include the user's instruction, so the invalid recognition calculation of the server 400 can be reduced, and the display can be reduced. Invalid communication between device 200 and server 400.

在一些实施例中,为了让显示设备200能够具有如图6所示的一种具体场景中,对语音数据中指令的识别功能,或者,让显示设备200能够具有如图6或图8所示的一种具体场景中,对语音数据中关键词的识别功能,作为显示设备200的语音交互功能的供应商,还需要制作可用于识别指令或者识别关键词的机器学习模型,例如textcnn、transform等深度学习模型。并将这些模型存储在显示设备200中,由显示设备200在进行识别时使用。In some embodiments, in order to enable the display device 200 to have the function of recognizing instructions in the voice data in a specific scenario as shown in FIG. 6 , or to enable the display device 200 to have the function as shown in FIG. 6 or FIG. 8 In a specific scenario of , the recognition function of keywords in the voice data, as a supplier of the voice interaction function of the display device 200, also needs to make a machine learning model that can be used to identify instructions or identify keywords, such as textcnn, transform, etc. Deep learning models. These models are stored in the display device 200 and used by the display device 200 when performing identification.

在一些实施例中,图10为识别模型的供应商下发识别模型的示意图,其中,供应商所设置的服务器400得到识别模型(可以是指令识别模型,也可以是关键词识别模型)后,可以将识别模型发送给各个显示设备200。其中,如图10所示的过程可以是显示设备200在生产时进行,由服务器400将服务器发送给每个显示设备200;或者,还可以在显示设备200开始使用后,服务器400通过互联网将识别模型发送给显示设备200。In some embodiments, FIG. 10 is a schematic diagram of an identification model issued by a supplier of the identification model, wherein, after the server 400 set by the supplier obtains the identification model (which may be an instruction identification model or a keyword identification model), The recognition model may be sent to each display device 200 . The process shown in FIG. 10 may be performed during the production of the display device 200, and the server 400 sends the server to each display device 200; or, after the display device 200 starts to use, the server 400 will identify The model is sent to the display device 200 .

在一些实施例中,服务器400可以具体通过采集语音数据,并基于机器学习模型进行学习的方式,得到识别模型。例如,图11为服务器400得到识别模型的一种流程示意图,其中,在S31中,各显示设备(以显示设备1-显示设备N,共N个为例)采集语音数据1-N,并在S32中将采集到的语音数据1-N发送给服务器400。随后,在S33中,供应商的工作人员可以通过人工标注的方式,将每个语音数据以及语音数据中包括的指令或者关键词进行标注后,将语音数据本身,以及语音数据对应的标注信息作为数据送入机器学习模型中,由服务器进行学习,学习得到的识别模型在后续使用时,当输入一个待识别的语音数据后,识别模型将该语音数据与已经学习的语音数据进行比对,并输出每个标注信息的概率,最终最大概率对应的标注信息可以作为待识别的语音数据的识别结果。在S34中,服务器400可以将计算得到的识别模型发送各显示设备。In some embodiments, the server 400 may obtain the recognition model by collecting speech data and learning based on the machine learning model. For example, FIG. 11 is a schematic flowchart of a recognition model obtained by the server 400, wherein, in S31, each display device (take display device 1-display device N, N as an example) collects voice data 1-N, In S32, the collected voice data 1-N are sent to the server 400. Subsequently, in S33, the staff of the supplier can mark each voice data and the instructions or keywords included in the voice data by manually marking, and then use the voice data itself and the marking information corresponding to the voice data as The data is sent into the machine learning model, and the server learns. When the learned recognition model is used later, when a voice data to be recognized is input, the recognition model compares the voice data with the learned voice data, and then The probability of each label information is output, and the label information corresponding to the final maximum probability can be used as the recognition result of the speech data to be recognized. In S34, the server 400 may send the calculated recognition model to each display device.

在一些实施例中,服务器400也可以不使用如图11所示实施例中由显示设备1-N实际采集的语音数据计算识别模型,而是可以直接由工作人员输入不同的语音数据,以及每个语音数据的标注信息,并在计算得到识别模型后发送给各显示设备。In some embodiments, the server 400 may not use the voice data actually collected by the display devices 1-N in the embodiment shown in FIG. 11 to calculate the recognition model, but may directly input different voice data by the staff, and each The annotation information of each voice data is sent to each display device after the recognition model is obtained by calculation.

在一些实施例中,如图11所示的采集语音数据并发送给服务器的显示设备1-N,与服务器计算得到识别模型后,发送给显示设备1-N中可以是两个独立的过程,也就是说,S32中服务器接收到N个显示设备采集的语音数据,S34中服务器可以向另外N个显示设备发送所训练得到的识别模型。这两个过程中N个显示设备可以相同或不同,或者也可以部分相同。In some embodiments, as shown in FIG. 11 , the display device 1-N that collects voice data and sends it to the server, and the server calculates the recognition model and sends it to the display device 1-N can be two independent processes, That is to say, in S32, the server receives the voice data collected by the N display devices, and in S34, the server can send the trained recognition model to the other N display devices. The N display devices in the two processes may be the same or different, or may be partially the same.

在一些实施例中,由于在得到识别模型时,所使用的样本数量有限,使得显示设备200所设置的识别模型不可能做到完全百分百准确的识别,因此供应商还可以通过服务器400随时收集各显示设备200在实际使用过程中所采集的语音数据,并根据所采集的语音数据对已经识别得到的识别模型进行更新,来进一步提高识别模型的识别准确性。In some embodiments, due to the limited number of samples used when obtaining the recognition model, it is impossible for the recognition model set by the display device 200 to achieve 100% accurate recognition. Therefore, the supplier can also use the server 400 at any time. The voice data collected by each display device 200 during actual use is collected, and the recognized recognition model is updated according to the collected voice data, so as to further improve the recognition accuracy of the recognition model.

例如,图12为服务器对识别模型进行更新的一种流程示意图,可以理解的是,在执行如图12所示的实施例之前,每个显示设备中按照如图10所示的方式,设置了识别模型。则在如图12所示S31中,各显示设备(以显示设备1-显示设备N,共N个为例)采集语音数据1-N,并在S32中将采集到的语音数据1-N发送给服务器400。随后,在S33中,供应商的工作人员可以通过人工标注的方式,将每个语音数据以及语音数据中包括的指令或者关键词进行标注后,将语音数据本身,以及语音数据对应的标注信息作为数据送入机器学习模型中,由服务器根据接收到的新的语音数据,对已经计算得到的识别模型进行更新,并在S34中,服务器400可以将更新后的识别模型重新发送各显示设备200,使得每个显示设备200可以使用更新后的识别模型进行更新。其中,对于这N个显示设备中的任一个显示设备而言,由于新的学习模型采用了这个显示设备200所采集的语音数据,因此可以有效地提高后续对这个显示设备200对所采集的语音数据进行识别的准确性。For example, FIG. 12 is a schematic flowchart of the server updating the recognition model. It can be understood that, before the embodiment shown in FIG. 12 is executed, each display device is set as shown in FIG. Identify the model. Then in S31 as shown in FIG. 12 , each display device (taking display device 1-display device N, a total of N as an example) collects voice data 1-N, and sends the collected voice data 1-N in S32. to server 400. Subsequently, in S33, the staff of the supplier can mark each voice data and the instructions or keywords included in the voice data by manually marking, and then use the voice data itself and the marking information corresponding to the voice data as The data is sent into the machine learning model, and the server updates the recognition model that has been calculated according to the new voice data received, and in S34, the server 400 can re-send the updated recognition model to each display device 200, This enables each display device 200 to be updated using the updated recognition model. Among them, for any one of the N display devices, since the new learning model adopts the voice data collected by the display device 200, it can effectively improve the follow-up response to the voice data collected by the display device 200. The accuracy of the identification of the data.

在一些实施例中,如图12所示的每个显示设备,可以在接收到语音数据后就发送给服务器,或者,在固定的时间段结束后将在这个时间段内采集到的语音数据发送给服务器,又或者,当采集到的一定数量的语音数据后统一发送给服务器,又或者,可以根据显示设备的用户的指示、或者根据服务器的工作人员的指示将已经接收到的语音发送给服务器。In some embodiments, each display device as shown in FIG. 12 can send the voice data to the server after receiving it, or send the voice data collected during the fixed period of time after the end of the fixed period of time. To the server, or, after collecting a certain amount of voice data, send it to the server uniformly, or, according to the instructions of the user of the display device, or according to the instructions of the staff of the server, the received voice can be sent to the server .

在一些实施例中,如图12所示的N个显示设备可以在同一个约定的时刻同时将语音数据发送给服务器,由服务器根据接收到的N个语音数据对识别模型进行更新;或者,N个显示设备还可以分别将语音数据发送给服务器,服务器在接收到语音数据的数量大于N个之后,即可开始根据接收到的语音数据对识别模型进行更新。In some embodiments, the N display devices shown in FIG. 12 may simultaneously send voice data to the server at the same appointed time, and the server updates the recognition model according to the N voice data received; or, N Each display device can also send voice data to the server respectively, and after the server receives more than N voice data, the server can start to update the recognition model according to the received voice data.

在搭建语音交互系统时,除去语音识别和文本转语音模块,还需要四层处理能力。首先是基础特征处理层,主要包括分词、语义标签标注、情感识别等;第二层为意图理解层,用于将用户提问转换成机器能理解的、结构化的用户意图表示,此时的用户意图可能有多种可能;第三层为对话管理层,用于基于上下文信息进一步明确用户意图,更新对话状态,从而根据对话策略做出决策,此过程会调用第四层业务服务层提供的数据服务(如获取媒资信息)和回复方案。比如针对同一个标题(title)的搜索,若音乐的搜索结果明显优于影视,那么语音交互系统在得到业务服务层的反馈信息后,就会将音乐的搜索结果反馈给用户。When building a voice interaction system, in addition to the speech recognition and text-to-speech modules, four layers of processing capability are required. The first is the basic feature processing layer, which mainly includes word segmentation, semantic labeling, emotion recognition, etc.; the second layer is the intent understanding layer, which is used to convert user questions into machine-understandable and structured user intent representations. At this time, the user There may be many possibilities for the intent; the third layer is the dialog management layer, which is used to further clarify the user's intent based on the context information, update the dialog state, and make decisions according to the dialog strategy. This process will call the data provided by the fourth layer of business service layer. Services (such as access to media assets) and response plans. For example, for a search of the same title, if the search result of music is obviously better than that of film and television, the voice interaction system will feed back the search result of music to the user after obtaining the feedback information from the business service layer.

在一些实施方式中,显示设备在接收到用户输入的语音后,一般先对用户输入的语音进行语音识别,并通过语义理解引擎确定用户意图,然后再根据用户意图来为用户提供相关的服务。其中,当前的语义理解引擎通过限定域多轮建模、指代消解建模,在特定场景下进行多轮交互和省略补全,同时利用基于机器学习模型中的中控决策模块,综合考虑不同语义处理模块的结果进行排序决策,完成用户意图的定位。但在更多的常规领域下没有考虑对话历史、用户所处场景等因素,在面对交叉业务决策或者用户说法意图模糊的情况下,无法做出准确的意图理解与定位决策。In some embodiments, after receiving the speech input by the user, the display device generally performs speech recognition on the speech input by the user, determines the user's intent through a semantic understanding engine, and then provides the user with related services according to the user's intent. Among them, the current semantic understanding engine performs multiple rounds of interaction and omission completion in specific scenarios through limited domain multi-round modeling and referential resolution modeling. The results of the semantic processing module make sorting decisions to complete the positioning of user intentions. However, in more conventional fields, factors such as dialogue history and user scenarios are not considered, and in the face of cross-business decisions or the user's intentions are ambiguous, accurate intention understanding and positioning decisions cannot be made.

为了解决上述技术问题,本申请实施例中提供了一种显示设备,当用户输入语句对应的候选用户意图为两种或两种以上时,显示设备通过一种拟人化的交互方式主动向用户反馈询问语句,然后再根据用户输入的应答语句来确定用户的真实意图,能够有效提升语音交互过程中用户意图理解的准确性。下面采用详细的实施例进行详细说明。In order to solve the above technical problems, an embodiment of the present application provides a display device, when the user input sentence corresponds to two or more candidate user intentions, the display device actively feeds back to the user in an anthropomorphic interactive manner Inquiry sentence, and then determine the user's true intention according to the response sentence input by the user, which can effectively improve the accuracy of the user's intention understanding in the process of voice interaction. Detailed description is given below by using detailed embodiments.

基于上述实施例中所描述的显示设备200,在一种可行的实施方式中,该显示设备200的语音采集装置可以为显示设备200对应的麦克风阵列,或者,该语音采集装置也可以为显示设备200对应的控制装置上的麦克风。Based on the display device 200 described in the foregoing embodiment, in a feasible implementation manner, the voice acquisition device of the display device 200 may be a microphone array corresponding to the display device 200, or the voice acquisition device may also be a display device 200 corresponds to the microphone on the control device.

本实施例中,上述语音采集装置在采集到用户的语音数据后,将采集到的语音数据发送至显示设备中的音频处理器,由音频处理器对该语音数据进行预处理之后发送至显示设备中的控制器。In this embodiment, after collecting the user's voice data, the above-mentioned voice collecting device sends the collected voice data to the audio processor in the display device, and the audio processor preprocesses the voice data and sends it to the display device in the controller.

上述控制器在接收到用户输入的第一语音数据之后,基于意图识别模型确定该第一语音数据对应的候选用户意图。After receiving the first voice data input by the user, the above-mentioned controller determines the candidate user intent corresponding to the first voice data based on the intent recognition model.

在一些实施例中,上述意图识别模型可以通过分类的方式将句子或者query分到相应的意图种类。其中,不同的用户意图会对应不同的领域词典,比如书名,歌曲名,商品名等等。可以根据query和词典的匹配程度或者重合程度来进行判断,query与哪个领域的词典重合程度高,就将该query判别给哪个领域。In some embodiments, the above-mentioned intent recognition model can classify sentences or queries into corresponding intent categories by means of classification. Among them, different user intentions will correspond to different domain dictionaries, such as book titles, song titles, commodity names, and so on. Judgment can be made according to the degree of matching or coincidence between the query and the dictionary. In which domain the query and the dictionary have a high degree of coincidence, the query is judged to which domain.

例如,当用户输入的query为“夕阳之歌”时,这个query的意图便是属于音乐意图,当用户输入的query为“新闻联播”时,这个query的意图便是属于新闻搜索意图。For example, when the query entered by the user is "song of the sunset", the intent of the query is music intent, and when the query entered by the user is "news broadcast", the intent of the query is news search intent.

在一些实施例中,用户输入的query可能会对应多种用户意图,例如,用户在查询"生化危机"时,由于"生化危机"不仅有相关的游戏,还有相关的电影,为了更加准确的定位用户意图,在本申请一种可行的实施方式中,方用户输入的query对应多种用户意图时,根据已经确定的多个候选用户意图生成一个询问语句,例如“你是想看电影,还是想玩游戏?”,然后将该询问语句发送至显示屏上进行显示,和/或发送至扬声器进行播放,并继续采集用户的语音数据。In some embodiments, the query input by the user may correspond to various user intentions. For example, when the user queries "Resident Evil", because "Resident Evil" not only has related games, but also related movies, in order to be more accurate To locate the user's intent, in a feasible implementation manner of the present application, when the query input by the user corresponds to multiple user intents, a query sentence is generated according to multiple determined candidate user intents, such as "Do you want to watch a movie, or Want to play a game?", then send the query sentence to the display screen for display, and/or send it to the speaker for playback, and continue to collect the user's voice data.

示例性的,当后续采集到用户的语音数据为“看电影”时,则搜索"生化危机"相关的电影,并推送至显示屏上进行显示或播放;当后续采集到用户的语音数据为“玩游戏”时,则搜索"生化危机"相关的游戏,并推送至显示屏上进行显示,或直接启动该游戏。Exemplarily, when the user's voice data is subsequently collected as "watching a movie", a movie related to "Resident Evil" is searched and pushed to the display screen for display or playback; when the user's voice data collected subsequently is " When playing games", search for "Resident Evil" related games, and push them to the display for display, or directly start the game.

即本申请实施例中所提供的显示设备,当面对交叉业务决策或者用户意图模糊时,通过一种拟人化的交互方式主动与用户进行语音交互,然后再根据与用户的会话来确定用户的真实意图,可以节省用户的搜索点击次数,缩短搜索时间,大幅提升语音交互过程中用户意图理解的准确性。That is, the display device provided in the embodiment of the present application, when faced with cross-service decisions or ambiguous user intentions, actively interacts with the user through an anthropomorphic interaction mode, and then determines the user's status according to the conversation with the user. Real intent can save users' search clicks, shorten search time, and greatly improve the accuracy of user intent understanding in the process of voice interaction.

基于上述实施例中的描述,在本申请一些实施例中,上述显示设备的控制器在接收到第一语音数据之后,还可以获取语音交互过程中确定的历史用户意图,然后根据第一语音数据与上述历史用户意图,利用意图识别模型来确定第一语音数据对应的候选用户意图。比如针对同一个询问语句(query),若用户在一段时间内一直在搜索新闻类的媒资,那么更应该把该query理解为新闻搜索业务,而不是百科类业务。Based on the descriptions in the above embodiments, in some embodiments of the present application, after receiving the first voice data, the controller of the above display device may also acquire the historical user intent determined during the voice interaction, and then, according to the first voice data With the above historical user intent, the intent recognition model is used to determine the candidate user intent corresponding to the first voice data. For example, for the same query sentence, if the user has been searching for news media resources for a period of time, the query should be understood as a news search service rather than an encyclopedia service.

其中,上述意图识别模型可以处理针对中控动作的用户通用回复说法的识别,如对“第一个”、“电影”、“我不想看这个”等用户通用回复说法的识别。整个模型的处理步骤如下:Among them, the above-mentioned intent recognition model can handle the recognition of the general user response to the central control action, such as the recognition of the general user response such as "the first", "movie", "I don't want to watch this". The processing steps of the entire model are as follows:

一、query归一处理:遍历候选用户意图集合,将query中出现的意图名进行替换。这里的意图名会支持一定泛化,如音乐搜索的泛化说法为“听歌儿”、“歌曲”等,这部分数据会在语言生成中对意图的泛化基础上增加一些更精简自由的说法。1. Query normalization processing: Traverse the set of candidate user intents and replace the intent names that appear in the query. The intent name here will support a certain generalization. For example, the generalization of music search is "Tinggeer", "song", etc. This part of the data will add some more streamlined and free based on the generalization of intent in language generation. statement.

二、意图解析:遍历数据库预定义的通用意图分类规则集(每条规则包含规则内容、规则匹配方式、意图决策结果、优先级,规则匹配按照优先级排序),若匹配到规则,则将该规则的意图决策结果添加到最终解析结果中,然后将query的已匹配部分以占位符替换后再遍历后续规则,直到遍历完所有规则。2. Intent parsing: Traverse the general intent classification rule set predefined in the database (each rule includes rule content, rule matching method, intent decision result, priority, and rule matching is sorted by priority), if a rule is matched, the The intent decision result of the rule is added to the final parsing result, and then the matched part of the query is replaced with a placeholder, and subsequent rules are traversed until all rules are traversed.

三、结果校验:结合解析完整度,判断意图识别结果是否可输出。若不能输出,则将结果对象置为null。3. Result verification: Combined with the analytical integrity, determine whether the intent recognition result can be output. If no output is available, the result object is set to null.

为了更好的理解本申请,本实施例假设用户输入的query为“春暖花开”,其中“春暖花开”既存在相关的电视剧,又存在相关的歌曲。则当获取到的语音交互过程中确定的历史用户意图为“音乐”时,则可以认为用户目前正在听音乐,从而确定上述query对应的候选用户意图为“音乐”;当获取到的语音交互过程中确定的历史用户意图为“电视剧”时,则可以认为用户目前正在看电视剧,从而确定上述query对应的候选用户意图为“电视剧”。For better understanding of the present application, this embodiment assumes that the query input by the user is "spring flowers bloom", wherein "spring flowers bloom" contains both related TV dramas and related songs. Then, when the historical user intent determined in the acquired voice interaction process is "music", it can be considered that the user is currently listening to music, so that the candidate user intent corresponding to the above query is determined to be "music"; when the acquired voice interaction process When the historical user intent determined in the query is "TV series", it can be considered that the user is currently watching a TV series, so that the candidate user intent corresponding to the above query is determined to be "TV series".

在一些可行的实施方式中,上述历史用户意图可以为上一轮语音交互过程中确定的用户意图。In some feasible implementations, the above-mentioned historical user intent may be the user intent determined in the last round of voice interaction.

即本申请实施例中所提供的显示设备,通过结合语音交互过程中确定的历史用户意图,来对当前接收到的语音数据进行用户意图的定位,有助于快速的对用户的真实意图进行定位。That is, the display device provided in the embodiment of the present application locates the user's intent on the currently received voice data by combining the historical user intent determined in the voice interaction process, which is helpful for quickly locating the user's true intent. .

在本申请一些实施例中,上述显示设备的控制器在接收到上述第一语音数据,且获取到语音交互过程中的历史用户意图之后,基于对话状态追踪模型确定第一语音数据与上述历史用户意图是否属于同一个对话序列。In some embodiments of the present application, after the controller of the display device receives the first voice data and acquires the historical user intent in the voice interaction process, it determines the relationship between the first voice data and the historical user based on a conversation state tracking model. Whether the intent belongs to the same dialogue sequence.

上述对话状态追踪模型包括会话切分模块与对话状态更新模块。其中,会话切分模块用于判断当前用户回复是否应开启新会话,可以采用基于时间间隔和文本语义的方式进行切分。其中,基于时间间隔的切分是基于用户在较短时间内的意图不发生改变的假设,并通过对用户日志的分析,根据上轮业务的不同来制定不同的切分标准。而基于文本语义的切分则侧重于处理语音识别错误、语义解析能力不足以及业务间存在强关联等情况,通过计算文本相似度及业务间关联矩阵进行切分。The above dialogue state tracking model includes a session segmentation module and a dialogue state update module. Among them, the session segmentation module is used to determine whether the current user reply should open a new session, and the segmentation can be performed based on time interval and text semantics. Among them, the segmentation based on time interval is based on the assumption that the user's intention does not change in a relatively short period of time, and through the analysis of user logs, different segmentation standards are formulated according to the difference of the previous round of services. The segmentation based on text semantics focuses on dealing with speech recognition errors, insufficient semantic parsing capabilities, and strong correlations between services. The segmentation is performed by calculating text similarity and correlation matrix between services.

在一些实施方式中,在当前第一语音数据的接收时间与上述历史用户意图的确定时间之间的间隔时长大于预设的时长阈值时,则可以认为当前接收到的第一语音数据与上述历史用户意图数据不属于同一个对话序列,反之,则属于同一个对话序列。In some implementation manners, when the interval duration between the current reception time of the first voice data and the determination time of the above-mentioned historical user intention is greater than a preset duration threshold, it can be considered that the currently received first voice data and the above-mentioned historical User intent data does not belong to the same dialogue sequence, otherwise, it belongs to the same dialogue sequence.

例如,假设用户在某一时间段通过语音交互的方式选择播放了一首歌曲,并在间隔一段时间后该用户又再次唤醒了语音助手,并进行了语音输入,则可以理解的是,当上述间隔时长较小时,用户继续选择听歌的概率会比较大,其很可能是想选择播放另其它歌曲,因此可以认为用户当前输入的语音与上一次语音交互时确定的用户意图都属于同一个对话序列。反之,当上述间隔时长较大时,用户继续选择听歌的概率会比较小,其可能是想换一种娱乐方式,例如改为看电视,因此可以认为用户当前输入的语音与上一次语音交互时确定的用户意图不属于同一个对话序列。For example, assuming that a user chooses to play a song through voice interaction during a certain period of time, and after a period of time, the user wakes up the voice assistant again and performs voice input, it is understandable that when the above mentioned When the interval is small, the probability of the user continuing to choose to listen to the song will be relatively high. It is likely that he wants to choose to play another song. Therefore, it can be considered that the user's current input voice and the user's intention determined during the last voice interaction belong to the same dialogue. sequence. On the contrary, when the above interval is longer, the probability of the user continuing to choose to listen to songs will be relatively small. It may be that he wants to change the entertainment method, such as watching TV. Therefore, it can be considered that the user's current input voice interacts with the last voice interaction. The user intent determined at the same time does not belong to the same dialogue sequence.

示例性的,假设用户在某一时间段唤醒语音助手,并说出“播放夕阳之歌”,此时显示设备识别用户的意图为音乐,播放歌曲《夕阳之歌》。在间隔一段时间后该用户又再次唤醒了语音助手,并说出“春暖花开”,则当上述间隔时长小于预设的时长阈值时,则可以认为“播放夕阳之歌”与“春暖花开”属于同一个对话序列,用户的意图为音乐;当上述间隔时长大于或等于预设的时长阈值时,则可以认为“播放夕阳之歌”与“春暖花开”不属于同一个对话序列,用户的意图可能为音乐,也可能为电视剧。Exemplarily, it is assumed that the user wakes up the voice assistant in a certain period of time and says "play the song of the sunset". At this time, the display device recognizes that the user's intention is music and plays the song "song of the sunset". After a period of time, the user wakes up the voice assistant again and says "Spring flowers are blooming", then when the above interval is less than the preset duration threshold, it can be considered that "playing the song of the sunset" and "Spring flowers are blooming" "belongs to the same dialogue sequence, and the user's intention is music; when the above interval is greater than or equal to the preset duration threshold, it can be considered that "playing the song of the sunset" and "spring flowers blooming" do not belong to the same dialogue sequence, and the user The intent may be music or TV series.

在一些实施方式中,在当前第一语音数据解析后的文本语义与上述历史用户意图的文本语义之间的相似度小于预设相似度阈值时,则可以认为当前接收到的第一语音数据与上述历史用户意图数据不属于同一个对话序列,反之,则属于同一个对话序列。In some embodiments, when the similarity between the parsed text semantics of the current first voice data and the text semantics of the above historical user intent is less than a preset similarity threshold, it can be considered that the currently received first voice data is the same as the The above historical user intent data does not belong to the same dialogue sequence, otherwise, it belongs to the same dialogue sequence.

例如,假设用户在某一时间段通过语音交互的方式选择播放了一首歌曲,并在间隔一段时间后该用户又再次唤醒了语音助手,并进行了语音输入。则可以理解的是,此时若输入的语音数据解析后的文本语义与上一轮语音交互时确定的用户意图的文本语义之间的相似度较大,则可以认为用户当前的意图未发生变化,用户当前输入的语音与上一次语音交互时确定的用户意图属于同一个对话序列。反之,若用户输入的语音数据解析后的文本语义与上一轮语音交互时确定的用户意图的文本语义之间的相似度较小,则可以认为用户当前的意图已经发生变化,可以认为用户当前输入的语音与上一次语音交互时确定的用户意图不属于同一个对话序列。For example, it is assumed that a user selects to play a song through voice interaction during a certain period of time, and after a period of time, the user wakes up the voice assistant again and performs voice input. It can be understood that, if the similarity between the text semantics of the input voice data after parsing and the text semantics of the user's intention determined in the previous round of voice interaction is relatively large, it can be considered that the user's current intention has not changed. , the user's current input voice and the user's intent determined during the last voice interaction belong to the same dialogue sequence. Conversely, if the similarity between the parsed text semantics of the voice data input by the user and the text semantics of the user's intention determined in the previous round of voice interaction is small, it can be considered that the user's current intention has changed, and it can be considered that the user's current intention has changed. The input speech does not belong to the same dialogue sequence as the user intent determined during the last speech interaction.

示例性的,假设用户在某一时间段唤醒语音助手,并说出“播放夕阳之歌”,此时显示设备识别用户的意图为音乐,播放歌曲《夕阳之歌》。在间隔一段时间后该用户又再次唤醒了语音助手,并说出“玩游戏”,由于当前输入的“玩游戏”与上一轮语音交互时确定的用户意图“音乐”之间的相似度较小,则可以认为“播放夕阳之歌”与“玩游戏”不属于同一个对话序列。Exemplarily, it is assumed that the user wakes up the voice assistant in a certain period of time and says "play the song of the sunset". At this time, the display device recognizes that the user's intention is music and plays the song "song of the sunset". After a period of time, the user wakes up the voice assistant again and says "play a game", because the similarity between the current input "play a game" and the user's intention "music" determined during the last round of voice interaction is relatively high. If it is small, it can be considered that "playing the song of the sunset" and "playing the game" do not belong to the same dialogue sequence.

上述对话状态更新模块用于根据上述会话切分模块的切分结果,新建或更新当前用户的对话状态。The above-mentioned dialogue state updating module is configured to create or update the dialogue state of the current user according to the segmentation result of the above-mentioned session segmentation module.

在本实施例中,当确定上述第一语音数据与上述历史用户意图属于同一个对话序列时,可以结合上述第一语音数据与上述历史用户意图来确定上述第一语音数据的候选用户意图;当确定上述第一语音数据与上述历史用户意图不属于同一个对话序列时,则建立新的对话序列,仅基于上述第一语音数据来确定上述第一语音数据的候选用户意图。In this embodiment, when it is determined that the first voice data and the historical user intent belong to the same dialogue sequence, the candidate user intent of the first voice data can be determined by combining the first voice data and the historical user intent; when When it is determined that the first voice data and the historical user intent do not belong to the same dialogue sequence, a new dialogue sequence is established, and the candidate user intent of the first voice data is determined only based on the first voice data.

基于上述实施例中的描述,在本申请一些实施例中,可以基于第一语音数据对应的各个候选用户意图,为对话策略学习模型中的各个输出模块进行打分,根据各个输出模块的得分,输出离散化的对话动作,与用户进行对话交互。Based on the descriptions in the above embodiments, in some embodiments of the present application, each output module in the dialogue strategy learning model may be scored based on each candidate user intent corresponding to the first speech data, and according to the score of each output module, output Discretized dialog actions that interact with the user in a dialog.

其中,对话策略学习模型中包括改写模块,指代消解模块、垂直领域意图解析模块、任务多轮应答模块、问答模块、新闻搜索模块、聊天模块、推荐模块以及候选意图解析模块。Among them, the dialogue strategy learning model includes a rewriting module, a referential resolution module, a vertical domain intent parsing module, a task multi-round response module, a question-and-answer module, a news search module, a chat module, a recommendation module, and a candidate intent parsing module.

在一种可行的实施方式中,可以采用如下策略来选择输出模块:In a feasible implementation, the following strategies can be adopted to select the output module:

步骤1、根据各个输出模块的得分,对各个输出模块进行排序,若排序第一的输出模块的得分大于或等于给定阈值1,则执行步骤4;否则,执行步骤2。Step 1. Rank each output module according to the score of each output module. If the score of the output module ranked first is greater than or equal to a given threshold value 1, go to Step 4; otherwise, go to Step 2.

步骤2、确定排序第一的输出模块的得分是否小于给定阈值2,若是,则输出default(默认)动作;否则,执行步骤3。其中,给定阈值2小于给定阈值1。Step 2: Determine whether the score of the output module ranked first is less than the given threshold 2, if so, output the default action; otherwise, go to Step 3. Among them, the given threshold 2 is less than the given threshold 1.

步骤3、若排序第二的输出模块的得分大于或等于给定阈值2,且排序前二的两个输出模块的得分之差小于给定阈值3,则输出select(选择)动作,该select动作用于提供选择选项,该选择选项中包括排序第一的输出模块与排序第二的输出模块;否则,输出confirm(确定)动作,该confirm动作中包括了排序第一的输出模块。Step 3. If the score of the second output module in the ranking is greater than or equal to the given threshold 2, and the difference between the scores of the two output modules in the first two rankings is less than the given threshold 3, the select action is output, and the select action It is used to provide a selection option, and the selection option includes the output module ranked first and the output module ranked second; otherwise, a confirm action is output, and the confirm action includes the output module ranked first.

步骤4、确定排序第一的模块是否为非候选意图解析模块,若是,则直接输出inform(通知)动作,该inform动作中包括了排序第一的输出模块;否则,执行步骤5。Step 4: Determine whether the first-ranked module is a non-candidate intent parsing module, and if so, directly output an inform (notification) action, which includes the first-ranked output module; otherwise, go to step 5.

步骤5、按照各个候选用户意图的得分对上述多个候选用户意图进行排序,若上述多个候选用户意图中得分最高的候选用户意图的得分大于或等于给定阈值4,或候选用户意图唯一且得分大于等于给定阈值5,则直接输出inform动作,该inform动作中包括了得分最高的候选用户意图,表示得分最高的候选用户意图为目标用户意图;否则,执行步骤5.1。Step 5. Sort the multiple candidate user intents according to the scores of each candidate user intent. If the score of the candidate user intent with the highest score among the multiple candidate user intents is greater than or equal to a given threshold of 4, or the candidate user intent is unique and If the score is greater than or equal to the given threshold of 5, the inform action is directly output, and the inform action includes the candidate user intent with the highest score, indicating that the candidate user intent with the highest score is the target user intent; otherwise, go to step 5.1.

步骤5.1、确定得分最高的候选用户意图的得分是否小于给定阈值6,若是,则直接输出default动作;若否,则执行步骤5.2。Step 5.1, determine whether the score of the candidate user intent with the highest score is less than the given threshold 6, if so, directly output the default action; if not, go to step 5.2.

步骤5.2、确定得分第二高的候选用户意图的得分是否大于或等于给定阈值6,若得分第二高的候选用户意图的得分大于或等于给定阈值6,且得分最高的候选用户意图与得分第二高的候选用户意图的得分之差小于给定间隔阈值,则输出select动作,否则输出confirm动作。其中,该select动作可以提供选择选项,该选择选项中包括得分最高的候选用户意图与得分第二高的候选用户意图;该confirm动作包括得分最高的候选用户意图。Step 5.2. Determine whether the score of the candidate user intent with the second highest score is greater than or equal to the given threshold 6, if the score of the candidate user intent with the second highest score is greater than or equal to the given threshold 6, and the candidate user intent with the highest score is equal to or equal to the given threshold 6. If the difference between the scores of the candidate user intent with the second highest score is less than the given interval threshold, the select action is output, otherwise the confirm action is output. Wherein, the select action may provide selection options, and the selection options include the candidate user intent with the highest score and the candidate user intent with the second highest score; the confirm action includes the candidate user intent with the highest score.

本申请实施例中所提供的显示设备,通过增加通用意图理解模型、对话状态追踪模型,以及对话策略学习模型,结合系统对话动作生成相应的系统回复,支撑语音交互系统通过丰富的对话动作类型,与用户进行主动式交互,完成用户的真实意图定位,可以有效提升用户使用体验。The display device provided in the embodiment of the present application, by adding a general intent understanding model, a dialogue state tracking model, and a dialogue strategy learning model, combined with system dialogue actions to generate corresponding system responses, supports the voice interaction system through rich dialogue action types, Active interaction with users to complete the user's true intention positioning can effectively improve the user experience.

基于上述实施例中的描述,本申请实施例中还提供了一种语音交互方法,参照图13,图13为本申请实施例中提供的一种语音交互方法的流程示意图一,在一种可行的实施方式中,上述语音交互方法包括:Based on the descriptions in the foregoing embodiments, an embodiment of the present application further provides a voice interaction method. Referring to FIG. 13 , FIG. 13 is a schematic flowchart 1 of a voice interaction method provided in the embodiment of the present application. In the embodiment of , the above-mentioned voice interaction method includes:

S1301、接收用户输入的第一语音数据,并确定该第一语音数据对应的候选用户意图。S1301. Receive first voice data input by a user, and determine a candidate user intent corresponding to the first voice data.

S1302、当第一语音数据对应多个候选用户意图时,根据多个候选用户意图生成询问语句,并向用户反馈该询问语句。S1302. When the first voice data corresponds to multiple candidate user intents, generate a query statement according to the multiple candidate user intents, and feed back the query statement to the user.

其中,上述询问语句用于提示用户从上述多个候选用户意图中选择一个用户意图。Wherein, the above query sentence is used to prompt the user to select a user intent from the plurality of candidate user intents.

S1303、接收用户输入的第二语音数据,并根据该第二语音数据,在上述多个候选用户意图中确定第一语音数据对应的目标用户意图。S1303. Receive the second voice data input by the user, and determine, according to the second voice data, a target user intent corresponding to the first voice data among the plurality of candidate user intents.

S1304、输出与目标用户意图关联的关联信息。S1304. Output associated information associated with the target user's intention.

即本申请所提供的语音交互方法,当用户输入语句对应的候选用户意图为两种或两种以上时,语音交互系统可以通过一种拟人化的交互方式主动向用户反馈询问语句,然后再根据用户输入的应答语句来确定用户的真实意图,能够有效提升语音交互过程中用户意图理解的准确性。That is, in the voice interaction method provided by this application, when the candidate user intentions corresponding to the user input sentence are two or more, the voice interaction system can actively feed back the query sentence to the user through an anthropomorphic interactive mode, and then according to The response sentence input by the user is used to determine the real intention of the user, which can effectively improve the accuracy of understanding the user's intention in the process of voice interaction.

为了更好的理解本申请实施例,参照图14,图14为本申请实施例中提供的一种语音交互方法的流程示意图二。For a better understanding of the embodiments of the present application, refer to FIG. 14 , which is a second schematic flowchart of a voice interaction method provided in the embodiments of the present application.

在图14中,假设用户输入的语音数据中包括“玩具火车”,且“玩具火车”即存在相关的歌曲,又存在相关的电影,则显示设备在接收到语音数据之后,会生成询问语句“您是想看电影还是听音乐”,若接收到的用户再次输入的语音数据中包括“听音乐”时,则查询与“玩具火车”相关的歌曲,并在显示设备的显示界面显示“正在为您查询歌曲”。In Figure 14, assuming that the voice data input by the user includes "toy train", and "toy train" contains both related songs and related movies, the display device will generate the query sentence "" after receiving the voice data Do you want to watch a movie or listen to music", if the received voice data input by the user again includes "listen to music", then query songs related to "toy train", and display on the display interface of the display device "In progress for You query the song".

为了更好的理解本申请实施例,参照图15a至图15d,图15a至图15d为本发明实施例中显示设备的语音交互示意图。For a better understanding of the embodiments of the present application, refer to FIGS. 15a to 15d , which are schematic diagrams of voice interaction of a display device in an embodiment of the present invention.

当用户需要与显示设备200进行语音交互时,可以通过预先设定好的唤醒方式,向显示设备200发送语音唤醒指令。例如,在一种可行的实施方式中,用户可以通过显示设备200的语音采集装置说出预先设定好的唤醒关键词,如“嗨,小聚”等;此时,显示设备200的语音采集装置会将采集到的语音信息发送至控制器,控制器对接收到的语音信息进行识别,若识别结果中包括上述唤醒关键词,则控制显示设备进入语音交互状态。When the user needs to perform voice interaction with the display device 200, a voice wake-up instruction may be sent to the display device 200 through a preset wake-up mode. For example, in a feasible implementation manner, the user can speak a preset wake-up keyword, such as "Hi, Xiaoju", etc. through the voice collection device of the display device 200; at this time, the voice collection of the display device 200 The device will send the collected voice information to the controller, and the controller will recognize the received voice information, and if the recognition result includes the above-mentioned wake-up keyword, it will control the display device to enter the voice interaction state.

在一些实施例中,当显示设备进入语音交互状态后,可以在显示屏上显示提醒消息用于提醒用户显示设备200已进入语音交互状态,如图15a所示,可以在显示屏上显示“您需要什么帮助?”。In some embodiments, after the display device enters the voice interaction state, a reminder message may be displayed on the display screen to remind the user that the display device 200 has entered the voice interaction state. As shown in FIG. 15a, the display screen may display “You How can I help you?".

当显示设备进入语音交互状态后,上述语音采集装置开始采集用户输入的语音信息,并将采集到的语音信息发送至控制器。例如,在一种可行的实施方式中,用户可以通过显示设备200的麦克风说出“玩具火车”;此时,显示设备200的麦克风会将采集到的语音信息发送至控制器。控制器在接收到该语音信息后,对该语音信息进行语音识别,并确定该语音信息对应的候选用户意图。After the display device enters the voice interaction state, the above-mentioned voice collecting device starts to collect the voice information input by the user, and sends the collected voice information to the controller. For example, in a feasible implementation, the user can say "toy train" through the microphone of the display device 200; at this time, the microphone of the display device 200 will send the collected voice information to the controller. After receiving the voice information, the controller performs voice recognition on the voice information, and determines the candidate user intent corresponding to the voice information.

当上述语音信息对应的候选用户意图中既包括电影又包括音乐时,利用语言生成模块生成询问语句,并在显示屏上进行显示,如图15b所示,可以在显示屏上显示“您是想看电影还是听音乐?”。When the candidate user intent corresponding to the above voice information includes both movies and music, the language generation module is used to generate a query sentence and display it on the display screen, as shown in Figure 15b, the display screen can display "Do you want to Watching movies or listening to music?".

显示设备在显示上述询问语句的过程中,上述语音采集装置继续采集用户输入的语音信息。当用户输入的语音信息为“听音乐”时,则通过服务器查找与“玩具火车”相关的歌曲;当用户输入的语音信息为“看电影”时,则通过服务器查找与“玩具火车”相关的电影。如图15c所示,确定在用户输入的语音信息为“听音乐”后,可以在显示界面显示:正在为您搜索“玩具火车”相关的歌曲……During the process of displaying the query sentence on the display device, the voice collecting device continues to collect the voice information input by the user. When the voice information input by the user is "listen to music", the server will search for songs related to "toy train"; when the voice information input by the user is "watching a movie", the server will search for songs related to "toy train". Movie. As shown in Figure 15c, after it is determined that the voice information input by the user is "listen to music", it can be displayed on the display interface: Searching for songs related to "toy train" for you...

在显示设备完成搜索任务后,即可在显示屏上显示搜索结果。如图15d所示,在显示屏上显示:为您搜索到以下歌曲:“玩具火车.MP3”。After the display device completes the search task, the search results can be displayed on the display. As shown in Figure 15d, it is displayed on the display screen: The following song is searched for you: "Toy Train.MP3".

参照图16a至图16d,图16a至图16d为本发明实施例中显示设备的另一种语音交互示意图。Referring to FIGS. 16a to 16d , FIGS. 16a to 16d are schematic diagrams of another voice interaction of the display device in the embodiment of the present invention.

当显示设备进入语音交互状态后,上述语音采集装置开始采集用户输入的语音信息,并将采集到的语音信息发送至控制器。例如,在一种可行的实施方式中,用户可以通过显示设备200的麦克风说出“变形金刚”;此时,显示设备200的麦克风会将采集到的语音信息发送至控制器。控制器在接收到该语音信息后,对该语音信息进行语音识别,并确定该语音信息对应的候选用户意图。同时,在显示界面显示语音识别结果,如图16a所示,可以在显示屏上显示“变形金刚”。After the display device enters the voice interaction state, the above-mentioned voice collecting device starts to collect the voice information input by the user, and sends the collected voice information to the controller. For example, in a feasible implementation manner, the user can say "Transformers" through the microphone of the display device 200; at this time, the microphone of the display device 200 will send the collected voice information to the controller. After receiving the voice information, the controller performs voice recognition on the voice information, and determines the candidate user intent corresponding to the voice information. At the same time, the voice recognition result is displayed on the display interface, as shown in Figure 16a, and "Transformers" can be displayed on the display screen.

当上述语音信息对应的候选用户意图中既包括电影又包括商品时,利用语言生成模块生成询问语句,并在显示屏上进行显示,如图16b所示,可以在显示屏上显示“您是想看电影还是购物?”。When the candidate user intent corresponding to the above voice information includes both movies and commodities, use the language generation module to generate a query sentence and display it on the display screen, as shown in Figure 16b, the display screen can display "Do you want to Movies or shopping?".

显示设备在显示上述询问语句的过程中,上述语音采集装置继续采集用户输入的语音信息。当用户输入的语音信息为“购物”时,则通过服务器查找与“变形金刚”相关的商品;当用户输入的语音信息为“看电影”时,则通过服务器查找与“变形金刚”相关的电影。如图16c所示,确定在用户输入的语音信息为“购物”后,可以在显示界面显示:正在为您搜索“变形金刚”相关的商品……During the process of displaying the query sentence on the display device, the voice collecting device continues to collect the voice information input by the user. When the voice information input by the user is "shopping", the server will search for products related to "Transformers"; when the voice information input by the user is "watching movies", the server will search for movies related to "Transformers" . As shown in Figure 16c, after it is determined that the voice information input by the user is "shopping", it can be displayed on the display interface: Searching for products related to "Transformers" for you...

在显示设备完成搜索任务后,即可在显示屏上显示搜索结果。如图16d所示,在显示屏上显示搜索到的各个商品的商品链接。After the display device completes the search task, the search results can be displayed on the display. As shown in Fig. 16d, the commodity links of the searched commodities are displayed on the display screen.

可以理解的是,上述实施例中所描述的语音交互方法可以由服务器执行。例如,当显示设备检测到用户的输入操作时,获取用户输入的语音数据,然后将用户输入的语音数据发送至服务器,由服务器对用户输入的语音数据进行语音识别后,确定该语音数据对应的目标用户意图,并将该目标用户意图关联的应答信息反馈至显示设备。It can be understood that, the voice interaction method described in the above embodiments may be executed by a server. For example, when the display device detects the user's input operation, it obtains the voice data input by the user, and then sends the voice data input by the user to the server. After the server performs voice recognition on the voice data input by the user, the corresponding voice data is determined. target user intent, and feedback the response information associated with the target user intent to the display device.

在一些实施方式中,上述服务器可以通过网络与显示设备进行数据交互,或者,上述服务器也可以集成于显示设备中,通过显示设备中的通信总线来与显示设备进行数据交互。In some embodiments, the above server may perform data interaction with the display device through a network, or the above server may also be integrated in the display device and perform data interaction with the display device through a communication bus in the display device.

另外,上述实施例中所描述的语音交互方法也可以由显示设备执行,例如,当显示设备检测到用户的输入操作时,获取用户输入的语音数据,然后对用户输入的语音数据进行语音识别后,确定该语音数据对应的目标用户意图,并将该目标用户意图关联的应答信息发送至显示上班进行显示。In addition, the voice interaction method described in the above embodiments can also be performed by a display device. For example, when the display device detects the user's input operation, it acquires the voice data input by the user, and then performs voice recognition on the voice data input by the user. , determine the target user's intention corresponding to the voice data, and send the response information associated with the target user's intention to the display for display.

可以理解的是,上述实施例中所描述的语音交互方法,不仅仅可以应用于上述显示设备,其还可以应用于其它具备语音交互功能的电子设备,例如智能音响、智能家居、可穿戴设备、儿童玩具、学习机等,本申请实施例中不做限定。It can be understood that the voice interaction method described in the above-mentioned embodiments can be applied not only to the above-mentioned display devices, but also to other electronic devices with voice interaction functions, such as smart speakers, smart homes, wearable devices, Children's toys, learning machines, etc., are not limited in the embodiments of the present application.

最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope.

Claims (12)

1. A display device, characterized in that the display device comprises:
the voice acquisition device is used for acquiring voice data;
the audio processor is used for processing the collected voice data;
a display screen for displaying an image;
a controller configured to:
receiving first voice data input by a user, and determining candidate user intentions corresponding to the first voice data;
when the first voice data corresponds to a plurality of candidate user intentions, generating an inquiry statement according to the candidate user intentions, and sending the inquiry statement to the display screen for displaying, wherein the inquiry statement is used for prompting a user to select one user intention from the candidate user intentions;
receiving second voice data input by the user, and determining a target user intention corresponding to the first voice data in the candidate user intentions according to the second voice data;
outputting association information associated with the target user intent.
2. The display device according to claim 1, wherein the controller is configured to:
when the first voice data corresponds to a single candidate user intention, outputting associated information associated with the candidate user intention.
3. The display device according to claim 1, wherein the controller is configured to:
acquiring historical user intentions determined in a voice interaction process;
and determining candidate user intentions corresponding to the first voice data by utilizing the intention recognition model according to the first voice data and the historical user intentions.
4. The display device according to claim 3, wherein the controller is configured to:
determining whether the first speech data and the historical user intent belong to the same dialog sequence based on a dialog state tracking model;
when the first voice data and the historical user intention belong to the same dialogue sequence, determining an initial user intention corresponding to the first voice data by using the intention recognition model, and updating the initial user intention corresponding to the first voice data according to the historical user intention to obtain a candidate user intention corresponding to the first voice data;
when the first voice data and the historical user intention do not belong to the same dialog sequence, determining candidate user intentions corresponding to the first voice data by using the intention recognition model.
5. The display device according to claim 1, wherein the controller is configured to:
determining scores of all output modules in a dialogue strategy learning model according to the intentions of all candidate users corresponding to the first voice data; the dialogue strategy learning model comprises at least one of the following output modules: the rewriting module is used for referring to a resolution module, a vertical field intention analysis module, a task multi-turn response module, a question-answering module, a news search module, a chat module, a recommendation module and a candidate intention analysis module;
and when the output module with the highest score in the dialogue strategy learning model is the candidate intention analysis module, generating the inquiry statement according to the score of each candidate user intention.
6. The display device according to claim 5, wherein the controller is configured to:
when the score of a first candidate user intention corresponding to the first voice data is smaller than a first preset threshold, the score of a second candidate user intention corresponding to the first voice data is larger than a second preset threshold, and the difference between the score of the first candidate user intention and the score of the second candidate user intention is smaller than a preset interval threshold, generating the query sentence based on the first candidate user intention and the second candidate user intention; the first candidate user intention is a candidate user intention with the highest score corresponding to the first voice data, the second candidate user intention is a candidate user intention with the second highest score corresponding to the first voice data, and the first preset threshold is larger than the second preset threshold.
7. A method of voice interaction, the method comprising:
receiving first voice data input by a user, and determining candidate user intentions corresponding to the first voice data;
when the first voice data corresponds to a plurality of candidate user intentions, generating a query statement according to the candidate user intentions, and feeding back the query statement to the user, wherein the query statement is used for prompting the user to select one user intention from the candidate user intentions;
receiving second voice data input by the user, and determining a target user intention corresponding to the first voice data in the candidate user intentions according to the second voice data;
outputting association information associated with the target user intent.
8. The method of claim 7, further comprising:
when the first voice data corresponds to a single candidate user intention, outputting associated information associated with the candidate user intention.
9. The method of claim 7, wherein the determining the candidate user intent corresponding to the first speech data based on the intent recognition model comprises:
acquiring historical user intentions determined in a voice interaction process;
and determining candidate user intentions corresponding to the first voice data by utilizing the intention recognition model according to the first voice data and the historical user intentions.
10. The method of voice interaction according to claim 9, wherein determining, using the intent recognition model, the candidate user intent corresponding to the first speech data based on the first speech data and the historical user intent comprises:
determining whether the first speech data and the historical user intent belong to the same dialog sequence based on a dialog state tracking model;
when the first voice data and the historical user intention belong to the same dialogue sequence, determining an initial user intention corresponding to the first voice data by using the intention recognition model, and updating the initial user intention corresponding to the first voice data according to the historical user intention to obtain a candidate user intention corresponding to the first voice data;
when the first voice data and the historical user intention do not belong to the same dialog sequence, determining candidate user intentions corresponding to the first voice data by using the intention recognition model.
11. The method of claim 7, wherein generating a query statement according to the plurality of candidate user intents comprises:
determining scores of all output modules in a dialogue strategy learning model according to the intentions of all candidate users corresponding to the first voice data; the dialogue strategy learning model comprises at least one of the following output modules: the rewriting module is used for referring to a resolution module, a vertical field intention analysis module, a task multi-turn response module, a question-answering module, a news search module, a chat module, a recommendation module and a candidate intention analysis module;
and when the output module with the highest score in the dialogue strategy learning model is the candidate intention analysis module, generating the query sentence according to the score of each candidate user intention.
12. The method of claim 11, wherein generating the query statement based on the score of each of the candidate user intentions comprises:
when the score of a first candidate user intention corresponding to the first voice data is smaller than a first preset threshold, the score of a second candidate user intention corresponding to the first voice data is larger than a second preset threshold, and the difference between the score of the first candidate user intention and the score of the second candidate user intention is smaller than a preset interval threshold, generating the query sentence based on the first candidate user intention and the second candidate user intention; the first candidate user intention is a candidate user intention with the highest score corresponding to the first voice data, the second candidate user intention is a candidate user intention with the second highest score corresponding to the first voice data, and the first preset threshold is larger than the second preset threshold.
CN202011433067.6A 2020-12-10 2020-12-10 Display device and voice interaction method Pending CN114627864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011433067.6A CN114627864A (en) 2020-12-10 2020-12-10 Display device and voice interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011433067.6A CN114627864A (en) 2020-12-10 2020-12-10 Display device and voice interaction method

Publications (1)

Publication Number Publication Date
CN114627864A true CN114627864A (en) 2022-06-14

Family

ID=81894849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011433067.6A Pending CN114627864A (en) 2020-12-10 2020-12-10 Display device and voice interaction method

Country Status (1)

Country Link
CN (1) CN114627864A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115472149A (en) * 2022-08-30 2022-12-13 海尔优家智能科技(北京)有限公司 Voice message response method and device, storage medium and electronic device
WO2024078419A1 (en) * 2022-10-14 2024-04-18 华为技术有限公司 Voice interaction method, voice interaction apparatus and electronic device
WO2025035983A1 (en) * 2024-06-28 2025-02-20 北京字跳网络技术有限公司 Request processing method and apparatus, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995880A (en) * 2014-05-27 2014-08-20 百度在线网络技术(北京)有限公司 Interactive searching method and device
CN111651578A (en) * 2020-06-02 2020-09-11 北京百度网讯科技有限公司 Man-machine conversation method, device and equipment
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
US20200380963A1 (en) * 2019-05-31 2020-12-03 Apple Inc. Global re-ranker
CN112035647A (en) * 2020-09-02 2020-12-04 中国平安人寿保险股份有限公司 Question-answering method, device, equipment and medium based on man-machine interaction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995880A (en) * 2014-05-27 2014-08-20 百度在线网络技术(北京)有限公司 Interactive searching method and device
US20200380963A1 (en) * 2019-05-31 2020-12-03 Apple Inc. Global re-ranker
CN111651578A (en) * 2020-06-02 2020-09-11 北京百度网讯科技有限公司 Man-machine conversation method, device and equipment
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
CN112035647A (en) * 2020-09-02 2020-12-04 中国平安人寿保险股份有限公司 Question-answering method, device, equipment and medium based on man-machine interaction

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115472149A (en) * 2022-08-30 2022-12-13 海尔优家智能科技(北京)有限公司 Voice message response method and device, storage medium and electronic device
WO2024078419A1 (en) * 2022-10-14 2024-04-18 华为技术有限公司 Voice interaction method, voice interaction apparatus and electronic device
WO2025035983A1 (en) * 2024-06-28 2025-02-20 北京字跳网络技术有限公司 Request processing method and apparatus, device and storage medium

Similar Documents

Publication Publication Date Title
CN112163086B (en) Multi-intention recognition method and display device
US12177522B2 (en) Smart television and server
CN110737840A (en) Voice control method and display device
CN112000820A (en) Media asset recommendation method and display device
CN112511882B (en) Display device and voice call-out method
CN112004157B (en) Multi-round voice interaction method and display device
CN111984763B (en) Question answering processing method and intelligent device
WO2022032916A1 (en) Display system
CN114627864A (en) Display device and voice interaction method
CN112002321B (en) Display device, server and voice interaction method
CN112182196A (en) Service equipment applied to multi-turn conversation and multi-turn conversation method
CN114118064A (en) Display device, text error correction method and server
CN111949782A (en) Information recommendation method and service equipment
CN114187905A (en) Training method of user intention recognition model, server and display equipment
CN118445485A (en) Display device and voice searching method
CN111914114A (en) Badcase mining method and electronic equipment
CN112256232B (en) Display device and natural language generation post-processing method
CN111950288B (en) Entity labeling method in named entity recognition and intelligent device
CN113035194B (en) Voice control method, display device and server
CN115270808A (en) Display device and semantic understanding method
CN114566144A (en) Voice recognition method and device, server and electronic equipment
CN113038217A (en) Display device, server and response language generation method
CN114155846A (en) A semantic slot extraction method and display device
CN111914565A (en) Electronic equipment and user statement processing method
CN113076427B (en) Media resource searching method, display equipment and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination