CN115240677A

CN115240677A - Voice interaction method, device and equipment for vehicle cabin

Info

Publication number: CN115240677A
Application number: CN202210963975.9A
Authority: CN
Inventors: 魏萌; 徐培来; 陈鹏
Original assignee: Beijing Binli Information Technology Co Ltd
Current assignee: Beijing Binli Information Technology Co Ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-10-25

Abstract

There is provided a voice interaction method for a vehicle cabin comprising a plurality of zones respectively corresponding to a plurality of vehicle seats, the plurality of zones being collectively configured with a current conversation cache for storing current rounds of conversations, and each zone of the plurality of zones being configured with a respective zone cache for storing historical semantic content from the zone within a preset time window, the method comprising: obtaining current semantic content from a first one of the plurality of sound zones; updating the current multi-turn conversation based on the current conversation cache, the sound zone cache of the first sound zone and the sound zone caches of the rest sound zones except the first sound zone in the plurality of sound zones, wherein the updated current multi-turn conversation is associated with the current semantic content; and processing current semantic content associated with the updated current plurality of sessions.

Description

Voice interaction method, apparatus and device for vehicle cockpit

技术领域technical field

本公开涉及车辆领域，特别是涉及一种用于车辆座舱的语音交互方法、用于车辆座舱的语音交互装置、用于车辆座舱的计算机设备、包括上述的语音交互装置或计算机设备的车辆、存储介质以及计算机程序产品。The present disclosure relates to the field of vehicles, and in particular to a voice interaction method for a vehicle cabin, a voice interaction device for a vehicle cabin, a computer device for a vehicle cabin, a vehicle including the above-mentioned voice interaction device or computer device, and a storage device. media and computer program products.

背景技术Background technique

近年来，随着人工智能技术的迅猛发展，语音交互技术也具备了实用水平。随着私家车数量的不断提升，在车机上搭载车载语音交互系统成为车机系统的标配。同时人们越来越来渴望可以自然便捷地与车机系统进行交流，从而提升交互体验和行车安全性。因此，进一步改善车载语音交互系统是实现车辆座舱语音交互智能化的重要工作之一。In recent years, with the rapid development of artificial intelligence technology, voice interaction technology also has a practical level. With the continuous increase of the number of private cars, the vehicle-mounted voice interaction system has become the standard configuration of the car-machine system. At the same time, people are more and more eager to communicate with the vehicle-machine system naturally and conveniently, so as to improve the interactive experience and driving safety. Therefore, further improving the vehicle voice interaction system is one of the important tasks to realize the intelligentization of vehicle cockpit voice interaction.

在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明，否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地，除非另有指明，否则此部分中提及的问题不应认为在任何现有技术中已被公认。The approaches described in this section are not necessarily approaches that have been previously conceived or employed. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the issues raised in this section should not be considered to be recognized in any prior art.

发明内容SUMMARY OF THE INVENTION

本公开实施例提供了一种用于车辆座舱的语音交互方法、用于车辆座舱的语音交互装置、计算机设备、包括上述的语音交互装置或语音交互设备的车辆、存储介质以及计算机程序产品。Embodiments of the present disclosure provide a voice interaction method for a vehicle cabin, a voice interaction device for a vehicle cabin, a computer device, a vehicle including the above-mentioned voice interaction device or voice interaction device, a storage medium, and a computer program product.

根据本公开的一方面，提供了一种用于车辆座舱的语音交互方法，该车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，该方法包括：获取来自该多个音区中的第一音区的当前语义内容；基于该当前会话缓存、该第一音区的音区缓存和该多个音区中除该第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及处理与更新后的当前多轮会话相关联的当前语义内容。According to an aspect of the present disclosure, there is provided a voice interaction method for a vehicle cabin, the vehicle cabin including a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being jointly configured with a current session cache to For storing the current multi-round conversation, and each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, the method includes: obtaining data from The current semantic content of the first sound region in the plurality of sound regions; based on the current session buffer, the sound region buffer of the first sound region and the remaining sound regions of the plurality of sound regions except the first sound region The sound area cache updates the current multi-round session, wherein the updated current multi-round session is associated with the current semantic content; and processes the current semantic content associated with the updated current multi-round session.

根据本公开的另一方面，提供了一种用于车辆座舱的语音交互装置，该车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，该装置包括：获取模块，该获取模块被配置为获取来自该多个音区中的第一音区的当前语义内容；更新模块，该更新模块被配置为基于该当前会话缓存、该第一音区的音区缓存和该多个音区中除该第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及处理模块，该处理模块被配置为处理与更新后的当前多轮会话相关联的当前语义内容。According to another aspect of the present disclosure, there is provided a voice interaction device for a vehicle cabin, the vehicle cabin including a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being jointly configured with a current session cache For storing the current multi-round conversation, and each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, the device comprises: obtaining module, the obtaining module is configured to obtain the current semantic content from the first sound region in the plurality of sound regions; the updating module is configured to be based on the current session cache, the sound region cache of the first sound region and the sound region buffers of the remaining sound regions in the plurality of sound regions except the first sound region, updating the current multi-round conversation, wherein the updated current multi-round conversation is associated with the current semantic content; and a processing module, the The processing module is configured to process the current semantic content associated with the updated current multi-round session.

根据本公开的又一方面，提供了一种用于车辆座舱的计算机设备，该车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，该计算机设备包括：至少一个处理器；以及至少一个存储器，其上存储有计算机程序，该计算机程序在被至少一个处理器执行时致使至少一个处理器实现上述的方法。According to yet another aspect of the present disclosure, there is provided a computer device for a vehicle cabin, the vehicle cabin including a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being jointly configured with a current session cache to For storing the current multi-round conversation, and each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, the computer equipment includes: at least a processor; and at least one memory having stored thereon a computer program that, when executed by the at least one processor, causes the at least one processor to implement the above-described method.

根据本公开的又另一方面，提供了一种车辆，该车辆包括上述的语音交互装置或计算机设备。According to yet another aspect of the present disclosure, there is provided a vehicle comprising the above-mentioned voice interaction device or computer equipment.

根据本公开的再一方面，提供了一种存储计算机程序的非暂态计算机可读存储介质，该计算机程序包括指令，该指令在由处理器执行时致使处理器执行上述的方法。According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing a computer program, the computer program comprising instructions that, when executed by a processor, cause the processor to perform the above-described method.

根据本公开的再另一方面，提供了一种计算机程序产品，该计算机程序产品包括指令，该指令在由处理器执行时致使处理器执行上述的方法。According to yet another aspect of the present disclosure, there is provided a computer program product comprising instructions which, when executed by a processor, cause the processor to perform the above-described method.

根据本公开的实施例，可实现车辆座舱中多音区多用户与车机的同时交互，并且可实现对存在上下文关联性的不同用户会话进行链接以避免将不同用户的语音映射成分开的意图，从而提高车辆座舱语音交互的准确度且有助于改善用户的用车体验。According to the embodiments of the present disclosure, it is possible to realize the simultaneous interaction of multi-sound zones and multi-users in the vehicle cockpit with the vehicle machine, and to link different user sessions with contextual relevance to avoid the intention of mapping different users' voices into separate ones. , thereby improving the accuracy of the voice interaction in the vehicle cockpit and helping to improve the user's car experience.

根据在下文中所描述的实施例，本公开的这些和其它方面将是清楚明白的，并且将参考在下文中所描述的实施例而被阐明。These and other aspects of the present disclosure will be apparent from and elucidated with reference to the embodiments described hereinafter.

附图说明Description of drawings

在下面结合附图对于示例性实施例的描述中，本公开的更多细节、特征和优点被公开。附图示例性地示出了实施例并且构成说明书的一部分，与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的，并不限制权利要求的范围。在所有附图中，相同的附图标记指代类似但不一定相同的要素。在附图中：Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings. The accompanying drawings illustrate the embodiments by way of example and constitute a part of the specification, and together with the written description of the specification serve to explain exemplary implementations of the embodiments. The shown embodiments are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, the same reference numbers refer to similar but not necessarily identical elements. In the attached image:

图1是图示出根据示例性实施例的可以在其中实施本文描述的各种方法的示例系统的示意图；1 is a schematic diagram illustrating an example system in which various methods described herein may be implemented, according to an example embodiment;

图2是图示出根据示例性实施例的用于车辆座舱的语音交互方法的流程图；FIG. 2 is a flowchart illustrating a voice interaction method for a vehicle cockpit according to an exemplary embodiment;

图3是图示出根据示例性实施例的更新当前多轮会话的方法的流程图；3 is a flowchart illustrating a method of updating a current multi-round session according to an exemplary embodiment;

图4是图示出根据示例性实施例的车辆座舱的语音交互过程的逻辑框图；4 is a logical block diagram illustrating a voice interaction process for a vehicle cabin in accordance with an exemplary embodiment;

图5是图示出根据示例性实施例的用于车辆座舱的语音交互装置的框图；并且5 is a block diagram illustrating a voice interaction device for a vehicle cabin according to an exemplary embodiment; and

图6是图示出能够应用于示例性实施例的示例性计算机设备的框图。FIG. 6 is a block diagram illustrating an example computer device that can be applied to the example embodiments.

具体实施方式Detailed ways

在本公开中，除非另有说明，否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系，这种术语只是用于将一个元件与另一元件区分开。在一些示例中，第一要素和第二要素可以指向该要素的同一实例，而在某些情况下，基于上下文的描述，它们也可以指代不同实例。In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, timing relationship or importance relationship of these elements, and such terms are only used for Distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.

在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的，而并非旨在进行限制。除非上下文另外明确地表明，如果不特意限定要素的数量，则该要素可以是一个也可以是多个。如本文使用的，术语“多个”意指两个或更多，并且术语“基于”应解释为“至少部分地基于”。此外，术语“和/或”以及“……中的至少一个”涵盖所列出的项目中的任何一个以及全部可能的组合方式。The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly dictates otherwise, if the number of an element is not expressly limited, the element may be one or more. As used herein, the term "plurality" means two or more, and the term "based on" should be construed as "based at least in part on." Furthermore, the terms "and/or" and "at least one of" encompass any and all possible combinations of the listed items.

在介绍本公开的示例性实施例之前，首先对本文中使用的若干术语进行解释。Before introducing exemplary embodiments of the present disclosure, several terms used herein are first explained.

如本文使用的，术语“音区”是指配备有采集用户语音和/或环境声音的麦克风或多个麦克风的阵列、且与车辆座舱中的座位位置相关联的空间划分。可根据实际需要来对车辆座舱空间进行音区划分，例如，可将车辆座舱空间按前后排座位划分成前后两个音区(实际情况包括但不限于具有两排车座的车辆座舱空间，例如，当车辆座舱具有不止两排车座时，可将车辆座舱空间相应地划分成不止两个音区)，可将车辆座舱空间按驾驶位、副驾驶位和乘客位划分成相应不同音区，等等。As used herein, the term "sound zone" refers to a spatial division that is equipped with a microphone or array of microphones that capture user speech and/or ambient sound, and is associated with a seating position in a vehicle cabin. The vehicle cockpit space can be divided into sound zones according to actual needs. For example, the vehicle cockpit space can be divided into two sound zones according to the front and rear seats (the actual situation includes but not limited to the vehicle cockpit space with two rows of seats, such as , when the vehicle cockpit has more than two rows of seats, the vehicle cockpit space can be divided into more than two sound areas accordingly), and the vehicle cockpit space can be divided into corresponding different sound areas according to the driver's seat, co-pilot seat and passenger seat, and many more.

如本文使用的，术语“音区缓存”是指与音区对应的、存储由该音区所配备的麦克风或麦克风阵列采集到的用户语音和/或环境声音或其经处理信号的存储器件。在没有明确说明的情况下，当提到音区缓存时，可指代音区缓存中所存储的对象，例如，包括但不限于来自该音区的经自然语言理解(Natural Language Understanding,NLU)处理过的语义内容等。As used herein, the term "sound zone buffer" refers to a memory device corresponding to a sound zone that stores user speech and/or ambient sound or processed signals thereof captured by a microphone or microphone array equipped with the sound zone. In the case of no explicit description, when referring to the voice zone cache, it can refer to the objects stored in the voice zone cache, for example, including but not limited to the natural language understanding (NLU) from the voice zone. Processed semantic content, etc.

如本文使用的，术语“当前会话缓存”是指与整个车辆座舱空间内(多个)用户当前进行的会话相关联的、存储由该会话所涉及的(多个)音区所配备的麦克风或麦克风阵列采集到的用户语音和/或环境声音或其经处理信号的存储器件。作为示例，在实际信号处理过程中，当前会话缓存中所存储的对象可包括组成当前会话的、由个体音区的麦克风或麦克风阵列采集到的用户语音和/或环境声音或其经处理信号的按时间顺序的级联。在没有明确说明的情况下，当提到当前会话缓存时，可指代当前会话缓存中所存储的对象，例如，包括但不限于经自然语言理解(Natural Language Understanding,NLU)处理过的当前多轮会话等。As used herein, the term "current session cache" refers to a storage associated with a session(s) currently in progress by a user throughout the vehicle cabin space, storing microphones equipped by the sound zone(s) involved in the session or A storage device for user speech and/or ambient sound or its processed signals collected by the microphone array. As an example, in the actual signal processing process, the objects stored in the current session cache may include the user's voice and/or the ambient sound or the processed signals thereof collected by the microphones or microphone arrays of the individual sound zones that make up the current session. Cascading in chronological order. Unless otherwise specified, when referring to the current session cache, it may refer to objects stored in the current session cache, for example, including but not limited to the current round sessions, etc.

如本文使用的，术语“语义内容”是指原始语音信号在由自然语言理解NLU处理后所得到的、含有上下文信息(例如，是否存在与其他语音的接续)、会话状态(例如，是否为一段对话的开头或结尾)和意图(例如，是否包括对特定信息的查询)等丰富细节的经处理信号。As used herein, the term "semantic content" refers to the original speech signal after being processed by natural language understanding (NLU), containing contextual information (eg, whether there is a continuation with other speech), session state (eg, whether it is a paragraph or not) the beginning or end of a conversation) and intent (eg, whether to include a query for specific information), a processed signal with rich details.

如本文使用的，术语“多轮会话”是指包括不止一条组成会话的语音信号的会话，其通常由多个用户完成，但也不排除一个用户进行多轮会话的可能性，例如，当单个用户说出一句带查询目的的语音之后又说出对前述查询进行补充说明的另一条语音。As used herein, the term "multi-turn session" refers to a session that includes more than one speech signal that makes up the session, which is usually completed by multiple users, but does not preclude the possibility of a single user having multiple sessions, for example, when a single After the user speaks a voice with a query purpose, he then speaks another voice to supplement the aforementioned query.

从传统的一问一答式的语音交互系统到目前较流行的多轮问答的语音交互系统，人机交互的方式趋向于接近人人交互。语音交互通常涉及语音识别、自然语言理解和语音合成等。在智能座舱内，说话人往往并不唯一。对此，在相关技术中，存在可实现多个乘客同时交互的车载语音交互系统。这些车载语音交互系统是通过为座舱内的多个音区创建多个语音交互链路(即，将座舱内分为多个音区(座位)，通过麦克风阵列采集不同区域的语音信号)来实现的。当一条或多条语音交互链路监听到音区的语音信号时，车机终端将该一条或多条语音交互链路切换到语音处理状态，语音处理状态用于处理其对应音区内的乘客输入的语音信号。尽管此类方法支持会话对象的切换，但遗憾的是，同一时间只有一条语音交互链路处于语音处理状态，所以此类方法仍然是串行单会话交互。此外，在相关技术中，还存在将多个音区的信号分别发送至预设云端服务器进行语义识别以便生成对应语音指令的方式以实现车载语音交互，但此类方式无法判断多个音区的语音信号之间是否具有上下文关系。From the traditional one-question and one-answer voice interaction system to the more popular multi-round question-and-answer voice interaction system, the way of human-computer interaction tends to be close to human-human interaction. Voice interaction usually involves speech recognition, natural language understanding, and speech synthesis, among others. In smart cockpits, speakers are often not unique. In this regard, in the related art, there is an in-vehicle voice interaction system that can realize simultaneous interaction of multiple passengers. These in-vehicle voice interaction systems are realized by creating multiple voice interaction links for multiple sound zones in the cockpit (that is, dividing the cockpit into multiple sound zones (seats), and collecting voice signals in different areas through a microphone array) of. When one or more voice interaction links monitor the voice signal in the sound zone, the vehicle terminal switches the one or more voice interaction links to the voice processing state, and the voice processing state is used to process the passengers in the corresponding voice zone. input voice signal. Although this method supports switching of conversation objects, unfortunately, only one voice interaction link is in the speech processing state at the same time, so this method is still a serial single-session interaction. In addition, in the related art, there is also a method of sending the signals of multiple sound zones to a preset cloud server for semantic recognition so as to generate corresponding voice commands to realize in-vehicle voice interaction, but such methods cannot determine the Whether there is a contextual relationship between speech signals.

为了解决上述技术问题，根据本公开的一个或多个实施例，提出一种新的用于车辆座舱的语音交互方法。该方法基于当前会话缓存、第一音区的音区缓存和多个音区中除该第一音区以外的其余音区的音区缓存来更新当前多轮会话，从而在将多轮会话的上下文关联性纳入考虑的情况下更好地处理与更新后的当前多轮会话相关联的当前语义内容。通过上述方法，可实现车辆座舱中多音区多用户与车机的同时交互，并且可实现对存在上下文关联性的不同用户会话进行链接以避免将不同用户的语音映射成分开的意图，从而提高车辆座舱语音交互的准确度且有助于改善用户的用车体验。下面结合附图详细描述本公开的示例性实施例。In order to solve the above technical problems, according to one or more embodiments of the present disclosure, a new voice interaction method for a vehicle cockpit is proposed. The method updates the current multi-round conversation based on the current session buffer, the sound area buffer of the first sound area, and the sound area buffers of the other sound areas in the plurality of sound areas except the first sound area, so that when The current semantic content associated with the updated current multi-round session is better handled with contextual relevance taken into account. Through the above method, the simultaneous interaction between multi-sound zones and multi-users in the vehicle cockpit and the vehicle machine can be realized, and different user sessions with contextual relevance can be linked to avoid mapping the voices of different users into separate intentions, thereby improving the The accuracy of the voice interaction in the vehicle cockpit can help improve the user's car experience. Exemplary embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

图1是图示出根据示例性实施例的可以在其中实施本文描述的各种方法的示例系统100的示意图。FIG. 1 is a schematic diagram illustrating an example system 100 in which various methods described herein may be implemented, according to an example embodiment.

参考图1，该系统100包括车载系统110、服务器120、以及将车载系统110与服务器120通信地耦合的网络130。Referring to FIG. 1 , the system 100 includes an in-vehicle system 110 , a server 120 , and a network 130 communicatively coupling the in-vehicle system 110 and the server 120 .

车载系统110包括显示器114和可经由显示器114显示的应用程序(APP)112。应用程序112可以为车载系统110默认安装的或由用户102下载和安装的应用程序，或者作为轻量化应用程序的小程序。在应用程序112为小程序的情况下，用户102可以通过在宿主应用中搜索应用程序112(例如，通过应用程序112的名称等)或扫描应用程序112的图形码(例如，条形码、二维码等)等方式，在车载系统110上直接运行应用程序112，而无需安装应用程序112。在一些实施例中，车载系统110可以包括一个或多个处理器和一个或多个存储器(未示出)，并且车载系统110被实现为车载计算机。在一些实施例中，车载系统110可以包括更多或更少的显示屏114(例如，不包括显示屏114)，和/或一个或多个扬声器或其他人机交互设备。在一些实施例中，车载系统110可以不与服务器120通信。The in-vehicle system 110 includes a display 114 and an application program (APP) 112 that can be displayed via the display 114 . The application 112 may be an application installed by default on the in-vehicle system 110 or downloaded and installed by the user 102, or as an applet of a lightweight application. In the case where the application 112 is an applet, the user 102 can search for the application 112 in the host application (eg, by the name of the application 112, etc.) or scan the graphic code (eg, barcode, QR code, etc.) of the application 112 etc.), etc., the application program 112 is directly run on the in-vehicle system 110 without installing the application program 112 . In some embodiments, in-vehicle system 110 may include one or more processors and one or more memories (not shown), and in-vehicle system 110 is implemented as an in-vehicle computer. In some embodiments, in-vehicle system 110 may include more or less display screen 114 (eg, no display screen 114), and/or one or more speakers or other human-computer interaction devices. In some embodiments, in-vehicle system 110 may not communicate with server 120 .

服务器120可以代表单台服务器、多台服务器的集群、分布式系统、或者提供基础云服务(诸如云数据库、云计算、云存储、云通信)的云服务器。将理解的是，虽然图1中示出服务器120与仅一个车载系统110通信，但是服务器120可以同时为多个车载系统提供后台服务。Server 120 may represent a single server, a cluster of multiple servers, a distributed system, or a cloud server that provides basic cloud services such as cloud database, cloud computing, cloud storage, cloud communication. It will be appreciated that although the server 120 is shown in FIG. 1 in communication with only one in-vehicle system 110, the server 120 may provide background services for multiple in-vehicle systems simultaneously.

网络130允许按照约定的通信协议和数据交互标准，在车-X(“X”意指车、路、行人或互联网等)之间，进行无线通讯和信息交换。网络130的示例包括局域网(LAN)、广域网(WAN)、个域网(PAN)、和/或诸如互联网之类的通信网络的组合。网络130可以是有线或无线网络。在一个示例中，网络130可以是车内网、车际网和/或车载移动互联网。The network 130 allows wireless communication and information exchange between vehicle-X ("X" means vehicle, road, pedestrian, or the Internet, etc.) according to agreed communication protocols and data exchange standards. Examples of network 130 include a local area network (LAN), a wide area network (WAN), a personal area network (PAN), and/or a combination of communication networks such as the Internet. Network 130 may be a wired or wireless network. In one example, the network 130 may be an in-vehicle network, an inter-vehicle network, and/or an in-vehicle mobile Internet.

为了本公开实施例的目的，在图1的示例中，应用程序112可以为电子地图应用程序，该电子地图应用程序可以提供基于电子地图的各种功能，例如，导航、路线查询、地点查找等等。与此相应，服务器120可以是与电子地图应用程序一起使用的服务器。该服务器120可以基于路网数据向车载系统110中运行的应用程序112提供在线地图服务，例如在线导航、在线路线查询以及在线地点查找等。替换地，服务器120也可以将路网数据提供给车载系统110，由车载系统110中运行的应用程序112根据该路网数据提供本地地图服务。For the purpose of the embodiments of the present disclosure, in the example of FIG. 1 , the application 112 may be an electronic map application, which may provide various functions based on the electronic map, such as navigation, route query, location search, etc. Wait. Correspondingly, the server 120 may be a server used with an electronic map application. The server 120 can provide online map services, such as online navigation, online route query, and online location search, to the application 112 running in the vehicle system 110 based on the road network data. Alternatively, the server 120 may also provide the road network data to the in-vehicle system 110, and the application 112 running in the in-vehicle system 110 provides a local map service according to the road network data.

图2是图示出根据示例性实施例的用于车辆座舱的语音交互方法200的流程图。方法200可以在车载系统(例如，图1中所示的车载系统110)处执行，也即，方法200的各个步骤的执行主体可以是图1中所示的车载系统110。在一些实施例中，方法200可以在服务器(例如，图1中所示的服务器120)处执行。在一些实施例中，方法200可以由车载系统(例如，车载系统110)和服务器(例如，服务器120)相组合地执行。在下文中，以执行主体为车载系统110为例，对方法200的各个步骤进行描述。此处，车载系统110所在的车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容。FIG. 2 is a flowchart illustrating a voice interaction method 200 for a vehicle cabin, according to an exemplary embodiment. The method 200 may be performed at an in-vehicle system (eg, the in-vehicle system 110 shown in FIG. 1 ), that is, the execution subject of each step of the method 200 may be the in-vehicle system 110 shown in FIG. 1 . In some embodiments, method 200 may be performed at a server (eg, server 120 shown in FIG. 1 ). In some embodiments, method 200 may be performed by a combination of an in-vehicle system (eg, in-vehicle system 110 ) and a server (eg, server 120 ). Hereinafter, each step of the method 200 will be described by taking the execution subject as the in-vehicle system 110 as an example. Here, the vehicle cockpit in which the in-vehicle system 110 is located includes a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones are jointly configured with a current session buffer for storing the current multi-round conversation, and the plurality of sound zones are Each sound area in is configured with a corresponding sound area buffer for storing historical semantic content from the sound area within a preset time window.

如图2所示，方法200包括：As shown in FIG. 2, method 200 includes:

步骤S210，获取来自多个音区中的第一音区的当前语义内容；Step S210, obtains the current semantic content from the first sound section in the plurality of sound sections;

步骤S220，基于当前会话缓存、第一音区的音区缓存和该多个音区中除第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及Step S220, based on the current session cache, the voice zone cache of the first voice zone, and the voice zone caches of the rest of the multiple voice zones except the first voice zone, update the current multi-round session, wherein the updated current Multiple rounds of conversations are associated with the current semantic content; and

步骤S230，处理与更新后的当前多轮会话相关联的当前语义内容。Step S230, processing the current semantic content associated with the updated current multi-round session.

下面详细描述方法200的各个步骤。The various steps of method 200 are described in detail below.

在步骤S210中，获取来自多个音区中的第一音区的当前语义内容。第一音区可以是车辆座舱所包括的与多个车辆座位分别对应的多个音区中的任一音区。例如，第一音区可以是与驾驶员座位对应的音区或者可以是与副驾驶座位和/或后排乘客座位对应的音区，本公开对此不进行限制。当前语义内容可以是来自第一音区的当前原始语音信号经自然语言理解NLU处理后得到的经处理信号。在一些示例中，来自第一音区的当前原始语音信号可以由第一音区所配备的麦克风或麦克风阵列来采集。In step S210, the current semantic content from the first sound region of the plurality of sound regions is acquired. The first sound zone may be any one of a plurality of sound zones included in the vehicle cabin and corresponding to a plurality of vehicle seats respectively. For example, the first sound zone may be the sound zone corresponding to the driver's seat or may be the sound zone corresponding to the front passenger seat and/or the rear passenger seat, which is not limited in the present disclosure. The current semantic content may be a processed signal obtained after the current original speech signal from the first sound region is processed by natural language understanding NLU. In some examples, the current raw speech signal from the first sound zone may be collected by a microphone or microphone array equipped with the first sound zone.

在步骤S220中，基于当前会话缓存、第一音区的音区缓存和该多个音区中除第一音区以外的其余音区的音区缓存，更新当前多轮会话。进一步地，更新后的当前多轮会话与当前语义内容相关联。当前会话缓存中所存储的内容与整个车辆座舱空间内(多个)用户当前进行的会话相关联，第一音区的音区缓存中所存储的内容与第一音区内用户当前说出的语音或用户历史语音相关联，并且多个音区中除第一音区以外的其余音区的音区缓存中所存储的内容与相应音区内用户当前说出的语音或用户历史语音相关联。由此，既能够避免NLU串线问题(即，多音区内的多个会话中所包括的意图可能被NLU合并理解为同一意图)，又能够将存在上下文关联性的不同用户会话进行链接以避免将不同用户的语音简单映射成分开的意图。In step S220, the current multi-round session is updated based on the current session buffer, the sound area buffer of the first sound area, and the sound area buffers of the remaining sound areas of the plurality of sound areas except the first sound area. Further, the updated current multi-round conversation is associated with the current semantic content. The content stored in the current session cache is associated with the current session (multiple) users in the entire vehicle cabin space, and the content stored in the voice zone cache of the first sound zone is related to the current conversation of the user in the first voice zone. The voices or the user's historical voices are associated, and the content stored in the voice zone caches of the remaining voice zones in the multiple voice zones except the first voice zone is associated with the user's currently spoken voice or the user's historical voice in the corresponding voice zone . In this way, the problem of NLU cross-linking can be avoided (that is, the intentions included in multiple sessions within a multi-tone area may be merged and understood as the same intention by NLU), and different user sessions with contextual relevance can be linked to avoid Simple mapping of speech from different users into separate intents.

在步骤S230中，处理与更新后的当前多轮会话相关联的当前语义内容。由于当前多轮会话已在步骤S220中得到更新，因此与更新后的当前多轮会话相关联的当前语义内容能够反映出其所处的当前多轮会话的上下文信息，车机因而能够更有针对性地给出对当前语义内容的响应，从而提高车辆座舱内人机语音交互的准确度。In step S230, the current semantic content associated with the updated current multi-round session is processed. Since the current multi-round session has been updated in step S220, the current semantic content associated with the updated current multi-round session can reflect the contextual information of the current multi-round session in which it is located, so the vehicle and the machine can be more targeted. It can give a response to the current semantic content, thereby improving the accuracy of human-machine voice interaction in the vehicle cockpit.

根据本公开的实施例，上述方法200克服了相关技术中的车载语音交互系统以串行单会话交互的方式应对多语音输入而无法真正并行响应多音区多会话意图的不足以及无法判断多个音区的语音信号之间是否具有上下文关系的缺陷。方法200基于当前会话缓存、第一音区的音区缓存和多个音区中除该第一音区以外的其余音区的音区缓存来更新当前多轮会话，在将多轮会话的上下文关联性纳入考虑的情况下能够更好地处理与更新后的当前多轮会话相关联的当前语义内容，从而更好地理解并响应用户的意图并改善用户的用车体验。According to the embodiments of the present disclosure, the above-mentioned method 200 overcomes the deficiencies in the related art that the in-vehicle voice interaction system responds to multi-voice input in a serial single-session interaction, but cannot truly respond to multi-speech area and multi-session intentions in parallel, and cannot judge multiple Whether there is a defect in the contextual relationship between the speech signals of the vocal area. The method 200 updates the current multi-round session based on the current session cache, the voice zone cache of the first voice zone, and the voice zone buffers of other voice zones in the plurality of voice zones except the first voice zone. Taking relevancy into account can better handle the current semantic content associated with the updated current multi-round session, thereby better understanding and responding to the user's intent and improving the user's car experience.

图3是图示出根据示例性实施例的更新当前多轮会话的方法300的流程图。方法300可以在车载系统(例如，图1中所示的车载系统110)处执行，也即，方法300的各个步骤的执行主体可以是图1中所示的车载系统110。在一些实施例中，方法300可以在服务器(例如，图1中所示的服务器120)处执行。在一些实施例中，方法300可以由车载系统(例如，车载系统110)和服务器(例如，服务器120)相组合地执行。在下文中，以执行主体为车载系统110为例，对方法300的各个步骤进行描述。此处，车载系统110所在的车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容。FIG. 3 is a flowchart illustrating a method 300 of updating a current multi-round session, according to an exemplary embodiment. The method 300 may be performed at an in-vehicle system (eg, the in-vehicle system 110 shown in FIG. 1 ), that is, the execution subject of each step of the method 300 may be the in-vehicle system 110 shown in FIG. 1 . In some embodiments, method 300 may be performed at a server (eg, server 120 shown in FIG. 1 ). In some embodiments, method 300 may be performed by a combination of an in-vehicle system (eg, in-vehicle system 110 ) and a server (eg, server 120 ). Hereinafter, each step of the method 300 will be described by taking the execution subject as the in-vehicle system 110 as an example. Here, the vehicle cockpit in which the in-vehicle system 110 is located includes a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones are jointly configured with a current session buffer for storing the current multi-round conversation, and the plurality of sound zones are Each sound area in is configured with a corresponding sound area buffer for storing historical semantic content from the sound area within a preset time window.

如图3所示，方法300包括：As shown in FIG. 3, method 300 includes:

步骤S310，确定第一音区的音区缓存内是否存储有历史语义内容；Step S310, determine whether there is historical semantic content stored in the sound region cache of the first sound region;

步骤S320，响应于确定第一音区的音区缓存内存储有历史语义内容，确定当前语义内容与历史语义内容是否属于同一多轮会话；以及Step S320, determining whether the current semantic content and the historical semantic content belong to the same multi-round conversation in response to determining that the historical semantic content is stored in the sound region cache of the first sound region; and

步骤S330，响应于确定当前语义内容与历史语义内容属于同一多轮会话，将当前语义内容与历史语义内容一起存储在当前会话缓存中以更新当前多轮会话。Step S330, in response to determining that the current semantic content and the historical semantic content belong to the same multi-round session, store the current semantic content and the historical semantic content together in the current session cache to update the current multi-round session.

下面结合图4来详细描述方法300的各个步骤。The various steps of the method 300 are described in detail below with reference to FIG. 4 .

图4是图示出根据示例性实施例的车辆座舱的语音交互过程400的逻辑框图。过程400的工作流程可包括通过麦克风或麦克风阵列采集来自多音区中的一音区的新的声源信号，对该声源信号进行处理，通过自动语音识别(Automatic Speech Recognition,ASR)和/或NLU完成对声源信号的语义内容解析。在一组多轮会话中，当已收到多个声源信号(其可能伴随有意图理解请求)时，ASR和/或NLU结合该音区的缓存状态、其余音区的缓存状态和车辆座舱所对应的当前会话缓存来判断新收到的声源信号是否与已收到的多个声源信号中的一者或多者属于同一多轮会话，据此给出此刻的意图理解，并处理新收到的声源信号(例如，新语义内容)相应的当前请求，并更新当前会话缓存和音区缓存，若新收到的声源信号与已收到的多个声源信号中的任一者都不属于多轮对话，则在当前会话缓存中将该新收到的声源信号新建为当前多轮会话(例如，以替换当前会话缓存中所存储的内容)并更新音区缓存。FIG. 4 is a logical block diagram illustrating a voice interaction process 400 for a vehicle cabin, according to an exemplary embodiment. The workflow of process 400 may include acquiring a new sound source signal from one of the multi-tone regions through a microphone or microphone array, processing the sound source signal, using Automatic Speech Recognition (ASR) and/or Or NLU completes the semantic content analysis of the sound source signal. In a set of multi-round sessions, when multiple sound source signals (which may be accompanied by an intentional understanding request) have been received, the ASR and/or NLU combine the cached state of that voice zone, the cached state of the remaining voice zones, and the vehicle cockpit The corresponding current session cache is used to determine whether the newly received sound source signal and one or more of the multiple received sound source signals belong to the same multi-round session, and the intention understanding at the moment is given accordingly, and Process the current request corresponding to the newly received sound source signal (for example, new semantic content), and update the current session buffer and sound area buffer. If none of them belong to the multi-round dialogue, the newly received sound source signal is newly created as the current multi-round session in the current session buffer (for example, to replace the content stored in the current session buffer) and the sound area buffer is updated.

下面详细描述过程400的各个框。The various blocks of process 400 are described in detail below.

在框401，等待接收新语义内容。语义内容可以是来自音区的当前原始语音信号经自然语言理解NLU处理后得到的经处理信号。在一些示例中，来自音区的当前原始语音信号可以由音区所配备的麦克风或麦克风阵列来采集。At block 401, new semantic content is awaited to be received. The semantic content may be a processed signal from the current raw speech signal from the phonetic region after being processed by natural language understanding NLU. In some examples, the current raw speech signal from the sound zone may be collected by a microphone or microphone array with which the sound zone is equipped.

在框402，收到音区i的新语义内容。音区i可以是车辆座舱所包括的与多个车辆座位分别对应的多个音区中的任一音区。例如，音区i可以是与驾驶员座位对应的音区或者可以是与副驾驶座位和/或后排乘客座位对应的音区，本公开对此不进行限制。At block 402, new semantic content for region i is received. The sound zone i may be any one of a plurality of sound zones included in the vehicle cabin and corresponding to a plurality of vehicle seats respectively. For example, the sound zone i may be the sound zone corresponding to the driver's seat or the sound zone corresponding to the front passenger seat and/or the rear passenger seat, which is not limited in the present disclosure.

在框403，判断音区i的缓存内是否存储有历史语义内容。框403可对应于图3的步骤S310：确定第一音区的音区缓存内是否存储有历史语义内容，此时音区i对应于第一音区。历史语义内容可以指历史语音信号在由自然语言理解NLU处理后所得到的、含有上下文信息(例如，是否存在与其他语音的接续)、会话状态(例如，是否为一段对话的开头或结尾)和意图(例如，是否包括对特定信息的查询)等丰富细节的经处理信号。At block 403, it is determined whether historical semantic content is stored in the buffer of sound zone i. Block 403 may correspond to step S310 of FIG. 3 : determining whether historical semantic content is stored in the sound region buffer of the first sound region, and the sound region i corresponds to the first sound region at this time. Historical semantic content can refer to historical speech signals obtained after being processed by natural language understanding NLU, containing contextual information (for example, whether there is a continuation with other speech), conversation state (for example, whether it is the beginning or end of a dialogue) and A processed signal that is rich in detail such as intent (eg, whether to include a query for specific information).

根据本公开的实施例，音区缓存的查询时间窗口长度可被自定义地调整。According to an embodiment of the present disclosure, the query time window length of the tone zone cache can be adjusted by custom.

当框403的判断结果为是时，过程400行进至框404。When the determination of block 403 is yes, process 400 proceeds to block 404 .

在框404，判断新语义内容是否与音区i的缓存内的历史语义内容属于同一多轮会话。框404可对应于图3的步骤S320：响应于确定第一音区的音区缓存内存储有历史语义内容，确定当前语义内容与历史语义内容是否属于同一多轮会话。需要注意的是，用于判断两个语义内容是否属于同一多轮会话的方法可包括ASR、NLU等本领域公知的语音处理技术，为避免模糊本公开的发明构思，在此不再赘述。At block 404, it is determined whether the new semantic content belongs to the same multi-round session as the historical semantic content in the cache of zone i. Block 404 may correspond to step S320 of FIG. 3 : determining whether the current semantic content and the historical semantic content belong to the same multi-round conversation in response to determining that the historical semantic content is stored in the sound region cache of the first sound region. It should be noted that the method for judging whether two semantic contents belong to the same multi-round conversation may include ASR, NLU and other speech processing technologies known in the art.

当框404的判断结果为是(也就是说，在音区i中新收到的语义内容与该音区i中之前收到的历史语义内容存在上下文关联性，使得音区i中的新语义内容与历史语义内容构成同一多轮会话)时，过程400行进至框405。When the judgment result of block 404 is yes (that is, the semantic content newly received in sound area i is contextually related to the historical semantic content previously received in this sound area i, so that the new semantic content in sound area i is The process 400 proceeds to block 405 when the content and the historical semantic content constitute the same multi-round conversation).

在框405，更新当前会话缓存。具体而言，将新收到的语义内容与其所关联的历史语义内容一起存储在当前会话缓存中以盖写当前会话缓存中存储的内容。框405可对应于图3的步骤S330：响应于确定当前语义内容与历史语义内容属于同一多轮会话，将当前语义内容与历史语义内容一起存储在当前会话缓存中以更新当前多轮会话。与音区缓存的查询时间窗口相类似，也可以为当前会话缓存设置可被自定义地调整的查询时间窗口长度。此处需要注意的是，在当前会话缓存的查询时间窗口长度不足以覆盖要盖写到当前会话缓存中的当前语义内容与历史语义内容两者的情况下，历史语义内容的在时间上的较早部分可以不被存储到当前会话缓存中。即，当前会话缓存中所存储的内容是最靠近当前时间点的。至此，方法300完成。At block 405, the current session cache is updated. Specifically, the newly received semantic content is stored in the current session cache together with its associated historical semantic content to overwrite the content stored in the current session cache. Block 405 may correspond to step S330 of FIG. 3: in response to determining that the current semantic content and the historical semantic content belong to the same multi-round session, store the current semantic content together with the historical semantic content in the current session cache to update the current multi-round session. Similar to the query time window of the sound zone cache, a customizable query time window length can also be set for the current session cache. It should be noted here that, in the case that the query time window of the current session cache is not long enough to cover both the current semantic content and the historical semantic content to be overwritten in the current session cache, the time difference of the historical semantic content is relatively high. The early part may not be stored in the current session cache. That is, the content stored in the current session cache is the closest to the current time point. At this point, method 300 is complete.

继续参考图4描述其他本公开其他示例性实施例。Other exemplary embodiments of the present disclosure are described with continued reference to FIG. 4 .

延续上述示例性实施例，当过程400经由框403行进至框404且在框404处的判断结果为否(也就是说，在音区i中新收到的语义内容与该音区i中之前收到的历史语义内容不存在上下文关联性，以至于音区i中的新语义内容与音区i中的历史语义内容无法构成同一多轮会话)时，过程400可从框404行进至框407。Continuing with the above-described exemplary embodiment, when process 400 proceeds via block 403 to block 404 and the decision at block 404 is no (that is, the semantic content newly received in phoneme i is the same as the process 400 may proceed from block 404 to block 400 when the received historical semantic content does not have contextual relevance such that the new semantic content in voice zone i and the historical semantic content in voice zone i cannot constitute the same multi-round conversation) 407.

在框407，判断新语义内容与当前会话缓存内的当前多轮会话是否属于同一多轮会话。当前会话缓存中所存储的内容与整个车辆座舱空间内(多个)用户当前进行的会话相关联。例如，如果新语义内容是“要有停车位”，并且当前会话缓存中所存储的当前多轮会话包括“买箱包”、“最近的商场”，则可判断新语义内容与当前会话缓存中所存储的当前多轮会话属于同一多轮会话，其中该多轮会话的意图可被理解为要去最近的有停车位的商场买箱包，并且要盖写到当前会话缓存中的内容为“要有停车位”、“买箱包”和“最近的商场”。因而，当框407的判断结果为是时，过程400可从框407行进至框405。At block 407, it is determined whether the new semantic content and the current multi-round session in the current session cache belong to the same multi-round session. The content stored in the current session cache is associated with the user's current session(s) throughout the vehicle cabin space. For example, if the new semantic content is "to have a parking space", and the current multi-round sessions stored in the current session cache include "buy luggage" and "nearest shopping mall", it can be determined that the new semantic content is related to the current session cache. The stored current multi-round session belongs to the same multi-round session, where the intention of the multi-round session can be understood as going to the nearest shopping mall with parking spaces to buy bags, and the content to be overwritten in the current session cache is "To be Parking available", "Buy luggage" and "Nearest mall". Thus, process 400 may proceed from block 407 to block 405 when the determination of block 407 is yes.

根据本公开的实施例，上述方法可任选地包括一附加步骤：响应于确定当前语义内容与历史语义内容不属于同一多轮会话，确定当前语义内容与当前多轮会话是否属于同一多轮会话；以及响应于确定当前语义内容与当前多轮会话属于同一多轮会话，将当前语义内容与当前多轮会话一起存储在当前会话缓存中以更新当前多轮会话。According to an embodiment of the present disclosure, the above method may optionally include an additional step of: in response to determining that the current semantic content and the historical semantic content do not belong to the same multi-round session, determining whether the current semantic content and the current multi-round session belong to the same multi-round session a round session; and in response to determining that the current semantic content and the current multi-round session belong to the same multi-round session, storing the current semantic content with the current multi-round session in the current session cache to update the current multi-round session.

延续上述示例性实施例，当过程400经由框403、404行进至框407且在框407处的判断结果为否时，过程400可从框407行进至框408。在框408，判断新语义内容与多个音区中除音区i以外的其余音区j的缓存内的历史语义内容是否属于同一多轮会话。在一些情形中，音区i中当前收到的新语义内容可能既不与音区i的历史语义内容相关联，也不与当前会话缓存中所存储的当前多轮会话相关联。例如，音区i的新语义内容是“限行尾号”，音区i的历史语义内容为“今天气温几度”，并且当前会话缓存中所存储的当前多轮会话包括“买箱包”、“最近的商场”，则此时既无法确定音区i中当前收到的新语义内容与音区i的历史语义内容之间的关联性，也无法确定音区i中当前收到的新语义内容与当前会话缓存中所存储的当前多轮会话之间的关联性。为此，有必要判断音区i中的新语义内容与多个音区中除音区i以外的其余音区j的缓存内的历史语义内容是否属于同一多轮会话。Continuing with the exemplary embodiment described above, when process 400 proceeds to block 407 via blocks 403 , 404 and the determination at block 407 is negative, process 400 may proceed from block 407 to block 408 . At block 408, it is determined whether the new semantic content and the historical semantic content in the buffers of the remaining sound regions j of the plurality of sound regions except sound region i belong to the same multi-round conversation. In some cases, the new semantic content currently received in voice zone i may be associated neither with the historical semantic content of voice zone i nor with the current multi-round session stored in the current session cache. For example, the new semantic content of sound area i is "limited line end number", the historical semantic content of sound area i is "how many degrees is the temperature today", and the current multi-round sessions stored in the current session cache include "buy luggage", " The nearest shopping mall", then neither the correlation between the new semantic content currently received in sound area i and the historical semantic content of sound area i, nor the new semantic content currently received in sound area i can be determined. Correlation with the current multi-round session stored in the current session cache. For this reason, it is necessary to judge whether the new semantic content in sound region i and the historical semantic content in the buffers of other sound regions j in multiple sound regions except for sound region i belong to the same multi-round conversation.

当音区i中的新语义内容与多个音区中除音区i以外的其余音区j的缓存内的历史语义内容属于同一多轮会话(例如，音区j的缓存内的历史语义内容为“明天周几”，从而两者所构成的多轮会话的意图可被理解为明天道路有无限行、限行尾号是多少)时，过程400可从框408行进至框405，在框405中，将音区i中的新语义内容与音区j的缓存内的历史语义内容一起存储到当前会话缓存中，以对当前会话缓存中所存储的内容进行更新。When the new semantic content in sound region i and the historical semantic content in the buffers of other sound regions j in multiple sound regions except sound region i belong to the same multi-round session (for example, the historical semantic content in the cache of sound region j When the content is "the day of the week tomorrow", so that the intention of the multi-round conversation formed by the two can be understood as tomorrow's road has unlimited lines, what is the limit of the end number of the line), the process 400 can proceed from block 408 to block 405, in block 408 In 405, the new semantic content in the sound zone i and the historical semantic content in the buffer of the sound zone j are stored in the current session cache, so as to update the content stored in the current session cache.

根据本公开的实施例，上述方法可任选地包括一附加步骤：响应于确定当前语义内容与当前多轮会话不属于同一多轮会话，确定当前语义内容与多个音区中除第一音区以外的其余音区的音区缓存中所存储的历史语义内容是否属于同一多轮会话；以及响应于确定当前语义内容与多个音区中的第二音区的音区缓存中所存储的历史语义内容属于同一多轮会话，将当前语义内容与第二音区的音区缓存中所存储的历史语义内容一起存储在当前会话缓存中以更新当前多轮会话。此处，第二音区是不同于第一音区的音区。According to an embodiment of the present disclosure, the above method may optionally include an additional step of: in response to determining that the current semantic content and the current multi-round conversation do not belong to the same multi-round conversation, determining that the current semantic content and the plurality of voice regions except the first Whether the historical semantic content stored in the sound region cache of the remaining sound regions other than the sound region belongs to the same multi-round session; The stored historical semantic content belongs to the same multi-round session, and the current semantic content is stored in the current session cache together with the historical semantic content stored in the voice zone cache of the second voice zone to update the current multi-round session. Here, the second sound zone is a sound zone different from the first sound zone.

延续上述示例性实施例，当过程400经由框403、404、407行进至框408且在框408处的判断结果为否时，过程400可从框408行进至框409。当音区i中的新语义内容不与多个音区中除音区i以外的其余音区j的缓存内的历史语义内容属于同一多轮会话(除此之外，如上所述，音区i中的新语义内容也不与音区i的缓存中所存储的历史语义内容或车辆座舱的当前会话缓存中所存储的当前多轮会话相关联)时，过程400可从框408行进至框409，在框409中，用音区i中的该新语义内容替换当前会话缓存中的当前多轮会话。Continuing with the exemplary embodiment described above, when process 400 proceeds to block 408 via blocks 403 , 404 , 407 and the determination at block 408 is negative, process 400 may proceed from block 408 to block 409 . When the new semantic content in sound area i does not belong to the same multi-round session as the historical semantic content in the buffers of the remaining sound area j in multiple sound areas except sound area j (in addition, as mentioned above, the sound process 400 may proceed from block 408 to Block 409, in block 409, replace the current multi-round session in the current session cache with the new semantic content in zone i.

根据本公开的实施例，上述方法可任选地包括一附加步骤：响应于确定当前语义内容与多个音区中除第一音区以外的其余音区的音区缓存中所存储的历史语义内容不属于同一多轮会话，将当前语义内容存储在当前会话缓存中以更新当前多轮会话。According to an embodiment of the present disclosure, the above method may optionally include an additional step of: in response to determining the current semantic content and the historical semantics stored in the voice zone caches of the remaining voice zones of the plurality of voice zones except the first voice zone If the content does not belong to the same multi-round session, store the current semantic content in the current session cache to update the current multi-round session.

返回至框403，当框403的判断结果为否时，过程400可从框403行进至框407。进一步，当框407的判断结果为是时，过程400可接着从框407行进至框405。这表明，音区i中收到的新语义内容可能是该音区的初始语义内容(即，音区i的音区缓存在此之前为空)，且与当前会话缓存中所存储的当前多轮会话(例如，其可以由来自除音区i以外的其余音区的语义内容构成)相关联。Returning to block 403 , when the determination of block 403 is negative, process 400 may proceed from block 403 to block 407 . Further, when the determination of block 407 is positive, process 400 may then proceed from block 407 to block 405 . This indicates that the new semantic content received in the sound area i may be the initial semantic content of the sound area (that is, the sound area buffer of the sound area i was empty before this), and it is different from the current number stored in the current session cache. Round conversations (eg, which may consist of semantic content from the rest of the phonetic regions other than phoneme i) are associated.

根据本公开的实施例，上述方法可任选地包括一附加步骤：响应于确定第一音区的音区缓存内未存储历史语义内容，确定当前语义内容与当前多轮会话是否属于同一多轮会话；以及响应于确定当前语义内容与当前多轮会话属于同一多轮会话，将当前语义内容与当前多轮会话一起存储在当前会话缓存中以更新当前多轮会话。According to an embodiment of the present disclosure, the above method may optionally include an additional step of determining whether the current semantic content and the current multi-round conversation belong to the same multi-session, in response to determining that no historical semantic content is stored in the voice zone cache of the first voice zone. a round session; and in response to determining that the current semantic content and the current multi-round session belong to the same multi-round session, storing the current semantic content with the current multi-round session in the current session cache to update the current multi-round session.

延续上述示例性实施例，当过程400经由框403、407行进至框408且在框408处的判断结果为是时，过程400可从框408行进至框405。例如，该情形可以是：音区i的新语义内容是“限行尾号”，音区i的缓存中无历史语义内容(即，音区i的缓存为空)，当前会话缓存中所存储的当前多轮会话包括“买箱包”、“最近的商场”，而其余音区j的缓存内所存储的历史语义内容为“明天周几”。因而，音区i的新语义内容可以与其余音区j的缓存内所存储的历史语义内容构成同一多轮会话。Continuing with the exemplary embodiment described above, process 400 may proceed from block 408 to block 405 when process 400 proceeds to block 408 via blocks 403 , 407 and the determination at block 408 is yes. For example, the situation may be: the new semantic content of sound area i is "limited line end number", there is no historical semantic content in the cache of sound area i (that is, the cache of sound area i is empty), the current session cache stored in the cache The current multi-round conversation includes "buying luggage" and "nearest shopping mall", while the historical semantic content stored in the cache of other sound zones j is "tomorrow's day of the week". Therefore, the new semantic content of sound area i can form the same multi-round conversation with the historical semantic content stored in the cache of the remaining sound area j.

延续上述示例性实施例，当过程400经由框403、407行进至框408且在框408处的判断结果为否时，过程400可从框408行进至框405。例如，该情形可以是：音区i的新语义内容是“限行尾号”，音区i的缓存中无历史语义内容(即，音区i的缓存为空)，当前会话缓存中所存储的当前多轮会话包括“买箱包”、“最近的商场”，而其余音区j的缓存内所存储的历史语义内容要么为空要么与“限行尾号”无任何上下文关联性。Continuing with the exemplary embodiment described above, when process 400 proceeds to block 408 via blocks 403 , 407 and the determination at block 408 is negative, process 400 may proceed from block 408 to block 405 . For example, the situation may be: the new semantic content of sound area i is "limited line end number", there is no historical semantic content in the cache of sound area i (that is, the cache of sound area i is empty), the current session cache stored in the cache The current multi-round conversation includes "buying luggage" and "nearest shopping mall", while the historical semantic content stored in the cache of other sound zones j is either empty or has no contextual relevance with "limited line ending number".

过程400还包括框410，其中，音区i的缓存被更新。Process 400 also includes block 410, wherein the cache of zone i is updated.

根据本公开的实施例，上述方法可任选地包括一附加步骤：将当前语义内容存储在第一音区的音区缓存中，以更新第一音区的音区缓存中所存储的历史语义内容。According to an embodiment of the present disclosure, the above-mentioned method may optionally include an additional step of: storing the current semantic content in the sound region cache of the first sound region to update the historical semantics stored in the sound region cache of the first sound region content.

需要注意的是，可以按循环方式执行过程400。例如，当过程400行进至框410之后可重新返回框401，以期接收新的语义内容。Note that process 400 may be performed in a loop. For example, process 400 may re-enter block 401 after proceeding to block 410 in anticipation of receiving new semantic content.

虽然各个操作在附图中被描绘为按照特定的顺序，但是这不应理解为要求这些操作必须以所示的特定顺序或者按顺行次序执行，也不应理解为要求必须执行所有示出的操作以获得期望的结果。Although various operations are depicted in the figures as being in a particular order, this should not be construed as a requirement that these operations be performed in the particular order shown or in a sequential order action to obtain the desired result.

图5是图示出根据示例性实施例的用于车辆座舱的语音交互装置500的示意性框图。车辆座舱包括与多个车辆座位分别对应的多个音区，多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容。FIG. 5 is a schematic block diagram illustrating a voice interaction apparatus 500 for a vehicle cabin according to an exemplary embodiment. The vehicle cockpit includes a plurality of sound zones respectively corresponding to the plurality of vehicle seats, the plurality of sound zones are jointly configured with a current session buffer for storing the current multi-round session, and each sound zone of the plurality of sound zones is configured with a corresponding The sound area buffer is used to store the historical semantic content from the sound area within the preset time window.

装置500包括：获取模块510，被配置为获取来自多个音区中的第一音区的当前语义内容；更新模块520，被配置为基于当前会话缓存、第一音区的音区缓存和多个音区中除第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及处理模块530，被配置为处理与更新后的当前多轮会话相关联的当前语义内容。The apparatus 500 includes: an obtaining module 510 configured to obtain current semantic content from a first sound region of the plurality of sound regions; an updating module 520 configured to be based on the current session cache, the sound region cache of the first sound region and the multiple sound regions. The sound region caches of the remaining sound regions in the sound regions except the first sound region, update the current multi-round conversation, wherein the updated current multi-round conversation is associated with the current semantic content; and a processing module 530, configured to process The current semantic content associated with the updated current multi-round session.

应当理解，图5中所示装置500的各个模块可以与参考图2描述的方法200中的各个步骤相对应。由此，上面针对方法200描述的操作、特征和优点同样适用于装置500及其包括的模块。It should be understood that various modules of the apparatus 500 shown in FIG. 5 may correspond to various steps in the method 200 described with reference to FIG. 2 . Thus, the operations, features, and advantages described above with respect to method 200 are equally applicable to apparatus 500 and the modules it includes.

根据本公开的实施例，上述装置500克服了相关技术中的车载语音交互系统以串行单会话交互的方式应对多语音输入而无法真正并行响应多音区多会话意图的不足以及无法判断多个音区的语音信号之间是否具有上下文关系的缺陷。装置500基于当前会话缓存、第一音区的音区缓存和多个音区中除该第一音区以外的其余音区的音区缓存来更新当前多轮会话，在将多轮会话的上下文关联性纳入考虑的情况下能够更好地处理与更新后的当前多轮会话相关联的当前语义内容，从而更好地理解并响应用户的意图并改善用户的用车体验。According to the embodiments of the present disclosure, the above-mentioned apparatus 500 overcomes the deficiencies in the related art that the in-vehicle voice interaction system responds to multi-voice input in a serial single-session interaction mode and cannot truly respond to multi-speech area and multi-session intentions in parallel, and cannot determine multiple Whether there is a defect in the contextual relationship between the speech signals of the vocal area. The apparatus 500 updates the current multi-round conversation based on the current session buffer, the sound area buffer of the first sound area, and the sound area buffers of the other sound areas in the plurality of sound areas except the first sound area. Taking relevancy into account can better handle the current semantic content associated with the updated current multi-round session, thereby better understanding and responding to the user's intent and improving the user's car experience.

虽然上面参考特定模块讨论了特定功能，但是应当注意，本文讨论的各个模块的功能可以分为多个模块，和/或多个模块的至少一些功能可以组合成单个模块。本文讨论的特定模块执行动作包括该特定模块本身执行该动作，或者替换地该特定模块调用或以其他方式访问执行该动作(或结合该特定模块一起执行该动作)的另一个组件或模块。因此，执行动作的特定模块可以包括执行动作的该特定模块本身和/或该特定模块调用或以其他方式访问的、执行动作的另一模块。如本文使用的，短语“基于A、B和C，执行动作Z”可以是指仅基于A、仅基于B、仅基于C、基于A和B、基于A和C、基于B和C、或基于A和B和C来执行动作Z。Although specific functionality is discussed above with reference to specific modules, it should be noted that the functionality of the various modules discussed herein may be divided into multiple modules, and/or at least some of the functionality of multiple modules may be combined into a single module. Performance of an action by a particular module discussed herein includes the particular module performing the action itself, or alternatively the particular module calling or otherwise accessing another component or module that performs the action (or performs the action in conjunction with the particular module). Thus, a particular module that performs an action may include the particular module that performs the action itself and/or another module that the particular module calls or otherwise accesses that performs the action. As used herein, the phrase "perform action Z based on A, B, and C" may mean based on A only, based on B only, based on C only, based on A and B, based on A and C, based on B and C, or based on A and B and C to perform action Z.

还应当理解，本文可以在软件硬件元件或程序模块的一般上下文中描述各种技术。上面关于图5描述的各个模块可以在硬件中或在结合软件和/或固件的硬件中实现。例如，这些模块可以被实现为计算机程序代码/指令，该计算机程序代码/指令被配置为在一个或多个处理器中执行并存储在计算机可读存储介质中。可替换地，这些模块可以被实现为硬件逻辑/电路。例如，在一些实施例中，获取模块510、更新模块520和处理模块530中的一个或多个可以一起被实现在片上系统(System on Chip,SoC)中。SoC可以包括集成电路芯片(其包括处理器(例如，中央处理单元(Central Processing Unit,CPU)、微控制器、微处理器、数字信号处理器(Digital Signal Processor,DSP)等)、存储器、一个或多个通信接口、和/或其他电路中的一个或多个部件)，并且可以可选地执行所接收的程序代码和/或包括嵌入式固件以执行功能。It should also be understood that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to FIG. 5 may be implemented in hardware or in hardware in conjunction with software and/or firmware. For example, these modules may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the acquisition module 510, the update module 520, and the processing module 530 may be implemented together in a System on Chip (SoC). An SoC may include an integrated circuit chip (which includes a processor (eg, a central processing unit (CPU), microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, a or more communication interfaces, and/or one or more components of other circuits), and may optionally execute the received program code and/or include embedded firmware to perform functions.

根据本公开的一方面，提供了一种用于车辆座舱的计算机设备，该车辆座舱包括与多个车辆座位分别对应的多个音区，该多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且该多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容。该计算机设备包括至少一个存储器、至少一个处理器以及存储在至少一个存储器上的计算机程序。该至少一个处理器被配置为执行计算机程序以实现上文描述的任一方法实施例的步骤。According to one aspect of the present disclosure, there is provided a computer device for a vehicle cabin, the vehicle cabin including a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being collectively configured with a current session buffer for A current multi-round session is stored, and each of the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window. The computer device includes at least one memory, at least one processor, and a computer program stored on the at least one memory. The at least one processor is configured to execute a computer program to implement the steps of any of the method embodiments described above.

根据本公开的一方面，提供了一种车辆，其包括如上所述的语音交互装置或计算机设备。According to an aspect of the present disclosure, there is provided a vehicle comprising the voice interaction apparatus or computer equipment as described above.

根据本公开的一方面，提供了一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上文描述的任一方法实施例的步骤。According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of any of the above-described method embodiments.

根据本公开的一方面，提供了一种计算机程序产品，其包括计算机程序，该计算机程序被处理器执行时实现上文描述的任一方法实施例的步骤。According to one aspect of the present disclosure, there is provided a computer program product comprising a computer program that, when executed by a processor, implements the steps of any of the method embodiments described above.

在下文中，结合图6描述这样的计算机设备、非暂态计算机可读存储介质和计算机程序产品的说明性示例。In the following, illustrative examples of such computer devices, non-transitory computer-readable storage media, and computer program products are described in conjunction with FIG. 6 .

图6示出了可以被用来实施本文所描述的方法的计算机设备600的示例配置。举例来说，图1中所示的服务器120和/或车载系统110可以包括类似于计算机设备600的架构。上述装置500或计算机设备也可以全部或至少部分地由计算机设备600或类似设备或系统实现。FIG. 6 shows an example configuration of a computer device 600 that may be used to implement the methods described herein. For example, server 120 and/or in-vehicle system 110 shown in FIG. 1 may include an architecture similar to computer device 600 . The above-mentioned apparatus 500 or computer equipment may also be implemented in whole or at least in part by computer equipment 600 or similar equipment or systems.

计算机设备600可以包括能够诸如通过系统总线614或其他适当的连接彼此通信的至少一个处理器602、存储器604、(多个)通信接口606、显示设备608、其他输入/输出(I/O)设备610以及一个或更多大容量存储设备612。Computer device 600 may include at least one processor 602, memory 604, communication interface(s) 606, display device 608, other input/output (I/O) devices capable of communicating with each other, such as through a system bus 614 or other suitable connection 610 and one or more mass storage devices 612.

处理器602可以是单个处理单元或多个处理单元，所有处理单元可以包括单个或多个计算单元或者多个核心。处理器602可以被实施成一个或更多微处理器、微型计算机、微控制器、数字信号处理器、中央处理单元、状态机、逻辑电路和/或基于操作指令来操纵信号的任何设备。除了其他能力之外，处理器602可以被配置成获取并且执行存储在存储器604、大容量存储设备612或者其他计算机可读介质中的计算机可读指令，诸如操作系统616的程序代码、应用程序618的程序代码、其他程序620的程序代码等。The processor 602 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. Processor 602 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any device that manipulates signals based on operational instructions. Among other capabilities, processor 602 may be configured to retrieve and execute computer-readable instructions, such as program code of operating system 616 , application programs 618 , stored in memory 604 , mass storage device 612 , or other computer-readable media , the program code of other programs 620, and the like.

存储器604和大容量存储设备612是用于存储指令的计算机可读存储介质的示例，所述指令由处理器602执行来实施前面所描述的各种功能。举例来说，存储器604一般可以包括易失性存储器和非易失性存储器二者(例如RAM、ROM等等)。此外，大容量存储设备612一般可以包括硬盘驱动器、固态驱动器、可移除介质、包括外部和可移除驱动器、存储器卡、闪存、软盘、光盘(例如CD、DVD)、存储阵列、网络附属存储、存储区域网等等。存储器604和大容量存储设备612在本文中都可以被统称为存储器或计算机可读存储介质，并且可以是能够把计算机可读、处理器可执行程序指令存储为计算机程序代码的非暂态介质，所述计算机程序代码可以由处理器602作为被配置成实施在本文的示例中所描述的操作和功能的特定机器来执行。Memory 604 and mass storage device 612 are examples of computer-readable storage media for storing instructions that are executed by processor 602 to implement the various functions described above. For example, memory 604 may generally include both volatile and non-volatile memory (eg, RAM, ROM, etc.). Additionally, mass storage devices 612 may generally include hard drives, solid state drives, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (eg, CDs, DVDs), storage arrays, network attached storage , storage area network, etc. Both memory 604 and mass storage device 612 may be collectively referred to herein as memory or computer-readable storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code, The computer program code may be executed by processor 602 as a specific machine configured to implement the operations and functions described in the examples herein.

多个程序可以存储在大容量存储设备612上。这些程序包括操作系统616、一个或多个应用程序618、其他程序620和程序数据622，并且它们可以被加载到存储器604以供执行。这样的应用程序或程序模块的示例可以包括例如用于实现以下部件/功能的计算机程序逻辑(例如，计算机程序代码或指令)：方法200、方法300及其可任选的附加步骤、装置500和/或本文描述的另外的实施例。A number of programs may be stored on mass storage device 612 . These programs include operating system 616, one or more application programs 618, other programs 620, and program data 622, and may be loaded into memory 604 for execution. Examples of such applications or program modules may include, for example, computer program logic (eg, computer program code or instructions) for implementing the following components/functions: method 200, method 300 and optional additional steps thereof, apparatus 500, and /or additional embodiments described herein.

虽然在图6中被图示成存储在计算机设备600的存储器604中，但是模块616、618、620和622或者其部分可以使用可由计算机设备600访问的任何形式的计算机可读介质来实施。如本文所使用的，“计算机可读介质”至少包括两种类型的计算机可读介质，也就是计算机可读存储介质和通信介质。Although illustrated in FIG. 6 as being stored in memory 604 of computer device 600 , modules 616 , 618 , 620 and 622 , or portions thereof, may be implemented using any form of computer-readable medium accessible by computer device 600 . As used herein, "computer-readable media" includes at least two types of computer-readable media, namely, computer-readable storage media and communication media.

计算机可读存储介质包括通过用于存储信息的任何方法或技术实施的易失性和非易失性、可移除和不可移除介质，所述信息诸如是计算机可读指令、数据结构、程序模块或者其他数据。计算机可读存储介质包括而不限于RAM、ROM、EEPROM、闪存或其他存储器技术，CD-ROM、数字通用盘(DVD)、或其他光学存储装置，磁盒、磁带、磁盘存储装置或其他磁性存储设备，或者可以被用来存储信息以供计算机设备访问的任何其他非传送介质。与此相对，通信介质可以在诸如载波或其他传送机制之类的已调制数据信号中具体实现计算机可读指令、数据结构、程序模块或其他数据。本文所定义的计算机可读存储介质不包括通信介质。Computer readable storage media includes volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storage of information such as computer readable instructions, data structures, programs modules or other data. Computer-readable storage media include, without limitation, RAM, ROM, EEPROM, flash memory, or other memory technologies, CD-ROMs, digital versatile disks (DVDs), or other optical storage devices, magnetic cartridges, tapes, magnetic disk storage devices, or other magnetic storage devices device, or any other non-transport medium that can be used to store information for access by a computer device. In contrast, communication media may embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism. Computer-readable storage media, as defined herein, do not include communication media.

一个或更多通信接口606用于诸如通过网络、直接连接等等与其他设备交换数据。这样的通信接口可以是以下各项中的一个或多个：任何类型的网络接口(例如，网络接口卡(NIC))、有线或无线(诸如IEEE 802.11无线LAN(WLAN))无线接口、全球微波接入互操作(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、Bluetooth^TM接口、近场通信(NFC)接口等。通信接口606可以促进在多种网络和协议类型内的通信，其中包括有线网络(例如LAN、电缆等等)和无线网络(例如WLAN、蜂窝、卫星等等)、因特网等等。通信接口606还可以提供与诸如存储阵列、网络附属存储、存储区域网等等中的外部存储装置(未示出)的通信。One or more communication interfaces 606 are used to exchange data with other devices, such as through a network, direct connection, and the like. Such a communication interface may be one or more of the following: any type of network interface (eg, network interface card (NIC)), wired or wireless (such as IEEE 802.11 wireless LAN (WLAN)) wireless interface, global microwave Access Interoperability (Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth ^™ interface, Near Field Communication (NFC) interface, and the like. Communication interface 606 may facilitate communication within a variety of networks and protocol types, including wired networks (eg, LAN, cable, etc.) and wireless networks (eg, WLAN, cellular, satellite, etc.), the Internet, and the like. Communication interface 606 may also provide for communication with external storage devices (not shown) such as in storage arrays, network attached storage, storage area networks, and the like.

在一些示例中，可以包括诸如监视器之类的显示设备608，以用于向用户显示信息和图像。其他I/O设备610可以是接收来自用户的各种输入并且向用户提供各种输出的设备，并且可以包括触摸输入设备、手势输入设备、摄影机、键盘、遥控器、鼠标、打印机、音频输入/输出设备等等。In some examples, a display device 608, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 610 may be devices that receive various inputs from the user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/ output devices, etc.

本文描述的技术可以由计算机设备600的这些各种配置来支持，并且不限于本文所描述的技术的具体示例。例如，该功能还可以通过使用分布式系统在“云”上全部或部分地实现。云包括和/或代表用于资源的平台。平台抽象云的硬件(例如，服务器)和软件资源的底层功能。资源可以包括在远离计算机设备600的服务器上执行计算处理时可以使用的应用和/或数据。资源还可以包括通过因特网和/或通过诸如蜂窝或Wi-Fi网络的订户网络提供的服务。平台可以抽象资源和功能以将计算机设备600与其他计算机设备连接。因此，本文描述的功能的实现可以分布在整个云内。例如，功能可以部分地在计算机设备600上以及部分地通过抽象云的功能的平台来实现。The techniques described herein may be supported by these various configurations of computer device 600 and are not limited to the specific examples of the techniques described herein. For example, this functionality can also be implemented in whole or in part on the "cloud" using a distributed system. The cloud includes and/or represents a platform for resources. The platform abstracts the underlying functionality of the cloud's hardware (eg, servers) and software resources. Resources may include applications and/or data that may be used when computing processing is performed on servers remote from computer device 600 . Resources may also include services provided over the Internet and/or over subscriber networks such as cellular or Wi-Fi networks. The platform may abstract resources and functions to connect computer device 600 with other computer devices. Thus, the implementation of the functions described herein can be distributed across the cloud. For example, functionality may be implemented partly on computer device 600 and partly through a platform that abstracts the functionality of the cloud.

虽然在附图和前面的描述中已经详细地说明和描述了本公开，但是这样的说明和描述应当被认为是说明性的和示意性的，而非限制性的；本公开不限于所公开的实施例。通过研究附图、公开内容和所附的权利要求书，本领域技术人员在实践所要求保护的主题时，能够理解和实现对于所公开的实施例的变型。在权利要求书中，词语“包括”不排除未列出的其他元件或步骤，不定冠词“一”或“一个”不排除多个，术语“多个”是指两个或两个以上，并且术语“基于”应解释为“至少部分地基于”。在相互不同的从属权利要求中记载了某些措施的仅有事实并不表明这些措施的组合不能用来获益。While the present disclosure has been illustrated and described in detail in the accompanying drawings and the foregoing description, such illustration and description are to be considered illustrative and schematic and not restrictive; the present disclosure is not limited to the disclosed Example. Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps not listed, the indefinite article "a" or "an" does not exclude a plurality, and the term "a plurality" means two or more, And the term "based on" should be interpreted as "based at least in part on". The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

以下将描述本公开的一些示例性方面。Some exemplary aspects of the present disclosure will be described below.

方面1，一种用于车辆座舱的语音交互方法，车辆座舱包括与多个车辆座位分别对应的多个音区，多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，该方法包括：Aspect 1, a voice interaction method for a vehicle cockpit, the vehicle cockpit comprising a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being collectively configured with a current session cache for storing a current multi-round session, and Each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, and the method includes:

获取来自多个音区中的第一音区的当前语义内容；obtaining the current semantic content from the first sound region of the plurality of sound regions;

基于当前会话缓存、第一音区的音区缓存和多个音区中除第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及The current multi-round session is updated based on the current session buffer, the sound area buffer of the first sound area, and the sound area buffers of other sound areas in the plurality of sound areas except the first sound area, wherein the updated current multi-round session is the same as the associated with the current semantic content; and

处理与更新后的当前多轮会话相关联的当前语义内容。Process the current semantic content associated with the updated current multi-round session.

方面2，方面1的方法，其中，更新当前多轮会话包括：Aspect 2, the method of aspect 1, wherein updating the current multi-round session includes:

确定第一音区的音区缓存内是否存储有历史语义内容；determining whether historical semantic content is stored in the sound region cache of the first sound region;

响应于确定第一音区的音区缓存内存储有历史语义内容，确定当前语义内容与历史语义内容是否属于同一多轮会话；以及determining whether the current semantic content and the historical semantic content belong to the same multi-round session in response to determining that historical semantic content is stored in the tone zone cache of the first tone zone; and

响应于确定当前语义内容与历史语义内容属于同一多轮会话，将当前语义内容与历史语义内容一起存储在当前会话缓存中以更新当前多轮会话。In response to determining that the current semantic content and the historical semantic content belong to the same multi-round session, the current semantic content and the historical semantic content are stored in the current session cache to update the current multi-round session.

方面3，方面2的方法，还包括：Aspect 3, the method of aspect 2, further comprising:

响应于确定当前语义内容与历史语义内容不属于同一多轮会话，确定当前语义内容与当前多轮会话是否属于同一多轮会话；以及In response to determining that the current semantic content and the historical semantic content do not belong to the same multi-round session, determining whether the current semantic content and the current multi-round session belong to the same multi-round session; and

响应于确定当前语义内容与当前多轮会话属于同一多轮会话，将当前语义内容与当前多轮会话一起存储在当前会话缓存中以更新当前多轮会话。In response to determining that the current semantic content and the current multi-round session belong to the same multi-round session, the current semantic content and the current multi-round session are stored in the current session cache to update the current multi-round session.

方面4，方面3的方法，还包括：Aspect 4, the method of aspect 3, further comprising:

响应于确定当前语义内容与当前多轮会话不属于同一多轮会话，确定当前语义内容与多个音区中除第一音区以外的其余音区的音区缓存中所存储的历史语义内容是否属于同一多轮会话；以及In response to determining that the current semantic content and the current multi-round session do not belong to the same multi-round session, determining the current semantic content and the historical semantic content stored in the voice zone cache of the remaining voice zones in the plurality of voice zones except the first voice zone belong to the same multi-round session; and

响应于确定当前语义内容与多个音区中的第二音区的音区缓存中所存储的历史语义内容属于同一多轮会话，将当前语义内容与第二音区的音区缓存中所存储的历史语义内容一起存储在当前会话缓存中以更新当前多轮会话，其中，第二音区是不同于第一音区的音区。In response to determining that the current semantic content and the historical semantic content stored in the sound region cache of the second sound region of the plurality of sound regions belong to the same multi-round session, the current semantic content is compared with the historical semantic content stored in the sound region cache of the second sound region. The stored historical semantic content is stored together in the current session cache to update the current multi-round session, where the second pitch is a different pitch than the first.

方面5，方面4的方法，还包括：Aspect 5, the method of aspect 4, further comprising:

响应于确定当前语义内容与多个音区中除第一音区以外的其余音区的音区缓存中所存储的历史语义内容不属于同一多轮会话，将当前语义内容存储在当前会话缓存中以更新当前多轮会话。In response to determining that the current semantic content does not belong to the same multi-round session as the historical semantic content stored in the sound region caches of the remaining sound regions of the plurality of sound regions except the first sound region, storing the current semantic content in the current session cache to update the current multi-round session.

方面6，方面2的方法，还包括：Aspect 6, the method of aspect 2, further comprising:

响应于确定第一音区的音区缓存内未存储历史语义内容，确定当前语义内容与当前多轮会话是否属于同一多轮会话；以及determining whether the current semantic content and the current multi-round session belong to the same multi-round conversation in response to determining that the historical semantic content is not stored in the sound region cache of the first sound region; and

方面7，方面6的方法，还包括：Aspect 7, the method of aspect 6, further comprising:

方面8，方面7的方法，还包括：Aspect 8, the method of aspect 7, further comprising:

方面9，方面1的方法，还包括：Aspect 9, the method of aspect 1, further comprising:

将当前语义内容存储在第一音区的音区缓存中，以更新第一音区的音区缓存中所存储的历史语义内容。The current semantic content is stored in the sound region cache of the first sound region to update the historical semantic content stored in the sound region cache of the first sound region.

方面10，一种用于车辆座舱的语音交互装置，车辆座舱包括与多个车辆座位分别对应的多个音区，多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，装置包括：Aspect 10, a voice interaction device for a vehicle cabin, the vehicle cabin comprising a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones are collectively configured with a current session cache for storing a current multi-round session, and Each sound area in the plurality of sound areas is configured with a corresponding sound area buffer for storing historical semantic content from the sound area within a preset time window, and the device includes:

获取模块，被配置为获取来自多个音区中的第一音区的当前语义内容；an obtaining module, configured to obtain the current semantic content of the first sound region from the plurality of sound regions;

更新模块，被配置为基于当前会话缓存、第一音区的音区缓存和多个音区中除第一音区以外的其余音区的音区缓存，更新当前多轮会话，其中，更新后的当前多轮会话与当前语义内容相关联；以及The updating module is configured to update the current multi-round session based on the current session cache, the sound region cache of the first sound region and the sound region caches of other sound regions in the plurality of sound regions except the first sound region, wherein after the update is associated with the current semantic content; and

处理模块，被配置为处理与更新后的当前多轮会话相关联的当前语义内容。A processing module configured to process the current semantic content associated with the updated current multi-round session.

方面11，一种用于车辆座舱的计算机设备，车辆座舱包括与多个车辆座位分别对应的多个音区，多个音区被共同配置有当前会话缓存以供存储当前多轮会话，并且多个音区中的每个音区被配置有相应的音区缓存以供存储预设时间窗口内来自该音区的历史语义内容，该计算机设备包括：Aspect 11, a computer device for a vehicle cabin, the vehicle cabin comprising a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones are collectively configured with a current session cache for storing a current multi-round session, and a plurality of Each of the sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, and the computer equipment includes:

至少一个处理器；以及at least one processor; and

至少一个存储器，其上存储有计算机程序，at least one memory having a computer program stored thereon,

其中，计算机程序在被至少一个处理器执行时，使至少一个处理器执行方面1-9中任一项的方法。Wherein, the computer program, when executed by at least one processor, causes at least one processor to execute the method of any one of aspects 1-9.

方面12，一种车辆，包括方面10的语音交互装置或方面11的计算机设备。Aspect 12, a vehicle comprising the voice interaction device of aspect 10 or the computer equipment of aspect 11.

方面13，一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时，使处理器执行方面1-9中任一项的方法。Aspect 13, a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, causes the processor to perform the method of any one of aspects 1-9.

方面14，一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时，使处理器执行方面1-9中任一项的方法。Aspect 14, a computer program product comprising a computer program which, when executed by a processor, causes the processor to perform the method of any of aspects 1-9.

Claims

1. A voice interaction method for a vehicle cockpit comprising a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being jointly configured with a current session cache for storing the current multi-round conversation, and each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, the method includes:

obtaining the current semantic content from the first sound region of the plurality of sound regions;

updating the current multi-round session based on the current session buffer, the tone buffer of the first tone area, and the tone buffers of the remaining tone areas of the plurality of tone areas except the first tone area, wherein the updated current multi-round conversation is associated with the current semantic content; and

The current semantic content associated with the updated current multi-round session is processed.

2. The method of claim 1, wherein updating the current multi-round session comprises:

Determine whether the historical semantic content is stored in the tone cache of the first tone area;

determining whether the current semantic content and the historical semantic content belong to the same multi-round session in response to determining that historical semantic content is stored in the tone zone cache of the first tone zone; and

In response to determining that the current semantic content and the historical semantic content belong to the same multi-round session, storing the current semantic content with the historical semantic content in the current session cache to update the current multi-round session .

3. The method of claim 2, further comprising:

In response to determining that the current semantic content and the historical semantic content do not belong to the same multi-round session, determining whether the current semantic content and the current multi-round session belong to the same multi-round session; and

In response to determining that the current semantic content and the current multi-round session belong to the same multi-round session, storing the current semantic content with the current multi-round session in the current session cache to update the current multi-round session. round session.

4. The method of claim 3, further comprising:

In response to determining that the current semantic content and the current multi-round conversation do not belong to the same multi-round session, determining the relationship between the current semantic content and the remaining sound regions of the plurality of sound regions other than the first sound region Whether the historical semantic content stored in the sound zone cache belongs to the same multi-round session; and

In response to determining that the current semantic content belongs to the same multi-round session as the historical semantic content stored in the voice zone cache of a second voice zone of the plurality of voice zones, comparing the current semantic content with the second voice zone The historical semantic content stored in the tone area cache of the tone area is stored together in the current session cache to update the current multi-round conversation, wherein the second tone area is different from the first tone area sound area.

5. The method of claim 4, further comprising:

In response to determining that the current semantic content does not belong to the same multi-round session as the historical semantic content stored in the sound region caches of the remaining sound regions of the plurality of sound regions except the first sound region, the The current semantic content is stored in the current session cache to update the current multi-round session.

6. The method of claim 2, further comprising:

determining whether the current semantic content and the current multi-round session belong to the same multi-round session in response to determining that historical semantic content is not stored in the voice zone cache of the first voice zone; and

7. The method of claim 6, further comprising:

8. The method of claim 7, further comprising:

9. The method of claim 1, further comprising:

The current semantic content is stored in the sound region cache of the first sound region to update the historical semantic content stored in the sound region cache of the first sound region.

10. A voice interaction device for a vehicle cockpit, the vehicle cockpit comprising a plurality of sound zones respectively corresponding to a plurality of vehicle seats, the plurality of sound zones being jointly configured with a current session cache for storing a current multi-round conversation, and each sound region in the plurality of sound regions is configured with a corresponding sound region buffer for storing historical semantic content from the sound region within a preset time window, the device includes:

an acquisition module, the acquisition module is configured to acquire the current semantic content from the first sound region in the plurality of sound regions;

an update module, the update module is configured to be based on the current session cache, the tone cache of the first tone zone, and the tone zones of the remaining tone zones in the plurality of tone zones except the first tone zone caching, updating the current multi-round session, wherein the updated current multi-round session is associated with the current semantic content; and

A processing module configured to process the current semantic content associated with the updated current multi-round session.