CN118155630A

CN118155630A - Voice interaction method and device based on large language model and intelligent voice equipment

Info

Publication number: CN118155630A
Application number: CN202410384003.3A
Authority: CN
Inventors: 李伟; 劳春峰; 贾奇伟; 李志宏
Original assignee: Qingdao Haier Air Conditioner Gen Corp Ltd; Qingdao Haier Smart Technology R&D Co Ltd; Qingdao Haier Air Conditioning Electric Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Air Conditioner Gen Corp Ltd; Qingdao Haier Smart Technology R&D Co Ltd; Qingdao Haier Air Conditioning Electric Co Ltd; Haier Smart Home Co Ltd
Priority date: 2024-04-01
Filing date: 2024-04-01
Publication date: 2024-06-07

Abstract

The application relates to the technical field of voice processing, and discloses a voice interaction method based on a large language model, which comprises the following steps: under the condition that voice interaction data input by a user are received, converting the voice interaction data into a question text; under the condition that the problem text indicates that the user intends to call the voice instruction book, acquiring voice reply information of the intelligent voice equipment according to the problem text, the built knowledge vector library and the large language model; and controlling the intelligent voice equipment to broadcast voice reply information. With the scheme, the problems of the user can be analyzed by using the problem text, the built knowledge vector library and the large language model; based on understanding the user problem, generating corresponding voice reply information; so that the user can timely acquire the professional knowledge of the intelligent voice equipment by controlling the mode of broadcasting the voice reply information by the intelligent voice equipment. The application also discloses a voice interaction device and intelligent voice equipment based on the large language model.

Description

Voice interaction method, device and intelligent voice equipment based on large language model

技术领域Technical Field

本申请涉及语音处理技术领域，例如涉及一种基于大语言模型的语音交互方法、装置及智能语音设备。The present application relates to the field of speech processing technology, for example, to a speech interaction method, apparatus and intelligent speech device based on a large language model.

背景技术Background technique

智能语音设备的发展是近年来人工智能领域的一个重要分支，其利用语音识别、自然语言处理等技术，将人的语音转化为文字，并通过对文字的分析理解，作出相应的回应。目前，智能语音设备均配备有详尽的纸质或电子说明书，用户可以通过说明书了解产品特性和操作指南的关键资源。为了方便用户随时查阅，制造商通常会在产品包装或官方网站上提供说明书的存放位置或链接入口。用户可以依靠索引或搜索功能，快速找到特定功能的相关说明，以解决实际使用中遇到的问题。但无论出于哪种方式，用户获取设备专业知识的方式并不便捷。The development of intelligent voice devices is an important branch of the field of artificial intelligence in recent years. It uses technologies such as voice recognition and natural language processing to convert human voice into text, and responds accordingly by analyzing and understanding the text. At present, intelligent voice devices are equipped with detailed paper or electronic manuals, through which users can learn about key resources of product features and operating guides. In order to facilitate users to check at any time, manufacturers usually provide the storage location or link entrance of the manual on the product packaging or official website. Users can rely on indexing or search functions to quickly find relevant instructions for specific functions to solve problems encountered in actual use. But no matter which way, it is not convenient for users to obtain professional knowledge of the device.

现阶段，相关技术中，用户可以通过输入语音指令的方式，向智能语音设备传递设备专业知识的获取意向；智能语音设备可以通过其搭载的自然语言处理技术，理解用户的具体意向，以提供相应的反馈。At present, in the relevant technologies, users can convey their intention to obtain device expertise to smart voice devices by inputting voice commands; smart voice devices can understand the user's specific intentions through the natural language processing technology they are equipped with, and provide corresponding feedback.

在实现本公开实施例的过程中，发现相关技术中至少存在如下问题：In the process of implementing the embodiments of the present disclosure, it is found that there are at least the following problems in the related art:

通过这种方式进行语言处理时，对不同类型、不同表述方式的用户问题的理解和回答能力有限，用户无法及时获知智能语音设备的专业知识。When language processing is performed in this way, the ability to understand and answer user questions of different types and in different expressions is limited, and users are unable to obtain the professional knowledge of the intelligent voice device in a timely manner.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background technology section is only used to enhance the understanding of the background of the present application, and therefore may include information that does not constitute the prior art known to ordinary technicians in the field.

发明内容Summary of the invention

为了对披露的实施例的一些方面有基本的理解，下面给出了简单的概括。所述概括不是泛泛评述，也不是要确定关键/重要组成元素或描绘这些实施例的保护范围，而是作为后面的详细说明的序言。In order to provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. The summary is not an extensive review, nor is it intended to identify key/critical components or delineate the scope of protection of these embodiments, but rather serves as a prelude to the detailed description that follows.

本公开实施例提供了一种基于大语言模型的语音交互方法、装置及智能语音设备，能够使用户及时获知智能语音设备的专业知识。The embodiments of the present disclosure provide a voice interaction method, apparatus, and intelligent voice device based on a large language model, which can enable users to obtain the professional knowledge of the intelligent voice device in a timely manner.

在一些实施例中，所述基于大语言模型的语音交互方法包括：在接收到用户输入的语音交互数据的情况下，将语音交互数据转化为问题文本；在问题文本表示用户意图调用语音说明书的情况下，根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息；控制智能语音设备播报语音回复信息，以使用户及时获知智能语音设备的专业知识。In some embodiments, the voice interaction method based on the large language model includes: when receiving voice interaction data input by the user, converting the voice interaction data into question text; when the question text indicates that the user intends to call the voice manual, obtaining voice reply information of the intelligent voice device based on the question text, the constructed knowledge vector library and the large language model; controlling the intelligent voice device to broadcast the voice reply information so that the user can promptly obtain the professional knowledge of the intelligent voice device.

在一些实施例中，所述基于大语言模型的语音交互方法包括：调用与智能语音设备的标识信息相匹配的说明书，以作为知识文档；将知识文档添加至知识库文档中，并进行文档处理；利用词嵌入技术将处理后的文档转化为向量矩阵，以得到知识向量库。In some embodiments, the voice interaction method based on a large language model includes: calling an instruction manual that matches the identification information of an intelligent voice device as a knowledge document; adding the knowledge document to a knowledge base document and performing document processing; and using word embedding technology to convert the processed document into a vector matrix to obtain a knowledge vector library.

在一些实施例中，所述基于大语言模型的语音交互方法包括：确定文档中的目标英文内容，将目标英文内容转换为中文内容；目标英文内容包括文档中非智能语音设备专有名词和品牌名称的英文内容；和/或，确定文档中待转换格式文档，将待转换格式文档转换为目标格式文档；待转换文档为文档中非目标格式的文档。In some embodiments, the large language model-based voice interaction method includes: determining the target English content in the document, and converting the target English content into Chinese content; the target English content includes English content of non-intelligent voice device proper nouns and brand names in the document; and/or, determining a document in a format to be converted in the document, and converting the document in the format to be converted into a document in a target format; the document to be converted is a document in a non-target format in the document.

在一些实施例中，所述基于大语言模型的语音交互方法包括：对文档名称进行调整，以使文档名称的字数满足第一字数范围；和/或，对文档中第一标题及第二标题进行调整，以使第一标题及第二标题的字数满足第二字数范围；其中，第一字数范围的最小值大于第二字数范围的最大值。In some embodiments, the voice interaction method based on a large language model includes: adjusting the document name so that the number of words in the document name meets a first word range; and/or adjusting the first title and the second title in the document so that the number of words in the first title and the second title meets a second word range; wherein the minimum value of the first word range is greater than the maximum value of the second word range.

在一些实施例中，所述基于大语言模型的语音交互方法包括：在接收到接口参数的情况下，按照接口参数对文档进行切割分块；其中，接口参数包括分块大小和/或分块方式，接口参数由开发者根据知识库文档的特性进行配置。In some embodiments, the voice interaction method based on a large language model includes: when interface parameters are received, the document is segmented into blocks according to the interface parameters; wherein the interface parameters include block size and/or block method, and the interface parameters are configured by the developer according to the characteristics of the knowledge base document.

在一些实施例中，所述基于大语言模型的语音交互方法包括：将问题文本进行向量化处理，以得到用户问题向量；将用户问题向量与已搭建的知识向量库进行对比，以检索出向量相似度最高的文本段；将问题文本及文本段作为大语言模型的输入信息，输出相应的语音回复信息。In some embodiments, the voice interaction method based on the large language model includes: vectorizing the question text to obtain a user question vector; comparing the user question vector with the established knowledge vector library to retrieve the text segment with the highest vector similarity; using the question text and the text segment as input information of the large language model, and outputting corresponding voice reply information.

在一些实施例中，所述基于大语言模型的语音交互方法包括：对语音回复信息进行安全审查；在语音回复信息通过安全审查的情况下，控制智能语音设备播报语音回复信息，以使用户及时获知智能语音设备的专业知识。In some embodiments, the voice interaction method based on the large language model includes: performing a security review on the voice reply information; when the voice reply information passes the security review, controlling the intelligent voice device to broadcast the voice reply information so that the user can promptly learn the professional knowledge of the intelligent voice device.

在一些实施例中，所述基于大语言模型的语音交互方法包括：在语音回复信息未通过安全审查的情况下，控制智能语音设备播报敏感问题回复信息。In some embodiments, the large language model-based voice interaction method includes: controlling the intelligent voice device to broadcast sensitive question reply information when the voice reply information fails to pass the security review.

在一些实施例中，所述基于大语言模型的语音交互装置包括：处理器和存储有程序指令的存储器，处理器被配置为在运行程序指令时，执行前述的基于大语言模型的语音交互方法。In some embodiments, the large language model-based voice interaction device includes: a processor and a memory storing program instructions, and the processor is configured to execute the aforementioned large language model-based voice interaction method when running the program instructions.

在一些实施例中，所述智能语音设备包括：智能语音设备本体；及前述的基于大语言模型的语音交互装置，安装于智能语音设备本体。In some embodiments, the intelligent voice device includes: an intelligent voice device body; and the aforementioned large language model-based voice interaction device installed in the intelligent voice device body.

本公开实施例提供的基于大语言模型的语音交互方法、装置及智能语音设备，可以实现以下技术效果：The speech interaction method, apparatus and intelligent speech device based on the large language model provided by the embodiments of the present disclosure can achieve the following technical effects:

能够在问题文本表示用户意图调用语音说明书时，利用问题文本、已搭建的知识向量库和大语言模型，对用户的问题进行深度分析和理解；并基于对用户问题的深度理解，生成相应的语音回复信息；以便通过控制智能语音设备播报语音回复信息的方式，使用户及时获知智能语音设备的专业知识，提升了用户对智能语音设备的使用体验。When the question text indicates the user's intention to call the voice manual, it can use the question text, the established knowledge vector library and the large language model to deeply analyze and understand the user's question; and based on the deep understanding of the user's question, generate corresponding voice reply information; so that by controlling the intelligent voice device to broadcast the voice reply information, the user can timely learn the professional knowledge of the intelligent voice device, thereby improving the user's experience of using the intelligent voice device.

以上的总体描述和下文中的描述仅是示例性和解释性的，不用于限制本申请。The above general description and the following description are exemplary and explanatory only and are not intended to limit the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

一个或多个实施例通过与之对应的附图进行示例性说明，这些示例性说明和附图并不构成对实施例的限定，附图中具有相同参考数字标号的元件示为类似的元件，附图不构成比例限制，并且其中：One or more embodiments are exemplarily described by corresponding drawings, which do not limit the embodiments. Elements with the same reference numerals in the drawings are shown as similar elements, and the drawings do not constitute a scale limitation, and wherein:

图1-1是本公开实施例提供的一个应用示意图；FIG1-1 is a schematic diagram of an application provided by an embodiment of the present disclosure;

图1-2是本公开实施例提供的一个基于大语言模型的语音交互方法示意图；FIG1-2 is a schematic diagram of a voice interaction method based on a large language model provided by an embodiment of the present disclosure;

图2是本公开实施例提供的一个用于搭建知识向量库的方法示意图；FIG2 is a schematic diagram of a method for building a knowledge vector library provided by an embodiment of the present disclosure;

图3是本公开实施例提供的一个用于得到语音回复信息的方法示意图；FIG3 is a schematic diagram of a method for obtaining voice reply information provided by an embodiment of the present disclosure;

图4是本公开实施例提供的另一个基于大语言模型的语音交互方法示意图；FIG4 is a schematic diagram of another voice interaction method based on a large language model provided by an embodiment of the present disclosure;

图5是本公开实施例提供的另一个基于大语言模型的语音交互方法示意图；FIG5 is a schematic diagram of another voice interaction method based on a large language model provided by an embodiment of the present disclosure;

图6是本公开实施例提供的一个基于大语言模型的语音交互装置示意图。FIG6 is a schematic diagram of a speech interaction device based on a large language model provided in an embodiment of the present disclosure.

具体实施方式Detailed ways

为了能够更加详尽地了解本公开实施例的特点与技术内容，下面结合附图对本公开实施例的实现进行详细阐述，所附附图仅供参考说明之用，并非用来限定本公开实施例。在以下的技术描述中，为方便解释起见，通过多个细节以提供对所披露实施例的充分理解。然而，在没有这些细节的情况下，一个或多个实施例仍然可以实施。在其它情况下，为简化附图，熟知的结构和装置可以简化展示。In order to be able to understand the features and technical contents of the embodiments of the present disclosure in more detail, the implementation of the embodiments of the present disclosure is described in detail below in conjunction with the accompanying drawings. The attached drawings are for reference only and are not used to limit the embodiments of the present disclosure. In the following technical description, for the convenience of explanation, a full understanding of the disclosed embodiments is provided through multiple details. However, one or more embodiments can still be implemented without these details. In other cases, to simplify the drawings, well-known structures and devices can be simplified for display.

本公开实施例的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开实施例的实施例。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含。The terms "first", "second", etc. in the specification and claims of the embodiments of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged where appropriate, so that the embodiments of the embodiments of the present disclosure described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions.

除非另有说明，术语“多个”表示两个或两个以上。Unless otherwise stated, the term "plurality" means two or more.

本公开实施例中，字符“/”表示前后对象是一种“或”的关系。例如，A/B表示：A或B。In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an "or" relationship. For example, A/B indicates: A or B.

术语“和/或”是一种描述对象的关联关系，表示可以存在三种关系。例如，A和/或B，表示：A或B，或，A和B这三种关系。The term "and/or" is a description of the association relationship between objects, indicating that three relationships can exist. For example, A and/or B means: A or B, or, A and B.

术语“对应”可以指的是一种关联关系或绑定关系，A与B相对应指的是A与B之间是一种关联关系或绑定关系。The term "correspondence" may refer to an association relationship or a binding relationship. The correspondence between A and B means that there is an association relationship or a binding relationship between A and B.

本公开实施例中，智能家电设备是指将微处理器、传感器技术、网络通信技术引入家电设备后形成的家电产品，具有智能控制、智能感知及智能应用的特征，智能家电设备的运作过程往往依赖于物联网、互联网以及电子芯片等现代技术的应用和处理，例如智能家电设备可以通过连接电子设备，实现用户对智能家电设备的远程控制和管理。In the embodiments of the present disclosure, smart home appliances refer to home appliances that are formed by introducing microprocessors, sensor technology, and network communication technology into home appliances. They have the characteristics of intelligent control, intelligent perception, and intelligent application. The operation process of smart home appliances often relies on the application and processing of modern technologies such as the Internet of Things, the Internet, and electronic chips. For example, smart home appliances can be connected to electronic devices to enable users to remotely control and manage the smart home appliances.

本公开实施例中，终端设备是指具有无线连接功能的电子设备，终端设备可以通过连接互联网，与如上的智能家电设备进行通信连接，也可以直接通过蓝牙、Wi-Fi等方式与如上的智能家电设备进行通信连接。在一些实施例中，终端设备例如为移动设备、电脑、或悬浮车中内置的车载设备等，或其任意组合。移动设备例如可以包括手机、智能家居设备、可穿戴设备、智能移动设备、虚拟现实设备等，或其任意组合，其中，可穿戴设备例如包括：智能手表、智能手环、计步器等。In the embodiments of the present disclosure, the terminal device refers to an electronic device with a wireless connection function. The terminal device can communicate with the above-mentioned smart home appliances by connecting to the Internet, or can directly communicate with the above-mentioned smart home appliances through Bluetooth, Wi-Fi, etc. In some embodiments, the terminal device is, for example, a mobile device, a computer, or a vehicle-mounted device built into a hover car, or any combination thereof. Mobile devices may include, for example, mobile phones, smart home devices, wearable devices, smart mobile devices, virtual reality devices, etc., or any combination thereof, wherein wearable devices include, for example: smart watches, smart bracelets, pedometers, etc.

图1-1是本公开实施例提供的一个应用示意图；结合图1-1所示，在实际应用中，智能语音设备可以为配置有语音模块的空调。具体地，空调的入口网关在接收到用户输入的语音交互数据后，可以利用语音识别技术，以将语音交互数据转化为问题文本。进一步地，在空调得到问题文本后，可以采用意图落域判断技术，精准地识别用户的意图。在用户意图调用语音说明书的情况下，空调可以对问题文本进行检索。具体检索过程包括，先对问题文本进行向量化处理后，将问题文本与已搭建的知识向量库中的文本进行相似度对比，以检索出相似度最高的文本段，从而将相似度最高的文本段及问题文本作为大语言模型的输入，以得到大语言模型生成的语音回复信息。进一步地，空调可以对语音回复信息进行安全审查；在语音回复信息通过安全审查的情况下，利用文本转语音技术将文本转化为语音后，控制空调播报语音回复信息，以对用户进行问题回复；在语音回复信息未通过安全审查的情况下，控制空调生成敏感问题回复信息，并利用文本转语音技术将文本转化为语音后，控制空调播报敏感问题回复信息。其中，知识向量库可以通过获取空调的说明书作为知识库文档、再通过设置不同文档的重要性，文档处理、文档切割、词嵌入等手段实现知识向量库的构建。Figure 1-1 is an application diagram provided by an embodiment of the present disclosure; in combination with Figure 1-1, in actual applications, the intelligent voice device can be an air conditioner equipped with a voice module. Specifically, after receiving the voice interaction data input by the user, the entry gateway of the air conditioner can use voice recognition technology to convert the voice interaction data into question text. Furthermore, after the air conditioner obtains the question text, the intention domain judgment technology can be used to accurately identify the user's intention. In the case where the user intends to call the voice manual, the air conditioner can retrieve the question text. The specific retrieval process includes, after vectorizing the question text, comparing the similarity between the question text and the text in the knowledge vector library that has been built, so as to retrieve the text segment with the highest similarity, so as to use the text segment with the highest similarity and the question text as the input of the large language model to obtain the voice reply information generated by the large language model. Furthermore, the air conditioner can conduct a security review of the voice reply information; if the voice reply information passes the security review, the text is converted into voice using the text-to-speech technology, and the air conditioner is controlled to broadcast the voice reply information to answer the user's questions; if the voice reply information fails the security review, the air conditioner is controlled to generate sensitive question reply information, and the text is converted into voice using the text-to-speech technology, and the air conditioner is controlled to broadcast the sensitive question reply information. Among them, the knowledge vector library can be constructed by obtaining the manual of the air conditioner as the knowledge base document, and then by setting the importance of different documents, document processing, document segmentation, word embedding and other means.

图1-2是本公开实施例提供的一个基于大语言模型的语音交互方法示意图；结合图1-2所示，本公开实施例提供一种基于大语言模型的语音交互方法，包括：FIG1-2 is a schematic diagram of a voice interaction method based on a large language model provided by an embodiment of the present disclosure. In conjunction with FIG1-2, an embodiment of the present disclosure provides a voice interaction method based on a large language model, including:

S11，在接收到用户输入的语音交互数据的情况下，智能语音设备将语音交互数据转化为问题文本。S11, when receiving the voice interaction data input by the user, the intelligent voice device converts the voice interaction data into question text.

S12，在问题文本表示用户意图调用语音说明书的情况下，智能语音设备根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息。S12, when the question text indicates that the user intends to call the voice manual, the intelligent voice device obtains the voice reply information of the intelligent voice device according to the question text, the constructed knowledge vector library and the large language model.

S13，智能语音设备控制其播报语音回复信息，以使用户及时获知智能语音设备的专业知识。S13, the intelligent voice device controls its broadcast voice reply information so that the user can learn the professional knowledge of the intelligent voice device in time.

在本方案中，智能语音设备是指具备语音模块的智能家电设备。作为一种示例，智能语音设备可以为智能空调。具体地，智能语音设备的入口网关在接收到用户输入的语音交互数据后，智能语音设备可以利用ASR技术(Automatic Speech Recognition，自动语音识别)，以将语音交互数据转化为问题文本。在这一过程中，ASR会识别语音中的音节、单词及短语等，并将其转换为对应的文本形式。以此方案，利用自动语音识别技术进行语音交互数据的转化，以快速得到更加精准地问题文本。In this solution, an intelligent voice device refers to an intelligent home appliance device with a voice module. As an example, the intelligent voice device can be an intelligent air conditioner. Specifically, after the entry gateway of the intelligent voice device receives the voice interaction data input by the user, the intelligent voice device can use ASR technology (Automatic Speech Recognition) to convert the voice interaction data into question text. In this process, ASR will recognize syllables, words, and phrases in the voice and convert them into corresponding text forms. In this solution, automatic speech recognition technology is used to convert voice interaction data to quickly obtain more accurate question text.

可选地，在智能语音设备得到问题文本后，可以判断问题文本所表示的用户意图。在一种示例中，可以采用意图落域判断技术，精准地识别用户的意图；一旦利用意图落域判断技术判断出用户意图为调用语音说明书时，智能语音设备将开启语音说明书功能。Optionally, after the intelligent voice device obtains the question text, it can determine the user's intention represented by the question text. In one example, the intention domain judgment technology can be used to accurately identify the user's intention; once the intention domain judgment technology is used to determine that the user's intention is to call the voice manual, the intelligent voice device will start the voice manual function.

在本实施例中，可以采用13B量级的大语言模型作为核心基座模型，以通过选取高性能的大语言模型，提供强大而高效的自然语言处理能力；进一步地，可以在智能语音设备部署基座模型、以及与之相适配的训练、存储和应用平台。其中，应用平台能够支持对生成的各类应用进行账号鉴权操作管理，以确保用户数据的安全与隐私；同时，应用平台还支持知识库的动态录入，使得新的知识和信息能够实时更新到智能语音设备中。该设计使用户能够根据实际需求，灵活定制和调整应用的功能，实现个性化服务。In this embodiment, a large language model of 13B can be used as the core base model to provide powerful and efficient natural language processing capabilities by selecting a high-performance large language model; further, the base model and the corresponding training, storage and application platform can be deployed on the intelligent voice device. Among them, the application platform can support account authentication operation management for various generated applications to ensure the security and privacy of user data; at the same time, the application platform also supports dynamic entry of the knowledge base, so that new knowledge and information can be updated to the intelligent voice device in real time. This design enables users to flexibly customize and adjust the functions of the application according to actual needs to achieve personalized services.

作为一种方式，可以采用前述方案进行语音说明书功能的构建，语音说明书功能的使用方法与智能语音设备其他功能的使用方法一致。这样，用户可以通过语音方式获取详细的智能语音设备的使用说明，提高了使用的便捷性和用户体验。As a method, the above solution can be used to construct the voice manual function, and the method of using the voice manual function is consistent with the method of using other functions of the intelligent voice device. In this way, users can obtain detailed instructions for using the intelligent voice device through voice, which improves the convenience of use and user experience.

在一种优化的方案中，为了保证语音说明书功能能够优先使用，还可以在语音技能平台上单独配置意图理解的优先级。这样，当用户发出与语音说明书相关的语音指令时，智能语音设备会优先识别并触发这一功能，确保用户能够迅速获得所需的信息。In an optimized solution, in order to ensure that the voice instructions function can be used first, the priority of intent understanding can be configured separately on the voice skill platform. In this way, when the user issues a voice instruction related to the voice instructions, the intelligent voice device will recognize and trigger this function first, ensuring that the user can quickly obtain the required information.

具体地，智能语音设备根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息，包括：智能语音设备将问题文本进行向量化处理，以得到用户问题向量。智能语音设备将用户问题向量与已搭建的知识向量库进行对比，以检索出向量相似度最高的文本段。智能语音设备将问题文本及文本段作为大语言模型的输入信息，输出相应的语音回复信息。以此方案，能够在确定启动语音说明书功能后，实现语音回复信息的精准获取。Specifically, the intelligent voice device obtains the voice reply information of the intelligent voice device according to the question text, the built knowledge vector library and the large language model, including: the intelligent voice device vectorizes the question text to obtain the user question vector. The intelligent voice device compares the user question vector with the built knowledge vector library to retrieve the text segment with the highest vector similarity. The intelligent voice device uses the question text and the text segment as input information of the large language model and outputs the corresponding voice reply information. With this solution, after determining to start the voice manual function, the voice reply information can be accurately obtained.

进一步地，智能语音设备在得到语音回复信息后，可以控制其播报语音回复信息，以使用户及时获知智能语音设备的专业知识。Furthermore, after receiving the voice reply information, the intelligent voice device can be controlled to broadcast the voice reply information so that the user can learn the professional knowledge of the intelligent voice device in a timely manner.

采用本公开实施例提供的基于大语言模型的语音交互方法，能够在问题文本表示用户意图调用语音说明书时，利用问题文本、已搭建的知识向量库和大语言模型，对用户的问题进行深度分析和理解；并基于对用户问题的深度理解，生成相应的语音回复信息；以便通过控制智能语音设备播报语音回复信息的方式，使用户及时获知智能语音设备的专业知识，提升了用户对智能语音设备的使用体验。By adopting the voice interaction method based on the large language model provided in the embodiment of the present disclosure, when the question text indicates the user's intention to call the voice manual, the question text, the constructed knowledge vector library and the large language model can be used to deeply analyze and understand the user's question; and based on the deep understanding of the user's question, corresponding voice reply information is generated; so that by controlling the intelligent voice device to broadcast the voice reply information, the user can timely learn the professional knowledge of the intelligent voice device, thereby improving the user's experience of using the intelligent voice device.

图2是本公开实施例提供的一个用于搭建知识向量库的方法示意图；结合图2所示，可选地，通过以下方式搭建知识向量库：FIG2 is a schematic diagram of a method for building a knowledge vector library provided by an embodiment of the present disclosure; in conjunction with FIG2 , optionally, the knowledge vector library is built in the following manner:

S21，智能语音设备调用与智能语音设备的标识信息相匹配的说明书，以作为知识文档。S21, the intelligent voice device calls the instruction manual that matches the identification information of the intelligent voice device as a knowledge document.

S22，智能语音设备将知识文档添加至知识库文档中，并进行文档处理。S22, the intelligent voice device adds the knowledge document to the knowledge base document and performs document processing.

S23，智能语音设备利用词嵌入技术将处理后的文档转化为向量矩阵，以得到知识向量库。S23, the intelligent voice device uses word embedding technology to convert the processed document into a vector matrix to obtain a knowledge vector library.

在本方案中，智能语音设备的标识信息包括智能语音设备的型号信息。具体地，智能语音设备可以从服务端中调用与智能语音设备的型号信息相匹配的说明书。这里，服务端为云端服务器，服务端中可以存储不同智能语音设备的型号信息与说明书的匹配关系。以此方案，能够更加精准地调用与智能语音设备的标识信息相匹配的说明书。这样，在智能语音设备调用与智能语音设备的标识信息相匹配的说明书后，可以将其作为知识文档。In this solution, the identification information of the intelligent voice device includes the model information of the intelligent voice device. Specifically, the intelligent voice device can call the instruction manual that matches the model information of the intelligent voice device from the server. Here, the server is a cloud server, and the matching relationship between the model information and the instruction manual of different intelligent voice devices can be stored in the server. With this solution, the instruction manual that matches the identification information of the intelligent voice device can be called more accurately. In this way, after the intelligent voice device calls the instruction manual that matches the identification information of the intelligent voice device, it can be used as a knowledge document.

进一步地，智能语音设备将知识文档添加至知识库文档中。这里，知识库文档还包括各种技术原理文档、品牌宣传文档等内容。需要说明的是，若知识库文档中存在不能直接被自然语言处理技术处理的内容，将提醒整理人员进行人工整理，以将其转化为文本形式。其中，不能直接被自然语言处理技术处理的内容包括但不限于图片内容或者表格内容。Furthermore, the intelligent voice device adds the knowledge document to the knowledge base document. Here, the knowledge base document also includes various technical principle documents, brand promotion documents and other contents. It should be noted that if there is content in the knowledge base document that cannot be directly processed by natural language processing technology, the organizer will be reminded to manually organize it to convert it into text form. Among them, the content that cannot be directly processed by natural language processing technology includes but is not limited to picture content or table content.

进一步地，在智能语音设备将知识文档添加至知识库文档后，需要对文档进行处理。这里，文档处理过程包括但不限于文档语言处理、文档格式处理、文档命名处理中的一种或多种。另外，在对文档进行前述处理后，还需对文档进行切割分块处理。以此方案，便于进行后续的词嵌入操作。Furthermore, after the intelligent voice device adds the knowledge document to the knowledge base document, the document needs to be processed. Here, the document processing process includes but is not limited to one or more of document language processing, document format processing, and document naming processing. In addition, after the aforementioned processing of the document, the document needs to be cut and divided into blocks. With this solution, subsequent word embedding operations are facilitated.

进一步地，在智能语音设备对文档进行处理后，可以利用词嵌入技术将处理后的文档转化为向量矩阵，以得到知识向量库。具体地，在词嵌入阶段，可以将说明书等文档转化为向量矩阵形式，即转化为计算机可以理解的格式。作为一种示例，经过测试研究，可以选取专门针对中文的M3E(Moka Massive Mixed Embedding)模型，以确保向量化的准确性和高效性。以此方案，知识库文档将被成功转化为知识向量库，为后续的语音说明书应用及其他相关功能提供了强大的数据支持。Furthermore, after the intelligent voice device processes the document, the word embedding technology can be used to convert the processed document into a vector matrix to obtain a knowledge vector library. Specifically, in the word embedding stage, documents such as instructions can be converted into a vector matrix form, that is, converted into a format that can be understood by the computer. As an example, after testing and research, the M3E (Moka Massive Mixed Embedding) model specifically for Chinese can be selected to ensure the accuracy and efficiency of vectorization. With this solution, the knowledge base document will be successfully converted into a knowledge vector library, providing strong data support for subsequent voice instruction manual applications and other related functions.

可选地，在智能语音设备对文档进行处理前，还包括：智能语音设备对不同文档设置不同的重要性级别。以此方案，有效避免了针对同一问题出现多个不同回复内容的情况，确保了智能语音设备能够根据文档的重要性选取最合适的回复文本。Optionally, before the intelligent voice device processes the document, the intelligent voice device sets different importance levels for different documents. This solution effectively avoids the situation where multiple different responses to the same question appear, ensuring that the intelligent voice device can select the most appropriate response text according to the importance of the document.

可选地，文档处理包括：Optionally, document processing includes:

智能语音设备确定文档中的目标英文内容，将目标英文内容转换为中文内容；目标英文内容包括文档中非智能语音设备专有名词和品牌名称的英文内容。The intelligent voice device determines target English content in the document and converts the target English content into Chinese content; the target English content includes English content of non-intelligent voice device proper nouns and brand names in the document.

和/或，and / or,

智能语音设备确定文档中待转换格式文档，将待转换格式文档转换为目标格式文档；待转换文档为文档中非目标格式的文档。The intelligent voice device determines a document in a format to be converted in the document, and converts the document in the format to be converted into a document in a target format; the document to be converted is a document in a non-target format in the document.

在本方案中，可以理解地，在不同语种中，将单词或其他文本单位映射到连续向量空间中的表示方法并不相同。为了避免语种混用可能导致乱码或影响后续的语义分析的情况，可以进行文档语言处理。具体地，智能语音设备可以确定文档中的目标英文内容。其中，目标英文内容包括文档中非智能语音设备专有名词和非品牌名称的英文内容。以此方案，能够实现目标英文内容的精准确定。进一步地，智能语音设备可以将目标英文内容转换为中文内容，以保证文档中的语言统一。In this solution, it can be understood that in different languages, the representation methods for mapping words or other text units into continuous vector spaces are not the same. In order to avoid the situation where language mixing may cause garbled characters or affect subsequent semantic analysis, document language processing can be performed. Specifically, the intelligent voice device can determine the target English content in the document. Among them, the target English content includes non-intelligent voice device proper nouns and non-brand name English content in the document. With this solution, accurate determination of the target English content can be achieved. Furthermore, the intelligent voice device can convert the target English content into Chinese content to ensure the language uniformity in the document.

可以理解地，经测试得到docx格式的文档在处理效果和兼容性方面表现优于pdf格式和md格式等文档。因此，可以将docx格式的文档作为目标格式文档。这样，可以在智能语音设备确定文档中待转换格式文档后，将待转换格式文档转换为目标格式文档。以此方案，有助于确保后续的自然语言处理任务能够更高效地处理这些文档。Understandably, it has been tested that documents in docx format perform better than documents in pdf format and md format in terms of processing effect and compatibility. Therefore, documents in docx format can be used as target format documents. In this way, after the intelligent voice device determines the document in the format to be converted in the document, the document in the format to be converted can be converted into the target format document. This solution helps to ensure that subsequent natural language processing tasks can process these documents more efficiently.

可选地，文档处理包括：Optionally, document processing includes:

智能语音设备对文档名称进行调整，以使文档名称的字数满足第一字数范围。The intelligent voice device adjusts the document name so that the number of characters in the document name meets the first character number range.

和/或，and / or,

智能语音设备对文档中第一标题及第二标题进行调整，以使第一标题及第二标题的字数满足第二字数范围。The intelligent voice device adjusts the first title and the second title in the document so that the number of words in the first title and the second title meets the second word number range.

其中，第一字数范围的最小值大于第二字数范围的最大值。The minimum value of the first word number range is greater than the maximum value of the second word number range.

在本方案中，为了有效提高文档的可读性和管理效率，可以对文档名称及文档中的标题名称的字数进行调整。具体地，智能语音设备可以对文档名称进行调整，以使文档名称的字数满足第一字数范围。作为一种示例，第一字数范围的取值为9至11。以此方案，能够使用简洁明了的词语或短语来命名文档，以确保文档名称的简洁性。In this solution, in order to effectively improve the readability and management efficiency of the document, the number of words in the document name and the title name in the document can be adjusted. Specifically, the intelligent voice device can adjust the document name so that the number of words in the document name meets the first word number range. As an example, the value of the first word number range is 9 to 11. With this solution, concise and clear words or phrases can be used to name the document to ensure the simplicity of the document name.

可选地，文档在命名时避免使用无意义的数字、符号或缩写，以保证文档命名的规整性。Optionally, avoid using meaningless numbers, symbols or abbreviations when naming the document to ensure the regularity of the document naming.

可选地，智能语音设备对文档中的段落进行打标加权，以便区分不同段落的重要性和相关性。Optionally, the intelligent voice device tags and weights the paragraphs in the document so as to distinguish the importance and relevance of different paragraphs.

在本方案中，为了有效提高文档的可读性和管理效率，还可以对文档中第一标题及第二标题进行调整。具体地，智能语音设备可以对第一标题及第二标题进行调整，以使第一标题及第二标题的字数满足第二字数范围。作为一种示例，第二字数范围的取值为4至6。以此方案，能够确保标题名称的简洁性。In this solution, in order to effectively improve the readability and management efficiency of the document, the first title and the second title in the document can also be adjusted. Specifically, the intelligent voice device can adjust the first title and the second title so that the number of words in the first title and the second title meets the second word range. As an example, the value of the second word range is 4 to 6. With this solution, the simplicity of the title name can be ensured.

可选地，在接收到接口参数的情况下，智能语音设备按照接口参数对文档进行切割分块；Optionally, upon receiving the interface parameters, the intelligent voice device segments the document according to the interface parameters;

其中，接口参数包括分块大小和/或分块方式，接口参数由开发者根据知识库文档的特性进行配置。The interface parameters include the block size and/or block mode, and the interface parameters are configured by the developer according to the characteristics of the knowledge base document.

在本方案中，可以理解地，文档的上下文数据的分块对于提升信息检索和处理的效率至关重要。因此，开发者可以根据知识库文档的特点和需求，灵活配置接口参数。即，开发者可以设计分块大小及分块方式作为接口参数。从而可以在智能语音设备接收到接口参数的情况下，智能语音设备可以按照接口参数对文档进行切割分块。以此方案，通过合理的文档切割，可以将长篇文档划分为多个段落或章节，便于后续的自然语言处理任务进行更精细化的分析和处理。In this solution, it can be understood that the segmentation of the context data of the document is crucial to improving the efficiency of information retrieval and processing. Therefore, developers can flexibly configure the interface parameters according to the characteristics and requirements of the knowledge base documents. That is, developers can design the segmentation size and segmentation method as interface parameters. Thus, when the intelligent voice device receives the interface parameters, the intelligent voice device can cut and segment the document according to the interface parameters. With this solution, through reasonable document segmentation, long documents can be divided into multiple paragraphs or chapters, which facilitates more refined analysis and processing of subsequent natural language processing tasks.

图3是本公开实施例提供的一个用于得到语音回复信息的方法示意图；结合图3所示，可选地，S12，智能语音设备根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息，包括：FIG3 is a schematic diagram of a method for obtaining voice reply information provided by an embodiment of the present disclosure; in conjunction with FIG3 , optionally, in S12, the intelligent voice device obtains voice reply information of the intelligent voice device according to the question text, the constructed knowledge vector library and the large language model, including:

S31，智能语音设备将问题文本进行向量化处理，以得到用户问题向量。S31, the intelligent voice device vectorizes the question text to obtain a user question vector.

S32，智能语音设备将用户问题向量与已搭建的知识向量库进行对比，以检索出向量相似度最高的文本段。S32, the intelligent voice device compares the user question vector with the established knowledge vector library to retrieve the text segment with the highest vector similarity.

S33，智能语音设备将问题文本及文本段作为大语言模型的输入信息，输出相应的语音回复信息。S33, the intelligent voice device uses the question text and text segment as input information of the large language model and outputs corresponding voice reply information.

在本方案中，智能语音设备可以将问题文本进行向量化处理，以得到用户问题向量。具体地，用户的问题文本可通过词嵌入技术被转换为向量形式。其中，词嵌入技术是一种将单词或短语从词汇表映射到向量空间的技术。通过该技术，智能语音设备可以将文本信息转化为计算机能够理解和处理的数值形式。这种向量化表示不仅保留了文本中的语义信息，而且便于进行后续的向量运算和相似度比较。In this solution, the intelligent voice device can vectorize the question text to obtain the user question vector. Specifically, the user's question text can be converted into a vector form through word embedding technology. Among them, word embedding technology is a technology that maps words or phrases from a vocabulary to a vector space. Through this technology, the intelligent voice device can convert text information into a numerical form that can be understood and processed by a computer. This vectorized representation not only retains the semantic information in the text, but also facilitates subsequent vector operations and similarity comparisons.

进一步地，智能语音设备可以将用户问题向量与已搭建的知识向量库进行对比，以检索出向量相似度最高的文本段。具体地，可以将用户问题向量与知识向量库中的各个文本段向量进行对比，以检索出相似度最高的文本段。这里，向量相似度最高的文本段可以为一个，也可以为多个。Furthermore, the intelligent voice device can compare the user question vector with the built knowledge vector library to retrieve the text segment with the highest vector similarity. Specifically, the user question vector can be compared with each text segment vector in the knowledge vector library to retrieve the text segment with the highest similarity. Here, the text segment with the highest vector similarity can be one or more.

可选地，可以通过多种手段进行对比，包括但不限于元数据过滤、图关系检索、关键词检索等。Optionally, the comparison may be performed by a variety of means, including but not limited to metadata filtering, graph relationship retrieval, keyword retrieval, etc.

可选地，元数据过滤包括在数据块中添加元数据，如上市日期、研发部门、型号、品牌等。这样，可以更快速地定位到与用户问题相关的文本段，从而提高检索效率。Optionally, metadata filtering includes adding metadata to the data block, such as launch date, R&D department, model, brand, etc. In this way, the text segment related to the user's question can be located more quickly, thereby improving the retrieval efficiency.

可选地，图关系检索包括将知识库文档中的实体转化为节点，并建立两个实体之间的相关联系，构建图数据结构。这样，能够更准确地理解知识之间的关系，特别是针对多跳问题，图数据索引能够显著提高检索的相关度。Optionally, graph relationship retrieval includes converting entities in knowledge base documents into nodes, establishing relevant connections between two entities, and constructing a graph data structure. In this way, the relationship between knowledge can be understood more accurately, especially for multi-hop problems, graph data indexing can significantly improve the relevance of retrieval.

可选地，关键词检索作为传统的检索方式，有助于快速找到与用户问题相关的文本块，增加检索效率。Optionally, keyword search, as a traditional search method, helps to quickly find text blocks related to user questions and increase search efficiency.

作为一种优选方案，还可以通过重排序、重新表述、假设性对话嵌入(HyDE)和子查询等手段进一步提升检索效果。As a preferred solution, the retrieval effect can be further improved by re-ranking, reformulation, hypothetical conversation embedding (HyDE) and sub-queries.

具体地，重排序技术是根据组合相关度、匹配度等因素对检索结果进行重新排序，以使结果更符合语音说明书业务场景。Specifically, the re-ranking technology is to re-rank the search results according to factors such as combined relevance and matching degree, so as to make the results more suitable for the voice manual business scenario.

重新表述技术可以在检索增强生成系统找不到相关上下文时，令大语言模型重新表述问题并重新提交，以提高检索成功率。The reformulation technology allows the large language model to rephrase the question and resubmit it when the retrieval enhancement generation system cannot find relevant context, so as to improve the retrieval success rate.

假设性对话嵌入技术可以生成假设回复并将其与用户问题一同进行嵌入向量查找，从而提高检索增强生成系统的性能。Hypothetical conversation embedding technology can generate hypothetical responses and perform embedding vector lookup together with user questions, thereby improving the performance of retrieval-augmented generation systems.

子查询技术是对复杂问题进行分解，以提高大语言模型的回复效果。Subquery technology is used to decompose complex questions to improve the response effect of large language models.

进一步地，智能语音设备可以将问题文本及文本段作为大语言模型的输入信息，输出相应的语音回复信息。具体地，智能语音设备可以将用户问题文本以及检索到的文本段同时输入给大语言模型。大语言模型可利用其自然语言处理能力，结合用户问题和相关文本段的信息，生成对应问题的回复。以此方案，不仅能够提供更准确、更全面的回答，还能够根据用户的实际需求进行个性化的回复。Furthermore, the intelligent voice device can use the question text and text segment as input information of the large language model and output the corresponding voice reply information. Specifically, the intelligent voice device can input the user question text and the retrieved text segment to the large language model at the same time. The large language model can use its natural language processing capabilities to combine the user question and the information of the relevant text segment to generate a reply to the corresponding question. With this solution, not only can more accurate and comprehensive answers be provided, but also personalized replies can be made according to the actual needs of the user.

图4是本公开实施例提供的另一个基于大语言模型的语音交互方法示意图；结合图4所示，本公开实施例提供的另一种基于大语言模型的语音交互方法，包括：FIG4 is a schematic diagram of another voice interaction method based on a large language model provided by an embodiment of the present disclosure; in conjunction with FIG4 , another voice interaction method based on a large language model provided by an embodiment of the present disclosure includes:

S41，在接收到用户输入的语音交互数据的情况下，智能语音设备将语音交互数据转化为问题文本。S41, when receiving the voice interaction data input by the user, the intelligent voice device converts the voice interaction data into question text.

S42，在问题文本表示用户意图调用语音说明书的情况下，智能语音设备根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息。S42, when the question text indicates that the user intends to call the voice manual, the intelligent voice device obtains the voice reply information of the intelligent voice device according to the question text, the constructed knowledge vector library and the large language model.

S43，智能语音设备对语音回复信息进行安全审查。S43, the intelligent voice device performs a security review on the voice reply information.

S44，在语音回复信息通过安全审查的情况下，智能语音设备控制其播报语音回复信息，以使用户及时获知智能语音设备的专业知识。S44, when the voice reply information passes the security review, the intelligent voice device controls it to broadcast the voice reply information so that the user can learn the professional knowledge of the intelligent voice device in time.

在本方案中，在得到智能语音设备的语音回复信息后，智能语音设备需要对语音回复信息进行安全审查，以确保生成的回复内容安全、准确、合法，并符合业务场景的要求。In this solution, after receiving the voice reply information from the intelligent voice device, the intelligent voice device needs to conduct a security review of the voice reply information to ensure that the generated reply content is safe, accurate, legal, and meets the requirements of the business scenario.

具体地，智能语音设备对语音回复信息进行安全审查，包括：语音回复信息中是否存在误导性输出、隐私泄露、版权问题、公共性和偏见，及政策法规等问题。Specifically, intelligent voice devices conduct security reviews of voice response information, including: whether there are misleading outputs, privacy leaks, copyright issues, publicity and bias, as well as policy and regulatory issues in the voice response information.

可以理解地，大语言模型在生成回答时，可能会因为对语境理解的偏差或数据污染而产生误导性的输出。这种输出虽然看似合理，但实际上可能包含错误或误导性的信息，导致用户做出错误的决策或产生误解。因此，需要对大语言模型生成的回答进行严格的审核和验证，确保其准确性和可靠性。Understandably, when generating responses, large language models may produce misleading outputs due to biased understanding of context or data contamination. Although such outputs may seem reasonable, they may actually contain erroneous or misleading information, causing users to make wrong decisions or misunderstandings. Therefore, the responses generated by large language models need to be rigorously reviewed and verified to ensure their accuracy and reliability.

可以理解地，在大语言模型的训练过程中，如果使用了包含敏感信息的数据集，模型可能会学习到敏感信息，并在后续的交互中无意中泄露。因此，在训练大语言模型时，需要确保数据集的隐私性和安全性，避免使用包含敏感信息的数据。Understandably, during the training of a large language model, if a dataset containing sensitive information is used, the model may learn sensitive information and inadvertently leak it in subsequent interactions. Therefore, when training a large language model, it is necessary to ensure the privacy and security of the dataset and avoid using data containing sensitive information.

可以理解地，在大语言模型生成回答时，可能会复制训练数据中的一些内容，从而引发版权问题。因此，为避免这种情况，需要确保训练数据的版权合法性，并在使用大语言模型时遵守相关的版权法规。Understandably, when a large language model generates answers, it may copy some of the content in the training data, which may cause copyright issues. Therefore, to avoid this, it is necessary to ensure the legality of the copyright of the training data and comply with relevant copyright laws when using large language models.

可以理解地，由于训练数据的局限性和偏见性，大语言模型可能会输出一些存在社会偏见和不公平性的内容，这可能导致用户对某些群体或观点产生误解或歧视。因此，为减少偏见的影响，需要优化大语言模型的训练算法和数据集，提高模型的公平性和客观性。Understandably, due to the limitations and bias of training data, large language models may output some content that is socially biased and unfair, which may cause users to misunderstand or discriminate against certain groups or viewpoints. Therefore, in order to reduce the impact of bias, it is necessary to optimize the training algorithm and data set of large language models to improve the fairness and objectivity of the model.

可以理解地，大语言模型的输出可能涉及法律法规的敏感内容，如政治敏感、违法言论等。因此，为避免违反相关法律法规，需要对大语言模型生成的回答进行严格的过滤和审查，确保其内容符合法律法规的要求。Understandably, the output of the large language model may involve sensitive content of laws and regulations, such as political sensitivity, illegal speech, etc. Therefore, in order to avoid violating relevant laws and regulations, the answers generated by the large language model need to be strictly filtered and reviewed to ensure that their content complies with the requirements of laws and regulations.

进一步地，可以在语音回复信息通过安全审查的情况下，智能语音设备可以控制其播报语音回复信息，以使用户及时获知智能语音设备的专业知识。具体地，可以在语音回复信息通过安全审查的情况下，利用文本转语音技术(Text To Speech，TTS)将文本转化为语音，并通过语音形式向用户进行回复。以此方案，便于用户及时获知专业知识，有助于提高用户对智能语音设备的使用体验。Furthermore, if the voice reply information passes the security review, the intelligent voice device can control it to broadcast the voice reply information so that the user can learn the professional knowledge of the intelligent voice device in time. Specifically, if the voice reply information passes the security review, the text can be converted into speech using text-to-speech technology (TTS), and the user can be replied to in voice form. This solution makes it easier for users to learn professional knowledge in a timely manner, which helps to improve the user experience of the intelligent voice device.

图5是本公开实施例提供的另一个基于大语言模型的语音交互方法示意图；结合图5所示，本公开实施例提供的另一种基于大语言模型的语音交互方法，包括：FIG5 is a schematic diagram of another voice interaction method based on a large language model provided by an embodiment of the present disclosure; in conjunction with FIG5 , another voice interaction method based on a large language model provided by an embodiment of the present disclosure includes:

S51，在接收到用户输入的语音交互数据的情况下，智能语音设备将语音交互数据转化为问题文本。S51, when receiving the voice interaction data input by the user, the intelligent voice device converts the voice interaction data into question text.

S52，在问题文本表示用户意图调用语音说明书的情况下，智能语音设备根据问题文本、已搭建的知识向量库及大语言模型，得到智能语音设备的语音回复信息。S52, when the question text indicates that the user intends to call the voice manual, the intelligent voice device obtains the voice reply information of the intelligent voice device according to the question text, the constructed knowledge vector library and the large language model.

S53，智能语音设备对语音回复信息进行安全审查。S53, the intelligent voice device performs a security review on the voice reply information.

S54，在语音回复信息未通过安全审查的情况下，智能语音设备控制其播报敏感问题回复信息。S54, when the voice reply information fails to pass the security review, the intelligent voice device controls it to broadcast the sensitive question reply information.

在本方案中，在智能语音设备对语音回复信息进行安全审查后，若语音回复信息未通过安全审查，则智能语音设备控制其播报敏感问题回复信息。这里，敏感问题回复是指针对可能出现的敏感、争议或高风险问题预设的标准答案。以此方案，有效地保护了用户权益和信息安全。In this solution, after the intelligent voice device conducts a security review of the voice reply information, if the voice reply information fails the security review, the intelligent voice device controls it to broadcast the sensitive question reply information. Here, the sensitive question reply refers to the preset standard answers for sensitive, controversial or high-risk questions that may arise. This solution effectively protects user rights and information security.

可选地，本公开实施例提供一种基于大语言模型的语音交互方法，包括：Optionally, an embodiment of the present disclosure provides a voice interaction method based on a large language model, comprising:

在接收到用户输入的语音交互数据的情况下，智能语音设备将语音交互数据转化为问题文本。When receiving the voice interaction data input by the user, the intelligent voice device converts the voice interaction data into question text.

智能语音设备根据用户的标识信息，调用专属向量库。The intelligent voice device calls the exclusive vector library based on the user's identification information.

在问题文本表示用户意图调用语音说明书的情况下，智能语音设备根据问题文本、专属向量库及大语言模型，得到智能语音设备的语音回复信息。When the question text indicates that the user intends to call the voice manual, the intelligent voice device obtains the voice reply information of the intelligent voice device based on the question text, the exclusive vector library and the large language model.

智能语音设备控制智能语音设备播报语音回复信息，为用户提供个性化的语音服务。The intelligent voice device controls the intelligent voice device to broadcast voice reply information and provide personalized voice services for users.

采用本公开实施例提供的基于大语言模型的语音交互方法，能够精准的将语音交互数据转化为问题文本，以结合用户的标识信息调用适用于用户的专属向量库；从而在识别到用户意图调用语音说明书的情况下，利用大语言模型结合问题文本和专属向量库，生成智能的语音回复。该回复不仅准确匹配了用户的问题，还充分考虑了用户的个性化需求，回复内容更加符合用户的期望。同时，通过控制智能语音设备播报生成的语音回复的方式，用户无需进行额外的操作，即可获得所需的语音说明书信息，提高了交互的便捷性和效率及用户的使用体验感。The voice interaction method based on the large language model provided by the embodiment of the present disclosure can accurately convert voice interaction data into question text, so as to call the exclusive vector library suitable for the user in combination with the user's identification information; thus, when it is recognized that the user intends to call the voice instructions, the large language model is used in combination with the question text and the exclusive vector library to generate an intelligent voice reply. This reply not only accurately matches the user's question, but also fully considers the user's personalized needs, and the reply content is more in line with the user's expectations. At the same time, by controlling the intelligent voice device to broadcast the generated voice reply, the user can obtain the required voice instruction information without performing additional operations, which improves the convenience and efficiency of the interaction and the user's experience.

可选地，通过以下方式搭建专属向量库：Optionally, build a dedicated vector library by:

智能语音设备获取用于搭建专属向量库所需的用户隐私文档、用户历史文档及智能语音设备的说明书文档。The intelligent voice device obtains the user privacy document, user history document and instruction document of the intelligent voice device required for building a dedicated vector library.

智能语音设备将用户隐私文档、用户历史文档及智能语音设备的说明书文档添加至知识库文档中，并进行文档处理。The intelligent voice device adds the user privacy document, the user history document and the instruction document of the intelligent voice device to the knowledge base document and performs document processing.

智能语音设备利用词嵌入技术将处理后的文档转化为向量矩阵，以得到专属向量库。Intelligent voice devices use word embedding technology to convert processed documents into vector matrices to obtain a proprietary vector library.

以此方案，知识库文档将被成功转化为专属向量库，该向量库中只含有对应用户的私人信息，提供个性化服务的同时不存在信息泄露的风险。同时，为后续的语音说明书应用及其他相关功能提供了强大的数据支持。With this solution, the knowledge base documents will be successfully converted into a dedicated vector library, which only contains the private information of the corresponding user, providing personalized services without the risk of information leakage. At the same time, it provides strong data support for subsequent voice manual applications and other related functions.

可选地，智能语音设备获取用户历史文档，包括：Optionally, the intelligent voice device obtains user history documents, including:

在用户通过智能语音设备的语音说明书功能完成交互的情况下，智能语音设备判断智能语音设备是否接收到用户的正面反馈。When the user completes the interaction through the voice instruction function of the intelligent voice device, the intelligent voice device determines whether the intelligent voice device receives positive feedback from the user.

在智能语音设备接收到用户的正面反馈的情况下，智能语音设备将该次交互的对话与预存的文档进行相似度对比。When the intelligent voice device receives positive feedback from the user, the intelligent voice device compares the interactive dialogue with the pre-stored documents for similarity.

若该次交互的对话与预存的文档的相似度低于预设阈值，智能语音设备将该次交互的对话作为用户历史文档。If the similarity between the interactive dialogue and the pre-stored document is lower than a preset threshold, the intelligent voice device will use the interactive dialogue as a user history document.

在本方案中，用户可以通过语音说明书功能完成交互操作。具体地，在用户通过智能语音设备的语音说明书功能完成交互后，语义助手可以通过语音或应用程序端向用户咨询使用体验。这样，能够通过用户回复的使用体验判断其是否接收到用户的正面反馈。例如，若用户回复为正面评价，则判定为正面反馈。以此方案，能够结合用户的交互评价进行用户反馈信息的精准判断。In this solution, users can complete interactive operations through the voice instructions function. Specifically, after the user completes the interaction through the voice instructions function of the intelligent voice device, the semantic assistant can consult the user about the user experience through voice or application. In this way, it is possible to judge whether it has received positive feedback from the user based on the user experience of the user's reply. For example, if the user replies with a positive evaluation, it is determined to be positive feedback. With this solution, it is possible to accurately judge the user feedback information in combination with the user's interactive evaluation.

进一步地，可以在智能语音设备接收到用户的正面反馈的情况下，智能语音设备将该次交互的对话与预存的文档进行相似度对比。具体地，在智能语音设备接收到用户的正面反馈的情况下，智能语音设备将该次交互的对话与预存的文档进行相似度对比，包括：Furthermore, when the intelligent voice device receives positive feedback from the user, the intelligent voice device can compare the similarity between the interactive dialogue and the pre-stored document. Specifically, when the intelligent voice device receives positive feedback from the user, the intelligent voice device can compare the similarity between the interactive dialogue and the pre-stored document, including:

在智能语音设备接收到用户的正面反馈的情况下，智能语音设备提取该次交互的问答对；智能语音设备将提取的问答对进行向量化处理，以得到问答对的文本向量；智能语音设备将问答对的文本向量与当前专属向量库中预存的文档向量进行相似度对比；其中，问答对为一轮对话或者多轮对话。以此方案，能够精准对比该次交互的对话是否与专属向量库中预存的文档相似。When the intelligent voice device receives positive feedback from the user, the intelligent voice device extracts the question-answer pair of the interaction; the intelligent voice device vectorizes the extracted question-answer pair to obtain the text vector of the question-answer pair; the intelligent voice device compares the text vector of the question-answer pair with the document vector pre-stored in the current exclusive vector library for similarity; wherein the question-answer pair is a round of dialogue or multiple rounds of dialogue. With this solution, it is possible to accurately compare whether the dialogue of the interaction is similar to the document pre-stored in the exclusive vector library.

进一步地，智能语音设备可以结合比对需求，预存预设阈值。这样，若该次交互的对话与预存的文档的相似度低于预设阈值，将该次交互的对话作为用户历史文档。以此方案，能够实现用户历史文档的精准获取，以便提高大语言模型输出的语音回复信息的个性化程度。Furthermore, the intelligent voice device can store a preset threshold value in combination with the comparison requirements. In this way, if the similarity between the interactive dialogue and the pre-stored document is lower than the preset threshold value, the interactive dialogue is used as the user's historical document. With this solution, the user's historical documents can be accurately obtained, so as to improve the personalization of the voice reply information output by the large language model.

可选地，智能语音设备获取用户隐私文档，包括：Optionally, the intelligent voice device obtains the user's privacy document, including:

智能语音设备将用户画像文档作为用户隐私文档；或者，智能语音设备将用户画像文档及用户上传的隐私文档作为用户隐私文档；其中，用户画像文档由用户的标识信息匹配得到，用户画像文档包括用户的收入、资产、婚姻状况、居住情况、家庭组成、年龄、地域中的一种或多种，用户上传的隐私文档包括用户的睡眠相关信息、用户的房屋相关信息中的一种或多种。The intelligent voice device uses the user portrait document as the user privacy document; or, the intelligent voice device uses the user portrait document and the privacy document uploaded by the user as the user privacy document; wherein, the user portrait document is obtained by matching the user's identification information, the user portrait document includes one or more of the user's income, assets, marital status, living conditions, family composition, age, and region, and the privacy document uploaded by the user includes one or more of the user's sleep-related information and the user's house-related information.

在本方案中，智能语音设备可以调用用户画像文档及用户上传的隐私文档。具体地，可以通过用户的标识信息在智能语音设备中匹配出用户画像文档。这里，用户画像文档是指通过用户画像分析得出的内容。用户画像文档包括用户的收入、资产、婚姻状况、居住情况、家庭组成、年龄、地域中的一种或多种，用户画像文档还可以是与前述内容相关的信息。这样，在智能语音设备未接收到用户上传的隐私文档时，将用户画像文档作为用户隐私文档；在智能语音设备接收到用户上传的隐私文档时，将用户画像文档及用户上传的隐私文档作为用户隐私文档。这里，用户上传的隐私文档包括用户的睡眠相关信息、用户的房屋相关信息中的一种或多种。用户的睡眠相关信息包括用户的睡眠时间、用户的睡眠状态。用户的房屋相关信息包括用户的房屋面积及房屋朝向等。以此方案，能够实现用户隐私文档的精准获取，以便提高大语言模型输出的语音回复信息的个性化程度。In this solution, the intelligent voice device can call the user portrait document and the privacy document uploaded by the user. Specifically, the user portrait document can be matched in the intelligent voice device through the user's identification information. Here, the user portrait document refers to the content obtained through the user portrait analysis. The user portrait document includes one or more of the user's income, assets, marital status, living conditions, family composition, age, and region, and the user portrait document can also be information related to the aforementioned content. In this way, when the intelligent voice device does not receive the privacy document uploaded by the user, the user portrait document is used as the user privacy document; when the intelligent voice device receives the privacy document uploaded by the user, the user portrait document and the privacy document uploaded by the user are used as the user privacy document. Here, the privacy document uploaded by the user includes one or more of the user's sleep-related information and the user's house-related information. The user's sleep-related information includes the user's sleep time and the user's sleep state. The user's house-related information includes the user's house area and house orientation, etc. With this solution, the user's privacy document can be accurately obtained so as to improve the personalization of the voice reply information output by the large language model.

可选地，智能语音设备获取智能语音设备的说明书文档，包括：Optionally, the intelligent voice device obtains a manual document of the intelligent voice device, including:

智能语音设备根据预设的关联关系，将与智能语音设备的标识信息相关联的说明书文档作为智能语音设备的说明书文档。According to the preset association relationship, the intelligent voice device uses the instruction document associated with the identification information of the intelligent voice device as the instruction document of the intelligent voice device.

在本方案中，智能语音设备还可以获取其入口网关得到的智能语音设备的标识信息。这里，智能语音设备的标识信息为智能语音设备的型号。这样，智能语音设备可以根据预设的关联关系，将与智能语音设备的型号相关联的说明书文档作为智能语音设备的说明书文档。以此方案，能够实现智能语音设备的说明书文档的精准获取。In this solution, the intelligent voice device can also obtain the identification information of the intelligent voice device obtained by its entry gateway. Here, the identification information of the intelligent voice device is the model of the intelligent voice device. In this way, the intelligent voice device can use the instruction document associated with the model of the intelligent voice device as the instruction document of the intelligent voice device according to the preset association relationship. With this solution, the accurate acquisition of the instruction document of the intelligent voice device can be achieved.

可选地，智能语音设备根据问题文本、专属向量库及大语言模型，得到智能语音设备的语音回复信息，包括：Optionally, the intelligent voice device obtains voice reply information of the intelligent voice device according to the question text, the exclusive vector library and the large language model, including:

智能语音设备将问题文本进行向量化处理，以得到用户问题向量。The intelligent voice device vectorizes the question text to obtain the user question vector.

智能语音设备将用户问题向量与专属向量库进行对比，以检索出向量相似度最高的文本段。The intelligent voice device compares the user question vector with the exclusive vector library to retrieve the text segment with the highest vector similarity.

智能语音设备将问题文本及文本段作为大语言模型的输入信息，输出相应的语音回复信息。The intelligent voice device uses the question text and text segments as input information of the large language model and outputs the corresponding voice reply information.

以此方案，不仅能够提供更准确、更全面的回答，还能够根据用户的实际需求进行个性化的回复。This solution can not only provide more accurate and comprehensive answers, but also provide personalized responses based on the user's actual needs.

图6是本公开实施例提供的一个基于大语言模型的语音交互装置示意图；结合图6所示，本公开实施例提供一种基于大语言模型的语音交互装置200，包括处理器(processor)201和存储器(memory)202。可选地，该装置200还可以包括通信接口(Communication Interface)203和总线204。其中，处理器201、通信接口203、存储器202可以通过总线204完成相互间的通信。通信接口203可以用于信息传输。处理器201可以调用存储器202中的逻辑指令，以执行上述实施例的基于大语言模型的语音交互方法。FIG6 is a schematic diagram of a speech interaction device based on a large language model provided in an embodiment of the present disclosure; in combination with FIG6, an embodiment of the present disclosure provides a speech interaction device 200 based on a large language model, including a processor (processor) 201 and a memory (memory) 202. Optionally, the device 200 may also include a communication interface (Communication Interface) 203 and a bus 204. Among them, the processor 201, the communication interface 203, and the memory 202 can communicate with each other through the bus 204. The communication interface 203 can be used for information transmission. The processor 201 can call the logic instructions in the memory 202 to execute the speech interaction method based on the large language model of the above embodiment.

此外，上述的存储器202中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。In addition, the logic instructions in the memory 202 described above may be implemented in the form of software functional units and when sold or used as independent products, may be stored in a computer-readable storage medium.

存储器202作为一种计算机可读存储介质，可用于存储软件程序、计算机可执行程序，如本公开实施例中的方法对应的程序指令/模块。处理器201通过运行存储在存储器202中的程序指令/模块，从而执行功能应用以及数据处理，即实现上述实施例中的基于大语言模型的语音交互方法。The memory 202 is a computer-readable storage medium that can be used to store software programs and computer executable programs, such as program instructions/modules corresponding to the method in the embodiment of the present disclosure. The processor 201 executes the function application and data processing by running the program instructions/modules stored in the memory 202, that is, the speech interaction method based on the large language model in the above embodiment is implemented.

存储器202可包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据终端设备的使用所创建的数据等。此外，存储器202可以包括高速随机存取存储器，还可以包括非易失性存储器。The memory 202 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application required for at least one function; the data storage area may store data created according to the use of the terminal device, etc. In addition, the memory 202 may include a high-speed random access memory and may also include a non-volatile memory.

本公开实施例提供了一种智能语音设备，包括：智能语音设备本体，以及上述的基于大语言模型的语音交互装置200。基于大语言模型的语音交互装置200安装于智能语音设备本体。这里所表述的安装关系，并不仅限于在智能语音设备本体的内部放置，还包括了与智能语音设备的其他元器件的安装连接，包括但不限于物理连接、电性连接或者信号传输连接等。本领域技术人员可以理解的是，基于大语言模型的语音交互装置200可以适配于可行的智能语音设备主体，进而实现其他可行的实施例。The embodiments of the present disclosure provide an intelligent voice device, including: an intelligent voice device body, and the above-mentioned voice interaction device 200 based on a large language model. The voice interaction device 200 based on a large language model is installed on the intelligent voice device body. The installation relationship described here is not limited to placement inside the intelligent voice device body, but also includes installation connections with other components of the intelligent voice device, including but not limited to physical connections, electrical connections, or signal transmission connections. It can be understood by those skilled in the art that the voice interaction device 200 based on a large language model can be adapted to a feasible intelligent voice device body, thereby realizing other feasible embodiments.

本公开实施例提供了一种计算机可读存储介质，存储有计算机可执行指令，所述计算机可执行指令设置为执行上述基于大语言模型的语音交互方法。An embodiment of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are configured to execute the above-mentioned voice interaction method based on a large language model.

本公开实施例的技术方案可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括一个或多个指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质，例如：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质。The technical solution of the embodiment of the present disclosure can be embodied in the form of a software product, which is stored in a storage medium and includes one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in the embodiment of the present disclosure. The aforementioned storage medium may be a non-transient storage medium, such as: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.

以上描述和附图充分地示出了本公开的实施例，以使本领域的技术人员能够实践它们。其他实施例可以包括结构的、逻辑的、电气的、过程的以及其他的改变。实施例仅代表可能的变化。除非明确要求，否则单独的部件和功能是可选的，并且操作的顺序可以变化。一些实施例的部分和特征可以被包括在或替换其他实施例的部分和特征。而且，本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的，除非上下文清楚地表明，否则单数形式的“一个”(a)、“一个”(an)和“所述”(the)旨在同样包括复数形式。类似地，如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外，当用于本申请中时，术语“包括”(comprise)及其变型“包括”(comprises)和/或包括(comprising)等指陈述的特征、整体、步骤、操作、元素，和/或组件的存在，但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。在没有更多限制的情况下，由语句“包括一个…”限定的要素，并不排除在包括所述要素的过程、方法或者设备中还存在另外的相同要素。本文中，每个实施例重点说明的可以是与其他实施例的不同之处，各个实施例之间相同相似部分可以互相参见。对于实施例公开的方法、产品等而言，如果其与实施例公开的方法部分相对应，那么相关之处可以参见方法部分的描述。The above description and the accompanying drawings fully illustrate the embodiments of the present disclosure so that those skilled in the art can practice them. Other embodiments may include structural, logical, electrical, process and other changes. The embodiments represent only possible changes. Unless explicitly required, individual components and functions are optional, and the order of operations may vary. Parts and features of some embodiments may be included in or replace parts and features of other embodiments. Moreover, the words used in this application are only used to describe the embodiments and are not used to limit the claims. As used in the description of the embodiments and the claims, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms as well. Similarly, the term "and/or" as used in this application refers to any and all possible combinations of one or more associated listings. In addition, when used in this application, the term "comprise" and its variants "comprises" and/or including (comprising) refer to the existence of stated features, wholes, steps, operations, elements, and/or components, but do not exclude the existence or addition of one or more other features, wholes, steps, operations, elements, components and/or groups of these. In the absence of further restrictions, the elements defined by the sentence "comprising a ..." do not exclude the existence of other identical elements in the process, method or device comprising the elements. In this article, each embodiment may focus on the differences from other embodiments, and the same or similar parts between the embodiments may refer to each other. For the methods, products, etc. disclosed in the embodiments, if they correspond to the method part disclosed in the embodiments, then the relevant parts can refer to the description of the method part.

本领域技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，可以取决于技术方案的特定应用和设计约束条件。所述技术人员可以对每个特定的应用来使用不同方法以实现所描述的功能，但是这种实现不应认为超出本公开实施例的范围。所述技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software may depend on the specific application and design constraints of the technical solution. The technicians may use different methods for each specific application to implement the described functions, but such implementations should not be considered to exceed the scope of the embodiments of the present disclosure. The technicians may clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments, and will not be repeated here.

本文所披露的实施例中，所揭露的方法、产品(包括但不限于装置、设备等)，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，可以仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例。另外，在本公开实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In the embodiments disclosed herein, the disclosed methods and products (including but not limited to devices, equipment, etc.) can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units can be only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. In addition, the coupling or direct coupling or communication connection between each other shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms. The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to implement this embodiment. In addition, each functional unit in the embodiment of the present disclosure may be integrated in a processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit.

附图中的流程图和框图显示了根据本公开实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这可以依所涉及的功能而定。在附图中的流程图和框图所对应的描述中，不同的方框所对应的操作或步骤也可以以不同于描述中所披露的顺序发生，有时不同的操作或步骤之间不存在特定的顺序。例如，两个连续的操作或步骤实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这可以依所涉及的功能而定。框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings show the possible architecture, functions and operations of the system, method and computer program product according to the embodiments of the present disclosure. In this regard, each box in the flowchart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions marked in the box can also occur in an order different from that marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, which can depend on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different boxes can also occur in an order different from that disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, which can depend on the functions involved. Each box in the block diagram and/or flowchart, and the combination of boxes in the block diagram and/or flowchart can be implemented with a dedicated hardware-based system that performs a specified function or action, or can be implemented with a combination of dedicated hardware and computer instructions.

Claims

1. A method of voice interaction based on a large language model, comprising:

Under the condition that voice interaction data input by a user are received, converting the voice interaction data into a question text;

under the condition that the problem text indicates that the user intends to call a voice instruction, obtaining voice reply information of the intelligent voice equipment according to the problem text, the built knowledge vector library and the large language model;

And controlling the intelligent voice equipment to broadcast the voice reply information so that a user can timely acquire the professional knowledge of the intelligent voice equipment.

2. A method according to claim 1, characterized in that the knowledge vector base is built by:

Invoking a specification matched with the identification information of the intelligent voice equipment to serve as a knowledge document;

adding the knowledge document into a knowledge base document, and performing document processing;

And converting the processed document into a vector matrix by using a word embedding technology to obtain a knowledge vector library.

3. The method of claim 2, wherein document processing comprises:

determining target English content in a document, and converting the target English content into Chinese content; the target English content comprises English content of proper nouns and brand names of non-intelligent voice equipment in the document;

And/or the number of the groups of groups,

Determining a to-be-converted format document in the document, and converting the to-be-converted format document into a target format document; the document to be converted is a document in a non-target format in the document.

4. The method of claim 2, wherein document processing comprises:

adjusting the document name so that the word number of the document name meets a first word number range;

And/or the number of the groups of groups,

Adjusting the first title and the second title in the document so that the number of words of the first title and the second title meets a second word number range;

wherein the minimum value of the first word number range is larger than the maximum value of the second word number range.

5. The method according to claim 3 or 4, further comprising:

Under the condition that the interface parameters are received, cutting and blocking the document according to the interface parameters;

the interface parameters comprise block sizes and/or block modes, and the interface parameters are configured by a developer according to the characteristics of the knowledge base document.

6. The method according to any one of claims 1 to 5, wherein obtaining the voice reply information of the intelligent voice device according to the question text, the built knowledge vector library and the large language model comprises:

vectorizing the problem text to obtain a user problem vector;

Comparing the user problem vector with the built knowledge vector library to search out a text segment with highest vector similarity;

And taking the problem text and the text segment as input information of a large language model, and outputting corresponding voice reply information.

7. The method according to any one of claims 1 to 5, wherein controlling the intelligent voice device to report the voice reply message so that the user knows the expertise of the intelligent voice device in time includes:

performing security examination on the voice reply information;

And under the condition that the voice reply information passes the security examination, controlling the intelligent voice equipment to broadcast the voice reply information so as to enable a user to know the professional knowledge of the intelligent voice equipment in time.

8. The method as recited in claim 7, further comprising:

And controlling the intelligent voice equipment to broadcast the sensitive problem reply information under the condition that the voice reply information does not pass the security examination.

9. A large language model based speech interaction device comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the large language model based speech interaction method of any one of claims 1 to 8 when the program instructions are run.

10. An intelligent speech device, comprising:

an intelligent voice device body;

The large language model based voice interaction device according to claim 9, installed on the intelligent voice equipment body.