[go: up one dir, main page]

CN117975968B - Remote patrol system control method and system based on sound and language model - Google Patents

Remote patrol system control method and system based on sound and language model Download PDF

Info

Publication number
CN117975968B
CN117975968B CN202410389170.7A CN202410389170A CN117975968B CN 117975968 B CN117975968 B CN 117975968B CN 202410389170 A CN202410389170 A CN 202410389170A CN 117975968 B CN117975968 B CN 117975968B
Authority
CN
China
Prior art keywords
data
intention
remote
model
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410389170.7A
Other languages
Chinese (zh)
Other versions
CN117975968A (en
Inventor
景志斌
陈果累
何佳
叶俊
李孟福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Kingscheme Information Technology Co ltd
Original Assignee
Sichuan Kingscheme Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Kingscheme Information Technology Co ltd filed Critical Sichuan Kingscheme Information Technology Co ltd
Priority to CN202410389170.7A priority Critical patent/CN117975968B/en
Publication of CN117975968A publication Critical patent/CN117975968A/en
Application granted granted Critical
Publication of CN117975968B publication Critical patent/CN117975968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00002Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a remote patrol system control method and a remote patrol system control system based on a sound and language model, relates to the field of intelligent patrol of transformer substations, and solves the problems that the existing remote patrol system is complex in operation, easy to make mistakes in inquiry and untimely in result notification; comprising the following steps: acquiring voice data, and converting the voice data into text data according to the fine-tuned voice recognition model; parsing intention data from the text data according to the remote patrol expert model, the intention data comprising: API interface and interface parameter of the service to be called; analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; generating a service call request according to the enhanced intention data, and calling a remote inspection system to execute a service through the service call request to obtain a service result; inputting the service result into a remote patrol expert model to generate reply text data; converting the reply text data into reply voice data and outputting the reply voice data; the remote patrol system can be controlled based on sound.

Description

一种基于声音和语言模型的远程巡视系统控制方法及系统A remote patrol system control method and system based on sound and language model

技术领域Technical Field

本发明涉及变电站智能巡视领域,更具体地说,它涉及一种基于声音和语言模型的远程巡视系统控制方法及系统。The present invention relates to the field of intelligent inspection of transformer substations, and more specifically, to a remote inspection system control method and system based on sound and language models.

背景技术Background Art

变电站远程巡视系统,是国家电网规划的以机器人、无人机、声纹设备、摄像机等为感知层设备,将采集数据通过算法主机进行智能分析后生成巡视结果的系统。系统主要以人为启动巡视、设定周期巡视任务或主辅系统告警信号触发等方式执行变电站设备的巡视任务。系统围绕巡视点位开展巡视任务,巡视点位是指业务上的检查点,一个设备一般存在多个巡视点位。通常由一个或多个摄像机、无人机或机器人预置位构造最小业务监视点,用于观察各巡视点位的部件是否存在缺陷。系统设计有基于点位属性的检索模块,用于查询巡视点位后进行设备监控查看、巡视任务启动、巡视设备设置等操作。The substation remote inspection system is a system planned by the State Grid Corporation of China that uses robots, drones, voiceprint devices, cameras, etc. as perception layer devices, and generates inspection results after intelligent analysis of collected data through the algorithm host. The system mainly performs inspection tasks of substation equipment by manually initiating inspections, setting periodic inspection tasks, or triggering alarm signals from main and auxiliary systems. The system carries out inspection tasks around inspection points. Inspection points refer to business checkpoints, and there are generally multiple inspection points for one device. The minimum business monitoring point is usually constructed by one or more camera, drone or robot preset positions to observe whether there are defects in the components of each inspection point. The system is designed with a retrieval module based on point attributes, which is used to query the inspection points and perform equipment monitoring and viewing, inspection task initiation, inspection equipment settings, and other operations.

然而,由于巡视点位的业务属性繁多,如变电站区域、变电站间隔、设备名称、部件名称、相位名称、点位名称、点位编码、巡视类型、重要等级、识别算法、感知层设备等,人为检索时查询准确率不高导致操作繁琐,应急情况发生时无法第一时间查看设备的相关情况,尤其是针对未建设主辅系统的变电站或主辅系统未涵盖的巡视点位。另一方面,由于涉及多个巡视点位,导致巡视过程持续时间长,巡视结果生成后可能无法第一时间被现场工作人员观察到。However, due to the numerous business attributes of patrol points, such as substation area, substation interval, equipment name, component name, phase name, point name, point code, patrol type, importance level, recognition algorithm, perception layer equipment, etc., the query accuracy is not high during manual retrieval, resulting in cumbersome operation. When an emergency occurs, it is impossible to check the relevant conditions of the equipment in the first time, especially for substations without primary and auxiliary systems or patrol points not covered by the primary and auxiliary systems. On the other hand, since multiple patrol points are involved, the patrol process lasts for a long time, and the patrol results may not be observed by on-site staff in the first time after they are generated.

发明内容Summary of the invention

本申请的目的是提供一种基于声音和语言模型的远程巡视系统控制方法及系统,解决现有的远程巡视系统操作繁琐、查询易出错、结果通知不及时的问题;拓展现有的变电站远程巡视系统,通过声音和语言大模型实现远程巡视系统控制,工作人员通过语音输入与远程巡视系统实现人机交流,指导远程巡视系统快速完成巡视相关工作,并得到巡视任务结果的语音反馈,简化远程巡视系统的操作流程、降低操作难度、便于工作人员及时获知任务结果。The purpose of this application is to provide a remote patrol system control method and system based on sound and language models, so as to solve the problems of cumbersome operation, error-prone query and untimely notification of results of existing remote patrol systems; to expand the existing substation remote patrol system, realize remote patrol system control through sound and language large models, and enable staff to communicate with the remote patrol system through voice input, guide the remote patrol system to quickly complete patrol-related work, and obtain voice feedback on patrol task results, so as to simplify the operation process of the remote patrol system, reduce the difficulty of operation, and facilitate staff to know the task results in a timely manner.

本申请首先提供一种基于声音和语言模型的远程巡视系统控制方法,包括:获取语音数据,根据微调的语音识别模型将所述语音数据转化为文本数据;根据远程巡视专家模型从所述文本数据中解析意图数据,所述意图数据包括:需调用服务的API接口和接口参数,所述远程巡视专家模型为SOTA大语言模型通过训练得到;解析所述意图数据中的接口参数,并对所述接口参数进行增强,得到增强意图数据;根据所述增强意图数据生成服务调用请求,通过所述服务调用请求调用远程巡视系统执行服务得到服务结果;将所述服务结果输入远程巡视专家模型,生成答复文本数据;将所述答复文本数据转换为答复语音数据输出。The present application first provides a remote patrol system control method based on sound and language models, including: acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model; parsing intent data from the text data according to a remote patrol expert model, the intent data including: an API interface and interface parameters of a service to be called, the remote patrol expert model being a SOTA large language model obtained through training; parsing the interface parameters in the intent data, and enhancing the interface parameters to obtain enhanced intent data; generating a service call request according to the enhanced intent data, and calling the remote patrol system to execute the service through the service call request to obtain a service result; inputting the service result into the remote patrol expert model to generate reply text data; and converting the reply text data into reply voice data for output.

采用上述技术方案,通过获取工作人员的语音数据,即可调用远程巡视系统的服务接口,控制远程巡视系统执行服务,将结果通过语音的方式展现,通过语音即可控制远程巡视系统执行服务,服务结果同样通过语音形式返回,简化工作人员的操作流程。另外,通过对接口参数进行意图增强实现精准调用,降低对工作人员的要求。By adopting the above technical solution, the service interface of the remote patrol system can be called by obtaining the voice data of the staff, and the remote patrol system can be controlled to perform services. The results can be displayed by voice, and the remote patrol system can be controlled to perform services by voice. The service results are also returned in the form of voice, which simplifies the operation process of the staff. In addition, accurate calling can be achieved by enhancing the intent of the interface parameters, reducing the requirements for the staff.

在一种可能的实施方式中,所述微调的语音识别模型,通过如下方式得到:获取语音识别SOTA模型,根据电力领域专有语音文本数据对所述语音识别SOTA模型进行微调,得到微调的语音识别模型。In a possible implementation, the fine-tuned speech recognition model is obtained in the following manner: a speech recognition SOTA model is acquired, and the speech recognition SOTA model is fine-tuned according to proprietary speech text data in the electric power field to obtain a fine-tuned speech recognition model.

在一种可能的实施方式中,所述远程巡视专家模型,通过如下方式得到:获取远程巡视系统的API接口和注释,组成意图实现种子库;对所述意图实现种子库中的每个API接口补充远程巡视系统信息的上下文和占位符,生成占位意图表达;将所述占位意图表达输入通用大语言模型进行拓展,组成占位意图表达数据集;对所述占位意图表达数据集进行人工审查,筛除错误的占位意图表达;将占位意图表达数据集中的占位符替换为远程巡视系统中的电力设备信息、点位信息、感知层设备信息,得到意图表达数据集;基于所述意图表达数据集对大语言模型进行训练,得到远程巡视专家模型。In a possible implementation, the remote patrol expert model is obtained in the following manner: obtaining the API interface and annotations of the remote patrol system to form an intention implementation seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention implementation seed library to generate a placeholder intention expression; inputting the placeholder intention expression into a general large language model for expansion to form a placeholder intention expression data set; manually reviewing the placeholder intention expression data set to filter out erroneous placeholder intention expressions; replacing the placeholders in the placeholder intention expression data set with the power equipment information, point information, and perception layer equipment information in the remote patrol system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.

在一种可能的实施方式中,所述增强意图数据,通过如下方式得到:对远程巡视系统中的关系型数据和实体数据进行同步,生成文档数据库,通过词嵌入模型对远程巡视系统中的实体数据进行向量化表达,生成向量数据库;分别在所述文档数据库和所述向量数据库中检索所述接口参数,并对检索结果做交集运算,得到增强后的接口参数;将增强后的接口参数和需调用服务的API接口组合成为增强意图数据。In a possible implementation, the enhanced intent data is obtained in the following manner: synchronizing the relational data and entity data in the remote patrol system to generate a document database, vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; retrieving the interface parameters in the document database and the vector database respectively, and performing an intersection operation on the retrieval results to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface of the service to be called into enhanced intent data.

在一种可能的实施方式中,还包括:获取远程巡视系统的告警文本,将所述告警文本转化为语音输出。In a possible implementation, the method further includes: acquiring an alarm text of a remote patrol system, and converting the alarm text into a voice output.

本申请还提供一种基于声音和语言模型的远程巡视系统控制系统,包括:语音识别模块,用于获取语音数据,根据微调的语音识别模型将所述语音数据转化为文本数据;大语言模型模块,用于根据远程巡视专家模型从所述文本数据中解析意图数据,所述意图数据包括:需调用服务的API接口和接口参数,所述远程巡视专家模型为SOTA大语言模型通过训练得到;检索增强模块,用于解析所述意图数据中的接口参数,并对所述接口参数进行增强,得到增强意图数据;意图调用模块,用于根据所述增强意图数据生成服务调用请求,通过所述服务调用请求调用远程巡视系统执行服务得到服务结果;大语言模型模块,还用于将所述服务结果输入远程巡视专家模型,生成答复文本数据;语音生成模块,用于将所述答复文本数据转换为答复语音数据输出。The present application also provides a remote patrol system control system based on sound and language models, including: a speech recognition module, used to obtain speech data, and convert the speech data into text data according to a fine-tuned speech recognition model; a large language model module, used to parse intent data from the text data according to a remote patrol expert model, the intent data including: an API interface and interface parameters of a service to be called, the remote patrol expert model is a SOTA large language model obtained through training; a retrieval enhancement module, used to parse the interface parameters in the intent data, and enhance the interface parameters to obtain enhanced intent data; an intent call module, used to generate a service call request according to the enhanced intent data, and call the remote patrol system to execute the service through the service call request to obtain a service result; the large language model module is also used to input the service result into the remote patrol expert model to generate reply text data; a speech generation module, used to convert the reply text data into reply voice data output.

在一种可能的实施方式中,所述语音识别模块,还用于获取语音识别SOTA模型,根据电力领域专有语音文本数据对所述语音识别SOTA模型进行微调,得到微调的语音识别模型。In a possible implementation, the speech recognition module is also used to obtain a speech recognition SOTA model, and fine-tune the speech recognition SOTA model according to proprietary speech text data in the power field to obtain a fine-tuned speech recognition model.

在一种可能的实施方式中,所述大语言模型模块,还用于:获取远程巡视系统的API接口和注释,组成意图实现种子库;对所述意图实现种子库中的每个API接口补充远程巡视系统信息的上下文和占位符,生成占位意图表达;将所述占位意图表达输入通用大语言模型进行拓展,组成占位意图表达数据集;对所述占位意图表达数据集进行人工审查,筛除错误的占位意图表达;将占位意图表达数据集中的占位符替换为远程巡视系统中的电力设备信息、点位信息、感知层设备信息,得到意图表达数据集;基于所述意图表达数据集对大语言模型进行训练,得到远程巡视专家模型。In a possible implementation, the large language model module is also used to: obtain the API interface and annotations of the remote patrol system to form an intention realization seed library; supplement the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; input the placeholder intention expression into a general large language model for expansion to form a placeholder intention expression data set; manually review the placeholder intention expression data set to screen out erroneous placeholder intention expressions; replace the placeholders in the placeholder intention expression data set with the power equipment information, point information, and perception layer equipment information in the remote patrol system to obtain an intention expression data set; train the large language model based on the intention expression data set to obtain a remote patrol expert model.

在一种可能的实施方式中,所述检索增强模块,还用于:对远程巡视系统中的关系型数据和实体数据进行同步,生成文档数据库,通过词嵌入模型对远程巡视系统中的实体数据进行向量化表达,生成向量数据库;分别在所述文档数据库和所述向量数据库中检索所述接口参数,并对检索结果做交集运算,得到增强后的接口参数;将增强后的接口参数和需调用服务的API接口组合成为增强意图数据。In a possible implementation, the retrieval enhancement module is also used to: synchronize relational data and entity data in the remote patrol system to generate a document database, vectorize the entity data in the remote patrol system through a word embedding model to generate a vector database; retrieve the interface parameters in the document database and the vector database respectively, and perform intersection operations on the retrieval results to obtain enhanced interface parameters; and combine the enhanced interface parameters and the API interface of the service to be called into enhanced intent data.

在一种可能的实施方式中,所述语音生成模块,还用于获取远程巡视系统的告警文本,将所述告警文本转化为语音输出。In a possible implementation, the voice generation module is further used to obtain an alarm text of the remote patrol system and convert the alarm text into a voice output.

与现有技术相比,本申请具有以下有益效果:本申请采集工作人员的语音数据,通过语音识别模型将语音数据转化为文本数据,通过远程巡视专家模型解析意图数据,进而通过意图数据调用远程巡视系统的对应服务,并且通过远程巡视专家模型将结构化的服务结果转化为语义通顺的文本,最终通过语音输出;工作人员通过语音即可控制远程巡视系统执行任务,最终返回语音形式的任务结果,简化远程巡视系统的操作流程,便于及时获悉任务结果;Compared with the prior art, the present application has the following beneficial effects: the present application collects the voice data of the staff, converts the voice data into text data through the voice recognition model, parses the intention data through the remote patrol expert model, and then calls the corresponding service of the remote patrol system through the intention data, and converts the structured service results into semantically coherent text through the remote patrol expert model, and finally outputs them through voice; the staff can control the remote patrol system to perform tasks through voice, and finally return the task results in voice form, which simplifies the operation process of the remote patrol system and facilitates timely acquisition of task results;

本申请对意图数据中的接口参数进行增强,结合文档数据库和向量数据库检索与接口参数最为接近的表达,确保精准控制,避免因工作人员表达不规范造成的控制错误,提高容错率,降低对工作人员的操作要求;This application enhances the interface parameters in the intent data, combines the document database and the vector database to retrieve the expression closest to the interface parameters, ensures accurate control, avoids control errors caused by non-standard expressions of staff, improves fault tolerance, and reduces operational requirements for staff;

本申请通过API接口和注释组成意图实现种子库,通过补充远程巡视系统信息的上下文和占位符生成占位意图表达,通过通用大语言模型拓展训练语料,通过实体替换占位符得到意图表达数据集,通过意图表达数据集训练远程巡视专家模型,输入一段文本数据即可解析需调用服务的API接口和接口参数,进而控制远程巡视系统执行任务。This application uses API interfaces and annotations to form an intent seed library, generates placeholder intent expressions by supplementing the context and placeholders of the remote patrol system information, expands the training corpus through a general large language model, obtains an intent expression dataset by replacing placeholders with entities, and trains a remote patrol expert model through the intent expression dataset. By inputting a piece of text data, the API interface and interface parameters of the service to be called can be parsed, thereby controlling the remote patrol system to perform tasks.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本发明实施例的进一步理解,构成本申请的一部分,并不构成对本发明实施例的限定。在附图中:The drawings described herein are used to provide a further understanding of the embodiments of the present invention, constitute a part of this application, and do not constitute a limitation of the embodiments of the present invention. In the drawings:

图1为基于声音和语言模型的远程巡视系统控制方法的流程示意图;FIG1 is a flow chart of a remote patrol system control method based on sound and language models;

图2为训练远程巡视专家模型的流程示意图;FIG2 is a flow chart of training a remote patrol expert model;

图3为接口参数检索增强的流程示意图;FIG3 is a schematic diagram of the process of enhancing interface parameter retrieval;

图4为基于声音和语言模型的远程巡视系统控制系统的结构示意图。FIG4 is a schematic diagram of the structure of the remote patrol system control system based on sound and language models.

具体实施方式DETAILED DESCRIPTION

为使本申请的目的、技术方案和优点更加清楚明白,下面结合实施例和附图,对本申请作进一步的详细说明,本申请的示意性实施方式及其说明仅用于解释本申请,并不作为对本申请的限定。In order to make the objectives, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with examples and drawings. The illustrative implementation scheme of the present application and its description are only used to explain the present application and are not intended to limit the present application.

请参见图1所示,图1为基于声音和语言模型的远程巡视系统控制方法的流程示意图,方法包括:S1、获取语音数据,根据微调的语音识别模型将所述语音数据转化为文本数据;S2、根据远程巡视专家模型从所述文本数据中解析意图数据,所述意图数据包括:需调用服务的API接口和接口参数,所述远程巡视专家模型为SOTA大语言模型通过训练得到;S3、解析所述意图数据中的接口参数,并对所述接口参数进行增强,得到增强意图数据;S4、根据所述增强意图数据生成服务调用请求,通过所述服务调用请求调用远程巡视系统执行服务得到服务结果;S5、将所述服务结果输入远程巡视专家模型,生成答复文本数据;S6、将所述答复文本数据转换为答复语音数据输出。Please refer to Figure 1, which is a flow chart of a remote patrol system control method based on sound and language models, the method comprising: S1, acquiring voice data, and converting the voice data into text data according to a fine-tuned speech recognition model; S2, parsing intent data from the text data according to a remote patrol expert model, the intent data comprising: an API interface and interface parameters of a service to be called, the remote patrol expert model being a SOTA large language model obtained through training; S3, parsing the interface parameters in the intent data, and enhancing the interface parameters to obtain enhanced intent data; S4, generating a service call request according to the enhanced intent data, and calling the remote patrol system to execute the service through the service call request to obtain a service result; S5, inputting the service result into the remote patrol expert model to generate reply text data; S6, converting the reply text data into reply voice data output.

具体地,对现有的远程巡视系统进行改进,通过音频采集设备获取工作人员的语音数据,通过语音识别模型将语音数据转化为文本数据,通过远程巡视专家模型解析意图数据,通过对接口参数进行检索精确、增强意图数据,进而通过意图数据调用远程巡视系统执行对应的服务任务,返回的任务结果通过远程巡视专家模型转化为语义通顺的文本,最终通过语音输出。Specifically, the existing remote patrol system is improved by acquiring the voice data of the staff through audio acquisition equipment, converting the voice data into text data through a speech recognition model, parsing the intention data through a remote patrol expert model, retrieving the interface parameters to accurately and enhance the intention data, and then calling the remote patrol system through the intention data to execute the corresponding service task. The returned task results are converted into semantically coherent text through the remote patrol expert model and finally output through voice.

本方案的改进在于,通过获取工作人员的语音数据,即可调用远程巡视系统的服务接口,控制远程巡视系统执行服务,并将结果通过语音的方式展现,实现语音控制远程巡视系统执行服务,并以语音形式及时通知结果,简化工作人员的操作流程。另外,通过对接口参数进行检索实现精准调用,降低对工作人员的要求。The improvement of this solution is that by obtaining the voice data of the staff, the service interface of the remote patrol system can be called, the remote patrol system can be controlled to perform services, and the results can be displayed in the form of voice, so that the voice control remote patrol system can perform services and the results can be notified in time in the form of voice, simplifying the operation process of the staff. In addition, accurate calling can be achieved by retrieving the interface parameters, reducing the requirements for the staff.

步骤S1获取语音数据,根据微调的语音识别模型将所述语音数据转化为文本数据。为提高语音识别模型对电力领域专用词汇的识别能力,采用提前录制的电力领域专有语音文本数据对语音识别模型进行微调。例如,采用OpenAI公司开源的whisper作为语音识别模型,通过提前录制的电力领域专有语音及对应文本数据对语音识别模型进行电力领域的微调(Fine-Tune)训练,以提升语音识别模型对电力行业专有词汇的识别准确率。进一步,为减少微调训练的算力成本,可采用Lora或QLora方式进行微调。Step S1 obtains voice data, and converts the voice data into text data according to the fine-tuned voice recognition model. In order to improve the voice recognition model's recognition ability for specialized vocabulary in the power field, the voice recognition model is fine-tuned using pre-recorded voice text data specific to the power field. For example, the whisper open sourced by OpenAI is used as a voice recognition model, and the voice recognition model is fine-tuned (Fine-Tune) in the power field through pre-recorded voice and corresponding text data specific to the power field, so as to improve the voice recognition model's recognition accuracy for specialized vocabulary in the power industry. Furthermore, in order to reduce the computing power cost of fine-tuning training, Lora or QLora can be used for fine-tuning.

步骤S2根据远程巡视专家模型从所述文本数据中解析意图数据,所述意图数据包括:需调用服务的API接口和接口参数。远程巡视专家模型的目的在于,对语音识别大模型生成的文本数据进行意图解析,并模拟人类自然语言对话。可选择开源大语言模型进行意图识别训练,得到远程巡视专家模型。例如, ChatGLM3-6B模型。传统方式中,对大语言模型进行意图识别训练,需要人工编写大量语料数据。本方案为减少人工投入成本,采用通用大语言模型编写训练用的语料数据进行意图识别训练,得到远程巡视专家模型。Step S2 parses the intent data from the text data according to the remote patrol expert model, and the intent data includes: the API interface and interface parameters of the service to be called. The purpose of the remote patrol expert model is to perform intent analysis on the text data generated by the speech recognition large model and simulate human natural language conversations. An open source large language model can be selected for intent recognition training to obtain a remote patrol expert model. For example, the ChatGLM3-6B model. In the traditional way, intent recognition training for a large language model requires manual writing of a large amount of corpus data. In order to reduce the cost of labor input, this solution uses a general large language model to write training corpus data for intent recognition training to obtain a remote patrol expert model.

请参见图2所示,图2为训练远程巡视专家模型的流程示意图。远程巡视专家模型,通过如下方式得到:Please refer to Figure 2, which is a flow chart of training a remote patrol expert model. The remote patrol expert model is obtained by:

S21、获取远程巡视系统的API接口和注释,组成意图实现种子库。具体地,将现有远程巡视系统中需要进行语音交互的相关API接口源码收集后,针对源码中注释缺失或不清晰的部分进行人工补充与调整,并移除源码的API接口实现部分。每对API接口和注释组成API接口对,多个API接口对组成意图实现种子库。S21. Obtain the API interface and annotations of the remote patrol system to form an intent realization seed library. Specifically, after collecting the relevant API interface source codes that require voice interaction in the existing remote patrol system, manually supplement and adjust the parts of the source code where the annotations are missing or unclear, and remove the API interface implementation part of the source code. Each pair of API interface and annotation constitutes an API interface pair, and multiple API interface pairs constitute an intent realization seed library.

S22、对所述意图实现种子库中的每个API接口补充远程巡视系统信息的上下文和占位符,生成占位意图表达;S23、将所述占位意图表达输入通用大语言模型进行拓展,组成占位意图表达数据集。具体地,占位意图表达是一段文本,该文本表示需要调用该API接口,对该占位符执行操作。将占位意图表达输入给通用大语言模型LLM执行意图表达生成任务,得到多个占位意图表达构成的占位意图表达数据集。这里的通用大语言模型可选择商用或开源的大语言模型,如ChatGPT、ChatGLM、文心一言、Llama 2等。S22. Supplement the context and placeholder of the remote patrol system information for each API interface in the intent implementation seed library to generate a placeholder intent expression; S23. Input the placeholder intent expression into a general large language model for expansion to form a placeholder intent expression data set. Specifically, the placeholder intent expression is a piece of text, which indicates that the API interface needs to be called to perform an operation on the placeholder. The placeholder intent expression is input into the general large language model LLM to perform the intent expression generation task, and a placeholder intent expression data set consisting of multiple placeholder intent expressions is obtained. The general large language model here can select a commercial or open source large language model, such as ChatGPT, ChatGLM, Wenxin Yiyan, Llama 2, etc.

例如,使用ChatGPT 3.5-Turbo模型执行意图表达生成任务。下面列举了一种可能的输入与输出,输入为某个占位意图表达,输出为拓展的多个占位意图表达,用于组成占位意图表达数据集。例如:For example, use the ChatGPT 3.5-Turbo model to perform the intent expression generation task. The following lists a possible input and output, where the input is a placeholder intent expression and the output is multiple expanded placeholder intent expressions, which are used to form a placeholder intent expression dataset. For example:

输入通用大语言模型LLM:Enter the general large language model LLM:

“在基于Springboot开发的变电站远程巡视系统有如下接口(隐藏了具体的实现部分)"The substation remote inspection system developed based on Springboot has the following interfaces (the specific implementation part is hidden)

######

@ApiOperation(value = "将设备相关的摄像头转到对应的预置点位上,这些视频预置点一般对着设备的某个部件")@ApiOperation(value = "Turn the device-related camera to the corresponding preset point. These video preset points are usually facing a certain component of the device")

@PostMapping("/cameraFoucsDevice")@PostMapping("/cameraFoucsDevice")

public WrappedResult<Boolean>cameraFoucsDevice(public WrappedResult<Boolean>cameraFoucsDevice(

@ApiParam(value = "变电站设备的名称", name = "deviceName",required = true) @RequestParam("deviceName") String deviceName,@ApiParam(value = "Name of substation equipment", name = "deviceName",required = true) @RequestParam("deviceName") String deviceName,

@ApiParam(value = "设备的部件名称", name = "partName",required = false) @RequestParam("partName") String partName@ApiParam(value = "device component name", name = "partName",required = false) @RequestParam("partName") String partName

){}){}

######

你是专业的中文语言学家,帮我列举出你所知道的表达句子,这些句子表达的意图可以通过调用对应的接口实现。You are a professional Chinese linguist. Please help me list the expressions you know. The intentions expressed by these sentences can be realized by calling the corresponding interfaces.

句子需要满足以下要求:The sentences need to meet the following requirements:

1. 根据函数的定义,句子中需要组合函数的参数,在你生成的句子中严格用参数名作为该实体的占位符,如{deviceName}1. According to the definition of the function, the sentence needs to combine the function parameters. In the sentence you generate, strictly use the parameter name as the placeholder of the entity, such as {deviceName}

2. 注意可选参数的情况,如果参数是可选的,则句子中可以不包含该参数,如果不包含该参数则句子中也不能有其它指代该参数的实体,需要保证句子的语法正确2. Pay attention to optional parameters. If a parameter is optional, the sentence can exclude the parameter. If the parameter is excluded, there cannot be other entities referring to the parameter in the sentence. The sentence must be grammatically correct.

3. 句子应作为该系统使用人员角色进行表达,需要符合人类的表达方式,如:我需要看看变压器的监控视频3. The sentence should be expressed as the user role of the system and should conform to the human expression, such as: I need to see the monitoring video of the transformer

4. 尽可能穷举相同意图的不同表达句子,以及涵盖接口的所有参数,并保证句子不重复4. Try to list different expressions of the same intent as much as possible, cover all parameters of the interface, and ensure that the sentences are not repeated

每条句子的返回格式:The return format for each sentence is:

Intent: 句子 Action: 调用的接口名称 Params: 参数列表”。Intent: sentence Action: name of the interface to be called Params: parameter list".

相应地,通用大语言模型LLM返回:Accordingly, the general large language model LLM returns:

“Intent: 我想要将{deviceName}设备对应的摄像头转到预置点位上 Action:cameraFoucsDevice Params: deviceName“Intent: I want to move the camera corresponding to the {deviceName} device to the preset position Action:cameraFoucsDevice Params: deviceName

Intent: 可以帮我将摄像头对准{deviceName}设备吗? Action:cameraFoucsDevice Params: deviceNameIntent: Can you help me focus my camera on {deviceName}? Action: cameraFoucsDevice Params: deviceName

Intent: 我需要将摄像头对焦在{deviceName}设备上 Action:cameraFoucsDevice Params: deviceNameIntent: I need to focus the camera on {deviceName} device Action:cameraFoucsDevice Params: deviceName

Intent: 将摄像头对准{deviceName}设备 Action: cameraFoucsDeviceParams: deviceNameIntent: Focus the camera on {deviceName} device Action: cameraFoucsDeviceParams: deviceName

Intent: 将摄像头对准{deviceName}设备的部件 Action: cameraFoucsDeviceParams: deviceName, partNameIntent: Focus the camera on the part of the {deviceName} device Action: cameraFoucsDeviceParams: deviceName, partName

Intent: 能帮我转一下摄像头,对准{deviceName}设备的{partName}部件吗?Action: cameraFoucsDevice Params: deviceName, partNameIntent: Can you help me turn the camera to focus on {partName} of {deviceName}? Action: cameraFoucsDevice Params: deviceName, partName

Intent: 能不能将摄像头调整到{partName}部件,它属于{deviceName}设备?Action: cameraFoucsDevice Params: deviceName, partNameIntent: Can you adjust the camera to the {partName} component, which belongs to the {deviceName} device? Action: cameraFoucsDevice Params: deviceName, partName

Intent: 我需要调整摄像头的焦点,对准{deviceName}设备的{partName}部件Action: cameraFoucsDevice Params: deviceName, partName”。Intent: I need to adjust the focus of the camera to the {partName} component of the {deviceName} device Action: cameraFoucsDevice Params: deviceName, partName".

S24、对所述占位意图表达数据集进行人工审查,筛除错误的占位意图表达。具体地,上述通用大语言模型生成的占位意图表达数据集,可能存在不恰当的表达,工作人员不会使用该表达。因此,需要人工筛除错误的占位意图表达。S24, manually review the placeholder intention expression data set to filter out incorrect placeholder intention expressions. Specifically, the placeholder intention expression data set generated by the general large language model may contain inappropriate expressions, which staff will not use. Therefore, it is necessary to manually filter out incorrect placeholder intention expressions.

S25、将占位意图表达数据集中的占位符替换为远程巡视系统中的电力设备信息、点位信息、感知层设备信息,得到意图表达数据集。例如,将设备“主变压器”的部件“油枕”与占位意图表达“放大{deviceName}的{partName}部分,我需要看清楚”中的占位符做替换,生成:“放大主变压器的油枕部分,我需要看清楚”,加入意图表达数据集。S25. Replace the placeholders in the placeholder intent expression data set with the power equipment information, point information, and perception layer equipment information in the remote patrol system to obtain the intent expression data set. For example, replace the component "oil pillow" of the equipment "main transformer" with the placeholder in the placeholder intent expression "zoom in on the {partName} part of {deviceName}, I need to see it clearly", and generate: "zoom in on the oil pillow part of the main transformer, I need to see it clearly", and add it to the intent expression data set.

具体地,利用Swagger文档获取步骤S21中API接口的API Schema,通过Python脚本程序将步骤S23生成的占位意图表达数据集与API Schema进行混合后得到可用于大语言模型(例如,ChatGLM3-6B模型)进行意图训练和验证的意图表达数据集,如下为意图表达数据集的一种表达方式:Specifically, the API Schema of the API interface in step S21 is obtained using the Swagger document, and the placeholder intent expression dataset generated in step S23 is mixed with the API Schema through a Python script to obtain an intent expression dataset that can be used for intent training and verification of a large language model (for example, the ChatGLM3-6B model). The following is an expression of the intent expression dataset:

“{"{

"tools": ["tools": [

"cameraFoucsDevice: 将设备相关的摄像头转到对应的预置点位上,这些视频预置点一般对着设备的某个部件\nParameters: {\"deviceName\": \"Required. string.变电站设备的名称\", \"partName\": \"Optional. string. 设备的部件名称.\"}\nOutput: boolean[True,False]\n""cameraFoucsDevice: Move the device-related camera to the corresponding preset point. These video preset points are generally facing a certain component of the device\nParameters: {\"deviceName\": \"Required. string. The name of the substation equipment\", \"partName\": \"Optional. string. The name of the device component.\"}\nOutput: boolean[True,False]\n"

],],

"conversations": ["conversations": [

{{

"role": "user","role": "user",

"content": "我想要将主变压器设备对应的摄像头转到预置点位上""content": "I want to move the camera corresponding to the main transformer device to the preset point"

},},

{{

"role": "assistant","role": "assistant",

"content": "我需要使用cameraFoucsDevice将摄像头转到主变压器区域.""content": "I need to use cameraFoucsDevice to turn the camera to the main transformer area."

},},

{{

"role": "tool","role": "tool",

"name": "cameraFoucsDevice","name": "cameraFoucsDevice",

"parameters": {"parameters": {

"deviceName": "主变压器""deviceName": "Main Transformer"

},},

"observation": "True""observation": "True"

},},

{{

"role": "assistant","role": "assistant",

"content": "操作返回True,摄像头已成功转至主变压器的""content": "The operation returns True, the camera has been successfully transferred to the main transformer"

}}

]]

}”。}".

S26、基于所述意图表达数据集对大语言模型进行训练,得到远程巡视专家模型。具体地,远程巡视专家模型,可根据文本数据识别出应该调用的远程巡视系统API接口和接口参数,并以Python函数执行源码的形式传递,用于生成服务调用请求。例如:S26. Train the large language model based on the intention expression data set to obtain a remote patrol expert model. Specifically, the remote patrol expert model can identify the remote patrol system API interface and interface parameters that should be called based on the text data, and pass it in the form of Python function execution source code to generate a service call request. For example:

输入远程巡视专家模型:“我想要将主变压器设备对应的摄像头转到预置点位上”。Input the remote patrol expert model: "I want to move the camera corresponding to the main transformer equipment to the preset point."

远程巡视专家模型输出:“dispath_request("cameraFoucsDevice","主变压器")”。Remote patrol expert model output: "dispath_request("cameraFoucsDevice","main transformer")".

步骤S3解析所述意图数据中的接口参数,并对所述接口参数进行增强,得到增强意图数据。解析意图数据中的接口参数进行增强,目的是确保API接口调用的接口参数是准确的,提升接口调用的鲁棒性。例如,远程巡视系统工作人员的数据权限为某变电站,经远程巡视专家模型解析后,意图数据中显示接口参数为“1号主变”或“主变”,但该变电站中仅有一台名称为“1号主变压器”的变压器,对接口参数进行增强后将返回“1号主变压器”。Step S3 parses the interface parameters in the intent data and enhances the interface parameters to obtain enhanced intent data. The purpose of parsing and enhancing the interface parameters in the intent data is to ensure that the interface parameters of the API interface call are accurate and to improve the robustness of the interface call. For example, the data authority of the remote patrol system staff is a certain substation. After being parsed by the remote patrol expert model, the interface parameters displayed in the intent data are "Main Transformer No. 1" or "Main Transformer", but there is only one transformer named "Main Transformer No. 1" in the substation. After enhancing the interface parameters, it will return to "Main Transformer No. 1".

其中,增强意图数据,通过如下方式得到:对远程巡视系统中的关系型数据和实体数据进行同步,生成文档数据库,通过词嵌入模型对远程巡视系统中的实体数据进行向量化表达,生成向量数据库;分别在所述文档数据库和所述向量数据库中检索所述接口参数,并对检索结果做交集运算,得到增强后的接口参数;将增强后的接口参数和需调用服务的API接口组合成为增强意图数据。Among them, the enhanced intent data is obtained in the following manner: synchronizing the relational data and entity data in the remote patrol system to generate a document database, vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing an intersection operation on the search results to obtain enhanced interface parameters; combining the enhanced interface parameters and the API interface of the service to be called into enhanced intent data.

具体地,将远程巡视系统中的关系型数据和实体数据(巡视点位、电力设备、人员信息等)同步到文档数据库,并且在同步数据时通过词嵌入模型(Embeding Model)将实体数据进行向量化表达后存入向量数据库中。可以使用Milvus做为向量数据库,Elasticsearch做为文档数据库,bge-large-zh做为词嵌入模型。对于一个接口参数,一方面对接口参数进行文档数据库(Elasticsearch)的关键词检索,具体而言使用_search接口(Elasticsearch提供)进行模糊查询,获取与接口参数相近的实体列表,另一方面将接口参数通过Embeding模型向量化表达后在向量数据库中(Milvus)查询与接口参数语义相近的实体列表,具体而言在向量数据库中使用欧氏距离指标查询最近的实体列表。将上述的两个实体列表做“交集运算”:该过程具体而言,为兼容中文谐音词的情况,首先将两个列表做合并后将列表中的中文转换为拼音,可采用pypinyin库执行该操作,最后将输入的接口参数转换为拼音后,再将它与列表中的拼音做编辑距离(Edit Distance)相似度计算,从列表中获取相似度最大的搜索结果为最终的输出。如图3所示,图3为接口参数检索增强的流程示意图。Specifically, the relational data and entity data (patrol points, power equipment, personnel information, etc.) in the remote patrol system are synchronized to the document database, and when synchronizing data, the entity data is vectorized and expressed through the word embedding model (Embeding Model) and stored in the vector database. Milvus can be used as the vector database, Elasticsearch as the document database, and bge-large-zh as the word embedding model. For an interface parameter, on the one hand, the interface parameter is searched by keywords in the document database (Elasticsearch), specifically, the _search interface (provided by Elasticsearch) is used for fuzzy query to obtain a list of entities similar to the interface parameter. On the other hand, the interface parameter is vectorized through the Embeding model, and the list of entities with semantics similar to the interface parameter is queried in the vector database (Milvus). Specifically, the Euclidean distance indicator is used in the vector database to query the nearest entity list. Perform "intersection operation" on the two entity lists mentioned above: Specifically, in order to be compatible with Chinese homophones, the two lists are first merged and the Chinese in the list is converted into pinyin. The pypinyin library can be used to perform this operation. Finally, the input interface parameter is converted into pinyin, and then the edit distance (Edit Distance) similarity is calculated between it and the pinyin in the list. The search result with the highest similarity is obtained from the list as the final output. As shown in Figure 3, Figure 3 is a schematic diagram of the process of interface parameter retrieval enhancement.

步骤S4根据所述增强意图数据生成服务调用请求,通过所述服务调用请求调用远程巡视系统执行服务得到服务结果。具体地,执行远程巡视专家模型传递的Python函数源码形式的增强意图数据,发起对远程巡视系统API接口调用的HTTP请求,远程巡视系统针对接口参数调用与API接口对应的服务,并返回服务结果。Step S4 generates a service call request according to the enhanced intention data, and calls the remote patrol system to execute the service through the service call request to obtain the service result. Specifically, the enhanced intention data in the form of Python function source code transmitted by the remote patrol expert model is executed, and an HTTP request for calling the remote patrol system API interface is initiated. The remote patrol system calls the service corresponding to the API interface according to the interface parameters and returns the service result.

步骤S5将所述服务结果输入远程巡视专家模型,生成答复文本数据。具体地,返回的服务结果经远程巡视专家模型生成自然语言表达的答复文本数据。Step S5: input the service result into the remote inspection expert model to generate reply text data. Specifically, the returned service result is generated into reply text data expressed in natural language by the remote inspection expert model.

步骤S6将所述答复文本数据转换为答复语音数据输出。具体地,答复文本数据传递给语音生成模型(PaddleSpeech、OpenTTS、eSpeak模型等)转换为答复语音,并经过远程巡视系统的音频外放设备输出。Step S6 converts the reply text data into reply voice data for output. Specifically, the reply text data is passed to a speech generation model (PaddleSpeech, OpenTTS, eSpeak model, etc.) to be converted into reply voice, and then output through the audio speaker of the remote patrol system.

在一种可能的实施方式中,步骤S6还包括:获取远程巡视系统的告警文本,将所述告警文本转换为答复语音,并经过远程巡视系统的音频外放设备输出。In a possible implementation, step S6 further includes: acquiring an alarm text of the remote patrol system, converting the alarm text into a reply voice, and outputting the reply voice through an audio speaker of the remote patrol system.

可以理解的是,本方案提供的基于声音和语言模型的远程巡视系统控制方法,一方面,通过语音对话的方式调用远程巡视系统的API接口服务,如调用摄像头、无人机、机器人进行巡视任务,得到服务结果,简化操作流程;另一方面,巡视过程中和巡视完成后对巡视告警、巡视结果等转化为语音播报,及时通知现场工作人员。It can be understood that the remote patrol system control method based on sound and language models provided in this solution, on the one hand, calls the API interface service of the remote patrol system through voice dialogue, such as calling cameras, drones, and robots to perform patrol tasks, obtains service results, and simplifies the operation process; on the other hand, during and after the patrol, patrol alarms, patrol results, etc. are converted into voice broadcasts to promptly notify on-site staff.

请参见图4所示,图4为基于声音和语言模型的远程巡视系统控制系统的结构示意图,系统用于实现如上所述的基于声音和语言模型的远程巡视系统控制方法,系统包括:语音识别模块,用于获取语音数据,根据微调的语音识别模型将所述语音数据转化为文本数据;大语言模型模块,用于根据远程巡视专家模型从所述文本数据中解析意图数据,所述意图数据包括:需调用服务的API接口和接口参数,所述远程巡视专家模型为SOTA大语言模型通过训练得到;检索增强模块,用于解析所述意图数据中的接口参数,并对所述接口参数进行增强,得到增强意图数据;意图调用模块,用于根据所述增强意图数据生成服务调用请求,通过所述服务调用请求调用远程巡视系统执行服务得到服务结果;大语言模型模块,还用于将所述服务结果输入远程巡视专家模型,生成答复文本数据;语音生成模块,用于将所述答复文本数据转换为答复语音数据输出。Please refer to Figure 4, which is a structural diagram of a remote patrol system control system based on sound and language models. The system is used to implement the remote patrol system control method based on sound and language models as described above. The system includes: a speech recognition module, which is used to obtain speech data and convert the speech data into text data according to a fine-tuned speech recognition model; a large language model module, which is used to parse intent data from the text data according to a remote patrol expert model, and the intent data includes: an API interface and interface parameters of a service to be called, and the remote patrol expert model is a SOTA large language model obtained through training; a retrieval enhancement module, which is used to parse the interface parameters in the intent data and enhance the interface parameters to obtain enhanced intent data; an intent call module, which is used to generate a service call request according to the enhanced intent data, and to call the remote patrol system to execute the service through the service call request to obtain a service result; the large language model module is also used to input the service result into the remote patrol expert model to generate reply text data; and a speech generation module is used to convert the reply text data into reply speech data output.

在一种可能的实施方式中,所述语音识别模块,还用于获取语音识别SOTA模型,根据电力领域专有语音文本数据对所述语音识别SOTA模型进行微调,得到微调的语音识别模型。In a possible implementation, the speech recognition module is also used to obtain a speech recognition SOTA model, and fine-tune the speech recognition SOTA model according to proprietary speech text data in the power field to obtain a fine-tuned speech recognition model.

在一种可能的实施方式中,所述大语言模型模块,还用于:获取远程巡视系统的API接口和注释,组成意图实现种子库;对所述意图实现种子库中的每个API接口补充远程巡视系统信息的上下文和占位符,生成占位意图表达;将所述占位意图表达输入通用大语言模型进行拓展,组成占位意图表达数据集;对所述占位意图表达数据集进行人工审查,筛除错误的占位意图表达;将占位意图表达数据集中的占位符替换为远程巡视系统中的电力设备信息、点位信息、感知层设备信息,得到意图表达数据集;基于所述意图表达数据集对大语言模型进行训练,得到远程巡视专家模型。In a possible implementation, the large language model module is also used to: obtain the API interface and annotations of the remote patrol system to form an intention realization seed library; supplement the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; input the placeholder intention expression into a general large language model for expansion to form a placeholder intention expression data set; manually review the placeholder intention expression data set to screen out erroneous placeholder intention expressions; replace the placeholders in the placeholder intention expression data set with the power equipment information, point information, and perception layer equipment information in the remote patrol system to obtain an intention expression data set; train the large language model based on the intention expression data set to obtain a remote patrol expert model.

在一种可能的实施方式中,所述检索增强模块,还用于:对远程巡视系统中的关系型数据和实体数据进行同步,生成文档数据库,通过词嵌入模型对远程巡视系统中的实体数据进行向量化表达,生成向量数据库;分别在所述文档数据库和所述向量数据库中检索所述接口参数,并对检索结果做交集运算,得到增强后的接口参数;将增强后的接口参数和需调用服务的API接口组合成为增强意图数据。In a possible implementation, the retrieval enhancement module is also used to: synchronize relational data and entity data in the remote patrol system to generate a document database, vectorize the entity data in the remote patrol system through a word embedding model to generate a vector database; retrieve the interface parameters in the document database and the vector database respectively, and perform intersection operations on the retrieval results to obtain enhanced interface parameters; and combine the enhanced interface parameters and the API interface of the service to be called into enhanced intent data.

在一种可能的实施方式中,所述语音生成模块,还用于获取远程巡视系统的告警文本,将所述告警文本转化为语音输出。In a possible implementation, the voice generation module is further used to obtain an alarm text of the remote patrol system and convert the alarm text into a voice output.

与现有技术相比,第一,本申请采集工作人员的语音数据,通过语音识别模型将语音数据转化为文本数据,通过远程巡视专家模型解析意图数据,进而通过意图数据调用远程巡视系统的对应服务,并且通过远程巡视专家模型将结构化的服务结果转化为语义通顺的文本,最终通过语音输出;工作人员通过语音即可控制远程巡视系统执行任务,最终返回语音形式的任务结果,简化远程巡视系统的操作流程,便于及时获悉任务结果。第二,本申请对意图数据中的接口参数进行增强,结合文档数据库和向量数据库检索与接口参数最为接近的表达,确保精准控制,避免因工作人员表达不规范造成的控制错误,提高容错率,降低对工作人员的操作要求。第三,本申请通过API接口和注释组成意图实现种子库,通过补充远程巡视系统信息的上下文和占位符生成占位意图表达,通过通用大语言模型拓展训练语料,通过实体替换占位符得到意图表达数据集,通过意图表达数据集训练远程巡视专家模型,输入一段文本数据即可解析需调用服务的API接口和接口参数,进而控制远程巡视系统执行任务。Compared with the prior art, first, this application collects the voice data of the staff, converts the voice data into text data through the voice recognition model, parses the intention data through the remote patrol expert model, and then calls the corresponding service of the remote patrol system through the intention data, and converts the structured service results into semantically coherent text through the remote patrol expert model, and finally outputs it through voice; the staff can control the remote patrol system to perform tasks through voice, and finally return the task results in voice form, simplifying the operation process of the remote patrol system, and facilitating timely learning of the task results. Second, this application enhances the interface parameters in the intention data, combines the document database and the vector database to retrieve the expression closest to the interface parameters, ensures precise control, avoids control errors caused by the non-standard expression of the staff, improves the fault tolerance rate, and reduces the operation requirements for the staff. Third, this application forms an intention seed library through API interfaces and annotations, generates placeholder intention expressions by supplementing the context and placeholders of the remote patrol system information, expands the training corpus through the general large language model, obtains the intention expression data set by replacing the placeholders with entities, trains the remote patrol expert model through the intention expression data set, and inputs a piece of text data to parse the API interface and interface parameters of the service to be called, thereby controlling the remote patrol system to perform tasks.

以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific implementation methods described above further illustrate the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above description is only a specific implementation method of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the scope of protection of the present invention.

Claims (6)

1. A method for controlling a remote patrol system based on a sound and language model, comprising:
Acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model;
Parsing intention data from the text data according to a remote patrol expert model, the intention data comprising: the remote patrol expert model is obtained by training an SOTA large language model;
Analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data;
Generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute a service through the service call request to obtain a service result;
inputting the service result into a remote patrol expert model to generate reply text data;
converting the reply text data into reply voice data to be output;
The remote patrol expert model is obtained by the following steps: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the SOTA large language model based on the intention expression data set to obtain a remote patrol expert model;
The enhancement intention data is obtained by the following steps: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;
Searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; comprising the following steps: on one hand, the interface parameters are subjected to keyword retrieval of a document database, and an entity list similar to the interface parameters is obtained; on the other hand, after the interface parameters are expressed in a vector mode through Embeding, an entity list similar to the interface parameter semantics is inquired in a vector database; in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, the input interface parameters are converted into pinyin, edit distance similarity calculation is carried out on the input interface parameters and the pinyin in the lists, and the search result with the maximum similarity is obtained from the lists and is finally output.
2. A method of controlling a remote tour system according to claim 1, wherein the fine-tuned speech recognition model is obtained by: and acquiring a voice recognition SOTA model, and performing fine adjustment on the voice recognition SOTA model according to the special voice text data in the electric power field to obtain a fine-adjusted voice recognition model.
3. A method of controlling a remote tour system based on a sound and language model according to claim 1, further comprising: and acquiring an alarm text of the remote patrol system, and converting the alarm text into voice output.
4. A voice and language model based remote patrol system control system for performing a voice and language model based remote patrol system control method as claimed in any one of claims 1-3, comprising:
The voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model;
The large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model;
the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data;
the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result;
The large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data;
the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data;
Wherein, the big language model module is further used for: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training a large language model based on the intention expression data set to obtain a remote patrol expert model;
The retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;
Searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; comprising the following steps: on one hand, the interface parameters are subjected to keyword retrieval of a document database, and an entity list similar to the interface parameters is obtained; on the other hand, after the interface parameters are expressed in a vector mode through Embeding, an entity list similar to the interface parameter semantics is inquired in a vector database; in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, the input interface parameters are converted into pinyin, edit distance similarity calculation is carried out on the input interface parameters and the pinyin in the lists, and the search result with the maximum similarity is obtained from the lists and is finally output.
5. The voice and language model based remote tour system control system according to claim 4, wherein the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power domain, so as to obtain a fine tuned voice recognition model.
6. The voice and language model based remote tour system control system according to claim 4, wherein the voice generating module is further configured to obtain an alert text of the remote tour system, and convert the alert text into a voice output.
CN202410389170.7A 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model Active CN117975968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410389170.7A CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410389170.7A CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Publications (2)

Publication Number Publication Date
CN117975968A CN117975968A (en) 2024-05-03
CN117975968B true CN117975968B (en) 2024-09-10

Family

ID=90864985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410389170.7A Active CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Country Status (1)

Country Link
CN (1) CN117975968B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067411A (en) * 2021-03-29 2021-07-02 北京智盟信通科技有限公司 Remote intelligent inspection system of transformer substation
CN115827750A (en) * 2022-11-02 2023-03-21 国网上海市电力公司 Inspection robot remote control system and method based on natural semantic recognition
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657680A (en) * 2017-08-30 2018-02-02 国网上海市电力公司 A kind of transformer substation remote monitoring system based on indoor substation crusing robot
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning
CA3164413A1 (en) * 2020-01-22 2021-07-29 Amit Choudhary Providing an intent suggestion to a user in a text-based conversation
CN111951805B (en) * 2020-07-10 2024-09-20 华为技术有限公司 A text data processing method and device
US11664010B2 (en) * 2020-11-03 2023-05-30 Florida Power & Light Company Natural language domain corpus data set creation based on enhanced root utterances
CN113270103A (en) * 2021-05-27 2021-08-17 平安普惠企业管理有限公司 Intelligent voice dialogue method, device, equipment and medium based on semantic enhancement
CN115858723A (en) * 2022-08-26 2023-03-28 国网江苏省电力有限公司无锡供电分公司 Query graph generation method and system for complex knowledge base question answering
CN116028608A (en) * 2023-01-10 2023-04-28 虎博网络技术(上海)有限公司 Question-answer interaction method, question-answer interaction device, computer equipment and readable storage medium
CN117290411B (en) * 2023-11-22 2024-02-13 深圳九有数据库有限公司 Multimode database query method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067411A (en) * 2021-03-29 2021-07-02 北京智盟信通科技有限公司 Remote intelligent inspection system of transformer substation
CN115827750A (en) * 2022-11-02 2023-03-21 国网上海市电力公司 Inspection robot remote control system and method based on natural semantic recognition
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Also Published As

Publication number Publication date
CN117975968A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN111429915A (en) A dispatch system and dispatch method based on speech recognition
CN113609264B (en) Data query method and device for power system nodes
KR102729987B1 (en) Apparatus, method and computer program for processing inquiry
WO2024164728A1 (en) Knowledge graph event extraction method and apparatus, device, and storage medium
Song et al. Speech-to-SQL: toward speech-driven SQL query generation from natural language question
CN117667992A (en) Method, device, equipment and medium for converting natural language problem into SQL sentence
JPH06266779A (en) Controller
CN117391095A (en) Natural language analysis method and device, electronic equipment and storage medium
CN119003746A (en) Multi-role digital person construction method based on multi-modal diagram retrieval enhancement generation
Wang et al. A framework for intelligent building information spoken dialogue system (iBISDS)
CN119783644A (en) An innovative system and method for automatically generating meeting minutes and intelligently refining them
CN114925707B (en) A multi-language translation method, device, equipment, and storage medium
CN117975968B (en) Remote patrol system control method and system based on sound and language model
CN118467680B (en) A dynamic multi-intent semantic understanding method, device, computer equipment and readable storage medium
Yu et al. Incorporating multimodal sentiments into conversational bots for service requirement elicitation
CN119066208A (en) A factual verification method for RAG system based on knowledge graph
EP4488881A1 (en) Incremental solves using llms for api calls
CN118761458A (en) Question answering method and device based on multimodal industrial large model
Milhorat et al. What if everyone could do it? a framework for easier spoken dialog system design
Seabra et al. Dynamic multi-agent orchestration and retrieval for multi-source question-answer systems using large language models
CN119513228B (en) Method and device for constructing interactive task knowledge base based on large language model
CN118331152B (en) Industrial control system logic optimization method and system based on natural language big model
KR102785215B1 (en) System and method for providing conversational artificial intelligence service using complex analysis of image and query
CN120216626A (en) Question-answer information processing method and device, electronic equipment and storage medium
CN120278281A (en) A method of event context analysis based on causality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant