[go: up one dir, main page]

CN110489517B - Automatic learning method and system of virtual assistant - Google Patents

Automatic learning method and system of virtual assistant Download PDF

Info

Publication number
CN110489517B
CN110489517B CN201810436639.2A CN201810436639A CN110489517B CN 110489517 B CN110489517 B CN 110489517B CN 201810436639 A CN201810436639 A CN 201810436639A CN 110489517 B CN110489517 B CN 110489517B
Authority
CN
China
Prior art keywords
vocabulary
corpus
data
intentions
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810436639.2A
Other languages
Chinese (zh)
Other versions
CN110489517A (en
Inventor
周忠信
吴兆麟
许旭正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingjie Shuzhi Co ltd
Original Assignee
Digiwin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digiwin Software Co Ltd filed Critical Digiwin Software Co Ltd
Priority to CN201810436639.2A priority Critical patent/CN110489517B/en
Publication of CN110489517A publication Critical patent/CN110489517A/en
Application granted granted Critical
Publication of CN110489517B publication Critical patent/CN110489517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

一种虚拟助理的自动学习方法及系统。虚拟助理的自动学习方法包含:接收音频输入并辨识音频以形成语料数据;利用自然语言处理模型分析语料数据,以产生与语料数据对应的语言特征信息;依据职能情境信息对语言特征信息进行职能情境分析,判断这些意图的其中之一对应的操作;如果职能情境分析无法判断这些意图的其中之一对应的操作,则针对语料数据进行分词处理;跟据分词处理后的结果,判断是否存在新词汇或新语料数据;如果存在新词汇,根据新词汇的意义更新自然语言处理模型,如果存在新语料数据,根据新语料数据的意图更新职能情境分析。借此,达到让使用者使用ERP系统时能够更快速便利的功效。

An automatic learning method and system for virtual assistants. The virtual assistant's automatic learning method includes: receiving audio input and identifying the audio to form corpus data; using natural language processing models to analyze the corpus data to generate language feature information corresponding to the corpus data; and performing functional contextualization of the language feature information based on the functional context information. Analysis to determine the operation corresponding to one of these intentions; if the functional situation analysis cannot determine the operation corresponding to one of these intentions, perform word segmentation processing on the corpus data; based on the results of the word segmentation processing, determine whether there is a new vocabulary Or new corpus data; if new vocabulary exists, the natural language processing model is updated according to the meaning of the new vocabulary; if new corpus data exists, the functional situation analysis is updated according to the intention of the new corpus data. In this way, users can use the ERP system more quickly and conveniently.

Description

虚拟助理的自动学习方法及系统Automatic learning method and system for virtual assistant

技术领域Technical field

本案是有关于一种自动学习的方法及系统,且特别是有关于一种虚拟助理的自动学习方法及系统。This case relates to an automatic learning method and system, and in particular, to an automatic learning method and system for a virtual assistant.

背景技术Background technique

企业资源规划系统(Enterprise Resource Planning,ERP),简称ERP系统,是指建立在信息技术的基础上为企业决策层提供决策的管理平台。其主要是将企业的人流、物流、信息流、资金流进行统一的管理,以最大限度的利用企业的资源。而ERP系统包含有生产控制、物流管理和财务管理等三大方面的功能,因此ERP系统规模非常的庞大。Enterprise Resource Planning (ERP), referred to as ERP system, refers to a management platform based on information technology that provides decision-making for enterprise decision-makers. Its main purpose is to uniformly manage the flow of people, logistics, information, and capital of the enterprise to maximize the use of the company's resources. The ERP system includes three major functions: production control, logistics management and financial management, so the scale of the ERP system is very large.

将虚拟助理应用于ERP系统中,更可以快速的帮助使用者与庞大的ERP系统交流,能够节省使用者在使用ERP系统中所花的时间,但由于每个使用者使用ERP系统习惯的不同,因此会有虚拟助理无法理解使用者问题的情况,反而造成使用者在使用ERP系统上的困难。Applying virtual assistants to ERP systems can quickly help users communicate with the huge ERP system, and can save users the time spent using the ERP system. However, due to the different habits of each user using the ERP system, Therefore, there will be situations where the virtual assistant cannot understand the user's problem, which in turn causes users difficulty in using the ERP system.

发明内容Contents of the invention

本发明的主要目的是在提供一种虚拟助理的自动学习方法及系统,其主要是让虚拟助理具有自动学习的功能,让虚拟助理可以在与使用者交流的过程中,自动学习到使用者的说话习惯,或是行业中的特殊用语用词,达到让使用者使用ERP系统是能够更快速便利的功效。The main purpose of the present invention is to provide an automatic learning method and system for a virtual assistant, which mainly allows the virtual assistant to have an automatic learning function, so that the virtual assistant can automatically learn the user's characteristics in the process of communicating with the user. Speaking habits, or special terms used in the industry, can make users use the ERP system faster and more conveniently.

为达成上述目的,本案的第一态样是在提供一种虚拟助理的自动学习方法,此方法包含以下步骤:接收音频输入并辨识音频以形成语料数据;利用自然语言处理模型分析语料数据,以产生与语料数据对应的语言特征信息,其中语言特征信息包含多个意图、所述多个意图对应的机率以及多个词汇;依据职能情境信息对语言特征信息进行职能情境分析,判断所述多个意图的其中之一对应的操作;如果职能情境分析无法判断所述多个意图的其中之一对应的操作,则针对语料数据进行分词处理;跟据分词处理后的结果,判断是否存在新词汇或新语料数据;如果存在新词汇,根据新词汇的意义更新自然语言处理模型,如果存在新语料数据,根据新语料数据的意图更新职能情境分析;其中,操作包含查询数据操作及执行指令操作的其中之一。In order to achieve the above purpose, the first aspect of this project is to provide an automatic learning method for virtual assistants. This method includes the following steps: receiving audio input and identifying the audio to form corpus data; using a natural language processing model to analyze the corpus data to Generate language feature information corresponding to the corpus data, where the language feature information includes multiple intentions, probabilities corresponding to the multiple intentions, and multiple words; perform functional context analysis on the language feature information based on the functional context information, and determine the multiple The operation corresponding to one of the intentions; if the functional situation analysis cannot determine the operation corresponding to one of the multiple intentions, perform word segmentation processing on the corpus data; according to the result of the word segmentation processing, determine whether there is a new vocabulary or New corpus data; if there is new vocabulary, update the natural language processing model according to the meaning of the new vocabulary; if there is new corpus data, update the functional situation analysis according to the intention of the new corpus data; among them, the operations include query data operations and execution instruction operations. one.

根据本案一实施例,还包含:根据一应用知识数据库及一领域知识数据库产生一系统领域词汇集合;该系统领域词汇集合及多个服务应用参数形成为一关键实体集合,该关键实体集合包含多个系统领域词汇;将多个训练语料分类为该查询数据操作及该执行指令操作的其中之一;依照该企业数据库中的类别区分对应该查询数据操作的所述多个训练语料的意图形成多个查询数据操作意图,以及依照该企业资源系统提供的服务行为区分对应该执行指令操作的所述多个训练语料的意图形成多个执行指令操作意图;建立所述多个查询数据操作意图的范本,以及所述多个执行指令操作意图的范本;根据该关键实体集合、所述多个查询数据操作意图的范本以及所述多个执行指令操作意图的范本建立该总体数据库;辨识该关键实体集合中的所述多个系统领域词汇在所述多个训练语料中出现的多个第一机率,并通过辨识出的所述多个系统领域词汇分析所述多个训练语料的多个句型结构,以及所述多个系统领域词汇彼此之间的多个关联性,并根据所述多个第一机率以及所述多个关联性建立一共通词汇模型;以及分析所述多个查询数据操作意图以及所述多个执行指令操作意图中出现所述多个系统领域词汇的多个第二机率,并根据所述多个句型结构以及所述多个第二机率建立一共通语意模型。According to an embodiment of the present case, it also includes: generating a system domain vocabulary set based on an application knowledge database and a domain knowledge database; the system domain vocabulary set and multiple service application parameters form a key entity set, and the key entity set includes multiple system domain vocabulary; classifying multiple training corpora into one of the query data operation and the execution instruction operation; distinguishing the multiple training corpora corresponding to the query data operation according to the categories in the enterprise database to form multiple A query data operation intention, and the intention of distinguishing the plurality of training corpora corresponding to the execution instruction operation according to the service behavior provided by the enterprise resource system to form multiple execution instruction operation intentions; establishing a template for the plurality of query data operation intentions , and the templates of the plurality of execution instruction operation intentions; establishing the overall database according to the key entity set, the plurality of query data operation intention templates and the plurality of execution instruction operation intention templates; identifying the key entity set A plurality of first probabilities that the plurality of system domain words appear in the plurality of training corpora, and the plurality of sentence structures of the plurality of training corpora are analyzed through the identified plurality of system domain words. , and multiple correlations between the plurality of system domain words, and establishing a common vocabulary model based on the plurality of first probabilities and the plurality of correlations; and analyzing the plurality of query data operation intentions and a plurality of second probabilities that the plurality of system domain words appear in the plurality of execution instruction operation intentions, and a common semantic model is established based on the plurality of sentence structures and the plurality of second probabilities.

根据本案一实施例,还包含:利用一分类器将一历史数据库中的数据进行关系强弱分类,产生一职能情境模型;以及将所述多个训练语料进行断词及分析,并根据该历史数据库中的数据产生一职能词汇模型。According to an embodiment of the present case, it also includes: using a classifier to classify the data in a historical database into strong and weak relationships to generate a functional situation model; and segmenting and analyzing the plurality of training corpora, and analyzing the data according to the history The data in the database generates a functional vocabulary model.

根据本案一实施例,该职能情境分析还包含:利用该语料数据以及该职能情境信息与该职能情境模型进行比对,并产生一职能情境辨识结果;以及根据该职能情境辨识结果判断所述多个意图的其中之一对应该查询数据操作及该执行指令操作的其中之一。According to an embodiment of this case, the functional situation analysis also includes: using the corpus data and the functional situation information to compare with the functional situation model, and generating a functional situation identification result; and judging the multiple functions based on the functional situation identification result. One of the intentions corresponds to one of the query data operation and the execution instruction operation.

根据本案一实施例,该分词处理还包含:根据该职能词汇模型对该语料数据进行断词,以产生多个分词;以及计算所述多个分词的频率。According to an embodiment of the present case, the word segmentation processing further includes: segmenting the corpus data according to the functional vocabulary model to generate multiple word segments; and calculating the frequencies of the multiple word segments.

根据本案一实施例,还包含:判断该分词处理计算出的所述多个分词的频率是否低于一门槛值;如果所述多个分词的其中之一低于该门槛值,所述多个分词的其中之一则为该新词汇,并接收该新词汇的定义,以更新该共通词汇模型及该共通语意模型;以及如果所述多个分词均高于该门槛值,则该语料数据则为该新语料数据,并接收该新语料数据的意图,以更新该职能情境模型。According to an embodiment of this case, it also includes: determining whether the frequencies of the plurality of word segments calculated by the word segmentation processing are lower than a threshold; if one of the plurality of word segments is lower than the threshold, the plurality of word segments are One of the word segments is the new vocabulary, and the definition of the new vocabulary is received to update the common vocabulary model and the common semantic model; and if the plurality of word segments are higher than the threshold, then the corpus data is The new corpus data is received, and the intention of receiving the new corpus data is used to update the functional situation model.

根据本案一实施例,还包含:判断该新语料数据是否为共通语料,如果是则根据该新语料数据更新该系统领域词汇集合;以及根据该新词汇更新该系统领域词汇集合。According to an embodiment of the present case, the method further includes: determining whether the new corpus data is common corpus, and if so, updating the system domain vocabulary set according to the new corpus data; and updating the system domain vocabulary set according to the new vocabulary.

根据本案一实施例,该自然语言处理模型分析该语料数据还包含:利用该共通词汇模型辨识该语料数据中是否具有符合该关键实体集合中的所述多个系统领域词汇,将辨识结果设定为所述多个词汇,并分析所述多个词汇出现的机率;根据所述多个词汇分析该语料数据的句型结构;以及利用该共通语意模型根据所述多个词汇出现的机率以及该语料数据的句型结构辨识该语料数据的所述多个意图以及所述多个意图对应的机率。According to an embodiment of the present case, the natural language processing model analyzing the corpus data further includes: using the common vocabulary model to identify whether the corpus data contains the plurality of system domain words in the key entity set, and setting the identification results for the plurality of words, and analyze the probability of occurrence of the plurality of words; analyze the sentence structure of the corpus data according to the plurality of words; and use the common semantic model according to the probability of occurrence of the plurality of words and the The sentence structure of the corpus data identifies the plurality of intentions of the corpus data and the probabilities corresponding to the plurality of intentions.

本案的第二态样是在提供一种虚拟助理的自动学习系统,分别与企业数据库及企业资源系统连接,其包含:处理器、储存装置以及输入/输出装置。储存装置电性连接至处理器,用以储存总体数据库、应用知识数据库、领域知识数据库以及历史数据库。输入/输出装置电性连接至处理器,用以提供接口以供输入音频。其中,处理器包含:语音辨识模块、语料分析模块、情境辨识模块、未知语料判断模块以及更新信息模块。语音辨识模块用以辨识音频以形成语料数据。语料分析模块与语音辨识模块电性连接,用以利用自然语言处理模型分析语料数据,以产生与语料数据对应的语言特征信息,其中语言特征信息包含多个意图、所述多个意图对应的机率以及多个词汇。情境辨识模块与语料分析模块电性连接,用以依据职能情境信息对语言特征信息进行职能情境分析,判断所述多个意图的其中之一对应的操作。未知语料判断模块与情境辨识模块电性连接,用以在情境辨识模块无法辨识所述多个意图的其中之一对应的操作时,针对语料数据进行分词处理,并跟据分词处理后的结果,判断是否存在新词汇或新语料数据。更新信息模块与未知语料判断模块电性连接,用以在有新词汇产生时,根据该新词汇的意义更新该自然语言处理模型,以及在该新语料数据产生时,根据该新语料数据的意图更新该职能情境分析;其中,该操作包含一查询数据操作及一执行指令操作的其中之一。The second aspect of this case is to provide an automatic learning system for a virtual assistant, which is connected to an enterprise database and an enterprise resource system respectively, and includes: a processor, a storage device, and an input/output device. The storage device is electrically connected to the processor and is used to store the overall database, application knowledge database, domain knowledge database and historical database. The input/output device is electrically connected to the processor and provides an interface for inputting audio. Among them, the processor includes: a speech recognition module, a corpus analysis module, a situation recognition module, an unknown corpus judgment module, and an update information module. The speech recognition module is used to recognize audio to form corpus data. The corpus analysis module is electrically connected to the speech recognition module, and is used to analyze the corpus data using a natural language processing model to generate language feature information corresponding to the corpus data, where the language feature information includes multiple intentions and probabilities corresponding to the multiple intentions. and multiple vocabulary words. The situation identification module is electrically connected to the corpus analysis module, and is used to perform functional situation analysis on the language feature information based on the functional situation information, and determine the operation corresponding to one of the plurality of intentions. The unknown corpus judgment module is electrically connected to the situation identification module, and is used to perform word segmentation processing on the corpus data when the situation identification module cannot identify the operation corresponding to one of the plurality of intentions, and based on the result of the word segmentation processing, Determine whether there are new vocabulary or new corpus data. The update information module is electrically connected to the unknown corpus judgment module, and is used to update the natural language processing model according to the meaning of the new vocabulary when a new vocabulary is generated, and to update the natural language processing model according to the intention of the new corpus data when the new corpus data is generated. The functional scenario analysis is updated; wherein the operation includes one of a data query operation and an instruction execution operation.

根据本案一实施例,该处理器还包含:一训练模块,与该语料分析模块电性连接,用以根据该应用知识数据库及该领域知识数据库产生一系统领域词汇集合,该系统领域词汇集合及多个服务应用参数形成为一关键实体集合,该关键实体集合包含多个系统领域词汇,并将多个训练语料分类为该查询数据操作及该执行指令操作的其中之一,依照该企业数据库中的类别区分对应该查询数据操作的所述多个训练语料的意图形成多个查询数据操作意图,以及依照该企业资源系统提供的服务行为区分对应该执行指令操作的所述多个训练语料的意图形成多个执行指令操作意图;一范本建立模块,与该训练模块电性连接,建立所述多个查询数据操作意图的范本,以及所述多个执行指令操作意图的范本,根据该关键实体集合、所述多个查询数据操作意图的范本以及所述多个执行指令操作意图的范本建立该总体数据库;一词汇模型建立模块,与该范本建立模块电性连接,辨识该关键实体集合中的所述多个系统领域词汇在所述多个训练语料中出现的多个第一机率,并通过辨识出的所述多个系统领域词汇分析所述多个训练语料的多个句型结构,以及所述多个系统领域词汇彼此之间的多个关联性,并根据所述多个第一机率以及所述多个关联性建立一共通词汇模型;以及一语意模型建立模块,与该范本建立模块电性连接,分析所述多个查询数据操作意图以及所述多个执行指令操作意图中出现所述多个系统领域词汇的多个第二机率,并根据所述多个句型结构以及所述多个第二机率建立一共通语意模型。According to an embodiment of the present case, the processor further includes: a training module, electrically connected to the corpus analysis module, for generating a system domain vocabulary set based on the application knowledge database and the domain knowledge database. The system domain vocabulary set and Multiple service application parameters are formed into a key entity set. The key entity set includes multiple system domain vocabularies, and multiple training corpora are classified into one of the query data operation and the execution instruction operation. According to the enterprise database Classify the intentions of the plurality of training corpora corresponding to the query data operation to form multiple query data operation intentions, and differentiate the intentions of the plurality of training corpora corresponding to the instruction operation according to the service behavior provided by the enterprise resource system Form multiple execution instruction operation intentions; a template creation module, electrically connected to the training module, establishes the templates of the plurality of query data operation intentions, and the templates of the multiple execution instruction operation intentions, according to the key entity set , the plurality of query data operation intention templates and the plurality of execution instruction operation intention templates establish the overall database; a vocabulary model establishment module is electrically connected to the template establishment module to identify all the key entity sets in the key entity set. A plurality of first probabilities of the plurality of system domain words appearing in the plurality of training corpora, and a plurality of sentence structures of the plurality of training corpora are analyzed through the identified plurality of system domain words, and the Describe multiple correlations between multiple system domain words, and establish a common vocabulary model based on the multiple first probabilities and the multiple correlations; and a semantic model building module, electrically connected to the template building module sexual connection, analyze the plurality of second probabilities that the plurality of system domain words appear in the plurality of query data operation intentions and the plurality of execution instruction operation intentions, and based on the plurality of sentence structures and the plurality of execution instruction operation intentions, A second probability is to establish a common semantic model.

根据本案一实施例,该处理器还包含:一情境训练模块,与该情境分析模块电性连接,用以利用一分类器将该历史数据库中的数据进行关系强弱分类,产生一职能情境模型;以及一词汇训练模块,与该未知语料判断模块电性连接,用以将所述多个训练语料进行断词及分析,并根据该历史数据库中的数据产生一职能词汇模型。According to an embodiment of the present case, the processor further includes: a situation training module, electrically connected to the situation analysis module, for using a classifier to classify the data in the historical database into strong and weak relationships to generate a functional situation model. ; And a vocabulary training module, electrically connected to the unknown corpus judgment module, used to segment and analyze the plurality of training corpus, and generate a functional vocabulary model based on the data in the historical database.

根据本案一实施例,该情境分析模块更用以利用该语料数据以及该职能情境信息与该职能情境模型进行比对,并产生一职能情境辨识结果,以及根据该职能情境辨识结果判断所述多个意图的其中之一对应该查询数据操作及该执行指令操作的其中之一。According to an embodiment of the present case, the situation analysis module is further used to compare the corpus data and the functional situation information with the functional situation model, generate a functional situation identification result, and judge the multiple functions based on the functional situation identification result. One of the intentions corresponds to one of the query data operation and the execution instruction operation.

根据本案一实施例,该未知语料判断模块更用以根据该职能词汇模型对该语料数据进行断词,以产生多个分词,以计算所述多个分词的频率。According to an embodiment of the present case, the unknown corpus judgment module is further used to segment the corpus data according to the functional vocabulary model to generate a plurality of word segments and calculate the frequencies of the plurality of word segments.

根据本案一实施例,该更新信息模块更用以判断该分词处理计算出的所述多个分词的频率是否低于一门槛值;如果所述多个分词的其中之一低于该门槛值,所述多个分词的其中之一则为该新词汇,并接收该新词汇的定义,以更新该共通词汇模型及该共通语意模型;如果所述多个分词均高于该门槛值,则该语料数据则为该新语料数据,并接收该新语料数据的意图,以更新该职能情境模型。According to an embodiment of the present case, the update information module is further used to determine whether the frequencies of the plurality of word segments calculated by the word segmentation processing are lower than a threshold; if one of the plurality of word segments is lower than the threshold, One of the plurality of word segments is the new vocabulary, and the definition of the new vocabulary is received to update the common vocabulary model and the common semantic model; if the plurality of word segments are higher than the threshold, then the The corpus data is the new corpus data, and the intention of the new corpus data is received to update the functional situation model.

根据本案一实施例,该更新信息模块更用以判断该新语料数据是否为共通语料,如果是则根据该新语料数据更新该系统领域词汇集合;以及根据该新词汇更新该系统领域词汇集合。According to an embodiment of the present case, the update information module is further used to determine whether the new corpus data is common corpus, and if so, update the system domain vocabulary set according to the new corpus data; and update the system domain vocabulary set according to the new vocabulary.

根据本案一实施例,该语料分析模块更用以利用该共通词汇模型辨识该语料数据中是否具有符合该关键实体集合中的所述多个系统领域词汇,将辨识结果设定为所述多个词汇,并分析所述多个词汇出现的机率,根据所述多个词汇分析该语料数据的句型结构,并利用该共通语意模型根据所述多个词汇出现的机率以及该语料数据的句型结构辨识该语料数据的所述多个意图以及所述多个意图对应的机率。According to an embodiment of the present case, the corpus analysis module is further used to use the common vocabulary model to identify whether the corpus data contains the plurality of system domain words in the key entity set, and set the identification result to the plurality of system domain words in the key entity set. Vocabulary, and analyze the probability of occurrence of the multiple vocabulary, analyze the sentence structure of the corpus data based on the multiple vocabulary, and use the common semantic model to analyze the probability of occurrence of the multiple vocabulary and the sentence pattern of the corpus data The structure identifies the plurality of intentions of the corpus data and the probabilities corresponding to the plurality of intentions.

本发明的虚拟助理的自动学习方法及虚拟助理的自动学习系统主要是让虚拟助理具有自动学习的功能,让虚拟助理可以在与使用者交流的过程中,自动学习到使用者的说话习惯,或是行业中的特殊用语用词,达到让使用者使用ERP系统时能够更快速便利的功效。The automatic learning method of the virtual assistant and the automatic learning system of the virtual assistant of the present invention mainly allow the virtual assistant to have an automatic learning function, so that the virtual assistant can automatically learn the user's speaking habits during the process of communicating with the user, or It is a special term used in the industry to make users use the ERP system faster and more conveniently.

附图说明Description of the drawings

为让本发明的上述和其他目的、特征、优点与实施例能更明显易懂,所附附图的说明如下:In order to make the above and other objects, features, advantages and embodiments of the present invention more apparent and understandable, the accompanying drawings are described as follows:

图1是根据本案的一些实施例所绘示的一种虚拟助理的自动学习系统的示意图;Figure 1 is a schematic diagram of an automatic learning system for a virtual assistant according to some embodiments of the present case;

图2是根据本案的一些实施例所绘示的处理器的示意图;Figure 2 is a schematic diagram of a processor according to some embodiments of the present application;

图3是根据本案的一些实施例所绘示的一种虚拟助理的自动学习方法的流程图;Figure 3 is a flow chart of an automatic learning method for a virtual assistant according to some embodiments of the present case;

图4是根据本案的一些实施例所绘示的训练数据模型的流程图;Figure 4 is a flow chart of a training data model according to some embodiments of this case;

图5是根据本案的一些实施例所绘示的步骤S320的流程图;Figure 5 is a flow chart of step S320 according to some embodiments of the present case;

图6是根据本案的一些实施例所绘示的步骤S330的流程图;Figure 6 is a flow chart of step S330 according to some embodiments of the present case;

图7是根据本案的一些实施例所绘示的步骤S340的流程图;以及Figure 7 is a flow chart of step S340 according to some embodiments of this case; and

图8是根据本案的一些实施例所绘示的步骤S360的流程图。Figure 8 is a flowchart of step S360 according to some embodiments of the present application.

具体实施方式Detailed ways

以下揭示提供许多不同实施例或例证用以实施本发明的不同特征。特殊例证中的元件及配置在以下讨论中被用来简化本揭示。所讨论的任何例证只用来作解说的用途,并不会以任何方式限制本发明或其例证的范围和意义。此外,本揭示在不同例证中可能重复引用数字符号且/或字母,这些重复皆为了简化及阐述,其本身并未指定以下讨论中不同实施例且/或配置之间的关系。The following disclosure provides many different embodiments or illustrations for implementing various features of the invention. Particular illustrations of components and arrangements are used in the following discussion to simplify the present disclosure. Any examples discussed are for illustrative purposes only and do not limit in any way the scope and significance of the invention or its examples. In addition, this disclosure may repeatedly refer to numerical symbols and/or letters in different examples. These repetitions are for simplicity and explanation, and do not themselves specify the relationship between different embodiments and/or configurations in the following discussion.

在全篇说明书与权利要求书所使用的用词(terms),除有特别注明外,通常具有每个用词使用在此领域中、在此揭露的内容中与特殊内容中的平常意义。某些用以描述本揭露的用词将于下或在此说明书的别处讨论,以提供本领域技术人员在有关本揭露的描述上额外的引导。Unless otherwise noted, the terms used throughout the specification and claims generally have their ordinary meanings as used in the field, in the disclosure and in the specific context. Certain terms used to describe the present disclosure are discussed below or elsewhere in this specification to provide those skilled in the art with additional guidance in describing the present disclosure.

关于本文中所使用的“耦接”或“连接”,均可指二或多个元件相互直接作实体或电性接触,或是相互间接作实体或电性接触,而“耦接”或“连接”还可指二或多个元件相互操作或动作。As used herein, "coupling" or "connected" may refer to two or more elements that are in direct physical or electrical contact with each other, or that are in indirect physical or electrical contact with each other, and "coupled" or "connected" "Connected" can also refer to the mutual operation or action of two or more elements.

在本文中,使用第一、第二与第三等等的词汇,是用于描述各种元件、组件、区域、层与/或区块是可以被理解的。但是这些元件、组件、区域、层与/或区块不应该被这些术语所限制。这些词汇只限于用来辨别单一元件、组件、区域、层与/或区块。因此,在下文中的一第一元件、组件、区域、层与/或区块也可被称为第二元件、组件、区域、层与/或区块,而不脱离本发明的本意。如本文所用,词汇“与/或”包含了列出的关联项目中的一个或多个的任何组合。本案文件中提到的“及/或”是指表列元件的任一者、全部或至少一者的任意组合。It will be understood that the terms first, second, third, etc. are used herein to describe various elements, components, regions, layers and/or blocks. However, these elements, components, regions, layers and/or blocks should not be limited by these terms. These terms are limited to identifying a single element, component, region, layer and/or block. Therefore, a first element, component, region, layer and/or block below can also be termed as a second element, component, region, layer and/or block without departing from the spirit of the present invention. As used herein, the term "and/or" includes any combination of one or more of the associated listed items. The "and/or" mentioned in this document refers to any one, all or any combination of at least one of the listed elements.

请参阅图1。图1是根据本案的一些实施例所绘示的一种虚拟助理的自动学习系统100的示意图。如图1所绘示,虚拟助理的自动学习系统100包含处理器110、储存装置130以及输入/输出装置150。储存装置130用以储存总体数据库131、应用知识数据库132、领域知识数据库133以及历史数据库134,储存总体数据库131、应用知识数据库132、领域知识数据库133以及历史数据库134电性连接至处理器110。输入/输出装置150电性连接至处理器110,用以提供接口以供输入音频。于一实施例中,输入/输出装置150可以是键盘、触控式屏幕、麦克风、喇叭或其它合适的输入/输出装置。使用者可透过输入/输出装置提供的接口输入音频。See Figure 1. Figure 1 is a schematic diagram of an automatic learning system 100 for a virtual assistant according to some embodiments of the present application. As shown in FIG. 1 , the virtual assistant automatic learning system 100 includes a processor 110 , a storage device 130 and an input/output device 150 . The storage device 130 is used to store the overall database 131 , the application knowledge database 132 , the domain knowledge database 133 and the history database 134 . The storage device 131 , the application knowledge database 132 , the domain knowledge database 133 and the history database 134 are electrically connected to the processor 110 . The input/output device 150 is electrically connected to the processor 110 to provide an interface for inputting audio. In one embodiment, the input/output device 150 may be a keyboard, a touch screen, a microphone, a speaker, or other suitable input/output devices. Users can input audio through the interface provided by the input/output device.

于本发明各实施例中,处理器110可以实施为集成电路如微控制单元(microcontroller)、微处理器(microprocessor)、数字信号处理器(digital signalprocessor)、特殊应用集成电路(application specific integrated circuit,ASIC)、逻辑电路或其他类似元件或上述元件的组合。储存装置150可以实施为记忆体、硬盘、随身盘、记忆卡等。In various embodiments of the present invention, the processor 110 may be implemented as an integrated circuit such as a microcontroller, a microprocessor, a digital signal processor, or an application specific integrated circuit. ASIC), logic circuits or other similar components or a combination of the above components. The storage device 150 can be implemented as a memory, a hard disk, a pen drive, a memory card, etc.

请参阅图2,图2是根据本案的一些实施例所绘示的一种处理器110的示意图。处理器110包含语音辨识模块111、语料分析模块112、情境辨识模块113、未知语料判断模块114、更新信息模块115、训练模块121、范本建立模块122、语意模型建立模块123、词汇模型建立模块124、情境训练模块125以及词汇训练模块126。语料分析模块112与语音辨识模块111电性连接,情境辨识模块113与语料分析模块112电性连接,未知语料判断模块114与情境判断模块113电性连接,更新信息模块115与未知语料判断模块114电性连接。训练模块121与语料分析模块112电性连接,范本建立模块122与训练模块121电性连接,语意模型建立模块123以及词汇模型建立模块124与范本建立模块122电性连接,情境训练模块125与情境辨识模块113电性连接,未知语料判断模块114与词汇训练模块126电性连接。Please refer to FIG. 2. FIG. 2 is a schematic diagram of a processor 110 according to some embodiments of the present application. The processor 110 includes a speech recognition module 111, a corpus analysis module 112, a situation recognition module 113, an unknown corpus judgment module 114, an update information module 115, a training module 121, a template establishment module 122, a semantic model establishment module 123, and a vocabulary model establishment module 124. , situation training module 125 and vocabulary training module 126. The corpus analysis module 112 is electrically connected to the speech recognition module 111, the situation recognition module 113 is electrically connected to the corpus analysis module 112, the unknown corpus judgment module 114 is electrically connected to the situation judgment module 113, and the update information module 115 is to the unknown corpus judgment module 114. Electrical connection. The training module 121 is electrically connected to the corpus analysis module 112, the template building module 122 is electrically connected to the training module 121, the semantic model building module 123 and the vocabulary model building module 124 are electrically connected to the template building module 122, and the situation training module 125 is connected to the situation The recognition module 113 is electrically connected, and the unknown corpus judgment module 114 and the vocabulary training module 126 are electrically connected.

请一并参阅图1~图3。图3是根据本案的一些实施例所绘示的一种虚拟助理的自动学习方法300的流程图。如图3所示,虚拟助理的自动学习方法300包含以下步骤:Please refer to Figures 1 to 3 together. Figure 3 is a flow chart of an automatic learning method 300 for a virtual assistant according to some embodiments of the present case. As shown in Figure 3, the automatic learning method 300 of the virtual assistant includes the following steps:

步骤S310:接收音频输入并辨识音频以形成语料数据;Step S310: Receive audio input and recognize the audio to form corpus data;

步骤S320:利用自然语言处理模型分析语料数据,以产生与语料数据对应的语言特征信息;Step S320: Use a natural language processing model to analyze the corpus data to generate language feature information corresponding to the corpus data;

步骤S330:依据职能情境信息对语言特征信息进行职能情境分析,判断这些意图的其中之一对应的操作;Step S330: Perform functional situation analysis on the language feature information based on the functional situation information, and determine the operation corresponding to one of these intentions;

步骤S340:如果职能情境分析无法判断这些意图的其中之一对应的操作,则针对语料数据进行分词处理;Step S340: If the functional situation analysis cannot determine the operation corresponding to one of these intentions, perform word segmentation processing on the corpus data;

步骤S350:跟据分词处理后的结果,判断是否存在新词汇或新语料数据;以及Step S350: According to the result of word segmentation processing, determine whether there is new vocabulary or new corpus data; and

步骤S360:如果存在新词汇,根据新词汇的意义更新自然语言处理模型,如果存在新语料数据,根据新语料数据的意图更新职能情境分析。Step S360: If there is a new vocabulary, update the natural language processing model according to the meaning of the new vocabulary. If there is new corpus data, update the functional situation analysis according to the intention of the new corpus data.

于步骤S310中,接收音频输入并辨识音频以形成语料数据。于一实施例中,经由输入/输出装置150接收到的音频可以由处理器110的语音辨识模块111进行语音辨识,将使用者的自然语言转换为语料数据。于另一实施例中,语音辨识也可以通过网际网路将音频传送至云端语音辨识系统,经由云端语音辨识系统辨识音频后,再将辨识结果作为语料数据,举例而言,云端语音辨识系统可以实施为google的语音辨识系统。In step S310, audio input is received and the audio is recognized to form corpus data. In one embodiment, the audio received through the input/output device 150 can be speech recognized by the speech recognition module 111 of the processor 110 to convert the user's natural language into corpus data. In another embodiment, speech recognition can also transmit the audio to the cloud speech recognition system through the Internet. After the cloud speech recognition system recognizes the audio, the recognition result is used as corpus data. For example, the cloud speech recognition system can Implemented as Google's speech recognition system.

在执行步骤S320之前,需先建立共通词汇模型以及共通语意模型。因此请参考图4,图4是根据本案的一些实施例所绘示的训练数据模型的流程图。如图4所示,训练数据模型阶段包含以下步骤:Before executing step S320, a common vocabulary model and a common semantic model need to be established first. Therefore, please refer to Figure 4, which is a flow chart of a training data model according to some embodiments of the present application. As shown in Figure 4, the training data model phase includes the following steps:

步骤S410:根据应用知识数据库及领域知识数据库产生系统领域词汇集合;Step S410: Generate a system domain vocabulary set according to the application knowledge database and the domain knowledge database;

步骤S420:系统领域词汇集合及多个服务应用参数形成为关键实体集合;Step S420: The system domain vocabulary set and multiple service application parameters are formed into a key entity set;

步骤S430:将多个训练语料分类为查询数据操作及执行指令操作的其中之一;Step S430: Classify multiple training corpora into one of query data operations and instruction execution operations;

步骤S440:依照企业数据库中的类别区分对应查询数据操作的这些训练语料的意图形成多个查询数据操作意图,以及依照企业资源系统提供的服务行为区分对应执行指令操作的这些训练语料的意图形成多个执行指令操作意图;Step S440: Differentiate the intentions of the training corpus corresponding to the query data operation according to the categories in the enterprise database to form multiple query data operation intentions, and differentiate the intentions of the training corpus corresponding to the execution of the instruction operation according to the service behavior provided by the enterprise resource system to form multiple An intention to execute the instruction operation;

步骤S450:建立查询数据操作意图的范本,以及执行指令操作意图的范本;Step S450: Establish a template of the query data operation intention and a template of the instruction operation intention;

步骤S460:根据关键实体集合、查询数据操作意图的范本以及执行指令操作意图的范本建立总体数据库;Step S460: Establish an overall database based on the key entity set, the template of the query data operation intention, and the template of the instruction operation intention;

步骤S470:辨识关键实体集合中的系统领域词汇在训练语料中出现的多个第一机率,并通过辨识出的系统领域词汇分析训练语料的多个句型结构,以及系统领域词汇彼此之间的多个关联性,并根据第一机率以及关联性建立共通词汇模型;以及Step S470: Identify multiple first probabilities that the system domain vocabulary in the key entity set appears in the training corpus, and analyze multiple sentence structures of the training corpus through the identified system domain vocabulary, as well as the relationships between the system domain vocabulary and each other. Multiple correlations, and establish a common vocabulary model based on the first probability and correlation; and

步骤S480:分析查询数据操作意图以及执行指令操作意图中出现系统领域词汇的多个第二机率,并根据句型结构以及第二机率建立共通语意模型。Step S480: Analyze multiple second probabilities of system domain words appearing in the query data operation intention and the execution instruction operation intention, and establish a common semantic model based on the sentence structure and the second probabilities.

于步骤S410及步骤S420中,根据应用知识数据库132及领域知识数据库133产生系统领域词汇集合,再利用系统领域词汇集合及多个服务应用参数形成为关键实体集合,关键实体集合包含多个系统领域词汇。举例而言,关键实体集合包含企业领域词汇以及企业系统的服务应用参数等信息。企业领域词汇则是指每个不同领域的企业可能会需要用到的词汇,例如医疗业运用到的词汇与运输业运用到的词汇一定不相同,因此企业领域词汇会依照每个使用ERP系统的企业不同而有所变化。企业系统的服务应用参数则是企业系统所提供的各项服务对应的参数,举例而言,企业系统中的请假功能可能需要请假时间、假别等信息,关键实体集合中的系统领域词汇就需要包含事假、年假、病假、出差假等信息。In step S410 and step S420, a system domain vocabulary set is generated according to the application knowledge database 132 and the domain knowledge database 133, and then a key entity set is formed using the system domain vocabulary set and multiple service application parameters. The key entity set includes multiple system domains. vocabulary. For example, the key entity set contains information such as enterprise domain vocabulary and service application parameters of the enterprise system. Enterprise domain vocabulary refers to the vocabulary that companies in different fields may need to use. For example, the vocabulary used in the medical industry must be different from the vocabulary used in the transportation industry. Therefore, the enterprise domain vocabulary will be based on each company using the ERP system. It varies from company to company. The service application parameters of the enterprise system are the parameters corresponding to the services provided by the enterprise system. For example, the leave function in the enterprise system may require leave time, leave type and other information, and the system domain vocabulary in the key entity set will need Contains personal leave, annual leave, sick leave, business trip leave and other information.

详细而言,关键实体集合还包含存取数据时会有的数据栏位名称、企业系统提供给使用者的服务名称、使用者在查询时所设定的限制条件的参数值、服务应用的参数值以及企业系统的操作函数等,企业系统的操作函数可以为请假、加班申请、出差申请、报支等操作函数。而上述的这些信息也可能会有对应的别名,也需在训练数据库时一并输入,例如:出货单对于特定领域的厂商有可能有出货明细表或销货单等不同的名称。In detail, the key entity set also includes the data field names that will be used when accessing data, the service names provided by the enterprise system to users, the parameter values of the restrictions set by users when querying, and the parameters of service applications. Values and operating functions of the enterprise system, etc. The operating functions of the enterprise system can be leave, overtime application, business trip application, expense reimbursement and other operating functions. The above information may also have corresponding aliases, which need to be input when training the database. For example, a shipping order may have different names such as shipping details or sales orders for manufacturers in specific fields.

于步骤S430中,将多个训练语料分类为查询数据操作及执行指令操作的其中之一。训练语料可以是使用者的可能会下的指令或会问的问题等自然语言的数据,在建立好关键实体集合后会将训练语料按照意图分类,于一实施例中,使用者的意图分为查询数据操作及执行指令操作,但也可以将使用者的意图分类的更精细,本发明不限于此。举例而言,使用者如果对虚拟助理说:「请帮我找XX公司的出货单」,在本发明的意图分类中会分类为查询数据操作,虚拟助理就会去企业数据库中帮使用者查询XX公司的出货单。如果使用者对虚拟助理说:「帮我请1月30日的出差假」,在本发明的意图分类中会分类为执行指令操作,虚拟助理就会进入企业资源系统中帮使用者请假。In step S430, multiple training corpora are classified into one of query data operations and instruction execution operations. The training corpus can be natural language data such as instructions the user may give or questions he may ask. After the key entity set is established, the training corpus will be classified according to intention. In one embodiment, the user's intention is divided into Query data operations and execute command operations, but the user's intentions can also be classified more precisely, and the present invention is not limited to this. For example, if the user says to the virtual assistant: "Please help me find the shipment order of XX company", it will be classified as a query data operation in the intention classification of the present invention, and the virtual assistant will go to the enterprise database to help the user Check the shipment order of XX company. If the user says to the virtual assistant: "Please help me ask for leave for a business trip on January 30th", it will be classified as executing the command operation in the intention classification of the present invention, and the virtual assistant will enter the enterprise resource system to help the user ask for leave.

于步骤S440中,依照企业数据库中的类别区分对应查询数据操作的这些训练语料的意图形成多个查询数据操作意图,以及依照企业资源系统提供的服务行为区分对应执行指令操作的这些训练语料的意图形成多个执行指令操作意图。于一实施例中,会先按照每个不同领域的企业数据库对查询数据操作区分意图。举例而言,医疗业的企业数据库所储存的数据栏位一定与运输业的企业数据库不相同,因此两者的使用者需求也不一定相同。例如,对医疗业的使用者可能会有查询病历数据、查询病房空位等都是查询数据操作的不同意图,对运输业的使用者可能会有查询出货纪录、查询包裹运送状态等都是查询数据操作的不同意图。当然也会按照每个不同领域的企业资源系统提供的服务行为对执行指令操作区分意图,如上所述医疗业的企业资源系统所提供的服务也当然会和运输业有所不同,每个不同领域的企业所提供的查询数据操作或服务行为操作也不一定可以通用,因此也需要对每个不同领域的企业所提供的服务区分意图,例如,对医疗业的使用者可能会有提供挂号的服务、提供住院订健康餐的服务等都是服务行为操作的不同意图,对运输业的使用者可能会有提供自动分类货物的服务、安排货物出货顺序的服务等都是服务行为操作的不同意图。In step S440, the intentions of the training corpus corresponding to the query data operation are distinguished according to the categories in the enterprise database to form multiple query data operation intentions, and the intentions of the training corpus corresponding to the execution of the instruction operation are distinguished according to the service behavior provided by the enterprise resource system. Form multiple execution instruction operation intentions. In one embodiment, the intent of the query data operation is first distinguished according to the enterprise database in each different field. For example, an enterprise database in the medical industry will store different data fields than an enterprise database in the transportation industry, so the user needs of the two may not be the same. For example, users in the medical industry may query medical record data, query ward vacancies, etc., which are all query data operations. Users in the transportation industry may query shipping records, query package delivery status, etc., which are all queries. Different intentions for data operations. Of course, the intention of executing the instruction operation will also be distinguished according to the service behavior provided by the enterprise resource system in each different field. As mentioned above, the services provided by the enterprise resource system in the medical industry will of course be different from those in the transportation industry. Each different field The query data operations or service behavior operations provided by enterprises may not necessarily be universal. Therefore, it is also necessary to distinguish the intentions of the services provided by enterprises in different fields. For example, users in the medical industry may provide registration services. , providing services such as ordering healthy meals for hospitalization, etc. are all different intentions of service behavior operations. For users in the transportation industry, there may be services such as automatically classifying goods, arranging services for shipment of goods, etc., which are all different intentions of service behavior operations. .

于步骤S450及步骤S460中,建立查询数据操作意图的范本以及执行指令操作意图的范本,并根据关键实体集合、查询数据操作意图的范本以及执行指令操作意图的范本建立总体数据库131。举例而言,将使用者在操作某个领域企业的虚拟助理会有的查询数据操作意图及执行指令操作意图都区分好后,就可以针对每个意图产生对应的范本,根据上方的范例,医疗业就会有对应查询病历数据、查询病房空位、提供挂号的服务及提供住院订健康餐的服务的4个范本,运输业就会有对应查询出货纪录、查询包裹运送状态、提供自动分类货物的服务、安排货物出货顺序的服务的4个范本,接着会根据上述这些范本以及关键实体集合建立总体数据库131。In steps S450 and S460, a template of the query data operation intention and a template of the instruction operation intention are established, and the overall database 131 is established based on the key entity set, the template of the query data operation intention, and the template of the instruction operation intention. For example, after distinguishing the user's intention to query data and execute instructions when operating a virtual assistant of a company in a certain field, a corresponding template can be generated for each intention. According to the example above, medical The industry will have four templates for querying medical record data, querying ward vacancies, providing registration services, and providing services for ordering healthy meals in hospital. The transportation industry will have corresponding queries for shipping records, querying package delivery status, and providing automatic classification of goods. 4 templates for services and services for arranging goods shipment sequence, and then the overall database 131 will be established based on these templates and key entity sets.

于步骤S470中,辨识关键实体集合中的系统领域词汇在训练语料中出现的多个第一机率,并通过辨识出的系统领域词汇分析训练语料的多个句型结构,以及系统领域词汇彼此之间的多个关联性,并根据第一机率以及关联性建立共通词汇模型。在一实施例中,利用n元语法(n-GRAM)以及上下文无关文法(Context-free grammar,CFG)两种演算法计算每一系统领域词汇在训练语料中出现的机率,并通过系统领域词汇分析训练语料的句型结构以及系统领域词汇彼此之间的关联性以建立共通词汇模型。举例而言,如果训练语料中有「我要查询XX公司的报价单」以及「我要查询XX公司的出货单」,而「XX公司」、「报价单」及「出货单」都是系统领域词汇,但在上述的范例中,由于「XX公司」可能平均出现在每一个查询数据操作的意图中,因此「XX公司」的机率在每一个查询数据操作的意图中都几乎相同,而「报价单」及「出货单」则只在查询某些特定数据的意图的训练语料中大量出现,而不会出现在查询其他数据的意图的训练语料中,因此「报价单」及「出货单」的机率在对应的意图中会特别高,而在其他意图中会较低。In step S470, multiple first probabilities of the system domain vocabulary appearing in the training corpus in the key entity set are identified, and multiple sentence structures of the training corpus are analyzed through the identified system domain vocabulary, as well as the relationship between the system domain vocabulary and each other. multiple correlations between them, and establish a common vocabulary model based on the first probability and correlation. In one embodiment, two algorithms, n-gram (n-GRAM) and context-free grammar (CFG), are used to calculate the probability of each system domain vocabulary appearing in the training corpus, and the system domain vocabulary is used to calculate the probability of occurrence of each system domain vocabulary in the training corpus. Analyze the sentence structure of the training corpus and the correlation between vocabulary in the system domain to establish a common vocabulary model. For example, if the training corpus contains "I want to inquire about the quotation of XX company" and "I want to inquire about the shipping order of XX company", and "XX company", "quotation" and "shipping order" are all System domain vocabulary, but in the above example, since "XX Company" may appear on average in each intention of querying data operations, the probability of "XX Company" is almost the same in each intention of querying data operations, and "Quotation" and "Shipping Order" only appear in large numbers in the training corpus for the purpose of querying certain specific data, and will not appear in the training corpus for the purpose of querying other data. Therefore, "Quotation" and "Shipping Order" The probability of "Manifest Order" will be particularly high in the corresponding intent, and lower in other intents.

于步骤S480中,分析查询数据操作意图以及执行指令操作意图中出现系统领域词汇的多个第二机率,并根据句型结构以及第二机率建立共通语意模型。在一实施例中,利用隐马尔可夫模型(Hidden Markov Model,HMM)演算法计算系统领域词汇在查询数据操作意图以及执行指令操作意图中出现的机率,以建立共通语意模型,举例而言,在训练数据模型阶段时会输入许多训练语料,隐马尔可夫模型演算法必须计算系统领域词汇在不同意图出现的机率。结合上述的范例,如果训练语料中有「我要查询XX公司的出货单」,依照n元语法以及上下文无关文法可以找出「XX公司」及「出货单」都是系统领域词汇,而隐马尔可夫模型演算法可以依据所有辨识出的系统领域词汇于查询数据操作意图以及执行指令操作意图中的机率以及系统领域词汇之间的关系,进一步判断「出货单」与查询出货数据的意图相关联,再结合「XX公司」的系统领域词汇,可以自动帮使用者在企业数据库中查询XX公司的出货相关数据。In step S480, a plurality of second probabilities of system domain vocabulary appearing in the query data operation intention and the execution instruction operation intention are analyzed, and a common semantic model is established based on the sentence structure and the second probabilities. In one embodiment, a Hidden Markov Model (HMM) algorithm is used to calculate the probability of system domain words appearing in the query data operation intention and the instruction operation intention to establish a common semantic model. For example, In the training data model stage, a lot of training corpus is input, and the hidden Markov model algorithm must calculate the probability of occurrence of words in the system domain for different purposes. Combined with the above example, if there is "I want to query the shipment order of XX company" in the training corpus, according to n-gram and context-free grammar, it can be found that "XX company" and "shipment order" are both system domain words, and The hidden Markov model algorithm can further determine the "shipping order" and query shipping data based on the probability of all identified system domain words in the query data operation intention and the execution instruction operation intention, as well as the relationship between the system domain words. Associated with the intention, combined with the system domain vocabulary of "XX company", it can automatically help users query XX company's shipment-related data in the enterprise database.

当建立完共通词汇模型及共通语意模型后,接着进行步骤S320,利用自然语言处理模型分析语料数据,以产生与语料数据对应的语言特征信息,语言特征信息包含多个意图、意图对应的机率以及多个词汇。步骤S320的细部流程请参考图5,图5是根据本案的一些实施例所绘示的步骤S320的流程图。如图5所示,步骤S320包含以下步骤:After the common vocabulary model and the common semantic model are established, step S320 is then performed to analyze the corpus data using the natural language processing model to generate language feature information corresponding to the corpus data. The language feature information includes multiple intentions, probabilities corresponding to the intentions, and Multiple words. For the detailed process of step S320, please refer to Figure 5. Figure 5 is a flow chart of step S320 according to some embodiments of this case. As shown in Figure 5, step S320 includes the following steps:

步骤S321:利用共通词汇模型辨识语料数据中是否具有符合关键实体集合中的系统领域词汇,将辨识结果设定为语言特征信息中的词汇,并分析语言特征信息中的词汇出现的机率;Step S321: Use the common vocabulary model to identify whether the corpus data contains system domain vocabulary that matches the key entity set, set the identification result to the vocabulary in the language feature information, and analyze the probability of occurrence of the vocabulary in the language feature information;

步骤S322:根据特征信息中的词汇分析语料数据的句型结构;以及Step S322: Analyze the sentence structure of the corpus data according to the vocabulary in the feature information; and

步骤S323:利用共通语意模型根据特征信息中的词汇出现的机率以及语料数据的句型结构辨识语料数据的意图以及意图对应的机率。Step S323: Use the common semantic model to identify the intention of the corpus data and the probability corresponding to the intention based on the probability of word occurrence in the feature information and the sentence structure of the corpus data.

于步骤S321及步骤S322中,利用共通词汇模型辨识语料数据中是否具有符合关键实体集合中的系统领域词汇,将辨识结果设定为语言特征信息中的词汇,并分析语言特征信息中的词汇出现的机率,再根据特征信息中的词汇分析语料数据的句型结构。举例而言,将使用者输入的语料数据,利用共通词汇模型将语料数据中含有系统领域词汇的词汇辨识出来,再进一步判断出语料数据的句型结构。举例而言,如果使用者对虚拟助理说:「我想要查XX公司上个月的出货单」,根据共通词汇模型可以辨识出「XX公司」、「上个月」及「出货单」等符合系统领域词汇的词汇。In steps S321 and S322, the common vocabulary model is used to identify whether the corpus data contains system domain vocabulary that matches the key entity set, the identification result is set to the vocabulary in the language feature information, and the occurrence of the vocabulary in the language feature information is analyzed probability, and then analyze the sentence structure of the corpus data based on the vocabulary in the feature information. For example, the corpus data input by the user is used to identify words containing system domain vocabulary in the corpus data using a common vocabulary model, and then the sentence structure of the corpus data is further determined. For example, if the user says to the virtual assistant: "I want to check the shipment order of XX company last month," the common vocabulary model can identify "XX company", "last month" and "shipment order" ” and other words that match the vocabulary of the system domain.

于步骤S323中,利用共通语意模型根据特征信息中的词汇出现的机率以及语料数据的句型结构辨识语料数据的意图以及意图对应的机率。根据上方的范例,辨识出「XX公司」、「上个月」及「出货单」等词汇后,会再进一步判断这些词汇在所有意图中的机率。此处指的所有意图包含所有查询数据操作意图以及执行指令操作意图的机率。In step S323, the common semantic model is used to identify the intention of the corpus data and the probability corresponding to the intention according to the probability of word occurrence in the feature information and the sentence structure of the corpus data. According to the example above, after identifying words such as "XX company", "last month" and "shipping note", the probability of these words in all intentions will be further determined. All intentions referred to here include all query data operation intentions and the probability of executing instruction operation intentions.

于步骤S330中,依据职能情境信息对语言特征信息进行职能情境分析,判断这些意图的其中之一对应的操作。在进行职能情境分析之前需先建立职能情境模型及职能词汇模型,职能情境模型在进行职能情境分析时是先将历史数据库134中的数据所转换成的特征向量,然后会利用机器学习演算法将历史数据库134中的数据依据各种不同的情境分类后计算特征向量与各情境之间的强弱关系,接着产生职能情境模型。适合建立上述职能情境的机器学习演算法包括:传统机器学习常用的支援向量机(Support Vector Machine,SVM),以及目前深度学习(Deep Learning)相关的卷积神经网路(Convolutional NeuralNetworks,CNN)、递归神经网路(Recurrent Neural Networks,RNN)和长短期记忆模型(Long Short-Term Memory,LSTM)等演算法。In step S330, functional context analysis is performed on the language feature information based on the functional context information, and the operation corresponding to one of these intentions is determined. Before performing functional situation analysis, it is necessary to establish a functional situation model and a functional vocabulary model. When performing functional situation analysis, the functional situation model first converts the data in the historical database 134 into feature vectors, and then uses a machine learning algorithm to The data in the historical database 134 is classified according to various situations, and then the strength relationship between the feature vector and each situation is calculated, and then a functional situation model is generated. Machine learning algorithms suitable for establishing the above functional scenarios include: Support Vector Machine (SVM) commonly used in traditional machine learning, and Convolutional Neural Networks (CNN) related to current deep learning (Deep Learning), Algorithms such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM).

承上述,职能词汇模型是根据大量输入的训练语料利用隐马尔可夫模型演算法分析后再进行断词处理,接着会统计分词的出现频率以产生分词频率表,进而建立职能词汇模型。步骤S330的细部流程请参考图6,图6是根据本案的一些实施例所绘示的步骤S330的流程图。如图6所示,步骤S330包含以下步骤:Following the above, the functional vocabulary model is based on the analysis of a large amount of input training corpus using the Hidden Markov Model algorithm and then performs word segmentation processing. Then the frequency of word segmentation is counted to generate a word segmentation frequency table, and then the functional vocabulary model is established. Please refer to Figure 6 for the detailed process of step S330. Figure 6 is a flow chart of step S330 according to some embodiments of this case. As shown in Figure 6, step S330 includes the following steps:

步骤S331:利用语料数据以及职能情境信息与职能情境模型进行比对,并产生职能情境辨识结果;以及Step S331: Use corpus data and functional situation information to compare with the functional situation model, and generate functional situation identification results; and

步骤S332:根据职能情境辨识结果判断这些意图的其中之一对应查询数据操作及执行指令操作的其中之一。Step S332: Determine that one of these intentions corresponds to one of the query data operation and the instruction execution operation according to the functional context identification result.

于步骤S331中,利用语料数据以及职能情境信息与职能情境模型进行比对,并产生职能情境辨识结果。职能情境信息包含使用者的身份、使用者的职位、使用者的部门、时间以及地点。职能情境信息的部分信息可以由输入/输出装置150所感测,例如可以侦测使用者目前的状态(例如,是否出差回来)。根据前面辨识使用者语料数据后所得到的所有意图对应的机率以及词汇,再结合职能情境信息可以进一步估算使用者的语料数据与训练数据模型中的数据的相似程度,作为对应的意图的机率。In step S331, the corpus data and functional situation information are compared with the functional situation model, and a functional situation identification result is generated. Functional context information includes the user's identity, user's position, user's department, time and location. Part of the functional context information may be sensed by the input/output device 150, for example, the user's current status may be detected (eg, whether the user is back from a business trip). Based on the probabilities and vocabulary corresponding to all intentions obtained after previously identifying the user's corpus data, combined with the functional situation information, the similarity between the user's corpus data and the data in the training data model can be further estimated as the probability of the corresponding intention.

于步骤S332中,根据职能情境辨识结果判断这些意图的其中之一对应查询数据操作及执行指令操作的其中之一。由于在训练数据模型中会有多个查询数据操作意图以及多个执行指令操作意图,并且在经过前述的共通语意模型的计算后会产生每个意图对应的机率,具有较低机率值的意图可以利用门槛值过滤,以得到最有可能的意图并确认对应的操作。由前述的范例可知,当辨识出「XX公司」、「上个月」及「出货单」等词汇后,会判断这些词汇搭配职能情境信息找出最符合的查询数据操作意图或执行指令操作意图,在经过上述操作后即会判断出使用者对虚拟助理说:「我想要查XX公司上个月的出货单」,最有可能会要查XX公司的出货单,因此即可对应出使用者想要执行的是查询数据操作。需要有职能情境的判断是因为会因为使用者的职位、部门、操作时间、操作地点等信息不同,而有不同的需求,举例而言,采购人员与财务人员都会看[厂商每月统计表],但是可能这两者的[厂商每月统计表]的统计目标并不相同:一个是统计厂商的进货状况,另一个是统计自己公司付款给厂商的状况。但使用者在与虚拟助理对话时不一定会明确指说需要什么[厂商每月统计表],可能只说:「我需要上个月的厂商每月统计表」这种简单的句型,因此才更需要搭配使用者的职能情境信息再进行进一步精准的判断。In step S332, it is determined according to the functional context recognition result that one of the intentions corresponds to one of the query data operation and the execution instruction operation. Since there will be multiple query data operation intentions and multiple execution instruction operation intentions in the training data model, and the probability corresponding to each intention will be generated after the calculation of the aforementioned common semantic model, the intention with a lower probability value can Use threshold filtering to get the most likely intent and confirm the corresponding action. As can be seen from the above examples, when words such as "XX company", "last month" and "shipping order" are recognized, these words will be judged and matched with the functional context information to find the most suitable query data operation intention or execute the command operation. Intention, after the above operation, it will be determined that the user said to the virtual assistant: "I want to check the shipment order of XX company last month", and most likely want to check the shipment order of XX company, so it can It corresponds to the query data operation that the user wants to perform. Functional context judgment is needed because users have different needs due to different information such as position, department, operating time, operating location, etc. For example, purchasing personnel and financial personnel will both look at [Manufacturer Monthly Statistical Table] , but the statistical goals of the two [Manufacturer Monthly Statistical Tables] may be different: one is to count the manufacturer's purchase status, and the other is to count the company's payment to the manufacturer. However, when the user talks to the virtual assistant, he may not necessarily specify what he needs [manufacturer's monthly statistics]. He may only say something as simple as: "I need last month's manufacturer's monthly statistics." Therefore, It is even more necessary to match the user's functional situation information to make further accurate judgments.

于步骤S340中,如果职能情境分析无法判断这些意图的其中之一对应的操作,则针对语料数据进行分词处理。步骤S340的细部流程请参考图7,图7是根据本案的一些实施例所绘示的步骤S340的流程图。如图7所示,步骤S340包含以下步骤:In step S340, if the functional situation analysis cannot determine the operation corresponding to one of these intentions, word segmentation processing is performed on the corpus data. For the detailed process of step S340, please refer to FIG. 7. FIG. 7 is a flow chart of step S340 according to some embodiments of this case. As shown in Figure 7, step S340 includes the following steps:

步骤S341:根据职能词汇模型对语料数据进行断词,以产生多个分词;以及Step S341: segment the corpus data according to the functional vocabulary model to generate multiple word segments; and

步骤S342:计算这些分词的频率。Step S342: Calculate the frequencies of these word segments.

于步骤S341及步骤S342中,根据职能词汇模型对语料数据进行断词,以产生多个分词;接着计算这些分词的频率。如果在步骤S330中职能情境分析无法判断输入的语料数据对应的操作时,就需要对语料数据进行分词处理。首先,会根据先前预先建立好的职能词汇模型中储存的词汇对语料数据进行断词,接着计算断词后产生的多个分词的频率。In steps S341 and S342, the corpus data is segmented according to the functional vocabulary model to generate multiple word segments; and then the frequencies of these segmentations are calculated. If the functional situation analysis cannot determine the operation corresponding to the input corpus data in step S330, it is necessary to perform word segmentation processing on the corpus data. First, the corpus data will be segmented based on the vocabulary stored in the previously pre-established functional vocabulary model, and then the frequency of multiple word segments generated after segmentation will be calculated.

于步骤S350及步骤S360中,跟据分词处理后的结果,判断是否存在新词汇或新语料数据;如果存在新词汇,根据新词汇的意义更新自然语言处理模型,如果存在新语料数据,根据新语料数据的意图更新职能情境分析。步骤S360的细部流程请参考图8,图8是根据本案的一些实施例所绘示的步骤S360的流程图。如图8所示,步骤S360包含以下步骤:In steps S350 and S360, based on the results of word segmentation processing, it is determined whether there are new words or new corpus data; if there are new words, the natural language processing model is updated according to the meaning of the new words; if there is new corpus data, the natural language processing model is updated according to the new words. Functional context analysis of intent updating of corpus data. Please refer to Figure 8 for the detailed process of step S360. Figure 8 is a flow chart of step S360 according to some embodiments of this case. As shown in Figure 8, step S360 includes the following steps:

步骤S361:判断分词处理计算出的这些分词的频率是否低于门槛值;Step S361: Determine whether the frequencies of these word segments calculated by the word segmentation processing are lower than the threshold;

步骤S362:如果这些分词的其中之一低于门槛值,这些分词的其中之一则为新词汇,并接收新词汇的定义,以更新共通词汇模型及共通语意模型;以及Step S362: If one of the word segments is lower than the threshold, one of the word segments is a new vocabulary, and the definition of the new vocabulary is received to update the common vocabulary model and the common semantic model; and

步骤S363:如果这些分词均高于门槛值,则语料数据则为新语料数据,并接收新语料数据的意图,以更新职能情境模型。Step S363: If these word segments are higher than the threshold, the corpus data is new corpus data, and the intention of the new corpus data is received to update the functional situation model.

于步骤S361及步骤S362中,判断分词处理计算出的这些分词的频率是否低于门槛值,如果这些分词的其中之一低于门槛值,这些分词的其中之一则为新词汇,并接收新词汇的定义,以更新共通词汇模型及共通语意模型。于一实施例中,经过分词处理计算完这些分词的频率后,将低于门槛值的分词设定为新词汇,虚拟助理会询问使用者新词汇的定义,并将新词汇以及新词汇的定义一起存入共通词汇模型及共通语意模型中。举例而言,使用者输入的语料是「我想找XX公司的联络人」,而如果虚拟助理无法判断「我想找XX公司的联络人」的意义,会在分词处理后分出「我」、「想找」、「XX公司」、「的」、「联络人」等词汇,如果「XX公司」低于门槛值,虚拟助理会询问使用者「XX公司」是什么意思,接着将使用者的回答及「XX公司」一起存入共通词汇模型及共通语意模型;而新词汇也需要一起存入系统领域词汇集合中,与所有人共用。In steps S361 and S362, it is judged whether the frequencies of these word segments calculated by the word segmentation processing are lower than the threshold value. If one of these word segments is lower than the threshold value, one of these word segments is a new vocabulary, and a new word segmentation is received. Definition of vocabulary to update the common vocabulary model and common semantic model. In one embodiment, after calculating the frequency of these word segmentations through word segmentation processing, the word segmentations below the threshold are set as new words. The virtual assistant will ask the user for the definition of the new word and assign the new word and the definition of the new word to the user. Store them together in a common vocabulary model and a common semantic model. For example, the corpus input by the user is "I want to find a contact person from XX company", and if the virtual assistant cannot determine the meaning of "I want to find a contact person from XX company", it will separate "I" after word segmentation processing. , "want to find", "XX company", "of", "contact person" and other words. If "XX company" is lower than the threshold, the virtual assistant will ask the user what "XX company" means, and then add the user The answers and "XX company" are stored in the common vocabulary model and common semantic model; and the new vocabulary also needs to be stored in the system domain vocabulary collection and shared with everyone.

于步骤S363中,如果这些分词均高于门槛值,则语料数据则为新语料数据,并接收新语料数据的意图,以更新职能情境模型。接续上方「我想找XX公司的联络人」的范例,在分词处理后分出「我」、「想找」、「XX公司」、「的」、「联络人」等词汇,如果都没有词汇低于门槛值,表示虚拟助理不理解的是语料的意图,有可能在训练智能助理时的训练语料都是关于「帮我查XX公司的联络人」的叙述,因此虚拟助理就会无法理解「我想找XX公司的联络人」的意图,而虚拟助理就需要再询问使用者「我想找XX公司的联络人」是什么意思,接着将使用者的回答及「我想找XX公司的联络人」的新语料一起存入职能情境模型。在存入职能模型之前需要再判断新语料是否为共通语料,如果是的话则代表其他人在使用虚拟助理时也会使用到新语料,因此需要将新语料存入系统领域词汇集合,让所有人共用;但如果不是的话则代表新语料只是使用者本身的说话习惯而有的不同的用语,因此只需要更新职能情境模型即可,不需要再更新系统领域词汇集合。In step S363, if these word segments are higher than the threshold, the corpus data is new corpus data, and the intention of the new corpus data is received to update the functional situation model. Continuing the example of "I want to find a contact person of XX company" above, after word segmentation processing, separate the words "I", "want to find", "XX company", "of", "contact person" and other words. If there are no words in any of them If it is lower than the threshold, it means that the virtual assistant does not understand the intention of the corpus. It is possible that the training corpus used when training the intelligent assistant is all about "Help me check the contact person of XX company", so the virtual assistant will not be able to understand " "I want to find a contact person for XX company", the virtual assistant needs to ask the user what "I want to find a contact person for XX company" means, and then combine the user's answer with "I want to find a contact person for XX company". The new corpus of "people" is stored in the functional situation model together. Before storing the functional model, it is necessary to determine whether the new corpus is common corpus. If so, it means that other people will also use the new corpus when using the virtual assistant. Therefore, the new corpus needs to be stored in the system domain vocabulary collection so that everyone can Shared; but if not, it means that the new corpus is just a different terminology based on the user's own speaking habits, so only the functional situation model needs to be updated, and there is no need to update the system domain vocabulary set.

由上述本案的实施方式可知,主要是让虚拟助理具有自动学习的功能,让虚拟助理可以在与使用者交流的过程中,如果有智能助理不懂的词汇可以在询问使用者过后,更新虚拟助理的数据库,使得虚拟助理可以自动学习到使用者的说话习惯,或是行业中的特殊用语用词,达到让使用者使用ERP系统是能够更快速便利的功效。It can be seen from the above implementation method of this case that the main purpose is to allow the virtual assistant to have an automatic learning function, so that during the process of communicating with the user, the virtual assistant can update the virtual assistant after asking the user if there are words that the intelligent assistant does not understand. The database allows the virtual assistant to automatically learn the user's speaking habits or special terms used in the industry, making it faster and more convenient for users to use the ERP system.

另外,上述例示包含依序的示范步骤,但这些步骤不必依所显示的顺序被执行。以不同顺序执行这些步骤皆在本揭示内容的考量范围内。在本揭示内容的实施例的精神与范围内,可视情况增加、取代、变更顺序及/或省略这些步骤。In addition, the above illustrations include sequential exemplary steps, but these steps need not be performed in the order shown. It is contemplated by this disclosure to perform these steps in a different order. These steps may be added, substituted, changed in order and/or omitted as appropriate within the spirit and scope of embodiments of the present disclosure.

虽然本案已以实施方式揭示如上,然其并非用以限定本案,任何熟悉此技艺者,在不脱离本案的精神和范围内,当可作各种的更动与润饰,因此本案的保护范围当视所附的权利要求书所界定的范围为准。Although the implementation of this case has been disclosed as above, it is not used to limit this case. Anyone familiar with this technology can make various changes and modifications without departing from the spirit and scope of this case. Therefore, the scope of protection of this case should be Subject to the scope defined by the appended claims.

Claims (14)

1.一种虚拟助理的自动学习方法,其特征在于,包含:1. An automatic learning method for virtual assistants, characterized by including: 接收一音频输入并辨识该音频以形成一语料数据;receiving an audio input and identifying the audio to form a corpus data; 利用一自然语言处理模型分析该语料数据,以产生与该语料数据对应的一语言特征信息,其中该语言特征信息包含多个意图、所述多个意图对应的机率以及多个词汇;Utilize a natural language processing model to analyze the corpus data to generate a language feature information corresponding to the corpus data, wherein the language feature information includes multiple intentions, probabilities corresponding to the multiple intentions, and multiple words; 依据一职能情境信息对该语言特征信息进行一职能情境分析,判断所述多个意图的其中之一对应的一操作,其中该职能情境信息包含一使用者的一身份、一职位、一部门、一时间以及一地点的其中至少一者,其中该职能情境分析还包含:利用该语料数据以及该职能情境信息与一职能情境模型进行比对,并产生一职能情境辨识结果;以及根据该职能情境辨识结果判断所述多个意图的其中之一对应一查询数据操作及一执行指令操作的其中之一,其中该职能情境模型相关于一历史数据库中的数据所转换的特征向量与多个情境中的各者之间的强弱关系;Perform a functional context analysis on the language feature information based on a functional context information to determine an operation corresponding to one of the plurality of intentions, wherein the functional context information includes an identity, a position, a department, At least one of a time and a place, wherein the functional situation analysis also includes: using the corpus data and the functional situation information to compare with a functional situation model and generating a functional situation identification result; and based on the functional situation The identification result determines that one of the plurality of intentions corresponds to one of a query data operation and an execution instruction operation, wherein the functional situation model is related to a feature vector converted from data in a historical database and a plurality of situations. The strong and weak relationship between them; 如果该职能情境分析无法判断所述多个意图的其中之一对应的该操作,则针对该语料数据进行一分词处理;If the functional situation analysis cannot determine the operation corresponding to one of the multiple intentions, then perform one-word segmentation processing on the corpus data; 根据该分词处理后的结果,判断是否存在一新词汇或一新语料数据;以及Based on the result of the word segmentation processing, determine whether there is a new vocabulary or new corpus data; and 如果存在该新词汇,根据该新词汇的意义更新该自然语言处理模型,如果存在该新语料数据,根据该新语料数据的意图更新该职能情境分析;If the new vocabulary exists, update the natural language processing model according to the meaning of the new vocabulary; if the new corpus data exists, update the functional situation analysis according to the intention of the new corpus data; 其中,该操作包含一查询数据操作及一执行指令操作的其中之一。The operation includes one of a data query operation and an instruction execution operation. 2.根据权利要求1所述的虚拟助理的自动学习方法,其特征在于,还包含:2. The automatic learning method of a virtual assistant according to claim 1, further comprising: 根据一应用知识数据库及一领域知识数据库产生一系统领域词汇集合;Generate a system domain vocabulary set according to an application knowledge database and a domain knowledge database; 该系统领域词汇集合及多个服务应用参数形成为一关键实体集合,该关键实体集合包含多个系统领域词汇;The system domain vocabulary set and multiple service application parameters form a key entity set, and the key entity set includes multiple system domain vocabulary; 将多个训练语料分类为该查询数据操作及该执行指令操作的其中之一;Classify multiple training corpora into one of the query data operation and the execution instruction operation; 依照一企业数据库中的类别区分对应该查询数据操作的所述多个训练语料的意图形成多个查询数据操作意图,以及依照一企业资源系统提供的服务行为区分对应该执行指令操作的所述多个训练语料的意图形成多个执行指令操作意图;Differentiating the intentions of the plurality of training corpora corresponding to the query data operation according to categories in an enterprise database to form multiple query data operation intentions, and differentiating the plurality of training corpora corresponding to the execution of the instruction operation according to the service behavior provided by an enterprise resource system. The intention of each training corpus forms multiple execution instruction operation intentions; 建立所述多个查询数据操作意图的范本,以及所述多个执行指令操作意图的范本;Establishing templates for the plurality of query data operation intentions, and templates for the plurality of execution instruction operation intentions; 根据该关键实体集合、所述多个查询数据操作意图的范本以及所述多个执行指令操作意图的范本建立一总体数据库;Establish an overall database based on the key entity set, the plurality of templates of query data operation intentions, and the plurality of templates of execution instruction operation intentions; 辨识该关键实体集合中的所述多个系统领域词汇在所述多个训练语料中出现的多个第一机率,并通过辨识出的所述多个系统领域词汇分析所述多个训练语料的多个句型结构,以及所述多个系统领域词汇彼此之间的多个关联性,并根据所述多个第一机率以及所述多个关联性建立一共通词汇模型;以及Identify a plurality of first probabilities that the plurality of system domain words in the key entity set appear in the plurality of training corpora, and analyze the plurality of training corpora through the identified plurality of system domain words. A plurality of sentence structures, and a plurality of correlations between the plurality of system domain words, and establishing a common vocabulary model based on the plurality of first probabilities and the plurality of correlations; and 分析所述多个查询数据操作意图以及所述多个执行指令操作意图中出现所述多个系统领域词汇的多个第二机率,并根据所述多个句型结构以及所述多个第二机率建立一共通语意模型。Analyze a plurality of second probabilities of occurrence of the plurality of system domain words in the plurality of query data operation intentions and the plurality of execution instruction operation intentions, and analyze the plurality of second probabilities of occurrence of the plurality of system domain words according to the plurality of sentence structures and the plurality of second Probability establishes a common semantic model. 3.根据权利要求2所述的虚拟助理的自动学习方法,其特征在于,还包含:3. The automatic learning method of a virtual assistant according to claim 2, further comprising: 利用一分类器将该历史数据库中的数据进行关系强弱分类,产生该职能情境模型;以及Use a classifier to classify the data in the historical database into strong and weak relationships to generate the functional situation model; and 将所述多个训练语料进行断词及分析,并根据该历史数据库中的数据产生一职能词汇模型。The multiple training corpora are segmented and analyzed, and a functional vocabulary model is generated based on the data in the historical database. 4.根据权利要求3所述的虚拟助理的自动学习方法,其特征在于,该分词处理还包含:4. The automatic learning method of virtual assistant according to claim 3, characterized in that the word segmentation processing also includes: 根据该职能词汇模型对该语料数据进行断词,以产生多个分词;以及Segment the corpus data according to the functional vocabulary model to generate multiple word segments; and 计算所述多个分词的频率。Calculate the frequency of the plurality of word segments. 5.根据权利要求4所述的虚拟助理的自动学习方法,其特征在于,还包含:5. The automatic learning method of a virtual assistant according to claim 4, further comprising: 判断该分词处理计算出的所述多个分词的频率是否低于一门槛值;Determine whether the frequencies of the plurality of word segments calculated by the word segmentation processing are lower than a threshold; 如果所述多个分词的其中之一低于该门槛值,所述多个分词的其中之一则为该新词汇,并接收该新词汇的定义,以更新该共通词汇模型及该共通语意模型;以及If one of the plurality of word segments is lower than the threshold, one of the plurality of word segments is a new vocabulary, and the definition of the new vocabulary is received to update the common vocabulary model and the common semantic model. ;as well as 如果所述多个分词均高于该门槛值,则该语料数据则为该新语料数据,并接收该新语料数据的意图,以更新该职能情境模型。If the plurality of word segments are all higher than the threshold, the corpus data is new corpus data, and the intention of the new corpus data is received to update the functional situation model. 6.根据权利要求5所述的虚拟助理的自动学习方法,其特征在于,还包含:6. The automatic learning method of a virtual assistant according to claim 5, further comprising: 判断该新语料数据是否为共通语料,如果是则根据该新语料数据更新该系统领域词汇集合;以及Determine whether the new corpus data is common corpus, and if so, update the system domain vocabulary set based on the new corpus data; and 根据该新词汇更新该系统领域词汇集合。The system domain vocabulary set is updated according to the new vocabulary. 7.根据权利要求2所述的虚拟助理的自动学习方法,其特征在于,该自然语言处理模型分析该语料数据还包含:7. The automatic learning method of virtual assistant according to claim 2, characterized in that the natural language processing model analyzing the corpus data also includes: 利用该共通词汇模型辨识该语料数据中是否具有符合该关键实体集合中的所述多个系统领域词汇,将辨识结果设定为所述多个词汇,并分析所述多个词汇出现的机率;Use the common vocabulary model to identify whether the corpus data contains the multiple system domain words that match the key entity set, set the identification results to the multiple words, and analyze the probability of occurrence of the multiple words; 根据所述多个词汇分析该语料数据的句型结构;以及Analyze the sentence structure of the corpus data according to the plurality of words; and 利用该共通语意模型根据所述多个词汇出现的机率以及该语料数据的句型结构辨识该语料数据的所述多个意图以及所述多个意图对应的机率。The common semantic model is used to identify the plurality of intentions of the corpus data and the probabilities corresponding to the plurality of intentions based on the occurrence probabilities of the plurality of words and the sentence structure of the corpus data. 8.一种虚拟助理的自动学习系统,分别与一企业数据库及一企业资源系统连接,其特征在于,包含:8. An automatic learning system for virtual assistants, which is connected to an enterprise database and an enterprise resource system respectively, and is characterized by including: 一处理器;a processor; 一储存装置,电性连接至该处理器,用以储存一总体数据库、一应用知识数据库、一领域知识数据库以及一历史数据库;A storage device electrically connected to the processor for storing an overall database, an application knowledge database, a domain knowledge database and a historical database; 一输入/输出装置,电性连接至该处理器,用以提供一接口以供输入一音频;an input/output device electrically connected to the processor for providing an interface for inputting audio; 其中,该处理器包含:Among them, the processor contains: 一语音辨识模块,用以辨识该音频以形成一语料数据;a speech recognition module for identifying the audio to form a corpus data; 一语料分析模块,与该语音辨识模块电性连接,用以利用一自然语言处理模型分析该语料数据,以产生与该语料数据对应的一语言特征信息,其中该语言特征信息包含多个意图、所述多个意图对应的机率以及多个词汇;A corpus analysis module, electrically connected to the speech recognition module, is used to analyze the corpus data using a natural language processing model to generate a language feature information corresponding to the corpus data, wherein the language feature information includes multiple intentions, The probabilities and multiple words corresponding to the plurality of intentions; 一情境辨识模块,与该语料分析模块电性连接,用以依据一职能情境信息对该语言特征信息进行一职能情境分析,判断所述多个意图的其中之一对应的一操作,其中该职能情境信息包含一使用者的一身份、一职位、一部门、一时间以及一地点的其中至少一者,其中一情境分析模块更用以利用该语料数据以及该职能情境信息与一职能情境模型进行比对,并产生一职能情境辨识结果;以及根据该职能情境辨识结果判断所述多个意图的其中之一对应一查询数据操作及一执行指令操作的其中之一,其中该职能情境模型相关于一历史数据库中的数据所转换的特征向量与多个情境中的各者之间的强弱关系;A situation identification module, electrically connected to the corpus analysis module, is used to perform a function situation analysis on the language feature information based on a function situation information, and determine an operation corresponding to one of the plurality of intentions, wherein the function The situational information includes at least one of an identity, a position, a department, a time and a place of a user, and a situational analysis module is further configured to use the corpus data, the functional situational information and a functional situational model to perform Compare and generate a functional context identification result; and determine according to the functional context identification result that one of the plurality of intentions corresponds to one of a query data operation and an execution instruction operation, wherein the functional context model is related to The strong and weak relationship between the feature vector converted by the data in a historical database and each person in multiple situations; 一未知语料判断模块,与该情境辨识模块电性连接,用以在该情境辨识模块无法辨识所述多个意图的其中之一对应的该操作时,针对该语料数据进行一分词处理,并根据该分词处理后的结果,判断是否存在一新词汇或一新语料数据;以及An unknown corpus judgment module is electrically connected to the situation identification module, and is used to perform word segmentation processing on the corpus data when the situation identification module cannot identify the operation corresponding to one of the plurality of intentions, and perform word segmentation processing on the corpus data according to The result after word segmentation processing is used to determine whether there is a new vocabulary or new corpus data; and 一更新信息模块,与该未知语料判断模块电性连接,用以在有该新词汇产生时,根据该新词汇的意义更新该自然语言处理模型,以及在该新语料数据产生时,根据该新语料数据的意图更新该职能情境分析;An update information module, electrically connected to the unknown corpus judgment module, is used to update the natural language processing model according to the meaning of the new vocabulary when the new vocabulary is generated, and when the new corpus data is generated, update the natural language processing model according to the new vocabulary when the new corpus data is generated. The intention of corpus data is to update the functional situation analysis; 其中,该操作包含一查询数据操作及一执行指令操作的其中之一。The operation includes one of a data query operation and an instruction execution operation. 9.根据权利要求8所述的虚拟助理的自动学习系统,其特征在于,该处理器还包含:9. The automatic learning system of virtual assistant according to claim 8, characterized in that the processor further includes: 一训练模块,与该语料分析模块电性连接,用以根据该应用知识数据库及该领域知识数据库产生一系统领域词汇集合,该系统领域词汇集合及多个服务应用参数形成为一关键实体集合,该关键实体集合包含多个系统领域词汇,并将多个训练语料分类为该查询数据操作及该执行指令操作的其中之一,依照该企业数据库中的类别区分对应该查询数据操作的所述多个训练语料的意图形成多个查询数据操作意图,以及依照该企业资源系统提供的服务行为区分对应该执行指令操作的所述多个训练语料的意图形成多个执行指令操作意图;A training module, electrically connected to the corpus analysis module, used to generate a system domain vocabulary set based on the application knowledge database and the domain knowledge database. The system domain vocabulary set and multiple service application parameters form a key entity set, The key entity set includes a plurality of system domain vocabulary, and the plurality of training corpus is classified into one of the query data operation and the execution instruction operation, and the plurality of training corpus corresponding to the query data operation is distinguished according to the categories in the enterprise database. The intention of each training corpus forms multiple query data operation intentions, and the intention of distinguishing the plurality of training corpora corresponding to the instruction operation is formed according to the service behavior provided by the enterprise resource system to form multiple execution instruction operation intentions; 一范本建立模块,与该训练模块电性连接,建立所述多个查询数据操作意图的范本,以及所述多个执行指令操作意图的范本,根据该关键实体集合、所述多个查询数据操作意图的范本以及所述多个执行指令操作意图的范本建立该总体数据库;A template creation module, electrically connected to the training module, establishes templates of the plurality of query data operation intentions, and templates of the plurality of execution instruction operation intentions, based on the key entity set, the plurality of query data operations The template of the intention and the templates of the plurality of execution instruction operation intentions establish the overall database; 一词汇模型建立模块,与该范本建立模块电性连接,辨识该关键实体集合中的所述多个系统领域词汇在所述多个训练语料中出现的多个第一机率,并通过辨识出的所述多个系统领域词汇分析所述多个训练语料的多个句型结构,以及所述多个系统领域词汇彼此之间的多个关联性,并根据所述多个第一机率以及所述多个关联性建立一共通词汇模型;以及A vocabulary model building module, electrically connected to the template building module, identifies a plurality of first probabilities that the plurality of system domain words in the key entity set appear in the plurality of training corpora, and passes the identified The multiple system domain words analyze multiple sentence structures of the multiple training corpora and multiple correlations between the multiple system domain words, and analyze the multiple system domain words according to the multiple first probabilities and the multiple correlations between the multiple system domain words. Multiple associations build a common vocabulary model; and 一语意模型建立模块,与该范本建立模块电性连接,分析所述多个查询数据操作意图以及所述多个执行指令操作意图中出现所述多个系统领域词汇的多个第二机率,并根据所述多个句型结构以及所述多个第二机率建立一共通语意模型。A semantic model building module, electrically connected to the template building module, analyzes the plurality of query data operation intentions and the plurality of execution instruction operation intentions for multiple second probabilities of occurrence of the plurality of system domain words, and A common semantic model is established based on the plurality of sentence structures and the plurality of second probabilities. 10.根据权利要求9所述的虚拟助理的自动学习系统,其特征在于,该处理器还包含:10. The automatic learning system of virtual assistant according to claim 9, characterized in that the processor further includes: 一情境训练模块,与该情境辨识模块电性连接,用以利用一分类器将该历史数据库中的数据进行关系强弱分类,产生该职能情境模型;以及A situation training module, electrically connected to the situation identification module, is used to use a classifier to classify the data in the historical database into strong and weak relationships to generate the functional situation model; and 一词汇训练模块,与该未知语料判断模块电性连接,用以将所述多个训练语料进行断词及分析,并根据该历史数据库中的数据产生一职能词汇模型。A vocabulary training module is electrically connected to the unknown corpus judgment module for segmenting and analyzing the plurality of training corpus, and generating a functional vocabulary model based on the data in the historical database. 11.根据权利要求10所述的虚拟助理的自动学习系统,其特征在于,该未知语料判断模块更用以根据该职能词汇模型对该语料数据进行断词,以产生多个分词,以计算所述多个分词的频率。11. The automatic learning system of the virtual assistant according to claim 10, characterized in that the unknown corpus judgment module is further used to segment the corpus data according to the functional vocabulary model to generate a plurality of word segments to calculate all Describe the frequency of multiple participles. 12.根据权利要求11所述的虚拟助理的自动学习系统,其特征在于,该更新信息模块更用以判断该分词处理计算出的所述多个分词的频率是否低于一门槛值;如果所述多个分词的其中之一低于该门槛值,所述多个分词的其中之一则为该新词汇,并接收该新词汇的定义,以更新该共通词汇模型及该共通语意模型;如果所述多个分词均高于该门槛值,则该语料数据则为该新语料数据,并接收该新语料数据的意图,以更新该职能情境模型。12. The automatic learning system of the virtual assistant according to claim 11, wherein the update information module is further used to determine whether the frequencies of the plurality of word segments calculated by the word segmentation processing are lower than a threshold; if If one of the plurality of word segments is lower than the threshold, one of the plurality of word segments is the new vocabulary, and the definition of the new vocabulary is received to update the common vocabulary model and the common semantic model; if If the plurality of word segments are all higher than the threshold, the corpus data is new corpus data, and the intention of receiving the new corpus data is to update the functional situation model. 13.根据权利要求12所述的虚拟助理的自动学习系统,其特征在于,该更新信息模块更用以判断该新语料数据是否为共通语料,如果是则根据该新语料数据更新该系统领域词汇集合;以及根据该新词汇更新该系统领域词汇集合。13. The automatic learning system of the virtual assistant according to claim 12, characterized in that the update information module is further used to determine whether the new corpus data is common corpus, and if so, update the system domain vocabulary according to the new corpus data. collection; and updating the system domain vocabulary collection according to the new vocabulary. 14.根据权利要求9所述的虚拟助理的自动学习系统,其特征在于,该语料分析模块更用以利用该共通词汇模型辨识该语料数据中是否具有符合该关键实体集合中的所述多个系统领域词汇,将辨识结果设定为所述多个词汇,并分析所述多个词汇出现的机率,根据所述多个词汇分析该语料数据的句型结构,并利用该共通语意模型根据所述多个词汇出现的机率以及该语料数据的句型结构辨识该语料数据的所述多个意图以及所述多个意图对应的机率。14. The automatic learning system of the virtual assistant according to claim 9, wherein the corpus analysis module is further configured to use the common vocabulary model to identify whether the corpus data contains the plurality of key entity sets. System domain vocabulary, set the recognition results to the multiple vocabulary, analyze the probability of occurrence of the multiple vocabulary, analyze the sentence structure of the corpus data based on the multiple vocabulary, and use the common semantic model to analyze the sentence structure of the corpus based on the multiple vocabulary. The occurrence probabilities of the plurality of words and the sentence structure of the corpus data are used to identify the plurality of intentions of the corpus data and the probabilities corresponding to the plurality of intentions.
CN201810436639.2A 2018-05-09 2018-05-09 Automatic learning method and system of virtual assistant Active CN110489517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810436639.2A CN110489517B (en) 2018-05-09 2018-05-09 Automatic learning method and system of virtual assistant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810436639.2A CN110489517B (en) 2018-05-09 2018-05-09 Automatic learning method and system of virtual assistant

Publications (2)

Publication Number Publication Date
CN110489517A CN110489517A (en) 2019-11-22
CN110489517B true CN110489517B (en) 2023-10-31

Family

ID=68543225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810436639.2A Active CN110489517B (en) 2018-05-09 2018-05-09 Automatic learning method and system of virtual assistant

Country Status (1)

Country Link
CN (1) CN110489517B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949797B (en) 2019-03-11 2021-11-12 北京百度网讯科技有限公司 Method, device, equipment and storage medium for generating training corpus
CN112559699A (en) * 2020-11-09 2021-03-26 联想(北京)有限公司 Information interaction method, device and equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007094291A (en) * 2005-09-30 2007-04-12 Tetsuo Suga Learning system of linguistic knowledge of natural language learning system and recording medium which records natural language learning program
TW201140559A (en) * 2010-05-10 2011-11-16 Univ Nat Cheng Kung Method and system for identifying emotional voices
CN103226949A (en) * 2011-09-30 2013-07-31 苹果公司 Using context information to facilitate processing of commands in a virtual assistant
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
CN104360994A (en) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 Natural language understanding method and natural language understanding system
TW201516756A (en) * 2013-10-28 2015-05-01 Univ Kun Shan Intelligent voice control system and method therefor
CN104778945A (en) * 2005-08-05 2015-07-15 沃伊斯博克斯科技公司 Systems and methods for responding to natural language speech utterance
CN106057200A (en) * 2016-06-23 2016-10-26 广州亿程交通信息有限公司 Semantic-based interaction system and interaction method
CN107015969A (en) * 2017-05-19 2017-08-04 四川长虹电器股份有限公司 Can self-renewing semantic understanding System and method for
WO2017212783A1 (en) * 2016-06-09 2017-12-14 真由美 稲場 Program for realizing function to assist in understanding personality and preferences of other party and communicating
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186523B (en) * 2011-12-30 2017-05-10 富泰华工业(深圳)有限公司 Electronic device and natural language analyzing method thereof
US10217059B2 (en) * 2014-02-04 2019-02-26 Maluuba Inc. Method and system for generating natural language training data
US10402453B2 (en) * 2014-06-27 2019-09-03 Nuance Communications, Inc. Utilizing large-scale knowledge graphs to support inference at scale and explanation generation
US9754207B2 (en) * 2014-07-28 2017-09-05 International Business Machines Corporation Corpus quality analysis
KR102447513B1 (en) * 2016-01-22 2022-09-27 한국전자통신연구원 Self-learning based dialogue apparatus for incremental dialogue knowledge, and method thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778945A (en) * 2005-08-05 2015-07-15 沃伊斯博克斯科技公司 Systems and methods for responding to natural language speech utterance
JP2007094291A (en) * 2005-09-30 2007-04-12 Tetsuo Suga Learning system of linguistic knowledge of natural language learning system and recording medium which records natural language learning program
TW201140559A (en) * 2010-05-10 2011-11-16 Univ Nat Cheng Kung Method and system for identifying emotional voices
CN103226949A (en) * 2011-09-30 2013-07-31 苹果公司 Using context information to facilitate processing of commands in a virtual assistant
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
TW201516756A (en) * 2013-10-28 2015-05-01 Univ Kun Shan Intelligent voice control system and method therefor
CN104360994A (en) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 Natural language understanding method and natural language understanding system
WO2017212783A1 (en) * 2016-06-09 2017-12-14 真由美 稲場 Program for realizing function to assist in understanding personality and preferences of other party and communicating
CN106057200A (en) * 2016-06-23 2016-10-26 广州亿程交通信息有限公司 Semantic-based interaction system and interaction method
CN107015969A (en) * 2017-05-19 2017-08-04 四川长虹电器股份有限公司 Can self-renewing semantic understanding System and method for
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
智能语音助手抢占AI入口市场;孟晋;《新经济导刊》;20170405(第04期);全文 *
电子词典与词汇知识表达;陈克健;中文信息学报(第04期);全文 *

Also Published As

Publication number Publication date
CN110489517A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
US11106983B2 (en) Intelligent interaction method and intelligent interaction system
WO2018196684A1 (en) Method and device for generating conversational robot
US8719192B2 (en) Transfer of learning for query classification
WO2020098308A1 (en) Method, device and equipment for establishing crowd portrait classification medel and storage medium
US9672490B2 (en) Procurement system
CN109635117A (en) A kind of knowledge based spectrum recognition user intention method and device
CN110297868A (en) Construct enterprise's specific knowledge figure
US20160132830A1 (en) Multi-level score based title engine
US11823082B2 (en) Methods for orchestrating an automated conversation in one or more networks and devices thereof
US9910909B2 (en) Method and apparatus for extracting journey of life attributes of a user from user interactions
CN116933130A (en) Enterprise industry classification method, system, equipment and medium based on big data
US11593740B1 (en) Computing system for automated evaluation of process workflows
CN110489517B (en) Automatic learning method and system of virtual assistant
CN114399343B (en) Intelligent robot online auxiliary selling method and system
TWI674530B (en) Method and system for operating a virtual assistant
TWI679548B (en) Method and system for automated learning of a virtual assistant
US20230351121A1 (en) Method and system for generating conversation flows
US20240127026A1 (en) Shallow-deep machine learning classifier and method
US12182765B2 (en) Automated credential processing system
CN110209776B (en) Method and system for operating virtual assistant
US20230351257A1 (en) Method and system for training virtual agents through fallback analysis
US20210117811A1 (en) Providing predictive analytics with predictions tailored for a specific domain
CN114529191B (en) Method and device for risk identification
CN113743127B (en) Task type dialogue method, device, electronic equipment and storage medium
CN117094786A (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and commodity recommendation medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20191122

Assignee: Shanghai Dingjie Shuzhi Software Co.,Ltd.

Assignor: DIGIWIN SOFTWARE Co.,Ltd.

Contract record no.: X2024310000112

Denomination of invention: Automated Learning Methods and Systems for Virtual Assistants

Granted publication date: 20231031

License type: Common License

Record date: 20240903

EE01 Entry into force of recordation of patent licensing contract
CP03 Change of name, title or address

Address after: 20th Floor, No. 7, Lane 1377, Jiangchang Road, Jing'an District, Shanghai, 200400

Patentee after: Dingjie Shuzhi Co.,Ltd.

Country or region after: China

Address before: 200443 22F, Building 1, Central Greenland, Lane 1377, Jiangchang Road, Zhabei District, Shanghai

Patentee before: DIGIWIN SOFTWARE Co.,Ltd.

Country or region before: China