CN111782793A - Intelligent customer service processing method, system and device - Google Patents
Intelligent customer service processing method, system and device Download PDFInfo
- Publication number
- CN111782793A CN111782793A CN202010798753.7A CN202010798753A CN111782793A CN 111782793 A CN111782793 A CN 111782793A CN 202010798753 A CN202010798753 A CN 202010798753A CN 111782793 A CN111782793 A CN 111782793A
- Authority
- CN
- China
- Prior art keywords
- content
- customer service
- classification model
- text
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种智能客服处理方法和系统及设备。方法包括:获取智能客服系统中的多轮对话内容,根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,其中每一标签类对应到一个回复项;将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。本发明方案,灵活度更高,能处理更多形式的用户请求;相较于基于检索方式的客服系统,利用分类模型取代检索式有助于提升整体系统鲁棒性;可以解决原检索式系统所会面临的语义鸿沟问题;能减少噪声对于系统选择回复项的影响;精确度更高。
The invention discloses an intelligent customer service processing method, system and equipment. The method includes: acquiring the content of multiple rounds of dialogue in the intelligent customer service system, performing model training according to the content of the multiple rounds of dialogue, and constructing a multi-category label classification model of the above context, wherein each label category corresponds to a reply item; The request content is sent to the multi-category label classification model for classification, and the corresponding reply items are output according to the classified labels. The solution of the present invention has higher flexibility and can handle more forms of user requests; compared with the customer service system based on the retrieval method, the use of a classification model to replace the retrieval method helps to improve the robustness of the overall system; it can solve the problem of the original retrieval method. The problem of the semantic gap that will be faced; it can reduce the influence of noise on the system's choice of reply items; the accuracy is higher.
Description
技术领域technical field
本发明涉及计算机技术领域,具体涉及一种智能客服处理方法和系统及设备。The invention relates to the field of computer technology, and in particular to a method, system and device for processing intelligent customer service.
背景技术Background technique
对话系统(Spoken Dialogue System)可看作是模仿人-人之间透过自然语言的方式进行沟通的交互系统,随自动语音识别(Automatic Speech Recognition,ASR)与自然语言处理(Natural Language Processing,NLP)技术逐渐成熟,已被广泛应用于客服系统。2006年开始深度学习方法被广泛的研究与应用,多个机器学习任务得到突破性的发展,再加上2013年word2vec、语言模型以及序列到序列模型相继出现,海量的数据被搜集并运用于系统的训练与设计。Spoken Dialogue System can be regarded as an interactive system that imitates human-human communication through natural language. ) technology has gradually matured and has been widely used in customer service systems. Since 2006, deep learning methods have been widely researched and applied, and many machine learning tasks have achieved breakthrough development. In 2013, word2vec, language models, and sequence-to-sequence models have appeared one after another. Massive amounts of data have been collected and applied to the system. training and design.
在交通卡客服场景中,最主要的服务面向卡片办理与激活等业务咨询,属于垂直领域的智能客服系统。根据用户的输入请求检索出回复项,传统做法是使用一系列人工预先定义好的规则进行语义解析,匹配出回复项。目前常规做法是基于检索式的模型,检索式对话系统主要思想是针对用户的输入请求在多个回复候选项中匹配出一个最合适的回复项作为输出,匹配方式有两种。第一种是基于表征的匹配方式,初始阶段分别对输入与回复候选项各自进行文本特征提取,然后再透过相似度函数对得到的文本表征进行相似度计算,匹配层度最高者作为输出。第二种方式则是基于交互的匹配方式,有别于方式一在最后阶段才对文本表征计算相似度,基于交互的匹配方式在模型前段便对文本表征进行交互,透过相似度矩阵计算,使模型能获取不同层级、不同粒度的匹配关系,最终返回匹配层度最高的回复项。In the traffic card customer service scenario, the most important service is for business consulting such as card processing and activation, which belongs to the intelligent customer service system in the vertical field. The reply items are retrieved according to the user's input request. The traditional method is to use a series of manually predefined rules to perform semantic analysis and match the reply items. At present, the conventional practice is based on the retrieval-based model. The main idea of the retrieval-based dialogue system is to match the most suitable reply item among multiple reply candidates for the user's input request as the output. There are two matching methods. The first is a representation-based matching method. In the initial stage, text features are extracted for the input and reply candidates respectively, and then the similarity calculation is performed on the obtained text representations through the similarity function, and the one with the highest matching level is used as the output. The second method is an interaction-based matching method. Different from method 1, the similarity is calculated for the text representation in the final stage. The interaction-based matching method interacts with the text representation in the first stage of the model, and calculates through the similarity matrix. The model can obtain matching relationships of different levels and granularities, and finally returns the reply item with the highest matching level.
现有技术存在的问题是,基于人工定义规则的做法不够灵活,无法很好的解决多样与多变的用户请求;而基于检索方式的客服系统,从回复候选集中选出一个最合适的回复项作为输出,该方法的核心是对用户的输入请求信息与回复项之间进行相关性匹配(Relevance Matching),然而信息-回复之间有时在字面上并无直接的关系,存在语义鸿沟的问题要克服,另外由于是对信息-回复进行匹配,两文本信息的质量非常重要,文本中若出现不相干的内容,此类噪声容易对基于检索的对话系统在匹配回复项时造成影响,鲁棒性较低。The problem existing in the prior art is that the method based on manual definition of rules is not flexible enough to well solve the diverse and changeable user requests; while the customer service system based on the retrieval method selects the most suitable reply item from the reply candidate set. As the output, the core of the method is to perform correlation matching between the user's input request information and the reply items (Relevance Matching). However, sometimes there is no direct relationship between the information and the reply, and there is a semantic gap. In addition, due to the information-reply matching, the quality of the two texts is very important. If there is irrelevant content in the text, such noise will easily affect the retrieval-based dialogue system when matching the reply items. Robustness lower.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种智能客服处理方法和系统及设备,用于改善现有技术存在上述缺陷。The purpose of the present invention is to provide an intelligent customer service processing method, system and device, which are used to improve the above-mentioned defects in the prior art.
为实现上述目的,本发明采用如下技术方案。In order to achieve the above objects, the present invention adopts the following technical solutions.
第一方面,提供一种智能客服处理方法,包括:获取智能客服系统中的多轮对话内容,根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,其中每一标签类对应到一个回复项;将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。In a first aspect, a method for processing intelligent customer service is provided, including: acquiring multiple rounds of dialogue content in an intelligent customer service system, performing model training according to the multiple rounds of dialogue content, and constructing a multi-category label classification model of the above context, where each label The class corresponds to a reply item; the request content input by the user is sent to the multi-class label classification model for classification, and the corresponding reply item is output according to the classified label.
在一些可能的实现方式中,所述根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,包括:从智能客服系统中获取多轮对话内容;将获取的多轮对话内容与上文内容进行关联,对关联后的文本数据进行标注,标注的类别包括闲聊、业务和投诉,其中业务类别进一步进行类别标注,业务的每个类别对应到一个回复项;对已经标注的文本数据进行预处理,所述预处理包括数据清洗和分词处理,将预处理后的文本数据分为训练集和测试集;对训练集的文本数据进行文本特征提取;采用提取的文本特征对分类模型进行训练,得到上文语境的多类别标签分类模型。In some possible implementations, the model training is performed according to the content of multiple rounds of dialogues to construct a multi-category label classification model of the above context, including: obtaining the content of the multiple rounds of dialogues from the intelligent customer service system; The content is associated with the above content, and the associated text data is labeled. The labeled categories include chat, business, and complaints. The business category is further labeled, and each category of business corresponds to a reply item; The text data is preprocessed, the preprocessing includes data cleaning and word segmentation processing, and the preprocessed text data is divided into a training set and a test set; text feature extraction is performed on the text data in the training set; the extracted text features are used to classify The model is trained to obtain the multi-class label classification model of the above context.
在一些可能的实现方式中,所述将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项,包括:接收用户当前输入的请求内容;将用户当前输入的请求内容与上文内容进行关联,生成输入数据;对生成的输入数据进行预处理,所述预处理包括数据清洗和分词处理;对预处理后的输入数据进行文本特征提取;将提取的文本特征送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。In some possible implementations, sending the request content input by the user into a multi-category label classification model for classification, and outputting corresponding reply items according to the classified labels, including: receiving the request content currently input by the user; The requested content is associated with the above content to generate input data; the generated input data is preprocessed, and the preprocessing includes data cleaning and word segmentation processing; text feature extraction is performed on the preprocessed input data; the extracted text The features are sent to the multi-class label classification model for classification, and the corresponding reply items are output according to the classified labels.
第二方面,提供一种智能客服系统,包括:分类模块,用于获取智能客服系统中的多轮对话内容,根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,其中每一标签类对应到一个回复项;回复模块,用于将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。In a second aspect, an intelligent customer service system is provided, including: a classification module for acquiring multi-round dialogue content in the intelligent customer service system, performing model training according to the multi-round dialogue content, and constructing a multi-category label classification model of the above context, Each tag class corresponds to a reply item; the reply module is used to send the request content input by the user into the multi-category tag classification model for classification, and output the corresponding reply item according to the classified tags.
在一些可能的实现方式中,所述分类模块包括:获取单元,用于从智能客服系统中获取多轮对话内容,存入数据库;标注单元,用于多轮对话内容与上文内容进行关联,对关联后的文本数据进行标注,标注的类别包括闲聊、业务和投诉,其中业务类别进一步进行类别标注,业务的每个类别对应到一个回复项,将已经标注的文本数据存入数据库;预处理单元,用于对已经标注的文本数据进行预处理,所述预处理包括数据清洗和分词处理,将预处理后的文本数据分为训练集和测试集;文本特征提取单元,用于对训练集的文本数据进行文本特征提取;模型构建单元,用于采用提取的文本特征对分类模型进行训练,得到上文语境的多类别标签分类模型。In some possible implementations, the classification module includes: an acquisition unit, used for acquiring multiple rounds of dialogue content from the intelligent customer service system, and storing it in a database; a labeling unit, used for associating the multiple rounds of dialogue content with the above content, Label the associated text data. The labeling categories include chat, business and complaints. The business category is further labelled. Each category of business corresponds to a reply item, and the labelled text data is stored in the database; preprocessing A unit for preprocessing the marked text data, the preprocessing includes data cleaning and word segmentation processing, and the preprocessed text data is divided into a training set and a test set; a text feature extraction unit is used for the training set. Extract the text features from the text data; the model building unit is used to train the classification model by using the extracted text features to obtain the multi-class label classification model of the above context.
在一些可能的实现方式中,所述回复模块包括:接收单元,用于接收用户当前输入的请求内容;关联单元,用于将用户当前输入的请求问题与上文内容进行关联,生成输入数据;预处理单元,用于对生成的输入数据进行预处理,所述预处理包括数据清洗和分词处理;文本特征提取单元,用于对预处理后的输入数据进行文本特征提取;输出单元,用于将提取的文本特征送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。In some possible implementations, the reply module includes: a receiving unit, configured to receive the request content currently input by the user; an association unit, configured to associate the request question currently input by the user with the above content to generate input data; a preprocessing unit for preprocessing the generated input data, the preprocessing includes data cleaning and word segmentation processing; a text feature extraction unit for performing text feature extraction on the preprocessed input data; an output unit for The extracted text features are sent to the multi-category label classification model for classification, and the corresponding reply items are output according to the classified labels.
第三方面,提供一种计算机设备,包括处理器和存储器,所述存储器中存储有程序,所述程序包括计算机执行指令,当所述计算机设备运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述计算机设备执行如第一方面所述的智能客服处理方法。In a third aspect, a computer device is provided, including a processor and a memory, the memory stores a program, the program includes computer-executable instructions, and when the computer device runs, the processor executes a program stored in the memory. The computer executes the instructions, so that the computer device executes the intelligent customer service processing method according to the first aspect.
第四方面,提供一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括计算机执行指令,所述计算机执行指令当被计算机设备执行时使计算机设备执行如第一方面所述的智能客服处理方法。In a fourth aspect, there is provided a computer-readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions that, when executed by a computer device, cause the computer device to perform the first The intelligent customer service processing method described in the aspect.
从以上技术方案可以看出,本发明实施例具有以下优点:As can be seen from the above technical solutions, the embodiments of the present invention have the following advantages:
首先,相较于传统人工规则的方法,本发明的灵活度更高,能处理更多形式的用户请求;First, compared with the traditional manual rule method, the present invention is more flexible and can handle more forms of user requests;
其次,相较于基于检索方式的客服系统,本发明利用分类模块取代检索式的做法有助于提升整体系统鲁棒性;Secondly, compared with the customer service system based on the retrieval method, the method of using the classification module to replace the retrieval method in the present invention helps to improve the robustness of the overall system;
再次,由于不需要利用到回复项的文本内容进行匹配,本发明可以解决原本检索式系统所会面临的语义鸿沟问题;Thirdly, because the text content of the reply item does not need to be used for matching, the present invention can solve the problem of semantic gap that the original retrieval system would face;
再次,本发明同时能减少噪声对于系统选择回复项的影响;Thirdly, the present invention can simultaneously reduce the influence of noise on the system to select reply items;
再次,相对于已有的智能客服系统,本发明中充分利用上文对话内容信息,透过与上文关联的方式对当前用户请求进行回复,精确度更高。Thirdly, compared with the existing intelligent customer service system, the present invention makes full use of the above dialogue content information, and responds to the current user request in a manner associated with the above, with higher accuracy.
附图说明Description of drawings
为了更清楚地说明本发明实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments and the prior art. Obviously, the drawings in the following description are only some implementations of the present invention. For example, for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1是本发明实施例提供的一种智能客服处理方法的流程示意图;1 is a schematic flowchart of a method for processing an intelligent customer service provided by an embodiment of the present invention;
图2是本发明实施例提供的一种智能客服系统的结构图;2 is a structural diagram of an intelligent customer service system provided by an embodiment of the present invention;
图3是本发明实施例中构建多标签分类模型的流程图;3 is a flow chart of constructing a multi-label classification model in an embodiment of the present invention;
图4是本发明实施例中进行文本标注的流程图;4 is a flow chart of text annotation in an embodiment of the present invention;
图5是本发明实施例中回复用户部分的流程图;Fig. 5 is the flow chart of replying to the user part in the embodiment of the present invention;
图6是本发明实施例提供的一种计算机设备的结构图。FIG. 6 is a structural diagram of a computer device according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别不同的对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and the like in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.
下面通过具体实施例,分别进行详细的说明。The following detailed descriptions are given respectively through specific embodiments.
本发明实施例提供一种智能客服系统以及相应的处理方法。该系统是基于分类方法的端到端(end to end)智能客服系统,主要由两部分组成,一个是多标签分类模型的构建部分,另一则是回复用户部分。多标签分类模型的构建部分,主要是获取智能客服系统中的多轮对话内容,并进行文本相关预处理,接着将处理过的多轮对话内容建模为上文语境的多类别标签分类模型,其中每一类都对应到一个特定的回复项。回复用户部分是处理用户输入的请求内容,对用户的请求内容进行分类,并输出对应到的回复项。Embodiments of the present invention provide an intelligent customer service system and a corresponding processing method. The system is an end-to-end intelligent customer service system based on the classification method, which is mainly composed of two parts, one is the construction part of the multi-label classification model, and the other is the part of replying to the user. The construction part of the multi-label classification model is mainly to obtain the multi-round dialogue content in the intelligent customer service system, and perform text-related preprocessing, and then model the processed multi-round dialogue content as the multi-class label classification model of the above context. , each of which corresponds to a specific reply item. The reply to the user part is to process the request content input by the user, classify the request content of the user, and output the corresponding reply item.
请参考图1,本发明的一个实施例提供一种智能客服处理方法,该方法包括以下步骤:Please refer to FIG. 1, an embodiment of the present invention provides a method for processing intelligent customer service, the method includes the following steps:
S1、获取智能客服系统中的多轮对话内容,根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,其中每一标签类对应到一个回复项;S1. Obtain multiple rounds of dialogue content in the intelligent customer service system, perform model training according to the multiple rounds of dialogue content, and construct a multi-category label classification model of the above context, where each label class corresponds to a reply item;
S2、将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。S2. Send the request content input by the user into a multi-category label classification model for classification, and output corresponding reply items according to the classified labels.
进一步的,步骤S1中根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,可包括:从智能客服系统中获取多轮对话内容;将获取的多轮对话内容与上文内容进行关联,对关联后的文本数据进行标注,标注的类别包括闲聊、业务和投诉,其中业务类别进一步进行类别标注,业务的每个类别对应到一个回复项;对已经标注的文本数据进行预处理,所述预处理包括数据清洗和分词处理,将预处理后的文本数据分为训练集和测试集;对训练集的文本数据进行文本特征提取;采用提取的文本特征对分类模型进行训练,构建得到上文语境的多类别标签分类模型。Further, in step S1, model training is performed according to the content of multiple rounds of dialogue, and a multi-category label classification model of the above context is constructed, which may include: obtaining the content of multiple rounds of dialogue from the intelligent customer service system; The content of the text is correlated, and the associated text data is marked. The marked categories include chat, business, and complaints. The business category is further marked by category, and each category of business corresponds to a reply item; Preprocessing, the preprocessing includes data cleaning and word segmentation, dividing the preprocessed text data into a training set and a test set; extracting text features from the text data in the training set; using the extracted text features to train the classification model , build a multi-class label classification model that obtains the above context.
进一步的,步骤S2可包括:接收用户当前输入的请求内容;将用户当前输入的请求内容与上文内容进行关联,生成输入数据;对生成的输入数据进行预处理,所述预处理包括数据清洗和分词处理;对预处理后的输入数据进行文本特征提取;将提取的文本特征送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。Further, step S2 may include: receiving the request content currently input by the user; associating the request content currently input by the user with the above content to generate input data; preprocessing the generated input data, and the preprocessing includes data cleaning and word segmentation processing; extract text features from the preprocessed input data; send the extracted text features to a multi-category label classification model for classification, and output the corresponding reply items according to the classified labels.
请参考图2,本发明的另一实施例,提供一种智能客服系统,该系统包括:Please refer to FIG. 2 , another embodiment of the present invention provides an intelligent customer service system, which includes:
分类模块10,用于获取智能客服系统中的多轮对话内容,根据多轮对话内容进行模型训练,构建上文语境的多类别标签分类模型,其中每一标签类对应到一个回复项。The
回复模块20,用于将用户输入的请求内容送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。The
进一步的,所述分类模块10可包括:Further, the
获取单元,用于从智能客服系统中获取多轮对话内容,存入数据库;The acquisition unit is used to acquire the content of multiple rounds of conversations from the intelligent customer service system and store them in the database;
标注单元,用于多轮对话内容与上文内容进行关联,对关联后的文本数据进行标注,标注的类别包括闲聊、业务和投诉,其中业务类别进一步进行类别标注,业务的每个类别对应到一个回复项,将已经标注的文本数据存入数据库;The labeling unit is used to associate the content of multiple rounds of conversations with the above content, and label the associated text data. The labeling categories include chat, business and complaints. The business category is further labelled, and each business category corresponds to A reply item that stores the marked text data in the database;
预处理单元,用于对已经标注的文本数据进行预处理,所述预处理包括数据清洗和分词处理,将预处理后的文本数据分为训练集和测试集;a preprocessing unit, used for preprocessing the marked text data, the preprocessing includes data cleaning and word segmentation processing, and the preprocessed text data is divided into a training set and a test set;
文本特征提取单元,用于对训练集的文本数据进行文本特征提取;A text feature extraction unit, which is used to extract text features from the text data of the training set;
模型构建单元,用于采用提取的文本特征对分类模型进行训练,构建得到上文语境的多类别标签分类模型。The model building unit is used for training the classification model by using the extracted text features, and constructing the multi-category label classification model obtained in the above context.
进一步的,所述回复模块20可包括:Further, the
接收单元,用于接收用户当前输入的请求内容;a receiving unit, configured to receive the request content currently input by the user;
关联单元,用于将用户当前输入的请求问题与上文内容进行关联,生成输入数据;an association unit, used to associate the request question currently input by the user with the above content to generate input data;
预处理单元,用于对生成的输入数据进行预处理,所述预处理包括数据清洗和分词处理;a preprocessing unit for preprocessing the generated input data, the preprocessing includes data cleaning and word segmentation;
文本特征提取单元,用于对预处理后的输入数据进行文本特征提取;A text feature extraction unit, which is used to extract text features from the preprocessed input data;
输出单元,用于将提取的文本特征送入多类别标签分类模型进行分类,按照分类的标签输出对应的回复项。The output unit is used to send the extracted text features into the multi-category label classification model for classification, and output corresponding reply items according to the classified labels.
下面,分别对比本发明实施例技术方案中的多标签分类模型构建部分以及回复用户部分,分别进行进一步说明。In the following, the construction part of the multi-label classification model and the part of replying to the user in the technical solutions of the embodiments of the present invention are respectively compared and further explained respectively.
(一)多标签分类模型的构建(1) Construction of a multi-label classification model
请参考图3,多标签分类模型的构建包括以下步骤:Referring to Figure 3, the construction of a multi-label classification model includes the following steps:
1.1获取多轮对话文本数据1.1 Obtaining multi-round dialogue text data
A)从智能客服系统,例如,面向ETC(Electronic Toll Collection,电子不停车收费系统)卡场景的卡场景的大交通智能客服系统,获取线上真实对话数据;A) Obtain online real conversation data from an intelligent customer service system, for example, a large traffic intelligent customer service system for the card scene of the ETC (Electronic Toll Collection, electronic non-stop toll collection system) card scene;
B)将获取数据存入数据库,例如:Mongo DB。MongoDB是一个基于分布式文件存储的数据库。B) Store the acquired data in a database, for example: Mongo DB. MongoDB is a database based on distributed file storage.
1.2文本信息标注1.2 Text information annotation
请参考图4,进行文本标注的流程如下。Referring to FIG. 4 , the process of text annotation is as follows.
A)首先对多轮对话内容文本数据信息进行标注:A) First, mark the text data information of the multi-round dialogue content:
A.1)将待标注的多轮对话内容导入数据标注系统中;A.1) Import the multi-round dialogue content to be marked into the data marking system;
A.2)标注数据人员将当前对话内容标注文本与上文内容进行关联;A.2) The data annotation personnel associate the annotation text of the current dialogue content with the above content;
A.3)依照对话内容进行文本标注,主要类别可包含闲聊、投诉、业务;A.3) Text annotation according to the content of the dialogue, the main categories can include small talk, complaints, business;
A.4)业务类别中能更进一步进行类别标注,而每个类别即对应到一个回复项;A.4) In the business category, category labeling can be further carried out, and each category corresponds to a reply item;
A.5)提交标注数据。A.5) Submit annotation data.
B)对标注过数据进行审查:B) Review the labeled data:
B.1)审查人员审查已标注数据;B.1) Reviewers review the marked data;
B.2)是否驳回已标注数据,若是则标注人员重新标注数据并再次审查;B.2) Whether to reject the labeled data, if so, the labeler will re-label the data and review it again;
B.3)审查通过数据入库,用于模型增量训练。B.3) Review and pass the data into the database for incremental training of the model.
C)所有已标注数据入库,交通卡领域数据集构建完成。C) All the labeled data are stored in the database, and the construction of the traffic card field dataset is completed.
1.3文本信息预处理1.3 Text Information Preprocessing
A)文本数据清洗,利用已有工具,例如python中的re模块,透过正则的方式去除文本数据中特殊的字符。A) Text data cleaning, using existing tools, such as the re module in python, to remove special characters in text data in a regular way.
B)文本分词:B) Text segmentation:
B.1)针对中文文本的预处理,由于中文的结构关系,特征粒度为词粒度远远优于字粒度,因此需对中文进行分词处理;B.1) For the preprocessing of Chinese text, due to the structural relationship of Chinese, the feature granularity is that the word granularity is far superior to the word granularity, so it is necessary to perform word segmentation processing on Chinese;
B.2)分词方法主要包含基于字符串匹配的分词方法、基于统计的分词方法以及基于理解的分词方法,可利用已有工具,例如结巴分词对中文文本进行分词处理;B.2) The word segmentation method mainly includes the word segmentation method based on string matching, the word segmentation method based on statistics and the word segmentation method based on understanding, and can use existing tools, such as stammer word segmentation, to perform word segmentation processing on Chinese text;
C)将预处理过的文本数据按比例分为训练集Tr_set与测试集Te_set。C) Divide the preprocessed text data into a training set Tr_set and a test set Te_set in proportion.
1.4构建分类模块1.4 Building a classification module
A)文本特征提取:A) Text feature extraction:
A.1)文本特征工程主要是要根据数据库抽取出能够体现出文本特征的特征表示;A.1) Text feature engineering is mainly to extract feature representations that can reflect text features according to the database;
A.2)输入为经过分词处理的多轮对话内容的训练集Tr_set,可使用已有工程技术比如TF-IDF进行文本特征提取;A.2) The input is the training set Tr_set of multi-round dialogue content processed by word segmentation, and existing engineering techniques such as TF-IDF can be used for text feature extraction;
A.3)TF-IDF(term frequency–inverse document frequency)是一种用于信息检索与数据挖掘的常用加权技术。TF是词频(Term Frequency),IDF是逆文本频率指数(Inverse Document Frequency)。A.3) TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency and IDF is Inverse Document Frequency.
词频(TF)是一词语出现的次数除以该文件的总词语数。假如一篇文件的总词语数是100个,而词语“激活”出现了3次,那么“激活”一词在该文件中的词频就是0.03(3/100)。一个计算文件频率(DF)的方法是测定有多少份文件出现过“激活”一词,然后除以文件集里包含的文件总数。所以,如果“激活”一词在1,000份文件出现过,而文件总数是10,000,000份的话,其文件频率就是0.0001(1000/10,000,000)。最后,TF-IDF分数就可以由计算词频x(文件频率的倒数的对数)而得到。以上面的例子来说,“激活”一词在该文件集的TF-IDF分数会是0.12=(0.03xlog(1/0.0001))。Term Frequency (TF) is the number of times a term occurs divided by the total number of terms in the document. If the total number of words in a document is 100, and the word "activation" appears 3 times, then the word frequency of the word "activation" in the document is 0.03 (3/100). One way to calculate the document frequency (DF) is to measure how many documents have the word "active" and divide by the total number of documents contained in the document set. So, if the word "active" appears in 1,000 documents, and the total number of documents is 10,000,000, the document frequency is 0.0001 (1000/10,000,000). Finally, the TF-IDF score can be obtained by calculating the term frequency x (logarithm of the inverse of the document frequency). Taking the example above, the TF-IDF score for the word "activation" in this file set would be 0.12=(0.03xlog(1/0.0001)).
B)分类模型B) Classification model
B.1)本文采用的分类模型属于一个监督学习的模型,输入为与上文关联的多轮对话特征表示q以及对应的业务标签l;这里,多轮对话特征表示q,包括当前轮和之前几轮的对话的特征表示;B.1) The classification model used in this paper belongs to a supervised learning model, and the input is the multi-round dialogue feature representation q associated with the above and the corresponding business label l; here, the multi-round dialogue feature representation q, including the current round and the previous round Feature representation of several rounds of dialogue;
B.2)分类算法采用已有有监督学习的工程技术比如支持向量机,根据输入的文本特征表示训练模型;B.2) The classification algorithm adopts the existing supervised learning engineering technology such as support vector machine, and represents the training model according to the input text features;
B.3)计算各业务标签置信度;B.3) Calculate the confidence level of each service label;
B.4)分类模型输出业务标签;B.4) The classification model outputs business labels;
C)导出并保存多类别标签分类模型,记为CM。C) Export and save the multi-class label classification model, denoted as CM.
(二)回复用户(2) Reply to the user
请参考图5,回复用户部分包括以下步骤:Referring to Figure 5, the Reply to User section includes the following steps:
2.1用户输入请求2.1 User input request
用户输入的请求内容qi,通常是含有一系列关键词的问句,例如“卡片激活异常”,“无法连接蓝牙”等。The request content qi input by the user is usually a question sentence containing a series of keywords, such as "abnormal card activation", "unable to connect to Bluetooth" and so on.
2.2对用户所输入的请求与上文内容进行关联2.2 Associating the request entered by the user with the above content
由于智能客服中的对话内容往往属于一个多轮的问答语境,因此将当前一轮的用户请求内容qi与上文内容{qi-n,ri-n,…,qi-2,ri-2,qi-1,ri-1}进行关联,其中q为请求r为回复,产生与上文关联过的输入数据qi ’能有效帮助回复用户请求。Since the dialogue content in intelligent customer service often belongs to a multi-round question and answer context, the current round of user request content qi and the above content {q in ,r in ,...,q i -2 ,r i-2 , q i-1 , r i-1 } are associated, where q is the request and r is the reply, and generating the input data qi ' associated with the above can effectively help reply to the user request.
对于第i轮的用户请求内容qi,进行分类时不仅要输入qi,还要输入第i轮之前的n轮的对话数据,即{qi-n,ri-n,…,qi-2,ri-2,qi-1,ri-1}。也就是说,这里的输入数据qi ’不仅包括qi,还包括{qi-n,ri-n,…,qi-2,ri-2,qi-1,ri-1}。其中,n为经验值,可以根据具体需要确定,例如可以是3或4等,不予限定。For the user request content q i in the i-th round, not only q i but also the dialogue data of n rounds before the i-th round must be input when classifying, namely {q in ,r in ,...,q i-2 ,r i-2 , q i-1 , r i-1 }. That is, the input data q i ' here includes not only q i , but also {q in ,r in ,...,q i-2 ,r i-2 ,q i-1 ,r i-1 }. Among them, n is an empirical value, which can be determined according to specific needs, for example, it can be 3 or 4, etc., which is not limited.
2.3与上文关联的请求信息qi ’预处理,具体请参考前文步骤1.3;2.3 Preprocessing of the request information qi ' associated with the above, please refer to step 1.3 above for details;
2.4特征提取得qi ’,具体请参考前文步骤1.4中的A)文本特征提取;2.4 Feature extraction obtains qi ' , please refer to A) text feature extraction in the previous step 1.4 for details;
2.5送入分类模型CM,根据模型输出的业务标签l对应至回复项ri;2.5 is sent into the classification model CM, and corresponds to the reply item ri according to the business label 1 output by the model;
2.6输出回复项ri给用户。2.6 Output the reply item ri to the user.
请参考图6,本发明的一个实施例,还提供一种计算机设备60,包括处理器61和存储器62,所述存储器62中存储有程序,所述程序包括计算机执行指令,当所述计算机设备60运行时,所述处理器61执行所述存储器存储的所述计算机执行指令,以使所述计算机设备60执行如上文所述的智能客服处理方法。Referring to FIG. 6, an embodiment of the present invention further provides a
本发明的一个实施例,还提供一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括计算机执行指令,所述计算机执行指令当被计算机设备执行时,使所述计算机设备执行如上文所述的智能客服处理方法。An embodiment of the present invention also provides a computer-readable storage medium storing one or more programs, wherein the one or more programs include computer-executable instructions that, when executed by a computer device, cause all The computer device executes the intelligent customer service processing method as described above.
综上,本发明实施例公开了一种智能客服处理方法和系统及相关设备,从以上技术方案可以看出,本发明实施例具有以下优点:To sum up, the embodiment of the present invention discloses an intelligent customer service processing method and system and related equipment. It can be seen from the above technical solutions that the embodiment of the present invention has the following advantages:
首先,相较于传统人工规则的方法,本发明的灵活度更高,能处理更多形式的用户请求;First, compared with the traditional manual rule method, the present invention is more flexible and can handle more forms of user requests;
其次,相较于基于检索方式的客服系统,本发明利用分类模块取代检索式的做法有助于提升整体系统鲁棒性;Secondly, compared with the customer service system based on the retrieval method, the method of using the classification module to replace the retrieval method in the present invention helps to improve the robustness of the overall system;
再次,由于不需要利用到回复项的文本内容进行匹配,本发明可以解决原本检索式系统所会面临的语义鸿沟问题;Thirdly, because the text content of the reply item does not need to be used for matching, the present invention can solve the problem of semantic gap that the original retrieval system would face;
再次,本发明同时能减少噪声对于系统选择回复项的影响;Thirdly, the present invention can simultaneously reduce the influence of noise on the system to select reply items;
再次,相对于已有的智能客服系统,本发明中充分利用上文对话内容信息,透过与上文关联的方式对当前用户请求进行回复,精确度更高。Thirdly, compared with the existing intelligent customer service system, the present invention makes full use of the above dialogue content information, and responds to the current user request in a manner associated with the above, with higher accuracy.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
上述实施例仅用以说明本发明的技术方案,而非对其限制;本领域的普通技术人员应当理解:其依然可以对上述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; those of ordinary skill in the art should understand that they can still modify the technical solutions recorded in the above embodiments, or modify some of the technical features. Equivalent replacements are made; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010798753.7A CN111782793A (en) | 2020-08-11 | 2020-08-11 | Intelligent customer service processing method, system and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010798753.7A CN111782793A (en) | 2020-08-11 | 2020-08-11 | Intelligent customer service processing method, system and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111782793A true CN111782793A (en) | 2020-10-16 |
Family
ID=72761949
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010798753.7A Pending CN111782793A (en) | 2020-08-11 | 2020-08-11 | Intelligent customer service processing method, system and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111782793A (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112328871A (en) * | 2020-10-27 | 2021-02-05 | 深圳集智数字科技有限公司 | Reply generation method, device, equipment and storage medium based on RPA module |
| CN112883183A (en) * | 2021-03-22 | 2021-06-01 | 北京大学深圳研究院 | Method for constructing multi-classification model, intelligent customer service method, and related device and system |
| CN113282755A (en) * | 2021-06-11 | 2021-08-20 | 上海寻梦信息技术有限公司 | Dialogue type text classification method, system, equipment and storage medium |
| CN114357157A (en) * | 2021-12-08 | 2022-04-15 | 有米科技股份有限公司 | A method and device for data processing based on marketing text |
| CN115525745A (en) * | 2022-09-23 | 2022-12-27 | 北京智谱华章科技有限公司 | STAR interview question-asking method and equipment based on multi-label classification model |
| CN117059082A (en) * | 2023-10-13 | 2023-11-14 | 北京水滴科技集团有限公司 | Outbound call conversation method, device, medium and computer equipment based on large model |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109543030A (en) * | 2018-10-12 | 2019-03-29 | 平安科技(深圳)有限公司 | Customer service machine conference file classification method and device, equipment, storage medium |
| CN110059182A (en) * | 2019-03-21 | 2019-07-26 | 阿里巴巴集团控股有限公司 | Art recommended method and device towards customer service |
| CN111177359A (en) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | Multi-turn dialogue method and device |
-
2020
- 2020-08-11 CN CN202010798753.7A patent/CN111782793A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109543030A (en) * | 2018-10-12 | 2019-03-29 | 平安科技(深圳)有限公司 | Customer service machine conference file classification method and device, equipment, storage medium |
| CN110059182A (en) * | 2019-03-21 | 2019-07-26 | 阿里巴巴集团控股有限公司 | Art recommended method and device towards customer service |
| CN111177359A (en) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | Multi-turn dialogue method and device |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112328871A (en) * | 2020-10-27 | 2021-02-05 | 深圳集智数字科技有限公司 | Reply generation method, device, equipment and storage medium based on RPA module |
| CN112328871B (en) * | 2020-10-27 | 2024-04-26 | 深圳集智数字科技有限公司 | Reply generation method, device, equipment and storage medium based on RPA module |
| CN112883183A (en) * | 2021-03-22 | 2021-06-01 | 北京大学深圳研究院 | Method for constructing multi-classification model, intelligent customer service method, and related device and system |
| CN113282755A (en) * | 2021-06-11 | 2021-08-20 | 上海寻梦信息技术有限公司 | Dialogue type text classification method, system, equipment and storage medium |
| CN114357157A (en) * | 2021-12-08 | 2022-04-15 | 有米科技股份有限公司 | A method and device for data processing based on marketing text |
| CN115525745A (en) * | 2022-09-23 | 2022-12-27 | 北京智谱华章科技有限公司 | STAR interview question-asking method and equipment based on multi-label classification model |
| CN117059082A (en) * | 2023-10-13 | 2023-11-14 | 北京水滴科技集团有限公司 | Outbound call conversation method, device, medium and computer equipment based on large model |
| CN117059082B (en) * | 2023-10-13 | 2023-12-29 | 北京水滴科技集团有限公司 | Outbound call conversation method, device, medium and computer equipment based on large model |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11663411B2 (en) | Ontology expansion using entity-association rules and abstract relations | |
| CN110096570B (en) | An intent recognition method and device applied to an intelligent customer service robot | |
| CN111782793A (en) | Intelligent customer service processing method, system and device | |
| US20200372025A1 (en) | Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering | |
| CN112579666B (en) | Intelligent question-answering system and method and related equipment | |
| CN116955699B (en) | Video cross-mode search model training method, searching method and device | |
| WO2019153737A1 (en) | Comment assessing method, device, equipment and storage medium | |
| CN110110335A (en) | A kind of name entity recognition method based on Overlay model | |
| CN107886231B (en) | Customer Service Quality Evaluation Method and System | |
| CN111260437A (en) | A product recommendation method based on commodity aspect-level sentiment mining and fuzzy decision-making | |
| CN114265931B (en) | Consumer policy perception analysis method and system based on big data text mining | |
| CN117453895B (en) | Intelligent customer service response method, device, equipment and readable storage medium | |
| CN115062621B (en) | Label extraction method, label extraction device, electronic equipment and storage medium | |
| CN110287314B (en) | Method and system for long text credibility assessment based on unsupervised clustering | |
| CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
| CN112989001B (en) | Question and answer processing method and device, medium and electronic equipment | |
| CN114330318A (en) | Method and device for recognizing Chinese fine-grained entities in financial field | |
| CN112562736B (en) | Voice data set quality assessment method and device | |
| CN113886553A (en) | A text generation method, apparatus, device and storage medium | |
| CN111291168A (en) | Book retrieval method, device and readable storage medium | |
| CN107833059A (en) | The QoS evaluating method and system of customer service | |
| WO2023124647A1 (en) | Summary determination method and related device thereof | |
| CN110610003A (en) | Method and system for assisting text annotation | |
| CN114528851A (en) | Reply statement determination method and device, electronic equipment and storage medium | |
| CN116415047A (en) | A resource screening method and system based on national image resource recommendation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201016 |
