CN115147130A

CN115147130A - Problem prediction method, device, storage medium and program product

Info

Publication number: CN115147130A
Application number: CN202210771557.XA
Authority: CN
Inventors: 陈志钊
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2022-10-04

Abstract

The present application provides a problem prediction method, apparatus, storage medium and program product, the method comprising: acquiring an interactive behavior sequence of a user within current preset time; wherein the interaction behavior sequence comprises a browsed page sequence; predicting the type of the problem encountered by the user based on the problem classification model according to the interaction behavior sequence, and predicting the problem encountered by the user based on the problem prediction model corresponding to the type; and determining a solution pushed to the user according to the predicted problem. According to the method and the device, the page sequence corresponding to the browsing application program of the user in the preset time is classified, the type corresponding to the page sequence is determined, then the model matched with the type is selected to predict the problem, the problem which possibly exists in the user at present is accurately predicted, the problem of the user is found in time, the accuracy, the real-time performance and the effectiveness of the problem prediction are improved, the solution is recommended to the user, the problem of the user is solved at the first time, and the service experience of the user is improved.

Description

Problem prediction method, device, storage medium and program product

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种问题预测方法、设备、存储介质及程序产品。The present application relates to the field of artificial intelligence technology, and in particular, to a problem prediction method, device, storage medium and program product.

背景技术Background technique

随着人工智能技术的不断发展，智能客服机器人具备了知识库问答和对话能力，可以比较顺畅、准确地解答用户问题。With the continuous development of artificial intelligence technology, intelligent customer service robots have the ability to ask questions and dialogues in the knowledge base, and can answer user questions smoothly and accurately.

但是，上述客服机器人通常被动地接受用户咨询，需要用户找到客服机器人入口并表达服务诉求，费时费力，浪费资源，并且难以及时有效地发现用户问题，和在第一时间解决用户问题，用户体验较差。However, the above-mentioned customer service robots usually accept user consultation passively, requiring users to find the entrance of the customer service robot and express service demands, which is time-consuming and labor-intensive, wastes resources, and it is difficult to find user problems in a timely and effective manner, and solve user problems in the first time. The user experience is relatively poor. Difference.

发明内容SUMMARY OF THE INVENTION

本申请实施例的主要目的在于提供一种问题预测方法、设备、存储介质及程序产品，以进行实时的问题和意图预测，提升用户的服务体验。The main purpose of the embodiments of the present application is to provide a problem prediction method, device, storage medium and program product, so as to perform real-time problem and intention prediction and improve the service experience of users.

第一方面，本申请实施例提供一种问题预测方法，包括：In a first aspect, an embodiment of the present application provides a problem prediction method, including:

获取用户在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列；Acquire the interactive behavior sequence of the user within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

根据所述交互行为序列，基于问题分类模型预测所述用户遇到的问题的类型，并基于所述类型对应的问题预测模型，预测所述用户遇到的问题；According to the interactive behavior sequence, the type of the problem encountered by the user is predicted based on the problem classification model, and the problem encountered by the user is predicted based on the problem prediction model corresponding to the type;

根据预测得到的问题，确定推送给所述用户的解决方案。Based on the predicted problem, a solution to push to the user is determined.

可选的，根据所述交互行为序列，基于问题分类模型预测所述用户遇到的问题的类型，并基于所述类型对应的问题预测模型，预测所述用户遇到的问题，包括：Optionally, according to the interactive behavior sequence, predict the type of problems encountered by the user based on a problem classification model, and predict the problems encountered by the user based on a problem prediction model corresponding to the type, including:

将所述交互行为序列对应的特征向量输入到所述问题分类模型，确定所述用户遇到的问题的类型；Input the feature vector corresponding to the interactive behavior sequence into the problem classification model to determine the type of problem encountered by the user;

若所述类型为营销类或账户类，则基于序列模型对所述用户的交互行为序列进行处理，得到对应的问题；If the type is a marketing type or an account type, the user's interactive behavior sequence is processed based on the sequence model to obtain the corresponding question;

若所述类型为交易类，则基于深度兴趣进化网络模型对所述用户的交互行为序列以及用户对应的交易信息进行处理，得到对应的问题。If the type is a transaction type, the interaction behavior sequence of the user and the transaction information corresponding to the user are processed based on the deep interest evolution network model to obtain the corresponding problem.

可选的，还包括：Optionally, also include:

获取历史用户与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题；Obtain historical conversation data between historical users and intelligent customer service, and extract corresponding questions according to the historical conversation data;

根据所述历史用户在所述历史会话数据前的历史交互行为序列以及提取的问题，构建序列模型对应的训练数据集，并对所述序列模型进行训练；和/或，According to the historical interaction behavior sequence of the historical user before the historical session data and the extracted questions, a training data set corresponding to the sequence model is constructed, and the sequence model is trained; and/or,

根据所述历史用户在所述历史会话数据前的历史交互行为序列、交易信息以及提取的问题，构建深度兴趣进化网络模型对应的训练数据集，并对所述深度兴趣进化网络模型进行训练。According to the historical interaction behavior sequence, transaction information and extracted questions of the historical user before the historical session data, a training data set corresponding to the deep interest evolution network model is constructed, and the deep interest evolution network model is trained.

可选的，所述序列模型为用于处理序列的自然语言模型；对所述序列模型进行训练，包括：Optionally, the sequence model is a natural language model for processing sequences; training the sequence model includes:

将所述训练数据集中的历史交互行为序列以及提取的问题进行分词操作，得到分词序列；Perform a word segmentation operation on the historical interactive behavior sequence and the extracted questions in the training data set to obtain a word segmentation sequence;

将所述分词序列输入到所述自然语言模型，以预测下一句为训练目标，对所述自然语言模型进行预训练；Inputting the word segmentation sequence into the natural language model, and pre-training the natural language model with predicting the next sentence as a training target;

以提取的问题为监督信号，根据所述历史交互行为序列对预训练后的自然语言模型进行微调训练。Taking the extracted questions as supervision signals, fine-tuning training is performed on the pre-trained natural language model according to the historical interaction behavior sequence.

可选的，获取用户在当前预设时间内的交互行为序列，包括：Optionally, obtain the interactive behavior sequence of the user within the current preset time, including:

通过流计算框架，根据用户在当前预设时间内的交互日志，生成包含交互行为序列的流式数据；Through the stream computing framework, according to the user's interaction log within the current preset time, the stream data containing the interaction behavior sequence is generated;

基于规则逻辑库对所述流式数据进行筛选，并将筛选后的流式数据放入消息队列，所述消息队列中的流式数据用于供所述问题分类模型进行处理。The streaming data is filtered based on the rule logic library, and the filtered streaming data is put into a message queue, and the streaming data in the message queue is used for processing by the problem classification model.

可选的，基于规则逻辑库对所述流式数据进行筛选，并将筛选后的流式数据放入消息队列，包括：Optionally, filter the streaming data based on the rule logic library, and put the filtered streaming data into a message queue, including:

根据字典以及所述规则逻辑库中的规则，对所述流式数据进行筛选；Screening the streaming data according to the dictionary and the rules in the rule logic library;

其中，所述字典包括页面字典和/或页面关键字字典；Wherein, the dictionary includes a page dictionary and/or a page keyword dictionary;

所述字典通过历史用户浏览的历史页面与对应的问题确定；The dictionary is determined by historical pages browsed by historical users and corresponding questions;

所述规则用于表示将字典作为黑名单还是白名单，和/或，所述用户浏览的页面序列中多少页面与字典相匹配即完成对所述页面序列的筛选操作。The rule is used to indicate whether the dictionary is used as a blacklist or a whitelist, and/or the number of pages in the page sequence browsed by the user that matches the dictionary is used to complete the screening operation on the page sequence.

可选的，还包括：Optionally, also include:

获取多个历史用户与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题，得到问题集合；Obtaining historical conversation data of multiple historical users and intelligent customer service, extracting corresponding questions according to the historical conversation data, and obtaining a question set;

根据所述多个历史用户在所述历史会话数据前的历史浏览页面序列，确定所述多个历史用户浏览的页面集合以及页面关键字集合；determining a page set and a page keyword set browsed by the plurality of historical users according to the sequence of historical pages viewed by the plurality of historical users before the historical session data;

对于问题集合中的任一问题，计算所述页面集合中各页面与所述问题的互信息，以及所述页面关键字集合中各页面关键字与所述问题的互信息，并计算得到的互信息从所述页面集合和页面关键字集合中选择所述问题对应的目标页面以及目标页面关键字；For any question in the question set, calculate the mutual information between each page in the page set and the question, and the mutual information between each page keyword in the page keyword set and the question, and calculate the mutual information obtained. The information selects the target page and target page keyword corresponding to the question from the page set and the page keyword set;

将问题集合中各问题对应的目标页面进行融合，得到页面字典，将各问题对应的目标页面关键字进行融合，得到页面关键字字典。The target pages corresponding to each question in the question set are fused to obtain a page dictionary, and the target page keywords corresponding to each question are fused to obtain a page keyword dictionary.

第二方面，本申请实施例还提供一种问题预测方法，包括：In a second aspect, the embodiments of the present application also provide a problem prediction method, including:

获取店铺对应的至少一个买家在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列；Obtain the interactive behavior sequence of at least one buyer corresponding to the store within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

根据所述交互行为序列，基于问题分类模型预测各个买家遇到的问题的类型，并基于所述类型对应的问题预测模型，预测各个买家遇到的问题；According to the interactive behavior sequence, the type of the problem encountered by each buyer is predicted based on the problem classification model, and the problem encountered by each buyer is predicted based on the problem prediction model corresponding to the type;

根据预测得到的问题，确定推送给所述店铺对应的卖家的解决方案。According to the predicted problem, a solution to be pushed to the seller corresponding to the store is determined.

第三方面，本申请实施例提供一种电子设备，包括：In a third aspect, an embodiment of the present application provides an electronic device, including:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；a memory communicatively coupled to the at least one processor;

其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述电子设备执行上述任一方面所述的方法。Wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the electronic device to perform the method according to any one of the above aspects.

第四方面，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机执行指令，当处理器执行所述计算机执行指令时，实现上述任一方面所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, any of the above-mentioned aspects is implemented. method.

第五方面，本申请实施例提供一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述任一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including a computer program, which implements the method described in any one of the foregoing aspects when the computer program is executed by a processor.

本申请实施例提供的问题预测方法、设备、存储介质及程序产品，可以通过获取用户在当前预设时间内的交互行为序列；其中，交互行为序列包括浏览的页面序列；进一步的，根据交互行为序列，基于问题分类模型预测用户遇到的问题的类型，并基于类型对应的问题预测模型，预测用户遇到的问题；进一步的，根据预测得到的问题，确定推送给用户的解决方案。本申请可以基于对用户在预设时间内的浏览应用程序对应的页面序列进行分类，确定页面序列对应的类型，进而选择与类型相匹配的模型进行问题预测，精准的预测出用户当前可能存在的问题，及时发现用户问题，提高了问题预测的准确性、实时性和有效性，还可以在向用户推荐解决方案，在第一时间解决用户问题，进而提升用户的服务体验。The problem prediction method, device, storage medium, and program product provided by the embodiments of the present application can obtain the interactive behavior sequence of the user within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence; further, according to the interactive behavior Sequence, predicts the types of problems encountered by users based on the problem classification model, and predicts the problems encountered by users based on the problem prediction model corresponding to the types; further, according to the predicted problems, determine the solution to be pushed to the user. This application can classify the page sequence corresponding to the user's browsing application within a preset time, determine the type corresponding to the page sequence, and then select a model matching the type to predict the problem, and accurately predict the user's current possible existing problems. Problems, timely detection of user problems, improve the accuracy, real-time and effectiveness of problem prediction, and can also recommend solutions to users, solve user problems in the first time, and improve user service experience.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

图1为本申请实施例涉及的一种应用场景示意图；FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of the present application;

图2为本申请实施例提供的一种终端设备的界面展示示意图；FIG. 2 is a schematic diagram showing an interface of a terminal device according to an embodiment of the present application;

图3为本申请实施例提供的一种问题预测方法的流程示意图；3 is a schematic flowchart of a problem prediction method provided by an embodiment of the present application;

图4为本申请实施例提供的一种流计算框架的结构示意图；FIG. 4 is a schematic structural diagram of a stream computing framework provided by an embodiment of the present application;

图5为本申请实施例提供的一种确定问题分类的过程示意图；5 is a schematic diagram of a process for determining problem classification according to an embodiment of the present application;

图6为本申请实施例提供的一种深度兴趣进化网络模型进行训练的示意图；6 is a schematic diagram of training a deep interest evolution network model provided by an embodiment of the present application;

图7为本申请实施例提供的一种自然语言模型进行预训练的示意图；7 is a schematic diagram of pre-training a natural language model provided by an embodiment of the present application;

图8为本申请实施例提供的一种基于互信息确定字典的流程图；8 is a flowchart of determining a dictionary based on mutual information provided by an embodiment of the present application;

图9为本申请实施例提供的一种问题预测方法的在线计算流程图；FIG. 9 is an online calculation flowchart of a problem prediction method provided by an embodiment of the present application;

图10为本申请实施例提供的另一种问题预测方法的流程示意图；10 is a schematic flowchart of another problem prediction method provided by an embodiment of the present application;

图11为本申请实施例提供的一种问题预测装置的结构示意图；FIG. 11 is a schematic structural diagram of a problem prediction apparatus provided by an embodiment of the present application;

图12为本申请实施例提供的另一种问题预测装置的结构示意图；FIG. 12 is a schematic structural diagram of another problem prediction apparatus provided by an embodiment of the present application;

图13为本申请实施例提供的一种电子设备的结构示意图。FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

通过上述附图，已示出本申请明确的实施例，后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围，而是通过参考特定实施例为本领域技术人员说明本申请的概念。Specific embodiments of the present application have been shown by the above-mentioned drawings, and will be described in more detail hereinafter. These drawings and written descriptions are not intended to limit the scope of the concepts of the present application in any way, but to illustrate the concepts of the present application to those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application.

首先对本申请所涉及的名词进行解释：First, the terms involved in this application are explained:

流计算：指的是一种计算架构，可以实时获取来自不同数据源的海量数据，经过实时分析处理，获得有价值的信息，用于对在不断变化的运动过程中大规模流动数据进行实时地分析，以捕捉到可能有用的信息，并把结果发送到下一计算节点。Stream computing: refers to a computing architecture that can acquire massive data from different data sources in real time, and obtain valuable information through real-time analysis and processing, which is used for real-time analysis of large-scale flowing data in the process of changing motion. analysis to capture potentially useful information and send the results to the next compute node.

流计算引擎：指的是一个基于开源的分布式计算框架开发的实时计算引擎，用于实现流计算。Stream computing engine: refers to a real-time computing engine developed based on an open-source distributed computing framework to implement stream computing.

推荐系统：指的是利用电子商务网站向客户提供商品信息和建议，帮助用户决定应该购买什么产品，模拟销售人员帮助客户完成购买过程，如根据用户的历史偏好和约束为用户提供排序的个性化物品(item)推荐列表，即可以根据用户偏好、商品特征、用户-商品交易和其他环境因素(如时间、季节、位置等)生成推荐结果，所推荐的物品可以包括电影、书籍、餐厅、新闻条目等，在本申请中，所述个性化物品可以为客服系统中的问题，也可以称为意图或场景。Recommendation system: refers to the use of e-commerce websites to provide customers with product information and suggestions, to help users decide what products they should buy, and to simulate salespeople to help customers complete the purchase process, such as providing users with personalized sorting based on users’ historical preferences and constraints Item recommendation list, that is, recommendation results can be generated according to user preferences, commodity characteristics, user-commodity transactions and other environmental factors (such as time, season, location, etc.). The recommended items can include movies, books, restaurants, news Items, etc. In this application, the personalized items may be questions in the customer service system, and may also be referred to as intents or scenarios.

C端用户：在电商场景下，通常指的是消费者或买家。C-end users: In the e-commerce scenario, it usually refers to consumers or buyers.

通过近年的发展，智能客服机器人具备了知识库问答和对话能力，可以比较顺畅、准确地解答用户问题，同时，体验更好的智能客服机器人还可以基于用户的问题进行预测，做到了用户问题的关联猜想。Through the development in recent years, the intelligent customer service robot has the knowledge base Q&A and dialogue capabilities, and can answer user questions smoothly and accurately. Association conjecture.

然而，在用户使用应用程序的过程中，各种操作都会留下行为轨迹，因此，在一些技术中，可以基于用户画像或者检测到的用户使用应用程序时产生的用户日志，提取出用户可能感兴趣的内容主动向不同的用户推荐不同的消息。例如，可以基于人群标签的推送系统主动向不同的人群推送不同的消息、宣传推广文案等。However, in the process of using the application, various operations will leave behavioral traces. Therefore, in some technologies, the user's possible feelings can be extracted based on the user portrait or the detected user log generated when the user uses the application. Interested content actively recommends different messages to different users. For example, a push system based on crowd tags can actively push different messages, publicity and promotion texts, etc. to different groups of people.

示例性的，推送系统可以通过在后台数据表中运行结构化查询语言(StructuredQuery Language,SQL)进行符合设定条件的圈人，或者基于人口统计学和兴趣模型的标签体系，通过人群标签的组合来手动筛选目标用户，进而进行消息的推送。Exemplarily, the push system can run structured query language (Structured Query Language, SQL) in the background data table to circle people that meet the set conditions, or a tag system based on demographics and interest models, through a combination of crowd tags. To manually filter target users, and then push the message.

但是，上述基于人群标签的推送系统主动推送消息的方式需要依赖人群的标签和属性对用户进行打标，而开发这样的人群标签系统需要的周期较长，且第三方支持的人群标签又往往不兼容，口径也不一致；另一方面，基于人口统计学和兴趣模型的标签体系，是一个相对长期的属性标签，短期内不会发生大的改变，因此通过打标签的方式向用户推送消息的准确性和实时性较差。However, the above-mentioned crowd tag-based push system actively pushes messages by relying on crowd tags and attributes to mark users. The development of such crowd tagging systems requires a long period of time, and crowd tags supported by third parties often do not On the other hand, the tag system based on demographics and interest models is a relatively long-term attribute tag and will not change significantly in the short term. Therefore, the accurate information of messages pushed to users by tagging performance and real-time performance are poor.

需要说明的是，上述推送系统多是以活动宣传和推广为目的进行一次性的推送，并不是为智能客服机器人定制的即时推送场景，难以适应即时“解答或解决问题”的场景，推送时效性较差，且消息推送的时机并不是问题发生的第一时间，因此，通过上述方式推送目标后，往往存在一定的滞后性，致使用户会打开推送消息的意愿大幅降低。It should be noted that the above push systems are mostly one-time push for the purpose of event publicity and promotion, and are not instant push scenarios customized for intelligent customer service robots. Poor, and the timing of message push is not the first time the problem occurs. Therefore, after the target is pushed through the above method, there is often a certain lag, which greatly reduces the user's willingness to open the push message.

然而，既然机器人有了问题预测的能力，在智能服务中，可以通过实时获取用户日志，主动预测用户的问题，并在第一时间发送推送消息给用户，提醒用户点击推送消息便可以帮助解决问题。However, since the robot has the ability to predict problems, in the intelligent service, it can obtain user logs in real time, actively predict the user's problems, and send a push message to the user at the first time, reminding the user to click the push message to help solve the problem. .

因此对于智能客服机器人来说，可以利用流计算的能力，对用户可能遇到的问题进行实时预测，以及相关解决方案的推送，进而提升服务的质量和温度。例如，C端用户在使用电子产品的过程中，系统会记录用户访问浏览页面的统一资源定位器(Uniform ResoureLocator，URL)，每个页面的URL对应用户的一个操作，通过对URL在时间上进行排序，可以构成了用户的行为轨迹，所述行为轨迹中包括用户的即时意图和需求，进一步的，基于用户的行为轨迹，以及对异常行为的监听和预测，就可以对用户可能遇到的问题进行实时预测，进而推荐相关解决方案。Therefore, for intelligent customer service robots, the ability of stream computing can be used to make real-time predictions on problems that users may encounter, and push relevant solutions, thereby improving the quality and temperature of services. For example, when a C-end user is using an electronic product, the system will record the Uniform ResoureLocator (URL) of the user's visit to the browsing page. The URL of each page corresponds to an operation of the user. Sorting can constitute the user's behavioral trajectory, which includes the user's immediate intentions and needs. Further, based on the user's behavioral trajectory, as well as the monitoring and prediction of abnormal behaviors, the problems that the user may encounter can be solved. Make real-time predictions to recommend relevant solutions.

有鉴于此，本申请实施例提供一种问题预测方法，可以通过对用户在应用程序中页面浏览的日志数据，加工转为流计算的消息触发源，同时使用算法模型分析用户在应用程序中可能遇到问题的类型，进而基于问题的类型选择相对应的模型进行实时的问题和意图预测，示例性的，图1为本申请实施例涉及的一种应用场景示意图。如图1所示，本申请实施例提供的问题预测方法可以应用于如图1所示的应用场景中，该应用场景包括：终端设备101、分析平台102和处理平台103；具体的，分析平台102可以获取到的终端设备101在30分钟内浏览的某应用程序的页面序列，进一步的，基于获取到终端设备101对应的30分钟内浏览的页面序列，利用问题分类模型确定该页面序列的类型，并根据该页面序列对应的类型选择与该类型相对应的问题预测模型如问题预测模型1，预测可能遇到的问题1，并将该问题1发送到处理平台103，进一步的，处理平台103根据该问题1确定推送给用户的解决方案1，并将确定的解决方案1发送到终端设备101进行显示，供用户查看。In view of this, the embodiment of the present application provides a problem prediction method, which can process the log data that the user browses on the page in the application to process the message trigger source of stream computing, and at the same time use the algorithm model to analyze the possibility of the user in the application. The type of the problem is encountered, and then a corresponding model is selected based on the type of the problem to predict the problem and intent in real time. Exemplarily, FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of the present application. As shown in FIG. 1 , the problem prediction method provided by the embodiment of the present application can be applied to the application scenario shown in FIG. 1 , and the application scenario includes: a terminal device 101 , an analysis platform 102 , and a processing platform 103 ; specifically, the analysis platform 102 The page sequence of an application program browsed by the terminal device 101 within 30 minutes can be obtained, and further, based on the obtained page sequence corresponding to the terminal device 101 browsed within 30 minutes, the problem classification model is used to determine the type of the page sequence , and select a problem prediction model corresponding to the type, such as problem prediction model 1, according to the type corresponding to the page sequence, predict the problem 1 that may be encountered, and send the problem 1 to the processing platform 103. Further, the processing platform 103 According to the problem 1, the solution 1 to be pushed to the user is determined, and the determined solution 1 is sent to the terminal device 101 for display for the user to view.

可以理解的是，每隔预设时间，处理平台103可以基于用户的反馈信息调整问题1对应的解决方案，使得所述解决方案更加符合用户的需求，提升用户体验感。It can be understood that, at preset time intervals, the processing platform 103 may adjust the solution corresponding to question 1 based on the user's feedback information, so that the solution is more in line with the user's needs and improves the user experience.

其中，分析平台102包括海量的数据，可以对所述数据进行分类，进而确定每一类型对应的问题预测模型并保存，用于问题预测，使得可以适用于多种场景，以分析平台102为电商平台为例，则可以按照电商平台的类型进行划分，每种类型对应有问题预测模型，如有3种类型，对应有问题预测模型1、问题预测模型2和问题预测模型3，每一个预测模型可以预测对应类型下的一个或多个问题，本申请实施例对分类的类别以及对应的问题预测模型的种类和数量不作具体限定，以上仅是示例说明。Among them, the analysis platform 102 includes a large amount of data, which can be classified, and then determine the problem prediction model corresponding to each type and save it for problem prediction, so that it can be applied to a variety of scenarios, using the analysis platform 102 as the electrical Take the e-commerce platform as an example, it can be divided according to the type of e-commerce platform. Each type corresponds to the problem prediction model. If there are 3 types, it corresponds to the problem prediction model 1, the problem prediction model 2 and the problem prediction model 3. Each The prediction model can predict one or more problems of a corresponding type. The embodiments of the present application do not specifically limit the categories of classification and the types and quantities of the corresponding problem prediction models, and the above are only examples.

需要说明的是，上述利用问题分类模型确定页面序列的类型，并调用该类型相对应的问题预测模型预测用户遇到的问题，以及根据问题确定推送给用户的解决方案的执行步骤，仅是示例说明，也可以只通过一个平台如云端进行处理，或者还可以拆分出更多的平台以及更多的模块进行处理，本申请实施例对此不作具体限定。It should be noted that the above steps of using the problem classification model to determine the type of page sequence, calling the problem prediction model corresponding to the type to predict the problems encountered by the user, and determining the solution to be pushed to the user according to the problem are just examples. It should be noted that the processing may also be performed only through one platform such as the cloud, or more platforms and more modules may be split for processing, which is not specifically limited in this embodiment of the present application.

可选的，终端设备101在30分钟内浏览的某应用程序的页面对应的日志可以通过如下方式获取，图2为本申请实施例提供的一种终端设备的界面展示示意图，如图2所示，终端设备可以根据用户打开的应用程序界面，以及在该应用程序界面的触控操作获取浏览日志数据，如用户在打开的某电子商务界面中，通过触控操作点击“我的”按钮，跳转至个人用户界面，在我的订单中再次通过触控操作点击“退款/售后”按钮，浏览退款/售后相关的数据，进而，终端设备获取浏览日志数据。Optionally, the log corresponding to a page of an application program browsed by the terminal device 101 within 30 minutes may be obtained in the following manner. FIG. 2 is a schematic diagram of an interface display of a terminal device provided by an embodiment of the present application, as shown in FIG. 2 . , the terminal device can obtain browsing log data according to the application program interface opened by the user and the touch operation on the application program interface. Go to the personal user interface, click the "Refund/After-Sales" button again through the touch operation in My Order, browse the data related to the refund/after-sales, and then, the terminal device obtains the browsing log data.

进一步的，终端设备在获取到浏览日志数据后，可以将该数据发送至日志存储模块，该日志存储模块可以是能够存储日志数据的任意设备或平台，例如日志服务数据中心或者数据仓库等。基于本申请提供的方案，可以使用流计算等方式，将日志数据实时转化成流式数据启动下游的主动服务流程。其中，日志服务数据中心或者数据仓库持久化存储，到本方案流计算触发，可以做到秒级延迟。Further, after acquiring the browsing log data, the terminal device can send the data to a log storage module, and the log storage module can be any device or platform capable of storing log data, such as a log service data center or a data warehouse. Based on the solution provided in this application, stream computing and other methods can be used to convert log data into streaming data in real time to initiate downstream active service processes. Among them, the log service data center or data warehouse persistent storage, when triggered by the flow calculation of this solution, can achieve a second-level delay.

可选的，主动服务流程可以判断出用户可能遇到的问题，如“用户需要了解退款或售后的处理流程”，进而将相应的解决方案如“退款流程/售后流程”推动给用户，及时帮助用户解决可能遇到的问题。Optionally, the active service process can determine the problems that users may encounter, such as "users need to understand the refund or after-sales process", and then push corresponding solutions such as "refund process/after-sales process" to users, Help users solve possible problems in a timely manner.

因此，本申请实施例提供的问题预测方法，可以不需要用户向客服机器人提问，便可以主动预测出用户当前可能存在的问题，节省时间的同时还可以及时地发现用户问题，且在预测用户可能遇到的问题时，使用算法模型确定用户可能遇到问题的类型，进而基于类型选择相应的模型预测问题，提高了问题预测的准确性和实时性，以及还可以将对应的解决方案主动推送给用户，实现陪伴式服务，提升用户服务体验。Therefore, the problem prediction method provided in the embodiment of the present application can proactively predict the user's current possible problems without the need for the user to ask questions to the customer service robot, save time, and at the same time discover the user's problems in a timely manner. When encountering problems, the algorithm model is used to determine the types of problems that users may encounter, and then the corresponding models are selected based on the types to predict the problems, which improves the accuracy and real-time performance of problem prediction, and can also actively push the corresponding solutions to Users, realize companion service and improve user service experience.

下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图，对本申请的实施例进行描述。The technical solutions of the present application will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below with reference to the accompanying drawings.

示例性地，图3为本申请实施例提供的一种问题预测方法的流程示意图。本实施例可以应用于任意能够实现数据处理的装置，例如服务端等。如图3所示，所述方法可以包括：Exemplarily, FIG. 3 is a schematic flowchart of a problem prediction method provided by an embodiment of the present application. This embodiment can be applied to any device capable of implementing data processing, such as a server and the like. As shown in Figure 3, the method may include:

S301、获取用户在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列。S301. Acquire a sequence of interactive behaviors of a user within a current preset time; wherein, the sequence of interactive behaviors includes a sequence of browsed pages.

本申请实施例中，所述交互行为序列用于体现用户与终端设备中应用程序的交互行为，包括浏览的页面序列，所述页面序列包括用户在当前预设时间内浏览的多个页面的信息构成的序列，所述页面的信息包括页面名称和/或页面URL；页面名称例如可以包括订单详情页、帮助、商品详情页等。In this embodiment of the present application, the interaction behavior sequence is used to reflect the interaction behavior between the user and the application program in the terminal device, including a browsed page sequence, and the page sequence includes information of multiple pages browsed by the user within the current preset time. The information of the page includes a page name and/or a page URL; the page name may include, for example, an order details page, a help page, a product details page, and the like.

可选的，在本步骤中，可以实时获取用户在当前预设时间内的交互行为序列。预设时间指的是预设系统用于将收集到的数据进行实时处理而设定的时间，所述预设系统可以为云端设置的用于进行数据处理的系统，使得用户的浏览日志数据可以得到即时处理，进而提高预测的准确性，如预设时间可以为30分钟。Optionally, in this step, the interactive behavior sequence of the user within the current preset time may be acquired in real time. The preset time refers to the time set by the preset system for real-time processing of the collected data, and the preset system can be a system set in the cloud for data processing, so that the user's browsing log data can be Get instant processing, and then improve the accuracy of the forecast, such as the preset time can be 30 minutes.

其中，所述交互行为序列可以为获取用户在当前预设时间内的，且满足预设要求的页面序列，所述预设要求可以是预设数量，也可以是指定页面，还可以是其他要求，本申请实施例对此不作具体限定。Wherein, the interactive behavior sequence may be a sequence of pages obtained by the user within the current preset time and meeting preset requirements, and the preset requirements may be a preset number, a specified page, or other requirements , which is not specifically limited in the embodiments of the present application.

示例性的，在图1的应用场景中，可以获取终端设备101在当前30分钟内浏览某应用程序中30个实时购物页面对应的页面序列，即30条URL记录。Exemplarily, in the application scenario of FIG. 1 , the page sequences corresponding to 30 real-time shopping pages browsed by the terminal device 101 in a certain application program in the current 30 minutes may be obtained, that is, 30 URL records.

S302、根据所述交互行为序列，基于问题分类模型预测所述用户遇到的问题的类型，并基于所述类型对应的问题预测模型，预测所述用户遇到的问题。S302. Predict the type of the problem encountered by the user based on the problem classification model according to the interactive behavior sequence, and predict the problem encountered by the user based on the problem prediction model corresponding to the type.

本申请实施例中，对问题的类型进行分类，可以按照输入数据的性质分类，如输入数据仅为交互行为序列时，则可以使用仅处理序列的模型即可，如输入数据为交互行为序列以及其它信息如订单状态，则需要使用可以同时处理序列和其它信息的模型。In this embodiment of the present application, the types of questions can be classified according to the nature of the input data. For example, when the input data is only an interactive behavior sequence, a model that only processes the sequence can be used. For example, the input data is an interactive behavior sequence and Other information, such as order status, requires a model that can handle both sequences and other information.

具体的，所述类型至少包括如下两种：仅基于交互行为序列预测问题、基于交互行为序列以及其它信息预测问题；根据所述类型，调用对应的模型预测用户遇到的问题。所述问题可以包括维权问题、产品推荐、物流问题、店铺评价信息等，本申请实施例对此不作具体限定。Specifically, the types include at least the following two types: predicting problems based only on interaction behavior sequences, and predicting problems based on interaction behavior sequences and other information; and calling a corresponding model to predict problems encountered by users according to the types. The problems may include rights protection problems, product recommendations, logistics problems, store evaluation information, etc., which are not specifically limited in this embodiment of the present application.

示例性的，本申请可以仅基于交互行为序列调用对应的模型预测用户遇到的问题，如只根据用户在预设时间段内浏览某个应用程序界面对应的页面序列调用对应的模型预测用户遇到的问题，也可以基于交互行为序列以及其它信息调用对应的模型预测用户遇到的问题；所述其它信息可以包括订单状态、物流状态以及可选的问题场景等，本申请实施例对此不作具体限定，其它信息包含的种类与内容越多，预测问题的准确性越高，如根据用户在在预设时间段内浏览某个应用程序界面对应的页面浏览数据、订单状态、物流状态以及可选的问题场景作为模型的输入，进一步，调用相应的模型，得到用户选择每个问题的概率，即预测出用户遇到的问题。Exemplarily, the present application can only call the corresponding model based on the interactive behavior sequence to predict the problems encountered by the user. The problem encountered by the user can also be called based on the interactive behavior sequence and other information to call the corresponding model to predict the problem encountered by the user; the other information may include order status, logistics status, and optional problem scenarios, etc., which are not addressed in this embodiment of the application The specific limitation is that the more types and contents other information contains, the higher the accuracy of the prediction problem. The selected problem scenario is used as the input of the model, and further, the corresponding model is called to obtain the probability that the user selects each problem, that is, the problem encountered by the user is predicted.

可选的，也可以按照输出数据的性质分类，如对用户影响不大的输出数据如涉及用户喜好的数据，可以使用准确性一般但是速度快的模型，而对用户影响较大的输出数据如涉及交易的数据，则使用准确性较高的模型，或者根据得到输出数据的计算量大小进行分类，如输出数据的计算量较大的，可以使用计算精度较高的模型，反之，则使用计算精度一般的模型，可以减轻预设系统的负担。Optionally, it can also be classified according to the nature of the output data. For example, the output data that has little influence on the user, such as the data related to the user's preferences, can use a model with general accuracy but fast speed, and the output data that has a great influence on the user, such as For transaction data, use a model with higher accuracy, or classify according to the amount of calculation of the output data. A model with average accuracy can reduce the burden on the preset system.

需要说明的是，本申请实施例对问题分类模型和问题预测模型对应的具体模型不作限定，所述问题分类模型和问题预测模型满足如下条件即可：在利用问题分类模型对交互行为序列进行分类后，可以调用与类型相对应的问题预测模型预测用户可能遇到的问题。It should be noted that the specific models corresponding to the problem classification model and the problem prediction model are not limited in the embodiments of the present application, and the problem classification model and the problem prediction model can satisfy the following conditions: when using the problem classification model to classify the interactive behavior sequence After that, the problem prediction model corresponding to the type can be called to predict the problems that users may encounter.

S303、根据预测得到的问题，确定推送给所述用户的解决方案。S303. Determine a solution to be pushed to the user according to the predicted problem.

具体的，根据预测得到的问题，可以通过匹配相应的消息模板获得所述用户的解决方案，并将所述解决方案推送给所述用户，在本申请实施例中，也可以通过其他方式获得所述用户的解决方案，本申请实施例对此不作具体限定。Specifically, according to the predicted problem, the user's solution can be obtained by matching the corresponding message template, and the solution can be pushed to the user. In this embodiment of the present application, the solution can also be obtained in other ways. The solution of the user is described above, which is not specifically limited in this embodiment of the present application.

在本步骤中，解决方案可以指的是针对预测出的用户可能遇到的问题，主动推送给用户的解决方案，所述解决方案可以提前存储在查找表中，在使用时直接调用，每一问题都有对应的解决方案，可以通过问题对应的id在查找表中查找对应的解决方案。In this step, the solution may refer to the solution that is proactively pushed to the user for the predicted problem that the user may encounter. The solution may be stored in the look-up table in advance and called directly during use. Each problem has a corresponding solution, and the corresponding solution can be found in the lookup table through the id corresponding to the problem.

可选的，所述解决方案可以为文案、链接、卡片等，可以通过应用程序中的消息、短信、电话等方式推送给用户，如可以将退款的链接以消息的形式推荐给用户。Optionally, the solution can be a copy, link, card, etc., which can be pushed to the user through messages, text messages, phone calls, etc. in the application, for example, a link for refund can be recommended to the user in the form of a message.

示例性的，以出行平台为例，通过用户浏览该出行平台对应的应用程序的页面信息，若基于相应的问题预测模型预测该用户遇到的问题为维权问题，则主动向用户的终端设备的显示界面推荐有关维权的流程；若基于相应的问题预测模型预测该用户遇到的问题为旅行路线选择问题，则主动向用户的终端设备的显示界面推荐当季的旅行路线、旅游社以及旅游攻略等；若基于相应的问题预测模型预测该用户遇到的问题为票务问题，则主动向用户的终端设备的显示界面推荐买票流程、改票流程或退票流程。Exemplarily, taking a travel platform as an example, through the user browsing the page information of the application program corresponding to the travel platform, if the problem encountered by the user is predicted to be a rights protection problem based on the corresponding problem prediction model, it will actively report to the user's terminal device. The display interface recommends the process of rights protection; if the user’s problem is predicted to be a travel route selection problem based on the corresponding problem prediction model, it will actively recommend the current season’s travel route, travel agency and travel strategy to the display interface of the user’s terminal device etc.; if the problem encountered by the user is predicted to be a ticketing problem based on the corresponding problem prediction model, it will proactively recommend a ticket purchase process, a ticket change process or a ticket refund process to the display interface of the user's terminal device.

以外卖平台为例，通过用户浏览该外卖平台对应的应用程序的网页信息，若基于相应的问题预测模型预测该用户遇到的问题为订饭问题，则主动向用户的终端设备的显示界面推送可能选择的外卖相关信息；以电子商务平台为例，通过统计用户的浏览电子商务平台对应的应用程序的网页信息及实时商品评论、服务数据等，若基于相应的问题预测模型预测该用户遇到的问题为购物问题，则主动向用户的终端设备的显示界面推送可能选择的店铺商家的相关信息。Take a takeaway platform as an example, through the user browsing the webpage information of the application program corresponding to the takeaway platform, if the problem encountered by the user is predicted to be a meal ordering problem based on the corresponding problem prediction model, the user will actively push the display interface of the user's terminal device. Possible selection of takeaway-related information; taking an e-commerce platform as an example, by counting the webpage information and real-time product reviews, service data, etc. If the problem is a shopping problem, it will actively push the relevant information of the store and merchant that may be selected to the display interface of the user's terminal device.

因此，本申请实施例可以通过获取到的用户在当前预设时间内的浏览的页面序列；基于问题分类模型预测用户遇到的问题的类型，并基于类型对应的问题预测模型，主动预测用户遇到的问题；进一步的，根据预测得到的问题，确定推送给用户的解决方案。本申请可以基于对用户在预设时间内的浏览应用程序对应的页面序列进行分类，确定页面序列对应的类型，进而选择与类型相匹配的模型进行问题预测，精准的预测出用户当前可能存在的问题，及时发现用户问题，提高了问题预测的准确性、实时性和有效性，还可以在向用户推荐解决方案，在第一时间解决用户问题，进而提升用户的服务体验。Therefore, the embodiment of the present application can obtain the page sequence browsed by the user in the current preset time; predict the type of problems encountered by the user based on the problem classification model, and actively predict the problem encountered by the user based on the problem prediction model corresponding to the type. problems; further, according to the predicted problems, determine a solution to push to the user. This application can classify the page sequence corresponding to the user's browsing application within a preset time, determine the type corresponding to the page sequence, and then select a model matching the type to predict the problem, and accurately predict the user's current possible existing problems. Problems, timely detection of user problems, improve the accuracy, real-time and effectiveness of problem prediction, and can also recommend solutions to users, solve user problems in the first time, and improve user service experience.

综合上述考虑，本申请提供的问题预测方法可以是基于流计算框架实现的，图4为本申请实施例提供的一种流计算框架的结构示意图，如图4所示，所述流计算框架为一个分层的用户意图预测框架，可以基于用户的浏览日志数据，便可以预测出用户可能遇到的问题，提高问题预测的准确性。具体的，所述流计算框架包括两部分组件，分别为由流计算自定义函数模块、算法模型Http在线服务模块(简称算法服务)和规则逻辑库组成的流计算消息触发源，和由召回模块(Matching)、排序模块(Ranking)和疲劳度控制、推送控制等模块组成的消息消费预测组件；所述流计算消息触发源集成了流计算引擎和消息中间件(MetaQ)等中间件，消息消费预测用于分层调用各类算法模型，如调用召回算法模型库、排序算法模型库中的算法模型预测用户的可能遇到的问题。Based on the above considerations, the problem prediction method provided by the present application may be implemented based on a flow computing framework. FIG. 4 is a schematic structural diagram of a flow computing framework provided by an embodiment of the present application. As shown in FIG. 4 , the flow computing framework is A hierarchical user intent prediction framework can predict the problems that users may encounter based on the user's browsing log data, and improve the accuracy of problem prediction. Specifically, the flow computing framework includes two components, namely a flow computing message trigger source consisting of a flow computing custom function module, an algorithm model Http online service module (referred to as algorithm service) and a rule logic library, and a recall module consisting of a flow computing message trigger source. A message consumption prediction component composed of (Matching), ranking module (Ranking), fatigue control, push control and other modules; the stream computing message trigger source integrates stream computing engine and message middleware (MetaQ) and other middleware, message consumption Prediction is used to call various algorithm models hierarchically, such as calling the recall algorithm model library and the algorithm models in the sorting algorithm model library to predict the problems that users may encounter.

示例性的，通过流计算引擎将用户浏览页面的用户日志加工成键值对的流式数据，进一步的，流计算自定义函数模块调用算法模型Http在线服务模块中的算法，通过基于互信息计算的到的关键页面/关键字，以及规则逻辑库，可以对大量的流式数据进行筛选，得到的筛选后的流式数据作为消息放入消息中间件中，用于消息消费预测。Exemplarily, the user log of the user's browsing page is processed into stream data of key-value pairs by the stream computing engine. Further, the stream computing user-defined function module calls the algorithm in the Http online service module of the algorithm model. The obtained key pages/keywords, as well as the rule logic library, can filter a large amount of streaming data, and the filtered streaming data is put into the message middleware as a message for message consumption prediction.

进一步的，在消息消费预测的过程中，所述消息依次经过召回模块和排序模块，预测用户可能遇到的问题，再基于推送控制模块把所述问题反馈给离线数据计算(调度)服务，反馈内容还包括日志表和推送表，所述日志表用于存储用户日志，所述推送表用于更新互信息，所述互信息用于衡量“某个关键字/关键页面的出现”对于“知识点的出现”所贡献的信息量，所述“知识点”为用户遇到的问题，这样，形成了一个闭环。Further, in the process of message consumption prediction, the message passes through the recall module and the sorting module in turn to predict the problems that the user may encounter, and then based on the push control module, the problems are fed back to the offline data computing (scheduling) service, and the feedback The content also includes a log table and a push table, the log table is used to store user logs, the push table is used to update mutual information, and the mutual information is used to measure "the appearance of a certain keyword/key page" for "knowledge". The amount of information contributed by the "appearance of the point", and the "knowledge point" is the problem encountered by the user, thus forming a closed loop.

其中，限流的目的是为了减少消息的堆积，当消息量大于阈值时，可以丢弃部分消息，所述疲劳控制模块用于设置向用户推送的次数，减少因推送次数太多，导致用户的反感情绪。Among them, the purpose of current limit is to reduce the accumulation of messages. When the amount of messages is greater than the threshold, some messages can be discarded. The fatigue control module is used to set the number of pushes to the user, so as to reduce the user's disgust caused by too many pushes. mood.

本申请实施例中，特征向量可以指的是向量化后得到的二维向量，也可以称为嵌入表现向量。可选的，可以使用word2vec模型对交互行为序列进行向量化。示例性地，可以通过接入某个应用程序实时获取用户浏览日志，确定用户的序列化页面浏览路径(即交互行为序列)如页面1至页面N等。在嵌入表现(Embedding representation)阶段，将用户的序列化页面浏览路径编码为二维向量，再根据二维向量进行问题分类。In this embodiment of the present application, the feature vector may refer to a two-dimensional vector obtained after vectorization, and may also be referred to as an embedded representation vector. Optionally, a word2vec model can be used to vectorize the sequence of interactive actions. Exemplarily, the user's browsing log can be obtained in real time by accessing a certain application, and the user's serialized page browsing path (ie, interactive behavior sequence), such as page 1 to page N, can be determined. In the embedding representation stage, the user's serialized page browsing path is encoded into a two-dimensional vector, and then the problem is classified according to the two-dimensional vector.

在本步骤中，问题分类模型用于对用户遇到的问题进行分类，预测当前用户是否有账户、交易、营销这三种类型问题中的一个或多个，通过对所述问题进行标签化，即对用户问题打上标签，以调用相应的问题预测模型基于交互行为序列对用户的问题进行预测；所述问题预测模型包括：基于语言模型的问题预测模型如序列模型、基于深度学习点选率(Click Through Rate，CTR)模型的问题预测模型如深度兴趣进化网络(Deep InterestEvolution Network，DIEN)模型，所述序列模型用来解决营销、账户类标签下的问题预测；所述DIEN模型用来解决交易标签下的问题预测，即对用户遇到的问题做二次排序。In this step, the problem classification model is used to classify the problems encountered by the user, and predict whether the current user has one or more of the three types of problems: account, transaction, and marketing. By labeling the problems, That is, the user's problem is labeled to call the corresponding problem prediction model to predict the user's problem based on the interactive behavior sequence; the problem prediction model includes: language model-based problem prediction models such as sequence models, based on deep learning click rate ( The problem prediction model of the Click Through Rate (CTR) model is such as the Deep Interest Evolution Network (DIEN) model, the sequence model is used to solve the problem prediction under the marketing and account labels; the DIEN model is used to solve the transaction The problem prediction under the label is to do a secondary ranking of the problems encountered by users.

其中，本申请实施例中引入CTR模型，是为了将问题预测问题转化为判断“用户(User)和问题(item)”的匹配问题，即判断当前用户会点击(1)或不会点击(0)的问题，进而通过0/1评分来求解，具体的，CTR模型输入的特征向量，包括用户页面浏览行为序列、离散特征和连续特征；所述离散特征如订单、物流、服务状态、问题id等，所述连续特征如订单价格、物品的点击率等，将特征向量输入到CTR模型中，可以得到模型的输出“预测点击率”，为0～1的分值，即用户对该物品是否产生点击行为，点击则为1，不点击则为0，进一步的，可以选择问题中对应的高预测分值的问题为优先为用户推送的问题。Among them, the introduction of the CTR model in the embodiment of the present application is to convert the problem prediction problem into a matching problem of judging "user (User) and problem (item)", that is, judging whether the current user will click (1) or will not click (0). ), and then solve it by scoring 0/1. Specifically, the feature vector input by the CTR model includes user page browsing behavior sequence, discrete features and continuous features; the discrete features such as order, logistics, service status, problem id etc., the continuous features such as the order price, the click rate of the item, etc., input the feature vector into the CTR model, the output "predicted click rate" of the model can be obtained, which is a score of 0 to 1, that is, whether the user has the item or not. When a click behavior is generated, the click is 1, and the non-click is 0. Further, the question with the corresponding high predicted score in the question can be selected as the question to be pushed to the user first.

鉴于不同问题标签的数据形态不一，因此采用不同问题分类模型来进行预测，若所述类型为营销类或账户类，则基于序列模型对用户遇到的问题进行预测，所述序列模型可以是自然语言模型，也可以是其它任意能够处理序列的模型，本申请实施例对此不作具体限定，如所述序列模型为Bert-MM-1模型，若所述类型为交易类，则基于深度兴趣进化网络模型对用户遇到的问题进行预测。In view of the different data forms of different question labels, different question classification models are used for prediction. If the type is marketing or account type, the problems encountered by users are predicted based on the sequence model. The sequence model can be The natural language model may also be any other model capable of processing sequences, which is not specifically limited in the embodiments of this application. For example, the sequence model is a Bert-MM-1 model, and if the type is a transaction class, it is based on deep interest Evolutionary network models make predictions about problems encountered by users.

其中，仅基于交互行为序列预测问题类型为营销类或账户类，基于交互行为序列以及其它信息预测问题类型为交易类；所述营销类为用户购买商品遇到的问题，包括活动玩法、如何领红包、优惠券等问题；所述账户类为用户修改个人信息遇到的问题，包括修改收货地址、修改账户绑定手机号等问题；所述交易类为用户进行交易遇到的问题，包括如何申请退款、查询物流状态、查询订单状态等问题，本申请实施例对每一类型中包含的问题数量不作具体限定。Among them, the type of problem predicted based only on the sequence of interactive behavior is marketing or account type, and the type of problem predicted based on the sequence of interactive behavior and other information is transaction type; Red envelopes, coupons, etc.; the account category refers to the problems encountered by users in modifying personal information, including modifying the delivery address, modifying the mobile phone number bound to the account, etc.; the transaction category refers to the problems encountered by users in transactions, including How to apply for a refund, query the logistics status, query the order status, etc., the embodiment of this application does not specifically limit the number of problems included in each type.

具体的，所述问题分类模型通过对所述交互行为序列进行向量化，即将用户的交互行为序列转化为页面浏览路径编码的二维向量，并将所述二维向量输入到相应的卷积神经网络(Convolutional Neural Network，CNN)中，通过CNN提取序列化的隐含模式，输出问题对应的分类作为预测结果；进一步的，基于所述问题对应的类型，选择相应的问题预测模型，并以交互行为序列作为模型的输入，得到用户选择每个问题的概率的输出结果。例如，图5为本申请实施例提供的一种确定问题分类的过程示意图，如图5所示，可以通过CNN提取序列化的嵌入表现向量的隐含模式，即经过卷积、最大池化、全连接神经网络以及分类器的处理，得到问题的分类作为预测结果。Specifically, the problem classification model vectorizes the interactive behavior sequence, that is, converts the user's interactive behavior sequence into a two-dimensional vector encoded by the page browsing path, and inputs the two-dimensional vector into the corresponding convolutional neural network. In the network (Convolutional Neural Network, CNN), the serialized implicit pattern is extracted through CNN, and the classification corresponding to the problem is output as the prediction result; further, based on the type corresponding to the problem, the corresponding problem prediction model is selected, and interactive The action sequence is used as the input to the model, and the output of the probability that the user selects each question is obtained. For example, FIG. 5 is a schematic diagram of a process for determining problem classification provided by an embodiment of the application. As shown in FIG. 5 , the implicit pattern of the serialized embedded representation vector can be extracted through CNN, that is, after convolution, maximum pooling, The fully connected neural network and the classifier are processed to obtain the classification of the problem as the prediction result.

可选的，将交互行为序列向量化的过程可以是无监督的，具体的，在编码模型训练过程中，可以基于海量用户的页面浏览路径，无监督的挖掘“语义信息”并保存为编码模型，所述编码模型可以理解为word2vec模型，即将所述页面浏览路径和对应的二维向量输入到word2vec模型中进行训练，得到训练后的word2vec模型。Optionally, the process of vectorizing the interactive behavior sequence can be unsupervised. Specifically, during the coding model training process, the “semantic information” can be mined unsupervised based on the page browsing paths of a large number of users and saved as the coding model. , the encoding model can be understood as the word2vec model, that is, the page browsing path and the corresponding two-dimensional vector are input into the word2vec model for training, and the trained word2vec model is obtained.

因此，由于交互行为序列对应的数据形态不一，本申请实施例可以基于用户遇到的问题的类型，调用相应的问题预测模型利用交互行为序列来进行问题预测，针对不同的数据类型有相应的处理方式，使得不同的数据可以预测出用户的问题，提高了问题预测场景的灵活性。Therefore, since the data forms corresponding to the interactive behavior sequences are different, this embodiment of the present application can call a corresponding problem prediction model based on the type of problems encountered by the user and use the interactive behavior sequence to predict the problem. There are corresponding data types for different data types. The processing method enables different data to predict user problems and improves the flexibility of problem prediction scenarios.

可选的，还可以获取历史用户与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题；Optionally, historical conversation data between historical users and intelligent customer service can also be obtained, and corresponding questions can be extracted according to the historical conversation data;

本申请实施例中，历史用户可以指的是在某段时间内在应用程序有过浏览记录且主动咨询过问题的用户，如浏览过某应用程序退款页面且咨询过退款问题的用户1，历史会话数据可以指的是历史用户与智能客服进行问题交流的会话数据，如用户1与客服咨询过某件物品的退款流程的一系列会话内容，根据历史会话数据，基于自然语言处理模型，可以提取对应的问题。In the embodiment of this application, historical users may refer to users who have browsing records in the application within a certain period of time and who have actively consulted with questions, such as user 1 who has browsed the refund page of an application and consulted about refunds, Historical session data can refer to the session data of historical users and intelligent customer service for question exchanges, such as a series of conversation contents in which user 1 and customer service consulted about the refund process of an item. According to historical session data, based on the natural language processing model, Corresponding questions can be extracted.

具体的，本申请实施例通过历史用户在历史会话数据前的历史交互行为序列以及提取的问题构建对应的训练数据集，所述提取的问题为训练数据对应的真实结果即标签，用于提高训练的准确性。Specifically, the embodiment of the present application constructs a corresponding training data set based on the historical interaction behavior sequence of historical users before historical session data and the extracted questions. The extracted questions are the real results corresponding to the training data, that is, labels, which are used to improve training. accuracy.

示例性的，序列模型可以为任意能够处理序列数据的模型，例如自然语言模型。而构建序列模型的训练集可以通过如下方式获取：获取一段时间内至少一个用户在某个应用程序界面的交互行为序列作为特征数据，再基于这段时间内每个用户与智能客服的会话数据，提取出对应的问题作为特征数据对应的标签，所述特征数据和标签构成序列模型的训练集，进一步的，利用所述训练集对序列模型进行训练，使得序列模型的输出可以接近提取出对应的问题，进而得到训练后的序列模型。Exemplarily, the sequence model may be any model capable of processing sequence data, such as a natural language model. The training set for building a sequence model can be obtained by the following methods: obtaining the interactive behavior sequence of at least one user in a certain application program interface within a period of time as feature data, and then based on the conversation data between each user and intelligent customer service during this period, The corresponding problem is extracted as the label corresponding to the feature data. The feature data and the label constitute the training set of the sequence model. Further, the training set is used to train the sequence model, so that the output of the sequence model can be close to the corresponding extraction. problem, and then get the trained sequence model.

对于多数交易场景，需要在模型训练中考虑一些“特征因子”，如订单状态、物流状态，而序列模型则变得不可用，因此本申请针对交易类采用了深度学习CTR模型中的一种，如DIEN模型，即利用有标注的数据对DIEN模型进行有监督训练，在训练完成后作为“交易”大场景下，结合离散订单等信息，进行用户问题预测，所述DIEN模型是一个端到端解决预测问题的预测模型。For most transaction scenarios, some "feature factors" need to be considered in model training, such as order status, logistics status, and the sequence model becomes unavailable. Therefore, this application adopts one of the deep learning CTR models for the transaction class. For example, the DIEN model uses the labeled data to perform supervised training on the DIEN model. After the training is completed, it is used as a “transaction” scenario, combining discrete orders and other information to predict user problems. The DIEN model is an end-to-end model. Predictive models for solving forecasting problems.

可选的，所述交易信息可以包括：订单、物流等信息，该信息可以对应离散特征，也可以对应连续特征，如状态信息为离散特征，价格信息为连续特征。图6为本申请实施例提供的一种深度兴趣进化网络模型进行训练的示意图；如图6所示，在DIEN模型训练中，历史交互行为序列作为一路输入，价格信息作为另一路输入，如离散特征如订单状态、物流状态或连续特征如订单价格作为另一路输入，相应的问题场景作为又一路输入，将上述输入数据输入到DIEN模型中进行训练，即可以得到训练后的DIEN模型，所述DIEN模型的输出结果中可以包含对问题场景的点击概率，所述概率用于确定用户遇到的问题的可能性量度，控制模型的输出结果在点击概率最大的前N个值中，可以提高问题的准确性，其中，可能性量度的取值在0和1之间，N为大于1的正整数。Optionally, the transaction information may include: order, logistics and other information, and the information may correspond to discrete features or continuous features, for example, status information is discrete features, and price information is continuous features. FIG. 6 is a schematic diagram of training a deep interest evolution network model provided by an embodiment of the present application; as shown in FIG. 6 , in the DIEN model training, historical interactive behavior sequences are used as one input, and price information is used as another input, such as discrete Features such as order status, logistics status or continuous features such as order price are used as another input, and the corresponding problem scenario is used as another input. Input the above input data into the DIEN model for training, and the trained DIEN model can be obtained. The output of the DIEN model can include the probability of clicking on the problem scene, and the probability is used to determine the probability of the problem encountered by the user. The output of the control model is in the top N values with the largest click probability, which can improve the problem. The accuracy of , where the likelihood measure is between 0 and 1, and N is a positive integer greater than 1.

因此，本申请实施例可以基于与智能客服的历史会话数据，构建训练数据集，进而能够以用户真实咨询的问题来训练模型，提升模型预测的准确性。Therefore, in the embodiment of the present application, a training data set can be constructed based on the historical conversation data with the intelligent customer service, so that the model can be trained based on the questions actually consulted by the user, and the accuracy of the model prediction can be improved.

本申请实施例中，由于营销类、账户类的问题预测仅通过纯行为序列的输入，便可以达到较高的准确率和精确率，因此可以利用所述序列模型如自然语言模型处理营销类、账户类的问题预测，但是由于URL的特殊性，使得历史交互行为序列中存在很多无意义的词，如“www”、“myPage”也被认为是一个单词，所以本申请通过将历史交互行为序列进行页面分词操作后，得到多个向量，进而组成训练的文件，利用所述文件对自然语言模型进行预训练，所述自然语言模型可以为计算序列数据中下一个分词的概率的任意模型，如Bert模型。In the embodiment of the present application, since the prediction of problems in the marketing category and account category can achieve high accuracy and precision only through the input of pure behavior sequences, the sequence model, such as a natural language model, can be used to process marketing, Account class problem prediction, but due to the particularity of the URL, there are many meaningless words in the historical interaction behavior sequence, such as "www", "myPage" are also considered as a word, so this application by the historical interaction behavior sequence After performing the page segmentation operation, a plurality of vectors are obtained, and then a training file is formed, and the natural language model is pre-trained by using the file. The natural language model can be an arbitrary model that calculates the probability of the next word segmentation in the sequence data, such as Bert model.

具体的，将用户在某个时间段内的交互行为序列进行分词操作，即对页面名称以及页面URL进行分词，如将某页面的item进行分词，分词后由原先的一个页面item变为多个“关键字”的序列，即基于关键字的建模将单个item处理为更多“语义”信息，在进行分词操作后，交互行为序列变为了多个向量，利用向量对自然语言模型进行预训练，得到一个能够输入行为序列的、无监督的自然语言模型。Specifically, the word segmentation operation is performed on the user's interactive behavior sequence within a certain period of time, that is, the page name and page URL are word-segmented. For example, the item of a certain page is word-segmented. The sequence of "keywords", that is, the keyword-based modeling processes a single item into more "semantic" information. After the word segmentation operation, the interactive behavior sequence becomes multiple vectors, and the vectors are used to pre-train the natural language model. , to obtain an unsupervised natural language model capable of inputting action sequences.

示例性的，图7为本申请实施例提供的一种自然语言模型进行预训练的示意图；如图7所示，获取训练数据集，并训练数据集中的历史交互行为序列以及提取的问题进行分词操作，对历史交互行为序列进行分词操作可以是从URL中提取分词，对提取的问题进行分词操作可以是从URL相对应问题中提取分词，其中，以第一个为例，英文“user baseinfosetrate myrate market app mtp app onion note weeex market app ikc redirect indexnewmycenter”对应的是从URL中提取出的分词，中文“怎么账号是否遗漏”对应的是从该URL相对应问题中提取的分词，进而得到分词序列，进一步的，将所述分词序列输入到自然语言模型中，以预测下一句为训练目标，对自然语言模型进行预训练，即通过对输入的分词序列进行处理，得到预训练后的自然语言模型。Exemplarily, FIG. 7 is a schematic diagram of pre-training a natural language model provided by an embodiment of the present application; as shown in FIG. 7 , a training data set is obtained, and the historical interaction behavior sequences in the training data set and the extracted questions are segmented. Operation, the word segmentation operation on the historical interaction behavior sequence may be to extract the word segmentation from the URL, and the word segmentation operation for the extracted question may be to extract the word segmentation from the corresponding question of the URL. "market app mtp app onion note weeex market app ikc redirect indexnewmycenter" corresponds to the word segmentation extracted from the URL, and the Chinese "why the account number is missing" corresponds to the word segmentation extracted from the corresponding question of the URL, and then the word segmentation sequence is obtained, Further, the word segmentation sequence is input into the natural language model, and the natural language model is pre-trained with predicting the next sentence as the training target, that is, the pre-trained natural language model is obtained by processing the input word segmentation sequence.

在本步骤中，微调训练可以指的是finetune训练，finetune训练就是基于预训练好的自然语言模型，利用历史交互行为序列和提取的问题来训练新的模型，不用完全重新训练自然语言模型，即可以利用较少的迭代次数，便可以得到一个较好效果的模型，提高模型训练的效率。In this step, fine-tuning training can refer to finetune training. Finetune training is based on the pre-trained natural language model, using historical interaction behavior sequences and extraction problems to train a new model without completely retraining the natural language model, that is By using fewer iterations, a model with better effect can be obtained and the efficiency of model training can be improved.

因此，本申请实施例可以通过预训练使自然语言模型更好地对行为序列进行处理，在进行预训练时，可以将交互序列和对应的问题一起作为模型输入，使得预训练后的模型能够更加精准地提取交互序列的含义，提升预训练效果。Therefore, the embodiment of the present application can enable the natural language model to better process the behavior sequence through pre-training. During pre-training, the interaction sequence and the corresponding question can be used as model input together, so that the pre-trained model can be more Accurately extract the meaning of the interaction sequence and improve the pre-training effect.

本申请实施例中，流式数据为将用户日志通过流计算框架中的计算引擎加工成的键值对，如用户在某应用程序中点击了一个按钮，或者访问了一个页面，相应的，会在日志系统中记录一条日志数据，进一步的，通过计算引擎加工成对应的键值对，即以该用户的身份证标识号(Identity Document,ID)为键，URL的序列为值，构成的键值对。In the embodiment of this application, streaming data is a key-value pair processed by a user log through a computing engine in a streaming computing framework. For example, if a user clicks a button in an application or visits a page, correspondingly, the A piece of log data is recorded in the log system, and further processed into a corresponding key-value pair by the computing engine, that is, the key formed by the user's ID number (Identity Document, ID) is the key and the sequence of URLs is the value. value pair.

具体的，所述流计算框架在对日志数据进行处理时，将该处理的日志数据的实现逻辑打包成jar包进行处理，生成包含交互行为序列的流式数据，进一步的，将该流式数据以流计算自定义函数的形式提交给流计算引擎执行，进一步的，调用一个算法模型Http在线服务接口，所述接口用于对流式数据进行筛选，即基于字典判断流式数据中是否代表用户有咨询需求，所述字典中包括多个关键字和/或关键页面，如果有咨询需求，则将流式数据以消息形式写入消息列队(MetaQ)；如果没有咨询需求，则丢弃该流式数据，待获取用户在下一预设时间内的交互行为序列后，通过上述方式再次进行判断。Specifically, when the stream computing framework processes the log data, the implementation logic of the processed log data is packaged into a jar package for processing, and the stream data including the interactive behavior sequence is generated, and further, the stream data is It is submitted to the stream computing engine for execution in the form of a stream computing self-defined function, and further, an algorithm model Http online service interface is called, and the interface is used to filter the streaming data, that is, based on the dictionary, it is judged whether the stream data represents that the user has Consultation requirements, the dictionary includes multiple keywords and/or key pages, if there are consultation requirements, the streaming data is written into the message queue (MetaQ) in the form of messages; if there is no consultation requirement, the streaming data is discarded , and after obtaining the interactive behavior sequence of the user within the next preset time, the judgment is performed again by the above method.

其中，判断流式数据中是否代表用户有咨询需求，即通过判断流式数据命中的字典中关键字和/或关键页面的数量是否大于预设阈值，来确定是否有咨询需求，本申请实施例对预设阈值不作具体限定，例如，可以通过判断流式数据中的关键字的数量大于5个，和/或，流式数据中的关键页面的数量大于1个。Among them, judging whether the streaming data represents that the user has a consulting demand, that is, whether there is a consulting demand is determined by judging whether the number of keywords and/or key pages in the dictionary hit by the streaming data is greater than a preset threshold, an embodiment of the present application. The preset threshold is not specifically limited. For example, it can be determined that the number of keywords in the streaming data is greater than 5, and/or the number of key pages in the streaming data is greater than 1.

可选的，所述交互行为序列还可以包括交互商品序列、用户在页面上的按钮事件等，如用户在某个页面上点击了收藏操作的按钮事件，本申请实施例对交互行为序列包括的具体内容不作限定，其可以适当横向扩展。Optionally, the interactive behavior sequence may also include an interactive commodity sequence, a user button event on a page, and the like. For example, a user clicks a button event of a favorite operation on a certain page, the embodiment of the present application includes the interactive behavior sequence. The specific content is not limited, and it can be appropriately extended horizontally.

因此，本申请实施例可以基于流计算框架生成流式数据，利用规则逻辑库对流式数据进行筛选，筛选出符合需求的流式数据，进而将符合需求的流式数据输入到问题分类模型中进行处理，减少无用数据的干扰，提高问题分类的准确性。Therefore, the embodiment of the present application can generate streaming data based on a stream computing framework, use a rule logic library to filter the streaming data, filter out the streaming data that meets the requirements, and then input the streaming data that meets the requirements into the problem classification model. processing, reduce the interference of useless data, and improve the accuracy of problem classification.

本申请实施例中，所述字典基于互信息确定，所述互信息指的是一种有用的信息度量，用于表述用户遇到的问题中包含的关于关键字(关键词)的信息量，即衡量某个关键字对用户遇到的问题所贡献的信息量，可以通过以下公式确定：In the embodiment of the present application, the dictionary is determined based on mutual information, and the mutual information refers to a useful information metric used to express the amount of information about keywords (keywords) contained in the problems encountered by the user, That is to measure the amount of information a keyword contributes to the problems encountered by users, which can be determined by the following formula:

其中，x_i表示从URL中提取出的所有可能的关键字中的第i个关键字，y_i表示用户遇到的所有可能的问题中的第i个问题，i表示对应的数量，I(Y,X)用于衡量关键字对用户遇到的问题的贡献，P(X＝x_i,Y＝y_i)表示历史数据中X＝x_i和Y＝y_i同时出现的概率，所述历史数据为一段时间内获取到URL和所述URL对应的提取出的问题，P(X＝x_i)表示历史数据中URL中X＝x_i出现的概率，P(Y＝y_i)表示历史数据中Y＝y_i出现的概率，上述概率均可以通过计算历史数据中相应的数据获取，如P(X＝x_i)可以通过计算历史数据中从URL提取出关键字的数量与历史数据中所有数据可能构成分词对应数量之比得到。Among them, x _i represents the ith keyword among all possible keywords extracted from the URL, _yi represents the ith problem among all possible problems encountered by the user, i represents the corresponding quantity, and I( Y, X) is used to measure the contribution of keywords to the problems encountered by users, and P(X= _xi , Y=y _i ) represents the probability of simultaneous occurrence of X= _xi and Y=y _i in the historical data. The historical data is the extracted problem corresponding to the URL and the URL obtained within a period of time, P(X= _xi ) represents the probability of X= _xi in the URL in the historical data, and P(Y=y _i ) represents the history The probability of occurrence of Y=y _i in the data, the above-mentioned probabilities can be obtained by calculating the corresponding data in the historical data, for example, P(X= _xi ) can be obtained by calculating the number of keywords extracted from the URL in the historical data and the historical data. All data may constitute the ratio of the corresponding number of word segmentation.

其中，所述字典包括页面字典和/或页面关键字字典；所述页面字典包括至少一个页面名称或URL；所述页面关键字字典包括页面名称或URL中的关键字。Wherein, the dictionary includes a page dictionary and/or a page keyword dictionary; the page dictionary includes at least one page name or URL; the page keyword dictionary includes keywords in the page name or URL.

本申请实施例中，所述关键字是从页面名称或URL拆分出来的，例如，所述页面名称为退款详情页面，则可以拆分成“退款”、“详情”、“页面”这三个关键字，进一步的，分别计算这三个关键字的互信息，互信息大于设定阈值的关键字为字典中的页面关键字。In the embodiment of this application, the keyword is split from the page name or URL. For example, if the page name is the refund details page, it can be split into "refund", "details", and "page" For these three keywords, further, the mutual information of the three keywords is calculated respectively, and the keyword whose mutual information is greater than the set threshold is the page keyword in the dictionary.

其中，针对页面名称或URL有对应的页面字典，针对页面名称或URL中的关键字有对应的页面关键字字典，若所述字典仅包括页面字典，则在对流式数据进行筛选操作时，即确定所述流式数据中有多少页面名称或URL命中字典中对应的页面名称或URL；若所述字典仅包括页面关键字字典，则在对流式数据进行筛选操作时，即确定所述流式数据中有多少页面名称或URL中的关键字命中字典中对应的页面关键字；若所述字典包括页面字典和页面关键字字典，则在对流式数据进行筛选操作时，可以基于预设条件进行筛选；所述预设条件包括下述至少一项：所述流式数据中页面名称或URL命中页面字典中对应的页面名称或URL、所述流式数据中页面名称或URL中的关键字命中页面关键字字典中对应的页面关键字，即可以通过确定满足预设条件中的任意一项或全部满足来对所述流式数据进行筛选。Among them, there is a corresponding page dictionary for the page name or URL, and there is a corresponding page keyword dictionary for the keywords in the page name or URL. If the dictionary only includes the page dictionary, when the streaming data is filtered, that is, Determine how many page names or URLs in the streaming data hit the corresponding page names or URLs in the dictionary; if the dictionary only includes a page keyword dictionary, when the streaming data is filtered, it is determined that the streaming data How many page names or keywords in the URL hit the corresponding page keywords in the dictionary in the data; if the dictionary includes a page dictionary and a page keyword dictionary, when filtering the streaming data, it can be performed based on preset conditions. Screening; the preset conditions include at least one of the following: the page name or URL in the streaming data hits the corresponding page name or URL in the page dictionary, and the keyword in the page name or URL in the streaming data hits The corresponding page keywords in the page keyword dictionary, that is, the streaming data can be filtered by determining that any one or all of the preset conditions are satisfied.

在本步骤中，由于算法模型Http在线服务接口需要承接的每秒查询率极大，所以本申请通过逻辑规则库对流式数据进行筛选，所述规则逻辑库由规则组成，所述规则的计算依靠以下两类字典实现：页面字典和页面关键字字典，具体的，所述规则用于表示将字典作为黑名单还是白名单，所述黑名单表示该字典中的页面和/或关键字的互信息较小，所述白名单表示该字典中的页面和/或关键字的互信息较大。In this step, since the query rate per second that the Http online service interface of the algorithm model needs to undertake is extremely large, the present application screens the streaming data through a logic rule base. The rule logic base consists of rules, and the calculation of the rules depends on The following two types of dictionaries are implemented: a page dictionary and a page keyword dictionary. Specifically, the rules are used to indicate whether the dictionary is used as a blacklist or a whitelist, and the blacklist indicates the mutual information of pages and/or keywords in the dictionary. If it is smaller, the whitelist indicates that the mutual information of the pages and/or keywords in the dictionary is larger.

可选的，所述关键字也可以进行筛选，因为关键字太多，会加大流式数据筛选的任务量，也会影响模型的使用效率，因此，通过实验证明，将关键字范围缩减到1000左右是一个比较理想的值。Optionally, the keywords can also be filtered, because too many keywords will increase the workload of streaming data filtering and affect the efficiency of the model. Therefore, it is proved by experiments that the range of keywords is reduced to Around 1000 is an ideal value.

可选的，若所述交互行为序列包括用户在页面上的按钮事件，相应的，针对按钮事件有对应的按钮事件字典，如通过判断用户浏览的页面中的触控操作与字典中的按钮事件相匹配即完成对流式数据的筛选操作，类似的，若所述交互行为序列还包括其他内容如交互商品序列，则也有对应的字典，则可以利用该字典对流式数据进行筛选，本申请实施例规则逻辑库中包含的规则数量不作具体限定，所述交互行为序列包含的内容均有对应的字典以及规则，用于进行筛选操作。Optionally, if the interactive behavior sequence includes the user's button event on the page, correspondingly, there is a corresponding button event dictionary for the button event, such as by judging the touch operation in the page browsed by the user and the button event in the dictionary. The filtering operation of the streaming data is completed after matching. Similarly, if the interactive behavior sequence also includes other content, such as an interactive commodity sequence, there is also a corresponding dictionary, and the dictionary can be used to filter the streaming data. This embodiment of the present application The number of rules contained in the rule logic base is not specifically limited, and the contents contained in the interactive behavior sequence have corresponding dictionaries and rules for performing screening operations.

因此，本申请实施例经过字典以及规则逻辑库中的规则对流式数据进行筛选后，能够将重要的数据写入消息队列，降低消息队列承接的任务量，减少流计算引擎的反压，或者重要数据的丢失。Therefore, after filtering the streaming data through the rules in the dictionary and the rule logic library, the embodiment of the present application can write important data into the message queue, reduce the amount of tasks undertaken by the message queue, reduce the back pressure of the stream computing engine, or reduce the important loss of data.

可选的，本申请实施例还提供了一种基于互信息确定字典的具体方案，图8为本申请实施例提供的一种基于互信息确定字典的流程图；所述流程包括：Optionally, an embodiment of the present application also provides a specific solution for determining a dictionary based on mutual information. FIG. 8 provides a flowchart for determining a dictionary based on mutual information provided by an embodiment of the present application; the process includes:

S801、获取多个历史用户与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题，得到问题集合。S801. Acquire historical conversation data of multiple historical users and intelligent customer service, and extract corresponding questions according to the historical conversation data to obtain a question set.

在本步骤中，问题集合为历史用户与智能客服进行会话交流，进而后台提取出的用户遇到的问题，针对每个历史用户与智能客服进行会话交流，每次会话交流可以提取出至少一个问题，所述问题集合用于表述用户可能遇到的各种问题。In this step, the set of questions is the conversational communication between historical users and intelligent customer service, and then the problems encountered by users extracted in the background, for each historical user and intelligent customer service for conversational communication, and at least one question can be extracted from each conversational communication , the problem set is used to express various problems that users may encounter.

S802、根据所述多个历史用户在所述历史会话数据前的历史浏览页面序列，确定所述多个历史用户浏览的页面集合以及页面关键字集合。S802. Determine a page set and a page keyword set browsed by the multiple historical users according to the historical browsing page sequence of the multiple historical users before the historical session data.

在本步骤中，将多个历史用户对应的历史会话数据前的历史浏览页面序列进行分解，进而确定多个历史用户浏览的页面集合以及页面关键字集合；所述页面集合包括历史用户的至少一个页面以及至少一个页面关键字。In this step, the historical browsing page sequence before the historical session data corresponding to the multiple historical users is decomposed, and then the page set and the page keyword set browsed by the multiple historical users are determined; the page set includes at least one of the historical users. page and at least one page keyword.

S803、对于问题集合中的任一问题，计算所述页面集合中各页面与所述问题的互信息，以及所述页面关键字集合中各页面关键字与所述问题的互信息，并计算得到的互信息从所述页面集合和页面关键字集合中选择所述问题对应的目标页面以及目标页面关键字。S803. For any question in the question set, calculate the mutual information between each page in the page set and the question, and the mutual information between each page keyword in the page keyword set and the question, and calculate to obtain The mutual information selects the target page and target page keyword corresponding to the question from the page set and the page keyword set.

本申请实施例中，确定的目标页面中可以用于预测用户问题的关键字，所述目标页面为包括有用信息的页面，即对应用户常规浏览行为的页面均不是目标页面，如用户经常访问的诸如“首页”，“我的”等页面，所述页面不可以视为目标页面。In the embodiment of the present application, the determined target page contains keywords that can be used to predict user problems, and the target page is a page that includes useful information, that is, pages corresponding to the user's regular browsing behavior are not target pages, such as those frequently visited by users. Pages such as "Home", "My", etc., which cannot be considered as target pages.

具体的，在利用公式(1)计算页面集合中各页面与问题的互信息后，对所述互信息按照大小顺序进行排序，进一步的，利用公式(1)计算页面关键字集合中各页面关键字与问题的互信息，也将计算得到的互信息进行排序，进而选取上述两者排序均位于前N的页面和页面关键字，进而确定为所述问题对应的目标页面以及目标页面关键字。Specifically, after calculating the mutual information between each page and the question in the page set by using formula (1), the mutual information is sorted in order of size. Further, formula (1) is used to calculate the key of each page in the page keyword set. The mutual information between the word and the question is also sorted by the calculated mutual information, and then the pages and page keywords that are both ranked in the top N are selected, and then the target page and the target page keyword corresponding to the question are determined.

S804、将问题集合中各问题对应的目标页面进行融合，得到页面字典，将各问题对应的目标页面关键字进行融合，得到页面关键字字典。S804 , fuse the target pages corresponding to each question in the question set to obtain a page dictionary, and fuse the target page keywords corresponding to each question to obtain a page keyword dictionary.

示例性的，假设从历史会话数据提取对应的问题有100个，从历史浏览页面序列中确定的页面有1000个，以及页面关键字有1万个，进一步的，基于互信息计算每个页面对每个提取出问题的贡献，以及每个页面关键字对每个提取出问题的贡献，在计算完成后，针对每个问题，找出该问题对应的前1000个关键字，前100个页面，即该问题对应的目标页面为前100个页面，以及目标页面关键字为前1000个关键字，进一步的，将所有问题对应的目标页面进行融合，得到页面字典，将所有问题对应的目标页面关键字进行融合，得到页面关键字字典。Exemplarily, it is assumed that there are 100 corresponding questions extracted from historical session data, 1,000 pages determined from the historical browsing page sequence, and 10,000 page keywords. Further, each page pair is calculated based on mutual information. The contribution of each extracted question, and the contribution of each page keyword to each extracted question, after the calculation is completed, for each question, find out the top 1000 keywords and the top 100 pages corresponding to the question, That is, the target page corresponding to the question is the first 100 pages, and the target page keywords are the first 1000 keywords. Further, the target pages corresponding to all the questions are fused to obtain a page dictionary, and the target page keys corresponding to all the questions are Words are fused to get the page keyword dictionary.

需要说明的是，本申请实施例对每一问题对应的目标页面以及目标页面关键字的数量不作具体限定，对提取对应的问题的数量也不作具体限定，以上仅是示例说明。It should be noted that the embodiments of the present application do not specifically limit the number of target pages and target page keywords corresponding to each question, nor do they specifically limit the number of extracting corresponding questions, and the above are only examples.

因此，本申请实施例可以根据用户实际的历史会话数据提取用户遇到的问题，再结合用户咨询问题前浏览的页面序列，确定各个页面及关键字对用户遇到问题的贡献，从而基于互信息精确地确定字典中的页面字典，以及页面关键字字典，提高确定字典的准确性，减少无用数据的干扰。Therefore, the embodiment of the present application can extract the problems encountered by the user according to the actual historical session data of the user, and then determine the contribution of each page and keyword to the problem encountered by the user in combination with the page sequence that the user browsed before consulting the problem, so as to determine the contribution of each page and keyword to the problem encountered by the user based on mutual information Accurately determine the page dictionary and page keyword dictionary in the dictionary, improve the accuracy of determining the dictionary, and reduce the interference of useless data.

结合上述实施例，图9为本申请实施例提供的一种问题预测方法的在线计算流程图，如图9所示，流计算框架将任务(流式数据)存储至消息中间件后，所述任务依次经过过滤器、向量化、标签化、排序模块和进程池的处理，预测出用户遇到的问题，其中，过滤器基于过滤逻辑(规则逻辑库)对即时页面如活动页面进行处理，筛选出关键词项或关键页面以及规则参数；向量化基于表征学习模型进行处理，即在嵌入表现阶段，将用户的序列化页面浏览路径编码为二维向量，并利用历史交互行为序列和提取的问题对模型进行微调训练，所述历史交互行为序列和提取的问题经过基本的三卷积神经网络模型(问题分类模型)的处理，进行了标签化，即分类；所述排序模块用于问题/知识(用户遇到的问题)的排序；所述进程池用于缓冲/触发，即触发推送的消息给用户。With reference to the above-mentioned embodiments, FIG. 9 is an online calculation flowchart of a problem prediction method provided by the embodiment of the application. As shown in FIG. 9 , after the stream computing framework stores tasks (streaming data) in the message middleware, the Tasks are processed by filters, vectorization, tagging, sorting modules and process pools in turn to predict the problems encountered by users. Among them, the filter processes real-time pages such as active pages based on the filtering logic (rule logic library). Keyword items or key pages and rule parameters; vectorization is processed based on the representation learning model, that is, in the embedding stage, the user's serialized page browsing path is encoded into a two-dimensional vector, and the historical interaction behavior sequence and extraction problem are used. The model is fine-tuned and trained, and the historical interaction behavior sequence and the extracted questions are processed by the basic tri-convolutional neural network model (question classification model), and labelled, that is, classified; the sorting module is used for questions/knowledge (problems encountered by users); the process pool is used for buffering/triggering, ie triggering push messages to users.

流计算框架获取的任务来自日志服务(Log Service，SLS)，所述流计算框架包括服务模块、精细化运营模块，以及应用配置中心，其中，在流计算框架的消息消费过程中，消息消费输出的数据类型可以有语音、图片、文字或视频等，进一步的，基于输出的数据类型选择相应的推荐形式将所述数据推荐给用户，所述推荐形式可以有智能外呼、消息、消息/短信、客服服务等；如可以将解决方案以语音的形式通过打电话智能外呼给用户。The tasks acquired by the stream computing framework come from the log service (Log Service, SLS). The stream computing framework includes a service module, a refined operation module, and an application configuration center. During the message consumption process of the stream computing framework, the message consumption output The data type can include voice, picture, text or video, etc. Further, based on the output data type, select the corresponding recommendation form to recommend the data to the user, and the recommendation form can include intelligent outbound call, message, message/SMS , customer service, etc.; for example, the solution can be intelligently called out to users in the form of voice through phone calls.

可选的，可以根据排序模块输出的问题，通过预设的表格确定对应的解决方案，例如，可以将问题转化成卡片内容，如表1所示：Optionally, the corresponding solution can be determined through a preset table according to the problem output by the sorting module. For example, the problem can be converted into card content, as shown in Table 1:

表1Table 1

其中，基于排序模块输出的问题与最终推送到用户端上的卡片内容并不一致，因此，需要维护一套推送卡片内容列表，包括用户遇到的问题到卡片ID的映射关系，这样当预测到问题时，可通过映射表查找问题ID相应的卡片ID，然后后台调用相应的卡片ID的卡片内容推送到用户端上。Among them, the problem based on the output of the sorting module is not consistent with the card content that is finally pushed to the client. Therefore, it is necessary to maintain a set of pushed card content list, including the mapping relationship between the problems encountered by the user and the card ID, so that when the problem is predicted , the card ID corresponding to the problem ID can be found through the mapping table, and then the card content of the corresponding card ID can be called in the background and pushed to the client.

可选的，本申请提供的方法可以不与任何一种推送系统耦合，当确定输出的问题后，推送系统发送触发推送的消息，可以是其他任意形式，如卡片，文案等，上述表格也可以根据实际需求增加其他设计，本申请实施例对此不作具体限定。Optionally, the method provided in this application may not be coupled with any kind of push system. When the output problem is determined, the push system sends a message that triggers the push, which can be in any other form, such as cards, copywriting, etc. The above table can also be used. Other designs are added according to actual needs, which are not specifically limited in this embodiment of the present application.

本申请实施例还可以根据向运营人员展示相关指标，从而使运营人员查看问题预测的效果，进一步的，由运营人员制定精细化运营的动作；所述相关指标可以包括发送用户量、发送成功用户量、触达用户量、有效用户量、主动服务会话(session)量、主动服务解决率、主动服务占比、触达率、有效率和点击率等。In this embodiment of the present application, relevant indicators can also be displayed to operators, so that operators can view the effect of problem prediction, and further, operators can formulate actions for refined operations; the relevant indicators may include the number of users sent, the number of users successfully sent The number of users, the number of users reached, the number of effective users, the number of active service sessions, the active service resolution rate, the active service ratio, the reach rate, the effective rate and the click rate, etc.

可选的，所述发送用户量指的是推送系统收到的推送用户量；所述发送成功用户量指的是推送解决方案成功的数量，即用户有收到推送内容；所述触达用户量指的是用户打开推送的消息并且看到推送卡片，其中消息变成已读的数量；所述有效用户量指的是用户在看到卡片的基础上，进行有效行为的数量，如进行点击，或回复信息等；所述主动服务会话量指的是用户打开推送消息并且看到推送卡片后，在智能客服页面产生会话服务的数量；所述主动服务解决率可以为成功解决量与可能解决的数量之比，其中，成功解决量可以为主动服务会话量减去未成功解决的数量，未成功解决的数量可以包括：转在线量、转电话量、不满意的会话数量、最后一次推荐未点击的会话数量、最后一次无答案的会话数量、直连人工量等，可能解决的数量可以为主动服务会话量减去直连人工量；所述主动服务占比指的是主动服务会话量与应用程序中总的服务会话量(包括主动和被动会话)之比；所述触达率指的是触达用户量与发送成功用户量之比；所述有效率指的是有效用户量与触达用户量之比；所述点击率指的是点击用户量与触达用户量之比。Optionally, the amount of sending users refers to the amount of push users received by the push system; the amount of users that are successfully sent refers to the number of successful push solutions, that is, users have received push content; the reach users The amount refers to the number of users who open the pushed message and see the push card, in which the message becomes read; the effective user amount refers to the number of valid actions that the user performs on the basis of seeing the card, such as clicking , or reply information, etc.; the active service session volume refers to the number of session services generated on the smart customer service page after the user opens the push message and sees the push card; the active service resolution rate can be the number of successful solutions and possible solutions. The ratio of the number of successful solutions can be the number of active service sessions minus the number of unsuccessful solutions, and the number of unsuccessful solutions can include: the number of online transfers, the number of calls transferred, the number of unsatisfactory sessions, the number of unsatisfactory sessions, the last recommendation The number of clicked sessions, the last unanswered session, the amount of direct connection labor, etc., the number of possible solutions can be the amount of active service sessions minus the amount of direct connection labor; the active service ratio refers to the number of active service sessions and the The ratio of the total number of service sessions (including active and passive sessions) in the application; the reach rate refers to the ratio of the number of reached users to the number of successfully sent users; the effective rate refers to the number of effective users and the number of touch The ratio of the number of users reached; the click rate refers to the ratio of the number of users who clicked to the number of users reached.

可选的，所述运营动作可以包括：设置推荐问题的类型以及数量、推荐问题的增删查改、推送人群疲劳度控制、推送模板/问题文案的编辑、推送人群的筛选等。Optionally, the operation action may include: setting the type and quantity of the recommended questions, adding, deleting, checking, and modifying the recommended questions, controlling the fatigue degree of the push crowd, editing the push template/question copy, and screening the push crowd, etc.

示例性的，在进行推送人群的筛选时，可以过滤消息进而触达不敏感人群、负向人群、或黑名单等，即针对不敏感人群和负向人群，可以不推或少推送消息，进而提升用户体验。Exemplarily, during the screening of push groups, messages can be filtered to reach insensitive groups, negative groups, or blacklists, etc., that is, for insensitive groups and negative groups, messages can be not pushed or less pushed, and then Improve user experience.

可选的，所述消息推送的时机由用户在应用程序页面的浏览行为触发，此时应用程序处于打开状态，通过消息推送的方式触达用户，可以保证较高的触达率和有效率。Optionally, the timing of the message push is triggered by the user's browsing behavior on the application page. At this time, the application is in an open state, and the user is reached by means of message push, which can ensure a higher reach rate and efficiency.

在另一种可能的实现方式中，可以通过基于分布式任务系统和手写编程逻辑的方式，直接监听行为日志数据，然后进行图3所示的问题预测方法，本申请实施例对获取日志数据的方法不作具体限定，且本申请实施例还可以用其他的框架代替流计算框架，只要能实现本申请同样的主动问题预测，均在本申请的保护范围内。In another possible implementation manner, the behavior log data can be directly monitored by means of a distributed task system and handwritten programming logic, and then the problem prediction method shown in FIG. 3 can be performed. The method is not specifically limited, and other frameworks may be used instead of the flow computing framework in the embodiments of the present application, as long as the same active problem prediction as the present application can be achieved, it is within the protection scope of the present application.

因此，本申请实施例提供的问题预测方法，推送时效性强，可以由用户浏览的行为触发，秒级产出用户问题预测结果，提高较高的触达率和有效率；且本申请还不依赖人群标签，而是依靠多层、多种相互结合的深度预测算法模型进行判断，提高预测模型的精准性，本申请采用的输入数据为用户的动态行为数据，以及动态的订单状态数据，输出为智能服务场景的问题，能更精准的捕捉即时的用户诉求，进而提升智能服务的用户体验。Therefore, the problem prediction method provided in the embodiment of the present application has strong push timeliness, can be triggered by the user's browsing behavior, and produces the user's problem prediction result in seconds, thereby improving a higher reach rate and efficiency; Instead of relying on crowd labels, it relies on multi-layer and multiple combined in-depth prediction algorithm models to make judgments and improve the accuracy of the prediction model. The input data used in this application are the user's dynamic behavior data and dynamic order status data. The output For the problem of intelligent service scenarios, it can more accurately capture real-time user demands, thereby improving the user experience of intelligent services.

图10为本申请实施例提供的另一种问题预测方法的流程示意图，如图10所示，可以根据买家的问题，向卖家推送解决方案，所述方法包括：FIG. 10 is a schematic flowchart of another problem prediction method provided by the embodiment of the present application. As shown in FIG. 10 , a solution can be pushed to the seller according to the buyer's problem, and the method includes:

S1001、获取店铺对应的至少一个买家在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列。S1001. Acquire an interactive behavior sequence of at least one buyer corresponding to the store within a current preset time; wherein, the interactive behavior sequence includes a browsed page sequence.

本申请实施例中，店铺对应的至少一个买家可以指的是在所述店铺内有过购买记录的买家、有过评论记录的买家、或有过咨询记录的买家。In the embodiment of the present application, at least one buyer corresponding to a store may refer to a buyer who has a purchase record, a buyer who has a comment record, or a buyer who has a consultation record in the store.

可选的，在本步骤中，可以实时获取店铺对应的至少一个买家在当前预设时间内的交互行为序列。Optionally, in this step, the interactive behavior sequence of at least one buyer corresponding to the store within the current preset time may be acquired in real time.

S1002、根据所述交互行为序列，基于问题分类模型预测各个买家遇到的问题的类型，并基于所述类型对应的问题预测模型，预测各个买家遇到的问题。S1002. According to the interactive behavior sequence, predict the types of problems encountered by each buyer based on a problem classification model, and predict the problems encountered by each buyer based on a problem prediction model corresponding to the type.

本申请实施例中，问题分类模型得到的是每个买家遇到的问题的类型，所述类型的分类可以按照步骤302中描述的方法进行分类，在此不再赘述，也可以按照买家的数量即客流量的多少进行分类，如买家的数量较多时，可以使用计算速度快的模型，而买家的数量较少时，则使用计算速度一般的模型，减轻预设系统的运算量，本申请实施例对类型的分类不作具体限定。In this embodiment of the present application, the problem classification model obtains the types of problems encountered by each buyer, and the classification of the types can be classified according to the method described in step 302, which is not repeated here, or can be classified according to the buyer For example, when the number of buyers is large, the model with fast calculation speed can be used, and when the number of buyers is small, the model with average calculation speed can be used to reduce the calculation amount of the default system. , the classification of the types is not specifically limited in the embodiment of the present application.

S1003、根据预测得到的问题，确定推送给所述店铺对应的卖家的解决方案。S1003. Determine a solution to be pushed to the seller corresponding to the store according to the predicted problem.

在本步骤中，确定推送给店铺对应的卖家的解决方案，可以基于店铺的类型、大小选择相应的解决方案推荐给卖家，如较大的店铺，则可以选择将确定的优先级高的买家对应的解决方案优先推送给店铺，所述优先级高的买家可以指的是会员客户或购物信用良好的买家等，对于较小的店铺，则可以选择将所有的买家对应的解决方案推送给店铺。In this step, the solution to be pushed to the seller corresponding to the store is determined, and the corresponding solution can be selected and recommended to the seller based on the type and size of the store, such as a larger store, the buyer with a higher priority can be selected The corresponding solution is pushed to the store first. The buyer with high priority can refer to member customers or buyers with good shopping credit, etc. For smaller stores, you can choose the solution corresponding to all buyers. Send to the store.

需要说明的是，所述解决方案的推送形式以及查找方式与上述实施例中步骤303类似，在此不再赘述，可参见步骤303的描述。可选的，对于同一问题，针对买家的解决方案和针对卖家的解决方案可以是不同的，例如，对于卖家来说，若有多个买家都存在退款问题，则可以向卖家推送文案：有多位买家可能存在退款问题，请点击此处查看解决办法。It should be noted that the push form and search method of the solution are similar to step 303 in the above-mentioned embodiment, which will not be repeated here, and reference may be made to the description of step 303 . Optionally, for the same problem, the solution for buyers and the solution for sellers can be different. For example, for sellers, if there are multiple buyers who have refund problems, they can push a copy to the seller. : Multiple buyers may have issues with refunds, please click here for solutions.

因此，本申请实施例可以通过获取到的至少一个买家在当前预设时间内的浏览的页面序列；基于问题分类模型预测各个买家遇到的问题的类型，并基于类型对应的问题预测模型，主动预测各个买家遇到的问题；进一步的，根据预测得到的问题，确定推送给店铺对应的卖家的解决方案，提高卖家解决用户问题的效率和准确性。Therefore, in the embodiment of the present application, the acquired page sequence of at least one buyer in the current preset time can be used; the type of the problem encountered by each buyer can be predicted based on the problem classification model, and the problem prediction model corresponding to the type can be used. , proactively predict the problems encountered by each buyer; further, according to the predicted problems, determine the solution that is pushed to the seller corresponding to the store, and improve the efficiency and accuracy of the seller's solution to the user's problem.

可选的，根据所述交互行为序列，基于问题分类模型预测各个买家遇到的问题的类型，并基于所述类型对应的问题预测模型，预测各个买家遇到的问题，包括：Optionally, according to the interactive behavior sequence, predict the types of problems encountered by each buyer based on the problem classification model, and predict the problems encountered by each buyer based on the problem prediction model corresponding to the type, including:

对于任一买家，将所述交互行为序列对应的特征向量输入到所述问题分类模型，确定所述买家遇到的问题的类型；For any buyer, input the feature vector corresponding to the interactive behavior sequence into the problem classification model to determine the type of problem encountered by the buyer;

若所述类型为营销类或账户类，则基于序列模型对所述买家的交互行为序列进行处理，得到对应的问题；If the type is a marketing type or an account type, processing the interactive behavior sequence of the buyer based on the sequence model to obtain the corresponding question;

若所述类型为交易类，则基于深度兴趣进化网络模型对所述买家的交互行为序列以及所述买家对应的交易信息进行处理，得到对应的问题。If the type is a transaction type, the interaction behavior sequence of the buyer and the transaction information corresponding to the buyer are processed based on the deep interest evolution network model to obtain corresponding questions.

可选的，还可以获取历史卖家与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题；Optionally, historical conversation data between historical sellers and intelligent customer service can also be obtained, and corresponding questions can be extracted according to the historical conversation data;

根据所述历史卖家在所述历史会话数据前对应历史买家的历史交互行为序列以及提取的问题，构建序列模型对应的训练数据集，并对所述序列模型进行训练；和/或，According to the historical interaction behavior sequence of the historical seller corresponding to the historical buyer before the historical session data and the extracted questions, construct a training data set corresponding to the sequence model, and train the sequence model; and/or,

根据所述历史卖家在所述历史会话数据前的历史买家的历史交互行为序列、交易信息以及提取的问题，构建深度兴趣进化网络模型对应的训练数据集，并对所述深度兴趣进化网络模型进行训练。According to the historical interaction behavior sequence, transaction information and extracted questions of historical buyers of the historical seller before the historical session data, a training data set corresponding to the deep interest evolution network model is constructed, and the deep interest evolution network model is to train.

可选的，获取店铺对应的至少一个买家在当前预设时间内的交互行为序列，包括：Optionally, obtain the interactive behavior sequence of at least one buyer corresponding to the store within the current preset time, including:

通过流计算框架，根据店铺对应的至少一个买家在当前预设时间内的交互日志，生成包含交互行为序列的流式数据；Through the stream computing framework, according to the interaction log of at least one buyer corresponding to the store within the current preset time, the stream data including the sequence of interaction behaviors is generated;

所述字典通过历史买家浏览的历史页面与对应的问题确定；The dictionary is determined by historical pages browsed by historical buyers and corresponding questions;

所述规则用于表示将字典作为黑名单还是白名单，和/或，所述买家浏览的页面序列中多少页面与字典相匹配即完成对所述页面序列的筛选操作。The rule is used to indicate whether the dictionary is used as a blacklist or a whitelist, and/or the number of pages in the page sequence browsed by the buyer that matches the dictionary is used to complete the screening operation on the page sequence.

可选的，还包括：Optionally, also include:

获取多个历史卖家与智能客服的历史会话数据，根据所述历史会话数据提取对应的问题，得到问题集合；Obtaining historical conversation data of multiple historical sellers and intelligent customer service, extracting corresponding questions according to the historical conversation data, and obtaining a set of questions;

根据所述多个历史卖家在所述历史会话数据前历史买家的历史浏览页面序列，确定所述历史买家浏览的页面集合以及页面关键字集合；Determine the page set and page keyword set browsed by the historical buyer according to the historical browsing page sequence of the historical buyer before the historical session data of the plurality of historical sellers;

上述实施例的实现原理和技术效果跟上述图3对应的从属方案的实施例类似，在此不再赘述。The implementation principle and technical effect of the foregoing embodiment are similar to those of the foregoing embodiment of the subordinate solution corresponding to FIG. 3 , and details are not described herein again.

对应于上述问题预测方法，本申请实施例提供一种问题预测装置，图11为本申请实施例提供的一种问题预测装置的结构示意图，所述装置包括：Corresponding to the above problem prediction method, an embodiment of the present application provides a problem prediction apparatus. FIG. 11 is a schematic structural diagram of a problem prediction apparatus provided by an embodiment of the present application, and the apparatus includes:

第一获取模块1101，用于获取用户在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列；The first obtaining module 1101 is used to obtain the interactive behavior sequence of the user within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

第一预测模块1102，用于根据所述交互行为序列，基于问题分类模型预测所述用户遇到的问题的类型，并基于所述类型对应的问题预测模型，预测所述用户遇到的问题；a first prediction module 1102, configured to predict the type of problems encountered by the user based on a problem classification model according to the interactive behavior sequence, and predict the problems encountered by the user based on a problem prediction model corresponding to the type;

第一确定模块1103，用于根据预测得到的问题，确定推送给所述用户的解决方案。The first determination module 1103 is configured to determine a solution to be pushed to the user according to the predicted problem.

本申请实施例提供的问题预测装置，可用于执行上述图1至图9所示实施例的技术方案，其实现原理和技术效果类似，本实施例此处不再赘述。The problem prediction apparatus provided in the embodiment of the present application can be used to implement the technical solutions of the embodiments shown in FIG. 1 to FIG. 9 , and the implementation principle and technical effect thereof are similar, and are not described again in this embodiment.

示例性的，本申请实施例还提供一种问题预测装置，图12为本申请实施例提供的另一种问题预测装置的结构示意图，所述装置包括：Exemplarily, an embodiment of the present application further provides an apparatus for predicting a problem. FIG. 12 is a schematic structural diagram of another apparatus for predicting a problem provided by an embodiment of the present application, and the apparatus includes:

第二获取模块1201，用于获取店铺对应的至少一个买家在当前预设时间内的交互行为序列；其中，所述交互行为序列包括浏览的页面序列；The second acquiring module 1201 is configured to acquire the interactive behavior sequence of at least one buyer corresponding to the store within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

第二预测模块1202，用于根据所述交互行为序列，基于问题分类模型预测各个买家遇到的问题的类型，并基于所述类型对应的问题预测模型，预测各个买家遇到的问题；The second prediction module 1202 is configured to predict the types of problems encountered by each buyer based on the problem classification model according to the interactive behavior sequence, and predict the problems encountered by each buyer based on the problem prediction model corresponding to the type;

第二确定模块1203，用于根据预测得到的问题，确定推送给所述店铺对应的卖家的解决方案。The second determination module 1203 is configured to determine a solution to be pushed to the seller corresponding to the store according to the predicted problem.

本申请实施例提供的问题预测装置，可用于执行上述图1至图10所示实施例的技术方案，其实现原理和技术效果类似，本实施例此处不再赘述。The problem prediction apparatus provided in the embodiments of the present application can be used to implement the technical solutions of the embodiments shown in FIG. 1 to FIG. 10 , and the implementation principles and technical effects thereof are similar, and details are not described herein again in this embodiment.

图13为本申请实施例提供的一种电子设备的结构示意图。如图13所示，本实施例的电子设备可以包括：FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 13 , the electronic device of this embodiment may include:

至少一个处理器1301；以及at least one processor 1301; and

与所述至少一个处理器通信连接的存储器1302；a memory 1302 in communication with the at least one processor;

其中，所述存储器1302存储有可被所述至少一个处理器1301执行的指令，所述指令被所述至少一个处理器1301执行，以使所述电子设备执行如上述任一实施例所述的方法。Wherein, the memory 1302 stores instructions that can be executed by the at least one processor 1301, and the instructions are executed by the at least one processor 1301, so that the electronic device executes any of the foregoing embodiments. method.

可选地，存储器1302既可以是独立的，也可以跟处理器1301集成在一起。Optionally, the memory 1302 may be independent or integrated with the processor 1301 .

本实施例提供的电子设备的实现原理和技术效果可以参见前述各实施例，此处不再赘述。For the implementation principle and technical effect of the electronic device provided in this embodiment, reference may be made to the foregoing embodiments, and details are not repeated here.

本申请实施例还提供一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机执行指令，当处理器执行所述计算机执行指令时，实现前述任一实施例所述的方法。Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the method described in any of the foregoing embodiments is implemented.

本申请实施例还提供一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现前述任一实施例所述的方法。Embodiments of the present application further provide a computer program product, including a computer program, which implements the method described in any of the foregoing embodiments when the computer program is executed by a processor.

本申请的技术方案中，所涉及的用户数据等信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。In the technical solution of this application, the collection, storage, use, processing, transmission, provision and disclosure of user data and other information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。例如，以上所描述的设备实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated. to another system, or some features can be ignored, or not implemented.

上述以软件功能模块的形式实现的集成的模块，可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器执行本申请各个实施例所述方法的部分步骤。The above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute some steps of the methods described in the various embodiments of the present application.

应理解，上述处理器可以是中央处理单元(Central Processing Unit，简称CPU)，还可以是其它通用处理器、数字信号处理器(Digital Signal Processor，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。存储器可能包含高速RAM存储器，也可能还包括非易失性存储NVM，例如至少一个磁盘存储器，还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。It should be understood that the above-mentioned processor may be a central processing unit (Central Processing Unit, CPU for short), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, Referred to as ASIC) and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the application can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor. The memory may include high-speed RAM memory, and may also include non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a U disk, a removable hard disk, a read-only memory, a magnetic disk or an optical disk, and the like.

上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above-mentioned storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. A storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

一种示例性的存储介质耦合至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息。当然，存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits，简称ASIC)中。当然，处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and the storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Of course, the processor and the storage medium may also exist in the electronic device or the host device as discrete components.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

1. a problem prediction method, is characterized in that, described method comprises:

Acquire the interactive behavior sequence of the user within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

According to the interactive behavior sequence, the type of the problem encountered by the user is predicted based on the problem classification model, and the problem encountered by the user is predicted based on the problem prediction model corresponding to the type;

Based on the predicted problem, a solution to push to the user is determined.

2. The method according to claim 1, wherein, according to the interactive behavior sequence, the type of the problem encountered by the user is predicted based on a problem classification model, and the problem prediction model corresponding to the type is used to predict the problem. problems encountered by users, including:

Input the feature vector corresponding to the interactive behavior sequence into the problem classification model to determine the type of problem encountered by the user;

If the type is a marketing type or an account type, the user's interactive behavior sequence is processed based on the sequence model to obtain the corresponding question;

If the type is a transaction type, the interaction behavior sequence of the user and the transaction information corresponding to the user are processed based on the deep interest evolution network model to obtain the corresponding problem.

3. The method of claim 2, further comprising:

Obtain historical conversation data between historical users and intelligent customer service, and extract corresponding questions according to the historical conversation data;

According to the historical interaction behavior sequence of the historical user before the historical session data and the extracted questions, a training data set corresponding to the sequence model is constructed, and the sequence model is trained; and/or,

According to the historical interaction behavior sequence, transaction information and extracted questions of the historical user before the historical session data, a training data set corresponding to the deep interest evolution network model is constructed, and the deep interest evolution network model is trained.

4. The method according to claim 3, wherein the sequence model is a natural language model for processing sequences; training the sequence model, comprising:

Perform a word segmentation operation on the historical interactive behavior sequence and the extracted questions in the training data set to obtain a word segmentation sequence;

Inputting the word segmentation sequence into the natural language model, and pre-training the natural language model with predicting the next sentence as a training target;

Taking the extracted questions as supervision signals, fine-tuning training is performed on the pre-trained natural language model according to the historical interaction sequence.

5. The method according to any one of claims 1-4, characterized in that, acquiring the interactive behavior sequence of the user within the current preset time, comprising:

Through the stream computing framework, according to the user's interaction log within the current preset time, the stream data containing the interaction behavior sequence is generated;

The streaming data is filtered based on the rule logic library, and the filtered streaming data is put into a message queue, and the streaming data in the message queue is used for processing by the problem classification model.

6. The method according to claim 5, wherein the streaming data is screened based on a rule logic library, and the filtered streaming data is put into a message queue, comprising:

Screening the streaming data according to the dictionary and the rules in the rule logic library;

Wherein, the dictionary includes a page dictionary and/or a page keyword dictionary;

The dictionary is determined by historical pages browsed by the user and corresponding questions.

7. The method of claim 6, further comprising:

Obtaining historical conversation data of multiple historical users and intelligent customer service, extracting corresponding questions according to the historical conversation data, and obtaining a set of questions;

determining a page set and a page keyword set browsed by the plurality of historical users according to the sequence of historical pages viewed by the plurality of historical users before the historical session data;

For any question in the question set, calculate the mutual information between each page in the page set and the question, and the mutual information between each page keyword in the page keyword set and the question, and calculate the mutual information obtained. The information selects the target page and target page keyword corresponding to the question from the page set and the page keyword set;

The target pages corresponding to each question in the question set are fused to obtain a page dictionary, and the target page keywords corresponding to each question are fused to obtain a page keyword dictionary.

8. A problem prediction method, comprising:

Obtain the interactive behavior sequence of at least one buyer corresponding to the store within the current preset time; wherein, the interactive behavior sequence includes the browsed page sequence;

According to the interactive behavior sequence, the type of the problem encountered by each buyer is predicted based on the problem classification model, and the problem encountered by each buyer is predicted based on the problem prediction model corresponding to the type;

According to the predicted problem, a solution to be pushed to the seller corresponding to the store is determined.

9. An electronic device, characterized in that, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor;

Wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the electronic device to perform the method of any one of claims 1-8 .

10. A computer-readable storage medium, characterized in that, computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the computer-executable instructions according to any one of claims 1-8 are implemented. method described.

11. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-8 is implemented.