CN110135647A

CN110135647A - A control method and control device for realizing trend prediction based on feature modeling

Info

Publication number: CN110135647A
Application number: CN201910423817.2A
Authority: CN
Inventors: 胡罡; 陆小彦; 蒋晓炜; 卢小雨; 马占山
Original assignee: China Pacific Insurance Group Co Ltd CPIC
Current assignee: China Pacific Insurance Group Co Ltd CPIC
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2019-08-16

Abstract

The present invention provides a kind of control methods that trend prediction is realized based on feature modeling, it is based on dynamic acquisition data, summarizes rule modeling, and be applied to external data to realize trend prediction in modeling, include the following steps: that the date determines multiple contextual informations at one or more dates to a. based on one or more；B. each contextual information situational model corresponding with the contextual information is matched one by one, one or more prediction results under each situational model state are obtained, the prediction result includes at least a core-prediction result and multiple auxiliary prediction results；C. final prediction result is determined according to a core-prediction result and multiple auxiliary prediction results.The present invention is easy to use, high-efficient, forecasting accuracy is high, has high commercial value.

Description

A control method and control device for realizing trend prediction based on feature modeling

技术领域technical field

本发明属于软件技术架构领域，涉及一种基于机器学习技术研发的趋势预测模型，尤其是一种基于特征建模实现趋势预测的控制方法及控制装置。The invention belongs to the field of software technology architecture, and relates to a trend prediction model developed based on machine learning technology, in particular to a control method and a control device for realizing trend prediction based on feature modeling.

背景技术Background technique

运维工作大部分是由运维人员手工完成的，随着互联网业务快速扩张、人力成本高企的时代，人肉运维难以维系日常运行。于是通过创建可被自动触发的、预定义规则的脚本，来执行常见的、重复性的运维工作，实现自动化运维，被认为是一种基于行业领域知识和运维场景领域知识的专家系统。自动化运维以编排作业方式为核心构建运维自动化平台，可为客户提供配置变更、任务巡检、脚本执行控制、自定义工作流等功能；覆盖巡检、文件分发、备份恢复、SQL操作等运维场景，并提供可扩展能力；自动化运维系统服务器端配置批作业，按照策略调度各脚本工作，实现业务流的自动化操作；控制台为各管理人员的操作平台，实现角色、对象、操作的分级分权管理。在CMDB基础上构建统一化设备模型，实现资源和服务的自动化交付。Most of the operation and maintenance work is done manually by operation and maintenance personnel. With the rapid expansion of Internet business and high labor costs, it is difficult for human operation and maintenance to maintain daily operation. Therefore, by creating scripts that can be automatically triggered and predefined rules to perform common and repetitive operation and maintenance tasks to achieve automated operation and maintenance, it is considered an expert system based on industry domain knowledge and operation and maintenance scenario domain knowledge . Automated operation and maintenance builds an operation and maintenance automation platform with the core of orchestration operations, which can provide customers with functions such as configuration changes, task inspections, script execution control, and custom workflows; covering inspections, file distribution, backup recovery, SQL operations, etc. Operation and maintenance scenarios, and provide scalability; the automated operation and maintenance system configures batch jobs on the server side, schedules scripts according to policies, and realizes automated operations of business flows; the console is the operating platform for each manager, realizing roles, objects, and operations hierarchical and decentralized management. Build a unified device model on the basis of CMDB to realize automatic delivery of resources and services.

IT运维从诞生发展至今，经历人工运维到目前的自动化运维，已经能够逐步适应复杂的业务、多样化的用户需求、不断扩展的IT应用，以保障IT服务灵活便捷，安全稳定，自动化运维作为代替人工操作为出发点的诉求，正被广泛研究和应用。From its birth to the present, IT operation and maintenance has experienced manual operation and maintenance to the current automatic operation and maintenance, and has been able to gradually adapt to complex businesses, diverse user needs, and continuously expanding IT applications to ensure that IT services are flexible, convenient, safe, stable, and automated. Operation and maintenance, as a starting point to replace manual operations, is being widely researched and applied.

运维自动化是一组将静态的设备结构转化为根据IT服务需求动态弹性响应的策略，目的就是实现IT运维的质量，降低成本。自动化运维主要以创建可被自动触发的、预定义规则的脚本，来执行常见的、重复性的运维工作，实现自动化运维。但是一旦发生未被预定义的问题，就需要人为去感知、排查、定位，这种运维服务需要投入的人力相当庞大。因为人力不足或因排查定位，将导致服务长时间挂起，无法实现系统的稳定运行，这种不足日益凸显。例如，在业务活动或特殊时间点，由于对业务量预估不足，人员反应不及时而引发资源调配问题，导致业务运行异常O&M automation is a set of strategies that convert static equipment structures into dynamic and elastic responses based on IT service requirements. The purpose is to achieve the quality of IT O&M and reduce costs. Automated operation and maintenance is mainly to create scripts that can be automatically triggered and pre-defined rules to perform common and repetitive operation and maintenance tasks to achieve automated operation and maintenance. But once a problem that is not predefined occurs, it needs to be perceived, checked, and located manually. This kind of operation and maintenance service requires a lot of manpower. Due to lack of manpower or due to investigation and positioning, the service will be suspended for a long time, and the stable operation of the system cannot be achieved. This deficiency has become increasingly prominent. For example, in business activities or special time points, resource allocation problems are caused due to insufficient estimation of business volume and untimely response of personnel, resulting in abnormal business operation

而目前，市场上并没有一种能够有效解决上述问题的具体办法，尤其涉及一种基于特征建模实现趋势预测的控制方法及控制装置。However, at present, there is no specific method on the market that can effectively solve the above problems, especially a control method and a control device for realizing trend prediction based on feature modeling.

发明内容Contents of the invention

针对现有技术存在的技术缺陷，本发明的目的是提供一种基于特征建模实现趋势预测的控制方法及控制装置，其基于动态获取数据、总结规则建模，并将外部数据应用于建模中实现趋势预测，包括如下步骤：Aiming at the technical defects existing in the prior art, the object of the present invention is to provide a control method and control device for realizing trend prediction based on feature modeling, which is based on dynamically acquiring data, summarizing rule modeling, and applying external data to modeling To achieve trend forecasting, including the following steps:

a.基于一个或多个日期确定一个或多个所述日期所处于的多个情境信息；a. Determining a plurality of context information on one or more dates based on the one or more dates;

b.逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果，所述预测结果至少包括一个核心预测结果以及多个辅助预测结果；b. Match each of the situation information with the situation model corresponding to the situation information one by one, and obtain one or more prediction results in each situation model state, and the prediction results include at least one core prediction result and multiple Auxiliary prediction results;

c.根据一个核心预测结果以及多个辅助预测结果确定最终预测结果。c. Determine the final prediction result according to one core prediction result and multiple auxiliary prediction results.

优选地，在所述步骤b之后，还包括步骤d：将所述最终预测结果进行动态展示。Preferably, after the step b, a step d is further included: dynamically displaying the final prediction result.

优选地，在所述步骤a中，所述情境信息至少包括：Preferably, in the step a, the context information at least includes:

XGBoost基础模型信息；XGBoost basic model information;

节假日模型信息；Holiday model information;

星期模型信息；以及week model information; and

营销活动信息。Marketing campaign information.

优选地，所述步骤b包括如下步骤：Preferably, said step b includes the following steps:

b1.基于所述XGBoost基础模型信息确定与所述XGBoost基础模型信息相匹配的核心预测结果；b1. Determine the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information;

b2.基于所述节假日模型信息确定与所述节假日模型信息相匹配的第一辅助预测结果；b2. Determining a first auxiliary prediction result that matches the holiday model information based on the holiday model information;

b3.基于所述星期模型信息确定与所述星期模型信息相匹配的第二辅助预测结果；b3. Determining a second auxiliary prediction result that matches the week model information based on the week model information;

b4.基于所述营销活动信息确定与所述营销活动模型信息相匹配的第三辅助预测结果。b4. Determine a third auxiliary prediction result that matches the marketing activity model information based on the marketing activity information.

优选地，所述步骤c包括c1：基于所述第一辅助预测结果、所述第二辅助预测结果以及所述第三辅助预测结果对所述核心预测结果进行调整，确定最终预测结果。Preferably, the step c includes c1: adjusting the core prediction result based on the first auxiliary prediction result, the second auxiliary prediction result and the third auxiliary prediction result to determine a final prediction result.

优选地，所述情境模型的建立将通过如下方式获取：Preferably, the establishment of the situation model will be obtained through the following methods:

i：基于一种或多种数据采集方式在一个或多个数据采集平台获取一个或多个原始外部信息；i: Obtain one or more original external information on one or more data collection platforms based on one or more data collection methods;

ii：对所述一个或多个原始外部信息进行预处理，确定一个或多个预处理外部信息；ii: Preprocessing the one or more original external information, and determining one or more preprocessed external information;

iii：对所述一个或多个预处理外部信息进行特征提取，确定一个或多个情境特征信息；iii: performing feature extraction on the one or more preprocessed external information, and determining one or more context feature information;

iv：基于所述一个或多个情境信息建模，确定一个或多个与所述情境特征信息相匹配的情境模型。iv: Based on the one or more context information models, determine one or more context models that match the context feature information.

优选地，所述数据采集方式至少包括如下方式中的任一种或任多种：Preferably, the data collection methods include at least any one or more of the following methods:

SQL导出；SQL export;

第三方API导出；或者third-party API export; or

爬虫采集。Reptile collection.

优选地，所述数据采集平台至少包括如下平台中的任一种或任多种：Preferably, the data collection platform includes at least any one or more of the following platforms:

运维派单系统；Operation and maintenance dispatch system;

配置管理数据库；configuration management database;

应用性能监测；application performance monitoring;

看板系统；或者Kanban system; or

统一日志平台。Unified logging platform.

优选地，所述原始外部信息至少包括如下信息中的任一种或任多种：Preferably, the original external information includes at least any one or more of the following information:

历史信息；history information;

公里、农历信息；Kilometer, lunar calendar information;

节假日信息；或者holiday information; or

营销活动数据。Marketing campaign data.

优选地，所述预处理包括附件采集、数据筛选和数据编码。Preferably, said preprocessing includes attachment collection, data screening and data encoding.

根据本发明的另一个方面，提供了一种基于特征建模实现趋势预测的控制装置，包括：According to another aspect of the present invention, a control device for realizing trend prediction based on feature modeling is provided, including:

第一确定装置1：基于一个或多个日期确定一个或多个所述日期所处于的多个情境信息；The first determining means 1: based on one or more dates, determine a plurality of context information in which the one or more dates are located;

第一获取装置2：逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果；The first acquiring means 2: matching each of the context information with the context model corresponding to the context information one by one, and acquiring one or more prediction results under the state of each context model;

第二确定装置3：根据一个核心预测结果以及多个辅助预测结果确定最终预测结果。The second determining means 3: determine the final prediction result according to a core prediction result and multiple auxiliary prediction results.

第一处理装置4：将所述最终预测结果进行动态展示。The first processing device 4: dynamically display the final prediction result.

优选地，所述第一获取装置2包括：Preferably, the first acquiring device 2 includes:

第三确定装置21：基于所述XGBoost基础模型信息确定与所述XGBoost基础模型信息相匹配的核心预测结果；The third determining means 21: determine the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information;

第四确定装置22：基于所述节假日模型信息确定与所述节假日模型信息相匹配的第一辅助预测结果；The fourth determination means 22: determine a first auxiliary prediction result matching the holiday model information based on the holiday model information;

第五确定装置23：基于所述星期模型信息确定与所述星期模型信息相匹配的第二辅助预测结果；Fifth determining means 23: determining a second auxiliary prediction result matching the week model information based on the week model information;

第六确定装置24：基于所述营销活动信息确定与所述营销活动模型信息相匹配的第三辅助预测结果。The sixth determining means 24: determine a third auxiliary prediction result that matches the marketing activity model information based on the marketing activity information.

优选地，还包括：Preferably, it also includes:

第二获取装置5：基于一种或多种数据采集方式在一个或多个数据采集平台获取一个或多个原始外部信息；The second obtaining means 5: obtain one or more original external information on one or more data collection platforms based on one or more data collection methods;

第七确定装置6：对所述一个或多个原始外部信息进行预处理，确定一个或多个预处理外部信息；The seventh determining means 6: perform preprocessing on the one or more original external information, and determine one or more preprocessed external information;

第八确定装置7：对所述一个或多个预处理外部信息进行特征提取，确定一个或多个情境特征信息；The eighth determining means 7: performing feature extraction on the one or more preprocessed external information, and determining one or more context feature information;

第九确定装置8：基于所述一个或多个情境信息建模，确定一个或多个与所述情境特征信息相匹配的情境模型。Ninth determining means 8: Based on the one or more situational information models, determine one or more situational models that match the situational feature information.

本发明提供了一种不依赖于人为指定规则，由机器学习算法自动地从海量业务数据中不断地学习，不断地提炼并总结规则，形成趋势预测。即是在自动化运维的基础上，增加了一个基于机器学习的大脑，指挥监测系统采集大脑决策所需的数据，做出业务数据预测，并指挥容器平台完成弹性扩缩的决策动作，从而达到运维系统的整体目标。结合大数据可视化技术向业务方和技术方实时展示业务运行情况，提供未来预测运行数据，使运维团队提前做好容量管理和性能管理，作为容器云平台的中枢大脑，指挥相关容器管理平台完成弹性扩缩的决策动作，本发明基于一个或多个日期确定一个或多个所述日期所处于的多个情境信息，逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果，所述预测结果至少包括一个核心预测结果以及多个辅助预测结果，根据一个核心预测结果以及多个辅助预测结果确定最终预测结果，本发明使用方便，效率高、预测准确性高，具有极高的商业价值。The present invention provides a method that does not rely on human-specified rules, and uses machine learning algorithms to automatically learn from massive business data, continuously refine and summarize rules, and form trend predictions. That is, on the basis of automated operation and maintenance, a brain based on machine learning is added, and the command monitoring system collects the data required for brain decision-making, makes business data predictions, and directs the container platform to complete the decision-making actions of elastic scaling, so as to achieve The overall goal of the operation and maintenance system. Combining big data visualization technology to display business operation status to business and technical parties in real time, provide future forecast operation data, enable the operation and maintenance team to do capacity management and performance management in advance, and as the central brain of the container cloud platform, direct the relevant container management platform to complete In the decision-making action of elastic scaling, the present invention determines multiple context information on one or more dates based on one or more dates, and performs each context model corresponding to the context information one by one. Matching, obtaining one or more prediction results in each situation model state, the prediction results include at least one core prediction result and multiple auxiliary prediction results, and determining the final prediction result according to one core prediction result and multiple auxiliary prediction results, The invention is convenient to use, high in efficiency, high in prediction accuracy, and has extremely high commercial value.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1示出了本发明的具体实施方式的，一种基于特征建模实现趋势预测的控制方法的具体结构示意图；FIG. 1 shows a specific structural schematic diagram of a control method for realizing trend prediction based on feature modeling according to a specific embodiment of the present invention;

图2示出了本发明的第一实施例的，逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果的具体流程示意图；Fig. 2 shows the first embodiment of the present invention, each of the context information is matched with the context model corresponding to the context information one by one, and one or more prediction results in each context model state are obtained Schematic diagram of the specific process;

图3示出了本发明的第二实施例的，建立所述情境模型的具体流程示意图；Fig. 3 shows the specific flowchart of establishing the situation model according to the second embodiment of the present invention;

图4示出了本发明的另一具体实施方式的，一种基于特征建模实现趋势预测的控制装置的模块连接示意图；以及Fig. 4 shows another specific embodiment of the present invention, a schematic diagram of module connection of a control device for realizing trend prediction based on feature modeling; and

图5示出了本发明的第三实施例的，一种基于特征建模实现趋势预测的控制装置的模块连接示意图。Fig. 5 shows a schematic diagram of module connection of a control device for realizing trend prediction based on feature modeling according to the third embodiment of the present invention.

具体实施方式Detailed ways

为了更好的使本发明的技术方案清晰的表示出来，下面结合附图对本发明作进一步说明。In order to better clearly express the technical solution of the present invention, the present invention will be further described below in conjunction with the accompanying drawings.

图1示出了本发明的具体实施方式的，一种基于特征建模实现趋势预测的控制方法的具体结构示意图，具体地，其基于动态获取数据、总结规则建模，并将外部数据应用于建模中实现趋势预测，包括如下步骤：Fig. 1 shows a specific structural diagram of a control method based on feature modeling to realize trend prediction according to a specific embodiment of the present invention. Specifically, it is based on dynamically acquiring data, summarizing rule modeling, and applying external data to Realize trend prediction in modeling, including the following steps:

首先，进入步骤S101，基于一个或多个日期确定一个或多个所述日期所处于的多个情境信息，在这样的实施例中，所述情境信息至少包括XGBoost基础模型信息、节假日模型信息、星期模型信息以及营销活动信息。本领域技术人员理解，本发明在对特征数据选择时，根据历史业务数据判断，例如业务量有明显的周期性，包括星期、月和年周期；春节、国庆、小长假对业务量有明显的影响，且影响程度有明显差异；总体波动较平稳，业务活动导致的峰谷不明显。基于以上，我们选择了如下特征：1.公历月、日；2.星期3.是否春节长假的某一天4.是否国庆长假的某一天5.是否小长假的某一天，即这里，我们可以选择单独一天的日期作为输入确定情境信息，也可以选择一个时间段作为输入，同时，还可以选择多个时间段与多个时刻作为输入，而我们可以根据上述中这些日期确定与所述日期相对应的情境信息。First, enter step S101, determine a plurality of context information on one or more dates based on one or more dates, in such an embodiment, the context information includes at least XGBoost basic model information, holiday model information, Weekly model information as well as marketing campaign information. Those skilled in the art understand that the present invention judges according to historical business data when selecting feature data, such as business volume has obvious periodicity, including week, month and year cycle; There are obvious differences in the degree of influence; the overall fluctuation is relatively stable, and the peaks and valleys caused by business activities are not obvious. Based on the above, we have selected the following features: 1. Gregorian month and day; 2. Week 3. Whether it is a certain day during the Spring Festival holiday 4. Whether it is a certain day during the National Day holiday 5. Whether it is a certain day during the small long holiday, that is, here, we can choose The date of a single day is used as the input to determine the situational information, and a time period can also be selected as the input. At the same time, multiple time periods and multiple moments can also be selected as the input, and we can determine the date corresponding to the date based on the above-mentioned dates situational information.

然后，进入步骤S102，逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果，所述预测结果至少包括一个核心预测结果以及多个辅助预测结果，在这样的实施例中，将每个与所述日期相对应的情境信息与情境模型进行匹配，即可获取每个情境模型状态下的一个或多个预测结果。所述预测结果至少包括XGBoost基础模型信息的预测结果、节假日模型信息的预测结果、星期模型信息的预测结果以及营销活动信息的预测结果中的一个或多个，所述预测结果至少包括一个核心预测结果以及多个辅助预测结果，而在本实施例中，所述XGBoost基础模型信息的预测结果即为所述核心预测结果，而节假日模型信息的预测结果、星期模型信息的预测结果以及营销活动信息的预测结果即为所述辅助预测结果。Then, enter step S102, match each of the situation information with the situation model corresponding to the situation information one by one, and obtain one or more prediction results under the state of each situation model, and the prediction results include at least one core A forecast result and a plurality of auxiliary forecast results. In such an embodiment, each context information corresponding to the date is matched with the context model to obtain one or more forecast results in each context model state . The prediction results include at least one or more of the prediction results of XGBoost basic model information, the prediction results of holiday model information, the prediction results of week model information, and the prediction results of marketing activity information, and the prediction results include at least one core prediction results and multiple auxiliary prediction results, and in this embodiment, the prediction results of the XGBoost basic model information are the core prediction results, and the prediction results of holiday model information, week model information and marketing activity information The prediction result of is the auxiliary prediction result.

再然后，进入步骤S103，根据一个核心预测结果以及多个辅助预测结果确定最终预测结果，在这样的实施例中，所述多个辅助预测结果确定最终预测结果可以转换为一定系数来对所述核心预测结果的校准，也可以按照权重比例的关系在核心预测结果的基础上进行运算，最终，得出最终预测结果。Then, enter step S103, and determine the final prediction result according to a core prediction result and multiple auxiliary prediction results. In such an embodiment, the final prediction result determined by the multiple auxiliary prediction results can be converted into a certain coefficient for the The calibration of the core forecast results can also be calculated on the basis of the core forecast results according to the relationship of the weight ratio, and finally the final forecast result is obtained.

最后，进入步骤S104，将所述最终预测结果进行动态展示，本领域技术人员理解，所述最终预测结果将作为表格、图像等形式进行展现，即在这样的实施例中，我们可以在设定的智能运行终端中输入指定的日期或时间段，经过运算，将最终预测结果展示出来，而展示的结果可能会根据不断输入的数据或者不断添加的历史数据而不断变化，使所述最终预测结果处于一种动态展示的画面。智能运维不依赖于人为指定规则，主张由机器学习算法自动地从海量运维数据(包括事件本身以及运维人员的人工处理日志)中不断地学习，不断地提炼并总结规则。即是在自动化运维的基础上，增加了一个基于机器学习的大脑，指挥监测系统采集大脑决策所需的数据，做出分析、决策，并指挥自动化脚本去执行大脑的决策，从而达到运维系统的整体目标。Finally, enter step S104, and dynamically display the final prediction result. Those skilled in the art understand that the final prediction result will be displayed in the form of a table, an image, etc., that is, in such an embodiment, we can set Enter the specified date or time period in the intelligent operation terminal of the computer, and display the final forecast results after calculation, and the displayed results may change according to the continuously input data or continuously added historical data, so that the final forecast results In a dynamic display screen. Intelligent operation and maintenance does not rely on human-specified rules, and advocates that machine learning algorithms automatically learn from massive operation and maintenance data (including the event itself and manual processing logs of operation and maintenance personnel), and continuously refine and summarize rules. That is, on the basis of automated operation and maintenance, a brain based on machine learning is added, and the command and monitoring system collects the data required for brain decision-making, makes analysis and decision-making, and directs automated scripts to execute brain decisions, so as to achieve operation and maintenance. overall goal of the system.

图2示出了本发明的第一实施例的，逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果的具体流程示意图，本领域技术人员理解，所述图2即为所述步骤S102的展开步骤，更为具体地，包括如下步骤：Fig. 2 shows the first embodiment of the present invention, each of the context information is matched with the context model corresponding to the context information one by one, and one or more prediction results in each context model state are obtained The specific flow diagram, as understood by those skilled in the art, the above-mentioned Figure 2 is the expansion step of the step S102, more specifically, includes the following steps:

首先，进入步骤S1021，基于所述XGBoost基础模型信息确定与所述XGBoost基础模型信息相匹配的核心预测结果，在这样的实施例中，所述XGBoost基础模型的建立将在后述的具体实施方式中做进一步地描述，而将所述XGBoost基础模型信息代入到所述XGBoost基础模型中，进行匹配运算，最终确定所述核心预测结果。First, enter step S1021, and determine the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information. In such an embodiment, the establishment of the XGBoost basic model will be described later in the specific implementation It is further described in , and the information of the XGBoost basic model is substituted into the XGBoost basic model, and a matching operation is performed to finally determine the core prediction result.

然后，进入步骤S1022，基于所述节假日模型信息确定与所述节假日模型信息相匹配的第一辅助预测结果，所述第一辅助预测结果即为基于所述节假日模型而得出的结果，而所述节假日模型的建立将在后述的具体实施方式中做进一步地描述。Then, enter step S1022, determine the first auxiliary prediction result matching the holiday model information based on the holiday model information, the first auxiliary prediction result is the result obtained based on the holiday model, and the The establishment of the above-mentioned holiday model will be further described in the specific implementation manner described later.

再然后，进入步骤S1023，基于所述星期模型信息确定与所述星期模型信息相匹配的第二辅助预测结果，所述第二辅助预测结果即为基于所述星期模型而得出的结果，而所述星期模型的建立将在后述的具体实施方式中做进一步地描述。Then, enter step S1023, determine the second auxiliary prediction result matching the week model information based on the week model information, the second auxiliary prediction result is the result obtained based on the week model, and The establishment of the week model will be further described in the specific implementation manner described later.

最后，进入步骤S1024，基于所述营销活动信息确定与所述营销活动模型信息相匹配的第三辅助预测结果，所述第三辅助预测结果即为基于所述营销活动模型而得出的结果，而所述营销活动模型的建立将在后述的具体实施方式中做进一步地描述。Finally, enter step S1024, determine the third auxiliary prediction result matching the marketing activity model information based on the marketing activity information, the third auxiliary prediction result is the result obtained based on the marketing activity model, The establishment of the marketing activity model will be further described in the specific implementation manner described later.

优选地，基于所述第一辅助预测结果、所述第二辅助预测结果以及所述第三辅助预测结果对所述核心预测结果进行调整，确定最终预测结果。本领域技术人员理解，行业政策、保险公司的业务活动对业务量有很大影响，相应资料的收集比较困难。但是产险相对寿险，受此类影响小得多。太平洋保险的车险业务一直比较平稳，营销活动也少，且报案、结案主要受客户活动影响，与企业端因素关系不大。在产险结案量建模时，暂不考虑政策、营销等因素。在参数调优，也就是如何基于所述第一辅助预测结果、所述第二辅助预测结果以及所述第三辅助预测结果对所述核心预测结果进行调整，确定最终预测结果时，本研究使用的Xgboost模型的参数调优主要指booster参数，包括：Preferably, the core prediction result is adjusted based on the first auxiliary prediction result, the second auxiliary prediction result and the third auxiliary prediction result to determine a final prediction result. Those skilled in the art understand that industry policies and business activities of insurance companies have a great impact on business volume, and it is difficult to collect corresponding data. However, property insurance is much less affected by such effects than life insurance. The auto insurance business of Pacific Insurance has been relatively stable, with few marketing activities, and the reporting and closing of cases are mainly affected by customer activities, and have little to do with enterprise-side factors. When modeling the amount of closed property insurance cases, factors such as policy and marketing are not considered for the time being. In parameter tuning, that is, how to adjust the core prediction result based on the first auxiliary prediction result, the second auxiliary prediction result and the third auxiliary prediction result, and determine the final prediction result, this study uses The parameter tuning of the Xgboost model mainly refers to booster parameters, including:

第一：eta[默认0.3]，和GBM中的learning rate参数类似，通过减少每一步的权重，可以提高模型的鲁棒性。其中典型值为0.01-0.2。First: eta [default 0.3], similar to the learning rate parameter in GBM, by reducing the weight of each step, the robustness of the model can be improved. Wherein the typical value is 0.01-0.2.

第二：min_child_weight[默认1]，决定最小叶子节点样本权重和。和GBM的min_child_leaf参数类似，但不完全一样。XGBoost的这个参数是最小样本权重的和，而GBM参数是最小样本总数。这个参数用于避免过拟合。当它的值较大时，可以避免模型学习到局部的特殊样本。但是如果这个值过高，会导致欠拟合。这个参数需要使用CV来调整。Second: min_child_weight[default 1], which determines the minimum leaf node sample weight sum. Similar to GBM's min_child_leaf parameter, but not exactly the same. This parameter of XGBoost is the sum of the minimum sample weights, while the GBM parameter is the minimum total number of samples. This parameter is used to avoid overfitting. When its value is large, it can prevent the model from learning local special samples. But if this value is too high, it will lead to underfitting. This parameter needs to be tuned using CV.

第三：max_depth[默认6]，和GBM中的参数相同，这个值为树的最大深度。这个值也是用来避免过拟合的。max_depth越大，模型会学到更具体更局部的样本。需要使用CV函数来进行调优。其中，典型值：3-10。Third: max_depth [default 6], the same as the parameters in GBM, this value is the maximum depth of the tree. This value is also used to avoid overfitting. The larger max_depth, the model will learn more specific and local samples. The CV function needs to be used for tuning. Among them, the typical value: 3-10.

第四：max_leaf_nodes，树上最大的节点或叶子的数量。可以替代max_depth的作用。因为如果生成的是二叉树，一个深度为n的树最多生成n2个叶子。如果定义了这个参数，GBM会忽略max_depth参数。Fourth: max_leaf_nodes, the largest number of nodes or leaves on the tree. Can replace the role of max_depth. Because if a binary tree is generated, a tree with a depth of n can generate at most n2 leaves. If this parameter is defined, GBM ignores the max_depth parameter.

第五：gamma[默认0]在节点分裂时，只有分裂后损失函数的值下降了，才会分裂这个节点。Gamma指定了节点分裂所需的最小损失函数下降值。这个参数的值越大，算法越保守。这个参数的值和损失函数息息相关，所以是需要调整的。Fifth: gamma [default 0] When a node is split, the node will only be split if the value of the loss function decreases after the split. Gamma specifies the minimum loss function drop value required for node splits. The larger the value of this parameter, the more conservative the algorithm. The value of this parameter is closely related to the loss function, so it needs to be adjusted.

第六：max_delta_step[默认0]，这参数限制每棵树权重改变的最大步长。如果这个参数的值为0，那就意味着没有约束。如果它被赋予了某个正值，那么它会让这个算法更加保守。通常，这个参数不需要设置。但是当各类别的样本十分不平衡时，它对逻辑回归是很有帮助的。这个参数一般用不到，但是你可以挖掘出来它更多的用处。Sixth: max_delta_step[default 0], this parameter limits the maximum step size of each tree weight change. If the value of this parameter is 0, it means that there is no constraint. If it is given some positive value, it makes the algorithm more conservative. Normally, this parameter does not need to be set. But it is very helpful for logistic regression when the samples of each category are very unbalanced. This parameter is generally not used, but you can find more uses for it.

第七：subsample[默认1]，和GBM中的subsample参数一模一样。这个参数控制对于每棵树，随机采样的比例。减小这个参数的值，算法会更加保守，避免过拟合。但是，如果这个值设置得过小，它可能会导致欠拟合。典型值：0.5-1。Seventh: subsample [default 1], which is exactly the same as the subsample parameter in GBM. This parameter controls the proportion of random sampling for each tree. Decreasing the value of this parameter will make the algorithm more conservative and avoid overfitting. However, if this value is set too small, it may cause underfitting. Typical value: 0.5-1.

第八：colsample_bytree[默认1]，和GBM里面的max_features参数类似。用来控制每棵随机采样的列数的占比(每一列是一个特征)。典型值：0.5-1。Eighth: colsample_bytree [default 1], similar to the max_features parameter in GBM. Used to control the ratio of the number of columns randomly sampled per tree (each column is a feature). Typical value: 0.5-1.

第九:colsample_bylevel[默认1],用来控制树的每一级的每一次分裂，对列数的采样的占比。subsample参数和colsample_bytree参数可以起到相同的作用。Ninth: colsample_bylevel [default 1], used to control each split of each level of the tree, the proportion of the sampling of the number of columns. The subsample parameter and the colsample_bytree parameter can play the same role.

第十：lambda[默认1]，权重的L2正则化项。(和Ridge regression类似)。这个参数是用来控制XGBoost的正则化部分的。虽然大部分数据科学家很少用到这个参数，但是这个参数在减少过拟合上还是可以挖掘出更多用处的。Tenth: lambda [default 1], the L2 regularization term of the weight. (Similar to Ridge regression). This parameter is used to control the regularization part of XGBoost. Although most data scientists rarely use this parameter, this parameter can still find more uses in reducing overfitting.

第十一：alpha[默认1]，权重的L1正则化项。(和Lasso regression类似)。可以应用在很高维度的情况下，使得算法的速度更快。Eleventh: alpha [default 1], the L1 regularization term of the weight. (Similar to Lasso regression). It can be applied in very high-dimensional situations, making the algorithm faster.

第十二：scale_pos_weight[默认1]，在各类别样本十分不平衡时，把这个参数设定为一个正值，可以使算法更快收敛。Twelfth: scale_pos_weight [default 1], when the samples of each category are very unbalanced, setting this parameter to a positive value can make the algorithm converge faster.

进一步地，模型集成，根据xgboost算法生成基本模型后，该模型对春节、国庆、小长假、星期等模式的预测，趋势基本正确，再根据历史数据，建立了星期模型、春节模型、国庆模型、小长假模型，用于对基础模型的输出做进一步调整。Further, the model is integrated. After the basic model is generated according to the xgboost algorithm, the prediction of the model for the Spring Festival, National Day, small holiday, week and other patterns, the trend is basically correct, and then based on historical data, the week model, the Spring Festival model, the National Day model, Small holiday model, used to make further adjustments to the output of the base model.

图3示出了本发明的第二实施例的，建立所述情境模型的具体流程示意图，所述情境模型的建立将通过如下方式获取：Fig. 3 shows the specific flowchart of establishing the situation model according to the second embodiment of the present invention, and the establishment of the situation model will be obtained by the following methods:

首先，进入步骤S201，基于一种或多种数据采集方式在一个或多个数据采集平台获取一个或多个原始外部信息，所述数据采集方式至少包括SQL导出、第三方API导出或者爬虫采集，所述数据采集平台至少包括运维派单系统、配置管理数据库、应用性能监测、看板系统或者统一日志平台。所述原始外部信息至少包括历史信息、公里、农历信息、节假日信息或者营销活动数据。First, enter step S201, and obtain one or more original external information on one or more data collection platforms based on one or more data collection methods, the data collection methods include at least SQL export, third-party API export or crawler collection, The data collection platform includes at least an operation and maintenance dispatch system, a configuration management database, application performance monitoring, a Kanban system or a unified log platform. The original external information includes at least historical information, kilometers, lunar calendar information, holiday information or marketing activity data.

然后，进入步骤S202，对所述一个或多个原始外部信息进行预处理，确定一个或多个预处理外部信息，优选地，所述预处理包括附件采集、数据筛选和数据编码。部分数据放在了附件中，这些数据虽然数量不多，但是比较重要。本系统以爬虫的方式从系统下载附件信息，将其加入相关程序中。Then, enter step S202, perform preprocessing on the one or more original external information, and determine one or more preprocessed external information, preferably, the preprocessing includes attachment collection, data screening and data encoding. Part of the data is placed in the attachment. Although these data are not large in number, they are more important. This system downloads attachment information from the system in the form of a crawler and adds it to the relevant program.

再然后，进入步骤S203，对所述一个或多个预处理外部信息进行特征提取，确定一个或多个情境特征信息，本发明采用业务知识与数据挖掘相结合的方式选择特征。首先通过走访业务人员的方式，搜集样本案例，和可能与其相关的特征，然后计算每一个特征与相应变量的相关性，从中筛选合适的候选特征。本发明中主要使用了皮尔森相关系数和互信息系数来衡量单变量的相关性。获得候选特征后，根据这些特征训练模型，根据模型的正确率和召回率评估特征的显著性。Then, enter step S203, perform feature extraction on the one or more pre-processed external information, and determine one or more context feature information, and the present invention selects features by combining business knowledge and data mining. First, collect sample cases and possible related features by visiting business personnel, and then calculate the correlation between each feature and the corresponding variable, and select suitable candidate features from them. In the present invention, the Pearson correlation coefficient and the mutual information coefficient are mainly used to measure the single-variable correlation. After obtaining the candidate features, train the model based on these features, and evaluate the significance of the features according to the accuracy and recall of the model.

最后，进入步骤S204，基于所述一个或多个情境信息建模，确定一个或多个与所述情境特征信息相匹配的情境模型。本领域技术人员理解，所述情境模型包括XGBoost基础模型、节假日模型信息、星期模型信息以及营销活动模型信息。基于前述步骤S201至步骤S203，确定一个或多个与所述情境特征信息相匹配的情境模型。Finally, enter step S204, and determine one or more situational models matching the situational feature information based on the one or more situational information models. Those skilled in the art understand that the situation model includes XGBoost basic model, holiday model information, week model information and marketing activity model information. Based on the foregoing steps S201 to S203, one or more situation models matching the situation characteristic information are determined.

图4示出了本发明的另一具体实施方式的，一种基于特征建模实现趋势预测的控制装置的模块连接示意图，包括第一确定装置1，基于一个或多个日期确定一个或多个所述日期所处于的多个情境信息，所述第一确定装置1的工作原理可以参考前述步骤S101，在此不予赘述。Fig. 4 shows another specific embodiment of the present invention, a schematic diagram of module connection of a control device for trend prediction based on feature modeling, including a first determination device 1, which determines one or more dates based on one or more dates For the multiple context information of the date, the working principle of the first determining device 1 can refer to the aforementioned step S101, which will not be repeated here.

进一步地，还包括第一获取装置2：逐一将每个所述情境信息与所述情境信息相对应的情境模型进行匹配，获取每个情境模型状态下的一个或多个预测结果，所述第一获取装置2的工作原理可以参考前述步骤S102，在此不予赘述。Further, it also includes a first obtaining means 2: matching each of the situation information with the situation model corresponding to the situation information one by one, and obtaining one or more prediction results under the state of each situation model, the first For the working principle of the acquiring device 2, reference may be made to the aforementioned step S102, which will not be repeated here.

进一步地，还包括第二确定装置3：根据一个核心预测结果以及多个辅助预测结果确定最终预测结果，所述第二确定装置3的工作原理可以参考前述步骤S103，在此不予赘述。Further, a second determination means 3 is also included: determining the final prediction result according to a core prediction result and multiple auxiliary prediction results. The working principle of the second determination means 3 can refer to the aforementioned step S103, which will not be repeated here.

进一步地，还包括第一处理装置4：将所述最终预测结果进行动态展示，所述第一处理装置4的工作原理可以参考前述步骤S104，在此不予赘述。Further, it also includes a first processing device 4: dynamically displaying the final prediction result, the working principle of the first processing device 4 can refer to the aforementioned step S104, which will not be repeated here.

进一步地，所述第一获取装置2包括第三确定装置21，基于所述XGBoost基础模型信息确定与所述XGBoost基础模型信息相匹配的核心预测结果，第三确定装置21的工作原理可以参考前述步骤S1021，在此不予赘述。Further, the first acquiring device 2 includes a third determining device 21, which determines the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information, and the working principle of the third determining device 21 can refer to the aforementioned Step S1021 will not be described in detail here.

进一步地，所述第一获取装置2还包括第四确定装置22，基于所述节假日模型信息确定与所述节假日模型信息相匹配的第一辅助预测结果，第四确定装置22的工作原理可以参考前述步骤S1022，在此不予赘述。Further, the first obtaining device 2 also includes a fourth determining device 22, which determines the first auxiliary prediction result matching the holiday model information based on the holiday model information, and the working principle of the fourth determining device 22 can be referred to The aforementioned step S1022 will not be described in detail here.

进一步地，所述第一获取装置2还包括第五确定装置23，基于所述星期模型信息确定与所述星期模型信息相匹配的第二辅助预测结果，所述第五确定装置23的工作原理可以参考前述步骤S1023，在此不予赘述。Further, the first acquiring device 2 also includes a fifth determining device 23, which determines a second auxiliary prediction result that matches the week model information based on the week model information. The working principle of the fifth determining device 23 is Reference may be made to the aforementioned step S1023, which will not be described in detail here.

进一步地，所述第一获取装置2还包括第六确定装置24，基于所述营销活动信息确定与所述营销活动信息相匹配的第三辅助预测结果，所述第六确定装置24的工作原理可以参考前述步骤S1024，在此不予赘述。Further, the first obtaining means 2 also includes a sixth determining means 24, which determines a third auxiliary prediction result matching the marketing activity information based on the marketing activity information, and the working principle of the sixth determining means 24 Reference may be made to the aforementioned step S1024, which will not be described in detail here.

图5示出了本发明的第三实施例的，一种基于特征建模实现趋势预测的控制装置的模块连接示意图，还包括第二获取装置5：基于一种或多种数据采集方式在一个或多个数据采集平台获取一个或多个原始外部信息，第二获取装置5的工作原理可以参考前述步骤S201，在此不予赘述。Fig. 5 shows a schematic diagram of module connection of a control device for realizing trend prediction based on feature modeling according to the third embodiment of the present invention, and also includes a second acquisition device 5: based on one or more data acquisition methods in one One or more data collection platforms obtain one or more original external information, and the working principle of the second obtaining device 5 can refer to the aforementioned step S201, which will not be repeated here.

进一步地，还包括第七确定装置6：对所述一个或多个原始外部信息进行预处理，确定一个或多个预处理外部信息，第七确定装置6的工作原理可以参考前述步骤S202，在此不予赘述。Further, it also includes a seventh determining means 6: preprocessing the one or more original external information, and determining one or more preprocessed external information, the working principle of the seventh determining means 6 can refer to the aforementioned step S202, in This will not be repeated.

进一步地，还包括第八确定装置7：对所述一个或多个预处理外部信息进行特征提取，确定一个或多个情境特征信息；第八确定装置7的工作原理可以参考前述步骤S203，在此不予赘述。Further, it also includes an eighth determining means 7: performing feature extraction on the one or more pre-processed external information, and determining one or more context feature information; the working principle of the eighth determining means 7 can refer to the aforementioned step S203, in This will not be repeated.

进一步地，还包括第九确定装置8：基于所述一个或多个情境信息建模，确定一个或多个与所述情境特征信息相匹配的情境模型，所述第九确定装置8的工作原理可以参考前述步骤S204，在此不予赘述。Further, it also includes a ninth determination means 8: based on the one or more context information models, determine one or more context models that match the context feature information, the working principle of the ninth determination means 8 Reference may be made to the aforementioned step S204, which will not be described in detail here.

以上对本发明的具体实施例进行了描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变形或修改，这并不影响本发明的实质内容。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. A control method based on feature modeling to realize trend forecasting, which is based on dynamic acquisition of data, summary rule modeling, and applying external data in modeling to realize trend forecasting, characterized in that, comprising the steps:

a. Determining a plurality of context information on one or more dates based on the one or more dates;

b. Match each of the situation information with the situation model corresponding to the situation information one by one, and obtain one or more prediction results in each situation model state, and the prediction results include at least one core prediction result and multiple Auxiliary prediction results;

c. Determine the final prediction result according to one core prediction result and multiple auxiliary prediction results.

2. The control method according to claim 1, characterized in that, after the step b, further comprising a step d: dynamically displaying the final prediction result.

3. The control method according to claim 1 or 2, characterized in that, in the step a, the context information at least includes:

XGBoost basic model information;

Holiday model information;

week model information; and

Campaign model information.

4. The control method according to claim 3, wherein said step b comprises the steps of:

b1. Determine the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information;

b2. Determining a first auxiliary prediction result that matches the holiday model information based on the holiday model information;

b3. Determining a second auxiliary prediction result that matches the week model information based on the week model information;

b4. Determine a third auxiliary prediction result that matches the marketing activity model information based on the marketing activity information.

5. The control method according to claim 4, wherein the step c comprises c1: based on the first auxiliary prediction result, the second auxiliary prediction result and the third auxiliary prediction result, the The core forecast results are adjusted to determine the final forecast result.

6. The control method according to claim 5, characterized in that, the establishment of the situational model will be obtained in the following manner:

i: Obtain one or more original external information on one or more data collection platforms based on one or more data collection methods;

ii: Preprocessing the one or more original external information, and determining one or more preprocessed external information;

iii: performing feature extraction on the one or more preprocessed external information, and determining one or more context feature information;

iv: Based on the one or more context information models, determine one or more context models that match the context feature information.

7. The control method according to claim 6, characterized in that, the data collection methods at least include any one or more of the following methods:

SQL export;

third-party API export; or

Reptile collection.

8. The control method according to claim 6, wherein the data collection platform includes at least any one or more of the following platforms:

Operation and maintenance dispatch system;

configuration management database;

application performance monitoring;

Kanban system; or

Unified logging platform.

9. The control method according to claim 6, wherein the original external information at least includes any one or more of the following information:

history information;

Kilometer, lunar calendar information;

holiday information; or

Marketing campaign data.

10. The control method according to claim 6, characterized in that the preprocessing includes attachment collection, data screening and data encoding.

11. A control device for realizing trend prediction based on feature modeling, characterized in that it comprises:

First determining means (1): based on one or more dates, determine a plurality of context information in which one or more dates are located;

The first obtaining means (2): matching each of the situation information with the situation model corresponding to the situation information one by one, and obtaining one or more prediction results under the state of each situation model;

The second determination means (3): determine the final prediction result according to a core prediction result and multiple auxiliary prediction results.

The first processing device (4): dynamically display the final prediction result.

12. The control device according to claim 11, characterized in that, the first obtaining device (2) comprises:

The third determining means (21): determine the core prediction result matching the XGBoost basic model information based on the XGBoost basic model information;

A fourth determining means (22): determining a first auxiliary prediction result matching the holiday model information based on the holiday model information;

Fifth determining means (23): determining a second auxiliary prediction result matching the week model information based on the week model information;

Sixth determining means (24): determining a third auxiliary prediction result that matches the marketing activity model information based on the marketing activity information.

13. The control device according to claim 11, further comprising:

The second obtaining device (5): obtain one or more original external information on one or more data collection platforms based on one or more data collection methods;

The seventh determining means (6): perform preprocessing on the one or more original external information, and determine one or more preprocessed external information;

The eighth determining device (7): performing feature extraction on the one or more pre-processed external information, and determining one or more context feature information;

Ninth determining means (8): based on the one or more situational information models, determine one or more situational models that match the situational feature information.