[go: up one dir, main page]

CN111429085A - Contract data generation method, device, electronic device and storage medium - Google Patents

Contract data generation method, device, electronic device and storage medium Download PDF

Info

Publication number
CN111429085A
CN111429085A CN202010135529.XA CN202010135529A CN111429085A CN 111429085 A CN111429085 A CN 111429085A CN 202010135529 A CN202010135529 A CN 202010135529A CN 111429085 A CN111429085 A CN 111429085A
Authority
CN
China
Prior art keywords
contract
data
information set
standard
contract information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010135529.XA
Other languages
Chinese (zh)
Inventor
李广翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010135529.XA priority Critical patent/CN111429085A/en
Publication of CN111429085A publication Critical patent/CN111429085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种合同数据生成方法、装置、电子设备及存储介质。该方法能够当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集,进一步根据函数映射关系构建数据分析函数,再根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集,并根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集,并利用所述合同配置宽表集生成合同数据,由于涉及多个建模过程,提高了合同数据生成的准确率,进一步对所述合同数据进行完整性校验,得到目标合同数据,提高了合同数据的完整性。

Figure 202010135529

The present invention provides a method, device, electronic device and storage medium for generating contract data. The method can perform de-abnormal preprocessing on the original contract information set when receiving the original contract information set to obtain a standard contract information set, further construct a data analysis function according to the function mapping relationship, and then analyze the data analysis function according to the data analysis function. The standard contract information set is identified, classified and stored to obtain a corresponding contract table list set, and data warehouse modeling is performed according to the corresponding contract table list set to obtain a contract configuration wide table set, and the contract configuration wide table set is used to generate For contract data, since multiple modeling processes are involved, the accuracy rate of contract data generation is improved, and the integrity of the contract data is further verified to obtain target contract data, which improves the integrity of the contract data.

Figure 202010135529

Description

合同数据生成方法、装置、电子设备及存储介质Contract data generation method, device, electronic device and storage medium

技术领域technical field

本发明涉及数据处理技术领域,尤其涉及一种合同数据生成方法、装置、电子设备及存储介质。The present invention relates to the technical field of data processing, and in particular, to a method, device, electronic device and storage medium for generating contract data.

背景技术Background technique

传统的合同数据生成方式通常按照一定的规则进行,并结合人工识别分析,不仅耗时较长,还将积累较多的硬编码,也无法快速确定方案的准确性,只能通过出现异常的合同数据反推出异常的环节,不仅造成了较高的修复成本,也降低了系统稳定性。The traditional method of generating contract data is usually carried out according to certain rules, combined with manual identification and analysis, which not only takes a long time, but also accumulates a lot of hard codes, and cannot quickly determine the accuracy of the plan. The abnormal link of data reverse rollout not only causes higher repair costs, but also reduces system stability.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容,有必要提供一种合同数据生成方法、装置、电子设备及存储介质,能够准确生成合同数据。In view of the above content, it is necessary to provide a contract data generation method, device, electronic device and storage medium, which can accurately generate contract data.

一种合同数据生成方法,所述方法包括:A method for generating contract data, the method comprising:

当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;When the original contract information set is received, de-abnormality preprocessing is performed on the original contract information set to obtain a standard contract information set;

根据函数映射关系构建数据分析函数;Build a data analysis function according to the function mapping relationship;

根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;Perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set;

根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;Perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set;

利用所述合同配置宽表集生成合同数据;generating contract data using the contract configuration wide table set;

对所述合同数据进行完整性校验,得到目标合同数据。Perform integrity check on the contract data to obtain target contract data.

根据本发明优选实施例,所述对所述原始合约信息集进行去异常预处理,得到标准合约信息集包括:According to a preferred embodiment of the present invention, performing anomaly-removing preprocessing on the original contract information set to obtain a standard contract information set includes:

对所述原始合约信息集中的数据进行绝对值运算,得到数据集;Perform an absolute value operation on the data in the original contract information set to obtain a data set;

将所述数据集中的数据转换为百分数,得到所述标准合约信息集。Convert the data in the data set into percentages to obtain the standard contract information set.

根据本发明优选实施例,所述根据函数映射关系构建数据分析函数包括:According to a preferred embodiment of the present invention, the construction of the data analysis function according to the function mapping relationship includes:

采用下述公式构建所述数据分析函数:The data analysis function is constructed using the following formula:

Figure BDA0002397166760000021
Figure BDA0002397166760000021

其中,R表示所述数据分析函数,D表示文件内容,L表示类别,

Figure BDA0002397166760000022
表示在所述标准合约信息集中含有配置类别属性Lj的所有特征组成的集合,WT表示所述原始合约信息集的文件名中所包含的特征词的集合,RT(D)表示应用数据分析方法,RB(D)表示BOW文件内容表示方法。Among them, R represents the data analysis function, D represents the file content, L represents the category,
Figure BDA0002397166760000022
Represents the set of all features that contain configuration category attribute L j in the standard contract information set, W T represents the set of feature words contained in the file name of the original contract information set, R T (D) represents application data Analysis method, RB (D) represents the BOW file content representation method.

根据本发明优选实施例,所述根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集包括:According to a preferred embodiment of the present invention, performing identification, classification and storage processing on the standard contract information set according to the data analysis function, and obtaining a corresponding contract table list set includes:

基于特征选择算法对所述标准合约信息集进行特征提取;Perform feature extraction on the standard contract information set based on a feature selection algorithm;

利用提取的特征对所述标准合约信息集中的数据进行分类,得到候选类别;Use the extracted features to classify the data in the standard contract information set to obtain candidate categories;

利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。A classifier corresponding to the data analysis function is used to determine the category of the data in the standard contract information set from the candidate categories to obtain the corresponding contract table list set.

根据本发明优选实施例,所述根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集包括:According to a preferred embodiment of the present invention, performing data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set includes:

确定当前项目的需求数据;Determine the requirements data for the current project;

根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理,得到所述合同配置宽表集。The data in the corresponding contract table list set is spliced according to the framework and the demand data to obtain the contract configuration wide table set.

根据本发明优选实施例,所述对所述合同数据进行完整性校验,得到目标合同数据包括:According to a preferred embodiment of the present invention, the performing integrity check on the contract data to obtain the target contract data includes:

从所述合同数据中获取新产品合同数据,校验所述新产品合同数据;及/或Obtain new product contract data from said contract data, and verify said new product contract data; and/or

从所述合同数据中获取相同类型的合同数据,校验所述相同类型的合同数据。Obtain contract data of the same type from the contract data, and verify the contract data of the same type.

根据本发明优选实施例,所述对所述合同数据进行完整性校验,得到目标合同数据包括:According to a preferred embodiment of the present invention, the performing integrity check on the contract data to obtain the target contract data includes:

基于Hash树热点窗口的存储器完整性校验方法对所述合同数据进行完整性校验,得到目标合同数据。The integrity of the contract data is verified by the memory integrity verification method based on the Hash tree hotspot window to obtain the target contract data.

一种合同数据生成装置,所述装置包括:An apparatus for generating contract data, the apparatus comprising:

预处理单元,用于当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;a preprocessing unit, configured to perform de-abnormality preprocessing on the original contract information set when receiving the original contract information set to obtain a standard contract information set;

构建单元,用于根据函数映射关系构建数据分析函数;The building unit is used to build the data analysis function according to the function mapping relationship;

处理单元,用于根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;a processing unit, configured to perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set;

建模单元,用于根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;a modeling unit, configured to perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set;

生成单元,用于利用所述合同配置宽表集生成合同数据;a generating unit, configured to generate contract data by using the contract configuration wide table set;

校验单元,用于对所述合同数据进行完整性校验,得到目标合同数据。A verification unit, configured to perform integrity verification on the contract data to obtain target contract data.

根据本发明优选实施例,所述预处理单元具体用于:According to a preferred embodiment of the present invention, the preprocessing unit is specifically used for:

对所述原始合约信息集中的数据进行绝对值运算,得到数据集;Perform an absolute value operation on the data in the original contract information set to obtain a data set;

将所述数据集中的数据转换为百分数,得到所述标准合约信息集。Convert the data in the data set into percentages to obtain the standard contract information set.

根据本发明优选实施例,所述构建单元具体用于:According to a preferred embodiment of the present invention, the building unit is specifically used for:

采用下述公式构建所述数据分析函数:The data analysis function is constructed using the following formula:

Figure BDA0002397166760000031
Figure BDA0002397166760000031

其中,R表示所述数据分析函数,D表示文件内容,L表示类别,

Figure BDA0002397166760000032
表示在所述标准合约信息集中含有配置类别属性Lj的所有特征组成的集合,WT表示所述原始合约信息集的文件名中所包含的特征词的集合,RT(D)表示应用数据分析方法,RB(D)表示BOW文件内容表示方法。Among them, R represents the data analysis function, D represents the file content, L represents the category,
Figure BDA0002397166760000032
Represents the set of all features that contain configuration category attribute L j in the standard contract information set, W T represents the set of feature words contained in the file name of the original contract information set, R T (D) represents application data Analysis method, RB (D) represents the BOW file content representation method.

根据本发明优选实施例,所述处理单元具体用于:According to a preferred embodiment of the present invention, the processing unit is specifically used for:

基于特征选择算法对所述标准合约信息集进行特征提取;Perform feature extraction on the standard contract information set based on a feature selection algorithm;

利用提取的特征对所述标准合约信息集中的数据进行分类,得到候选类别;Use the extracted features to classify the data in the standard contract information set to obtain candidate categories;

利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。A classifier corresponding to the data analysis function is used to determine the category of the data in the standard contract information set from the candidate categories to obtain the corresponding contract table list set.

根据本发明优选实施例,所述建模单元具体用于:According to a preferred embodiment of the present invention, the modeling unit is specifically used for:

确定当前项目的需求数据;Determine the requirements data for the current project;

根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理,得到所述合同配置宽表集。The data in the corresponding contract table list set is spliced according to the framework and the demand data to obtain the contract configuration wide table set.

根据本发明优选实施例,所述校验单元对所述合同数据进行完整性校验,得到目标合同数据包括:According to a preferred embodiment of the present invention, the verification unit performs integrity verification on the contract data, and obtaining the target contract data includes:

从所述合同数据中获取新产品合同数据,校验所述新产品合同数据;及/或Obtain new product contract data from said contract data, and verify said new product contract data; and/or

从所述合同数据中获取相同类型的合同数据,校验所述相同类型的合同数据。Obtain contract data of the same type from the contract data, and verify the contract data of the same type.

根据本发明优选实施例,所述校验单元对所述合同数据进行完整性校验,得到目标合同数据包括:According to a preferred embodiment of the present invention, the verification unit performs integrity verification on the contract data, and obtaining the target contract data includes:

基于Hash树热点窗口的存储器完整性校验方法对所述合同数据进行完整性校验,得到目标合同数据。The integrity of the contract data is verified by the memory integrity verification method based on the Hash tree hotspot window to obtain the target contract data.

一种电子设备,所述电子设备包括:An electronic device comprising:

存储器,存储至少一个指令;及a memory that stores at least one instruction; and

处理器,执行所述存储器中存储的指令以实现所述合同数据生成方法。A processor executes the instructions stored in the memory to implement the contract data generation method.

一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现所述合同数据生成方法。A computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the contract data generating method.

由以上技术方案可以看出,本发明能够当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集,进一步根据函数映射关系构建数据分析函数,再根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集,并根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集,并利用所述合同配置宽表集生成合同数据,由于涉及多个建模过程,提高了合同数据生成的准确率,进一步对所述合同数据进行完整性校验,得到目标合同数据,提高了合同数据的完整性。It can be seen from the above technical solutions that when the original contract information set is received, the present invention can perform de-abnormal preprocessing on the original contract information set to obtain a standard contract information set, and further construct a data analysis function according to the function mapping relationship, and then further. Identify, classify and store the standard contract information set according to the data analysis function to obtain a corresponding contract table list set, and perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set, and use The contract configuration wide table set generates contract data. Since multiple modeling processes are involved, the accuracy rate of contract data generation is improved, and the integrity of the contract data is further verified to obtain the target contract data, which improves the accuracy of the contract data. completeness.

附图说明Description of drawings

图1是本发明合同数据生成方法的较佳实施例的流程图。FIG. 1 is a flow chart of a preferred embodiment of the contract data generation method of the present invention.

图2是本发明合同数据生成装置的较佳实施例的功能模块图。FIG. 2 is a functional block diagram of a preferred embodiment of the contract data generating apparatus of the present invention.

图3是本发明实现合同数据生成方法的较佳实施例的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for generating contract data according to the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

如图1所示,是本发明合同数据生成方法的较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。As shown in FIG. 1, it is a flow chart of a preferred embodiment of the method for generating contract data of the present invention. According to different requirements, the order of the steps in this flowchart can be changed, and some steps can be omitted.

所述合同数据生成方法应用于一个或者多个电子设备中,所述电子设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital SignalProcessor,DSP)、嵌入式设备等。The contract data generation method is applied to one or more electronic devices, the electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but not Limited to microprocessors, application specific integrated circuits (ASICs), programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), digital processors (Digital SignalProcessors, DSPs), embedded devices, and the like.

所述电子设备可以是任何一种可与用户进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。The electronic device can be any electronic product that can interact with a user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network TV ( Internet Protocol Television, IPTV), smart wearable devices, etc.

所述电子设备还可以包括网络设备和/或用户设备。其中,所述网络设备包括,但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(CloudComputing)的由大量主机或网络服务器构成的云。The electronic equipment may also include network equipment and/or user equipment. Wherein, the network device includes, but is not limited to, a single network server, a server group formed by multiple network servers, or a cloud formed by a large number of hosts or network servers based on cloud computing (Cloud Computing).

所述电子设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。The network where the electronic device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

S10,当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集。S10, when the original contract information set is received, perform anomaly-removal preprocessing on the original contract information set to obtain a standard contract information set.

在本发明的至少一个实施例中,所述原始合约信息集中可以包括,但不限于以下一种或者多种信息的组合:合约、附约、险种、责任等相关信息,以及分段信息、费率相关记录信息等。In at least one embodiment of the present invention, the original contract information set may include, but is not limited to, a combination of one or more of the following information: related information such as contracts, supplementary agreements, insurance types, responsibilities, etc., as well as segment information, fees rate-related record information, etc.

在本发明的至少一个实施例中,所述电子设备对所述原始合约信息集进行去异常预处理,得到标准合约信息集包括:In at least one embodiment of the present invention, the electronic device performs anomaly-removal preprocessing on the original contract information set to obtain a standard contract information set including:

所述电子设备对所述原始合约信息集中的数据进行绝对值运算,得到数据集,并将所述数据集中的数据转换为百分数,得到所述标准合约信息集。The electronic device performs an absolute value operation on the data in the original contract information set to obtain a data set, and converts the data in the data set into a percentage to obtain the standard contract information set.

具体地,所述电子设备采用下述公式对所述原始合约信息集进行去异常预处理:Specifically, the electronic device uses the following formula to perform de-abnormality preprocessing on the original contract information set:

Za=|Z|%Z a = |Z|%

其中,Za为所述去异常处理过后的正确数据,Z为所述原始合约信息集。经过所述去异常预处理后,所述电子设备得到所述标准合约信息集。Among them, Z a is the correct data after the exception removal processing, and Z is the original contract information set. After the exception-removing preprocessing, the electronic device obtains the standard contract information set.

可以理解的是,如在保险合同签订的过程中,在填写费率时,总是将所述费率填写为某一个整数或小数(可能有时还会将数据填写为负数),而所述电子设备在处理数据时,只能对给出的数据进行指定处理,无法像人一样将数字以百分率来计算,通过上述预处理过程,则使所述原始合约信息集中的数据能够被所述电子设备正常处理。It is understandable that, for example, in the process of signing an insurance contract, when filling in the rate, the rate is always filled in as an integer or decimal (maybe sometimes the data is filled in as a negative number), while the electronic When the device processes data, it can only perform specified processing on the given data, and cannot calculate the number as a percentage like a human being. Through the above preprocessing process, the data in the original contract information set can be used by the electronic device. Process normally.

S11,根据函数映射关系构建数据分析函数。S11, construct a data analysis function according to the function mapping relationship.

在本发明的至少一个实施例中,当文件名无法判定类别时,则用传统的BOW(BackOrifice 2K Workspace)方法通过内容进行分类。这个分类模型可以用一个四元组M=<D,C,R,T>来表示,其中D表示文件内容,C表示类别,R表示数据分析函数,T表示分类器函数。用函数映射关系可以表示为(T·R):D→C。In at least one embodiment of the present invention, when the file name cannot determine the category, the traditional BOW (Back Orifice 2K Workspace) method is used to classify the content. This classification model can be represented by a four-tuple M=<D,C,R,T>, where D represents the file content, C represents the category, R represents the data analysis function, and T represents the classifier function. The function mapping relationship can be expressed as (T·R): D→C.

具体地,所述数据分析函数R定义如下:Specifically, the data analysis function R is defined as follows:

Figure BDA0002397166760000071
Figure BDA0002397166760000071

其中,

Figure BDA0002397166760000072
表示在所述标准合约信息集中含有配置类别属性Lj的所有特征组成的集合,WT表示所述原始合约信息集的文件名中所包含的特征词的集合,RT(D)表示应用数据分析方法,RB(D)表示BOW文件内容表示方法。in,
Figure BDA0002397166760000072
Represents the set of all features that contain configuration category attribute L j in the standard contract information set, W T represents the set of feature words contained in the file name of the original contract information set, R T (D) represents application data Analysis method, RB (D) represents the BOW file content representation method.

通过RT(D)表示所述标准合约信息集的文件名,能够有效降低文件中噪声特征对分类结果的影响,再通过RB(D)表示所述标准合约信息集的文件内容,并利用所述T函数在所有配置类别范围内执行分类操作,并存储到对应的数据表中,得到所述对应合同表清单集。Representing the file name of the standard contract information set by R T (D) can effectively reduce the influence of noise features in the file on the classification results, and then using R B (D) to represent the file content of the standard contract information set, and using The T function performs a classification operation within the scope of all configuration categories, and stores it in the corresponding data table to obtain the corresponding contract table list set.

S12,根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集。S12: Perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set.

在本发明的至少一个实施例中,所述电子设备根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集包括:In at least one embodiment of the present invention, the electronic device performs identification, classification and storage processing on the standard contract information set according to the data analysis function, and obtaining a corresponding contract table list set includes:

所述电子设备基于特征选择算法对所述标准合约信息集进行特征提取,并利用提取的特征对所述标准合约信息集中的数据进行分类,得到候选类别,进一步地,所述电子设备利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。The electronic device performs feature extraction on the standard contract information set based on a feature selection algorithm, and uses the extracted features to classify the data in the standard contract information set to obtain candidate categories. The classifier corresponding to the data analysis function determines the category of the data in the standard contract information set from the candidate categories, and obtains the corresponding contract table list set.

具体地,所述电子设备采用TCSR分类算法,通过识别所述标准合约信息集中的数据特征,并利用所述数据特征所属的语义类信息预测所述标准合约信息集的主题类别,进一步地,通过训练好的分类器对预测类别进行类别确认。即:所述TCRS分类算法首先对所述标准合约信息集进行特征提取,然后利用分类器进行分类。Specifically, the electronic device adopts the TCSR classification algorithm to predict the subject category of the standard contract information set by identifying the data features in the standard contract information set, and using the semantic class information to which the data features belong, and further, by The trained classifier performs class confirmation on the predicted class. That is, the TCRS classification algorithm first performs feature extraction on the standard contract information set, and then uses the classifier to classify.

首先,所述电子设备对所述标准合约信息集进行特征提取。First, the electronic device performs feature extraction on the standard contract information set.

具体地,所述电子设备采用基于类别信息的特征选择算法(ConstructiveApproachFeatureSelection,CAFS)的特征评分函数执行所述特征提取。所述特征评分函数是获取数据特征的有效方法,可形式化表示为一个映射:Fs:T→SH,其中T为特征空间,SH为特征值集合,Fs为所述特征评分函数。Fs函数一般从分散度和集中度两个方面对特征进行评估,所述分散度描述了数据特征在某类内部的分布情况,所述集中度则描述特征在不同类别之间的差异性。Specifically, the electronic device uses a feature scoring function of a feature selection algorithm based on category information (Constructive Approach Feature Selection, CAFS) to perform the feature extraction. The feature scoring function is an effective method to obtain data features, which can be formally expressed as a mapping: F s : T→ SH , where T is the feature space, SH is the feature value set, and F s is the feature scoring function . The F s function generally evaluates features from two aspects: dispersion degree and concentration degree. The dispersion degree describes the distribution of data features within a certain class, and the concentration degree describes the difference of features between different categories.

在所述特征提取算法中,引入特征类别贡献函数

Figure BDA0002397166760000081
及方差机制来衡量特征的重要程度,并根据重要程度选择特征。其中,特征类别贡献函数
Figure BDA0002397166760000082
(设特征为Wi∈T,i=1,…,n,类别为j,j=1,…,m)的定义如下:In the feature extraction algorithm, the feature category contribution function is introduced
Figure BDA0002397166760000081
and variance mechanism to measure the importance of features, and select features according to their importance. Among them, the feature category contribution function
Figure BDA0002397166760000082
(Let the feature be W i ∈ T, i=1,...,n, and the category be j,j=1,...,m) is defined as follows:

Figure BDA0002397166760000083
Figure BDA0002397166760000083

其中,

Figure BDA0002397166760000084
正比于Wi在所述标准合约信息集Cj中出现的频数,正比于Wi在所述标准合约信息集Cj中分布的均匀度,用于衡量特征对类别的重要性,其中fWij=Tij/Pj,Tij是Wi在所述标准合约信息集Cj中出现的频数,Pj是所述标准合约信息集Cj中某个数据出现的总次数;dWij=dij/Dj,dij是所述标准合约信息集Cj中出现Wi的文件数,Dj是所述标准合约信息集Cj中的文件个数。in,
Figure BDA0002397166760000084
It is proportional to the frequency of Wi i appearing in the standard contract information set C j , and is proportional to the uniformity of the distribution of Wi in the standard contract information set C j , used to measure the importance of the feature to the category, where fW ij =T ij /P j , T ij is the frequency of occurrence of Wi in the standard contract information set C j , P j is the total number of occurrences of a certain data in the standard contract information set C j ; dW ij = d ij /D j , d ij is the number of files with Wi in the standard contract information set C j , and D j is the number of files in the standard contract information set C j .

进一步地,在上述公式的基础上,所述特征评分函数Imp(wi)定义如下:Further, on the basis of the above formula, the feature scoring function Imp(w i ) is defined as follows:

Figure BDA0002397166760000091
Figure BDA0002397166760000091

Imp(wi)通过计算

Figure BDA0002397166760000092
的方差来评价Wi的重要性,Imp(wi)越大,表明Wi在不同类之间的贡献差异性越大,就更容易获取所述标准合约信息集的数据特征。式中:
Figure BDA0002397166760000093
Imp( wi ) is calculated by
Figure BDA0002397166760000092
The variance of W i is used to evaluate the importance of Wi . The larger Imp( wi ), the greater the contribution difference of Wi between different classes, and the easier it is to obtain the data features of the standard contract information set. where:
Figure BDA0002397166760000093

通过上述实施方式,即可确定所述候选类别。Through the above-mentioned implementation manner, the candidate category can be determined.

然后,所述电子设备利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。由于所述分类器函数T只在所述候选类别的范围内执行分类即可,因此效率较高。Then, the electronic device determines the category of the data in the standard contract information set from the candidate categories by using the classifier corresponding to the data analysis function, and obtains the corresponding contract table list set. Since the classifier function T only needs to perform classification within the range of the candidate categories, the efficiency is high.

S13,根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集。S13: Perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set.

例如:所述电子设备将合同表、子合同表、子合同险种表、子合同险种责任表、险种责任分段表、分段条件表、费率表等多个对应合同表清单集的数据生成一张再保方案配置宽表,即所述合同配置宽表集。For example, the electronic device generates data from a plurality of corresponding contract table list sets, such as a contract table, a sub-contract table, a sub-contract insurance category table, a sub-contract insurance category liability table, an insurance category liability segmentation table, a segmentation condition table, and a rate table. A reinsurance plan configuration wide table, that is, the contract configuration wide table set.

其中,所述数据仓库建模是指,为了将对现有业务系统工作的影响降到最低,所述电子设备采取分期逐步建模方式。The data warehouse modeling means that, in order to minimize the impact on the work of the existing business system, the electronic device adopts a step-by-step modeling approach.

具体地,所述分期逐步建模方式主要包含4个核心步骤:需求分析、概念设计、逻辑设计及物理设计。即所述电子设备首先获取项目需求,从数据库中得到所述对应合同表清单集,其次统筹所述合同配置宽表集的设计框架,再设计表拼接的计算逻辑思想和逻辑关联,最后实现概念模型的框架和逻辑模型的关联。Specifically, the phased and step-by-step modeling method mainly includes four core steps: requirement analysis, conceptual design, logical design and physical design. That is, the electronic device first obtains the project requirements, obtains the corresponding contract table list set from the database, then coordinates the design framework of the contract configuration wide table set, then designs the calculation logic idea and logical association of table splicing, and finally realizes the concept The framework of the model and the association of the logical model.

具体地,所述电子设备根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集包括:Specifically, the electronic device performs data warehouse modeling according to the corresponding contract table list set, and the obtained contract configuration wide table set includes:

所述电子设备确定当前项目的需求数据,并根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理,得到所述合同配置宽表集。The electronic device determines the demand data of the current project, and performs splicing processing on the data in the corresponding contract table list set according to the framework and the demand data to obtain the contract configuration wide table set.

其中,所述电子设备根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理可以采用如下公式:Wherein, the electronic device may use the following formula to perform splicing processing on the data in the corresponding contract table list set according to the framework and the demand data:

Figure BDA0002397166760000101
Figure BDA0002397166760000101

其中,Pi为所述对应合同表清单集中的第i个表,n为所述对应合同表清单集中表的个数,P为所述合同配置宽表集。由此实现所述对应合同表清单集的拼接,得到所述合同配置宽表集。Wherein, P i is the ith table in the corresponding contract table list set, n is the number of tables in the corresponding contract table list set, and P is the contract configuration wide table set. In this way, the splicing of the corresponding contract table list sets is realized, and the contract configuration wide table set is obtained.

S14,利用所述合同配置宽表集生成合同数据。S14, generating contract data by using the contract configuration wide table set.

在本发明的至少一个实施例中,所述合同配置宽表集中包含了一份合同中应该有的所有信息,并对所有的信息进行了排版。In at least one embodiment of the present invention, the contract configuration wide table includes all the information that should be in a contract, and typeset all the information.

所述合同配置宽表集保存于指定的数据库中,当所述电子设备需要调用所述合同配置宽表集时,可以利用数据库连接技术JDBC(Java DataBase Connectivity,java数据库连接)连接所述数据库,再利用数据库查询的SQL语句(Structured Query Language,结构化查询语言)将所述合同配置宽表集从所述数据库中逐一提取出来,以生成所述合同数据。The contract configuration wide table set is stored in a specified database, and when the electronic device needs to call the contract configuration wide table set, the database connection technology JDBC (Java DataBase Connectivity, java database connection) can be used to connect to the database, The contract configuration wide table set is then extracted from the database one by one by using a SQL statement (Structured Query Language, structured query language) for database query to generate the contract data.

其中,所述合同数据可以直接以表的形式显示在所述电子设备的显示器上。Wherein, the contract data can be directly displayed on the display of the electronic device in the form of a table.

S15,对所述合同数据进行完整性校验,得到目标合同数据。S15: Perform integrity check on the contract data to obtain target contract data.

在本发明的至少一个实施例中,所述电子设备对所述合同数据进行完整性校验,得到目标合同数据包括:In at least one embodiment of the present invention, the electronic device performs an integrity check on the contract data, and obtaining the target contract data includes:

所述电子设备从所述合同数据中获取新产品合同数据,校验所述新产品合同数据;及/或所述电子设备从所述合同数据中获取相同类型的合同数据,校验所述相同类型的合同数据。The electronic device acquires new product contract data from the contract data, and verifies the new product contract data; and/or the electronic device acquires the same type of contract data from the contract data, and verifies the same type of contract data. type of contract data.

通过上述实施方式,能够实现对所述合同数据快速且准确的校验,以提高了效率。Through the above-mentioned embodiments, fast and accurate verification of the contract data can be realized, so as to improve the efficiency.

在本发明的至少一个实施例中,所述电子设备对所述合同数据进行完整性校验,得到目标合同数据具体包括:In at least one embodiment of the present invention, the electronic device performs an integrity check on the contract data, and obtaining the target contract data specifically includes:

所述电子设备基于Hash(哈希)树热点窗口的存储器完整性校验方法对所述合同数据进行完整性校验,得到目标合同数据。The electronic device performs integrity verification on the contract data based on the memory integrity verification method of the Hash tree hotspot window to obtain the target contract data.

具体地,所述电子设备采用Hash树热点窗口的完整性校验方法,以PC(PersonalComputer,个人计算机)机系统作为实现环境,由于通常的CPU不具备校验所需的功能,因此,所述电子设备引入一个附加的“校验器”作为校验的可信计算基。所述校验器具备对所述合同数据的访问、Hash计算、缓存等功能。Specifically, the electronic device adopts the integrity verification method of the Hash tree hotspot window, and uses a PC (Personal Computer, personal computer) computer system as the implementation environment. Since the usual CPU does not have the functions required for verification, the described The electronic device introduces an additional "verifier" as a trusted computing base for verification. The validator has functions such as access to the contract data, Hash calculation, and caching.

进一步地,所述电子设备在对所述合同数据进行校验时,首先利用Bus coupler(总线耦合器)将所述合同数据耦合在系统的存储器总线上,并对窗口的访问频率进行排序,再确定当前热点窗口的位置,并在访问簇区发生改变时生成新的窗口序列并指示热点窗口平移,再按照标准的MD5信息摘要算法(MD5 Message-Digest Algorithm,)计算所需的Hash值,最后通过Transaction queue(事务队列)引入“弱脱机校验”的概念,由于校验结果的生成总会滞后于对应合同数据的访问,因此在连续发起合同生成访问事务的情况下,校验过程将迫使系统延迟下一访问事务的初始化,以便等待当前访问的完整性校验过程执行完毕。对于所述合同数据,若其离散的间隔大于校验所带来的延迟,则不影响,否则通过缓冲若干次访问事务,可以实施一定程度的“脱机”,以允许CPU在连续发起若干次访问事务时不必每次都等校验完成,从而可以降低校验延迟带来的影响。多次校验完成后即可得到所述目标合同数据。Further, when the electronic device verifies the contract data, it first uses a Bus coupler to couple the contract data to the memory bus of the system, sorts the access frequency of the window, and then Determine the position of the current hotspot window, generate a new window sequence when the access cluster area changes, and instruct the hotspot window to translate, and then calculate the required Hash value according to the standard MD5 Message-Digest Algorithm, and finally The concept of "weak offline verification" is introduced through the Transaction queue. Since the generation of verification results always lags behind the access to the corresponding contract data, in the case of continuously initiating contract generation access transactions, the verification process will Force the system to delay the initialization of the next access transaction in order to wait for the completion of the integrity verification process of the current access. For the contract data, if the discrete interval is greater than the delay caused by the verification, it will not affect, otherwise, by buffering several access transactions, a certain degree of "offline" can be implemented to allow the CPU to initiate several consecutive times. When accessing a transaction, it is not necessary to wait for the verification to be completed every time, which can reduce the impact of verification delay. The target contract data can be obtained after multiple verifications are completed.

由以上技术方案可以看出,本发明能够当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集,进一步根据函数映射关系构建数据分析函数,再根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集,并根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集,并利用所述合同配置宽表集生成合同数据,由于涉及多个建模过程,提高了合同数据生成的准确率,进一步对所述合同数据进行完整性校验,得到目标合同数据,提高了合同数据的完整性。It can be seen from the above technical solutions that when the original contract information set is received, the present invention can perform de-abnormal preprocessing on the original contract information set to obtain a standard contract information set, and further construct a data analysis function according to the function mapping relationship, and then further. Identify, classify and store the standard contract information set according to the data analysis function to obtain a corresponding contract table list set, and perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set, and use The contract configuration wide table set generates contract data. Since multiple modeling processes are involved, the accuracy rate of contract data generation is improved, and the integrity of the contract data is further verified to obtain the target contract data, which improves the accuracy of the contract data. completeness.

如图2所示,是本发明合同数据生成装置的较佳实施例的功能模块图。所述合同数据生成装置11包括预处理单元110、构建单元111、处理单元112、建模单元113、生成单元114以及校验单元115。本发明所称的模块/单元是指一种能够被处理器13所执行,并且能够完成固定功能的一系列计算机程序段,其存储在存储器12中。在本实施例中,关于各模块/单元的功能将在后续的实施例中详述。As shown in FIG. 2, it is a functional block diagram of a preferred embodiment of the device for generating contract data according to the present invention. The contract data generating apparatus 11 includes a preprocessing unit 110 , a constructing unit 111 , a processing unit 112 , a modeling unit 113 , a generating unit 114 and a checking unit 115 . The modules/units referred to in the present invention refer to a series of computer program segments that can be executed by the processor 13 and can perform fixed functions, and are stored in the memory 12 . In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.

当接收到原始合约信息集时,预处理单元110对所述原始合约信息集进行去异常预处理,得到标准合约信息集。When receiving the original contract information set, the preprocessing unit 110 performs abnormal preprocessing on the original contract information set to obtain a standard contract information set.

在本发明的至少一个实施例中,所述原始合约信息集中可以包括,但不限于以下一种或者多种信息的组合:合约、附约、险种、责任等相关信息,以及分段信息、费率相关记录信息等。In at least one embodiment of the present invention, the original contract information set may include, but is not limited to, a combination of one or more of the following information: related information such as contracts, supplementary agreements, insurance types, responsibilities, etc., as well as segment information, fees rate-related record information, etc.

在本发明的至少一个实施例中,所述预处理单元110对所述原始合约信息集进行去异常预处理,得到标准合约信息集包括:In at least one embodiment of the present invention, the preprocessing unit 110 performs de-abnormality preprocessing on the original contract information set, to obtain a standard contract information set including:

所述预处理单元110对所述原始合约信息集中的数据进行绝对值运算,得到数据集,并将所述数据集中的数据转换为百分数,得到所述标准合约信息集。The preprocessing unit 110 performs an absolute value operation on the data in the original contract information set to obtain a data set, and converts the data in the data set into a percentage to obtain the standard contract information set.

具体地,所述预处理单元110采用下述公式对所述原始合约信息集进行去异常预处理:Specifically, the preprocessing unit 110 uses the following formula to perform de-abnormality preprocessing on the original contract information set:

Za=|Z|%Z a = |Z|%

其中,Za为所述去异常处理过后的正确数据,Z为所述原始合约信息集。经过所述去异常预处理后,所述预处理单元110得到所述标准合约信息集。Among them, Z a is the correct data after the exception removal processing, and Z is the original contract information set. After the exception removal preprocessing, the preprocessing unit 110 obtains the standard contract information set.

可以理解的是,如在保险合同签订的过程中,在填写费率时,总是将所述费率填写为某一个整数或小数(可能有时还会将数据填写为负数),而所述预处理单元110在处理数据时,只能对给出的数据进行指定处理,无法像人一样将数字以百分率来计算,通过上述预处理过程,则使所述原始合约信息集中的数据能够被所述电子设备正常处理。It is understandable that, for example, in the process of signing an insurance contract, when filling in the rate, the rate is always filled in as a certain integer or decimal (sometimes the data may also be filled in as a negative number), and When processing the data, the processing unit 110 can only perform specified processing on the given data, and cannot calculate the numbers as percentages like a human being. Through the above preprocessing process, the data in the original contract information set can be Electronic equipment is processed normally.

构建单元111根据函数映射关系构建数据分析函数。The construction unit 111 constructs the data analysis function according to the function mapping relationship.

在本发明的至少一个实施例中,当文件名无法判定类别时,则用传统的BOW(BackOrifice 2K Workspace)方法通过内容进行分类。这个分类模型可以用一个四元组M=<D,C,R,T>来表示,其中D表示文件内容,C表示类别,R表示数据分析函数,T表示分类器函数。用函数映射关系可以表示为(T·R):D→C。In at least one embodiment of the present invention, when the file name cannot determine the category, the traditional BOW (Back Orifice 2K Workspace) method is used to classify the content. This classification model can be represented by a four-tuple M=<D,C,R,T>, where D represents the file content, C represents the category, R represents the data analysis function, and T represents the classifier function. The function mapping relationship can be expressed as (T·R): D→C.

具体地,所述数据分析函数R定义如下:Specifically, the data analysis function R is defined as follows:

Figure BDA0002397166760000131
Figure BDA0002397166760000131

其中,

Figure BDA0002397166760000132
表示在所述标准合约信息集中含有配置类别属性Lj的所有特征组成的集合,WT表示所述原始合约信息集的文件名中所包含的特征词的集合,RT(D)表示应用数据分析方法,RB(D)表示BOW文件内容表示方法。in,
Figure BDA0002397166760000132
Represents the set of all features that contain configuration category attribute L j in the standard contract information set, W T represents the set of feature words contained in the file name of the original contract information set, R T (D) represents application data Analysis method, RB (D) represents the BOW file content representation method.

通过RT(D)表示所述标准合约信息集的文件名,能够有效降低文件中噪声特征对分类结果的影响,再通过RB(D)表示所述标准合约信息集的文件内容,并利用所述T函数在所有配置类别范围内执行分类操作,并存储到对应的数据表中,得到所述对应合同表清单集。Using RT ( D ) to represent the file name of the standard contract information set can effectively reduce the impact of noise features in the file on the classification results, and then using R B (D) to represent the file content of the standard contract information set, and using The T function performs a classification operation within the scope of all configuration categories, and stores it in the corresponding data table to obtain the corresponding contract table list set.

处理单元112根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集。The processing unit 112 performs identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set.

在本发明的至少一个实施例中,所述处理单元112根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集包括:In at least one embodiment of the present invention, the processing unit 112 performs identification, classification and storage processing on the standard contract information set according to the data analysis function, and obtaining a corresponding contract table list set includes:

所述处理单元112基于特征选择算法对所述标准合约信息集进行特征提取,并利用提取的特征对所述标准合约信息集中的数据进行分类,得到候选类别,进一步地,所述处理单元112利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。The processing unit 112 performs feature extraction on the standard contract information set based on the feature selection algorithm, and uses the extracted features to classify the data in the standard contract information set to obtain candidate categories. Further, the processing unit 112 uses The classifier corresponding to the data analysis function determines the category of the data in the standard contract information set from the candidate categories, and obtains the corresponding contract table list set.

具体地,所述处理单元112采用TCSR分类算法,通过识别所述标准合约信息集中的数据特征,并利用所述数据特征所属的语义类信息预测所述标准合约信息集的主题类别,进一步地,通过训练好的分类器对预测类别进行类别确认。即:所述TCRS分类算法首先对所述标准合约信息集进行特征提取,然后利用分类器进行分类。Specifically, the processing unit 112 adopts the TCSR classification algorithm to predict the subject category of the standard contract information set by identifying the data features in the standard contract information set, and using the semantic class information to which the data features belong, and further, The predicted categories are confirmed by the trained classifier. That is, the TCRS classification algorithm first performs feature extraction on the standard contract information set, and then uses the classifier to classify.

首先,所述处理单元112对所述标准合约信息集进行特征提取。First, the processing unit 112 performs feature extraction on the standard contract information set.

具体地,所述处理单元112采用基于类别信息的特征选择算法(ConstructiveApproachFeatureSelection,CAFS)的特征评分函数执行所述特征提取。所述特征评分函数是获取数据特征的有效方法,可形式化表示为一个映射:Fs:T→SH,其中T为特征空间,SH为特征值集合,Fs为所述特征评分函数。Fs函数一般从分散度和集中度两个方面对特征进行评估,所述分散度描述了数据特征在某类内部的分布情况,所述集中度则描述特征在不同类别之间的差异性。Specifically, the processing unit 112 uses a feature scoring function of a feature selection algorithm based on category information (Constructive Approach Feature Selection, CAFS) to perform the feature extraction. The feature scoring function is an effective method to obtain data features, which can be formally expressed as a mapping: F s : T→ SH , where T is the feature space, SH is the feature value set, and F s is the feature scoring function . The F s function generally evaluates features from two aspects: dispersion degree and concentration degree. The dispersion degree describes the distribution of data features within a certain class, and the concentration degree describes the difference of features between different categories.

在所述特征提取算法中,引入特征类别贡献函数

Figure BDA0002397166760000141
及方差机制来衡量特征的重要程度,并根据重要程度选择特征。其中,特征类别贡献函数
Figure BDA0002397166760000142
(设特征为Wi∈T,i=1,…,n,类别为j,j=1,…,m)的定义如下:In the feature extraction algorithm, the feature category contribution function is introduced
Figure BDA0002397166760000141
and variance mechanism to measure the importance of features, and select features according to their importance. Among them, the feature category contribution function
Figure BDA0002397166760000142
(Let the feature be W i ∈ T, i=1,...,n, and the category be j,j=1,...,m) is defined as follows:

Figure BDA0002397166760000143
Figure BDA0002397166760000143

其中,

Figure BDA0002397166760000144
正比于Wi在所述标准合约信息集Cj中出现的频数,正比于Wi在所述标准合约信息集Cj中分布的均匀度,用于衡量特征对类别的重要性,其中fWij=Tij/Pj,Tij是Wi在所述标准合约信息集Cj中出现的频数,Pj是所述标准合约信息集Cj中某个数据出现的总次数;dWij=dij/Dj,dij是所述标准合约信息集Cj中出现Wi的文件数,Dj是所述标准合约信息集Cj中的文件个数。in,
Figure BDA0002397166760000144
Proportional to the frequency of Wi i appearing in the standard contract information set C j , proportional to the uniformity of the distribution of Wi in the standard contract information set C j , used to measure the importance of the feature to the category, where fW ij =T ij /P j , T ij is the frequency of occurrence of Wi in the standard contract information set C j , P j is the total number of occurrences of a certain data in the standard contract information set C j ; dW ij = d ij /D j , d ij is the number of files with Wi in the standard contract information set C j , and D j is the number of files in the standard contract information set C j .

进一步地,在上述公式的基础上,所述特征评分函数Imp(wi)定义如下:Further, on the basis of the above formula, the feature scoring function Imp(w i ) is defined as follows:

Figure BDA0002397166760000151
Figure BDA0002397166760000151

Imp(Wi)通过计算

Figure BDA0002397166760000152
的方差来评价Wi的重要性,Imp(Wi)越大,表明Wi在不同类之间的贡献差异性越大,就更容易获取所述标准合约信息集的数据特征。式中:
Figure BDA0002397166760000153
Imp(W i ) is calculated by
Figure BDA0002397166760000152
The variance of W i is used to evaluate the importance of Wi . The larger the Imp(W i ), the greater the difference in the contribution of Wi between different classes, and the easier it is to obtain the data features of the standard contract information set. where:
Figure BDA0002397166760000153

通过上述实施方式,即可确定所述候选类别。Through the above-mentioned implementation manner, the candidate category can be determined.

然后,所述处理单元112利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。由于所述分类器函数T只在所述候选类别的范围内执行分类即可,因此效率较高。Then, the processing unit 112 determines the category of the data in the standard contract information set from the candidate categories by using the classifier corresponding to the data analysis function, and obtains the corresponding contract table list set. Since the classifier function T only needs to perform classification within the range of the candidate categories, the efficiency is high.

建模单元113根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集。The modeling unit 113 performs data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set.

例如:所述建模单元113将合同表、子合同表、子合同险种表、子合同险种责任表、险种责任分段表、分段条件表、费率表等多个对应合同表清单集的数据生成一张再保方案配置宽表,即所述合同配置宽表集。For example, the modeling unit 113 converts the contract table, sub-contract table, sub-contract insurance category table, sub-contract insurance category responsibility table, insurance category responsibility segmentation table, segmentation condition table, rate table, etc. into a plurality of corresponding contract table list sets. The data generates a reinsurance plan configuration wide table, that is, the contract configuration wide table set.

其中,所述数据仓库建模是指,为了将对现有业务系统工作的影响降到最低,所述建模单元113采取分期逐步建模方式。Wherein, the data warehouse modeling means that, in order to minimize the impact on the existing business system, the modeling unit 113 adopts a step-by-step modeling approach.

具体地,所述分期逐步建模方式主要包含4个核心步骤:需求分析、概念设计、逻辑设计及物理设计。即所述建模单元113首先获取项目需求,从数据库中得到所述对应合同表清单集,其次统筹所述合同配置宽表集的设计框架,再设计表拼接的计算逻辑思想和逻辑关联,最后实现概念模型的框架和逻辑模型的关联。Specifically, the phased and step-by-step modeling method mainly includes four core steps: requirement analysis, conceptual design, logical design and physical design. That is, the modeling unit 113 first obtains the project requirements, obtains the corresponding contract table list set from the database, then coordinates the design framework of the contract configuration wide table set, and then designs the calculation logic idea and logical association of table splicing, and finally Realize the association of conceptual model framework and logical model.

具体地,所述建模单元113根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集包括:Specifically, the modeling unit 113 performs data warehouse modeling according to the corresponding contract table list set, and obtains the contract configuration wide table set including:

所述建模单元113确定当前项目的需求数据,并根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理,得到所述合同配置宽表集。The modeling unit 113 determines the demand data of the current project, and performs splicing processing on the data in the corresponding contract table list set according to the framework and the demand data, to obtain the contract configuration wide table set.

其中,所述建模单元113根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理可以采用如下公式:Wherein, the modeling unit 113 may use the following formula to perform splicing processing on the data in the corresponding contract table list set according to the framework and the demand data:

Figure BDA0002397166760000161
Figure BDA0002397166760000161

其中,Pi为所述对应合同表清单集中的第i个表,n为所述对应合同表清单集中表的个数,P为所述合同配置宽表集。由此实现所述对应合同表清单集的拼接,得到所述合同配置宽表集。Wherein, P i is the ith table in the corresponding contract table list set, n is the number of tables in the corresponding contract table list set, and P is the contract configuration wide table set. In this way, the splicing of the corresponding contract table list sets is realized, and the contract configuration wide table set is obtained.

生成单元114利用所述合同配置宽表集生成合同数据。The generating unit 114 generates contract data using the contract configuration wide table set.

在本发明的至少一个实施例中,所述合同配置宽表集中包含了一份合同中应该有的所有信息,并对所有的信息进行了排版。In at least one embodiment of the present invention, the contract configuration wide table includes all the information that should be in a contract, and typeset all the information.

所述合同配置宽表集保存于指定的数据库中,当所述生成单元114需要调用所述合同配置宽表集时,可以利用数据库连接技术JDBC(Java DataBase Connectivity,java数据库连接)连接所述数据库,再利用数据库查询的SQL语句(Structured Query Language,结构化查询语言)将所述合同配置宽表集从所述数据库中逐一提取出来,以生成所述合同数据。The contract configuration wide table set is stored in a specified database, and when the generating unit 114 needs to call the contract configuration wide table set, the database connection technology JDBC (Java DataBase Connectivity, java database connection) can be used to connect to the database , and then use the SQL statement (Structured Query Language, structured query language) for database query to extract the contract configuration wide table set from the database one by one to generate the contract data.

其中,所述合同数据可以直接以表的形式显示在电子设备的显示器上。Wherein, the contract data can be directly displayed on the display of the electronic device in the form of a table.

校验单元115对所述合同数据进行完整性校验,得到目标合同数据。The verification unit 115 performs integrity verification on the contract data to obtain target contract data.

在本发明的至少一个实施例中,所述校验单元115对所述合同数据进行完整性校验,得到目标合同数据包括:In at least one embodiment of the present invention, the verification unit 115 performs integrity verification on the contract data, and obtaining the target contract data includes:

所述校验单元115从所述合同数据中获取新产品合同数据,校验所述新产品合同数据;及/或所述校验单元115从所述合同数据中获取相同类型的合同数据,校验所述相同类型的合同数据。The verification unit 115 acquires new product contract data from the contract data, and verifies the new product contract data; and/or the verification unit 115 acquires the same type of contract data from the contract data, and verifies the contract data of the new product. Verify the same type of contract data described.

通过上述实施方式,能够实现对所述合同数据快速且准确的校验,以提高了效率。Through the above-mentioned embodiments, fast and accurate verification of the contract data can be realized, so as to improve the efficiency.

在本发明的至少一个实施例中,所述校验单元115对所述合同数据进行完整性校验,得到目标合同数据具体包括:In at least one embodiment of the present invention, the verification unit 115 performs integrity verification on the contract data, and obtaining the target contract data specifically includes:

所述校验单元115基于Hash(哈希)树热点窗口的存储器完整性校验方法对所述合同数据进行完整性校验,得到目标合同数据。The verification unit 115 performs integrity verification on the contract data based on the memory integrity verification method of the Hash tree hotspot window to obtain the target contract data.

具体地,所述校验单元115采用Hash树热点窗口的完整性校验方法,以PC(Personal Computer,个人计算机)机系统作为实现环境,由于通常的CPU不具备校验所需的功能,因此,所述校验单元115引入一个附加的“校验器”作为校验的可信计算基。所述校验器具备对所述合同数据的访问、Hash计算、缓存等功能。Specifically, the verification unit 115 adopts the integrity verification method of the Hash tree hotspot window, and uses a PC (Personal Computer, personal computer) computer system as the implementation environment. , the verification unit 115 introduces an additional "verifier" as a trusted computing base for verification. The validator has functions such as access to the contract data, Hash calculation, and caching.

进一步地,所述校验单元115在对所述合同数据进行校验时,首先利用Buscoupler(总线耦合器)将所述合同数据耦合在系统的存储器总线上,并对窗口的访问频率进行排序,再确定当前热点窗口的位置,并在访问簇区发生改变时生成新的窗口序列并指示热点窗口平移,再按照标准的MD5信息摘要算法(MD5 Message-Digest Algorithm,)计算所需的Hash值,最后通过Transaction queue(事务队列)引入“弱脱机校验”的概念,由于校验结果的生成总会滞后于对应合同数据的访问,因此在连续发起合同生成访问事务的情况下,校验过程将迫使系统延迟下一访问事务的初始化,以便等待当前访问的完整性校验过程执行完毕。对于所述合同数据,若其离散的间隔大于校验所带来的延迟,则不影响,否则通过缓冲若干次访问事务,可以实施一定程度的“脱机”,以允许CPU在连续发起若干次访问事务时不必每次都等校验完成,从而可以降低校验延迟带来的影响。多次校验完成后即可得到所述目标合同数据。Further, when verifying the contract data, the verification unit 115 first uses a Buscoupler to couple the contract data to the memory bus of the system, and sorts the access frequency of the window, Then determine the position of the current hotspot window, generate a new window sequence when the access cluster area changes, and instruct the hotspot window to translate, and then calculate the required Hash value according to the standard MD5 Message-Digest Algorithm, Finally, the concept of "weak offline verification" is introduced through the Transaction queue (transaction queue). Since the generation of verification results always lags behind the access to the corresponding contract data, in the case of continuously initiating contract generation access transactions, the verification process The system will be forced to delay the initialization of the next access transaction in order to wait for the completion of the integrity check process of the current access. For the contract data, if the discrete interval is greater than the delay caused by the verification, it will not affect, otherwise, by buffering several access transactions, a certain degree of "offline" can be implemented to allow the CPU to initiate several consecutive times. When accessing a transaction, it is not necessary to wait for the verification to be completed every time, which can reduce the impact of verification delay. The target contract data can be obtained after multiple verifications are completed.

由以上技术方案可以看出,本发明能够当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集,进一步根据函数映射关系构建数据分析函数,再根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集,并根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集,并利用所述合同配置宽表集生成合同数据,由于涉及多个建模过程,提高了合同数据生成的准确率,进一步对所述合同数据进行完整性校验,得到目标合同数据,提高了合同数据的完整性。It can be seen from the above technical solutions that when the original contract information set is received, the present invention can perform de-abnormal preprocessing on the original contract information set to obtain a standard contract information set, and further construct a data analysis function according to the function mapping relationship, and then further. Identify, classify and store the standard contract information set according to the data analysis function to obtain a corresponding contract table list set, and perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set, and use The contract configuration wide table set generates contract data. Since multiple modeling processes are involved, the accuracy rate of contract data generation is improved, and the integrity of the contract data is further verified to obtain the target contract data, which improves the accuracy of the contract data. completeness.

如图3所示,是本发明实现合同数据生成方法的较佳实施例的电子设备的结构示意图。As shown in FIG. 3 , it is a schematic structural diagram of an electronic device according to a preferred embodiment of the method for generating contract data according to the present invention.

所述电子设备1可以包括存储器12、处理器13和总线,还可以包括存储在所述存储器12中并可在所述处理器13上运行的计算机程序,例如合同数据生成程序。The electronic device 1 may include a memory 12, a processor 13 and a bus, and may also include a computer program stored in the memory 12 and executable on the processor 13, such as a contract data generation program.

本领域技术人员可以理解,所述示意图仅仅是电子设备1的示例,并不构成对电子设备1的限定,所述电子设备1既可以是总线型结构,也可以是星形结构,所述电子设备1还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置,例如所述电子设备1还可以包括输入输出设备、网络接入设备等。Those skilled in the art can understand that the schematic diagram is only an example of the electronic device 1, and does not constitute a limitation on the electronic device 1. The electronic device 1 can be either a bus-type structure or a star-shaped structure. The device 1 may also include more or less other hardware or software than shown, or different component arrangements, for example, the electronic device 1 may also include input and output devices, network access devices, and the like.

需要说明的是,所述电子设备1仅为举例,其他现有的或今后可能出现的电子产品如可适应于本发明,也应包含在本发明的保护范围以内,并以引用方式包含于此。It should be noted that the electronic device 1 is only an example. If other existing or future electronic products can be adapted to the present invention, they should also be included within the protection scope of the present invention, and are incorporated herein by reference. .

其中,存储器12至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器12在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。存储器12在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,存储器12还可以既包括电子设备1的内部存储单元也包括外部存储设备。存储器12不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如合同数据生成程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 12 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. . The memory 12 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 12 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device 1 ) card, flash memory card (Flash Card) and so on. Further, the memory 12 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 12 can be used not only to store application software installed in the electronic device 1 and various types of data, such as the code of the contract data generation program, etc., but also to temporarily store data that has been output or will be output.

处理器13在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。处理器13是所述电子设备1的控制核心(Control Unit),利用各种接口和线路连接整个电子设备1的各个部件,通过运行或执行存储在所述存储器12内的程序或者模块(例如执行合同数据生成程序等),以及调用存储在所述存储器12内的数据,以执行电子设备1的各种功能和处理数据。The processor 13 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more central processing units. CPU (Central Processing Unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 13 is the control core (Control Unit) of the electronic device 1, and uses various interfaces and lines to connect the various components of the entire electronic device 1, by running or executing the program or module (for example, executing the program) stored in the memory 12. contract data generation program, etc.), and call data stored in the memory 12 to execute various functions of the electronic device 1 and process data.

所述处理器13执行所述电子设备1的操作系统以及安装的各类应用程序。所述处理器13执行所述应用程序以实现上述各个合同数据生成方法实施例中的步骤,例如图1所示的步骤S10、S11、S12、S13、S14、S15。The processor 13 executes the operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in each of the above embodiments of the contract data generation method, such as steps S10, S11, S12, S13, S14, and S15 shown in FIG. 1 .

或者,所述处理器13执行所述计算机程序时实现上述各装置实施例中各模块/单元的功能,例如:Alternatively, when the processor 13 executes the computer program, the functions of the modules/units in the above device embodiments are implemented, for example:

当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;When the original contract information set is received, de-abnormality preprocessing is performed on the original contract information set to obtain a standard contract information set;

根据函数映射关系构建数据分析函数;Build a data analysis function according to the function mapping relationship;

根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;Perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set;

根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;Perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set;

利用所述合同配置宽表集生成合同数据;generating contract data using the contract configuration wide table set;

对所述合同数据进行完整性校验,得到目标合同数据。Perform integrity check on the contract data to obtain target contract data.

示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器12中,并由所述处理器13执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在所述电子设备1中的执行过程。例如,所述计算机程序可以被分割成预处理单元110、构建单元111、处理单元112、建模单元113、生成单元114以及校验单元115。Exemplarily, the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to complete the present invention. invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 1 . For example, the computer program may be divided into a preprocessing unit 110 , a construction unit 111 , a processing unit 112 , a modeling unit 113 , a generating unit 114 and a checking unit 115 .

上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、计算机设备,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present invention. part.

所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指示相关的硬件设备来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。If the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware devices through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above method embodiments can be implemented.

其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) .

总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,在图3中仅用一根箭头表示,但并不表示仅有一根总线或一种类型的总线。所述总线被设置为实现所述存储器12以及至少一个处理器13等之间的连接通信。The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus, or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one arrow is shown in FIG. 3, but it does not mean that there is only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 and the like.

尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器13逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components, preferably, the power source may be logically connected to the at least one processor 13 through a power management device, so as to be implemented by the power management device Charge management, discharge management, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.

应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

图3仅示出了具有组件12-13的电子设备1,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 3 only shows the electronic device 1 with components 12-13. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include less than shown in the figure. Or more components, or a combination of certain components, or a different arrangement of components.

结合图1,所述电子设备1中的所述存储器12存储多个指令以实现一种合同数据生成方法,所述处理器13可执行所述多个指令从而实现:1 , the memory 12 in the electronic device 1 stores a plurality of instructions to implement a method for generating contract data, and the processor 13 can execute the plurality of instructions to implement:

当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;When the original contract information set is received, de-abnormality preprocessing is performed on the original contract information set to obtain a standard contract information set;

根据函数映射关系构建数据分析函数;Build a data analysis function according to the function mapping relationship;

根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;Perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set;

根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;Perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set;

利用所述合同配置宽表集生成合同数据;generating contract data using the contract configuration wide table set;

对所述合同数据进行完整性校验,得到目标合同数据。Perform integrity check on the contract data to obtain target contract data.

具体地,所述处理器13对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above-mentioned instruction by the processor 13, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1 , which is not repeated here.

在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention.

因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim.

此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.

最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1.一种合同数据生成方法,其特征在于,所述方法包括:1. A method for generating contract data, wherein the method comprises: 当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;When the original contract information set is received, de-abnormality preprocessing is performed on the original contract information set to obtain a standard contract information set; 根据函数映射关系构建数据分析函数;Build a data analysis function according to the function mapping relationship; 根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;Perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set; 根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;Perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set; 利用所述合同配置宽表集生成合同数据;generating contract data using the contract configuration wide table set; 对所述合同数据进行完整性校验,得到目标合同数据。Perform integrity check on the contract data to obtain target contract data. 2.如权利要求1所述的合同数据生成方法,其特征在于,所述对所述原始合约信息集进行去异常预处理,得到标准合约信息集包括:2. The method for generating contract data according to claim 1, wherein the de-abnormal preprocessing is performed on the original contract information set to obtain a standard contract information set, comprising: 对所述原始合约信息集中的数据进行绝对值运算,得到数据集;Perform an absolute value operation on the data in the original contract information set to obtain a data set; 将所述数据集中的数据转换为百分数,得到所述标准合约信息集。Convert the data in the data set into percentages to obtain the standard contract information set. 3.如权利要求1所述的合同数据生成方法,其特征在于,所述根据函数映射关系构建数据分析函数包括:3. The method for generating contract data according to claim 1, wherein the constructing the data analysis function according to the function mapping relationship comprises: 采用下述公式构建所述数据分析函数:The data analysis function is constructed using the following formula:
Figure FDA0002397166750000011
Figure FDA0002397166750000011
其中,R表示所述数据分析函数,D表示文件内容,L表示类别,
Figure FDA0002397166750000012
表示在所述标准合约信息集中含有配置类别属性Lj的所有特征组成的集合,WT表示所述原始合约信息集的文件名中所包含的特征词的集合,RT(D)表示应用数据分析方法,RB(D)表示BOW文件内容表示方法。
Among them, R represents the data analysis function, D represents the file content, L represents the category,
Figure FDA0002397166750000012
Represents the set of all features that contain configuration category attribute L j in the standard contract information set, W T represents the set of feature words contained in the file name of the original contract information set, R T (D) represents application data Analysis method, RB (D) represents the BOW file content representation method.
4.如权利要求1所述的合同数据生成方法,其特征在于,所述根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集包括:4. The method for generating contract data according to claim 1, wherein the identifying, classifying and storing the standard contract information set according to the data analysis function, and obtaining the corresponding contract table list set comprises: 基于特征选择算法对所述标准合约信息集进行特征提取;Perform feature extraction on the standard contract information set based on a feature selection algorithm; 利用提取的特征对所述标准合约信息集中的数据进行分类,得到候选类别;Use the extracted features to classify the data in the standard contract information set to obtain candidate categories; 利用与所述数据分析函数对应的分类器从所述候选类别中确定所述标准合约信息集中数据的类别,得到所述对应合同表清单集。A classifier corresponding to the data analysis function is used to determine the category of the data in the standard contract information set from the candidate categories to obtain the corresponding contract table list set. 5.如权利要求1所述的合同数据生成方法,其特征在于,所述根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集包括:5. The method for generating contract data according to claim 1, wherein the performing data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set comprises: 确定当前项目的需求数据;Determine the requirements data for the current project; 根据所述框架及所述需求数据对所述对应合同表清单集中的数据进行拼接处理,得到所述合同配置宽表集。The data in the corresponding contract table list set is spliced according to the framework and the demand data to obtain the contract configuration wide table set. 6.如权利要求1所述的合同数据生成方法,其特征在于,所述对所述合同数据进行完整性校验,得到目标合同数据包括:6. The method for generating contract data according to claim 1, wherein the performing integrity check on the contract data to obtain the target contract data comprises: 从所述合同数据中获取新产品合同数据,校验所述新产品合同数据;及/或Obtain new product contract data from said contract data, and verify said new product contract data; and/or 从所述合同数据中获取相同类型的合同数据,校验所述相同类型的合同数据。Obtain contract data of the same type from the contract data, and verify the contract data of the same type. 7.如权利要求1所述的合同数据生成方法,其特征在于,所述对所述合同数据进行完整性校验,得到目标合同数据包括:7. The method for generating contract data according to claim 1, wherein the performing integrity check on the contract data to obtain the target contract data comprises: 基于Hash树热点窗口的存储器完整性校验方法对所述合同数据进行完整性校验,得到目标合同数据。The integrity of the contract data is verified by the memory integrity verification method based on the Hash tree hotspot window to obtain the target contract data. 8.一种合同数据生成装置,其特征在于,所述装置包括:8. An apparatus for generating contract data, wherein the apparatus comprises: 预处理单元,用于当接收到原始合约信息集时,对所述原始合约信息集进行去异常预处理,得到标准合约信息集;a preprocessing unit, configured to perform de-abnormality preprocessing on the original contract information set when receiving the original contract information set to obtain a standard contract information set; 构建单元,用于根据函数映射关系构建数据分析函数;The building unit is used to build the data analysis function according to the function mapping relationship; 处理单元,用于根据所述数据分析函数对所述标准合约信息集进行识别分类存储处理,得到对应合同表清单集;a processing unit, configured to perform identification, classification and storage processing on the standard contract information set according to the data analysis function to obtain a corresponding contract table list set; 建模单元,用于根据所述对应合同表清单集进行数据仓库建模,得到合同配置宽表集;a modeling unit, configured to perform data warehouse modeling according to the corresponding contract table list set to obtain a contract configuration wide table set; 生成单元,用于利用所述合同配置宽表集生成合同数据;a generating unit, configured to generate contract data by using the contract configuration wide table set; 校验单元,用于对所述合同数据进行完整性校验,得到目标合同数据。A verification unit, configured to perform integrity verification on the contract data to obtain target contract data. 9.一种电子设备,其特征在于,所述电子设备包括:9. An electronic device, characterized in that the electronic device comprises: 存储器,存储至少一个指令;及a memory that stores at least one instruction; and 处理器,执行所述存储器中存储的指令以实现如权利要求1至7中任意一项所述的合同数据生成方法。a processor to execute the instructions stored in the memory to implement the contract data generation method as claimed in any one of claims 1 to 7. 10.一种计算机可读存储介质,其特征在于:所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现如权利要求1至7中任意一项所述的合同数据生成方法。10. A computer-readable storage medium, characterized in that: the computer-readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the method as claimed in claims 1 to 7 The contract data generation method of any one of the above.
CN202010135529.XA 2020-03-02 2020-03-02 Contract data generation method, device, electronic device and storage medium Pending CN111429085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010135529.XA CN111429085A (en) 2020-03-02 2020-03-02 Contract data generation method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010135529.XA CN111429085A (en) 2020-03-02 2020-03-02 Contract data generation method, device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN111429085A true CN111429085A (en) 2020-07-17

Family

ID=71547332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010135529.XA Pending CN111429085A (en) 2020-03-02 2020-03-02 Contract data generation method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111429085A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465117A (en) * 2020-11-25 2021-03-09 平安科技(深圳)有限公司 Contract generation model construction method, device, equipment and storage medium
CN113918553A (en) * 2021-10-22 2022-01-11 深圳市中博科创信息技术有限公司 Data aggregation method and device, electronic equipment and storage medium
CN114493551A (en) * 2022-03-28 2022-05-13 中国光大银行股份有限公司 Contract generation method and device, electronic equipment and storage medium
CN114610718A (en) * 2022-03-08 2022-06-10 华泰证券股份有限公司 Off-site derivative contract element three-level structure storage management method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102376029A (en) * 2010-08-27 2012-03-14 上海宝信软件股份有限公司 Architecting device and method of information object model for decision analysis
CN108885631A (en) * 2016-02-22 2018-11-23 塔塔咨询服务有限公司 Method and system for contract management in a data marketplace
CN109190098A (en) * 2018-08-15 2019-01-11 上海唯识律简信息科技有限公司 A kind of document automatic creation method and system based on natural language processing
CN109933783A (en) * 2019-01-31 2019-06-25 华融融通(北京)科技有限公司 A kind of essence of a contract method of non-performing asset operation field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102376029A (en) * 2010-08-27 2012-03-14 上海宝信软件股份有限公司 Architecting device and method of information object model for decision analysis
CN108885631A (en) * 2016-02-22 2018-11-23 塔塔咨询服务有限公司 Method and system for contract management in a data marketplace
CN109190098A (en) * 2018-08-15 2019-01-11 上海唯识律简信息科技有限公司 A kind of document automatic creation method and system based on natural language processing
CN109933783A (en) * 2019-01-31 2019-06-25 华融融通(北京)科技有限公司 A kind of essence of a contract method of non-performing asset operation field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
侯方勇, 王志英, 刘真: "基于Hash树热点窗口的存储器完整性校验方法", 计算机学报, no. 11, 12 November 2004 (2004-11-12), pages 1471 - 1479 *
王强 等;: "基于标题类别语义识别的文本分类算法研究", 电子与信息学报, vol. 29, no. 12, 31 December 2007 (2007-12-31), pages 2885 - 2889 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465117A (en) * 2020-11-25 2021-03-09 平安科技(深圳)有限公司 Contract generation model construction method, device, equipment and storage medium
CN112465117B (en) * 2020-11-25 2024-05-07 平安科技(深圳)有限公司 Contract generation model construction method, device, equipment and storage medium
CN113918553A (en) * 2021-10-22 2022-01-11 深圳市中博科创信息技术有限公司 Data aggregation method and device, electronic equipment and storage medium
CN114610718A (en) * 2022-03-08 2022-06-10 华泰证券股份有限公司 Off-site derivative contract element three-level structure storage management method and device
CN114610718B (en) * 2022-03-08 2025-02-25 华泰证券股份有限公司 A method and device for storing and managing three-level structures of over-the-counter derivative contract elements
CN114493551A (en) * 2022-03-28 2022-05-13 中国光大银行股份有限公司 Contract generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2022160449A1 (en) Text classification method and apparatus, electronic device, and storage medium
CN113961584B (en) Field lineage analysis method, device, electronic device and storage medium
WO2021212683A1 (en) Law knowledge map-based query method and apparatus, and electronic device and medium
CN111429085A (en) Contract data generation method, device, electronic device and storage medium
CN112052370A (en) Message generation method and device, electronic equipment and computer readable storage medium
CN111428458A (en) Universal report generation method and device and computer readable storage medium
CN112732567B (en) Mock data testing method and device based on ip, electronic equipment and storage medium
CN112347042A (en) File uploading method and device, electronic equipment and storage medium
CN113961473A (en) Data testing method and device, electronic equipment and computer readable storage medium
CN112597135A (en) User classification method and device, electronic equipment and readable storage medium
CN113868528A (en) Information recommendation method, device, electronic device and readable storage medium
CN118568256B (en) Method and device for evaluating text classification performance of large language model
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN113434542B (en) Data relationship identification method and device, electronic equipment and storage medium
CN111651292A (en) Data verification method, apparatus, electronic device and computer-readable storage medium
CN114881616A (en) Business process execution method and device, electronic equipment and storage medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN111859985B (en) AI customer service model test method and device, electronic equipment and storage medium
CN113051171A (en) Interface test method, device, equipment and storage medium
CN114968816A (en) Strategy testing method, device, equipment and storage medium based on data simulation
CN113592606B (en) Product recommendation method, device, equipment and storage medium based on multiple decisions
CN112667244B (en) Data verification method, device, electronic equipment and computer readable storage medium
CN114003720A (en) Business document classification method, device, equipment and storage medium
CN114978964A (en) Communication announcement configuration method, device, equipment and medium based on network self-checking
CN112948705A (en) Intelligent matching method, device and medium based on policy big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination