CN112465393B

CN112465393B - Enterprise risk early warning method based on correlation analysis FP-Tree algorithm

Info

Publication number: CN112465393B
Application number: CN202011461438.1A
Authority: CN
Inventors: 吴志雄; 甘建武; 李晓琼; 黄鼎
Original assignee: Linewell Software Co Ltd
Current assignee: Linewell Software Co Ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2022-07-08
Anticipated expiration: 2040-12-09
Also published as: CN112465393A; WO2022121083A1

Abstract

The invention relates to an enterprise risk early warning method based on an FP-Tree algorithm of correlation analysis. An enterprise index data set is constructed, then a mutual entropy-interval set method is used for carrying out box separation and chi-square test correlation screening indexes, and finally a correlation analysis FP-Tree algorithm is used for carrying out enterprise risk early warning. The method and the system not only can analyze the enterprise risk from single index data, but also can integrate two or more index data to mine the enterprise risk, and mine the risk existing in the enterprise more comprehensively.

Description

Enterprise risk early warning method based on correlation analysis FP-Tree algorithm

Technical Field

The invention belongs to the field of enterprise risk early warning, and particularly relates to an enterprise risk early warning method based on an FP-Tree algorithm of correlation analysis.

Background

The enterprise activities are comprehensive social activities integrating various aspects of economy, technology, management, organization and the like, and have uncertainty in all aspects. The enterprise risk early warning is an effective means for carrying out risk pre-control, resolving the occurrence of risks and reducing the loss caused by the risks to the minimum degree by establishing a risk assessment system. The risk analysis and management of enterprise activities are developed, the occurrence of risks is prevented and solved, the loss caused by the risks is controlled to the minimum, and the risk analysis and management method becomes one of important measures for ensuring the enterprise operation activities and creating the maximum benefits. The enterprise risk early warning index system is a scale and an important basis for measuring the financial risk condition of an enterprise. The construction of a risk early warning index system which accords with the characteristics of enterprises follows the following basic principle: (1) a comprehensive principle; (2) a scientific principle; (3) a target principle; (4) a typical principle; (5) operability principle; (6) and (4) a fairness principle.

In the prior art, enterprise risks are divided into internal risks and external risks, and the risk classification method comprises four risk comprehensive indexes: financial, technical, business, and strategic.

(1) Financial risk factor: including liquidity, financing, investment, compensation, profit, asset utilization, growth, etc.

(2) Technical risk factor: including trademarks, patents, software copyrights, works, key technologies, etc.

(3) Operating risk factors: including jurisdictions, business anomalies, administrative penalties, and the like.

(4) Strategic risk factors: including contests, business associations, development history, etc.

Currently, the following methods are mostly adopted for enterprise risk early warning: in the aspect of external environment risks, a six-force analysis model is used for reference, and the competitive environment of an enterprise is analyzed; in the aspect of internal environment risks, an index system mainly comprising financial risk factors, technical risk factors, operational risk factors and strategic risk factors is established by combining the availability of research documents and data at home and abroad, common rating methods comprise a judgment analysis method, a comprehensive judgment method, a fuzzy analysis method and the like, and finally, an early warning interval is set according to a judgment result and corresponding countermeasures are taken.

The early warning in the prior art is analyzed from single index data analysis or integral index data, and due to the current situations that the basic professional knowledge of enterprises is lack, the dimensionality of enterprise data is high, and the data volume of the enterprises is large, and the current enterprise risk early warning needs longer time in information acquisition, updating, processing and analysis, and cannot realize dynamic processing, the timeliness of the risk early warning is seriously influenced, so that the enterprise risk early warning has serious time errors to a great extent.

Disclosure of Invention

The invention aims to provide an enterprise risk early warning method based on a correlation analysis FP-Tree algorithm, which not only can analyze enterprise risks from single index data, but also can integrate two or more index data to mine enterprise risks, and mine the risks existing in the enterprises more comprehensively.

In order to achieve the purpose, the technical scheme of the invention is as follows: an enterprise risk early warning method based on an association analysis FP-Tree algorithm comprises the following steps:

step S1, according to the historical enterprise relevant behavior data, analyzing and measuring the scale and important basis of the enterprise risk condition, and designing a risk index system X ═ X₁,x₂,…,x_i}，x_iA name representing an ith index of a risk index system;

step S2, according to the risk index system, big data analysis is applied to form a risk rule, namely, the value of one or more indexes is equal to a preset value or a preset interval value, the enterprise is considered to have corresponding risk, and a risk rule set B is obtained:

wherein, X_kIs a subset of index system X; risk_kIs composed of X_kAnalyzing the corresponding risk text description obtained by inference;

step S3, acquiring enterprise related behavior data, constructing a training index data set of an enterprise risk early warning model and an enterprise index data set to be early warned, and training the training set in the training index data set: test set 4: 1;

step S4, based on the training index data set, calculating and obtaining the corresponding risk level of the enterprise through the credit dimension data of the enterprise, wherein the calculation formula is as follows:

among them, creditScore_newRepresenting the normalized value of the latest credit risk score, 100. creditScore_newA base score as a risk score; creditScore_iRepresenting the credit risk score of the previous i years,

stability condition representing a credit score; the riskListCount represents the number of times of blacklisting or loss of credit in the last 5 years, and the 4. riskListCount represents the risk of blacklisting or loss of credit;

step S5, performing box separation and chi-square test on correlation screening indexes by using a mutual entropy-interval set method, performing index tokenization according to box separation results, and storing box separation rules and a remaining index list after screening;

step S6, acquiring association rule set: mining association rules of enterprise behaviors of each risk level of the enterprise by using an association analysis FP-Tree algorithm, traversing the association rules and integrating the association rules into an association rule set consisting of an index set, a risk level and a confidence coefficient, wherein the association rule set consists of elements in the form of (index set): (risk level, confidence coefficient) with the confidence coefficient being more than 0.5;

wherein, A represents a certain index set; b represents a certain risk level;

representing the confidence coefficient of the risk level B inferred by the index set A; count (A ≈ B) and count (A) respectively represent the number of samples in which the same sample exists in the element in the index set A and the risk level B at the same time, and the number of samples in which the same sample exists in the element in the index set A;

and S7, according to the association rule set obtained in the step S6 and the risk rule set obtained in the step S2, early warning is carried out on the enterprise to be early warned based on the index data set of the enterprise to be early warned, association rules of enterprise life are early warned, the enterprise risk level and possible risk points are predicted, and early warning results are output.

In an embodiment of the present invention, in step S5, the specific implementation manner of performing bin sorting and chi-square test correlation screening indexes by using the mutual entropy-interval set method is as follows:

for indexes with discrete variable attributes and continuous variables with more than 5 value types, the indexes are subjected to box separation by using a supervised mutual entropy-interval method, and the continuous variables are symbolized according to box separation results, so that the risk of model overfitting is reduced;

the mutual entropy-interval set method carries out the box dividing steps as follows:

step 0, presetting a threshold value threshold and a maximum box dividing number n;

index I to be binned includes

And (3) performing box separation on the index I, wherein the initial box separation Boundary value set is Boundary { a, b }:

step 1, taking

Will [ a, b ]]Divided into two intervals [ a, a₀]、(a₀,b]And combining mutual information and information entropy to provide a new category uncertain evaluation function MiEncopy:

wherein t is an interval; c is a set of classes, { C ═ C₁,c₂,…,c_mM is the number of categories; p (c)_i)、p(t)、p(t,c_i) Respectively in the training set c_iThe number of samples of class, the number of samples of index value in interval t, the index value in interval t and belonging to c_iThe ratio of the number of class samples to the total number of training set samples, p (c)_iI t) index value belongs to the interval t and c_iIs proportional to the number of samples of the index value in the interval t, eta is a hyper-parameter and satisfies eta ∈ [0 ],1]；

Using MiEntrophpy pairs [ a, a ]₀]、(a₀,b]Evaluating, and turning to the step 2;

step 2, if MiEncopy ([ a, a)₀]) Not less than throshold or MiEntrophy ((a)₀,b]) If the value is more than or equal to throshold, a is₀Adding the mixture into Boundary, and turning to the step 3;

step 3, obtaining the box number numb (I) of the index I according to Boundary:

if numb (I) is more than or equal to n, the box separation is stopped

If MiEntrophy ([ a, a)₀]) A is greater than or equal to throshold, and a is taken as a, b is taken as a₀Jumping to the step 1;

if MiEncopy ((a)₀,b]) A is not less than throshold, and a is taken as₀B is equal to b and jump to step 1;

if MiEntrophy ([ a, a)₀])≤MiEntropy((a₀,b]) < throshold, take a as₀B is b and jumps to the step 1;

if MiEncopy ((a)₀,b])≤MiEntropy([a,a₀]) < throshold, take a as a, b as a₀Jumping to the step 1;

and 4, obtaining a box Boundary set after the box separation is finished, and sorting the box Boundary set according to the small to large sequence to obtain boundry ═ a, a₁,a₂,…,a_kB, dividing the index I into k +1 boxes according to Boundary: { [ a, a)₁],(a₁,a₂],…,(a_k,b]}；

The chi-square test correlation screening indexes are specifically as follows: the relevance of index variables and enterprise risks is detected through chi-square inspection, indexes which are not beneficial to early warning are filtered, and sample space is divided according to the result of relevant analysis of chi-square inspection based on supervised sub-boxes.

In an embodiment of the present invention, the specific implementation manner of step S7 is as follows:

firstly, centralizing the enterprise index data to be early-warned in a to-be-early-warned enterprise index data set to perform the symbolization of the enterprise index data to be early-warned: the conversion of the index data is determined by the box-dividing rule of step S5, and the original index data is converted into the corresponding character identifier to obtain the converted index set of the enterprise

Wherein, C_iThe result set is formed by the symbolization of each index value of the ith sample enterprise;

represents the ith sample enterprise c_iA result value is tokenized by an index;

secondly, obtaining a hit association rule: traversing the association rule, if the index set of the association rule

Satisfies C_i∩R_j＝R_jThen it indicates that the enterprise hits R_jAnd (3) corresponding association rules, so that an enterprise hit risk rule index set is obtained:

wherein,

represents the qth early warning enterprise_iAn index set of risk rules for individual hits;

represents the qth early warning enterprise_iA risk level of the hit risk rule;

represents the qth early warning enterprise_iConfidence of the risk rule for an individual hit;

then, a risk level is obtained: the risk grade is determined by the risk grade and the confidence coefficient of the hit association rule, the risk grade of the association rule is converted into a corresponding score, the confidence coefficient is used as a weight to carry out weighted average, a final risk score is obtained through calculation, and the risk grade is obtained according to the score interval of each risk grade;

wherein, high risk is represented by P0, middle and high risk has two grades, namely P1 and P2, and the risk of P1 is greater than that of P2, low risk is represented by P3, and no risk is represented by P4; riskScore_iRepresenting a risk score of the ith early warning business; SP_ijA risk level score representing the risk rule of the jth hit of the ith early warning business; p is_ijRepresenting the risk level of the risk rule of the jth hit of the ith early warning enterprise; conf_ijRepresenting the confidence of the risk rule of the jth hit of the ith early warning enterprise; r is_iThe sum of the confidence degrees of the risk rules of the hit of the ith early warning enterprise is represented; the riskLevel is a function for mapping the risk score to the risk level;

finally, obtaining a risk description: traversing the risk rule set from step S2

And enterprise hit risk rule index set

If X_k∩R_ir＝X_kThen X exists in the enterprise probability_kCorresponding risk point risk_k(ii) a After traversing is completed, obtaining the enterprise risk point set

And splicing the elements in the risk point set by semicolons to obtain the risk description of the elements.

Compared with the prior art, the invention has the following beneficial effects:

(1) high innovativeness. The invention relates to the specific application of an association analysis FP-Tree algorithm in the field of enterprise risk early warning analysis, fills the blank of the association analysis algorithm in the field of enterprise risk early warning analysis, performs index screening and binning by using a chi-square test principle before the FP-Tree mining association rule, improves the early warning accuracy by removing the index with weak correlation, and can more comprehensively obtain the risk of mining enterprise behaviors.

(2) And (4) timeliness. When enterprise early warning is carried out each time, the code script obtains real-time data generation indexes from the original data table, and the index screening, the binning and the association rule are correspondingly dynamically updated, so that the method can be automatically adjusted in real time according to external changes to adapt to the changes of the indexes, and time errors of enterprise risk early warning in data processing and analysis are reduced to a great extent.

(3) A low threshold. Because the enterprise risk early warning analysis method based on the correlation analysis algorithm FP-Tree is black box for the use of the end user, the end user does not need to care about the specific model construction process, only needs to store and update the required enterprise basic information and behavior information data into an enterprise information database, and the obtained early warning clues are displayed in a domain model risk clue list through a risk early warning system interface.

Drawings

FIG. 1 is a schematic diagram of the method of the present invention.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

The invention provides an enterprise risk early warning method based on an FP-Tree algorithm of correlation analysis, which comprises the following steps:

step S1, according to the historical enterprise relevant behavior data, analyzing the scale and important basis for measuring the enterprise risk condition, and designing a risk index system X ═ X₁,x₂,…,x_i}，x_iA name representing an ith index of a risk index system;

wherein, X_kIs a subset of the index system X; risk (r) is a chemical compound_kIs composed of X_kAnalyzing the corresponding risk text description obtained by inference;

step S3, collecting enterprise related behavior data, and constructing a training index data set of an enterprise risk early warning model and an enterprise index data set to be early warned, wherein the training index data set comprises a training set: test set 4: 1;

step S4, based on the training index data set, calculating and obtaining the corresponding risk grade of the enterprise through the credit dimension data of the enterprise, wherein the calculation formula is as follows:

wherein, A represents a certain index set; b represents a certain risk level;

representing the confidence of the risk level B inferred by the index set A; count (A ≧ B) and count (A) respectively represent the number of samples in which the element in the index set A and the risk level B have the same sample at the same time, and the number of samples in which the element in the index set A has the same sample at the same time;

The following is a specific implementation of the present invention.

The invention is realized by adopting the following scheme steps:

step 1, carrying out early-stage investigation and research on various behavior data of an enterprise, analyzing a scale and an important basis for measuring the risk condition of the enterprise, and designing a risk index system X ═ X₁,x₂,…,x_i}，x_iThe name of the i-th index of the index system is represented. For example, a risk index system consisting of 7 first-level indexes, 30 second-level indexes and 81 third-level indexes is designed by researching the behavior data of each link of an enterprise in administrative inspection behaviors, administrative penalty behavior information, administrative mandatory behaviors, performance history, product quality inspection, complaint reporting information, credit rating evaluation and the like and the attributes of the enterprise;

TABLE 1 Enterprise Risk indicator System

TABLE 1-CONTINUOUS 1

TABLE 1-CONTINUOUS 2

TABLE 1-CONTINUOUS 3

Step 2, according to an index system, using the existing big data analysis to form a risk rule, namely, reasoning that a certain risk possibly exists in the enterprise by using the value of one or more indexes equal to a certain specific value or belonging to a certain specific interval value to obtain a risk rule set B:

wherein,

represents the qth early warning enterprise_iA risk level of the hit risk rule;

represents the qth early warning enterprise_iConfidence of risk rule of individual hit；

taking the established risk index system in table 1 as an example, the risk of ' suspected tax evasion and false deduction of enterprise exists ' can be inferred according to the three-level index income abnormity, asset abnormity, profit abnormity, personnel abnormity, tax payment abnormity and logic relation abnormity in the first-level index annual statement of enterprise, or the risk of ' suspected operation instability ' exists because the frequency of the enterprise is more than 10 times in three years according to the change of legal representative, the change of enterprise name, the change of registered residence and other change registered items, and the change of ' enterprise basic information, share right and the like is obtained by greatly increasing or reducing the registered capital, and the like.

Step 3, establishing a model training data standard, comprising: enterprise related behavior data (enterprise basic information, administrative inspection behavior information, administrative punishment behavior information, administrative mandatory behavior information, performance history, complaint reporting information, enterprise credit scores, enterprise product information tables and the like) are collected through a data management system, and python scripts are compiled to generate a training index data set (a training set: a test set is 4:1) and an enterprise index data set to be early-warned of the proposal early-warning model in real time;

and 4, acquiring target variables of the training samples, and calculating according to the credit dimension data design formula of the enterprise to obtain corresponding risk levels. Performing risk grade evaluation on a training data set sample by combining data such as credit scores of enterprises in the last 5 years, blacklist or credit loss times of enterprises in the last 5 years and the like, performing risk grade evaluation on the training sample enterprises according to the following formula and each risk grade score interval to serve as a target variable 'Y' of the training data set, and inputting the obtained target variable 'Y' and a training index data set into a correlation analysis algorithm for correlation rule mining;

among them, creditScore_newRepresent the normalized value of the latest credit risk score, 100. creditScore_newA base score as a risk score; creditScore_iRepresenting the credit risk score of the previous i years,

stability condition representing a credit score; the riskListCount represents the number of times of blacklisting or loss of credit in the last 5 years, and the 4. riskListCount represents the risk of blacklisting or loss of credit; table 2 is a risk score-risk level correspondence table.

TABLE 2 Risk score-Risk level correspondence Table

riskScore	(-∞,20)	[20,40)	[40,60)	[60,80)	[80,+∞)
						Risk rating	Risk-free P4	Low risk P3	Middle risk P2	Middle and high windDanger P1	High risk P0

And 5, performing box separation and chi-square test by using a mutual entropy-interval set method to screen indexes (indexes which are not beneficial to the early warning model by filtering), performing index symbolization according to box separation results, and storing box separation rules and a residual index list after screening.

Further, the chi-square binning tokenization index variable in step 5 is specifically: for the index with the discrete variable attribute and the index with the value type of more than 5, a supervised cross entropy-interval set method is used for carrying out box separation on the index variables and converting the continuous variables into characters according to box separation results, so that the risk of model overfitting is reduced, for example, for the index 'enterprise registered capital (x 1)', original index data is divided into 3 boxes under chi-square box separation, and the numerical value of the index after character conversion is converted into x1_ bin0, x1_ bin1 or x1_ bin 2.

step 0, presetting a threshold value threshold and a maximum box number n;

index I to be binned is

And (3) taking the initial binning Boundary value set as Boundary { a, b }, binning the index I:

step 1, get

wherein t is an interval; c is a set of classes, { C ═ C₁,c₂,…,c_mM is the number of categories; p (c)_i)、p(t)、p(t,c_i) Respectively in the training set c_iThe number of samples of class, the number of samples of index value in interval t, the index value in interval t and belonging to c_iThe ratio of the number of class samples to the total number of training set samples, p (c)_iI t) index value in the interval t and belonging to c_iIs proportional to the number of samples of the index value in the interval t, eta is a hyper-parameter and satisfies eta ∈ [0,1 ]]Default value is 0.5.

step 3, obtaining the box number numb (I) of the index I according to Boundary:

if numb (I) is more than or equal to n, the box separation is stopped

if MiEncopy ((a)₀,b]) A is not less than throshold, and a is taken as₀B is b and jumps to step 1

If MiEntrophy ([ a, a)₀])≤MiEntropy((a₀,b]) < throshold, take a as a₀B is equal to b and jump to step 1;

if MiEncopy ((a)₀,b])≤MiEntropy([a,a₀]) < throshold, take a as a, b as a₀And jumping to step 1.

And 4, obtaining a box Boundary set after the box separation is finished, and sorting the box Boundary set according to the small to large sequence to obtain boundry ═ a, a₁,a₂,…,a_kB, dividing the index I into k +1 boxes according to Boundary: { [ a, a)₁],(a₁,a₂],…,(a_k,b]}。

The chi-square test correlation screening indexes are specifically as follows: the relevance of index variables and enterprise risks are detected through chi-square detection, indexes which are not beneficial to early warning are filtered, but the result of related analysis of traditional chi-square detection depends on division of a sample space, different divisions can obtain different inference results, and the sample space is divided based on supervised binning, so that the early warning method has high detection efficiency and is stable.

And 6, acquiring an association rule set. Based on the steps, a complete enterprise training sample index set and a target variable 'Y' are obtained, association rules of enterprise behaviors of all risk levels of an enterprise in the training data are mined by using a classical association rule mining algorithm FP-Tree, the association rules are traversed and integrated into an association rule set consisting of the index set, the risk levels and confidence degrees, and the association rule set consists of elements in the form of '(index set): (risk level, confidence degree)' with the confidence degree being greater than 0.5. The association rule set mined by the enterprise by applying the FP-Tree algorithm is as follows: { (x1_ bin0, x3_ bin1, x7_ bin3, x15_ bin4): (P0,0.98), … … }.

Further, the association rule in step 6 reflects the interdependency and association between one object and other objects, and if there is an association relationship between the objects, one object can be predicted by other objects. Based on the extension of the idea, the association analysis algorithm is applied to enterprise risk early warning, and the association rules of each risk level and enterprise behaviors of the enterprise are mined by using a classical association rule mining algorithm FP-Tree.

And 7, early warning the enterprise to be early warned according to the obtained association rule and the index system risk rule combed in the step 2, early warning the association rule of enterprise hit, and predicting the enterprise risk level and possible risk points. For any enterprise to be early-warned, an early-warning result can be obtained according to the following steps:

firstly, the enterprise index data to be early-warned is converted into characters. The conversion of the index data is determined by the box-dividing rule in step 5, the original index data is converted into the corresponding character identifier, and the index set of the enterprise is obtained

Second, a hit association rule is obtained. Traversing the association rule, and if the index set of the association rule

Satisfies C_i∩R_j＝R_jThen it indicates that the enterprise hits R_jCorresponding association rules, thus, resulting in an enterprise hit risk rule set:

then, a risk level is obtained. And the risk grade is determined by the risk grade hitting the association rule and the confidence coefficient, the risk grade of the association rule is converted into a corresponding score, the confidence coefficient is used as a weight value to carry out weighted average, the final risk score is obtained by calculation, and the risk grade is obtained according to the score interval of each risk grade.

Finally, a risk description is obtained. Traversing the risk rule set obtained in step 2

And index set of enterprise hit risk rules

If X_k∩R_ir＝X_kThen X exists in the enterprise probability_kCorresponding risk point risk_k. After traversing is completed, the enterprise obtains a risk point set

Early warning result display case: a business risk level is P0 (high risk), and the thread description is: missing annual newspaper bulletins; the enterprise registration changes frequently, and the risk of unstable operation exists; the operational finance may risk falsifying; expiration or invalidation of business license; the abnormal operation proportion of the associated enterprises is too high, and the enterprise operation risks being brought into abnormal operation; the rate of losing credit of the associated enterprises is too high, and the associated enterprises have the risk of losing credit.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. An enterprise risk early warning method based on an FP-Tree association analysis algorithm is characterized by comprising the following steps:

step S1, according to the historical related behavior data of the enterprise, analyzing and measuring a scale and an important basis of the risk condition of the enterprise, and designing a risk index system X ═ X₁,X₂,…,X_i}，X_iA name representing an ith index of a risk index system;

step S2, according to the risk index system, big data analysis is applied to form a risk rule, namely, the value of one or more indexes is equal to a preset value or a preset interval value, the enterprise is considered to have corresponding risks, and a risk rule set B is obtained:

B＝{X₁:risk₁,X₂:risk₂,…,X_b:risk_b,},

wherein, X_kIs a subset of the index system X; risk_kIs composed of X_kAnalyzing the corresponding risk text description obtained by inference;

among them, creditScore_newRepresenting the normalized value of the latest credit risk score, 100. creditScore_newA base score as a risk score; creditScore_iThe first i-year credit risk score is represented,

step S6, acquiring association rule set: mining association rules of enterprise behaviors of each risk level of the enterprise by using an association analysis FP-Tree algorithm, traversing the association rules and integrating the association rules into an association rule set consisting of an index set, a risk level and a confidence coefficient, wherein the association rule set consists of elements in the forms of the index set, the risk level and the confidence coefficient, and the confidence coefficient is more than 0.5;

wherein, A represents one index set; b represents one of the risk levels;

representing the confidence of the risk level B inferred by the index set A; count (A ≈ B) and count (A) respectively represent the number of samples in which the elements in the index set A and the risk level B have the same sample at the same time, and the number of samples in which the elements in the index set A have the same sample at the same time;

s7, according to the association rule set obtained in the step S6 and the risk rule set obtained in the step S2, early warning is conducted on the enterprise to be early warned based on the enterprise index data set to be early warned, association rules of enterprise life are early warned, enterprise risk levels and possible risk points are predicted, and early warning results are output;

in step S5, the specific implementation manner of performing binning and chi-square test on the correlation screening index by using the mutual entropy-interval set method is as follows:

step 0, presetting a threshold value threshold and a maximum box number n;

index I to be binned is

step 1, get

Will be [ a, b ]]Divided into two intervals [ a, a₀]、(a₀,b]And combining mutual information and information entropy to provide a new category uncertain evaluation function MiEncopy:

wherein t is an interval; c is a set of classes, { C ═ C₁,c₂,…,c_mM is the number of categories; p (c)_i)、p(t)、p(t,c_i) Respectively, training set c_iClass number of samples, index value number of samples in interval t, index value in interval t and belonging to c_iThe ratio of the number of class samples to the total number of training set samples, p (c)_iI t) index value in the interval t and belonging to c_iIs proportional to the number of samples of the index value in the interval t, eta is a hyper-parameter and satisfies eta ∈ [0,1 ]]；

step 2, if MiEncopy ([ a, a)₀]) Greater than or equal to threshold or MiEncopy ((a)₀,b]) A is greater than or equal to threshold, then₀Adding the mixture into Boundary, and turning to the step 3; otherwise, directly turning to the step 3;

step 3, obtaining the box number numb (I) of the index I according to Boundary:

if numb (I) is more than or equal to n, the box separation is stopped

If MiEntrophy ([ a, a)₀]) A is greater than or equal to threshold, and a is taken as a, b is taken as a₀Jumping to the step 1;

if MiEncopy ((a)₀,b]) A is greater than or equal to threshold, and a is taken as₀B is b and jumps to the step 1;

if MiEntrophy ([ a, a)₀])≤MiEntropy((a₀,b]) < threshold, take a ═ a₀B is b and jumps to the step 1;

if MiEncopy ((a)₀,b])≤MiEntropy([a,a₀]) < threshold, take a ═ a, b ═ a₀Jumping to the step 1;

and 4, obtaining a box Boundary set after the box separation is finished, and sequencing the box Boundary set from small to large to obtain boundry { a, a }₁,a₂,…,a_kB, dividing the index I into k +1 boxes according to Boundary: { [ a, a)₁],(a₁,a₂],…,(a_k,b]}；

2. The enterprise risk early warning method based on the correlation analysis FP-Tree algorithm according to claim 1, wherein the concrete implementation manner of step S7 is as follows:

Wherein, C_iPerforming the symbolization on each index value of the ith sample enterprise to obtain a result set;

Satisfies C_i∩R_j＝R_jThen it indicates that the enterprise hits R_jCorresponding association rule, thereby obtaining an enterprise hit risk rule index set Q_i：

Wherein,

represents the qth early warning enterprise_iA risk level of the individual hit risk rule;

represents the qth early warning enterprise of the ith_iConfidence of the risk rule for an individual hit;

wherein, high risk is represented by P0, middle and high risk has two grades, namely P1 and P2, and the risk of P1 is greater than that of P2, low risk is represented by P3, and no risk is represented by P4; riskScore_iRepresenting a risk score for the ith early warning business; SP_ijA risk level score representing a risk rule of a jth hit of the ith early warning enterprise; p_ijRepresenting the risk level of the risk rule of the jth hit of the ith early warning enterprise; conf_ijRepresenting the confidence of the risk rule of the jth hit of the ith early warning enterprise; r is_iThe sum of the confidence degrees of the risk rules of the hit of the ith early warning enterprise is represented; the riskLevel is a function for mapping the risk score to the risk level;

finally, obtaining a risk description: the risk rule set B ═ X obtained in step S2 is traversed₁:risk₁,X₂:risk₂,…,X_b:risk_b,},

And enterprise hit risk rule index set Q_i＝{R_ij:(P_ij,Conf_ij)},j＝i,…,q_iIf X is_k∩R_ij＝X_kThen X exists in the enterprise probability_kCorresponding risk point risk_k(ii) a After traversing, obtaining the RISK point set RISK of the enterprise_i＝{risk_j},j＝1,…,k_iAnd splicing the elements in the risk point set by semicolons to obtain the risk description of the elements.