CN112801231A

CN112801231A - Decision model training method and device for business object classification

Info

Publication number: CN112801231A
Application number: CN202110373889.8A
Authority: CN
Inventors: 李盟; 李龙飞
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-05-14
Anticipated expiration: 2041-04-07
Also published as: CN112801231B

Abstract

The embodiment of the specification provides a decision model training method and device for business object classification. The training method comprises the steps of firstly, obtaining a sample total set and training constraint conditions; and then constructing a decision tree in a node splitting mode according to the sample collection, wherein the process of splitting any current node comprises the following steps: for any one splitting condition in a plurality of alternative splitting conditions of the current node, determining the constraint fitness of the splitting condition according to the conformity degree of two child nodes obtained by splitting the current node according to the splitting condition to the constraint condition; determining a comprehensive score of the splitting condition according to the splitting purity of the splitting condition and the constraint fitness; and splitting the current node according to the splitting condition with the optimal comprehensive score in the multiple candidate splitting conditions. Then, based on the decision tree, a decision model for classifying the business object is determined.

Description

Decision model training method and device for business object classification

Technical Field

One or more embodiments of the present specification relate to the field of artificial intelligence and machine learning, and more particularly, to a method and apparatus for training a decision model for business object classification.

Background

Under various scenarios, business objects need to be decided and classified, for example, whether a transaction is a high risk transaction related to fraud or theft is determined, a credit rating of a user is determined, whether the user needs to be added to a blacklist is determined, whether a business application should be checked and approved, and the like. In the traditional mode, the above decision making process is often completed by a strategy developer with expert experience through making a decision making rule. Policy developers need to try to fuse different business variables and different variable thresholds to construct decision rules according to experience of business. Meanwhile, performance and stability of the output decision rule are measured and calculated. The way of artificially generating the decision rule has obvious defects in cost, efficiency and effect. First, generating decision rules under conditions that satisfy complex business constraints requires high labor and time costs. Meanwhile, because the efficiency of manual trial and measurement is low, only relatively limited service characteristics and condition combinations can be tried through the prior manual experience, and a higher service target cannot be achieved.

Some schemes for training a decision model and performing decision learning by using a machine learning method have also been proposed recently. These methods may allow for the learning of decision rules by non-manual methods. However, when a real business scenario has specific requirements on a decision model, the general machine learning model training method is often difficult to achieve expected requirements and performance, and still has disadvantages.

Therefore, it is desirable to have an improved scheme for more effectively training a decision model satisfying the constraints and requirements of a business scenario, so as to more efficiently perform classification decisions of business objects.

Disclosure of Invention

One or more embodiments of the present disclosure describe a constraint adaptive decision model training method and apparatus, which can effectively train a decision model satisfying a constraint condition of a business scenario, so as to efficiently perform a classification decision of a business object.

According to a first aspect, there is provided a decision model training method for business object classification, comprising:

acquiring a sample total set and a training constraint condition, wherein a single sample in the sample total set comprises attribute characteristics of a single business object and a classification label for judging whether the business object belongs to a target business classification;

according to the sample collection, a first decision tree is constructed in a node splitting mode, wherein the process of splitting any current node comprises the following steps: for any one splitting condition in a plurality of alternative splitting conditions of the current node, determining the constraint fitness of the splitting condition according to the conformity degree of two child nodes obtained by splitting the current node according to the splitting condition to the constraint condition; determining a comprehensive score of the splitting condition according to the splitting purity of the splitting condition and the constraint fitness; splitting the current node according to the splitting condition with the optimal comprehensive score in the multiple alternative splitting conditions;

based on the first decision tree, a decision model for classifying a business object is determined.

In one embodiment, the process of splitting for any current node further comprises: and determining the multiple alternative splitting conditions according to the attribute characteristic values of each service object in the current sample set of the current node.

Further, in one example, the attribute feature includes a plurality of attribute features of numerical type; determining the plurality of candidate splitting conditions specifically comprises: enumerating possible values of the multiple attribute features in the current sample set, and taking a combination of one attribute feature and one value of the attribute feature as an alternative splitting condition.

According to one possible embodiment, the constraint condition may include an evaluation index predicted for the sample and an index threshold to which the evaluation index should be met; the determining the constraint fitness of the splitting condition specifically includes: for any child node in the two child nodes, determining an index value of the evaluation index according to sample prediction performed according to a decision rule corresponding to the child node, and determining constraint conformity of the child node according to comparison between the index value and the index threshold value; and determining the larger of the constraint conformity of the two child nodes as the constraint fitness of the splitting condition.

Further, in various embodiments, the aforementioned evaluation index may include one of: confidence, recall rate, number of recalls, stability.

In an embodiment, the determining the constraint conformity of the child node according to the comparison between the index value and the index threshold specifically includes: if the index value meets the index threshold, determining the constraint conformity of the child node as 0; and if the index value does not accord with the index threshold, taking the inverse number of the absolute value of the difference between the index value and the index threshold as the constraint conformity of the child node.

According to one embodiment, the process of splitting for any current node further comprises: and determining the splitting purity of the splitting condition according to the sample purity of the current sample set corresponding to the current node and the sample purities of the two sample subsets corresponding to the two sub-nodes respectively.

Further, the aforementioned sample purity may be determined based on one of the following indicators: information entropy, kini coefficient.

In one embodiment, the determining the cleavage purity of the cleavage condition specifically includes: taking the ratio of the number of the samples of the two sample subsets to the number of the samples of the current sample set as respective weights, and performing weighted summation on the sample purities of the two sample subsets to obtain a sum value; determining a splitting purity for the splitting condition based on a difference between the sample purity for the current sample set and the sum value.

According to a possible implementation manner, determining a comprehensive score of the splitting condition according to the splitting purity of the splitting condition and the constraint fitness specifically comprises: and respectively taking the first weight and the second weight as weight factors, and carrying out weighted summation on the splitting purity and the constraint fitness to obtain the comprehensive score.

Further, in an example, the first weight may be determined according to a first variance of a plurality of fragmentation purities corresponding to the plurality of candidate fragmentation conditions, respectively, and is negatively correlated with the first variance; the second weight is determined according to a second variance of a plurality of constraint fitness degrees respectively corresponding to the plurality of candidate splitting conditions, and is inversely related to the second variance.

According to one possible embodiment, the first decision tree comprises a first number N of leaf nodes; the determining a decision model for classifying the business object based on the first decision tree specifically includes: determining N decision rules corresponding to N paths from a root node to N leaf nodes in the first decision tree; screening out a second number M of decision rules which meet the constraint condition from the N decision rules; forming the decision model based on the M decision rules.

Further, in one embodiment, forming the decision model based on the M decision rules further includes: for each of the M decision rules, performing the following clipping iterations: if the parent rule of the current decision rule meets the constraint condition, replacing the current decision rule with the parent rule until the parent rule does not meet the constraint condition any more; the current decision rule corresponds to a first node sequence starting from a root node in the first decision tree, and the father rule is a decision rule corresponding to a node sequence obtained by cutting the last node in the first node sequence; and forming the decision model based on a non-repeated decision rule obtained after the cutting iteration is executed.

According to one possible embodiment, the method further comprises: predicting each sample of the sample collection by using the first decision tree to obtain a first sample set formed by samples predicted to belong to the target service classification; removing the first sample set from the total sample set to obtain a second sample set; constructing a second decision tree by utilizing the same node splitting mode as the first decision tree construction according to the second sample set; correspondingly, determining a decision model for classifying the business object based on the first decision tree specifically includes: determining the decision model based on the first decision tree and the second decision tree.

In various embodiments, the aforementioned business object may comprise one of: user, operation event, transaction, service application request; the target traffic class indicates traffic objects at risk.

According to a second aspect, there is provided a decision model training apparatus for business object classification, comprising:

the acquisition unit is configured to acquire a sample total set and a training constraint condition, wherein a single sample in the sample total set comprises attribute characteristics of a single business object and a classification label of whether the business object belongs to a target business classification;

a decision tree construction unit configured to construct a first decision tree by means of node splitting according to the sample total set, wherein a process of splitting an arbitrary current node includes: for any one splitting condition in a plurality of alternative splitting conditions of the current node, determining the constraint fitness of the splitting condition according to the conformity degree of two child nodes obtained by splitting the current node according to the splitting condition to the constraint condition; determining a comprehensive score of the splitting condition according to the splitting purity of the splitting condition and the constraint fitness; splitting the current node according to the splitting condition with the optimal comprehensive score in the multiple alternative splitting conditions;

a model determination unit configured to determine a decision model for classifying a business object based on the first decision tree.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, in order to train a decision model with better interpretability aiming at the task of business object classification, a decision tree is adopted as a basic model. In order to better adapt to the constraint conditions proposed for the training task, a constraint self-adaptive decision tree generation mode is proposed. When the splitting condition of the node is selected, not only the information gain or the splitting purity before and after splitting but also the degree of satisfaction of the splitting result of the splitting condition to the constraint condition, that is, the constraint adaptability, is considered. The decision tree thus obtained can be better adapted to preset constraints. Furthermore, an effective decision rule can be extracted by further referring to the constraint condition according to the constraint self-adaptive decision tree, so that a final decision model is determined.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 schematically illustrates a process for determining a constraint adaptive decision model;

FIG. 2 illustrates a flow diagram of a method of training a decision model for business object classification, according to one embodiment;

FIG. 3 illustrates a flow of steps for splitting for a current node in one embodiment;

FIG. 4 shows a schematic diagram of a decision tree;

fig. 5 shows a schematic diagram of a training apparatus according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As described above, when a real-life business scenario has specific requirements on a decision model or applies specific constraint conditions, it is difficult for a conventional machine learning method to train a decision model satisfying the constraint conditions. Therefore, in the embodiment of the present specification, a constraint adaptive decision tree generation method is proposed. According to the method, the constraint conditions required by the service scene can be automatically and better adapted in the node splitting process of generating the decision tree. A final decision model can then be determined based on the constraint adaptive decision tree to better meet the business requirements.

Fig. 1 schematically shows a determination process of a constraint adaptive decision model. As shown in fig. 1, under a certain service scenario, some constraint conditions are provided for the decision model, for example, the confidence exceeds a certain threshold, the recall rate meets a certain condition, the number of recalls meets a certain requirement, the stability reaches a certain target, and the like. Therefore, in the model training process, a decision tree is generated according to the characteristics and label information of the training sample set and the satisfaction condition of the constraint condition is considered at the same time, namely the constraint self-adaptive decision tree is constructed. It is to be understood that the decision tree is generated by constantly splitting nodes according to the splitting condition. Therefore, in the process of constructing the constraint adaptive decision tree, when selecting the splitting mode or splitting condition of the node, the information gain before and after splitting is considered, and the satisfaction degree of the splitting result of the splitting condition on the constraint condition is also considered. The decision tree thus obtained can be better adapted to preset constraints. Furthermore, an effective decision rule can be extracted by further referring to the constraint condition according to the constraint self-adaptive decision tree, so that a final decision model is determined. As shown in fig. 1, optionally, after the decision rule is extracted from the decision tree, the decision rule may be screened and/or tailored with reference to the constraint condition again, so as to optimize the decision rule, so that the decision model has better performance while satisfying the constraint condition.

The following describes a specific implementation procedure of the above technical concept.

FIG. 2 illustrates a flow diagram of a method of training a decision model for business object classification, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 2, the method includes the following steps.

In step 21, a sample collection and trained constraints are obtained.

The total set of samples contains a large number of samplesThe single sample comprises the attribute characteristics of the single business object and the classification label of whether the business object belongs to the target business classification. Specifically, the ith sample in the total set of samples can be denoted as (x)_i,y_i) Wherein x is_iThe attribute feature, y, of the business object corresponding to the ith sample_iA label indicating whether the business object belongs to the target business classification. It should be understood that, since the decision tree needs to be trained by using the sample collection, each item in the attribute feature is a numerical feature. y is_iUsually, the value is 0 or 1, respectively, to show whether the business object belongs to the target business class. In general, samples belonging to a target traffic class (e.g., label value y)_iSamples of 1) are referred to as positive samples and the remaining samples are referred to as negative samples.

In different embodiments, the business object corresponding to the sample may be various business objects, such as a user, an operation event, a transaction, a business application request, and the like.

In a specific example, the business object is a user, and the user can be represented by a corresponding account. Accordingly, the target business classification may be a risky user/account, e.g., a spam account, a compromised account, a credit-risky user, etc. The attribute characteristics of the user may include basic attributes such as age, registration duration of the account number, etc., and may also include attributes associated with a particular target business category, such as the number of debits in the last period of time, the cumulative amount of debits, etc., when used to assess credit risk.

In another example, the business object is a transaction. Accordingly, the target traffic class may be a high risk transaction, such as a transaction involving fraud, cash-out, card theft, and the like. For a transaction sample, the attribute characteristics may include, for example, transaction amount, transaction time, frequency of transactions over a recent period of time, and the like.

In yet another example, the business object is a business application request, such as a loan request, an insurance claim request, and correspondingly, the target business classification may be a high-risk application request, such as a suspected fraud claim request, an overdue loan request.

In other examples, the sample may also be other business objects, such as user actions, interaction events, etc., and different types of business objects have different attribute characteristics for different target business categories, which are not described in detail herein.

On the other hand, the constraint conditions proposed for the decision model training are also required to be obtained. The constraint may include an evaluation index predicted for the sample and an index threshold to which the evaluation index should meet.

For example, in one example, the constraint may include that the confidence of the sample prediction should reach a confidence threshold. The confidence of a decision rule or a decision model may be defined as the proportion of true positive samples in a set of positive samples (samples belonging to a target traffic class) predicted according to the rule or model. That is, if a decision rule or a decision model predicts N positive samples for a batch of samples, and M of the N samples are true positive samples with labels of 1, the confidence is M/N. In some application scenarios, the confidence level is also referred to as accuracy, or prediction accuracy.

In one example, the constraint may include that the recall rate of the sample prediction should exceed a proportional threshold. Specifically, the recall rate of the decision model may be defined as a ratio m/n of the number m of positive samples correctly predicted by the decision model to the number n of input positive samples when a certain number n of positive samples are input into the decision model. Recall, which may also be referred to as coverage, is another measure of the predictive performance of the model.

In other examples, the constraints may further include that the number of recalls for the sample prediction should reach a certain threshold, that the utilization of the sample features in the sample prediction should exceed a certain threshold, and so on. Not to mention here.

When the sample collection and the constraint conditions are obtained, the decision tree can be trained based on the sample collection in combination with the constraint conditions. That is, at step 22, a first decision tree is constructed by means of node splitting based on the sample ensemble.

As known to those skilled in the art, a decision tree is a tree model with strong interpretability, and the trained decision tree includes a root node, intermediate nodes and leaf nodes, and each node except the leaf nodes corresponds to a splitting condition. The sample set is input from a root node, is divided into child nodes of the next level through the splitting condition of each node and reaches a leaf node. The process of training or constructing the decision tree is a process of splitting nodes by determining splitting conditions corresponding to the nodes from a root node. In general, the splitting condition of a node corresponds to a combination of an attribute feature and a feature value of a sample. For example, for a user service object, assuming that the splitting condition of a certain node i is that the attribute feature is age, and the feature value is 25, a user sample with the age less than 25 will be divided into the left child nodes of the node i, and a user sample with the age greater than or equal to 25 will be divided into the right child nodes of the node i.

In the embodiment of the present specification, in order to construct a constraint adaptive decision tree, in the node splitting process, splitting purity of each candidate splitting condition and fitness of the foregoing constraint condition are comprehensively considered to select a splitting condition, and node splitting is performed.

FIG. 3 illustrates a flow of steps, i.e., sub-steps of step 22, for splitting for a current node in one embodiment. It is to be understood that the current node may be any node in the decision tree for which node splitting is to occur. In other words, for each node in the decision tree, the splitting process may be implemented according to the step flow of fig. 3.

As shown in fig. 3, at step 31, a plurality of alternative splitting conditions for the current node are determined. In an embodiment, the multiple candidate splitting conditions may be determined according to attribute feature values of each service object in the current sample set S of the current node D.

As previously mentioned, the sample attribute features used to train the decision tree are typically numerical multiple attribute features. Thus, in an example, all possible values of multiple attribute features in the current sample set S may be enumerated, and a combination of one attribute feature and one value of the attribute feature is used as one candidate splitting condition, so that the multiple candidate splitting conditions are enumerated. In another example, the maximum value and the minimum value of any attribute feature may be removed, the intermediate value is retained, and then the attribute feature and each intermediate value are combined to obtain a plurality of candidate splitting conditions. In another example, if the current node is located at a higher level, the feature values of some attribute features may be obtained in a certain step, and then the combination of the attribute features and the respective feature values is performed to obtain the above-mentioned multiple alternative splitting conditions.

For any one of the thus obtained plurality of candidate cleavage conditions, the cleavage purity of the cleavage condition is determined at step 32, the constraint fitness of the cleavage condition is determined at step 33, and then the composite score of the cleavage condition is determined based on the cleavage purity and the constraint fitness of the cleavage condition at step 34. The above steps are described separately below.

In step 32, the cleavage purity of a certain cleavage condition (s, t) is determined

Wherein s is the attribute feature selected by the splitting condition, and t is the feature value of the attribute feature for splitting. Purity of cleavage

By measuring sample purity of node sample set before and after splitting

Determines the information gain of the splitting condition for the sample classification.

Sample purity of a sample set

And is used for characterizing the degree of difference or the degree of uniformity of the sample classification labels in the sample set. Can use

Representing the distribution of samples of each class in a sample set, where p_jAnd the sample with the category label of the jth category accounts for the total amount of the samples in the sample set. When any one of p in the above distribution_jEqual to 1, and the sample purity is 0 for all other classes

To a maximum. In the case of whether the target business classification belongs to two classifications, when all samples in a sample set are positive samples or negative samples, the sample purity

To a maximum. In various embodiments, the sample purity

The determination may be based on the information entropy corresponding to the sample set, the index of the kini coefficient (Gini index), and the like.

Assuming that the current node D corresponds to the current sample set S, if the current node D is split according to the splitting condition (S, t), two child nodes D on the left side and the right side are obtained_LAnd D_RAccordingly, the current sample set S is divided into two sample subsets that fall into the left and right child nodes, respectively. Thus, the sample purity of the current sample set corresponding to the current node D can be determined

Two child nodes D_LAnd D_RSample purities of two sample subsets respectively corresponding to the sample subsets

Determining the cleavage purity of the cleavage conditions (s, t)

。

There are various ways to determine cleavage purity.

In a specific example, the sample purities of the two sample subsets may be weighted by the ratio of the number of samples of the two sample subsets to the number of samples of the current sample set

Carrying out weighted summation to obtain a sum value; sample purity based on current sample set

Determining the cleavage purity of the cleavage conditions from the difference of the sum

. More specifically, in one example, the cleavage purity can be determined using the following equation (1)

：

（1）

Wherein,

representing the number of samples in the sample set corresponding to node D.

In another specific example, the sample purity of the two sample subsets after splitting can also be found

Sum of (d) and original bulk purity before cleavage

Difference of (2), comparing the difference with the original sample purity

The ratio of (A) to (B) is determined as the cleavage purity

. In this example, the cleavage purity can be measured as the ratio of information gain before and after cleavage.

Thus, the cleavage purity of the cleavage conditions (s, t) is determined in various ways

。

On the other hand, also in step 33, the fitness of the constraint of the splitting condition (s, t) is determined

. The constraint fitness is used for measuring the fitness of the sample when the sample is predicted based on the decision rule formed by the splitting condition.

In one embodiment, to calculate the fitness of the constraint of the splitting condition (s, t)

First, defining and calculating constraint conformity of nodes

Then, according to the constraint conformity of two child nodes generated by the splitting condition (s, t), the constraint fitness of the splitting condition is determined

。

As previously mentioned, constraints generally include an evaluation index predicted for a sample and an index threshold to which the evaluation index should be met

. And a node D in the decision tree may correspond to a decision rule R, which is a combination of splitting conditions for each node on the path from the root node to the node D. Therefore, for the node D, it can be determined that the evaluation index is the aforementioned evaluation index when the sample prediction is performed according to the corresponding decision rule RIndex value of (2)

. For example, when the evaluation index is the confidence, the confidence conf (D) of the decision rule R corresponding to the node D in the sample prediction is determined as the index value

. When the evaluation index is the recall rate, determining the recall rate of the decision rule R corresponding to the node D in the sample prediction as an index value

. Then, based on the index value

And index threshold

Determining the constraint conformity of the node D

。

In one embodiment, if the indicator value is

Meet the above-mentioned index threshold

Then conform the constraint of the node D

The determination is 0, wherein the above "coincidence" means the same magnitude relation as the regulation in the constraint condition. If the constraint condition specifies that the index value of the evaluation index should be larger than the index threshold, the coincidence is larger than the index threshold. And if the index value is

Out of compliance with indicator threshold

Then get the index value

And index threshold

Is the inverse of the absolute value of the difference of (D), as the constraint conformity of node D

。

Specifically, in one example, the constraint condition specifies that the index value of the evaluation index should be greater than an index threshold, for example, the confidence level should be greater than a preset confidence level threshold. The constraint conformity of node D can be expressed as the following equation (2):

(2)

thus, constraint conformity of nodes is defined in various ways

. For the splitting condition (s, t), the splitting condition (s, t) will generate two child nodes D_LAnd D_RThe constraint conformity of the two child nodes can be determined respectively

Taking the larger of the two constraint conformity degrees as the constraint fitness of the splitting condition (s, t)

Namely:

(3)

thus, at step 33, the fitness of the constraint of the splitting condition (s, t) may be determined in a number of ways

。

It should be noted that, the determination of the cleavage purity in step 32 and the determination of the constraint fitness in step 33 may be performed in any relative order or in parallel, and are not limited herein.

Next, based on the cleavage purity determined in step 32 and the constraint fitness determined in step 33, in step 34, a composite score for the cleavage condition (s, t) is determined based on the cleavage purity and the constraint fitness of the cleavage condition

。

In one embodiment, the splitting purity and the constraint fitness of the splitting condition (s, t) may be simply added or multiplied, with the result being the composite score for the splitting condition.

In another embodiment, for the splitting condition (s, t), the splitting condition may be respectively weighted by a first weight w₁And a second weight w₂As a weighting factor, for split purity

And constrained fitness

Carrying out weighted summation to obtain a comprehensive score

Namely:

(4)

in one example, the first weight w is₁And a second weight w₂May be a preset hyper-parameter.

In another example, the first weight w₁A variance d of a plurality of fragmentation purities respectively corresponding to the plurality of candidate fragmentation conditions₁Is determined and is related to the variance d₁A negative correlation. The second weight is according to the variance d of a plurality of constraint fitness degrees respectively corresponding to a plurality of alternative splitting conditions₂Is determined and is related to the variance d₂A negative correlation.

More specifically, according to one example, the composite score

Can be expressed as:

(5)

thus, a composite score for the splitting condition (s, t) is obtained in a variety of ways. It is to be understood that the splitting condition (s, t) is any one of a plurality of alternative splitting conditions for the current node. Thus, for each of the plurality of candidate splitting conditions, a corresponding composite score may be determined in the manner of steps 32 through 34.

Then, in step 35, a splitting condition with the best overall score is determined from the plurality of candidate splitting conditions, and the current node is split according to the splitting condition. In most cases, both the fragmentation purity and the constraint fitness are set such that the higher the score, the more likely the corresponding fragmentation condition will meet the training objective. In such a case, the splitting condition with the highest composite score may be selected as the optimal splitting condition. The opposite is not excluded. And are not limited herein.

Thus, through the step flow of fig. 3, node splitting is performed for any current node in the decision tree. By performing the steps of fig. 3 for each node until a predetermined decision tree termination condition is satisfied, a decision tree, referred to herein as a first decision tree, is obtained. The decision tree termination condition may include, for example, the depth reaching a certain threshold, the number of samples in the node being less than a certain threshold, and so on.

Returning to fig. 2. After obtaining a first decision tree by node splitting according to the sample collection, a decision model for classifying the business object is determined based on the first decision tree in step 23.

In one embodiment, the first decision tree is used directly as the final decision model. In this case, the decision model may be understood to include respective decision rules corresponding to respective paths formed from the root node to respective leaf nodes in the first decision tree.

In another embodiment, the decision rules included in the first decision tree may be filtered based on the constraint conditions, so that the obtained decision model has stronger constraint adaptability.

In particular, assuming that the first decision tree includes a first number N of leaf nodes, N paths are formed from the root node to the N leaf nodes. Each path corresponds to a decision rule, namely the combination of the splitting conditions of the nodes where the path passes. The first decision tree then contains N decision rules. A second number M of decision rules that satisfy the aforementioned constraint condition may be screened from the N decision rules, and a decision model is formed based on the screened M decision rules.

Fig. 4 shows a schematic diagram of a decision tree. In the example of fig. 4, the decision tree contains 5 leaf nodes D, E, H, I, G, corresponding to 5 decision rules. The constraint condition is assumed that the confidence should be greater than or equal to 0.5, i.e., the confidence threshold is 0.5. The confidence of the respective decision rule can be determined separately. The number in the node in fig. 4 represents the confidence of the decision rule corresponding to the node. It can be seen that, in the 5 decision rules, the confidence degrees of the decision rules corresponding to the nodes I and G do not satisfy the constraint condition, and can be eliminated. Therefore, the decision rules corresponding to the nodes D, E and H are screened out to form a decision model.

In one embodiment, each decision rule is also tailored based on the screening of the decision rules based on the constraint conditions, thereby avoiding the overfitting problem caused by too long path and too complex rules.

Specifically, in one embodiment, for each of the screened decision rules, the following clipping iterations are performed: if the father rule of the current decision rule meets the constraint condition, replacing the current decision rule with the father rule, and continuously judging and replacing until the father rule does not meet the constraint condition any more; the current decision rule corresponds to a first node sequence from a root node in the first decision tree, and the father rule is a decision rule corresponding to a node sequence obtained by cutting the last node in the first node sequence.

Then, a decision model is formed based on a non-repetitive decision rule obtained after executing the clipping iteration.

Reference is again made to fig. 4. It is still assumed that the constraint is that the confidence level is greater than or equal to 0.5 and that each decision rule is represented by its corresponding node label. When the decision rule H is taken as the current decision rule, the parent rule is the decision rule F, the confidence coefficient of the decision rule is 0.5, and the constraint condition is met. The current decision rule may then be updated to decision rule F. At this time, the parent rule is the decision rule C, the confidence of the decision rule is 0.4, and the constraint condition is not satisfied, so that the replacement and the clipping are not performed any more.

In this way, the cutting iteration is executed on each screened decision rule D, E and H in sequence, and repeated decision rules are removed, so that decision rules B and F after cutting are obtained.

The decision model formed based on the screened and cut decision rule simplifies the rule and avoids overfitting, so that the decision model has better generalization capability and operation efficiency.

In one embodiment, in order to further improve the performance of the decision model, a scheme for iteratively training the decision tree is also provided. This is considered that, a decision tree, especially after rule screening and clipping, often has limited available decision rules, sometimes resulting in a reduced recall rate of the whole decision model. For this reason, the following ways of enhancing the iteration are proposed.

According to this embodiment, after obtaining the first decision tree according to step 22 of fig. 2, each sample of the sample collection is predicted by using the first decision tree, so as to obtain a first sample set composed of samples predicted to belong to the target traffic class. In other words, the first sample set is the sample set for which the first decision tree predicts as positive samples. Then, the first sample set is removed from the total sample set to obtain a second sample set. Here, the samples in the first sample set are all removed from the total set of samples regardless of whether they are true positive samples or not. Thus, the second sample set is obtained as the sample that is not covered by the first decision tree. Then, a second decision tree is constructed from the second sample set using the same node splitting as the first decision tree.

Accordingly, in step 23, the decision model may be determined based on the first decision tree and the second decision tree obtained by training in sequence. In this step, the first decision tree and/or the second decision tree may optionally be subjected to rule screening and/or rule clipping to obtain a final decision model.

It will be appreciated that if the recall of the first decision tree plus the second decision tree is still less than ideal, the training of the third decision tree, and possibly the fourth decision tree, may continue in the manner previously described until the recall reaches the ideal.

Reviewing the above process, in the embodiment of the present specification, in order to train a decision model with better interpretability for the task of classifying business objects, a decision tree is adopted as a base model. In order to better adapt to the constraint conditions proposed for the training task, a constraint self-adaptive decision tree generation mode is proposed. When the splitting condition of the node is selected, not only the information gain or the splitting purity before and after splitting but also the degree of satisfaction of the splitting result of the splitting condition to the constraint condition, that is, the constraint adaptability, is considered. The decision tree thus obtained can be better adapted to preset constraints. Furthermore, an effective decision rule can be extracted by further referring to the constraint condition according to the constraint self-adaptive decision tree, so that a final decision model is determined.

According to another embodiment, a decision model training device for business object classification is also provided, and the device can be deployed on any equipment or platform with computing and processing capabilities. Fig. 5 shows a schematic diagram of a training apparatus according to an embodiment. As shown in fig. 5, the training apparatus 500 includes:

an obtaining unit 51, configured to obtain a sample total set and a training constraint condition, where a single sample in the sample total set includes an attribute feature of a single business object and a classification label of whether the business object belongs to a target business classification;

a decision tree constructing unit 52, configured to construct a first decision tree by means of node splitting according to the sample total set, where the process of splitting for any current node includes: for any one splitting condition in a plurality of alternative splitting conditions of the current node, determining the constraint fitness of the splitting condition according to the conformity degree of two child nodes obtained by splitting the current node according to the splitting condition to the constraint condition; determining a comprehensive score of the splitting condition according to the splitting purity of the splitting condition and the constraint fitness; splitting the current node according to the splitting condition with the optimal comprehensive score in the multiple alternative splitting conditions;

a model determining unit 53 configured to determine a decision model for classifying the business object based on the first decision tree.

In an embodiment, the decision tree building unit 52 is further configured to determine the multiple candidate splitting conditions according to attribute feature values of each service object in the current sample set falling into the current node.

According to one possible embodiment, the constraint condition may include an evaluation index predicted for the sample and an index threshold to which the evaluation index should be met; decision tree building unit 52 may be specifically configured to: for any child node in the two child nodes, determining an index value of the evaluation index according to sample prediction performed according to a decision rule corresponding to the child node, and determining constraint conformity of the child node according to comparison between the index value and the index threshold value; and determining the larger of the constraint conformity of the two child nodes as the constraint fitness of the splitting condition.

According to one embodiment, the decision tree building unit 52 is further configured to: and determining the splitting purity of the splitting condition according to the sample purity of the current sample set corresponding to the current node and the sample purities of the two sample subsets corresponding to the two sub-nodes respectively.

According to one possible embodiment, the decision tree construction unit 52 is specifically configured to determine the composite score of the splitting condition by: and respectively taking the first weight and the second weight as weight factors, and carrying out weighted summation on the splitting purity and the constraint fitness to obtain the comprehensive score.

According to one possible embodiment, the first decision tree comprises a first number N of leaf nodes; the model determining unit 53 is specifically configured to: determining N decision rules corresponding to N paths from a root node to N leaf nodes in the first decision tree; screening out a second number M of decision rules which meet the constraint condition from the N decision rules; forming the decision model based on the M decision rules.

Further, in an embodiment, the model determining unit 53 is further configured to: for each of the M decision rules, performing the following clipping iterations: if the parent rule of the current decision rule meets the constraint condition, replacing the current decision rule with the parent rule until the parent rule does not meet the constraint condition any more; the current decision rule corresponds to a first node sequence starting from a root node in the first decision tree, and the father rule is a decision rule corresponding to a node sequence obtained by cutting the last node in the first node sequence; and forming the decision model based on a non-repeated decision rule obtained after the cutting iteration is executed.

According to a possible embodiment, the apparatus further comprises a second building unit (not shown) configured to: predicting each sample of the sample collection by using the first decision tree to obtain a first sample set formed by samples predicted to belong to the target service classification; removing the first sample set from the total sample set to obtain a second sample set; and constructing a second decision tree by utilizing the same node splitting mode as the first decision tree according to the second sample set. Accordingly, the model determining unit 53 is configured to: determining the decision model based on the first decision tree and the second decision tree.

Through the device, a constraint self-adaptive decision tree can be trained, and a more effective decision model can be determined.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 and 3.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2 and 3.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A decision model training method for business object classification comprises the following steps:

2. The method of claim 1, wherein splitting for any current node further comprises:

and determining the multiple alternative splitting conditions according to the attribute characteristic values of each service object in the current sample set of the current node.

3. The method of claim 2, wherein the attribute features comprise a plurality of attribute features of a numerical type; determining the plurality of candidate splitting conditions, comprising: enumerating possible values of the multiple attribute features in the current sample set, and taking a combination of one attribute feature and one value of the attribute feature as an alternative splitting condition.

4. The method of claim 1, wherein the constraints include an evaluation index predicted for the sample and an index threshold to which the evaluation index should meet; the determining the constraint fitness of the splitting condition specifically includes:

for any child node in the two child nodes, determining an index value of the evaluation index according to sample prediction performed according to a decision rule corresponding to the child node, and determining constraint conformity of the child node according to comparison between the index value and the index threshold value;

and determining the larger of the constraint conformity of the two child nodes as the constraint fitness of the splitting condition.

5. The method of claim 4, wherein the evaluation index comprises one of:

confidence, recall rate, number of recalls, stability.

6. The method according to claim 4, wherein determining the constraint compliance of the child node according to the comparison between the metric value and the metric threshold value specifically comprises:

if the index value meets the index threshold, determining the constraint conformity of the child node as 0;

and if the index value does not accord with the index threshold, taking the inverse number of the absolute value of the difference between the index value and the index threshold as the constraint conformity of the child node.

7. The method of claim 1, wherein splitting for any current node further comprises:

and determining the splitting purity of the splitting condition according to the sample purity of the current sample set corresponding to the current node and the sample purities of the two sample subsets corresponding to the two sub-nodes respectively.

8. The method of claim 7, wherein the sample purity is determined based on one of the following indicators: information entropy, kini coefficient.

9. The method according to claim 7, wherein the determining the cleavage purity of the cleavage conditions specifically comprises:

taking the ratio of the number of the samples of the two sample subsets to the number of the samples of the current sample set as respective weights, and performing weighted summation on the sample purities of the two sample subsets to obtain a sum value;

determining a splitting purity for the splitting condition based on a difference between the sample purity for the current sample set and the sum value.

10. The method of claim 1, wherein determining a composite score for the splitting condition based on the splitting purity for the splitting condition and the fitness for the constraint comprises:

and respectively taking the first weight and the second weight as weight factors, and carrying out weighted summation on the splitting purity and the constraint fitness to obtain the comprehensive score.

11. The method of claim 10, wherein,

the first weight is determined according to a first variance of a plurality of splitting purities respectively corresponding to the plurality of candidate splitting conditions, and is inversely related to the first variance;

the second weight is determined according to a second variance of a plurality of constraint fitness degrees respectively corresponding to the plurality of candidate splitting conditions, and is inversely related to the second variance.

12. The method of claim 1, wherein the first decision tree includes a first number N of leaf nodes; the determining a decision model for classifying a business object based on the first decision tree includes:

determining N decision rules corresponding to N paths from a root node to N leaf nodes in the first decision tree;

screening out a second number M of decision rules which meet the constraint condition from the N decision rules;

forming the decision model based on the M decision rules.

13. The method of claim 12, wherein forming the decision model based on the M decision rules comprises:

for each of the M decision rules, performing the following clipping iterations: if the parent rule of the current decision rule meets the constraint condition, replacing the current decision rule with the parent rule until the parent rule does not meet the constraint condition any more; the current decision rule corresponds to a first node sequence starting from a root node in the first decision tree, and the father rule is a decision rule corresponding to a node sequence obtained by cutting the last node in the first node sequence;

and forming the decision model based on a non-repeated decision rule obtained after the cutting iteration is executed.

14. The method of claim 1, further comprising:

predicting each sample of the sample collection by using the first decision tree to obtain a first sample set formed by samples predicted to belong to the target service classification;

removing the first sample set from the total sample set to obtain a second sample set;

constructing a second decision tree by utilizing the same node splitting mode as the first decision tree construction according to the second sample set;

the determining a decision model for classifying a business object based on the first decision tree includes:

determining the decision model based on the first decision tree and the second decision tree.

15. The method of any of claims 1-14, wherein the business object comprises one of: user, operation event, transaction, service application request; the target traffic class indicates traffic objects at risk.

16. A decision model training apparatus for business object classification, comprising:

17. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-15.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-15.