[go: up one dir, main page]

CN119558961A - A risk prediction method, device, electronic device, storage medium and program product - Google Patents

A risk prediction method, device, electronic device, storage medium and program product Download PDF

Info

Publication number
CN119558961A
CN119558961A CN202411708298.1A CN202411708298A CN119558961A CN 119558961 A CN119558961 A CN 119558961A CN 202411708298 A CN202411708298 A CN 202411708298A CN 119558961 A CN119558961 A CN 119558961A
Authority
CN
China
Prior art keywords
predicted
resource application
information
features
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411708298.1A
Other languages
Chinese (zh)
Inventor
王熹佳
安义文
肖宏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202411708298.1A priority Critical patent/CN119558961A/en
Publication of CN119558961A publication Critical patent/CN119558961A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种风险预测方法、装置、电子设备、存储介质和程序产品,具体涉及计算机技术领域。具体实施方案包括:获取待预测对象的对象基本特征和所述待预测对象的历史资源申请特征;合并所述对象基本特征和所述历史资源申请特征,得到目标向量;以设定目标搜索所述目标向量中的特征信息;将所述特征信息输入至预测模型,得到类别概率,所述类别概率为预测得到的表征所述待预测对象是否存在资源申请风险的概率。通过获取待预测对象的对象基本特征和历史资源申请特征,输入至预测模型中,得到待预测对象的资源申请风险概率,实现了对待预测对象未来的资源申请风险识别,以便采取相应的措施对待预测对象的资源申请进行管理。

The present invention discloses a risk prediction method, device, electronic device, storage medium and program product, and specifically relates to the field of computer technology. The specific implementation scheme includes: obtaining the basic characteristics of the object to be predicted and the historical resource application characteristics of the object to be predicted; merging the basic characteristics of the object and the historical resource application characteristics to obtain a target vector; searching for feature information in the target vector with a set target; inputting the feature information into a prediction model to obtain a category probability, which is the predicted probability that characterizes whether the object to be predicted has a resource application risk. By obtaining the basic characteristics of the object to be predicted and the historical resource application characteristics of the object to be predicted, and inputting them into the prediction model, the resource application risk probability of the object to be predicted is obtained, and the future resource application risk of the object to be predicted is identified, so that corresponding measures can be taken to manage the resource application of the object to be predicted.

Description

Risk prediction method, apparatus, electronic device, storage medium and program product
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a risk prediction method, apparatus, electronic device, storage medium, and program product.
Background
On-line credit services are an important component of financial institutions, and with the development of internet technology, more and more financial institutions offer personal loan services to customers through on-line channels. Although online credit services are characterized by convenience and rapidity, they are also subject to higher risks, and therefore, how to effectively manage online credit risk has become one of the focuses of financial institutions.
Taking corresponding risk management measures before providing personal loan services to customers is a current problem to be solved.
Disclosure of Invention
The invention provides a risk prediction method, a risk prediction device, electronic equipment, storage media and a program product, which are used for predicting the probability that an object to be predicted has a resource application risk.
According to an aspect of the present invention, there is provided a risk prediction method including:
acquiring object basic characteristics of an object to be predicted and historical resource application characteristics of the object to be predicted;
Combining the object basic characteristics and the historical resource application characteristics to obtain a target vector;
searching characteristic information in the target vector by a set target;
And inputting the characteristic information into a prediction model to obtain category probability, wherein the category probability is the probability indicating whether the resource application risk exists in the object to be predicted obtained through prediction.
According to another aspect of the present invention, there is provided a risk prediction apparatus including:
The acquisition module is used for acquiring object basic characteristics of an object to be predicted and historical resource application characteristics of the object to be predicted;
The merging module is used for merging the object basic characteristics and the historical resource application characteristics to obtain a target vector;
The searching module is used for searching the characteristic information in the target vector with a set target;
the input module is used for inputting the characteristic information into the prediction model to obtain category probability, wherein the category probability is the probability indicating whether the predicted object has a resource application risk or not.
According to another aspect of the present invention, there is provided an electronic apparatus including:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the risk prediction method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a risk prediction method according to any one of the embodiments of the present invention.
According to another aspect of the invention, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the risk prediction method according to any of the embodiments of the invention.
According to the technical scheme, the object basic characteristics of the object to be predicted and the historical resource application characteristics of the object to be predicted are firstly obtained, then the object basic characteristics and the historical resource application characteristics are combined to obtain the target vector, then the characteristic information in the target vector is searched by setting targets, and finally the characteristic information is input into the prediction model to obtain the category probability which is the probability indicating whether the predicted object has the resource application risk or not. The object basic characteristics and the historical resource application characteristics of the object to be predicted are acquired and input into the prediction model, so that the resource application risk probability of the object to be predicted is obtained, future resource application risk identification of the object to be predicted is realized, and corresponding measures are taken to manage the resource application of the object to be predicted.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a risk prediction method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a predictive model iteration method according to a second embodiment of the invention;
Fig. 3 is a schematic structural diagram of a risk prediction device according to a third embodiment of the present invention;
Fig. 4 is a block diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.
Example 1
Fig. 1 is a flowchart of a risk prediction method provided in an embodiment of the present invention, where the method may be applicable to a situation where a risk prediction is performed on an object to be predicted, and the method may be performed by a risk prediction device, where the risk prediction device may be implemented in a form of hardware and/or software, and the risk prediction device may be configured in an electronic device, where the electronic device is a computer device that may implement training and use of a prediction model. As shown in fig. 1, the method includes:
S110, obtaining object basic characteristics of an object to be predicted and historical resource application characteristics of the object to be predicted.
In this embodiment, the object to be predicted may be understood as an object that needs to perform risk prediction of the resource application, and may be a user that needs to perform prediction and has characteristics required for the prediction. The basic characteristics of the subject may be understood as characteristics related to characteristic information of the subject itself, and may include age, sex, occupation, etc. of the subject. Historical resource application characteristics may be understood as characteristics related to the resource application condition of the subject's historical period.
Specifically, an object to be predicted which can be predicted is selected from a database. Firstly, various features related to the feature information of the object to be predicted are acquired, cross statistical information among the various features can be extracted, and the result is used as the object basic feature of the object to be predicted. Likewise, the characteristics related to the resource application of the object to be predicted are obtained and used as the historical resource application characteristics of the object to be predicted.
For example, various features related to the feature information of the object to be predicted, such as the age, sex, occupation type, credit information and the like of the object to be predicted, may be obtained, cross statistical information among the various features may be extracted by factorization, and the result may be used as the object basic feature of the object to be predicted. Likewise, the features related to the resource application of the object to be predicted, such as the features of time, number, etc. of the resource application, may be obtained as the historical resource application features of the object to be predicted.
Optionally, the object basic features include category features of the object to be predicted, credit metric information of the object to be predicted, and information associated with resource return capability;
The historical resource application information comprises application information and return information of the historical application resource of the object to be predicted.
In this embodiment, the category characteristics may be understood as basic information of the object to be predicted, including age, sex, occupation category, and the like of the object to be predicted. Credit metric information may be understood as a feature related to credit information of an object to be predicted reflecting the credit level of the object to be predicted. The information associated with the resource return capability may be understood as dense features that have passed feature engineering, may include application resource information of the object to be predicted, and the like. The historical resource application information can be understood as the record of the application resource in the historical period of the object to be predicted, the record of the returned resource, and the like, and can be obtained through characteristic engineering. The application information of the historical application resource can be understood as information of the application resource of the object to be predicted in the historical period, and the information includes statistical information of the quantity, the application purpose and the like of the application resource. Similarly, the information of returning the historical application resources can be understood as information of returning the resources in the historical period of the object to be predicted, including statistical information on the number, time, etc. of returning the resources.
And S120, merging the object basic characteristics and the historical resource application characteristics to obtain a target vector.
In this embodiment, the target vector may be understood as a vector obtained by splicing the basic features of the object and the features of the historical resource application, and may be used to store feature information.
Specifically, feature stitching is performed on object basic features and historical resource application features of an object to be predicted, and a complete target vector is obtained. Feature stitching can reduce the dimension of the target vector, and can calculate the numerical values of the object basic features and the historical resource application features of the object to be predicted, and the numerical values are combined into a new target vector.
S130, searching the characteristic information in the target vector with a set target.
In this embodiment, the setting target may be understood as a feature related to the object resource application to be predicted.
Specifically, a target satisfying the setting condition among targets related to the target resource application to be predicted is determined as a setting target. And searching the target vector obtained by splicing the object basic characteristics and the historical resource application characteristics by using a set target, and searching characteristic information related to the object resource application to be predicted in the target vector.
For example, when searching with a set target, searching can be performed by a particle swarm optimization algorithm based on a linearly decreasing weight, so as to search feature information related to the resource application of the object to be predicted in the target vector.
S140, inputting the characteristic information into a prediction model to obtain category probability, wherein the category probability is the probability indicating whether the predicted object has a resource application risk or not.
In this embodiment, the prediction model may be understood as a model for predicting a resource application risk of an object to be predicted, and the model may output a probability of the resource application risk of the object to be predicted. The class probability can be understood as the probability obtained by a prediction model according to the object basic characteristics and the historical resource application characteristics of the object to be predicted, and can represent the size of the resource application risk of the object to be predicted. Resource application risk may be understood as the impact or uncertainty on the resource application due to the behavior of the object to be predicted.
Specifically, feature information searched from the target vector is input into a prediction model, so that category probability for representing the application risk of the resource of the object to be predicted can be obtained. After the class probability is obtained, the class probability can be optimized through an optimization algorithm, so that the probability for representing the application risk of the object resource to be predicted more accurately is obtained.
The prediction model can be an iteratively trained lightweight gradient hoist model, is a tree-based integrated learning method, and combines a plurality of decision trees into a large model by adopting a gradient hoisting technology. And obtaining class probability representing the application risk of the resource of the object to be predicted by inputting the characteristic information into the prediction model. After the class probability is obtained, the class probability can be corrected and optimized through an index optimization algorithm based on the particle swarm, so that the probability for representing the application risk of the object resource to be predicted more accurately is obtained.
According to the technical scheme, the object basic characteristics of the object to be predicted and the historical resource application characteristics of the object to be predicted are firstly obtained, then the object basic characteristics and the historical resource application characteristics are combined to obtain the target vector, then the characteristic information in the target vector is searched by setting targets, and finally the characteristic information is input into the prediction model to obtain the category probability which is the probability indicating whether the predicted object has the resource application risk or not. The object basic characteristics and the historical resource application characteristics of the object to be predicted are acquired and input into the prediction model, so that the resource application risk probability of the object to be predicted is obtained, future resource application risk identification of the object to be predicted is realized, and corresponding measures are taken to manage the resource application of the object to be predicted.
On the basis of the above embodiments, modified embodiments of the above embodiments are proposed, and it is to be noted here that only the differences from the above embodiments are described in the modified embodiments for the sake of brevity of description.
In one embodiment, the searching the feature information in the target vector with the set target includes:
and taking the first evaluation index as a set target and taking a feature selection method as particles, and searching feature information in the target vector by updating the position and the speed of the particles.
In this embodiment, the first evaluation index may be set according to the object basic feature and the historical resource application feature of the object to be predicted, and is used to select feature information related to the resource application in the object vector to be predicted. The setting condition can be understood as a condition for selecting a setting target related to the object resource application to be predicted. The feature selection method is understood to be a method of selecting the most relevant features from the raw data, and each feature selection method can be used as a feature selection scheme. In the particle swarm optimization algorithm, each feature selection method can be used as a particle as a potential solution of the particle swarm optimization algorithm.
Specifically, first, the feature that the first evaluation index in the target vector satisfies the set condition is set as the set target, and the first evaluation index may be an index about the gain or accuracy of the feature information. And using the particle swarm optimization algorithm, taking each feature selection algorithm as a particle, and completing the process of searching the feature information in the target vector by updating the position and the speed of the particle.
In one embodiment, the operation of iteratively updating the weights of the samples during the training phase of the predictive model includes:
and searching the weight by using a particle swarm optimization algorithm based on the linearly decreasing weight value with the aim of maximizing the second evaluation index.
In this embodiment, the second evaluation index may be understood as an index characterizing the performance of the prediction model, and may be determined based on the target of the prediction model and its corresponding weight.
Specifically, when the weight of the sample is updated, a second evaluation index is first determined according to the target value and the weight. And searching the optimal weight by using the second evaluation index for realizing maximization as a target based on a particle swarm optimization algorithm with linearly decreasing weight values, and updating the weight of the sample.
In an exemplary particle swarm optimization algorithm based on linearly decreasing weight values, first, the weight of each sample is taken as one particle, an initial position and an initial speed are allocated to each particle, and the fitness of each particle and the optimal position of each particle are calculated. The particle of the best fitness is found out among all particles as the global extremum. And then updating the position of each particle according to the speed, the current position, the global extremum and the like of each particle, so as to update the global extremum, obtain the optimal weight and finish updating the sample weight.
In one embodiment, the second evaluation index is a value determined based on a target value and a weight, the target value including a value measuring the combined performance of the precision and recall.
In this embodiment, the target value is an index that measures the average between the accuracy rate and the recall rate, and is used to balance between the accuracy rate and the recall rate, and a larger target value may indicate a better model performance. The accuracy refers to the proportion of the actual positive class in the sample predicted by the prediction model to be the positive class, and is used for measuring the accuracy of the prediction model to predict the positive class. The recall rate refers to the proportion of the positive class predicted by the prediction model in all samples which are actually positive classes, and is used for measuring how much of all positive class samples are correctly predicted.
Example two
Fig. 2 is a flowchart of an iterative method of a prediction model according to a second embodiment of the present invention, where the iterative flow of one of the training phases of the prediction model in the present embodiment is expanded. As shown in fig. 2, the method includes:
S210, according to the gradient of the training sample, a sample set of the round of iteration is screened from the training sample.
In this embodiment, the training samples are understood to consist of feature information searched out from the target vector of the object to be predicted. A sample set is understood as a set of samples selected from the training samples according to sample gradients, a process for training a predictive model.
Specifically, a target vector is obtained according to the object basic characteristics and the historical resource application characteristics of the object to be predicted, characteristic information is searched from the target vector, and the characteristic information is used as a training sample. And determining the gradient of the training sample, and utilizing the characteristic that the sample with large gradient has larger influence on information gain calculation. And during the iteration of the round, a small part of training samples with large gradient are reserved, random sampling is carried out on the training samples with small gradient, and the selected training samples are put into a sample set for the iteration of the round.
S220, determining whether to perform feature dimension reduction processing or not based on the distribution information of the features in the sample set.
In this embodiment, the distribution information of the features may be understood as the distribution condition of the features of the training sample in the sample set, including sparsity, mutual exclusion, and computational complexity of feature distribution. Feature dimension reduction is understood as an operation of preserving important information of features while reducing the number of features in a sample set, and noise and irrelevant features can be removed.
Specifically, according to the distribution information of the features in the sample set, if the features in the sample set are mostly sparse and a large number of mutually exclusive features exist in the sample set, dimension reduction can be considered on the features in the sample set, so that the operation complexity is reduced. If the features in the sample set do not meet the sparsity and mutual exclusion, feature dimension reduction processing is not needed for the features.
Illustratively, if the distribution information of the features in the sample set shows that most of the features are sparse and there are a large number of mutually exclusive features in the sample set, i.e., many features in the sample set are zero values in most samples and many sparse features are not non-zero values at the same time, the feature in the sample set may be considered to be dimensionality-reduced.
And S230, if so, binding the mutually exclusive features in the sample set to obtain bound features.
Specifically, when the feature in the sample set needs to perform feature dimension reduction, the feature dimension reduction can be realized through a mutual exclusion feature binding algorithm. Firstly, selecting mutually exclusive features with good fusion effect through a greedy binding algorithm, and then combining the mutually exclusive features into new features by utilizing a combined mutually exclusive feature algorithm, thereby completing feature dimension reduction.
Illustratively, among all features of a sample set, there are mutually exclusive features, i.e., they are not non-zero values at the same time, which can be bundled into one feature by a mutually exclusive feature bundling algorithm. For example, first the mutually exclusive feature binding algorithm finds all the mutually exclusive features in the sample set, and then binds the found mutually exclusive features together to form a new feature, where the value of the new feature may be the sum of the original feature values.
S240, constructing a decision tree based on the histogram based on the bundled features.
In this embodiment, the decision tree is a tree structure of nodes and edges that can be used to infer target variables from features.
Specifically, the bundled features are used as input features, and continuous input features are first discretized into a limited number of bins while constructing a histogram. Statistics are accumulated in the histogram according to the discretized values as an index while traversing the input features. After traversing the data once, the histogram accumulates the required statistics, then traverses and searches the optimal segmentation point according to the discrete value of the histogram, and then determines the root node, the child node and the decision rule of the decision tree according to the optimal segmentation point, thereby completing the construction of the decision tree based on the histogram.
For example, for a feature with a value in the range of [0,100], if it is discretized into 10 intervals [0,10 ], [10,20 ], [90,100], only the statistics of these 10 intervals need to be calculated, simplifying the calculation process.
S250, training the prediction model based on the decision tree.
Specifically, a predictive model is trained through a decision tree based on a histogram, and a sample set of the current round of iteration is selected from training samples by a gradient sampling method. And then performing dimension reduction on the training sample needing to be subjected to feature dimension reduction through a mutual exclusion feature binding algorithm to obtain bound features. And finally, constructing a decision tree based on a histogram by utilizing the bundled features, and training a prediction model. The process of single iteration training is that different sample sets are selected through different training samples, multiple iterations of the training model can be achieved, and finally a prediction model for the object to be predicted is obtained.
According to the technical scheme of the embodiment of the invention, firstly, a sample set of the round of iteration is selected from training samples according to the gradient of the training samples, whether feature dimension reduction processing is carried out is determined based on the distribution information of features in the sample set, and if so, mutually exclusive features in the sample set are bound to obtain bound features. And finally, constructing a decision tree based on a histogram based on the bundled features, and training the prediction model based on the decision tree. Each iteration flow of the training stage of the prediction model is described in detail, and the calculation difficulty of model training is reduced, the model training speed is improved, and the performance of the prediction model is optimized through feature dimension reduction and feature binding.
Example III
Fig. 3 is a schematic structural diagram of a risk prediction apparatus according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes:
an obtaining module 310, configured to obtain an object basic feature of an object to be predicted and a historical resource application feature of the object to be predicted;
a merging module 320, configured to merge the object basic feature and the historical resource application feature to obtain a target vector;
A search module 330 for searching for feature information in the target vector with a set target;
the input module 340 is configured to input the feature information into a prediction model, and obtain a category probability, where the category probability is a probability obtained by prediction and representing whether the object to be predicted has a risk of applying for resources.
According to the risk prediction device provided by the embodiment of the invention, the object basic characteristics of the object to be predicted and the historical resource application characteristics of the object to be predicted are firstly obtained through the obtaining module, then the object basic characteristics and the historical resource application characteristics are combined through the combining module to obtain the target vector, then the characteristic information in the target search target vector is set through the searching module, and finally the characteristic information is input into the prediction model through the input module to obtain the category probability which is the predicted probability representing whether the resource application risk exists in the object to be predicted. Through mutual coordination among the modules, the object basic characteristics and the historical resource application characteristics of the object to be predicted are obtained and are input into the prediction model, so that the resource application risk probability of the object to be predicted is obtained, future resource application risk identification of the object to be predicted is realized, and corresponding measures are taken to manage the resource application of the object to be predicted.
In one embodiment, the search module 330 is specifically configured to:
and taking the first evaluation index as a set target and taking a feature selection method as particles, and searching feature information in the target vector by updating the position and the speed of the particles.
In one embodiment, the input module 340 includes a process for each iteration of the training phase of the predictive model, specifically for:
according to the gradient of the training sample, a sample set of the round of iteration is screened from the training sample;
determining whether to perform feature dimension reduction processing or not based on the distribution information of the features in the sample set;
if yes, binding the mutually exclusive features in the sample set to obtain bound features;
constructing a decision tree based on a histogram based on the bundled features;
training the prediction model based on the decision tree.
In one embodiment, the input module 340 includes operations for iteratively updating weights of samples during a training phase of a predictive model, specifically for:
and searching the weight by using a particle swarm optimization algorithm based on the linearly decreasing weight value with the aim of maximizing the second evaluation index.
In one embodiment, the second evaluation index is a value determined based on a target value and a weight, the target value including a value measuring the combined performance of the precision and recall.
In one embodiment, the object basic characteristics comprise category characteristics of the object to be predicted, credit measurement information of the object to be predicted and information associated with resource return capability, and the historical resource application information comprises application information and return information of the historical application resource of the object to be predicted.
The data center submitting device provided by the embodiment of the invention can execute the risk prediction method provided by any embodiment of the invention, completes the submitting of the data center through the mutual coordination and cooperative work among the modules, and has the corresponding functional modules and beneficial effects of the executing method.
Example IV
According to an embodiment of the invention, the invention also provides an electronic device, a computer-readable storage medium and a computer program product.
Fig. 4 is a block diagram of an electronic device according to a fourth embodiment of the present invention, where the electronic device may implement the risk prediction method according to the embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 410 includes at least one processor 411, and a memory, such as a Read Only Memory (ROM) 412, a Random Access Memory (RAM) 413, etc., communicatively connected to the at least one processor 411, wherein the memory stores a computer program executable by the at least one processor, and the processor 411 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 412 or the computer program loaded from the storage unit 418 into the Random Access Memory (RAM) 413. In the RAM 413, various programs and data required for the operation of the electronic device 410 may also be stored. The processor 411, the ROM 412, and the RAM 413 are connected to each other through a bus 414. An input/output (I/O) interface 415 is also connected to bus 414.
Various components in the electronic device are connected to the I/O interface 415, including an input unit 416, such as a keyboard, mouse, etc., an output unit 417, such as various types of displays, speakers, etc., a storage unit 418, such as a magnetic disk, optical disk, etc., and a communication unit 419, such as a network card, modem, wireless communication transceiver, etc. The communication unit 419 allows the electronic device to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The processor 411 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 411 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 411 performs the various methods and processes described above, such as risk prediction methods.
In some embodiments, the risk prediction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 418. In some embodiments, some or all of the computer program may be loaded and/or installed onto the electronic device 410 via the ROM 412 and/or the communication unit 419. When the computer program is loaded into RAM 413 and executed by processor 411, one or more steps of the risk prediction method described above may be performed. Alternatively, in other embodiments, processor 411 may be configured to perform the risk prediction method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), a blockchain network, and the Internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
In some embodiments, the computer program product comprises a computer program which, when executed by a processor, implements the risk prediction method provided by embodiments of the present invention.
According to the technical scheme, through a risk prediction method, device, electronic equipment, storage medium and program product, object basic characteristics of an object to be predicted and historical resource application characteristics of the object to be predicted are firstly obtained, then the object basic characteristics and the historical resource application characteristics are combined to obtain a target vector, characteristic information in the target vector is searched by setting targets, and finally the characteristic information is input into a prediction model to obtain category probability which is the probability indicating whether the predicted object has resource application risk or not. According to the method, the device and the system, the object basic characteristics and the historical resource application characteristics of the object to be predicted are obtained and input into the prediction model, so that the resource application risk probability of the object to be predicted is obtained, future resource application risk identification of the object to be predicted is realized, and corresponding measures are taken to manage the resource application of the object to be predicted.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1.一种风险预测方法,其特征在于,包括:1. A risk prediction method, comprising: 获取待预测对象的对象基本特征和所述待预测对象的历史资源申请特征;Obtaining basic characteristics of the object to be predicted and historical resource application characteristics of the object to be predicted; 合并所述对象基本特征和所述历史资源申请特征,得到目标向量;Combining the basic features of the object and the historical resource application features to obtain a target vector; 以设定目标搜索所述目标向量中的特征信息;Searching for characteristic information in the target vector with a set target; 将所述特征信息输入至预测模型,得到类别概率,所述类别概率为预测得到的表征所述待预测对象是否存在资源申请风险的概率。The feature information is input into a prediction model to obtain a category probability, where the category probability is a predicted probability that characterizes whether the object to be predicted has a resource application risk. 2.根据权利要求1所述的方法,其特征在于,所述以设定目标搜索所述目标向量中的特征信息,包括:2. The method according to claim 1, characterized in that the step of searching the characteristic information in the target vector with a set target comprises: 以第一评价指标满足设定条件为设定目标,将特征选择方法作为粒子,通过更新粒子的位置和速度,搜索所述目标向量中的特征信息。The first evaluation index satisfies the set condition as the set goal, the feature selection method is used as a particle, and the feature information in the target vector is searched by updating the position and speed of the particle. 3.根据权利要求1所述的方法,其特征在于,所述预测模型的训练阶段的每一次迭代流程,包括:3. The method according to claim 1, characterized in that each iteration process of the training phase of the prediction model comprises: 根据训练样本的梯度大小,从训练样本中筛选本轮迭代的样本集;According to the gradient size of the training samples, the sample set for this round of iteration is selected from the training samples; 基于所述样本集中特征的分布信息,确定是否进行特征降维处理;Determining whether to perform feature dimensionality reduction processing based on distribution information of features in the sample set; 若是,将所述样本集中互斥的特征进行捆绑,得到捆绑后特征;If so, the mutually exclusive features in the sample set are bundled to obtain bundled features; 基于捆绑后特征构建基于直方图的决策树;Construct a histogram-based decision tree based on the bundled features; 基于所述决策树进行所述预测模型的训练。The prediction model is trained based on the decision tree. 4.根据权利要求1所述的方法,其特征在于,所述预测模型的训练阶段迭代更新样本的权重的操作,包括:4. The method according to claim 1, characterized in that the operation of iteratively updating the weights of samples in the training phase of the prediction model comprises: 以最大化第二评价指标为目标,基于线性递减权重值的粒子群优化算法搜索权重。With the goal of maximizing the second evaluation index, the particle swarm optimization algorithm based on linearly decreasing weight values searches for weights. 5.根据权利要求4所述的方法,其特征在于,所述第二评价指标为基于目标值和权重确定的数值,所述目标值包括衡量精确率和召回率综合性能的数值。5. The method according to claim 4 is characterized in that the second evaluation index is a numerical value determined based on a target value and a weight, and the target value includes a numerical value measuring the comprehensive performance of precision and recall. 6.根据权利要求1所述的方法,其特征在于,所述对象基本特征包括所述待预测对象的类别特征、所述待预测对象的信用度量信息和与资源返还能力关联的信息;6. The method according to claim 1, characterized in that the basic characteristics of the object include the category characteristics of the object to be predicted, the credit metric information of the object to be predicted and the information associated with the resource return capability; 所述历史资源申请信息包括所述待预测对象历史申请资源的申请信息和返还信息。The historical resource application information includes application information and return information of the historical resources applied for by the object to be predicted. 7.一种风险预测装置,其特征在于,包括:7. A risk prediction device, comprising: 获取模块,用于获取待预测对象的对象基本特征和所述待预测对象的历史资源申请特征;An acquisition module, used to acquire basic characteristics of an object to be predicted and historical resource application characteristics of the object to be predicted; 合并模块,用于合并所述对象基本特征和所述历史资源申请特征,得到目标向量;A merging module, used for merging the basic features of the object and the historical resource application features to obtain a target vector; 搜索模块,用于以设定目标搜索所述目标向量中的特征信息;A search module, used to search for characteristic information in the target vector with a set target; 输入模块,用于将所述特征信息输入至预测模型,得到类别概率,所述类别概率为预测得到的表征所述待预测对象是否存在资源申请风险的概率。The input module is used to input the feature information into the prediction model to obtain the category probability, which is the predicted probability that characterizes whether the object to be predicted has a resource application risk. 8.一种电子设备,其特征在于,所述电子设备包括:8. An electronic device, characterized in that the electronic device comprises: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的风险预测方法。The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the risk prediction method according to any one of claims 1 to 6. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现权利要求1-6中任一项所述的风险预测方法。9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to implement the risk prediction method according to any one of claims 1 to 6 when executed by a processor. 10.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述的风险预测方法。10. A computer program product, characterized in that the computer program product comprises a computer program, and when the computer program is executed by a processor, the risk prediction method according to any one of claims 1 to 6 is implemented.
CN202411708298.1A 2024-11-26 2024-11-26 A risk prediction method, device, electronic device, storage medium and program product Pending CN119558961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411708298.1A CN119558961A (en) 2024-11-26 2024-11-26 A risk prediction method, device, electronic device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411708298.1A CN119558961A (en) 2024-11-26 2024-11-26 A risk prediction method, device, electronic device, storage medium and program product

Publications (1)

Publication Number Publication Date
CN119558961A true CN119558961A (en) 2025-03-04

Family

ID=94744718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411708298.1A Pending CN119558961A (en) 2024-11-26 2024-11-26 A risk prediction method, device, electronic device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN119558961A (en)

Similar Documents

Publication Publication Date Title
JP2017228086A (en) Machine learning management program, machine learning management method, and machine learning management apparatus
CN114490408B (en) A test case generation method, device, equipment, storage medium and product
CN117519946A (en) Memory resource scheduling methods, devices, equipment and media in deep learning networks
CN114861800B (en) Model training method, probability determining device, model training equipment, model training medium and model training product
CN114756753B (en) Product recommendation method, device, electronic device and storage medium
CN114048136B (en) Test type determination method, device, server, medium and product
CN115827979B (en) Knowledge recommendation method and device, electronic equipment and storage medium
CN117350811B (en) Order processing method, order processing device, electronic equipment and storage medium
CN119558961A (en) A risk prediction method, device, electronic device, storage medium and program product
CN118607656A (en) A method, device, equipment and medium for determining parameters of a regression calculation model
CN115017145B (en) Data expansion method, device and storage medium
CN117726448A (en) A method, device, equipment and storage medium for recommending financial products
CN116028824B (en) Training method of text matching model, text query method, device and equipment
CN118827413B (en) Traffic prediction methods and devices, electronic devices and storage media for IP bearer networks
CN120448238B (en) Operator performance evaluation method, device, equipment and medium in AI chip
CN116383242B (en) An online index selection method and apparatus based on neural networks and decay strategies
CN117668209A (en) Document recommendation method, device, equipment and storage medium
CN119884478A (en) Modeling model recommendation method, device, equipment and storage medium
CN120876089A (en) Data processing method, device, electronic equipment and storage medium
CN119048202A (en) Product recommendation strategy determining method and device, electronic equipment and storage medium
CN120656566A (en) Method, device and equipment for predicting intermolecular interaction force
CN120852013A (en) A financial product recommendation method, device, equipment and storage medium
CN120182644A (en) GCN-based clustering method, device, equipment and medium
CN117009857A (en) Data screening methods, devices, equipment, storage media and products
CN120471713A (en) A financial transaction risk prediction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination