CN109214912A - Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data - Google Patents
Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data Download PDFInfo
- Publication number
- CN109214912A CN109214912A CN201810931189.4A CN201810931189A CN109214912A CN 109214912 A CN109214912 A CN 109214912A CN 201810931189 A CN201810931189 A CN 201810931189A CN 109214912 A CN109214912 A CN 109214912A
- Authority
- CN
- China
- Prior art keywords
- user
- application program
- target application
- sample
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention proposes a kind of processing method of behavioral data, behavior prediction method, apparatus, equipment and medium, the processing method includes: the difference between the application program usage behavior data of application program usage behavior data and the second class sample of users based on the first kind sample of users in sample of users set, filter out at least one destination application, first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur;The duration that at least one destination application is used according to user each in sample of users set, calculates the weight of each destination application at least one corresponding destination application of each user;According to the weight of the corresponding each destination application of each user, the overdue refund prediction model of user is determined.Through the embodiment of the present invention, manual intervention is reduced, avoids consuming a large amount of human resources, prediction result caused by human subjective's judgement is reduced and deviation occurs.
Description
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, a device, and a medium for processing behavior data, and a method, an apparatus, and a medium for predicting behavior.
Background
The personal credit evaluation adopts a certain modeling method to predict future overdue risks of a person after generating characteristic variables by collecting multi-dimensional personal historical data, and is widely applied to the fields of credit card application, consumption staging, deposit-free leasing and the like.
The credit investigation data is a widely used judgment basis of large-scale organizations such as banks and the like at present, the credit investigation data is strongly associated with overdue risks, and the data is standard, but the covered population is limited, and the credit investigation requirements of a large number of credit investigation record-free populations cannot be met. In view of this, more and more organizations are beginning to use fragmented, unstructured data to predict an individual's overdue risk.
There are two ways in the prior art to predict whether a lending user will be overdue for a payment. The first is a wind control rule based on expert experience, which requires wind control personnel to follow up with market dynamics in time, captures behavior of the loan user, and judges whether the user has high fraud or overdue risk according to the behavior of the loan user. The second is feature engineering based on expert experience (feature engineering, meaning extracting features from raw data for use by models), usually by staff extracting feature training models from raw data, with models to predict whether a user has a high risk of fraud or overdue.
The two wind control application technologies lack stability, and workers need to constantly observe the effectiveness of wind control rules and characteristic engineering, namely whether overdue users and non-overdue users can be distinguished. Because manual intervention is prone to deviation, a relatively large deviation can be caused in case of a working error of a worker. Manual intervention requires a great deal of effort, time and efficiency.
Disclosure of Invention
The embodiment of the invention provides a behavior data processing method, a behavior prediction device, equipment and a medium, and can obtain a user overdue repayment prediction model based on the behavior of a sample user using an application program, the user overdue repayment prediction model can predict whether a target user is overdue repayment, manual intervention is reduced, a large amount of human resources are avoided being consumed, result deviation caused by artificial subjective judgment is reduced, the period of predicting whether the user is overdue repayment is shortened, and prediction efficiency is improved.
In a first aspect, an embodiment of the present invention provides a method for processing behavior data of an application used by a user, including:
acquiring application program use behavior data of all users in a sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data;
screening out at least one target application program based on the difference between the application program use behavior data of the first type of sample users and the application program use behavior data of the second type of sample users in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not yet overdue repayment;
calculating the weight of each target application program in the at least one target application program corresponding to each user according to the time length of each user in the sample user set using the at least one target application program;
determining a user overdue repayment prediction model according to the weight of each target application program corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can be overdue payment.
In a second aspect, an embodiment of the present invention provides a user behavior prediction method, including:
calculating the weight of each target application program corresponding to a target user according to the time length of each target application program in at least one target application program used by the target user;
inputting the weight of each target application program corresponding to the target user into a preset overdue payment prediction model of the user to obtain a prediction result of whether the target user will overdue payment; the user overdue payment prediction model according to the first aspect is adopted by the user overdue payment prediction model, and the at least one target application is the at least one target application according to the first aspect.
In a third aspect, an embodiment of the present invention provides a device for processing behavior data when a user uses an application program, where the device includes:
the acquisition module is used for acquiring application program use behavior data of all users in the sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data;
the screening module is used for screening out at least one target application program based on the difference between the application program use behavior data of the first type of sample users and the application program use behavior data of the second type of sample users in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not yet overdue repayment;
a calculating module, configured to calculate, according to a duration that each user in the sample user set uses the at least one target application, a weight of each target application in the at least one target application corresponding to each user;
the determining module is used for determining a user overdue repayment prediction model according to the weight of each target application program corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can be overdue payment.
In a fourth aspect, an embodiment of the present invention provides a device for predicting user behavior, including:
the computing module is used for computing the weight of each target application program corresponding to a target user according to the time length of the target user using each target application program in at least one target application program;
the model prediction module is used for inputting the weight of each target application program corresponding to the target user into a preset overdue payment prediction model of the user to obtain a prediction result of whether the target user will be overdue payment; the user overdue payment prediction model according to the first aspect is adopted by the user overdue payment prediction model, and the at least one target application is the at least one target application according to the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computing device, including: a processor, a memory, and computer program instructions stored in the memory;
the computer program instructions, when executed by the processor, implement the method of the first aspect;
or,
the computer program instructions, when executed by the processor, implement the method of the second aspect.
In a sixth aspect, embodiments of the present invention provide a computer-readable storage medium, having stored thereon computer program instructions,
the computer program instructions, when executed by a processor, implement the method of the first aspect;
or,
the computer program instructions, when executed by a processor, implement the method of the second aspect.
The behavior data processing method, the behavior prediction device, the behavior data processing equipment and the behavior data processing medium can screen out target application programs with different use differences between overdue repayment users and non-overdue repayment users, determine the overdue repayment prediction model of the users based on the use duration of the target application programs by the sample users, predict whether the target users of loans are overdue repayment or not by using the determined overdue repayment prediction model of the users, reduce manual intervention, avoid consuming a large amount of human resources and reduce prediction result deviation caused by manual subjective judgment. Because the machine automatically predicts whether the user is overdue for payment, the prediction period can be shortened, and the prediction efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for processing behavior data of a user using an application according to an embodiment of the present invention;
FIG. 2 shows ROC curves for a predictive model of overdue repayment by a user;
FIG. 3 shows a histogram of the relationship between the probability range of overdue repayment and the number of samples;
FIG. 4 shows a line graph of the number of samples of a overdue payment in each group as a proportion of the total number of samples of the group;
FIG. 5 is a flowchart illustrating a method for processing behavior data of a user using an application according to an embodiment of the present invention;
FIG. 6 is a block diagram of a device for processing behavior data of a user using an application according to an embodiment of the present invention;
FIG. 7 is a block diagram of a user behavior prediction apparatus according to an embodiment of the present invention;
FIG. 8 is a block diagram illustrating an exemplary hardware architecture of a computing device.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described are merely illustrative of the invention and are not intended to limit the invention. Terms such as first, second, etc. in this document are only used for distinguishing one entity (or operation) from another entity (or operation), and do not indicate any relationship or order between these entities (or operations); in addition, terms such as upper, lower, left, right, front, rear, and the like in the text denote directions or orientations, and only relative directions or orientations, not absolute directions or orientations. Without additional limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other elements in a process, method, article, or apparatus that comprises the element.
Fig. 1 is a flowchart illustrating a method for processing behavior data of a user using an application according to an embodiment of the present invention. The method comprises the following steps: s101 to S104.
S101, acquiring application program use behavior data of all users in a sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data.
As one example, Application (APP) behavior data of a user is obtained after authorization by the user.
For example, the APP behavior data is an APP behavior list, which is in a text format and belongs to typical unstructured data. Unstructured data is data that is irregularly or incompletely structured and inconvenient to represent with a database two-dimensional logical table, including text, images, audio, video, and the like.
The application usage behavior data includes an application installation list, application installation data for a sufficient behavior observation period (not less than 3 months), and application uninstallation data.
As one example, the application installation data includes, but is not limited to, application installation time and number of applications installed, and the application uninstallation data includes, but is not limited to, application uninstallation time and number of applications uninstalled.
It should be noted that the application may be an application in the financial field, and for example, the application may include: bank applications and financial applications. Of course, the application program may also be an application program in other fields, such as the application program may also include: tool class applications and network communication class applications.
S102, screening out at least one target application program based on the difference between the application program use behavior data of the first type of sample users and the application program use behavior data of the second type of sample users in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not.
As an example, the total number of sample users in the sample user set is 1 ten thousand, where the number of first type sample users is 1500 and the number of second type sample users is 8500.
It should be noted that the first type sample user is a user who has made an overdue payment, and the second type sample user is a user who has not made an overdue payment. The difference between the use of the screened target application program by the first type of sample users and the use of the screened target application program by the second type of sample users is larger, so that the target application program can distinguish users who are overdue for payment from users who are not overdue for payment. Such as fraudulent group users or users who are frequently overdue for repayment, use an application more often than users who are not overdue for repayment.
The first type of sample user may be a negative sample user and the second type of sample user may be a positive sample user. Of course, the first type sample user may be a positive sample user, and the second type sample user may be a negative sample user, which is not limited herein.
S103, calculating the weight of each target application program in the at least one target application program corresponding to each user according to the time length of each user using the at least one target application program in the sample user set.
As one example, the length of time that the user uses the target application may be determined based on the time that the user installs the application and the time that the application is uninstalled.
For example, the user first installs a target application program at 20 in this month, uninstalls the target application program at 25 in this month, and then installs the target application program again at 26 in this month, and uses the target application program until now. Therefore, the time for the user to use the application program can be analyzed according to the data.
S104, determining a overdue repayment prediction model of each user according to the weight of each target application program corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can overdue payment.
As an example, the weight of each target application for each user is recorded in a table.
For example, table 1 shows the weight of each target application corresponding to the user, APP1 to APP5 are target applications, respectively, and the sample user set includes: user 1, user 2, user 3 …. As can be seen from table 1, the weight of APP1 corresponding to user 1 is 5, the weight of APP2 corresponding to user 1 is 6, the weight of APP3 corresponding to user 1 is 4, and the weight of APP4 corresponding to user 1 is 4. It should be noted that, if a user does not install a target application, the weight of the target application corresponding to the user may be 0.
TABLE 1
| APP1 | APP2 | APP3 | APP4 | |
| User 1 | 5 | 6 | 4 | 4 |
| User 2 | 1 | 1 | 3 | 2 |
| User 3 | 2 | 3 | 1 | 1 |
| … | … | … | … | … |
The processing method of the behavior data of the application program used by the user can be applied to the fields of credit card application, consumption staging, deposit-free leasing and the like, and has the following effects:
1. the effect of the overdue payment prediction model of the user is good, the embodiment of the invention is used for carrying out risk discrimination on the application program level, and the data granularity is finer. The effectiveness of the overdue repayment prediction model of the user can be measured by a Receiver operating characteristic Curve (ROC Curve for short), and the Area size (AUC) below the ROC Curve is between 1.0 and 0.5. In the case of AUC > 0.5, the closer the AUC is to 1, the better the user's overdue repayment prediction model is. Wherein, the AUC has lower accuracy when being 0.5-0.7, certain accuracy when being 0.7-0.9, and higher accuracy when being more than 0.9. When the AUC is 0.5, the user's overdue repayment prediction model has no effect, and the prediction result has no reference meaning. AUC < 0.5 does not correspond to the real case and is rarely found in practice. The risk differentiation capability of the overdue payment prediction model of the user can be evaluated by using a Kolmogorov-Smirnov (KS) value, and the larger the KS value is, the stronger the risk differentiation capability of the overdue payment prediction model of the user is. Or the KS value can measure the accuracy of the overdue repayment prediction model of the user, the larger the KS value is, the more accurate the overdue repayment prediction model of the user is, and the user overdue repayment prediction model can be considered to have better prediction accuracy when the KS is larger than 0.2.
FIG. 2 shows ROC curves for a predictive model of overdue payment for a user. The AUC of the ROC curve is equal to 0.76, which indicates that the overdue payment prediction model of the user has certain accuracy. The KS value of the overdue payment prediction model of the user is 0.38, which indicates that the accuracy of the overdue payment prediction model of the user is higher.
FIG. 3 shows a histogram of the relationship between the probability range of overdue repayment and the number of samples. In the histogram, the horizontal axis represents the probability range of the predicted overdue payment, and the vertical axis represents the number of samples in each probability range, for example, the number of samples with the probability of the predicted overdue payment between 0 and 0.05 is 1500. In the histogram, the total number of samples involved in prediction was 24617, and users who predicted overdue payment to account for 21.9% of the total number of samples by the user overdue payment prediction model, and users who actually incurred overdue payment to account for 21.9% of the total number of samples. Therefore, the accuracy of the overdue repayment prediction model of the user is high.
Figure 4 shows a line graph of the number of samples of a late payment in each group as a proportion of the total number of samples of the group. In fig. 4, the horizontal axis represents the sample group number, and the vertical axis represents the proportion of the number of samples of overdue repayment in the group corresponding to the group number to the total number of samples of the group. The broken line A represents the proportion of the number of samples predicted to be overdue in each group to the total number of samples in the group, and the broken line B represents the proportion of the number of samples actually subjected to overdue in each group to the total number of samples in the group. If 10000 samples exist, the 10000 samples are sorted according to the predicted probability of overdue repayment from small to large, and the 10000 samples are divided into 10 groups according to the sorting sequence, wherein each group contains 1000 samples. Of the 1000 samples of group 1, the number of samples for which overdue payment is predicted is 90, the number of samples for which overdue payment actually occurs is 50, the number of samples for which overdue payment is predicted in group 1 is 9% of the total number of samples of the group, and the number of samples for which overdue payment actually occurs in group 1 is 5% of the total number of samples of the group. Therefore, when the horizontal axis of the fold line A is 1, the corresponding vertical axis is 9%; when the horizontal axis of the fold line B is 1, the corresponding vertical axis is 5%. Since it can be seen from fig. 4 that the fold line a and the fold line B are not very different, the predicted probability of overdue payment is relatively accurate.
2. The iteration of the overdue repayment prediction model of the user is fast, and if the time of about 2 weeks is needed for manually identifying and classifying the application program, the embodiment of the invention saves the time for manually researching, identifying and classifying the application program, avoids consuming a large amount of human resources and reduces the deviation of the prediction result caused by manual subjective judgment.
3. And (3) combining the static data with the dynamic behavior to carry out risk screening: according to the embodiment of the invention, the application program used by the user is considered, the overdue payment prediction model of the user is determined according to the duration of the application program used by the user, the application program list used by the user is static data, the duration of the application program used by the user is dynamic data, and the overdue payment prediction model of the user can be optimized by combining the dynamic data and the static data.
4. The overdue repayment prediction model of the user can be learned online, and the embodiment of the invention does not depend on subjective judgment and artificial input, so that the iterative process of the overdue repayment prediction model of the user can be automatically changed into a full-automatic process of 'model online performance monitoring-model online learning-model automatic deployment'.
It should be noted that, the embodiment of the present invention determines the overdue payment prediction model of the user based on the duration of the target application used by the user. Currently, a common method for performing overdue payment prediction by using applications is to classify the applications, for example, 25 applications are classified, the number of the applications in the first category is 5, the number of the applications in the second category is 6, the number of the applications in the third category is 7, the number of the applications in the fourth category is 5, and the number of the applications in the fifth category is 2. The possibility of overdue of the user is judged by detecting whether the number of the application programs of each category is higher than that of most users.
In one embodiment of the present invention, S102 includes:
calculating a difference value between a first installation rate and a second installation rate, wherein the first installation rate is the installation rate of the application program to be screened in a first type of sample users, and the second installation rate is the installation rate of the application program to be screened in a second type of sample users; and if the difference value between the first installation rate and the second installation rate is larger than a first threshold value, taking the application program to be screened as a target application program.
As one example, the range of the first threshold is greater than or equal to 1.5%.
For example, the first threshold is 2%, or the first threshold is 5%.
Taking the first threshold of 2% as an example, the installation rate of a financial application program in the first type of sample users is 38.5%, the installation rate of the financial application program in the second type of sample users is 35.2%, the difference between the two installation rates is calculated to be 3.3% and is more than 2%, and more users who have overdue repayment are indicated to install the financial application program.
According to the processing method of the behavior data of the application program used by the user, provided by the embodiment of the invention, the difference between the installation rate of the application program in the first type of sample users and the installation rate of the application program in the second type of sample users is calculated, if the difference is larger than the first threshold, the installation of the application program is different between the user who overdue payment occurs and the user who does not occur, and the application program can be used for distinguishing the user who overdue payment occurs and the user who does not occur to a certain extent.
In one embodiment of the present invention, S102 includes:
carrying out significance test on the difference between the use behavior data of the application program to be screened of the first type sample user and the use behavior data of the application program to be screened of the second type sample user; and if the application program passes the significance test, taking the application program to be screened as a target application program.
As an example, the following assumptions are made: the method comprises the following steps that a significant difference exists between the use behavior data of an application program to be screened of a first type of sample user and the use behavior data of the application program to be screened of a second type of sample user; calculating the average value and the variance of the use behavior data of the two types of sample users according to the use behavior data of the application program to be screened of the first type of sample users and the use behavior data of the application program to be screened of the second type of sample users, then calculating a double-overall t test value, inquiring a distribution table of the t test value, and determining a P value which represents the probability of establishing an assumption; if the P value is less than or equal to 0.05, the hypothesis is not satisfied, namely the significance test is not passed; if the P value is > 0.05, the hypothesis is true, i.e., the significance test is passed.
It should be noted that the target application program may be filtered only according to the difference between the first installation rate and the second installation rate; alternatively, the target application may be screened based only on the results of the significance test; alternatively, the target application may be filtered based on a difference between the first installation rate and the second installation rate, and the result of the significance test.
For example, table 2 shows the results of screening the target application.
TABLE 2
| Difference in installation rate | Significance test results | Whether the difference is significant | |
| APP1 | +15.1% | Is remarkable in that | Is that |
| APP2 | -8.5% | Is generally significant | Is that |
| APP3 | +0.2% | Is not significant | Whether or not |
| APP4 | +1.4% | Is not significant | Whether or not |
| APP5 | -4.6% | Is remarkable in that | Is that |
In table 2, the installation rate difference refers to a difference between the first installation rate and the second installation rate, and when the installation rate difference of the application is greater than 2% and the result of the significance test is significant or generally significant, the difference between the first sample user and the second sample user of the application is screened out to be significant, and the application is taken as the target application. In table 2, APP1, APP2 and APP5 are distinct APPs respectively and are target applications screened respectively.
In one embodiment of the present invention, S103 includes:
and smoothing the time length of the single target application program used by the single user by a box separation method, and taking the smoothed numerical value as the weight of the target application program corresponding to the user.
In the binning method, adjacent values are classified into one class, and continuous data is discretized by a local smoothing method, granularity is increased, and noise is removed.
As an example, a corresponding relationship between the duration range of the target application used by the user and the weight is pre-established, and the weight of the target application corresponding to the user is determined according to the corresponding relationship.
For example, the following corresponding relationship is established in advance, and if the time length for installing the target application program by the user does not exceed 3 months, the weight of the target application program corresponding to the user is 1; if the time length for installing the target application program by the user is more than 3 months and less than or equal to 8 months, the weight of the target application program corresponding to the user is 2; if the time length for installing the target application program by the user is more than 8 months and less than or equal to 2 years, the weight of the target application program corresponding to the user is 3; if the time length of the target application program installed by the user is longer than 2 years, the weight of the target application program corresponding to the user is 4. Based on the above correspondence, the weight of the target application program corresponding to a single user can be obtained.
According to the processing method of the behavior data of the application program used by the user, the time length of the target application program used by the user is subjected to smoothing processing, discretization processing of the time length is achieved, data granularity is increased, and data noise is removed.
In one embodiment of the present invention, S104 includes:
respectively inputting the weight of at least one target application program corresponding to each user into a pre-constructed prediction model, and outputting the prediction result of each user by the prediction model; calculating the difference between the prediction result of each user and the actual result of each user, wherein the prediction result of each user represents the predicted probability of overdue repayment of each user, and the actual result of each user represents whether overdue repayment of each user occurs or not; if the difference is larger than a second threshold value, adjusting the coefficient in the prediction model, and returning to the step of respectively inputting the weight of at least one target application program corresponding to each user into the pre-constructed prediction model; and if the difference is equal to or smaller than the second threshold, taking the prediction model when the difference is equal to or smaller than the second threshold as the overdue payment prediction model of the user.
The table described in table 3 can be made based on table 1, and the predicted result of the user and the actual result of the user are added in table 3. The prediction result of the user refers to whether the user predicted by the prediction model is overdue and paid, and the actual result of the user refers to whether the user actually is overdue and paid.
TABLE 3
According to the contents of table 3, the magnitude of the difference between the predicted result of the user and the actual result of the user is calculated to train the prediction function.
According to the processing method of the behavior data of the application program used by the user, the prediction model is trained based on the weight of at least one target application program corresponding to each user and the actual result of each user, the trained prediction model is the overdue payment prediction model of the user, and the overdue payment prediction model of the user can predict according to the behavior of the target application program used by the user.
In one embodiment of the present invention, calculating the magnitude of the difference between the predicted result of each user and the actual result of each user comprises:
the difference size Z is calculated by the following formula,
wherein i is 1, … …, n; j is 1, … …, m; x is the number ofijRepresenting the weight of the ith user in the sample user set corresponding to the jth target application program, m representing the total number of at least one target application program, f (x)i1,xi2...xij...xim) Is a predictive model, f (x)i1,xi2...xij...xim) Output the firstThe predicted result of i users, n represents the total number of sample users in the sample user set, yi represents the actual result of the ith user, and L (f (x)i1,xi2...xij...xim),yi) Is a loss function.
As an example, L (f (x)i1,xi2...xij...xim),yi)=f(xi1,xi2...xij...xim)-yi。
The prediction model is a function of:
wherein, theta0、θ1、θ2...θmAre coefficients in the prediction model.
In an embodiment of the present invention, after S103, the method further includes:
and adjusting the weight of the target application program corresponding to the user according to the proportion of the number of the first type sample users or the number of the second type sample users to the total number of the sample users in the sample user set.
It should be noted that the sum of the ratio of the number of first type sample users to the total number of sample users and the ratio of the number of second type sample users to the total number of sample users is 1.
As an example, adjusting the weight of the target application corresponding to the user includes:
if the user is overdue and paid, when the proportion of the number of the first type sample users to the total number of the samples in the sample set is larger than a first numerical value, the weight of the target application program corresponding to the user is called; or if the user does not pay due to overdue payment, when the proportion of the number of the second samples to the total number of the samples in the sample set is larger than a first numerical value, the weight of the target application program corresponding to the user is called.
As an example, the first value is greater than or equal to 60%, and the weighting of the target application corresponding to the upper call user includes: and the weight of the target application program corresponding to the user is increased by A%, and A is more than or equal to 1 and less than or equal to 3.
As an example, adjusting the weight of the target application corresponding to the user includes:
if the user is overdue, when the proportion of the number of the first type sample users to the total number of the samples in the sample set is smaller than a second numerical value, the weight of the target application program corresponding to the user is adjusted downwards; or if the user does not make overdue payment, when the proportion of the number of the second samples to the total number of the samples in the sample set is smaller than a second numerical value, adjusting the weight of the target application program corresponding to the user downward.
As an example, the second value is less than or equal to 40%, and the down-adjusting the weight of the target application corresponding to the user includes: and B% of the weight of the target application program corresponding to the user is reduced by B%, wherein B is more than or equal to 1 and less than or equal to 3.
As an example, adjusting the weight of the target application corresponding to the user includes:
if the user is overdue and the proportion of the number of the first type sample users in the total number of the samples in the sample set is more than 20% and less than or equal to 40%, subtracting 0.1 from the weight of the target application program corresponding to the user; if the user is overdue and the proportion of the number of the first type sample users in the total number of the samples in the sample set is less than 20%, subtracting 0.2 from the weight of the target application program corresponding to the user; if the user does not have overdue payment and the proportion of the number of the second type sample users in the total number of the samples in the sample set is more than 60% and less than or equal to 80%, adding 0.1 to the weight of the target application program corresponding to the user; and if the user does not have overdue payment and the proportion of the number of the second type sample users in the total number of the samples in the sample set is more than 80% and less than or equal to 100%, adding 0.2 to the weight of the target application program corresponding to the user.
According to the processing method of the behavior data of the application program used by the user, the weight of the target application program corresponding to the user is finely adjusted according to the proportion of the number of the first-type sample users to the total number of the sample users or the proportion of the number of the second-type sample users to the total number of the sample users, so that the weight can better reflect whether the user is overdue or not, and a prediction model trained according to the weight can accurately predict.
Fig. 5 is a flowchart illustrating a method for processing behavior data of a user using an application according to an embodiment of the present invention. The method comprises the following steps: s201 and S202.
S201, calculating the weight of each target application program corresponding to the target user according to the time length of each target application program in at least one target application program used by the target user.
It should be noted that an implementation scheme for calculating the weight of each target application corresponding to the target user is the same as the implementation scheme for calculating the weight in S103 in fig. 1, and details are not repeated here. The target user may be a user of the loan.
S202, inputting the weight of each target application program corresponding to the target user into a preset overdue payment prediction model of the user to obtain a prediction result of whether the target user will overdue payment; the user overdue payment prediction model is the user overdue payment prediction model shown in fig. 1, and the at least one target application is the at least one target application shown in fig. 1.
According to the user behavior prediction method, the weight of the target application program corresponding to the target user is used as the input of the overdue payment prediction model of the user, the prediction result of whether the target user will overdue payment is output through the overdue payment prediction model of the user, and from the perspective of data fact, the behavior of the target user using the application program and the hook of whether the target user will overdue payment are connected, so that manual intervention is reduced, consumption of a large amount of human resources is avoided, and deviation of the prediction result caused by manual subjective judgment is reduced. And whether the target user is a fraud group can be judged according to the prediction result by predicting whether the target user is overdue and repayment, and the loan risk of the user is evaluated.
Fig. 6 is a block diagram of a processing apparatus for processing behavior data of a user using an application according to an embodiment of the present invention. The apparatus 300 comprises: an acquisition module 301, a screening module 302, a calculation module 303, and a determination module 304.
An obtaining module 301, configured to obtain application usage behavior data of all users in a sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data.
The screening module 302 is configured to screen out at least one target application program based on a difference between the application program usage behavior data of the first type of sample user and the application program usage behavior data of the second type of sample user in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not.
The calculating module 303 is configured to calculate a weight of each target application program in the at least one target application program corresponding to each user according to a duration of each user using the at least one target application program in the sample user set.
A determining module 304, configured to determine a user overdue repayment prediction model according to the weight of each target application corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can overdue payment.
In one embodiment of the present invention, the screening module 302 includes: the device comprises a first calculation unit and a first execution unit.
The first calculation unit is used for calculating a difference value between a first installation rate and a second installation rate, wherein the first installation rate is the installation rate of the application program to be screened in a first type of sample users, and the second installation rate is the installation rate of the application program to be screened in a second type of sample users;
and the first execution unit is used for taking the application program to be screened as the target application program when the difference value between the first installation rate and the second installation rate is larger than a first threshold value.
In one embodiment of the present invention, the screening module 302 includes: a significance check unit and a second execution unit.
The significance testing unit is used for testing the significance of the difference between the use behavior data of the application program to be screened of the first type of sample users and the use behavior data of the application program to be screened of the second type of sample users;
and the second execution unit is used for taking the application program to be screened as the target application program when the saliency test is passed.
In one embodiment of the invention, the calculation module 303 comprises:
and the first processing unit is used for smoothing the time length of a single target application program used by a single user through a box separation method, and taking the value after smoothing as the weight of the target application program corresponding to the user.
In one embodiment of the invention, the determining module 304 comprises: the device comprises an input unit, a second calculation unit, a second processing unit and a third execution unit.
And the input unit is used for respectively inputting the weight of at least one target application program corresponding to each user into a pre-constructed prediction model, and the prediction model outputs the prediction result of each user.
And the second calculation unit is used for calculating the difference between the prediction result of each user and the actual result of each user, the prediction result of each user represents the predicted probability of overdue payment of each user, and the actual result of each user represents whether overdue payment of each user occurs or not.
And the second processing unit is used for adjusting the coefficients in the prediction model when the difference size is larger than a second threshold value, and returning to the step of respectively inputting the weight of at least one target application program corresponding to each user into the pre-constructed prediction model.
And the third execution unit is used for taking the prediction model when the difference size is equal to or smaller than the second threshold value as a user overdue payment prediction model if the difference size is equal to or smaller than the second threshold value.
In one embodiment of the invention, the second calculation unit is adapted to,
the difference size Z is calculated by the following formula,
wherein i is 1, … …, n; j is 1, … …, m; x is the number ofijRepresenting the weight of the ith user in the sample user set corresponding to the jth target application program, m representing the total number of at least one target application program, f (x)i1,xi2...xij...xim) Is a predictive model, f (x)i1,xi2...xij...xim) Outputting the prediction result of the ith user, wherein n represents the total number of sample users in the sample user set, and yiRepresents the actual result of the ith user, L (f (x)i1,xi2...xij...xim),yi) Is a loss function.
In one embodiment of the present invention, the processing device 300 for behavior data of a user using an application further includes:
and adjusting the value of the weight according to the proportion of the number of the first type sample users or the number of the second type sample users to the total number of the sample users in the sample user set.
Fig. 7 is a block diagram of a user behavior prediction apparatus according to an embodiment of the present invention. The apparatus 400 comprises: a calculation module 401 and a model prediction module 402.
The calculating module 401 is configured to calculate a weight of each target application corresponding to the target user according to a duration that the target user uses each target application in the at least one target application.
The model prediction module 402 is configured to input the weight of each target application corresponding to the target user into a preset overdue payment prediction model of the user, so as to obtain a prediction result of whether the target user will overdue payment; the user overdue payment prediction model is the user overdue payment prediction model shown in fig. 1, and the at least one target application is the at least one target application shown in fig. 1.
FIG. 8 is a block diagram illustrating an exemplary hardware architecture of a computing device. As shown in fig. 8, computing device 500 includes an input device 501, an input interface 502, a processor 503, a memory 504, an output interface 505, and an output device 506.
The input interface 502, the processor 503, the memory 504, and the output interface 505 are connected to each other via a bus 510, and the input device 501 and the output device 506 are connected to the bus 510 via the input interface 502 and the output interface 505, respectively, and further connected to other components of the computing device 500.
Specifically, the input device 501 receives input information from the outside and transmits the input information to the processor 503 through the input interface 502; the processor 503 processes the input information based on computer-executable instructions stored in the memory 504 to generate output information, stores the output information temporarily or permanently in the memory 504, and then transmits the output information to the output device 506 through the output interface 505; output device 506 outputs the output information outside of computing device 500 for use by a user.
The computing device 500 may perform the steps of the method for processing behavior data of a user using an application described above. Alternatively, the computing device 500 may perform the steps of the user behavior prediction method described above in this application.
The processor 503 may be one or more Central Processing Units (CPUs). In the case where the processor 503 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
The Memory 504 may be, but is not limited to, one or more of a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable programmable Read-Only Memory (EPROM), a Compact Disc Read-Only Memory (CD-ROM), a hard disk, and the like. The memory 504 is used for storing program codes.
It is understood that in the embodiment of the present application, the functions of any one or all of the modules provided in fig. 6 or fig. 7 may be implemented by the central processing unit 503 shown in fig. 8.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product that includes one or more computer program instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer program instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
All parts of the specification are described in a progressive mode, the same and similar parts of all embodiments can be referred to each other, and each embodiment is mainly introduced to be different from other embodiments. In particular, as to the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple and reference may be made to the description of the method embodiments in relevant places.
Claims (12)
1. A processing method for behavior data of a user using an application program is characterized by comprising the following steps:
acquiring application program use behavior data of all users in a sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data;
screening out at least one target application program based on the difference between the application program use behavior data of the first type of sample users and the application program use behavior data of the second type of sample users in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not yet overdue repayment;
calculating the weight of each target application program in the at least one target application program corresponding to each user according to the time length of each user in the sample user set using the at least one target application program;
determining a user overdue repayment prediction model according to the weight of each target application program corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can be overdue payment.
2. The method of claim 1, wherein screening out at least one target application based on a difference between the application usage behavior data of the first type of sample user and the application usage behavior data of the second type of sample user in the sample user set comprises:
calculating a difference value between a first installation rate and a second installation rate, wherein the first installation rate is the installation rate of the application program to be screened in a first type of sample users, and the second installation rate is the installation rate of the application program to be screened in a second type of sample users;
and if the difference value between the first installation rate and the second installation rate is greater than a first threshold value, taking the application program to be screened as the target application program.
3. The method of claim 1, wherein screening out at least one target application based on a difference between the application usage behavior data of the first type of sample user and the application usage behavior data of the second type of sample user in the sample user set comprises:
carrying out significance test on the difference between the use behavior data of the application program to be screened of the first type sample user and the use behavior data of the application program to be screened of the second type sample user;
and if the application program passes the significance test, taking the application program to be screened as the target application program.
4. The method according to claim 1, wherein the calculating the weight of each target application of the at least one target application corresponding to each user according to the duration of the at least one target application used by each user of the sample user set comprises:
and smoothing the time length of the single target application program used by the single user by a box separation method, and taking the smoothed numerical value as the weight of the target application program corresponding to the user.
5. The method of claim 1, wherein the determining a predictive model of overdue payment for a user according to the weight of each target application corresponding to each user comprises:
respectively inputting the weight of the at least one target application program corresponding to each user into a pre-constructed prediction model, wherein the prediction model outputs the prediction result of each user;
calculating a difference value between the predicted result of each user and the actual result of each user, wherein the predicted result of each user represents the predicted probability of overdue payment of each user, and the actual result of each user represents whether overdue payment of each user occurs or not;
if the difference is larger than a second threshold value, adjusting coefficients in the prediction model, and returning to the step of respectively inputting the weight of the at least one target application program corresponding to each user into the pre-constructed prediction model;
and if the difference is equal to or smaller than a second threshold value, taking the prediction model when the difference is equal to or smaller than the second threshold value as the overdue payment prediction model of the user.
6. The method of claim 5, wherein the calculating the magnitude of the difference between the predicted outcome of each user and the actual outcome of each user comprises:
the difference size Z is calculated by the following formula,
wherein i is 1, … …, n; j is 1, … …, m; x is the number ofijRepresenting the weight of the jth target application corresponding to the ith user in the sample user set, wherein m represents the total number of the at least one target application; f (x)i1,xi2...xij...xim) Outputting a prediction result of the ith user by using the prediction model, wherein n represents the total number of sample users in the sample user set; y isiRepresents the actual result of the ith user, L (f (x)i1,xi2...xij...xim),yi) Is a loss function.
7. The method according to any one of claims 1 to 6, further comprising, after calculating the weight of each target application in the at least one target application corresponding to each user according to the duration of the at least one target application used by each user in the sample user set, the following:
and adjusting the value of the weight according to the proportion of the number of the first type sample users or the number of the second type sample users to the total number of the sample users in the sample user set.
8. A method for predicting user behavior, comprising:
calculating the weight of each target application program corresponding to a target user according to the time length of each target application program in at least one target application program used by the target user;
inputting the weight of each target application program corresponding to the target user into a preset overdue payment prediction model of the user to obtain a prediction result of whether the target user will overdue payment; the user overdue payment prediction model is the user overdue payment prediction model according to any one of claims 1 to 7, and the at least one target application is the at least one target application according to any one of claims 1 to 7.
9. An apparatus for processing behavior data of a user using an application, comprising:
the acquisition module is used for acquiring application program use behavior data of all users in the sample user set; wherein the application usage behavior data comprises application installation data and application uninstallation data;
the screening module is used for screening out at least one target application program based on the difference between the application program use behavior data of the first type of sample users and the application program use behavior data of the second type of sample users in the sample user set; the first type of sample users comprise users who have overdue repayment, and the second type of sample users comprise users who have not yet overdue repayment;
a calculating module, configured to calculate, according to a duration that each user in the sample user set uses the at least one target application, a weight of each target application in the at least one target application corresponding to each user;
the determining module is used for determining a user overdue repayment prediction model according to the weight of each target application program corresponding to each user; the user overdue payment prediction model is used for predicting whether the target user can be overdue payment.
10. A user behavior prediction apparatus, comprising:
the computing module is used for computing the weight of each target application program corresponding to a target user according to the time length of the target user using each target application program in at least one target application program;
the model prediction module is used for inputting the weight of each target application program corresponding to the target user into a preset overdue payment prediction model of the user to obtain a prediction result of whether the target user will be overdue payment; the user overdue payment prediction model is the user overdue payment prediction model according to any one of claims 1 to 7, and the at least one target application is the at least one target application according to any one of claims 1 to 7.
11. A computing device, comprising: a processor, a memory, and computer program instructions stored in the memory;
the computer program instructions, when executed by the processor, implement the method of any of claims 1-7;
or,
the computer program instructions, when executed by the processor, implement the method of claim 8.
12. A computer-readable storage medium having computer program instructions stored thereon, wherein,
the computer program instructions, when executed by a processor, implement the method of any one of claims 1-7;
or,
the computer program instructions, when executed by a processor, implement the method of claim 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810931189.4A CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810931189.4A CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109214912A true CN109214912A (en) | 2019-01-15 |
Family
ID=64988234
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810931189.4A Pending CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109214912A (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110246026A (en) * | 2019-05-21 | 2019-09-17 | 平安银行股份有限公司 | A kind of output combination setting method, device and the terminal device of data transfer |
| CN111062518A (en) * | 2019-11-22 | 2020-04-24 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing hastening service based on artificial intelligence |
| CN111915378A (en) * | 2020-08-17 | 2020-11-10 | 深圳墨世科技有限公司 | User attribute prediction method, device, computer equipment and storage medium |
| CN113222258A (en) * | 2021-05-17 | 2021-08-06 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
| CN114896603A (en) * | 2022-05-26 | 2022-08-12 | 支付宝(杭州)信息技术有限公司 | Service processing method, device and equipment |
| CN118378177A (en) * | 2024-06-20 | 2024-07-23 | 杭银消费金融股份有限公司 | Multi-classification model prediction distribution adjustment method |
-
2018
- 2018-08-15 CN CN201810931189.4A patent/CN109214912A/en active Pending
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110246026A (en) * | 2019-05-21 | 2019-09-17 | 平安银行股份有限公司 | A kind of output combination setting method, device and the terminal device of data transfer |
| CN110246026B (en) * | 2019-05-21 | 2023-06-27 | 平安银行股份有限公司 | Data transfer output combination setting method and device and terminal equipment |
| CN111062518A (en) * | 2019-11-22 | 2020-04-24 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing hastening service based on artificial intelligence |
| CN111062518B (en) * | 2019-11-22 | 2023-06-09 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing collect-promoting business based on artificial intelligence |
| CN111915378A (en) * | 2020-08-17 | 2020-11-10 | 深圳墨世科技有限公司 | User attribute prediction method, device, computer equipment and storage medium |
| CN113222258A (en) * | 2021-05-17 | 2021-08-06 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
| CN114896603A (en) * | 2022-05-26 | 2022-08-12 | 支付宝(杭州)信息技术有限公司 | Service processing method, device and equipment |
| CN118378177A (en) * | 2024-06-20 | 2024-07-23 | 杭银消费金融股份有限公司 | Multi-classification model prediction distribution adjustment method |
| CN118378177B (en) * | 2024-06-20 | 2024-09-03 | 杭银消费金融股份有限公司 | Multi-classification model prediction distribution adjustment method |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109214912A (en) | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data | |
| CN109191282A (en) | Methods of marking and system are monitored in a kind of loan of Behavior-based control model | |
| CN107633030A (en) | Credit estimation method and device based on data model | |
| CN112328869A (en) | User loan willingness prediction method and device and computer system | |
| CN113034046A (en) | Data risk metering method and device, electronic equipment and storage medium | |
| CN110648223A (en) | A method, device and electronic device for approving a large amount of business | |
| CN112950359A (en) | User identification method and device | |
| CN109102396A (en) | A kind of user credit ranking method, computer equipment and readable medium | |
| CN114240215B (en) | Method, device and storage medium for obtaining user disconnection level | |
| CN111738824B (en) | A method, device and system for screening accounting data processing methods | |
| US20110125623A1 (en) | Account level cost of funds determination | |
| CN111597343B (en) | APP-based intelligent user occupation judgment method and device and electronic equipment | |
| CN112686312A (en) | Data classification method, device and system | |
| CN112598422A (en) | Transaction risk assessment method, system, device and storage medium | |
| CN110689425A (en) | Method and device for pricing quota based on income and electronic equipment | |
| CN111160695A (en) | Method, system, device and storage medium for identifying risk account of computer operation | |
| WO2011149608A1 (en) | Identifying and using critical fields in quality management | |
| CN113807943A (en) | A multi-factor valuation method and system, medium and equipment for non-performing assets | |
| CN118747335A (en) | Financial account risk early warning response method and system based on big data | |
| JP6423031B2 (en) | Information processing apparatus and program | |
| CN108446907B (en) | Safety verification method and device | |
| CN117172910A (en) | Credit assessment methods, devices, electronic equipment, and storage media based on EBM models | |
| CN117131460A (en) | A telecommunications fraud account identification model training method, device, equipment and medium | |
| CN117237083A (en) | Credit risk identification method, credit risk identification device, equipment and storage medium | |
| CN108197740A (en) | Business failure Forecasting Methodology, electronic equipment and computer storage media |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190115 |