CN114707762A

CN114707762A - Prediction method, device, equipment and medium of credit risk

Info

Publication number: CN114707762A
Application number: CN202210477902.9A
Authority: CN
Inventors: 乔媛; 朱道彬; 闫冬梅; 汪婕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-05
Anticipated expiration: 2042-04-29
Also published as: CN114707762B

Abstract

The present disclosure provides a method for predicting credit risk, which can be used in the field of big data technology. The method includes: acquiring personal information of n customers, wherein the personal information of each customer includes N information items; quantifying the personal information of the n customers to obtain a personal information matrix; for the personal information matrix, using spectral clustering The method calculates the graph Laplacian matrix corresponding to the personal information matrix; uses the local optimal block conjugate gradient method to reduce the dimension of the graph Laplacian matrix, and obtains the feature matrix corresponding to the graph Laplacian matrix; based on the feature A matrix is used to classify n customers by a clustering method; and according to the classified n customers, the credit risk of n customers is predicted. According to the prediction method in the embodiment of the present disclosure, the feature space of the graph Laplacian matrix can be quickly obtained by the locally optimal block conjugate gradient method, and the calculation speed is fast and the memory is small.

Description

Prediction method, device, equipment and medium of credit risk

技术领域technical field

本公开涉及大数据技术领域，具体涉及一种信用风险的预测方法、装置、设备和介质。The present disclosure relates to the technical field of big data, and in particular, to a method, device, device and medium for predicting credit risk.

背景技术Background technique

目前，对于信用风险的大数据分析通常是将每个客户的信息作为一个样本，运用机器学习方法对这些客户进行分类，得到每个客户的信用风险等级。但对于拥有千万级别以上客户数据的银行来说，在每增加一个新客户或增加一项新信息时，机器需要对全部数据进行重新计算和分类，这将会产生巨大的计算量，耗时且耗资源。At present, the big data analysis of credit risk usually takes the information of each customer as a sample, uses machine learning methods to classify these customers, and obtains the credit risk level of each customer. But for banks with more than 10 million customer data, each time a new customer is added or a new piece of information is added, the machine needs to recalculate and classify all the data, which will result in a huge amount of calculation and time-consuming. and consume resources.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本公开提供了一种信用风险的预测方法、装置、设备、介质和程序产品。In view of the above problems, the present disclosure provides a method, apparatus, device, medium and program product for predicting credit risk.

根据本公开的第一个方面，提供了一种信用风险的预测方法，包括以下步骤：According to a first aspect of the present disclosure, a method for predicting credit risk is provided, comprising the following steps:

获取客户对获取个人信息的授权；Obtain customer authorization to obtain personal information;

在得到客户对获取个人信息的授权的情况下，获取n个客户的个人信息，其中，每一个客户的个人信息包括N个信息项，所述N个信息项均与信用风险相关，n为大于等于1的整数，N为大于等于2的整数；Obtain the personal information of n customers under the condition of obtaining the authorization of the customer to obtain personal information, wherein the personal information of each customer includes N information items, and the N information items are all related to credit risk, and n is greater than An integer equal to 1, N is an integer greater than or equal to 2;

对所述n个客户的个人信息进行量化，得到个人信息矩阵，其中，所述个人信息矩阵为n行N列的矩阵，所述个人信息矩阵的每一行表示一个客户的经量化的个人信息；Quantifying the personal information of the n customers to obtain a personal information matrix, wherein the personal information matrix is a matrix of n rows and N columns, and each row of the personal information matrix represents the quantified personal information of a customer;

针对所述个人信息矩阵，利用谱聚类方法计算与所述个人信息矩阵对应的图拉普拉斯矩阵，其中，所述图拉普拉斯矩阵为n行n列的矩阵；For the personal information matrix, a spectral clustering method is used to calculate a graph Laplacian matrix corresponding to the personal information matrix, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

采用局部最优块共轭梯度法对所述图拉普拉斯矩阵进行降维，得到与所述图拉普拉斯矩阵对应的特征矩阵，其中，所述特征矩阵为n行b列的矩阵，b为正整数且1≤b＜n；The local optimal block conjugate gradient method is used to reduce the dimension of the graph Laplacian matrix to obtain a feature matrix corresponding to the graph Laplacian matrix, wherein the feature matrix is a matrix with n rows and b columns , b is a positive integer and 1≤b<n;

基于所述特征矩阵，采用聚类方法对所述n个客户进行分类；以及classifying the n customers using a clustering method based on the feature matrix; and

根据分类后的n个客户，预测所述n个客户的信用风险。According to the classified n customers, the credit risk of the n customers is predicted.

根据本公开实施例中的预测方法，通过使用局部最优块共轭梯度法可快速搜索最佳的梯度方向，对每个客户的N个信息项进行降维，进而迅速得到图拉普拉斯矩阵的特征空间，计算速度快、占用内存小，大大降低了神经网络的计算量和训练速度。According to the prediction method in the embodiment of the present disclosure, by using the locally optimal block conjugate gradient method, the optimal gradient direction can be quickly searched, and the N information items of each customer can be dimensionally reduced, and then the graph Laplacian can be quickly obtained. The feature space of the matrix has fast calculation speed and small memory occupation, which greatly reduces the calculation amount and training speed of the neural network.

根据一些示例性的实施例，所述采用局部最优块共轭梯度法对所述图拉普拉斯矩阵进行降维，得到与所述图拉普拉斯矩阵对应的特征矩阵，具体包括：According to some exemplary embodiments, the dimensionality reduction of the graph Laplacian matrix by using the locally optimal block conjugate gradient method to obtain a feature matrix corresponding to the graph Laplacian matrix specifically includes:

采用迭代方法确定搜索方向，使得所述搜索方向逐渐与待确定的特征矩阵中每一列的向量的方向一致；以及Determine the search direction by an iterative method, so that the search direction is gradually consistent with the direction of the vector of each column in the feature matrix to be determined; and

根据确定出的搜索方向，确定与所述图拉普拉斯矩阵对应的特征矩阵。According to the determined search direction, a feature matrix corresponding to the graph Laplacian matrix is determined.

根据一些示例性的实施例，所述采用迭代方法确定搜索方向，使得所述搜索方向逐渐与待确定的特征矩阵中每一列的向量的方向一致，具体包括：According to some exemplary embodiments, the iterative method is used to determine the search direction, so that the search direction is gradually consistent with the direction of the vector of each column in the feature matrix to be determined, specifically including:

基于所述图拉普拉斯矩阵，得到中间矩阵，其中，所述中间矩阵为n行b列的矩阵；Based on the graph Laplacian matrix, an intermediate matrix is obtained, wherein the intermediate matrix is a matrix with n rows and b columns;

计算所述中间矩阵的特征值和特征向量；以及computing eigenvalues and eigenvectors of the intermediate matrix; and

根据所述中间矩阵的特征向量，生成第一子矩阵，其中，所述第一子矩阵中每一列的向量表示搜索方向，所述第一子矩阵中每一列的向量表示的搜索方向分别与待确定的特征矩阵中每一列的向量的方向对应，所述第一子矩阵为n行b列的矩阵。According to the eigenvectors of the intermediate matrix, a first sub-matrix is generated, wherein the vector of each column in the first sub-matrix represents the search direction, and the search direction represented by the vector of each column in the first sub-matrix is different from the one to be The direction of the vector of each column in the determined feature matrix corresponds to the direction, and the first sub-matrix is a matrix with n rows and b columns.

根据一些示例性的实施例，所述采用迭代方法确定搜索方向，使得所述搜索方向逐渐与待确定的特征矩阵中每一列的向量的方向一致，还具体包括：According to some exemplary embodiments, the use of an iterative method to determine the search direction, so that the search direction is gradually consistent with the direction of the vector of each column in the feature matrix to be determined, further specifically includes:

根据所述中间矩阵的特征值和特征向量以及所述第一子矩阵，生成第二子矩阵，所述第二子矩阵为n行b列的矩阵；According to the eigenvalues and eigenvectors of the intermediate matrix and the first sub-matrix, a second sub-matrix is generated, and the second sub-matrix is a matrix with n rows and b columns;

其中，所述第二子矩阵中每一列的向量表示所述图拉普拉斯矩阵与所述第一子矩阵之间的残差向量。Wherein, the vector of each column in the second sub-matrix represents the residual vector between the graph Laplacian matrix and the first sub-matrix.

根据所述中间矩阵的特征向量、所述第一子矩阵以及所述第二子矩阵，生成第三子矩阵，所述第三子矩阵为n行b列的矩阵；According to the eigenvectors of the intermediate matrix, the first sub-matrix and the second sub-matrix, generate a third sub-matrix, and the third sub-matrix is a matrix with n rows and b columns;

其中，所述第一子矩阵、所述第二子矩阵和所述第三子矩阵构成表示搜索子空间的搜索矩阵。Wherein, the first sub-matrix, the second sub-matrix and the third sub-matrix constitute a search matrix representing a search sub-space.

根据所述中间矩阵的特征向量以及所述搜索矩阵，采用迭代方法更新所述第一子矩阵，According to the eigenvectors of the intermediate matrix and the search matrix, the first sub-matrix is updated by an iterative method,

其中，在迭代过程中，根据前次迭代过程中的所述中间矩阵的特征向量以及前次迭代过程中的所述搜索矩阵，更新所述第一子矩阵，以生成当次迭代过程中的第一子矩阵。Wherein, in the iterative process, according to the eigenvector of the intermediate matrix in the previous iteration process and the search matrix in the previous iteration process, the first sub-matrix is updated to generate the first sub-matrix in the current iteration process. a submatrix.

在迭代过程中，根据当次迭代过程中的所述中间矩阵的特征值和特征向量以及当次迭代过程中的所述第一子矩阵，更新所述第二子矩阵，以生成当次迭代过程中的第二子矩阵。In the iterative process, according to the eigenvalues and eigenvectors of the intermediate matrix in the current iterative process and the first submatrix in the current iterative process, the second submatrix is updated to generate the current iterative process the second submatrix in .

根据一些示例性的实施例，所述第三子矩阵中每一列的向量表示相邻两次迭代过程中所述搜索子空间的基之间的差。According to some exemplary embodiments, the vector of each column in the third sub-matrix represents the difference between the bases of the search subspace in two adjacent iterations.

在迭代过程中，根据前次迭代过程中的所述中间矩阵的特征向量以及前次迭代过程中的第一子矩阵和第二子矩阵，更新所述第三子矩阵，以生成当次迭代过程中的第三子矩阵。In the iterative process, the third sub-matrix is updated according to the eigenvector of the intermediate matrix in the previous iterative process and the first sub-matrix and the second sub-matrix in the previous iterative process to generate the current iterative process the third submatrix in .

根据一些示例性的实施例，所述根据确定出的搜索方向，确定与所述图拉普拉斯矩阵对应的特征矩阵，具体包括：According to some exemplary embodiments, the determining a feature matrix corresponding to the graph Laplacian matrix according to the determined search direction specifically includes:

在迭代过程中，当更新后的所述第二子矩阵中第i列的向量满足第一规定条件时，将更新后的所述第一子矩阵中第i列的向量确定为所述特征矩阵的一列，其中，i为正整数且1≤i＜b。In the iterative process, when the updated vector of the i-th column in the second sub-matrix satisfies the first prescribed condition, the updated vector of the i-th column in the first sub-matrix is determined as the feature matrix A column of , where i is a positive integer and 1≤i<b.

根据一些示例性的实施例，所述更新后的所述第二子矩阵中第i列的向量满足第一规定条件包括：According to some exemplary embodiments, the updated vector of the i-th column in the second sub-matrix satisfies the first prescribed condition including:

更新后的所述第二子矩阵中第i列的向量的范数小于规定的阈值。The norm of the vector in the i-th column of the updated second sub-matrix is smaller than a predetermined threshold.

根据一些示例性的实施例，所述基于所述图拉普拉斯矩阵，得到中间矩阵，具体包括：According to some exemplary embodiments, the obtaining an intermediate matrix based on the graph Laplacian matrix specifically includes:

基于所述图拉普拉斯矩阵、所述搜索矩阵和所述搜索矩阵的转置矩阵，生成所述中间矩阵。The intermediate matrix is generated based on the graph Laplacian matrix, the search matrix, and a transpose matrix of the search matrix.

根据一些示例性的实施例，所述迭代过程中首次使用的第一子矩阵为n行b列的随机矩阵。According to some exemplary embodiments, the first sub-matrix used for the first time in the iterative process is a random matrix with n rows and b columns.

根据一些示例性的实施例，所述方法还包括：当所述n个客户中至少一个客户的至少一个信息项发生变化时，更新所述特征矩阵；和/或，According to some exemplary embodiments, the method further comprises: updating the feature matrix when at least one information item of at least one customer among the n customers changes; and/or,

在获取第n+1个客户的个人信息后，更新所述特征矩阵。After acquiring the personal information of the n+1th customer, update the feature matrix.

根据一些示例性的实施例，在更新所述特征矩阵的过程中，迭代过程中首次使用的第一子矩阵为更新前的特征矩阵。According to some exemplary embodiments, in the process of updating the feature matrix, the first sub-matrix used for the first time in the iterative process is the feature matrix before updating.

本公开的第二方面提供了一种信用风险的预测装置，包括：A second aspect of the present disclosure provides a device for predicting credit risk, including:

客户授权获取模块，用于获取客户对获取个人信息的授权；The customer authorization acquisition module is used to obtain the authorization of the customer to obtain personal information;

个人信息获取模块，用于：在得到客户对获取个人信息的授权的情况下，获取n个客户的个人信息，其中，每一个客户的个人信息包括N个信息项，所述N个信息项均与信用风险相关，n为大于等于1的整数，N为大于等于2的整数；The personal information acquisition module is used to: obtain the personal information of n customers under the condition of obtaining the authorization of the customer to obtain personal information, wherein the personal information of each customer includes N information items, and the N information items are Related to credit risk, n is an integer greater than or equal to 1, and N is an integer greater than or equal to 2;

个人信息矩阵获取模块，用于：对所述n个客户的个人信息进行量化，得到个人信息矩阵，其中，所述个人信息矩阵为n行N列的矩阵，所述个人信息矩阵的每一行表示一个客户的经量化的个人信息；A personal information matrix acquisition module, configured to: quantify the personal information of the n customers to obtain a personal information matrix, wherein the personal information matrix is a matrix with n rows and N columns, and each row of the personal information matrix represents Quantified personal information about a customer;

图拉普拉斯矩阵计算模块，用于：针对所述个人信息矩阵，利用谱聚类方法计算与所述个人信息矩阵对应的图拉普拉斯矩阵，其中，所述图拉普拉斯矩阵为n行n列的矩阵；A graph Laplacian matrix calculation module, configured to: for the personal information matrix, use a spectral clustering method to calculate a graph Laplacian matrix corresponding to the personal information matrix, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

特征矩阵获取模块，用于：采用局部最优块共轭梯度法对所述图拉普拉斯矩阵进行降维，得到与所述图拉普拉斯矩阵对应的特征矩阵，其中，所述特征矩阵为n行b列的矩阵，b为正整数且1≤b＜n；A feature matrix acquisition module, used for: reducing the dimension of the graph Laplacian matrix by using the local optimal block conjugate gradient method to obtain a feature matrix corresponding to the graph Laplacian matrix, wherein the feature The matrix is a matrix with n rows and b columns, where b is a positive integer and 1≤b<n;

分类模块，用于：基于所述特征矩阵，采用聚类方法对所述n个客户进行分类；以及a classification module for: classifying the n customers by using a clustering method based on the feature matrix; and

信用风险预测模块，用于：根据分类后的n个客户，预测所述n个客户的信用风险。The credit risk prediction module is used for: predicting the credit risk of the n customers according to the classified n customers.

本公开的第三方面提供了一种电子设备，包括：一个或多个处理器；存储器，用于存储一个或多个程序，其中，当所述一个或多个程序被所述一个或多个处理器执行时，使得一个或多个处理器执行上述的方法。A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more programs When executed by the processor, one or more processors are caused to execute the above method.

本公开的第四方面还提供了一种计算机可读存储介质，其上存储有可执行指令，该指令被处理器执行时使处理器执行上述的方法。A fourth aspect of the present disclosure also provides a computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, cause the processor to perform the above method.

本公开的第五方面还提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述的方法。A fifth aspect of the present disclosure also provides a computer program product, including a computer program, which implements the above method when executed by a processor.

附图说明Description of drawings

通过以下参照附图对本公开实施例的描述，本公开的上述内容以及其他目的、特征和优点将更为清楚，在附图中：The foregoing and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

图1示意性示出了根据本公开实施例的信用风险的预测方法的应用场景图；FIG. 1 schematically shows an application scenario diagram of a method for predicting credit risk according to an embodiment of the present disclosure;

图2示意性示出了根据本公开实施例的信用风险的预测方法的流程图；FIG. 2 schematically shows a flowchart of a method for predicting credit risk according to an embodiment of the present disclosure;

图3示意性示出了根据本公开实施例的利用局部最优块共轭梯度法对图拉普拉斯矩阵降维的流程图；FIG. 3 schematically shows a flowchart of reducing the dimension of the graph Laplacian matrix by using the locally optimal block conjugate gradient method according to an embodiment of the present disclosure;

图4示意性示出了根据本公开实施例的信用风险的预测装置的结构框图；以及FIG. 4 schematically shows a structural block diagram of an apparatus for predicting credit risk according to an embodiment of the present disclosure; and

图5示意性示出了根据本公开实施例的适于实现信用风险的预测方法的电子设备的方框图。FIG. 5 schematically shows a block diagram of an electronic device suitable for implementing a credit risk prediction method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下，将参照附图来描述本公开的实施例。但是应该理解，这些描述只是示例性的，而并非要限制本公开的范围。在下面的详细描述中，为便于解释，阐述了许多具体的细节以提供对本公开实施例的全面理解。然而，明显地，一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外，在以下说明中，省略了对公知结构和技术的描述，以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. In the following detailed description, for convenience of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It will be apparent, however, that one or more embodiments may be practiced without these specific details. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concepts of the present disclosure.

在此使用的术语仅仅是为了描述具体实施例，而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在，但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. The terms "comprising", "comprising" and the like as used herein indicate the presence of stated features, steps, operations and/or components, but do not preclude the presence or addition of one or more other features, steps, operations or components.

在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义，除非另外定义。应注意，这里使用的术语应解释为具有与本说明书的上下文相一致的含义，而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meaning as commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly rigid manner.

在使用类似于“A、B和C等中至少一个”这样的表述的情况下，一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如，“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。Where expressions like "at least one of A, B, and C, etc.," are used, they should generally be interpreted in accordance with the meaning of the expression as commonly understood by those skilled in the art (eg, "has A, B, and C") At least one of the "systems" shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ).

目前，各大银行预测客户的信用风险通常是采用人工评判或机器预测实现的，人工分类带有主观因素影响较大，难以形成客观的评价标准，而对于应用机器学习来分类预测，通常是将每个客户的信息作为一个样本，然后利用SVM(Support Vector Machine，支持向量机)、神经网络或决策树等算法得到每个客户的信用风险等级。At present, major banks usually use manual judgment or machine prediction to predict the credit risk of customers. Manual classification has a large influence of subjective factors, and it is difficult to form an objective evaluation standard. For classification and prediction by applying machine learning, it is usually the The information of each customer is used as a sample, and then algorithms such as SVM (Support Vector Machine), neural network or decision tree are used to obtain the credit risk level of each customer.

但对于拥有千万级别以上客户数据的银行来说，在每增加一个新客户或增加一个新的信息项时，机器需要对全部数据进行重新计算和分类，这将会产生巨大的计算量，耗时且耗资源，且可能对变化的数据无法及时做出反应。However, for banks with more than 10 million customer data, each time a new customer is added or a new information item is added, the machine needs to recalculate and classify all the data, which will generate a huge amount of calculation and consume a lot of money. Time-consuming and resource-intensive, and may not be able to respond to changing data in a timely manner.

由于人工判断较为主观，因此本申请所提供的预测方法是在机器学习上改进的。在客户的多个信息项中，存在很多无关信息或不重要的信息，干扰信用风险的评估结果，还会增加没必要的计算负担，本申请是对这些信息进行有效的规避，以快速找到有用的信息项，进而对金融客户或企业的信用进行分类。与现有技术中降维不同的是，本申请使用了局部最优块共轭梯度法对图拉普拉斯矩阵进行降维，可以迅速逼近图拉普拉斯矩阵的特征空间，因此本申请的预测方法具备占用内存少、计算快速的优点。Since human judgment is relatively subjective, the prediction method provided in this application is improved on machine learning. Among the multiple information items of the customer, there are a lot of irrelevant information or unimportant information, which interferes with the assessment result of credit risk and increases unnecessary computational burden. This application is to effectively avoid such information to quickly find useful information. information items, and then classify the credit of financial customers or enterprises. Different from the dimensionality reduction in the prior art, the present application uses the local optimal block conjugate gradient method to reduce the dimension of the graph Laplacian matrix, which can quickly approximate the feature space of the graph Laplacian matrix. The prediction method has the advantages of less memory consumption and fast calculation.

为了便于理解本申请的技术方案，下述将对本申请涉及到的技术术语进行介绍。In order to facilitate the understanding of the technical solutions of the present application, the following will introduce the technical terms involved in the present application.

谱聚类算法：一种非监督机器学习方法，谱聚类算法是建立在图论中的谱图理论基础上，其本质是将聚类问题转化为图的最优划分问题，与传统的聚类算法相比，它具有能在任意形状的样本空间上聚类且收敛于全局最优解的优点。Spectral clustering algorithm: an unsupervised machine learning method. The spectral clustering algorithm is based on the spectral graph theory in graph theory. Compared with the class algorithm, it has the advantage of being able to cluster in any shape of the sample space and converging to the global optimal solution.

降维：是谱聚类算法中的关键步骤，在本申请中可以对输入的资料数据进行缩减，从而减少计算量。例如，数据量由n×n个下降到了n×b个，其中的b一定小于n。Dimensionality reduction: It is a key step in the spectral clustering algorithm. In this application, the input data can be reduced, thereby reducing the amount of calculation. For example, the amount of data is reduced from n×n to n×b, where b must be less than n.

特征向量与特征值：矩阵的特征向量是矩阵理论上的重要概念之一，线性变换的特征向量是一个非简并的向量，其方向在该变换下不变，该向量在此变换下缩放的比例成为其的特征值。从数学上看，如果向量v与变换A满足Av＝λv，则称向量v是变换A的一个特征向量，λ是相应的特征值。Eigenvector and eigenvalue: The eigenvector of a matrix is one of the important concepts in matrix theory. The eigenvector of a linear transformation is a non-degenerate vector whose direction does not change under this transformation, and the vector is scaled under this transformation. The scale becomes its eigenvalue. Mathematically, if the vector v and the transformation A satisfy Av=λv, then the vector v is an eigenvector of the transformation A, and λ is the corresponding eigenvalue.

特征空间：特征向量所在的空间，每一个特征对应特征空间中的唯一坐标。Feature space: The space where the feature vector is located, and each feature corresponds to a unique coordinate in the feature space.

下述将参照附图来描述本申请的实施例。但应当理解的是，这些描述只是示例型的，而并非要限制本申请的公开范围。在下面的详细描述中，为便于解释，阐述了许多具体的细节以及提供对本申请实施例的全面解释。然而，一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外，在以下说明中，省略了对公知结构和技术的描述，以避免不必要地混淆。Embodiments of the present application will be described below with reference to the accompanying drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the disclosure of the present application. In the following detailed description, for convenience of explanation, numerous specific details are set forth and to provide a thorough explanation of the embodiments of the present application. However, one or more embodiments may be practiced without these specific details. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessary confusion.

需要注意的是，在本申请的技术方案中，所涉及的客户个人信息的获取，存储和应用等，均符合相关法律法规的规定，采取了必要保密措施，且不违背公序良俗。It should be noted that, in the technical solution of this application, the acquisition, storage and application of the customer's personal information involved are in compliance with the relevant laws and regulations, and necessary confidentiality measures have been taken, and do not violate public order and good customs.

图1示意性示出了根据本公开实施例的信用风险的预测方法的应用场景图。FIG. 1 schematically shows an application scenario diagram of a method for predicting credit risk according to an embodiment of the present disclosure.

如图1所示，根据该实施例的应用场景100可以包括终端设备101、102、103、网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , an application scenario 100 according to this embodiment may include terminal devices 101 , 102 , 103 , a network 104 , and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (only examples).

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器105可以是提供各种服务的服务器，例如对用户利用终端设备101、102、103所浏览的网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的用户请求等数据进行分析等处理，并将处理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给终端设备。The server 105 may be a server that provides various services, such as a background management server (just an example) that provides support for websites browsed by users using the terminal devices 101 , 102 , and 103 . The background management server can analyze and process the received user requests and other data, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device.

需要说明的是，本公开实施例所提供的信用风险的预测方法一般可以由服务器105执行。相应地，本公开实施例所提供的信用风险的预测装置一般可以设置于服务器105中。本公开实施例所提供的信用风险的预测方法也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地，本公开实施例所提供的信用风险的预测装置也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。It should be noted that, the method for predicting credit risk provided by the embodiment of the present disclosure may generally be executed by the server 105 . Correspondingly, the apparatus for predicting credit risk provided by the embodiment of the present disclosure may generally be set in the server 105 . The credit risk prediction method provided by the embodiment of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 . Correspondingly, the credit risk prediction apparatus provided by the embodiments of the present disclosure may also be set in a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 .

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

图2示意性示出了根据本公开实施例的信用风险的预测方法的流程图。FIG. 2 schematically shows a flowchart of a method for predicting credit risk according to an embodiment of the present disclosure.

如图2所示，该实施例的预测方法包括操作S110～操作S170。As shown in FIG. 2 , the prediction method of this embodiment includes operations S110 to S170.

在操作S110，获取客户对获取个人信息的授权。In operation S110, the authorization of the customer to obtain personal information is obtained.

在本公开的实施例中，在获取客户的信息之前，都需要获得客户的同意或授权。例如，在操作S120之前，可以通知客户，向客户发出获取其个人信息以及相关联的其他信息的请求。在客户同意或授权可以获取个人信息的情况下，执行操作S120。In the embodiments of the present disclosure, before acquiring the information of the customer, the consent or authorization of the customer needs to be obtained. For example, prior to operation S120, the client may be notified, and a request to obtain his/her personal information and other associated information may be issued to the client. In the case that the customer agrees or authorizes that the personal information can be obtained, operation S120 is performed.

在操作S120，在得到客户对获取个人信息的授权的情况下，获取n个客户的个人信息，其中，每一个客户的个人信息包括N个信息项，N个信息项均与信用风险相关，n为大于等于1的整数，N为大于等于2的整数。In operation S120, the personal information of n customers is obtained under the condition of obtaining the authorization from the customer to obtain the personal information, wherein the personal information of each customer includes N information items, and the N information items are related to credit risk, and n is an integer greater than or equal to 1, and N is an integer greater than or equal to 2.

在对客户进行信用风险预测之前，需要得到该客户相关的个人信息，也就是俗称的客户资料。从客户资料中，通常可以得知该客户的客户经营能力、盈利能力、偿债能力、发展能力、客户素质以及信用状态等的信息，根据这些相关信息，评判客户的信用。Before making credit risk prediction for a customer, it is necessary to obtain personal information related to the customer, which is commonly known as customer data. From the customer information, we can usually learn the customer's customer management ability, profitability, debt repayment ability, development ability, customer quality and credit status, etc., and judge the customer's credit according to these relevant information.

获取客户的个人信息渠道可以是多样化的，例如，在具体示例中，可包括以下文字段：“某公司，2015年成立，营业额，每股收益，净利润增长率，履约情况，……”。从上述资料信息中可以得知征机构属性、运营情况的相关信息。在一些实例中，可以对上述资料信息做进一步地拓展，例如，机构的信息可以包括：企业登记补充信息、企业登记变更信息、股东登记信息、单位参保情况信息、企业法人、企业财务数据、企业纳税信息、企业纳税登记信息、企业公积金缴存信息等；其它信息例如：机构名称、企业营业收入、企业基础信息等。The channels for obtaining customers' personal information can be diverse, for example, in a specific example, the following fields can be included: "A company, established in 2015, turnover, earnings per share, net profit growth rate, contract performance, ... ". From the above-mentioned data and information, we can learn the relevant information about the property and operation of the expropriation agency. In some instances, the above-mentioned data and information can be further expanded. For example, the information of the institution may include: supplementary information on enterprise registration, information on changes in enterprise registration, shareholder registration information, information on unit participation in insurance, enterprise legal person, enterprise financial data, Enterprise tax information, enterprise tax registration information, enterprise provident fund payment information, etc.; other information such as: institution name, enterprise operating income, enterprise basic information, etc.

在获得客户的个人信息后，将个人信息整理成N个信息项，即每个客户都对应N个信息项。但由于N个信息项中，可能存在部分信息项是相关联的，例如公司或个人收益高的，纳税也会相应增长，收益信息和纳税额度这两个信息项是一定呈正比存在的，因此可以将相关的信息项进行合并，从而达到降维的目的。本申请涉及到降维，因此在此操作中，获取到的信息项N为大于等于2的整数。After obtaining the personal information of the customer, the personal information is organized into N information items, that is, each customer corresponds to N information items. However, among the N information items, there may be some information items that are related. For example, if the company or individual has high income, the tax payment will also increase accordingly. The two information items of income information and tax amount must exist in direct proportion. Therefore, Related information items can be combined to achieve the purpose of dimensionality reduction. This application involves dimension reduction, so in this operation, the acquired information item N is an integer greater than or equal to 2.

可以理解的是，多个信息项之间呈负比存在，也是同样可以做合并降维处理的。It can be understood that there is a negative ratio between multiple information items, and it can also be merged and reduced in dimension.

在操作S130，对n个客户的个人信息进行量化，得到个人信息矩阵，其中，个人信息矩阵为n行N列的矩阵，个人信息矩阵的每一行表示一个客户的经量化的个人信息。In operation S130, personal information of n customers is quantified to obtain a personal information matrix, wherein the personal information matrix is a matrix of n rows and N columns, and each row of the personal information matrix represents the quantified personal information of one customer.

在降维之前，对n个客户的个人信息进行数学上的信息处理。Before dimensionality reduction, mathematical information processing is performed on the personal information of n customers.

具体的处理方法可针对信息项中的具体内容做调整，例如，对于盈利能力(净利润、毛利率等)用简单的数字表示，再例如，履约情况用违约次数表示，或用违约次数与占信贷总次数的比例表示。The specific processing method can be adjusted according to the specific content in the information item. For example, the profitability (net profit, gross profit rate, etc.) is represented by simple numbers. Expressed as a percentage of the total number of credits.

利用量化后的客户资料可用元素X_ij表示，其中的i表示为第i个客户，j表示第i个客户的第j个信息项。因此i和j均可以表示为无向图中的节点。The quantized customer data can be represented by the element X _ij , where i represents the ith customer, and j represents the jth information item of the ith customer. Therefore both i and j can be represented as nodes in an undirected graph.

在操作S140，针对个人信息矩阵，利用谱聚类方法计算与个人信息矩阵对应的图拉普拉斯矩阵，其中，图拉普拉斯矩阵为n行n列的矩阵。In operation S140, for the personal information matrix, a spectral clustering method is used to calculate a graph Laplacian matrix corresponding to the personal information matrix, wherein the graph Laplacian matrix is a matrix with n rows and n columns.

在计算图拉普拉斯时，需要先利用量化后的个人信息生成邻接矩阵W。邻接矩阵是图的矩阵表示，借助他可以方便的存储图的结构，用线性代数的方法研究图的问题。本申请中的为对n个客户的求解，因此邻接矩阵W为n*n的矩阵，即，邻接矩阵W为n*n维度的矩阵，其中，矩阵元素是操作S130中得到的X_ij元素，表示为边(i，j)的权重，如果两个节点之间没有边连接，则在邻接矩阵中对应的元素为0。具体是通过k邻近法，利用KNN算法遍历所有的样本点，只保留每个样本最近的k个点作为近邻，即只有和样本距离最近的k个点之间的W_ij＞0，其余均设定为0。When calculating the graph Laplacian, it is necessary to use the quantized personal information to generate the adjacency matrix W first. The adjacency matrix is the matrix representation of the graph. With the help of it, the structure of the graph can be conveniently stored, and the problem of the graph can be studied by the method of linear algebra. In this application, it is the solution for n customers, so the adjacency matrix W is a matrix of n*n, that is, the adjacency matrix W is a matrix of n*n dimensions, wherein, the matrix elements are the X _ij elements obtained in operation S130, Expressed as the weight of the edge (i, j), if there is no edge connection between two nodes, the corresponding element in the adjacency matrix is 0. Specifically, through the k-proximity method, the KNN algorithm is used to traverse all the sample points, and only the k points closest to each sample are reserved as the nearest neighbors, that is, only the k points closest to the sample are W _ij > 0, and the rest are set to Set to 0.

计算公式如下：Calculated as follows:

上式中，X_i为第i行的数据，即第i个客户的个人信息，X_j为第j行的数据，即第j个客户的个人信息。In the above formula, X _i is the data of the i-th row, that is, the personal information of the i-th customer, and X _j is the data of the j-th row, that is, the personal information of the j-th customer.

可以理解为，(1)式代表只要X_i在X_j的K邻近集合中，就保留W_ij；(2)式代表X_i和X_j的两个点必须相互在对方的k邻近集合中，才能保留W_ij。It can be understood that formula (1) represents that as long as X _i is in the K adjacent set of X _j , W _ij is retained; formula (2) represents that the two points of X _i and X _j must be in each other's k adjacent set of each other, in order to preserve Wi _ij .

对于W_ij的计算，可以使用欧式距离度量任意两点(X_i和X_j)之间的距离。在预设函数过程中，常用的有多项式核函数、高斯核函数和Sigmoid核函数。For the calculation of _Wij , Euclidean distance can be used to measure the distance between any two points (X _i and X _j ). In the process of preset function, polynomial kernel function, Gaussian kernel function and Sigmoid kernel function are commonly used.

作为本申请的一个具体实施例，采用最常用的高斯核函数进行计算。As a specific embodiment of the present application, the most commonly used Gaussian kernel function is used for calculation.

通过上述方法在得到邻接矩阵W后，进一步地计算对角矩阵D，即度矩阵。After the adjacency matrix W is obtained by the above method, the diagonal matrix D, that is, the degree matrix, is further calculated.

由于本申请属于无向图，对于无向图来说，节点的加权度是与该节点相关的所有边的权重值和。无向图的邻接矩阵W的节点i的加权度为邻接矩阵第i行元素之和。Since this application belongs to an undirected graph, for an undirected graph, the weighted degree of a node is the sum of the weights of all edges related to the node. The weighted degree of the node i of the adjacency matrix W of the undirected graph is the sum of the elements in the ith row of the adjacency matrix.

根据邻近矩阵W计算相应的对角矩阵D，公式为：The corresponding diagonal matrix D is calculated according to the adjacent matrix W, and the formula is:

最终得到对角矩阵D为：Finally, the diagonal matrix D is obtained as:

根据无向图中的n个节点(n个客户)，邻接矩阵为W，对角矩阵为D，可以在邻接矩阵W和对角矩阵D的基础上定义图拉普拉斯矩阵，即，图拉普拉斯矩阵A被定义为对角矩阵D与邻接矩阵W之差：According to n nodes (n clients) in an undirected graph, the adjacency matrix is W and the diagonal matrix is D, the graph Laplacian matrix can be defined on the basis of the adjacency matrix W and the diagonal matrix D, that is, the graph The Laplacian matrix A is defined as the difference between the diagonal matrix D and the adjacency matrix W:

A＝D-WA=D-W

在操作S150，采用局部最优块共轭梯度法对图拉普拉斯矩阵进行降维，得到与图拉普拉斯矩阵对应的特征矩阵，其中，特征矩阵为n行b列的矩阵，b为正整数且1≤b＜n。In operation S150, the local optimal block conjugate gradient method is used to reduce the dimension of the graph Laplacian matrix to obtain a feature matrix corresponding to the graph Laplacian matrix, wherein the feature matrix is a matrix with n rows and b columns, b is a positive integer and 1≤b<n.

求得的图拉普拉斯矩阵A为n*n维的矩阵，在此操作中，通过采用局部最优块共轭梯度法对图拉普拉斯矩阵进行降维，即从n*n维的图拉普拉斯矩阵，变为n*b维的特征矩阵。The obtained graph Laplacian matrix A is an n*n-dimensional matrix. In this operation, the local optimal block conjugate gradient method is used to reduce the dimension of the graph Laplacian matrix, that is, from n*n dimensions The graph Laplacian matrix of , becomes the feature matrix of n*b dimension.

局部最优块共轭梯度法可探索最佳梯度方向，使得在机器训练和使用过程中，可以迅速逼近图拉普拉斯A的特征空间。The locally optimal block conjugate gradient method can explore the optimal gradient direction, so that the feature space of graph Laplacian A can be quickly approximated during machine training and use.

在操作S160，基于特征矩阵，采用聚类方法对n个客户进行分类。In operation S160, based on the feature matrix, a clustering method is adopted to classify the n customers.

对于n*b维的特征矩阵Q，利用每一行的数据作为样本，采用K-means将客户分类。需要明确的是，特征矩阵Q为n行b列的矩阵，此操作是将客户n中，每个客户对应的经过降维后的个人信息b作为样本，进行分类。For the n*b-dimensional feature matrix Q, use the data of each row as a sample, and use K-means to classify customers. It needs to be clear that the feature matrix Q is a matrix with n rows and b columns. This operation is to use the dimension-reduced personal information b corresponding to each customer in customer n as a sample for classification.

在本公开的实施例中，在使用如上所述的降维方法得到n*b维的特征矩阵后，利用每一行的数据作为样本对客户进行分类，本公开的实施例不局限于上述的K-means聚类方法，还可以使用训练神经网络或决策树(包括经典决策树方法与随机森林等派生方法)，将全部客户分类。In the embodiment of the present disclosure, after using the above-mentioned dimensionality reduction method to obtain an n*b-dimensional feature matrix, the data of each row is used as a sample to classify customers. The embodiment of the present disclosure is not limited to the above K -means clustering method, you can also use training neural network or decision tree (including classical decision tree method and derivative methods such as random forest) to classify all customers.

需要注意的是，在实际应用中，根据经验表明，最后得到的b列应该与对客户进行分类后的类别的个数相同，否则误差会变大。即，若要把客户分为7类，则特征矩阵Q中就应该有n*7的数据，即求图拉普拉斯A的7个特征向量，降维后的特征矩阵Q是n行7列的矩阵。It should be noted that in practical applications, according to experience, the number of the final b column should be the same as the number of categories after classifying customers, otherwise the error will become larger. That is, if customers are to be divided into 7 categories, there should be n*7 data in the feature matrix Q, that is, to find the 7 feature vectors of the graph Laplacian A, the feature matrix Q after dimension reduction is n rows of 7 A matrix of columns.

在操作S170，根据分类后的n个客户，预测n个客户的信用风险。In operation S170, the credit risks of n customers are predicted according to the classified n customers.

将每个类别设置为不同的信用等级，也就是可将n个客户划分为b个类别，对应了b个信用风险。Each category is set to a different credit level, that is, n customers can be divided into b categories, corresponding to b credit risks.

在一个具体实施例中，在执行完操作S160对客户分类完成后，调取所有客户的履约情况，以分类的组为单位，加和每组中客户的履约情况，然后根据得到的值进行排序。分数最高的，即一个组内履约情况最好的，可以对应设置为一级，第二高的对应设置为二级，以此类推。In a specific embodiment, after the customer classification is completed in operation S160, the contract performance status of all customers is retrieved, and the contract performance status of the customers in each group is added in the classified group, and then sorted according to the obtained value. . The highest score, that is, the best performance in a group, can be set to the first level, the second highest is set to the second level, and so on.

图3示意性示出了根据本公开实施例的利用局部最优块共轭梯度法对图拉普拉斯矩阵降维的流程图。FIG. 3 schematically shows a flowchart of reducing the dimension of the graph Laplacian matrix by using the locally optimal block conjugate gradient method according to an embodiment of the present disclosure.

如图3所示，该实施例的降维过程包括操作S210～操作S220。As shown in FIG. 3 , the dimensionality reduction process in this embodiment includes operations S210 to S220.

在操作S210，采用迭代方法确定搜索方向，使得搜索方向逐渐与待确定的特征矩阵中每一列的向量的方向一致。In operation S210, an iterative method is used to determine the search direction, so that the search direction is gradually consistent with the direction of the vector of each column in the feature matrix to be determined.

对于操作S210，首先，基于图拉普拉斯矩阵，得到中间矩阵，其中，中间矩阵为n行b列的矩阵。For operation S210, first, an intermediate matrix is obtained based on the graph Laplacian matrix, where the intermediate matrix is a matrix with n rows and b columns.

可以理解为，从图拉普拉斯矩阵A从n*n的维度，变为中间矩阵B中n*b的维度，即实现降维。It can be understood that the dimension of Laplacian matrix A changes from n*n to the dimension of n*b in the intermediate matrix B, that is, dimensionality reduction is realized.

在此步骤中的机器学习过程中，会涉及到求偏微分方程的解问题。在实际运用中，通过有限元方法可以简化求解的过程，得到近似的微积分方程的解。In the machine learning process in this step, the problem of solving partial differential equations will be involved. In practical applications, the finite element method can simplify the solution process and obtain approximate solutions of calculus equations.

然后，计算中间矩阵的特征值和特征向量。Then, the eigenvalues and eigenvectors of the intermediate matrix are calculated.

最后，根据中间矩阵的特征向量，生成第一子矩阵，其中，第一子矩阵中每一列的向量表示搜索方向，第一子矩阵中每一列的向量表示的搜索方向分别与待确定的特征矩阵中每一列的向量的方向对应，第一子矩阵为n行b列的矩阵。Finally, a first sub-matrix is generated according to the eigenvectors of the intermediate matrix, wherein the vector of each column in the first sub-matrix represents the search direction, and the search direction represented by the vector of each column in the first sub-matrix is different from the eigenmatrix to be determined. The direction of the vector in each column corresponds to, and the first sub-matrix is a matrix with n rows and b columns.

本申请针对上述求解过程运用的算法是Rayleigh-Ritz method(瑞利-里茨法)，是直接从泛函数触发，找到可以最小化它的函数的过程。The algorithm used in the present application for the above solution process is the Rayleigh-Ritz method, which is a process of directly triggering from a functional function to find a function that can minimize it.

示例型的，Rayleigh-Ritz算法的应用方式可以为以下过程：Exemplarily, the application of the Rayleigh-Ritz algorithm may be the following process:

输入：A∈R^n*n，矩阵；Input: A∈R ^n*n , matrix;

S∈R^n*b，矩阵，(0≤b＜n)；S∈R ^n*b , matrix, (0≤b<n);

输出：θ：b阶对角矩阵Output: θ: diagonal matrix of order b

Y：b阶矩阵Y: matrix of order b

RR：RR:

计算矩阵B＝S^TASCalculation matrix B=S ^T AS

求B的全部特征向量y₁，y₂，...，y_b与全部特征值θ₁，θ₂，...，θ_b Find all eigenvectors y ₁ , y ₂ , ..., y _b of B and all eigenvalues θ ₁ , θ ₂ , ..., θ _b

令Y＝[y₁ y₂…y_b]，

Let Y = [y ₁ y ₂ ... y _b ],

其中，矩阵B为中间矩阵，输入的矩阵A为图拉普拉斯矩阵，S为搜索矩阵，输出的矩阵θ为特征值矩阵，输出的Y为特征向量，RR是Rayleigh-Ritz的缩写，它表示Rayleigh-Ritz算法。Among them, the matrix B is the intermediate matrix, the input matrix A is the graph Laplacian matrix, S is the search matrix, the output matrix θ is the eigenvalue matrix, the output Y is the eigenvector, and RR is the abbreviation of Rayleigh-Ritz. Represents the Rayleigh-Ritz algorithm.

上述算法可以解释为，在S∈R^n*b中计算一个正交基数，逼近与b个特征向量对应的特征空间(b为想得到的类别个数，在操作S160和S170中有具体解释)，计算中间矩阵B，并求解中间矩阵B的特征向量Y和特征值θ。The above algorithm can be interpreted as calculating an orthogonal base in S ∈ ^{R n*b} , approximating the feature space corresponding to b feature vectors (b is the desired number of categories, which is explained in operations S160 and S170), Calculate the intermediate matrix B, and solve the eigenvector Y and eigenvalue θ of the intermediate matrix B.

在计算中间矩阵B时，基于图拉普拉斯矩阵、搜索矩阵和搜索矩阵的转置矩阵，生成中间矩阵，即，B＝S^TAS。When calculating the intermediate matrix B, based on the graph Laplacian matrix, the search matrix and the transpose matrix of the search matrix, the intermediate matrix is generated, ie, B=S ^T AS.

下述将利用Rayleigh-Ritz算法求得的特征值θ和特征向量Y对图拉普拉斯矩阵进行降维。The following will use the eigenvalue θ and eigenvector Y obtained by the Rayleigh-Ritz algorithm to reduce the dimension of the graph Laplacian matrix.

根据中间矩阵的特征值和特征向量以及第一子矩阵，生成第二子矩阵，第二子矩阵为n行b列的矩阵；其中，第二子矩阵中每一列的向量表示图拉普拉斯矩阵与第一子矩阵之间的残差向量。According to the eigenvalues and eigenvectors of the intermediate matrix and the first sub-matrix, a second sub-matrix is generated, and the second sub-matrix is a matrix with n rows and b columns; wherein, the vector of each column in the second sub-matrix represents the graph Laplacian Residual vector between the matrix and the first submatrix.

根据中间矩阵B、特征向量Y以及特征值θ求解第二子矩阵R₀，残差向量表征想要达到的精确度。The second sub-matrix R ₀ is solved according to the intermediate matrix B, the eigenvector Y and the eigenvalue θ, and the residual vector characterizes the desired accuracy.

此过程中，可以再次使用Rayleigh-Ritz，利用During this process, Rayleigh-Ritz can be used again, using

RR(Y，θ)＝RR(A，X₀)RR(Y, θ)=RR(A, X ₀ )

其中的X₀为随机矩阵，X₀为n行b列的矩阵，即X₀∈R^n*b，然后通过Among them, X ₀ is a random matrix, and X ₀ is a matrix with n rows and b columns, that is, X ₀ ∈R ^n*b , and then by

R₀＝AX₀-X₀θ₀ R ₀ =AX ₀ -X ₀ θ ₀

求得第二子矩阵R₀，第二子矩阵R₀可以理解为搜索子空间残差的方向。The second sub-matrix R ₀ is obtained, and the second sub-matrix R ₀ can be understood as the search direction of the subspace residual.

需要注意的是，在迭代过程中首次使用的第一子矩阵为n行b列的随机矩阵，即第一子矩阵为X₀。It should be noted that the first sub-matrix used for the first time in the iterative process is a random matrix with n rows and b columns, that is, the first sub-matrix is X ₀ .

根据中间矩阵的特征向量、第一子矩阵以及第二子矩阵，生成第三子矩阵，第三子矩阵为n行b列的矩阵；其中，第一子矩阵、第二子矩阵和第三子矩阵构成表示搜索子空间的搜索矩阵。According to the eigenvectors of the intermediate matrix, the first sub-matrix and the second sub-matrix, a third sub-matrix is generated, and the third sub-matrix is a matrix with n rows and b columns; wherein, the first sub-matrix, the second sub-matrix and the third sub-matrix The matrix constitutes a search matrix representing the search subspace.

根据中间矩阵的特征向量Y、第一子矩阵X₀和第二子矩阵R₀求解第三子矩阵P₀，最后第一子矩阵X₀、第二子矩阵R₀和第三子矩阵P₀构成表示搜索子空间的搜索矩阵S₀，即可以表示为The third sub-matrix P ₀ is solved according to the eigenvector Y of the intermediate matrix, the first sub-matrix X ₀ and the second sub-matrix R ₀ , and finally the first sub-matrix X ₀ , the second sub-matrix R ₀ and the third sub-matrix P ₀ A search matrix S ₀ representing the search subspace is formed, that is, it can be expressed as

S₀＝[X₀，R₀，P₀]S ₀ =[X ₀ , R ₀ , P ₀ ]

根据中间矩阵的特征向量以及搜索矩阵，采用迭代方法更新第一子矩阵，其中，在迭代过程中，根据前次迭代过程中的中间矩阵的特征向量以及前次迭代过程中的搜索矩阵，更新第一子矩阵，以生成当次迭代过程中的第一子矩阵。According to the eigenvectors of the intermediate matrix and the search matrix, the first sub-matrix is updated by an iterative method, wherein, in the iterative process, according to the eigenvectors of the intermediate matrix in the previous iteration process and the search matrix in the previous iteration process, the first sub-matrix is updated. a submatrix to generate the first submatrix in the current iteration.

在迭代过程中，根据当次迭代过程中的中间矩阵的特征值和特征向量以及当次迭代过程中的第一子矩阵，更新第二子矩阵，以生成当次迭代过程中的第二子矩阵。In the iterative process, according to the eigenvalues and eigenvectors of the intermediate matrix in the current iteration process and the first submatrix in the current iteration process, update the second submatrix to generate the second submatrix in the current iteration process .

在迭代过程中，根据前次迭代过程中的中间矩阵的特征向量以及前次迭代过程中的第一子矩阵和第二子矩阵，更新第三子矩阵，以生成当次迭代过程中的第三子矩阵。In the iteration process, the third sub-matrix is updated according to the eigenvectors of the intermediate matrix in the previous iteration process and the first sub-matrix and the second sub-matrix in the previous iteration process to generate the third sub-matrix in the current iteration process submatrix.

在计算机领域，k可以用来表示迭代次数，通常在下角标注明。在迭代过程中，每经历一次迭代过程，中间矩阵的特征向量、第一子矩阵、第二子矩阵值和第三子矩阵在原有下角标的基础上的+1，基于中间矩阵的特征向量、第一子矩阵、第二子矩阵值和第三子矩阵的关系，迭代过程可以表示为：In the computer field, k can be used to represent the number of iterations, usually indicated in the lower corner. In the iterative process, after each iteration process, the eigenvector of the intermediate matrix, the first sub-matrix, the second sub-matrix value and the third sub-matrix are +1 on the basis of the original subscript, based on the eigenvector of the intermediate matrix, the first sub-matrix The relationship between a sub-matrix, the second sub-matrix value and the third sub-matrix, the iterative process can be expressed as:

X_k+1＝S_kY_k X _k+1 =S _k Y _k

P_k+1＝[0，R_k，P_k]Y_k P _k+1 =[0, R _k , P _k ]Y _k

R_k+1＝AX_k+1-X_k+1θ_k+1 R _k+1 =AX _k+1 -X _k+1 θ _k+1

k＝k+1k=k+1

进一步地，其中，第三子矩阵中每一列的向量表示相邻两次迭代过程中搜索子空间的基之间的差。Further, wherein, the vector of each column in the third sub-matrix represents the difference between the bases of the search subspace in two adjacent iterative processes.

在操作S220，根据确定出的搜索方向，确定与图拉普拉斯矩阵对应的特征矩阵。In operation S220, a feature matrix corresponding to the graph Laplacian matrix is determined according to the determined search direction.

对于操作S220，在迭代过程中，当更新后的第二子矩阵中第i列的向量满足第一规定条件时，将更新后的第一子矩阵中第i列的向量确定为特征矩阵的一列，其中，i为正整数且1≤i＜b。For operation S220, in the iterative process, when the vector of the i-th column in the updated second sub-matrix satisfies the first prescribed condition, the vector of the i-th column in the updated first sub-matrix is determined as a column of the feature matrix , where i is a positive integer and 1≤i<b.

进一步地，更新后的第二子矩阵中第i列的向量满足第一规定条件包括：更新后的第二子矩阵中第i列的向量的范数小于规定的阈值。Further, that the vector of the i-th column in the updated second sub-matrix satisfies the first predetermined condition includes: the norm of the vector of the i-th column in the updated second sub-matrix is smaller than a predetermined threshold.

例如，在本公开的实施例中，可以按照如下迭代过程确定特征矩阵Q。For example, in an embodiment of the present disclosure, the feature matrix Q may be determined according to the following iterative process.

(1)生成随机矩阵X₀∈R^n*b，(0≤b＜n)；(1) Generate a random matrix X ₀ ∈R ^n*b , (0≤b<n);

(2)使用Rayleigh-Ritz算法计算特征向量和特征值：

(2) Calculate the eigenvectors and eigenvalues using the Rayleigh-Ritz algorithm:

(3)给出迭代初始值：R₀：＝AX₀-X₀ θ₀，k：＝0，Q：＝[]，P₀：＝[]；(3) Give the initial value of iteration: R ₀ :=AX ₀ -X ₀ θ ₀ , k:=0, Q:=[], P ₀ :=[];

(4)当特征矩阵Q的列数小于b时，执行如下迭代过程：(4) When the number of columns of the feature matrix Q is less than b, perform the following iterative process:

让Q与R_k规范正交化；Let Q be normalized to R _k norm;

令S_k：＝[X_k，R_k，P_k]，

Let S _k :=[X _k , R _k , P _k ],

X_k+1：＝S_kY_k；X _k+1 :=S _k Y _k ;

P_k+1：＝[0，R_k，P_k]Y_k；P _k+1 :=[0, R _k , P _k ]Y _k ;

R_k+1：＝AX_k+1-X_k+1 θk₊₁；R _k+1 :=AX _k+1 -X _k+1 θk ₊₁ ;

k：＝k+1；k:=k+1;

如果矩阵R_k+1某些列的范数小于规定的阈值ε，将X_k+1中对应的列放进特征矩阵Q中；把X_k+1中对应的列置为随机向量，然后：X₀：＝X_k+1，k：＝0，执行上述迭代过程，直至特征矩阵Q的列数等于b。If the norm of some columns of matrix R _k+1 is less than the specified threshold ε, put the corresponding columns in X _k+1 into the feature matrix Q; set the corresponding columns in X _k+1 as random vectors, then: X ₀ :=X _k+1 , k:=0, and the above iterative process is performed until the number of columns of the characteristic matrix Q is equal to b.

根据本申请的一个实施例，该预测方法还包括：当n个客户中至少一个客户的至少一个信息项发生变化时，更新特征矩阵；和/或，在获取第n+1个客户的个人信息后，更新特征矩阵。According to an embodiment of the present application, the prediction method further includes: when at least one information item of at least one customer among the n customers changes, updating the feature matrix; and/or, after acquiring the personal information of the n+1th customer Then, update the feature matrix.

在每增加一个新信息项、变动其中一个客户的信息项对应的字段值以及添加一个新客户时，必然伴随着图拉普拉斯矩阵的更新，进而特征矩阵也会得到更新。When a new information item is added, the field value corresponding to one of the customer's information items is changed, and a new customer is added, the graph Laplacian matrix must be updated, and the feature matrix will also be updated.

在一个实施例中，在更新特征矩阵的过程中，迭代过程中首次使用的第一子矩阵为更新前的特征矩阵。In one embodiment, in the process of updating the feature matrix, the first sub-matrix used for the first time in the iterative process is the feature matrix before updating.

对于现有技术中的每增加一个新信息项、变动其中一个客户的信息项对应的字段值以及添加一个新客户时，需要重新对所有的数据进行计算的情况。In the prior art, when a new information item is added, a field value corresponding to one of the customer's information items is changed, and a new customer is added, all data needs to be recalculated.

考虑到即使添加新的信息项或增加一个新客户，各类特征值变化不大，新计算得到的特征空间与原有的特征空间必然接近，因此可以迅速得到新的特征空间。通过上述理念，本申请的预测方法是在原有的特征矩阵上进行计算的，在客户的信息发生变动时，可以在现有的图拉普拉斯矩阵的特征空间上建立新的图拉普拉斯矩阵的特征空间，即在操作S150中的X₀＝0直接替换为令X₀＝Q。Considering that even if a new information item or a new customer is added, the various eigenvalues do not change much, and the newly calculated feature space must be close to the original feature space, so a new feature space can be obtained quickly. Through the above concept, the prediction method of the present application is calculated on the original feature matrix. When the customer's information changes, a new graph Laplacian can be established on the feature space of the existing graph Laplacian matrix. The eigenspace of the Si matrix, that is, X ₀ =0 in operation S150 is directly replaced by letting X ₀ =Q.

在一次试验中，第二次计算的速度可比第一次计算快数百倍到数千倍。可以断定，用本申请的预测方法可以减少机器计算量，进而节约资源，减少计算时间。In a single trial, the second calculation can be hundreds to thousands of times faster than the first. It can be concluded that the prediction method of the present application can reduce the amount of machine computation, thereby saving resources and reducing computation time.

基于上述信用风险的预测方法，本公开还提供了信用风险的预测装置。以下将结合图4对该装置进行详细描述。图4示意性示出了根据本公开实施例的预测装置的结构框图。Based on the above-mentioned method for predicting credit risk, the present disclosure also provides a device for predicting credit risk. The device will be described in detail below with reference to FIG. 4 . FIG. 4 schematically shows a structural block diagram of a prediction apparatus according to an embodiment of the present disclosure.

如图4所示，该实施例的预测装置800包括客户授权获取模块810、个人信息获取模块820、个人信息矩阵获取模块830、图拉普拉斯矩阵计算模块840、特征矩阵获取模块850、分类模块860和信用风险预测模块870。As shown in FIG. 4, the prediction apparatus 800 of this embodiment includes a customer authorization acquisition module 810, a personal information acquisition module 820, a personal information matrix acquisition module 830, a graph Laplace matrix calculation module 840, a feature matrix acquisition module 850, a classification module 860 and credit risk prediction module 870.

客户授权获取模块810用于获取客户对获取个人信息的授权。在一个实施例中，客户授权获取模块810可以用于执行前文描述的操作S110，在此不再赘述。The client authorization acquisition module 810 is used to acquire the authorization of the client to acquire personal information. In one embodiment, the client authorization acquisition module 810 may be configured to perform the operation S110 described above, which will not be repeated here.

个人信息获取模块820用于：在得到客户对获取个人信息的授权的情况下，获取n个客户的个人信息，其中，每一个客户的个人信息包括N个信息项，N个信息项均与信用风险相关，n为大于等于1的整数，N为大于等于2的整数。在一个实施例中，个人信息获取模块820可以用于执行前文描述的操作S120，在此不再赘述。The personal information acquisition module 820 is used to: obtain the personal information of n customers under the condition of obtaining the authorization of the customer to obtain personal information, wherein the personal information of each customer includes N information items, and the N information items are related to the credit. Risk correlation, n is an integer greater than or equal to 1, and N is an integer greater than or equal to 2. In one embodiment, the personal information obtaining module 820 may be configured to perform the operation S120 described above, which will not be repeated here.

个人信息矩阵获取模块830用于：对n个客户的个人信息进行量化，得到个人信息矩阵，其中，个人信息矩阵为n行N列的矩阵，个人信息矩阵的每一行表示一个客户的经量化的个人信息。在一个实施例中，个人信息矩阵获取模块可以用于执行前文描述的操作S130，在此不再赘述。The personal information matrix obtaining module 830 is configured to: quantify the personal information of n customers to obtain a personal information matrix, wherein the personal information matrix is a matrix with n rows and N columns, and each row of the personal information matrix represents the quantified data of one customer. Personal information. In one embodiment, the personal information matrix acquisition module may be configured to perform the operation S130 described above, which will not be repeated here.

图拉普拉斯矩阵计算模块840用于：针对个人信息矩阵，利用谱聚类方法计算与个人信息矩阵对应的图拉普拉斯矩阵，其中，图拉普拉斯矩阵为n行n列的矩阵。在一个实施例中，图拉普拉斯矩阵计算模块可以用于执行前文描述的操作S140，在此不再赘述。The graph Laplacian matrix calculation module 840 is configured to: for the personal information matrix, use the spectral clustering method to calculate the graph Laplacian matrix corresponding to the personal information matrix, wherein the graph Laplacian matrix has n rows and n columns matrix. In one embodiment, the graph Laplacian matrix calculation module may be used to perform the operation S140 described above, which will not be repeated here.

特征矩阵获取模块850用于：采用局部最优块共轭梯度法对图拉普拉斯矩阵进行降维，得到与图拉普拉斯矩阵对应的特征矩阵，其中，特征矩阵为n行b列的矩阵，b为正整数且1≤b＜n。在一个实施例中，特征矩阵获取模块可以用于执行前文描述的操作S150，在此不再赘述。The feature matrix obtaining module 850 is used for: using the local optimal block conjugate gradient method to reduce the dimension of the graph Laplacian matrix to obtain a feature matrix corresponding to the graph Laplacian matrix, wherein the feature matrix is n rows and b columns , where b is a positive integer and 1≤b<n. In one embodiment, the feature matrix obtaining module may be configured to perform the operation S150 described above, which will not be repeated here.

分类模块860用于：基于特征矩阵，采用聚类方法对n个客户进行分类。在一个实施例中，分类模块可以用于执行前文描述的操作S160，在此不再赘述。The classification module 860 is configured to: classify the n customers by adopting a clustering method based on the feature matrix. In one embodiment, the classification module may be configured to perform the operation S160 described above, which will not be repeated here.

信用风险预测模块870用于：根据分类后的n个客户，预测n个客户的信用风险。在一个实施例中，信用风险预测模块830可以用于执行前文描述的操作S170，在此不再赘述。The credit risk prediction module 870 is used for: predicting the credit risk of n customers according to the classified n customers. In one embodiment, the credit risk prediction module 830 may be configured to perform the operation S170 described above, which will not be repeated here.

根据本公开实施例中的预测装置，可以执行上述的预测方法，通过使用局部最优块共轭梯度法可快速搜索最佳的梯度方向，对每个客户的N个信息项进行降维，进而迅速得到图拉普拉斯矩阵的特征空间，计算速度快、占用内存小，大大降低了神经网络的计算量和训练速度。According to the prediction device in the embodiment of the present disclosure, the above-mentioned prediction method can be implemented, and the optimal gradient direction can be quickly searched by using the local optimal block conjugate gradient method, and the dimension of each customer's N information items can be reduced, and then The feature space of the graph Laplacian matrix is quickly obtained, the calculation speed is fast, and the memory is small, which greatly reduces the calculation amount and training speed of the neural network.

根据本公开的实施例，客户授权获取模块810、个人信息获取模块820、个人信息矩阵获取模块830、图拉普拉斯矩阵计算模块840、特征矩阵获取模块850、分类模块860和信用风险预测模块870中的任意多个模块可以合并在一个模块中实现，或者其中的任意一个模块可以被拆分成多个模块。或者，这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合，并在一个模块中实现。根据本公开的实施例，客户授权获取模块810、个人信息获取模块820、个人信息矩阵获取模块830、图拉普拉斯矩阵计算模块840、特征矩阵获取模块850、分类模块860和信用风险预测模块870中的至少一个可以至少被部分地实现为硬件电路，例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC)，或可以通过对电路进行集成或封装的任伺其他的合理方式等硬件或固件来实现，或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者，客户授权获取模块810、个人信息获取模块820、个人信息矩阵获取模块830、图拉普拉斯矩阵计算模块840、特征矩阵获取模块850、分类模块860和信用风险预测模块870中的至少一个可以至少被部分地实现为计算机程序模块，当该计算机程序模块被运行时，可以执行相应的功能。According to the embodiment of the present disclosure, the customer authorization acquisition module 810, the personal information acquisition module 820, the personal information matrix acquisition module 830, the graph Laplace matrix calculation module 840, the feature matrix acquisition module 850, the classification module 860 and the credit risk prediction module Any number of modules in 870 may be combined into one module for implementation, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to the embodiment of the present disclosure, the customer authorization acquisition module 810, the personal information acquisition module 820, the personal information matrix acquisition module 830, the graph Laplace matrix calculation module 840, the feature matrix acquisition module 850, the classification module 860 and the credit risk prediction module At least one of 870 may be implemented, at least in part, as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit ( ASIC), or can be realized by hardware or firmware such as any other reasonable way of integrating or encapsulating the circuit, or in any one of the three implementation modes of software, hardware and firmware or in any appropriate combination of any of them to fulfill. Or, at least one of the customer authorization acquisition module 810 , the personal information acquisition module 820 , the personal information matrix acquisition module 830 , the graph Laplace matrix calculation module 840 , the feature matrix acquisition module 850 , the classification module 860 and the credit risk prediction module 870 Can be implemented at least in part as computer program modules which, when executed, can perform corresponding functions.

如图5所示，根据本公开实施例的电子设备900包括处理器901，其可以根据存储在只读存储器(ROM)902中的程序或者从存储部分908加载到随机访问存储器(RAM)903中的程序而执行各种适当的动作和处理。处理器901例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如，专用集成电路(ASIC))等等。处理器901还可以包括用于缓存用途的板载存储器。处理器901可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。As shown in FIG. 5 , an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can be loaded into a random access memory (RAM) 903 according to a program stored in a read only memory (ROM) 902 or from a storage portion 908 program to perform various appropriate actions and processes. The processor 901 may include, for example, a general-purpose microprocessor (eg, a CPU), an instruction set processor and/or a related chipset, and/or a special-purpose microprocessor (eg, an application-specific integrated circuit (ASIC)), and the like. The processor 901 may also include on-board memory for caching purposes. The processor 901 may include a single processing unit or multiple processing units for performing different actions of the method flow according to the embodiments of the present disclosure.

在RAM 903中，存储有电子设备900操作所需的各种程序和数据。处理器901、ROM902以及RAM 903通过总线904彼此相连。处理器901通过执行ROM 902和/或RAM 903中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意，所述程序也可以存储在除ROM 902和RAM 903以外的一个或多个存储器中。处理器901也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901 , the ROM 902 and the RAM 903 are connected to each other through a bus 904 . The processor 901 performs various operations of the method flow according to the embodiment of the present disclosure by executing the programs in the ROM 902 and/or the RAM 903 . Note that the program may also be stored in one or more memories other than the ROM 902 and the RAM 903 . The processor 901 may also perform various operations of the method flow according to the embodiments of the present disclosure by executing programs stored in the one or more memories.

根据本公开的实施例，电子设备900还可以包括输入/输出(I/O)接口905，输入/输出(I/O)接口905也连接至总线904。电子设备900还可以包括连接至I/O接口905的以下部件中的一项或多项：包括键盘、鼠标等的输入部分906；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907；包括硬盘等的存储部分908；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口905。可拆卸介质911，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器910上，以便于从其上读出的计算机程序根据需要被安装入存储部分908。According to an embodiment of the present disclosure, the electronic device 900 may also include an input/output (I/O) interface 905 which is also connected to the bus 904 . Electronic device 900 may also include one or more of the following components connected to I/O interface 905: input portion 906 including keyboard, mouse, etc.; including components such as cathode ray tube (CRT), liquid crystal display (LCD), etc., and An output section 907 of speakers and the like; a storage section 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, and the like. The communication section 909 performs communication processing via a network such as the Internet. A drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 910 as needed so that a computer program read therefrom is installed into the storage section 908 as needed.

本公开还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的；也可以是单独存在，而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被执行时，实现根据本公开实施例的方法。The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be included in the device/apparatus/system described in the above embodiments; it may also exist alone without being assembled into the device/system. device/system. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, implement the method according to the embodiment of the present disclosure.

根据本公开的实施例，计算机可读存储介质可以是非易失性的计算机可读存储介质，例如可以包括但不限于：便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。例如，根据本公开的实施例，计算机可读存储介质可以包括上文描述的ROM 902和/或RAM 903和/或ROM 902和RAM 903以外的一个或多个存储器。According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as, but not limited to, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM) , erasable programmable read only memory (EPROM or flash memory), portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include one or more memories other than ROM 902 and/or RAM 903 and/or ROM 902 and RAM 903 described above.

本公开的实施例还包括一种计算机程序产品，其包括计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。当计算机程序产品在计算机系统中运行时，该程序代码用于使计算机系统实现本公开实施例所提供的信用风险预测方法。Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flowchart. When the computer program product runs in the computer system, the program code is used to make the computer system implement the credit risk prediction method provided by the embodiments of the present disclosure.

在该计算机程序被处理器901执行时执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例，上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。When the computer program is executed by the processor 901, the above-described functions defined in the system/apparatus of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules.

在一种实施例中，该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中，该计算机程序也可以在网络介质上以信号的形式进行传输、分发，并通过通信部分909被下载和安装，和/或从可拆卸介质911被安装。该计算机程序包含的程序代码可以用任伺适当的网络介质传输，包括但不限于：无线、有线等等，或者上述的任意合适的组合。In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal over a network medium, and downloaded and installed through the communication section 909, and/or installed from a removable medium 911. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

在这样的实施例中，该计算机程序可以通过通信部分909从网络上被下载和安装，和/或从可拆卸介质911被安装。在该计算机程序被处理器901执行时，执行本公开实施例的系统中限定的上述功能。根据本公开的实施例，上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909, and/or installed from the removable medium 911. When the computer program is executed by the processor 901, the above-described functions defined in the system of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the above-described systems, apparatuses, apparatuses, modules, units, etc. can be implemented by computer program modules.

根据本公开的实施例，可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码，具体地，可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java，C++，python，“C”语言或类似的程序设计语言。程序代码可以完全地在客户计算设备上执行、部分地在客户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到客户计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。According to the embodiments of the present disclosure, the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages, and specifically, high-level procedures and/or object-oriented programming may be used. programming language, and/or assembly/machine language to implement these computational programs. Programming languages include, but are not limited to, languages such as Java, C++, python, "C" or similar programming languages. The program code may execute entirely on the client computing device, partly on the client device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the client computing device through any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

本领域技术人员可以理解，本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合或/或结合，即使这样的组合或结合没有明确记载于本公开中。特别地，在不脱离本公开精神和教导的情况下，本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。Those skilled in the art will appreciate that various combinations and/or combinations of features recited in various embodiments and/or claims of the present disclosure are possible, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or in the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of this disclosure.

以上对本公开的实施例进行了描述。但是，这些实施例仅仅是为了说明的目的，而并非为了限制本公开的范围。尽管在以上分别描述了各实施例，但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围，本领域技术人员可以做出多种替代和修改，这些替代和修改都应落在本公开的范围之内。Embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only, and are not intended to limit the scope of the present disclosure. Although the various embodiments are described above separately, this does not mean that the measures in the various embodiments cannot be used in combination to advantage. The scope of the present disclosure is defined by the appended claims and their equivalents. Without departing from the scope of the present disclosure, those skilled in the art can make various substitutions and modifications, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims

1. A method for predicting credit risk, comprising the steps of:

obtaining the authorization of a client to obtain personal information;

under the condition that the authorization of the client for obtaining the personal information is obtained, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer greater than or equal to 1, and N is an integer greater than or equal to 2;

quantizing the personal information of the N clients to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents quantized personal information of one client;

aiming at the personal information matrix, calculating a graph Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix, wherein the characteristic matrix is a matrix with n rows and b columns, b is a positive integer and b is more than or equal to 1 and less than n;

classifying the n customers by adopting a clustering method based on the characteristic matrix; and

and predicting credit risks of the n clients according to the classified n clients.

2. The method according to claim 1, wherein the dimensionality reduction of the graph laplacian matrix by using a locally optimal block conjugate gradient method to obtain an eigen matrix corresponding to the graph laplacian matrix specifically includes:

determining a search direction by adopting an iteration method, wherein the search direction is gradually consistent with the direction of a vector of each column in a characteristic matrix to be determined; and

and determining a characteristic matrix corresponding to the graph Laplacian matrix according to the determined search direction.

3. The method according to claim 2, wherein the determining the search direction using an iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined comprises:

obtaining an intermediate matrix based on the graph Laplace matrix, wherein the intermediate matrix is a matrix with n rows and b columns;

calculating eigenvalue and eigenvector of the intermediate matrix; and

and generating a first sub-matrix according to the feature vector of the intermediate matrix, wherein the vector of each column in the first sub-matrix represents a search direction, the search direction represented by the vector of each column in the first sub-matrix corresponds to the direction of the vector of each column in the feature matrix to be determined, and the first sub-matrix is a matrix with n rows and b columns.

4. The method according to claim 3, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

generating a second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix and the first sub-matrix, wherein the second sub-matrix is a matrix with n rows and b columns;

wherein the vector of each column in the second sub-matrix represents a residual vector between the graph laplacian matrix and the first sub-matrix.

5. The method according to claim 4, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

generating a third sub-matrix according to the feature vector of the intermediate matrix, the first sub-matrix and the second sub-matrix, wherein the third sub-matrix is a matrix with n rows and b columns;

wherein the first, second, and third sub-matrices constitute a search matrix representing a search subspace.

6. The method according to any one of claims 2 to 5, wherein the determining the search direction using an iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

updating the first sub-matrix using an iterative method based on the feature vectors of the intermediate matrix and the search matrix,

and in the iteration process, updating the first sub-matrix according to the feature vector of the intermediate matrix in the previous iteration process and the search matrix in the previous iteration process to generate the first sub-matrix in the current iteration process.

7. The method according to claim 6, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

in the iteration process, updating the second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix in the current iteration process and the first sub-matrix in the current iteration process to generate the second sub-matrix in the current iteration process.

8. The method of claim 7, wherein the vector of each column in the third sub-matrix represents a difference between bases of the search subspace in two adjacent iterations.

9. The method according to claim 8, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

in the iteration process, updating the third sub-matrix according to the feature vector of the intermediate matrix in the previous iteration process and the first sub-matrix and the second sub-matrix in the previous iteration process so as to generate the third sub-matrix in the current iteration process.

10. The method according to claim 9, wherein the determining a feature matrix corresponding to the graph laplacian matrix according to the determined search direction specifically includes:

in the iteration process, when the vector of the ith column in the updated second sub-matrix meets a first specified condition, determining the vector of the ith column in the updated first sub-matrix as one column of the feature matrix, wherein i is a positive integer and is more than or equal to 1 and less than b.

11. The method of claim 10, wherein the updated vector of the ith column in the second sub-matrix satisfying a first specified condition comprises:

and the norm of the vector of the ith column in the updated second sub-matrix is smaller than a specified threshold value.

12. The method according to claim 3, wherein the obtaining an intermediate matrix based on the graph laplacian matrix specifically includes:

generating the intermediate matrix based on the graph laplacian matrix, the search matrix, and a transpose of the search matrix.

13. The method according to claim 6, wherein the first sub-matrix used for the first time in the iterative process is a random matrix with n rows and b columns.

14. The method of claim 13, further comprising: updating the feature matrix when at least one information item of at least one of the n customers changes; and/or the presence of a gas in the gas,

and after the personal information of the (n + 1) th customer is acquired, updating the feature matrix.

15. The method of claim 14, wherein in the updating of the feature matrix, the first sub-matrix used for the first time in the iterative process is the feature matrix before updating.

16. An apparatus for predicting a credit risk, comprising:

the client authorization acquisition module is used for acquiring the authorization of the client for acquiring the personal information;

a personal information acquisition module to: under the condition that the authorization of the client for obtaining the personal information is obtained, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer greater than or equal to 1, and N is an integer greater than or equal to 2;

a personal information matrix acquisition module to: quantizing the personal information of the N customers to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents the quantized personal information of one customer;

a graph laplacian matrix calculation module to: aiming at the personal information matrix, calculating a graph Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

a feature matrix acquisition module to: reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix, wherein the characteristic matrix is a matrix with n rows and b columns, b is a positive integer and b is more than or equal to 1 and less than n;

a classification module to: classifying the n customers by adopting a clustering method based on the characteristic matrix; and

a credit risk prediction module to: and predicting credit risks of the n clients according to the classified n clients.

17. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-15.

18. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 15.

19. A computer program product comprising a computer program which, when executed by a processor, carries out the method according to any one of claims 1 to 15.