CN108629675A

CN108629675A - A kind of Multiple trees financial alert method

Info

Publication number: CN108629675A
Application number: CN201810388744.3A
Authority: CN
Inventors: 郭华平; 刁小宇; 刘宏兵; 邬长安
Original assignee: Xinyang Normal University
Current assignee: Xinyang Normal University
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2018-10-09

Abstract

The present invention provides a kind of Multiple trees financial alert method.This method includes：Step 1 obtains financial data sample set D to be detected, and the financial data sample set D to be detected includes the data subset corresponding to each data attribute in data attribute subset and the data attribute subset；Step 2 carries out M random sampling according to preset random sampling number M to the financial data sample set D to be detected, extracts M data every time, determines that the sample set that kth time random sampling obtains is training subset D_k, wherein k=1,2,3 ... M；Step 3, using preset noise reduction own coding model in the training subset D_kAcquistion go to school to decision tree T_k；Step 4, according to the decision tree T_kFinancial situation class prediction is carried out to each data subset in financial data sample set D to be detected, the financial situation classification includes that financial situation is good and financial situation is abnormal.The present invention can improve the accuracy predicted financial situation and generalization ability.

Description

A Multi-decision Tree Financial Early Warning Method

技术领域technical field

本发明涉及数据分析技术领域，尤其涉及一种多决策树财务预警方法。The invention relates to the technical field of data analysis, in particular to a multi-decision tree financial early warning method.

背景技术Background technique

近年来，随着我国市场经济和资本市场的快速发展，国内企业之间的竞争日益激烈，同时越来越多的跨国公司与国内公司也在争夺客户、人力、商品和资本供应商。世界经济一体化在给企业带来前所未有的机遇的同时，也带来了巨大的挑战，企业面临的不确定性日益加强。随着雷曼兄弟控股公司、世界通信公司等大型集团公司破产事件的曝光，企业陷入财务困境甚至破产的案例屡见不鲜。但是企业陷入财务危机是一个逐步的过程，并非短期内形成的，而是具有较长的潜伏期。企业财务状况的变化可以通过某些财务指标反映出来，因而财务指标对公司未来的财务状况具有预测能力。如果对企业的财务状况通过一定技术手段进行预测，对企业的各个利益相关者包括投资者具有重要意义：一是有利于企业管理者及时发现企业财务问题，解决问题，防患于未然；二是有利于投资者的投资决策，保护投资者利益。In recent years, with the rapid development of my country's market economy and capital market, the competition among domestic enterprises has become increasingly fierce. At the same time, more and more multinational companies and domestic companies are also competing for customers, human resources, commodity and capital suppliers. While the integration of the world economy has brought unprecedented opportunities to enterprises, it has also brought enormous challenges, and the uncertainties faced by enterprises are increasing day by day. With the exposure of the bankruptcies of large group companies such as Lehman Brothers Holdings and WorldCom, it is not uncommon for companies to encounter financial difficulties or even go bankrupt. However, it is a gradual process for an enterprise to fall into financial crisis, and it is not formed in a short period of time, but has a long incubation period. Changes in the financial status of an enterprise can be reflected by certain financial indicators, so financial indicators have the ability to predict the company's future financial status. If the financial situation of the enterprise is predicted by certain technical means, it is of great significance to various stakeholders of the enterprise, including investors: first, it will help the enterprise managers to discover the financial problems of the enterprise in time, solve them, and prevent problems before they happen; It is beneficial to investors' investment decisions and protects their interests.

为了能够及时并准确地对财务状况做出预测，我国各高校和企业以及投资者都相继开展了对财务预警方法的研究工作，目前已有相应的财务与警方法。然而，目前国内现有的财务预警方法大部分都是采用单个模型的预测方法，如神经网络模型、决策树模型、Logistic回归模型等，准确率较低，性能有待提高；而少数多模型的预测方法则泛化能力较低。In order to predict the financial situation timely and accurately, universities, enterprises and investors in our country have successively carried out research on financial early warning methods. At present, there are corresponding financial and police methods. However, most of the existing financial early warning methods in China use single-model forecasting methods, such as neural network models, decision tree models, Logistic regression models, etc., with low accuracy and performance needs to be improved; while a few multi-model forecasting methods The method has low generalization ability.

发明内容Contents of the invention

本发明提供一种多决策树财务预警方法，用以提高对财务状况预测的准确度和泛化能力。The invention provides a multi-decision tree financial early warning method, which is used to improve the accuracy and generalization ability of financial status prediction.

本发明提供一种多决策树财务预警方法，该方法包括：The invention provides a multi-decision tree financial early warning method, the method comprising:

步骤1、获取待检测财务数据样本集D，所述待检测财务数据样本集D包括数据属性子集和所述数据属性子集中每个数据属性所对应的数据子集；Step 1. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes data attribute subsets and data subsets corresponding to each data attribute in the data attribute subsets;

步骤2、根据预设的随机抽样次数M对所述待检测财务数据样本集D进行M次随机抽样，每次抽取M个数据，确定第k次随机抽样得到的样本集为训练子集D_k，其中k＝1,2,3…M；Step 2. Perform M random sampling on the financial data sample set D to be tested according to the preset random sampling times M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D _k , where k=1,2,3...M;

步骤3、利用预设的降噪自编码模型在所述训练子集D_k上学习得到决策树T_k；Step 3, using the preset noise reduction autoencoder model to learn the decision tree T _{k on the training subset D k} _;

步骤4、根据所述决策树T_k对待检测财务数据样本集D中的每个数据子集进行财务状况类别预测，所述财务状况类别包括财务状况良好和财务状况异常。Step 4. According to the decision tree T _k , predict the financial status category of each data subset in the financial data sample set D to be detected, and the financial status category includes good financial status and abnormal financial status.

进一步地，所述步骤3具体包括：Further, the step 3 specifically includes:

步骤31、利用预设的降噪自编码模型Step 31. Use the preset noise reduction autoencoder model

对训练子集D_k在第l层神经网络进行编码得到新的训练子集其中W^(k,1)为第l层神经网络的权重，b^(k,l)为第l层神经网络的偏置项。Encode the training subset D _k in the l-layer neural network to obtain a new training subset Where W ^{(k, 1)} is the weight of the l-th layer neural network, and b ^{(k, l)} is the bias item of the l-th layer neural network.

步骤32、在所述新的训练子集上学习得到决策树模型T_k。Step 32, in the new training subset The decision tree model T _k is obtained through learning above.

进一步地，所述步骤31中的权重W^(k,1)和偏置项b^(k,l)分别根据下式：Further, the weight W ^{(k, 1)} and the bias item b ^{(k, l)} in the step 31 are respectively according to the following formula:

确定；其中，λ为权重衰减参数，x为特定数据属性下的数据子集，为加入噪声后的特定数据属性下的数据子集，表示输出层的输出，和分别表示代价函数J(W^(k,l-1),b^(k,l-1))关于W^(k,l-1)和b^(k,l-1)的偏导数。Determined; where, λ is the weight decay parameter, x is the data subset under the specific data attribute, is a subset of data under a specific data attribute after adding noise, represents the output of the output layer, and represent the partial derivatives of the cost function J(W ^(k,l-1) ,b ^(k,l-1) ) with respect to W ^(k,l-1) and b ^(k,l-1) respectively.

进一步地，所述步骤32具体包括：Further, the step 32 specifically includes:

步骤321、初始化生成节点node作为根节点；Step 321, initialize the generation node node as the root node;

步骤322、若中财务数据样本同属一个特定财务状况类别或财务数据样本数小于预设阈值，则将node标记为所述特定财务状况类别的叶结点；Step 322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category;

步骤323、若中财务数据样本不同属一个特定财务状况类别并且财务数据样本数不小于预设阈值，则利用目标函数Step 323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function

搜索测试条件，其中D_t为当前节点t对应的数据集，D_tj为当前节点t的孩子结点j的数据集，Ent(D_tj)为D_tj的信息熵；Search test conditions, where D _t is the data set corresponding to the current node t, D _tj is the data set of the child node j of the current node t, and Ent(D _tj ) is the information entropy of D _tj ;

步骤324、根据测试条件将划分为两个子集并构建两个孩子结点。Step 324, according to test condition will Divide into two subsets and build two child nodes.

步骤325、对于每个子集，重复步骤322和步骤323。Step 325, for each subset, repeat step 322 and step 323.

进一步地，所述步骤4具体包括：Further, the step 4 specifically includes:

利用组合基分类器T^* Using the combined base classifier T ^*

对特定数据属性下的数据子集x进行分类，其中δ(true)＝1，δ(false)＝0，T_k(x)为决策树T_k对特定数据属性下的数据子集x的预测。Classify a data subset x under a specific data attribute, where δ(true)=1, δ(false)=0, T _k (x) is the prediction of the decision tree T _k on the data subset x under a specific data attribute.

进一步地，所述数据属性子集包括：资产负债率，净资产收益率，资产利润率，净利润率，总资产周转率，应收账款周转率，流动资产周转率，主营业务增长率，总资产增长率和净利润增长率中的至少一种数据属性。Further, the subset of data attributes includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, accounts receivable turnover, current asset turnover, main business growth rate , at least one data attribute in total asset growth rate and net profit growth rate.

本发明的技术效果：Technical effect of the present invention:

本发明提供的一种多决策树财务预警方法，通过对获取到的待检测财务数据样本集进行随机抽样得到多个不同的训练子集，然后对多个不同的训练子集进行编码得到多个不同的决策树，最后通过组合多个决策树对待检测数据子集进行预测。因此，本发明能够提高对财务状况预测的准确度和泛化能力。A multi-decision tree financial early warning method provided by the present invention obtains a plurality of different training subsets by randomly sampling the acquired financial data sample set to be detected, and then encodes a plurality of different training subsets to obtain multiple Different decision trees, and finally predict the subset of data to be detected by combining multiple decision trees. Therefore, the present invention can improve the accuracy and generalization ability of financial status prediction.

附图说明Description of drawings

图1为本发明实施例提供的决策树的结构示意图；Fig. 1 is a schematic structural diagram of a decision tree provided by an embodiment of the present invention;

图2为本发明实施例提供的多决策树财务预警方法的流程示意图；Fig. 2 is a schematic flow chart of the multi-decision tree financial early warning method provided by the embodiment of the present invention;

图3为本发明又一实施例提供的多决策树财务预警方法的流程示意图。Fig. 3 is a schematic flowchart of a multi-decision tree financial early warning method provided by another embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the present invention Examples, not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

决策树是一种简单易用的非参数分类器。它不需要对数据有任何的先验假设，计算速度较快，结果容易解释，而且稳健性强A decision tree is an easy-to-use nonparametric classifier. It does not require any prior assumptions about the data, the calculation speed is fast, the results are easy to interpret, and it is robust

决策树的构成有四个要素：(1)决策结点；(2)方案枝；(3)状态结点；(4)概率枝。如图1所示：决策树一般由方块结点、圆形结点、方案枝、概率枝等组成，方块结点称为决策结点，由决策结点引出若干条细支，每条细支代表一个方案，称为方案枝；圆形结点称为状态结点，由状态结点引出若干条细支，表示不同的自然状态，称为概率枝。每条概率枝代表一种自然状态。在每条细枝上标明客观状态的内容和其出现概率。在概率枝的最末稍标明该方案在该自然状态下所达到的结果(收益值或损失值)。这样树形图由左向右，由简到繁展开，组成一个树状网络图。The composition of the decision tree has four elements: (1) decision node; (2) program branch; (3) state node; (4) probability branch. As shown in Figure 1: The decision tree is generally composed of square nodes, circular nodes, plan branches, probability branches, etc. The square nodes are called decision nodes, and several thin branches are derived from the decision nodes. Each thin branch It represents a plan, called a plan branch; the circular node is called a state node, and several thin branches are drawn from the state node, which represent different natural states, called probability branches. Each probability branch represents a natural state. Mark the content of the objective state and its probability of occurrence on each twig. At the end of the probability branch, indicate the result (benefit value or loss value) achieved by the plan in the natural state. In this way, the tree diagram is expanded from left to right, from simple to complex, forming a tree network diagram.

图2为本发明实施例提供的一种多决策树财务预警方法的流程示意图。如图2所示，该方法包括以下步骤：Fig. 2 is a schematic flowchart of a multi-decision tree financial early warning method provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps:

S201、获取待检测财务数据样本集D，所述待检测财务数据样本集D包括数据属性子集和所述数据属性子集中每个数据属性所对应的数据子集；S201. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes a data attribute subset and a data subset corresponding to each data attribute in the data attribute subset;

具体地，本发明实施例中的待检测财务数据样本集D是经过预处理之后的，可以直接用于训练模型的原始训练集。所谓预处理指针对缺测和异常数据进行补充、修正和归一化处理。每个数据子集中的数据的取值类型为数值型。Specifically, the financial data sample set D to be tested in the embodiment of the present invention is preprocessed and can be directly used for the original training set of the training model. The so-called preprocessing refers to supplementing, correcting and normalizing the missing and abnormal data. The value type of the data in each data subset is numeric.

S202、根据预设的随机抽样次数M对所述待检测财务数据样本集D进行M次随机抽样，每次抽取M个数据，确定第k次随机抽样得到的样本集为训练子集D_k，其中k＝1,2,3…M；S202. Perform M random sampling on the financial data sample set D to be tested according to the preset number of random sampling M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D _k , where k=1,2,3...M;

具体地，本发明实施例中的随机抽样为迭代地、有放回地随机抽样，旨在从样本集D中确定M个不同的训练子集D_k，从而为后续训练多个决策树作准备。Specifically, the random sampling in the embodiment of the present invention is iterative random sampling with replacement, aiming to determine M different training subsets D _k from the sample set D, so as to prepare for the subsequent training of multiple decision trees .

S203、利用预设的降噪自编码模型在所述训练子集D_k上学习得到决策树T_k；S203. Using a preset noise reduction autoencoder model to learn a decision tree T _{k on the training subset D k} _;

具体地，所谓预设的降噪自编码模型是一个包含一层隐藏层的神经网络模型，该神经网络最左边的一层叫做输入层，最右边的一层叫做输出层，中间所有节点组成的一层叫做隐藏层，并视该隐藏层为编码空间层。所谓降噪，是指在学习阶段，对输入层训练子集D_k中的财务训练数据随机加入噪声，使学习到的编码器具有较强的鲁棒性，从而增强模型的泛化能力。在实际应用中，隐藏层的隐藏单元的数量的可以根据数据属性子集中所包含的数据属性个数决定，即隐藏单元的个数等于数据属性的个数。Specifically, the so-called preset noise reduction autoencoder model is a neural network model containing a hidden layer, the leftmost layer of the neural network is called the input layer, the rightmost layer is called the output layer, and all nodes in the middle are composed of One layer is called the hidden layer, and this hidden layer is considered to encode the spatial layer. The so-called noise reduction refers to randomly adding noise to the financial training data in the training subset _Dk of the input layer during the learning stage, so that the learned encoder has strong robustness, thereby enhancing the generalization ability of the model. In practical applications, the number of hidden units in the hidden layer can be determined according to the number of data attributes included in the data attribute subset, that is, the number of hidden units is equal to the number of data attributes.

每个训练子集D_k经输入层输入，利用预设的降噪自编码模型对该训练子集进行编码得到每个训练子集D_k的编码空间，从而在每个编码空间中学习得到决策树T_k，最终可得到M个决策树。Each training subset Dk is input through the input layer, and the preset noise reduction _autoencoder model is used to encode the training subset to obtain the coding space of each training subset _Dk , so as to learn the decision in each coding space tree T _k , finally M decision trees can be obtained.

S204、根据所述决策树T_k对待检测财务数据样本集D中的每个数据子集进行财务状况类别预测，所述财务状况类别包括财务状况良好和财务状况异常。S204. Predict the financial status category of each data subset in the financial data sample set D to be detected according to the decision tree T _k , and the financial status category includes good financial status and abnormal financial status.

具体地，在每个编码空间中，使用相应的决策树模型T_k对财务情况进行投票，累积这些投票并使用多数投票方法获得预测结果。Specifically, in each coding space, use the corresponding decision tree model _Tk to vote on the financial situation, accumulate these votes and use the majority voting method to obtain the prediction result.

本发明实施例提供的多决策树财务报警方法，通过对获取到的待检测财务数据样本集进行随机抽样得到多个不同的训练子集，然后对多个不同的训练子集进行编码得到多个不同的决策树，最后通过组合多个决策树对待检测数据子集进行预测。因此，本发明能够提高对财务状况预测的准确度和泛化能力。The multi-decision tree financial alarm method provided by the embodiment of the present invention obtains a plurality of different training subsets by randomly sampling the acquired financial data sample set to be detected, and then encodes a plurality of different training subsets to obtain multiple Different decision trees, and finally predict the subset of data to be detected by combining multiple decision trees. Therefore, the present invention can improve the accuracy and generalization ability of financial status prediction.

在上述实施例的基础上，该方法中的步骤203具体包括：On the basis of the foregoing embodiments, step 203 in the method specifically includes:

步骤2031、利用预设的降噪自编码模型Step 2031, using a preset noise reduction autoencoder model

步骤2032、在所述新的训练子集上学习得到决策树模型T_k。Step 2032, in the new training subset The decision tree model T _k is obtained through learning above.

具体地，该过程可以看作是对“输入数据”D_k的编码，使编码后的数据向量能保留输入财务数据的典型特征，从而能够较为方便的恢复原始财务数据。Specifically, this process can be regarded as encoding the "input data" D _k , so that the encoded data vector The typical characteristics of the input financial data can be preserved, so that the original financial data can be recovered more conveniently.

在上述各实施例的基础上，该方法中的步骤2031中的权重W^(k,1)和偏置项b^(k,l)分别根据下式： On the basis of the above-mentioned embodiments, the weight W ^{(k, 1)} and the bias term b ^{(k, l)} in step 2031 in the method are respectively according to the following formula:

具体地，可以通过让输入值等于输出值，从而确定权重W＝{W^(k，l)|k＝1,2…M；l＝1,2,3；}的初始值。在实际应用中，和计算过程如下：Specifically, the initial value of the weight W={W ^(k,l) |k=1,2...M;l=1,2,3;} can be determined by making the input value equal to the output value. In practical application, and The calculation process is as follows:

其中δ^(l)为第l层的残差，具体为：Where δ ^(l) is the residual of layer l, specifically:

z^(l+1)＝W^(k,l)a^(l)+b^(k,l),l＝1z ^(l+1) = W ^(k,l) a ^(l) +b ^(k,l) ,l=1

在上述各实施例的基础上，该方法中的步骤2032具体包括：On the basis of the above-mentioned embodiments, step 2032 in the method specifically includes:

步骤20321、初始化生成节点node作为根节点；Step 20321, initialize the generated node node as the root node;

步骤20322、若中财务数据样本同属一个特定财务状况类别或财务数据样本数小于预设阈值，则将node标记为所述特定财务状况类别的叶结点；Step 20322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category;

步骤20323、若中财务数据样本不同属一个特定财务状况类别并且财务数据样本数不小于预设阈值，则利用目标函数Step 20323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function

其中，D_t为当前节点t对应的数据集，D_tj为当前节点t的孩子结点j的数据集；Ent(D)为D的信息熵，体现了数据D的纯度，其值越小，则D的纯度越高；Gain表示划分前后信息熵增加值(信息增益)，其值越大，表明划分所获得的“纯度提升”越大。选择Gain最大的测试条件作为划分条件。Among them, D _t is the data set corresponding to the current node t, D _tj is the data set of the child node j of the current node t; Ent(D) is the information entropy of D, reflecting the purity of the data D, the smaller the value, The higher the purity of D is; Gain represents the increase value of information entropy (information gain) before and after division, and the larger the value, the greater the "purity improvement" obtained by division. Select the test condition with the largest Gain as the division condition.

步骤20324、根据测试条件将划分为两个子集并构建两个孩子结点。Step 20324, according to test conditions will Divide into two subsets and build two child nodes.

步骤20325、对于每个子集，重复步骤322和步骤323。Step 20325, for each subset, repeat step 322 and step 323.

在上述各实施例的基础上，该方法中的步骤4具体包括：On the basis of the above-mentioned embodiments, step 4 in the method specifically includes:

利用组合基分类器T^* Using the combined base classifier T ^*

在上述各实施例的基础上，该方法中的数据属性子集包括：资产负债率，净资产收益率，资产利润率，净利润率，总资产周转率，应收账款周转率，流动资产周转率，主营业务增长率，总资产增长率和净利润增长率中的至少一种数据属性。On the basis of the above-mentioned embodiments, the subset of data attributes in this method includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, accounts receivable turnover, current assets At least one data attribute among turnover rate, growth rate of main business, growth rate of total assets and growth rate of net profit.

图3为本发明又一实施例提供的多决策树财务预警方法的流程示意图。首先，对待检测财务数据样本D进行随机抽样获取训练子集：D₁，D₂，D₃……D_k；其次，通过训练得到权重W^(k,1)和偏置项b^(k,l)；然后通过预设的降噪自编码模型对上述训练子集：D₁，D₂，D₃……D_k进行编码得到新的训练子集：之后，在所述新的训练子集上迭代地学习多个决策树：T₁，T₂，T₃……T_k；最后利用决策树模型预测各数据子集的财务状况类别，通过多数投票方法组合这些预测结果，进而获得最终财务预警情况。Fig. 3 is a schematic flowchart of a multi-decision tree financial early warning method provided by another embodiment of the present invention. First, randomly sample the financial data sample D to be tested to obtain training subsets: D ₁ , D ₂ , D ₃ ... D _k ; secondly, obtain weight W ^(k,1) and bias item b ^{(k,l )} ; Then, the above training subsets are encoded by the preset noise reduction autoencoding model: D ₁ , D ₂ , D ₃ ... D _k to obtain a new training subset: After that, iteratively learn multiple decision trees on the new training subset: T ₁ , T ₂ , T ₃ ... T _k ; finally use the decision tree model to predict the financial status category of each data subset, and pass the majority vote The method combines these forecast results to obtain the final financial warning situation.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. A multi-decision tree financial early warning method is characterized in that, comprising:

Step 1. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes data attribute subsets and data subsets corresponding to each data attribute in the data attribute subsets;

Step 2. Perform M random sampling on the financial data sample set D to be tested according to the preset random sampling times M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D _k , where k=1,2,3...M;

Step 3, using the preset noise reduction autoencoder model to learn the decision tree T _{k on the training subset D k} _;

Step 4. According to the decision tree T _k , predict the financial status category of each data subset in the financial data sample set D to be detected, and the financial status category includes good financial status and abnormal financial status.

2. The method according to claim 1, wherein said step 3 specifically comprises:

Step 31. Use the preset noise reduction autoencoder model

Encode the training subset D _k in the l-layer neural network to obtain a new training subset Where W ^{(k, 1)} is the weight of the l-th layer neural network, and b ^{(k, l)} is the bias item of the l-th layer neural network.

Step 32, in the new training subset The decision tree model T _k is obtained through learning above.

3. method according to claim 2, is characterized in that, weight W ^{(k, 1)} and bias item b ^(k, 1) in the described step 31 are according to following formula respectively:

Determined; where, λ is the weight decay parameter, x is the data subset under the specific data attribute, is a subset of data under a specific data attribute after adding noise, represents the output of the output layer, and represent the partial derivatives of the cost function J(W ^(k,l-1) ,b ^(k,l-1) ) with respect to W ^(k,l-1) and b ^(k,l-1) respectively.

4. The method according to claim 2, wherein said step 32 specifically comprises:

Step 321, initialize the generation node node as the root node;

Step 322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category;

Step 323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function

Search test conditions, where D _t is the data set corresponding to the current node t, D _tj is the data set of the child node j of the current node t, and Ent(D _tj ) is the information entropy of D _tj ;

Step 324, according to test condition will Divide into two subsets and build two child nodes.

Step 325, for each subset, repeat step 322 and step 323.

5. The method according to claim 1, wherein said step 4 specifically comprises:

Using the combined base classifier T ^*

Classify a data subset x under a specific data attribute, where δ(true)=1, δ(false)=0, T _k (x) is the prediction of the decision tree T _k on the data subset x under a specific data attribute.

6. The method according to any one of claims 1-5, wherein the data attribute subset includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, should At least one data attribute among the turnover rate of accounts collection, turnover rate of current assets, growth rate of main business, growth rate of total assets and growth rate of net profit.