[go: up one dir, main page]

CN108629675A - A kind of Multiple trees financial alert method - Google Patents

A kind of Multiple trees financial alert method Download PDF

Info

Publication number
CN108629675A
CN108629675A CN201810388744.3A CN201810388744A CN108629675A CN 108629675 A CN108629675 A CN 108629675A CN 201810388744 A CN201810388744 A CN 201810388744A CN 108629675 A CN108629675 A CN 108629675A
Authority
CN
China
Prior art keywords
data
financial
subset
sample set
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810388744.3A
Other languages
Chinese (zh)
Inventor
郭华平
刁小宇
刘宏兵
邬长安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Normal University
Original Assignee
Xinyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Normal University filed Critical Xinyang Normal University
Priority to CN201810388744.3A priority Critical patent/CN108629675A/en
Publication of CN108629675A publication Critical patent/CN108629675A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of Multiple trees financial alert method.This method includes:Step 1 obtains financial data sample set D to be detected, and the financial data sample set D to be detected includes the data subset corresponding to each data attribute in data attribute subset and the data attribute subset;Step 2 carries out M random sampling according to preset random sampling number M to the financial data sample set D to be detected, extracts M data every time, determines that the sample set that kth time random sampling obtains is training subset Dk, wherein k=1,2,3 ... M;Step 3, using preset noise reduction own coding model in the training subset DkAcquistion go to school to decision tree Tk;Step 4, according to the decision tree TkFinancial situation class prediction is carried out to each data subset in financial data sample set D to be detected, the financial situation classification includes that financial situation is good and financial situation is abnormal.The present invention can improve the accuracy predicted financial situation and generalization ability.

Description

一种多决策树财务预警方法A Multi-decision Tree Financial Early Warning Method

技术领域technical field

本发明涉及数据分析技术领域,尤其涉及一种多决策树财务预警方法。The invention relates to the technical field of data analysis, in particular to a multi-decision tree financial early warning method.

背景技术Background technique

近年来,随着我国市场经济和资本市场的快速发展,国内企业之间的竞争日益激烈,同时越来越多的跨国公司与国内公司也在争夺客户、人力、商品和资本供应商。世界经济一体化在给企业带来前所未有的机遇的同时,也带来了巨大的挑战,企业面临的不确定性日益加强。随着雷曼兄弟控股公司、世界通信公司等大型集团公司破产事件的曝光,企业陷入财务困境甚至破产的案例屡见不鲜。但是企业陷入财务危机是一个逐步的过程,并非短期内形成的,而是具有较长的潜伏期。企业财务状况的变化可以通过某些财务指标反映出来,因而财务指标对公司未来的财务状况具有预测能力。如果对企业的财务状况通过一定技术手段进行预测,对企业的各个利益相关者包括投资者具有重要意义:一是有利于企业管理者及时发现企业财务问题,解决问题,防患于未然;二是有利于投资者的投资决策,保护投资者利益。In recent years, with the rapid development of my country's market economy and capital market, the competition among domestic enterprises has become increasingly fierce. At the same time, more and more multinational companies and domestic companies are also competing for customers, human resources, commodity and capital suppliers. While the integration of the world economy has brought unprecedented opportunities to enterprises, it has also brought enormous challenges, and the uncertainties faced by enterprises are increasing day by day. With the exposure of the bankruptcies of large group companies such as Lehman Brothers Holdings and WorldCom, it is not uncommon for companies to encounter financial difficulties or even go bankrupt. However, it is a gradual process for an enterprise to fall into financial crisis, and it is not formed in a short period of time, but has a long incubation period. Changes in the financial status of an enterprise can be reflected by certain financial indicators, so financial indicators have the ability to predict the company's future financial status. If the financial situation of the enterprise is predicted by certain technical means, it is of great significance to various stakeholders of the enterprise, including investors: first, it will help the enterprise managers to discover the financial problems of the enterprise in time, solve them, and prevent problems before they happen; It is beneficial to investors' investment decisions and protects their interests.

为了能够及时并准确地对财务状况做出预测,我国各高校和企业以及投资者都相继开展了对财务预警方法的研究工作,目前已有相应的财务与警方法。然而,目前国内现有的财务预警方法大部分都是采用单个模型的预测方法,如神经网络模型、决策树模型、Logistic回归模型等,准确率较低,性能有待提高;而少数多模型的预测方法则泛化能力较低。In order to predict the financial situation timely and accurately, universities, enterprises and investors in our country have successively carried out research on financial early warning methods. At present, there are corresponding financial and police methods. However, most of the existing financial early warning methods in China use single-model forecasting methods, such as neural network models, decision tree models, Logistic regression models, etc., with low accuracy and performance needs to be improved; while a few multi-model forecasting methods The method has low generalization ability.

发明内容Contents of the invention

本发明提供一种多决策树财务预警方法,用以提高对财务状况预测的准确度和泛化能力。The invention provides a multi-decision tree financial early warning method, which is used to improve the accuracy and generalization ability of financial status prediction.

本发明提供一种多决策树财务预警方法,该方法包括:The invention provides a multi-decision tree financial early warning method, the method comprising:

步骤1、获取待检测财务数据样本集D,所述待检测财务数据样本集D包括数据属性子集和所述数据属性子集中每个数据属性所对应的数据子集;Step 1. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes data attribute subsets and data subsets corresponding to each data attribute in the data attribute subsets;

步骤2、根据预设的随机抽样次数M对所述待检测财务数据样本集D进行M次随机抽样,每次抽取M个数据,确定第k次随机抽样得到的样本集为训练子集Dk,其中k=1,2,3…M;Step 2. Perform M random sampling on the financial data sample set D to be tested according to the preset random sampling times M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D k , where k=1,2,3...M;

步骤3、利用预设的降噪自编码模型在所述训练子集Dk上学习得到决策树TkStep 3, using the preset noise reduction autoencoder model to learn the decision tree T k on the training subset D k ;

步骤4、根据所述决策树Tk对待检测财务数据样本集D中的每个数据子集进行财务状况类别预测,所述财务状况类别包括财务状况良好和财务状况异常。Step 4. According to the decision tree T k , predict the financial status category of each data subset in the financial data sample set D to be detected, and the financial status category includes good financial status and abnormal financial status.

进一步地,所述步骤3具体包括:Further, the step 3 specifically includes:

步骤31、利用预设的降噪自编码模型Step 31. Use the preset noise reduction autoencoder model

对训练子集Dk在第l层神经网络进行编码得到新的训练子集其中W(k,1)为第l层神经网络的权重,b(k,l)为第l层神经网络的偏置项。Encode the training subset D k in the l-layer neural network to obtain a new training subset Where W (k, 1) is the weight of the l-th layer neural network, and b (k, l) is the bias item of the l-th layer neural network.

步骤32、在所述新的训练子集上学习得到决策树模型TkStep 32, in the new training subset The decision tree model T k is obtained through learning above.

进一步地,所述步骤31中的权重W(k,1)和偏置项b(k,l)分别根据下式:Further, the weight W (k, 1) and the bias item b (k, l) in the step 31 are respectively according to the following formula:

确定;其中,λ为权重衰减参数,x为特定数据属性下的数据子集,为加入噪声后的特定数据属性下的数据子集,表示输出层的输出,分别表示代价函数J(W(k,l-1),b(k,l-1))关于W(k,l-1)和b(k,l-1)的偏导数。Determined; where, λ is the weight decay parameter, x is the data subset under the specific data attribute, is a subset of data under a specific data attribute after adding noise, represents the output of the output layer, and represent the partial derivatives of the cost function J(W (k,l-1) ,b (k,l-1) ) with respect to W (k,l-1) and b (k,l-1) respectively.

进一步地,所述步骤32具体包括:Further, the step 32 specifically includes:

步骤321、初始化生成节点node作为根节点;Step 321, initialize the generation node node as the root node;

步骤322、若中财务数据样本同属一个特定财务状况类别或财务数据样本数小于预设阈值,则将node标记为所述特定财务状况类别的叶结点;Step 322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category;

步骤323、若中财务数据样本不同属一个特定财务状况类别并且财务数据样本数不小于预设阈值,则利用目标函数Step 323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function

搜索测试条件,其中Dt为当前节点t对应的数据集,Dtj为当前节点t的孩子结点j的数据集,Ent(Dtj)为Dtj的信息熵;Search test conditions, where D t is the data set corresponding to the current node t, D tj is the data set of the child node j of the current node t, and Ent(D tj ) is the information entropy of D tj ;

步骤324、根据测试条件将划分为两个子集并构建两个孩子结点。Step 324, according to test condition will Divide into two subsets and build two child nodes.

步骤325、对于每个子集,重复步骤322和步骤323。Step 325, for each subset, repeat step 322 and step 323.

进一步地,所述步骤4具体包括:Further, the step 4 specifically includes:

利用组合基分类器T* Using the combined base classifier T *

对特定数据属性下的数据子集x进行分类,其中δ(true)=1,δ(false)=0,Tk(x)为决策树Tk对特定数据属性下的数据子集x的预测。Classify a data subset x under a specific data attribute, where δ(true)=1, δ(false)=0, T k (x) is the prediction of the decision tree T k on the data subset x under a specific data attribute.

进一步地,所述数据属性子集包括:资产负债率,净资产收益率,资产利润率,净利润率,总资产周转率,应收账款周转率,流动资产周转率,主营业务增长率,总资产增长率和净利润增长率中的至少一种数据属性。Further, the subset of data attributes includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, accounts receivable turnover, current asset turnover, main business growth rate , at least one data attribute in total asset growth rate and net profit growth rate.

本发明的技术效果:Technical effect of the present invention:

本发明提供的一种多决策树财务预警方法,通过对获取到的待检测财务数据样本集进行随机抽样得到多个不同的训练子集,然后对多个不同的训练子集进行编码得到多个不同的决策树,最后通过组合多个决策树对待检测数据子集进行预测。因此,本发明能够提高对财务状况预测的准确度和泛化能力。A multi-decision tree financial early warning method provided by the present invention obtains a plurality of different training subsets by randomly sampling the acquired financial data sample set to be detected, and then encodes a plurality of different training subsets to obtain multiple Different decision trees, and finally predict the subset of data to be detected by combining multiple decision trees. Therefore, the present invention can improve the accuracy and generalization ability of financial status prediction.

附图说明Description of drawings

图1为本发明实施例提供的决策树的结构示意图;Fig. 1 is a schematic structural diagram of a decision tree provided by an embodiment of the present invention;

图2为本发明实施例提供的多决策树财务预警方法的流程示意图;Fig. 2 is a schematic flow chart of the multi-decision tree financial early warning method provided by the embodiment of the present invention;

图3为本发明又一实施例提供的多决策树财务预警方法的流程示意图。Fig. 3 is a schematic flowchart of a multi-decision tree financial early warning method provided by another embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the present invention Examples, not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

决策树是一种简单易用的非参数分类器。它不需要对数据有任何的先验假设,计算速度较快,结果容易解释,而且稳健性强A decision tree is an easy-to-use nonparametric classifier. It does not require any prior assumptions about the data, the calculation speed is fast, the results are easy to interpret, and it is robust

决策树的构成有四个要素:(1)决策结点;(2)方案枝;(3)状态结点;(4)概率枝。如图1所示:决策树一般由方块结点、圆形结点、方案枝、概率枝等组成,方块结点称为决策结点,由决策结点引出若干条细支,每条细支代表一个方案,称为方案枝;圆形结点称为状态结点,由状态结点引出若干条细支,表示不同的自然状态,称为概率枝。每条概率枝代表一种自然状态。在每条细枝上标明客观状态的内容和其出现概率。在概率枝的最末稍标明该方案在该自然状态下所达到的结果(收益值或损失值)。这样树形图由左向右,由简到繁展开,组成一个树状网络图。The composition of the decision tree has four elements: (1) decision node; (2) program branch; (3) state node; (4) probability branch. As shown in Figure 1: The decision tree is generally composed of square nodes, circular nodes, plan branches, probability branches, etc. The square nodes are called decision nodes, and several thin branches are derived from the decision nodes. Each thin branch It represents a plan, called a plan branch; the circular node is called a state node, and several thin branches are drawn from the state node, which represent different natural states, called probability branches. Each probability branch represents a natural state. Mark the content of the objective state and its probability of occurrence on each twig. At the end of the probability branch, indicate the result (benefit value or loss value) achieved by the plan in the natural state. In this way, the tree diagram is expanded from left to right, from simple to complex, forming a tree network diagram.

图2为本发明实施例提供的一种多决策树财务预警方法的流程示意图。如图2所示,该方法包括以下步骤:Fig. 2 is a schematic flowchart of a multi-decision tree financial early warning method provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps:

S201、获取待检测财务数据样本集D,所述待检测财务数据样本集D包括数据属性子集和所述数据属性子集中每个数据属性所对应的数据子集;S201. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes a data attribute subset and a data subset corresponding to each data attribute in the data attribute subset;

具体地,本发明实施例中的待检测财务数据样本集D是经过预处理之后的,可以直接用于训练模型的原始训练集。所谓预处理指针对缺测和异常数据进行补充、修正和归一化处理。每个数据子集中的数据的取值类型为数值型。Specifically, the financial data sample set D to be tested in the embodiment of the present invention is preprocessed and can be directly used for the original training set of the training model. The so-called preprocessing refers to supplementing, correcting and normalizing the missing and abnormal data. The value type of the data in each data subset is numeric.

S202、根据预设的随机抽样次数M对所述待检测财务数据样本集D进行M次随机抽样,每次抽取M个数据,确定第k次随机抽样得到的样本集为训练子集Dk,其中k=1,2,3…M;S202. Perform M random sampling on the financial data sample set D to be tested according to the preset number of random sampling M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D k , where k=1,2,3...M;

具体地,本发明实施例中的随机抽样为迭代地、有放回地随机抽样,旨在从样本集D中确定M个不同的训练子集Dk,从而为后续训练多个决策树作准备。Specifically, the random sampling in the embodiment of the present invention is iterative random sampling with replacement, aiming to determine M different training subsets D k from the sample set D, so as to prepare for the subsequent training of multiple decision trees .

S203、利用预设的降噪自编码模型在所述训练子集Dk上学习得到决策树TkS203. Using a preset noise reduction autoencoder model to learn a decision tree T k on the training subset D k ;

具体地,所谓预设的降噪自编码模型是一个包含一层隐藏层的神经网络模型,该神经网络最左边的一层叫做输入层,最右边的一层叫做输出层,中间所有节点组成的一层叫做隐藏层,并视该隐藏层为编码空间层。所谓降噪,是指在学习阶段,对输入层训练子集Dk中的财务训练数据随机加入噪声,使学习到的编码器具有较强的鲁棒性,从而增强模型的泛化能力。在实际应用中,隐藏层的隐藏单元的数量的可以根据数据属性子集中所包含的数据属性个数决定,即隐藏单元的个数等于数据属性的个数。Specifically, the so-called preset noise reduction autoencoder model is a neural network model containing a hidden layer, the leftmost layer of the neural network is called the input layer, the rightmost layer is called the output layer, and all nodes in the middle are composed of One layer is called the hidden layer, and this hidden layer is considered to encode the spatial layer. The so-called noise reduction refers to randomly adding noise to the financial training data in the training subset Dk of the input layer during the learning stage, so that the learned encoder has strong robustness, thereby enhancing the generalization ability of the model. In practical applications, the number of hidden units in the hidden layer can be determined according to the number of data attributes included in the data attribute subset, that is, the number of hidden units is equal to the number of data attributes.

每个训练子集Dk经输入层输入,利用预设的降噪自编码模型对该训练子集进行编码得到每个训练子集Dk的编码空间,从而在每个编码空间中学习得到决策树Tk,最终可得到M个决策树。Each training subset Dk is input through the input layer, and the preset noise reduction autoencoder model is used to encode the training subset to obtain the coding space of each training subset Dk , so as to learn the decision in each coding space tree T k , finally M decision trees can be obtained.

S204、根据所述决策树Tk对待检测财务数据样本集D中的每个数据子集进行财务状况类别预测,所述财务状况类别包括财务状况良好和财务状况异常。S204. Predict the financial status category of each data subset in the financial data sample set D to be detected according to the decision tree T k , and the financial status category includes good financial status and abnormal financial status.

具体地,在每个编码空间中,使用相应的决策树模型Tk对财务情况进行投票,累积这些投票并使用多数投票方法获得预测结果。Specifically, in each coding space, use the corresponding decision tree model Tk to vote on the financial situation, accumulate these votes and use the majority voting method to obtain the prediction result.

本发明实施例提供的多决策树财务报警方法,通过对获取到的待检测财务数据样本集进行随机抽样得到多个不同的训练子集,然后对多个不同的训练子集进行编码得到多个不同的决策树,最后通过组合多个决策树对待检测数据子集进行预测。因此,本发明能够提高对财务状况预测的准确度和泛化能力。The multi-decision tree financial alarm method provided by the embodiment of the present invention obtains a plurality of different training subsets by randomly sampling the acquired financial data sample set to be detected, and then encodes a plurality of different training subsets to obtain multiple Different decision trees, and finally predict the subset of data to be detected by combining multiple decision trees. Therefore, the present invention can improve the accuracy and generalization ability of financial status prediction.

在上述实施例的基础上,该方法中的步骤203具体包括:On the basis of the foregoing embodiments, step 203 in the method specifically includes:

步骤2031、利用预设的降噪自编码模型Step 2031, using a preset noise reduction autoencoder model

对训练子集Dk在第l层神经网络进行编码得到新的训练子集其中W(k,1)为第l层神经网络的权重,b(k,l)为第l层神经网络的偏置项。Encode the training subset D k in the l-layer neural network to obtain a new training subset Where W (k, 1) is the weight of the l-th layer neural network, and b (k, l) is the bias item of the l-th layer neural network.

步骤2032、在所述新的训练子集上学习得到决策树模型TkStep 2032, in the new training subset The decision tree model T k is obtained through learning above.

具体地,该过程可以看作是对“输入数据”Dk的编码,使编码后的数据向量能保留输入财务数据的典型特征,从而能够较为方便的恢复原始财务数据。Specifically, this process can be regarded as encoding the "input data" D k , so that the encoded data vector The typical characteristics of the input financial data can be preserved, so that the original financial data can be recovered more conveniently.

在上述各实施例的基础上,该方法中的步骤2031中的权重W(k,1)和偏置项b(k,l)分别根据下式: On the basis of the above-mentioned embodiments, the weight W (k, 1) and the bias term b (k, l) in step 2031 in the method are respectively according to the following formula:

确定;其中,λ为权重衰减参数,x为特定数据属性下的数据子集,为加入噪声后的特定数据属性下的数据子集,表示输出层的输出,分别表示代价函数J(W(k,l-1),b(k,l-1))关于W(k,l-1)和b(k,l-1)的偏导数。Determined; where, λ is the weight decay parameter, x is the data subset under the specific data attribute, is a subset of data under a specific data attribute after adding noise, represents the output of the output layer, and represent the partial derivatives of the cost function J(W (k,l-1) ,b (k,l-1) ) with respect to W (k,l-1) and b (k,l-1) respectively.

具体地,可以通过让输入值等于输出值,从而确定权重W={W(k,l)|k=1,2…M;l=1,2,3;}的初始值。在实际应用中,计算过程如下:Specifically, the initial value of the weight W={W (k,l) |k=1,2...M;l=1,2,3;} can be determined by making the input value equal to the output value. In practical application, and The calculation process is as follows:

其中δ(l)为第l层的残差,具体为:Where δ (l) is the residual of layer l, specifically:

z(l+1)=W(k,l)a(l)+b(k,l),l=1z (l+1) = W (k,l) a (l) +b (k,l) ,l=1

在上述各实施例的基础上,该方法中的步骤2032具体包括:On the basis of the above-mentioned embodiments, step 2032 in the method specifically includes:

步骤20321、初始化生成节点node作为根节点;Step 20321, initialize the generated node node as the root node;

步骤20322、若中财务数据样本同属一个特定财务状况类别或财务数据样本数小于预设阈值,则将node标记为所述特定财务状况类别的叶结点;Step 20322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category;

步骤20323、若中财务数据样本不同属一个特定财务状况类别并且财务数据样本数不小于预设阈值,则利用目标函数Step 20323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function

其中,Dt为当前节点t对应的数据集,Dtj为当前节点t的孩子结点j的数据集;Ent(D)为D的信息熵,体现了数据D的纯度,其值越小,则D的纯度越高;Gain表示划分前后信息熵增加值(信息增益),其值越大,表明划分所获得的“纯度提升”越大。选择Gain最大的测试条件作为划分条件。Among them, D t is the data set corresponding to the current node t, D tj is the data set of the child node j of the current node t; Ent(D) is the information entropy of D, reflecting the purity of the data D, the smaller the value, The higher the purity of D is; Gain represents the increase value of information entropy (information gain) before and after division, and the larger the value, the greater the "purity improvement" obtained by division. Select the test condition with the largest Gain as the division condition.

步骤20324、根据测试条件将划分为两个子集并构建两个孩子结点。Step 20324, according to test conditions will Divide into two subsets and build two child nodes.

步骤20325、对于每个子集,重复步骤322和步骤323。Step 20325, for each subset, repeat step 322 and step 323.

在上述各实施例的基础上,该方法中的步骤4具体包括:On the basis of the above-mentioned embodiments, step 4 in the method specifically includes:

利用组合基分类器T* Using the combined base classifier T *

对特定数据属性下的数据子集x进行分类,其中δ(true)=1,δ(false)=0,Tk(x)为决策树Tk对特定数据属性下的数据子集x的预测。Classify a data subset x under a specific data attribute, where δ(true)=1, δ(false)=0, T k (x) is the prediction of the decision tree T k on the data subset x under a specific data attribute.

在上述各实施例的基础上,该方法中的数据属性子集包括:资产负债率,净资产收益率,资产利润率,净利润率,总资产周转率,应收账款周转率,流动资产周转率,主营业务增长率,总资产增长率和净利润增长率中的至少一种数据属性。On the basis of the above-mentioned embodiments, the subset of data attributes in this method includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, accounts receivable turnover, current assets At least one data attribute among turnover rate, growth rate of main business, growth rate of total assets and growth rate of net profit.

图3为本发明又一实施例提供的多决策树财务预警方法的流程示意图。首先,对待检测财务数据样本D进行随机抽样获取训练子集:D1,D2,D3……Dk;其次,通过训练得到权重W(k,1)和偏置项b(k,l);然后通过预设的降噪自编码模型对上述训练子集:D1,D2,D3……Dk进行编码得到新的训练子集: 之后,在所述新的训练子集上迭代地学习多个决策树:T1,T2,T3……Tk;最后利用决策树模型预测各数据子集的财务状况类别,通过多数投票方法组合这些预测结果,进而获得最终财务预警情况。Fig. 3 is a schematic flowchart of a multi-decision tree financial early warning method provided by another embodiment of the present invention. First, randomly sample the financial data sample D to be tested to obtain training subsets: D 1 , D 2 , D 3 ... D k ; secondly, obtain weight W (k,1) and bias item b (k,l ) ; Then, the above training subsets are encoded by the preset noise reduction autoencoding model: D 1 , D 2 , D 3 ... D k to obtain a new training subset: After that, iteratively learn multiple decision trees on the new training subset: T 1 , T 2 , T 3 ... T k ; finally use the decision tree model to predict the financial status category of each data subset, and pass the majority vote The method combines these forecast results to obtain the final financial warning situation.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims (6)

1.一种多决策树财务预警方法,其特征在于,包括:1. A multi-decision tree financial early warning method is characterized in that, comprising: 步骤1、获取待检测财务数据样本集D,所述待检测财务数据样本集D包括数据属性子集和所述数据属性子集中每个数据属性所对应的数据子集;Step 1. Obtain a financial data sample set D to be detected, the financial data sample set D to be detected includes data attribute subsets and data subsets corresponding to each data attribute in the data attribute subsets; 步骤2、根据预设的随机抽样次数M对所述待检测财务数据样本集D进行M次随机抽样,每次抽取M个数据,确定第k次随机抽样得到的样本集为训练子集Dk,其中k=1,2,3…M;Step 2. Perform M random sampling on the financial data sample set D to be tested according to the preset random sampling times M, extract M data each time, and determine the sample set obtained by the kth random sampling as the training subset D k , where k=1,2,3...M; 步骤3、利用预设的降噪自编码模型在所述训练子集Dk上学习得到决策树TkStep 3, using the preset noise reduction autoencoder model to learn the decision tree T k on the training subset D k ; 步骤4、根据所述决策树Tk对待检测财务数据样本集D中的每个数据子集进行财务状况类别预测,所述财务状况类别包括财务状况良好和财务状况异常。Step 4. According to the decision tree T k , predict the financial status category of each data subset in the financial data sample set D to be detected, and the financial status category includes good financial status and abnormal financial status. 2.根据权利要求1所述的方法,其特征在于,所述步骤3具体包括:2. The method according to claim 1, wherein said step 3 specifically comprises: 步骤31、利用预设的降噪自编码模型Step 31. Use the preset noise reduction autoencoder model 对训练子集Dk在第l层神经网络进行编码得到新的训练子集其中W(k,1)为第l层神经网络的权重,b(k,l)为第l层神经网络的偏置项。Encode the training subset D k in the l-layer neural network to obtain a new training subset Where W (k, 1) is the weight of the l-th layer neural network, and b (k, l) is the bias item of the l-th layer neural network. 步骤32、在所述新的训练子集上学习得到决策树模型TkStep 32, in the new training subset The decision tree model T k is obtained through learning above. 3.根据权利要求2所述的方法,其特征在于,所述步骤31中的权重W(k,1)和偏置项b(k,l)分别根据下式:3. method according to claim 2, is characterized in that, weight W (k, 1) and bias item b (k, 1) in the described step 31 are according to following formula respectively: 确定;其中,λ为权重衰减参数,x为特定数据属性下的数据子集,为加入噪声后的特定数据属性下的数据子集,表示输出层的输出,分别表示代价函数J(W(k,l-1),b(k,l-1))关于W(k,l-1)和b(k,l-1)的偏导数。Determined; where, λ is the weight decay parameter, x is the data subset under the specific data attribute, is a subset of data under a specific data attribute after adding noise, represents the output of the output layer, and represent the partial derivatives of the cost function J(W (k,l-1) ,b (k,l-1) ) with respect to W (k,l-1) and b (k,l-1) respectively. 4.根据权利要求2所述的方法,其特征在于,所述步骤32具体包括:4. The method according to claim 2, wherein said step 32 specifically comprises: 步骤321、初始化生成节点node作为根节点;Step 321, initialize the generation node node as the root node; 步骤322、若中财务数据样本同属一个特定财务状况类别或财务数据样本数小于预设阈值,则将node标记为所述特定财务状况类别的叶结点;Step 322, if If the financial data samples belong to a specific financial status category or the number of financial data samples is less than a preset threshold, the node is marked as a leaf node of the specific financial status category; 步骤323、若中财务数据样本不同属一个特定财务状况类别并且财务数据样本数不小于预设阈值,则利用目标函数Step 323, if different financial data samples belong to a specific financial status category and the number of financial data samples is not less than the preset threshold, then use the objective function 搜索测试条件,其中Dt为当前节点t对应的数据集,Dtj为当前节点t的孩子结点j的数据集,Ent(Dtj)为Dtj的信息熵;Search test conditions, where D t is the data set corresponding to the current node t, D tj is the data set of the child node j of the current node t, and Ent(D tj ) is the information entropy of D tj ; 步骤324、根据测试条件将划分为两个子集并构建两个孩子结点。Step 324, according to test condition will Divide into two subsets and build two child nodes. 步骤325、对于每个子集,重复步骤322和步骤323。Step 325, for each subset, repeat step 322 and step 323. 5.根据权利要求1所述的方法,其特征在于,所述步骤4具体包括:5. The method according to claim 1, wherein said step 4 specifically comprises: 利用组合基分类器T* Using the combined base classifier T * 对特定数据属性下的数据子集x进行分类,其中δ(true)=1,δ(false)=0,Tk(x)为决策树Tk对特定数据属性下的数据子集x的预测。Classify a data subset x under a specific data attribute, where δ(true)=1, δ(false)=0, T k (x) is the prediction of the decision tree T k on the data subset x under a specific data attribute. 6.根据权利要求1-5任一所述的方法,其特征在于,所述数据属性子集包括:资产负债率,净资产收益率,资产利润率,净利润率,总资产周转率,应收账款周转率,流动资产周转率,主营业务增长率,总资产增长率和净利润增长率中的至少一种数据属性。6. The method according to any one of claims 1-5, wherein the data attribute subset includes: asset-liability ratio, return on net assets, return on assets, net profit rate, total asset turnover, should At least one data attribute among the turnover rate of accounts collection, turnover rate of current assets, growth rate of main business, growth rate of total assets and growth rate of net profit.
CN201810388744.3A 2018-04-27 2018-04-27 A kind of Multiple trees financial alert method Pending CN108629675A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810388744.3A CN108629675A (en) 2018-04-27 2018-04-27 A kind of Multiple trees financial alert method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810388744.3A CN108629675A (en) 2018-04-27 2018-04-27 A kind of Multiple trees financial alert method

Publications (1)

Publication Number Publication Date
CN108629675A true CN108629675A (en) 2018-10-09

Family

ID=63694696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810388744.3A Pending CN108629675A (en) 2018-04-27 2018-04-27 A kind of Multiple trees financial alert method

Country Status (1)

Country Link
CN (1) CN108629675A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492712A (en) * 2018-12-17 2019-03-19 上海应用技术大学 The method for establishing internet finance air control model
CN110210959A (en) * 2019-06-10 2019-09-06 广发证券股份有限公司 Analysis method, device and the storage medium of financial data
CN110472660A (en) * 2019-07-09 2019-11-19 深圳壹账通智能科技有限公司 Abnormal deviation data examination method, device, computer equipment and storage medium
CN111110224A (en) * 2020-01-17 2020-05-08 武汉中旗生物医疗电子有限公司 Electrocardiogram classification method and device based on multi-angle feature extraction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014238643A (en) * 2013-06-06 2014-12-18 株式会社三井住友銀行 ATM skimming prevention system and method
CN105373606A (en) * 2015-11-11 2016-03-02 重庆邮电大学 Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN107229916A (en) * 2017-05-27 2017-10-03 南京航空航天大学 A kind of airport noise Monitoring Data restorative procedure based on depth noise reduction own coding
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014238643A (en) * 2013-06-06 2014-12-18 株式会社三井住友銀行 ATM skimming prevention system and method
CN105373606A (en) * 2015-11-11 2016-03-02 重庆邮电大学 Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data
CN107229916A (en) * 2017-05-27 2017-10-03 南京航空航天大学 A kind of airport noise Monitoring Data restorative procedure based on depth noise reduction own coding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孟杰: "随机森林模型在财务失败预警中的应用", 《统计与决策》 *
张万军: "基于大数据的个人信用风险评估模型研究", 《中国博士学位论文全文数据库》 *
邱爽 等: "基于栈式降噪自动编码器的中文短文本分类", 《内蒙古民族大学学报(自然科学版)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492712A (en) * 2018-12-17 2019-03-19 上海应用技术大学 The method for establishing internet finance air control model
CN110210959A (en) * 2019-06-10 2019-09-06 广发证券股份有限公司 Analysis method, device and the storage medium of financial data
CN110472660A (en) * 2019-07-09 2019-11-19 深圳壹账通智能科技有限公司 Abnormal deviation data examination method, device, computer equipment and storage medium
WO2021004132A1 (en) * 2019-07-09 2021-01-14 深圳壹账通智能科技有限公司 Abnormal data detection method, apparatus, computer device, and storage medium
CN111110224A (en) * 2020-01-17 2020-05-08 武汉中旗生物医疗电子有限公司 Electrocardiogram classification method and device based on multi-angle feature extraction

Similar Documents

Publication Publication Date Title
CN107392644A (en) A kind of commodity purchasing predicts modeling method
CN106960358A (en) A kind of financial fraud behavior based on rural area electronic commerce big data deep learning quantifies detecting system
CN108629675A (en) A kind of Multiple trees financial alert method
CN112417176A (en) Graph feature-based method, device and medium for mining implicit association relation between enterprises
CN111047193A (en) Enterprise credit scoring model generation algorithm based on credit big data label
Wei [Retracted] A Method of Enterprise Financial Risk Analysis and Early Warning Based on Decision Tree Model
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN111104975B (en) Credit evaluation method based on breadth learning
CN115049472B (en) An unsupervised credit card anomaly detection method based on multi-dimensional feature tensors
Stanišić et al. Corporate bankruptcy prediction in the Republic of Serbia
CN119850315A (en) Intelligent bidding data analysis method, system and storage medium based on AI technology
CN117522124A (en) Credit risk assessment methods, systems and equipment for listed companies based on knowledge graph
Vochozka et al. Model to predict survival of transportation and shipping companies
Zeng et al. Research on audit opinion prediction of listed companies based on sparse principal component analysis and kernel fuzzy clustering algorithm
CN119647957A (en) A method, device, storage medium and processor for predicting credit risk of small and micro enterprises
Sun et al. Short-Term Stock Price Forecasting Based on an SVD-LSTM Model.
Wang A study on early warning of financial indicators of listed companies based on random forest
Petersone et al. A data-driven framework for identifying investment opportunities in private equity
Qiang et al. [Retracted] Relationship Model between Human Resource Management Activities and Performance Based on LMBP Algorithm
CN117151867A (en) Enterprise exception identification method and system based on big data
CN116542369A (en) A Financial Default Risk Prediction Method Based on Hypergraph Contrastive Learning
Khajehpour et al. Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups
Li et al. Prediction and Sensitivity Analysis of Companies’ Return on Equity Based on Deep Neural Network
Jenčová et al. IS LOGISTIC REGRESSION RELIABLE IN BANKRUPTCY PREDICTION?
Stevenson Novel applications of advanced predictive analytics and artificial intelligence to improve SME competitiveness and access to funding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181009

RJ01 Rejection of invention patent application after publication