Background
In the past, for example, US8,050,988B 2 and US 2006/0106686 a1 proposed structured audit systems for financial risk and opportunities and suggestions for financial audit from risk planes in financial audit, and other patents, such as US 7885841B 2, US5765138, US 7346527B 2, US2008/019546 a1, US 8504412B 1, also include automation such as audit planning and audit item generation.
Although there are recommendation systems using natural language processing such as US 2016/0148327A 1, US 2018/0165696A 1 and CN 107807962B, it is not considered that the risk of the supplier and its background information such as scale, operation performance and operation time are quantitative.
Disclosure of Invention
The invention aims to provide a system and a method for integrating qualitative data and quantitative data to recommend an auditing criterion, which can objectively establish the correlation between auditing discovery and an operation index by considering background information of supplier operation.
Based on the above, the present invention mainly adopts the following technical means to achieve the above object.
A system for integrating qualitative data and quantitative data for auditing criteria recommendation includes: the storage module is used for receiving ongoing analysis data of the supplier audit and storing historical analysis data which is already finished with the supplier audit in the past, wherein the ongoing analysis data and the historical analysis data both comprise qualitative data found by the audit and quantitative data of the supplier operation data; a theme model conversion module, connected to the storage module, for analyzing the audit findings of the historical analysis data to establish a theme model or update the theme model, and obtaining a theme model probability distribution, the theme model conversion module converting the audit findings of the ongoing analysis data according to the theme model; the characteristic vector module is connected with the topic model conversion module and the storage module and used for generating a corresponding characteristic vector set according to the topic model probability distribution and the supplier operation data of the historical analysis data, and the characteristic vector module is used for generating a characteristic vector value corresponding to the ongoing analysis data; the classification module is connected with the characteristic vector module and used for carrying out cluster analysis on the characteristic vector set and determining a cluster to which the characteristic vector value belongs; and the recommending module is connected with the classifying module and the theme model converting module and used for receiving an auditing criterion list used by provider auditing and generating corresponding recommended auditing criterion items for a related theme according to the cluster to which the characteristic vector value belongs.
Further, the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters, and the cluster having the smallest distance value is used as the cluster to which the feature vector value belongs.
Further, the quantitative data of the supplier operation data at least comprises any one or combination of the number data of the suppliers, the turnover data and the operating time data.
A method for integrating qualitative data and quantitative data to recommend auditing criteria includes: receiving an on-going analysis data of the supplier audit and storing a historical analysis data which has finished the supplier audit in the past by a storage module, wherein the on-going analysis data and the historical analysis data comprise qualitative data found by the audit and quantitative data of the supplier operation data; analyzing the audit finding of the historical analysis data by a topic model conversion module to establish a topic model or update the topic model, obtaining a topic model probability distribution and converting the audit finding of the ongoing analysis data according to the topic model so as to enable a feature vector module to generate a corresponding feature vector set and a feature vector value of the ongoing analysis data according to the topic model probability distribution and the supplier operation data of the historical analysis data; and a classification module carries out cluster analysis on the feature vector set and determines a cluster to which the feature vector value belongs so that a recommendation module receives an audit criterion list used by supplier audit, and generates corresponding recommended audit criterion items for a related subject according to the cluster to which the feature vector value belongs.
Further, the feature vector set is subjected to clustering analysis by using a K-means clustering algorithm.
Further, the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters, and the cluster having the smallest distance value is used as the cluster to which the feature vector value belongs.
Further, the cluster analysis may be reduced in dimensionality by a Weighted K-means feature selection algorithm to establish a feature vector for the cluster analysis.
Further, the topic model probability distribution is established by using at least one of Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (Non-Negative Matrix Factorization).
Further, the quantitative data of the supplier operation data at least comprises any one or combination of the number data of the suppliers, the turnover data and the operating time data.
According to the technical characteristics, the following effects can be achieved:
1. the recommendation of the audit criterion takes background information (such as scale, operation performance, operation time and other quantitative information) of the operation of the supplier into consideration, and provides a more suitable audit criterion than the recommendation processed only by natural language.
2. The qualitative information of audit finding and the quantitative information related to suppliers collected in the past are clustered and analyzed by the suppliers through natural language processing and unsupervised learning at regular intervals, and the characteristic selection is carried out, so that the correlation between the audit finding and the operation index can be objectively established.
Drawings
FIG. 1 is a block diagram of a system according to an embodiment of the present invention.
FIG. 2 is a detailed flowchart of another embodiment of the present invention including a modeling step and an audit criteria recommendation step.
[ notation ] to show
100 system
1 storage module
11 analyzing the data in progress
111 in-progress audit discovery
112 ongoing supplier operation data
12 historical analysis data
121 completed audit discovery
122 historical supplier operation data
2 topic model conversion module
3 feature vector module
30 feature vector sets
31 characteristic vector value
4 categorised module
40 cluster analysis
5 recommend module
50 audit criteria list
51 recommend audit criteria items
S10 modeling step
Step one of the S100 modeling step
Step two of the S101 modeling step
Step three of the S102 modeling step
Step four of the S103 modeling step
Step five of the S104 modeling step
Step six of the S105 modeling step
Step seven of the S106 modeling step
Step eight of the S107 modeling step
Step nine of the S108 modeling step
Step ten of the S109 modeling step
Step eleven of the S110 modeling step
Step twelve of the S111 modeling step
Step thirteen of the S112 modeling step
S20 auditing criterion recommending step
Step one of the S200 audit criterion recommending step
Step two of the step of recommending the audit criterion in S201
Step three of the step of recommending the audit criteria of S202
Step four of the step of recommending the audit criterion in S203.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings.
FIG. 1 shows an embodiment of a system 100 for integrating qualitative data and quantitative data to recommend auditing criteria, which may be implemented as a cloud system or a stand-alone device, and mainly includes a storage module 1, a topic model transformation module 2, a feature vector module 3, a classification module 4, and a recommendation module 5; the system 100 is used for implementing a method for integrating qualitative data and quantitative data for auditing criteria recommendation according to another embodiment of the present invention; the following will further specifically describe the system for integrating the qualitative data and the quantitative data to perform audit criteria recommendation:
the storage module 1 is used for receiving ongoing analysis data 11 of supplier audit and storing historical analysis data 12 of supplier audit completed in the past; the on-going analytics 11 include certain data, i.e., an on-going audit finding 111, and certain data, i.e., on-going supplier business data 112; the on-going audit finding 111 is an objective statement seen by an auditor in an audit process of an audited supplier, data is in a text form, and once audit is completed, the on-going audit finding 111 is updated to a completed audit finding 121; the ongoing supplier operation data 112 is a numerical data set, which may include, but is not limited to, for example, a number of suppliers data, a turnover data, a running time data, etc.; the ongoing supplier operation data 112 can be collected in advance, the status is updated to a historical supplier operation data 122 after the audit is completed, and the historical analysis data 12 is a general term of the completed audit finding 121 and the historical supplier operation data 122.
The topic model transformation module 2 is connected to the storage module 1, and periodically updates a topic model for the audit findings 121 to obtain a topic model probability distribution. The topic model probability distribution can be established by using at least one of an implicit Dirichlet Allocation (LDA) algorithm and Non-Negative Matrix Factorization (NMF). The topic model transformation module 2 generates the topic model probability distribution by mapping and transforming the completed audit findings 121 stored in the storage module 1 and the ongoing audit findings 111 received by the storage module 1 into linear combinations of the topic models by using a latest topic model.
The feature vector module 3 is connected to the topic model conversion module 2 and the storage module 1, reads the topic model probability distribution of the completed audit finding 121 and performs a combining operation with the historical supplier operation data 122 stored in the storage module 1 to generate a feature vector set 30, and simultaneously reads the topic model probability distribution of the ongoing audit finding 111 and performs a combining operation with the ongoing supplier operation data 112 in the storage module 1 to generate a feature vector value 31.
The classifying module 4 is connected to the feature vector module 3, so as to determine an optimal clustering number for the feature vector set 30 by using, for example, an intra-group least squares sum algorithm, and perform a clustering analysis 40 on the feature vector set 30 by using, for example, a K-means clustering algorithm (K-means clustering) with the optimal clustering number; during the cluster analysis 40, the feature vector set 30 combines the ongoing supplier operation data 112 with the topic model probability distribution of the completed audit finding 121, and each dimension of a feature vector has different contribution and influence on the cluster analysis result, so that the classification module 4 can use Weighted K-means to perform feature selection to reduce the dimension of the feature vector for establishing the cluster analysis 40; and determines a cluster to which the characteristic vector value 31 belongs; specifically, the classifying module 4 calculates a distance value between the feature vector 31 and the center of gravity of each cluster, and determines the cluster with the smallest distance value as the cluster to which the feature vector 31 belongs.
Then, the recommendation module 5 is connected to the classification module 4 and the topic model conversion module 2, and configured to receive an audit criterion list 50 for provider audit, obtain at least one topic with high correlation from coordinates of the cluster gravity center according to the cluster to which the feature vector value 31 belongs determined by the classification module 4, and query and return a recommended audit criterion item 51 corresponding to each topic sorted according to correlation in the audit criterion list 50 by using the topic model with term frequency-inverse document frequency (tf-idf).
The following embodiment, with reference to fig. 2, will further describe details of the method for integrating qualitative data and quantitative data to make audit criteria recommendation, which mainly includes a modeling step S10 and an audit criteria recommendation step S20. The modeling step S10 is mainly to perform cluster analysis according to a completed audit finding, an audit rule list and a historical supplier operation data (such as supplier number data, business volume data, operation time data, etc.) in a storage module, and may be performed only once or updated periodically or aperiodically. The audit criteria recommending step S20 is to classify a newly provided ongoing audit discovery and an ongoing supplier operation data to provide a corresponding recommended audit criteria item.
The modeling step S10 includes:
step one of a modeling step S100: an audit event is established, an audit criterion list is input to a recommendation module, and a number of all existing suppliers and the finished audit finding (csv file) corresponding to the number are output from the storage module.
Step two of a modeling step S101: the topic model transformation module reads the completed audit findings output by step S100 of the modeling step using the pandas tool.
Step three of a modeling step S102: and the topic model conversion module utilizes a genim tool to perform word segmentation on the completed audit finding in the step two S101 of the modeling step.
Step four of a modeling step S103: the topic model conversion module uses a space Tool and an NLTK (Natural Language Tool kit) Tool to perform pre-processing such as stop word removal and root extraction on the participled audit findings in the third step S102 of the modeling step. It should be noted that the pandas, gensim, space, and NLTK are all natural languages or data analysis processing software tools in Python programming language.
Step five of a modeling step S104: the topic model conversion module converts the completed audit findings processed by step four S103 of the modeling step into term frequency (term frequency) spatial vectors.
Step six S105 of a modeling step: the topic model conversion module uses an implicit Dirichlet Allocation (LDA) algorithm to establish and optimize a topic model for the completed audit findings processed in the step five S104 of the modeling step.
Step seven of a modeling step S106: the topic model conversion module maps the completed audit finding to a topic model probability distribution of the topic model, namely D ═ Σ Φ T, where D is the completed audit finding, T is the topic model, and Φ is the probability of T in D.
Step eight of a modeling step S107: a feature vector module for fetching out phi and reading in a certain amount of information from the storage module, i.e. the operation data V of the historical supplier and making combination operation to generate a feature vector F ═ V + ═ phi, and forming a feature vector set F from all the feature vectors Fn。
Step nine of a modeling step S108: a classification module for the feature vector set FnThe analysis is performed by using K-means algorithm, and an optimal clustering number K is obtained by using the minimum-cluster sum of squares (WCSS).
A modeling step of step ten S109: w is given arbitrarily for m dimensions of the feature vector F
jBut, however, do
Wherein, w
jIs a set of weights corresponding to m of said feature vector values.
Step eleven of a modeling step S110: given beta (beta)>1) And k, optionally giving a clustering center of gravity Z
kFixing the two solutions to minimum in turn
(C, Z, w) of (A), wherein C
ilIs an orthogonal matrix, which is 1 only when i ═ l, i.e. only n x and the cluster barycenter Z to which it belongs are calculated
kThe distance value of (2).
A modeling step, step twelve S111: m is p + q, ijIn more detail, the step twelve S111 of the modeling step selects the first p feature vector values with larger weight w from the m feature vector values, and the remaining q feature vector values are not selected, wherein r feature vector values from the p feature vector values are from the probability distribution of the topic model.
Step thirteen S112 of a modeling step: and (d) utilizing tf-idf to query the auditing criterion list by r subjects and returning each auditing criterion item corresponding to each subject in relevant sequencing.
The audit criteria recommending step S20 includes:
an audit criteria recommendation step, step S200: the storage module receives an ongoing analytics data from a user (e.g., smart phone, laptop, tablet, etc.), the ongoing analytics data including the ongoing supplier business data and the ongoing audit findings.
A second step S201 of an audit criterion recommending step: the classification module maps the ongoing audit finding of the ongoing analysis data with the established topic model to obtain DA=ΣφATAAnd A represents the audit event.
A third step S202 of the audit criterion recommending step: the classification module is represented by pACalculating the minimum distance between each feature and the gravity center Zk of each cluster, and determining the current cluster CA。
A step four of audit criteria recommendation step S203: the recommendation module is from CASequentially recommending the corresponding subjects generated by step thirteen S112 of the modeling step according to the degree of correlation with the subjectsAnd each item of audit criterion is sent to the user side, namely each item of the recommended audit criterion.
Therefore, a user can upload the provider operation data and audit findings in real time on an audit site, wherein the audit findings form theme distribution after theme Model (Topic Model) conversion operation, the provider operation data is integrated and classified in the original clustering after unsupervised learning (such as K-means operation method), and after the themes with higher probability in the category are sorted, corresponding recommended audit criteria of the themes can be sequentially returned to serve as references of audit opportunities.
While the operation, use and efficacy of the present invention will be fully understood from the foregoing description of the preferred embodiments, it is to be understood that the present invention is not limited to the disclosed embodiments, but is capable of numerous modifications and variations, including variations, modifications, variations, equivalents, variations, changes, substitutions, modifications, variations, changes, variations, alterations, substitutions and equivalents, which fall within the spirit and scope of the invention.