CN112256740A

CN112256740A - System and method for integrating qualitative data and quantitative data to recommend auditing criteria

Info

Publication number: CN112256740A
Application number: CN201910661396.7A
Authority: CN
Inventors: 王其宏
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2021-01-22
Also published as: TW202107348A; TWI759785B

Abstract

The present invention discloses a system for integrating qualitative data and quantitative data to recommend audit criteria, which is used to execute the following method: a storage module receives ongoing analysis data of supplier audits and stores historical analysis data that has been completed in the past, wherein the ongoing analysis data and the historical analysis data both contain qualitative data of audit findings and quantitative data of supplier operating data; a topic model conversion module analyzes the audit findings of the historical analysis data to obtain a topic model probability distribution, so that a feature vector module generates a corresponding feature vector set and a feature vector value of the ongoing analysis data based on the topic model probability distribution and the supplier operating data of the historical analysis data; a classification module determines the cluster to which the feature vector value belongs, so that a recommendation module provides a recommended audit criterion item pre-calculated based on an audit criterion list based on the cluster.

Description

System and method for integrating qualitative data and quantitative data to recommend auditing criteria

Technical Field

The invention relates to the field of natural language processing, in particular to a recommendation system and a recommendation method, and particularly relates to a system and a method for integrating qualitative data and quantitative data to recommend an audit criterion.

Background

In the past, for example, US8,050,988B 2 and US 2006/0106686 a1 proposed structured audit systems for financial risk and opportunities and suggestions for financial audit from risk planes in financial audit, and other patents, such as US 7885841B 2, US5765138, US 7346527B 2, US2008/019546 a1, US 8504412B 1, also include automation such as audit planning and audit item generation.

Although there are recommendation systems using natural language processing such as US 2016/0148327A 1, US 2018/0165696A 1 and CN 107807962B, it is not considered that the risk of the supplier and its background information such as scale, operation performance and operation time are quantitative.

Disclosure of Invention

The invention aims to provide a system and a method for integrating qualitative data and quantitative data to recommend an auditing criterion, which can objectively establish the correlation between auditing discovery and an operation index by considering background information of supplier operation.

Based on the above, the present invention mainly adopts the following technical means to achieve the above object.

A system for integrating qualitative data and quantitative data for auditing criteria recommendation includes: the storage module is used for receiving ongoing analysis data of the supplier audit and storing historical analysis data which is already finished with the supplier audit in the past, wherein the ongoing analysis data and the historical analysis data both comprise qualitative data found by the audit and quantitative data of the supplier operation data; a theme model conversion module, connected to the storage module, for analyzing the audit findings of the historical analysis data to establish a theme model or update the theme model, and obtaining a theme model probability distribution, the theme model conversion module converting the audit findings of the ongoing analysis data according to the theme model; the characteristic vector module is connected with the topic model conversion module and the storage module and used for generating a corresponding characteristic vector set according to the topic model probability distribution and the supplier operation data of the historical analysis data, and the characteristic vector module is used for generating a characteristic vector value corresponding to the ongoing analysis data; the classification module is connected with the characteristic vector module and used for carrying out cluster analysis on the characteristic vector set and determining a cluster to which the characteristic vector value belongs; and the recommending module is connected with the classifying module and the theme model converting module and used for receiving an auditing criterion list used by provider auditing and generating corresponding recommended auditing criterion items for a related theme according to the cluster to which the characteristic vector value belongs.

Further, the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters, and the cluster having the smallest distance value is used as the cluster to which the feature vector value belongs.

Further, the quantitative data of the supplier operation data at least comprises any one or combination of the number data of the suppliers, the turnover data and the operating time data.

A method for integrating qualitative data and quantitative data to recommend auditing criteria includes: receiving an on-going analysis data of the supplier audit and storing a historical analysis data which has finished the supplier audit in the past by a storage module, wherein the on-going analysis data and the historical analysis data comprise qualitative data found by the audit and quantitative data of the supplier operation data; analyzing the audit finding of the historical analysis data by a topic model conversion module to establish a topic model or update the topic model, obtaining a topic model probability distribution and converting the audit finding of the ongoing analysis data according to the topic model so as to enable a feature vector module to generate a corresponding feature vector set and a feature vector value of the ongoing analysis data according to the topic model probability distribution and the supplier operation data of the historical analysis data; and a classification module carries out cluster analysis on the feature vector set and determines a cluster to which the feature vector value belongs so that a recommendation module receives an audit criterion list used by supplier audit, and generates corresponding recommended audit criterion items for a related subject according to the cluster to which the feature vector value belongs.

Further, the feature vector set is subjected to clustering analysis by using a K-means clustering algorithm.

Further, the cluster analysis may be reduced in dimensionality by a Weighted K-means feature selection algorithm to establish a feature vector for the cluster analysis.

Further, the topic model probability distribution is established by using at least one of Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (Non-Negative Matrix Factorization).

According to the technical characteristics, the following effects can be achieved:

1. the recommendation of the audit criterion takes background information (such as scale, operation performance, operation time and other quantitative information) of the operation of the supplier into consideration, and provides a more suitable audit criterion than the recommendation processed only by natural language.

2. The qualitative information of audit finding and the quantitative information related to suppliers collected in the past are clustered and analyzed by the suppliers through natural language processing and unsupervised learning at regular intervals, and the characteristic selection is carried out, so that the correlation between the audit finding and the operation index can be objectively established.

Drawings

FIG. 1 is a block diagram of a system according to an embodiment of the present invention.

FIG. 2 is a detailed flowchart of another embodiment of the present invention including a modeling step and an audit criteria recommendation step.

[ notation ] to show

100 system

1 storage module

11 analyzing the data in progress

111 in-progress audit discovery

112 ongoing supplier operation data

12 historical analysis data

121 completed audit discovery

122 historical supplier operation data

2 topic model conversion module

3 feature vector module

30 feature vector sets

31 characteristic vector value

4 categorised module

40 cluster analysis

5 recommend module

50 audit criteria list

51 recommend audit criteria items

S10 modeling step

Step one of the S100 modeling step

Step two of the S101 modeling step

Step three of the S102 modeling step

Step four of the S103 modeling step

Step five of the S104 modeling step

Step six of the S105 modeling step

Step seven of the S106 modeling step

Step eight of the S107 modeling step

Step nine of the S108 modeling step

Step ten of the S109 modeling step

Step eleven of the S110 modeling step

Step twelve of the S111 modeling step

Step thirteen of the S112 modeling step

S20 auditing criterion recommending step

Step one of the S200 audit criterion recommending step

Step two of the step of recommending the audit criterion in S201

Step three of the step of recommending the audit criteria of S202

Step four of the step of recommending the audit criterion in S203.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings.

FIG. 1 shows an embodiment of a system 100 for integrating qualitative data and quantitative data to recommend auditing criteria, which may be implemented as a cloud system or a stand-alone device, and mainly includes a storage module 1, a topic model transformation module 2, a feature vector module 3, a classification module 4, and a recommendation module 5; the system 100 is used for implementing a method for integrating qualitative data and quantitative data for auditing criteria recommendation according to another embodiment of the present invention; the following will further specifically describe the system for integrating the qualitative data and the quantitative data to perform audit criteria recommendation:

the storage module 1 is used for receiving ongoing analysis data 11 of supplier audit and storing historical analysis data 12 of supplier audit completed in the past; the on-going analytics 11 include certain data, i.e., an on-going audit finding 111, and certain data, i.e., on-going supplier business data 112; the on-going audit finding 111 is an objective statement seen by an auditor in an audit process of an audited supplier, data is in a text form, and once audit is completed, the on-going audit finding 111 is updated to a completed audit finding 121; the ongoing supplier operation data 112 is a numerical data set, which may include, but is not limited to, for example, a number of suppliers data, a turnover data, a running time data, etc.; the ongoing supplier operation data 112 can be collected in advance, the status is updated to a historical supplier operation data 122 after the audit is completed, and the historical analysis data 12 is a general term of the completed audit finding 121 and the historical supplier operation data 122.

The topic model transformation module 2 is connected to the storage module 1, and periodically updates a topic model for the audit findings 121 to obtain a topic model probability distribution. The topic model probability distribution can be established by using at least one of an implicit Dirichlet Allocation (LDA) algorithm and Non-Negative Matrix Factorization (NMF). The topic model transformation module 2 generates the topic model probability distribution by mapping and transforming the completed audit findings 121 stored in the storage module 1 and the ongoing audit findings 111 received by the storage module 1 into linear combinations of the topic models by using a latest topic model.

The feature vector module 3 is connected to the topic model conversion module 2 and the storage module 1, reads the topic model probability distribution of the completed audit finding 121 and performs a combining operation with the historical supplier operation data 122 stored in the storage module 1 to generate a feature vector set 30, and simultaneously reads the topic model probability distribution of the ongoing audit finding 111 and performs a combining operation with the ongoing supplier operation data 112 in the storage module 1 to generate a feature vector value 31.

The classifying module 4 is connected to the feature vector module 3, so as to determine an optimal clustering number for the feature vector set 30 by using, for example, an intra-group least squares sum algorithm, and perform a clustering analysis 40 on the feature vector set 30 by using, for example, a K-means clustering algorithm (K-means clustering) with the optimal clustering number; during the cluster analysis 40, the feature vector set 30 combines the ongoing supplier operation data 112 with the topic model probability distribution of the completed audit finding 121, and each dimension of a feature vector has different contribution and influence on the cluster analysis result, so that the classification module 4 can use Weighted K-means to perform feature selection to reduce the dimension of the feature vector for establishing the cluster analysis 40; and determines a cluster to which the characteristic vector value 31 belongs; specifically, the classifying module 4 calculates a distance value between the feature vector 31 and the center of gravity of each cluster, and determines the cluster with the smallest distance value as the cluster to which the feature vector 31 belongs.

Then, the recommendation module 5 is connected to the classification module 4 and the topic model conversion module 2, and configured to receive an audit criterion list 50 for provider audit, obtain at least one topic with high correlation from coordinates of the cluster gravity center according to the cluster to which the feature vector value 31 belongs determined by the classification module 4, and query and return a recommended audit criterion item 51 corresponding to each topic sorted according to correlation in the audit criterion list 50 by using the topic model with term frequency-inverse document frequency (tf-idf).

The following embodiment, with reference to fig. 2, will further describe details of the method for integrating qualitative data and quantitative data to make audit criteria recommendation, which mainly includes a modeling step S10 and an audit criteria recommendation step S20. The modeling step S10 is mainly to perform cluster analysis according to a completed audit finding, an audit rule list and a historical supplier operation data (such as supplier number data, business volume data, operation time data, etc.) in a storage module, and may be performed only once or updated periodically or aperiodically. The audit criteria recommending step S20 is to classify a newly provided ongoing audit discovery and an ongoing supplier operation data to provide a corresponding recommended audit criteria item.

The modeling step S10 includes:

step one of a modeling step S100: an audit event is established, an audit criterion list is input to a recommendation module, and a number of all existing suppliers and the finished audit finding (csv file) corresponding to the number are output from the storage module.

Step two of a modeling step S101: the topic model transformation module reads the completed audit findings output by step S100 of the modeling step using the pandas tool.

Step three of a modeling step S102: and the topic model conversion module utilizes a genim tool to perform word segmentation on the completed audit finding in the step two S101 of the modeling step.

Step four of a modeling step S103: the topic model conversion module uses a space Tool and an NLTK (Natural Language Tool kit) Tool to perform pre-processing such as stop word removal and root extraction on the participled audit findings in the third step S102 of the modeling step. It should be noted that the pandas, gensim, space, and NLTK are all natural languages or data analysis processing software tools in Python programming language.

Step five of a modeling step S104: the topic model conversion module converts the completed audit findings processed by step four S103 of the modeling step into term frequency (term frequency) spatial vectors.

Step six S105 of a modeling step: the topic model conversion module uses an implicit Dirichlet Allocation (LDA) algorithm to establish and optimize a topic model for the completed audit findings processed in the step five S104 of the modeling step.

Step seven of a modeling step S106: the topic model conversion module maps the completed audit finding to a topic model probability distribution of the topic model, namely D ═ Σ Φ T, where D is the completed audit finding, T is the topic model, and Φ is the probability of T in D.

Step eight of a modeling step S107: a feature vector module for fetching out phi and reading in a certain amount of information from the storage module, i.e. the operation data V of the historical supplier and making combination operation to generate a feature vector F ═ V + ═ phi, and forming a feature vector set F from all the feature vectors Fⁿ。

Step nine of a modeling step S108: a classification module for the feature vector set FⁿThe analysis is performed by using K-means algorithm, and an optimal clustering number K is obtained by using the minimum-cluster sum of squares (WCSS).

A modeling step of step ten S109: w is given arbitrarily for m dimensions of the feature vector F_jBut, however, do

Wherein, w_jIs a set of weights corresponding to m of said feature vector values.

Step eleven of a modeling step S110: given beta (beta)>1) And k, optionally giving a clustering center of gravity Z_kFixing the two solutions to minimum in turn

(C, Z, w) of (A), wherein C_ilIs an orthogonal matrix, which is 1 only when i ═ l, i.e. only n x and the cluster barycenter Z to which it belongs are calculated_kThe distance value of (2).

A modeling step, step twelve S111: m is p + q, i_jIn more detail, the step twelve S111 of the modeling step selects the first p feature vector values with larger weight w from the m feature vector values, and the remaining q feature vector values are not selected, wherein r feature vector values from the p feature vector values are from the probability distribution of the topic model.

Step thirteen S112 of a modeling step: and (d) utilizing tf-idf to query the auditing criterion list by r subjects and returning each auditing criterion item corresponding to each subject in relevant sequencing.

The audit criteria recommending step S20 includes:

an audit criteria recommendation step, step S200: the storage module receives an ongoing analytics data from a user (e.g., smart phone, laptop, tablet, etc.), the ongoing analytics data including the ongoing supplier business data and the ongoing audit findings.

A second step S201 of an audit criterion recommending step: the classification module maps the ongoing audit finding of the ongoing analysis data with the established topic model to obtain D_A＝Σφ_AT_AAnd A represents the audit event.

A third step S202 of the audit criterion recommending step: the classification module is represented by p_ACalculating the minimum distance between each feature and the gravity center Zk of each cluster, and determining the current cluster C_A。

A step four of audit criteria recommendation step S203: the recommendation module is from C_ASequentially recommending the corresponding subjects generated by step thirteen S112 of the modeling step according to the degree of correlation with the subjectsAnd each item of audit criterion is sent to the user side, namely each item of the recommended audit criterion.

Therefore, a user can upload the provider operation data and audit findings in real time on an audit site, wherein the audit findings form theme distribution after theme Model (Topic Model) conversion operation, the provider operation data is integrated and classified in the original clustering after unsupervised learning (such as K-means operation method), and after the themes with higher probability in the category are sorted, corresponding recommended audit criteria of the themes can be sequentially returned to serve as references of audit opportunities.

While the operation, use and efficacy of the present invention will be fully understood from the foregoing description of the preferred embodiments, it is to be understood that the present invention is not limited to the disclosed embodiments, but is capable of numerous modifications and variations, including variations, modifications, variations, equivalents, variations, changes, substitutions, modifications, variations, changes, variations, alterations, substitutions and equivalents, which fall within the spirit and scope of the invention.

Claims

1. a system that integrates qualitative data and quantitative data to carry out auditing criteria recommendation, is characterized in that, comprises:

a storage module for receiving an in-progress analysis data of supplier audits and storing historical analysis data of supplier audits that have been completed in the past, the in-progress analysis data and the historical analysis data both including a qualitative analysis of audit findings Information and quantitative information about a supplier's operating data;

A topic model conversion module, connected to the storage module, for analyzing the audit findings of the historical analysis data to establish a topic model or update the topic model, and obtain a topic model probability distribution, the a topic model conversion module and converts the audit findings of the ongoing analysis data according to the topic model;

A feature vector module, connected to the theme model conversion module and the storage module, for generating a corresponding feature vector set according to the theme model probability distribution and the supplier operation data of the historical analysis data , the eigenvector module is also used to generate a eigenvector value corresponding to the ongoing analysis data;

a classification module, connected to the feature vector module, for performing cluster analysis on the feature vector set and determining a cluster to which the feature vector value belongs;

a recommendation module, connected to the classification module and the topic model conversion module, for receiving a list of audit criteria used for supplier audit, and according to the cluster to which the feature vector value belongs, related A subject generates a corresponding item of a recommended audit criterion.

2 . The system for integrating qualitative data and quantitative data for recommendation of audit criteria according to claim 1 , wherein the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters. 3 . , taking the cluster with the smallest distance value as the cluster to which the feature vector value belongs.

3. The system for integrating qualitative data and quantitative data for recommendation of audit criteria as claimed in claim 1, wherein the quantitative data of the supplier's business data at least comprises a supplier number data, a turnover data, a management data Any or combination of time data.

4. A method for integrating qualitative data and quantitative data for recommendation of audit criteria, characterized in that it comprises:

A storage module receives an in-process analysis data of the supplier audit and stores a historical analysis data of the supplier audit completed in the past, the in-process analysis data and the historical analysis data both include qualitative data of an audit finding and 1. Quantitative information on the supplier's operating data;

Analyzing the audit findings of the historical analysis data by a topic model conversion module to establish a topic model or update the topic model, obtain a topic model probability distribution, and convert the in-progress analysis according to the topic model The audit findings of the data are used for a feature vector module to generate a corresponding set of feature vectors and a set of the ongoing analysis data according to the subject model probability distribution and the supplier operation data of the historical analysis data. eigenvector value;

Perform cluster analysis on the feature vector set with a classification module and determine a cluster to which the feature vector value belongs, so that a recommendation module can receive a list of audit criteria used for supplier audit, and according to the The cluster to which the eigenvector value belongs generates a corresponding item of recommended audit criteria for a related topic.

5 . The method of integrating qualitative data and quantitative data to recommend audit criteria as claimed in claim 4 , wherein the feature vector set is clustered using K-means algorithm. 6 .

6 . The method for integrating qualitative data and quantitative data for recommendation of audit criteria as claimed in claim 4 , wherein the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters. 7 . , taking the cluster with the smallest distance value as the cluster to which the feature vector value belongs.

7 . The method of integrating qualitative data and quantitative data for recommendation of audit criteria as claimed in claim 4 , wherein the cluster analysis reduces the dimension of the feature vector for establishing the cluster analysis through the weighted K-mean feature selection algorithm. 8 .

8 . The method for recommending audit criteria by integrating qualitative data and quantitative data according to claim 4 , wherein the topic model probability distribution is established by at least one of implicit Dirichlet distribution or non-negative matrix decomposition. 9 .

9. The method of integrating qualitative data and quantitative data for recommendation of audit criteria as claimed in claim 4, wherein the quantitative data of the supplier's business data at least comprises a supplier number data, a turnover data, a management data Any or combination of time data.