CN118520404B

CN118520404B - Enterprise business data mining method, device, equipment and storage medium

Info

Publication number: CN118520404B
Application number: CN202410979815.2A
Authority: CN
Inventors: 李理; 黄茜; 罗永佳
Original assignee: Shenzhen Bytter Tech Co ltd
Current assignee: Shenzhen Bytter Tech Co ltd
Priority date: 2024-07-22
Filing date: 2024-07-22
Publication date: 2024-10-01
Anticipated expiration: 2044-07-22
Also published as: CN118520404A

Abstract

The application relates to the technical field of data processing, and discloses an enterprise business data mining method, device, equipment and storage medium. The method comprises the following steps: performing multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data; carrying out non-negative matrix factorization on the clustering service data to obtain a key characteristic data set; performing fuzzy logic hybrid modeling processing on the key characteristic data set to obtain a service hybrid model; performing differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model; performing factor anomaly detection processing on the model parameters to obtain anomaly detection results; and carrying out self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report. The application improves the accuracy and efficiency of enterprise business data mining.

Description

Enterprise business data mining method, device, equipment and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a storage medium for mining business data of an enterprise.

Background

In the prior art, mining and analysis of enterprise business data typically relies on traditional statistical methods and simple machine learning models. The methods are mainly used for analyzing and predicting basic data such as business reports, asset liabilities and cash flow meters. However, these conventional methods often have certain limitations when processing high-dimensional and diversified business data, and it is difficult to fully mine potential patterns and complex relationships in the data.

The deficiencies of the prior art in terms of traffic data mining are mainly manifested in the following aspects. When the traditional method processes multi-dimensional and multi-view data, information of each dimension is difficult to fuse effectively, and accuracy and comprehensiveness of data analysis results are insufficient. The existing method has limited capability in anomaly detection and risk prediction, and potential business risks are difficult to discover and deal with in time. The visualization means in the prior art are single, complex business data and risk situations cannot be intuitively displayed, and certain difficulty is brought to decision making of a management layer.

Disclosure of Invention

The application provides an enterprise business data mining method, device, equipment and storage medium, which are used for improving the accuracy and efficiency of enterprise business data mining.

In a first aspect, the present application provides an enterprise business data mining method, where the enterprise business data mining method includes: performing multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data;

Performing non-negative matrix factorization on the clustering service data to obtain a key characteristic data set;

Performing fuzzy logic hybrid modeling processing on the key characteristic data set to obtain a service hybrid model;

performing differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model;

Performing factor anomaly detection processing on the model parameters to obtain anomaly detection results;

and carrying out self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report.

With reference to the first aspect, in a first implementation manner of the first aspect of the present application, the performing multi-view cluster analysis processing on the acquired enterprise service data to obtain clustered service data includes:

performing multidimensional feature extraction on the enterprise business data to obtain a multidimensional feature set corresponding to the enterprise business data;

Performing principal component analysis on the multi-dimensional feature set to obtain dimension reduction feature data corresponding to the multi-dimensional feature set;

Performing preliminary clustering on the dimension reduction characteristic data through a K-Means clustering algorithm to obtain corresponding preliminary clustering data;

Performing density clustering on the preliminary clustering data, and extracting noise point data and dense region data in the preliminary clustering data;

performing data screening on the preliminary clustering data according to the noise point data and the dense region data to obtain target clustering data;

Performing time dimension analysis on the target cluster data to obtain time dimension data, performing department dimension analysis on the target cluster data to obtain department dimension data, and performing service dimension analysis on the target cluster data to obtain service dimension data;

And merging the time dimension data, the department dimension data and the service dimension data into the clustering service data.

With reference to the first aspect, in a second implementation manner of the first aspect of the present application, the performing a non-negative matrix factorization on the clustered service data to obtain a key feature data set includes:

Constructing a data matrix of the clustering service data to obtain a data matrix;

Extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardization matrix;

Performing initial non-negative matrix factorization on the standardized matrix to obtain two non-negative matrices;

performing element initialization processing on the two non-negative matrixes to obtain two initialized non-negative matrixes;

Performing iterative optimization on the two initialized non-negative matrixes through a preset iterative rule, and obtaining a base matrix and a coefficient matrix when a preset stopping condition is met;

performing feature extraction and sparsification processing on the base matrix to obtain feature data corresponding to the base matrix;

regularization processing is carried out on the coefficient matrix, so that characteristic data corresponding to the coefficient matrix is obtained;

And carrying out feature data fusion on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix to obtain the key feature data set.

With reference to the first aspect, in a third implementation manner of the first aspect of the present application, the performing fuzzy logic hybrid modeling processing on the key feature data set to obtain a service hybrid model includes:

Performing fuzzy variable matching on the key characteristic data to obtain fuzzy variable data;

carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data;

performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data;

And carrying out linear regression modeling processing on the numerical value set to obtain the business hybrid model.

With reference to the first aspect, in a fourth implementation manner of the first aspect of the present application, the performing differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model includes:

carrying out population scale analysis on the business mixed model through a differential evolution algorithm to obtain a corresponding population scale;

carrying out parameter range analysis on the population scale to obtain a target parameter range;

randomly generating a plurality of initial individuals according to the population scale within the target parameter range;

Performing iterative optimization on the plurality of initial individuals through a preset variation strategy to obtain target individuals;

and carrying out parameter mapping on the target individual to obtain model parameters corresponding to the service mixing model.

With reference to the first aspect, in a fifth implementation manner of the first aspect of the present application, the performing factor anomaly detection processing on the model parameter to obtain an anomaly detection result includes:

performing covariance matrix analysis on the model parameters to obtain a covariance matrix;

performing matrix conversion on the covariance matrix to obtain a correlation matrix;

performing eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors;

performing factor screening on the feature vector according to the feature value to obtain a plurality of target factors;

Respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor;

performing orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor;

respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor;

calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor;

And detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor, and obtaining the abnormal detection result.

With reference to the first aspect, in a sixth implementation manner of the first aspect of the present application, the performing a self-organizing map visual analysis processing on an anomaly detection result to obtain a service risk visual report includes:

extracting a topological structure corresponding to the self-organizing map, and matching to obtain weight data of each grid node in the topological structure;

performing abnormal node mapping on the abnormal detection result based on the weight data of each grid node to obtain a target abnormal node;

extracting a data tag of the target abnormal node, and performing grid color coding on the topological structure according to the data tag of the target abnormal node to obtain a coded topological structure;

And generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure, and generating the service risk visualization report according to the risk heat point diagram and the trend analysis diagram.

In a second aspect, the present application provides an enterprise business data mining apparatus, comprising:

the analysis module is used for carrying out multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data;

the decomposition module is used for carrying out non-negative matrix decomposition processing on the clustering service data to obtain a key characteristic data set;

the modeling module is used for carrying out fuzzy logic hybrid modeling processing on the key characteristic data set to obtain a service hybrid model;

The optimization module is used for carrying out differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model;

The detection module is used for carrying out factor anomaly detection processing on the model parameters to obtain anomaly detection results;

And the processing module is used for carrying out self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report.

A third aspect of the present application provides an enterprise business data mining apparatus, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the enterprise business data mining apparatus to perform the enterprise business data mining method described above.

A fourth aspect of the present application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the enterprise business data mining method described above.

According to the technical scheme provided by the application, the multi-view clustering analysis processing can be used for comprehensively acquiring and processing the multi-source business data of the enterprise, and the multi-view analysis method ensures that the data processing is not limited to single-dimension information, but integrates multiple dimensions such as time, departments, businesses and the like, so that the potential modes and complex relations in the data are more comprehensively disclosed. The multi-view cluster analysis results provide high-quality input data for subsequent data decomposition and feature extraction, so that the whole data mining process is more accurate and deep. By the non-negative matrix factorization process, the key feature data set can be effectively extracted. Non-negative matrix factorization maps the high-dimensional space of raw data to a low-dimensional feature space by factoring the data matrix into a base matrix and a coefficient matrix. The process not only reserves main information of the data, but also improves the interpretability of the features through the technologies of sparsification, regularization and the like, so that the subsequent modeling and analysis are more efficient and accurate. Fuzzy logic hybrid modeling is another important technical means of the application, and the ambiguity and uncertainty in the data can be processed through the combination of fuzzy logic and traditional regression analysis. In fuzzy logic hybrid modeling, key feature data is subjected to fuzzification processing to obtain fuzzy variable data, and then the fuzzy variable data is converted into a digital value set through defuzzification processing. And carrying out linear regression modeling on the numerical value set to obtain a business hybrid model. The method not only can process complex relations in the data, but also can improve the robustness and generalization capability of the model. The differential optimization process further improves the performance of the business mixing model. And carrying out population scale analysis, parameter range analysis and iterative optimization on the business hybrid model through a differential evolution algorithm, so that the optimal parameters of the model can be effectively found. The differential evolution algorithm has global searching capability, and can avoid the problem of local optimization, thereby ensuring that the optimized model parameters have higher fitness and stability. The factor anomaly detection processing is one of the key steps of the application, and the anomaly data in the model parameters can be effectively identified through the steps of covariance matrix analysis, eigenvalue decomposition, factor screening and the like. Factor analysis can extract the main factors in the data, reduce the dimensionality of the data, and retain the main information. And further extracting a factor score matrix through calculation of the factor load matrix and the rotation matrix, and detecting abnormal data by using a mahalanobis distance. The method can accurately identify potential business risks and provide powerful support for risk management of enterprises. And the self-organizing map visual analysis processing intuitively displays the complex abnormal detection result. And mapping the high-dimensional data to a low-dimensional space through self-organizing mapping, and maintaining the topological structure of the data to generate a risk heat map and a trend analysis map. The visualization method not only can display the abnormal detection result, but also can reveal potential modes and trends in the data, and helps the management layer to better understand and cope with business risks. The generation of the service risk visualization report provides an intuitive and easy-to-understand risk management tool for enterprises, and the scientificity and the effectiveness of decision making are greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an embodiment of an enterprise business data mining method according to an embodiment of the present application;

Fig. 2 is a schematic diagram of an embodiment of an enterprise business data mining apparatus according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an enterprise business data mining method, device, equipment and storage medium. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, a specific flow of an embodiment of the present application is described below with reference to fig. 1, where an embodiment of an enterprise business data mining method according to an embodiment of the present application includes:

step S101, performing multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data;

It will be appreciated that the execution body of the present application may be an enterprise business data mining apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the application is described by taking a server as an execution main body as an example.

Specifically, multidimensional feature extraction is performed on enterprise business data to obtain a multidimensional feature set corresponding to the enterprise business data, and multifaceted information of the business data is captured. And carrying out principal component analysis on the multi-dimensional feature set to obtain dimension reduction feature data corresponding to the multi-dimensional feature set. The principal component analysis retains most of the important information in the original data by reducing the dimensionality of the data, so that the subsequent processing is more efficient. And carrying out preliminary clustering on the dimension reduction characteristic data by adopting a K-Means clustering algorithm to obtain preliminary clustering data. The K-Means clustering algorithm gathers similar data points together through iterative optimization of a clustering center to form a preliminary clustering result. And performing density clustering on the preliminary clustering data, and extracting noise point data and dense area data in the preliminary clustering data. The density clustering algorithm effectively identifies noise and dense areas by analyzing the density of data points. And carrying out data screening on the preliminary clustering data according to the noise point data and the dense region data to obtain target clustering data. By eliminating noise points and focusing on a dense area, high quality and reliability of target clustering data are ensured. And performing time dimension analysis on the target cluster data to obtain time dimension data, performing department dimension analysis on the time dimension data to obtain department dimension data, and finally performing service dimension analysis on the department dimension data to obtain service dimension data. The time dimension analysis is helpful to reveal the law of the change of the business data along with time, the department dimension analysis can reveal the business characteristic difference among different departments, and the business dimension analysis focuses on the characteristics of the business activity.

Step S102, carrying out non-negative matrix factorization on clustering service data to obtain a key characteristic data set;

Specifically, data matrix construction is carried out on clustered service data to obtain a data matrix, and the service data is organized into a form suitable for matrix operation. And extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardized matrix. The purpose of the standardized processing is to eliminate the dimensional difference of the data, so that the data of different characteristics can be compared and operated under the same scale, and the stability and the accuracy of the algorithm are enhanced. And carrying out initial non-negative matrix factorization on the standardized matrix to obtain two non-negative matrices. The original matrix is decomposed into two submatrices by utilizing a non-negative matrix decomposition technology, so that the structure of the data is simplified and key features of the data are highlighted. And carrying out element initialization processing on the two non-negative matrixes to obtain two initialized non-negative matrixes. The element initialization process aims to provide a reasonable starting point for iterative optimization of matrix decomposition, thereby accelerating convergence speed and improving decomposition effect. And carrying out iterative optimization on the two initialized non-negative matrixes through a preset iterative rule, and obtaining a base matrix and a coefficient matrix when a preset stopping condition is met. The iterative optimization process enables the matrix decomposition result to approach the optimal solution gradually by continuously adjusting the element values of the matrix, and ensures the accuracy and reliability of the decomposition result. And when the optimization process meets the preset stopping condition (such as the iteration times reach the upper limit or the error reaches the set threshold), determining a base matrix and a coefficient matrix. And performing feature extraction and sparsification processing on the base matrix to obtain feature data corresponding to the base matrix. Feature extraction aims at identifying elements representing main features of data from a base matrix, and thinning processing aims at reducing redundant information, so that feature data is more concise and representative. And similarly, regularizing the coefficient matrix to obtain the characteristic data corresponding to the coefficient matrix. The regularization treatment prevents overfitting by introducing constraint conditions, and improves the generalization capability and robustness of the feature data. And carrying out feature data fusion on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix to obtain a key feature data set.

Step S103, performing fuzzy logic hybrid modeling processing on the key feature data set to obtain a service hybrid model;

Specifically, fuzzy variable matching is performed on the key characteristic data to obtain fuzzy variable data. By corresponding the key characteristic data with the predefined fuzzy variables, the association between the characteristic data and the fuzzy concepts is established, so that the data can be processed under a fuzzy logic system. And carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data. The blurring processing converts accurate numerical data into a blurring set, so that the data can better reflect uncertainty and blurring in the real world, and the flexibility and the robustness of data processing are improved. And performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data. The fuzzy set is converted back to a specific value to facilitate subsequent quantitative analysis and model construction. And carrying out linear regression modeling processing on the numerical value set to obtain a business hybrid model. The linear regression modeling is used for building a linear relation between data characteristics and service results by fitting a numerical value set, and building a service hybrid model.

Step S104, carrying out differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model;

Specifically, the population scale analysis is carried out on the business mixed model through a differential evolution algorithm, so that the corresponding population scale is obtained. The differential evolution algorithm is an optimization algorithm based on a population, and the number of individuals to be processed in the optimization process is determined by analyzing the population scale. And carrying out parameter range analysis on the population scale to obtain a target parameter range. The value range of the model parameters is determined so that searching and adjustment can be performed in a reasonable range in the optimization process, and the accuracy and the effectiveness of the optimization result are ensured. Within the target parameter range, a plurality of initial individuals are randomly generated according to the population scale. By randomly generating multiple initial solutions within the parameters, an initial search starting point is provided for the differential evolution algorithm, enabling the algorithm to select and optimize from a variety of possible solutions. And carrying out iterative optimization on a plurality of initial individuals through a preset variation strategy to obtain a target individual. The mutation strategy is the core of a differential evolution algorithm, generates new candidate solutions by carrying out mutation operation on initial individuals, and continuously updates the individuals in the population through a selection mechanism so as to gradually approach the optimal solution. In the iterative optimization process, the algorithm carries out mutation and crossover operation on individuals according to a preset mutation strategy, continuously generates new individuals, selects individuals with better performance according to the fitness function, and updates the population. When the iteration process meets the preset stopping condition (for example, the maximum iteration times or the convergence of the fitness function is reached), the finally obtained individual is the target individual. And carrying out parameter mapping on the target individual to obtain the model parameters corresponding to the service hybrid model. The solution space of the target individual is converted into specific parameters of the service mixing model, so that the model can be practically applied to analysis and prediction of service data.

Step S105, performing factor anomaly detection processing on the model parameters to obtain anomaly detection results;

Specifically, covariance matrix analysis is performed on model parameters to obtain a covariance matrix. Covariance matrix analysis reveals the linear relationship between parameters by calculating the covariance between the different parameters. And performing matrix conversion on the covariance matrix to obtain a correlation matrix. The correlation matrix is a standardized process for the covariance matrix such that elements in the matrix represent correlation coefficients between parameters, ranging from-1 to 1, thereby facilitating analysis of correlations between parameters. And carrying out eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors. Eigenvalue decomposition extracts the principal components of the data by decomposing the correlation matrix, the eigenvalues represent the importance of each component, and the eigenvectors describe the direction and nature of the components. And performing factor screening on the feature vectors according to the feature values to obtain a plurality of target factors. By selecting components with larger eigenvalues, it is ensured that the selected factors can account for most of the variability in the data, thereby simplifying the data structure and highlighting the main features. And respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor. The factor load matrix describes the projection of the original parameters onto the factors, reflecting the relationship between each factor and the original parameters. And carrying out orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor. The orthogonal rotation makes the factor structure clearer by adjusting the direction of the factors, and each factor is independent in the aspect of interpretation parameter variability, so that the interpretation power and the interpretation performance of factor analysis are improved. And respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor. The factor score matrix converts the raw data into factor scores by projecting the raw data into a factor space, calculating a score for each sample on each factor. And calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor. The mahalanobis distance is a measurement method, and can measure the multidimensional distance between samples by considering the correlation of data, thereby accurately evaluating the similarity of the samples. And detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor, and obtaining an abnormal detection result. And identifying abnormal data in the model parameters by comparing the calculated Markov distance with a preset threshold value.

And S106, performing self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report.

Specifically, the topology structure corresponding to the self-organizing map is extracted, and weight data of each grid node in the topology structure are obtained through matching. The self-organizing map is an unsupervised learning algorithm, and high-dimensional data is mapped into a low-dimensional space through the self-organizing map to form a grid-shaped topological structure, each grid node represents a characteristic mode, and the weight data of the grid node reflects the characteristics of the node. And performing abnormal node mapping on the abnormal detection result based on the weight data of each grid node to obtain a target abnormal node. And comparing the anomaly detection result with the weight data of the self-organizing map to identify which grid nodes correspond to the anomaly data, so as to obtain the target anomaly node. These target anomaly nodes represent areas of the data where anomalies exist, providing a centralized distribution of the anomaly data. And extracting the data label of the target abnormal node, and performing grid color coding on the topological structure according to the data label of the target abnormal node to obtain a coded topological structure. The extraction of the data labels classifies and marks the specific characteristics of the target abnormal nodes by identifying the specific characteristics. The grid color coding distributes different colors for different types of abnormal nodes, so that the whole topological structure can intuitively reflect the distribution condition and the severity of abnormal data. And generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure. The risk heat point diagram visually displays a concentrated area of abnormal conditions in the data through color coding, and helps a user to quickly identify a high risk area. The trend analysis graph reveals the development trend and the change rule of the abnormal situation in the data through time sequence analysis of the abnormal nodes, and provides the dynamic change information of the abnormal situation. And generating a service risk visualization report according to the risk hot point diagram and the trend analysis diagram. The business risk visualization report integrates the contents of a risk thermal diagram and a trend analysis diagram, and shows the abnormal situation and the potential risk thereof in the data through the combination of the diagram and the text.

In the embodiment of the application, the multi-view clustering analysis processing can comprehensively acquire and process the multi-source business data of enterprises, and the multi-view analysis method ensures that the data processing is not only limited to single-dimension information, but also fuses a plurality of dimensions such as time, departments, businesses and the like, thereby more comprehensively revealing the potential modes and complex relations in the data. The multi-view cluster analysis results provide high-quality input data for subsequent data decomposition and feature extraction, so that the whole data mining process is more accurate and deep. By the non-negative matrix factorization process, the key feature data set can be effectively extracted. Non-negative matrix factorization maps the high-dimensional space of raw data to a low-dimensional feature space by factoring the data matrix into a base matrix and a coefficient matrix. The process not only reserves main information of the data, but also improves the interpretability of the features through the technologies of sparsification, regularization and the like, so that the subsequent modeling and analysis are more efficient and accurate. Fuzzy logic hybrid modeling is another important technical means of the application, and the ambiguity and uncertainty in the data can be processed through the combination of fuzzy logic and traditional regression analysis. In fuzzy logic hybrid modeling, key feature data is subjected to fuzzification processing to obtain fuzzy variable data, and then the fuzzy variable data is converted into a digital value set through defuzzification processing. And carrying out linear regression modeling on the numerical value set to obtain a business hybrid model. The method not only can process complex relations in the data, but also can improve the robustness and generalization capability of the model. The differential optimization process further improves the performance of the business mixing model. And carrying out population scale analysis, parameter range analysis and iterative optimization on the business hybrid model through a differential evolution algorithm, so that the optimal parameters of the model can be effectively found. The differential evolution algorithm has global searching capability, and can avoid the problem of local optimization, thereby ensuring that the optimized model parameters have higher fitness and stability. The factor anomaly detection processing is one of the key steps of the application, and the anomaly data in the model parameters can be effectively identified through the steps of covariance matrix analysis, eigenvalue decomposition, factor screening and the like. Factor analysis can extract the main factors in the data, reduce the dimensionality of the data, and retain the main information. And further extracting a factor score matrix through calculation of the factor load matrix and the rotation matrix, and detecting abnormal data by using a mahalanobis distance. The method can accurately identify potential business risks and provide powerful support for risk management of enterprises. And the self-organizing map visual analysis processing intuitively displays the complex abnormal detection result. And mapping the high-dimensional data to a low-dimensional space through self-organizing mapping, and maintaining the topological structure of the data to generate a risk heat map and a trend analysis map. The visualization method not only can display the abnormal detection result, but also can reveal potential modes and trends in the data, and helps the management layer to better understand and cope with business risks. The generation of the service risk visualization report provides an intuitive and easy-to-understand risk management tool for enterprises, and the scientificity and the effectiveness of decision making are greatly improved.

In a specific embodiment, the process of executing step S101 may specifically include the following steps:

(1) Extracting multidimensional features of the enterprise business data to obtain a multidimensional feature set corresponding to the enterprise business data;

(2) Performing principal component analysis on the multi-dimensional feature set to obtain dimension reduction feature data corresponding to the multi-dimensional feature set;

(3) Performing preliminary clustering on the dimension reduction characteristic data through a K-Means clustering algorithm to obtain corresponding preliminary clustering data;

(4) Performing density clustering on the preliminary clustering data, and extracting noise point data and dense area data in the preliminary clustering data;

(5) Data screening is carried out on the preliminary clustering data according to the noise point data and the dense area data, and target clustering data is obtained;

(6) Performing time dimension analysis on the target cluster data to obtain time dimension data, performing department dimension analysis on the target cluster data to obtain department dimension data, and performing service dimension analysis on the target cluster data to obtain service dimension data;

(7) And merging the time dimension data, the department dimension data and the service dimension data into clustered service data.

In particular, comprehensive business data is obtained, which may include sales records, customer information, inventory data, financial statements, and the like. The raw data is converted into a structured dataset by data preprocessing techniques, such as data cleansing, normalization, and the like. And (3) carrying out multidimensional feature extraction on the data by applying a feature engineering method. For example, for sales data, characteristics of sales amount, sales number, sales time, and the like of each product are extracted; for the customer data, characteristics of customer purchase frequency, average purchase amount, customer satisfaction, and the like are extracted. These multidimensional features together form a multidimensional feature set of the enterprise business data. In order to reduce the dimension of the data, the calculation efficiency is improved, main information is reserved, and principal component analysis is performed. Principal component analysis converts the original multidimensional feature data into a new coordinate system by linear transformation, so that the coordinates (principal components) in the new coordinate system are uncorrelated with each other, and are ordered according to the size of the data variance. For example, let the original multidimensional feature data matrix beThe covariance matrix isThe objective of principal component analysis is to solve for eigenvalues and eigenvectors. Covariance matrixThe calculation can be made by the following formula:

；

wherein, Is the number of samples to be processed,Is the firstA number of samples of the sample were taken,Is the sample mean. By solving eigenvaluesAnd characteristic directionThe main components of the data are obtained, and the original data are projected onto the main components to obtain the dimension-reducing characteristic data. And performing preliminary clustering on the dimension reduction characteristic data by using a K-Means clustering algorithm. The data set is divided into k clusters, so that data points in the same cluster are similar to each other, and the difference of the data points between different clusters is large. The K-Means algorithm randomly initializes K cluster centers, then distributes each data point to the nearest cluster center, recalculates the cluster center according to the distribution result, and repeats the above process until the cluster center is not changed or reaches the preset iteration times. And performing density clustering on the preliminary clustering data. Density clustering algorithms (e.g., DBSCAN) divide areas of high density into clusters by analyzing the density of data points, and mark points of low density as noise. The DBSCAN algorithm is implemented by setting two parameters: and checking whether the number of points in the neighborhood is larger than the minimum neighborhood number of points for each data point, if so, marking the data point as a core point, and classifying the points in the neighborhood into the same cluster. In this way, noise point data and dense region data in the preliminary cluster data are extracted. And carrying out data screening on the preliminary clustering data according to the noise point data and the dense region data to obtain target clustering data. The noise point data can be regarded as abnormal data, the final clustering result is screened out, and the dense region data is reserved as target clustering data. And (3) carrying out time dimension analysis on the target cluster data, arranging the data points in each cluster in time sequence, and analyzing the law of time variation. For example, the change trend of the number of data points in each time period is calculated, the peak period and the valley period of the business activity are identified, and the time dimension data are obtained. And carrying out department dimension analysis on the target cluster data, dividing and counting according to departments to which the data points belong, and analyzing the performances and the characteristics of different departments in business activities. For example, the number, average and variance of data points in the target cluster for each department are calculated, resulting in department dimension data. And carrying out service dimension analysis on the target clustering data, dividing and counting data points according to different service types, and analyzing the performance and characteristics of the different service types in the target clustering. For example, the number, average value and variance of data points in the target cluster of each service type are calculated to obtain service dimension data. And merging the time dimension data, the department dimension data and the service dimension data into clustering service data to form a comprehensive data set.

In a specific embodiment, the process of executing step S102 may specifically include the following steps:

(1) Constructing a data matrix of the clustering service data to obtain a data matrix;

(2) Extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardization matrix;

(3) Performing initial non-negative matrix factorization on the standardized matrix to obtain two non-negative matrices;

(4) Element initialization processing is carried out on the two non-negative matrixes to obtain two initialized non-negative matrixes;

(5) Performing iterative optimization on the two initialized non-negative matrixes through a preset iterative rule, and obtaining a base matrix and a coefficient matrix when a preset stopping condition is met;

(6) Performing feature extraction and sparsification processing on the base matrix to obtain feature data corresponding to the base matrix;

(7) Regularization treatment is carried out on the coefficient matrix to obtain characteristic data corresponding to the coefficient matrix;

(8) And carrying out feature data fusion on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix to obtain a key feature data set.

Specifically, first, the following is performed. The traffic data is converted into a form suitable for matrix operations. Assume that a data set is provided that contains different business indicia, such as sales, sales volume, customer satisfaction, product return rate, etc. The data are arranged in specific dimensions (e.g., time, department, product category, etc.) to form a data matrix X, where each row represents a business sample and each column represents a business index. For example, assume that the size of the data matrix X is m×n, where m is the number of samples and n is the number of traffic indicators. And extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardized matrix. The purpose of the normalization process is to eliminate dimensional differences between different business indexes, so that the indexes are compared and operated on the same scale. Normalization methods may employ Z-score normalization, subtracting the mean value from the data for each index and dividing by the standard deviation. Performing initial non-negative matrix factorization on the standardized matrix to decompose the standardized matrix into two non-negative matricesAnd. One non-negative matrix is decomposed into the product of two non-negative matrices, the formula is as follows:

；

wherein, Is thatIs used for the base matrix of the (c),Is thatIs used for the coefficient matrix of (a),Is a decomposed rank (in generalAnd). The initial decomposition of the initial non-negative matrix factorization may be by randomly initializing the matrixAndIs realized by the elements of (a). And carrying out element initialization processing on the two non-negative matrixes to obtain two initialized non-negative matrixes. A reasonable starting point is provided for iterative optimization, so that the convergence speed is increased and the decomposition effect is improved. Generating an initial matrix using a random number generatorAndEnsuring that all elements are non-negative. And carrying out iterative optimization on the two initialized non-negative matrixes through a preset iterative rule. When the preset stopping condition is met, obtaining a final base matrixCoefficient matrix. The iterative optimization process generally adopts a multiplication update rule, and the update formula is as follows:

；

Wherein the update process is iterated until the matrix AndConverging to a steady state or reaching a preset number of iterations. Regularization treatment is carried out on the coefficient matrix, and feature data corresponding to the coefficient matrix is obtained. The regularization treatment prevents overfitting by introducing constraint conditions, and improves the generalization capability and robustness of the feature data. Common regularization methods include L1 regularization and L2 regularization, with the formulas:

；

wherein, Is the element of the coefficient matrix after regularization,Is the firstL2 norm of a line. And carrying out feature data fusion on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix to obtain a key feature data set. And integrating the characteristic data extracted from the base matrix and the coefficient matrix to form a comprehensive characteristic data set.

In a specific embodiment, the process of executing step S103 may specifically include the following steps:

(1) Carrying out fuzzy variable matching on the key characteristic data to obtain fuzzy variable data;

(2) Carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data;

(3) Performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data;

(4) And carrying out linear regression modeling processing on the numerical value set to obtain a business hybrid model.

Specifically, specific features and attributes of the key feature data are specified. For example, in an enterprise business data set, key characteristic data may include sales, customer satisfaction, inventory levels, and the like. These data are converted into fuzzy variables defining fuzzy sets and membership functions for each feature. The definition of fuzzy sets may be based on business experience and expert knowledge. For example, sales may be defined as three fuzzy sets "low", "medium", "high", each set having a corresponding membership function describing the degree to which sales belong to a certain set. Suppose salesThe membership function of (2) is as follows: when sales are lower than or equal to a certain value a, the membership belonging to the "low" set is 1; when sales are between a and b, the membership belonging to the "low" set is; When sales are greater than or equal to b, the membership belonging to the "low" set is 0. Also, when sales are between a and b, the membership belonging to the "middle" set is; Between b and c, the membership belonging to the "middle" set is. When sales are between b and c, the membership belonging to the "high" set is; When sales are greater than or equal to c, the membership belonging to the "high" set is 1. These membership functions convert sales to fuzzy variables, resulting in fuzzy variable data. For example, assuming a sales of 50, the membership in the "low", "medium" and "high" fuzzy sets is calculated by the membership function described above to be 0.2, 0.8 and 0, respectively. This indicates that sales 50 have the highest membership in the fuzzy set of "medium" and the highest likelihood of belonging to "medium". And carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data. The original numerical data is converted into membership of the fuzzy set, so that the ambiguity and uncertainty of the data are expressed. For sales 50, membership in different fuzzy sets is calculated as described above to form a fuzzy variable dataset. And performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data. The membership of the fuzzy set is converted back to a specific value for subsequent analysis and modeling. Common defuzzification methods include barycenter and maximum membership. The barycenter method is adopted for defuzzification, and the formula is as follows: the value after defuzzification= (Σ (membership value))/Σ membership. According to the foregoing example, if the values corresponding to the fuzzy sets "low", "medium", "high" are 20, 50 and 80, respectively, the sales after defuzzification are: the defuzzified value = (0.2×20+0.8×50+0×80)/(0.2+0.8) =44. This means that the blurred data is converted into a numerical value 44 by the defuzzification process. And carrying out linear regression modeling processing on the numerical value set to obtain a business hybrid model. Linear regression modeling describes the relationship between a plurality of feature variables and a target variable by fitting a linear model. Assuming n characteristic variables (e.g., sales, customer satisfaction, inventory levels) and target variables (e.g., profit), the linear regression model is in the form of: profit = intercept + sales coefficient + sales + customer satisfaction coefficient + customer satisfaction + inventory level coefficient + inventory level + error term. And estimating a regression coefficient by a least square method, and minimizing the square sum of error terms to obtain an optimal linear regression model. Specifically, the objective function of the least squares method is: min (sigma (profit- (intercept + sales coefficient + sales + customer satisfaction coefficient + customer satisfaction + inventory level coefficient + inventory level)) ≡2). For example, suppose there is a dataset containing three characteristic variables (sales, customer satisfaction, inventory levels), the target variable being business profit. The fuzzy variable data obtained through the steps are subjected to fuzzification and defuzzification processing to form a numerical value set. Performing linear regression modeling on the numerical value set, and assuming that the obtained regression model is: profit = 10+0.5 sales +2 customer satisfaction-0.3 stock level the model shows that per increase in sales by one unit, profit increases by 0.5 units; every time customer satisfaction increases by one unit, profit increases by 2 units; the profit is reduced by 0.3 units for each increase in inventory level.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

(1) Carrying out population scale analysis on the business mixed model through a differential evolution algorithm to obtain a corresponding population scale;

(2) Carrying out parameter range analysis on the population scale to obtain a target parameter range;

(3) Randomly generating a plurality of initial individuals according to population scale within a target parameter range;

(4) Performing iterative optimization on a plurality of initial individuals through a preset variation strategy to obtain target individuals;

(5) And carrying out parameter mapping on the target individual to obtain the model parameters corresponding to the service hybrid model.

Specifically, population size analysis is performed to determine the number of individuals to be treated in the optimization process. The choice of population size typically depends on the complexity of the problem and the size of the data set. For the business mix model, an initial population size is selected from historical data and experience. Assuming an initial population size of 50, this means that 50 different solutions need to be processed simultaneously in the optimization process. And carrying out parameter range analysis on the population scale to obtain a target parameter range so as to search and adjust in a reasonable range in the optimization process and ensure the accuracy and the effectiveness of an optimization result. For the business mixed model, the parameter range can be set according to the actual situation of the business and the historical data. For example, assume that the parameters of the business model include sales coefficients, customer satisfaction coefficients, and inventory level coefficients, which range in values of [ -10, 10], [ -5,5], and [ -2,2], respectively. Within the target parameter range, a plurality of initial individuals are randomly generated according to the population scale. The generation of the initial individual may be accomplished by randomly selecting values within each parameter range. Assuming n parameters (e.g., sales coefficients, customer satisfaction coefficients, inventory level coefficients), each individual is represented in an n-dimensional vector. By random generation, 50 initial individuals can be obtained, each randomly distributed over the parameters. For example, an initial individual may be [3.5, -1.2,0.8], indicating a sales factor of 3.5, a customer satisfaction factor of-1.2, and an inventory level factor of 0.8. And carrying out iterative optimization on a plurality of initial individuals through a preset variation strategy to obtain a target individual. The mutation strategy is the core of a differential evolution algorithm, generates new candidate solutions by carrying out mutation operation on initial individuals, and continuously updates the individuals in the population through a selection mechanism so as to gradually approach the optimal solution. One common form of mutation strategy is "DE/rand/1/bin", which is formulated as follows:

；

wherein, Is a variant of the gene of the human body,、、Is three different individuals randomly selected from the current population, F is a variation factor, and is usually valued between [0,2 ]. New candidate individuals are generated by mutation operations and, by crossover and selection operations, it is determined whether or not to introduce them into the population. The goal of crossover operations is to create new individuals by exchanging part of the genes of the variant individuals with those of the current individuals. And the selection operation is to select a better individual to enter the next generation by comparing the fitness of the new individual and the current individual. When the iteration process meets the preset stopping condition (for example, the maximum iteration times or the convergence of the fitness function is reached), the finally obtained individual is the target individual. Assume that after multiple iterations an optimal individual [4.2, -0.5,1.1] is obtained, indicating an optimal sales coefficient of 4.2, a customer satisfaction coefficient of-0.5, and an inventory level coefficient of 1.1. And carrying out parameter mapping on the target individual to obtain the model parameters corresponding to the service hybrid model. And applying the optimized target individual value to the service mixed model, so that the model can be practically applied to analysis and prediction of service data. For example, mapping the target individual [4.2, -0.5,1.1] into the business mix model, the final business mix model can be obtained as follows: profit = 10+4.2 + (-0.5) customer satisfaction +1.1 inventory level.

In a specific embodiment, the process of executing step S105 may specifically include the following steps:

(1) Performing covariance matrix analysis on the model parameters to obtain a covariance matrix;

(2) Performing matrix conversion on the covariance matrix to obtain a correlation matrix;

(3) Performing eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors;

(4) Performing factor screening on the feature vectors according to the feature values to obtain a plurality of target factors;

(5) Respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor;

(6) Performing orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor;

(7) Respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor;

(8) Calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor;

(9) And detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor, and obtaining an abnormal detection result.

Specifically, covariance matrix analysis is performed on model parameters to obtain a covariance matrix. The data are organized in a matrix, with each column representing a variable and each row representing an observation. A covariance matrix is constructed by calculating the covariance between each pair of variables. The elements of the covariance matrix represent the covariance between the different variables. The covariance formula is:

；

wherein, Is the covariance between the variables X and Y,AndThe ith observation of variables X and Y respectively,AndThe mean of variables X and Y, respectively, and n is the number of observations. The covariance matrix Σ is constructed by calculating the covariance of all the variable pairs. And performing matrix conversion on the covariance matrix to obtain a correlation matrix. The correlation matrix is a normalization process to the covariance matrix such that the elements in the matrix represent correlation coefficients between variables, ranging from-1 to 1. The calculation formula of the correlation matrix is as follows:

；

where Corr (X, Y) is the correlation coefficient between variables X and Y, cov (X, Y) is their covariance, AndStandard deviations of variables X and Y, respectively. The correlation matrix eliminates the influence of dimension through the standardized covariance matrix, so that the correlation among different variables is more visual. And carrying out eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors. Eigenvalue decomposition is an important method in matrix analysis, and the principal components in the data are extracted by decomposing the correlation matrix. Assuming that the correlation matrix is R, the result of the eigenvalue decomposition can be expressed as:

；

Where Q is the eigenvector matrix, Is a diagonal matrix, and the elements on the diagonal are eigenvalues. The eigenvectors represent the direction of the data, while the eigenvalues represent the variance of the data in the corresponding direction. And performing factor screening on the feature vectors according to the feature values to obtain a plurality of target factors, and selecting the feature vectors capable of explaining most of the variation of the data. In general, the first k feature vectors having large feature values are selected as target factors. And through factor analysis, the structure of the data is simplified, and main features are extracted. And respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor. The factor load matrix describes the projection of the original variables onto the factors, reflecting the relationship between each factor and the original variable. The calculation formula of the factor load matrix L is as follows:

；

Where Q is the eigenvector matrix of the target factor, Is the square root of the eigenvalue diagonal matrix. And obtaining the weight of each original variable on each factor through the factor load matrix. And carrying out orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor. The purpose of orthogonal rotation is to make the factor structure clearer, each factor being more independent in interpreting variable variability. The common orthogonal rotation method is Varimax rotations, whose goal is to maximize the variance of the factor load matrix, so that each factor has a large load on a few variables, while the load on other variables is near zero. And respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor. The factor score matrix computes the score of each sample on each factor by projecting the raw data into the factor space. And calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor. The mahalanobis distance is a measurement method, and can consider the correlation of data and measure the multidimensional distance between samples, and the formula is as follows:

；

wherein, Is the mahalanobis distance, x is the sample vector, μ is the mean vector, Σ is the covariance matrix. By calculating the mahalanobis distance, an abnormal sample is identified. And detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor, and obtaining an abnormal detection result. And identifying abnormal data in the model parameters by comparing the mahalanobis distance with a preset threshold. If the mahalanobis distance exceeds the preset threshold, the sample is determined to be abnormal.

In a specific embodiment, the process of executing step S106 may specifically include the following steps:

(1) Extracting a topological structure corresponding to the self-organizing map, and matching to obtain weight data of each grid node in the topological structure;

(2) Performing abnormal node mapping on the abnormal detection result based on the weight data of each grid node to obtain a target abnormal node;

(3) Extracting a data tag of a target abnormal node, and performing grid color coding on the topological structure according to the data tag of the target abnormal node to obtain a coded topological structure;

(4) And generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure, and generating a service risk visualization report according to the risk heat point diagram and the trend analysis diagram.

Specifically, topology and weight data are extracted from the trained self-organizing map model. Assuming a data set containing multiple business characteristics (such as sales, customer satisfaction, inventory levels, etc.), each grid node, after training through self-organizing mapIs expressed as a weight vector of (2). The topology may be represented as a two-dimensional grid with each node having a corresponding weight vector. And mapping the abnormal node to the abnormal detection result based on the weight data of each grid node. Mapping the anomaly detection result into an ad hoc mapping grid, and identifying which nodes correspond to the anomaly data. Assume an anomaly detection result dataset in which each data point is represented as. For each data pointThe closest mesh node (i.e., best matching unit, BMU) is found, whose formula is:

；

wherein, Representing data pointsAnd grid nodeWeight vector of (2)Euclidean distance between them. And (3) through traversing all abnormal data points, finding out the corresponding BMU, and recording the BMU nodes as target abnormal nodes. And extracting the data label of the target abnormal node, and performing grid color coding on the topological structure according to the data label of the target abnormal node to obtain a coded topological structure. The data tag may be a category, severity, or other meaningful indicia of the anomalous data. The labels of each target anomaly node are mapped onto a corresponding grid node and encoded using a different color. For example, assuming three types of anomaly data, represented by red, yellow, and green, respectively, each anomaly node would be color coded according to its tag, forming an intuitive coding topology. And generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure. The risk heat point diagram visually displays the distribution condition of abnormal data in the self-organizing map grid through color coding, and helps a user to quickly identify a high risk area. The trend analysis graph reveals the development trend and the change rule of the abnormal situation in the data through time sequence analysis of the abnormal nodes, and provides the dynamic change information of the abnormal situation. And generating a service risk visualization report according to the risk hot point diagram and the trend analysis diagram. The business risk visualization report integrates the contents of a risk thermal diagram and a trend analysis diagram, and comprehensively displays abnormal conditions and potential risks thereof in the data through the combination of the diagrams and the characters.

The method for mining enterprise business data in the embodiment of the present application is described above, and the enterprise business data mining apparatus in the embodiment of the present application is described below, referring to fig. 2, where one embodiment of the enterprise business data mining apparatus in the embodiment of the present application includes:

the analysis module 201 is configured to perform multi-view cluster analysis processing on the acquired enterprise service data to obtain clustered service data;

The decomposition module 202 is configured to perform non-negative matrix factorization on the clustered service data to obtain a key feature data set;

the modeling module 203 is configured to perform fuzzy logic hybrid modeling processing on the key feature data set to obtain a service hybrid model;

The optimizing module 204 is configured to perform differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model;

the detection module 205 is configured to perform factor anomaly detection processing on the model parameters to obtain an anomaly detection result;

and the processing module 206 is used for performing self-organizing map visual analysis processing on the abnormal detection result to obtain a service risk visual report.

Through the cooperation of the components, multi-view clustering analysis processing can be used for comprehensively acquiring and processing multi-source business data of enterprises, and the multi-view analysis method ensures that the data processing is not limited to single-dimension information, but integrates multiple dimensions such as time, departments, businesses and the like, so that potential modes and complex relations in the data are more comprehensively disclosed. The multi-view cluster analysis results provide high-quality input data for subsequent data decomposition and feature extraction, so that the whole data mining process is more accurate and deep. By the non-negative matrix factorization process, the key feature data set can be effectively extracted. Non-negative matrix factorization maps the high-dimensional space of raw data to a low-dimensional feature space by factoring the data matrix into a base matrix and a coefficient matrix. The process not only reserves main information of the data, but also improves the interpretability of the features through the technologies of sparsification, regularization and the like, so that the subsequent modeling and analysis are more efficient and accurate. Fuzzy logic hybrid modeling is another important technical means of the invention, and the ambiguity and uncertainty in the data can be processed through the combination of fuzzy logic and traditional regression analysis. In fuzzy logic hybrid modeling, key feature data is subjected to fuzzification processing to obtain fuzzy variable data, and then the fuzzy variable data is converted into a digital value set through defuzzification processing. And carrying out linear regression modeling on the numerical value set to obtain a business hybrid model. The method not only can process complex relations in the data, but also can improve the robustness and generalization capability of the model. The differential optimization process further improves the performance of the business mixing model. And carrying out population scale analysis, parameter range analysis and iterative optimization on the business hybrid model through a differential evolution algorithm, so that the optimal parameters of the model can be effectively found. The differential evolution algorithm has global searching capability, and can avoid the problem of local optimization, thereby ensuring that the optimized model parameters have higher fitness and stability. The factor anomaly detection processing is one of the key steps of the invention, and the anomaly data in the model parameters can be effectively identified through the steps of covariance matrix analysis, eigenvalue decomposition, factor screening and the like. Factor analysis can extract the main factors in the data, reduce the dimensionality of the data, and retain the main information. And further extracting a factor score matrix through calculation of the factor load matrix and the rotation matrix, and detecting abnormal data by using a mahalanobis distance. The method can accurately identify potential business risks and provide powerful support for risk management of enterprises. And the self-organizing map visual analysis processing intuitively displays the complex abnormal detection result. And mapping the high-dimensional data to a low-dimensional space through self-organizing mapping, and maintaining the topological structure of the data to generate a risk heat map and a trend analysis map. The visualization method not only can display the abnormal detection result, but also can reveal potential modes and trends in the data, and helps the management layer to better understand and cope with business risks. The generation of the service risk visualization report provides an intuitive and easy-to-understand risk management tool for enterprises, and the scientificity and the effectiveness of decision making are greatly improved.

The present application also provides an enterprise business data mining apparatus, which includes a memory and a processor, where the memory stores computer readable instructions that, when executed by the processor, cause the processor to execute the steps of the enterprise business data mining method in the above embodiments.

The present application also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions are executed on a computer, cause the computer to perform the steps of the enterprise business data mining method.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An enterprise business data mining method, characterized in that the enterprise business data mining method comprises:

Performing multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data; the method specifically comprises the following steps: performing multidimensional feature extraction on the enterprise business data to obtain a multidimensional feature set corresponding to the enterprise business data; performing principal component analysis on the multi-dimensional feature set to obtain dimension reduction feature data corresponding to the multi-dimensional feature set; performing preliminary clustering on the dimension reduction characteristic data through a K-Means clustering algorithm to obtain corresponding preliminary clustering data; performing density clustering on the preliminary clustering data, and extracting noise point data and dense region data in the preliminary clustering data; performing data screening on the preliminary clustering data according to the noise point data and the dense region data to obtain target clustering data; performing time dimension analysis on the target cluster data to obtain time dimension data, performing department dimension analysis on the target cluster data to obtain department dimension data, and performing service dimension analysis on the target cluster data to obtain service dimension data; combining the time dimension data, the department dimension data and the service dimension data into the clustering service data;

Performing non-negative matrix factorization on the clustering service data to obtain a key characteristic data set; the method specifically comprises the following steps: constructing a data matrix of the clustering service data to obtain a data matrix; extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardization matrix; performing initial non-negative matrix factorization on the standardized matrix to obtain two non-negative matrices; performing element initialization processing on the two non-negative matrixes to obtain two initialized non-negative matrixes; performing iterative optimization on the two initialized non-negative matrixes through a preset iterative rule, and obtaining a base matrix and a coefficient matrix when a preset stopping condition is met; performing feature extraction and sparsification processing on the base matrix to obtain feature data corresponding to the base matrix; regularization processing is carried out on the coefficient matrix, so that characteristic data corresponding to the coefficient matrix is obtained; feature data fusion is carried out on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix, so that the key feature data set is obtained;

Performing fuzzy logic hybrid modeling processing on the key characteristic data set to obtain a service hybrid model; the method specifically comprises the following steps: performing fuzzy variable matching on the key characteristic data to obtain fuzzy variable data; carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data; performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data; performing linear regression modeling on the numerical value set to obtain the service hybrid model;

Performing differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model; the method specifically comprises the following steps: carrying out population scale analysis on the business mixed model through a differential evolution algorithm to obtain a corresponding population scale; carrying out parameter range analysis on the population scale to obtain a target parameter range; randomly generating a plurality of initial individuals according to the population scale within the target parameter range; performing iterative optimization on the plurality of initial individuals through a preset variation strategy to obtain target individuals; performing parameter mapping on the target individual to obtain model parameters corresponding to the service mixing model;

Performing factor anomaly detection processing on the model parameters to obtain anomaly detection results; the method specifically comprises the following steps: performing covariance matrix analysis on the model parameters to obtain a covariance matrix; performing matrix conversion on the covariance matrix to obtain a correlation matrix; performing eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors; performing factor screening on the feature vector according to the feature value to obtain a plurality of target factors; respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor; performing orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor; respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor; calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor; detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor to obtain an abnormal detection result;

Performing self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report; the method specifically comprises the following steps: extracting a topological structure corresponding to the self-organizing map, and matching to obtain weight data of each grid node in the topological structure; performing abnormal node mapping on the abnormal detection result based on the weight data of each grid node to obtain a target abnormal node; extracting a data tag of the target abnormal node, and performing grid color coding on the topological structure according to the data tag of the target abnormal node to obtain a coded topological structure; and generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure, and generating the service risk visualization report according to the risk heat point diagram and the trend analysis diagram.

2. An enterprise business data mining apparatus, the enterprise business data mining apparatus comprising:

The analysis module is used for carrying out multi-view cluster analysis processing on the acquired enterprise business data to obtain clustered business data; the method specifically comprises the following steps: performing multidimensional feature extraction on the enterprise business data to obtain a multidimensional feature set corresponding to the enterprise business data; performing principal component analysis on the multi-dimensional feature set to obtain dimension reduction feature data corresponding to the multi-dimensional feature set; performing preliminary clustering on the dimension reduction characteristic data through a K-Means clustering algorithm to obtain corresponding preliminary clustering data; performing density clustering on the preliminary clustering data, and extracting noise point data and dense region data in the preliminary clustering data; performing data screening on the preliminary clustering data according to the noise point data and the dense region data to obtain target clustering data; performing time dimension analysis on the target cluster data to obtain time dimension data, performing department dimension analysis on the target cluster data to obtain department dimension data, and performing service dimension analysis on the target cluster data to obtain service dimension data; combining the time dimension data, the department dimension data and the service dimension data into the clustering service data;

The decomposition module is used for carrying out non-negative matrix decomposition processing on the clustering service data to obtain a key characteristic data set; the method specifically comprises the following steps: constructing a data matrix of the clustering service data to obtain a data matrix; extracting dimension data of the data matrix, and carrying out standardization processing on the data matrix according to the dimension data to obtain a standardization matrix; performing initial non-negative matrix factorization on the standardized matrix to obtain two non-negative matrices; performing element initialization processing on the two non-negative matrixes to obtain two initialized non-negative matrixes; performing iterative optimization on the two initialized non-negative matrixes through a preset iterative rule, and obtaining a base matrix and a coefficient matrix when a preset stopping condition is met; performing feature extraction and sparsification processing on the base matrix to obtain feature data corresponding to the base matrix; regularization processing is carried out on the coefficient matrix, so that characteristic data corresponding to the coefficient matrix is obtained; feature data fusion is carried out on the feature data corresponding to the base matrix and the feature data corresponding to the coefficient matrix, so that the key feature data set is obtained;

the modeling module is used for carrying out fuzzy logic hybrid modeling processing on the key characteristic data set to obtain a service hybrid model; the method specifically comprises the following steps: performing fuzzy variable matching on the key characteristic data to obtain fuzzy variable data; carrying out fuzzification processing on the key characteristic data according to the fuzzy variable data to obtain corresponding fuzzy data; performing defuzzification processing on the fuzzy data to obtain a numerical value set corresponding to the fuzzy data; performing linear regression modeling on the numerical value set to obtain the service hybrid model;

The optimization module is used for carrying out differential optimization processing on the service hybrid model to obtain model parameters corresponding to the service hybrid model; the method specifically comprises the following steps: carrying out population scale analysis on the business mixed model through a differential evolution algorithm to obtain a corresponding population scale; carrying out parameter range analysis on the population scale to obtain a target parameter range; randomly generating a plurality of initial individuals according to the population scale within the target parameter range; performing iterative optimization on the plurality of initial individuals through a preset variation strategy to obtain target individuals; performing parameter mapping on the target individual to obtain model parameters corresponding to the service mixing model;

The detection module is used for carrying out factor anomaly detection processing on the model parameters to obtain anomaly detection results; the method specifically comprises the following steps: performing covariance matrix analysis on the model parameters to obtain a covariance matrix; performing matrix conversion on the covariance matrix to obtain a correlation matrix; performing eigenvalue decomposition on the correlation matrix to obtain eigenvalues and eigenvectors; performing factor screening on the feature vector according to the feature value to obtain a plurality of target factors; respectively carrying out factor load matrix calculation on each target factor to obtain a factor load matrix of each target factor; performing orthogonal rotation on the factor load matrix of each target factor to obtain a rotation matrix of each target factor; respectively constructing a factor score matrix of the rotation matrix of each target factor to obtain a factor score matrix corresponding to each target factor; calculating the mahalanobis distance between each target factor and a preset sample factor according to the factor score matrix corresponding to each target factor; detecting abnormal data of the model parameters according to the mahalanobis distance between each target factor and a preset sample factor to obtain an abnormal detection result;

the processing module is used for carrying out self-organizing mapping visual analysis processing on the abnormal detection result to obtain a service risk visual report; the method specifically comprises the following steps: extracting a topological structure corresponding to the self-organizing map, and matching to obtain weight data of each grid node in the topological structure; performing abnormal node mapping on the abnormal detection result based on the weight data of each grid node to obtain a target abnormal node; extracting a data tag of the target abnormal node, and performing grid color coding on the topological structure according to the data tag of the target abnormal node to obtain a coded topological structure; and generating a risk heat point diagram and a trend analysis diagram according to the coding topological structure, and generating the service risk visualization report according to the risk heat point diagram and the trend analysis diagram.

3. An enterprise business data mining apparatus, the enterprise business data mining apparatus comprising: a memory and at least one processor, the memory having instructions stored therein;

The at least one processor invoking the instructions in the memory to cause the enterprise business data mining apparatus to perform the enterprise business data mining method of claim 1.

4. A computer readable storage medium having instructions stored thereon, which when executed by a processor implement the enterprise business data mining method of claim 1.