CN114140635B

CN114140635B - Non-negative matrix factorization method for self-expression learning supervision

Info

Publication number: CN114140635B
Application number: CN202110911804.7A
Authority: CN
Inventors: 孙艳丰; 王杰; 郭继鹏; 胡永利; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2024-05-28
Anticipated expiration: 2041-08-10
Also published as: CN114140635A

Abstract

The invention discloses a non-negative matrix factorization method for self-expression learning supervision. Then, the similarity matrix is further decomposed, and a matrix with cluster structure information can be obtained. And finally, guiding the matrix with the cluster structure information to learn the coefficient matrix, so that the matrix has a consistent structure, and the discrimination capability of the coefficient matrix is improved. The method mainly solves the problem that the coefficient matrix obtained by non-negative matrix factorization is weak in discrimination capability in the aspect of unsupervised clustering. The non-negative matrix factorization method for self-representation learning supervision fully considers the problem of weak discrimination capability of the traditional non-negative matrix factorization, and the self-representation method is utilized to further obtain the matrix with cluster structure information to guide the learning of low-dimensional representation, so that the discrimination capability of the low-dimensional representation can be effectively improved, and the clustering performance is improved.

Description

Non-negative matrix factorization method for self-expression learning supervision

Technical Field

The invention relates to a non-negative matrix factorization method for self-expression learning supervision, which is suitable for a dimension-reduction clustering technology in the field of machine learning.

Background

With the continuous development of information collection technology, the collected data sets are larger and larger in scale, which brings great challenges, such as dimension disasters, exponential degradation of algorithm performance and the like, so that useful information cannot be timely extracted. Thus, extracting the most important low-dimensional representation from these high-dimensional data not only helps to avoid the problem of "dimension disasters" but also reduces the complexity of the input data space. Furthermore, it is necessary to embed the high-dimensional space into a low-dimensional space while retaining a large portion of the desired useful information. Research shows that some dimension reduction methods have achieved great success, such as principal component analysis, linear discriminant analysis, partial projection method and the like. However, while these methods are very effective for dimension reduction of high-dimensional data, they are poorly interpreted in terms of physical meaning. Instead, non-Negative Matrix Factorization (NMF) is increasingly becoming the most popular dimension reduction tool due to the direct interpretability of the factorization results.

Unlike principal component analysis and SVD methods, NMF decomposes raw data into multiplications of two matrices, which are subject to non-negative constraints. One matrix is composed of basis vectors revealing underlying semantic structures, and the other matrix can be seen as coefficients where each sample point is linearly combined by other basis vectors. NMF can be considered as a partial-based data representation and therefore is applied only to combinations of additions, but not subtractions, which makes the decomposed representation matrix easy to interpret. NMF and extended versions have been widely used in image segmentation, text mining, image clustering, etc. because of its ability to extract the most identifiable features and computational feasibility.

While NMF and its related methods have achieved tremendous success in many applications, in unsupervised clustering, the discrimination ability of the learned low-dimensional representation is weak due to lack of supervisory information, resulting in poor performance of subsequent clustering tasks. In fact, there is some important clustering structure information in the data itself, which is important for improving the clustering performance. Therefore, how to learn the cluster structure information contained in the data itself as the supervision information to guide the learning of the low-dimensional representation is a problem that must be studied at present.

Disclosure of Invention

In order to solve the problem of weak distinguishing capability of the coefficient matrix, the invention provides a novel non-negative matrix factorization method for self-expression learning supervision. The method firstly obtains a similarity matrix reflecting the local or global structure of the data through self-expression learning. Then, the similarity matrix is further decomposed, and a matrix with cluster structure information can be obtained. And finally, guiding the matrix with the cluster structure information to learn the coefficient matrix, so that the matrix has a consistent structure, and the discrimination capability of the coefficient matrix is improved. The method mainly solves the problem that the coefficient matrix obtained by non-negative matrix factorization is weak in discrimination capability in the aspect of unsupervised clustering.

The non-negative matrix factorization method for self-representation learning supervision fully considers the problem of weak discrimination capability of the traditional non-negative matrix factorization, and further obtains the matrix with clustering structure information to guide the learning of low-dimensional representation by using the self-representation method, so that the discrimination capability of the low-dimensional representation can be effectively improved, and the clustering performance is improved.

The invention is realized by the following technical scheme:

the original data is input, and normalization processing is carried out on the original data to improve efficiency, and the normalized images have the same standard. The data is then represented in a low dimension using the proposed model. And finally, evaluating the dimension reduction method by using a clustering method and an evaluation index. The method comprises the following specific steps:

step one: construction of sample points

The invention firstly uses four classical databases to construct input sample points, and the specific information of the four data sets is shown in the table:

table 1 data set introduction

Data set	Size (N)	Dimension (M)	Category number (K)
				COIL20	1440	1024	20
ORL	400	1024	40
				Yale	165	1024	15
PIE	2856	1024	68

A partial image of the dataset is shown in fig. 1 below (COIL 20, ORL, yale and PIE in order from top to bottom). Optionally a databaseWherein f _i is a sample point, and normalizing the sample to obtain

Step two: dimension reduction treatment

Since the same subspace data tends to have very strong correlation, and the different subspace data has no correlation or weak correlation, self-expression learning is performed on the normalized non-negative data matrix X first. The similarity matrix is constructed with the following formula:

X＝XZ

s.t.Z≥0,Z1＝1. (1)

Wherein, The constraint ensures that all solutions for Z are meaningful, since Z is a similarity matrix established from the raw data, and each element in the matrix Z represents a similarity weight between x _i and other points of similarity x _j, the sum of the similarity weights for each point and other points being 1.

Research shows that based on self-expression, the similarity matrix can be further decomposed to embody clustering structure information:

Through the above process, a matrix G with cluster structure information is obtained.

Meanwhile, since data in reality is often high-dimensional, there is a problem of 'curse of dimension', etc., it is necessary to perform dimension reduction processing on the high-dimensional data. NMF aims to find two non-negative matrices and minimize the approximation error between the product of the two non-negative matrices and the original data matrix. For normalized data(D is the feature number, n is the number of samples), two non-negative matrices/>, are found using NMFAnd/>The concrete representation is as follows:

The matrix G with cluster structure information obtained by self-expression learning is used for supervising the guide coefficient matrix V, so that the guide coefficient matrix V has a consistent structure, and the discrimination capability of the coefficient matrix is enhanced. Unifying the self-expression learning and the non-negative matrix learning into one framework can be expressed by the following formula:

wherein alpha and beta are balance parameters, the value range {10^-4,10^-3,10^-2,10^-1,10⁰,10¹,10²,10³,10⁴}; V is a low-dimensional representation of the original data obtained by non-negative matrix factorization, GG ^T is a similarity matrix learned by the original data, For a low-dimensional representation matrix learned by a similarity matrix, there is some cluster structure information in the data.The update rule of the method is as follows.

Wherein h=x ^T X and w=gg ^T.

Step three, subsequent clustering

After the model is solved, a coefficient matrix V with strong discrimination capability can be obtained. For the reduced-dimension representation V, the sample dataset is divided into k clusters according to the distance between samples. At the same time, the points in the clusters are ensured to be tightly connected together as much as possible, and the distance between the clusters is kept to be the maximum. Expressed in mathematical formulas, assuming the cluster partition is (C ₁,C₂,...C_k), the goal is to minimize the squared error E:

Where μ _i is the mean vector of cluster C _i, also referred to as the centroid, which can be expressed as:

according to the clustering method, the coefficient matrix is subjected to subsequent clustering, and excellent clustering results can be obtained.

Compared with the prior art, the invention has the following advantages:

(1) The method utilizes self-expression learning to build a high-quality adaptive graph. In contrast to previous approaches, the graph built here not only considers global structure, but is not predefined fixed, while reducing the complexity of the model.

(2) The method further decomposes the similarity matrix to obtain a matrix with discrimination information, and the discrimination capability of the low-dimensional representation is improved by using the matrix.

Drawings

FIG. 1 is a diagram of a sample portion of a database.

Fig. 2 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 1.

Fig. 3 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 2.

Fig. 4 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 3.

Fig. 5 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 4.

Detailed Description

The algorithm of the present invention can be expressed as follows:

1) Input raw data matrix Dimension r (r.ltoreq.D), parameters α, β and γ;

2) Initialization by standard NMF Epsilon=10 ^-5;

3) Iteratively updating U, V and G by equations (5), (6) and (7) until the condition is satisfied

4) K-means clustering application is carried out by using a coefficient matrix V;

5) And carrying out qualitative analysis on the model and carrying out quantitative analysis on the clustering result by using various evaluation indexes.

The invention has been experimentally verified on four data sets, and excellent experimental results are obtained.

1. Qualitative assessment

The invention provides a method based on traditional nonnegative matrix factorization, which is used for constructing a similarity matrix based on self-expression learning by fully considering local or global structural information among data and further obtaining a matrix of cluster structural information. The matrix with the cluster structure information is learned by using an unsupervised learning method, and the learning of the coefficient matrix is supervised and guided. In addition, the model is built on the basis of traditional nonnegative matrix factorization, so that the performance is necessarily better than NMF, and the coefficient matrix has cluster structure supervision information, so that the discrimination capability is obviously improved, and the clustering performance is enhanced.

2. Quantitative evaluation

The experiment adopts 4 evaluation criteria to evaluate the application of the non-negative matrix factorization method based on self-expression learning supervision in clustering.

1. Analysis of experimental results

Based on the evaluation criteria, table 2 shows the clustering results of the different algorithms on the COIL20, ORL, yale and PIE datasets at four evaluation criteria, normalized information (NMI), accuracy (ACC), F-score and purity, respectively, and compared to the six classical dimension reduction methods k-means, PCA, NMF, CF, LCCF and ALLRNMF. And the best results are marked in bold.

It can be seen from table 2 that the non-negative matrix factorization algorithm of self-representative learning supervision proposed by the present invention is always superior to other algorithms. The model can learn the clustering structure information of the data on the basis of self-expression learning, and supervise and guide the coefficient matrix by using the information, so that the model has a consistent structure, and the distinguishing capability of the coefficient matrix is obviously improved. In addition, the ALLRNMF algorithm was found to be due to the NMF algorithm in most cases, which suggests that the ALLRNMF algorithm can not only construct a more accurate graph relationship matrix, but also maintain local structure during learning of the low-dimensional representation. Experimental results also show that the clustering performance of LCCF method is superior to CF in most cases, indicating the importance of maintaining the aggregate structure.

TABLE 2 clustering results on different data aggregations

2. Convergence analysis

The updating rule of the objective function is an iterative mode, and a convergence curve of the objective function is manufactured on different data sets. Where the y-axis is the value of the objective function and the x-axis is the number of iterations. As is clear from fig. 2,3, 4 and 5, the objective function of the optimization algorithm shows a monotonically decreasing trend in different data sets and reaches a stable value after several iterations, which indicates that the proposed non-negative matrix factorization method with self-expression learning supervision is very effective in practical applications.

Claims

1. The non-negative matrix factorization method for self-expression learning supervision is characterized by comprising the following steps of: the method inputs original data, firstly, the original data is normalized, and normalized images have the same standard; then the data is represented in low dimension by using the proposed model; finally, evaluating the dimension reduction method by using a clustering method and an evaluation index; the method comprises the following specific steps:

step one: construction of sample points

Firstly, constructing input sample points by using four classical databases, wherein partial images of the data sets are sequentially OIL20, ORL, yale and PIE; optionally a databaseWherein f _i is a sample point, and normalizing the sample to obtain/>

Step two: dimension reduction treatment

Since the same subspace data often has very strong correlation, and different subspace data has no correlation or weak correlation, self-expression learning is performed on the normalized non-negative data matrix X; the similarity matrix is constructed with the following formula:

X＝XZ

s.t.Z≥0,Z1＝1. (1)

Wherein, The constraint condition ensures that all solutions about Z are meaningful, since Z is a similarity matrix established by the original data, and each element in the matrix Z represents a similarity weight between x _i and other similarity points x _j, and the sum of the similarity weights of each point and other points is 1;

on the basis of self-expression, the similarity matrix is further decomposed to embody clustering structure information:

through the above process, a matrix G with cluster structure information is obtained;

For normalized data D is a feature number, n is a sample number, and NMF is utilized to find two non-negative matrices/>And/>The concrete representation is as follows:

the matrix G with cluster structure information obtained by self-expression learning is used for supervising the guide coefficient matrix V, so that the guide coefficient matrix V has a consistency structure, and the discrimination capability of the coefficient matrix is enhanced; unifying the self-expression learning and the non-negative matrix learning into one framework is expressed by the following formula:

Wherein alpha and beta are balance parameters, the value range {10^-4,10^-3,10^-2,10^-1,10⁰,10¹,10²,10³,10⁴};V is a low-dimensional representation of the original data obtained by non-negative matrix factorization, GG ^T is a similarity matrix learned by the original data, A low-dimensional representation matrix learned by the similarity matrix, having certain cluster structure information in the data; /(I)The updating rule of the method is as follows;

Wherein h=x ^T X and w=gg ^T;

Step three, subsequent clustering

After the model is solved, a coefficient matrix V with strong discrimination capability can be obtained; for the reduced representation V, dividing a sample data set into k clusters according to the distance between samples; meanwhile, the points in the clusters are ensured to be tightly connected together as much as possible, and the distance between the clusters is kept to be the maximum; expressed in mathematical formulas, assuming the cluster partition is (C ₁,C₂,...C_k), the goal is to minimize the squared error E:

where μ _i is the mean vector of cluster C _i, also referred to as centroid, expressed as: