[go: up one dir, main page]

CN114140635B - Non-negative matrix factorization method for self-expression learning supervision - Google Patents

Non-negative matrix factorization method for self-expression learning supervision Download PDF

Info

Publication number
CN114140635B
CN114140635B CN202110911804.7A CN202110911804A CN114140635B CN 114140635 B CN114140635 B CN 114140635B CN 202110911804 A CN202110911804 A CN 202110911804A CN 114140635 B CN114140635 B CN 114140635B
Authority
CN
China
Prior art keywords
matrix
data
self
clustering
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110911804.7A
Other languages
Chinese (zh)
Other versions
CN114140635A (en
Inventor
孙艳丰
王杰
郭继鹏
胡永利
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110911804.7A priority Critical patent/CN114140635B/en
Publication of CN114140635A publication Critical patent/CN114140635A/en
Application granted granted Critical
Publication of CN114140635B publication Critical patent/CN114140635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a non-negative matrix factorization method for self-expression learning supervision. Then, the similarity matrix is further decomposed, and a matrix with cluster structure information can be obtained. And finally, guiding the matrix with the cluster structure information to learn the coefficient matrix, so that the matrix has a consistent structure, and the discrimination capability of the coefficient matrix is improved. The method mainly solves the problem that the coefficient matrix obtained by non-negative matrix factorization is weak in discrimination capability in the aspect of unsupervised clustering. The non-negative matrix factorization method for self-representation learning supervision fully considers the problem of weak discrimination capability of the traditional non-negative matrix factorization, and the self-representation method is utilized to further obtain the matrix with cluster structure information to guide the learning of low-dimensional representation, so that the discrimination capability of the low-dimensional representation can be effectively improved, and the clustering performance is improved.

Description

Non-negative matrix factorization method for self-expression learning supervision
Technical Field
The invention relates to a non-negative matrix factorization method for self-expression learning supervision, which is suitable for a dimension-reduction clustering technology in the field of machine learning.
Background
With the continuous development of information collection technology, the collected data sets are larger and larger in scale, which brings great challenges, such as dimension disasters, exponential degradation of algorithm performance and the like, so that useful information cannot be timely extracted. Thus, extracting the most important low-dimensional representation from these high-dimensional data not only helps to avoid the problem of "dimension disasters" but also reduces the complexity of the input data space. Furthermore, it is necessary to embed the high-dimensional space into a low-dimensional space while retaining a large portion of the desired useful information. Research shows that some dimension reduction methods have achieved great success, such as principal component analysis, linear discriminant analysis, partial projection method and the like. However, while these methods are very effective for dimension reduction of high-dimensional data, they are poorly interpreted in terms of physical meaning. Instead, non-Negative Matrix Factorization (NMF) is increasingly becoming the most popular dimension reduction tool due to the direct interpretability of the factorization results.
Unlike principal component analysis and SVD methods, NMF decomposes raw data into multiplications of two matrices, which are subject to non-negative constraints. One matrix is composed of basis vectors revealing underlying semantic structures, and the other matrix can be seen as coefficients where each sample point is linearly combined by other basis vectors. NMF can be considered as a partial-based data representation and therefore is applied only to combinations of additions, but not subtractions, which makes the decomposed representation matrix easy to interpret. NMF and extended versions have been widely used in image segmentation, text mining, image clustering, etc. because of its ability to extract the most identifiable features and computational feasibility.
While NMF and its related methods have achieved tremendous success in many applications, in unsupervised clustering, the discrimination ability of the learned low-dimensional representation is weak due to lack of supervisory information, resulting in poor performance of subsequent clustering tasks. In fact, there is some important clustering structure information in the data itself, which is important for improving the clustering performance. Therefore, how to learn the cluster structure information contained in the data itself as the supervision information to guide the learning of the low-dimensional representation is a problem that must be studied at present.
Disclosure of Invention
In order to solve the problem of weak distinguishing capability of the coefficient matrix, the invention provides a novel non-negative matrix factorization method for self-expression learning supervision. The method firstly obtains a similarity matrix reflecting the local or global structure of the data through self-expression learning. Then, the similarity matrix is further decomposed, and a matrix with cluster structure information can be obtained. And finally, guiding the matrix with the cluster structure information to learn the coefficient matrix, so that the matrix has a consistent structure, and the discrimination capability of the coefficient matrix is improved. The method mainly solves the problem that the coefficient matrix obtained by non-negative matrix factorization is weak in discrimination capability in the aspect of unsupervised clustering.
The non-negative matrix factorization method for self-representation learning supervision fully considers the problem of weak discrimination capability of the traditional non-negative matrix factorization, and further obtains the matrix with clustering structure information to guide the learning of low-dimensional representation by using the self-representation method, so that the discrimination capability of the low-dimensional representation can be effectively improved, and the clustering performance is improved.
The invention is realized by the following technical scheme:
the original data is input, and normalization processing is carried out on the original data to improve efficiency, and the normalized images have the same standard. The data is then represented in a low dimension using the proposed model. And finally, evaluating the dimension reduction method by using a clustering method and an evaluation index. The method comprises the following specific steps:
step one: construction of sample points
The invention firstly uses four classical databases to construct input sample points, and the specific information of the four data sets is shown in the table:
table 1 data set introduction
Data set Size (N) Dimension (M) Category number (K)
COIL20 1440 1024 20
ORL 400 1024 40
Yale 165 1024 15
PIE 2856 1024 68
A partial image of the dataset is shown in fig. 1 below (COIL 20, ORL, yale and PIE in order from top to bottom). Optionally a databaseWherein f i is a sample point, and normalizing the sample to obtain
Step two: dimension reduction treatment
Since the same subspace data tends to have very strong correlation, and the different subspace data has no correlation or weak correlation, self-expression learning is performed on the normalized non-negative data matrix X first. The similarity matrix is constructed with the following formula:
X=XZ
s.t.Z≥0,Z1=1. (1)
Wherein, The constraint ensures that all solutions for Z are meaningful, since Z is a similarity matrix established from the raw data, and each element in the matrix Z represents a similarity weight between x i and other points of similarity x j, the sum of the similarity weights for each point and other points being 1.
Research shows that based on self-expression, the similarity matrix can be further decomposed to embody clustering structure information:
Through the above process, a matrix G with cluster structure information is obtained.
Meanwhile, since data in reality is often high-dimensional, there is a problem of 'curse of dimension', etc., it is necessary to perform dimension reduction processing on the high-dimensional data. NMF aims to find two non-negative matrices and minimize the approximation error between the product of the two non-negative matrices and the original data matrix. For normalized data(D is the feature number, n is the number of samples), two non-negative matrices/>, are found using NMFAnd/>The concrete representation is as follows:
The matrix G with cluster structure information obtained by self-expression learning is used for supervising the guide coefficient matrix V, so that the guide coefficient matrix V has a consistent structure, and the discrimination capability of the coefficient matrix is enhanced. Unifying the self-expression learning and the non-negative matrix learning into one framework can be expressed by the following formula:
wherein alpha and beta are balance parameters, the value range {10-4,10-3,10-2,10-1,100,101,102,103,104}; V is a low-dimensional representation of the original data obtained by non-negative matrix factorization, GG T is a similarity matrix learned by the original data, For a low-dimensional representation matrix learned by a similarity matrix, there is some cluster structure information in the data.The update rule of the method is as follows.
Wherein h=x T X and w=gg T.
Step three, subsequent clustering
After the model is solved, a coefficient matrix V with strong discrimination capability can be obtained. For the reduced-dimension representation V, the sample dataset is divided into k clusters according to the distance between samples. At the same time, the points in the clusters are ensured to be tightly connected together as much as possible, and the distance between the clusters is kept to be the maximum. Expressed in mathematical formulas, assuming the cluster partition is (C 1,C2,...Ck), the goal is to minimize the squared error E:
Where μ i is the mean vector of cluster C i, also referred to as the centroid, which can be expressed as:
according to the clustering method, the coefficient matrix is subjected to subsequent clustering, and excellent clustering results can be obtained.
Compared with the prior art, the invention has the following advantages:
(1) The method utilizes self-expression learning to build a high-quality adaptive graph. In contrast to previous approaches, the graph built here not only considers global structure, but is not predefined fixed, while reducing the complexity of the model.
(2) The method further decomposes the similarity matrix to obtain a matrix with discrimination information, and the discrimination capability of the low-dimensional representation is improved by using the matrix.
Drawings
FIG. 1 is a diagram of a sample portion of a database.
Fig. 2 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 1.
Fig. 3 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 2.
Fig. 4 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 3.
Fig. 5 is a trend graph in which the objective functions of the optimization algorithm all exhibit monotonic decreases in the data set 4.
Detailed Description
The algorithm of the present invention can be expressed as follows:
1) Input raw data matrix Dimension r (r.ltoreq.D), parameters α, β and γ;
2) Initialization by standard NMF Epsilon=10 -5;
3) Iteratively updating U, V and G by equations (5), (6) and (7) until the condition is satisfied
4) K-means clustering application is carried out by using a coefficient matrix V;
5) And carrying out qualitative analysis on the model and carrying out quantitative analysis on the clustering result by using various evaluation indexes.
The invention has been experimentally verified on four data sets, and excellent experimental results are obtained.
1. Qualitative assessment
The invention provides a method based on traditional nonnegative matrix factorization, which is used for constructing a similarity matrix based on self-expression learning by fully considering local or global structural information among data and further obtaining a matrix of cluster structural information. The matrix with the cluster structure information is learned by using an unsupervised learning method, and the learning of the coefficient matrix is supervised and guided. In addition, the model is built on the basis of traditional nonnegative matrix factorization, so that the performance is necessarily better than NMF, and the coefficient matrix has cluster structure supervision information, so that the discrimination capability is obviously improved, and the clustering performance is enhanced.
2. Quantitative evaluation
The experiment adopts 4 evaluation criteria to evaluate the application of the non-negative matrix factorization method based on self-expression learning supervision in clustering.
1. Analysis of experimental results
Based on the evaluation criteria, table 2 shows the clustering results of the different algorithms on the COIL20, ORL, yale and PIE datasets at four evaluation criteria, normalized information (NMI), accuracy (ACC), F-score and purity, respectively, and compared to the six classical dimension reduction methods k-means, PCA, NMF, CF, LCCF and ALLRNMF. And the best results are marked in bold.
It can be seen from table 2 that the non-negative matrix factorization algorithm of self-representative learning supervision proposed by the present invention is always superior to other algorithms. The model can learn the clustering structure information of the data on the basis of self-expression learning, and supervise and guide the coefficient matrix by using the information, so that the model has a consistent structure, and the distinguishing capability of the coefficient matrix is obviously improved. In addition, the ALLRNMF algorithm was found to be due to the NMF algorithm in most cases, which suggests that the ALLRNMF algorithm can not only construct a more accurate graph relationship matrix, but also maintain local structure during learning of the low-dimensional representation. Experimental results also show that the clustering performance of LCCF method is superior to CF in most cases, indicating the importance of maintaining the aggregate structure.
TABLE 2 clustering results on different data aggregations
2. Convergence analysis
The updating rule of the objective function is an iterative mode, and a convergence curve of the objective function is manufactured on different data sets. Where the y-axis is the value of the objective function and the x-axis is the number of iterations. As is clear from fig. 2,3, 4 and 5, the objective function of the optimization algorithm shows a monotonically decreasing trend in different data sets and reaches a stable value after several iterations, which indicates that the proposed non-negative matrix factorization method with self-expression learning supervision is very effective in practical applications.

Claims (1)

1. The non-negative matrix factorization method for self-expression learning supervision is characterized by comprising the following steps of: the method inputs original data, firstly, the original data is normalized, and normalized images have the same standard; then the data is represented in low dimension by using the proposed model; finally, evaluating the dimension reduction method by using a clustering method and an evaluation index; the method comprises the following specific steps:
step one: construction of sample points
Firstly, constructing input sample points by using four classical databases, wherein partial images of the data sets are sequentially OIL20, ORL, yale and PIE; optionally a databaseWherein f i is a sample point, and normalizing the sample to obtain/>
Step two: dimension reduction treatment
Since the same subspace data often has very strong correlation, and different subspace data has no correlation or weak correlation, self-expression learning is performed on the normalized non-negative data matrix X; the similarity matrix is constructed with the following formula:
X=XZ
s.t.Z≥0,Z1=1. (1)
Wherein, The constraint condition ensures that all solutions about Z are meaningful, since Z is a similarity matrix established by the original data, and each element in the matrix Z represents a similarity weight between x i and other similarity points x j, and the sum of the similarity weights of each point and other points is 1;
on the basis of self-expression, the similarity matrix is further decomposed to embody clustering structure information:
through the above process, a matrix G with cluster structure information is obtained;
For normalized data D is a feature number, n is a sample number, and NMF is utilized to find two non-negative matrices/>And/>The concrete representation is as follows:
the matrix G with cluster structure information obtained by self-expression learning is used for supervising the guide coefficient matrix V, so that the guide coefficient matrix V has a consistency structure, and the discrimination capability of the coefficient matrix is enhanced; unifying the self-expression learning and the non-negative matrix learning into one framework is expressed by the following formula:
Wherein alpha and beta are balance parameters, the value range {10-4,10-3,10-2,10-1,100,101,102,103,104};V is a low-dimensional representation of the original data obtained by non-negative matrix factorization, GG T is a similarity matrix learned by the original data, A low-dimensional representation matrix learned by the similarity matrix, having certain cluster structure information in the data; /(I)The updating rule of the method is as follows;
Wherein h=x T X and w=gg T;
Step three, subsequent clustering
After the model is solved, a coefficient matrix V with strong discrimination capability can be obtained; for the reduced representation V, dividing a sample data set into k clusters according to the distance between samples; meanwhile, the points in the clusters are ensured to be tightly connected together as much as possible, and the distance between the clusters is kept to be the maximum; expressed in mathematical formulas, assuming the cluster partition is (C 1,C2,...Ck), the goal is to minimize the squared error E:
where μ i is the mean vector of cluster C i, also referred to as centroid, expressed as:
according to the clustering method, the coefficient matrix is subjected to subsequent clustering, and excellent clustering results can be obtained.
CN202110911804.7A 2021-08-10 2021-08-10 Non-negative matrix factorization method for self-expression learning supervision Active CN114140635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110911804.7A CN114140635B (en) 2021-08-10 2021-08-10 Non-negative matrix factorization method for self-expression learning supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110911804.7A CN114140635B (en) 2021-08-10 2021-08-10 Non-negative matrix factorization method for self-expression learning supervision

Publications (2)

Publication Number Publication Date
CN114140635A CN114140635A (en) 2022-03-04
CN114140635B true CN114140635B (en) 2024-05-28

Family

ID=80394178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110911804.7A Active CN114140635B (en) 2021-08-10 2021-08-10 Non-negative matrix factorization method for self-expression learning supervision

Country Status (1)

Country Link
CN (1) CN114140635B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853239A (en) * 2010-05-06 2010-10-06 复旦大学 A Dimensionality Reduction Method Based on Non-Negative Matrix Factorization for Clustering
CN102411610A (en) * 2011-10-12 2012-04-11 浙江大学 Semi-supervised dimensionality reduction method for high-dimensional data clustering
CN110276049A (en) * 2019-06-24 2019-09-24 河南科技大学 A Semi-Supervised Adaptive Graph Regularization Discriminative Nonnegative Matrix Factorization Method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009038822A2 (en) * 2007-05-25 2009-03-26 The Research Foundation Of State University Of New York Spectral clustering for multi-type relational data
US20210064928A1 (en) * 2018-02-16 2021-03-04 Nec Corporation Information processing apparatus, method, and non-transitory storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853239A (en) * 2010-05-06 2010-10-06 复旦大学 A Dimensionality Reduction Method Based on Non-Negative Matrix Factorization for Clustering
CN102411610A (en) * 2011-10-12 2012-04-11 浙江大学 Semi-supervised dimensionality reduction method for high-dimensional data clustering
CN110276049A (en) * 2019-06-24 2019-09-24 河南科技大学 A Semi-Supervised Adaptive Graph Regularization Discriminative Nonnegative Matrix Factorization Method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于图正则化和稀疏约束的半监督非负矩阵分解;姜小燕;孙福明;李豪杰;;计算机科学;20160715(07);全文 *
基于图正则化的受限非负矩阵分解算法及在图像表示中的应用;舒振球;赵春霞;;模式识别与人工智能;20130315(03);全文 *
基于流形正则化判别的因子分解;杜世强;石玉清;王维兰;马明;;山东大学学报(理学版);20130417(05);全文 *
徐慧敏 ; 陈秀宏 ; .图正则化稀疏判别非负矩阵分解.智能系统学报.(06),全文. *

Also Published As

Publication number Publication date
CN114140635A (en) 2022-03-04

Similar Documents

Publication Publication Date Title
Li et al. Deep graph regularized non-negative matrix factorization for multi-view clustering
US7653646B2 (en) Method and apparatus for quantum clustering
Sra Directional statistics in machine learning: a brief review
Celeux et al. Variable selection in model-based clustering and discriminant analysis with a regularization approach
CN110826635B (en) Sample clustering and feature identification method based on integration non-negative matrix factorization
CN111428768A (en) Clustering Method Based on Hellinger Distance-Gaussian Mixture Model
CN111625576B (en) Score clustering analysis method based on t-SNE
CN113298009B (en) Entropy regularization-based self-adaptive adjacent face image clustering method
CN116523320A (en) Intellectual property risk intelligent analysis method based on Internet big data
CN115098690B (en) Multi-data document classification method and system based on cluster analysis
CN114863151A (en) Image dimensionality reduction clustering method based on fuzzy theory
CN109840545A (en) A kind of robustness structure Non-negative Matrix Factorization clustering method based on figure regularization
CN111652265A (en) A Robust Semi-Supervised Sparse Feature Selection Method Based on Self-Adjusting Graphs
CN114140635B (en) Non-negative matrix factorization method for self-expression learning supervision
Barbaro et al. Sparse mixture of von Mises-Fisher distribution
CN114037853A (en) Depth image clustering method based on Laplace rank constraint
CN112348115A (en) Feature selection method for solving biological data classification
CN113807393B (en) Clustering method based on multi-attribute non-negative matrix factorization
Chen et al. A nonlinear clustering algorithm via kernel function and locality structure learning
Luo et al. Hypergraph-based convex semi-supervised unconstraint symmetric matrix factorization for image clustering
CN113688229B (en) Text recommendation method, system, storage medium and equipment
Maugis-Rabusseau et al. SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values
Li et al. Dual space-based fuzzy graphs and orthogonal basis clustering for unsupervised feature selection
CN119249179B (en) A method, device, equipment and medium for proposal merging based on dimensionality reduction clustering
CN114742135B (en) Comparison principal component data analysis method based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant