Fast Fusion Clustering via Double Random Projection
<p>Plots of BIC values against <math display="inline"><semantics> <mrow> <mo form="prefix">log</mo> <mo>(</mo> <mi>λ</mi> <mo>)</mo> </mrow> </semantics></math> and the estimated cluster number of simulation with different <span class="html-italic">n</span>, <span class="html-italic">p</span> and true cluster number <math display="inline"><semantics> <mrow> <mi>K</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>.</p> "> Figure 2
<p>The level plots of cluster matrix including the true one in the left panel, estimators calculated from RRP-ADMM and RP-ADMM in the middle and right panels, respectively.</p> "> Figure 3
<p>Boxplots of SI values through RP-ADMM and RRP-ADMM algorithms, respectively, under four choices of <span class="html-italic">c</span> after 100 replicates.</p> "> Figure 4
<p>Boxplots of SI values through classical ADMM and RRP-ADMM algorithms, respectively, under four choices of sample sizes <span class="html-italic">n</span> after 100 replicates.</p> "> Figure 5
<p>Boxplots of the square root of the run time through classical ADMM and RRP-ADMM algorithms, respectively, under four choices of sample sizes <span class="html-italic">n</span> after 100 replicates.</p> "> Figure 6
<p>Boxplots of SI values through DRP-ADMM and DRRP-ADMM algorithms, respectively, under four choices of dimensions <span class="html-italic">p</span> after 100 replicates.</p> "> Figure 7
<p>True (<b>a</b>) and estimated (<b>b</b>) cluster matrix in DrivFace data.</p> "> Figure 8
<p>The above <math display="inline"><semantics> <mrow> <mi>f</mi> <mi>u</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> <mi>g</mi> <mi>r</mi> <mi>a</mi> <mi>m</mi> <mi>s</mi> </mrow> </semantics></math> are plotted from 4 selected variables in DrivFace data before (left panel) and after (right panel) deleting the influence points, respectively.</p> ">
Abstract
:1. Introduction
2. Methodology
2.1. Random Projection Based ADMM
- Bernoulli distribution-based random projections ADMM
- For the Lasso penalty [25],
- For SCAD penalty [26] with ,
- For the MCP [27] with ,
- For the TLP [8] with ,
Algorithm 1 RP-ADMM for fusion clustering |
Input: data ; Initialize , ; tuning parameter, Output: an estimate of for do compute using (5) compute using (6) compute using (7) if convergence criterion is met, then Stop and denote the last iteration by , else end if end for |
2.2. Selection of Optimal Tuning Parameter
2.3. Recursive RP-ADMM and Cluster Matrix
Algorithm 2 RRP-ADMM for fusion clustering |
Input: data ; M; Initialize , ; tuning parameter, Output: an estimate of for, M do compute using RP-ADMM end for while do compute and from (13) end while |
3. Simulation
3.1. Low-Dimensional Setting
3.2. High-Dimensional Setting
4. Real Data Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Lemma 1
Appendix B. Proof of Theorem 1
References
- Haq, M.A. CDLSTM: A novel model for climate change forecasting. Comput. Mater. Contin. 2022, 71, 2. [Google Scholar] [CrossRef]
- Haq, M.A. SMOTEDNN: A novel model for air pollution forecasting and AQI classification. Comput. Mater. Contin. 2022, 71, 1. [Google Scholar] [CrossRef]
- Van Der Kloot, W.A.; Spaans, A.M.J.; Heiser, W.J. Instability of hierarchical cluster analysis due to input order of the data: The PermuCLUSTER solution. Psychol. Methods 2005, 10, 468. [Google Scholar] [CrossRef] [PubMed]
- Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yan, X.; Huang, J. High-dimensional integrative analysis with homogeneity and sparsity recovery. J. Multivar. Anal. 2019, 174, 104529. [Google Scholar] [CrossRef]
- Chi, E.C.; Lange, K. Splitting methods for convex clustering. J. Comput. Graph. Stat. 2015, 24, 994–1013. [Google Scholar] [CrossRef] [PubMed]
- Lindsten, F.; Ohlsson, H.; Ljung, L. Clustering using sum-of-norms regularization: With application to particle filter output computation. In Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 28–30 June 2011; pp. 201–204. [Google Scholar] [CrossRef]
- Pan, W.; Shen, X.; Liu, B. Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty. J. Mach. Learn. Res. 2013, 14, 1865. [Google Scholar]
- Yang, X.; Yan, X. Mechanism and a new algorithm for nonconvex clustering. J. Stat. Comput. Sim. 2020, 90, 719–746. [Google Scholar] [CrossRef]
- Paul, D.; Chakraborty, S.; Das, S.; Xu, J. Implicit annealing in kernel spaces: A strongly consistent clustering approach. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5862–5871. [Google Scholar] [CrossRef]
- Shah, S.A.; Koltun, V. Robust continuous clustering. Proc. Natl. Acad. Sci. USA 2017, 114, 9814–9819. [Google Scholar] [CrossRef]
- Hocking, T.D.; Joulin, A.; Bach, F.; Vert, J.P. Clusterpath an algorithm for clustering using convex fusion penalties. In Proceedings of the 28th International Conference on Machine Learning, Washington, DC, USA, 28 June–2 July 2011; p. 1. [Google Scholar]
- Radchenko, P.; Mukherjee, G. Convex clustering via l1 fusion penalization. J. R. Stat. Soc. B. 2017, 79, 1527–1546. [Google Scholar] [CrossRef]
- Wang, B.; Zhang, Y.; Sun, W.W.; Fang, Y. Sparse convex clustering. J. Comput. Graph. Stat. 2018, 27, 393–403. [Google Scholar] [CrossRef]
- Yan, X.; Yin, G.; Zhao, X. Subgroup analysis in censored linear regression. Stat. Sinica 2021, 31, 1027–1054. [Google Scholar] [CrossRef]
- Yan, X.; Wang, H.; Zhou, Y.; Yan, J.; Wang, Y.; Wang, W.; Xie, J.; Yang, S.; Zeng, Z.; Chen, X. Heterogeneous logistic regression for estimation of subgroup effects on hypertension. J. Biopharm. Stat. 2022, 32, 969–985. [Google Scholar] [CrossRef] [PubMed]
- Zhu, C.; Xu, H.; Leng, C.; Yan, S. Convex optimization procedure for clustering: Theoretical revisit. Adv. Neural Inf. Process. Syst. 2014, 1619–1627. [Google Scholar]
- Ma, S.; Huang, J. A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 2017, 112, 410–423. [Google Scholar] [CrossRef]
- Ma, S.; Huang, J. Estimating subgroup-specific treatment effects via concave fusion. arXiv 2016, arXiv:1607.03717. [Google Scholar]
- Marchetti, Y.; Zhou, Q. Iterative subsampling in solution path clustering of noisy big data. arXiv 2014, arXiv:1412.1559. [Google Scholar] [CrossRef]
- Achlioptas, D. Database-Friendly Random Projections. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, USA, 21–23 May 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 274–281. [Google Scholar] [CrossRef]
- Ailon, N.; Chazelle, B. The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput. 2009, 39, 302–322. [Google Scholar] [CrossRef]
- Bingham, E.; Mannila, H. Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 245–250. [Google Scholar] [CrossRef]
- Kane, D.M.; Nelson, J. Sparser johnson-lindenstrauss transforms. J. ACM 2014, 61, 1–23. [Google Scholar] [CrossRef]
- Tibshirani, R.; Walther, G. Cluster validation by prediction strength. J. Comput. Graph. Stat. 2005, 14, 511–528. [Google Scholar] [CrossRef]
- Fan, J.; Lv, J. Nonconcave penalized likelihood with NP-dimensionality. IEEE T. Inform. Theory 2011, 57, 5467–5484. [Google Scholar] [CrossRef]
- Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
- Ghadimi, E.; Teixeira, A.; Shames, I.; Johansson, M. Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems. IEEE Trans. Autom. Control 2014, 60, 644–658. [Google Scholar] [CrossRef]
- Liu, B.; Shen, X.; Pan, W. Integrative and regularized principal component analysis of multiple sources of data. Stat. Med. 2016, 35, 2235–2250. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
- Zheng, P.; Aravkin, A. Relax-and-split method for nonconvex inverse problems. Inverse Probl. 2020, 36, 095013. [Google Scholar] [CrossRef]
- Chakraborty, S.; Xu, J. Biconvex clustering. J. Comput. Graph. Stat. 2023, 32, 1524–1536. [Google Scholar] [CrossRef]
ADMM | RRP-ADMM | RS-ADMM | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sample Size | SI | RI | Time | SI | RI | Time | SI | RI | Time | ||
0.081 | 0.921 | 7 | 0.059 | 0.933 | 2 | 0.080 | 0.925 | 10 | |||
0.058 | 0.945 | 88 | 0.046 | 0.957 | 7 | 0.056 | 0.947 | 121 | |||
0.049 | 0.962 | 352 | 0.045 | 0.974 | 17 | 0.047 | 0.966 | 551 | |||
0.042 | 0.973 | 1582 | 0.040 | 0.986 | 41 | 0.042 | 0.978 | 1864 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Li, N.; Zhou, Y.; Yan, J.; Jiang, B.; Kong, L.; Yan, X. Fast Fusion Clustering via Double Random Projection. Entropy 2024, 26, 376. https://doi.org/10.3390/e26050376
Wang H, Li N, Zhou Y, Yan J, Jiang B, Kong L, Yan X. Fast Fusion Clustering via Double Random Projection. Entropy. 2024; 26(5):376. https://doi.org/10.3390/e26050376
Chicago/Turabian StyleWang, Hongni, Na Li, Yanqiu Zhou, Jingxin Yan, Bei Jiang, Linglong Kong, and Xiaodong Yan. 2024. "Fast Fusion Clustering via Double Random Projection" Entropy 26, no. 5: 376. https://doi.org/10.3390/e26050376
APA StyleWang, H., Li, N., Zhou, Y., Yan, J., Jiang, B., Kong, L., & Yan, X. (2024). Fast Fusion Clustering via Double Random Projection. Entropy, 26(5), 376. https://doi.org/10.3390/e26050376