Abstract
A critical stage in the study of single-cell RNA-sequencing (scRNA-seq) data is cell clustering. The quality of feature selection, which comes first in unsupervised clustering, directly affects the quality of the analysis that follows. It is difficult to choose high-quality characteristics since the gene expression data from scRNA-seq are high dimensional. Feature extraction is often used on gene expression data to choose highly expressed features, that is, subsets of original features. The typical ways for feature selection are to either reserve by percentage or to simply establish a specified threshold number based on experience. It is challenging to guarantee that the first-rank clustering results can be procured using these methods because they are so subjective. In this study, we propose a feature selection method scMFSA to overcome the one-dimensional shortcoming of the traditional PCA method by selecting multiple top-level feature sets. The similarity matrix constructed from each feature set is enhanced by affinity to optimize the feature learning. Lastly, studies are carried out on the actual scRNA-seq datasets using the features discovered in scMFSA. The findings indicate that when paired with clustering methods, the features chosen by scMFSA can increase the accuracy of clustering results. As a result, scMFSA can be an effective tool for researchers to employ when analyzing scRNA-seq data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Macosko, E.Z., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015). https://doi.org/10.1016/j.cell.2015.05.002
Ge, A., Aheh, A., Eab, C.: Classification models for heart disease prediction using feature selection and PCA. Inform. Med. Unlocked 19, 100330 (2020)
Kiselev, V.Y., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483–486 (2017). https://doi.org/10.1038/nmeth.4236
Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., Regev, A.: Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33(5), 495–502 (2015). https://doi.org/10.1038/nbt.3192
Lin, P., Troup, M., Ho, J.W.: CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 18(1), 59 (2017). https://doi.org/10.1186/s13059-017-1188-0
Hu, M.W., Kim, D.W., Liu, S., Zack, D.J., Blackshaw, S., Qian, J.: PanoView: an iterative clustering method for single-cell RNA sequencing data. PLoS Comput. Biol. 15(8), e1007040 (2019). https://doi.org/10.1371/journal.pcbi.1007040
Wolf, F.A., Angerer, P., Theis, F.J.: SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018). https://doi.org/10.1186/s13059-017-1382-0
Ji, Z., Ji, H.: TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44(13), e117 (2016). https://doi.org/10.1093/nar/gkw430
Jiang, L., Chen, H., Pinello, L., Yuan, G.C.: GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol 17(1), 144 (2016). https://doi.org/10.1186/s13059-016-1010-4
Lall, S., Ray, S., Bandyopadhyay, S.: RgCop-A regularized copula based method for gene selection in single cell rna-seq data. Cold Spring Harbor Lab. 17(10), e1009464 (2020)
Li, L., Tang, H., Xia, R., Dai, H., Liu, R., Chen, L.: Intrinsic entropy model for feature selection of scRNA-seq data. J. Mol. Cell Biol. 14(2), 014 (2022). https://doi.org/10.1093/jmcb/mjac008
Su, K., Yu, T., Wu, H.: Accurate feature selection improves single-cell RNA-seq cell clustering. Brief Bioinform. 22(5), bbab034 (2021). https://doi.org/10.1093/bib/bbab034
Peter, R.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Zhao, L., Yan, H.: MCNF: a novel method for cancer subtyping by integrating multi-omics and clinical data. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(5), 1682–1690 (2020). https://doi.org/10.1109/TCBB.2019.2910515
Ting, D.T., et al.: Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Rep. 8(6), 1905–1918 (2014). https://doi.org/10.1016/j.celrep.2014.08.029
Kolodziejczyk, A.A., et al.: Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4), 471–485 (2015). https://doi.org/10.1016/j.stem.2015.09.011
Grover, A., et al.: Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells. Nat. Commun. 7, 11075 (2016). https://doi.org/10.1038/ncomms11075
Maaten, L.J.P.V.D., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Liu, Q., Fu, W., Qin, J., Wei, X.Z., Gao, H.: Distributed k-means algorithm for sensor networks based on multi-agent consensus theory. In: 2016 IEEE International Conference on Industrial Technology (ICIT) (2016)
Liu, H., Zhao, R., Fang, H., Cheng, F., Fu, Y., Liu, Y.Y.: Entropy-based consensus clustering for patient stratification. Bioinformatics 33(17), 2691–2698 (2017). https://doi.org/10.1093/bioinformatics/btx167
Jiang, H., Sohn, L.L., Huang, H., Chen, L.: Single cell clustering based on cell-pair differentiability correlation and variance analysis. Bioinformatics (Oxford, England) 36(21), 3684–3694 (2018)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(3), 583–617 (2002)
Meilă, M.: Comparing clusterings—an information based distance. J. Multivariate Anal. 98(5), 873–895 (2007)
Zhang, S.H., Wong, H.S., Shen, Y.: Generalized adjusted rand indices for cluster ensembles. Pattern Recognit. 45(6), 2214–2226 (2012). https://doi.org/10.1016/j.patcog.2011.11.017
Zhang, D.J., Gao, Y.L., Zhao, J.X., Zheng, C.H., Liu, J.X.: A new graph autoencoder-based consensus-guided model for scRNA-seq cell type detection. IEEE Trans. Neural Netw. Learn. Syst. PP (2022). https://doi.org/10.1109/tnnls.2022.3190289
Acknowledgements
This work has been supported by the National Natural Science Foundation of China (61902216 and 61972226).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, Y., Li, F., Shang, J., Ge, D., Ren, Q., Li, S. (2023). Spectral Clustering of Single-Cell RNA-Sequencing Data by Multiple Feature Sets Affinity. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_23
Download citation
DOI: https://doi.org/10.1007/978-981-99-4749-2_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)