Abstract
The problem of feature selection has been an area of considerable research in machine learning. Feature selection is known to be particularly difficult in unsupervised learning because different subgroups of features can yield useful insights into the same dataset. In other words, many theoretically-right answers may exist for the same problem. Furthermore, designing algorithms for unsupervised feature selection is technically harder than designing algorithms for supervised feature selection because unsupervised feature selection algorithms cannot be guided by class labels. As a result, previous work attempts to discover intrinsic structures of data with heavy computation such as matrix decomposition, and require significant time to find even a single solution. This paper proposes a novel algorithm, named Explainability-based Unsupervised Feature Value Selection (EUFVS), which enables a paradigm shift in feature selection, and solves all of these problems. EUFVS requires only a few tens of milliseconds for datasets with thousands of features and instances, allowing the generation of a large number of possible solutions and select the solution with the best fit. Another important advantage of EUFVS is that it selects feature values instead of features, which can better explain phenomena in data than features. EUFVS enables a paradigm shift in feature selection. This paper explains its theoretical advantage, and also shows its applications in real experiments. In our experiments with labeled datasets, EUFVS found feature value sets that explain labels, and also detected useful relationships between feature value sets not detectable from given class labels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Almuallim, H., Dietterich, T.G.: Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994)
Angulo, A.P., Shin, K.: mRMR+ and CFS+ feature selection algorithms for high-dimensional data. Appl. Intell. 49(5), 1954–1967 (2019). https://doi.org/10.1007/s10489-018-1381-1. https://doi.org/10.1007/s10489-018-1381-1
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 333–342 (2010)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003, pp. 523–528 (2003)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: ICML 2000 (2000)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems (NIPS 2005), pp. 507–514 (2005)
Li, Z., Liu, J., Yang, Y., Zhou, X., Liu, H.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26(9), 2138–2150 (2014)
Liu, H., Shao, M., Fu, Y.: Consensus guided unsupervised feature selection. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2016), pp. 1874–1880 (2016)
Mohamed, N.S., Zainudin, S., Othman, Z.A.: Metaheuristic approach for an enhanced MRMR filter method for classification using drug response microarray data. Expert Syst. Appl. 90, 224–231 (2017). https://doi.org/10.1016/j.eswa.2017.08.026. http://www.sciencedirect.com/science/article/pii/S0957417417305638
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Qian, M., Zhai, C.: Robust unsupervised feature selection. In: Proceedings of 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 1621–1627 (2013)
Radovic, M., Ghalwash, M., Filipovic, N., Obradovic, Z.: Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 18(1), 9 (2017). https://doi.org/10.1186/s12859-016-1423-9
Senawi, A., Wei, H., Billings, S.A.: A new maximum relevance-minimum multicollinearity (mrmmc) method for feature selection and ranking. Pattern Recognit. 67, 47–61 (2017). https://doi.org/10.1016/j.patcog.2017.01.026
Shin, K., Fernandes, D., Miyazaki, S.: Consistency measures for feature selection: a formal definition, relative sensitivity comparison, and a fast algorithm. In: 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011). pp. 1491–1497 (2011)
Shin, K., Kuboyama, T., Hashimoto, T., Shepard, D.: sCWC/sLCC: highly scalable feature selection algorithms. Information 8(4), 159 (2017)
Shin, K., Xu, X.: Consistency-based feature selection. In: 13th International Conference on Knowledge-Based and Intelligent Information & Engineering System (2009)
Shin, K., Kuboyama, T., Hashimoto, T., Shepard, D.: Super-CWC and super-LCC: super fast feature selection algorithms. Big Data 2015, 61–67 (2015)
Shin, K., Okumoto, K., Shepard, D., Kuboyama, T., Hashimoto, T., Ohshima, H.: A fast algorithm for unsupervised feature value selection. In: 12th International Conference on Agents and Artificial Intelligence (ICAART 2020), pp. 203–213 (2020). https://doi.org/10.5220/0008981702030213
Vinh, L.T., Thang, N.D., Lee, Y.K.: An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet, July 2010. https://doi.org/10.1109/saint.2010.50. http://dx.doi.org/10.1109/SAINT.2010.50
Wei, X., Cao, B., Yu, P.S.: Unsupervised feature selection on networks: a generative view. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2016), pp. 2215–2221 (2016)
Wei, X., Cao, B., Yu, P.S.: Multi-view unsupervised feature selection by cross-diffused matrix alignment. In: Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN 2017), pp. 494–501 (2017)
Zhang, Y., Ding, C., Li, T.: Gene selection algorithm by combining reliefF and mRMR. BCM Genomics 9(2), 1–10 (2008)
Zhao, Z., Liu, H.: Searching for interacting features. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1156–1161 (2007)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pp. 1151–1157 (2007)
Zhao, Z., Anand, R., Wang, M.: Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform, August 2019
Acknowledgements
This work was partially supported by the Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Numbers 16K12491 and 17H00762) from the Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shin, K. et al. (2021). Unsupervised Feature Value Selection Based on Explainability. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2020. Lecture Notes in Computer Science(), vol 12613. Springer, Cham. https://doi.org/10.1007/978-3-030-71158-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-71158-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71157-3
Online ISBN: 978-3-030-71158-0
eBook Packages: Computer ScienceComputer Science (R0)