Machine Learning Fusion Multi-Source Data Features for Classification Prediction of Lunar Surface Geological Units
"> Figure 1
<p>Flow chart of geological unit classification prediction based on machine learning in combination with multi-source data features.</p> "> Figure 2
<p>Image and geological unit distribution of the two study areas. (<b>a</b>,<b>b</b>) Chang’e-4 landing area. (<b>c</b>,<b>d</b>) Chang’e-5 landing area.</p> "> Figure 3
<p>An illustration of Confusion Matrix.</p> "> Figure 4
<p>Ranking of feature importance evaluation of two study areas. (<b>a</b>,<b>c</b>) Scores of feature importance of Chang’e-4 and Chang’e-5 study areas, respectively, based on the nearest neighbor model. (<b>b</b>,<b>d</b>) Scores of feature importance of Chang’e-4 and Chang’e-5 study areas, respectively, based on the application of the integrated algorithm of XGBoost.</p> "> Figure 5
<p>The summary of obtained accuracy, precision, recall and f1-score for each classification model of the two study areas. (<b>a</b>–<b>d</b>) The landing area of Chang’e-4, and (<b>e</b>–<b>h</b>) the landing area of Chang’e-5.</p> "> Figure 6
<p>Classification evaluation report and the Recall–Confusion Matrix of the highest classification accuracy for the two study areas. (<b>a</b>,<b>b</b>) Classification results of Chang’e-4 landing area obtained on XGBoost + DataSet_4. (<b>c</b>,<b>d</b>) Classification results of Chang’e-5 landing area obtained on XGBoost + DataSet_9.</p> "> Figure 7
<p>Statistics of the means of the accuracy for 6 machine learning algorithms and 16 datasets. (<b>a</b>,<b>b</b>) Chang’e-4 area. (<b>c</b>,<b>d</b>) Chang’e-5 area.</p> "> Figure 8
<p>Feature distribution and statistics of featured geological unit classification. (<b>a</b>) Topographic elevation distribution of CE-4 landing area. (<b>b</b>) Olivine abundance of CE-4 landing area. (<b>c</b>) Plagioclase content of CE-4 landing area. (<b>d</b>) TiO2 content of CE-4 landing area. (<b>e</b>) Relief of CE-5 landing area. (<b>f</b>) Olivine content of CE-5 landing area. (<b>g</b>) Plagioclase content of CE-5 landing area. (<b>h</b>) TiO2 distribution of CE-5 landing area. (<b>i</b>) Statistic of elevation of classified geological units for CE-4 landing area. (<b>j</b>) Statistic of elevation of classified geological units for CE-5 landing area. (<b>k</b>) Statistic of relief of classified geological units for CE-4 landing area. (<b>l</b>) Statistic of relief of classified geological units for CE-5 landing area. (<b>m</b>) Statistic of olivine and plagioclase of classified geological units for CE-4 landing area. (<b>n</b>) Statistic of olivine and plagioclase of classified geological units for CE-5 landing area. (<b>o</b>) Statistic of TiO2 of classified geological units for CE-4 landing area. (<b>p</b>) Statistic of TiO2 of classified geological units for CE-5 landing area.</p> "> Figure 9
<p>Visualization of classification prediction results of geological units for two study areas. (<b>a</b>) Chang’e-4 landing area. (<b>b</b>) Chang’e-5 landing area.</p> ">
Abstract
:1. Introduction
2. Methods
2.1. Study Regions
2.2. Feature Extraction
2.3. Target Classification
2.4. Prediction Assessment
3. Results
3.1. Constructed Feature Dataset
3.2. Chang’e-4 Landing Area
3.3. Chang’e-5 Landing Area
4. Discussions
4.1. Comparison of Classification Models
4.2. Feature Selection and Correlation Analysis
4.3. Applications of Classification Prediction
5. Conclusions
- (1)
- Classification models: the classification models constructed obtain high accuracy classification predictions of 97.9% and 95.1% for the two inhomogeneous and complex areas with multiple classifications. This fully verifies the effectiveness of the constructed classification models, which combine machine learning algorithms with data features (e.g., topography, geomorphology, mineral abundance, material composition) in the classification prediction of geological units. On one hand, all the six machine learning algorithms selected exhibit a strong multi-classification ability, among which XGBoost, CatBoost, Bagging and GradientBoosting are preferred, and especially XGBoost which has the best classification performance and can be used as the preferred classifier for subsequent work; on the other hand, the feature dataset composed of the combination of feature variables has an important influence on the accuracy of geological unit classification prediction. Compared to adjusting the hyperparameters of the machine learning algorithm, building an effective feature dataset by feature combination is a more effective way to improve the classification prediction accuracy.
- (2)
- Feature selection: several important features such as ‘elevation’, ‘relief’, ‘TiO2’, ‘Plagioclase’, ‘Olivine’ and ‘FeO’ were screened using the two feature selection methods, namely, statistical-based data mutual information estimation and model-based machine learning algorithm feature evaluation. A classification model was constructed by the combination of these features to achieve a high accuracy geological unit classification prediction. These features also effectively reflect the apparent variation in topography, geomorphology, materials composition and mineral abundance of the study areas, which deepens our understandings on the formation and evolution of the Moon. It should be noted that although the final classification prediction results verify the effectiveness of the feature selection method, the features selected in this study are not the only features that can be used due to the diversity of the feature selection methods. The effectiveness of other features and their associated combinations is still worth exploring. Therefore, our future work will focus on mining more effective feature variables to obtain more accurate classification prediction results and conducting in-depth research on correlation analysis between data features and geological units.
- (3)
- Application of the method: The developed method is flexible, efficient and has a good extensibility. It is suitable for the geological unit classification prediction for any lunar geological map data and any region of the Moon. The classification prediction method can not only be applied to the digital mapping of the global Moon surface, but also provide effective support for the automatic mapping of geological units in any region. In addition, effective feature variables can be mined through classification prediction, which can help to perform in-depth comprehensive analysis of geological units for any size area on the lunar surface. Moreover, the classification method can also be applied to the classification of lunar surface chronological units. Subsequently, we will attempt to mine the association rules of geochronological units on the global lunar surface based on the results of this work. A lunar surface chronology and quantitative analysis model based on machine learning of multiple feature variables will be also our central focus in the future.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fortezzo, C.M.; Spudis, P.D.; Harrel, S.L. Release of the digital unified global geologic map of the Moon at 1: 5,000,000-Scale. Lunar Planet. Sci. Conf. 2020, 2326, 2760. [Google Scholar]
- Ouyang, Z.Y.; Liu, J.Z. The origin and evolution of the Moon and its geological mapping. Earth Sci. Front. 2014, 21, 1–6. (In Chinese) [Google Scholar]
- Ling, Z.C.; Liu, J.Z.; Zhang, J.; Li, B.; Wu, Z.C.; Ni, Y.H.; Sun, L.Z. The lunar rock types as determined by Chang’E-1 IIM data: A case study of Mare Imbrium-Mare Frigoris region (LQ-4). Adv. Earth Sci. 2014, 21, 107–120. (In Chinese) [Google Scholar]
- Ding, X.Z.; Wang, L.; Han, K.Y.; Pang, J.F.; Liu, J.Z.; Guo, D.J.; Ding, W.W.; Ju, Y.J. The lunar digital geological mapping based on ArcGIS: Taking the arctic region as an example. Adv. Earth Sci. 2014, 21, 19–30. (In Chinese) [Google Scholar]
- Cheng, W.M.; Liu, Q.Y.; Wang, J.; Gao, W.X.; Liu, J.Z. A preliminary study of classification method on lunar topography and landforms. Adv. Earth Sci. 2018, 33, 885–897. (In Chinese) [Google Scholar]
- Cracknell, M.J.; Reading, A.M. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput. Geosci. 2014, 63, 22–33. [Google Scholar] [CrossRef] [Green Version]
- Harris, J.R.; Grunsky, E.C. Predictive lithological mapping of Canada’s North using Random Forest classification applied to geophysical and geochemical data. Comput. Geosci. 2015, 80, 9–25. [Google Scholar] [CrossRef]
- Zheng, Y. Research on Lithology Recognition Based on Deep Learning. Ph.D. Thesis, China University of Petroleum, Beijing, China, 2017. (In Chinese with English abstract). [Google Scholar]
- Othman, A.A.; Gloaguen, R. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq. J. Asian Earth Sci. 2017, 146, 90–102. [Google Scholar] [CrossRef]
- Kuhn, S.; Cracknell, M.J.; Reading, A.M. Lithologic mapping using Random Forests applied to geophysical and remote-sensing data: A demonstration study from the Eastern Goldfields of Australia. Geophysics 2018, 83, B183–B193. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, J.; Yu, C.C.; Meng, P.Y.; Guo, Z. Classification of quaternary coverings in desert grassland shallow cover area based on multi-source remote sensing data: A case of 1:50000 pilot geological mapping in Qigandiani, Inner Mongolia. Bull. Geol. Sci. Technol. 2019, 38, 281–290. (In Chinese) [Google Scholar]
- Duan, Y.X.; Zhao, Y.S.; Ma, C.F.; Jiang, W.X. Lithology identification method based on multi-layer ensemble learning. J. Data Acquis. Process. 2020, 35, 572–581. (In Chinese) [Google Scholar]
- Zhu, M.Y.; Li, B.Q.; Fu, H.Z.; Chen, C.; Gao, M. SVM lithological classification based on multi-source data collaboration: A case study in Jianggalesayi area. Uranium Geol. 2020, 36, 288–292+317. (In Chinese) [Google Scholar]
- Wang, J. Mineral Assemblages Mapping of Porphyry Copper Deposits Based on Normalized Multispectral Remote Sensing Data in the Dulong Ore Concentrating Area. Ph.D. Thesis, Chengdu University of Technology, Chengdu, China, 2018. (In Chinese with English abstract). [Google Scholar]
- Wang, Q.; Lin, B.; Tang, J.X.; Song, Y.; Li, Y.B.; Hou, J.F.; Wei, L.J. Diagenesis, lithogenesis and geodynamic setting of intrusions in Senadong Area, Duolong district, Tibet. Earth Sci. 2018, 43, 1125–1141. [Google Scholar]
- Wu, G.; Chen, G.; Cheng, Q.; Zhang, Z.; Yang, J. Unsupervised machine learning for lithological mapping using geochemical data in covered areas of Jining, China. Nat. Resour. Res. 2021, 30, 1053–1068. [Google Scholar] [CrossRef]
- Li, C.; Hu, H.; Yang, M.F.; Pei, Z.Y.; Zhou, Q.; Ren, X.; Liu, B.; Liu, D.; Zeng, X.; Zhang, G.; et al. Characteristics of the lunar samples returned by the Chang’E-5 mission. Natl. Sci. Rev. 2022, 9, nwab188. [Google Scholar] [CrossRef]
- Qian, Y.Q.; Xiao, L.; Zhao, S.Y. Geology and scientific significance of the Rümker region in northern Oceanus Procellarum: China’s Chang’E-5 landing region. J. Geophys. Res. Planets 2018, 123, 1407–1430. [Google Scholar] [CrossRef] [Green Version]
- Liu, J.; Ren, X.; Yan, W. Descent trajectory reconstruction and landing site positioning of Chang’E-4 on the lunar farside. Nat. Commun. 2019, 10, 4229. [Google Scholar] [CrossRef] [Green Version]
- Di, K.; Liu, Z.; Liu, B.; Wan, W.; Peng, M.; Wang, Y.; Gou, S.; Yue, Z.; Xin, X.; Jia, M.; et al. Chang’e-4 lander localization based on multi-source data. J. Remote Sens 2019, 23, 177–184. [Google Scholar]
- Li, C.; Liu, D.; Liu, B.; Ren, X.; Liu, J.; He, Z.; Zuo, W.; Zeng, X.; Xu, R.; Tan, X.; et al. Chang’E-4 initial spectroscopic identification of lunar far-side mantle-derived materials. Nature 2019, 569, 378–382. [Google Scholar] [CrossRef]
- Ohtake, M.; Pieters, C.M.; Isaacson, P.; Besse, S.; Yokota, Y.; Matsunaga, T.; Boardman, J.; Yamomoto, S.; Haruyama, J.; Staid, M.; et al. One Moon, many measurements 3: Spectral reflectance. Icarus 2013, 226, 364–374. [Google Scholar] [CrossRef]
- Li, C.L.; Liu, J.J.; Ren, X.; Mou, L.L.; Mou, L.L.; Zou, Y.L.; Zhang, H.B.; Lü, C.; Liu, J.Z.; Zuo, W.; et al. The global image of the moon by the Chang’E-1: Data processing and lunar cartography. Sci. China Earth Sci. 2010, 53, 1091–1102. [Google Scholar] [CrossRef]
- Zuo, W.; Li, C.; Zhang, Z.; Zeng, X.; Liu, Y.; Xiong, Y. China’s Lunar and Planetary Data System: Preserve and Present Reliable Chang’e Project and Tianwen-1 Scientific. Space Sci. Rev. 2021, 217, 88. [Google Scholar] [CrossRef]
- Li, C.; Ren, X.; Liu, J.; Zou, X.; Mu, L.; Wang, J.; Shu, R.; Zou, Y.; Zhang, H.; Lü, C.; et al. Laser altimetry data of Chang’E-1 and the global lunar DEM model. Sci. China Earth Sci. 2010, 53, 1582–1593. [Google Scholar] [CrossRef]
- Sato, H.; Robinson, M.S.; Lawrence, S.J.; Denevi, B.W.; Hapke, H.; Jolliff, B.L.; Hiesinger, H. Lunar Mare TiO2 Abundances Estimated from UV/Vis Reflectance. Icarus 2017, 296, 216–238. [Google Scholar] [CrossRef]
- Lemelin, M.; Lucey, P.G.; Song, E.; Taylor, G.J. Lunar central peak mineralogy and iron content using the Kaguya Multiband Imager: Reassessment of the compositional structure of the lunar crust. J. Geophys. Res. Planets 2015, 120, 869–887. [Google Scholar] [CrossRef]
- Gahegan, M. On the application of inductive machine learning tools to geographical analysis. Geogr. Anal. 2000, 32, 113–139. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Kovacevic, M.; Bajat, B.; Trivic, B.; Pavlovic, R. Geological units classification of multispectral images by using Support Vector Machines. In Proceedings of the International Conference on Intelligent Networking and Collaborative Systems, Barcelona, Spain, 4–6 November 2009; pp. 267–272. [Google Scholar]
- Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
- Hsu, C.W.; Chang, C.C.; Lin, C.J. A Practical Guide to Support Vector Classification; Department of Computer Science, Mational Taiwan University: Taipei, Taiwan, 2003. [Google Scholar]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef] [Green Version]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Friedman, J.H. Greedy function approximation, a gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost, A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363, 2018. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
No | Feature Name | Feature Definition | Data Source Description |
---|---|---|---|
1 | Longitude | Longitude coordinates of the center of the sample point, ranging from −180 to 180 degrees | The global DOM data were acquired by the Chang’e-2 CCD camera [23,24]. The resolution of the data used is 20 m. |
2 | Latitude | Latitude coordinates of the center of the sample point, ranging from −90 to 90 degrees | |
3 | Gray | The grayscale value of the image of the pixel where the sample point is located | |
4 | Elevation | Elevation from the DEM data of the pixel where the sample point is located | The global DEM data were acquired by the Chang’e-2 CCD camera [24,25]. The resolution of the data used in this paper is 20 m. |
5 | Relief | The difference between the maximum and minimum elevation values of all pixel points in the eight neighbors centered around the pixel where the sample point is located | |
6 | slope | The average of the rate of change of elevation from one pixel to another. It can be calculated as denote the partial derivatives in the x and y directions, respectively | |
7 | TiO2 | The TiO2 content of the pixel where the sample point is located | TiO2 content data are from the wide angle camera (WAC) of the US Lunar Reconnaissance Orbiter (LROC) system [26], with data coverage from 0 to 360 degrees longitude and −70 to 70 degrees latitude. The resolution of the data used is 400 m/pixel. |
8 | FeO | The FeO content of the pixel where the sample point is located | Multispectral image data of the lunar surface acquired by the Kaguya multi-band imager (MI) at five wavelength positions in the ultraviolet-visible band (UVVIS; 415, 750, 900, 950, 1001 nm) and four wavelength positions in the near-infrared band (NIR; 1000, 1050, 1100, 1250 nm). FeO content, four common mineral contents (two types of pyroxene, plagioclase, olivine), submicroscopic metallic iron (SMFe) abundance, and optical maturity (OMAT) data were derived with data coverage from 0 to 360 degrees longitude and −50 to 50 degrees latitude [27]. The resolution of the data used is 59 m/pixel. |
9 | SMFe | Submicroscopic metallic iron (SMFe) content of the pixel where the sample point is located | |
10 | Clinopyroxene | Clinopyroxene content of the pixel where the sample point is located | |
11 | Orthopyroxene | Orthopyroxene content of the pixel where the sample point is located | |
12 | Plagioclase | Plagioclase content of the pixel where the sample point is located | |
13 | Olivine | Olivine content of the pixel where the sample point is located | |
14 | OMAT | Optical maturity of the pixel where the sample point is located |
Dataset Name | Abbreviations | Feature Combinations of Datasets |
---|---|---|
DataSet_1 | DS1 | ‘Longitude’, ‘Latitude’, ‘FeO’ |
DataSet_2 | DS2 | ‘Longitude’, ‘Latitude’, ‘TiO2’ |
DataSet_3 | DS3 | ‘Longitude’, ‘Latitude’, ‘Plagioclase’ |
DataSet_4 | DS4 | ‘Longitude’, ‘Latitude’, ‘Elevation’ |
DataSet_5 | DS5 | ‘Longitude’, ‘Latitude’, ‘Relief’ |
DataSet_6 | DS6 | ‘Longitude’, ‘Latitude’, ‘Olivine’ |
DataSet_7 | DS7 | ‘Longitude’, ‘Latitude’, ‘Relief’, ‘TiO2’ |
DataSet_8 | DS8 | ‘Longitude’, ‘Latitude’, ‘Elevation’, ‘TiO2’ |
DataSet_9 | DS9 | ‘Longitude’, ‘Latitude’, ‘Relief’, ‘TiO2’, ‘Olivine’ |
DataSet_10 | DS10 | ‘Longitude’, ‘Latitude’, ‘Elevation’, ‘TiO2’, ‘Olivine’ |
DataSet_11 | DS11 | ‘Longitude’, ‘Latitude’, ‘Relief’, ‘TiO2’, ‘FeO’ |
DataSet_12 | DS12 | ‘Longitude’, ‘Latitude’, ‘Elevation’, ‘TiO2’, ‘FeO’ |
DataSet_13 | DS13 | ‘Longitude’, ‘Latitude’, ‘Relief’, ‘TiO2’, ‘Plagioclase’ |
DataSet_14 | DS14 | ‘Longitude’, ‘Latitude’, ‘Elevation’, ‘TiO2’, ‘Plagioclase’ |
DataSet_15 | DS15 | ‘Longitude’, ‘Latitude’, ‘Relief’, ‘TiO2’, ‘Plagioclase’, ‘FeO’ |
DataSet_16 | DS16 | ‘Longitude’, ‘Latitude’, ‘Elevation’, ‘TiO2’, ‘FeO’, ‘Plagioclase’ |
Study Regions | Classification Model | Number of Training Samples | Number of Test Samples | Proportion of Known Information | Accuracy |
---|---|---|---|---|---|
CE-4 | XGBoost + DataSet_4 | 16326 | 6997 | 70% | 95.1% |
13993 | 9330 | 60% | 94.6% | ||
11662 | 11662 | 50% | 93.9% | ||
CE-5 | XGBoost + DataSet_9 | 12944 | 5548 | 70% | 97.9% |
11095 | 7397 | 60% | 97.6% | ||
9246 | 9246 | 50% | 97.3% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zuo, W.; Zeng, X.; Gao, X.; Zhang, Z.; Liu, D.; Li, C. Machine Learning Fusion Multi-Source Data Features for Classification Prediction of Lunar Surface Geological Units. Remote Sens. 2022, 14, 5075. https://doi.org/10.3390/rs14205075
Zuo W, Zeng X, Gao X, Zhang Z, Liu D, Li C. Machine Learning Fusion Multi-Source Data Features for Classification Prediction of Lunar Surface Geological Units. Remote Sensing. 2022; 14(20):5075. https://doi.org/10.3390/rs14205075
Chicago/Turabian StyleZuo, Wei, Xingguo Zeng, Xingye Gao, Zhoubin Zhang, Dawei Liu, and Chunlai Li. 2022. "Machine Learning Fusion Multi-Source Data Features for Classification Prediction of Lunar Surface Geological Units" Remote Sensing 14, no. 20: 5075. https://doi.org/10.3390/rs14205075