Abstract
In this paper we present a method to compute dissimilarities on unlabeled data, based on extremely randomized trees. This method, Unsupervised Extremely Randomized Trees, is used jointly with a novel randomized labeling scheme we describe here, and that we call AddCl3. Unlike existing methods such as AddCl1 and AddCl2, no synthetic instances are generated, thus avoiding an increase in the size of the dataset. The empirical study of this method shows that Unsupervised Extremely Randomized Trees with AddCl3 provides competitive results regarding the quality of resulting clusterings, while clearly outperforming previous similar methods in terms of running time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
\(m_{try}\) is the number of variables used at each node when a tree is grown in RF.
References
Abba, M.C., et al.: Breast cancer molecular signatures as determined by sage: correlation with lymph node status. Mol. Cancer Res. 5(9), 881–890 (2007)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Deza, M.M., Deza, E.: Encyclopedia of distances. Encyclopedia of Distances, pp. 1–583. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00234-2_1
Elghazel, H., Aussem, A.: Feature selection for unsupervised learning using random cluster ensembles. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 168–175. IEEE (2010)
Fisher, R., Marshall, M.: Iris data set. RA Fisher, UC Irvine Machine Learning Repository (1936)
Forina, M., et al.: An extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies 16147 (1991)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. SSS, vol. 1. Springer, New York (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Kim, H.L., Seligson, D., Liu, X., Janzen, N., Bui, M., Yu, H., Shi, T., Belldegrun, A.S., Horvath, S., Figlin, R.: Using tumor markers to predict the survival of patients with metastatic renal cell carcinoma. J. Urol. 173(5), 1496–1501 (2005)
Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)
Mangasarian, O., Wolberg, W.: Cancer diagnosis via linear programming. University of Wisconsin-Madison, Computer Sciences Department (1990)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peerbhay, K., Mutanga, O., Ismail, R.: Random forests unsupervised classification: the detection and mapping of solanum mauritianum infestations in plantation forestry using hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 3107–3122 (2015)
Percha, B., Garten, Y., Altman, R.B.: Discovery and explanation of drug-drug interactions via text mining. In: Pacific Symposium on Biocomputing. pp. 410–421 (2012). http://psb.stanford.edu/psb-online/proceedings/psb2012/percha.pdf
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rennard, S.I., et al.: Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann. Am. Thorac. Soc. 12(3), 303–312 (2015)
Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 15(1), 118–138 (2006)
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Acknowledgements
Kevin Dalleau’s PhD is funded by the RHU FIGHT-HF (ANR-15-RHUS-0004) and the Region Grand Est (France).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Dalleau, K., Couceiro, M., Smail-Tabbone, M. (2018). Unsupervised Extremely Randomized Trees. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)