Unsupervised Extremely Randomized Trees

Kevin Dalleau¹⁹,
Miguel Couceiro¹⁹ &
Malika Smail-Tabbone¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3666 Accesses

Abstract

In this paper we present a method to compute dissimilarities on unlabeled data, based on extremely randomized trees. This method, Unsupervised Extremely Randomized Trees, is used jointly with a novel randomized labeling scheme we describe here, and that we call AddCl3. Unlike existing methods such as AddCl1 and AddCl2, no synthetic instances are generated, thus avoiding an increase in the size of the dataset. The empirical study of this method shows that Unsupervised Extremely Randomized Trees with AddCl3 provides competitive results regarding the quality of resulting clusterings, while clearly outperforming previous similar methods in terms of running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data

Article 31 March 2020

Labeled Trees Generating Complete, Compact, and Discrete Ultrametric Spaces

Article 18 April 2022

Using Decision Trees for Interpretable Supervised Clustering

Article Open access 15 February 2024

Notes

1.
https://archive.ics.uci.edu/ml/index.php.
2.
https://labs.genetics.ucla.edu/horvath/RFclustering/RFclustering.htm.
3.
$m_{try}$ is the number of variables used at each node when a tree is grown in RF.

References

Abba, M.C., et al.: Breast cancer molecular signatures as determined by sage: correlation with lymph node status. Mol. Cancer Res. 5(9), 881–890 (2007)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Deza, M.M., Deza, E.: Encyclopedia of distances. Encyclopedia of Distances, pp. 1–583. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00234-2_1
Chapter MATH Google Scholar
Elghazel, H., Aussem, A.: Feature selection for unsupervised learning using random cluster ensembles. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 168–175. IEEE (2010)
Google Scholar
Fisher, R., Marshall, M.: Iris data set. RA Fisher, UC Irvine Machine Learning Repository (1936)
Google Scholar
Forina, M., et al.: An extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies 16147 (1991)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. SSS, vol. 1. Springer, New York (2001)
MATH Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article Google Scholar
Kim, H.L., Seligson, D., Liu, X., Janzen, N., Bui, M., Yu, H., Shi, T., Belldegrun, A.S., Horvath, S., Figlin, R.: Using tumor markers to predict the survival of patients with metastatic renal cell carcinoma. J. Urol. 173(5), 1496–1501 (2005)
Article Google Scholar
Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)
Article Google Scholar
Mangasarian, O., Wolberg, W.: Cancer diagnosis via linear programming. University of Wisconsin-Madison, Computer Sciences Department (1990)
Google Scholar
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Article MathSciNet Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Peerbhay, K., Mutanga, O., Ismail, R.: Random forests unsupervised classification: the detection and mapping of solanum mauritianum infestations in plantation forestry using hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 3107–3122 (2015)
Article Google Scholar
Percha, B., Garten, Y., Altman, R.B.: Discovery and explanation of drug-drug interactions via text mining. In: Pacific Symposium on Biocomputing. pp. 410–421 (2012). http://psb.stanford.edu/psb-online/proceedings/psb2012/percha.pdf
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Rennard, S.I., et al.: Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann. Am. Thorac. Soc. 12(3), 303–312 (2015)
Article Google Scholar
Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 15(1), 118–138 (2006)
Article MathSciNet Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Kevin Dalleau’s PhD is funded by the RHU FIGHT-HF (ANR-15-RHUS-0004) and the Region Grand Est (France).

Author information

Authors and Affiliations

Universite de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
Kevin Dalleau, Miguel Couceiro & Malika Smail-Tabbone

Authors

Kevin Dalleau
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Couceiro
View author publications
You can also search for this author in PubMed Google Scholar
Malika Smail-Tabbone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Dalleau .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dalleau, K., Couceiro, M., Smail-Tabbone, M. (2018). Unsupervised Extremely Randomized Trees. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_38
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Extremely Randomized Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data

Labeled Trees Generating Complete, Compact, and Discrete Ultrametric Spaces

Using Decision Trees for Interpretable Supervised Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Unsupervised Extremely Randomized Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data

Labeled Trees Generating Complete, Compact, and Discrete Ultrametric Spaces

Using Decision Trees for Interpretable Supervised Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation