[go: up one dir, main page]

Skip to main content

Unsupervised Extremely Randomized Trees

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

  • 3666 Accesses

Abstract

In this paper we present a method to compute dissimilarities on unlabeled data, based on extremely randomized trees. This method, Unsupervised Extremely Randomized Trees, is used jointly with a novel randomized labeling scheme we describe here, and that we call AddCl3. Unlike existing methods such as AddCl1 and AddCl2, no synthetic instances are generated, thus avoiding an increase in the size of the dataset. The empirical study of this method shows that Unsupervised Extremely Randomized Trees with AddCl3 provides competitive results regarding the quality of resulting clusterings, while clearly outperforming previous similar methods in terms of running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/index.php.

  2. 2.

    https://labs.genetics.ucla.edu/horvath/RFclustering/RFclustering.htm.

  3. 3.

    \(m_{try}\) is the number of variables used at each node when a tree is grown in RF.

References

  1. Abba, M.C., et al.: Breast cancer molecular signatures as determined by sage: correlation with lymph node status. Mol. Cancer Res. 5(9), 881–890 (2007)

    Article  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  3. Deza, M.M., Deza, E.: Encyclopedia of distances. Encyclopedia of Distances, pp. 1–583. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00234-2_1

    Chapter  MATH  Google Scholar 

  4. Elghazel, H., Aussem, A.: Feature selection for unsupervised learning using random cluster ensembles. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 168–175. IEEE (2010)

    Google Scholar 

  5. Fisher, R., Marshall, M.: Iris data set. RA Fisher, UC Irvine Machine Learning Repository (1936)

    Google Scholar 

  6. Forina, M., et al.: An extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies 16147 (1991)

    Google Scholar 

  7. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. SSS, vol. 1. Springer, New York (2001)

    MATH  Google Scholar 

  8. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)

    Article  Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  Google Scholar 

  10. Kim, H.L., Seligson, D., Liu, X., Janzen, N., Bui, M., Yu, H., Shi, T., Belldegrun, A.S., Horvath, S., Figlin, R.: Using tumor markers to predict the survival of patients with metastatic renal cell carcinoma. J. Urol. 173(5), 1496–1501 (2005)

    Article  Google Scholar 

  11. Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)

    Article  Google Scholar 

  12. Mangasarian, O., Wolberg, W.: Cancer diagnosis via linear programming. University of Wisconsin-Madison, Computer Sciences Department (1990)

    Google Scholar 

  13. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)

    Article  MathSciNet  Google Scholar 

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Peerbhay, K., Mutanga, O., Ismail, R.: Random forests unsupervised classification: the detection and mapping of solanum mauritianum infestations in plantation forestry using hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 3107–3122 (2015)

    Article  Google Scholar 

  16. Percha, B., Garten, Y., Altman, R.B.: Discovery and explanation of drug-drug interactions via text mining. In: Pacific Symposium on Biocomputing. pp. 410–421 (2012). http://psb.stanford.edu/psb-online/proceedings/psb2012/percha.pdf

  17. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  18. Rennard, S.I., et al.: Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann. Am. Thorac. Soc. 12(3), 303–312 (2015)

    Article  Google Scholar 

  19. Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 15(1), 118–138 (2006)

    Article  MathSciNet  Google Scholar 

  20. Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Kevin Dalleau’s PhD is funded by the RHU FIGHT-HF (ANR-15-RHUS-0004) and the Region Grand Est (France).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Dalleau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dalleau, K., Couceiro, M., Smail-Tabbone, M. (2018). Unsupervised Extremely Randomized Trees. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics