[go: up one dir, main page]

Skip to content

wywongbd/autocluster

Repository files navigation

autocluster

autocluster is an automated machine learning (AutoML) toolkit for performing clustering tasks.

Report and presentation slides can be found here and here.

Prerequisites

  • Python 3.5 or above
  • Linux OS, or Windows WSL is also possible

How to get started?

  1. First, install SMAC:
  • sudo apt-get install build-essential swig
  • conda install gxx_linux-64 gcc_linux-64 swig
  • pip install smac==0.8.0
  1. pip install autocluster

How it works?

  • autocluster automatically optimizes the configuration of a clustering problem. By configuration, we mean

    • choice of dimension reduction algorithm
    • choice of clustering model
    • setting of dimension reduction algorithm's hyperparameters
    • setting of clustering model's hyperparameters
  • autocluster provides 3 different approaches to optimize the configuration (with increasing complexity):

    • random optimization
    • bayesian optimization
    • bayesian optimization + meta-learning (warmstarting)

Algorithms/Models supported

  • List of dimension reduction algorithms in sklearn supported by autocluster's optimizer.

  • List of clustering models in sklearn supported by autocluster's optimizer.

Examples

Examples are available in these notebooks.

Experimental results

  • This dataset comprises of 16 Gaussian clusters in 128-dimensional space with N = 1024 points. The optimal configuration obtained by autocluster (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.

  • This dataset comprises of 15 Gaussian clusters in 2-dimensional space with N = 5000 points. The optimal configuration obtained by autocluster (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.

Links

  • Link to pypi.
  • Great writeup by Martin Krasser on Bayesian Optimization

Disclaimer

The project is experimental and still under development.