autocluster
is an automated machine learning (AutoML) toolkit for performing clustering tasks.
Report and presentation slides can be found here and here.
- Python 3.5 or above
- Linux OS, or Windows WSL is also possible
- First, install SMAC:
sudo apt-get install build-essential swig
conda install gxx_linux-64 gcc_linux-64 swig
pip install smac==0.8.0
pip install autocluster
-
autocluster
automatically optimizes the configuration of a clustering problem. By configuration, we mean- choice of dimension reduction algorithm
- choice of clustering model
- setting of dimension reduction algorithm's hyperparameters
- setting of clustering model's hyperparameters
-
autocluster
provides 3 different approaches to optimize the configuration (with increasing complexity):- random optimization
- bayesian optimization
- bayesian optimization + meta-learning (warmstarting)
- List of dimension reduction algorithms in
sklearn
supported byautocluster
's optimizer.
- List of clustering models in
sklearn
supported byautocluster
's optimizer.
Examples are available in these notebooks.
- This dataset comprises of 16 Gaussian clusters in 128-dimensional space with
N = 1024
points. The optimal configuration obtained byautocluster
(SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.
- This dataset comprises of 15 Gaussian clusters in 2-dimensional space with
N = 5000 points
. The optimal configuration obtained byautocluster
(SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.
The project is experimental and still under development.