8000 DOC structure for related projects (#8257) · scikit-learn/scikit-learn@dfcf632 · GitHub
[go: up one dir, main page]

Skip to content

Commit dfcf632

Browse files
jnothmanraghavrv
authored andcommitted
DOC structure for related projects (#8257)
Also adds xgboost and eli5
1 parent 15fddab commit dfcf632

File tree

1 file changed

+92
-55
lines changed

1 file changed

+92
-55
lines changed

doc/related_projects.rst

Lines changed: 92 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,12 @@ Interoperability and framework enhancements
1212
These tools adapt scikit-learn for use with other technologies or otherwise
1313
enhance the functionality of scikit-learn's estimators.
1414

15-
- `ML Frontend <https://github.com/jeff1evesque/machine-learning>`_ provides
16-
dataset management and SVM fitting/prediction through
17-
`web-based <https://github.com/jeff1evesque/machine-learning#web-interface>`_
18-
and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface>`_
19-
interfaces.
15+
**Data formats**
2016

2117
- `sklearn_pandas <https://github.com/paulgb/sklearn-pandas/>`_ bridge for
2218
scikit-learn pipelines and pandas data frame with dedicated transformers.
2319

24-
- `Scikit-Learn Laboratory
25-
<https://skll.readthedocs.io/en/latest/index.html>`_ A command-line
26-
wrapper around scikit-learn that makes it easy to run machine learning
27-
experiments with multiple learners and large feature sets.
20+
**Auto-ML**
2821

2922
- `auto-sklearn <https://github.com/automl/auto-sklearn/>`_
3023
An automated machine learning toolkit and a drop-in replacement for a
@@ -36,6 +29,37 @@ enhance the functionality of scikit-learn's estimators.
3629
preprocessors as well as the estimators. Works as a drop-in replacement for a
3730
scikit-learn estimator.
3831

32+
**Experimentation frameworks**
33+
34+
- `PyMC <http://pymc-devs.github.io/pymc/>`_ Bayesian statistical models and
35+
fitting algorithms.
36+
37+
- `REP <https://github.com/yandex/REP>`_ Environment for conducting data-driven
38+
research in a consistent and reproducible way
39+
40+
- `ML Frontend <https://github.com/jeff1evesque/machine-learning>`_ provides
41+
dataset management and SVM fitting/prediction through
42+
`web-based <https://github.com/jeff1evesque/machine-learning#web-interface>`_
43+
and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface>`_
44+
interfaces.
45+
46+
- `Scikit-Learn Laboratory
47+
<https://skll.readthedocs.io/en/latest/index.html>`_ A command-line
48+
wrapper around scikit-learn that makes it easy to run machine learning
49+
experiments with multiple learners and large feature sets.
50+
51+
**Model inspection and visualisation**
52+
53+
- `eli5 <https://github.com/TeamHG-Memex/eli5/>`_ A library for
54+
debugging/inspecting machine learning models and explaining their
55+
predictions.
56+
57+
- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes model visualization
58+
utilities.
59+
60+
61+
**Model export for production**
62+
3963
- `sklearn-pmml <https://github.com/alex-pirozhenko/sklearn-pmml>`_
4064
Serialization of (some) scikit-learn estimators into PMML.
4165

@@ -47,6 +71,12 @@ enhance the functionality of scikit-learn's estimators.
4771
- `sklearn-porter <https://github.com/nok/sklearn-porter>`_
4872
Transpile trained scikit-learn models to C, Java, Javascript and others.
4973

74+
- `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/>`_
75+
Generate a C++ implementation of the predict function for decision trees (and
76+
ensembles) trained by sklearn. Useful for latency-sensitive production
77+
environments.
78+
79+
5080
Other estimators and tasks
5181
--------------------------
5282

@@ -55,14 +85,7 @@ project. The following are projects providing interfaces similar to
5585
scikit-learn for additional learning algorithms, infrastructures
5686
and tasks.
5787

58-
- `pylearn2 <http://deeplearning.net/software/pylearn2/>`_ A deep learning and
59-
neural network library build on theano with scikit-learn like interface.
60-
61-
- `sklearn_theano <http://sklearn-theano.github.io/>`_ scikit-learn compatible
62-
estimators, transformers, and datasets which use Theano internally
63-
64-
- `lightning <https://github.com/scikit-learn-contrib/lightning>`_ Fast state-of-the-art
65-
linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
88+
**Structured learning**
6689

6790
- `Seqlearn <https://github.com/larsmans/seqlearn>`_ Sequence classification
6891
using HMMs or structured perceptron.
@@ -81,25 +104,41 @@ and tasks.
81104
(`CRFsuite <http://www.chokkan.org/software/crfsuite/>`_ wrapper with
82105
sklearn-like API).
83106

84-
- `py-earth <https://github.com/scikit-learn-contrib/py-earth>`_ Multivariate adaptive
85-
regression splines
107+
**Deep neural networks etc.**
86108

87-
- `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/>`_
88-
Generate a C++ implementation of the predict function for decision trees (and
89-
ensembles) trained by sklearn. Useful for latency-sensitive production
90-
environments.
109+
- `pylearn2 <http://deeplearning.net/software/pylearn2/>`_ A deep learning and
110+
neural network library build on theano with scikit-learn like interface.
91111

92-
- `lda <https://github.com/ariddell/lda/>`_: Fast implementation of latent
93-
Dirichlet allocation in Cython which uses `Gibbs sampling
94-
<https://en.wikipedia.org/wiki/Gibbs_sampling>`_ to sample from the true
95-
posterior distribution. (scikit-learn's
96-
:class:`sklearn.decomposition.LatentDirichletAllocation` implementation uses
97-
`variational inference
98-
<https://en.wikipedia.org/wiki/Variational_Bayesian_methods>`_ to sample from
99-
a tractable approximation of a topic model's posterior distribution.)
112+
- `sklearn_theano <http://sklearn-theano.github.io/>`_ scikit-learn compatible
113+
estimators, transformers, and datasets which use Theano internally
100114

101-
- `Sparse Filtering <https://github.com/jmetzen/sparse-filtering>`_
102-
Unsupervised feature learning based on sparse-filtering
115+
- `nolearn <https://github.com/dnouri/nolearn>`_ A n 9E7A umber of wrappers and
116+
abstractions around existing neural network libraries
117+
118+
- `keras <https://github.com/fchollet/keras>`_ Deep Learning library capable of
119+
running on top of either TensorFlow or Theano.
120+
121+
- `lasagne <https://github.com/Lasagne/Lasagne>`_ A lightweight library to
122+
build and train neural networks in Theano.
123+
124+
**Broad scope**
125+
126+
- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes a number of additional
127+
estimators as well as model visualization utilities.
128+
129+
- `sparkit-learn <https://github.com/lensacom/sparkit-learn>`_ Scikit-learn
130+
API and functionality for PySpark's distributed modelling.
131+
132+
**Other regression and classification**
133+
134+
- `xgboost https://github.com/dmlc/xgboost` Optimised gradient boosted decision
135+
tree library.
136+
137+
- `lightning <https://github.com/scikit-learn-contrib/lightning>`_ Fast
138+
state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
139+
140+
- `py-earth <https://github.com/scikit-learn-contrib/py-earth>`_ Multivariate
141+
adaptive regression splines
103142

104143
- `Kernel Regression <https://github.com/jmetzen/kernel_regression>`_
10 F438 5144
Implementation of Nadaraya-Watson kernel regression with automatic bandwidth
@@ -108,28 +147,32 @@ and tasks.
108147
- `gplearn <https://github.com/trevorstephens/gplearn>`_ Genetic Programming
109148
for symbolic regression tasks.
110149

111-
- `nolearn <https://github.com/dnouri/nolearn>`_ A number of wrappers and
112-
abstractions around existing neural network libraries
150+
- `multiisotonic <https://github.com/alexfields/multiisotonic>`_ Isotonic
151+
regression on multidimensional features.
113152

114-
- `sparkit-learn <https://github.com/lensacom/sparkit-learn>`_ Scikit-learn functionality and API on PySpark.
153+
**Decomposition and clustering**
115154

116-
- `keras <https://github.com/fchollet/keras>`_ Deep Learning library capable of
117-
running on top of either TensorFlow or Theano.
118-
119-
- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes a number of additional
120-
estimators as well as model visualization utilities.
121-
122-
- `kmodes <https://github.com/nicodv/kmodes>`_ k-modes clustering algorithm for categorical data, and
123-
several of its variations.
155+
- `lda <https://github.com/ariddell/lda/>`_: Fast implementation of latent
156+
Dirichlet allocation in Cython which uses `Gibbs sampling
157+
<https://en.wikipedia.org/wiki/Gibbs_sampling>`_ to sample from the true
158+
posterior distribution. (scikit-learn's
159+
:class:`sklearn.decomposition.LatentDirichletAllocation` implementation uses
160+
`variational inference
161+
<https://en.wikipedia.org/wiki/Variational_Bayesian_methods>`_ to sample from
162+
a tractable approximation of a topic model's posterior distribution.)
124163

125-
- `hdbscan <https://github.com/lmcinnes/hdbscan>`_ HDBSCAN and Robust Single Linkage clustering algorithms
126-
for robust variable density clustering.
164+
- `Sparse Filtering <https://github.com/jmetzen/sparse-filtering>`_
165+
Unsupervised feature learning based on sparse-filtering
127166

128-
- `lasagne <https://github.com/Lasagne/Lasagne>`_ A lightweight library to build and train neural networks in Theano.
167+
- `kmodes <https://github.com/nicodv/kmodes>`_ k-modes clustering algorithm for
168+
categorical data, and several of its variations.
129169

130-
- `multiisotonic <https://github.com/alexfields/multiisotonic>`_ Isotonic regression on multidimensional features.
170+
- `hdbscan <https://github.com/lmcinnes/hdbscan>`_ HDBSCAN and Robust Single
171+
Linkage clustering algorithms for robust variable density clustering.
131172

132-
- `spherecluster <https://github.com/clara-labs/spherecluster>`_ Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere.
173+
- `spherecluster <https://github.com/clara-labs/spherecluster>`_ Spherical
174+
K-means and mixture of von Mises Fisher clustering routines for data on the
175+
unit hypersphere.
133176

134177
Statistical learning with Python
135178
--------------------------------
@@ -145,12 +188,6 @@ Other packages useful for data analysis and machine learning.
145188
statistical models. More focused on statistical tests and less on prediction
146189
than scikit-learn.
147190

148-
- `PyMC <http://pymc-devs.github.io/pymc/>`_ Bayesian statistical models and
149-
fitting algorithms.
150-
151-
- `REP <https://github.com/yandex/REP>`_ Environment for conducting data-driven
152-
research in a consistent and reproducible way
153-
154191
- `Sacred <https://github.com/IDSIA/Sacred>`_ Tool to help you configure,
155192
organize, log and reproduce experiments
156193

0 commit comments

Comments
 (0)
0