@@ -12,19 +12,12 @@ Interoperability and framework enhancements
12
12
These tools adapt scikit-learn for use with other technologies or otherwise
13
13
enhance the functionality of scikit-learn's estimators.
14
14
15
- - `ML Frontend <https://github.com/jeff1evesque/machine-learning >`_ provides
16
- dataset management and SVM fitting/prediction through
17
- `web-based <https://github.com/jeff1evesque/machine-learning#web-interface >`_
18
- and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface >`_
19
- interfaces.
15
+ **Data formats **
20
16
21
17
- `sklearn_pandas <https://github.com/paulgb/sklearn-pandas/ >`_ bridge for
22
18
scikit-learn pipelines and pandas data frame with dedicated transformers.
23
19
24
- - `Scikit-Learn Laboratory
25
- <https://skll.readthedocs.io/en/latest/index.html> `_ A command-line
26
- wrapper around scikit-learn that makes it easy to run machine learning
27
- experiments with multiple learners and large feature sets.
20
+ **Auto-ML **
28
21
29
22
- `auto-sklearn <https://github.com/automl/auto-sklearn/ >`_
30
23
An automated machine learning toolkit and a drop-in replacement for a
@@ -36,6 +29,37 @@ enhance the functionality of scikit-learn's estimators.
36
29
preprocessors as well as the estimators. Works as a drop-in replacement for a
37
30
scikit-learn estimator.
38
31
32
+ **Experimentation frameworks **
33
+
34
+ - `PyMC <http://pymc-devs.github.io/pymc/ >`_ Bayesian statistical models and
35
+ fitting algorithms.
36
+
37
+ - `REP <https://github.com/yandex/REP >`_ Environment for conducting data-driven
38
+ research in a consistent and reproducible way
39
+
40
+ - `ML Frontend <https://github.com/jeff1evesque/machine-learning >`_ provides
41
+ dataset management and SVM fitting/prediction through
42
+ `web-based <https://github.com/jeff1evesque/machine-learning#web-interface >`_
43
+ and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface >`_
44
+ interfaces.
45
+
46
+ - `Scikit-Learn Laboratory
47
+ <https://skll.readthedocs.io/en/latest/index.html> `_ A command-line
48
+ wrapper around scikit-learn that makes it easy to run machine learning
49
+ experiments with multiple learners and large feature sets.
50
+
51
+ **Model inspection and visualisation **
52
+
53
+ - `eli5 <https://github.com/TeamHG-Memex/eli5/ >`_ A library for
54
+ debugging/inspecting machine learning models and explaining their
55
+ predictions.
56
+
57
+ - `mlxtend <https://github.com/rasbt/mlxtend >`_ Includes model visualization
58
+ utilities.
59
+
60
+
61
+ **Model export for production **
62
+
39
63
- `sklearn-pmml <https://github.com/alex-pirozhenko/sklearn-pmml >`_
40
64
Serialization of (some) scikit-learn estimators into PMML.
41
65
@@ -47,6 +71,12 @@ enhance the functionality of scikit-learn's estimators.
47
71
- `sklearn-porter <https://github.com/nok/sklearn-porter >`_
48
72
Transpile trained scikit-learn models to C, Java, Javascript and others.
49
73
74
+ - `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/ >`_
75
+ Generate a C++ implementation of the predict function for decision trees (and
76
+ ensembles) trained by sklearn. Useful for latency-sensitive production
77
+ environments.
78
+
79
+
50
80
Other estimators and tasks
51
81
--------------------------
52
82
@@ -55,14 +85,7 @@ project. The following are projects providing interfaces similar to
55
85
scikit-learn for additional learning algorithms, infrastructures
56
86
and tasks.
57
87
58
- - `pylearn2 <http://deeplearning.net/software/pylearn2/ >`_ A deep learning and
59
- neural network library build on theano with scikit-learn like interface.
60
-
61
- - `sklearn_theano <http://sklearn-theano.github.io/ >`_ scikit-learn compatible
62
- estimators, transformers, and datasets which use Theano internally
63
-
64
- - `lightning <https://github.com/scikit-learn-contrib/lightning >`_ Fast state-of-the-art
65
- linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
88
+ **Structured learning **
66
89
67
90
- `Seqlearn <https://github.com/larsmans/seqlearn >`_ Sequence classification
68
91
using HMMs or structured perceptron.
@@ -81,25 +104,41 @@ and tasks.
81
104
(`CRFsuite <http://www.chokkan.org/software/crfsuite/ >`_ wrapper with
82
105
sklearn-like API).
83
106
84
- - `py-earth <https://github.com/scikit-learn-contrib/py-earth >`_ Multivariate adaptive
85
- regression splines
107
+ **Deep neural networks etc. **
86
108
87
- - `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/ >`_
88
- Generate a C++ implementation of the predict function for decision trees (and
89
- ensembles) trained by sklearn. Useful for latency-sensitive production
90
- environments.
109
+ - `pylearn2 <http://deeplearning.net/software/pylearn2/ >`_ A deep learning and
110
+ neural network library build on theano with scikit-learn like interface.
91
111
92
- - `lda <https://github.com/ariddell/lda/ >`_: Fast implementation of latent
93
- Dirichlet allocation in Cython which uses `Gibbs sampling
94
- <https://en.wikipedia.org/wiki/Gibbs_sampling> `_ to sample from the true
95
- posterior distribution. (scikit-learn's
96
- :class: `sklearn.decomposition.LatentDirichletAllocation ` implementation uses
97
- `variational inference
98
- <https://en.wikipedia.org/wiki/Variational_Bayesian_methods> `_ to sample from
99
- a tractable approximation of a topic model's posterior distribution.)
112
+ - `sklearn_theano <http://sklearn-theano.github.io/ >`_ scikit-learn compatible
113
+ estimators, transformers, and datasets which use Theano internally
100
114
101
- - `Sparse Filtering <https://github.com/jmetzen/sparse-filtering >`_
102
- Unsupervised feature learning based on sparse-filtering
115
+ - `nolearn <https://github.com/dnouri/nolearn >`_ A n
9E7A
umber of wrappers and
116
+ abstractions around existing neural network libraries
117
+
118
+ - `keras <https://github.com/fchollet/keras >`_ Deep Learning library capable of
119
+ running on top of either TensorFlow or Theano.
120
+
121
+ - `lasagne <https://github.com/Lasagne/Lasagne >`_ A lightweight library to
122
+ build and train neural networks in Theano.
123
+
124
+ **Broad scope **
125
+
126
+ - `mlxtend <https://github.com/rasbt/mlxtend >`_ Includes a number of additional
127
+ estimators as well as model visualization utilities.
128
+
129
+ - `sparkit-learn <https://github.com/lensacom/sparkit-learn >`_ Scikit-learn
130
+ API and functionality for PySpark's distributed modelling.
131
+
132
+ **Other regression and classification **
133
+
134
+ - `xgboost https://github.com/dmlc/xgboost ` Optimised gradient boosted decision
135
+ tree library.
136
+
137
+ - `lightning <https://github.com/scikit-learn-contrib/lightning >`_ Fast
138
+ state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
139
+
140
+ - `py-earth <https://github.com/scikit-learn-contrib/py-earth >`_ Multivariate
141
+ adaptive regression splines
103
142
104
143
- `Kernel Regression <https://github.com/jmetzen/kernel_regression >`_
10
F438
5
144
Implementation of Nadaraya-Watson kernel regression with automatic bandwidth
@@ -108,28 +147,32 @@ and tasks.
108
147
- `gplearn <https://github.com/trevorstephens/gplearn >`_ Genetic Programming
109
148
for symbolic regression tasks.
110
149
111
- - `nolearn <https://github.com/dnouri/nolearn >`_ A number of wrappers and
112
- abstractions around existing neural network libraries
150
+ - `multiisotonic <https://github.com/alexfields/multiisotonic >`_ Isotonic
151
+ regression on multidimensional features.
113
152
114
- - ` sparkit-learn < https://github.com/lensacom/sparkit-learn >`_ Scikit-learn functionality and API on PySpark.
153
+ ** Decomposition and clustering **
115
154
116
- - `keras <https://github.com/fchollet/keras >`_ Deep Learning library capable of
117
- running on top of either TensorFlow or Theano.
118
-
119
- - ` mlxtend < https://github.com/rasbt/mlxtend >`_ Includes a number of additional
120
- estimators as well as model visualization utilities.
121
-
122
- - ` kmodes <https://github.com/nicodv/kmodes >`_ k-modes clustering algorithm for categorical data, and
123
- several of its variations.
155
+ - `lda <https://github.com/ariddell/lda/ >`_: Fast implementation of latent
156
+ Dirichlet allocation in Cython which uses ` Gibbs sampling
157
+ <https://en.wikipedia.org/wiki/Gibbs_sampling> `_ to sample from the true
158
+ posterior distribution. (scikit-learn's
159
+ :class: ` sklearn.decomposition.LatentDirichletAllocation ` implementation uses
160
+ ` variational inference
161
+ <https://en.wikipedia.org/wiki/Variational_Bayesian_methods > `_ to sample from
162
+ a tractable approximation of a topic model's posterior distribution.)
124
163
125
- - `hdbscan <https://github.com/lmcinnes/hdbscan >`_ HDBSCAN and Robust Single Linkage clustering algorithms
126
- for robust variable density clustering.
164
+ - `Sparse Filtering <https://github.com/jmetzen/sparse-filtering >`_
165
+ Unsupervised feature learning based on sparse-filtering
127
166
128
- - `lasagne <https://github.com/Lasagne/Lasagne >`_ A lightweight library to build and train neural networks in Theano.
167
+ - `kmodes <https://github.com/nicodv/kmodes >`_ k-modes clustering algorithm for
168
+ categorical data, and several of its variations.
129
169
130
- - `multiisotonic <https://github.com/alexfields/multiisotonic >`_ Isotonic regression on multidimensional features.
170
+ - `hdbscan <https://github.com/lmcinnes/hdbscan >`_ HDBSCAN and Robust Single
171
+ Linkage clustering algorithms for robust variable density clustering.
131
172
132
- - `spherecluster <https://github.com/clara-labs/spherecluster >`_ Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere.
173
+ - `spherecluster <https://github.com/clara-labs/spherecluster >`_ Spherical
174
+ K-means and mixture of von Mises Fisher clustering routines for data on the
175
+ unit hypersphere.
133
176
134
177
Statistical learning with Python
135
178
--------------------------------
@@ -145,12 +188,6 @@ Other packages useful for data analysis and machine learning.
145
188
statistical models. More focused on statistical tests and less on prediction
146
189
than scikit-learn.
147
190
148
- - `PyMC <http://pymc-devs.github.io/pymc/ >`_ Bayesian statistical models and
149
- fitting algorithms.
150
-
151
- - `REP <https://github.com/yandex/REP >`_ Environment for conducting data-driven
152
- research in a consistent and reproducible way
153
-
154
191
- `Sacred <https://github.com/IDSIA/Sacred >`_ Tool to help you configure,
155
192
organize, log and reproduce experiments
156
193
0 commit comments