[MRG] Multi-layer perceptron (MLP) #2120

IssamLaradji · 2013-06-30T21:35:06Z

Multi-layer perceptron (MLP)

PR closed in favor or #3204

This is an extention to larsmans code.

A multilayer perceptron (MLP) is a feedforward artificial neural network model that tries to learn a function f(X)=y where y is the output and X is the input. An MLP consists of multiple layers, usually of one hidden layer, an input layer and an output layer, where each layer is fully connected to the next one. This is a classic algorithm that has been extensively used in Neural Networks.

Code Check out :

git clone https://github.com/scikit-learn/scikit-learn
cd scikit-learn/
git fetch origin refs/pull/2120/head:mlp
git checkout mlp

Tutorial link:

- http://easymachinelearning.blogspot.com/p/multi-layer-perceptron-tutorial.html

Sample Benchmark:

- `MLP` on the scikit's `Digits` dataset gives, - Score for `tanh-based sgd`: 0.981 - Score for `logistic-based sgd`: 0.987 - Score for `tanh-based l-bfgs`: 0.994 - Score for `logistic-based l-bfgs`: 1.000

TODO:

- Review

amueller · 2013-07-01T07:12:59Z

sklearn/neural_network/mlp.py

+from ..utils.extmath import logsumexp, safe_sparse_dot
+
+
+def validate_grad(J, theta, n_slice):


you could move this into the test file.

I was wondering where to put it. Thanks!

amueller · 2013-07-01T07:16:43Z

Could you please say in how far this extends @larsmans PR? This one seems to be completely in Python, while I remember @larsmans's to be in Cython, right? I'm not completely sure how large the benefit of Cython was, though.
Does this one support sparse matrices? cc @temporaer

IssamLaradji · 2013-07-01T08:29:51Z

@amueller, larsmans missing part was the backpropagation which was partly developed in Cython, I developed that part using vectorized matrix operations, its quite fast, for example, running the algorithm on the 'digits' dataset for 400 iterations with 50 hidden neurons, would take about 5 seconds. I also added the option of using of a secondary optimization algorithm 'fmin_l_bfgs_bf' which is as fast, but achieves better classification performance with the same number of iterations. I'm also thinking of adding a third option: 'fmin_cg'. These optimizers are somewhat heavily used in neural networks.

I read that Cython code is easy to produce (just a matter of adding some prefixes and compiling the code). I will Cython the code and see if it adds benefits.

Yes. It supports sparse matrices (via safe_sparse_dot)

Thanks for the review

larsmans · 2013-07-01T09:19:22Z

sklearn/neural_network/mlp.py

+    activation: string, optional
+        Activation function for the hidden layer; either "sigmoid" for
+        1 / (1 + exp(x)), or "tanh" for the hyperbolic tangent.
+    _lambda : float, optional


We call this alpha almost everywhere else. Don't use a leading underscore on a parameter.

larsmans · 2013-07-01T09:20:30Z

I found that switching to Cython gave about an order of magnitude improvement over pure Python. We can merge this version as an intermediate, it looks every clean. How fast is it on 20newsgroups w/ 100 hidden units?

amueller · 2013-07-01T09:28:13Z

Really? Why? In the other implementation, there was no gain at all. I guess you used one sample "mini-batches"?

larsmans · 2013-07-01T09:30:15Z

No, my implementation would take large batches, divide these into randomized minibatches internally (of user-specified size), then train on those. That gave much faster convergence, without the need to actually materialize the minibatches (no NumPy indexing).

amueller · 2013-07-01T10:07:16Z

Ok, that makes sense and explains why the cython version is much faster.

IssamLaradji · 2013-07-01T14:09:20Z

@larsmans, using the whole 20 categories of 20news (not the watered down version) modeled by tf-idf scikit vectorizer, yielding an Input matrix of 18828 rows and 74324 columns aka features, and with 100 hidden neurons, the algorithm fitted on the whole sparse matrix with around 1 second per iteration. It seems like a good enough speed for MLP for such large data, but I might be wrong.

larsmans · 2013-07-01T14:12:12Z

What's the F1 score, and how many iterations are needed for it? (I got somewhat faster results, but I wonder if LBFGS converges faster than SGD.)

IssamLaradji · 2013-07-01T14:48:34Z

Right now, I applied the code on 4 categories of the 20news corpus, with 100 iterations and 100 hidden neurons, l_bfgs achieved an average f1-score of 0.87. I might need to leave the code run for a long time before it converges (it doesn't converge even after 300 iteration), thus, I suspect there is a bug in my code.

In your pull request you mentioned that you tested your code on a small subset of 20news corpus achieving similar results, did you use 4 categories too?

larsmans · 2013-07-02T17:40:10Z

sklearn/neural_network/mlp.py

+        inds = np.arange(n_samples)
+        rng.shuffle(inds)
+        n_batches = int(np.ceil(len(inds) / float(self.batch_size)))
+        # Transpose improves performance (from 0.5 seconds to 0.05)


Improves the performance of what? For dense or sparse data?

The performance improved in calculating the cost and the gradient. It was observed on dense data, didn't try it on sparse yet.

It might look peculiar, but it has something to do with the matrix multiplications. I just played with the timeit library to understand the performance increase. I found that if for example you multiply matrices A and B, together, while assuming the time it takes is 0.25 ms for such multiplication, then multiplying B.T with A.T could take twice as long, that is 0.5 ms. So, that small time increase will add up if the cost and gradient is calculated multiple times. In other words, multiplying matrices with different shapes could incur time overheads.

I will commit the non-transposed function to benchmark again, just to be safe.

It's not too surprising; probably due to Fortran vs. C arrays (column-major or row-major). Input to np.dot should ideally be a C and a Fortran array, in that order, IIRC.

But anyway, my point was that performance figures out of context don't belong in code :)

Thanks! I'll have such comments removed soon :)

larsmans · 2013-07-03T12:09:03Z

I have some MLP documentation lying around, I'll see if I can dig that up, too.

larsmans · 2013-07-03T15:29:17Z

sklearn/neural_network/mlp.py

@@ -0,0 +1,617 @@
+"""Mulit-layer perceptron


IssamLaradji · 2013-07-05T11:22:15Z

Sorry for being slow in responding, I had a bug in the code which took time to fix because the transposed X was confusing everything :). I had a weird benchmark that made me think that X.T improved performance, but in reality it did not, so I removed the transpose, making the code cleaner and easier to debug while the performance unchanged.

Moreover, I just committed a lot of changes, including,

Optimization_method parameter for selecting any scipy optimizer
Support of SGD
Improved minibatch creation using scikit's gen_even_slices
- (much faster than X[inds[minibatch::n_batches]])
Support of loss functions cross-entropy and square (more will be added)
Typos and name fixes

The performance benchmark on the digits dataset (100 hidden neurons and 170 iterations),

SGD with cross-entropy loss
- Score : 0.95
Optimization using CG aka Congruent Gradient with cross-entropy loss
- Score : 0.95
Optimization using l-bfgs with square loss
- Score : 0.98 (it has converged in 80 iterations)
Please note that the score is worse when the loss is square for SGD and CG.

Will post the test results on the 20News dataset soon.

Some of the remaining TODO's would be:

Use sqrt(n_features) to select the number of hidden neurons
Update the documentation
Add verbose
Add a test file
Add an example file

Thank you for your great reviews!

larsmans · 2013-07-05T17:09:21Z

You can get a unit test by

git remote add larsmans git@github.com:larsmans/scikit-learn.git
git fetch larsmans
git cherry-pick 3176ce315b38176484fd57357496f8a6a0589071

I'd send you a PR, but it looks like the GitHub UI changes broke PRs between forks.

IssamLaradji · 2013-07-06T12:59:01Z

@larsmans the unit test is very useful! I will renovate the code as per the comments.

Thanks for the review!

IssamLaradji · 2013-07-06T22:55:08Z

Updates

- Replaced scipy `minimize` with`l-bfgs` - So not to compel users to install scipy 13.0+ - Renames, as per the comments, done - Divided the function to `MLPClassifier` and `MLPRegressor` - Set `square_loss` default for `MLPRegressor` - Set `log` default for `MLPClassifier` - Fixed long lines (some lines might still be long) - Added learning rates, that include, `optimal`, `constant`, and `invscaling` - New Benchmark on the `digits` dataset (100 iterations, 100 hidden neurons, `log` loss) - `tanh-based SGD` : `0.957` - `tanh-based l_bfgs` : `0.985` - `logistic-based SGD` : `0.992` - `logistic-based l_bfgs` : `1.000` (converged in `70` iterations)

These are interesting results because tanh should make the algorithm converge faster than logistic. I suspect a bug in computing the deltas (line 454 to 466) that lead to these obscure results. I'll use the unit test to ensure the backpropagation is working as necessary.

The documentation will be updated once the code is deemed reliable.

Thank you for your great reviews and tips! :)

GaelVaroquaux · 2013-07-11T14:18:34Z

sklearn/neural_network/mlp.py

+    -------
+    x_new: array-like, shape (M, N)
+    """
+    X *= -X


You need to indicate in a very visible way in the docstring that the input data is modified.

@GaelVaroquaux , so it would be better to write Returns the value computed by the derivative of the hyperbolic tan function instead of Computes the derivative of the hyperbolic tan function in line 100?

or Modifies the input 'x' via the computation of the tanh derivative ?

thanks

If you make all of these private (preprend an _ to the name) then a single comment above them would be enough, I think.

Thanks, fixed.
On another note, is there a way to beat the travis build? Does it fail because the test cases are not established yet? Thanks

Travis runs a bunch of tests on the entire package, including your code because it detects a class inheriting from ClassifierMixin. You should inspect the results to see what's going wrong.

Thanks, the errors are very clear now :)

GaelVaroquaux · 2013-07-11T14:21:01Z

The neural_network sub-package needs to be added to the setup.py of sklearn. Elsewhere it does not get copied during the install.

luoq · 2013-07-11T15:34:56Z

sklearn/neural_network/mlp.py

+        }
+        self.activation = activation_functions[activation]
+        self.derivative = derivative_functions[activation]
+        self.output_func = activation_functions[output_func]


parameters of `init' should not be changed. Same for random_state below. See http://scikit-learn.org/stable/developers/index.html#apis-of-scikit-learn-objects

@luoq thanks for pointing that out, I fixed the initialization disagreements and pushed the new code.

~Issam

IssamLaradji · 2013-08-02T03:22:29Z

Hi everyone, sorry for being inactive in this, it's been a laborious 2 weeks :). I have updated the code by improving the documentation and eliminating tanh related problems. As tanh can yield negative values, applying the log function on that produces an error, so I added code that scales the value in [0,1] range to ensure it being positive via 0.5*(a_output + 1)

The code seems to be accepted by the travis build, however, MLPRegressor is yet to be implemented, but will be done soon.

PS: I'm also writing a blog that aims in helping 'newcomers to the field' (maybe practitionars even) engage in Neural Networks
http://easydeeplearning.blogspot.com/p/multi-layer-perceptron-tutorial.html

Thanks in advance!

arjoly · 2013-08-02T06:03:00Z

Instead of using an abreviation for MLP, why not write plainly MultilayerPerceptron? Thus you would get more readable class name MultilayerPerceptronClassifier, MultilayerPerceptronRegressor and BaseMultilayerPerceptron.

larsmans · 2013-08-02T08:46:35Z

Also, can you rebase on master? We merged RBMs, so there's a neural_network module now.

larsmans · 2013-08-02T08:47:15Z

sklearn/neural_network/mlp.py

+from itertools import cycle, izip
+
+
+def _logistic(x):


In master, there's a fast logistic function in sklearn.utils.extmath.

Thanks, I will plug it in!

IssamLaradji · 2013-08-02T22:19:36Z

@arjoly, the naming is a good idea thanks!
@larsmans sure I will have it rebased

IssamLaradji · 2013-08-04T01:46:35Z

Okay done :), I have fixed the binary classification, I'm getting 100% score with logistic as well as tanh on a binary dataset generated using the Digits scikit's repository. It turns out that I had to apply logistic on the output layer regardless of the activation function in the hidden layer, and the loss function is
-np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

Gladly, it passed the travis test, now what is left is to re-use some of scikit's cython-based loss functions (and logistic) for improved speed and implement MLP for regression.

In addition, the packing and unpacking methods are to be improved.

… verbose for SGD and minibatch processing for l-bfgs

IssamLaradji · 2014-05-27T01:57:49Z

Hi guys, I am closing this pull-request because of the very long discussion.
Here is the new pull request: #3204.

Thanks

amueller reviewed Jul 1, 2013
View reviewed changes

larsmans reviewed Jul 1, 2013
View reviewed changes

larsmans reviewed Jul 2, 2013
View reviewed changes

larsmans reviewed Jul 3, 2013
View reviewed changes

sklearn/neural_network/mlp.py

@@ -0,0 +1,617 @@

"""Mulit-layer perceptron

Copy link

Member

larsmans Jul 3, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

GaelVaroquaux reviewed Jul 11, 2013
View reviewed changes

luoq reviewed Jul 11, 2013
View reviewed changes

larsmans reviewed Aug 2, 2013
View reviewed changes

IssamLaradji added 27 commits May 27, 2014 04:38

Renamed methods for readability

06179a2

Cleaned the code and set internal functions private

f821428

10000

Fixed some travis errors

93cc2d4

More Travis error fixes

145ab21

Updates to fix 'tanh' values issue with 'log'

8ad8552

Updated the method descriptions

ecbce5e

replaced 'einsum' to a more readable and faster operation

dfb422b

Added an example to illustrate MLP performance on the digits dataset,…

23ac620

… verbose for SGD and minibatch processing for l-bfgs

Fixed initialization disagreements

d77c5bc

Renamed methods for readability

27312d9

Cleaned the code and set internal functions private

1509e6f

Fixed some travis errors

f14f4f3

More Travis error fixes

fe50253

Updates to fix 'tanh' values issue with 'log'

052d9ee

Updated the method descriptions

450feb2

rebased

6d28ca5

Added Binary classification support; Name changed for readability

6e773d4

squashed last 40 commits

a514ea7

Adopted normalized init and set tanh as default non-linearity

58b9c55

rebased

453debd

fixed syntax

28b6a58

More fixes

7537689

Fixes

acb5c7d

concrete fixes

5d8350b

changed xrange to range

dad4148

fixed remaining issues of python3

9a857f5

Addressed the last set of comments

6d37a78

IssamLaradji mentioned this pull request May 27, 2014

[MRG] Generic multi layer perceptron #3204

Closed

4 tasks

IssamLaradji closed this May 27, 2014

		from ..utils.extmath import logsumexp, safe_sparse_dot


		def validate_grad(J, theta, n_slice):

Uh oh!

[MRG] Multi-layer perceptron (MLP) #2120

[MRG] Multi-layer perceptron (MLP) #2120

Uh oh!

Conversation

Multi-layer perceptron (MLP)

Code Check out :

Tutorial link:

Sample Benchmark:

TODO:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Updates

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!