Add 32 bit support to neural_network module #17700

rth · 2020-06-24T11:45:51Z

Related to #11000 it would be good to add support for 32 bit computations for estimators in the neural_network module. This was done for BernoulliRBM in #16352 Because performance is bound by the dot product this is going to have a large impact (cf #17641 (comment))

I would even argue that unlike other models, it could make sense to add a dtype=np.float32 parameter and make calculations in 32 bit by default regardless of X.dtypes. We could also consider supporting dtype=np.float16.

The text was updated successfully, but these errors were encountered:

NicolasHug · 2020-06-24T17:34:32Z

I was under the impression that we decided to stop providing new features / improvements to the NN module?

rth · 2020-06-24T17:44:04Z

Stop providing features yes, performance improvements no, I think. As we do include it, people will use in some simple cases where installing pytorch/tensorflow is a hassle and there is no reason to keep our implementation slower that it can be.

Here we are talking about easy changes to half a dozen lines( similar to #16352) with a clear performance improvement.

postmalloc · 2020-06-27T12:06:35Z

@rth, I've tried passing dtype as a parameter to MLP something like this -

clf = MLPClassifier(alpha=1e-5,
                    hidden_layer_sizes=(10, 5, 3), 
					random_state=1, max_iter=100, 
					dtype=np.float32)

Internally, I cast all the network parameters to this dtype. A basic speed comparison of _fit shows that float32 is faster than float64, as expected.

float64 => 4.48 s ± 41.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
float32 => 4.01 s ± 74.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The dataset is something like this:

X, y = make_classification(n_samples=10000, n_features=30, n_informative=15, n_classes=5, random_state=0)

rth · 2020-06-27T14:49:36Z

Great! I was hoping that it would be a bit more significant, I but I guess it also depends on the CPU vectorization support.

Maybe let's not add the dtype parameter after all, but rather do as in #16352: run,

X = check_array(X, ..., dtype=(np.float64, np.float32))

first and then use X.dtype for the creation of new arrays. That would probably be more consistent with other estimators. Would you like to make a PR?

postmalloc · 2020-06-27T16:04:03Z

Sure @rth, will open a PR in a bit.

rth added Performance good first issue Easy with clear instructions to resolve Sprint labels Jun 24, 2020

rth mentioned this issue Jun 24, 2020

PERF Remove redundant if statement from neural net log_loss function #17641

Closed

postmalloc mentioned this issue Jun 27, 2020

[MRG] 32-bit support for MLP #17759

Merged

rth closed this as completed in #17759 Jul 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add 32 bit support to neural_network module #17700

Add 32 bit support to neural_network module #17700

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add 32 bit support to neural_network module #17700

Add 32 bit support to neural_network module #17700

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!