8000 "scoring must return a number" error with custom scorer · Issue #6783 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

"scoring must return a number" error with custom scorer #6783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
proinsias opened this issue May 15, 2016 · 4 comments · Fixed by #6789
Closed

"scoring must return a number" error with custom scorer #6783

proinsias opened this issue May 15, 2016 · 4 comments · Fixed by #6789

Comments

@proinsias
Copy link
Contributor
proinsias commented May 15, 2016

Description

I'm encountering the same error (ValueError: scoring must return a number, got [...] (<class 'numpy.core.memmap.memmap'>) instead.) as #6147, despite running v0.17.1. This is because I'm creating my own scorer, following the example in this article.

Steps/Code to Reproduce

import pandas as pd
import numpy as np
from sklearn.cross_validation import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from functools import partial

def cutoff_predict(clf, X, cutoff):
    return (clf.predict_proba(X)[:, 1] > cutoff).astype(int)

def perc_diff_score(y, ypred, X=None):
    values = X[:,0]
    actual_value = np.sum(np.multiply(y, values))
    predict_value = np.sum(np.multiply(ypred, values))
    difference = predict_value - actual_value
    percent_diff = abs(difference * 100 / actual_value )
    return -1*percent_diff

def perc_diff_cutoff(clf, X, y, cutoff=None):
    ypred = cutoff_predict(clf, X, cutoff)
    return perc_diff_score(y, ypred, X)

def perc_diff_score_cutoff(cutoff):
    return partial(perc_diff_cutoff, cutoff=cutoff)

clf = RandomForestClassifier()
X_train, y_train = make_classification(n_samples=int(1e6), n_features=5, random_state=0)
values = abs(100000 * np.random.randn(len(X_train))).reshape((X_train.shape[0], 1))
X_train = np.append(values, X_train, 1)

cutoff = 0.1
validated = cross_val_score(clf, X_train, y_train, scoring=perc_diff_score_cutoff(cutoff),
                            verbose=3,
                            n_jobs=-1,
                            )

Expected Results

No error.

Actual Results

Same error as in #6147 :

/home/gillesa/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _score(estimator=ExtraTreesClassifier(bootstrap=False, class_weig..., random_state=None, verbose=0, warm_start=False), X_test=memmap([[  0.,   9.,  56., ...,   1.,   0.,   0....      [  0.,   6.,  57., ...,   1.,   0.,   0.]]), y_test=memmap([0, 0, 0, ..., 0, 0, 0]), scorer=make_scorer(roc_auc_score, needs_threshold=True))
   1604         score = scorer(estimator, X_test)
   1605     else:
   1606         score = scorer(estimator, X_test, y_test)
   1607     if not isinstance(score, numbers.Number):
   1608         raise ValueError("scoring must return a number, got %s (%s) instead."
-> 1609                          % (str(score), type(score)))
   1610     return score
   1611
   1612
   1613 def _permutation_test_score(estimator, X, y, cv, scorer):

ValueError: scoring must return a number, got 0.671095795498 (<class 'numpy.core.memmap.memmap'>) instead.

Workaround

Updated perc_diff_score() as follows to add cast to float.:

def perc_diff_score(y, ypred, X=None):
    values = X[:,0]
    actual_value = np.sum(np.multiply(y, values))
    predict_value = np.sum(np.multiply(ypred, values))
    difference = predict_value - actual_value
    percent_diff = np.float(abs(difference * 100 / actual_value ))
    return -1*percent_diff

Versions

Darwin-15.4.0-x86_64-i386-64bit
Python 3.5.1 |Anaconda 4.0.0 (x86_64)| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)]import numpy; print("NumPy", numpy.version)
NumPy 1.11.0
SciPy 0.17.0
Scikit-Learn 0.17.1

@jnothman
Copy link
Member

I'm not sure what's going on here. But I thought I'd point out that calling perc_diff_score_cutoff(cutoff) isn't doing what you think it's doing. The cutoff in perc_diff_cutoff is the global variable named cutoff, not the one passed into perc_diff_score_cutoff. You'd be better off using functools.partial to pass in cutoff as an extra argument to perc_diff_cutoff

@proinsias
Copy link
Contributor Author
proinsias commented May 16, 2016

@jnothman: Thanks! That was next on my list to understand!
I've updated the original post with this change.

@jnothman
Copy link
Member

So I think we don't have enough information to tackle this issue. What is the dtype of X?`

@proinsias
Copy link
Contributor Author
proinsias commented May 17, 2016

I've updated the example to include dummy data that trigger the bug on my machine.

X=X_train is originally an ndarray of float64's. When we call _fit_and_score() in cross_validation.py, X is now a memmap of float64's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0