8000 make LinearSVC(dual="auto") default · Issue #6830 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

make LinearSVC(dual="auto") default #6830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amueller opened this issue May 26, 2016 · 7 comments
Open

make LinearSVC(dual="auto") default #6830

amueller opened this issue May 26, 2016 · 7 comments

Comments

@amueller
Copy link
Member

I think the dual variable in LinearSVC and LogisticRegression should be set based on the penalty and loss parameters. (not sure if there is already an issue for that).

@amueller amueller changed the title make LinearSVC(dual=None) default make LinearSVC(dual="auto") default Jul 15, 2016
@amueller amueller added Easy Well-defined and straightforward way to resolve Need Contributor Sprint labels Jul 15, 2016
@bwallin
Copy link
bwallin commented Jul 16, 2016

Ill choose you for my first issue to address at the Scipy 2016 sprint!

@amueller
Copy link
Member Author

Be sure to use the deprecation mechanism: http://scikit-learn.org/dev/developers/contributing.html#deprecation

@krishnakalyan3
Copy link
Contributor
krishnakalyan3 commented Sep 20, 2016

@amueller could you please help clarify, if only the below 2 files need to be changed?
sklearn/svm/classes.py
sklearn/linear_model/logistic.py

  • As per my understanding dual takes only boolean values. Default being False. Now this default value changes to auto. Is this the only change required in the above files with appropriate parameter depreciation warning?.
  • Does the Parameter definition change to something else if auto as a parameter is implemented?
    Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

@amueller amueller removed Easy Well-defined and straightforward way to resolve Sprint labels Sep 22, 2016
@amueller
Copy link
Member Author

You also need to add the correct behavior for auto which would be something compatible with the loss and penalty parameters. If there's more than one possible way to set dual based on loss and penalty we need a heuristic, probably based on n_samples, n_features and maybe whether the data is sparse. That actually requires some benchmarking, I think.

For now, we could say always default to dual=False and don't use a heuristic as a first step.

@krishnakalyan3
Copy link
Contributor

@amueller so for now what is the scope of this Issue. If its not too complicated I could work on this.
Thanks

@amueller
Copy link
Member Author

As I said above, I think for now it would be fine to have "auto" be "dual=False" if "dual=False" is supported and otherwise "dual=True"

@amueller
Copy link
Member Author

If we do it like is it's a backward-compatible change. If we do anything else, we would need a deprecation cycle...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0