8000 Bayesian Logistic Regression / Logistic Regression with ARD · Issue #6259 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Bayesian Logistic Regression / Logistic Regression with ARD #6259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexYurasov opened this issue Feb 1, 2016 · 27 comments
Closed

Bayesian Logistic Regression / Logistic Regression with ARD #6259

AlexYurasov opened this issue Feb 1, 2016 · 27 comments
Labels
Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository

Comments

@AlexYurasov
Copy link

There are Bayesian Linear Regression and ARD regression in scikit, are there any plans to include Bayesian / ARD Logistic Regression? ARD version will be really helpful for identifying relevant features.

@agramfort
Copy link
Member
agramfort commented Feb 1, 2016 via email

@AmazaspShumik
Copy link

@agramfort

Hi, I have implemented ARD Logistic Regression with sklearn API. I also made small ipython notebook
comparing perfomance of ARD, L1, L2 logistic regressions.

code: https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Fast%20Relevance%20Vector%20Machine/fast_scikit_rvm.py

ipython notebook: https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Fast%20Relevance%20Vector%20Machine/ard_classification_demo.ipynb

@agramfort
Copy link
Member
agramfort commented Feb 9, 2016 via email

@AmazaspShumik
Copy link

@agramfort

Hi , sorry for delay I was working on tests & ipython examples. Here is link to package: https://github.com/AmazaspShumik/sklearn_bayes .

@amueller
Copy link
Member
amueller commented Oct 8, 2016

@agramfort why not get this into scikit-learn?

@amueller
Copy link
Member
amueller commented Oct 8, 2016

@AmazaspShumik do you want to get this into scikit-learn-contrib? https://github.com/scikit-learn-contrib/scikit-learn-contrib/blob/master/README.md

@amueller
Copy link
Member
amueller commented Oct 8, 2016

Also see #1513

@AmazaspShumik
Copy link

@amueller @agramfort @AlexYurasov

Hi,

I will be more than happy to contribute. I am on vacation now so if it is ok I can start working on it on Monday.

@agramfort
Copy link
Member
agramfort commented Oct 12, 2016 via email

@amueller
Copy link
Member

There is a package, though it doesn't fulfil our testing and doc standard. I haven't check api compatibility.
@AmazaspShumik can you include check_estimator in your test-suite? You don't have continuous integration set up, do you?

I kind of want all these models but maybe @agramfort is right about going to scikit-learn-contrib first.
Maybe @GaelVaroquaux and @jnothman have opinions?

@AmazaspShumik
Copy link

Hi @amueller @agramfort @GaelVaroquaux

  1. I do not have continuous integration (but I plan to include it, I actually tried to do it before with
    Travis CI )

  2. I will update test suite. However, I used check_estimator and all regression and classification
    algorithms (with the exception of RVR ) that are currently included in sklearn-bayes package
    pass check_estimator.

In case you are interested here is my plan for package updates in next 2-3 weeks:

  • Update test suite (include tests for )
  • Set up continuous integration
  • Finish fixing RVR's stability + finish refactoring code for mixture models
  • Add / improve ipython notebooks

New Algorithms I plan to include:

  • SVI for HDP and Probabilistic Matrix Factorization

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Oct 14, 2016 via email

@amueller
Copy link
Member

@GaelVaroquaux in academia but not in industry, I think. I haven't seen an extensive comparison but people really like linear models.

@GaelVaroquaux
Copy link
Member

I really like linear models :). I was more thinking of the Bayesian
thing. Is it still used in the industry? I thought that it never had been
used.

I do know some cases where it has an edge (sparse recovery with a noise
with known covariance, and the covariance is injected in the model), but
AFAIK they are rare.

@amueller
Copy link
Member

I think ARD is helpful in particular with unknown covariance. I talk to people and they tell me they use a thing because... you know it works? I think people rarely take the time to do full comparisons. My boss at Amazon was really into EP ;) And I think it was a hedge fund that recently asked me if we want their implementation for EP.

@AmazaspShumik
Copy link

@amueller

Hi, really sorry it took me so long! It is really difficult to make some parts of my package (skbayes) to pass check_estimator(), for instance, Bernoulli and Poisson Mixture models accept only specific type of inputs and fail to pass tests. Other types of models, like Restricted Boltzmann Machines and Latent Dirichlet Allocation, are already included in sklearn .

So I just wanted to ask will it be ok if I will create a separate package that includes only ARD / Variational ARD / Linear models? I significantly improved all of these models over last month, all of them pass check_estimator and are python 3 compatible, so it should be relatively easy to create such package to satisfy criteria for scikit-learn-contrib.

Thanks!

@jnothman
Copy link
Member

I don't think we can make check_estimator a hard criterion, for the reasons
you state: it's not flexible enough (yet). So don't worry too much if you
can't make it entirely pass, but please do comment, perhaps at #6715 on the
ways in which check_estimator needs to be more flexible to admit your
estimators...

On 16 November 2016 at 04:37, Amazasp notifications@github.com wrote:

@amueller https://github.com/amueller

Hi, really sorry it took me so long! It is really difficult to make some
parts of my package (skbayes) to pass check_estimator(), for instance,
Bernoulli and Poisson Mixture models accept only specific type of inputs
and fail to pass tests. Other types of models, like Restricted Boltzmann
Machines and Latent Dirichlet Allocation, are already included in sklearn .

So I just wanted to ask will it be ok if I will create a separate package
that includes only ARD / Variational ARD / Linear models? I significantly
improved all of these models over last month, all of them pass
check_estimator and are python 3 compatible, so it should be relatively
easy to create such package to satisfy criteria for scikit-learn-contrib.

Thanks!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6259 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz62zE8wLyEORgczKICpotyzG1jWFmks5q-e33gaJpZM4HQPyX
.

@amueller
Copy link
Member

I totally agree with @jnothman. If you'd like to move your whole project, we can do that for now even if it doesn't entirely pass check_estimator. If you'd rather move out parts as a separate project, that's fine, too.

I'm also a bit curious about how your ARD compares to our ARD, and similar for your BayesianRidge. Our "legacy" implementations are probably not that great.
@agramfort suggested taking the one from MNEPython in #2542 though I'm not sure that's still relevant? If it is, we should compare all three and make sure that the "best" is in sklearn.

@AmazaspShumik
Copy link
AmazaspShumik commented Jan 15, 2017

@amueller Sorry for my late reply.

  1. Moving Project
  • @amueller @jnothman I would like to move the whole project now if it is an option. Currently, I have little time to work on it, so maybe other people will contribute! I will have more time in February to improve package and work on appropriate tests.
  1. Comparison with "legacy" implementation.
  • ARD: In cases, when n_samples >= n_features, both algorithms have similar MSE, but "legacy" implementation is significantly slower. In cases, when n_features >> n_samples both algorithms obtain bad results.
    (Note that my implementation of ARD is based on RVM article by Tipping, where input is kernel matrix i.e. it has size of (n_samples,n_samples ) )
  • Also while "legacy" algorithms do not provide an opportunity to define prior, it can be done in some algorithms included in skbayes (for instance VBRegressionARD and VBLinearRegression) .
  1. Comparison with MNEPython implementation
  • As far as I understand MNEPython implementation is based on the paper by Wipf, if that is the case, then it should have much better accuracy for cases when n_features >> n_samples since it is less likely to be trapped in local minimum for high dimensional data.
    But I am not sure how fast it is.

@jnothman
Copy link
Member
jnothman commented Jan 15, 2017 via email

@AmazaspShumik
Copy link

@jnothman
Thanks for quick answer!

I just had a look, is it ok to make a request for inclusion if there are still some problems, and then work & discuss them?

I still have some issues with Travis CI (it fails every time I include c extensions for GibbsLDA and Hidden Markov Models, I am still trying to figure out why)
Three algorithms in the package are not Python 3 compatible (this should be easy to resolve, it is mainly because of "print").
I would probably need some more tests for unsupervised learning algorithms in package, which do not pass check_estimator (all supervised learning algorithms pass check_estimator)

@jnothman
Copy link
Member
jnothman commented Jan 15, 2017 via email

@AmazaspShumik
Copy link

@jnothman

I fixed issues with Python 2/3 and extension modules, build passes in Python 2.7 and Python 3.4 / 3.5 (using Travis CI).

However, as I mentioned before I have little tests for unsupervised learning algorithms. I am busy currently, so I will need a little time writing them. Is it ok if I will come back to you at the end of this week?

Thanks)

@jnothman
Copy link
Member
jnothman commented Jan 17, 2017 via email

@AmazaspShumik
Copy link

@jnothman Thank you very much for the advice I appreciate it.

I think I can add more robust testing for all supervised learning algorithms (they constitute about half of the package) in the span of two weeks. As for the rest of the package, it may be more difficult, because there is a substantial amount of code refactoring to do so that I might need help with that.

The problem is that some algorithms in my package already exist in scikit-learn (like RBM and LDA) or other great packages (like Bayesian HMM) and there is no point to spend time on them.
So I am not sure it will make any sense to work on the whole package, maybe it will be reasonable to separate supervised learning and mixture models into a separate package which fully satisfies conditions for inclusion into scikit-learn contrib? ( I can do that in one month probably). Then new package can gradually grow and include more and more Bayesian algorithms.

What do you think @jnothman?

@jnothman
Copy link
Member

Yes, I think that's a reasonable conclusion.

@cmarmo cmarmo added the Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository label Nov 30, 2021
@adrinjalali
Copy link
Member

Seems like this is not something we'd be easily adding to scikit-learn since I don't see much recurring request for it, and having it somewhere else might be ideal. So closing this one.

@adrinjalali adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository
Projects
None yet
Development

No branches or pull requests

3712
8 participants
0