-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Bayesian Logistic Regression / Logistic Regression with ARD #6259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
there is no plan AFAIK. But an implementation with a sklearn API that
passes the check_estimator
http://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.check_estimator.html
that one can find on github would be nice to have.
|
Hi, I have implemented ARD Logistic Regression with sklearn API. I also made small ipython notebook ipython notebook: https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Fast%20Relevance%20Vector%20Machine/ard_classification_demo.ipynb |
you should make a proper package with tests and example checking sklearn
API.
then we could add it to the list of related projects
|
Hi , sorry for delay I was working on tests & ipython examples. Here is link to package: https://github.com/AmazaspShumik/sklearn_bayes . |
@agramfort why not get this into scikit-learn? |
@AmazaspShumik do you want to get this into scikit-learn-contrib? https://github.com/scikit-learn-contrib/scikit-learn-contrib/blob/master/README.md |
Also see #1513 |
@amueller @agramfort @AlexYurasov Hi, I will be more than happy to contribute. I am on vacation now so if it is ok I can start working on it on Monday. |
no objection to move this into sklearn but it's easier to start with an
outside package.
|
There is a package, though it doesn't fulfil our testing and doc standard. I haven't check api compatibility. I kind of want all these models but maybe @agramfort is right about going to scikit-learn-contrib first. |
Hi @amueller @agramfort @GaelVaroquaux
In case you are interested here is my plan for package updates in next 2-3 weeks:
New Algorithms I plan to include:
|
Maybe @GaelVaroquaux and @jnothman have opinions?
It seemed to me that these types of models has mostly lost steam lately.
|
@GaelVaroquaux in academia but not in industry, I think. I haven't seen an extensive comparison but people really like linear models. |
I really like linear models :). I was more thinking of the Bayesian I do know some cases where it has an edge (sparse recovery with a noise |
I think ARD is helpful in particular with unknown covariance. I talk to people and they tell me they use a thing because... you know it works? I think people rarely take the time to do full comparisons. My boss at Amazon was really into EP ;) And I think it was a hedge fund that recently asked me if we want their implementation for EP. |
Hi, really sorry it took me so long! It is really difficult to make some parts of my package (skbayes) to pass check_estimator(), for instance, Bernoulli and Poisson Mixture models accept only specific type of inputs and fail to pass tests. Other types of models, like Restricted Boltzmann Machines and Latent Dirichlet Allocation, are already included in sklearn . So I just wanted to ask will it be ok if I will create a separate package that includes only ARD / Variational ARD / Linear models? I significantly improved all of these models over last month, all of them pass check_estimator and are python 3 compatible, so it should be relatively easy to create such package to satisfy criteria for scikit-learn-contrib. Thanks! |
I don't think we can make check_estimator a hard criterion, for the reasons On 16 November 2016 at 04:37, Amazasp notifications@github.com wrote:
|
I totally agree with @jnothman. If you'd like to move your whole project, we can do that for now even if it doesn't entirely pass I'm also a bit curious about how your ARD compares to our ARD, and similar for your BayesianRidge. Our "legacy" implementations are probably not that great. |
@amueller Sorry for my late reply.
|
Apart from check_estimator, have you taken a look at the workflow at
https://github.com/scikit-learn-contrib/scikit-learn-contrib/blob/master/workflow.md?
A request for inclusion there would provide a dedicated space for
discussion of any issues, but I expect there'll be few.
…On 15 January 2017 at 16:00, Amazasp ***@***.***> wrote:
@amueller <https://github.com/amueller> Sorry for my late reply.
1.
Moving Project
@amueller <https://github.com/amueller> @jnothman
<https://github.com/jnothman> I would like to move the whole project
now if it is an option. Currently, I have little time to work on it, so
maybe other people will contribute! I will have more time in February to
improve package and work on appropriate tests.
2.
Comparison with "legacy" implementation.
- ARD: In cases, when n_samples <= n_features, both algorithms have
similar MSE, but "legacy" implementation is significantly slower. In cases,
when n_samples >> n_features both algorithms obtain bad results.
(Note that my implementation of ARD is based on RVM article by
Tipping, where input is kernel matrix i.e. it has size of
(n_samples,n_samples ) )
- Also while "legacy" algorithms do not provide an opportunity to
define prior, it can be done in some algorithms included in skbayes (for
instance VBRegressionARD and VBLinearRegression) .
1. Comparison with MNEPython implementation
As far as I understand MNEPython implementation is based on the paper by
Wipf, if that is the case, then it should have much better accuracy for
cases when n_features >> n_samples since it is less likely to be trapped in
local minimum for high dimensional data.
But I am not sure how fast it is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6259 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yjnC5nOaJD29X-fh5d35IGjjAEoks5rSafvgaJpZM4HQPyX>
.
|
@jnothman I just had a look, is it ok to make a request for inclusion if there are still some problems, and then work & discuss them? I still have some issues with Travis CI (it fails every time I include c extensions for GibbsLDA and Hidden Markov Models, I am still trying to figure out why) |
Best to sort out basic things like Py2/3 in before submitting, but
otherwise, issues can be discussed.
…On 16 January 2017 at 07:40, Amazasp ***@***.***> wrote:
@jnothman <https://github.com/jnothman>
Thanks for quick answer!
I just had a look, is it ok to make a request for inclusion if there are
still some problems, and then work & discuss them?
I still have some issues with Travis CI (it fails every time I include c
extensions for GibbsLDA and Hidden Markov Models, I am still trying to
figure out why)
Three algorithms in the package are not Python 3 compatible (this should
be easy to resolve, it is mainly because of "print").
I would probably need some more tests for unsupervised learning algorithms
in package, which do not pass check_estimator (all supervised learning
algorithms pass check_estimator)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6259 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz645MLOtTtOHrjQVe8mVDoMY0Ttdfks5rSoQhgaJpZM4HQPyX>
.
|
I fixed issues with Python 2/3 and extension modules, build passes in Python 2.7 and Python 3.4 / 3.5 (using Travis CI). However, as I mentioned before I have little tests for unsupervised learning algorithms. I am busy currently, so I will need a little time writing them. Is it ok if I will come back to you at the end of this week? Thanks) |
There's no rush!
I should note that one of my main concerns about sklearn-bayes is the lack
of unit tests. (This is my concern more as someone who would like to be
able to promote sklearn-bayes to my colleagues doing applied bayesian
modelling.) I appreciate that much of the functionality is tested in ipynb,
and that perhaps basic reasonable behaviour is tested by scikit-learn's
check_estimator. (Perhaps the IPython notebooks could be included in a
continuous integration test harness, but there are challenges in that.) It
would be great if there were tests that the models were reasonable (at PPD
expectation and in distribution) on toy / known datasets (with one
iteration, if possible; and at convergence), that methods predict and
predict_dist agree with each other, that model attributes are present and
have the expected shape, that they emit appropriate warnings if convergence
demonstrably fails, etc.
If you don't have the time to do this, but might have time to review it, I
wouldn't mind if you put out a call for contributors to the scikit-learn
mailing list.
…On 17 January 2017 at 08:22, Amazasp ***@***.***> wrote:
@jnothman <https://github.com/jnothman>
I fixed issues with Python 2/3 and extension modules, build passes in
Python 2.7 and Python 3.4 / 3.5 (using Travis CI).
However, as I mentioned before I have little tests for unsupervised
learning algorithms. I am busy currently, so I will need a little time
writing them. Is it ok if I will come back to you at the end of this week?
Thanks)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6259 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz673eNDjIPtEeaFTYfTdRQxwRVZwsks5rS9-cgaJpZM4HQPyX>
.
|
@jnothman Thank you very much for the advice I appreciate it. I think I can add more robust testing for all supervised learning algorithms (they constitute about half of the package) in the span of two weeks. As for the rest of the package, it may be more difficult, because there is a substantial amount of code refactoring to do so that I might need help with that. The problem is that some algorithms in my package already exist in scikit-learn (like RBM and LDA) or other great packages (like Bayesian HMM) and there is no point to spend time on them. What do you think @jnothman? |
Yes, I think that's a reasonable conclusion. |
Seems like this is not something we'd be easily adding to scikit-learn since I don't see much recurring request for it, and having it somewhere else might be ideal. So closing this one. |
There are Bayesian Linear Regression and ARD regression in scikit, are there any plans to include Bayesian / ARD Logistic Regression? ARD version will be really helpful for identifying relevant features.
The text was updated successfully, but these errors were encountered: