8000 Minimum redundancy maximum relevance (mRMR) feature selection by AndreaBravi · Pull Request #2547 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Minimum redundancy maximum relevance (mRMR) feature selection #2547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

AndreaBravi
Copy link

Hi!

I have created a new class computing the mRMR filtering feature selection.

pep8, pyflakes and nosetests run succesfully on the submitted code.

I am planning to create the documentation for this class as soon as I receive your approval.

Thanks!

Andrea

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 62cf55b on AndreaBravi:mRMR into d43a767 on scikit-learn:master.

-------
_compute_mRMR(X, y)
Computes the minimal relevance maximal redundancy of each feature
returning mask and score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should never refer to private methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@amueller
Copy link
Member

It would be cool to have an example that compares this method with univariate selection and RFE.

@amueller
Copy link
Member

This looks good, thanks. I am not totally convinced by the tests, though. I am not familiar with the method but it would be good if the expected result from the model could be computed in an easy way. Currently it looks like the scores are some magic numbers and I don't know what they mean.

@AndreaBravi
Copy link
Author

Nice suggestion! I will insert that kind of example in the documentation, as soon as I get familiar with sphinx.

About the testing, I emulated what done in the tests for mutual_information (sklearn.metrics.cluster). Also in that case there are arbitrary numbers. I am not aware of a theoretical value of mRMR that can be used for this purpose.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling c2d1852 on AndreaBravi:mRMR into d43a767 on scikit-learn:master.

@amueller
Copy link
Member
amueller commented Nov 4, 2013

To create an example, you simply have to add a file under the examples folder that starts with plot_.

@AndreaBravi
Copy link
Author

Thanks for the clarification, I was thinking of adding it in the description of the method, inside feature_selection.rst

By the way, once I have added the example in the examples folder, which rst file do I need to modify to make sure that it gets published in http://scikit-learn.org/stable/auto_examples/index.html?

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 9d1ea9a on AndreaBravi:mRMR into d43a767 on scikit-learn:master.

@ddofer
Copy link
ddofer commented Nov 19, 2014

Are there any plans to end up implementing this? I'd love to see mrmr/Mutual Info feature selection actually decently implemented in python (without needing the C.exe) especially in scikit.

@amueller
Copy link
Member
amueller commented Jan 9, 2015

Sorry this lay around for a while. We seem to be all pretty busy at the moment. I still think this is a cool addition.


Attributes
----------
k : int, default=2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a Parameters section together with "rule".


assert_array_equal([2, 0], m.mask)

assert_array_equal(0.6730116670092563, m.score[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to compare to results in the literature, which I presume are not reported to 16 decimal places ;)

@MechCoder
Copy link
Member

Closed in favour of the other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0