-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Implement mRMR feature selection #8889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What's the reference paper and how much has it been cited?
|
A bit more than 4500 citations:http://ieeexplore.ieee.org/abstract/document/1453511/
|
The author version: PDF |
Hi. Any update on this issue? I'm been working in feature selection lately and I think the current mutual information is limited |
I think we should have an MRMR approach implemented, pe 8000 rhaps especially after we have dropped Randomized L1. PR welcome. |
In practice, how far is it from using a |
I'm not sure. SelectFromModel with lasso might be as good. perhaps we need
to look at the computational complexity with lots of features also.
|
and sequential feature selection might be a more expensive variant again
that would account for redundancy
|
the issue with mrmr will be to compute the mutual information which pretty costly as well if I am not wrong. |
but anyway it would be great to have some real benchmark to actually argument the inclusion or not of such feature |
but anyway it would be great to have some real benchmark to actually argument
the inclusion or not of such feature
+1
|
Does any branch is working on this issue? Higher Order Mutual Information Approximation for Feature Selection As we can see on above paper, mRMR is just one special case of 'Mutual Information Based Feature Selection Method' If so, i'll code it in few days and want to discuss about implementation details.
|
I think we're keen on having one implementation of a well-established mrmr approach if you can illustrate that it complements or outperforms , say, SelectFromModel(RandomForest*) and Sequential (#8684) |
@jnothman But I can say that mRMR is quite different(complement) to either of approaches.
I hope that my explanation is what you expected...... |
it would be good to have the benefits of mrmr illustrated with an example
in our example gallery, as well as an implementation
|
Another advantage of MRMR is that it's just very fast and decent guarantees. I'd like using it as a "filter" before other methods, and it's much better than just naively dropping by co-correlation between features (regardless of target). In highly redundant datasets (features wise), this is a huge difference |
Is there any advance on this issue? I can submit a PR with the mRMR implementation. There are very useful feature selection functions in Matlab for regression and for classification. |
@ncooder As mentioned in the discussion, we would need a benchmark to check if in practice this methods is better than just selecting feature using a If in practice there are leading to the same performance then there is no need to implement it even though the paper is cited. |
Speaking with @jorisvandenbossche IRL, we come to discuss about the mRMR feature selection among other methods.
#5372 intended at first to implement the mRMR with mutual information as a metric. However, it has been merged such that mutual information could be used in the
SelectKBest
class. It was also discuss that the mRMR mechanism could be implemented in a separate PR with the possibility to plug any metric.Therefore, I was wondering if scikit-learn will be interested to have this transformer to perform feature selection.
I would be interested to know the different opinions
@agramfort @MechCoder @jnothman @GaelVaroquaux @amueller @jorisvandenbossche
The text was updated successfully, but these errors were encountered: