8000 Implement mRMR feature selection · Issue #8889 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Implement mRMR feature selection #8889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
glemaitre opened this issue May 16, 2017 · 18 comments
Open

Implement mRMR feature selection #8889

glemaitre opened this issue May 16, 2017 · 18 comments

Comments

@glemaitre
Copy link
Member

Speaking with @jorisvandenbossche IRL, we come to discuss about the mRMR feature selection among other methods.

#5372 intended at first to implement the mRMR with mutual information as a metric. However, it has been merged such that mutual information could be used in the SelectKBest class. It was also discuss that the mRMR mechanism could be implemented in a separate PR with the possibility to plug any metric.

Therefore, I was wondering if scikit-learn will be interested to have this transformer to perform feature selection.

I would be interested to know the different opinions
@agramfort @MechCoder @jnothman @GaelVaroquaux @amueller @jorisvandenbossche

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented May 17, 2017 via email

@glemaitre
Copy link
Member Author
glemaitre commented May 17, 2017 via email

@glemaitre
Copy link
Member Author

The author version: PDF

@HugoDLopes
Copy link

Hi. Any update on this issue? I'm been working in feature selection lately and I think the current mutual information is limited

@jnothman
Copy link
Member

I think we should have an MRMR approach implemented, pe 8000 rhaps especially after we have dropped Randomized L1. PR welcome.

@glemaitre
Copy link
Member Author

In practice, how far is it from using a SelectFromModel with a Lasso. I recall a discussion with @GaelVaroquaux sometime ago about that.

@jnothman
Copy link
Member
jnothman commented Feb 22, 2018 via email

@jnothman
Copy link
Member
jnothman commented Feb 22, 2018 via email

@glemaitre
Copy link
Member Author

the issue with mrmr will be to compute the mutual information which pretty costly as well if I am not wrong.

@glemaitre
Copy link
Member Author

but anyway it would be great to have some real benchmark to actually argument the inclusion or not of such feature

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Feb 22, 2018 via email

@goldragoon
Copy link

Does any branch is working on this issue?
I'm currently working on researching information theoretic feature selection, and i want to volunteer to implement mRMR and several variants.

Higher Order Mutual Information Approximation for Feature Selection

As we can see on above paper, mRMR is just one special case of 'Mutual Information Based Feature Selection Method'
Does scikit-learn is suitable package to include such feature that general version of 'Mutual Information Based Feature Selection Method'?

If so, i'll code it in few days and want to discuss about implementation details.

  • Sorry for poor vocabulary.

@jnothman
Copy link
Member

I think we're keen on having one implementation of a well-established mrmr approach if you can illustrate that it complements or outperforms , say, SelectFromModel(RandomForest*) and Sequential (#8684)

@goldragoon
Copy link

@jnothman
I can't guarantee that mRMR with MI score(filter based feature selection method) outperforms SelectFromModel(RandomForest*) (ensemble feature selection based on filter method) and Sequential Selection (forward, backward..., wrapper method) on performance aspect(But i think that it would be quite fast than Sequential Selection, on high dimensional dataset.).

But I can say that mRMR is quite different(complement) to either of approaches.

  1. SelectFromModel(RandomForest*) with information gain criterion is information theoretic filter type feature selection as mRMR. But it uses ordinary entropy rather than mutual information. And it only uses single-variate information measure. I will implement generalized version of mRMR with multi-variate version as (Higher Order Mutual Information Approximation for Feature Selection)[https://arxiv.org/pdf/1612.00554.pdf].

  2. Sequential Search is wrapper type feature selection method. So it has completely different mechanism to select variable compared to filter type method as mRMR. And lot's of research show us that wrapper type feature selection tends to overfit to data compared to filter type. And wrapper types are tends to slower that feature type.

I hope that my explanation is what you expected......
If you think that this method(generalized version of mRMR) is suitable to scikit-learn,
I will write some code right away.

@jnothman
Copy link
Member
jnothman commented Apr 28, 2018 via email

@ddofer
Copy link
ddofer commented Jul 27, 2021

Another advantage of MRMR is that it's just very fast and decent guarantees. I'd like using it as a "filter" before other methods, and it's much better than just naively dropping by co-correlation between features (regardless of target). In highly redundant datasets (features wise), this is a huge difference

@ncooder
Copy link
ncooder commented Aug 17, 2024

Is there any advance on this issue? I can submit a PR with the mRMR implementation. There are very useful feature selection functions in Matlab for regression and for classification.

@glemaitre
Copy link
Member Author

@ncooder As mentioned in the discussion, we would need a benchmark to check if in practice this methods is better than just selecting feature using a Lasso model (#8889 (comment)).

If in practice there are leading to the same performance then there is no need to implement it even though the paper is cited.

@glemaitre glemaitre added Needs Decision Requires decision and removed help wanted labels Sep 2, 2024
@glemaitre glemaitre changed the title [RFC] mRMR feature selection Implement mRMR feature selection Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
0