Implement mRMR feature selection #8889

glemaitre · 2017-05-16T15:34:40Z

Speaking with @jorisvandenbossche IRL, we come to discuss about the mRMR feature selection among other methods.

#5372 intended at first to implement the mRMR with mutual information as a metric. However, it has been merged such that mutual information could be used in the SelectKBest class. It was also discuss that the mRMR mechanism could be implemented in a separate PR with the possibility to plug any metric.

Therefore, I was wondering if scikit-learn will be interested to have this transformer to perform feature selection.

I would be interested to know the different opinions
@agramfort @MechCoder @jnothman @GaelVaroquaux @amueller @jorisvandenbossche

The text was updated successfully, but these errors were encountered:

GaelVaroquaux · 2017-05-17T18:00:20Z

What's the reference paper and how much has it been cited?

glemaitre · 2017-05-17T19:05:21Z

A bit more than 4500 citations:http://ieeexplore.ieee.org/abstract/document/1453511/

glemaitre · 2017-05-18T10:19:14Z

The author version: PDF

HugoDLopes · 2018-02-22T17:48:58Z

Hi. Any update on this issue? I'm been working in feature selection lately and I think the current mutual information is limited

jnothman · 2018-02-22T20:58:44Z

I think we should have an MRMR approach implemented, pe 8000 rhaps especially after we have dropped Randomized L1. PR welcome.

glemaitre · 2018-02-22T21:42:23Z

In practice, how far is it from using a SelectFromModel with a Lasso. I recall a discussion with @GaelVaroquaux sometime ago about that.

jnothman · 2018-02-22T21:54:09Z

I'm not sure. SelectFromModel with lasso might be as good. perhaps we need to look at the computational complexity with lots of features also.

jnothman · 2018-02-22T21:55:09Z

and sequential feature selection might be a more expensive variant again that would account for redundancy

glemaitre · 2018-02-22T22:15:15Z

the issue with mrmr will be to compute the mutual information which pretty costly as well if I am not wrong.

glemaitre · 2018-02-22T22:15:54Z

but anyway it would be great to have some real benchmark to actually argument the inclusion or not of such feature

GaelVaroquaux · 2018-02-22T22:16:55Z

but anyway it would be great to have some real benchmark to actually argument the inclusion or not of such feature

+1

goldragoon · 2018-04-18T21:09:27Z

Does any branch is working on this issue?
I'm currently working on researching information theoretic feature selection, and i want to volunteer to implement mRMR and several variants.

Higher Order Mutual Information Approximation for Feature Selection

As we can see on above paper, mRMR is just one special case of 'Mutual Information Based Feature Selection Method'
Does scikit-learn is suitable package to include such feature that general version of 'Mutual Information Based Feature Selection Method'?

If so, i'll code it in few days and want to discuss about implementation details.

Sorry for poor vocabulary.

jnothman · 2018-04-18T23:34:46Z

I think we're keen on having one implementation of a well-established mrmr approach if you can illustrate that it complements or outperforms , say, SelectFromModel(RandomForest*) and Sequential (#8684)

goldragoon · 2018-04-28T11:41:37Z

@jnothman
I can't guarantee that mRMR with MI score(filter based feature selection method) outperforms SelectFromModel(RandomForest*) (ensemble feature selection based on filter method) and Sequential Selection (forward, backward..., wrapper method) on performance aspect(But i think that it would be quite fast than Sequential Selection, on high dimensional dataset.).

But I can say that mRMR is quite different(complement) to either of approaches.

SelectFromModel(RandomForest*) with information gain criterion is information theoretic filter type feature selection as mRMR. But it uses ordinary entropy rather than mutual information. And it only uses single-variate information measure. I will implement generalized version of mRMR with multi-variate version as (Higher Order Mutual Information Approximation for Feature Selection)[https://arxiv.org/pdf/1612.00554.pdf].
Sequential Search is wrapper type feature selection method. So it has completely different mechanism to select variable compared to filter type method as mRMR. And lot's of research show us that wrapper type feature selection tends to overfit to data compared to filter type. And wrapper types are tends to slower that feature type.

I hope that my explanation is what you expected......
If you think that this method(generalized version of mRMR) is suitable to scikit-learn,
I will write some code right away.

jnothman · 2018-04-28T12:01:30Z

it would be good to have the benefits of mrmr illustrated with an example in our example gallery, as well as an implementation

ddofer · 2021-07-27T09:37:02Z

Another advantage of MRMR is that it's just very fast and decent guarantees. I'd like using it as a "filter" before other methods, and it's much better than just naively dropping by co-correlation between features (regardless of target). In highly redundant datasets (features wise), this is a huge difference

ncooder · 2024-08-17T20:34:07Z

Is there any advance on this issue? I can submit a PR with the mRMR implementation. There are very useful feature selection functions in Matlab for regression and for classification.

glemaitre · 2024-09-02T09:34:02Z

@ncooder As mentioned in the discussion, we would need a benchmark to check if in practice this methods is better than just selecting feature using a Lasso model (#8889 (comment)).

If in practice there are leading to the same performance then there is no need to implement it even though the paper is cited.

jnothman added Enhancement help wanted labels Feb 22, 2018

glemaitre mentioned this issue May 9, 2021

mRMR (Minimum Redundancy and Maximum Relevance) score as score_func for feature selection methods #20067

Closed

cmarmo added the module:feature_selection label Dec 16, 2021

aivarsoo mentioned this issue Apr 10, 2024

Unexpected behavior of sklearn.feature_selection.mutual_info_regression if copy=False #28793

Open

glemaitre mentioned this issue Aug 2, 2024

mRMR feature selector #29604

Closed

glemaitre added Needs Decision Requires decision and removed help wanted labels Sep 2, 2024

glemaitre changed the title ~~[RFC] mRMR feature selection~~ Implement mRMR feature selection Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement mRMR feature selection #8889

Implement mRMR feature selection #8889

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Implement mRMR feature selection #8889

Implement mRMR feature selection #8889

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!