Description
Describe the workflow you want to enable
Currently, there is no way to use SelectKBest
with ordinal data.
Describe your proposed solution
I would want to add a keyword parameter to the current f_regression
(or f_classif
) that takes in the type of input data. For example, if our X
is ordinal and y
is continuous, we can run f_regression(X, y, input_type="ordinal")
. The function will then calculate the Spearman's coefficient (as opposed to the current implementation of Pearson's coefficient in f_regression
) and output the scores and pvalues.
The steps to add support for ordinal data are:
- Write wrapper for
scipy.stats.spearmanr
OR write our own function that calculates Spearman's - Integrate that wrapper into
f_regression
and add keyword parameterinput_type
Now, I am not sure how to score one-hot encoded data yet, but hopefully by adding the keyword parameter, we can gradually expand the types of input data sklearn
's scoring functions can support.
Describe alternatives you've considered, if relevant
Alternatively, we can also write a new function f_regression_ordinal
to deal with ordinal X
and continuous y
.
Additional context
This feature request partially addresses #8480. There has also been discussions of the wrapper method, but no consensus has been reached: #6673, #8038.
This feature request was submitted per suggestions by @thomasjpfan and discussion with @yashika51 and @flosincapite