-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] remove warnings in univariate feature selection #2369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
+1 |
👍 for removal, as long as we use a stable, non-random, sort. The reason is that I want to have 100% reproducibility. The default sort used by argsort is quicksort which is not stable. Should we switch to a heapsort, which is stable, but has the drawback of requiring p/2 work space in memory? I think that the work space requirement is not too bad, is it is in O(p) and not O(n p). |
What is p in this formula? According to Wikipedia, heapsort should require O(1) auxiliary space (apart from the n indices allocated by |
Number of features in the learning problem.
Correct, I made a mistake and meant mergesort rather than heapsort, which |
Actually there's a heapsort in NumPy master and it seems to have been there since the days of |
Timings:
Again, with fresh random numbers:
Memory usage:
Without the |
So let's use mergesort. I don't find the memory-usage numbers |
while you're at it I'd also like to have a stable sort in StratifiedKFold :) On Mon, Aug 19, 2013 at 2:33 PM, Gael Varoquaux
|
PR welcomed :P |
I've heard this before ;) |
These warnings are issued practically always when using frequency-valued or boolean data. Switched to a stable sort to get reproducible results.
Force-pushed a new version. Time to go back to the actual experiment I was performing, @agramfort stratified k-fold is yours :p |
👍 for merge. Thanks! |
[MRG] remove warnings in univariate feature selection
I pushed the green button as travis was happy. |
These warnings are practically always triggered when doing text classification or any task with lots of boolean features. I suggest to just remove them, since in those cases the warning is so confusing that it does more harm than good.