-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fix #4883 improved compatibility of FeatureHasher with dict-vectorizer #4913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ct-vectorizer
test are failing. you probably need to use six. also ping @ogrisel. I thought this was already supported, maybe I'm mistaken. |
All the tests under feature_extraction are working on my machine. I can't figure out which one is failing. |
You can click on the travis details to see the failures: https://travis-ci.org/scikit-learn/scikit-learn/jobs/69054099 |
Six is a library to make code python2 and python3 compatible |
How did you run the tests? There are several failures under different versions: https://travis-ci.org/scikit-learn/scikit-learn/jobs/69054101 |
i just ran nosetests under feature_extraction |
Linking to #4883. |
I don't remember where, but @jnothman brought up that string features should be one-hot encoded. I'm not sure what the best way to handle categorical integer types is, though. |
@amueller I am new to the scikit-learn community and want to contribute. Can I pick this up ? |
Tests are failing because you forgot to commit the latest generated C code (i.e. it is inconsistent with the pyx code). Once that's sorted, if tests pass this almost LGTM, so @ryadzenine you're welcome to wrap it up or pass it onto @rrohan. |
@@ -26,7 +26,6 @@ def transform(raw_X, Py_ssize_t n_features, dtype): | |||
|
|||
""" | |||
assert n_features > 0 | |||
|
|||
cdef np.int32_t h | |||
cdef double value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this no longer used?
@ryadzenine Could you confirm if you are still working on this or would you be okay if @rrohan picked this up? |
@rvraghav93 Can I pick it up ? |
Yes I guess... As we have waited for a week. But you might want to cherry-pick his commits and attribute him for the work as well if you don't mind :) |
Sorry for interrupting your work @rrohan if you were working on this but I felt this needed quick addressing. Also thanks for the brilliant work @ryadzenine! @jnothman can you please check once? |
Sorry, got held up. I will fix this over the weekend. [Update] Just looked at your commit. Looks good :) On Fri, Jan 15, 2016 at 8:25 AM, Devashish Deshpande <
Rohan Ramanath |
Hi guys, I am really sorry. I just disappeared. I had some personal issues to handle. Sorry! |
We know what it's like to have lives outside of github. No problem. Does On 2 February 2016 at 09:33, Ryad ZENINE notifications@github.com wrote:
|
@jnothman No, I can't right now. Really sorry |
I can work on it. On Wed, Feb 3, 2016 at 12:27 PM, Ryad ZENINE notifications@github.com
Rohan Ramanath |
@rrohan are you still interested? This is up for grabs. |
Sure, I can do it On Fri, Oct 7, 2016 at 3:52 PM, Andreas Mueller notifications@github.com
Rohan Ramanath |
The underlying issue was resolved in #6173. Closing. Thanks! |
Hopefully, this one will be ok. Sorry for all the trouble guys and thank you for your help.
When a string value is encountered, is it concatenated with the feature name and we put a one on the relevant column.