8000 [MRG+1] micro-optimize HashingVectorizer and FeatureHasher (#7470) · scikit-learn/scikit-learn@c336a43 · GitHub
[go: up one dir, main page]

Skip to content

Commit c336a43

Browse files
kmikeamueller
authored andcommitted
[MRG+1] micro-optimize HashingVectorizer and FeatureHasher (#7470)
* micro-optimize HashingVectorizer and FeatureHasher * fix backwards compatibility for Cython < 0.20
1 parent 69ca580 commit c336a43

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

sklearn/feature_extraction/_hashing.pyx

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@ from libc.stdlib cimport abs
88
cimport numpy as np
99
import numpy as np
1010

11-
from ..externals.six import string_types
12-
1311
from sklearn.utils.murmurhash cimport murmurhash3_bytes_s32
1412

1513
np.import_array()
@@ -45,7 +43,7 @@ def transform(raw_X, Py_ssize_t n_features, dtype):
4543

4644
for x in raw_X:
4745
for f, v in x:
48-
if isinstance(v, string_types):
46+
if isinstance(v, (str, unicode)):
4947
f = "%s%s%s" 8000 % (f, '=', v)
5048
value = 1
5149
else:
@@ -55,13 +53,13 @@ def transform(raw_X, Py_ssize_t n_features, dtype):
5553
continue
5654

5755
if isinstance(f, unicode):
58-
f = f.encode("utf-8")
56+
f = (<unicode>f).encode("utf-8")
5957
# Need explicit type check because Murmurhash does not propagate
6058
# all exceptions. Add "except *" there?
6159
elif not isinstance(f, bytes):
6260
raise TypeError("feature names must be strings")
6361

64-
h = murmurhash3_bytes_s32(f, 0)
62+
h = murmurhash3_bytes_s32(<bytes>f, 0)
6563

6664
array.resize_smart(indices, len(indices) + 1)
6765
indices[len(indices) - 1] = abs(h) % n_features

0 commit comments

Comments
 (0)
0