8000 [MRG + 1] DOC remove deprecated option in HashingVectorizer examples … · raghavrv/scikit-learn@b189254 · GitHub
[go: up one dir, main page]

Skip to content

Commit b189254

Browse files
rthjnothman
authored andcommitted
[MRG + 1] DOC remove deprecated option in HashingVectorizer examples (scikit-learn#9163)
* DOC remplace non_negative=True with alternate_sign=False in HashingVectorizer examples * Updated feature_extraction doc
1 parent 1f6ac72 commit b189254

File tree

4 files changed

+8
-9
lines changed

4 files changed

+8
-9
lines changed

doc/modules/feature_extraction.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,11 +125,10 @@ Since the hash function might cause collisions between (unrelated) features,
125125
a signed hash function is used and the sign of the hash value
126126
determines the sign of the value stored in the output matrix for a feature.
127127
This way, collisions are likely to cancel out rather than accumulate error,
128-
and the expected mean of any output feature's value is zero.
129-
130-
If ``non_negative=True`` is passed to the constructor, the absolute
131-
value is taken. This undoes some of the collision handling, but allows
132-
the output to be passed to estimators like
128+
and the expected mean of any output feature's value is zero. This mechanism
129+
is enabled by default with ``alternate_sign=True`` and is particularly useful
130+
for small hash table sizes (``n_features < 10000``). For large hash table
131+
sizes, it can be disabled, to allow the output to be passed to estimators like
133132
:class:`sklearn.naive_bayes.MultinomialNB` or
134133
:class:`sklearn.feature_selection.chi2`
135134
feature selectors that expect non-negative inputs.

examples/applications/plot_out_of_core_classification.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ def progress(blocknum, bs, size):
194194
# maximum
195195

196196
vectorizer = HashingVectorizer(decode_error='ignore', n_features=2 ** 18,
197-
non_negative=True)
197+
alternate_sign=False)
198198

199199

200200
# Iterator over parsed Reuters SGML files.

examples/text/document_classification_20newsgroups.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ def size_mb(docs):
152152
print("Extracting features from the training data using a sparse vectorizer")
153153
t0 = time()
154154
if opts.use_hashing:
155-
vectorizer = HashingVectorizer(stop_words='english', non_negative=True,
155+
vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,
156156
n_features=opts.n_features)
157157
X_train = vectorizer.transform(data_train.data)
158158
else:

examples/text/document_clustering.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,13 +144,13 @@ def is_interactive():
144144
if opts.use_idf:
145145
# Perform an IDF normalization on the output of HashingVectorizer
146146
hasher = HashingVectorizer(n_features=opts.n_features,
147-
stop_words='english', non_negative=True,
147+
stop_words='english', alternate_sign=False,
148148
norm=None, binary=False)
149149
vectorizer = make_pipeline(hasher, TfidfTransformer())
150150
else:
151151
vectorizer = HashingVectorizer(n_features=opts.n_features,
152152
stop_words='english',
153-
non_negative=False, norm='l2',
153+
alternate_sign=False, norm='l2',
154154
binary=False)
155155
else:
156156
vectorizer = TfidfVectorizer(max_df=0.5, max_features=opts.n_features,

0 commit comments

Comments
 (0)
0