10000 DOC copyedit FeatureHasher narrative further · amueller/scikit-learn@50dbadb · GitHub
[go: up one dir, main page]

Skip to content

Commit 50dbadb

Browse files
committed
DOC copyedit FeatureHasher narrative further
@ogrisel's remark + avoid "since" 2× in a sentence.
1 parent f63cdbb commit 50dbadb

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

doc/modules/feature_extraction.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -140,11 +140,11 @@ or ``chi2`` feature selectors that expect non-negative inputs.
140140
depending on the constructor parameter ``input_type``.
141141
Mapping are treated as lists of ``(feature, value)`` pairs,
142142
while single strings have an implicit value of 1,
143-
so ``[feat1, feat2, feat3]`` is interpreted as
144-
``[(feat1, 1), (feat2, 1), (feat3, 1)]``.
143+
so ``['feat1', 'feat2', 'feat3']`` is interpreted as
144+
``[('feat1', 1), ('feat2', 1), ('feat3', 1)]``.
145145
If a single feature occurs multiple times in a sample,
146146
the associated values will be summed
147-
(so ``(feat, 2)`` and ``(feat, 3.5)`` become ``(feat, 5.5)``).
147+
(so ``('feat', 2)`` and ``('feat', 3.5)`` become ``('feat', 5.5)``).
148148
The output from :class:`FeatureHasher` is always a ``scipy.sparse`` matrix
149149
in the CSR format.
150150

@@ -200,8 +200,8 @@ The present implementation works under the assumption
200200
that the sign bit of MurmurHash3 is independent of its other bits.
201201

202202
Since a simple modulo is used to transform the hash function to a column index,
203-
it is advisable to use a power of two as the ``n_features`` parameter,
204-
since otherwise the features will not be mapped evenly to the columns.
203+
it is advisable to use a power of two as the ``n_features`` parameter;
204+
otherwise the features will not be mapped evenly to the columns.
205205

206206

207207
.. topic:: References:

0 commit comments

Comments
 (0)
0