10000 Merge pull request #6426 from yenchenlin1994/remove-redundant-typo · scikit-learn/scikit-learn@323bcea · GitHub
[go: up one dir, main page]

Skip to content

Commit 323bcea

Browse files
committed
Merge pull request #6426 from yenchenlin1994/remove-redundant-typo
[MRG] DOC Remove redundant words in sklearn
2 parents 613e8be + 46fc1be commit 323bcea

32 files changed

+36
-36
lines changed

doc/faq.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ anymore. The version of joblib shipped with scikit-learn automatically uses
235235
that setting by default (under Python 3.4 and later).
236236

237237
If you have custom code that uses ``multiprocessing`` directly instead of using
238-
it via joblib you can enable the the 'forkserver' mode globally for your
238+
it via joblib you can enable the 'forkserver' mode globally for your
239239
program: Insert the following instructions in your main script::
240240

241241
import multiprocessing

doc/modules/biclustering.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ are used to form the matrix :math:`Z`:
140140
C^{-1/2} V
141141
\end{bmatrix}
142142
143-
where the the columns of :math:`U` are :math:`u_2, \dots, u_{\ell +
143+
where the columns of :math:`U` are :math:`u_2, \dots, u_{\ell +
144144
1}`, and similarly for :math:`V`.
145145

146146
Then the rows of :math:`Z` are clustered using :ref:`k-means
@@ -174,7 +174,7 @@ The :class:`SpectralBiclustering` algorithm assumes that the input
174174
data matrix has a hidden checkerboard structure. The rows and columns
175175
of a matrix with this structure may be partitioned so that the entries
176176
of any bicluster in the Cartesian product of row clusters and column
177-
clusters is are approximately constant. For instance, if there are two
177+
clusters are approximately constant. For instance, if there are two
178178
row partitions and three column partitions, each row will belong to
179179
three biclusters, and each column will belong to two biclusters.
180180

doc/modules/covariance.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ empirical covariance matrix is then rescaled to compensate the
280280
performed selection of observations ("consistency step"). Having
281281
computed the Minimum Covariance Determinant estimator, one can give
282282
weights to observations according to their Mahalanobis distance,
283-
leading the a reweighted estimate of the covariance matrix of the data
283+
leading to a reweighted estimate of the covariance matrix of the data
284284
set ("reweighting step").
285285

286286
Rousseeuw and Van Driessen [4] developed the FastMCD algorithm in order

doc/modules/cross_validation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,7 @@ Here is a usage example::
465465

466466
:class:`ShuffleSplit` is thus a good alternative to :class:`KFold` cross
467467
validation that allows a finer control on the number of iterations and
468-
the proportion of samples in on each side of the train / test split.
468+
the proportion of samples on each side of the train / test split.
469469

470470

471471
Label-Shuffle-Split

doc/modules/ensemble.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -679,7 +679,7 @@ the contribution of each weak learner by a factor :math:`\nu`:
679679
F_m(x) = F_{m-1}(x) + \nu \gamma_m h_m(x)
680680
681681
The parameter :math:`\nu` is also called the **learning rate** because
682-
it scales the step length the the gradient descent procedure; it can
682+
it scales the step length the gradient descent procedure; it can
683683
be set via the ``learning_rate`` parameter.
684684

685685
The parameter ``learning_rate`` strongly interacts with the parameter

doc/modules/pipeline.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ create complex models.
134134

135135
(A :class:`FeatureUnion` has no way of checking whether two transformers
136136
might produce identical features. It only produces a union when the
137-
feature sets are disjoint, and making sure they are is the caller's
137+
feature sets are disjoint, and making sure they are the caller's
138138
responsibility.)
139139

140140

doc/modules/svm.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ for these classifiers.
181181

182182
This might be made more clear by an example:
183183

184-
Consider a three class problem with with class 0 having three support vectors
184+
Consider a three class problem with class 0 having three support vectors
185185
:math:`v^{0}_0, v^{1}_0, v^{2}_0` and class 1 and 2 having two support vectors
186186
:math:`v^{0}_1, v^{1}_1` and :math:`v^{0}_2, v^{1}_2` respectively. For each
187187
support vector :math:`v^{j}_i`, there are two dual coefficients. Let's call

doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
dataset.data, dataset.target, test_size=0.5)
2929

3030

31-
# TASK: Build a an vectorizer that splits strings into sequence of 1 to 3
31+
# TASK: Build a vectorizer that splits strings into sequence of 1 to 3
3232
# characters instead of word tokens
3333

3434
# TASK: Build a vectorizer / classifier pipeline using the previous analyzer

doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
dataset.data, dataset.target, test_size=0.5)
2929

3030

31-
# TASK: Build a an vectorizer that splits strings into sequence of 1 to 3
31+
# TASK: Build a vectorizer that splits strings into sequence of 1 to 3
3232
# characters instead of word tokens
3333
vectorizer = TfidfVectorizer(ngram_range=(1, 3), analyzer='char',
3434
use_idf=False)

doc/whats_new.rst

Lines changed: 2 additions & 2 deletions
2851
Original file line numberDiff line numberDiff line change
@@ -2812,7 +2812,7 @@ Other changes
28122812

28132813
- :class:`svm.SVC` members ``coef_`` and ``intercept_`` changed sign for
28142814
consistency with ``decision_function``; for ``kernel==linear``,
2815-
``coef_`` was fixed in the the one-vs-one case, by `Andreas Müller`_.
2815+
``coef_`` was fixed in the one-vs-one case, by `Andreas Müller`_.
28162816

28172817
- Performance improvements to efficient leave-one-out cross-validated
28182818
Ridge regression, esp. for the ``n_samples > n_features`` case, in
@@ -2993,7 +2993,7 @@ Changelog
29932993

29942994
- Fixed a bug in the RFE module by `Gilles Louppe`_ (issue #378).
29952995

2996-
- Fixed a memory leak in in :ref:`svm` module by `Brian Holt`_ (issue #367).
2996+
- Fixed a memory leak in :ref:`svm` module by `Brian Holt`_ (issue #367).
29972997

29982998
- Faster tests by `Fabian Pedregosa`_ and others.
29992999

examples/bicluster/plot_spectral_coclustering.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
==============================================
55
66
This example demonstrates how to generate a dataset and bicluster it
7-
using the the Spectral Co-Clustering algorithm.
7+
using the Spectral Co-Clustering algorithm.
88
99
The dataset is generated using the ``make_biclusters`` function, which
1010
creates a matrix of small values and implants bicluster with large

examples/cluster/plot_cluster_comparison.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
clusters for the methods that needs this parameter
1818
specified. Note that affinity propagation has a tendency to
1919
create many clusters. Thus in this example its two parameters
20-
(damping and per-point preference) were set to to mitigate this
20+
(damping and per-point preference) were set to mitigate this
2121
behavior.
2222
"""
2323
print(__doc__)

examples/plot_johnson_lindenstrauss_bound.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
Empirical validation
4545
====================
4646
47-
We validate the above bounds on the the digits dataset or on the 20 newsgroups
47+
We validate the above bounds on the digits dataset or on the 20 newsgroups
4848
text document (TF-IDF word frequencies) dataset:
4949
5050
- for the digits dataset, some 8x8 gray level pixels data for 500

sklearn/ensemble/bagging.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -610,7 +610,7 @@ def predict_proba(self, X):
610610
the mean predicted class probabilities of the base estimators in the
611611
ensemble. If base estimators do not implement a ``predict_proba``
612612
method, then it resorts to voting and the predicted class probabilities
613-
of a an input sample represents the proportion of estimators predicting
613+
of an input sample represents the proportion of estimators predicting
614614
each class.
615615
616616
Parameters

sklearn/ensemble/gradient_boosting.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1240,7 +1240,7 @@ def apply(self, X):
12401240
-------
12411241
X_leaves : array_like, shape = [n_samples, n_estimators, n_classes]
12421242
For each datapoint x in X and for each tree in the ensemble,
1243-
return the index of the leaf x ends up in in each estimator.
1243+
return the index of the leaf x ends up in each estimator.
12441244
In the case of binary classification n_classes is 1.
12451245
"""
12461246

@@ -1840,7 +1840,7 @@ def apply(self, X):
18401840
-------
18411841
X_leaves : array_like, shape = [n_samples, n_estimators]
18421842
For each datapoint x in X and for each tree in the ensemble,
1843-
return the index of the leaf x ends up in in each estimator.
1843+
return the index of the leaf x ends up in each estimator.
18441844
"""
18451845

18461846
leaves = super(GradientBoostingRegressor, self).apply(X)

sklearn/exceptions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ class DataConversionWarning(UserWarning):
4747
This warning occurs when some input data needs to be converted or
4848
interpreted in a way that may not match the user's expectations.
4949
50-
For example, this warning may occur when the the user
50+
For example, this warning may occur when the user
5151
- passes an integer array to a function which expects float input and
5252
will convert the input
5353
- requests a non-copying operation, but a copy is required to meet the

sklearn/externals/joblib/pool.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -534,7 +534,7 @@ def __init__(self, processes=None, temp_folder=None, max_nbytes=1e6,
534534
os.makedirs(pool_folder)
535535
use_shared_mem = True
536536
except IOError:
537-
# Missing rights in the the /dev/shm partition,
537+
# Missing rights in the /dev/shm partition,
538538
# fallback to regular temp folder.
539539
temp_folder = None
540540
if temp_folder is None:

sklearn/grid_search.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@ class GridSearchCV(BaseSearchCV):
771771
See Also
772772
---------
773773
:class:`ParameterGrid`:
774-
generates all the combinations of a an hyperparameter grid.
774+
generates all the combinations of a hyperparameter grid.
775775
776776
:func:`sklearn.cross_validation.train_test_split`:
777777
utility function to split the data into a development set usable

sklearn/kernel_ridge.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ def fit(self, X, y=None, sample_weight=None):
166166
return self
167167

168168
def predict(self, X):
169-
"""Predict using the the kernel ridge model
169+
"""Predict using the kernel ridge model
170170
171171
Parameters
172172
----------

sklearn/linear_model/omp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ def _gram_omp(Gram, Xy, n_nonzero_coefs, tol_0=None, tol=None,
145145
copy_Gram=True, copy_Xy=True, return_path=False):
146146
"""Orthogonal Matching Pursuit step on a precomputed Gram matrix.
147147
148-
This function uses the the Cholesky decomposition method.
148+
This function uses the Cholesky decomposition method.
149149
150150
Parameters
151151
----------

sklearn/linear_model/tests/test_coordinate_descent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def test_lasso_path_return_models_vs_new_return_gives_same_coefficients():
205205
alphas = [5., 1., .5]
206206

207207
# Use lars_path and lasso_path(new output) with 1D linear interpolation
208-
# to compute the the same path
208+
# to compute the same path
209209
alphas_lars, _, coef_path_lars = lars_path(X, y, method='lasso')
210210
coef_path_cont_lars = interpolate.interp1d(alphas_lars[::-1],
211211
coef_path_lars[:, ::-1])

sklearn/linear_model/tests/test_sgd.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ def test_sgd_multiclass_njobs(self):
445445
assert_array_equal(pred, true_result2)
446446

447447
def test_set_coef_multiclass(self):
448-
# Checks coef_init and intercept_init shape for for multi-class
448+
# Checks coef_init and intercept_init shape for multi-class
449449
# problems
450450
# Provided coef_ does not match dataset
451451
clf = self.factory()

sklearn/manifold/_barnes_hut_tsne.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ cdef inline void index2offset(int* offset, int index, int n_dimensions) nogil:
195195
# Quite likely there's a fancy bitshift way of doing this
196196
# since the offset is equivalent to the binary representation
197197
# of the integer index
198-
# We read the the offset array left-to-right
198+
# We read the offset array left-to-right
199199
# such that the least significat bit is on the right
200200
cdef int rem, k, shift
201201
for k in range(n_dimensions):
@@ -212,7 +212,7 @@ cdef inline void index2offset(int* offset, int index, int n_dimensions) nogil:
212212

213213
cdef inline int offset2index(int* offset, int n_dimensions) nogil:
214214
# Calculate the 1:1 index for a given offset array
215-
# We read the the offset array right-to-left
215+
# We read the offset array right-to-left
216216
# such that the least significat bit is on the right
217217
cdef int dim
218218
cdef int index = 0

sklearn/manifold/spectral_embedding_.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,7 @@ def fit(self, X, y=None):
479479
"'precomputed', 'rbf', 'nearest_neighbors' "
480480
"or a callable.") % self.affinity)
481481
elif not callable(self.affinity):
482-
raise ValueError(("'affinity' is expected to be an an affinity "
482+
raise ValueError(("'affinity' is expected to be an affinity "
483483
"name or a callable. Got: %s") % self.affinity)
484484

485485
affinity_matrix = self._get_affinity_matrix(X)

sklearn/manifold/tests/test_t_sne.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ def test_preserve_trustworthiness_approximately():
228228
# Nearest neighbors should be preserved approximately.
229229
random_state = check_random_state(0)
230230
# The Barnes-Hut approximation uses a different method to estimate
231-
# P_ij using only a a number of nearest neighbors instead of all
231+
# P_ij using only a number of nearest neighbors instead of all
232232
# points (so that k = 3 * perplexity). A F438 s a result we set the
233233
# perplexity=5, so that the number of neighbors is 5%.
234234
n_components = 2

sklearn/metrics/tests/test_pairwise.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ def test_paired_distances():
270270
S3 = func(csr_matrix(X), csr_matrix(Y))
271271
assert_array_almost_equal(S, S3)
272272
if metric in PAIRWISE_DISTANCE_FUNCTIONS:
273-
# Check the the pairwise_distances implementation
273+
# Check the pairwise_distances implementation
274274
# gives the same value
275275
distances = PAIRWISE_DISTANCE_FUNCTIONS[metric](X, Y)
276276
distances = np.diag(distances)

sklearn/mixture/dpgmm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -492,7 +492,7 @@ def _fit(self, X, y=None):
492492
493493
A initialization step is performed before entering the em
494494
algorithm. If you want to avoid this step, set the keyword
495-
argument init_params to the empty string '' when when creating
495+
argument init_params to the empty string '' when creating
496496
the object. Likewise, if you would like just to do an
497497
initialization, set n_iter=0.
498498

sklearn/model_selection/_search.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -763,7 +763,7 @@ class GridSearchCV(BaseSearchCV):
763763
See Also
764764
---------
765765
:class:`ParameterGrid`:
766-
generates all the combinations of a an hyperparameter grid.
766+
generates all the combinations of a hyperparameter grid.
767767
768768
:func:`sklearn.model_selection.train_test_split`:
769769
utility function to split the data into a development set usable

sklearn/tree/_splitter.pxd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ cdef class Splitter:
7575
# The 1-d `constant_features` array of size n_features holds in
7676
# `constant_features[:n_constant_features]` the feature ids with
7777
# constant values for all the samples that reached a specific node.
78-
# The value `n_constant_features` is given by the the parent node to its
78+
# The value `n_constant_features` is given by the parent node to its
7979
# child nodes. The content of the range `[n_constant_features:]` is left
8080
# undefined, but preallocated for performance reasons
8181
# This allows optimization with depth-based tree building.

sklearn/tree/_tree.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ cdef class Tree:
537537
"""
538538
# Wrap for outside world.
539539
# WARNING: these reference the current `nodes` and `value` buffers, which
540-
# must not be be freed by a subsequent memory allocation.
540+
# must not be freed by a subsequent memory allocation.
541541
# (i.e. through `_resize` or `__setstate__`)
542542
property n_classes:
543543
def __get__(self):

sklearn/utils/seq_dataset.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ cdef class SequentialDataset:
4747

4848
cdef int random(self, double **x_data_ptr, int **x_ind_ptr,
4949
int *nnz, double *y, double *sample_weight) nogil:
50-
"""Get the a random example ``x`` from the dataset.
50+
"""Get a random example ``x`` from the dataset.
5151
5252
Parameters
5353
----------

sklearn/utils/weight_vector.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ cdef class WeightVector(object):
137137
val = x_data_ptr[j]
138138
aw_data_ptr[idx] += (self.average_a * val * (-c / wscale))
139139

140-
# Once the the sample has been processed
140+
# Once the sample has been processed
141141
# update the average_a and average_b
142142
if num_iter > 1:
143143
self.average_b /= (1.0 - mu)

0 commit comments

Comments
 (0)
0