jwjohnson314
diff --git a/‎doc/datasets/kddcup99.rst
Lines changed: 2 additions & 2 deletions b/‎doc/datasets/kddcup99.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/datasets/labeled_faces.rst
Lines changed: 2 additions & 2 deletions b/‎doc/datasets/labeled_faces.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/modules/calibration.rst
Lines changed: 2 additions & 2 deletions b/‎doc/modules/calibration.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/modules/gaussian_process.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/gaussian_process.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/manifold.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/manifold.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/multiclass.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/multiclass.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/neighbors.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/neighbors.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/neural_networks_unsupervised.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/neural_networks_unsupervised.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/pipeline.rst
Lines changed: 1 addition & 1 deletion b/‎doc/modules/pipeline.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modules/preprocessing.rst
Lines changed: 2 additions & 2 deletions b/‎doc/modules/preprocessing.rst
Lines changed: 2 additions & 2 deletions
@@ -12,11 +12,11 @@ generated using a closed network and hand-injected attacks to produce a
 large number of different types of attack with normal activity in the
 background. As the initial goal was to produce a large training set for
 supervised learning algorithms, there is a large proportion (80.1%) of
-abnormal data which is unrealistic in real world, and inapropriate for
+abnormal data which is unrealistic in real world, and inappropriate for
 unsupervised anomaly detection which aims at detecting 'abnormal' data, ie
 1) qualitatively different from normal data
 2) in large minority among the observations.
-We thus transform the KDD Data set into two differents data set: SA and SF.
+We thus transform the KDD Data set into two different data sets: SA and SF.
 
 -SA is obtained by simply selecting all the normal data, and a small
 proportion of abnormal data to gives an anomaly proportion of 1%.
 
@@ -29,11 +29,11 @@ Usage
 
 ``scikit-learn`` provides two loaders that will automatically download,
 cache, parse the metadata files, decode the jpeg and convert the
-interesting slices into memmaped numpy arrays. This dataset size is more
+interesting slices into memmapped numpy arrays. This dataset size is more
 than 200 MB. The first load typically takes more than a couple of minutes
 to fully decode the relevant part of the JPEG files into numpy arrays. If
 the dataset has  been loaded once, the following times the loading times
-less than 200ms by using a memmaped version memoized on the disk in the
+less than 200ms by using a memmapped version memoized on the disk in the
 ``~/scikit_learn_data/lfw_home/`` folder using ``joblib``.
 
 The first loader is used for the Face Identification task: a multi-class
 
@@ -56,7 +56,7 @@ with different biases per method:
    than 0 for this case, thus moving the average prediction of the bagged
    ensemble away from 0. We observe this effect most strongly with random
    forests because the base-level trees trained with random forests have
-   relatively high variance due to feature subseting." As a result, the
+   relatively high variance due to feature subsetting." As a result, the
    calibration curve also referred to as the reliability diagram (Wilks 1995 [5]_) shows a
    characteristic sigmoid shape, indicating that the classifier could trust its
    "intuition" more and return probabilties closer to 0 or 1 typically.
@@ -78,7 +78,7 @@ The class :class:`CalibratedClassifierCV` uses a cross-validation generator and
 estimates for each split the model parameter on the train samples and the
 calibration of the test samples. The probabilities predicted for the
 folds are then averaged. Already fitted classifiers can be calibrated by
-:class:`CalibratedClassifierCV` via the paramter cv="prefit". In this case,
+:class:`CalibratedClassifierCV` via the parameter cv="prefit". In this case,
 the user has to take care manually that data for model fitting and calibration
 are disjoint.
 
 
@@ -280,7 +280,7 @@ of the dataset, this might be considerably faster. However, note that
 "one_vs_one" does not support predicting probability estimates but only plain
 predictions. Moreover, note that :class:`GaussianProcessClassifier` does not
 (yet) implement a true multi-class Laplace approximation internally, but
-as discussed aboved is based on solving several binary classification tasks
+as discussed above is based on solving several binary classification tasks
 internally, which are combined using one-versus-rest or one-versus-one.
 
 GPC examples
 
@@ -558,7 +558,7 @@ descent will get stuck in a bad local minimum. If it is too high the KL
 divergence will increase during optimization. More tips can be found in
 Laurens van der Maaten's FAQ (see references). The last parameter, angle,
 is a tradeoff between performance and accuracy. Larger angles imply that we
-can approximate larger regions by a single point,leading to better speed
+can approximate larger regions by a single point, leading to better speed
 but less accurate results.
 
 `"How to Use t-SNE Effectively" <http://distill.pub/2016/misread-tsne/>`_
 
@@ -367,7 +367,7 @@ classifier per target.  This allows multiple target variable
 classifications. The purpose of this class is to extend estimators
 to be able to estimate a series of target functions (f1,f2,f3...,fn)
 that are trained on a single X predictor matrix to predict a series
-of reponses (y1,y2,y3...,yn).
+of responses (y1,y2,y3...,yn).
 
 Below is an example of multioutput classification:
 
 
@@ -294,7 +294,7 @@ the *KD tree* data structure (short for *K-dimensional tree*), which
 generalizes two-dimensional *Quad-trees* and 3-dimensional *Oct-trees*
 to an arbitrary number of dimensions.  The KD tree is a binary tree
 structure which recursively partitions the parameter space along the data
+axes, dividing it into nested orthotropic regions into which data points
 are filed.  The construction of a KD tree is very fast: because partitioning
 is performed only along the data axes, no :math:`D`-dimensional distances
 need to be computed.  Once constructed, the nearest neighbor of a query
 
@@ -135,7 +135,7 @@ negative gradient, however, is intractable. Its goal is to lower the energy of
 joint states that the model prefers, therefore making it stay true to the data.
 It can be approximated by Markov chain Monte Carlo using block Gibbs sampling by
 iteratively sampling each of :math:`v` and :math:`h` given the other, until the
-chain mixes. Samples generated in this way are sometimes refered as fantasy
+chain mixes. Samples generated in this way are sometimes referred as fantasy
 particles. This is inefficient and it is difficult to determine whether the
 Markov chain mixes.
 
 
@@ -164,7 +164,7 @@ object::
     >>> # Clear the cache directory when you don't need it anymore
     >>> rmtree(cachedir)
 
-.. warning:: **Side effect of caching transfomers**
+.. warning:: **Side effect of caching transformers**
 
    Using a :class:`Pipeline` without cache enabled, it is possible to
    inspect the original instance such as::
 
@@ -482,7 +482,7 @@ Then we fit the estimator, and transform a data point.
 In the result, the first two numbers encode the gender, the next set of three
 numbers the continent and the last four the web browser.
 
-Note that, if there is a possibilty that the training data might have missing categorical
+Note that, if there is a possibility that the training data might have missing categorical
 features, one has to explicitly set ``n_values``. For example,
 
     >>> enc = preprocessing.OneHotEncoder(n_values=[2, 3, 4])
@@ -588,7 +588,7 @@ In some cases, only interaction terms among features are required, and it can be
 
 The features of X have been transformed from :math:`(X_1, X_2, X_3)` to :math:`(1, X_1, X_2, X_3, X_1X_2, X_1X_3, X_2X_3, X_1X_2X_3)`.
 
-Note that polynomial features are used implicitily in `kernel methods <https://en.wikipedia.org/wiki/Kernel_method>`_ (e.g., :class:`sklearn.svm.SVC`, :class:`sklearn.decomposition.KernelPCA`) when using polynomial :ref:`svm_kernels`.
+Note that polynomial features are used implicitly in `kernel methods <https://en.wikipedia.org/wiki/Kernel_method>`_ (e.g., :class:`sklearn.svm.SVC`, :class:`sklearn.decomposition.KernelPCA`) when using polynomial :ref:`svm_kernels`.
 
 See :ref:`sphx_glr_auto_examples_linear_model_plot_polynomial_interpolation.py` for Ridge regression using created polynomial features.