From 76dcfc431e6f6315876729f9c925a8ca2bd5c5f4 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Fri, 24 May 2024 20:26:56 +0200
Subject: [PATCH 01/11] DOC remove tutorials

---
 doc/tutorial/basic/tutorial.rst               | 351 -----------
 doc/tutorial/common_includes/info.txt         |   3 -
 doc/tutorial/index.rst                        |  26 -
 doc/tutorial/machine_learning_map/README.md   |  17 -
 doc/tutorial/machine_learning_map/index.rst   |  75 ---
 doc/tutorial/statistical_inference/index.rst  |  34 -
 .../statistical_inference/model_selection.rst | 315 ----------
 .../putting_together.rst                      |  62 --
 .../statistical_inference/settings.rst        |  92 ---
 .../supervised_learning.rst                   | 528 ----------------
 .../unsupervised_learning.rst                 | 297 ---------
 doc/tutorial/text_analytics/.gitignore        |  25 -
 .../data/languages/fetch_data.py              | 103 ---
 .../data/movie_reviews/fetch_data.py          |  33 -
 .../exercise_01_language_train_model.py       |  62 --
 .../skeletons/exercise_02_sentiment.py        |  63 --
 .../exercise_01_language_train_model.py       |  70 ---
 .../solutions/exercise_02_sentiment.py        |  79 ---
 .../solutions/generate_skeletons.py           |  38 --
 .../text_analytics/working_with_text_data.rst | 586 ------------------
 20 files changed, 2859 deletions(-)
 delete mode 100644 doc/tutorial/basic/tutorial.rst
 delete mode 100644 doc/tutorial/common_includes/info.txt
 delete mode 100644 doc/tutorial/index.rst
 delete mode 100644 doc/tutorial/machine_learning_map/README.md
 delete mode 100644 doc/tutorial/machine_learning_map/index.rst
 delete mode 100644 doc/tutorial/statistical_inference/index.rst
 delete mode 100644 doc/tutorial/statistical_inference/model_selection.rst
 delete mode 100644 doc/tutorial/statistical_inference/putting_together.rst
 delete mode 100644 doc/tutorial/statistical_inference/settings.rst
 delete mode 100644 doc/tutorial/statistical_inference/supervised_learning.rst
 delete mode 100644 doc/tutorial/statistical_inference/unsupervised_learning.rst
 delete mode 100644 doc/tutorial/text_analytics/.gitignore
 delete mode 100644 doc/tutorial/text_analytics/data/languages/fetch_data.py
 delete mode 100644 doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py
 delete mode 100644 doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py
 delete mode 100644 doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py
 delete mode 100644 doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py
 delete mode 100644 doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py
 delete mode 100644 doc/tutorial/text_analytics/solutions/generate_skeletons.py
 delete mode 100644 doc/tutorial/text_analytics/working_with_text_data.rst

diff --git a/doc/tutorial/basic/tutorial.rst b/doc/tutorial/basic/tutorial.rst
deleted file mode 100644
index 27dddb4e0e909..0000000000000
--- a/doc/tutorial/basic/tutorial.rst
+++ /dev/null
@@ -1,351 +0,0 @@
-.. _introduction:
-
-An introduction to machine learning with scikit-learn
-=====================================================
-
-.. topic:: Section contents
-
-    In this section, we introduce the `machine learning
-    <https://en.wikipedia.org/wiki/Machine_learning>`_
-    vocabulary that we use throughout scikit-learn and give a
-    simple learning example.
-
-
-Machine learning: the problem setting
--------------------------------------
-
-In general, a learning problem considers a set of n
-`samples <https://en.wikipedia.org/wiki/Sample_(statistics)>`_ of
-data and then tries to predict properties of unknown data. If each sample is
-more than a single number and, for instance, a multi-dimensional entry
-(aka `multivariate <https://en.wikipedia.org/wiki/Multivariate_random_variable>`_
-data), it is said to have several attributes or **features**.
-
-Learning problems fall into a few categories:
-
-* `supervised learning <https://en.wikipedia.org/wiki/Supervised_learning>`_,
-  in which the data comes with additional attributes that we want to predict
-  (:ref:`Click here <supervised-learning>`
-  to go to the scikit-learn supervised learning page).This problem
-  can be either:
-
-  * `classification
-    <https://en.wikipedia.org/wiki/Classification_in_machine_learning>`_:
-    samples belong to two or more classes and we
-    want to learn from already labeled data how to predict the class
-    of unlabeled data. An example of a classification problem would
-    be handwritten digit recognition, in which the aim is
-    to assign each input vector to one of a finite number of discrete
-    categories.  Another way to think of classification is as a discrete
-    (as opposed to continuous) form of supervised learning where one has a
-    limited number of categories and for each of the n samples provided,
-    one is to try to label them with the correct category or class.
-
-  * `regression <https://en.wikipedia.org/wiki/Regression_analysis>`_:
-    if the desired output consists of one or more
-    continuous variables, then the task is called *regression*. An
-    example of a regression problem would be the prediction of the
-    length of a salmon as a function of its age and weight.
-
-* `unsupervised learning <https://en.wikipedia.org/wiki/Unsupervised_learning>`_,
-  in which the training data consists of a set of input vectors x
-  without any corresponding target values. The goal in such problems
-  may be to discover groups of similar examples within the data, where
-  it is called `clustering <https://en.wikipedia.org/wiki/Cluster_analysis>`_,
-  or to determine the distribution of data within the input space, known as
-  `density estimation <https://en.wikipedia.org/wiki/Density_estimation>`_, or
-  to project the data from a high-dimensional space down to two or three
-  dimensions for the purpose of *visualization*
-  (:ref:`Click here <unsupervised-learning>`
-  to go to the Scikit-Learn unsupervised learning page).
-
-.. topic:: Training set and testing set
-
-    Machine learning is about learning some properties of a data set
-    and then testing those properties against another data set. A common
-    practice in machine learning is to evaluate an algorithm by splitting a data
-    set into two. We call one of those sets the **training set**, on which we
-    learn some properties; we call the other set the **testing set**, on which
-    we test the learned properties.
-
-
-.. _loading_example_dataset:
-
-Loading an example dataset
---------------------------
-
-`scikit-learn` comes with a few standard datasets, for instance the
-`iris <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ and `digits
-<https://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits>`_
-datasets for classification and the `diabetes dataset
-<https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html>`_ for regression.
-
-In the following, we start a Python interpreter from our shell and then
-load the ``iris`` and ``digits`` datasets.  Our notational convention is that
-``$`` denotes the shell prompt while ``>>>`` denotes the Python
-interpreter prompt::
-
-  $ python
-  >>> from sklearn import datasets
-  >>> iris = datasets.load_iris()
-  >>> digits = datasets.load_digits()
-
-A dataset is a dictionary-like object that holds all the data and some
-metadata about the data. This data is stored in the ``.data`` member,
-which is a ``n_samples, n_features`` array. In the case of supervised
-problems, one or more response variables are stored in the ``.target`` member. More
-details on the different datasets can be found in the :ref:`dedicated
-section <datasets>`.
-
-For instance, in the case of the digits dataset, ``digits.data`` gives
-access to the features that can be used to classify the digits samples::
-
-  >>> print(digits.data)
-  [[ 0.   0.   5. ...   0.   0.   0.]
-   [ 0.   0.   0. ...  10.   0.   0.]
-   [ 0.   0.   0. ...  16.   9.   0.]
-   ...
-   [ 0.   0.   1. ...   6.   0.   0.]
-   [ 0.   0.   2. ...  12.   0.   0.]
-   [ 0.   0.  10. ...  12.   1.   0.]]
-
-and ``digits.target`` gives the ground truth for the digit dataset, that
-is the number corresponding to each digit image that we are trying to
-learn::
-
-  >>> digits.target
-  array([0, 1, 2, ..., 8, 9, 8])
-
-.. topic:: Shape of the data arrays
-
-    The data is always a 2D array, shape ``(n_samples, n_features)``, although
-    the original data may have had a different shape. In the case of the
-    digits, each original sample is an image of shape ``(8, 8)`` and can be
-    accessed using::
-
-      >>> digits.images[0]
-      array([[  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.],
-             [  0.,   0.,  13.,  15.,  10.,  15.,   5.,   0.],
-             [  0.,   3.,  15.,   2.,   0.,  11.,   8.,   0.],
-             [  0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.],
-             [  0.,   5.,   8.,   0.,   0.,   9.,   8.,   0.],
-             [  0.,   4.,  11.,   0.,   1.,  12.,   7.,   0.],
-             [  0.,   2.,  14.,   5.,  10.,  12.,   0.,   0.],
-             [  0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.]])
-
-    The :ref:`simple example on this dataset
-    <sphx_glr_auto_examples_classification_plot_digits_classification.py>` illustrates how starting
-    from the original problem one can shape the data for consumption in
-    scikit-learn.
-
-.. topic:: Loading from external datasets
-
-    To load from an external dataset, please refer to :ref:`loading external datasets <external_datasets>`.
-
-Learning and predicting
-------------------------
-
-In the case of the digits dataset, the task is to predict, given an image,
-which digit it represents. We are given samples of each of the 10
-possible classes (the digits zero through nine) on which we *fit* an
-`estimator <https://en.wikipedia.org/wiki/Estimator>`_ to be able to *predict*
-the classes to which unseen samples belong.
-
-In scikit-learn, an estimator for classification is a Python object that
-implements the methods ``fit(X, y)`` and ``predict(T)``.
-
-An example of an estimator is the class ``sklearn.svm.SVC``, which
-implements `support vector classification
-<https://en.wikipedia.org/wiki/Support_vector_machine>`_. The
-estimator's constructor takes as arguments the model's parameters.
-
-For now, we will consider the estimator as a black box::
-
-  >>> from sklearn import svm
-  >>> clf = svm.SVC(gamma=0.001, C=100.)
-
-.. topic:: Choosing the parameters of the model
-
-  In this example, we set the value of ``gamma`` manually.
-  To find good values for these parameters, we can use tools
-  such as :ref:`grid search <grid_search>` and :ref:`cross validation
-  <cross_validation>`.
-
-The ``clf`` (for classifier) estimator instance is first
-fitted to the model; that is, it must *learn* from the model. This is
-done by passing our training set to the ``fit`` method. For the training
-set, we'll use all the images from our dataset, except for the last
-image, which we'll reserve for our predicting. We select the training set with
-the ``[:-1]`` Python syntax, which produces a new array that contains all but
-the last item from ``digits.data``::
-
-  >>> clf.fit(digits.data[:-1], digits.target[:-1])
-  SVC(C=100.0, gamma=0.001)
-
-Now you can *predict* new values. In this case, you'll predict using the last
-image from ``digits.data``. By predicting, you'll determine the image from the
-training set that best matches the last image.
-
-
-  >>> clf.predict(digits.data[-1:])
-  array([8])
-
-The corresponding image is:
-
-.. image:: /auto_examples/datasets/images/sphx_glr_plot_digits_last_image_001.png
-    :target: ../../auto_examples/datasets/plot_digits_last_image.html
-    :align: center
-    :scale: 50
-
-As you can see, it is a challenging task: after all, the images are of poor
-resolution. Do you agree with the classifier?
-
-A complete example of this classification problem is available as an
-example that you can run and study:
-:ref:`sphx_glr_auto_examples_classification_plot_digits_classification.py`.
-
-Conventions
------------
-
-scikit-learn estimators follow certain rules to make their behavior more
-predictive.  These are described in more detail in the :ref:`glossary`.
-
-Type casting
-~~~~~~~~~~~~
-
-Where possible, input of type ``float32`` will maintain its data type. Otherwise
-input will be cast to ``float64``::
-
-  >>> import numpy as np
-  >>> from sklearn import kernel_approximation
-
-  >>> rng = np.random.RandomState(0)
-  >>> X = rng.rand(10, 2000)
-  >>> X = np.array(X, dtype='float32')
-  >>> X.dtype
-  dtype('float32')
-
-  >>> transformer = kernel_approximation.RBFSampler()
-  >>> X_new = transformer.fit_transform(X)
-  >>> X_new.dtype
-  dtype('float32')
-
-In this example, ``X`` is ``float32``, and is unchanged by ``fit_transform(X)``.
-
-Using `float32`-typed training (or testing) data is often more
-efficient than using the usual ``float64`` ``dtype``: it allows to
-reduce the memory usage and sometimes also reduces processing time
-by leveraging the vector instructions of the CPU. However it can
-sometimes lead to numerical stability problems causing the algorithm
-to be more sensitive to the scale of the values and :ref:`require
-adequate preprocessing<preprocessing_scaler>`.
-
-Keep in mind however that not all scikit-learn estimators attempt to
-work in `float32` mode. For instance, some transformers will always
-cast their input to `float64` and return `float64` transformed
-values as a result.
-
-Regression targets are cast to ``float64`` and classification targets are
-maintained::
-
-    >>> from sklearn import datasets
-    >>> from sklearn.svm import SVC
-    >>> iris = datasets.load_iris()
-    >>> clf = SVC()
-    >>> clf.fit(iris.data, iris.target)
-    SVC()
-
-    >>> list(clf.predict(iris.data[:3]))
-    [0, 0, 0]
-
-    >>> clf.fit(iris.data, iris.target_names[iris.target])
-    SVC()
-
-    >>> list(clf.predict(iris.data[:3]))
-    ['setosa', 'setosa', 'setosa']
-
-Here, the first ``predict()`` returns an integer array, since ``iris.target``
-(an integer array) was used in ``fit``. The second ``predict()`` returns a string
-array, since ``iris.target_names`` was for fitting.
-
-Refitting and updating parameters
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Hyper-parameters of an estimator can be updated after it has been constructed
-via the :term:`set_params()<set_params>` method. Calling ``fit()`` more than
-once will overwrite what was learned by any previous ``fit()``::
-
-  >>> import numpy as np
-  >>> from sklearn.datasets import load_iris
-  >>> from sklearn.svm import SVC
-  >>> X, y = load_iris(return_X_y=True)
-
-  >>> clf = SVC()
-  >>> clf.set_params(kernel='linear').fit(X, y)
-  SVC(kernel='linear')
-  >>> clf.predict(X[:5])
-  array([0, 0, 0, 0, 0])
-
-  >>> clf.set_params(kernel='rbf').fit(X, y)
-  SVC()
-  >>> clf.predict(X[:5])
-  array([0, 0, 0, 0, 0])
-
-Here, the default kernel ``rbf`` is first changed to ``linear`` via
-:func:`SVC.set_params()<sklearn.svm.SVC.set_params>` after the estimator has
-been constructed, and changed back to ``rbf`` to refit the estimator and to
-make a second prediction.
-
-Multiclass vs. multilabel fitting
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When using :class:`multiclass classifiers <sklearn.multiclass>`,
-the learning and prediction task that is performed is dependent on the format of
-the target data fit upon::
-
-    >>> from sklearn.svm import SVC
-    >>> from sklearn.multiclass import OneVsRestClassifier
-    >>> from sklearn.preprocessing import LabelBinarizer
-
-    >>> X = [[1, 2], [2, 4], [4, 5], [3, 2], [3, 1]]
-    >>> y = [0, 0, 1, 1, 2]
-
-    >>> classif = OneVsRestClassifier(estimator=SVC(random_state=0))
-    >>> classif.fit(X, y).predict(X)
-    array([0, 0, 1, 1, 2])
-
-In the above case, the classifier is fit on a 1d array of multiclass labels and
-the ``predict()`` method therefore provides corresponding multiclass predictions.
-It is also possible to fit upon a 2d array of binary label indicators::
-
-    >>> y = LabelBinarizer().fit_transform(y)
-    >>> classif.fit(X, y).predict(X)
-    array([[1, 0, 0],
-           [1, 0, 0],
-           [0, 1, 0],
-           [0, 0, 0],
-           [0, 0, 0]])
-
-Here, the classifier is ``fit()``  on a 2d binary label representation of ``y``,
-using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`.
-In this case ``predict()`` returns a 2d array representing the corresponding
-multilabel predictions.
-
-Note that the fourth and fifth instances returned all zeroes, indicating that
-they matched none of the three labels ``fit`` upon. With multilabel outputs, it
-is similarly possible for an instance to be assigned multiple labels::
-
-  >>> from sklearn.preprocessing import MultiLabelBinarizer
-  >>> y = [[0, 1], [0, 2], [1, 3], [0, 2, 3], [2, 4]]
-  >>> y = MultiLabelBinarizer().fit_transform(y)
-  >>> classif.fit(X, y).predict(X)
-  array([[1, 1, 0, 0, 0],
-         [1, 0, 1, 0, 0],
-         [0, 1, 0, 1, 0],
-         [1, 0, 1, 0, 0],
-         [1, 0, 1, 0, 0]])
-
-In this case, the classifier is fit upon instances each assigned multiple labels.
-The :class:`MultiLabelBinarizer <sklearn.preprocessing.MultiLabelBinarizer>` is
-used to binarize the 2d array of multilabels to ``fit`` upon. As a result,
-``predict()`` returns a 2d array with multiple predicted labels for each instance.
diff --git a/doc/tutorial/common_includes/info.txt b/doc/tutorial/common_includes/info.txt
deleted file mode 100644
index f8e44fec90f2f..0000000000000
--- a/doc/tutorial/common_includes/info.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-Meant to share common RST file snippets that we want to reuse by inclusion 
-in the real tutorial in order to lower the maintenance burden 
-of redundant sections.
diff --git a/doc/tutorial/index.rst b/doc/tutorial/index.rst
deleted file mode 100644
index bd4b8997f5f39..0000000000000
--- a/doc/tutorial/index.rst
+++ /dev/null
@@ -1,26 +0,0 @@
-.. _tutorial_menu:
-
-======================
-scikit-learn Tutorials
-======================
-
-.. toctree::
-   :maxdepth: 2
-
-   basic/tutorial.rst
-   statistical_inference/index.rst
-   text_analytics/working_with_text_data.rst
-   machine_learning_map/index
-   ../presentations
-
-.. note:: **Doctest Mode**
-
-   The code-examples in the above tutorials are written in a
-   *python-console* format. If you wish to easily execute these examples
-   in **IPython**, use::
-
-	%doctest_mode
-
-   in the IPython-console. You can then simply copy and paste the examples
-   directly into IPython without having to worry about removing the **>>>**
-   manually.
diff --git a/doc/tutorial/machine_learning_map/README.md b/doc/tutorial/machine_learning_map/README.md
deleted file mode 100644
index 006b1e5e1a38c..0000000000000
--- a/doc/tutorial/machine_learning_map/README.md
+++ /dev/null
@@ -1,17 +0,0 @@
-The scikit-learn machine learning cheat sheet was originally created by Andreas Mueller:
-https://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html
-
-The current version of the chart is located at `doc/images/ml_map.svg` in SVG+XML
-format, created using [draw.io](https://draw.io/). To edit the chart, open the file in
-draw.io, make changes, and export as SVG with the same filename. Export configurations
-are:
-
-- Zoom: 100%
-- Border width: 15
-- Size: Diagram
-- Transparent Background: False
-- Appearance: Light
-
-Each node in the chart that contains an estimator should have a link, where the root
-directory is at `../../`. Note that after exporting the SVG, the links may be prefixed
-with e.g. `https://app.diagrams.net/`. Remember to check and remove them.
diff --git a/doc/tutorial/machine_learning_map/index.rst b/doc/tutorial/machine_learning_map/index.rst
deleted file mode 100644
index 5fd6879563489..0000000000000
--- a/doc/tutorial/machine_learning_map/index.rst
+++ /dev/null
@@ -1,75 +0,0 @@
-:html_theme.sidebar_secondary.remove:
-
-.. _ml_map:
-
-Choosing the right estimator
-============================
-
-Often the hardest part of solving a machine learning problem can be finding the right
-estimator for the job. Different estimators are better suited for different types of
-data and different problems.
-
-The flowchart below is designed to give users a bit of a rough guide on how to approach
-problems with regard to which estimators to try on your data. Click on any estimator in
-the chart below to see its documentation. Use scroll wheel to zoom in and out, and click
-and drag to pan around. You can also download the chart:
-:download:`ml_map.svg <../../images/ml_map.svg>`.
-
-.. raw:: html
-
-  <style>
-    #sk-ml-map {
-      height: 80vh;
-      margin: 1.5rem 0;
-    }
-
-    #sk-ml-map svg {
-      height: 100%;
-      width: 100%;
-      border: 2px solid var(--pst-color-border);
-      border-radius: 0.5rem;
-    }
-
-    html[data-theme="dark"] #sk-ml-map svg {
-      filter: invert(90%) hue-rotate(180deg);
-    }
-  </style>
-
-  <script src="../../_static/scripts/vendor/svg-pan-zoom.min.js"></script>
-  <script>
-    document.addEventListener("DOMContentLoaded", function () {
-      const beforePan = function (oldPan, newPan) {
-        const gutterWidth = 100, gutterHeight = 100;
-        const sizes = this.getSizes();
-
-        // Compute pan limits
-        const leftLimit = -((sizes.viewBox.x + sizes.viewBox.width) * sizes.realZoom) + gutterWidth;
-        const rightLimit = sizes.width - gutterWidth - (sizes.viewBox.x * sizes.realZoom);
-        const topLimit = -((sizes.viewBox.y + sizes.viewBox.height) * sizes.realZoom) + gutterHeight;
-        const bottomLimit = sizes.height - gutterHeight - (sizes.viewBox.y * sizes.realZoom);
-
-        return {
-          x: Math.max(leftLimit, Math.min(rightLimit, newPan.x)),
-          y: Math.max(topLimit, Math.min(bottomLimit, newPan.y))
-        };
-      };
-
-      // Limit the pan
-      svgPanZoom("#sk-ml-map svg", {
-        zoomEnabled: true,
-        controlIconsEnabled: true,
-        fit: 1,
-        center: 1,
-        beforePan: beforePan,
-      });
-    });
-  </script>
-
-  <div id="sk-ml-map">
-
-.. raw:: html
-  :file: ../../images/ml_map.svg
-
-.. raw:: html
-
-  </div>
diff --git a/doc/tutorial/statistical_inference/index.rst b/doc/tutorial/statistical_inference/index.rst
deleted file mode 100644
index 358bf16512254..0000000000000
--- a/doc/tutorial/statistical_inference/index.rst
+++ /dev/null
@@ -1,34 +0,0 @@
-.. _stat_learn_tut_index:
-
-==========================================================================
-A tutorial on statistical-learning for scientific data processing
-==========================================================================
-
-.. topic:: Statistical learning
-
-    `Machine learning <https://en.wikipedia.org/wiki/Machine_learning>`_ is
-    a technique with a growing importance, as the
-    size of the datasets experimental sciences are facing is rapidly
-    growing. Problems it tackles range from building a prediction function
-    linking different observations, to classifying observations, or
-    learning the structure in an unlabeled dataset.
-
-    This tutorial will explore *statistical learning*, the use of
-    machine learning techniques with the goal of `statistical inference
-    <https://en.wikipedia.org/wiki/Statistical_inference>`_:
-    drawing conclusions on the data at hand.
-
-    Scikit-learn is a Python module integrating classic machine
-    learning algorithms in the tightly-knit world of scientific Python
-    packages (`NumPy <https://www.numpy.org/>`_, `SciPy
-    <https://scipy.org/>`_, `matplotlib
-    <https://matplotlib.org/>`_).
-
-.. toctree::
-   :maxdepth: 2
-
-   settings
-   supervised_learning
-   model_selection
-   unsupervised_learning
-   putting_together
diff --git a/doc/tutorial/statistical_inference/model_selection.rst b/doc/tutorial/statistical_inference/model_selection.rst
deleted file mode 100644
index 7d7d5f69f18c4..0000000000000
--- a/doc/tutorial/statistical_inference/model_selection.rst
+++ /dev/null
@@ -1,315 +0,0 @@
-.. _model_selection_tut:
-
-============================================================
-Model selection: choosing estimators and their parameters
-============================================================
-
-Score, and cross-validated scores
-==================================
-
-As we have seen, every estimator exposes a ``score`` method that can judge
-the quality of the fit (or the prediction) on new data. **Bigger is
-better**.
-
-::
-
-    >>> from sklearn import datasets, svm
-    >>> X_digits, y_digits = datasets.load_digits(return_X_y=True)
-    >>> svc = svm.SVC(C=1, kernel='linear')
-    >>> svc.fit(X_digits[:-100], y_digits[:-100]).score(X_digits[-100:], y_digits[-100:])
-    0.98
-
-To get a better measure of prediction accuracy (which we can use as a
-proxy for goodness of fit of the model), we can successively split the
-data in *folds* that we use for training and testing::
-
-    >>> import numpy as np
-    >>> X_folds = np.array_split(X_digits, 3)
-    >>> y_folds = np.array_split(y_digits, 3)
-    >>> scores = list()
-    >>> for k in range(3):
-    ...     # We use 'list' to copy, in order to 'pop' later on
-    ...     X_train = list(X_folds)
-    ...     X_test = X_train.pop(k)
-    ...     X_train = np.concatenate(X_train)
-    ...     y_train = list(y_folds)
-    ...     y_test = y_train.pop(k)
-    ...     y_train = np.concatenate(y_train)
-    ...     scores.append(svc.fit(X_train, y_train).score(X_test, y_test))
-    >>> print(scores)
-    [0.934..., 0.956..., 0.939...]
-
-.. currentmodule:: sklearn.model_selection
-
-This is called a :class:`KFold` cross-validation.
-
-.. _cv_generators_tut:
-
-Cross-validation generators
-=============================
-
-Scikit-learn has a collection of classes which can be used to generate lists of
-train/test indices for popular cross-validation strategies.
-
-They expose a ``split`` method which accepts the input
-dataset to be split and yields the train/test set indices for each iteration
-of the chosen cross-validation strategy.
-
-This example shows an example usage of the ``split`` method.
-
-    >>> from sklearn.model_selection import KFold, cross_val_score
-    >>> X = ["a", "a", "a", "b", "b", "c", "c", "c", "c", "c"]
-    >>> k_fold = KFold(n_splits=5)
-    >>> for train_indices, test_indices in k_fold.split(X):
-    ...      print('Train: %s | test: %s' % (train_indices, test_indices))
-    Train: [2 3 4 5 6 7 8 9] | test: [0 1]
-    Train: [0 1 4 5 6 7 8 9] | test: [2 3]
-    Train: [0 1 2 3 6 7 8 9] | test: [4 5]
-    Train: [0 1 2 3 4 5 8 9] | test: [6 7]
-    Train: [0 1 2 3 4 5 6 7] | test: [8 9]
-
-The cross-validation can then be performed easily::
-
-    >>> [svc.fit(X_digits[train], y_digits[train]).score(X_digits[test], y_digits[test])
-    ...  for train, test in k_fold.split(X_digits)]
-    [0.963..., 0.922..., 0.963..., 0.963..., 0.930...]
-
-The cross-validation score can be directly calculated using the
-:func:`cross_val_score` helper. Given an estimator, the cross-validation object
-and the input dataset, the :func:`cross_val_score` splits the data repeatedly into
-a training and a testing set, trains the estimator using the training set and
-computes the scores based on the testing set for each iteration of cross-validation.
-
-By default the estimator's ``score`` method is used to compute the individual scores.
-
-Refer the :ref:`metrics module <metrics>` to learn more on the available scoring
-methods.
-
-    >>> cross_val_score(svc, X_digits, y_digits, cv=k_fold, n_jobs=-1)
-    array([0.96388889, 0.92222222, 0.9637883 , 0.9637883 , 0.93036212])
-
-`n_jobs=-1` means that the computation will be dispatched on all the CPUs
-of the computer.
-
-Alternatively, the ``scoring`` argument can be provided to specify an alternative
-scoring method.
-
-    >>> cross_val_score(svc, X_digits, y_digits, cv=k_fold,
-    ...                 scoring='precision_macro')
-    array([0.96578289, 0.92708922, 0.96681476, 0.96362897, 0.93192644])
-
-**Cross-validation generators**
-
-
-.. list-table::
-
-   *
-
-    - :class:`KFold` **(n_splits, shuffle, random_state)**
-
-    - :class:`StratifiedKFold` **(n_splits, shuffle, random_state)**
-
-    - :class:`GroupKFold` **(n_splits)**
-
-
-   *
-
-    - Splits it into K folds, trains on K-1 and then tests on the left-out.
-
-    - Same as K-Fold but preserves the class distribution within each fold.
-
-    - Ensures that the same group is not in both testing and training sets.
-
-
-.. list-table::
-
-   *
-
-    - :class:`ShuffleSplit` **(n_splits, test_size, train_size, random_state)**
-
-    - :class:`StratifiedShuffleSplit`
-
-    - :class:`GroupShuffleSplit`
-
-   *
-
-    - Generates train/test indices based on random permutation.
-
-    - Same as shuffle split but preserves the class distribution within each iteration.
-
-    - Ensures that the same group is not in both testing and training sets.
-
-
-.. list-table::
-
-   *
-
-    - :class:`LeaveOneGroupOut` **()**
-
-    - :class:`LeavePGroupsOut`  **(n_groups)**
-
-    - :class:`LeaveOneOut` **()**
-
-
-
-   *
-
-    - Takes a group array to group observations.
-
-    - Leave P groups out.
-
-    - Leave one observation out.
-
-
-
-.. list-table::
-
-   *
-
-    - :class:`LeavePOut` **(p)**
-
-    - :class:`PredefinedSplit`
-
-   *
-
-    - Leave P observations out.
-
-    - Generates train/test indices based on predefined splits.
-
-
-.. currentmodule:: sklearn.svm
-
-.. topic:: **Exercise**
-
-    On the digits dataset, plot the cross-validation score of a :class:`SVC`
-    estimator with a linear kernel as a function of parameter ``C`` (use a
-    logarithmic grid of points, from 1 to 10).
-
-    ::
-
-        >>> import numpy as np
-        >>> from sklearn import datasets, svm
-        >>> from sklearn.model_selection import cross_val_score
-        >>> X, y = datasets.load_digits(return_X_y=True)
-        >>> svc = svm.SVC(kernel="linear")
-        >>> C_s = np.logspace(-10, 0, 10)
-        >>> scores = list()
-        >>> scores_std = list()
-
-    .. dropdown:: Solution
-
-        .. plot::
-            :context: close-figs
-            :align: center
-
-            import numpy as np
-            from sklearn import datasets, svm
-            from sklearn.model_selection import cross_val_score
-            X, y = datasets.load_digits(return_X_y=True)
-            svc = svm.SVC(kernel="linear")
-            C_s = np.logspace(-10, 0, 10)
-            scores = list()
-            scores_std = list()
-            for C in C_s:
-                svc.C = C
-                this_scores = cross_val_score(svc, X, y, n_jobs=1)
-                scores.append(np.mean(this_scores))
-                scores_std.append(np.std(this_scores))
-
-            import matplotlib.pyplot as plt
-
-            plt.figure()
-            plt.semilogx(C_s, scores)
-            plt.semilogx(C_s, np.array(scores) + np.array(scores_std), "b--")
-            plt.semilogx(C_s, np.array(scores) - np.array(scores_std), "b--")
-            locs, labels = plt.yticks()
-            plt.yticks(locs, list(map(lambda x: "%g" % x, locs)))
-            plt.ylabel("CV score")
-            plt.xlabel("Parameter C")
-            plt.ylim(0, 1.1)
-            plt.show()
-
-Grid-search and cross-validated estimators
-============================================
-
-Grid-search
--------------
-
-.. currentmodule:: sklearn.model_selection
-
-scikit-learn provides an object that, given data, computes the score
-during the fit of an estimator on a parameter grid and chooses the
-parameters to maximize the cross-validation score. This object takes an
-estimator during the construction and exposes an estimator API::
-
-    >>> from sklearn.model_selection import GridSearchCV, cross_val_score
-    >>> Cs = np.logspace(-6, -1, 10)
-    >>> clf = GridSearchCV(estimator=svc, param_grid=dict(C=Cs),
-    ...                    n_jobs=-1)
-    >>> clf.fit(X_digits[:1000], y_digits[:1000])        # doctest: +SKIP
-    GridSearchCV(cv=None,...
-    >>> clf.best_score_                                  # doctest: +SKIP
-    0.925...
-    >>> clf.best_estimator_.C                            # doctest: +SKIP
-    0.0077...
-
-    >>> # Prediction performance on test set is not as good as on train set
-    >>> clf.score(X_digits[1000:], y_digits[1000:])      # doctest: +SKIP
-    0.943...
-
-
-By default, the :class:`GridSearchCV` uses a 5-fold cross-validation. However,
-if it detects that a classifier is passed, rather than a regressor, it uses
-a stratified 5-fold.
-
-.. topic:: Nested cross-validation
-
-    ::
-
-        >>> cross_val_score(clf, X_digits, y_digits) # doctest: +SKIP
-        array([0.938..., 0.963..., 0.944...])
-
-    Two cross-validation loops are performed in parallel: one by the
-    :class:`GridSearchCV` estimator to set ``gamma`` and the other one by
-    ``cross_val_score`` to measure the prediction performance of the
-    estimator. The resulting scores are unbiased estimates of the
-    prediction score on new data.
-
-.. warning::
-
-    You cannot nest objects with parallel computing (``n_jobs`` different
-    than 1).
-
-.. _cv_estimators_tut:
-
-Cross-validated estimators
-----------------------------
-
-Cross-validation to set a parameter can be done more efficiently on an
-algorithm-by-algorithm basis. This is why, for certain estimators,
-scikit-learn exposes :ref:`cross_validation` estimators that set their
-parameter automatically by cross-validation::
-
-    >>> from sklearn import linear_model, datasets
-    >>> lasso = linear_model.LassoCV()
-    >>> X_diabetes, y_diabetes = datasets.load_diabetes(return_X_y=True)
-    >>> lasso.fit(X_diabetes, y_diabetes)
-    LassoCV()
-    >>> # The estimator chose automatically its lambda:
-    >>> lasso.alpha_
-    0.00375...
-
-These estimators are called similarly to their counterparts, with 'CV'
-appended to their name.
-
-.. topic:: **Exercise**
-
-   On the diabetes dataset, find the optimal regularization parameter
-   alpha.
-
-   **Bonus**: How much can you trust the selection of alpha?
-
-   .. literalinclude:: ../../auto_examples/exercises/plot_cv_diabetes.py
-       :lines: 17-24
-
-   **Solution:** :ref:`sphx_glr_auto_examples_exercises_plot_cv_diabetes.py`
diff --git a/doc/tutorial/statistical_inference/putting_together.rst b/doc/tutorial/statistical_inference/putting_together.rst
deleted file mode 100644
index b28ba77bfac33..0000000000000
--- a/doc/tutorial/statistical_inference/putting_together.rst
+++ /dev/null
@@ -1,62 +0,0 @@
-=========================
-Putting it all together
-=========================
-
-..  Imports
-    >>> import numpy as np
-
-Pipelining
-============
-
-We have seen that some estimators can transform data and that some estimators
-can predict variables. We can also create combined estimators:
-
-.. literalinclude:: ../../auto_examples/compose/plot_digits_pipe.py
-    :lines: 23-63
-
-.. image:: ../../auto_examples/compose/images/sphx_glr_plot_digits_pipe_001.png
-   :target: ../../auto_examples/compose/plot_digits_pipe.html
-   :scale: 65
-   :align: center
-
-Face recognition with eigenfaces
-=================================
-
-The dataset used in this example is a preprocessed excerpt of the
-"Labeled Faces in the Wild", also known as LFW_:
-
-http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)
-
-.. _LFW: http://vis-www.cs.umass.edu/lfw/
-
-.. literalinclude:: ../../auto_examples/applications/plot_face_recognition.py
-
-.. figure:: ../../images/plot_face_recognition_1.png
-   :scale: 50
-
-   **Prediction**
-
-.. figure:: ../../images/plot_face_recognition_2.png
-   :scale: 50
-
-   **Eigenfaces**
-
-Expected results for the top 5 most represented people in the dataset::
-
-                     precision    recall  f1-score   support
-
-  Gerhard_Schroeder       0.91      0.75      0.82        28
-    Donald_Rumsfeld       0.84      0.82      0.83        33
-         Tony_Blair       0.65      0.82      0.73        34
-       Colin_Powell       0.78      0.88      0.83        58
-      George_W_Bush       0.93      0.86      0.90       129
-
-        avg / total       0.86      0.84      0.85       282
-
-
-Open problem: Stock Market Structure
-=====================================
-
-Can we predict the variation in stock prices for Google over a given time frame?
-
-:ref:`stock_market`
diff --git a/doc/tutorial/statistical_inference/settings.rst b/doc/tutorial/statistical_inference/settings.rst
deleted file mode 100644
index 422972fbd6cb4..0000000000000
--- a/doc/tutorial/statistical_inference/settings.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-
-==========================================================================
-Statistical learning: the setting and the estimator object in scikit-learn
-==========================================================================
-
-Datasets
-=========
-
-Scikit-learn deals with learning information from one or more
-datasets that are represented as 2D arrays. They can be understood as a
-list of multi-dimensional observations. We say that the first axis of
-these arrays is the **samples** axis, while the second is the
-**features** axis.
-
-.. topic:: A simple example shipped with scikit-learn: iris dataset
-
-    ::
-
-        >>> from sklearn import datasets
-        >>> iris = datasets.load_iris()
-        >>> data = iris.data
-        >>> data.shape
-        (150, 4)
-
-    It is made of 150 observations of irises, each described by 4
-    features: their sepal and petal length and width, as detailed in
-    ``iris.DESCR``.
-
-When the data is not initially in the ``(n_samples, n_features)`` shape, it
-needs to be preprocessed in order to be used by scikit-learn.
-
-.. topic:: An example of reshaping data would be the digits dataset
-
-    The digits dataset is made of 1797 8x8 images of hand-written
-    digits ::
-
-        >>> digits = datasets.load_digits()
-        >>> digits.images.shape
-        (1797, 8, 8)
-        >>> import matplotlib.pyplot as plt
-        >>> plt.imshow(digits.images[-1],
-        ...            cmap=plt.cm.gray_r)
-        <...>
-    
-    .. image:: /auto_examples/datasets/images/sphx_glr_plot_digits_last_image_001.png
-        :target: ../../auto_examples/datasets/plot_digits_last_image.html
-        :align: center
-
-    To use this dataset with scikit-learn, we transform each 8x8 image into a
-    feature vector of length 64 ::
-
-        >>> data = digits.images.reshape(
-        ...     (digits.images.shape[0], -1)
-        ... )
-
-Estimators objects
-===================
-
-.. Some code to make the doctests run
-
-   >>> from sklearn.base import BaseEstimator
-   >>> class Estimator(BaseEstimator):
-   ...      def __init__(self, param1=0, param2=0):
-   ...          self.param1 = param1
-   ...          self.param2 = param2
-   ...      def fit(self, data):
-   ...          pass
-   >>> estimator = Estimator()
-
-**Fitting data**: the main API implemented by scikit-learn is that of the
-`estimator`. An estimator is any object that learns from data;
-it may be a classification, regression or clustering algorithm or
-a *transformer* that extracts/filters useful features from raw data.
-
-All estimator objects expose a ``fit`` method that takes a dataset
-(usually a 2-d array):
-
-    >>> estimator.fit(data)
-
-**Estimator parameters**: All the parameters of an estimator can be set
-when it is instantiated or by modifying the corresponding attribute::
-
-    >>> estimator = Estimator(param1=1, param2=2)
-    >>> estimator.param1
-    1
-
-**Estimated parameters**: When data is fitted with an estimator,
-parameters are estimated from the data at hand. All the estimated
-parameters are attributes of the estimator object ending by an
-underscore::
-
-    >>> estimator.estimated_param_ #doctest: +SKIP
diff --git a/doc/tutorial/statistical_inference/supervised_learning.rst b/doc/tutorial/statistical_inference/supervised_learning.rst
deleted file mode 100644
index 41adf60c44fc7..0000000000000
--- a/doc/tutorial/statistical_inference/supervised_learning.rst
+++ /dev/null
@@ -1,528 +0,0 @@
-.. _supervised_learning_tut:
-
-=======================================================================================
-Supervised learning: predicting an output variable from high-dimensional observations
-=======================================================================================
-
-
-.. topic:: The problem solved in supervised learning
-
-   :ref:`Supervised learning <supervised-learning>`
-   consists in learning the link between two
-   datasets: the observed data ``X`` and an external variable ``y`` that we
-   are trying to predict, usually called "target" or "labels". Most often,
-   ``y`` is a 1D array of length ``n_samples``.
-
-   All supervised `estimators <https://en.wikipedia.org/wiki/Estimator>`_
-   in scikit-learn implement a ``fit(X, y)`` method to fit the model
-   and a ``predict(X)`` method that, given unlabeled observations ``X``,
-   returns the predicted labels ``y``.
-
-.. topic:: Vocabulary: classification and regression
-
-   If the prediction task is to classify the observations in a set of
-   finite labels, in other words to "name" the objects observed, the task
-   is said to be a **classification** task. On the other hand, if the goal
-   is to predict a continuous target variable, it is said to be a
-   **regression** task.
-
-   When doing classification in scikit-learn, ``y`` is a vector of integers
-   or strings.
-
-   Note: See the :ref:`Introduction to machine learning with scikit-learn
-   Tutorial <introduction>` for a quick run-through on the basic machine
-   learning vocabulary used within scikit-learn.
-
-Nearest neighbor and the curse of dimensionality
-=================================================
-
-.. topic:: Classifying irises:
-
-    The iris dataset is a classification task consisting in identifying 3
-    different types of irises (Setosa, Versicolour, and Virginica) from
-    their petal and sepal length and width::
-
-        >>> import numpy as np
-        >>> from sklearn import datasets
-        >>> iris_X, iris_y = datasets.load_iris(return_X_y=True)
-        >>> np.unique(iris_y)
-        array([0, 1, 2])
-
-    .. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png
-        :target: ../../auto_examples/datasets/plot_iris_dataset.html
-        :align: center
-	:scale: 50
-
-k-Nearest neighbors classifier
--------------------------------
-
-The simplest possible classifier is the
-`nearest neighbor <https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_:
-given a new observation ``X_test``, find in the training set (i.e. the data
-used to train the estimator) the observation with the closest feature vector.
-(Please see the :ref:`Nearest Neighbors section<neighbors>` of the online
-Scikit-learn documentation for more information about this type of classifier.)
-
-.. topic:: Training set and testing set
-
-   While experimenting with any learning algorithm, it is important not to
-   test the prediction of an estimator on the data used to fit the
-   estimator as this would not be evaluating the performance of the
-   estimator on **new data**. This is why datasets are often split into
-   *train* and *test* data.
-
-**KNN (k nearest neighbors) classification example**:
-
-.. image:: /auto_examples/neighbors/images/sphx_glr_plot_classification_001.png
-   :target: ../../auto_examples/neighbors/plot_classification.html
-   :align: center
-   :scale: 70
-
-::
-
-    >>> # Split iris data in train and test data
-    >>> # A random permutation, to split the data randomly
-    >>> np.random.seed(0)
-    >>> indices = np.random.permutation(len(iris_X))
-    >>> iris_X_train = iris_X[indices[:-10]]
-    >>> iris_y_train = iris_y[indices[:-10]]
-    >>> iris_X_test = iris_X[indices[-10:]]
-    >>> iris_y_test = iris_y[indices[-10:]]
-    >>> # Create and fit a nearest-neighbor classifier
-    >>> from sklearn.neighbors import KNeighborsClassifier
-    >>> knn = KNeighborsClassifier()
-    >>> knn.fit(iris_X_train, iris_y_train)
-    KNeighborsClassifier()
-    >>> knn.predict(iris_X_test)
-    array([1, 2, 1, 0, 0, 0, 2, 1, 2, 0])
-    >>> iris_y_test
-    array([1, 1, 1, 0, 0, 0, 2, 1, 2, 0])
-
-.. _curse_of_dimensionality:
-
-The curse of dimensionality
--------------------------------
-
-For an estimator to be effective, you need the distance between neighboring
-points to be less than some value :math:`d`, which depends on the problem.
-In one dimension, this requires on average :math:`n \sim 1/d` points.
-In the context of the above :math:`k`-NN example, if the data is described by
-just one feature with values ranging from 0 to 1 and with :math:`n` training
-observations, then new data will be no further away than :math:`1/n`.
-Therefore, the nearest neighbor decision rule will be efficient as soon as
-:math:`1/n` is small compared to the scale of between-class feature variations.
-
-If the number of features is :math:`p`, you now require :math:`n \sim 1/d^p`
-points.  Let's say that we require 10 points in one dimension: now :math:`10^p`
-points are required in :math:`p` dimensions to pave the :math:`[0, 1]` space.
-As :math:`p` becomes large, the number of training points required for a good
-estimator grows exponentially.
-
-For example, if each point is just a single number (8 bytes), then an
-effective :math:`k`-NN estimator in a paltry :math:`p \sim 20` dimensions would
-require more training data than the current estimated size of the entire
-internet (±1000 Exabytes or so).
-
-This is called the
-`curse of dimensionality  <https://en.wikipedia.org/wiki/Curse_of_dimensionality>`_
-and is a core problem that machine learning addresses.
-
-Linear model: from regression to sparsity
-==========================================
-
-.. topic:: Diabetes dataset
-
-    The diabetes dataset consists of 10 physiological variables (age,
-    sex, weight, blood pressure) measured on 442 patients, and an
-    indication of disease progression after one year::
-
-        >>> diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
-        >>> diabetes_X_train = diabetes_X[:-20]
-        >>> diabetes_X_test  = diabetes_X[-20:]
-        >>> diabetes_y_train = diabetes_y[:-20]
-        >>> diabetes_y_test  = diabetes_y[-20:]
-
-    The task at hand is to predict disease progression from physiological
-    variables.
-
-Linear regression
-------------------
-
-.. currentmodule:: sklearn.linear_model
-
-:class:`LinearRegression`,
-in its simplest form, fits a linear model to the data set by adjusting
-a set of parameters in order to make the sum of the squared residuals
-of the model as small as possible.
-
-Linear models: :math:`y = X\beta + \epsilon`
-
-* :math:`X`: data
-* :math:`y`: target variable
-* :math:`\beta`: Coefficients
-* :math:`\epsilon`: Observation noise
-
-.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_001.png
-   :target: ../../auto_examples/linear_model/plot_ols.html
-   :scale: 50
-   :align: center
-
-::
-
-    >>> from sklearn import linear_model
-    >>> regr = linear_model.LinearRegression()
-    >>> regr.fit(diabetes_X_train, diabetes_y_train)
-    LinearRegression()
-    >>> print(regr.coef_) # doctest: +SKIP
-    [   0.30349955 -237.63931533  510.53060544  327.73698041 -814.13170937
-      492.81458798  102.84845219  184.60648906  743.51961675   76.09517222]
-
-
-    >>> # The mean square error
-    >>> np.mean((regr.predict(diabetes_X_test) - diabetes_y_test)**2)
-    2004.5...
-
-    >>> # Explained variance score: 1 is perfect prediction
-    >>> # and 0 means that there is no linear relationship
-    >>> # between X and y.
-    >>> regr.score(diabetes_X_test, diabetes_y_test)
-    0.585...
-
-
-.. _shrinkage:
-
-Shrinkage
-----------
-
-If there are few data points per dimension, noise in the observations
-induces high variance:
-
-::
-
-    >>> X = np.c_[ .5, 1].T
-    >>> y = [.5, 1]
-    >>> test = np.c_[ 0, 2].T
-    >>> regr = linear_model.LinearRegression()
-
-    >>> import matplotlib.pyplot as plt
-    >>> plt.figure()
-    <...>
-    >>> np.random.seed(0)
-    >>> for _ in range(6):
-    ...     this_X = .1 * np.random.normal(size=(2, 1)) + X
-    ...     regr.fit(this_X, y)
-    ...     plt.plot(test, regr.predict(test))
-    ...     plt.scatter(this_X, y, s=3)
-    LinearRegression...
-
-.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_001.png
-   :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html
-   :align: center
-
-A solution in high-dimensional statistical learning is to *shrink* the
-regression coefficients to zero: any two randomly chosen set of
-observations are likely to be uncorrelated. This is called :class:`Ridge`
-regression:
-
-::
-
-    >>> regr = linear_model.Ridge(alpha=.1)
-
-    >>> plt.figure()
-    <...>
-    >>> np.random.seed(0)
-    >>> for _ in range(6):
-    ...     this_X = .1 * np.random.normal(size=(2, 1)) + X
-    ...     regr.fit(this_X, y)
-    ...     plt.plot(test, regr.predict(test))
-    ...     plt.scatter(this_X, y, s=3)
-    Ridge...
-
-.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_002.png
-   :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html
-   :align: center
-
-This is an example of **bias/variance tradeoff**: the larger the ridge
-``alpha`` parameter, the higher the bias and the lower the variance.
-
-We can choose ``alpha`` to minimize left out error, this time using the
-diabetes dataset rather than our synthetic data::
-
-    >>> alphas = np.logspace(-4, -1, 6)
-    >>> print([regr.set_params(alpha=alpha)
-    ...            .fit(diabetes_X_train, diabetes_y_train)
-    ...            .score(diabetes_X_test, diabetes_y_test)
-    ...        for alpha in alphas])
-    [0.585..., 0.585..., 0.5854..., 0.5855..., 0.583..., 0.570...]
-
-
-.. note::
-
-    Capturing in the fitted parameters noise that prevents the model to
-    generalize to new data is called
-    `overfitting <https://en.wikipedia.org/wiki/Overfitting>`_. The bias introduced
-    by the ridge regression is called a
-    `regularization <https://en.wikipedia.org/wiki/Regularization_%28machine_learning%29>`_.
-
-.. _sparsity:
-
-Sparsity
-----------
-
-
-.. |diabetes_ols_1| image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_3d_001.png
-   :target: ../../auto_examples/linear_model/plot_ols_3d.html
-   :scale: 65
-
-.. |diabetes_ols_3| image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_3d_003.png
-   :target: ../../auto_examples/linear_model/plot_ols_3d.html
-   :scale: 65
-
-.. |diabetes_ols_2| image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_3d_002.png
-   :target: ../../auto_examples/linear_model/plot_ols_3d.html
-   :scale: 65
-
-
-
-
-.. rst-class:: centered
-
-    **Fitting only features 1 and 2**
-
-.. centered:: |diabetes_ols_1| |diabetes_ols_3| |diabetes_ols_2|
-
-.. note::
-
-   A representation of the full diabetes dataset would involve 11
-   dimensions (10 feature dimensions and one of the target variable). It
-   is hard to develop an intuition on such representation, but it may be
-   useful to keep in mind that it would be a fairly *empty* space.
-
-
-
-We can see that, although feature 2 has a strong coefficient on the full
-model, it conveys little information on ``y`` when considered with feature 1.
-
-To improve the conditioning of the problem (i.e. mitigating the
-:ref:`curse_of_dimensionality`), it would be interesting to select only the
-informative features and set non-informative ones, like feature 2 to 0. Ridge
-regression will decrease their contribution, but not set them to zero. Another
-penalization approach, called :ref:`lasso` (least absolute shrinkage and
-selection operator), can set some coefficients to zero. Such methods are
-called **sparse methods** and sparsity can be seen as an
-application of Occam's razor: *prefer simpler models*.
-
-::
-
-    >>> regr = linear_model.Lasso()
-    >>> scores = [regr.set_params(alpha=alpha)
-    ...               .fit(diabetes_X_train, diabetes_y_train)
-    ...               .score(diabetes_X_test, diabetes_y_test)
-    ...           for alpha in alphas]
-    >>> best_alpha = alphas[scores.index(max(scores))]
-    >>> regr.alpha = best_alpha
-    >>> regr.fit(diabetes_X_train, diabetes_y_train)
-    Lasso(alpha=0.025118864315095794)
-    >>> print(regr.coef_)
-    [   0.         -212.4...   517.2...  313.7... -160.8...
-       -0.         -187.1...   69.3...  508.6...   71.8... ]
-
-.. topic:: **Different algorithms for the same problem**
-
-    Different algorithms can be used to solve the same mathematical
-    problem. For instance the ``Lasso`` object in scikit-learn
-    solves the lasso regression problem using a
-    `coordinate descent <https://en.wikipedia.org/wiki/Coordinate_descent>`_ method,
-    that is efficient on large datasets. However, scikit-learn also
-    provides the :class:`LassoLars` object using the *LARS* algorithm,
-    which is very efficient for problems in which the weight vector estimated
-    is very sparse (i.e. problems with very few observations).
-
-.. _clf_tut:
-
-Classification
----------------
-
-For classification, as in the labeling
-`iris <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ task, linear
-regression is not the right approach as it will give too much weight to
-data far from the decision frontier. A linear approach is to fit a sigmoid
-function or **logistic** function:
-
-.. image:: /auto_examples/linear_model/images/sphx_glr_plot_logistic_001.png
-   :target: ../../auto_examples/linear_model/plot_logistic.html
-   :scale: 70
-   :align: center
-
-.. math::
-
-   y = \textrm{sigmoid}(X\beta - \textrm{offset}) + \epsilon =
-   \frac{1}{1 + \textrm{exp}(- X\beta + \textrm{offset})} + \epsilon
-
-::
-
-    >>> log = linear_model.LogisticRegression(C=1e5)
-    >>> log.fit(iris_X_train, iris_y_train)
-    LogisticRegression(C=100000.0)
-
-This is known as :class:`LogisticRegression`.
-
-.. image:: /auto_examples/linear_model/images/sphx_glr_plot_iris_logistic_001.png
-   :target: ../../auto_examples/linear_model/plot_iris_logistic.html
-   :scale: 83
-   :align: center
-
-.. topic:: Multiclass classification
-
-   If you have several classes to predict, an option often used is to fit
-   one-versus-all classifiers and then use a voting heuristic for the final
-   decision.
-
-.. topic:: Shrinkage and sparsity with logistic regression
-
-   The ``C`` parameter controls the amount of regularization in the
-   :class:`LogisticRegression` object: a large value for ``C`` results in
-   less regularization.
-   ``penalty="l2"`` gives :ref:`shrinkage` (i.e. non-sparse coefficients), while
-   ``penalty="l1"`` gives :ref:`sparsity`.
-
-.. topic:: **Exercise**
-   :class: green
-
-   Try classifying the digits dataset with nearest neighbors and a linear
-   model. Leave out the last 10% and test prediction performance on these
-   observations.
-
-   .. literalinclude:: ../../auto_examples/exercises/plot_digits_classification_exercise.py
-       :lines: 15-19
-
-   A solution can be downloaded :download:`here <../../auto_examples/exercises/plot_digits_classification_exercise.py>`.
-
-
-Support vector machines (SVMs)
-================================
-
-Linear SVMs
--------------
-
-
-:ref:`svm` belong to the discriminant model family: they try to find a combination of
-samples to build a plane maximizing the margin between the two classes.
-Regularization is set by the ``C`` parameter: a small value for ``C`` means the margin
-is calculated using many or all of the observations around the separating line
-(more regularization);
-a large value for ``C`` means the margin is calculated on observations close to
-the separating line (less regularization).
-
-.. currentmodule :: sklearn.svm
-
-.. figure:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_001.png
-   :target: ../../auto_examples/svm/plot_svm_margin.html
-
-   **Unregularized SVM**
-
-.. figure:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_002.png
-   :target: ../../auto_examples/svm/plot_svm_margin.html
-
-   **Regularized SVM (default)**
-
-.. rubric:: Examples
-
-- :ref:`sphx_glr_auto_examples_svm_plot_iris_svc.py`
-
-
-SVMs can be used in regression --:class:`SVR` (Support Vector Regression)--, or in
-classification --:class:`SVC` (Support Vector Classification).
-
-::
-
-    >>> from sklearn import svm
-    >>> svc = svm.SVC(kernel='linear')
-    >>> svc.fit(iris_X_train, iris_y_train)
-    SVC(kernel='linear')
-
-
-.. warning:: **Normalizing data**
-
-   For many estimators, including the SVMs, having datasets with unit
-   standard deviation for each feature is important to get good
-   prediction.
-
-.. _using_kernels_tut:
-
-Using kernels
--------------
-
-Classes are not always linearly separable in feature space. The solution is to
-build a decision function that is not linear but may be polynomial instead.
-This is done using the *kernel trick* that can be seen as
-creating a decision energy by positioning *kernels* on observations:
-
-Linear kernel
-^^^^^^^^^^^^^
-
-::
-
-    >>> svc = svm.SVC(kernel='linear')
-
-.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_002.png
-   :target: ../../auto_examples/svm/plot_svm_kernels.html
-
-Polynomial kernel
-^^^^^^^^^^^^^^^^^
-
-::
-
-    >>> svc = svm.SVC(kernel='poly',
-    ...               degree=3)
-    >>> # degree: polynomial degree
-
-.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_003.png
-   :target: ../../auto_examples/svm/plot_svm_kernels.html
-
-RBF kernel (Radial Basis Function)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-::
-
-    >>> svc = svm.SVC(kernel='rbf')
-    >>> # gamma: inverse of size of
-    >>> # radial kernel
-
-.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_004.png
-   :target: ../../auto_examples/svm/plot_svm_kernels.html
-
-Sigmoid kernel
-^^^^^^^^^^^^^^
-
-::
-
-    >>> svc = svm.SVC(kernel='sigmoid')
-
-.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_005.png
-   :target: ../../auto_examples/svm/plot_svm_kernels.html
-
-
-.. topic:: **Exercise**
-   :class: green
-
-   Try classifying classes 1 and 2 from the iris dataset with SVMs, with
-   the 2 first features. Leave out 10% of each class and test prediction
-   performance on these observations.
-
-   **Warning**: the classes are ordered, do not leave out the last 10%,
-   you would be testing on only one class.
-
-   **Hint**: You can use the ``decision_function`` method on a grid to get
-   intuitions.
-
-   .. literalinclude:: ../../auto_examples/exercises/plot_iris_exercise.py
-       :lines: 18-23
-
-   .. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png
-      :target: ../../auto_examples/datasets/plot_iris_dataset.html
-      :align: center
-      :scale: 70
-
-
-   A solution can be downloaded :download:`here <../../auto_examples/exercises/plot_iris_exercise.py>`
diff --git a/doc/tutorial/statistical_inference/unsupervised_learning.rst b/doc/tutorial/statistical_inference/unsupervised_learning.rst
deleted file mode 100644
index fd827cc75b212..0000000000000
--- a/doc/tutorial/statistical_inference/unsupervised_learning.rst
+++ /dev/null
@@ -1,297 +0,0 @@
-============================================================
-Unsupervised learning: seeking representations of the data
-============================================================
-
-Clustering: grouping observations together
-============================================
-
-.. topic:: The problem solved in clustering
-
-    Given the iris dataset, if we knew that there were 3 types of iris, but
-    did not have access to a taxonomist to label them: we could try a
-    **clustering task**: split the observations into well-separated group
-    called *clusters*.
-
-::
-
-   >>> # Set the PRNG
-   >>> import numpy as np
-   >>> np.random.seed(1)
-
-K-means clustering
--------------------
-
-Note that there exist a lot of different clustering criteria and associated
-algorithms. The simplest clustering algorithm is :ref:`k_means`.
-
-::
-
-    >>> from sklearn import cluster, datasets
-    >>> X_iris, y_iris = datasets.load_iris(return_X_y=True)
-
-    >>> k_means = cluster.KMeans(n_clusters=3)
-    >>> k_means.fit(X_iris)
-    KMeans(n_clusters=3)
-    >>> print(k_means.labels_[::10])
-    [1 1 1 1 1 2 0 0 0 0 2 2 2 2 2]
-    >>> print(y_iris[::10])
-    [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2]
-
-.. figure:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_001.png
-   :target: ../../auto_examples/cluster/plot_cluster_iris.html
-   :scale: 63
-
-.. warning::
-
-    There is absolutely no guarantee of recovering a ground truth. First,
-    choosing the right number of clusters is hard. Second, the algorithm
-    is sensitive to initialization, and can fall into local minima,
-    although scikit-learn employs several tricks to mitigate this issue.
-
-    For instance, on the image above, we can observe the difference between the
-    ground-truth (bottom right figure) and different clustering. We do not
-    recover the expected labels, either because the number of cluster was
-    chosen to be to large (top left figure) or suffer from a bad initialization
-    (bottom left figure).
-
-    **It is therefore important to not over-interpret clustering results.**
-
-.. topic:: **Application example: vector quantization**
-
-    Clustering in general and KMeans, in particular, can be seen as a way
-    of choosing a small number of exemplars to compress the information.
-    The problem is sometimes known as
-    `vector quantization <https://en.wikipedia.org/wiki/Vector_quantization>`_.
-    For instance, this can be used to posterize an image::
-
-        >>> import scipy as sp
-        >>> try:
-        ...    face = sp.face(gray=True)
-        ... except AttributeError:
-        ...    from scipy import misc
-        ...    face = misc.face(gray=True)
-    	>>> X = face.reshape((-1, 1)) # We need an (n_sample, n_feature) array
-    	>>> k_means = cluster.KMeans(n_clusters=5, n_init=1)
-    	>>> k_means.fit(X)
-        KMeans(n_clusters=5, n_init=1)
-    	>>> values = k_means.cluster_centers_.squeeze()
-    	>>> labels = k_means.labels_
-    	>>> face_compressed = np.choose(labels, values)
-    	>>> face_compressed.shape = face.shape
-
-**Raw image**
-
-.. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_001.png
-   :target: ../../auto_examples/cluster/plot_face_compress.html
-
-**K-means quantization**
-
-.. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_004.png
-   :target: ../../auto_examples/cluster/plot_face_compress.html
-
-**Equal bins**
-
-.. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_002.png
-   :target: ../../auto_examples/cluster/plot_face_compress.html
-
-Hierarchical agglomerative clustering: Ward
----------------------------------------------
-
-A :ref:`hierarchical_clustering` method is a type of cluster analysis
-that aims to build a hierarchy of clusters. In general, the various approaches
-of this technique are either:
-
-* **Agglomerative** - bottom-up approaches: each observation starts in its
-  own cluster, and clusters are iteratively merged in such a way to
-  minimize a *linkage* criterion. This approach is particularly interesting
-  when the clusters of interest are made of only a few observations. When
-  the number of clusters is large, it is much more computationally efficient
-  than k-means.
-
-* **Divisive** - top-down approaches: all observations start in one
-  cluster, which is iteratively split as one moves down the hierarchy.
-  For estimating large numbers of clusters, this approach is both slow (due
-  to all observations starting as one cluster, which it splits recursively)
-  and statistically ill-posed.
-
-Connectivity-constrained clustering
-.....................................
-
-With agglomerative clustering, it is possible to specify which samples can be
-clustered together by giving a connectivity graph. Graphs in scikit-learn
-are represented by their adjacency matrix. Often, a sparse matrix is used.
-This can be useful, for instance, to retrieve connected regions (sometimes
-also referred to as connected components) when clustering an image.
-
-.. image:: /auto_examples/cluster/images/sphx_glr_plot_coin_ward_segmentation_001.png
-   :target: ../../auto_examples/cluster/plot_coin_ward_segmentation.html
-   :scale: 40
-   :align: center
-
-::
-
-    >>> from skimage.data import coins
-    >>> from scipy.ndimage import gaussian_filter
-    >>> from skimage.transform import rescale
-    >>> rescaled_coins = rescale(
-    ...     gaussian_filter(coins(), sigma=2),
-    ...     0.2, mode='reflect', anti_aliasing=False
-    ... )
-    >>> X = np.reshape(rescaled_coins, (-1, 1))
-
-We need a vectorized version of the image. `'rescaled_coins'` is a down-scaled
-version of the coins image to speed up the process::
-
-    >>> from sklearn.feature_extraction import grid_to_graph
-    >>> connectivity = grid_to_graph(*rescaled_coins.shape)
-
-Define the graph structure of the data. Pixels connected to their neighbors::
-
-    >>> n_clusters = 27  # number of regions
-
-    >>> from sklearn.cluster import AgglomerativeClustering
-    >>> ward = AgglomerativeClustering(n_clusters=n_clusters, linkage='ward',
-    ...                                connectivity=connectivity)
-    >>> ward.fit(X)
-    AgglomerativeClustering(connectivity=..., n_clusters=27)
-    >>> label = np.reshape(ward.labels_, rescaled_coins.shape)
-
-Feature agglomeration
-......................
-
-We have seen that sparsity could be used to mitigate the curse of
-dimensionality, *i.e* an insufficient amount of observations compared to the
-number of features. Another approach is to merge together similar
-features: **feature agglomeration**. This approach can be implemented by
-clustering in the feature direction, in other words clustering the
-transposed data.
-
-.. image:: /auto_examples/cluster/images/sphx_glr_plot_digits_agglomeration_001.png
-   :target: ../../auto_examples/cluster/plot_digits_agglomeration.html
-   :align: center
-   :scale: 57
-
-::
-
-   >>> digits = datasets.load_digits()
-   >>> images = digits.images
-   >>> X = np.reshape(images, (len(images), -1))
-   >>> connectivity = grid_to_graph(*images[0].shape)
-
-   >>> agglo = cluster.FeatureAgglomeration(connectivity=connectivity,
-   ...                                      n_clusters=32)
-   >>> agglo.fit(X)
-   FeatureAgglomeration(connectivity=..., n_clusters=32)
-   >>> X_reduced = agglo.transform(X)
-
-   >>> X_approx = agglo.inverse_transform(X_reduced)
-   >>> images_approx = np.reshape(X_approx, images.shape)
-
-.. topic:: ``transform`` and ``inverse_transform`` methods
-
-   Some estimators expose a ``transform`` method, for instance to reduce
-   the dimensionality of the dataset.
-
-Decompositions: from a signal to components and loadings
-===========================================================
-
-.. topic:: **Components and loadings**
-
-   If X is our multivariate data, then the problem that we are trying to solve
-   is to rewrite it on a different observational basis: we want to learn
-   loadings L and a set of components C such that *X = L C*.
-   Different criteria exist to choose the components
-
-Principal component analysis: PCA
------------------------------------
-
-:ref:`PCA` selects the successive components that explain the maximum variance in the
-signal. Let's create a synthetic 3-dimensional dataset.
-
-.. np.random.seed(0)
-
-::
-
-    >>> # Create a signal with only 2 useful dimensions
-    >>> x1 = np.random.normal(size=(100, 1))
-    >>> x2 = np.random.normal(size=(100, 1))
-    >>> x3 = x1 + x2
-    >>> X = np.concatenate([x1, x2, x3], axis=1)
-
-The point cloud spanned by the observations above is very flat in one
-direction: one of the three univariate features (i.e. z-axis) can almost be exactly
-computed using the other two.
-
-.. plot::
-   :context: close-figs
-   :align: center
-
-   >>> import matplotlib.pyplot as plt
-   >>> fig = plt.figure()
-   >>> ax = fig.add_subplot(111, projection='3d')
-   >>> ax.scatter(X[:, 0], X[:, 1], X[:, 2])
-   <...>
-   >>> _ = ax.set(xlabel="x", ylabel="y", zlabel="z")
-
-
-PCA finds the directions in which the data is not *flat*.
-
-::
-
-   >>> from sklearn import decomposition
-   >>> pca = decomposition.PCA()
-   >>> pca.fit(X)
-   PCA()
-   >>> print(pca.explained_variance_)  # doctest: +SKIP
-   [  2.18565811e+00   1.19346747e+00   8.43026679e-32]
-
-Looking at the explained variance, we see that only the first two components
-are useful. PCA can be used to reduce dimensionality while preserving
-most of the information. It will project the data on the principal subspace.
-
-::
-
-   >>> pca.set_params(n_components=2)
-   PCA(n_components=2)
-   >>> X_reduced = pca.fit_transform(X)
-   >>> X_reduced.shape
-   (100, 2)
-
-.. Eigenfaces here?
-
-Independent Component Analysis: ICA
--------------------------------------
-
-:ref:`ICA` selects components so that the distribution of their loadings carries
-a maximum amount of independent information. It is able to recover
-**non-Gaussian** independent signals:
-
-.. image:: /auto_examples/decomposition/images/sphx_glr_plot_ica_blind_source_separation_001.png
-   :target: ../../auto_examples/decomposition/plot_ica_blind_source_separation.html
-   :scale: 70
-   :align: center
-
-.. np.random.seed(0)
-
-::
-
-    >>> # Generate sample data
-    >>> import numpy as np
-    >>> from scipy import signal
-    >>> time = np.linspace(0, 10, 2000)
-    >>> s1 = np.sin(2 * time)  # Signal 1 : sinusoidal signal
-    >>> s2 = np.sign(np.sin(3 * time))  # Signal 2 : square signal
-    >>> s3 = signal.sawtooth(2 * np.pi * time)  # Signal 3: saw tooth signal
-    >>> S = np.c_[s1, s2, s3]
-    >>> S += 0.2 * np.random.normal(size=S.shape)  # Add noise
-    >>> S /= S.std(axis=0)  # Standardize data
-    >>> # Mix data
-    >>> A = np.array([[1, 1, 1], [0.5, 2, 1], [1.5, 1, 2]])  # Mixing matrix
-    >>> X = np.dot(S, A.T)  # Generate observations
-
-    >>> # Compute ICA
-    >>> ica = decomposition.FastICA()
-    >>> S_ = ica.fit_transform(X)  # Get the estimated sources
-    >>> A_ = ica.mixing_.T
-    >>> np.allclose(X,  np.dot(S_, A_) + ica.mean_)
-    True
diff --git a/doc/tutorial/text_analytics/.gitignore b/doc/tutorial/text_analytics/.gitignore
deleted file mode 100644
index 54c78634d9dd1..0000000000000
--- a/doc/tutorial/text_analytics/.gitignore
+++ /dev/null
@@ -1,25 +0,0 @@
-# cruft
-.*.swp
-*.pyc
-.DS_Store
-*.pdf
-
-# folder to be used for working on the exercises
-workspace
-
-# output of the sphinx build of the documentation
-tutorial/_build
-
-# datasets to be fetched from the web and cached locally
-data/twenty_newsgroups/20news-bydate.tar.gz
-data/twenty_newsgroups/20news-bydate-train
-data/twenty_newsgroups/20news-bydate-test
-
-data/movie_reviews/txt_sentoken
-data/movie_reviews/poldata.README.2.0
-
-data/languages/paragraphs
-data/languages/short_paragraphs
-data/languages/html
-
-data/labeled_faces_wild/lfw_preprocessed/
diff --git a/doc/tutorial/text_analytics/data/languages/fetch_data.py b/doc/tutorial/text_analytics/data/languages/fetch_data.py
deleted file mode 100644
index 2dd0f208ade86..0000000000000
--- a/doc/tutorial/text_analytics/data/languages/fetch_data.py
+++ /dev/null
@@ -1,103 +0,0 @@
-
-# simple python script to collect text paragraphs from various languages on the
-# same topic namely the Wikipedia encyclopedia itself
-
-import os
-from urllib.request import Request, build_opener
-
-import lxml.html
-from lxml.etree import ElementTree
-import numpy as np
-
-import codecs
-
-pages = {
-    'ar': 'http://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7',   # noqa: E501
-    'de': 'http://de.wikipedia.org/wiki/Wikipedia',
-    'en': 'https://en.wikipedia.org/wiki/Wikipedia',
-    'es': 'http://es.wikipedia.org/wiki/Wikipedia',
-    'fr': 'http://fr.wikipedia.org/wiki/Wikip%C3%A9dia',
-    'it': 'http://it.wikipedia.org/wiki/Wikipedia',
-    'ja': 'http://ja.wikipedia.org/wiki/Wikipedia',
-    'nl': 'http://nl.wikipedia.org/wiki/Wikipedia',
-    'pl': 'http://pl.wikipedia.org/wiki/Wikipedia',
-    'pt': 'http://pt.wikipedia.org/wiki/Wikip%C3%A9dia',
-    'ru': 'http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F',  # noqa: E501
-#    u'zh': u'http://zh.wikipedia.org/wiki/Wikipedia',
-}
-
-html_folder = 'html'
-text_folder = 'paragraphs'
-short_text_folder = 'short_paragraphs'
-n_words_per_short_text = 5
-
-
-if not os.path.exists(html_folder):
-    os.makedirs(html_folder)
-
-for lang, page in pages.items():
-
-    text_lang_folder = os.path.join(text_folder, lang)
-    if not os.path.exists(text_lang_folder):
-        os.makedirs(text_lang_folder)
-
-    short_text_lang_folder = os.path.join(short_text_folder, lang)
-    if not os.path.exists(short_text_lang_folder):
-        os.makedirs(short_text_lang_folder)
-
-    opener = build_opener()
-    html_filename = os.path.join(html_folder, lang + '.html')
-    if not os.path.exists(html_filename):
-        print("Downloading %s" % page)
-        request = Request(page)
-        # change the User Agent to avoid being blocked by Wikipedia
-        # downloading a couple of articles should not be considered abusive
-        request.add_header('User-Agent', 'OpenAnything/1.0')
-        html_content = opener.open(request).read()
-        with open(html_filename, 'wb') as f:
-            f.write(html_content)
-
-    # decode the payload explicitly as UTF-8 since lxml is confused for some
-    # reason
-    with codecs.open(html_filename,'r','utf-8') as html_file:
-        html_content = html_file.read()
-    tree = ElementTree(lxml.html.document_fromstring(html_content))
-    i = 0
-    j = 0
-    for p in tree.findall('//p'):
-        content = p.text_content()
-        if len(content) < 100:
-            # skip paragraphs that are too short - probably too noisy and not
-            # representative of the actual language
-            continue
-
-        text_filename = os.path.join(text_lang_folder,
-                                     '%s_%04d.txt' % (lang, i))
-        print("Writing %s" % text_filename)
-        with open(text_filename, 'wb') as f:
-            f.write(content.encode('utf-8', 'ignore'))
-        i += 1
-
-        # split the paragraph into fake smaller paragraphs to make the
-        # problem harder e.g. more similar to tweets
-        if lang in ('zh', 'ja'):
-        # FIXME: whitespace tokenizing does not work on chinese and japanese
-            continue
-        words = content.split()
-        n_groups = len(words) / n_words_per_short_text
-        if n_groups < 1:
-            continue
-        groups = np.array_split(words, n_groups)
-
-        for group in groups:
-            small_content = " ".join(group)
-
-            short_text_filename = os.path.join(short_text_lang_folder,
-                                               '%s_%04d.txt' % (lang, j))
-            print("Writing %s" % short_text_filename)
-            with open(short_text_filename, 'wb') as f:
-                f.write(small_content.encode('utf-8', 'ignore'))
-            j += 1
-            if j >= 1000:
-                break
-
diff --git a/doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py b/doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py
deleted file mode 100644
index 67def14889774..0000000000000
--- a/doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py
+++ /dev/null
@@ -1,33 +0,0 @@
-"""Script to download the movie review dataset"""
-
-from pathlib import Path
-from hashlib import sha256
-import tarfile
-from urllib.request import urlopen
-
-
-URL = "http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz"
-
-ARCHIVE_SHA256 = "fc0dccc2671af5db3c5d8f81f77a1ebfec953ecdd422334062df61ede36b2179"
-ARCHIVE_NAME = Path(URL.rsplit("/", 1)[1])
-DATA_FOLDER = Path("txt_sentoken")
-
-
-if not DATA_FOLDER.exists():
-
-    if not ARCHIVE_NAME.exists():
-        print("Downloading dataset from %s (3 MB)" % URL)
-        opener = urlopen(URL)
-        with open(ARCHIVE_NAME, "wb") as archive:
-            archive.write(opener.read())
-
-    try:
-        print("Checking the integrity of the archive")
-        assert sha256(ARCHIVE_NAME.read_bytes()).hexdigest() == ARCHIVE_SHA256
-
-        print("Decompressing %s" % ARCHIVE_NAME)
-        with tarfile.open(ARCHIVE_NAME, "r:gz") as archive:
-            archive.extractall(path=".")
-
-    finally:
-        ARCHIVE_NAME.unlink()
diff --git a/doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py b/doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py
deleted file mode 100644
index 438481120d126..0000000000000
--- a/doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py
+++ /dev/null
@@ -1,62 +0,0 @@
-"""Build a language detector model
-
-The goal of this exercise is to train a linear classifier on text features
-that represent sequences of up to 3 consecutive characters so as to be
-recognize natural languages by using the frequencies of short character
-sequences as 'fingerprints'.
-
-"""
-# Author: Olivier Grisel <olivier.grisel@ensta.org>
-# License: Simplified BSD
-
-import sys
-
-from sklearn.feature_extraction.text import TfidfVectorizer
-from sklearn.linear_model import Perceptron
-from sklearn.pipeline import Pipeline
-from sklearn.datasets import load_files
-from sklearn.model_selection import train_test_split
-from sklearn import metrics
-
-
-# The training data folder must be passed as first argument
-languages_data_folder = sys.argv[1]
-dataset = load_files(languages_data_folder)
-
-# Split the dataset in training and test set:
-docs_train, docs_test, y_train, y_test = train_test_split(
-    dataset.data, dataset.target, test_size=0.5)
-
-
-# TASK: Build a vectorizer that splits strings into sequence of 1 to 3
-# characters instead of word tokens
-
-# TASK: Build a vectorizer / classifier pipeline using the previous analyzer
-# the pipeline instance should stored in a variable named clf
-
-# TASK: Fit the pipeline on the training set
-
-# TASK: Predict the outcome on the testing set in a variable named y_predicted
-
-# Print the classification report
-print(metrics.classification_report(y_test, y_predicted,
-                                    target_names=dataset.target_names))
-
-# Plot the confusion matrix
-cm = metrics.confusion_matrix(y_test, y_predicted)
-print(cm)
-
-#import matplotlib.pyplot as plt
-#plt.matshow(cm, cmap=plt.cm.jet)
-#plt.show()
-
-# Predict the result on some short new sentences:
-sentences = [
-    'This is a language detection test.',
-    'Ceci est un test de d\xe9tection de la langue.',
-    'Dies ist ein Test, um die Sprache zu erkennen.',
-]
-predicted = clf.predict(sentences)
-
-for s, p in zip(sentences, predicted):
-    print('The language of "%s" is "%s"' % (s, dataset.target_names[p]))
diff --git a/doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py b/doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py
deleted file mode 100644
index 23299f5f01b3d..0000000000000
--- a/doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py
+++ /dev/null
@@ -1,63 +0,0 @@
-"""Build a sentiment analysis / polarity model
-
-Sentiment analysis can be casted as a binary text classification problem,
-that is fitting a linear classifier on features extracted from the text
-of the user messages so as to guess whether the opinion of the author is
-positive or negative.
-
-In this examples we will use a movie review dataset.
-
-"""
-# Author: Olivier Grisel <olivier.grisel@ensta.org>
-# License: Simplified BSD
-
-import sys
-from sklearn.feature_extraction.text import TfidfVectorizer
-from sklearn.svm import LinearSVC
-from sklearn.pipeline import Pipeline
-from sklearn.model_selection import GridSearchCV
-from sklearn.datasets import load_files
-from sklearn.model_selection import train_test_split
-from sklearn import metrics
-
-
-if __name__ == "__main__":
-    # NOTE: we put the following in a 'if __name__ == "__main__"' protected
-    # block to be able to use a multi-core grid search that also works under
-    # Windows, see: http://docs.python.org/library/multiprocessing.html#windows
-    # The multiprocessing module is used as the backend of joblib.Parallel
-    # that is used when n_jobs != 1 in GridSearchCV
-
-    # the training data folder must be passed as first argument
-    movie_reviews_data_folder = sys.argv[1]
-    dataset = load_files(movie_reviews_data_folder, shuffle=False)
-    print("n_samples: %d" % len(dataset.data))
-
-    # split the dataset in training and test set:
-    docs_train, docs_test, y_train, y_test = train_test_split(
-        dataset.data, dataset.target, test_size=0.25, random_state=None)
-
-    # TASK: Build a vectorizer / classifier pipeline that filters out tokens
-    # that are too rare or too frequent
-
-    # TASK: Build a grid search to find out whether unigrams or bigrams are
-    # more useful.
-    # Fit the pipeline on the training set using grid search for the parameters
-
-    # TASK: print the cross-validated scores for the each parameters set
-    # explored by the grid search
-
-    # TASK: Predict the outcome on the testing set and store it in a variable
-    # named y_predicted
-
-    # Print the classification report
-    print(metrics.classification_report(y_test, y_predicted,
-                                        target_names=dataset.target_names))
-
-    # Print and plot the confusion matrix
-    cm = metrics.confusion_matrix(y_test, y_predicted)
-    print(cm)
-
-    # import matplotlib.pyplot as plt
-    # plt.matshow(cm)
-    # plt.show()
diff --git a/doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py b/doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py
deleted file mode 100644
index 21cee0c80e00e..0000000000000
--- a/doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py
+++ /dev/null
@@ -1,70 +0,0 @@
-"""Build a language detector model
-
-The goal of this exercise is to train a linear classifier on text features
-that represent sequences of up to 3 consecutive characters so as to be
-recognize natural languages by using the frequencies of short character
-sequences as 'fingerprints'.
-
-"""
-# Author: Olivier Grisel <olivier.grisel@ensta.org>
-# License: Simplified BSD
-
-import sys
-
-from sklearn.feature_extraction.text import TfidfVectorizer
-from sklearn.linear_model import Perceptron
-from sklearn.pipeline import Pipeline
-from sklearn.datasets import load_files
-from sklearn.model_selection import train_test_split
-from sklearn import metrics
-
-
-# The training data folder must be passed as first argument
-languages_data_folder = sys.argv[1]
-dataset = load_files(languages_data_folder)
-
-# Split the dataset in training and test set:
-docs_train, docs_test, y_train, y_test = train_test_split(
-    dataset.data, dataset.target, test_size=0.5)
-
-
-# TASK: Build a vectorizer that splits strings into sequence of 1 to 3
-# characters instead of word tokens
-vectorizer = TfidfVectorizer(ngram_range=(1, 3), analyzer='char',
-                             use_idf=False)
-
-# TASK: Build a vectorizer / classifier pipeline using the previous analyzer
-# the pipeline instance should stored in a variable named clf
-clf = Pipeline([
-    ('vec', vectorizer),
-    ('clf', Perceptron()),
-])
-
-# TASK: Fit the pipeline on the training set
-clf.fit(docs_train, y_train)
-
-# TASK: Predict the outcome on the testing set in a variable named y_predicted
-y_predicted = clf.predict(docs_test)
-
-# Print the classification report
-print(metrics.classification_report(y_test, y_predicted,
-                                    target_names=dataset.target_names))
-
-# Plot the confusion matrix
-cm = metrics.confusion_matrix(y_test, y_predicted)
-print(cm)
-
-#import matlotlib.pyplot as plt
-#plt.matshow(cm, cmap=plt.cm.jet)
-#plt.show()
-
-# Predict the result on some short new sentences:
-sentences = [
-    'This is a language detection test.',
-    'Ceci est un test de d\xe9tection de la langue.',
-    'Dies ist ein Test, um die Sprache zu erkennen.',
-]
-predicted = clf.predict(sentences)
-
-for s, p in zip(sentences, predicted):
-    print('The language of "%s" is "%s"' % (s, dataset.target_names[p]))
diff --git a/doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py b/doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py
deleted file mode 100644
index 434bece341975..0000000000000
--- a/doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py
+++ /dev/null
@@ -1,79 +0,0 @@
-"""Build a sentiment analysis / polarity model
-
-Sentiment analysis can be casted as a binary text classification problem,
-that is fitting a linear classifier on features extracted from the text
-of the user messages so as to guess whether the opinion of the author is
-positive or negative.
-
-In this examples we will use a movie review dataset.
-
-"""
-# Author: Olivier Grisel <olivier.grisel@ensta.org>
-# License: Simplified BSD
-
-import sys
-from sklearn.feature_extraction.text import TfidfVectorizer
-from sklearn.svm import LinearSVC
-from sklearn.pipeline import Pipeline
-from sklearn.model_selection import GridSearchCV
-from sklearn.datasets import load_files
-from sklearn.model_selection import train_test_split
-from sklearn import metrics
-
-
-if __name__ == "__main__":
-    # NOTE: we put the following in a 'if __name__ == "__main__"' protected
-    # block to be able to use a multi-core grid search that also works under
-    # Windows, see: http://docs.python.org/library/multiprocessing.html#windows
-    # The multiprocessing module is used as the backend of joblib.Parallel
-    # that is used when n_jobs != 1 in GridSearchCV
-
-    # the training data folder must be passed as first argument
-    movie_reviews_data_folder = sys.argv[1]
-    dataset = load_files(movie_reviews_data_folder, shuffle=False)
-    print("n_samples: %d" % len(dataset.data))
-
-    # split the dataset in training and test set:
-    docs_train, docs_test, y_train, y_test = train_test_split(
-        dataset.data, dataset.target, test_size=0.25, random_state=None)
-
-    # TASK: Build a vectorizer / classifier pipeline that filters out tokens
-    # that are too rare or too frequent
-    pipeline = Pipeline([
-        ('vect', TfidfVectorizer(min_df=3, max_df=0.95)),
-        ('clf', LinearSVC(C=1000)),
-    ])
-
-    # TASK: Build a grid search to find out whether unigrams or bigrams are
-    # more useful.
-    # Fit the pipeline on the training set using grid search for the parameters
-    parameters = {
-        'vect__ngram_range': [(1, 1), (1, 2)],
-    }
-    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1)
-    grid_search.fit(docs_train, y_train)
-
-    # TASK: print the mean and std for each candidate along with the parameter
-    # settings for all the candidates explored by grid search.
-    n_candidates = len(grid_search.cv_results_['params'])
-    for i in range(n_candidates):
-        print(i, 'params - %s; mean - %0.2f; std - %0.2f'
-                 % (grid_search.cv_results_['params'][i],
-                    grid_search.cv_results_['mean_test_score'][i],
-                    grid_search.cv_results_['std_test_score'][i]))
-
-    # TASK: Predict the outcome on the testing set and store it in a variable
-    # named y_predicted
-    y_predicted = grid_search.predict(docs_test)
-
-    # Print the classification report
-    print(metrics.classification_report(y_test, y_predicted,
-                                        target_names=dataset.target_names))
-
-    # Print and plot the confusion matrix
-    cm = metrics.confusion_matrix(y_test, y_predicted)
-    print(cm)
-
-    # import matplotlib.pyplot as plt
-    # plt.matshow(cm)
-    # plt.show()
diff --git a/doc/tutorial/text_analytics/solutions/generate_skeletons.py b/doc/tutorial/text_analytics/solutions/generate_skeletons.py
deleted file mode 100644
index 4729b976530c7..0000000000000
--- a/doc/tutorial/text_analytics/solutions/generate_skeletons.py
+++ /dev/null
@@ -1,38 +0,0 @@
-"""Generate skeletons from the example code"""
-import os
-
-exercise_dir = os.path.dirname(__file__)
-if exercise_dir == '':
-    exercise_dir = '.'
-
-skeleton_dir = os.path.abspath(os.path.join(exercise_dir, '..', 'skeletons'))
-if not os.path.exists(skeleton_dir):
-    os.makedirs(skeleton_dir)
-
-solutions = os.listdir(exercise_dir)
-
-for f in solutions:
-    if not f.endswith('.py'):
-        continue
-
-    if f == os.path.basename(__file__):
-        continue
-
-    print("Generating skeleton for %s" % f)
-
-    input_file = open(os.path.join(exercise_dir, f))
-    output_file = open(os.path.join(skeleton_dir, f), 'w')
-
-    in_exercise_region = False
-
-    for line in input_file:
-        linestrip = line.strip()
-        if len(linestrip) == 0:
-            in_exercise_region = False
-        elif linestrip.startswith('# TASK:'):
-            in_exercise_region = True
-
-        if not in_exercise_region or linestrip.startswith('#'):
-            output_file.write(line)
-
-    output_file.close()
diff --git a/doc/tutorial/text_analytics/working_with_text_data.rst b/doc/tutorial/text_analytics/working_with_text_data.rst
deleted file mode 100644
index 43fd305c3b8b6..0000000000000
--- a/doc/tutorial/text_analytics/working_with_text_data.rst
+++ /dev/null
@@ -1,586 +0,0 @@
-.. _text_data_tutorial:
-
-======================
-Working With Text Data
-======================
-
-The goal of this guide is to explore some of the main ``scikit-learn``
-tools on a single practical task: analyzing a collection of text
-documents (newsgroups posts) on twenty different topics.
-
-In this section we will see how to:
-
-- load the file contents and the categories
-
-- extract feature vectors suitable for machine learning
-
-- train a linear model to perform categorization
-
-- use a grid search strategy to find a good configuration of both
-  the feature extraction components and the classifier
-
-
-Tutorial setup
---------------
-
-To get started with this tutorial, you must first install
-*scikit-learn* and all of its required dependencies.
-
-Please refer to the :ref:`installation instructions <installation-instructions>`
-page for more information and for system-specific instructions.
-
-The source of this tutorial can be found within your scikit-learn folder::
-
-    scikit-learn/doc/tutorial/text_analytics/
-
-The source can also be found `on Github
-<https://github.com/scikit-learn/scikit-learn/tree/main/doc/tutorial/text_analytics>`_.
-
-The tutorial folder should contain the following sub-folders:
-
-* ``*.rst files`` - the source of the tutorial document written with sphinx
-
-* ``data`` - folder to put the datasets used during the tutorial
-
-* ``skeletons`` - sample incomplete scripts for the exercises
-
-* ``solutions`` - solutions of the exercises
-
-
-You can already copy the skeletons into a new folder somewhere
-on your hard-drive named ``sklearn_tut_workspace``, where you
-will edit your own files for the exercises while keeping
-the original skeletons intact:
-
-.. prompt:: bash $
-
-  cp -r skeletons work_directory/sklearn_tut_workspace
-
-
-Machine learning algorithms need data. Go to each ``$TUTORIAL_HOME/data``
-sub-folder and run the ``fetch_data.py`` script from there (after
-having read them first).
-
-For instance:
-
-.. prompt:: bash $
-
-  cd $TUTORIAL_HOME/data/languages
-  less fetch_data.py
-  python fetch_data.py
-
-
-Loading the 20 newsgroups dataset
----------------------------------
-
-The dataset is called "Twenty Newsgroups". Here is the official
-description, quoted from the `website
-<http://people.csail.mit.edu/jrennie/20Newsgroups/>`_:
-
-  The 20 Newsgroups data set is a collection of approximately 20,000
-  newsgroup documents, partitioned (nearly) evenly across 20 different
-  newsgroups. To the best of our knowledge, it was originally collected
-  by Ken Lang, probably for his paper "Newsweeder: Learning to filter
-  netnews," though he does not explicitly mention this collection.
-  The 20 newsgroups collection has become a popular data set for
-  experiments in text applications of machine learning techniques,
-  such as text classification and text clustering.
-
-In the following we will use the built-in dataset loader for 20 newsgroups
-from scikit-learn. Alternatively, it is possible to download the dataset
-manually from the website and use the :func:`sklearn.datasets.load_files`
-function by pointing it to the ``20news-bydate-train`` sub-folder of the
-uncompressed archive folder.
-
-In order to get faster execution times for this first example, we will
-work on a partial dataset with only 4 categories out of the 20 available
-in the dataset::
-
-  >>> categories = ['alt.atheism', 'soc.religion.christian',
-  ...               'comp.graphics', 'sci.med']
-
-We can now load the list of files matching those categories as follows::
-
-  >>> from sklearn.datasets import fetch_20newsgroups
-  >>> twenty_train = fetch_20newsgroups(subset='train',
-  ...     categories=categories, shuffle=True, random_state=42)
-
-The returned dataset is a ``scikit-learn`` "bunch": a simple holder
-object with fields that can be both accessed as python ``dict``
-keys or ``object`` attributes for convenience, for instance the
-``target_names`` holds the list of the requested category names::
-
-  >>> twenty_train.target_names
-  ['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']
-
-The files themselves are loaded in memory in the ``data`` attribute. For
-reference the filenames are also available::
-
-  >>> len(twenty_train.data)
-  2257
-  >>> len(twenty_train.filenames)
-  2257
-
-Let's print the first lines of the first loaded file::
-
-  >>> print("\n".join(twenty_train.data[0].split("\n")[:3]))
-  From: sd345@city.ac.uk (Michael Collier)
-  Subject: Converting images to HP LaserJet III?
-  Nntp-Posting-Host: hampton
-
-  >>> print(twenty_train.target_names[twenty_train.target[0]])
-  comp.graphics
-
-Supervised learning algorithms will require a category label for each
-document in the training set. In this case the category is the name of the
-newsgroup which also happens to be the name of the folder holding the
-individual documents.
-
-For speed and space efficiency reasons, ``scikit-learn`` loads the
-target attribute as an array of integers that corresponds to the
-index of the category name in the ``target_names`` list. The category
-integer id of each sample is stored in the ``target`` attribute::
-
-  >>> twenty_train.target[:10]
-  array([1, 1, 3, 3, 3, 3, 3, 2, 2, 2])
-
-It is possible to get back the category names as follows::
-
-  >>> for t in twenty_train.target[:10]:
-  ...     print(twenty_train.target_names[t])
-  ...
-  comp.graphics
-  comp.graphics
-  soc.religion.christian
-  soc.religion.christian
-  soc.religion.christian
-  soc.religion.christian
-  soc.religion.christian
-  sci.med
-  sci.med
-  sci.med
-
-You might have noticed that the samples were shuffled randomly when we called
-``fetch_20newsgroups(..., shuffle=True, random_state=42)``: this is useful if
-you wish to select only a subset of samples to quickly train a model and get a
-first idea of the results before re-training on the complete dataset later.
-
-
-Extracting features from text files
------------------------------------
-
-In order to perform machine learning on text documents, we first need to
-turn the text content into numerical feature vectors.
-
-.. currentmodule:: sklearn.feature_extraction.text
-
-
-Bags of words
-~~~~~~~~~~~~~
-
-The most intuitive way to do so is to use a bags of words representation:
-
-1. Assign a fixed integer id to each word occurring in any document
-   of the training set (for instance by building a dictionary
-   from words to integer indices).
-
-2. For each document ``#i``, count the number of occurrences of each
-   word ``w`` and store it in ``X[i, j]`` as the value of feature
-   ``#j`` where ``j`` is the index of word ``w`` in the dictionary.
-
-The bags of words representation implies that ``n_features`` is
-the number of distinct words in the corpus: this number is typically
-larger than 100,000.
-
-If ``n_samples == 10000``, storing ``X`` as a NumPy array of type
-float32 would require 10000 x 100000 x 4 bytes = **4GB in RAM** which
-is barely manageable on today's computers.
-
-Fortunately, **most values in X will be zeros** since for a given
-document less than a few thousand distinct words will be
-used. For this reason we say that bags of words are typically
-**high-dimensional sparse datasets**. We can save a lot of memory by
-only storing the non-zero parts of the feature vectors in memory.
-
-``scipy.sparse`` matrices are data structures that do exactly this,
-and ``scikit-learn`` has built-in support for these structures.
-
-
-Tokenizing text with ``scikit-learn``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Text preprocessing, tokenizing and filtering of stopwords are all included
-in :class:`CountVectorizer`, which builds a dictionary of features and
-transforms documents to feature vectors::
-
-  >>> from sklearn.feature_extraction.text import CountVectorizer
-  >>> count_vect = CountVectorizer()
-  >>> X_train_counts = count_vect.fit_transform(twenty_train.data)
-  >>> X_train_counts.shape
-  (2257, 35788)
-
-:class:`CountVectorizer` supports counts of N-grams of words or consecutive
-characters. Once fitted, the vectorizer has built a dictionary of feature
-indices::
-
-  >>> count_vect.vocabulary_.get(u'algorithm')
-  4690
-
-The index value of a word in the vocabulary is linked to its frequency
-in the whole training corpus.
-
-.. note:
-
-  The method ``count_vect.fit_transform`` performs two actions:
-  it learns the vocabulary and transforms the documents into count vectors.
-  It's possible to separate these steps by calling
-  ``count_vect.fit(twenty_train.data)`` followed by
-  ``X_train_counts = count_vect.transform(twenty_train.data)``,
-  but doing so would tokenize and vectorize each text file twice.
-
-
-From occurrences to frequencies
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Occurrence count is a good start but there is an issue: longer
-documents will have higher average count values than shorter documents,
-even though they might talk about the same topics.
-
-To avoid these potential discrepancies it suffices to divide the
-number of occurrences of each word in a document by the total number
-of words in the document: these new features are called ``tf`` for Term
-Frequencies.
-
-Another refinement on top of tf is to downscale weights for words
-that occur in many documents in the corpus and are therefore less
-informative than those that occur only in a smaller portion of the
-corpus.
-
-This downscaling is called `tf–idf`_ for "Term Frequency times
-Inverse Document Frequency".
-
-.. _`tf–idf`: https://en.wikipedia.org/wiki/Tf-idf
-
-
-Both **tf** and **tf–idf** can be computed as follows using
-:class:`TfidfTransformer`::
-
-  >>> from sklearn.feature_extraction.text import TfidfTransformer
-  >>> tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
-  >>> X_train_tf = tf_transformer.transform(X_train_counts)
-  >>> X_train_tf.shape
-  (2257, 35788)
-
-In the above example-code, we firstly use the ``fit(..)`` method to fit our
-estimator to the data and secondly the ``transform(..)`` method to transform
-our count-matrix to a tf-idf representation.
-These two steps can be combined to achieve the same end result faster
-by skipping redundant processing. This is done through using the
-``fit_transform(..)`` method as shown below, and as mentioned in the note
-in the previous section::
-
-  >>> tfidf_transformer = TfidfTransformer()
-  >>> X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
-  >>> X_train_tfidf.shape
-  (2257, 35788)
-
-
-Training a classifier
----------------------
-
-Now that we have our features, we can train a classifier to try to predict
-the category of a post. Let's start with a :ref:`naïve Bayes <naive_bayes>`
-classifier, which
-provides a nice baseline for this task. ``scikit-learn`` includes several
-variants of this classifier, and the one most suitable for word counts is the
-multinomial variant::
-
-  >>> from sklearn.naive_bayes import MultinomialNB
-  >>> clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)
-
-To try to predict the outcome on a new document we need to extract
-the features using almost the same feature extracting chain as before.
-The difference is that we call ``transform`` instead of ``fit_transform``
-on the transformers, since they have already been fit to the training set::
-
-  >>> docs_new = ['God is love', 'OpenGL on the GPU is fast']
-  >>> X_new_counts = count_vect.transform(docs_new)
-  >>> X_new_tfidf = tfidf_transformer.transform(X_new_counts)
-
-  >>> predicted = clf.predict(X_new_tfidf)
-
-  >>> for doc, category in zip(docs_new, predicted):
-  ...     print('%r => %s' % (doc, twenty_train.target_names[category]))
-  ...
-  'God is love' => soc.religion.christian
-  'OpenGL on the GPU is fast' => comp.graphics
-
-
-Building a pipeline
--------------------
-
-In order to make the vectorizer => transformer => classifier easier
-to work with, ``scikit-learn`` provides a :class:`~sklearn.pipeline.Pipeline` class that behaves
-like a compound classifier::
-
-  >>> from sklearn.pipeline import Pipeline
-  >>> text_clf = Pipeline([
-  ...     ('vect', CountVectorizer()),
-  ...     ('tfidf', TfidfTransformer()),
-  ...     ('clf', MultinomialNB()),
-  ... ])
-
-
-The names ``vect``, ``tfidf`` and ``clf`` (classifier) are arbitrary.
-We will use them to perform grid search for suitable hyperparameters below.
-We can now train the model with a single command::
-
-  >>> text_clf.fit(twenty_train.data, twenty_train.target)
-  Pipeline(...)
-
-
-Evaluation of the performance on the test set
----------------------------------------------
-
-Evaluating the predictive accuracy of the model is equally easy::
-
-  >>> import numpy as np
-  >>> twenty_test = fetch_20newsgroups(subset='test',
-  ...     categories=categories, shuffle=True, random_state=42)
-  >>> docs_test = twenty_test.data
-  >>> predicted = text_clf.predict(docs_test)
-  >>> np.mean(predicted == twenty_test.target)
-  0.8348...
-
-We achieved 83.5% accuracy. Let's see if we can do better with a
-linear :ref:`support vector machine (SVM) <svm>`,
-which is widely regarded as one of
-the best text classification algorithms (although it's also a bit slower
-than naïve Bayes). We can change the learner by simply plugging a different
-classifier object into our pipeline::
-
-  >>> from sklearn.linear_model import SGDClassifier
-  >>> text_clf = Pipeline([
-  ...     ('vect', CountVectorizer()),
-  ...     ('tfidf', TfidfTransformer()),
-  ...     ('clf', SGDClassifier(loss='hinge', penalty='l2',
-  ...                           alpha=1e-3, random_state=42,
-  ...                           max_iter=5, tol=None)),
-  ... ])
-
-  >>> text_clf.fit(twenty_train.data, twenty_train.target)
-  Pipeline(...)
-  >>> predicted = text_clf.predict(docs_test)
-  >>> np.mean(predicted == twenty_test.target)
-  0.9101...
-
-We achieved 91.3% accuracy using the SVM. ``scikit-learn`` provides further
-utilities for more detailed performance analysis of the results::
-
-  >>> from sklearn import metrics
-  >>> print(metrics.classification_report(twenty_test.target, predicted,
-  ...     target_names=twenty_test.target_names))
-                          precision    recall  f1-score   support
-  <BLANKLINE>
-             alt.atheism       0.95      0.80      0.87       319
-           comp.graphics       0.87      0.98      0.92       389
-                 sci.med       0.94      0.89      0.91       396
-  soc.religion.christian       0.90      0.95      0.93       398
-  <BLANKLINE>
-                accuracy                           0.91      1502
-               macro avg       0.91      0.91      0.91      1502
-            weighted avg       0.91      0.91      0.91      1502
-  <BLANKLINE>
-
-  >>> metrics.confusion_matrix(twenty_test.target, predicted)
-  array([[256,  11,  16,  36],
-         [  4, 380,   3,   2],
-         [  5,  35, 353,   3],
-         [  5,  11,   4, 378]])
-
-As expected the confusion matrix shows that posts from the newsgroups
-on atheism and Christianity are more often confused for one another than
-with computer graphics.
-
-.. note:
-
-  SGD stands for Stochastic Gradient Descent. This is a simple
-  optimization algorithms that is known to be scalable when the dataset
-  has many samples.
-
-  By setting ``loss="hinge"`` and ``penalty="l2"`` we are configuring
-  the classifier model to tune its parameters for the linear Support
-  Vector Machine cost function.
-
-  Alternatively we could have used ``sklearn.svm.LinearSVC`` (Linear
-  Support Vector Machine Classifier) that provides an alternative
-  optimizer for the same cost function based on the liblinear_ C++
-  library.
-
-.. _liblinear: https://www.csie.ntu.edu.tw/~cjlin/liblinear/
-
-
-Parameter tuning using grid search
-----------------------------------
-
-We've already encountered some parameters such as ``use_idf`` in the
-``TfidfTransformer``. Classifiers tend to have many parameters as well;
-e.g., ``MultinomialNB`` includes a smoothing parameter ``alpha`` and
-``SGDClassifier`` has a penalty parameter ``alpha`` and configurable loss
-and penalty terms in the objective function (see the module documentation,
-or use the Python ``help`` function to get a description of these).
-
-Instead of tweaking the parameters of the various components of the
-chain, it is possible to run an exhaustive search of the best
-parameters on a grid of possible values. We try out all classifiers
-on either words or bigrams, with or without idf, and with a penalty
-parameter of either 0.01 or 0.001 for the linear SVM::
-
-  >>> from sklearn.model_selection import GridSearchCV
-  >>> parameters = {
-  ...     'vect__ngram_range': [(1, 1), (1, 2)],
-  ...     'tfidf__use_idf': (True, False),
-  ...     'clf__alpha': (1e-2, 1e-3),
-  ... }
-
-
-Obviously, such an exhaustive search can be expensive. If we have multiple
-CPU cores at our disposal, we can tell the grid searcher to try these eight
-parameter combinations in parallel with the ``n_jobs`` parameter. If we give
-this parameter a value of ``-1``, grid search will detect how many cores
-are installed and use them all::
-
-  >>> gs_clf = GridSearchCV(text_clf, parameters, cv=5, n_jobs=-1)
-
-The grid search instance behaves like a normal ``scikit-learn``
-model. Let's perform the search on a smaller subset of the training data
-to speed up the computation::
-
-  >>> gs_clf = gs_clf.fit(twenty_train.data[:400], twenty_train.target[:400])
-
-The result of calling ``fit`` on a ``GridSearchCV`` object is a classifier
-that we can use to ``predict``::
-
-  >>> twenty_train.target_names[gs_clf.predict(['God is love'])[0]]
-  'soc.religion.christian'
-
-The object's ``best_score_`` and ``best_params_`` attributes store the best
-mean score and the parameters setting corresponding to that score::
-
-  >>> gs_clf.best_score_
-  0.9...
-  >>> for param_name in sorted(parameters.keys()):
-  ...     print("%s: %r" % (param_name, gs_clf.best_params_[param_name]))
-  ...
-  clf__alpha: 0.001
-  tfidf__use_idf: True
-  vect__ngram_range: (1, 1)
-
-A more detailed summary of the search is available at ``gs_clf.cv_results_``.
-
-The ``cv_results_`` parameter can be easily imported into pandas as a
-``DataFrame`` for further inspection.
-
-.. note:
-
-  A ``GridSearchCV`` object also stores the best classifier that it trained
-  as its ``best_estimator_`` attribute. In this case, that isn't much use as
-  we trained on a small, 400-document subset of our full training set.
-
-
-Exercises
-~~~~~~~~~
-
-To do the exercises, copy the content of the 'skeletons' folder as
-a new folder named 'workspace':
-
-.. prompt:: bash $
-
-  cp -r skeletons workspace
-
-
-You can then edit the content of the workspace without fear of losing
-the original exercise instructions.
-
-Then fire an ipython shell and run the work-in-progress script with::
-
-  [1] %run workspace/exercise_XX_script.py arg1 arg2 arg3
-
-If an exception is triggered, use ``%debug`` to fire-up a post
-mortem ipdb session.
-
-Refine the implementation and iterate until the exercise is solved.
-
-**For each exercise, the skeleton file provides all the necessary import
-statements, boilerplate code to load the data and sample code to evaluate
-the predictive accuracy of the model.**
-
-
-Exercise 1: Language identification
------------------------------------
-
-- Write a text classification pipeline using a custom preprocessor and
-  ``TfidfVectorizer`` set up to use character based n-grams, using data from Wikipedia articles as the training set.
-
-- Evaluate the performance on some held out test set.
-
-ipython command line::
-
-  %run workspace/exercise_01_language_train_model.py data/languages/paragraphs/
-
-
-Exercise 2: Sentiment Analysis on movie reviews
------------------------------------------------
-
-- Write a text classification pipeline to classify movie reviews as either
-  positive or negative.
-
-- Find a good set of parameters using grid search.
-
-- Evaluate the performance on a held out test set.
-
-ipython command line::
-
-  %run workspace/exercise_02_sentiment.py data/movie_reviews/txt_sentoken/
-
-
-Exercise 3: CLI text classification utility
--------------------------------------------
-
-Using the results of the previous exercises and the ``cPickle``
-module of the standard library, write a command line utility that
-detects the language of some text provided on ``stdin`` and estimate
-the polarity (positive or negative) if the text is written in
-English.
-
-Bonus point if the utility is able to give a confidence level for its
-predictions.
-
-
-Where to from here
-------------------
-
-Here are a few suggestions to help further your scikit-learn intuition
-upon the completion of this tutorial:
-
-
-* Try playing around with the ``analyzer`` and ``token normalisation`` under
-  :class:`CountVectorizer`.
-
-* If you don't have labels, try using
-  :ref:`Clustering <sphx_glr_auto_examples_text_plot_document_clustering.py>`
-  on your problem.
-
-* If you have multiple labels per document, e.g. categories, have a look
-  at the :ref:`Multiclass and multilabel section <multiclass>`.
-
-* Try using :ref:`Truncated SVD <LSA>` for
-  `latent semantic analysis <https://en.wikipedia.org/wiki/Latent_semantic_analysis>`_.
-
-* Have a look at using
-  :ref:`Out-of-core Classification
-  <sphx_glr_auto_examples_applications_plot_out_of_core_classification.py>` to
-  learn from data that would not fit into the computer main memory.
-
-* Have a look at the :ref:`Hashing Vectorizer <hashing_vectorizer>`
-  as a memory efficient alternative to :class:`CountVectorizer`.

From da38895c5264b6174d77b2f1a45770765491ade9 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 00:18:28 +0200
Subject: [PATCH 02/11] remove exclude from pyproject.toml

---
 pyproject.toml | 2 --
 1 file changed, 2 deletions(-)

diff --git a/pyproject.toml b/pyproject.toml
index 9f1fd9ec3b1bb..80636a4dcaa50 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -115,7 +115,6 @@ exclude = '''
   | \.vscode
   | build
   | dist
-  | doc/tutorial
   | doc/_build
   | doc/auto_examples
   | sklearn/externals
@@ -134,7 +133,6 @@ exclude=[
     "sklearn/externals",
     "doc/_build",
     "doc/auto_examples",
-    "doc/tutorial",
     "build",
     "asv_benchmarks/env",
     "asv_benchmarks/html",

From 85eac1d767623849fc8b715d7cb158568f242787 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 13:13:52 +0200
Subject: [PATCH 03/11] DOC add back ML Map and reference from getting started

---
 doc/developers/contributing.rst |  3 --
 doc/getting_started.rst         |  2 +
 doc/machine_learning_map.rst    | 75 +++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 3 deletions(-)
 create mode 100644 doc/machine_learning_map.rst

diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst
index 402711dcd1bf3..2900ed02803d7 100644
--- a/doc/developers/contributing.rst
+++ b/doc/developers/contributing.rst
@@ -659,9 +659,6 @@ We are glad to accept any sort of documentation:
   `doc/ <https://github.com/scikit-learn/scikit-learn/tree/main/doc>`_ directory
   and
   `doc/modules/ <https://github.com/scikit-learn/scikit-learn/tree/main/doc/modules>`_.
-* **tutorials** - these introduce various statistical learning and machine learning
-  concepts and are located in
-  `doc/tutorial <https://github.com/scikit-learn/scikit-learn/tree/main/doc/tutorial>`_.
 * **examples** - these provide full code examples that may demonstrate the use
   of scikit-learn modules, compare different algorithms or discuss their
   interpretation etc. Examples live in
diff --git a/doc/getting_started.rst b/doc/getting_started.rst
index cd4d953db1b8a..295671a2a2e0e 100644
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
@@ -53,6 +53,8 @@ new data. You don't need to re-train the estimator::
   >>> clf.predict([[4, 5, 6], [14, 15, 16]])  # predict classes of new data
   array([0, 1])
 
+You can check :ref:<ml_map> on how to choose the right model for your use case.
+
 Transformers and pre-processors
 -------------------------------
 
diff --git a/doc/machine_learning_map.rst b/doc/machine_learning_map.rst
new file mode 100644
index 0000000000000..0c1e811716648
--- /dev/null
+++ b/doc/machine_learning_map.rst
@@ -0,0 +1,75 @@
+:html_theme.sidebar_secondary.remove:
+
+.. _ml_map:
+
+Choosing the right estimator
+============================
+
+Often the hardest part of solving a machine learning problem can be finding the right
+estimator for the job. Different estimators are better suited for different types of
+data and different problems.
+
+The flowchart below is designed to give users a bit of a rough guide on how to approach
+problems with regard to which estimators to try on your data. Click on any estimator in
+the chart below to see its documentation. Use scroll wheel to zoom in and out, and click
+and drag to pan around. You can also download the chart:
+:download:`ml_map.svg <images/ml_map.svg>`.
+
+.. raw:: html
+
+  <style>
+    #sk-ml-map {
+      height: 80vh;
+      margin: 1.5rem 0;
+    }
+
+    #sk-ml-map svg {
+      height: 100%;
+      width: 100%;
+      border: 2px solid var(--pst-color-border);
+      border-radius: 0.5rem;
+    }
+
+    html[data-theme="dark"] #sk-ml-map svg {
+      filter: invert(90%) hue-rotate(180deg);
+    }
+  </style>
+
+  <script src="../../_static/scripts/vendor/svg-pan-zoom.min.js"></script>
+  <script>
+    document.addEventListener("DOMContentLoaded", function () {
+      const beforePan = function (oldPan, newPan) {
+        const gutterWidth = 100, gutterHeight = 100;
+        const sizes = this.getSizes();
+
+        // Compute pan limits
+        const leftLimit = -((sizes.viewBox.x + sizes.viewBox.width) * sizes.realZoom) + gutterWidth;
+        const rightLimit = sizes.width - gutterWidth - (sizes.viewBox.x * sizes.realZoom);
+        const topLimit = -((sizes.viewBox.y + sizes.viewBox.height) * sizes.realZoom) + gutterHeight;
+        const bottomLimit = sizes.height - gutterHeight - (sizes.viewBox.y * sizes.realZoom);
+
+        return {
+          x: Math.max(leftLimit, Math.min(rightLimit, newPan.x)),
+          y: Math.max(topLimit, Math.min(bottomLimit, newPan.y))
+        };
+      };
+
+      // Limit the pan
+      svgPanZoom("#sk-ml-map svg", {
+        zoomEnabled: true,
+        controlIconsEnabled: true,
+        fit: 1,
+        center: 1,
+        beforePan: beforePan,
+      });
+    });
+  </script>
+
+  <div id="sk-ml-map">
+
+.. raw:: html
+  :file: ../../images/ml_map.svg
+
+.. raw:: html
+
+  </div>

From 7c41009a324bece48b57391c12d87e6701d4a7f7 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 13:15:16 +0200
Subject: [PATCH 04/11] fix paths

---
 doc/machine_learning_map.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/machine_learning_map.rst b/doc/machine_learning_map.rst
index 0c1e811716648..fb2bae2a53716 100644
--- a/doc/machine_learning_map.rst
+++ b/doc/machine_learning_map.rst
@@ -35,7 +35,7 @@ and drag to pan around. You can also download the chart:
     }
   </style>
 
-  <script src="../../_static/scripts/vendor/svg-pan-zoom.min.js"></script>
+  <script src="js/scripts/vendor/svg-pan-zoom.min.js"></script>
   <script>
     document.addEventListener("DOMContentLoaded", function () {
       const beforePan = function (oldPan, newPan) {
@@ -68,7 +68,7 @@ and drag to pan around. You can also download the chart:
   <div id="sk-ml-map">
 
 .. raw:: html
-  :file: ../../images/ml_map.svg
+  :file: images/ml_map.svg
 
 .. raw:: html
 

From 495c2145ffcb8515ead140c9c0de26c4d57e2923 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 14:52:31 +0200
Subject: [PATCH 05/11] MNT fix warnings

---
 doc/getting_started.rst | 3 ---
 doc/presentations.rst   | 3 ---
 doc/user_guide.rst      | 1 +
 3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/doc/getting_started.rst b/doc/getting_started.rst
index 295671a2a2e0e..32ddccf2a00b1 100644
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
@@ -229,6 +229,3 @@ provide. You can also find an exhaustive list of the public API in the
 
 You can also look at our numerous :ref:`examples <general_examples>` that
 illustrate the use of ``scikit-learn`` in many different contexts.
-
-The :ref:`tutorials <tutorial_menu>` also contain additional learning
-resources.
diff --git a/doc/presentations.rst b/doc/presentations.rst
index 19fd09218b5fd..e55086f4422e3 100644
--- a/doc/presentations.rst
+++ b/doc/presentations.rst
@@ -2,9 +2,6 @@
 External Resources, Videos and Talks
 ===========================================
 
-For written tutorials, see the :ref:`Tutorial section <tutorial_menu>` of
-the documentation.
-
 New to Scientific Python?
 ==========================
 For those that are still new to the scientific Python ecosystem, we highly
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
index a32c63db9de01..4b68722be4f3b 100644
--- a/doc/user_guide.rst
+++ b/doc/user_guide.rst
@@ -19,6 +19,7 @@ User Guide
    model_persistence.rst
    common_pitfalls.rst
    dispatching.rst
+   machine_learning_map.rst
 
 Under Development
 -----------------

From f78dd92f4820c2b05a3f8f971f172a5f75c1e33a Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 14:55:33 +0200
Subject: [PATCH 06/11] MNT remove more mentions

---
 doc/conftest.py         | 6 ------
 doc/index.rst.template  | 1 -
 doc/whats_new/v0.15.rst | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/doc/conftest.py b/doc/conftest.py
index d66148ccc553f..7081e8b8bf698 100644
--- a/doc/conftest.py
+++ b/doc/conftest.py
@@ -128,10 +128,6 @@ def pytest_runtest_setup(item):
         setup_rcv1()
     elif fname.endswith("datasets/twenty_newsgroups.rst") or is_index:
         setup_twenty_newsgroups()
-    elif (
-        fname.endswith("tutorial/text_analytics/working_with_text_data.rst") or is_index
-    ):
-        setup_working_with_text_data()
     elif fname.endswith("modules/compose.rst") or is_index:
         setup_compose()
     elif fname.endswith("datasets/loading_other_datasets.rst"):
@@ -148,8 +144,6 @@ def pytest_runtest_setup(item):
     rst_files_requiring_matplotlib = [
         "modules/partial_dependence.rst",
         "modules/tree.rst",
-        "tutorial/statistical_inference/settings.rst",
-        "tutorial/statistical_inference/supervised_learning.rst",
     ]
     for each in rst_files_requiring_matplotlib:
         if fname.endswith(each):
diff --git a/doc/index.rst.template b/doc/index.rst.template
index df058f5fb6185..f1f1f49836515 100644
--- a/doc/index.rst.template
+++ b/doc/index.rst.template
@@ -13,7 +13,6 @@
    auto_examples/index
    Community <https://blog.scikit-learn.org/>
    getting_started
-   Tutorials <tutorial/index>
    whats_new
    Glossary <glossary>
    Development <{{ development_link }}>
diff --git a/doc/whats_new/v0.15.rst b/doc/whats_new/v0.15.rst
index d12c4a2526d71..6f87a2852b751 100644
--- a/doc/whats_new/v0.15.rst
+++ b/doc/whats_new/v0.15.rst
@@ -315,7 +315,7 @@ Enhancements
 Documentation improvements
 ...........................
 
-- The :ref:`Working With Text Data <text_data_tutorial>` tutorial
+- The Working With Text Data tutorial
   has now been worked in to the main documentation's tutorial section.
   Includes exercises and skeletons for tutorial presentation.
   Original tutorial created by several authors including

From d91bd5c11d75bb3ca8ad98c5422f8648608e1e82 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Sat, 25 May 2024 16:41:27 +0200
Subject: [PATCH 07/11] MNT remove more warnings

---
 doc/presentations.rst | 4 +---
 doc/user_guide.rst    | 1 +
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/presentations.rst b/doc/presentations.rst
index e55086f4422e3..92f23c0fa26cb 100644
--- a/doc/presentations.rst
+++ b/doc/presentations.rst
@@ -42,9 +42,7 @@ Videos
 
   An extensive tutorial, consisting of four sessions of one hour.
   The tutorial covers the basics of machine learning,
-  many algorithms and how to apply them using scikit-learn. The
-  material corresponding is now in the scikit-learn documentation
-  section :ref:`stat_learn_tut_index`.
+  many algorithms and how to apply them using scikit-learn.
 
 - `Statistical Learning for Text Classification with scikit-learn and NLTK
   <https://pyvideo.org/video/417/pycon-2011--statistical-machine-learning-for-text>`_
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
index 4b68722be4f3b..81ce774a5155e 100644
--- a/doc/user_guide.rst
+++ b/doc/user_guide.rst
@@ -20,6 +20,7 @@ User Guide
    common_pitfalls.rst
    dispatching.rst
    machine_learning_map.rst
+   presentations.rst
 
 Under Development
 -----------------

From 7359961b2c252be64265f2b7c0b58a78b0ce2f93 Mon Sep 17 00:00:00 2001
From: Adrin Jalali <adrin.jalali@gmail.com>
Date: Mon, 27 May 2024 15:15:53 +0200
Subject: [PATCH 08/11] Update doc/machine_learning_map.rst

Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com>
---
 doc/machine_learning_map.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/machine_learning_map.rst b/doc/machine_learning_map.rst
index fb2bae2a53716..edd1dd62c0893 100644
--- a/doc/machine_learning_map.rst
+++ b/doc/machine_learning_map.rst
@@ -35,7 +35,7 @@ and drag to pan around. You can also download the chart:
     }
   </style>
 
-  <script src="js/scripts/vendor/svg-pan-zoom.min.js"></script>
+  <script src="_static/scripts/vendor/svg-pan-zoom.min.js"></script>
   <script>
     document.addEventListener("DOMContentLoaded", function () {
       const beforePan = function (oldPan, newPan) {

From 2d4f3ffbd474b7e3ae2c5bd2699f95ad4bb20012 Mon Sep 17 00:00:00 2001
From: Adrin Jalali <adrin.jalali@gmail.com>
Date: Mon, 27 May 2024 15:16:00 +0200
Subject: [PATCH 09/11] Update doc/getting_started.rst

Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com>
---
 doc/getting_started.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/getting_started.rst b/doc/getting_started.rst
index 32ddccf2a00b1..14e0178f0826b 100644
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
@@ -53,7 +53,7 @@ new data. You don't need to re-train the estimator::
   >>> clf.predict([[4, 5, 6], [14, 15, 16]])  # predict classes of new data
   array([0, 1])
 
-You can check :ref:<ml_map> on how to choose the right model for your use case.
+You can check :ref:`ml_map` on how to choose the right model for your use case.
 
 Transformers and pre-processors
 -------------------------------

From 80bdcc8beaa53bfbe43454d43e3387d045e55650 Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Mon, 3 Jun 2024 13:22:42 +0200
Subject: [PATCH 10/11] fix merge issue

---
 doc/developers/contributing.rst | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst
index 98960c5eb8839..32f6c786b1051 100644
--- a/doc/developers/contributing.rst
+++ b/doc/developers/contributing.rst
@@ -651,17 +651,9 @@ We are glad to accept any sort of documentation:
 * **User guide:** These provide more detailed information about the algorithms
   implemented in scikit-learn and generally live in the root
   `doc/ <https://github.com/scikit-learn/scikit-learn/tree/main/doc>`_ directory
-<<<<<<< HEAD
   and
   `doc/modules/ <https://github.com/scikit-learn/scikit-learn/tree/main/doc/modules>`_.
-* **examples** - these provide full code examples that may demonstrate the use
-=======
-  and `doc/modules/ <https://github.com/scikit-learn/scikit-learn/tree/main/doc/modules>`_.
-* **Tutorials:** These introduce various statistical learning and machine learning
-  concepts and are located in
-  `doc/tutorial <https://github.com/scikit-learn/scikit-learn/tree/main/doc/tutorial>`_.
 * **Examples:** These provide full code examples that may demonstrate the use
->>>>>>> upstream/main
   of scikit-learn modules, compare different algorithms or discuss their
   interpretation, etc. Examples live in
   `examples/ <https://github.com/scikit-learn/scikit-learn/tree/main/examples>`_.

From ef4b210a31e5c61e0f6dd6e30e38acf5822377ce Mon Sep 17 00:00:00 2001
From: adrinjalali <adrin.jalali@gmail.com>
Date: Mon, 3 Jun 2024 13:25:39 +0200
Subject: [PATCH 11/11] add redirect

---
 doc/conf.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/conf.py b/doc/conf.py
index db79bfe3f7865..9923a24260267 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -448,6 +448,7 @@ def add_js_css_files(app, pagename, templatename, context, doctree):
     "auto_examples/exercises/plot_cv_digits.py": (
         "auto_examples/model_selection/plot_nested_cross_validation_iris.py"
     ),
+    "tutorial/machine_learning_map/index.html": "machine_learning_map/index.html",
 }
 html_context["redirects"] = redirects
 for old_link in redirects: