scikit-learn · aperezlebel · Jun 23, 2022 · glemaitre · Jun 28, 2022 · ogrisel
diff --git a/doc/images/missing_value_mechanisms.png b/doc/images/missing_value_mechanisms.png
diff --git a/doc/modules/impute.rst b/doc/modules/impute.rst
@@ -17,6 +17,31 @@ values, i.e., to infer them from the known part of the data. See the
 :ref:`glossary` entry on imputation.
 
 
+Missing value mechanisms
+========================
+Three mechanisms model data missingness.
-Three mechanisms model data missingness.
+Three mechanisms model data missingness exist:
-Three mechanisms model data missingness.
+The machine learning literature typically distinguishes between the following
+settings. Note that the names are not necessarily very intuitive:
-Three mechanisms model data missingness.
+Three mechanisms model data missingness exist:
-Three mechanisms model data missingness.
+The machine learning literature typically distinguishes between the following
+settings. Note that the names are not necessarily very intuitive:
+
+* **Missing Completely At Random (MCAR)**: the missingness does not depend on data.
+* **Missing At Random (MAR)**: the missingness does not depend on underlying
+  missing values but can depend on observed ones.
+* **Missing Not At Random (MNAR)**: the missingness depends on underlying missing
+  values.
-  values.
+  values. Therefore, the missingness pattern can be statistically associated
+  with `y` in a supervised classification or regression setting.
-  values.
+  values. Therefore, the missingness pattern can be statistically associated
+  with `y` in a supervised classification or regression setting.
+
+.. figure:: ../images/missing_value_mechanisms.png
+   :align: center
+   :scale: 20%
+
+In the above example, X1 is always observed. In the first plot, X2 is masked
-In the above example, X1 is always observed. In the first plot, X2 is masked
+In the above example, X1 is always observed. In the left-hand side plot, X2 is masked
-In the above example, X1 is always observed. In the first plot, X2 is masked
+In the above example, X1 is always observed. In the left-hand side plot, X2 is masked
+independently of the values of (X1, X2), hence MCAR. In the second, X2 is
-independently of the values of (X1, X2), hence MCAR. In the second, X2 is
+independently of the values of (X1, X2), hence MCAR. In the middle, X2 is
-independently of the values of (X1, X2), hence MCAR. In the second, X2 is
+independently of the values of (X1, X2), hence MCAR. In the middle, X2 is
+masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is
-masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is
+masked when X1 (observed) reaches some threshold, hence MAR. In the right-hand side plot, X2 is
-masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is
+masked when X1 (observed) reaches some threshold, hence MAR. In the right-hand side plot, X2 is
+masked when X2 reaches some threshold, hence MNAR.
+
+Conditional imputation (e.g. :class:`~sklearn.impute.IterativeImputer` or
+:class:`~sklearn.impute.KNNImputer`) is guaranteed to work only for ignorable
+missingness (i.e. MCAR or MAR settings). When missingness is seldom ignored,
+i.e. MNAR setting, adding the mask (`add_indicator=True`) is needed as the missingness is
+informative. In practice, real-world data are often MNAR.
+
 Univariate vs. Multivariate Imputation
 ======================================
 
@@ -317,8 +342,8 @@ wrap this in a :class:`Pipeline` with a classifier (e.g., a
 Estimators that handle NaN values
 =================================
 
-Some estimators are designed to handle NaN values without preprocessing. 
-Below is the list of these estimators, classified by type 
+Some estimators are designed to handle NaN values without preprocessing.
+Below is the list of these estimators, classified by type
 (cluster, regressor, classifier, transform) :
 
 .. allow_nan_estimators::