[WIP] DOC Explain missing value mechanisms #23746

aperezlebel · 2022-06-23T21:21:58Z

Reference Issues/PRs

Addresses task 3 of #21967.

What does this implement/fix? Explain your changes.

Add a section to the "Imputation of missing values" doc to explain the missing value mechanisms.

Any other comments?

Work in progress

glemaitre

It is already looking good.

In addition to the synthetic illustration that is good, I am wondering if we could illustrate with a specific application setting related to data collection such that we have formal and applicative aspects.

glemaitre · 2022-06-28T09:52:22Z

doc/modules/impute.rst

@@ -17,6 +17,31 @@ values, i.e., to infer them from the known part of the data. See the
 :ref:`glossary` entry on imputation.


+Missing value mechanisms
+========================
+Three mechanisms model data missingness.


Suggested change

Three mechanisms model data missingness.

Three mechanisms model data missingness exist:

glemaitre · 2022-06-28T09:53:15Z

doc/modules/impute.rst

+========================
+Three mechanisms model data missingness.
+
+* **Missing Completely At Random (MCAR)**: the missingness does not depend on data.


What about giving a concrete example for each mechanism to illustrate it.

glemaitre · 2022-06-28T09:54:26Z

doc/modules/impute.rst

+   :align: center
+   :scale: 20%
+
+In the above example, X1 is always observed. In the first plot, X2 is masked


Suggested change

In the above example, X1 is always observed. In the first plot, X2 is masked

In the above example, X1 is always observed. In the left-hand side plot, X2 is masked

glemaitre · 2022-06-28T09:54:34Z

doc/modules/impute.rst

+   :scale: 20%
+
+In the above example, X1 is always observed. In the first plot, X2 is masked
+independently of the values of (X1, X2), hence MCAR. In the second, X2 is


Suggested change

independently of the values of (X1, X2), hence MCAR. In the second, X2 is

independently of the values of (X1, X2), hence MCAR. In the middle, X2 is

glemaitre · 2022-06-28T09:54:52Z

doc/modules/impute.rst

+
+In the above example, X1 is always observed. In the first plot, X2 is masked
+independently of the values of (X1, X2), hence MCAR. In the second, X2 is
+masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is


Suggested change

masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is

masked when X1 (observed) reaches some threshold, hence MAR. In the right-hand side plot, X2 is

glemaitre · 2023-01-10T17:26:06Z

@aperezlebel do you want to address the comment and solve the conflict such that we merge this PR?

ogrisel · 2023-07-07T09:05:45Z

doc/modules/impute.rst

+
+* **Missing Completely At Random (MCAR)**: the missingness does not depend on data.
+* **Missing At Random (MAR)**: the missingness does not depend on underlying
+  missing values but can depend on observed ones.


Including the target variable y?

ogrisel · 2023-07-07T15:54:03Z

doc/modules/impute.rst

@@ -17,6 +17,31 @@ values, i.e., to infer them from the known part of the data. See the
 :ref:`glossary` entry on imputation.


+Missing value mechanisms
+========================
+Three mechanisms model data missingness.


Suggested change

Three mechanisms model data missingness.

The machine learning literature typically distinguishes between the following

settings. Note that the names are not necessarily very intuitive:

ogrisel · 2023-07-07T15:56:12Z

doc/modules/impute.rst

+* **Missing At Random (MAR)**: the missingness does not depend on underlying
+  missing values but can depend on observed ones.
+* **Missing Not At Random (MNAR)**: the missingness depends on underlying missing
+  values.


Suggested change

values.

values. Therefore, the missingness pattern can be statistically associated

with `y` in a supervised classification or regression setting.

Add draft of missing value mechanisms section

c67adb1

github-actions bot added the Documentation label Jun 23, 2022

aperezlebel mentioned this pull request Jun 23, 2022

Documenting missing-values practices #21967

Open

7 tasks

aperezlebel changed the title ~~[WIP] DOC Add draft of missing value mechanisms section~~ [WIP] DOC Explain missing value mechanisms Jun 23, 2022

glemaitre reviewed Jun 28, 2022

View reviewed changes

glemaitre self-requested a review November 8, 2022 14:46

glemaitre removed their request for review January 10, 2023 17:25

lorentzenchr added the Stalled label Feb 19, 2023

ogrisel reviewed Jul 7, 2023

View reviewed changes

ArturoAmorQ mentioned this pull request Aug 3, 2023

DOC Add example showcasing HGBT regression #26991

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] DOC Explain missing value mechanisms #23746

[WIP] DOC Explain missing value mechanisms #23746

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	Three mechanisms model data missingness.
	Three mechanisms model data missingness exist:

	In the above example, X1 is always observed. In the first plot, X2 is masked
	In the above example, X1 is always observed. In the left-hand side plot, X2 is masked

	independently of the values of (X1, X2), hence MCAR. In the second, X2 is
	independently of the values of (X1, X2), hence MCAR. In the middle, X2 is

	masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is
	masked when X1 (observed) reaches some threshold, hence MAR. In the right-hand side plot, X2 is

	Three mechanisms model data missingness.
	The machine learning literature typically distinguishes between the following
	settings. Note that the names are not necessarily very intuitive:

	values.
	values. Therefore, the missingness pattern can be statistically associated
	with `y` in a supervised classification or regression setting.

Uh oh!

[WIP] DOC Explain missing value mechanisms #23746

Are you sure you want to change the base?

[WIP] DOC Explain missing value mechanisms #23746

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!