From 21e057885977eede8ff645e0949db02f5dd37ba2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Thu, 13 Feb 2025 13:48:22 +0100 Subject: [PATCH 01/19] Added from_predictions --- doc/visualizations.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 412dfc001fab1..57c03f29927f5 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -9,9 +9,10 @@ learning. The key feature of this API is to allow for quick plotting and visual adjustments without recalculation. We provide `Display` classes that expose two methods for creating plots: `from_estimator` and `from_predictions`. The `from_estimator` method will take a fitted estimator -and some data (`X` and `y`) and create a `Display` object. Sometimes, we would -like to only compute the predictions once and one should use `from_predictions` -instead. In the following example, we plot a ROC curve for a fitted support +and some data (`X` and `y`) and create a `Display` object. The `from_predictions`` +method creates a `Display` object when given the true and predicted values. +We should use the latter when we want to compute the predictions only once. +In the following example, we plot a ROC curve for a fitted support vector machine: .. plot:: From c71d9ba4d76a0b7544fdfdd99ea6c12a3b088287 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Thu, 13 Feb 2025 15:41:31 +0100 Subject: [PATCH 02/19] version to propose --- doc/visualizations.rst | 45 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 57c03f29927f5..4a0df3123bbbf 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -8,12 +8,19 @@ Scikit-learn defines a simple API for creating visualizations for machine learning. The key feature of this API is to allow for quick plotting and visual adjustments without recalculation. We provide `Display` classes that expose two methods for creating plots: `from_estimator` and -`from_predictions`. The `from_estimator` method will take a fitted estimator -and some data (`X` and `y`) and create a `Display` object. The `from_predictions`` -method creates a `Display` object when given the true and predicted values. -We should use the latter when we want to compute the predictions only once. +`from_predictions`. + +The `from_estimator` method generates a `Display` object from a fitted estimator and input data (X, y). +And the `from_predictions` method creates a `Display` object when given the true and predicted values. +This is useful when predictions should only be computed once. + +The `Display` object stores the computed values required for plotting with Matplotlib. +These values can either be passed directly via `from_predictions`, or derived from an estimator +and sample data using `from_estimator`. +Additionally, the plot method allows adding to an existing plot via the ax parameter. + In the following example, we plot a ROC curve for a fitted support -vector machine: +vector machine using `from_estimator`: .. plot:: :context: close-figs @@ -35,9 +42,9 @@ vector machine: The returned `svc_disp` object allows us to continue using the already computed ROC curve for SVC in future plots. In this case, the `svc_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` that stores the computed values as -attributes called `roc_auc`, `fpr`, and `tpr`. Be aware that we could get -the predictions from the support vector machine and then use `from_predictions` -instead of `from_estimator`. Next, we train a random forest classifier and plot +attributes called `roc_auc`, `fpr`, and `tpr`. + +Next, we train a random forest classifier and plot the previously computed ROC curve again by using the `plot` method of the `Display` object. @@ -58,6 +65,28 @@ the previously computed ROC curve again by using the `plot` method of the Notice that we pass `alpha=0.8` to the plot functions to adjust the alpha values of the curves. +Finally, compared with the first example above, we can first obtain predictions +from the support vector machine and then use `from_predictions` instead of `from_estimator`. + + +.. plot:: + :context: close-figs + :align: center + + from sklearn.model_selection import train_test_split + from sklearn.svm import SVC + from sklearn.metrics import RocCurveDisplay + from sklearn.datasets import load_wine + + X, y = load_wine(return_X_y=True) + y = y == 2 # make binary + X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) + svc = SVC(random_state=42).fit(X_train, y_train) + y_pred = svc.decision_function(X_test) + + svc_disp = RocCurveDisplay.from_predictions(y_test, y_pred) + + .. rubric:: Examples * :ref:`sphx_glr_auto_examples_miscellaneous_plot_roc_curve_visualization_api.py` From 36f31e855bc47e56c3c260fe239b7aa939b50cc4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Thu, 13 Feb 2025 15:48:48 +0100 Subject: [PATCH 03/19] removed error --- doc/visualizations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 4a0df3123bbbf..8dd1b17fe952b 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -11,7 +11,7 @@ expose two methods for creating plots: `from_estimator` and `from_predictions`. The `from_estimator` method generates a `Display` object from a fitted estimator and input data (X, y). -And the `from_predictions` method creates a `Display` object when given the true and predicted values. +The `from_predictions` method creates a `Display` object when given the true and predicted values. This is useful when predictions should only be computed once. The `Display` object stores the computed values required for plotting with Matplotlib. From 0a13cd0cace5e67e6128ee79bf5ce3f35f48b2b8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 14 Feb 2025 09:09:05 +0100 Subject: [PATCH 04/19] after first review --- doc/visualizations.rst | 49 ++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 8dd1b17fe952b..1d75d0ab3eecb 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -11,13 +11,14 @@ expose two methods for creating plots: `from_estimator` and `from_predictions`. The `from_estimator` method generates a `Display` object from a fitted estimator and input data (X, y). -The `from_predictions` method creates a `Display` object when given the true and predicted values. -This is useful when predictions should only be computed once. +The `from_predictions` method creates a `Display` object from true and predicted values, which +is useful when you only want to compute the predictions once. The `Display` object stores the computed values required for plotting with Matplotlib. These values can either be passed directly via `from_predictions`, or derived from an estimator and sample data using `from_estimator`. -Additionally, the plot method allows adding to an existing plot via the ax parameter. +Additionally, the plot method allows adding to an existing plot by passing the existing +plots :class`matplotlib.axes.Axes` to the `ax` parameter. In the following example, we plot a ROC curve for a fitted support vector machine using `from_estimator`: @@ -39,6 +40,28 @@ vector machine using `from_estimator`: svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test) +If you already have the prediction values, you could instead use +`from_predictions` to do the same thing: + + +.. plot:: + :context: close-figs + :align: center + + from sklearn.model_selection import train_test_split + from sklearn.svm import SVC + from sklearn.metrics import RocCurveDisplay + from sklearn.datasets import load_wine + + X, y = load_wine(return_X_y=True) + y = y == 2 # make binary + X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) + svc = SVC(random_state=42).fit(X_train, y_train) + y_pred = svc.decision_function(X_test) + + svc_disp = RocCurveDisplay.from_predictions(y_test, y_pred) + + The returned `svc_disp` object allows us to continue using the already computed ROC curve for SVC in future plots. In this case, the `svc_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` that stores the computed values as @@ -65,26 +88,6 @@ the previously computed ROC curve again by using the `plot` method of the Notice that we pass `alpha=0.8` to the plot functions to adjust the alpha values of the curves. -Finally, compared with the first example above, we can first obtain predictions -from the support vector machine and then use `from_predictions` instead of `from_estimator`. - - -.. plot:: - :context: close-figs - :align: center - - from sklearn.model_selection import train_test_split - from sklearn.svm import SVC - from sklearn.metrics import RocCurveDisplay - from sklearn.datasets import load_wine - - X, y = load_wine(return_X_y=True) - y = y == 2 # make binary - X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) - svc = SVC(random_state=42).fit(X_train, y_train) - y_pred = svc.decision_function(X_test) - - svc_disp = RocCurveDisplay.from_predictions(y_test, y_pred) .. rubric:: Examples From 96638d7f5d7e0b7a3497d0b3e64b7cae32440511 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 14 Feb 2025 11:54:18 +0100 Subject: [PATCH 05/19] fix link --- doc/visualizations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 1d75d0ab3eecb..51ed220f12750 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -18,7 +18,7 @@ The `Display` object stores the computed values required for plotting with Matpl These values can either be passed directly via `from_predictions`, or derived from an estimator and sample data using `from_estimator`. Additionally, the plot method allows adding to an existing plot by passing the existing -plots :class`matplotlib.axes.Axes` to the `ax` parameter. +plots :class:`matplotlib.axes.Axes` to the `ax` parameter. In the following example, we plot a ROC curve for a fitted support vector machine using `from_estimator`: From 4e9eb83d1de568d0868f056d70621cb372103d58 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 21 Feb 2025 11:36:16 +0100 Subject: [PATCH 06/19] added arg names, removed unused blank lines --- doc/visualizations.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 51ed220f12750..cf5163d7365ae 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -10,8 +10,8 @@ visual adjustments without recalculation. We provide `Display` classes that expose two methods for creating plots: `from_estimator` and `from_predictions`. -The `from_estimator` method generates a `Display` object from a fitted estimator and input data (X, y). -The `from_predictions` method creates a `Display` object from true and predicted values, which +The `from_estimator` method generates a `Display` object from a fitted estimator and input data (`X`, `y`). +The `from_predictions` method creates a `Display` object from true and predicted values (`y_test`, `y_pred`), which is useful when you only want to compute the predictions once. The `Display` object stores the computed values required for plotting with Matplotlib. @@ -89,7 +89,6 @@ Notice that we pass `alpha=0.8` to the plot functions to adjust the alpha values of the curves. - .. rubric:: Examples * :ref:`sphx_glr_auto_examples_miscellaneous_plot_roc_curve_visualization_api.py` From 6132fad6a8117ed969bcd65f8ae943a645643e51 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 7 Mar 2025 19:48:33 +0100 Subject: [PATCH 07/19] Correcting Display explanation --- doc/visualizations.rst | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index cf5163d7365ae..8f0eec6a13a7e 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -10,13 +10,19 @@ visual adjustments without recalculation. We provide `Display` classes that expose two methods for creating plots: `from_estimator` and `from_predictions`. -The `from_estimator` method generates a `Display` object from a fitted estimator and input data (`X`, `y`). -The `from_predictions` method creates a `Display` object from true and predicted values (`y_test`, `y_pred`), which -is useful when you only want to compute the predictions once. +The `from_estimator` method generates a `Display` object from a fitted estimator and +input data (`X`, `y`). +The `from_predictions` method creates a `Display` object from true and predicted +values (`y_test`, `y_pred`), which +is useful when you only want to compute the predictions once. Using `from_predictions` +avoids us to recompute the predictions, but does not automatically resolve some +ambiguities. + +The `Display` object stores the computed values (e. g. metric values) required for +plotting with Matplotlib. These computed values are the results of some derivatives +after we pass the raw predictions to `from_predictions`, or we get them from +an estimator via `from_estimator`. -The `Display` object stores the computed values required for plotting with Matplotlib. -These values can either be passed directly via `from_predictions`, or derived from an estimator -and sample data using `from_estimator`. Additionally, the plot method allows adding to an existing plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. From 1c88ab1e17a37ebe0d61f7cbd76076ce83c13b71 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 10 Mar 2025 10:37:36 +0100 Subject: [PATCH 08/19] after feedback --- doc/visualizations.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 8f0eec6a13a7e..4f31f5341486a 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -23,8 +23,9 @@ plotting with Matplotlib. These computed values are the results of some derivati after we pass the raw predictions to `from_predictions`, or we get them from an estimator via `from_estimator`. -Additionally, the plot method allows adding to an existing plot by passing the existing -plots :class:`matplotlib.axes.Axes` to the `ax` parameter. +Display objects have a plot method which will create a matplotlib plot once the display +object has been initialised. Additionally, the plot method allows adding to an existing +plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. In the following example, we plot a ROC curve for a fitted support vector machine using `from_estimator`: From 2a25249e83f12a141a98b93f9c04e32b9f20a0c1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 10 Mar 2025 14:12:41 +0100 Subject: [PATCH 09/19] Changed classifier to LogisticRegression --- doc/visualizations.rst | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 4f31f5341486a..73b3f9d61722d 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -35,17 +35,19 @@ vector machine using `from_estimator`: :align: center from sklearn.model_selection import train_test_split - from sklearn.svm import SVC + from sklearn.linear_model import LogisticRegression from sklearn.metrics import RocCurveDisplay - from sklearn.datasets import load_wine + from sklearn.datasets import load_iris - X, y = load_wine(return_X_y=True) + X, y = load_iris(return_X_y=True) y = y == 2 # make binary - X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) - svc = SVC(random_state=42) - svc.fit(X_train, y_train) + X_train, X_test, y_train, y_test = train_test_split( + X, y, test_size=.8, random_state=42 + ) + clf = LogisticRegression(random_state=42, C=.01) + clf.fit(X_train, y_train) - svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test) + clf_disp = RocCurveDisplay.from_estimator(clf, X_test, y_test) If you already have the prediction values, you could instead use `from_predictions` to do the same thing: @@ -55,18 +57,20 @@ If you already have the prediction values, you could instead use :context: close-figs :align: center - from sklearn.model_selection import train_test_split - from sklearn.svm import SVC - from sklearn.metrics import RocCurveDisplay - from sklearn.datasets import load_wine + from sklearn.model_selection import train_test_split + from sklearn.linear_model import LogisticRegression + from sklearn.metrics import RocCurveDisplay + from sklearn.datasets import load_iris + + X, y = load_iris(return_X_y=True) + y = y == 2 # make binary + X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.8, random_state=0) + clf = LogisticRegression(random_state=42, C=.01) + clf.fit(X_train, y_train) - X, y = load_wine(return_X_y=True) - y = y == 2 # make binary - X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) - svc = SVC(random_state=42).fit(X_train, y_train) - y_pred = svc.decision_function(X_test) + y_pred = clf.predict_proba(X_test)[:, 1] - svc_disp = RocCurveDisplay.from_predictions(y_test, y_pred) + clf_disp = RocCurveDisplay.from_predictions(y_test, y_pred) The returned `svc_disp` object allows us to continue using the already computed From 7c8182f58f50f8d2b1d5d40e720c3049de7ead40 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 10 Mar 2025 14:54:03 +0100 Subject: [PATCH 10/19] changed Display object phrase --- doc/visualizations.rst | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 73b3f9d61722d..655898eb8cd97 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -15,20 +15,20 @@ input data (`X`, `y`). The `from_predictions` method creates a `Display` object from true and predicted values (`y_test`, `y_pred`), which is useful when you only want to compute the predictions once. Using `from_predictions` -avoids us to recompute the predictions, but does not automatically resolve some +avoids to recompute the predictions, but does not automatically resolve some ambiguities. -The `Display` object stores the computed values (e. g. metric values) required for -plotting with Matplotlib. These computed values are the results of some derivatives -after we pass the raw predictions to `from_predictions`, or we get them from -an estimator via `from_estimator`. +The `Display` object stores the computed values (e.g., metric values or +feature importance) required for plotting with Matplotlib. These values are the +derivative results after we pass the raw predictions to `from_predictions`, or +an estimator to `from_estimator`. -Display objects have a plot method which will create a matplotlib plot once the display -object has been initialised. Additionally, the plot method allows adding to an existing +Display objects have a plot method that creates a matplotlib plot once the display +object has been initialized. Additionally, the plot method allows adding to an existing plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. -In the following example, we plot a ROC curve for a fitted support -vector machine using `from_estimator`: +In the following example, we plot a ROC curve for a fitted Logistic LogisticRegression +model `from_estimator`: .. plot:: :context: close-figs @@ -64,17 +64,20 @@ If you already have the prediction values, you could instead use X, y = load_iris(return_X_y=True) y = y == 2 # make binary - X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.8, random_state=0) + X_train, X_test, y_train, y_test = train_test_split( + X, y, test_size=.8, random_state=0 + ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) + # select the probability of the class that we considered to be the positive label y_pred = clf.predict_proba(X_test)[:, 1] clf_disp = RocCurveDisplay.from_predictions(y_test, y_pred) -The returned `svc_disp` object allows us to continue using the already computed -ROC curve for SVC in future plots. In this case, the `svc_disp` is a +The returned `clf_disp` object allows us to continue using the already computed +ROC curve for clf in future plots. In this case, the `clf_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` that stores the computed values as attributes called `roc_auc`, `fpr`, and `tpr`. @@ -94,7 +97,7 @@ the previously computed ROC curve again by using the `plot` method of the ax = plt.gca() rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=ax, alpha=0.8) - svc_disp.plot(ax=ax, alpha=0.8) + clf_disp.plot(ax=ax, alpha=0.8) Notice that we pass `alpha=0.8` to the plot functions to adjust the alpha values of the curves. From 394a567871f351cbf97052edf23837aebf5127e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 10 Mar 2025 16:22:43 +0100 Subject: [PATCH 11/19] fix typo --- doc/visualizations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 655898eb8cd97..53adb27059be1 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -27,7 +27,7 @@ Display objects have a plot method that creates a matplotlib plot once the displ object has been initialized. Additionally, the plot method allows adding to an existing plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. -In the following example, we plot a ROC curve for a fitted Logistic LogisticRegression +In the following example, we plot a ROC curve for a fitted Logistic Regression model `from_estimator`: .. plot:: From 21f398accfda3b92343ddc964f86daf3aa3b259e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Wed, 12 Mar 2025 09:39:27 +0100 Subject: [PATCH 12/19] after another round of feedback --- doc/visualizations.rst | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 53adb27059be1..518bee3b1c46e 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -13,15 +13,16 @@ expose two methods for creating plots: `from_estimator` and The `from_estimator` method generates a `Display` object from a fitted estimator and input data (`X`, `y`). The `from_predictions` method creates a `Display` object from true and predicted -values (`y_test`, `y_pred`), which -is useful when you only want to compute the predictions once. Using `from_predictions` +values (`y_test`, `y_pred`). Using `from_predictions` avoids to recompute the predictions, but does not automatically resolve some -ambiguities. +ambiguities. The reason being that the user needs to know which column corresponds +to the positive label (in this case, `y_pred`). The `Display` object stores the computed values (e.g., metric values or feature importance) required for plotting with Matplotlib. These values are the -derivative results after we pass the raw predictions to `from_predictions`, or -an estimator to `from_estimator`. +derived results after we pass the raw predictions to `from_predictions`, or +an estimator to `from_estimator`. When the `Display` object is created, +the plot is also created. Display objects have a plot method that creates a matplotlib plot once the display object has been initialized. Additionally, the plot method allows adding to an existing @@ -50,7 +51,7 @@ model `from_estimator`: clf_disp = RocCurveDisplay.from_estimator(clf, X_test, y_test) If you already have the prediction values, you could instead use -`from_predictions` to do the same thing: +`from_predictions` to do the same thing (and save on compute): .. plot:: @@ -76,14 +77,12 @@ If you already have the prediction values, you could instead use clf_disp = RocCurveDisplay.from_predictions(y_test, y_pred) -The returned `clf_disp` object allows us to continue using the already computed -ROC curve for clf in future plots. In this case, the `clf_disp` is a -:class:`~sklearn.metrics.RocCurveDisplay` that stores the computed values as -attributes called `roc_auc`, `fpr`, and `tpr`. +The returned `clf_disp` object allows us to add another curve to the already computed +ROC curve. In this case, the `clf_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` that stores +the computed values as attributes called `roc_auc`, `fpr`, and `tpr`. -Next, we train a random forest classifier and plot -the previously computed ROC curve again by using the `plot` method of the -`Display` object. +Next, we train a random forest classifier and plot the previously computed ROC curve again +by using the `plot` method of the `Display` object. .. plot:: :context: close-figs From 8ef6f4134ec1701e97d4e4cb8cb8c29de15edbb8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Wed, 12 Mar 2025 09:59:33 +0100 Subject: [PATCH 13/19] corrected random_state to match others --- doc/visualizations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 518bee3b1c46e..d866836649346 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -66,7 +66,7 @@ If you already have the prediction values, you could instead use X, y = load_iris(return_X_y=True) y = y == 2 # make binary X_train, X_test, y_train, y_test = train_test_split( - X, y, test_size=.8, random_state=0 + X, y, test_size=.8, random_state=42 ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) From c4a492f410abcddfa0efdcd580c4affa23c24945 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 12 May 2025 10:09:21 +0200 Subject: [PATCH 14/19] Trying to add all the feedbacks --- doc/visualizations.rst | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index d866836649346..351f61cff5660 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -13,10 +13,12 @@ expose two methods for creating plots: `from_estimator` and The `from_estimator` method generates a `Display` object from a fitted estimator and input data (`X`, `y`). The `from_predictions` method creates a `Display` object from true and predicted -values (`y_test`, `y_pred`). Using `from_predictions` -avoids to recompute the predictions, but does not automatically resolve some -ambiguities. The reason being that the user needs to know which column corresponds -to the positive label (in this case, `y_pred`). +values (`y_test`, `y_pred`), which is useful when you only want to compute the +predictions once. + +Using `from_predictions` avoids having to recompute predictions, but does not +automatically resolve some ambiguities. For binary classification, the user must know +which column corresponds to the positive label (in this case, `y_pred`). The `Display` object stores the computed values (e.g., metric values or feature importance) required for plotting with Matplotlib. These values are the @@ -43,8 +45,8 @@ model `from_estimator`: X, y = load_iris(return_X_y=True) y = y == 2 # make binary X_train, X_test, y_train, y_test = train_test_split( - X, y, test_size=.8, random_state=42 - ) + X, y, test_size=.8, random_state=42 + ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) @@ -66,8 +68,8 @@ If you already have the prediction values, you could instead use X, y = load_iris(return_X_y=True) y = y == 2 # make binary X_train, X_test, y_train, y_test = train_test_split( - X, y, test_size=.8, random_state=42 - ) + X, y, test_size=.8, random_state=42 + ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) From b89de078fa691fb3033ff40504e2369d8aa1efdd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 12 May 2025 10:56:10 +0200 Subject: [PATCH 15/19] fix extra space in plot --- doc/visualizations.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 351f61cff5660..8f2e50e716b6f 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -46,7 +46,7 @@ model `from_estimator`: y = y == 2 # make binary X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.8, random_state=42 - ) + ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) @@ -69,7 +69,7 @@ If you already have the prediction values, you could instead use y = y == 2 # make binary X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.8, random_state=42 - ) + ) clf = LogisticRegression(random_state=42, C=.01) clf.fit(X_train, y_train) From a37e898f5d72de21c1880359d6506b96bd6e91ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Mon, 12 May 2025 17:08:16 +0200 Subject: [PATCH 16/19] After feedback again --- doc/visualizations.rst | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 8f2e50e716b6f..92d8a6773dd12 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -11,24 +11,24 @@ expose two methods for creating plots: `from_estimator` and `from_predictions`. The `from_estimator` method generates a `Display` object from a fitted estimator and -input data (`X`, `y`). +input data (`X`, `y`), and a plot. The `from_predictions` method creates a `Display` object from true and predicted -values (`y_test`, `y_pred`), which is useful when you only want to compute the -predictions once. +values (`y_test`, `y_pred`), and a plot. -Using `from_predictions` avoids having to recompute predictions, but does not -automatically resolve some ambiguities. For binary classification, the user must know -which column corresponds to the positive label (in this case, `y_pred`). +For :term:`predict_proba`, the column corresponding to the probability estimate of +the `pos_label` class is selected while for :term:`decision_function`, the score is +reverted (i.e. multiply by -1) when `pos_label` is not the label 1. The `Display` object stores the computed values (e.g., metric values or -feature importance) required for plotting with Matplotlib. These values are the -derived results after we pass the raw predictions to `from_predictions`, or -an estimator to `from_estimator`. When the `Display` object is created, -the plot is also created. +feature importance) required for plotting with Matplotlib. These values the +results derived from the raw predictions passed to `from_predictions`, or +an estimator and `X` passed to `from_estimator`. Display objects have a plot method that creates a matplotlib plot once the display -object has been initialized. Additionally, the plot method allows adding to an existing -plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. +object has been initialized (note that we recommend that display objects are created +via `from_estimator` or `from_predictions` instead of initialized directly). The plot +method allows adding to an existing plot by passing the existing plots +:class:`matplotlib.axes.Axes` to the `ax` parameter. In the following example, we plot a ROC curve for a fitted Logistic Regression model `from_estimator`: From e1ca7cbce0d7d5e81c25ac511170c32aa5e6b93f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 16 May 2025 15:14:18 +0200 Subject: [PATCH 17/19] next feedback fix --- doc/visualizations.rst | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 92d8a6773dd12..0d9828dde3a11 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -13,21 +13,23 @@ expose two methods for creating plots: `from_estimator` and The `from_estimator` method generates a `Display` object from a fitted estimator and input data (`X`, `y`), and a plot. The `from_predictions` method creates a `Display` object from true and predicted -values (`y_test`, `y_pred`), and a plot. +values (`y_test`, `y_pred`), and a plot. Using `from_predictions` avoids having to +recompute predictions, but the user needs to take care that the prediction values +passed correspond to the `pos_label`. For :term:`predict_proba`, the column corresponding to the probability estimate of the `pos_label` class is selected while for :term:`decision_function`, the score is reverted (i.e. multiply by -1) when `pos_label` is not the label 1. The `Display` object stores the computed values (e.g., metric values or -feature importance) required for plotting with Matplotlib. These values the +feature importance) required for plotting with Matplotlib. These values are the results derived from the raw predictions passed to `from_predictions`, or an estimator and `X` passed to `from_estimator`. Display objects have a plot method that creates a matplotlib plot once the display object has been initialized (note that we recommend that display objects are created -via `from_estimator` or `from_predictions` instead of initialized directly). The plot -method allows adding to an existing plot by passing the existing plots +via `from_estimator` or `from_predictions` instead of initialized directly). +The plot method allows adding to an existing plot by passing the existing plots :class:`matplotlib.axes.Axes` to the `ax` parameter. In the following example, we plot a ROC curve for a fitted Logistic Regression @@ -80,11 +82,11 @@ If you already have the prediction values, you could instead use The returned `clf_disp` object allows us to add another curve to the already computed -ROC curve. In this case, the `clf_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` that stores -the computed values as attributes called `roc_auc`, `fpr`, and `tpr`. +ROC curve. In this case, the `clf_disp` is a :class:`~sklearn.metrics.RocCurveDisplay` +that stores the computed values as attributes called `roc_auc`, `fpr`, and `tpr`. -Next, we train a random forest classifier and plot the previously computed ROC curve again -by using the `plot` method of the `Display` object. +Next, we train a random forest classifier and plot the previously computed ROC curve +again by using the `plot` method of the `Display` object. .. plot:: :context: close-figs From dff13b20d386273afee5de1dd7aa8e44ca35f98f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Fri, 16 May 2025 15:21:43 +0200 Subject: [PATCH 18/19] removed an extra -and- --- doc/visualizations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 0d9828dde3a11..23ec7091773a5 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -10,7 +10,7 @@ visual adjustments without recalculation. We provide `Display` classes that expose two methods for creating plots: `from_estimator` and `from_predictions`. -The `from_estimator` method generates a `Display` object from a fitted estimator and +The `from_estimator` method generates a `Display` object from a fitted estimator, input data (`X`, `y`), and a plot. The `from_predictions` method creates a `Display` object from true and predicted values (`y_test`, `y_pred`), and a plot. Using `from_predictions` avoids having to From 103cbf5030ba4d7846bb3de429b051298e00caee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dea=20Mar=C3=ADa=20L=C3=A9on?= Date: Thu, 22 May 2025 09:12:00 +0200 Subject: [PATCH 19/19] Comply with the lastest feedback --- doc/visualizations.rst | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/doc/visualizations.rst b/doc/visualizations.rst index 23ec7091773a5..e42be3a6db040 100644 --- a/doc/visualizations.rst +++ b/doc/visualizations.rst @@ -13,13 +13,14 @@ expose two methods for creating plots: `from_estimator` and The `from_estimator` method generates a `Display` object from a fitted estimator, input data (`X`, `y`), and a plot. The `from_predictions` method creates a `Display` object from true and predicted -values (`y_test`, `y_pred`), and a plot. Using `from_predictions` avoids having to -recompute predictions, but the user needs to take care that the prediction values -passed correspond to the `pos_label`. - -For :term:`predict_proba`, the column corresponding to the probability estimate of -the `pos_label` class is selected while for :term:`decision_function`, the score is -reverted (i.e. multiply by -1) when `pos_label` is not the label 1. +values (`y_test`, `y_pred`), and a plot. + +Using `from_predictions` avoids having to recompute predictions, +but the user needs to take care that the prediction values passed correspond +to the `pos_label`. For :term:`predict_proba`, select the column corresponding +to the `pos_label` class while for :term:`decision_function`, revert the score +(i.e. multiply by -1) if `pos_label` is not the last class in the +`classes_` attribute of your estimator. The `Display` object stores the computed values (e.g., metric values or feature importance) required for plotting with Matplotlib. These values are the