From e186f8fe6a945e27d388e61370bc4323ae3ac917 Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 15 Jun 2023 15:16:27 +1000 Subject: [PATCH 1/3] fix typo in math --- doc/modules/preprocessing.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst index 69045147d8af9..ee1d7eb636c43 100644 --- a/doc/modules/preprocessing.rst +++ b/doc/modules/preprocessing.rst @@ -883,7 +883,7 @@ cardinality categories are location based such as zip code or region. For the binary classification target, the target encoding is given by: .. math:: - S_i = \lambda_i\frac{n_{iY}}{n_i} + (1 - \lambda_i)\frac{n_y}{n} + S_i = \lambda_i\frac{n_{iY}}{n_i} + (1 - \lambda_i)\frac{n_Y}{n} where :math:`S_i` is the encoding for category :math:`i`, :math:`n_{iY}` is the number of observations with :math:`Y=1` with category :math:`i`, :math:`n_i` is @@ -897,14 +897,14 @@ observations with :math:`Y=1`, :math:`n` is the number of observations, and where :math:`m` is a smoothing factor, which is controlled with the `smooth` parameter in :class:`TargetEncoder`. Large smoothing factors will put more weight on the global mean. When `smooth="auto"`, the smoothing factor is -computed as an empirical Bayes estimate: :math:`m=\sigma_c^2/\tau^2`, where +computed as an empirical Bayes estimate: :math:`m=\sigma_i^2/\tau^2`, where :math:`\sigma_i^2` is the variance of `y` with category :math:`i` and :math:`\tau^2` is the global variance of `y`. For continuous targets, the formulation is similar to binary classification: .. math:: - S_i = \lambda_i\frac{\sum_{k\in L_i}y_k}{n_i} + (1 - \lambda_i)\frac{\sum_{k=1}^{n}y_k}{n} + S_i = \lambda_i\frac{\sum_{k\in L_i}Y_k}{n_i} + (1 - \lambda_i)\frac{\sum_{k=1}^{n}Y_k}{n} where :math:`L_i` is the set of observations for which :math:`X=X_i` and :math:`n_i` is the cardinality of :math:`L_i`. From 7ce8002876879685040638eb4d35c1b4cfd88ad4 Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Fri, 16 Jun 2023 17:19:18 +1000 Subject: [PATCH 2/3] fix typo --- doc/modules/preprocessing.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst index ee1d7eb636c43..a7c04634c93ac 100644 --- a/doc/modules/preprocessing.rst +++ b/doc/modules/preprocessing.rst @@ -887,7 +887,7 @@ binary classification target, the target encoding is given by: where :math:`S_i` is the encoding for category :math:`i`, :math:`n_{iY}` is the number of observations with :math:`Y=1` with category :math:`i`, :math:`n_i` is -the number of observations with category :math:`i`, :math:`n_y` is the number of +the number of observations with category :math:`i`, :math:`n_Y` is the number of observations with :math:`Y=1`, :math:`n` is the number of observations, and :math:`\lambda_i` is a shrinkage factor. The shrinkage factor is given by: @@ -898,7 +898,7 @@ where :math:`m` is a smoothing factor, which is controlled with the `smooth` parameter in :class:`TargetEncoder`. Large smoothing factors will put more weight on the global mean. When `smooth="auto"`, the smoothing factor is computed as an empirical Bayes estimate: :math:`m=\sigma_i^2/\tau^2`, where -:math:`\sigma_i^2` is the variance of `y` with category :math:`i` and +:math:`\sigma_i^2` is the variance of `y ` with category :math:`i` and :math:`\tau^2` is the global variance of `y`. For continuous targets, the formulation is similar to binary classification: From 040274f593f4b835b3f79d9115bd04bc0793b99d Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Fri, 16 Jun 2023 17:21:09 +1000 Subject: [PATCH 3/3] remove typo --- doc/modules/preprocessing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst index a7c04634c93ac..c2b3aa1accf63 100644 --- a/doc/modules/preprocessing.rst +++ b/doc/modules/preprocessing.rst @@ -898,7 +898,7 @@ where :math:`m` is a smoothing factor, which is controlled with the `smooth` parameter in :class:`TargetEncoder`. Large smoothing factors will put more weight on the global mean. When `smooth="auto"`, the smoothing factor is computed as an empirical Bayes estimate: :math:`m=\sigma_i^2/\tau^2`, where -:math:`\sigma_i^2` is the variance of `y ` with category :math:`i` and +:math:`\sigma_i^2` is the variance of `y` with category :math:`i` and :math:`\tau^2` is the global variance of `y`. For continuous targets, the formulation is similar to binary classification: