From 6c04aef0572a4e640bf40312affdfe24ba47b3e9 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Thu, 5 Jan 2023 15:37:21 +0100
Subject: [PATCH 1/9] Make MeanShift explanation clearer

---
 doc/modules/clustering.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 48a59e590a8e7..79b7d59098531 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -392,22 +392,22 @@ for centroids to be the mean of the points within a given region. These
 candidates are then filtered in a post-processing stage to eliminate
 near-duplicates to form the final set of centroids.
 
-Given a candidate centroid :math:`x_i` for iteration :math:`t`, the candidate
+Given a candidate centroid :math:`x` for iteration :math:`t`, the candidate
 is updated according to the following equation:
 
 .. math::
 
-    x_i^{t+1} = m(x_i^t)
+    x^{t+1} = x^t + m(x^t)
 
-Where :math:`N(x_i)` is the neighborhood of samples within a given distance
-around :math:`x_i` and :math:`m` is the *mean shift* vector that is computed for each
+Where :math:`N(x)` is the neighborhood of samples within a given distance
+around :math:`x` and :math:`m` is the *mean shift* vector that is computed for each
 centroid that points towards a region of the maximum increase in the density of points.
 This is computed using the following equation, effectively updating a centroid
 to be the mean of the samples within its neighborhood:
 
 .. math::
 
-    m(x_i) = \frac{\sum_{x_j \in N(x_i)}K(x_j - x_i)x_j}{\sum_{x_j \in N(x_i)}K(x_j - x_i)}
+    m(x_i)  = \frac{1}{|N(x_i)|} \sum_{x_j \in N(x_i)}x_j - x
 
 The algorithm automatically sets the number of clusters, instead of relying on a
 parameter ``bandwidth``, which dictates the size of the region to search through.

From 646f5b6d1116c8f530fb4eb218e5c48705d9d248 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Thu, 5 Jan 2023 15:40:59 +0100
Subject: [PATCH 2/9] Fix kernel name

---
 sklearn/cluster/_mean_shift.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sklearn/cluster/_mean_shift.py b/sklearn/cluster/_mean_shift.py
index f886fe392f4bf..407b7b3012fa0 100644
--- a/sklearn/cluster/_mean_shift.py
+++ b/sklearn/cluster/_mean_shift.py
@@ -288,7 +288,7 @@ class MeanShift(ClusterMixin, BaseEstimator):
     Parameters
     ----------
     bandwidth : float, default=None
-        Bandwidth used in the RBF kernel.
+        Bandwidth used in the flat kernel.
 
         If not given, the bandwidth is estimated using
         sklearn.cluster.estimate_bandwidth; see the documentation for that

From 79f693ce18333c3641567d4c0dc9f126b1f3f267 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Thu, 5 Jan 2023 16:47:03 +0100
Subject: [PATCH 3/9] Remove indices

---
 doc/modules/clustering.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 79b7d59098531..23f8b601d53bf 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -407,7 +407,7 @@ to be the mean of the samples within its neighborhood:
 
 .. math::
 
-    m(x_i)  = \frac{1}{|N(x_i)|} \sum_{x_j \in N(x_i)}x_j - x
+    m(x)  = \frac{1}{|N(x)|} \sum_{x_j \in N(x)}x_j - x
 
 The algorithm automatically sets the number of clusters, instead of relying on a
 parameter ``bandwidth``, which dictates the size of the region to search through.

From 6f72fba2dc806275adac10e4343b55abf2768818 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Sun, 22 Jan 2023 10:37:07 +0100
Subject: [PATCH 4/9] Restructure paragraph

---
 doc/modules/clustering.rst | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 23f8b601d53bf..145aee1ce2a2d 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -399,11 +399,12 @@ is updated according to the following equation:
 
     x^{t+1} = x^t + m(x^t)
 
-Where :math:`N(x)` is the neighborhood of samples within a given distance
-around :math:`x` and :math:`m` is the *mean shift* vector that is computed for each
+Where :math:`m` is the *mean shift* vector that is computed for each
 centroid that points towards a region of the maximum increase in the density of points.
-This is computed using the following equation, effectively updating a centroid
-to be the mean of the samples within its neighborhood:
+To compute :math:`m` we define :math:`N(x)` as the neighborhood of samples within
+a given distance around :math:`x`. Then :math:`m` is computed using the following
+equation, effectively updating a centroid to be the mean of the samples within
+its neighborhood:
 
 .. math::
 

From cf80f736e1c3ec0387b8fa1056578377ff1c26a2 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Sun, 22 Jan 2023 10:37:44 +0100
Subject: [PATCH 5/9] Add multiplication sign

---
 doc/modules/clustering.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 145aee1ce2a2d..2292206c445bb 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -397,7 +397,7 @@ is updated according to the following equation:
 
 .. math::
 
-    x^{t+1} = x^t + m(x^t)
+    x^{t+1} = x^t + m * (x^t)
 
 Where :math:`m` is the *mean shift* vector that is computed for each
 centroid that points towards a region of the maximum increase in the density of points.

From eda2948412282f9170a8e5b53a9fb4c5d8ec1386 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Sun, 22 Jan 2023 10:40:56 +0100
Subject: [PATCH 6/9] Remove multiplication sign

---
 doc/modules/clustering.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 2292206c445bb..145aee1ce2a2d 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -397,7 +397,7 @@ is updated according to the following equation:
 
 .. math::
 
-    x^{t+1} = x^t + m * (x^t)
+    x^{t+1} = x^t + m(x^t)
 
 Where :math:`m` is the *mean shift* vector that is computed for each
 centroid that points towards a region of the maximum increase in the density of points.

From d94756541baedb6a20b6fc7d913253bc101bc57a Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Sun, 22 Jan 2023 10:51:58 +0100
Subject: [PATCH 7/9] Mention hill climbing

---
 doc/modules/clustering.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 145aee1ce2a2d..72a8afcbc6be0 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -392,6 +392,8 @@ for centroids to be the mean of the points within a given region. These
 candidates are then filtered in a post-processing stage to eliminate
 near-duplicates to form the final set of centroids.
 
+The position of centroid candidates is iteratively adjusted using a technique called hill 
+climbing, which finds local maxima of the estimated probability density. 
 Given a candidate centroid :math:`x` for iteration :math:`t`, the candidate
 is updated according to the following equation:
 

From 5c1944e4b3428275eb14dbe874009fcd3108ba73 Mon Sep 17 00:00:00 2001
From: Adam Kania <48769688+remilvus@users.noreply.github.com>
Date: Sun, 22 Jan 2023 11:05:29 +0100
Subject: [PATCH 8/9] Add kernel explanation

---
 doc/modules/clustering.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 72a8afcbc6be0..b677fb33fc2f2 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -412,6 +412,17 @@ its neighborhood:
 
     m(x)  = \frac{1}{|N(x)|} \sum_{x_j \in N(x)}x_j - x
 
+In general, the equation for :math:`m` depends on a kernel used for density estimation. 
+The generic formula is:
+
+.. math::
+
+    m(x)  = \frac{\sum_{x_j \in N(x)}K(x_j - x)x_j}{\sum_{x_j \in N(x)}K(x_j - x)} - x
+
+In our implementation, :math:`K(x)` is equal to 1 if :math:`x` is small enough and is 
+equal to 0 otherwise. Effectively :math:`K(y - x)` indicates whether :math:`y` is in
+the neighborhood of :math:`x`.
+
 The algorithm automatically sets the number of clusters, instead of relying on a
 parameter ``bandwidth``, which dictates the size of the region to search through.
 This parameter can be set manually, but can be estimated using the provided

From 9a5ed894bdca0b55fc2c83431a252387c71c6ff2 Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Mon, 23 Jan 2023 18:22:03 +0100
Subject: [PATCH 9/9] Update doc/modules/clustering.rst

---
 doc/modules/clustering.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index b677fb33fc2f2..775f8c8180d14 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -410,14 +410,14 @@ its neighborhood:
 
 .. math::
 
-    m(x)  = \frac{1}{|N(x)|} \sum_{x_j \in N(x)}x_j - x
+    m(x) = \frac{1}{|N(x)|} \sum_{x_j \in N(x)}x_j - x
 
 In general, the equation for :math:`m` depends on a kernel used for density estimation. 
 The generic formula is:
 
 .. math::
 
-    m(x)  = \frac{\sum_{x_j \in N(x)}K(x_j - x)x_j}{\sum_{x_j \in N(x)}K(x_j - x)} - x
+    m(x) = \frac{\sum_{x_j \in N(x)}K(x_j - x)x_j}{\sum_{x_j \in N(x)}K(x_j - x)} - x
 
 In our implementation, :math:`K(x)` is equal to 1 if :math:`x` is small enough and is 
 equal to 0 otherwise. Effectively :math:`K(y - x)` indicates whether :math:`y` is in