DOC: Added drop down menus to 1.11 Ensemble Methods (#27915)

linus-md · web-flow · commit 7a68c9bdc28f · 2024-02-16T11:32:16.000Z
diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -252,7 +252,11 @@ the most samples (just like for continuous features). When predicting,
 categories that were not seen during fit time will be treated as missing
 values.
 
-**Split finding with categorical features**: The canonical way of considering
+|details-start|
+**Split finding with categorical features**:
+|details-split|
+
+The canonical way of considering
 categorical splits in a tree is to consider
 all of the :math:`2^{K - 1} - 1` partitions, where :math:`K` is the number of
 categories. This can quickly become prohibitive when :math:`K` is large.
@@ -267,6 +271,8 @@ instead of :math:`2^{K - 1} - 1`. The initial sorting is a
 :math:`\mathcal{O}(K \log(K))` operation, leading to a total complexity of
 :math:`\mathcal{O}(K \log(K) + K)`, instead of :math:`\mathcal{O}(2^K)`.
 
+|details-end|
+
 .. topic:: Examples:
 
   * :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py`
@@ -444,8 +450,9 @@ The usage and the parameters of :class:`GradientBoostingClassifier` and
 :class:`GradientBoostingRegressor` are described below. The 2 most important
 parameters of these estimators are `n_estimators` and `learning_rate`.
 
-Classification
-^^^^^^^^^^^^^^^
+|details-start|
+**Classification**
+|details-split|
 
 :class:`GradientBoostingClassifier` supports both binary and multi-class
 classification.
@@ -482,8 +489,11 @@ depth via ``max_depth`` or by setting the number of leaf nodes via
    :class:`HistGradientBoostingClassifier` as an alternative to
    :class:`GradientBoostingClassifier` .
 
-Regression
-^^^^^^^^^^^
+|details-end|
+
+|details-start|
+**Regression**
+|details-split|
 
 :class:`GradientBoostingRegressor` supports a number of
 :ref:`different loss functions <gradient_boosting_loss>`
@@ -524,6 +534,8 @@ to determine the optimal number of trees (i.e. ``n_estimators``) by early stoppi
    :align: center
    :scale: 75
 
+|details-end|
+
 .. topic:: Examples:
 
  * :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py`
@@ -580,8 +592,9 @@ Mathematical formulation
 We first present GBRT for regression, and then detail the classification
 case.
 
-Regression
-...........
+|details-start|
+**Regression**
+|details-split|
 
 GBRT regressors are additive models whose prediction :math:`\hat{y}_i` for a
 given input :math:`x_i` is of the following form:
@@ -663,8 +676,11 @@ space.
   update is loss-dependent: for the absolute error loss, the value of
   a leaf is updated to the median of the samples in that leaf.
 
-Classification
-..............
+|details-end|
+
+|details-start|
+**Classification**
+|details-split|
 
 Gradient boosting for classification is very similar to the regression case.
 However, the sum of the trees :math:`F_M(x_i) = \sum_m h_m(x_i)` is not
@@ -685,6 +701,8 @@ still a regressor, not a classifier. This is because the sub-estimators are
 trained to predict (negative) *gradients*, which are always continuous
 quantities.
 
+|details-end|
+
 .. _gradient_boosting_loss:
 
 Loss Functions
@@ -693,7 +711,9 @@ Loss Functions
 The following loss functions are supported and can be specified using
 the parameter ``loss``:
 
-* Regression
+|details-start|
+**Regression**
+|details-split|
 
   * Squared error (``'squared_error'``): The natural choice for regression
     due to its superior computational properties. The initial model is
@@ -710,7 +730,12 @@ the parameter ``loss``:
     can be used to create prediction intervals
     (see :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_quantile.py`).
 
-* Classification
+|details-end|
+
+
+|details-start|
+**Classification**
+|details-split|
 
   * Binary log-loss (``'log-loss'``): The binomial
     negative log-likelihood loss function for binary classification. It provides
@@ -728,6 +753,8 @@ the parameter ``loss``:
     examples than ``'log-loss'``; can only be used for binary
     classification.
 
+|details-end|
+
 .. _gradient_boosting_shrinkage:
 
 Shrinkage via learning rate
@@ -1356,8 +1383,28 @@ Vector Machine, a Decision Tree, and a K-nearest neighbor classifier::
     :align: center
     :scale: 75%
 
-Using the `VotingClassifier` with `GridSearchCV`
-------------------------------------------------
+Usage
+-----
+
+In order to predict the class labels based on the predicted
+class-probabilities (scikit-learn estimators in the VotingClassifier
+must support ``predict_proba`` method)::
+
+   >>> eclf = VotingClassifier(
+   ...     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
+   ...     voting='soft'
+   ... )
+
+Optionally, weights can be provided for the individual classifiers::
+
+   >>> eclf = VotingClassifier(
+   ...     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
+   ...     voting='soft', weights=[2,5,1]
+   ... )
+
+|details-start|
+**Using the `VotingClassifier` with `GridSearchCV`**
+|details-split|
 
 The :class:`VotingClassifier` can also be used together with
 :class:`~sklearn.model_selection.GridSearchCV` in order to tune the
@@ -1377,24 +1424,7 @@ hyperparameters of the individual estimators::
    >>> grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5)
    >>> grid = grid.fit(iris.data, iris.target)
 
-Usage
------
-
-In order to predict the class labels based on the predicted
-class-probabilities (scikit-learn estimators in the VotingClassifier
-must support ``predict_proba`` method)::
-
-   >>> eclf = VotingClassifier(
-   ...     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
-   ...     voting='soft'
-   ... )
-
-Optionally, weights can be provided for the individual classifiers::
-
-   >>> eclf = VotingClassifier(
-   ...     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
-   ...     voting='soft', weights=[2,5,1]
-   ... )
+|details-end|
 
 .. _voting_regressor: