DOC Added dropdowns to 6.2 feature-extraction (scikit-learn#26807)

Kishan-Ved · REDVM · commit 5ba4c825b0ce · 2023-11-16T16:28:15.000+01:00
diff --git a/doc/modules/feature_extraction.rst b/doc/modules/feature_extraction.rst
@@ -206,8 +206,9 @@ Note the use of a generator comprehension,
 which introduces laziness into the feature extraction:
 tokens are only processed on demand from the hasher.
 
-Implementation details
-----------------------
+|details-start|
+**Implementation details**
+|details-split|
 
 :class:`FeatureHasher` uses the signed 32-bit variant of MurmurHash3.
 As a result (and because of limitations in ``scipy.sparse``),
@@ -223,16 +224,18 @@ Since a simple modulo is used to transform the hash function to a column index,
 it is advisable to use a power of two as the ``n_features`` parameter;
 otherwise the features will not be mapped evenly to the columns.
 
+.. topic:: References:
+    
+  * `MurmurHash3 <https://github.com/aappleby/smhasher>`_.
+
+|details-end|
 
 .. topic:: References:
 
  * Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola and
    Josh Attenberg (2009). `Feature hashing for large scale multitask learning
    <https://alex.smola.org/papers/2009/Weinbergeretal09.pdf>`_. Proc. ICML.
 
- * `MurmurHash3 <https://github.com/aappleby/smhasher>`_.
-
-
 .. _text_feature_extraction:
 
 Text feature extraction
@@ -395,8 +398,9 @@ last document::
 
 .. _stop_words:
 
-Using stop words
-................
+|details-start|
+**Using stop words**
+|details-split|
 
 Stop words are words like "and", "the", "him", which are presumed to be
 uninformative in representing the content of a text, and which may be
@@ -426,6 +430,9 @@ identify and warn about some kinds of inconsistencies.
                <https://aclweb.org/anthology/W18-2502>`__.
                In *Proc. Workshop for NLP Open Source Software*.
 
+
+|details-end|
+
 .. _tfidf:
 
 Tf–idf term weighting
@@ -490,6 +497,10 @@ class::
 Again please see the :ref:`reference documentation
 <text_feature_extraction_ref>` for the details on all the parameters.
 
+|details-start|
+**Numeric example of a tf-idf matrix**
+|details-split|
+
 Let's take an example with the following counts. The first term is present
 100% of the time hence not very interesting. The two other features only
 in less than 50% of the time hence probably more representative of the
@@ -609,6 +620,7 @@ feature extractor with a classifier:
 
  * :ref:`sphx_glr_auto_examples_model_selection_plot_grid_search_text_feature_extraction.py`
 
+|details-end|
 
 Decoding text files
 -------------------
@@ -637,6 +649,10 @@ or ``"replace"``. See the documentation for the Python function
 ``bytes.decode`` for more details
 (type ``help(bytes.decode)`` at the Python prompt).
 
+|details-start|
+**Troubleshooting decoding text**
+|details-split|
+
 If you are having trouble decoding text, here are some things to try:
 
 - Find out what the actual encoding of the text is. The file might come
@@ -690,6 +706,7 @@ About Unicode <https://www.joelonsoftware.com/articles/Unicode.html>`_.
 
 .. _`ftfy`: https://github.com/LuminosoInsight/python-ftfy
 
+|details-end|
 
 Applications and examples
 -------------------------
@@ -870,8 +887,9 @@ The :class:`HashingVectorizer` also comes with the following limitations:
   model. A :class:`TfidfTransformer` can be appended to it in a pipeline if
   required.
 
-Performing out-of-core scaling with HashingVectorizer
-------------------------------------------------------
+|details-start|
+**Performing out-of-core scaling with HashingVectorizer**
+|details-split|
 
 An interesting development of using a :class:`HashingVectorizer` is the ability
 to perform `out-of-core`_ scaling. This means that we can learn from data that
@@ -890,6 +908,8 @@ time is often limited by the CPU time one wants to spend on the task.
 For a full-fledged example of out-of-core scaling in a text classification
 task see :ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`.
 
+|details-end|
+
 Customizing the vectorizer classes
 ----------------------------------
 
@@ -928,6 +948,10 @@ parameters it is possible to derive from the class and override the
 ``build_preprocessor``, ``build_tokenizer`` and ``build_analyzer``
 factory methods instead of passing custom functions.
 
+|details-start|
+**Tips and tricks**
+|details-split|
+
 Some tips and tricks:
 
   * If documents are pre-tokenized by an external package, then store them in
@@ -982,6 +1006,8 @@ Some tips and tricks:
 Customizing the vectorizer can also be useful when handling Asian languages
 that do not use an explicit word separator such as whitespace.
 
+|details-end|
+
 .. _image_feature_extraction:
 
 Image feature extraction