8000 [draft] Use the same 'fruit' example to drive the conversation · scikit-learn/scikit-learn@b17598a · GitHub
[go: up one dir, main page]

Skip to content

Commit b17598a

Browse files
author
Joan Massich
committed
[draft] Use the same 'fruit' example to drive the conversation
1 parent 0aee15a commit b17598a

File tree

1 file changed

+65
-36
lines changed

1 file changed

+65
-36
lines changed

doc/modules/multiclass.rst

Lines changed: 65 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -14,46 +14,75 @@ Multiclass and multilabel algorithms
1414

1515
The :mod:`sklearn.multiclass` module implements *meta-estimators* to solve
1616
``multiclass`` and ``multilabel`` classification problems
17-
by decomposing such problems into binary classification problems. Multioutput
17+
by decomposing such problems into binary classification problems. ``multioutput``
1818
regression is also supported.
1919

20-
- **Multiclass classification** means a classification task with more than
21-
two classes; e.g., classify a set of images of fruits which may be oranges,
22-
apples, or pears. Multiclass classification makes the assumption that each
23-
sample is assigned to one and only one label: a fruit can be either an
24-
apple or a pear but not both at the same time.
20+
- **Multiclass classification** produce single output that is categorical variable.
21+
In other words, a single classification task with more than two classes.
2522

26-
- **Multilabel classification** assigns to each sample a set of target
27-
labels. This can be thought as predicting properties of a data-point
28-
that are not mutually exclusive, such as topics that are relevant for a
29-
document. A text might be about any of religion, politics, finance or
30-
education at the same time or none of these.
23+
- valid :term:`multiclass` representation for
24+
:func:`~utils.multiclass.type_of_target` (`y`) are:
3125

32-
- **Multioutput regression** assigns each sample a set of target
33-
values. This can be thought of as predicting several properties
34-
for each data-point, such as wind direction and magnitude at a
35-
certain location.
26+
- 1d or column vector containing more than two discrete values.
27+
- sparse :term:`binary` matrix of shape ``(n_samples, n_classes)`` with a single element per row, where each column represents one class.
3628

37-
- **Multioutput-multiclass classification**
29+
- *example:* classify a set of images of fruits which may be oranges, apples,
30+
or pears. Multiclass classification makes the assumption that each sample is
31+
assigned to one and only one label: a fruit can be either an apple or a pear
32+
but not both at the same time.
33+
34+
- **Multilabel classification** predict a set of binary attributes that can
35+
either be true or false independently of one another. In other words, assigns
36+
to each sample a set of target labels. (This task can also be seen as a binary
37+
label multioutput task)
38+
39+
- valid representation :term:`multilabel` `y` is:
40+
41+
- either dense (or sparse) :term:`binary` matrix of shape
42+
``(n_samples, n_classes)`` with multiple active elements per row to denote
43+
that the sample belongs to multiple classes. Each column represents a class.
44+
45+
- *example:* based on an arbitrary set of features from fruit images, **Multilabel
46+
classification** simultaneously predict a set of binary attributes such as:
47+
grows in a tree, has stone fruit, is citric ...
48+
49+
- **Multioutput regression** predicts multiple outputs that are all continuous
50+
variables. In other words, assigns each sample a set of target values.
51+
52+
- valid representation :term:`multilabel` `y` is:
53+
54+
- dense matrix of shape ``(n_samples, n_classes)`` of floats. A column wise
55+
concatenation of :term:`continuous` variables.
56+
57+
- *example:* based on an arbitrary set of features from fruit images, predicts
58+
a set of :term:`contineous` variables such as: weight, sugar content, calories, etc.
59+
60+
- **Multioutput-multiclass classification**
3861
(also known as **multi-task classification**)
3962
means that a single estimator has to handle several joint classification
4063
tasks. This is both a generalization of the multi-label classification
4164
task, which only considers binary classification, as well as a
42-
generalization of the multi-class classification task. *The output format
43-
is a 2d numpy array or sparse matrix.*
44-
45-
The set of labels can be different for each output variable.
46-
For instance, a sample could be assigned "pear" for an output variable that
47-
takes possible values in a finite set of species such as "pear", "apple";
48-
and "blue" or "green" for a second output variable that takes possible values
49-
in a finite set of colors such as "green", "red", "blue", "yellow"...
50-
51-
This means that any classifiers handling multi-output
52-
multiclass or multi-task classification tasks,
53-
support the multi-label classification task as a special case.
54-
Multi-task classification is similar to the multi-output
55-
classification task with different model formulations. For
56-
more information, see the relevant estimator documentation.
65+
generalization of the multi-class classification task.
66+
67+
68+
- valid representation :term:`multilabel` `y` is:
69+
70+
- dense matrix of shape ``(n_samples, n_classes)`` of floats. A column wise
71+
concatenation of 1d :term:`multiclass` variables.
72+
73+
- *example:* The set of labels can be different for each output variable.
74+
For instance, a sample could be assigned "pear" for an output variable that
75+
takes possible values in a finite set of species such as "pear", "apple";
76+
and "blue" or "green" for a second output variable that takes possible values
77+
in a finite set of colors such as "green", "red", "blue", "yellow"...
78+
79+
- Note that any classifiers handling multi-output
80+
multiclass or multi-task classification tasks,
81+
support the multi-label classification task as a special case.
82+
Multi-task classification is similar to the multi-output
83+
classification task with different model formulations. For
84+
more information, see the relevant estimator documentation.
85+
5786

5887
All scikit-learn classifiers are capable of multiclass classification,
5988
but the meta-estimators offered by :mod:`sklearn.multiclass`
@@ -168,7 +197,7 @@ This strategy, also known as **one-vs-all**, is implemented in
168197
per class. For each classifier, the class is fitted against all the other
169198
classes. In addition to its computational efficiency (only `n_classes`
170199
classifiers are needed), one advantage of this approach is its
171-
interpretability. Since each class is represented by one and only one classifier,
200+
interpretability. Since each class is represented by one and only one classifier,
172201
it is possible to gain knowledge about the class by inspecting its
173202
corresponding classifier. This is the most commonly used strategy and is a fair
174203
default choice.
@@ -371,7 +400,7 @@ that are trained on a single X predictor matrix to predict a series
371400
of responses (y1,y2,y3...,yn).
372401

373402
Below is an example of multioutput classification:
374-
403+
375404
>>> from sklearn.datasets import make_classification
376405
>>> from sklearn.multioutput import MultiOutputClassifier
377406
>>> from sklearn.ensemble import RandomForestClassifier
@@ -434,7 +463,7 @@ averaged together.
434463
Regressor Chain
435464
================
436465

437-
Regressor chains (see :class:`RegressorChain`) is analogous to
438-
ClassifierChain as a way of combining a number of regressions
439-
into a single multi-target model that is capable of exploiting
466+
Regressor chains (see :class:`RegressorChain`) is analogous to
467+
ClassifierChain as a way of combining a number of regressions
468+
into a single multi-target model that is capable of exploiting
440469
correlations among targets.

0 commit comments

Comments
 (0)
0