@@ -14,46 +14,75 @@ Multiclass and multilabel algorithms
14
14
15
15
The :mod: `sklearn.multiclass ` module implements *meta-estimators * to solve
16
16
``multiclass `` and ``multilabel `` classification problems
17
- by decomposing such problems into binary classification problems. Multioutput
17
+ by decomposing such problems into binary classification problems. `` multioutput ``
18
18
regression is also supported.
19
19
20
- - **Multiclass classification ** means a classification task with more than
21
- two classes; e.g., classify a set of images of fruits which may be oranges,
22
- apples, or pears. Multiclass classification makes the assumption that each
23
- sample is assigned to one and only one label: a fruit can be either an
24
- apple or a pear but not both at the same time.
20
+ - **Multiclass classification ** produce single output that is categorical variable.
21
+ In other words, a single classification task with more than two classes.
25
22
26
- - **Multilabel classification ** assigns to each sample a set of target
27
- labels. This can be thought as predicting properties of a data-point
28
- that are not mutually exclusive, such as topics that are relevant for a
29
- document. A text might be about any of religion, politics, finance or
30
- education at the same time or none of these.
23
+ - valid :term: `multiclass ` representation for
24
+ :func: `~utils.multiclass.type_of_target ` (`y `) are:
31
25
32
- - **Multioutput regression ** assigns each sample a set of target
33
- values. This can be thought of as predicting several properties
34
- for each data-point, such as wind direction and magnitude at a
35
- certain location.
26
+ - 1d or column vector containing more than two discrete values.
27
+ - sparse :term: `binary ` matrix of shape ``(n_samples, n_classes) `` with a single element per row, where each column represents one class.
36
28
37
- - **Multioutput-multiclass classification **
29
+ - *example: * classify a set of images of fruits which may be oranges, apples,
30
+ or pears. Multiclass classification makes the assumption that each sample is
31
+ assigned to one and only one label: a fruit can be either an apple or a pear
32
+ but not both at the same time.
33
+
34
+ - **Multilabel classification ** predict a set of binary attributes that can
35
+ either be true or false independently of one another. In other words, assigns
36
+ to each sample a set of target labels. (This task can also be seen as a binary
37
+ label multioutput task)
38
+
39
+ - valid representation :term: `multilabel ` `y ` is:
40
+
41
+ - either dense (or sparse) :term: `binary ` matrix of shape
42
+ ``(n_samples, n_classes) `` with multiple active elements per row to denote
43
+ that the sample belongs to multiple classes. Each column represents a class.
44
+
45
+ - *example: * based on an arbitrary set of features from fruit images, **Multilabel
46
+ classification ** simultaneously predict a set of binary attributes such as:
47
+ grows in a tree, has stone fruit, is citric ...
48
+
49
+ - **Multioutput regression ** predicts multiple outputs that are all continuous
50
+ variables. In other words, assigns each sample a set of target values.
51
+
52
+ - valid representation :term: `multilabel ` `y ` is:
53
+
54
+ - dense matrix of shape ``(n_samples, n_classes) `` of floats. A column wise
55
+ concatenation of :term: `continuous ` variables.
56
+
57
+ - *example: * based on an arbitrary set of features from fruit images, predicts
58
+ a set of :term: `contineous ` variables such as: weight, sugar content, calories, etc.
59
+
60
+ - **Multioutput-multiclass classification **
38
61
(also known as **multi-task classification **)
39
62
means that a single estimator has to handle several joint classification
40
63
tasks. This is both a generalization of the multi-label classification
41
64
task, which only considers binary classification, as well as a
42
- generalization of the multi-class classification task. *The output format
43
- is a 2d numpy array or sparse matrix. *
44
-
45
- The set of labels can be different for each output variable.
46
- For instance, a sample could be assigned "pear" for an output variable that
47
- takes possible values in a finite set of species such as "pear", "apple";
48
- and "blue" or "green" for a second output variable that takes possible values
49
- in a finite set of colors such as "green", "red", "blue", "yellow"...
50
-
51
- This means that any classifiers handling multi-output
52
- multiclass or multi-task classification tasks,
53
- support the multi-label classification task as a special case.
54
- Multi-task classification is similar to the multi-output
55
- classification task with different model formulations. For
56
- more information, see the relevant estimator documentation.
65
+ generalization of the multi-class classification task.
66
+
67
+
68
+ - valid representation :term: `multilabel ` `y ` is:
69
+
70
+ - dense matrix of shape ``(n_samples, n_classes) `` of floats. A column wise
71
+ concatenation of 1d :term: `multiclass ` variables.
72
+
73
+ - *example: * The set of labels can be different for each output variable.
74
+ For instance, a sample could be assigned "pear" for an output variable that
75
+ takes possible values in a finite set of species such as "pear", "apple";
76
+ and "blue" or "green" for a second output variable that takes possible values
77
+ in a finite set of colors such as "green", "red", "blue", "yellow"...
78
+
79
+ - Note that any classifiers handling multi-output
80
+ multiclass or multi-task classification tasks,
81
+ support the multi-label classification task as a special case.
82
+ Multi-task classification is similar to the multi-output
83
+ classification task with different model formulations. For
84
+ more information, see the relevant estimator documentation.
85
+
57
86
58
87
All scikit-learn classifiers are capable of multiclass classification,
59
88
but the meta-estimators offered by :mod: `sklearn.multiclass `
@@ -168,7 +197,7 @@ This strategy, also known as **one-vs-all**, is implemented in
168
197
per class. For each classifier, the class is fitted against all the other
169
198
classes. In addition to its computational efficiency (only `n_classes `
170
199
classifiers are needed), one advantage of this approach is its
171
- interpretability. Since each class is represented by one and only one classifier,
200
+ interpretability. Since each class is represented by one and only one classifier,
172
201
it is possible to gain knowledge about the class by inspecting its
173
202
corresponding classifier. This is the most commonly used strategy and is a fair
174
203
default choice.
@@ -371,7 +400,7 @@ that are trained on a single X predictor matrix to predict a series
371
400
of responses (y1,y2,y3...,yn).
372
401
373
402
Below is an example of multioutput classification:
374
-
403
+
375
404
>>> from sklearn.datasets import make_classification
376
405
>>> from sklearn.multioutput import MultiOutputClassifier
377
406
>>> from sklearn.ensemble import RandomForestClassifier
@@ -434,7 +463,7 @@ averaged together.
434
463
Regressor Chain
435
464
================
436
465
437
- Regressor chains (see :class: `RegressorChain `) is analogous to
438
- ClassifierChain as a way of combining a number of regressions
439
- into a single multi-target model that is capable of exploiting
466
+ Regressor chains (see :class: `RegressorChain `) is analogous to
467
+ ClassifierChain as a way of combining a number of regressions
468
+ into a single multi-target model that is capable of exploiting
440
469
correlations among targets.
0 commit comments