-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT consolidate target type tags #28927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/developers/develop.rst
Outdated
- 'multi-output': multi-output regression or classification, i.e. `y` | ||
has shape `(n_samples, n_outputs)` | ||
- 'single-output': single-output regression or classification, i.e. `y` | ||
has shape `(n_samples,)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should have an instance to explain that you can multi-output and not support a single output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note:
Note that an estimator can support multi-output but not single-output, or vice
versa. The same goes for binary and multi-class support.
doc/developers/develop.rst
Outdated
- 'binary': binary classification. All estimators supporting | ||
'multi-class' also support 'binary'; classifiers only; | ||
- 'multi-class': multiclass classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Together with the other comment, we might benefit of an example to show what we mean.
doc/developers/develop.rst
Outdated
- 'multi-output': multi-output regression or classification, i.e. `y` | ||
has shape `(n_samples, n_outputs)` | ||
- 'single-output': single-output regression or classification, i.e. `y` | ||
has shape `(n_samples,)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should have an instance to explain that you can multi-output and not support a single output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the logic behind it, but it's unfortunate we can use the names in type_of_target
.
doc/developers/develop.rst
Outdated
|
||
multioutput_only (default=False) | ||
whether estimator supports only multi-output classification or regression. | ||
target_type (default=["multiclass", "single-output"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the estimator is a regressor, then having a "multiclass" tag feels strange to me.
An alternative here is to have a set of binary flags, instead of a set of flags, as in, introduce 5 flags which are all binary. It makes updating the flags in inheritance slightly easier. Right now once we implement the inheritance for flags["target_type"] = set(flags["target_type"]) | {"multi-output"} I'm not sure which one would be better. |
Another alternative is to have two flags: classification-type and output-shape/type maybe? |
I like the binary flags approach. |
If we go back to binary flags, then the only change I see necessary is to:
to
|
I'm okay with
Although adding another tag is more complexity, I do not like how |
I think single and multi output aspect is independent from binary/multi class aspect. You could be a multi output only classifier that only supports binary outputs for instance. |
So something which isn't making me comfortable with having a set of binary flags is that A part of me wishes we had these as flags: @dataclass
class InputFlags:
two_d_array: bool
sparse: bool
categorical: bool
string: bool
positive_only: bool
nan_allowed: bool
pairwise: bool
@dataclass
class TargetFlags:
required: bool
positive_only: bool
multi_output: bool
single_output: bool
@dataclass
class TransformerFlags:
preserve_dtype: list[str]
@dataclass
class ClassifierFlags:
poor_score: bool
binary: bool
multiclass: bool
multilabel: bool
@dataclass
class RegressorFlags:
poor_score: bool
@dataclass
class Flags:
array_api_support: bool
no_validation: bool
stateless: bool
non_deterministic: bool
poor_score: bool
requires_fit: bool
_skip_test: bool
_xfail_checks: dict[str, str]
input_flags: InputFlags
target_flags: TargetFlags
transformer_flags: TransformerFlags
classifier_flags: ClassifierFlags
regressor_flags: RegressorFlags
I'm actually not sure why we have this in WDYT @glemaitre @thomasjpfan |
I'm okay with having flags for everything. It makes sense for For this PR, are you considering using dataclasses to represent the tags or are you keeping them as dicts? |
If we're okay with data classes, I can open a separate PR implementing them. Users can then subclass these if they want to add more tags. |
Solved with dataclasses. |
This consolidates the following tags:
into
Towards #28910
cc @glemaitre @thomasjpfan