HistGradientBoosting avoid data shuffling when early_stopping activated · Issue #25460 · scikit-learn/scikit-learn

aldder · 2023-01-23T17:32:50.000Z

hello, it would be useful if the _HistGradientBoostingRegressor_ or _HistGradientBoostingClassifier_ model had the ability to avoid data shuffling when using the _early_stopping_ and _validation_fraction_ parameters, since maintaining data order is a basic requirement in case you work with TimeSeries https://github.com/scikit-learn/scikit-learn/blob/98cf537f5/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L427 ```python if sample_weight is None: X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=self.validation_fraction, stratify=stratify, random_state=self._random_seed, ) sample_weight_train = sample_weight_val = None else: # TODO: incorporate sample_weight in sampling here, as well as # stratify ( X_train, X_val, y_train, y_val, sample_weight_train, sample_weight_val, ) = train_test_split( X, y, sample_weight, test_size=self.validation_fraction, stratify=stratify, random_state=self._random_seed, ) ``` ### Describe your proposed solution it would be sufficient to add an additional parameter to control whether or not to shuffle the data ### Describe alternatives you've considered, if relevant _No response_ ### Additional context _No response_

8000 HistGradientBoosting avoid data shuffling when early_stopping activated · Issue #25460 · scikit-learn/scikit-learn · GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions