ENH: Add warnings for small target values in RandomForestRegressor #30538
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #29922
What does this implement/fix? Explain your changes.
This PR addresses the potential numerical instability and performance issues when using very small target values in
RandomForestRegressor
and data with very small ranges. The following changes have been made:Documentation Update:
RandomForestRegressor
documentation about the potential numerical issues with very small target values (e.g.,1e-8
or smaller).Validation Enhancement:
_check_data_range
insklearn/utils/validation.py
to detect and warn about data with very small ranges.validate_data
function to warn users if their target variable (y
) has a very small range.Any other comments?
To further illustrate the impact of small target values and data range, I generated a heatmap that visualizes the R² score for different scaling factors applied to X and Y. The results highlight numerical instability when very small scaling factors (e.g., 1e-8 or smaller) are used for the targets Y. This reinforces the need to warn users about potential issues and recommend scaling their data appropriately.
