8000 [MRG] MNT Initialize histograms in parallel and don't call np.zero in… · scikit-learn/scikit-learn@4a2de5b · GitHub
[go: up one dir, main page]

Skip to content

Commit 4a2de5b

Browse files
authored
[MRG] MNT Initialize histograms in parallel and don't call np.zero in Hist-GBDT (#18341)
1 parent 4de0f97 commit 4a2de5b

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

doc/whats_new/v0.24.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,13 @@ Changelog
195195
usage in `fit`. :pr:`18334` by `Olivier Grisel`_ `Nicolas Hug`_, `Thomas
196196
Fan`_ and `Andreas Müller`_.
197197

198+
- |Efficiency| Histogram initialization is now done in parallel in
199+
:class:`ensemble.HistGradientBoostingRegressor` and
200+
:class:`ensemble.HistGradientBoostingClassifier` which results in speed
201+
improvement for problems that build a lot of nodes on multicore machines.
202+
:pr:`18341` by `Olivier Grisel`_, `Nicolas Hug`_, `Thomas Fan`_, and
203+
:user:`Egor Smirnov <SmirnovEgorRu>`.
204+
198205
- |API|: The parameter ``n_classes_`` is now deprecated in
199206
:class:`ensemble.GradientBoostingRegressor` and returns `1`.
200207
:pr:`17702` by :user:`Simona Maggio <simonamaggio>`.

sklearn/ensemble/_hist_gradient_boosting/histogram.pyx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,8 @@ cdef class HistogramBuilder:
132132
G_H_DTYPE_C [::1] gradients = self.gradients
133133
G_H_DTYPE_C [::1] ordered_hessians = self.ordered_hessians
134134
G_H_DTYPE_C [::1] hessians = self.hessians
135-
hist_struct [:, ::1] histograms = np.zeros(
135+
# Histograms will be initialized to zero later within a prange
136+
hist_struct [:, ::1] histograms = np.empty(
136137
shape=(self.n_features, self.n_bins),
137138
dtype=HISTOGRAM_DTYPE
138139
)
@@ -177,6 +178,12 @@ cdef class HistogramBuilder:
177178
self.ordered_hessians[:n_samples]
178179
unsigned char hessians_are_constant = \
179180
self.hessians_are_constant
181+
unsigned int bin_idx = 0
182+
183+
for bin_idx in range(self.n_bins):
184+
histograms[feature_idx, bin_idx].sum_gradients = 0.
185+
histograms[feature_idx, bin_idx].sum_hessians = 0.
186+
histograms[feature_idx, bin_idx].count = 0
180187

181188
if root_node:
182189
if hessians_are_constant:
@@ -227,7 +234,7 @@ cdef class HistogramBuilder:
227234
cdef:
228235
int feature_idx
229236
int n_features = self.n_features
230-
hist_struct [:, ::1] histograms = np.zeros(
237+
hist_struct [:, ::1] histograms = np.empty(
231238
shape=(self.n_features, self.n_bins),
232239
dtype=HISTOGRAM_DTYPE
233240
)

0 commit comments

Comments
 (0)
0