Closed
Description
Related to #16395
In a dataset with 10000 samples and 100 features, the memory usage is 1657 MB, for 200 features it is 3400MB, for 400 it is
6627 MB. In comparison, it is 95MB, 181MB and 356MB respectively for LightGBM
.
Noticed this while trying to train on MNIST, program got killed by OS.
Here is the code I'm using:
X, y = make_classification(n_classes=2, n_samples=10_000, n_features=400)
hgb = HistGradientBoostingClassifier(
max_iter=500,
max_leaf_nodes=127,
learning_rate=.1,
)
lg = lgb.LGBMClassifier(
n_estimators=500,
num_leaves=127,
learning_rate=0.1,
n_jobs=16
)
mems = memory_usage((hgb.fit, (X, y)))
print(f"{max(mems):.2f}, {max(mems) - min(mems):.2f} MB") # 2nd value is reported above.
Both were running at 100% in a 8 core/16 thread CPU. Had similar results with version 0.23.1.
System Info:
System:
python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
executable: /home/shihab/anaconda3/bin/python
machine: Linux-5.8.0-050800-generic-x86_64-with-debian-bullseye-sid
Python dependencies:
pip: 20.2.1
setuptools: 49.2.0
sklearn: 0.24.dev0
numpy: 1.19.1
scipy: 1.5.0
Cython: 0.29.21
pandas: 1.1.0
matplotlib: 3.2.2
joblib: 0.16.0
threadpoolctl: 2.1.0
Built with OpenMP: True
Metadata
Metadata
Assignees
Labels
No labels