8000 Python Kernel Died unexpectedly when using a for loop and RandomForestRegressor · Issue #7903 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Python Kernel Died unexpectedly when using a for loop and RandomForestRegressor #7903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ngonthier opened this issue Nov 17, 2016 · 7 comments

Comments

@ngonthier
Copy link
Contributor
ngonthier commented Nov 17, 2016

Description

I try to 30 fit times a Random Forest Regressor (sklearn.ensemble.RandomForestRegressor) on different sets of data.
In order to do it I use a for loop. But every-times, I run my script the python kernel died unexpectedly.
I run this script on different machines, the power of the machine (RAM and CPU) only delays the moment when the kernel die.
I write a minimal case of my script without my personal data and of the other thinks I want to do normally. In my complete and original script in which I read the data in a csv file and write the prediction in an other csv file, the kernel die even quickly.

Steps/Code to Reproduce

Example:

"""
@author: Nicolas 

The goal of this script is too highlight a problem that make crash the python 
kernel on Windows and Linux machine by using RandomForestRegressor
With scikit learn 0.18.
"""

import numpy as np
from sklearn.ensemble import RandomForestRegressor

def main():
    NumberOfRandomForest = 30
    
    # We create a random forest regressor
    RFR = RandomForestRegressor(n_estimators=100, criterion='mae', max_depth=None,min_samples_split=2, min_samples_leaf=1)
        
    print("Start the for loop")
    for i in range(NumberOfRandomForest):
        print(i)
        X = np.random.rand(150, 30)
        y = np.random.rand(150, 10)
        RFR.fit(X,y)

if __name__ == "__main__":
    main()

Versions

I try my script on two different machines.

On the first one the script crash after only 3 iterations whereas on the second the script can reach 25 iterations.
The setup of the first machine :
Windows-10-10.0.14393-SP0
Python 3.5.2 |Anaconda 4.2.0 (32-bit)| (default, Jul 5 2016, 11:45:57) [MSC v.1900 32 bit (Intel)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18

The setup of the second one :
Linux-3.10.0-327.28.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18.1

@jnothman
Copy link
Member

Failing to replicate on OS X 10.12.1 / Python 3.5.2 |Continuum Analytics,
Inc.

On 18 November 2016 at 04:59, ISAE-Nicolas notifications@github.com wrote:

Description

I try to 30 fit times a Random Forest Regressor (sklearn.ensemble.RandomForestRegressor)
on different sets of data.
In order to do it I use a for loop. But every-times, I run my script the
python kernel died unexpectedly.
I run this script on different machines, the power of the machine (RAM and
CPU) only delays the moment when the kernel die.
I write a minimal case of my script without my personal data and of the
other thinks I want to do normally. In my complete and original script in
which I read the data in a csv file and write the prediction in an other
csv file, the kernel die even quickly.
Steps/Code to Reproduce Expected Results

Example:

"""
@author: Nicolas

The goal of this script is too highlight a problem that make crash the python
kernel on Windows and Linux machine by using RandomForestRegressor
With scikit learn 0.18.
"""

import numpy as np
from sklearn.ensemble import RandomForestRegressor

def main():
NumberOfRandomForest = 30

# We create a random forest regressor
RFR = RandomForestRegressor(n_estimators=100, criterion='mae', max_depth=None,min_samples_split=2, min_samples_leaf=1)

print("Start the for loop")
for i in range(NumberOfRandomForest):
    print(i)
    X = np.random.rand(150, 30)
    y = np.random.rand(150, 10)
    RFR.fit(X,y)

if name == "main":
main()

Actual Results Versions

I try my script on two different machines.

On the first one the script crash after only 3 iterations whereas on the
second the script can reach 25 iterations.
The setup of the first machine :
Windows-10-10.0.14393-SP0
Python 3.5.2 |Anaconda 4.2.0 (32-bit)| (default, Jul 5 2016, 11:45:57)
[MSC v.1900 32 bit (Intel)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18

The setup of the second one :
Linux-3.10.0-327.28.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18.1


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7903, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AAEz693mpBw0ZBOAX0u5BuBPTGFHU9L5ks5q_JYRgaJpZM4K1ngQ
.

8000
@amueller
Copy link
Member

Do you maybe run out of memory?

@nelson-liu
Copy link
Contributor

perhaps related to #7811, which i haven't been able to figure out. I think it's something about the use of numpy arrays to hold objects in the mae code, which prevents it from being freed until it reaches the python level...

@amueller
Copy link
Member

I can reproduce that this keeps eating arbitrary large amounts of memory. Which is somewhat but not entirely surprising. You have 10 target variables that are all completely unrelated to the features, so you have to create 1500 leafs per tree.

@amueller
Copy link
Member

Hm yeah sounds like a memory leak.

@lucidyan
Copy link

Same story on OS X 10.12.1 / Python 3.5.2 | Anaconda
It's reproduced only when I use new 'mae' criterion, with 'mse' all working fine. Also 'mae' very slow even on very small datasets, with compare to 'mse'.

@lesteve
Copy link
Member
lesteve commented Nov 28, 2016

I am going to close this one, since it is a duplicate of #7811.

@lesteve lesteve closed this as completed Nov 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
0