-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
crash in _tree.so when calling fit() on large tree ensemble #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep, it's using 32-bit indices, without a check for whether its inputs are small enough. Ping @glouppe. (Were you actually handling a 20GB dataset, or is this just a test?) |
I had some data that was large enough to trigger the bug, but these sizes are arbitrary. |
We could use http://www.python.org/dev/peps/pep-0353/ |
We should be using |
This bug is indeed known (see #1466) and should be solved by using Py_ssize_t instead. |
Isn't it just a search and replace basically? Why wait for someone to report crashers instead of just fixing it? |
Yes basically. PR are welcome. |
With adversarial datasets, I can get more errors. I think if you swap dimensions it also errors out.
|
Closing as duplicate -- feel free to continue discussing, but I'd like to keep the issue tracker clean. |
On 03/27/2013 10:08 PM, erg wrote:
|
I made a patch that fixes the splitting problem and didn't submit it yet. I ran it with this patch overnight and while it didn't finish training my trees, it didn't crash either. The remaining bug is with overflow on datasets with large number of rows or columns. I don't know if there are any subtleties in changing the cython code or if it would slow it down. |
Perhaps it's using 32bit ints as the indices instead of u64?
The text was updated successfully, but these errors were encountered: