BallTree.data is a memory view #11728

amueller · 2018-08-01T09:30:54Z

Via stackoverflow. BallTree.data is documented as numpy array but is instead a memory view. I haven't looked at the code, but I think we should store it as a numpy array and if that results in any efficiency issues, we can make it private instead.

The text was updated successfully, but these errors were encountered:

BlaneG · 2018-08-02T02:59:51Z

I haven't worked with Cython before. Are you just suggesting to cast the memory view as a numpy array when data is initialized in ball_tree?:
cdef DTYPE_t* data = np.asarray(&tree.data[0, 0])

Or is there a cleaner way to do this?

jeremiedbb · 2018-08-02T17:27:05Z

This is not as simple as that. BallTree heritates from the BinaryTree class, and data is a public attribute of BinaryTree, stored as a memoryview.

In BinaryTree all array attributes are stored as memory views and I think we shouldn't store them a np.ndarray. Memoryviews are more modern and it's advised to use them instead of numpy arrays now.

Is it not possible to just change the doc and document data as a memoryview ?
If not, I think making it private is the best solution. We could maybe add a get_data function which returns a numpy array if it's really needed.

rth · 2018-08-03T08:50:51Z

Yes, also in BinaryTree data is actually the memoryview corresponding to the data_arr attribute. data is exposed in python using the readonly property while data_arr is not (more information about static cython attributes can be found in the docs).

So we could, refactor BinaryTree, BallTree etc to use the data variable name for what is now data_arr and make it public, while not making public what is currently data.

In the end though, data is just a copy of the training data that the user has already. I guess it was exposed to be consistent with the scipy's KDTree implementation, but I wonder if making sure it's 100% consistent would really justify the effort. Maybe just documenting that it's a memoryview, so that the few users who want to use it, can call np.asarray(estimator.data) to get an array.

amueller added Easy Well-defined and straightforward way to resolve help wanted labels Aug 1, 2018

NicolasHug mentioned this issue Aug 6, 2018

[MRG] Updated doc for data attribute of BallTree and KDTree #11764

Merged

amueller closed this as completed in #11764 Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BallTree.data is a memory view #11728

BallTree.data is a memory view #11728

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BallTree.data is a memory view #11728

BallTree.data is a memory view #11728

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!