8000 ENH: Allow reading from buffered stdout using np.fromfile by hmaarrfk · Pull Request #12324 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Allow reading from buffered stdout using np.fromfile #12324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

hmaarrfk
Copy link
Contributor
@hmaarrfk hmaarrfk commented Nov 4, 2018

Previously, np.fromfile would check if the object was an object of RawIO. If it was, then it wasn't seekable.

Unfortunately, (at least in python 3) this doesn't work for stdout as it is buffered but not seekable (at least on linux).

This fixes that by calling file.seekable()

Tests are also added to ensure the behaviour lives on.

Fixes #12309

Patch from 2018, before rebase in 2022:
12324_before_rebase.patch.txt

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 4, 2018

Shippable failed with this test:

=================================== FAILURES ===================================
______________________ TestPathUsage.test_tofile_fromfile ______________________
[gw0] linux2 -- Python 2.7.12 /root/venv/2.7/bin/python

self = <numpy.core.tests.test_records.TestPathUsage object at 0xffff8d904f90>
    def test_tofile_fromfile(self):
        with temppath(suffix='.bin') as path:
            path = Path(path)
            a = np.empty(10, dtype='f8,i4,a5')
            a[5] = (0.5,10,'abcde')
            a.newbyteorder('<')
            with path.open("wb") as fd:
                a.tofile(fd)
            x = np.core.records.fromfile(path,
                                         formats='f8,i4,a5',
                                         shape=10,
                                         byteorder='<')

but I can't seem to find it when i search...

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 4, 2018

Is there a preference between

ret = PyObject_CallMethod(file, "seekable", "");

and

ret = PyObject_CallMethod(file, "seekable", NULL);

@eric-wieser
Copy link
Member
eric-wieser commented Nov 5, 2018

I'd probably opt for passing NULL - I'd expect it to be faster, else CPython wouldn't bother supporting it.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

Ok.

Any clue where the python 2.7 test is? Seems important...

@eric-wieser
Copy link
Member

numpy.core.tests.test_records would be at numpy/core/tests/test_records.py

@charris
Copy link
Member
charris commented Nov 5, 2018

Shippable failed with this test:

I've seen that failure on other PRs, but not all, or even all recent, so I don't know what is the cause. It seems inconsistent, only on shippable, and only on Python 2.7. Might try a rebase to see if that fixes it.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

Sorry my repo was out of sync.

How is it even getting in that test? Path should be None.....

I rebased, lets see what happens.

@tylerjereddy
Copy link
Contributor

The shippable CI uses pathlib for 2.x backporting of the functionality. I guess we could try switching to pathlib2, which is perhaps more well maintained. Or just accept not using pathlib on 2.x & remove that from shippable 2.7 job.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

Thanks @tylerjereddy.

Is this considered a bugfix or feature? If it is a feature, then maybe we can just ignore 2.7?

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

@charris didn't mean to ignore your comment. Thanks for chiming in. my webpage was out of sync.

@tylerjereddy
Copy link
Contributor

I opened another PR to try to fix the shippable 2.x issue. In that PR & here I see circleci now starting to fail for some reason!

@eric-wieser
Copy link
Member
eric-wieser commented Nov 5, 2018

How is it even getting in that test? Path should be None.....

I can't see the old shippable output, as the link is discarded every time you rebase - so can't help you. Can you link to the failed output next time?

Edit: https://app.shippable.com/github/numpy/numpy/runs/1301/1/tests, I assume

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

@eric-wieser it seems that shippable is using a backport of pathlib, I didn't know that. Hence the discussion in #12330

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

@eric-wieser adding te comment here since that subthread is annoying.

On the line:

PyErr_SetString(PyExc_IOError,

It detects an error, but keeps going. Shouldn't it return in the if statement?

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 5, 2018

I would also like it if a out parameter was provided to fromfile does that discussion go to the the mailing list, or issue tracker?

@eric-wieser
Copy link
Member

It detects an error, but keeps going. Shouldn't it return in the if statement?

Congrats, you found a bug

@hmaarrfk hmaarrfk force-pushed the from_file_stdout branch 3 times, most recently from 73047f3 to 538f414 Compare November 7, 2018 15:19
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 7, 2018

@eric-wieser It seems that the code that tries to guess how long a file is also uses seek, but doesn't check the seekable parameter. It seems difficult to tell the size of the stdout buffer otherwise.

Basically multiarraymodule.c should check the output of originpos and nin somewhere in:

    fp = npy_PyFile_Dup2(file, "rb", &orig_pos);
    if (fp == NULL) {
        Py_DECREF(file);
        return NULL;
    }
    if (type == NULL) {
        type = PyArray_DescrFromType(NPY_DEFAULT_TYPE);
    }
    ret = PyArray_FromFile(fp, type, (npy_intp) nin, sep);

and return a resonable error message telling the user that they must specify count for unseekable file buffers. Maybe I'll get to it this weekend.

In my mind, a patch would include a test that raises an exception when the count parameter isn't provided when trying to read from the stdout buffer.

@hmaarrfk hmaarrfk force-pushed the from_file_stdout branch 3 times, most recently from a2e78a1 to 482b315 Compare November 7, 2018 20:07
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Nov 7, 2018

Thanks eric, seems like there is a serious bug that was found due to the tests you requested. I'll have to keep looking into it.

@hmaarrfk
Copy link
Contributor Author

@eric-wieser i'm stumped. Basically, it works in pure python as shown by the tests

https://travis-ci.org/numpy/numpy/jobs/453301883#L4540

only 2 "numpy-ic" fail, the two "pythonic" ones pass well. I split the tests so you can do it on your computer by checking out the commit before the mods to the implementation are done.

My ultimate goal here would be to make a well tested fromfile so that I can expand the capabilities with an out parameter.

I couldn't find many tests for fromfile 😕

Base automatically changed from master to main March 4, 2021 02:04
@hmaarrfk hmaarrfk force-pushed the from_file_stdout branch 3 times, most recently from 1973034 to 912c9af Compare June 19, 2022 03:16
@InessaPawson InessaPawson added the 52 - Inactive Pending author response label Apr 8, 2024
@mattip
Copy link
Member
mattip commented Jul 3, 2024

@hmaarrfk would you like to try to revive this?

@hmaarrfk hmaarrfk force-pushed the from_file_stdout branch 2 times, most recently from 5088e2a to 3a3a062 Compare July 3, 2024 12:49
@hmaarrfk hmaarrfk force-pushed the from_file_stdout branch from 3a3a062 to 6bbdd65 Compare July 3, 2024 12:53
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Jul 3, 2024

Thanks mat. While I don't need this anymore, I think that it is a good addition.

I'm not sure why the needs-tests label was added. I thought I had added enough tests in 2018 with the help of Eric.

Honestly, if more tests are needed, I don't think I"ll be able to be in the right headspace to craft them.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Jul 3, 2024

osx:

2024-07-03T12:57:10.0060650Z >           arr = np.fromfile(p.stdout, dtype=np.uint8, count=len(s1))
2024-07-03T12:57:10.0061030Z E           OSError: [Errno 29] Illegal seek

@hmaarrfk hmaarrfk closed this Jul 3, 2024
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Jul 3, 2024

Unfortunately, these test failures are not ones I have an intention to debug anymore.

This can be used as a reference in case anybody else wants to implement this feature in the future (perhaps even myself).

@mattip
Copy link
Member
mattip commented Jul 3, 2024

Thanks for giving it a refresh, sorry it didn't pan out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

from npy_PyFile_Dup2 check the seekable attribute
6 participants
0