8000 doctest failure in time.core with --remote-data by mhvk · Pull Request #2958 · astropy/astropy · GitHub
[go: up one dir, main page]

Skip to content

doctest failure in time.core with --remote-data #2958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 22, 2014

Conversation

mhvk
Copy link
Contributor
@mhvk mhvk commented Sep 19, 2014

I ran the tests in the v0.4.x branch with the --remote-data option and got one failure:

__________________________________ [doctest] astropy.time.core.Time.get_delta_ut1_utc __________________________________
825         To use an updated IERS A bulletin to calculate UT1-UTC
826         (see also ``astropy.utils.iers``)::
827 
828             >>> from astropy.utils.iers import IERS_A, IERS_A_URL
829             >>> from astropy.utils.data import download_file
830             >>> t = Time(['1974-01-01', '2000-01-01'], scale='utc')
831             >>> iers_a_file = download_file(IERS_A_URL,
832             ...                             cache=True)        # doctest: +REMOTE_DATA
833             Downloading ... [Done]
834             >>> iers_a = IERS_A.open(iers_a_file)              # doctest: +REMOTE_DATA
UNEXPECTED EXCEPTION: AttributeError("'MaskedColumn' object has no attribute '_fill_value'",)
Traceback (most recent call last):

  File "/internal/1/root/usr/local/lib/python2.7/doctest.py", line 1289, in __run
    compileflags, 1) in test.globs

  File "<doctest astropy.time.core.Time.get_delta_ut1_utc[8]>", line 1, in <module>

  File "astropy/utils/iers/iers.py", line 124, in open
    cls.iers_table = cls.read(**kwargs)

  File "astropy/utils/iers/iers.py", line 290, in read
    return cls(iers_a[~iers_a['UT1_UTC_A'].mask])

  File "astropy/utils/iers/iers.py", line 261, in __init__
    table['UT1_UTC_B'])

  File "astropy/table/table.py", line 792, in __setitem__
    new_column = value.copy(copy_data=False)

  File "astropy/table/column.py", line 160, in copy
    data = self.data

  File "astropy/table/column.py", line 790, in data
    out.fill_value = self.fill_value

  File "astropy/table/column.py", line 756, in fill_value
    return self.get_fill_value()  # defer to native ma.MaskedArray method

  File "/internal/1/root/src/astropy/astropy/.tox/py27/lib/python2.7/site-packages/numpy/ma/core.py", line 3360, in get_fill_value
    if self._fill_value is None:

AttributeError: 'MaskedColumn' object has no attribute '_fill_value'

astropy/time/core.py:834: UnexpectedException

For some reason this did not occur in Python 2.6 (though I need to double check this). It also does not occur in Python 3, but only because we don't run the doctests there. However, when I run the same test manually in Python 3.4 I get the same traceback.

Update: After further testing I can reduce this to a simpler case:

In [1]: import numpy as np

In [2]: class MyArray(np.ndarray):
    def __new__(*args, **kwargs):
        print('MyArray.__new__ called with', args, kwargs)
        return super(MyArray, args[0]).__new__(*args, **kwargs)
    def __array_finalize__(self, obj=None):
        print('MyArray.__array_finalize__ called with', type(self), type(obj))
    __array_priority__ = 1000
   ...:     

In [3]: a = MyArray(3)
MyArray.__new__ called with (<class '__main__.MyArray'>, 3) {}
MyArray.__array_finalize__ called with <class '__main__.MyArray'> <class 'NoneType'>

In [4]: b = MyArray(3)
MyArray.__new__ called with (<class '__main__.MyArray'>, 3) {}
MyArray.__array_finalize__ called with <class '__main__.MyArray'> <class 'NoneType'>

In [5]: np.where([True, False, True], a, b)
MyArray.__array_finalize__ called with <class '__main__.MyArray'> <class 'NoneType'>
Out[5]: MyArray([  1.25665856e-312,   1.25665856e-312,   2.22248119e-317])

What this demonstrates is that for the output from the np.where call, Numpy creates a new MyArray without calling its __new__, and then calls its __array_finalize__ without a template to copy from. In thinking about it this behavior seems intentional. It kind of makes sense. But if it is intentional Numpy's docs don't mention this mode of operation.

The solution, I think, is to set sensible defaults for the column's necessary attributes in __array_finalize__ even when no template is given.

@embray
Copy link
Member Author
embray commented Sep 18, 2014

Note, this is with Numpy 1.9. Could this be one of those Numpy 1.9 bugs I've heard about?

@embray
Copy link
Member Author
embray commented Sep 18, 2014

Confirmed, at least, that this only happens with Numpy 1.9.

@embray embray added this to the v0.4.2 milestone Sep 18, 2014
@embray
Copy link
Member Author
embray commented Sep 18, 2014

So this seems to have something to do with this:

259             table['UT1_UTC'] = np.where(table['UT1_UTC_B'].mask,
260                                         table['UT1_UTC_A'],
261  ->                                     table['UT1_UTC_B'])

in iers.py. The array returned from np.where is a MaskedColumn like it should be, but none of its attributes have been set. It has no ._name, ._unit, ._fill_value, etc. I must confess I'm not entirely sure what goes on in np.where as far as creation of the output array.

In Numpy < 1.9 the np.where just returns a plain ndarray, which is later properly recast as a Column or MaskedColumn as the case may be when assigning it to the table. However in Numpy 1.9 it actually returns a MaskedColumn, but with most of its attributes unassigned. I'm not sure how it's creating that MaskedColumn because it seems like it is neither calling its __new__ or its __array_finalize__.

@embray
Copy link
Member Author
embray commented Sep 18, 2014

Ah, my mistake. __array_finalize__ is called when the output array is converted to a MaskedColumn, but since the obj argument is None (the array is not being finalized from an existing MaskedColumn).

However, according to the Numpy docs we should only have obj=None if the array construct was called explicitly. If that were the case all the relevant attributes would be created with the appropriate defaults. However, in testing this I set a breakpoint on __new__ and it is never entered. So this is a change in behavior in Numpy and it's not clear whether or not it's intentional.

@astrofrog
Copy link
Member

cc @mhvk, who might have some ideas here.

…lumn

(which prevents reading of IERS_A files)
@mhvk
Copy link
Contributor
mhvk commented Sep 19, 2014

I think the change is that in numpy 1.8 and before, np.where always returned a plain array. In principle, this of course is not good, but it does work for just setting column data. Hence, we can work around this bug by simply changing to

table['UT1_UTC'] = np.where(table['UT
8000
1_UTC_B'].mask,
                            table['UT1_UTC_A'].data,
                            table['UT1_UTC_B'].data)

I made this into a PR, but do think we should try to do something better. I'll see if I can have a look at how np.where is actually implemented.

@embray
Copy link
Member Author
embray commented Sep 19, 2014

I don't think the issue is necessarily on np.where per se. Something else changed in how and when __array_finalize__ gets called. But this workaround is definitely fine for now.

@astrofrog
Copy link
Member

👍

@mhvk
Copy link
Contributor
mhvk commented Sep 19, 2014

I looked a bit more at np.where (in multiarraymodule.c) and it is non-trivial to get that to work safely with subclasses. In principle, in MaskedColumn, we could ensure this particular error disappears, but my sense is that that is not worth it. Indeed, as is, numbers are just copied from one array or the other, so if units are different, the result will be wrong, so it is not necessarily helpful to have np.where "work".

@embray
Copy link
Member Author
embray commented Sep 19, 2014

I think this may need to be raised upstream. I can imagine how the current behavior might be intentional, but it's also surprising and undocumented, and it's not clear how best to handle this case.

@mhvk
Copy link
Contributor
mhvk commented Sep 21, 2014

@embray -- see numpy/numpy#5095. It also has a suggestion for what I think would be the ideal behaviour. Not sure how to implement it myself, but hopefully one of the people with a better sense of the numpy internals can help!

@embray
Copy link
Member Author
embray commented Sep 22, 2014

Well this fix is good enough for now. We can deal with the upstream issue later.

embray added a commit that referenced this pull request Sep 22, 2014
doctest failure in time.core with --remote-data
@embray embray merged commit f99c8ec into astropy:master Sep 22, 2014
@mhvk mhvk deleted the iers-table-numpy-1.9-bug branch September 22, 2014 17:56
embray added a commit that referenced this pull request Sep 22, 2014
doctest failure in time.core with --remote-data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0