8000 .fill called with unicode scalar Python3 on Windows yields UnicodeDecodeError when accessing the array values · Issue #7227 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

.fill called with unicode scalar Python3 on Windows yields UnicodeDecodeError when accessing the array values #7227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lesteve opened this issue Feb 11, 2016 · 6 comments

Comments

@lesteve
Copy link
Contributor
lesteve commented Feb 11, 2016

Note: this works fine with Python 2.7 on Windows.

Snippet:

import numpy as np

arr = np.empty(3, dtype='<U3')
unicode_scalar = np.array(['asd'])[0]
arr.fill(unicode_scalar.tolist())  # works with python str
print(arr)

arr.fill(unicode_scalar)  # fills the array with some garbage
print(arr)  # UnicodeDecodeError

Full Exception:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-8-5a844f3e15b5> in <module>()
----> 1 print(arr)  # UnicodeDecodeError

C:\Users\lesteve\Miniconda3\lib\site-packages\numpy\core\numeric.py in array_str(a, max_line_width, precision, suppress_small)
   1773
   1774     """
-> 1775     return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
   1776
   1777 def set_string_function(f, repr=True):

C:\Users\lesteve\Miniconda3\lib\site-packages\numpy\core\arrayprint.py in array2string(a, max_line_width, precision, suppress_small, separator, prefix, style, formatter)
    445     else:
    446         lst = _array2string(a, max_line_width, precision, suppress_small,
--> 447                             separator, prefix, formatter=formatter)
    448     return lst
    449

C:\Users\lesteve\Miniconda3\lib\site-packages\numpy\core\arrayprint.py in _array2string(a, max_line_width, precision, suppress_small, separator, prefix, formatter)
    323     lst = _formatArray(a, format_function, len(a.shape), max_line_width,
    324                        next_line_prefix, separator,
--> 325                        _summaryEdgeItems, summary_insert)[:-1]
    326     return lst
    327

C:\Users\lesteve\Miniconda3\lib\site-packages\numpy\core\arrayprint.py in _formatArray(a, format_function, rank, max_line_len, next_line_prefix, separator, edge_items, summary_insert)
    491
    492         for i in range(trailing_items, 1, -1):
--> 493             word = format_function(a[-i]) + separator
    494             s, line = _extendLine(s, line, word, max_line_len, next_line_prefix)
    495

UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
@lesteve
Copy link
Contributor Author
lesteve commented Feb 11, 2016

I forgot to say, this is with numpy version 1.10.4.

@lesteve
< 8000 details-menu class="dropdown-menu dropdown-menu-sw show-more-popover color-fg-default" style="width:185px" src="" preload > Copy link
Contributor Author
lesteve commented Feb 11, 2016

Same thing happens with numpy master, numpy version '1.12.0.dev0+920c527'.

@eric-wieser
Copy link
Member
eric-wieser commented Apr 6, 2017

What is sys.maxunicode on your system? Seems related to #8901.

Also duplicates #3258.

Essentially, UCS2-encoded text is being copied into a UCS4 buffer, and everything that you would expect to go wrong goes wrong

@lesteve
Copy link
Contributor Author
lesteve commented Apr 7, 2017

Hmmm, I am not able to reproduce this anymore, even with the numpy version mentioned above. Not sure what changed, maybe the python from conda I was using at the time was using a narrow build.

For completeness, on my current system where I am unable to reproduce the problem, sys.maxunicode is 1114111.

@lesteve
Copy link
Contributor Author
lesteve commented Apr 7, 2017

I am going to close this issue since it is a duplicate of #3258.

@lesteve lesteve closed this as completed Apr 7, 2017
@eric-wieser
Copy link
Member
eric-wieser commented Apr 7, 2017

sys.maxunicode is 1114111

Yep, you're no longer on a narrow build

I am going to close this

I've added a brief summary of this issue there then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0