-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Bug with NumPy loadtxt()
and unicode strings
#4600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OP here. Just to correct/clarify the above: I used python/numpy on Linux but the files were created by a windows PC. |
the text loading functions are broken in respect to unicode or non-latin encodings, especially in python3, please try if gh-4208 helps, make sure to give the function the right encoding. |
I also faced a problem of reading encoded by non-ascii files, such as in written in Japanese. |
Nice, it's useful to replace |
This is a pretty old issue and I'm not sure how much of it is still relevant. The closest thing I can find to a reproducer is from one of the linked SO posts, where a user has trouble loading "Côte d'Ivoire" from a iso-8859 encoded file. This should work using loadtxt's >>> fh = io.BytesIO("Côte d'Ivoire".encode('iso-8859-1'))
>>> fh.getvalue()
b"C\xf4te d'Ivoire"
>>> # Note: use delimiter=',' to prevent a split at the space
>>> np.loadtxt(fh, dtype="U", delimiter=",", encoding='iso-8859-1')
array("Côte d'Ivoire", dtype='<U13') Note that the default value for I'm going to close this hoping that the original issue is either obsolete or resolved by e.g. the above example. If the issue persists or there are related file encoding issues, please reopen or open a new issue with a minimal reproducing example. |
Please, refer to this question posted in StackOverflow::
The OP uses windows and
ISO-8859
text file created by linux with very long lines, withCRLF
line terminators.When reading into NumPy, except the first line which contains labels (with special characters, usually only the greek mu):
Python 2.7.6, Numpy 1.8.0, this works perfectly::
Python 3.4.0, Numpy 1.8.0, gives an error::
It worked with
genfromtxt()
.The text was updated successfully, but these errors were encountered: