8000 Handle zip files which contain non-UTF-8 encoded files by craiga · Pull Request #75 · pyexcel/pyexcel-io · GitHub
[go: up one dir, main page]

Skip to content

Handle zip files which contain non-UTF-8 encoded files #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
handle zip files which contain non-UTF-8 encoded files
  • Loading branch information
craiga committed Sep 16, 2020
commit 61c1195cd1672057c65d0e5135d7d3dbc48edf19
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Change log
0.6.0 - tbd
--------------------------------------------------------------------------------

#. `#74 <https://github.com/pyexcel/pyexcel-io/issues/74>`_: handle zip files which
contain non-UTF-8 encoded files.


**removed**

#. python 3.6 lower versions are no longer supported
Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
In alphabet 8000 ical order:

* `Antherkiv <https://api.github.com/users/antherkiv>`_
* `Craig Anderson <https://api.github.com/users/craiga>`_
* `John Vandenberg <https://api.github.com/users/jayvdb>`_
* `Stephen J. Fuhry <https://api.github.com/users/fuhrysteve>`_
* `Stephen Rauch <https://api.github.com/users/stephenrauch>`_
5 changes: 4 additions & 1 deletion pyexcel_io/readers/csvz.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
"""
import zipfile

import chardet

from pyexcel_io.sheet import NamedContent
from pyexcel_io._compact import StringIO
from pyexcel_io.readers.csvr import CSVinMemoryReader
Expand Down Expand Up @@ -43,7 +45,8 @@ def close(self):
def read_sheet(self, index):
name = self.content_array[index].name
content = self.zipfile.read(self.content_array[index].payload)
sheet = StringIO(content.decode("utf-8"))
encoding_guess = chardet.detect(content)
sheet = StringIO(content.decode(encoding_guess["encoding"]))

return CSVinMemoryReader(NamedContent(name, sheet), **self.keywords)

Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
ordereddict;python_version<"2.7"
lml>=0.0.4
chardet
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@

INSTALL_REQUIRES = [
"lm 8000 l>=0.0.4",
"chardet",
]
SETUP_COMMANDS = {}

Expand Down
Empty file modified test.sh
100644 → 100755
Empty file.
10 changes: 10 additions & 0 deletions tests/test_new_csvz_book.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,16 @@ def test_reading(self):
self.assertEqual(list(data["pyexcel_sheet1"]), [[u"中", u"文", 1, 2, 3]])
zipreader.close()

def test_reading_utf32(self):
zip = zipfile.ZipFile(self.file, "w")
zip.writestr("something.ext", self.result.encode("utf-32"))
zip.close()
zipreader = self.reader_class()
zipreader.open(self.file)
data = zipreader.read_all()
self.assertEqual(list(data["something"]), [[u"中", u"文", 1, 2, 3]])
zipreader.close()

def tearDown(self):
os.unlink(self.file)

Expand Down
0