8000 Defaulting to utf-8 when chardet returns None · tanqhnguyen/python-readability@75e2e0c · GitHub
[go: up one dir, main page]

Skip to content

Commit 75e2e0c

Browse files
committed
Defaulting to utf-8 when chardet returns None
On articles like this one chardet returns None: http://news.zing.vn/nhip-song-tre/thay-giao-gay-sot-tung-bo-luat-tinh-yeu/a291427.html This causes exceptions later on when encoding.lower() is called
1 parent 0c2f29e commit 75e2e0c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

readability/encoding.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def get_encoding(page):
2626
if not text.strip() or len(text) < 10:
2727
return enc # can't guess
2828
res = chardet.detect(text)
29-
enc = res['encoding']
29+
enc = res['encoding'] or 'utf-8'
3030
#print '->', enc, "%.2f" % res['confidence']
3131
enc = custom_decode(enc)
3232
return enc
@@ -45,4 +45,4 @@ def custom_decode(encoding):
4545
if encoding in alternates:
4646
return alternates[encoding]
4747
else:
48-
return encoding
48+
return encoding

0 commit comments

Comments
 (0)
0