Unparseable: local variable 'enc' referenced before assignment

Hi there!

Extracting doesn’t work anymore when you predecode the strings. This looks pretty trivial though. enc could be initialized with None, unless that would cause any problems in other parts of the code.

By the way, I would discourage the use of the old chardet library. The range of encodings it can detect is very limited and it’s slow on top. I’ve found cchardet to be a lot better, but really there is the excellent UnicodeDammit library in BeautifulSoup that first tries to extract various explicit encoding specifications and then falls back on such implicit methods. Thanks to their latest refactoring, I could even remove a number of ugly hacks I needed to use the older version.

/home/telofy/.buildout/eggs/readability_lxml-0.3.0.1-py2.7.egg/readability/readability.pyc in summary(self, html_partial)
    152             ruthless = True
    153             while True:
--> 154                 self._html(True)
    155                 for i in self.tags(self.html, 'script', 'style'):
    156                     i.drop_tree()

/home/telofy/.buildout/eggs/readability_lxml-0.3.0.1-py2.7.egg/readability/readability.pyc in _html(self, force)
    117     def _html(self, force=False):
    118         if force or self.html is None:
--> 119             self.html = self._parse(self.input)
    120         return self.html
    121 

/home/telofy/.buildout/eggs/readability_lxml-0.3.0.1-py2.7.egg/readability/readability.pyc in _parse(self, input)
    121 
    122     def _parse(self, input):
--> 123         doc, self.encoding = build_doc(input)
    124         doc = html_cleaner.clean_html(doc)
    125         base_href = self.options.get('url', None)

/home/telofy/.buildout/eggs/readability_lxml-0.3.0.1-py2.7.egg/readability/htmls.pyc in build_doc(page)
     15         page_unicode = page.decode(enc, 'replace')
     16     doc = lxml.html.document_fromstring(page_unicode.encode('utf-8', 'replace'), parser=utf8_parser)
---> 17     return doc, enc
     18 
     19 def js_re(src, pattern, flags, repl):

Unparseable: local variable 'enc' referenced before assignment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions