8000 weird UnicodeDecodeError in StanfordCoreNLP.parse() · Issue #19 · dasmith/stanford-corenlp-python · GitHub
[go: up one dir, main page]

Skip to content
weird UnicodeDecodeError in StanfordCoreNLP.parse() #19
Open
@arne-cl

Description

@arne-cl

Hi Dustin,

I just found a really weird error. While corenlp can parse '100 dollars' just fine, '100 yen' causes it to crash.

Python 2.7.3 (default, Feb 27 2014, 19:37:34) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import corenlp
>>> c = corenlp.StanfordCoreNLP()
Loading Models: 5/5                                                                                                                                                                                                                         
>>> c.parse('100 dollars')
'{"sentences": [{"parsetree": "(ROOT (X (NP (CD 100) (NNS dollars))))", "text": "100 dollars", "dependencies": [["root", "ROOT", "dollars"], ["num", "dollars", "100"]], "words": [["100", {"NormalizedNamedEntityTag": "$100.0", "Lemma": "100", "CharacterOffsetEnd": "3", "PartOfSpeech": "CD", "CharacterOffsetBegin": "0", "NamedEntityTag": "MONEY"}], ["dollars", {"NormalizedNamedEntityTag": "$100.0", "Lemma": "dollar", "CharacterOffsetEnd": "11", "PartOfSpeech": "NNS", "CharacterOffsetBegin": "4", "NamedEntityTag": "MONEY"}]]}]}'

>>> c.parse('100 yen')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/corenlp-3.4.1-py2.7.egg/corenlp.py", line 240, in parse
    response = self._parse(text)
  File "/usr/local/lib/python2.7/dist-packages/corenlp-3.4.1-py2.7.egg/corenlp.py", line 230, in _parse
    raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 169: ordinal not in range(128)

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0