8000 import works with simplejson or json. Removed parse_imperative() bec… · aayn/stanford-corenlp-python@d486ad2 · GitHub
[go: up one dir, main page]

65E3
Skip to content

Commit d486ad2

Browse files
committed
import works with simplejson or json. Removed parse_imperative() because the new Standford Parser seems to handle imperatives well.
1 parent 4f4edbd commit d486ad2

File tree

5 files changed

+247
-93
lines changed

5 files changed

+247
-93
lines changed

LICENSE

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
GNU LESSER GENERAL PUBLIC LICENSE
2+
Version 3, 29 June 2007
3+
4+
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
5+
Everyone is permitted to copy and distribute verbatim copies
6+
of this license document, but changing it is not allowed.
7+
8+
9+
This version of the GNU Lesser General Public License incorporates
10+
the terms and conditions of version 3 of the GNU General Public
11+
License, supplemented by the additional permissions listed below.
12+
13+
0. Additional Definitions.
14+
15+
As used herein, "this License" refers to version 3 of the GNU Lesser
16+
General Public License, and the "GNU GPL" refers to version 3 of the GNU
17+
General Public License.
18+
19+
"The Library" refers to a covered work governed by this License,
20+
other than an Application or a Combined Work as defined below.
21+
22+
An "Application" is any work that makes use of an interface provided
23+
by the Library, but which is not otherwise based on the Library.
24+
Defining a subclass of a class defined by the Library is deemed a mode
25+
of using an interface provided by the Library.
26+
27+
A "Combined Work" is a work produced by combining or linking an
28+
Application with the Library. The particular version of the Library
29+
with which the Combined Work was made is also called the "Linked
30+
Version".
31+
32+
The "Minimal Corresponding Source" for a Combined Work means the
33+
Corresponding Source for the Combined Work, excluding any source code
34+
for portions of the Combined Work that, considered in isolation, are
35+
based on the Application, and not on the Linked Version.
36+
37+
The "Corresponding Application Code" for a Combined Work means the
38+
object code and/or source code for the Application, including any data
39+
and utility programs needed for reproducing the Combined Work from the
40+
Application, but excluding the System Libraries of the Combined Work.
41+
42+
1. Exception to Section 3 of the GNU GPL.
43+
44+
You may convey a covered work under sections 3 and 4 of this License
45+
without being bound by section 3 of the GNU GPL.
46+
47+
2. Conveying Modified Versions.
48+
49+
If you modify a copy of the Library, and, in your modifications, a
50+
facility refers to a function or data to be supplied by an Application
51+
that uses the facility (other than as an argument passed when the
52+
facility is invoked), then you may convey a copy of the modified
53+
version:
54+
55+
a) under this License, provided that you make a good faith effort to
56+
ensure that, in the event an Application does not supply the
57+
function or data, the facility still operates, and performs
58+
whatever part of its purpose remains meaningful, or
59+
60+
b) under the GNU GPL, with none of the additional permissions of
61+
this License applicable to that copy.
62+
63+
3. Object Code Incorporating Material from Library Header Files.
64+
65+
The object code form of an Application may incorporate material from
66+
a header file that is part of the Library. You may convey such object
67+
code under terms of your choice, provided that, if the incorporated
68+
material is not limited to numerical parameters, data structure
69+
layouts and accessors, or small macros, inline functions and templates
70+
(ten or fewer lines in length), you do both of the following:
71+
72+
a) Give prominent notice with each copy of the object code that the
73+
Library is used in it and that the Library and its use are
74+
covered by this License.
75+
76+
b) Accompany the object code with a copy of the GNU GPL and this license
77+
document.
78+
79+
4. Combined Works.
80+
81+
You may convey a Combined Work under terms of your choice that,
82+
taken together, effectively do not restrict modi F438 fication of the
83+
portions of the Library contained in the Combined Work and reverse
84+
engineering for debugging such modifications, if you also do each of
85+
the following:
86+
87+
a) Give prominent notice with each copy of the Combined Work that
88+
the Library is used in it and that the Library and its use are
89+
covered by this License.
90+
91+
b) Accompany the Combined Work with a copy of the GNU GPL and this license
92+
document.
93+
94+
c) For a Combined Work that displays copyright notices during
95+
execution, include the copyright notice for the Library among
96+
these notices, as well as a reference directing the user to the
97+
copies of the GNU GPL and this license document.
98+
99+
d) Do one of the following:
100+
101+
0) Convey the Minimal Corresponding Source under the terms of this
102+
License, and the Corresponding Application Code in a form
103+
suitable for, and under terms that permit, the user to
104+
recombine or relink the Application with a modified version of
105+
the Linked Version to produce a modified Combined Work, in the
106+
manner specified by section 6 of the GNU GPL for conveying
107+
Corresponding Source.
108+
109+
1) Use a suitable shared library mechanism for linking with the
110+
Library. A suitable mechanism is one that (a) uses at run time
111+
a copy of the Library already present on the user's computer
112+
system, and (b) will operate properly with a modified version
113+
of the Library that is interface-compatible with the Linked
114+
Version.
115+
116+
e) Provide Installation Information, but only if you would otherwise
117+
be required to provide such information under section 6 of the
118+
GNU GPL, and only to the extent that such information is
119+
necessary to install and execute a modified version of the
120+
Combined Work produced by recombining or relinking the
121+
Application with a modified version of the Linked Version. (If
122+
you use option 4d0, the Installation Information must accompany
123+
the Minimal Corresponding Source and Corresponding Application
124+
Code. If you use option 4d1, you must provide the Installation
125+
Information in the manner specified by section 6 of the GNU GPL
126+
for conveying Corresponding Source.)
127+
128+
5. Combined Libraries.
129+
130+
You may place library facilities that are a work based on the
131+
Library side by side in a single library together with other library
132+
facilities that are not Applications and are not covered by this
133+
License, and convey such a combined library under terms of your
134+
choice, if you do both of the following:
135+
136+
a) Accompany the combined library with a copy of the same work based
137+
on the Library, uncombined with any other library facilities,
138+
conveyed under the terms of this License.
139+
140+
b) Give prominent notice with the combined library that part of it
141+
is a work based on the Library, and explaining where to find the
142+
accompanying uncombined form of the same work.
143+
144+
6. Revised Versions of the GNU Lesser General Public License.
145+
146+
The Free Software Foundation may publish revised and/or new versions
147+
of the GNU Lesser General Public License from time to time. Such new
148+
versions will be similar in spirit to the present version, but may
149+
differ in detail to address new problems or concerns.
150+
151+
Each version is given a distinguishing version number. If the
152+
Library as you received it specifies that a certain numbered version
153+
of the GNU Lesser General Public License "or any later version"
154+
applies to it, you have the option of following the terms and
155+
conditions either of that published version or of any later version
156+
published by the Free Software Foundation. If the Library as you
157+
received it does not specify a version number of the GNU Lesser
158+
General Public License, you may choose any version of the GNU Lesser
159+
General Public License ever published by the Free Software Foundation.
160+
161+
If the Library as you received it specifies that a proxy can decide
162+
whether future versions of the GNU Lesser General Public License shall
163+
apply, that proxy's public statement of acceptance of any version is
164+
permanent authorization for you to choose that version for the
165+
Library.

README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,19 @@ There's not much to this script. I decided to create it after having problems u
88
First the JPypes approach used in [stanford-parser-python](http://projects.csail.mit.edu/spatial/Stanford_Parser) had trouble initializing a JVM on two separate computers. Next, I discovered I could not use a
99
[Jython solution](http://blog.gnucom.cc/2010/using-the-stanford-parser-with-jython/) because the Python modules I needed did not work in Jython.
1010

11-
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.2.0** released 2011-09-16.
11+
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 1.3.1** released 2012-04-09.
1212

1313
## Download and Usage
1414

15-
You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package. Then copy all of the python files from this repository into the `stanford-corenlp-2011-09-16` folder.
15+
You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package. By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
1616

1717
In other words:
1818

19-
sudo pip install pexpect
20-
wget http://nlp.stanford.edu/software/stanford-corenlp-v1.2.0.tgz
21-
tar xvfz stanford-corenlp-v1.2.0.tgz
22-
cd stanford-corenlp-2011-09-16
23-
git clone git://github.com/dasmith/stanford-corenlp-python.git
24-
mv stanford-corenlp-python/* .
19+
sudo pip install pexpect unidecode # unidecode is optional
20+
git clone git://github.com/dasmith/stanford-corenlp-python.git
21+
cd stanford-corenlp-python.git
22+
wget http://nlp.stanford.edu/software/stanford-corenlp-2012-04-09.tgz
23+
tar xvfz stanford-corenlp-2012-04-09.tgz
2524

2625
Then, to launch a server:
2726

@@ -45,10 +44,20 @@ Assuming you are running on port 8080, the code in `client.py` shows an example
4544

4645
That returns a list containing a dictionary for each sentence, with keys `text`, `tuples` of the dependencies, and `words`:
4746

48-
Result [{'text': 'hello world',
49-
'tuples': [['amod', 'world', 'hello']],
50-
'words': [['hello', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': 5, 'CharacterOffsetBegin': 0, 'PartOfSpeech': 'JJ', 'Lemma': 'hello'}],
51-
['world', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': 11, 'CharacterOffsetBegin': 6, 'PartOfSpeech': 'NN', 'Lemma': 'world'}]]}]
47+
{u'sentences': [{u'parsetree': u'(ROOT (NP (JJ hello) (NN world)))',
48+
u'text': u'hello world',
49+
u'tuples': [[u'amod', u'world', u'hello'],
50+
[u'root', u'ROOT', u'world']],
51+
u'words': [[u'hello', {u'NamedEntityTag': u'O',
52+
u'CharacterOffsetEnd': u'5',
53+
u'CharacterOffsetBegin': u'0',
54+
u'PartOfSpeech': u'UH',
55+
u'Lemma': u'hello'}],
56+
[u'world', {u'NamedEntityTag': u'O',
57+
u'CharacterOffsetEnd': u'11',
58+
u'CharacterOffsetBegin': u'6',
59+
u'PartOfSpeech': u'NN',
60+
u'Lemma': u'world'}]]}]}
5261

5362
To use it in a regular script or to edit/debug it (because errors via RPC are opaque), load the module instead:
5463

@@ -89,9 +98,6 @@ If pexpect timesout while loading models, check to make sure you have enough mem
8998

9099
You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).
91100

92-
# TODO
93-
94-
- Mutex on parser
95-
- Write test functions for parsing accuracy
96-
- Calibrate parse-time prediction as function of sentence inputs
101+
# Contributors
102+
97103

client.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
11
import jsonrpc
2-
from simplejson import loads
2+
try:
3+
import json
4+
except ImportError:
5+
import simplejson as json
6+
37
server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
48
jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
59

610
# call a remote-procedure
7-
result = loads(server.parse("hello world"))
11+
result = json.loads(server.parse("hello world"))
812
print "Result", result
913

14+
result = json.loads(server.parse("stop smoking"))
15+
print "Result", result
1016

17+
result = json.loads(server.parse("eat dinner"))
18+
print "Result", result

corenlp.py

Lines changed: 41 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,45 @@
11
#!/usr/bin/env python
2-
"""
3-
This is a Python interface to Stanford Core NLP tools.
4-
It can be imported as a module or run as a server.
2+
#
3+
# corenlp - Python interface to Stanford Core NLP tools
4+
# Copyright (c) 2012 Dustin Smith
5+
# https://github.com/dasmith/stanford-corenlp-python
6+
#
7+
# This program is free software: you can redistribute it and/or modify
8+
# it under the terms of the GNU General Public License as published by
9+
# the Free Software Foundation, either version 3 of the License, or
10+
# (at your option) any later version.
11+
#
12+
# This program is distributed in the hope that it will be useful,
13+
# but WITHOUT ANY WARRANTY; without even the implied warranty of
14+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15+
# GNU General Public License for more details.
16+
#
17+
# You should have received a copy of the GNU General Public License
18+
# along with this program. If not, see <http://www.gnu.org/licenses/>.
519

6-
For more details:
7-
https://github.com/dasmith/stanford-corenlp-python
8-
9-
By Dustin Smith, 2011
10-
"""
11-
from simplejson import loads, dumps
20+
try:
21+
import json
22+
except ImportError:
23+
import simplejson as json
24+
1225
import optparse
1326
import sys
1427
import os
1528
import time
1629
import re
17-
from unidecode import unidecode
30+
import logging
1831

19-
import pexpect
32+
try:
33+
from unidecode import unidecode
34+
except ImportError:
35+
logging.info("unidecode library not installed")
36+
def unidecode(text):
37+
return text
2038

21-
import jsonrpc
2239
from progressbar import *
40+
import jsonrpc
41+
42+
import pexpect
2343

2444

2545
def remove_id(word):
@@ -135,15 +155,15 @@ def __init__(self):
135155
Checks the location of the jar files.
136156
Spawns the server as a process.
137157
"""
138-
139158
jars = ["stanford-corenlp-2012-04-09.jar",
140159
"stanford-corenlp-2012-04-09-models.jar",
141160
"joda-time.jar",
142161
"xom.jar"]
143162

144163
# if CoreNLP libraries are in a different directory,
145164
# change the corenlp_path variable to point to them
146-
corenlp_path = ""
165+
corenlp_path = "stanford-corenlp-2012-04-09/"
166+
147167
java_path = "java"
148168
classname = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
149169
# include the properties file, so you can change defaults
@@ -249,61 +269,16 @@ def parse(self, text, verbose=True):
249269
if verbose: print "Request", text
250270
results = self._parse(text, verbose)
251271
if verbose: print "Results", results
252-
return dumps(results)
253-
254-
def parse_imperative(self, text, verbose=True):
255-
"""
256-
This is a hacky way to deal with imperative statements.
257-
258-
Takes an imperative, adds a personal pronoun, parses it,
259-
and then removes it in the resulting parse.
260-
261-
e.g. "open the door" gets parsed as "you open the door"
262-
"""
263-
# find a pronoun that's not in the string already.
264-
used_pronoun = None
265-
pronouns = ["you","he", "she","i"]
266-
for p in pronouns:
267-
if text.startswith(p+" "):
268-
# it's already an imperative!
269-
used_pronoun = None
270-
break
271-
if p not in text:
272-
# found one not in there already
273-
used_pronoun = p
274-
break
275-
# if you can't find one, regress to original parse
276-
if not used_pronoun:
277-
return self.parse(text, verbose)
278-
279-
# create text with pronoun and parse it
280-
new_text = used_pronoun+" "+text.lstrip()
281-
result = self._parse(new_text, verbose)
282-
283-
if len(result) != 1:
284-
print "Non-imperative sentence? Multiple sentences found."
285-
286-
# remove the dummy pronoun
287-
used_pronoun_offset = len(used_pronoun)+1
288-
if result[0].has_key('text'):
289-
result[0]['text'] = text
290-
result[0]['tuples'] = filter(lambda x: not (x[1] == used_pronoun or x[2]
291-
== used_pronoun), result[0]['tuples'])
292-
result[0]['words'] = result[0]['words'][1:]
293-
# account for offset
294-
ct = 0
295-
for word, av in result[0]['words']:
296-
for a,v in av.items():
297-
if a.startswith("CharacterOffset"):
298-
result[0]['words'][ct][1][a] = v-used_pronoun_offset
299-
ct += 1
300-
return dumps(result)
301-
else:
302-
# if there's a timeout error, just return it.
303-
return dumps(result)
272+
return json.dumps(results)
304273

305274

306275
if __name__ == '__main__':
276+
"""
277+
This block is executed when the file is run directly as a script, not when it
278+
is imported.
279+
280+
The code below starts an JSONRPC server
281+
"""
307282
parser = optparse.OptionParser(usage="%prog [OPTIONS]")
308283
parser.add_option(
309284
'-p', '--port', default='8080',

0 commit comments

Comments
 (0)
0