8000 updated README · ez-max/stanford-corenlp-python@1a64f9f · GitHub
[go: up one dir, main page]

Skip to content

Commit 1a64f9f

Browse files
committed
updated README
1 parent 4790d45 commit 1a64f9f

File tree

2 files changed

+17
-11
lines changed

2 files changed

+17
-11
lines changed

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,15 @@
22

33
This a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml). It can either be imported as a module or run as an JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM and usually a few minutes loading time), most applications will probably want to run it as a server.
44

5-
There's not much to this script.
5+
It requires [pexpect](http://www.noah.org/wiki/pexpect) and uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included.
66

7-
It requires `pexpect`.
8-
9-
This uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included in this repository.
7+
There's not much to this script. I decided to create it after having trouble initializing the JVM through JPypes on two different machines.
108

9+
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
1110

1211
## Download and Usage
1312

14-
You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's Core-NLP package. Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
13+
You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package. Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
1514

1615
Then, to launch a server:
1716

@@ -33,8 +32,6 @@ Download WordNet-3.0 Prolog: http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.
3332

3433
## Questions
3534

36-
I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
37-
3835
If you think there may be a problem with this wrapper, first ensure you can run the Java program:
3936

4037
java -cp stanford-corenlp-2010-11-12.jar:stanford-corenlp-models-2010-11-06.jar:xom-1.2.6.jar:xom.jar:jgraph.jar:jgrapht.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties

server.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,14 +86,17 @@ def __init__(self):
8686

8787
classname = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
8888
javapath = "java"
89+
# include the properties file, so you can change defaults
90+
# but any changes in output format will break parse_parser_results()
91+
props = "-props default.properties"
8992

9093
for jar in jars:
9194
if not os.path.exists(jar):
9295
print "Error! Cannot locate %s" % jar
9396
sys.exit(1)
9497

9598
# spawn the server
96-
self._server = pexpect.spawn("%s -Xmx3g -cp %s %s" % (javapath, ':'.join(jars), classname))
99+
self._server = pexpect.spawn("%s -Xmx3g -cp %s %s %s" % (javapath, ':'.join(jars), classname, props))
97100

98101
print "Starting the Stanford Core NLP parser."
99102
# show progress bar while loading the models
@@ -111,7 +114,8 @@ def __init__(self):
111114
pbar.update(5)
112115
self._server.expect("Entering interactive shell.")
113116
pbar.finish()
114-
print self._server.before
117+
print "Server loaded."
118+
#print self._server.before
115119

116120
def parse(self, text):
117121
"""
@@ -121,7 +125,9 @@ def parse(self, text):
121125
"""
122126
print "Request", text
123127
print self._server.sendline(text)
124-
max_expected_time = 2 + len(text) / 200.0
128+
# How much time should we give the parser to parse it?it
129+
#
130+
max_expected_time = min(5, 2 + len(text) / 200.0)
125131
print "Timeout", max_expected_time
126132
end_time = time.time() + max_expected_time
127133
incoming = ""
@@ -131,8 +137,11 @@ def parse(self, text):
131137
freshlen = len(ch)
132138
time.sleep (0.0001)
133139
incoming = incoming + ch
134-
if "\nNLP>" in incoming or end_time - time.time() < 0:
140+
if "\nNLP>" in incoming:
135141
break
142+
if end_time - time.time() < 0:
143+
return dumps({'error': "timed out after %f seconds" %
144+
max_expected_time, 'output': incoming})
136145
results = parse_parser_results(incoming)
137146
print "Results", results
138147
# convert to JSON and return

0 commit comments

Comments
 (0)
0