You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+31-22Lines changed: 31 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Python interface to Stanford Core NLP tools v1.3.3
1
+
# Python interface to Stanford Core NLP tools v3.4.1
2
2
3
3
This is a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml). It can either be imported as a module or run as a JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM on 64-bit machines and usually a few minutes loading time), most applications will probably want to run it as a server.
4
4
@@ -8,23 +8,21 @@ This is a Python wrapper for Stanford University's NLP group's Java-based [CoreN
8
8
* Outputs parse trees which can be used by [nltk](http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html).
9
9
10
10
11
-
It requires [pexpect](http://www.noah.org/wiki/pexpect) and (optionally) [unidecode](http://pypi.python.org/pypi/Unidecode) to handle non-ASCII text. This script includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
11
+
It requires [pexpect](http://www.noah.org/wiki/pexpect) and [unidecode](http://pypi.python.org/pypi/Unidecode) to handle non-ASCII text. This script includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
12
12
13
-
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 1.3.3** released 2012-07-09.
13
+
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 3.4.1** released 2014-08-27.
14
14
15
-
## Download and Usage
15
+
## Download and Usage
16
16
17
-
To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the tgz file containing Stanford's CoreNLP package. By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
17
+
To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the compressed file containing Stanford's CoreNLP package. By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run. In other words:
@@ -39,7 +37,7 @@ Assuming you are running on port 8080, the code in `client.py` shows an example
39
37
import jsonrpc
40
38
from simplejson import loads
41
39
server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
42
-
jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
40
+
jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
43
41
44
42
result = loads(server.parse("Hello world. It is so beautiful"))
45
43
print "Result", result
@@ -112,8 +110,19 @@ To use it in a regular script or to edit/debug it (because errors via RPC are op
112
110
corenlp = StanfordCoreNLP() # wait a few minutes...
113
111
corenlp.parse("Parse it")
114
112
113
+
114
+
## Coreference Resolution
115
+
116
+
The library supports [coreference resolution](http://en.wikipedia.org/wiki/Coreference), meaning pronouns can be "dereferenced." If an entry in the `coref` list is, `[u'Hello world', 0, 1, 0, 2]`, the numbers mean:
117
+
118
+
* 0 = The reference appears in the 0th sentence (e.g. "Hello World")
119
+
* 1 = The 2nd token, "world", is the [headword](http://en.wikipedia.org/wiki/Head_%28linguistics%29) of that sentence
120
+
* 0 = 'Hello world' begins at the 0th token in the sentence
121
+
* 2 = 'Hello world' ends before the 2nd token in the sentence.
122
+
115
123
<!--
116
124
125
+
117
126
## Adding WordNet
118
127
119
128
Note: wordnet doesn't seem to be supported using this approach. Looks like you'll need Java.
@@ -129,23 +138,23 @@ tar xvfz WNprolog-3.0.tar.gz
129
138
**Stanford CoreNLP tools require a large amount of free memory**. Java 5+ uses about 50% more RAM on 64-bit machines than 32-bit machines. 32-bit machine users can lower the memory requirements by changing `-Xmx3g` to `-Xmx2g` or even less.
130
139
If pexpect timesout while loading models, check to make sure you have enough memory and can run the server alone without your kernel killing the java process:
You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).
135
144
136
145
137
-
# Contributors
146
+
# License & Contributors
138
147
139
148
This is free and open source software and has benefited from the contribution and feedback of others. Like Stanford's CoreNLP tools, it is covered under the [GNU General Public License v2 +](http://www.gnu.org/licenses/gpl-2.0.html), which in short means that modifications to this program must maintain the same free and open source distribution policy.
140
149
141
-
This project has benefited from the contributions of:
150
+
I gratefully welcome bug fixes and new features. If you have forked this repository, please submit a [pull request](https://help.github.com/articles/using-pull-requests/) so others can benefit from your contributions. This project has already benefited from contributions from these members of the open source community:
142
151
143
-
*@jcc Justin Cheng
152
+
*[Emilio Monti](https://github.com/emilmont)
153
+
*[Justin Cheng](https://github.com/jcccf)
144
154
* Abhaya Agarwal
145
155
146
-
## Related Projects
156
+
*Thank you!*
147
157
148
-
These two projects are python wrappers for the [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml), which includes the Stanford Parser, although the Stanford Parser is another project.
149
-
-[stanford-parser-python](http://projects.csail.mit.edu/spatial/Stanford_Parser) uses [JPype](http://jpype.sourceforge.net/) (interface to JVM)
Maintainers of the Core NLP library at Stanford keep an [updated list of wrappers and extensions](http://nlp.stanford.edu/software/corenlp.shtml#Extensions).
0 commit comments