|
2 | 2 |
|
3 | 3 | This a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml). It can either be imported as a module or run as an JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM and usually a few minutes loading time), most applications will probably want to run it as a server.
|
4 | 4 |
|
5 |
| -It requires [pexpect](http://www.noah.org/wiki/pexpect) and uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included. |
| 5 | +It requires [pexpect](http://www.noah.org/wiki/pexpect). Included dependencies are [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/). |
6 | 6 |
|
7 |
| -There's not much to this script. I decided to create it after having trouble initializing the JVM through JPypes on two different machines. |
| 7 | +There's not much to this script. I decided to create it after having trouble initializing a JVM using JPypes on two different machines. |
8 | 8 |
|
9 | 9 | It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
|
10 | 10 |
|
11 | 11 | ## Download and Usage
|
12 | 12 |
|
13 | 13 | You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package. Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
|
14 | 14 |
|
| 15 | +In other words: |
| 16 | + |
| 17 | + sudo pip install pexpect |
| 18 | + wget http://nlp.stanford.edu/software/stanford-corenlp-v1.0.2.tgz |
| 19 | + tar xvfz stanford-corenlp-v1.0.2.tgz |
| 20 | + cd stanford-corenlp-2010-11-12 |
| 21 | + git clone git://github.com/dasmith/stanford-corenlp-python.git |
| 22 | + mv stanford-corenlp-python/* . |
| 23 | + |
15 | 24 | Then, to launch a server:
|
16 | 25 |
|
17 | 26 | python server.py
|
18 | 27 |
|
19 |
| -Optionally, specify a host or port: |
| 28 | +Optionally, you can specify a host or port: |
20 | 29 |
|
21 | 30 | python server.py -H 0.0.0.0 -p 3456
|
22 | 31 |
|
23 |
| -To run a public JSON-RPC server on port 3456. |
| 32 | +That will run a public JSON-RPC server on port 3456. |
| 33 | + |
| 34 | +Assuming you are running on port 8080, the code in `client.py` shows an example parse: |
| 35 | + |
| 36 | + port jsonrpc |
| 37 | + server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(), |
| 38 | + jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080))) |
| 39 | + |
| 40 | + result = server.parse("hello world") |
| 41 | + print "Result", result |
| 42 | + |
| 43 | + |
| 44 | +Produces: |
| 45 | + |
| 46 | + Result [{"text": "hello world", "tuples": [["amod", "world", "hello"]], "words": {"world": {"NamedEntityTag": "O", "CharacterOffsetEnd": "11", "Lemma": "world", "PartOfSpeech": "NN", "CharacterOffsetBegin": "6"}, "hello": {"NamedEntityTag": "O", "CharacterOffsetEnd": "5", "Lemma": "hello", "PartOfSpeech": "JJ", "CharacterOffsetBegin": "0"}}}] |
24 | 47 |
|
25 |
| -See `client.py` for example of how to connect with a client. |
26 | 48 |
|
27 | 49 | <!--
|
28 | 50 | ## Adding WordNet
|
|
0 commit comments