You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -124,165 +124,16 @@ Not to use JSON-RPC, load the module instead:
124
124
125
125
If you need to parse long texts (more than 30-50 sentences), you have to use a batch_parse() function. It reads text files from input directory and returns a generator object of dictionaries parsed each file results:
126
126
127
-
from corenlp import batch_process
127
+
from corenlp import batch_parse
128
+
corenlp_dir = "stanford-corenlp-full-2013-04-04/"
128
129
raw_text_directory = "sample_raw_text/"
129
-
parsed = batch_process(raw_text_directory) # It returns a generator object
130
+
parsed = batch_process(raw_text_directory, corenlp_dir) # It returns a generator object
Following are the README in original stanford-corenlp-python.
138
-
139
-
-------------------------------------
140
-
141
-
Python interface to Stanford Core NLP tools v1.3.3
142
-
143
-
This is a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml). It can either be imported as a module or run as a JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM on 64-bit machines and usually a few minutes loading time), most applications will probably want to run it as a server.
144
-
145
-
146
-
* Python interface to Stanford CoreNLP tools: tagging, phrase-structure parsing, dependency parsing, named entity resolution, and coreference resolution.
147
-
* Runs an JSON-RPC server that wraps the Java server and outputs JSON.
148
-
* Outputs parse trees which can be used by [nltk](http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html).
149
-
150
-
151
-
It requires [pexpect](http://www.noah.org/wiki/pexpect) and (optionally) [unidecode](http://pypi.python.org/pypi/Unidecode) to handle non-ASCII text. This script includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
152
-
153
-
It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 1.3.3** released 2012-07-09.
154
-
155
-
## Download and Usage
156
-
157
-
To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the tgz file containing Stanford's CoreNLP package. By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
158
-
159
-
In other words:
160
-
161
-
sudo pip install pexpect unidecode # unidecode is optional
That will run a public JSON-RPC server on port 3456.
176
-
177
-
Assuming you are running on port 8080, the code in `client.py` shows an example parse:
178
-
179
-
import jsonrpc
180
-
from simplejson import loads
181
-
server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
182
-
jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
183
-
184
-
result = loads(server.parse("Hello world. It is so beautiful"))
185
-
print "Result", result
186
-
187
-
That returns a dictionary containing the keys `sentences` and (when applicable) `corefs`. The key `sentences` contains a list of dictionaries for each sentence, which contain `parsetree`, `text`, `tuples` containing the dependencies, and `words`, containing information about parts of speech, NER, etc:
**Stanford CoreNLP tools require a large amount of free memory**. Java 5+ uses about 50% more RAM on 64-bit machines than 32-bit machines. 32-bit machine users can lower the memory requirements by changing `-Xmx3g` to `-Xmx2g` or even less.
270
-
If pexpect timesout while loading models, check to make sure you have enough memory and can run the server alone without your kernel killing the java process:
You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).
275
-
276
-
277
-
# Contributors
278
-
279
-
This is free and open source software and has benefited from the contribution and feedback of others. Like Stanford's CoreNLP tools, it is covered under the [GNU General Public License v2 +](http://www.gnu.org/licenses/gpl-2.0.html), which in short means that modifications to this program must maintain the same free and open source distribution policy.
280
-
281
-
This project has benefited from the contributions of:
282
-
283
-
*@jcc Justin Cheng
284
-
* Abhaya Agarwal
285
-
286
137
## Related Projects
287
138
288
139
These two projects are python wrappers for the [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml), which includes the Stanford Parser, although the Stanford Parser is another project.
0 commit comments