Extracts nouns & verbs from a text file via SpaCy (e.g. mining it for vocabulary) and then prints them to stdout.
For all methods, you must first check out or download the code.
For the script to run, you will need to setup an environment with the dependencies. The current recommended method is using pipenv, which helps manage the details of the virtual envionment, and provides deterministic builds via a lockfile (while requiring a specific python version):
- Make sure you have both
python(3.13) as well aspipinstalled on your system - Run
pip install pipenv --user
See the pipenv docs for details
Navigate to the project directory and then run these commands:
pipenv shell
pipenv installNow you should be able to run the script from within a pipenv shell. Note that you will need to restart the pipenv shell if you restart your machine, but will only need to run install if the packages have been updated.
- Perform setup as above
- Activate the environment you'll be using, if not already (e.g.
pipenv shell) - Create a local folder called
data(mkdir datafrom repo root) - Paste text you would like to parse for into a text file,
data/article.txt - Run the script (
python gen_list > data/output.txt). Note that you can omit the redirection of output, if you'd like to preview the results instead.
In general, the repo is currently set up to ignore files in ./data, so that is an easy option for output files.
If you would like to use a different input path for the data file, you can do so by calling:
python gen_list --input='/path/to/your/text/file'If you would like to specify an alternate output type (currently, the only additional option is CSV), you can do so by calling:
python gen_list --output-type=csvNote that you can see all available options by calling:
python gen_list --help