8000 projects/pipelines/tagger_parser_ud at v3 · explosion/projects · GitHub
[go: up one dir, main page]

Skip to content

Latest commit

 

History

History

tagger_parser_ud

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

🪐 Weasel Project: Part-of-speech Tagging & Dependency Parsing (Universal Dependencies)

This project template lets you train a part-of-speech tagger, morphologizer, lemmatizer and dependency parser from a Universal Dependencies corpus. It takes care of downloading the treebank, converting it to spaCy's format and training and evaluating the model. The template uses the UD_English-EWT treebank by default, but you can swap it out for any other available treebank. Just make sure to adjust the lang and treebank settings in the variables below. Use xx for multi-language if no language-specific tokenizer is available in spaCy. Note that multi-word tokens will be merged together when the corpus is converted since spaCy does not support multi-word token expansion.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the Weasel documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using weasel run [name]. Commands are only re-run if their inputs have changed.

Command Description
preprocess Convert the data to spaCy's format
train Train UD_English-EWT
evaluate Evaluate on the test data and save the metrics
package Package the trained model so it can be installed
clean Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using weasel run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all preprocesstrainevaluatepackage

🗂 Assets

The following assets are defined by the project. They can be fetched by running weasel assets in the project directory.

File Source Description
assets/UD_English-EWT Git
0