This repository contains an op-for-op PyTorch reimplementation of Google's TensorFlow repository for the BERT model that was released together with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
This implementation can load any pre-trained TensorFlow checkpoint for BERT (in particular Google's pre-trained models) and a conversion script is provided (see below).
The code to use, in addition, the Multilingual and Chinese models will be added later this week (it's actually just the tokenization code that needs to be updated).
Loading a TensorFlow checkpoint (e.g. Google's pre-trained models)
You can convert any TensorFlow checkpoint for BERT (in particular the pre-trained models released by Google) in a PyTorch save file by using the convert_tf_checkpoint_to_pytorch.py
script.
This script takes as input a TensorFlow checkpoint (three files starting with bert_model.ckpt
) and the associated configuration file (bert_config.json
), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using torch.load()
(see examples in extract_features.py
, run_classifier.py
and run_squad.py
).
You only need to run this conversion script once to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with bert_model.ckpt
) but be sure to keep the configuration file (bert_config.json
) and the vocabulary file (vocab.txt
) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (pip install tensorflow
). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained BERT-Base Uncased
model:
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
python convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
--bert_config_file $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion here.