8000 Add accelerate API support for Word Language Model example by framoncg · Pull Request #1345 · pytorch/examples · GitHub
[go: up one dir, main page]

Skip to content

Add accelerate API support for Word Language Model example #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion run_python_examples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ function vision_transformer() {
}

function word_language_model() {
uv run main.py --epochs 1 --dry-run $CUDA_FLAG --mps || error "word_language_model failed"
uv run main.py --epochs 1 --dry-run $ACCEL_FLAG || error "word_language_model failed"
}

function gcn() {
Expand Down
21 changes: 10 additions & 11 deletions word_language_model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ This example trains a multi-layer RNN (Elman, GRU, or LSTM) or Transformer on a
The trained model can then be used by the generate script to generate new text.

```bash
python main.py --cuda --epochs 6 # Train a LSTM on Wikitext-2 with CUDA.
python main.py --cuda --epochs 6 --tied # Train a tied LSTM on Wikitext-2 with CUDA.
python main.py --cuda --tied # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs.
python main.py --cuda --epochs 6 --model Transformer --lr 5
python main.py --accel --epochs 6 # Train a LSTM on Wikitext-2 with CUDA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with CUDA

I suggest to drop this from example command line and maybe add a note that example supports running on acceleration devices and list which were tried (CUDA, MPS, XPU).

python main.py --accel --epochs 6 --tied # Train a tied LSTM on Wikitext-2 with CUDA.
python main.py --accel --tied # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs.
python main.py --accel --epochs 6 --model Transformer --lr 5
# Train a Transformer model on Wikitext-2 with CUDA.

python generate.py # Generate samples from the default model checkpoint.
python generate.py --accel # Generate samples from the default model checkpoint.
```

The model uses the `nn.RNN` module (and its sister modules `nn.GRU` and `nn.LSTM`) or Transformer module (`nn.TransformerEncoder` and `nn.TransformerEncoderLayer`) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.
Expand All @@ -35,8 +35,7 @@ optional arguments:
--dropout DROPOUT dropout applied to layers (0 = no dropout)
--tied tie the word embedding and softmax weights
--seed SEED random seed
--cuda use CUDA
--mps enable GPU on macOS
--accel use accelerator
--log-interval N report interval
--save SAVE path to save the final model
--onnx-export ONNX_EXPORT
Expand All @@ -49,8 +48,8 @@ With these arguments, a variety of models can be tested.
As an example, the following arguments produce slower but better models:

```bash
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied
python main.py --accel --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --accel --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --accel --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --accel --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied
```
24 changes: 7 additions & 17 deletions word_language_model/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,38 +21,28 @@
help='number of words to generate')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--cuda', action='store_true',
help='use CUDA')
parser.add_argument('--mps', action='store_true', default=False,
help='enables macOS GPU training')
parser.add_argument('--temperature', type=float, default=1.0,
help='temperature - higher will increase diversity')
parser.add_argument('--log-interval', type=int, default=100,
help='reporting interval')
parser.add_argument('--accel', action='store_true', default=False,
help='use accelerator')
args = parser.parse_args()

# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda.")
if torch.backends.mps.is_available():
if not args.mps:
print("WARNING: You have mps device, to enable macOS GPU run with --mps.")

use_mps = args.mps and torch.backends.mps.is_available()
if args.cuda:
device = torch.device("cuda")
elif use_mps:
device = torch.device("mps")

if args.accel and torch.accelerator.is_available():
device = torch.accelerator.current_accelerator()

else:
device = torch.device("cpu")

if args.temperature < 1e-3:
parser.error("--temperature has to be greater or equal 1e-3.")

with open(args.checkpoint, 'rb') as f:
model = torch.load(f, map_location=device)
model = torch.load(f, map_location=device, weights_only=False)
model.eval()

corpus = data.Corpus(args.data)
Expand Down
27 changes: 9 additions & 18 deletions word_language_model/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,6 @@
help='tie the word embedding and softmax weights')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--cuda', action='store_true', default=False,
help='use CUDA')
parser.add_argument('--mps', action='store_true', default=False,
help='enables macOS GPU training')
parser.add_argument('--log-interval', type=int, default=200, metavar='N',
help='report interval')
parser.add_argument('--save', type=str, default='model.pt',
Expand All @@ -51,25 +47,20 @@
help='the number of heads in the encoder/decoder of the transformer model')
parser.add_argument('--dry-run', action='store_true',
help='verify the code and the model')
parser.add_argument('--accel', action='store_true',help='Enables accelerated training')
args = parser.parse_args()

# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda.")
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
if not args.mps:
print("WARNING: You have mps device, to enable macOS GPU run with --mps.")

use_mps = args.mps and torch.backends.mps.is_available()
if args.cuda:
device = torch.device("cuda")
elif use_mps:
device = torch.device("mps")

if args.accel and torch.accelerator.is_available():
device = torch.accelerator.current_accelerator()

else:
device = torch.device("cpu")

print("Using device:", device)

###############################################################################
# Load data
###############################################################################
Expand Down Expand Up @@ -243,11 +234,11 @@ def export_onnx(path, batch_size, seq_len):

# Load the best saved model.
with open(args.save, 'rb') as f:
model = torch.load(f)
torch.load(f, weights_only=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, please, extract this change to separate PR? It also needs an update for required torch version:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I extract the change and update the requirements to 2.7 it won't work, this change allows the example to run with the simplest code change, since leaving it as it was fails to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PyTorch 2.6, the default value for weights_only was set to True, and PyTorch 2.7 introduced support for the accelerator API.

In this pull request, we can integrate the use of the accelerator API in this PR. Meanwhile, we will address the update for saving and loading models using state_dict in a separate pull request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch 2.7 introduced support for the accelerator API. <...>In this pull request, we can integrate the use of the accelerator API in this PR.

From 2.6 actually. See https://docs.pytorch.org/docs/2.6/accelerator.html#module-torch.accelerator.

To integrate torch.accelerator we must update the requirement for torch to be >=2.6. Otherwise tests will simply fail. I suspect that you did not actually run the modified run_python_examples.sh.

If I extract the change and update the requirements to 2.7 it won't work

I believe you are doing changes in the wrong order. First, update requirement to be able to use latest pytorch and fix issues which appear. Next, as a second step, introduce new APIs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run the modified run_python_examples.sh but maybe I am doing this in the wrong order. So the suggestion here is to first update requirements and fix the issues in a separate PR, close this one and create a new one for the new API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we need to run the example with latest PyTorch and fix any issue in a separate PR.

Thanks for the feedback @dvrogozh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggestion here is to first update requirements and fix the issues in a separate PR, close this one and create

Yes, but you don't need to close this PR. Just mark it as a draft while working on the update requirements PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a PR to update torch version requirement as I would do it:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@framoncg, the #1347 got merged. Please, rebase your PR.

# after load the rnn params are not a continuous chunk of memory
# this makes them a continuous chunk, and will speed up forward pass
# Currently, only rnn model supports flatten_parameters function.
if args.model in ['RNN_TANH', 'RNN_RELU', 'LSTM', 'GRU']:
if args.model in ['RNN_TANH', 'RNN_RELU', 'LSTM', 'GRU'] and device.type == 'cuda':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the error you're getting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be an overlook from my part. This was needed when trying a safe approach of only loading the weights but apparently it is no longer needed. I will remove it to prevent any unwanted changes

model.rnn.flatten_parameters()

# Run on test data.
Expand Down
0