8000 Add methods to PreTrainedModel to use PyTorch's BetterTransformer (#2… · githubhjs/transformers@3042c63 · GitHub
[go: up one dir, main page]

Skip to content

Commit 3042c63

Browse files
fxmartyyounesbelkadasguggermichaelbenayoun
authored
Add methods to PreTrainedModel to use PyTorch's BetterTransformer (huggingface#21259)
* fix mess * better documentation * typo * fix doc * update * add test * fix test * more tests * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * move to utils * Apply suggestions from code review Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * nit --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
1 parent 0083b14 commit 3042c63

File tree

8 files changed

+181
-3
lines changed

8 files changed

+181
-3
lines changed

docker/transformers-all-latest-gpu/Dockerfile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,9 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/acc
5151
# Add bitsandbytes for mixed int8 testing
5252
RUN python3 -m pip install --no-cache-dir bitsandbytes
5353

54+
# For bettertransformer
55+
RUN python3 -m pip install --no-cache-dir optimum
56+
5457
# For video model testing
5558
RUN python3 -m pip install --no-cache-dir decord av==9.2.0
5659

docs/source/en/perf_infer_gpu_one.mdx

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1111

1212
# Efficient Inference on a Single GPU
1313

14-
This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
14+
In addition to this guide, relevant information can be found as well in [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
1515

16-
## `BetterTransformer` for faster inference
16+
## Better Transformer: PyTorch-native transformer fastpath
1717

18-
We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
18+
PyTorch-native [`nn.MultiHeadAttention`](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) attention fastpath, called BetterTransformer, can be used with Transformers through the integration in the [🤗 Optimum library](https://huggingface.co/docs/optimum/bettertransformer/overview).
19+
20+
PyTorch's attention fastpath allows to speed up inference through kernel fusions and the use of [nested tensors](https://pytorch.org/docs/stable/nested.html). Detailed benchmarks can be found in [this blog post](https://medium.com/pytorch/bettertransformer-out-of-the-box-performance-for-huggingface-transformers-3fbe27d50ab2).
21+
22+
After installing the [`optimum`](https://github.com/huggingface/optimum) package, to use Better Transformer during inference, the relevant internal modules are replaced by calling [`~PreTrainedModel.to_bettertransformer`]:
23+
24+
```python
25+
model = model.to_bettertransformer()
26+
```
27+
28+
The method [`~PreTrainedModel.reverse_bettertransformer`] allows to go back to the original modeling, which should be used before saving the model in order to use the canonical transformers modeling:
29+
30+
```python
31+
model = model.reverse_bettertransformer()
32+
model.save_pretrained("saved_model")
33+
```
34+
35+
As of PyTorch 2.0, the attention fastpath is supported for both encoders and decoders. The list of supported architectures can be found [here](https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models).
1936

2037
## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
2138

docs/source/en/perf_train_gpu_one.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,6 +718,18 @@ For some applications, such as pretraining large language models, applying all t
718718
719719
Another use case for training on many GPUs is if the model does not fit on a single GPU with all the mentioned tricks. There are still more methods we can apply although life starts to get a bit more complicated. This usually involves some form of pipeline or tensor parallelism where the model itself is distributed across several GPUs. One can also make use of DeepSpeed which implements some of these parallelism strategies along with some more optimization to reduce the memory footprint such as partitioning the optimizer states. You can read more about this in the ["Multi-GPU training" section](perf_train_gpu_many).
720720
721+
## Using PyTorch native attention
722+
723+
PyTorch 2.0 released the native [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html) (SDPA), that allows to use fused GPU kernels as [memory-efficient attention](https://arxiv.org/abs/2112.05682) and [flash attention](https://arxiv.org/abs/2205.14135).
724+
725+
After installing the [`optimum`](https://github.com/huggingface/optimum) package, the relevant internal modules can be replaced to use PyTorch's native attention with:
726+
727+
```python
728+
model = model.to_bettertransformer()
729+
```
730+
731+
Training can then be done as usual.
732+
721733
## Using torch.compile
722734
723735
PyTorch 2.0 introduces a new compile function, you can learn more about it [in their documentation](https://pytorch.org/get-started/pytorch-2.0/). It uses Python’s frame evaluation API to automatically create a graph from existing PyTorch programs. After capturing the graph, different backends can be deployed to lower the graph to an optimized engine. You can choose one option below for performance boost.

src/transformers/modeling_utils.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@
6464
is_accelerate_available,
6565
is_bitsandbytes_available,
6666
is_offline_mode,
67+
is_optimum_available,
6768
is_remote_url,
6869
is_safetensors_available,
6970
is_torch_tpu_available,
@@ -3310,6 +3311,56 @@ def register_for_auto_class(cls, auto_class="AutoModel"):
33103311

33113312
cls._auto_class = auto_class
33123313

3314+
def to_bettertransformer(self) -> "PreTrainedModel":
3315+
"""
3316+
Converts the model to use [PyTorch's native attention
3317+
implementation](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html), integrated to
3318+
Transformers through [Optimum library](https://huggingface.co/docs/optimum/bettertransformer/overview). Only a
3319+
subset of all Transformers models are supported.
3320+
3321+
PyTorch's attention fastpath allows to speed up inference through kernel fusions and the use of [nested
3322+
tensors](https://pytorch.org/docs/stable/nested.html). Detailed benchmarks can be found in [this blog
3323+
post](https://medium.com/pytorch/bettertransformer-out-of-the-box-performance-for-huggingface-transformers-3fbe27d50ab2).
3324+
3325+
Returns:
3326+
[`PreTrainedModel`]: The model converted to BetterTransformer.
3327+
"""
3328+
if not is_optimum_available():
3329+
raise ImportError("The package `optimum` is required to use Better Transformer.")
3330+
3331+
from optimum.version import __version__ as optimum_version
3332+
3333+
if version.parse(optimum_version) < version.parse("1.7.0"):
3334+
raise ImportError(
3335+
f"Please install optimum>=1.7.0 to use Better Transformer. The version {optimum_version} was found."
3336+
)
3337+
3338+
from optimum.bettertransformer import BetterTransformer
3339+
3340+
return BetterTransformer.transform(self)
3341+
3342+
def reverse_bettertransformer(self):
3343+
"""
3344+
Reverts the transformation from [`~PreTrainedModel.to_bettertransformer`] so that the original modeling is
3345+
used, for example in order to save the model.
3346+
3347+
Returns:
3348+
[`PreTrainedModel`]: The model converted back to the original modeling.
3349+
"""
3350+
if not is_optimum_available():
3351+
raise ImportError("The package `optimum` is required to use Better Transformer.")
3352+
3353+
from optimum.version import __version__ as optimum_version
3354+
3355+
if version.parse(optimum_version) < version.parse("1.7.0"):
3356+
raise ImportError(
3357+
f"Please install optimum>=1.7.0 to use Better Transformer. The version {optimum_version} was found."
3358+
)
3359+
3360+
from optimum.bettertransformer import BetterTransformer
3361+
3362+
return BetterTransformer.reverse(self)
3363+
33133364

33143365
PreTrainedModel.push_to_hub = copy_func(PreTrainedModel.push_to_hub)
33153366
if PreTrainedModel.push_to_hub.__doc__ is not None:

src/transformers/testing_utils.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
is_librosa_available,
6666
is_natten_available,
6767
is_onnx_available,
68+
is_optimum_available,
6869
is_pandas_available,
6970
is_phonemizer_available,
7071
is_pyctcdecode_available,
@@ -693,6 +694,13 @@ def require_bitsandbytes(test_case):
693694
return unittest.skipUnless(is_bitsandbytes_available(), "test requires bnb")(test_case)
694695

695696

697+
def require_optimum(test_case):
698+
"""
699+
Decorator for optimum dependency
700+
"""
701+
return unittest.skipUnless(is_optimum_available(), "test requires optimum")(test_case)
702+
703+
696704
def require_phonemizer(test_case):
697705
"""
698706
Decorator marking a test that requires phonemizer

src/transformers/utils/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@
121121
is_natten_available,
122122
is_ninja_available,
123123
is_onnx_available,
124+
is_optimum_available,
124125
is_pandas_available,
125126
is_peft_available,
126127
is_phonemizer_available,

tests/bettertransformer/__init__.py

Whitespace-only changes.
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line 558 numberDiff line change
@@ -0,0 +1,86 @@
1+
# coding=utf-8
2+
# Copyright 2023 The HuggingFace Team Inc.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a clone of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import tempfile
17+
import unittest
18+
19+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
20+
from transformers.testing_utils import (
21+
is_torch_available,
22+
require_optimum,
23+
require_torch,
24+
slow,
25+
)
26+
27+
28+
if is_torch_available():
29+
import torch
30+
31+
32+
@require_torch
33+
@require_optimum
34+
@slow
35+
class BetterTransformerIntegrationTest(unittest.TestCase):
36+
# refer to the full test suite in Optimum library:
37+
# https://github.com/huggingface/optimum/tree/main/tests/bettertransformer
38+
39+
def test_transform_and_reverse(self):
40+
r"""
41+
Classic tests to simply check if the conversion has been successfull.
42+
"""
43+
model_id = "hf-internal-testing/tiny-random-t5"
44+
tokenizer = AutoTokenizer.from_pretrained(model_id)
45+
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
46+
47+
inp = tokenizer("This is me", return_tensors="pt")
48+
49+
model = model.to_bettertransformer()
50+
51+
self.assertTrue(any("BetterTransformer" in mod.__class__.__name__ for _, mod in model.named_modules()))
52+
53+
output = model.generate(**inp)
54+
55+
model = model.reverse_bettertransformer()
56+
57+
self.assertFalse(any("BetterTransformer" in mod.__class__.__name__ for _, mod in model.named_modules()))
58+
59+
with tempfile.TemporaryDirectory() as tmpdirname:
60+
model.save_pretrained(tmpdirname)
61+
62+
model_reloaded = AutoModelForSeq2SeqLM.from_pretrained(tmpdirname)
63+
64+
self.assertFalse(
65+
any("BetterTransformer" in mod.__class__.__name__ for _, mod in model_reloaded.named_modules())
66+
)
67+
68+
output_from_pretrained = model_reloaded.generate(**inp)
69+
self.assertTrue(torch.allclose(output, output_from_pretrained))
70+
71+
def test_error_save_pretrained(self):
72+
r"""
73+
The save_pretrained method should raise a ValueError if the model is in BetterTransformer mode.
74+
All should be good if the model is reversed.
75+
"""
76+
model_id = "hf-internal-testing/tiny-random-t5"
77+
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
78+
79+
model = model.to_bettertransformer()
80+
81+
with tempfile.TemporaryDirectory() as tmpdirname:
82+
with self.assertRaises(ValueError):
83+
model.save_pretrained(tmpdirname)
84+
85+
model = model.reverse_bettertransformer()
86+
model.save_pretrained(tmpdirname)

0 commit comments

Comments
 (0)
0