-
Notifications
You must be signed in to change notification settings - Fork 2k
avoid saving boft_P in adapter model #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
you could use https://github.com/huggingface/peft/blob/main/examples/boft_dreambooth/train_dreambooth.sh to reproduce, wo the PR. the unet adapter size is 485995200B, with the PR, the adapter size drop to 54387152 training params from the output is training precision is float32 |
@BenjaminBossan please help review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks for providing a fix. Indeed, I think that the boft_P
buffer needs not be saved. Pinging @Zeju1997 to kindly ask to check this as well.
Let's also add a unit test for this. I tested locally using this:
import torch
from peft import BOFTConfig, PeftModel, get_peft_model
from safetensors.torch import load_file
from transformers import AutoModelForCausalLM
class TestBoft:
def test_boft_state_dict(self, tmp_path):
# see #2050
# ensure that the boft_P buffer is not stored in the checkpoint file and is not necessary to load the model
# correctly
torch.manual_seed(0)
inputs = torch.arange(10).view(-1, 1).to(0)
model_id = "hf-internal-testing/tiny-random-OPTForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
model.eval();
output_base = model(inputs).logits
config = BOFTConfig(init_weights=False)
model = get_peft_model(model, config)
model.eval();
output_peft = model(inputs).logits
atol, rtol = 1e-9, 1e-9
# sanity check: loading boft changed the output
assert not torch.allclose(output_base, output_peft, atol=atol, rtol=rtol)
model.save_pretrained(tmp_path)
del model
# check that the boft_P buffer is not present
state_dict = load_file(tmp_path / "adapter_model.safetensors")
assert not any("boft_P" in key for key in state_dict)
# sanity check: the model still produces the same output after loading
model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
model = PeftModel.from_pretrained(model, tmp_path)
output_loaded = model(inputs).logits
assert torch.allclose(output_peft, output_loaded, atol=atol, rtol=rtol)
If you agree, just create a new file tests/test_boft.py
and add it there (with the copyright notice on top).
One thing I was concerned about was whether this would cause trouble loading old checkpoints that still contain the boft_P
buffer. It should not matter, but just to be sure, I created such a checkpoint and wrote a test to check that this checkpoint can be successfully loaded. This test can be added below the previous one.
def test_boft_old_checkpoint_including_boft_P(self, tmp_path):
# see #2050
# This test exists to ensure that after the boft_P buffer was made non-persistent, old checkpoints can still be
# loaded successfully.
torch.manual_seed(0)
inputs = torch.arange(10).view(-1, 1).to(0)
model_id = "hf-internal-testing/tiny-random-OPTForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
# first create the expected output
config = BOFTConfig(init_weights=False)
model = get_peft_model(model, config)
model.eval();
output_peft = model(inputs).logits
del model
model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
# checkpoint from before the PR whose state_dict still contains boft_P
hub_id = "peft-internal-testing/boft-tiny-opt-peft-v0.12"
model = PeftModel.from_pretrained(model, hub_id)
output_old = model(inputs).logits
atol, rtol = 1e-9, 1e-9
assert torch.allclose(output_peft, output_old, atol=atol, rtol=rtol)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests, I only have some minor comments. Could you please also run make style
with a recent ruff version? Apart from that, let's wait a bit if Zeju wants to add anything, otherwise I'll merge as is.
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @sywangyi I forgot to make the test device agnostic. With the suggested changes, the CI should pass.
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@sywangyi For some reason, the tests fail on MacOS. I don't have a machine to test this on, but looking at the logs, the tensors look identical but still the
(I added some line breaks for readability) Probably it's enough to loosen the tolerance, 1e-9 really isn't necessary. Other than that, I'd wait until end of week if Zeju replies, otherwise I'll merge even without their approval. |
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this improvement, LGTM.
We didn't get a response, so I'll just go ahead and merge this.
The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.
The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.
No description provided.