avoid saving boft_P in adapter model #2050

sywangyi · 2024-09-05T10:25:25Z

No description provided.

sywangyi · 2024-09-05T10:28:51Z

you could use https://github.com/huggingface/peft/blob/main/examples/boft_dreambooth/train_dreambooth.sh to reproduce, wo the PR. the unet adapter size is 485995200B, with the PR, the adapter size drop to 54387152

training params from the output is
trainable params: 13,611,904 || all params: 879,497,668 || trainable%: 1.5477

training precision is float32

sywangyi · 2024-09-05T10:29:36Z

@BenjaminBossan please help review

BenjaminBossan

Great catch, thanks for providing a fix. Indeed, I think that the boft_P buffer needs not be saved. Pinging @Zeju1997 to kindly ask to check this as well.

Let's also add a unit test for this. I tested locally using this:

import torch
from peft import BOFTConfig, PeftModel, get_peft_model
from safetensors.torch import load_file
from transformers import AutoModelForCausalLM


class TestBoft:
    def test_boft_state_dict(self, tmp_path):
        # see #2050
        # ensure that the boft_P buffer is not stored in the checkpoint file and is not necessary to load the model
        # correctly
        torch.manual_seed(0)

        inputs = torch.arange(10).view(-1, 1).to(0)
        model_id = "hf-internal-testing/tiny-random-OPTForCausalLM"
        model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
        model.eval();
        output_base = model(inputs).logits

        config = BOFTConfig(init_weights=False)
        model = get_peft_model(model, config)
        model.eval();
        output_peft = model(inputs).logits

        atol, rtol = 1e-9, 1e-9
        # sanity check: loading boft changed the output
        assert not torch.allclose(output_base, output_peft, atol=atol, rtol=rtol)

        model.save_pretrained(tmp_path)
        del model

        # check that the boft_P buffer is not present
        state_dict = load_file(tmp_path / "adapter_model.safetensors")
        assert not any("boft_P" in key for key in state_dict)

        # sanity check: the model still produces the same output after loading
        model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
        model = PeftModel.from_pretrained(model, tmp_path)
        output_loaded = model(inputs).logits
        assert torch.allclose(output_peft, output_loaded, atol=atol, rtol=rtol)

If you agree, just create a new file tests/test_boft.py and add it there (with the copyright notice on top).

One thing I was concerned about was whether this would cause trouble loading old checkpoints that still contain the boft_P buffer. It should not matter, but just to be sure, I created such a checkpoint and wrote a test to check that this checkpoint can be successfully loaded. This test can be added below the previous one.

    def test_boft_old_checkpoint_including_boft_P(self, tmp_path):
        # see #2050
        # This test exists to ensure that after the boft_P buffer was made non-persistent, old checkpoints can still be
        # loaded successfully.
        torch.manual_seed(0)

        inputs = torch.arange(10).view(-1, 1).to(0)
        model_id = "hf-internal-testing/tiny-random-OPTForCausalLM"
        model = AutoModelForCausalLM.from_pretrained(model_id).to(0)

        # first create the expected output
        config = BOFTConfig(init_weights=False)
        model = get_peft_model(model, config)
        model.eval();
        output_peft = model(inputs).logits
        del model

        model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
        # checkpoint from before the PR whose state_dict still contains boft_P
        hub_id = "peft-internal-testing/boft-tiny-opt-peft-v0.12"
        model = PeftModel.from_pretrained(model, hub_id)
        output_old = model(inputs).logits

        atol, rtol = 1e-9, 1e-9
        assert torch.allclose(output_peft, output_old, atol=atol, rtol=rtol)

HuggingFaceDocBuilderDev · 2024-09-05T14:15:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for adding the tests, I only have some minor comments. Could you please also run make style with a recent ruff version? Apart from that, let's wait a bit if Zeju wants to add anything, otherwise I'll merge as is.

tests/test_boft.py

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

BenjaminBossan

Sorry @sywangyi I forgot to make the test device agnostic. With the suggested changes, the CI should pass.

tests/test_boft.py

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

BenjaminBossan · 2024-09-09T10:27:02Z

@sywangyi For some reason, the tests fail on MacOS. I don't have a machine to test this on, but looking at the logs, the tensors look identical but still the torch.allclose call returns False:

FAILED tests/test_boft.py::TestBoft::test_boft_old_checkpoint_including_boft_P - assert False
 +  where False = <built-in method allclose of type object at 0x10e63f150>(
tensor([
[[ 0.2096,  0.0000, -0.0209,  ...,  0.1637, -0.1071,  0.0516]],\n\n        
[[ 0.0901,  0.0000,  0.0214,  ...,  0...0301]],\n\n        
[[ 0.0866,  0.0000,  0.0419,  ...,  0.1571, -0.1145, -0.0144]]
],\n       grad_fn=<UnsafeViewBackward0>), tensor([
[[ 0.2096,  0.0000, -0.0209,  ...,  0.1637, -0.1071,  0.0516]],\n\n        
[[ 0.0901,  0.0000,  0.0214,  ...,  0..., -0.0659,  ...,  0.0609, -0.0228,  0.0301]],\n\n        
[[ 0.0866,  0.0000,  0.0419,  ...,  0.1571, -0.1145, -0.0144]]]), atol=1e-09, rtol=1e-09)
 +    where <built-in method allclose of type object at 0x10e63f150> = torch.allclose

(I added some line breaks for readability)

Probably it's enough to loosen the tolerance, 1e-9 really isn't necessary.

Other than that, I'd wait until end of week if Zeju replies, otherwise I'll merge even without their approval.

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

BenjaminBossan

Thanks for this improvement, LGTM.

We didn't get a response, so I'll just go ahead and merge this.

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

BenjaminBossan requested changes Sep 5, 2024

View reviewed changes

sywangyi requested a review from BenjaminBossan September 5, 2024 13:40

BenjaminBossan requested changes Sep 5, 2024

View reviewed changes

tests/test_boft.py Outdated Show resolved Hide resolved

tests/test_boft.py Outdated Show resolved Hide resolved

sywangyi added 2 commits September 5, 2024 15:28

avoid saving boft_P in adapter model

e988251

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

add boft test

d52803a

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

BenjaminBossan requested changes Sep 6, 2024

View reviewed changes

sywangyi and others added 2 commits September 6, 2024 06:45

format the file

ab514fc

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

Apply suggestions from code review

29fde18

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

sywangyi requested a review from BenjaminBossan September 7, 2024 00:17

fix test failure

5945d11

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

loss the tolerance

1728191

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

BenjaminBossan approved these changes Sep 13, 2024

View reviewed changes

BenjaminBossan merged commit 2520227 into huggingface:main Sep 13, 2024
14 checks passed

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Sep 18, 2024

ENH BOFT don't save boft_P buffer (huggingface#2050)

38f4a98

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025

ENH BOFT don't save boft_P buffer (huggingface#2050)

7dae16c

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

avoid saving boft_P in adapter model #2050

avoid saving boft_P in adapter model #2050

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avoid saving boft_P in adapter model #2050

avoid saving boft_P in adapter model #2050

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!