8000 Update OFT to fix merge bugs by zqiu24 · Pull Request #1996 · huggingface/peft · GitHub
[go: up one dir, main page]

Skip to content

Update OFT to fix merge bugs #1996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Oct 1, 2024
Merged

Update OFT to fix merge bugs #1996

merged 43 commits into from
Oct 1, 2024

Conversation

zqiu24
Copy link
Contributor
@zqiu24 zqiu24 commented Aug 8, 2024

changes for BOFT

  • fixing small errors in comments of BOFT
  • making Conv2D operation consistent with Linear

changes for OFT

  • [Important!] fixing the incorrect merging of OFT, it is not an implementation error, but more of an incorrect understanding of how orthogonal fine-tuning works, resulting in a wrong way of merging adapters
  • update config.py/layer.py/model.py file to be consistent with other peft methods
  • adding additional paramters oft_block_size, oft_dropout, bias
  • fixing module dropout implementation error (in orthogonal finetuning, dropout is not the same as LoRA)
  • fixing errors in merge and unmerge function in OFT
  • making Conv2D operation consistent with Linear

@zqiu24 zqiu24 changed the title Update oft to fix merge bugs Update OFT to fix merge bugs Aug 8, 2024
Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing the issues you mentioned with OFT. I don't understand all the details but in general, this is to bring OFT closer to the BOFT implementation, is that right? Is that also why you decided to no longer use LycorisTuner as a base?

Did you check whether after these changes, the same code using OFT would still produce the exact same results? E.g. let's say a user has trained an OFT model checkpoint in PEFT 0.12. Now we merge this PR and release it in PEFT 0.13. If the user upgrades to 0.13 and loads the OFT checkpoint, would they still get the same results? This is really important because otherwise it would be a source of bugs.

If we cannot ensure the same results after this PR because, for instance, something was not implemented correctly previously, at the very least we have to add some mechanism to error or warn users when they try to load an old checkpoint with new PEFT versions.

I have added some smaller comments too, please check them out. On top of those, please run make style and ensure 120 character line limits (e.g. long strings).

Also, pinging @okotaku and @lukaskuhn-lku since they worked on the original OFT PR.

Comment on lines 318 to 322
elif boft_block_size != 0 and boft_block_num != 0:
raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")

else:
raise ValueError(
f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
"to be 0, because boft_block_size x boft_block_num != in_features."
)
raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these checks? We already have checks here:

if self.boft_block_size == 0 and self.boft_block_num == 0:
raise ValueError("You must specify either boft_block_size or boft_block_num.")
if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):
raise ValueError(
f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "
"but not both simultaneously, because boft_block_size x boft_block_num != in_features."
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I forgot there is already a check in the config.py Updated.

Comment on lines 750 to 754
elif boft_block_size != 0 and boft_block_num != 0:
raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")

else:
raise ValueError("Unknown error!")
raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

class OFTLayer(nn.Module, LycorisLayer):
class MultiplicativeDropoutLayer(nn.Module):
"""
Implements the multiplicative dropout layer for BOFT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implements the multiplicative dropout layer for BOFT.
Implements the multiplicative dropout layer for OFT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

if D == 1:
return x

# Create a mask with 1s for matrices to be replaced with identity and 0s otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence doesn't quite make sense, could you please check again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Aug 9, 2024

@BenjaminBossan Hi, thank you so much. Because the main change (beside changing the format) is in the merge/unmerge, it should be possible to update it without affecting previous trained model checkpoints. I am currently testing it. But I am currently facing a problem:

import transformers
import accelerate
import peft
from datasets import load_dataset
from transformers import AutoImageProcessor
from peft import LoraConfig, BOFTConfig, OFTConfig, get_peft_model
import numpy as np
import evaluate
import torch
import os

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from peft import PeftConfig, PeftModel, AutoPeftModel

os.environ["HF_API_TOKEN"] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

if __name__ == '__main__':
    dataset = load_dataset("food101", split="train[:5000]")

    output_dir = "output"

    model_checkpoint = "google/vit-base-patch16-224-in21k"

    labels = dataset.features["label"].names
    label2id, id2label = dict(), dict()
    for i, label in enumerate(labels):
        label2id[label] = i
        id2label[i] = label

    image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)


    normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
    train_transforms = Compose(
        [
            RandomResizedCrop(image_processor.size["height"]),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

    val_transforms = Compose(
        [
            Resize(image_processor.size["height"]),
            CenterCrop(image_processor.size["height"]),
            ToTensor(),
            normalize,
        ]
    )


    def preprocess_train(example_batch):
        """Apply train_transforms across a batch."""
        example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    def preprocess_val(example_batch):
        """Apply val_transforms across a batch."""
        example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    splits = dataset.train_test_split(test_size=0.1)
    train_ds = splits["train"]
    val_ds = splits["test"]

    train_ds.set_transform(preprocess_train)
    val_ds.set_transform(preprocess_val)


    def print_trainable_parameters(model):
        trainable_params = 0
        all_param = 0
        for _, param in model.named_parameters():
            all_param += param.numel()
            if param.requires_grad:
                trainable_params += param.numel()
        print(
            f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
        )

    model = AutoModelForImageClassification.from_pretrained(
        model_checkpoint,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )


    print_trainable_parameters(model)

    config = LoraConfig(
        r=32,
        lora_alpha=16,
        target_modules=["query", "value"],
        lora_dropout=0.1,
        bias="none",
        # modules_to_save=["classifier"],
    )

    lora_model = get_peft_model(model, config)
    print_trainable_parameters(lora_model)

    model_name = model_checkpoint.split("/")[-1]
    batch_size = 128

    args = TrainingArguments(
        f"{model_name}-finetuned-lora-food101",
        remove_unused_columns=False,
        evaluation_strategy="epoch",
        save_strategy="no",
        learning_rate=5e-3,
        per_device_train_batch_size=batch_size,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=batch_size,
        fp16=True,
        num_train_epochs=5,
        logging_steps=10,
        metric_for_best_model="accuracy",
        label_names=["labels"],
        eval_strategy="steps",
        eval_steps=100000000000,
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        """Computes accuracy on a batch of predictions"""
        predictions = np.argmax(eval_pred.predictions, axis=1)
        return metric.compute(predictions=predictions, references=eval_pred.label_ids)

    def collate_fn(examples):
        pixel_values = torch.stack([example["pixel_values"] for example in examples])
        labels = torch.tensor([example["label"] for example in examples])
        return {"pixel_values": pixel_values, "labels": labels}

    trainer = Trainer(
        lora_model,
        args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        tokenizer=image_processor,
        compute_metrics=compute_metrics,
        data_collator=collate_fn,
    )
    eval_results = trainer.evaluate(val_ds)
    print('Before training', eval_results)

    repo_name = f"sgp-bench/peft-finetuned-lora-food101"

    train_results = trainer.train()
    
    token = os.getenv("HF_API_TOKEN")
    lora_model.push_to_hub(repo_name, token=token)

    eval_results = trainer.evaluate(val_ds)
    print('After training', eval_results)

    config = PeftConfig.from_pretrained(repo_name)
    model = AutoModelForImageClassification.from_pretrained(
        config.base_model_name_or_path,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )
    # Load the LoRA model
    # inference_model = AutoPeftModel.from_pretrained(repo_name).to("cuda")
    trainer.model = PeftModel.from_pretrained(model, repo_name).to("cuda")
    eval_results = trainer.evaluate(val_ds)
    print('Load adapter weight and inference', eval_results)

I am currently debugging with LoRA and the official guide from doc.
I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?
If I do merge_and_unload(), the eval_accuracy is the same as after training.
Is there something wrong with the way I am loading adapter weights?

@BenjaminBossan
Copy link
Member

I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?

I could reproduce your issue of low accuracy (I did not train, just loaded your checkpoint). I think the issue is this line, right?

# modules_to_save=["classifier"]

For this reason, the classifier layer is randomly initialized, so the accuracy is low.

If I do merge_and_unload(), the eval_accuracy is the same as after training.

Could you show me which steps you take to achieve this?

@zqiu24
Copy link
Contributor Author
zqiu24 commented Aug 9, 2024

Ah, ok, you are right, in the old checkpoint with merge_and_unload I did have the classifier saved.

It should be working now, I have tested it can load an old checkpoint using previous version of PEFT OFT.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model,
B421
 peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success

In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
"to be 0, because boft_block_size x boft_block_num != in_features."
)
raise ValueError("Unknown error!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code should not be reachable. Still, let's use a more helpful error message. My proposal:

"Something went wrong, please report this error: https://github.com/huggingface/peft/issues".

# OFT info
self.oft_r = nn.ParameterDict({})
self.oft_s = nn.ParameterDict({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we introduce a new parameter oft_s. All existing checkpoints don't have this parameter, so that means when a user loads an old checkpoint, this parameter is randomly initialized, which means the checkpoint behaves different. See my longer comment on compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it depends on the init_weights, if init_weights=True (which should be the default behavior), then oft_s is basically like an identity and should not make changes. If init_weights=False, then it is randomly initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

Comment on lines 179 to 182
elif r != 0 and oft_block_size != 0:
raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
else:
raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should not be reachable because of the checks we have in the config class, but IMO it's fine to leave them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 506 to 509
elif r != 0 and oft_block_size != 0:
raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
else:
raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier: Should be unreachable but it's fine if we leave the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"]}),
("Vanilla MLP 5 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"], "modules_to_save": ["lin1"]}),
("Vanilla MLP 1 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": "lin0"}),
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Aug 13, 2024

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success
In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose F987 a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Good questions. I checked what would be the best place to change this, keeping in mind that in the future, we could have more of these cases.

My suggestion is the following. In this line, we load all the arguments for the config:

kwargs = {**class_kwargs, **loaded_attributes}

Here, we could add a check on these arguments that could be used to raise an error if an old OFT config is detected. So the new code would be:

        loaded_attributes = cls.from_json_file(config_file)
        kwargs = {**class_kwargs, **loaded_attributes}
        kwargs = self.check_kwargs(**kwargs)  # <= new line
        return cls.from_peft_type(**kwargs)

By default, this method should be defined on PeftConfigMixin to just return the kwargs without any validation or change. Then on OFTConfig, we can overrride this method to check if it's from an old OFT checkpoint and raise an error.

To check if the config is from an old checkpoint, we could simply check for the presence of one of the new arguments added in this PR. WDYT?

Maybe long term it would be better to also store the PEFT version in the adapter_config.json so that this can be checked instead. But it's not quite easy, as existing checkpoints don't have this info yet. So let's not add that in the current PR.

# OFT info
self.oft_r = nn.ParameterDict({})
self.oft_s = nn.ParameterDict({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Aug 26, 2024

@BenjaminBossan Sorry for the late reply, I added the check_kwargs, do you think it is ok like that? As for the oft_s, if the user used an older version, it would be not existent, therefore, even if it is trained, it will still be an identity transform, therefore should make no difference.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these latest changes.

Concerning the compatibility check, I have a few comments, please check.

Also, I think it would be good to have a unit test for this check. I created a checkpoint with the old OFT implementation here: https://huggingface.co/peft-internal-testing/OFT-tiny-random-OPTForCausalLM. You can use it to load and then check that the warning has been raised.

"""
if "oft_block_size" in kwargs:
warnings.warn(
'OFT has been updated since 0.12.1.dev0. Your trained adapter weights may not be compatible with the latest version of OFT. Please retrain your adapter weights.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Note that dev0 is not a real version, so let's not refer to that. The next release version will be 0.13.0, so let' use that version.
  2. "may not be compatible": We are pretty sure it is incompatible when trained, right? Let's phrase it as "is incompatible".
  3. Let's also mention that users can downgrade PEFT to version 0.12.0 and then the adapter will still work.

kwargs (additional keyword arguments, *optional*):
Additional keyword arguments passed along to the child class initialization.
"""
if "oft_block_size" in kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so oft_block_size is the new parameter, right? So if we're loading an old OFT model, it should be missing. Therefore, should the check not be if "oft_block_size" not in kwargs?

@zqiu24
Copy link
Contributor Author
zqiu24 commented Aug 27, 2024

@BenjaminBossan I noticed some other problems in the previous implementation of OFT, the size of the OFT adapters are also defined incorrectly (OFT should rotate the neurons, not the feature dimensions, the current way could have some fine-tuning effects, but is not optimal). Therefore, I added the raise ValueError in the OFTConfig, because it is now not possible to load the old adapter weights. I would suggest the user either use the old version of PEFT or retrain the adapter weights entirely. Best.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making further corrections to OFT. I have a few smaller comments and also a question about the parameter finding algorithm, please take a look.

Also, please ensure to run make style on your code changes.

"""
if "oft_block_size" not in kwargs:
raise ValueError(
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

Also, let's ensure the 120 char line limit.

if "oft_block_size" not in kwargs:
raise ValueError(
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

Not specific to merging, right?

Comment on lines 173 to 175
warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
oft_block_size = self.adjust_oft_parameters(self.in_features, oft_block_size)
warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of two warnings, let's make this one warning by prepending the first message to the second.

if self.in_features % r != 0 or r > self.in_features:
warnings.warn(f"Invalid `r` ({r})!")
r = self.adjust_oft_parameters(self.in_features, r)
warnings.warn(f"Adjusted `r` to ({r}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about having a single warning.

if conv_filter_dim % oft_block_size != 0 or oft_block_size > conv_filter_dim:
warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
oft_block_size = self.adjust_oft_parameters(conv_filter_dim, oft_block_size)
warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about having a single warning.

if conv_filter_dim % r != 0 or r > conv_filter_dim:
warnings.warn(f"Invalid `r` ({r})!")
r = self.adjust_oft_parameters(conv_filter_dim, r)
warnings.warn(f"Adjusted `r` to ({r}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about having a single warning.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 27, 2024

@BenjaminBossan So locally, I ran the pytest tests/ -k test_encoder_decoder_models and pytest tests/ -k test_decoder_models and I did not get any error.

I do get some error with pytest tests/ -k test_feature_extraction_models:
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_27_test_hf_internal_testing_tiny_random_DebertaModel_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_31_test_hf_internal_testing_tiny_random_DebertaV2Model_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

@BenjaminBossan
Copy link
Member

I do get some error with pytest tests/ -k test_feature_extraction_models

Yeah, I can reproduce that, strange. Maybe it's related to the special layer types that Deberta uses.. I think we can skip this specific Deberta test, similar to how we skip AdaLoRA + Roberta:

peft/tests/testing_common.py

Lines 1105 to 1107 in c29810b

if (config_cls == AdaLoraConfig) and ("roberta" in model_id.lower()):
# TODO: no gradients on the "dense" layer, other layers work, not sure why
self.skipTest("AdaLora with RoBERTa does not work correctly")

Please also merge with/rebase on the latest main from PEFT.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 27, 2024

@BenjaminBossan I fixed the error in test_feature_extraction_models and fetched the latest main. Best.

@BenjaminBossan
Copy link
Member

Hmm, it seems like test_deeply_nested is flaky now. Let me investigate a bit next week.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 27, 2024

@BenjaminBossan Thank you so much! I tested the test_deeply_nested, so locally there was no error.

@BenjaminBossan
Copy link
Member

@Zeju1997 I can confirm that test_deeply_nested works locally, both on GPU and CPU. Therefore, I'm not sure what's causing the error on CI. Checking the error message, the tensors look identical, so it's most likely a precision problem. Could you please increase the tolerance in this test to 1e-4?

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 30, 2024

@BenjaminBossan Thanks! I have updated.

@BenjaminBossan
Copy link
Member

Thanks, but unfortunately, this has not solved the issue. I tested on another Linux PC with Intel CPU and still can't reproduce the issue. At this point, I don't know what else can be done, it's most likely some problem with the hardware of the GitHub CI runners. I would propose to skip this test on Linux for now:

...
import platform
...

# in test
        if platform.system() == "Linux":
            self.skipTest("This test fails but only on GitHub CI with Linux systems.")

If you add this, you can also do revert the tolerance changes, since they didn't help.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 30, 2024

@BenjaminBossan Updated.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Sep 30, 2024

@BenjaminBossan I think all the tests have been passed. Thank you again for the great help during the process!

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR and for your patience. Finally CI is green, I'll merge right away.

Comment on lines 186 to 188
"OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible "
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
"Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, damn, a minor issue I noticed just now: Since PEFT 0.13.0 was released already, this message is no longer correct.

Suggested change
"OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible "
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
"Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights."
"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible "
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
"Alternatively, downgrade PEFT to version 0.13.0 to use the old adapter weights."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @Zeju1997 just this small change and we're good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan Sure, updated. Best.

Comment on lines 186 to 187
"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible"
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add back the spaces:

Suggested change
"OFT has b F438 een updated since PEFT 0.14.0. Your trained adapter weights are incompatible"
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible "
"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions. "

@zqiu24
Copy link
Contributor Author
zqiu24 commented Oct 1, 2024

@BenjaminBossan Updated. Best,

@BenjaminBossan BenjaminBossan merged commit 2a80735 into huggingface:main Oct 1, 2024
14 checks passed
@BenjaminBossan
Copy link
Member

We've got it, finally. Thanks for your patience.

@BenjaminBossan
Copy link
Member
BenjaminBossan commented Oct 2, 2024

@Zeju1997 Unfortunately, the regression tests are failing now for BOFT with the Conv2d layer:

https://github.com/huggingface/peft/actions/runs/11136746324/job/30949028303#step:8:182

The issue is most likely that the boft_s parameter has been transposed:

https://github.com/huggingface/peft/pull/1996/files#diff-76cd9e2dc2dd53a73bdd1a2ca6f475b44978690fea8e417bbd3952ae1c7929adR775

This means that users wouldn't be able to load BOFT checkpoints including Conv2d. Can we transpose this vector back to its initial shape?

To test it locally, run this pytest command:

pytest tests/regression/test_regression.py -s --regression -k boft

@zqiu24
Copy link
Contributor Author
zqiu24 commented Oct 4, 2024

@BenjaminBossan OK, I will look into this.

@zqiu24
Copy link
Contributor Author
zqiu24 commented Oct 4, 2024

@BenjaminBossan Is this related to old checkpoints from previous versions or also the current version?

@BenjaminBossan
Copy link
Member

Is this related to old checkpoints from previous versions or also the current version?

It's an old BOFT Conv2d checkpoint that can't be loaded with the current code.

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Oct 22, 2024
The previous OFT implementation contained a few errors, which are fixed now.

Unfortunately, this makes previous OFT checkpoints invalid, which is why an
error will be raised. Users are instructed to either retrain the OFT adapter or
switch to an old PEFT version.
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
The previous OFT implementation contained a few errors, which are fixed now.

Unfortunately, this makes previous OFT checkpoints invalid, which is why an
error will be raised. Users are instructed to either retrain the OFT adapter or
switch to an old PEFT version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0