Update OFT to fix merge bugs #1996

zqiu24 · 2024-08-08T18:13:06Z

changes for BOFT

fixing small errors in comments of BOFT
making Conv2D operation consistent with Linear

changes for OFT

[Important!] fixing the incorrect merging of OFT, it is not an implementation error, but more of an incorrect understanding of how orthogonal fine-tuning works, resulting in a wrong way of merging adapters
update config.py/layer.py/model.py file to be consistent with other peft methods
adding additional paramters oft_block_size, oft_dropout, bias
fixing module dropout implementation error (in orthogonal finetuning, dropout is not the same as LoRA)
fixing errors in merge and unmerge function in OFT
making Conv2D operation consistent with Linear

BenjaminBossan

Thanks a lot for fixing the issues you mentioned with OFT. I don't understand all the details but in general, this is to bring OFT closer to the BOFT implementation, is that right? Is that also why you decided to no longer use LycorisTuner as a base?

Did you check whether after these changes, the same code using OFT would still produce the exact same results? E.g. let's say a user has trained an OFT model checkpoint in PEFT 0.12. Now we merge this PR and release it in PEFT 0.13. If the user upgrades to 0.13 and loads the OFT checkpoint, would they still get the same results? This is really important because otherwise it would be a source of bugs.

If we cannot ensure the same results after this PR because, for instance, something was not implemented correctly previously, at the very least we have to add some mechanism to error or warn users when they try to load an old checkpoint with new PEFT versions.

I have added some smaller comments too, please check them out. On top of those, please run make style and ensure 120 character line limits (e.g. long strings).

Also, pinging @okotaku and @lukaskuhn-lku since they worked on the original OFT PR.

BenjaminBossan · 2024-08-09T10:25:00Z

src/peft/tuners/boft/layer.py

+        elif boft_block_size != 0 and boft_block_num != 0:
+            raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")
+
        else:
-            raise ValueError(
-                f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
-                "to be 0, because boft_block_size x boft_block_num != in_features."
-            )
+            raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")


Do we need these checks? We already have checks here:

peft/src/peft/tuners/boft/config.py

Lines 127 to 133 in 41c274e

if self.boft_block_size == 0 and self.boft_block_num == 0:

raise ValueError("You must specify either boft_block_size or boft_block_num.")

if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):

raise ValueError(

f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "

"but not both simultaneously, because boft_block_size x boft_block_num != in_features."

)

Yes, you are right, I forgot there is already a check in the config.py Updated.

BenjaminBossan · 2024-08-09T10:25:17Z

src/peft/tuners/boft/layer.py

+        elif boft_block_size != 0 and boft_block_num != 0:
+            raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")
+
        else:
-            raise ValueError("Unknown error!")
+            raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")


Same comment.

BenjaminBossan · 2024-08-09T10:27:24Z

src/peft/tuners/oft/layer.py

-class OFTLayer(nn.Module, LycorisLayer):
+class MultiplicativeDropoutLayer(nn.Module):
+    """
+    Implements the multiplicative dropout layer for BOFT.


Suggested change

Implements the multiplicative dropout layer for BOFT.

Implements the multiplicative dropout layer for OFT.

BenjaminBossan · 2024-08-09T10:28:33Z

src/peft/tuners/oft/layer.py

+            if D == 1:
+                return x
+
+            # Create a mask with 1s for matrices to be replaced with identity and 0s otherwise


Sentence doesn't quite make sense, could you please check again?

src/peft/tuners/oft/layer.py

zqiu24 · 2024-08-09T13:21:47Z

@BenjaminBossan Hi, thank you so much. Because the main change (beside changing the format) is in the merge/unmerge, it should be possible to update it without affecting previous trained model checkpoints. I am currently testing it. But I am currently facing a problem:

import transformers
import accelerate
import peft
from datasets import load_dataset
from transformers import AutoImageProcessor
from peft import LoraConfig, BOFTConfig, OFTConfig, get_peft_model
import numpy as np
import evaluate
import torch
import os

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from peft import PeftConfig, PeftModel, AutoPeftModel

os.environ["HF_API_TOKEN"] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

if __name__ == '__main__':
    dataset = load_dataset("food101", split="train[:5000]")

    output_dir = "output"

    model_checkpoint = "google/vit-base-patch16-224-in21k"

    labels = dataset.features["label"].names
    label2id, id2label = dict(), dict()
    for i, label in enumerate(labels):
        label2id[label] = i
        id2label[i] = label

    image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)


    normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
    train_transforms = Compose(
        [
            RandomResizedCrop(image_processor.size["height"]),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

    val_transforms = Compose(
        [
            Resize(image_processor.size["height"]),
            CenterCrop(image_processor.size["height"]),
            ToTensor(),
            normalize,
        ]
    )


    def preprocess_train(example_batch):
        """Apply train_transforms across a batch."""
        example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    def preprocess_val(example_batch):
        """Apply val_transforms across a batch."""
        example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    splits = dataset.train_test_split(test_size=0.1)
    train_ds = splits["train"]
    val_ds = splits["test"]

    train_ds.set_transform(preprocess_train)
    val_ds.set_transform(preprocess_val)


    def print_trainable_parameters(model):
        trainable_params = 0
        all_param = 0
        for _, param in model.named_parameters():
            all_param += param.numel()
            if param.requires_grad:
                trainable_params += param.numel()
        print(
            f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
        )

    model = AutoModelForImageClassification.from_pretrained(
        model_checkpoint,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )


    print_trainable_parameters(model)

    config = LoraConfig(
        r=32,
        lora_alpha=16,
        target_modules=["query", "value"],
        lora_dropout=0.1,
        bias="none",
        # modules_to_save=["classifier"],
    )

    lora_model = get_peft_model(model, config)
    print_trainable_parameters(lora_model)

    model_name = model_checkpoint.split("/")[-1]
    batch_size = 128

    args = TrainingArguments(
        f"{model_name}-finetuned-lora-food101",
        remove_unused_columns=False,
        evaluation_strategy="epoch",
        save_strategy="no",
        learning_rate=5e-3,
        per_device_train_batch_size=batch_size,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=batch_size,
        fp16=True,
        num_train_epochs=5,
        logging_steps=10,
        metric_for_best_model="accuracy",
        label_names=["labels"],
        eval_strategy="steps",
        eval_steps=100000000000,
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        """Computes accuracy on a batch of predictions"""
        predictions = np.argmax(eval_pred.predictions, axis=1)
        return metric.compute(predictions=predictions, references=eval_pred.label_ids)

    def collate_fn(examples):
        pixel_values = torch.stack([example["pixel_values"] for example in examples])
        labels = torch.tensor([example["label"] for example in examples])
        return {"pixel_values": pixel_values, "labels": labels}

    trainer = Trainer(
        lora_model,
        args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        tokenizer=image_processor,
        compute_metrics=compute_metrics,
        data_collator=collate_fn,
    )
    eval_results = trainer.evaluate(val_ds)
    print('Before training', eval_results)

    repo_name = f"sgp-bench/peft-finetuned-lora-food101"

    train_results = trainer.train()
    
    token = os.getenv("HF_API_TOKEN")
    lora_model.push_to_hub(repo_name, token=token)

    eval_results = trainer.evaluate(val_ds)
    print('After training', eval_results)

    config = PeftConfig.from_pretrained(repo_name)
    model = AutoModelForImageClassification.from_pretrained(
        config.base_model_name_or_path,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )
    # Load the LoRA model
    # inference_model = AutoPeftModel.from_pretrained(repo_name).to("cuda")
    trainer.model = PeftModel.from_pretrained(model, repo_name).to("cuda")
    eval_results = trainer.evaluate(val_ds)
    print('Load adapter weight and inference', eval_results)

I am currently debugging with LoRA and the official guide from doc.
I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?
If I do merge_and_unload(), the eval_accuracy is the same as after training.
Is there something wrong with the way I am loading adapter weights?

…sion checkpoints / loading checkpoint successful

BenjaminBossan · 2024-08-09T15:22:32Z

I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?

I could reproduce your issue of low accuracy (I did not train, just loaded your checkpoint). I think the issue is this line, right?

# modules_to_save=["classifier"]

For this reason, the classifier layer is randomly initialized, so the accuracy is low.

If I do merge_and_unload(), the eval_accuracy is the same as after training.

Could you show me which steps you take to achieve this?

zqiu24 · 2024-08-09T15:39:25Z

Ah, ok, you are right, in the old checkpoint with merge_and_unload I did have the classifier saved.

It should be working now, I have tested it can load an old checkpoint using previous version of PEFT OFT.

HuggingFaceDocBuilderDev · 2024-08-13T09:10:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model,
B421
 peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1	branch 2	method	status
main	PR	OFT	fail: outputs different
PR	main	OFT	`TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'`
main	PR	BOFT	success
PR	main	BOFT	success

In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

BenjaminBossan · 2024-08-13T09:08:42Z

src/peft/tuners/boft/layer.py

-                f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
-                "to be 0, because boft_block_size x boft_block_num != in_features."
-            )
+            raise ValueError("Unknown error!")


So this code should not be reachable. Still, let's use a more helpful error message. My proposal:

"Something went wrong, please report this error: https://github.com/huggingface/peft/issues".

src/peft/tuners/boft/layer.py

BenjaminBossan · 2024-08-13T09:43:54Z

src/peft/tuners/oft/layer.py

        # OFT info
        self.oft_r = nn.ParameterDict({})
+        self.oft_s = nn.ParameterDict({})


Here we introduce a new parameter oft_s. All existing checkpoints don't have this parameter, so that means when a user loads an old checkpoint, this parameter is randomly initialized, which means the checkpoint behaves different. See my longer comment on compatibility.

Actually, it depends on the init_weights, if init_weights=True (which should be the default behavior), then oft_s is basically like an identity and should not make changes. If init_weights=False, then it is randomly initialized.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

BenjaminBossan · 2024-08-13T09:45:37Z

src/peft/tuners/oft/layer.py

+        elif r != 0 and oft_block_size != 0:
+            raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
+        else:
+            raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")


These lines should not be reachable because of the checks we have in the config class, but IMO it's fine to leave them here.

BenjaminBossan · 2024-08-13T09:47:26Z

src/peft/tuners/oft/layer.py

+        elif r != 0 and oft_block_size != 0:
+            raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
+        else:
+            raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")


Same comment as earlier: Should be unreachable but it's fine if we leave the code.

BenjaminBossan · 2024-08-13T09:48:16Z

tests/test_custom_models.py

-    ("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"]}),
-    ("Vanilla MLP 5 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"], "modules_to_save": ["lin1"]}),
+    ("Vanilla MLP 1 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": "lin0"}),
+    ("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),


Suggested change

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

zqiu24 · 2024-08-13T10:58:36Z

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:
# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)
Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success
In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

BenjaminBossan

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Good questions. I checked what would be the best place to change this, keeping in mind that in the future, we could have more of these cases.

My suggestion is the following. In this line, we load all the arguments for the config:

peft/src/peft/config.py

Line 151 in 0421234

kwargs = {**class_kwargs, **loaded_attributes}

Here, we could add a check on these arguments that could be used to raise an error if an old OFT config is detected. So the new code would be:

        loaded_attributes = cls.from_json_file(config_file)
        kwargs = {**class_kwargs, **loaded_attributes}
        kwargs = self.check_kwargs(**kwargs)  # <= new line
        return cls.from_peft_type(**kwargs)

By default, this method should be defined on PeftConfigMixin to just return the kwargs without any validation or change. Then on OFTConfig, we can overrride this method to check if it's from an old OFT checkpoint and raise an error.

To check if the config is from an old checkpoint, we could simply check for the presence of one of the new arguments added in this PR. WDYT?

Maybe long term it would be better to also store the PEFT version in the adapter_config.json so that this can be checked instead. But it's not quite easy, as existing checkpoints don't have this info yet. So let's not add that in the current PR.

BenjaminBossan · 2024-08-13T13:24:01Z

src/peft/tuners/oft/layer.py

        # OFT info
        self.oft_r = nn.ParameterDict({})
+        self.oft_s = nn.ParameterDict({})


True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

zqiu24 · 2024-08-26T14:27:41Z

@BenjaminBossan Sorry for the late reply, I added the check_kwargs, do you think it is ok like that? As for the oft_s, if the user used an older version, it would be not existent, therefore, even if it is trained, it will still be an identity transform, therefore should make no difference.

BenjaminBossan

Thanks for making these latest changes.

Concerning the compatibility check, I have a few comments, please check.

Also, I think it would be good to have a unit test for this check. I created a checkpoint with the old OFT implementation here: https://huggingface.co/peft-internal-testing/OFT-tiny-random-OPTForCausalLM. You can use it to load and then check that the warning has been raised.

src/peft/config.py

BenjaminBossan · 2024-08-26T15:31:38Z

src/peft/config.py

+        """
+                      if "oft_block_size" in kwargs:

            
+            warnings.warn(
+                'OFT has been updated since 0.12.1.dev0. Your trained adapter weights may not be compatible with the latest version of OFT. Please retrain your adapter weights.')


Note that dev0 is not a real version, so let's not refer to that. The next release version will be 0.13.0, so let' use that version.

"may not be compatible": We are pretty sure it is incompatible when trained, right? Let's phrase it as "is incompatible".

Let's also mention that users can downgrade PEFT to version 0.12.0 and then the adapter will still work.

BenjaminBossan · 2024-08-26T15:34:13Z

src/peft/config.py

+            kwargs (additional keyword arguments, *optional*):
+                Additional keyword arguments passed along to the child class initialization.
+        """
+        if "oft_block_size" in kwargs:


Hmm, so oft_block_size is the new parameter, right? So if we're loading an old OFT model, it should be missing. Therefore, should the check not be if "oft_block_size" not in kwargs?

zqiu24 · 2024-08-27T16:34:02Z

@BenjaminBossan I noticed some other problems in the previous implementation of OFT, the size of the OFT adapters are also defined incorrectly (OFT should rotate the neurons, not the feature dimensions, the current way could have some fine-tuning effects, but is not optimal). Therefore, I added the raise ValueError in the OFTConfig, because it is now not possible to load the old adapter weights. I would suggest the user either use the old version of PEFT or retrain the adapter weights entirely. Best.

BenjaminBossan

Thanks for making further corrections to OFT. I have a few smaller comments and also a question about the parameter finding algorithm, please take a look.

Also, please ensure to run make style on your code changes.

BenjaminBossan · 2024-08-28T08:50:01Z

src/peft/tuners/oft/config.py

+        """
+        if "oft_block_size" not in kwargs:
+            raise ValueError(
+                'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'


Suggested change

'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

Also, let's ensure the 120 char line limit.

BenjaminBossan · 2024-08-28T08:50:30Z

src/peft/tuners/oft/config.py

+        if "oft_block_size" not in kwargs:
+            raise ValueError(
+                'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
+                'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'


Suggested change

'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'

'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

Not specific to merging, right?

BenjaminBossan · 2024-08-28T08:52:37Z

src/peft/tuners/oft/layer.py

+                warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
+                oft_block_size = self.adjust_oft_parameters(self.in_features, oft_block_size)
+                warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")


Instead of two warnings, let's make this one warning by prepending the first message to the second.

BenjaminBossan · 2024-08-28T08:52:48Z

src/peft/tuners/oft/layer.py

+            if self.in_features % r != 0 or r > self.in_features:
+                warnings.warn(f"Invalid `r` ({r})!")
+                r = self.adjust_oft_parameters(self.in_features, r)
+                warnings.warn(f"Adjusted `r` to ({r}).")


Same about having a single warning.

src/peft/tuners/oft/layer.py

BenjaminBossan · 2024-08-28T09:11:32Z

src/peft/tuners/oft/layer.py

+            if conv_filter_dim % oft_block_size != 0 or oft_block_size > conv_filter_dim:
+                warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
+                oft_block_size = self.adjust_oft_parameters(conv_filter_dim, oft_block_size)
+                warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")


Same comment about having a single warning.

BenjaminBossan · 2024-08-28T09:11:38Z

src/peft/tuners/oft/layer.py

+            if conv_filter_dim % r != 0 or r > conv_filter_dim:
+                warnings.warn(f"Invalid `r` ({r})!")
+                r = self.adjust_oft_parameters(conv_filter_dim, r)
+                warnings.warn(f"Adjusted `r` to ({r}).")


Same comment about having a single warning.

zqiu24 · 2024-09-27T12:47:06Z

@BenjaminBossan So locally, I ran the pytest tests/ -k test_encoder_decoder_models and pytest tests/ -k test_decoder_models and I did not get any error.

I do get some error with pytest tests/ -k test_feature_extraction_models:
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_27_test_hf_internal_testing_tiny_random_DebertaModel_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_31_test_hf_internal_testing_tiny_random_DebertaV2Model_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

BenjaminBossan · 2024-09-27T14:44:11Z

I do get some error with pytest tests/ -k test_feature_extraction_models

Yeah, I can reproduce that, strange. Maybe it's related to the special layer types that Deberta uses.. I think we can skip this specific Deberta test, similar to how we skip AdaLoRA + Roberta:

peft/tests/testing_common.py

Lines 1105 to 1107 in c29810b

    
           if (config_cls == AdaLoraConfig) and ("roberta" in model_id.lower()): 
        
               # TODO: no gradients on the "dense" layer, other layers work, not sure why 
        
               self.skipTest("AdaLora with RoBERTa does not work correctly")

Please also merge with/rebase on the latest main from PEFT.

zqiu24 · 2024-09-27T16:24:58Z

@BenjaminBossan I fixed the error in test_feature_extraction_models and fetched the latest main. Best.

BenjaminBossan · 2024-09-27T17:15:11Z

Hmm, it seems like test_deeply_nested is flaky now. Let me investigate a bit next week.

zqiu24 · 2024-09-27T17:30:22Z

@BenjaminBossan Thank you so much! I tested the test_deeply_nested, so locally there was no error.

BenjaminBossan · 2024-09-30T10:32:15Z

@Zeju1997 I can confirm that test_deeply_nested works locally, both on GPU and CPU. Therefore, I'm not sure what's causing the error on CI. Checking the error message, the tensors look identical, so it's most likely a precision problem. Could you please increase the tolerance in this test to 1e-4?

zqiu24 · 2024-09-30T10:36:59Z

@BenjaminBossan Thanks! I have updated.

BenjaminBossan · 2024-09-30T11:22:48Z

Thanks, but unfortunately, this has not solved the issue. I tested on another Linux PC with Intel CPU and still can't reproduce the issue. At this point, I don't know what else can be done, it's most likely some problem with the hardware of the GitHub CI runners. I would propose to skip this test on Linux for now:

...
import platform
...

# in test
        if platform.system() == "Linux":
            self.skipTest("This test fails but only on GitHub CI with Linux systems.")

If you add this, you can also do revert the tolerance changes, since they didn't help.

zqiu24 · 2024-09-30T12:42:37Z

@BenjaminBossan Updated.

zqiu24 · 2024-09-30T14:27:46Z

@BenjaminBossan I think all the tests have been passed. Thank you again for the great help during the process!

BenjaminBossan

Thanks a lot for this PR and for your patience. Finally CI is green, I'll merge right away.

BenjaminBossan · 2024-09-30T14:43:03Z

src/peft/tuners/oft/config.py

+                "OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible "
+                "with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
+                "Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights."


Ah, damn, a minor issue I noticed just now: Since PEFT 0.13.0 was released already, this message is no longer correct.

Suggested change

"OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible "

"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."

"Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights."

"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible "

"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."

"Alternatively, downgrade PEFT to version 0.13.0 to use the old adapter weights."

Ping @Zeju1997 just this small change and we're good.

@BenjaminBossan Sure, updated. Best.

BenjaminBossan · 2024-10-01T12:58:35Z

src/peft/tuners/oft/config.py

+                "OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible"
+                "with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."


Please add back the spaces:

Suggested change

"OFT has b F438 een updated since PEFT 0.14.0. Your trained adapter weights are incompatible"

"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."

"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible "

"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions. "

zqiu24 · 2024-10-01T13:02:17Z

@BenjaminBossan Updated. Best,

BenjaminBossan · 2024-10-01T14:51:34Z

We've got it, finally. Thanks for your patience.

BenjaminBossan · 2024-10-02T09:18:26Z

@Zeju1997 Unfortunately, the regression tests are failing now for BOFT with the Conv2d layer:

https://github.com/huggingface/peft/actions/runs/11136746324/job/30949028303#step:8:182

The issue is most likely that the boft_s parameter has been transposed:

https://github.com/huggingface/peft/pull/1996/files#diff-76cd9e2dc2dd53a73bdd1a2ca6f475b44978690fea8e417bbd3952ae1c7929adR775

This means that users wouldn't be able to load BOFT checkpoints including Conv2d. Can we transpose this vector back to its initial shape?

To test it locally, run this pytest command:

pytest tests/regression/test_regression.py -s --regression -k boft

zqiu24 · 2024-10-04T11:22:23Z

@BenjaminBossan OK, I will look into this.

zqiu24 · 2024-10-04T14:00:31Z

@BenjaminBossan Is this related to old checkpoints from previous versions or also the current version?

BenjaminBossan · 2024-10-04T14:15:16Z

Is this related to old checkpoints from previous versions or also the current version?

It's an old BOFT Conv2d checkpoint that can't be loaded with the current code.

The previous OFT implementation contained a few errors, which are fixed now. Unfortunately, this makes previous OFT checkpoints invalid, which is why an error will be raised. Users are instructed to either retrain the OFT adapter or switch to an old PEFT version.

Zeju and others added 8 commits August 7, 2024 15:54

updating oft

8e48df0

update oft to be consistent with other peft methods

42b72e6

Merge remote-tracking branch 'upstream/main' into oft-new2

6ade80f

update oft to be consistent with other peft methods

ada7c69

update oft to be consistent with other peft methods

b7f5b23

update oft to fix merge bugs, be consistent with other peft methods

3d9a4bc

update oft to fix merge bugs, be consistent with other peft methods

2bca964

update oft to fix merge bugs, be consistent with other peft methods

76aba70

zqiu24 changed the title ~~Update oft to fix merge bugs~~ Update OFT to fix merge bugs Aug 8, 2024

Merge remote-tracking branch 'upstream/main' into oft-new2

7b9a1af

BenjaminBossan requested changes Aug 9, 2024

View reviewed changes

Zeju added 2 commits August 9, 2024 15:31

addressing issues in config file

7c116b5

update oft config to be compatible with previous peft 0.12.1.dev0 ver…

3297ed6

…sion checkpoints / loading checkpoint successful

BenjaminBossan requested changes Aug 13, 2024

View reviewed changes

Zeju added 2 commits August 13, 2024 13:06

update according to review

fd3a7a2

run make style

27fc08c

BenjaminBossan reviewed Aug 13, 2024

View reviewed changes

Zeju and others added 2 commits August 20, 2024 14:44

Merge remote-tracking branch 'upstream/main' into oft-new2

ddac2b1

added check_kwargs

b4637a3

BenjaminBossan mentioned this pull request Aug 26, 2024

ENH Make PEFT configs forward compatible #2038

Merged

BenjaminBossan requested changes Aug 26, 2024

View reviewed changes

update version check

e62bcfd

BenjaminBossan requested changes Aug 28, 2024

View reviewed changes

Zeju added 3 commits September 27, 2024 18:18

update to solve test_feature_extraction_models

e601692

Merge remote-tracking branch 'upstream/main' into oft-new2

b13410c

fetch upstream and make style

d6fc326

updat test_deeply_nested precision

c591015

skip test_deeply_nested for remote check

097b2f8

BenjaminBossan approved these changes Sep 30, 2024

View reviewed changes

BenjaminBossan reviewed Sep 30, 2024

View reviewed changes

Zeju added 2 commits October 1, 2024 14:46

update oft check_kwargs

aedaa0d

update oft check_kwargs

0fd23cc

BenjaminBossan reviewed Oct 1, 2024

View reviewed changes

update oft check_kwargs

64cd73f

BenjaminBossan merged commit 2a80735 into huggingface:main Oct 1, 2024
14 checks passed

	if self.boft_block_size == 0 and self.boft_block_num == 0:
	raise ValueError("You must specify either boft_block_size or boft_block_num.")
	if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):
	raise ValueError(
	f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "
	"but not both simultaneously, because boft_block_size x boft_block_num != in_features."
	)

	Implements the multiplicative dropout layer for BOFT.
	Implements the multiplicative dropout layer for OFT.

	("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
	("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

	'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
	'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

	'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
	'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

		"OFT has been updated since PEFT 0.14.0. Your trained adapter weights are incompatible"
		"with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."

Update OFT to fix merge bugs #1996

Update OFT to fix merge bugs #1996

Uh oh!

Conversation

changes for BOFT

changes for OFT

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose F987 a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!