8000 Add lora+ implementation by kallewoof · Pull Request #1915 · huggingface/peft · GitHub
[go: up one dir, main page]

Skip to content

Add lora+ implementation #1915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jul 29, 2024
Merged

Conversation

kallewoof
Copy link
Contributor
@kallewoof kallewoof commented Jul 8, 2024

LoRA+

Builds on #1509.

@kallewoof kallewoof changed the title 202407 loraplus Add lora+ implementation Jul 8, 2024
@kallewoof kallewoof force-pushed the 202407-loraplus branch 2 times, most recently from 64dc23c to 6f42ed4 Compare July 9, 2024 14:43
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

@kallewoof Thanks for pushing this PR forward. By now, we have waited long enough that I feel it's fair to continue with this PR.

I haven't done a full review yet, but it seems that tests are failing. Could you please take a look?

@kallewoof
Copy link
Contributor Author

@BenjaminBossan Thanks. I think I addressed the errors.

@BenjaminBossan
Copy link
Member

We get an error in Python 3.8 as we're missing a from __future__ import annotations.

@kallewoof
Copy link
Contributor Author

Thanks, fixed. Also added a license header.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. No in-depth review yet, but I found a couple of issues that should be quick to address.

@kallewoof
Copy link
Contributor Author

Failing tests appear unrelated to the PR.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing the work on LoRA+. I found a few areas for improvement, but overall it looks good already. Tested it on a small example and results were slightly improved.

@BenjaminBossan
Copy link
Member

@kallewoof LMK when this is ready for another review.

@kallewoof
Copy link
Contributor Author

@BenjaminBossan Should be ready! Sorry if I missed anything.

Copy link
Member
@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, I think we're almost done. I only have two small comments left, the rest looks good.

@kallewoof kallewoof force-pushed the 202407-loraplus branch 3 times, most recently from fa6d661 to f983257 Compare July 18, 2024 10:27
@BenjaminBossan
Copy link
Member

@kallewoof Please ping me once this is ready for review. Otherwise, I don't know if you're still working on some changes or not :)

@kallewoof
Copy link
Contributor Author

@BenjaminBossan Ping! Sorry about all the force-pushes.

@kallewoof
Copy link
Contributor Author

@BenjaminBossan Sorry, I thought about this overnight and I think you're right that weight_decay should be popped. If someone provides a weight decay meant for something other than LoRA+ then it will be picked up by both, which is most likely undesired.

@kallewoof
Copy link
Contributor Author
kallewoof commented Jul 22, 2024

Playing around with this, I noticed that LoraPlusConfig is actually not used anywhere. I think the code is missing a part where you create a LoraConfig with lora+ feature and it initializes the optimizer and such.

@BenjaminBossan
Copy link
Member
BenjaminBossan commented Jul 22, 2024

Sorry about all the force-pushes.

Not a big deal with this PR, but especially on bigger ones they're better to avoid to make reviews easier. Note that there is no need to clean up the git history, if that was what you're going for, as we squash before merging.

Sorry, I thought about this overnight and I think you're right that weight_decay should be popped. If someone provides a weight decay meant for something other than LoRA+ then it will be picked up by both, which is most likely undesired.

What do you mean by "both"?

Playing around with this, I noticed that LoraPlusConfig is actually not used anywhere.

Hmm, yes, you're right. How about removing it completely then? I guess an argument could be made that something like this API could be useful:

from peft import LoraPlusConfig

optimizer_config = LoraPlusConfig(...)
optimizer = create_loraplus_optimizer(model, optimizer_config)

to make it easier to share the config settings, but IMO the value is very marginal.

@kallewoof
Copy link
Contributor Author
kallewoof commented Jul 22, 2024

Sorry about all the force-pushes.

Not a big deal with this PR, but especially on bigger ones they're better to avoid to make reviews easier. Note that there is no need to clean up the git history, if that was what you're going for, as we squash before merging.

Right. It's a very ingrained habit from other projects where they don't squash.

Sorry, I thought about this overnight and I think you're right that weight_decay should be popped. If someone provides a weight decay meant for something other than LoRA+ then it will be picked up by both, which is most likely undesired.

What do you mean by "both"?

Imagine FutureOptimizerX which has a fancy weight_decay parameter that does something, and now imagine someone attaching a LoRA+ optimizer to it:

def create_loraplus_optimizer(
    model: PeftModel, optimizer_cls: type[Optimizer], *, lr: float, loraplus_lr_ratio: float, **kwargs
) -> Optimizer:

The LoRA+ optimizer picks out and uses the passed-in weight decay value in its setup. Then, if we don't pop it,

    optimizer = optimizer_cls(optimizer_grouped_parameters, **kwargs)

the FutureOptimizerX optimizer will now also use our weight decay param. It is unclear which optimizer the user was referring to for the weight decay, but it's probably unlikely that they intended for both FutureOptimizerX and LoRA+ to get the same weight decay with the same value.

Maybe I'm overcomplicating this?

Playing around with this, I noticed that LoraPlusConfig is actually not used anywhere.

Hmm, yes, you're right. How about removing it completely then? I guess an argument could be made that something like this API could be useful:

from peft import LoraPlusConfig

optimizer_config = LoraPlusConfig(...)
optimizer = create_loraplus_optimizer(model, optimizer_config)

to make it easier to share the config settings, but IMO the value is very marginal.

I think the cleanest approach is to remove it in this PR and then make a follow-up where we make it easier to use, if necessary.

Ultimately, without some tweaks to transformers, I don't think we can do this automatically e.g. from get_peft_model because we need to actually access the Trainer. I think.

Edit: We probably should add an example to the docs on how to use it though, at least. Let me look into that.

@kallewoof
Copy link
Contributor Author

@BenjaminBossan Sounds good. I rebased on main. If it is easier I will merge instead in the future.

@BenjaminBossan BenjaminBossan merged commit 273acf0 into huggingface:main Jul 29, 2024
14 checks passed
@BenjaminBossan
Copy link
Member

Thanks everyone involved in this PR, good work.

@kallewoof kallewoof deleted the 202407-loraplus branch July 29, 2024 13:09
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
Add LoRA+: Efficient Low Rank Adaptation of Large Models

https://arxiv.org/abs/2402.12354

Call create_loraplus_optimizer to initialize an optimizer with optimizer
parameters that are especially effective for LoRA training.

Builds upon this code base:

https://github.com/nikhil-ghosh-berkeley/loraplus

---------

Co-authored-by: moghadas76 <s.m.moghadas2012@gmail.com>
Co-authored-by: Chris Hua <stillmatic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0