-
Notifications
You must be signed in to change notification settings - Fork 2k
TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
BenjaminBossan
merged 1 commit into
huggingface:main
from
BenjaminBossan:tst-regression-tests-dora-vera-boft-ln_tuning
May 27, 2024
Merged
TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792
BenjaminBossan
merged 1 commit into
huggingface:main
from
BenjaminBossan:tst-regression-tests-dora-vera-boft-ln_tuning
May 27, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
younesbelkada
approved these changes
May 27, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding these regression tests !
BenjaminBossan
added a commit
that referenced
this pull request
May 31, 2024
This PR moves all the DoRA functionality into a separate module class. Essentially, this is necessary because otherwise, the DoRA parameter lives on the lora.Linear layer as a parameter, not a separate module. Since FSDP auto wrap policy operates on the level of modules, not parameters, there is no way to modify the auto wrap policy to wrap the DoRA parameter, it must be its own module. If not for this reason, #1797 would be preferable, since the number of code changes is smaller overall. In this PR, there are more numerous changes, but the majority only involves moving code around, not any actual code changes. Since we introduce a new submodule, an extra steps are required to ensure that old DoRA state dicts can still be loaded correctly. This involves a fairly trivial extra remapping step in set_peft_model_state_dict. The test for this is performed via the new regression DoRA tests introduced in #1792. Similarly, there is a remapping step involved in get_peft_model_state_dict to ensure that when new state dicts with DoRA are saved, they still conform to the old format. An additional required change was to make a defensive copy of the base layer before dequantizing its weight in order to calculate the weight norm for DoRA. Without this defensive copy, some side-effect is triggered in FSDP that results in > ValueError: Cannot flatten integer dtype tensors even though the compute dtype of bnb is correctly set to float. Creating a fully functioning deepcopy does currently not work with 8bit BNB but there is a fix. Once the next BNB release is out, 8bit BNB will be tested and enabled. While working on this, I also noticed a small bug that dropout was not correctly applied when using QDoRA. This is now also fixed. This PR was tested successfully with FSDP and (Q)DoRA using the scripts in examples/sft/ with a modification to enable DoRA.
Guy-Bilitski
pushed a commit
to Guy-Bilitski/peft
that referenced
this pull request
May 13, 2025
These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.
Guy-Bilitski
pushed a commit
to Guy-Bilitski/peft
that referenced
this pull request
May 13, 2025
This PR moves all the DoRA functionality into a separate module class. Essentially, this is necessary because otherwise, the DoRA parameter lives on the lora.Linear layer as a parameter, not a separate module. Since FSDP auto wrap policy operates on the level of modules, not parameters, there is no way to modify the auto wrap policy to wrap the DoRA parameter, it must be its own module. If not for this reason, huggingface#1797 would be preferable, since the number of code changes is smaller overall. In this PR, there are more numerous changes, but the majority only involves moving code around, not any actual code changes. Since we introduce a new submodule, an extra steps are required to ensure that old DoRA state dicts can still be loaded correctly. This involves a fairly trivial extra remapping step in set_peft_model_state_dict. The test for this is performed via the new regression DoRA tests introduced in huggingface#1792. Similarly, there is a remapping step involved in get_peft_model_state_dict to ensure that when new state dicts with DoRA are saved, they still conform to the old format. An additional required change was to make a defensive copy of the base layer before dequantizing its weight in order to calculate the weight norm for DoRA. Without this defensive copy, some side-effect is triggered in FSDP that results in > ValueError: Cannot flatten integer dtype tensors even though the compute dtype of bnb is correctly set to float. Creating a fully functioning deepcopy does currently not work with 8bit BNB but there is a fix. Once the next BNB release is out, 8bit BNB will be tested and enabled. While working on this, I also noticed a small bug that dropout was not correctly applied when using QDoRA. This is now also fixed. This PR was tested successfully with FSDP and (Q)DoRA using the scripts in examples/sft/ with a modification to enable DoRA.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.