GitHub - defyingdemonprogram/Finetuning-LLM-with-different-LoRA-techniques: Supervised Finetuning using QLoRA, AdaLoRA and LoftQ

LoRA: Parameter-Efficient Fine-Tuning for Large Language Models

LoRA (Low-Rank Adaptation) is an innovative technique designed to fine-tune large language models (LLMs) with minimal computational and memory requirements. It introduces a set of low-rank matrices to the original model, drastically reducing the number of trainable parameters. This approach significantly enhances the speed and efficiency of fine-tuning LLMs, particularly when working with limited resources.

How LoRA Works

Decomposition: LoRA decomposes weight updates in each layer into two smaller low rank matrices, A and B(r << original weight size).
Parameter Reduction: Utilizing low-rank matrices allows LoRA to reduce the number of trainable parameters, resulting in faster and more memory-efficient training.
Identity Transform: LoRA is initially set up to minimally impact the model's output, preserving the original model's knowledge during training.
Fine-Tuning: During fine-tuning, only the LoRA matrices (A and B) are trained, while the original model's weights remain unchanged.
Inference: Inference combines the original model's weights with the learned LoRA matrices to produce the model's output.

Advantages of LoRA

Efficiency: LoRA significantly reduces the number of trainable parameters compared to full fine-tuning, making the process faster and more memory-efficient.
Knowledge Preservation: By keeping the original model's weights frozen, LoRA preserves the pre-existing knowledge from the initial training.
Flexibility: LoRA adapts the original model to specific tasks without needing to retrain the entire model.

Advanced LoRA Techniques

Recent advancements have further enhanced LoRA's effectiveness:

PiSSA: Uses Singular Value Decomposition (SVD) to initialize LoRA adapters, resulting in faster convergence and better performance.
LoftQ: Reduces quantization errors during training, particularly useful for quantized models.
Rank-Stabilized LoRA (rsLoRA): Scales LoRA adapters based on rank, enabling higher performance with larger ranks.
Weight-Decomposed Low-Rank Adaptation (DoRA): Separates weight updates into magnitude and direction, improving low-rank performance.
QLoRA-style Training: Applies LoRA to all linear layers while freezing the base model in quantized form(4 bit), achieving performance comparable to fully fine-tuned models.
Memory-Efficient Layer Replication: Expands model size by duplicating layers while maintaining memory efficiency through LoRA adapters.

Conclusion

LoRA has revolutionized the fine-tuning process for large language models, providing a practical and efficient method for adapting these models to specific tasks. Its efficiency, flexibility, and advanced variations make it an essential tool for researchers and practitioners working with LLMs.

Sources: Hugging Face Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
SFT_phi_3_AdaLora.ipynb		SFT_phi_3_AdaLora.ipynb
SFT_phi_3_LoftQ.ipynb		SFT_phi_3_LoftQ.ipynb
SFT_phi_3_Qlora.ipynb		SFT_phi_3_Qlora.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoRA: Parameter-Efficient Fine-Tuning for Large Language Models

How LoRA Works

Advantages of LoRA

Advanced LoRA Techniques

Conclusion

About

Releases

Packages

Languages

defyingdemonprogram/Finetuning-LLM-with-different-LoRA-techniques

Folders and files

Latest commit

History

Repository files navigation

LoRA: Parameter-Efficient Fine-Tuning for Large Language Models

How LoRA Works

Advantages of LoRA

Advanced LoRA Techniques

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages