8000 GitHub - defyingdemonprogram/Finetuning-LLM-with-different-LoRA-techniques: Supervised Finetuning using QLoRA, AdaLoRA and LoftQ
[go: up one dir, main page]

Skip to content

defyingdemonprogram/Finetuning-LLM-with-different-LoRA-techniques

Repository files navigation

LoRA: Parameter-Efficient Fine-Tuning for Large Language Models

LoRA (Low-Rank Adaptation) is an innovative technique designed to fine-tune large language models (LLMs) with minimal computational and memory requirements. It introduces a set of low-rank matrices to the original model, drastically reducing the number of trainable parameters. This approach significantly enhances the speed and efficiency of fine-tuning LLMs, particularly when working with limited resources.

How LoRA Works

  1. Decomposition: LoRA decomposes weight updates in each layer into two smaller low rank matrices, A and B(r << original weight size).

  2. Parameter Reduction: Utilizing low-rank matrices allows LoRA to reduce the number of trainable parameters, resulting in faster and more memory-efficient training.

  3. Identity Transform: LoRA is initially set up to minimally impact the model's output, preserving the original model's knowledge during training.

  4. Fine-Tuning: During fine-tuning, only the LoRA matrices (A and B) are trained, while the original model's weights remain unchanged.

  5. Inference: Inference combines the original model's weights with the learned LoRA matrices to produce the model's output.

Advantages of LoRA

  • Efficiency: LoRA significantly reduces the number of trainable parameters compared to full fine-tuning, making the process faster and more memory-efficient.
  • Knowledge Preservation: By keeping the original model's weights frozen, LoRA preserves the pre-existing knowledge from the initial training.
  • Flexibility: LoRA adapts the original model to specific tasks without needing to retrain the entire model.

Advanced LoRA Techniques

Recent advancements have further enhanced LoRA's effectiveness:

  • PiSSA: Uses Singular Value Decomposition (SVD) to initialize LoRA adapters, resulting in faster convergence and better performance.
  • LoftQ: Reduces quantization errors during training, particularly useful for quantized models.
  • Rank-Stabilized LoRA (rsLoRA): Scales LoRA adapters based on rank, enabling higher performance with larger ranks.
  • Weight-Decomposed Low-Rank Adaptation (DoRA): Separates weight updates into magnitude and direction, improving low-rank performance.
  • QLoRA-style Training: Applies LoRA to all linear layers while freezing the base model in quantized form(4 bit), achieving performance comparable to fully fine-tuned models.
  • Memory-Efficient Layer Replication: Expands model size by duplicating layers while maintaining memory efficiency through LoRA adapters.

Conclusion

LoRA has revolutionized the fine-tuning process for large language models, providing a practical and efficient method for adapting these models to specific tasks. Its efficiency, flexibility, and advanced variations make it an essential tool for researchers and practitioners working with LLMs.


Sources: Hugging Face Documentation

About

Supervised Finetuning using QLoRA, AdaLoRA and LoftQ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0