8000 GitHub - NVlabs/LoRWeB: We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations. · GitHub
[go: up one dir, main page]

Skip to content

NVlabs/LoRWeB

Repository files navigation

LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs

arXiv Project Website Evaluation Dataset (Comming Soon) Model (Comming Soon)

👥 Authors

Hila Manor1,2,  Rinon Gal2,  Haggai Maron1,2,  Tomer Michaeli1,  Gal Chechik2,3

1Technion - Israel Institute of Technology    2NVIDIA    3Bar-Ilan University


LoRWeB Teaser

Given a prompt and an image triplet ${a,a',b}$ that visually describe a desired transformation, LoRWeB dynamically constructs a single LoRA from 8000 a learnable basis of LoRA modules, and produces an editing result $b'$ that applies the same analogy to the new image.

📄 Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet ${a,a',b}$, the goal is to generate $b'$ such that $a$ : $a'$ :: $b$ : $b'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation.

📋 Table of Contents

🔨 Setup

conda env create -f environment.yml
conda activate lorweb

🚀 Usage

💻 Training

Train a LoRWeB model on your visual analogy dataset:

python run.py config/your_config.yaml

You can override the main options with arguments to the run.py script, e.g. python run.py LoRWeB_default_PROMPTS.yaml --name "lorweb_model" --linear 4 --linear_alpha 4 --loras_num 32 --lora_softmax true --query_mode "cat-aa'b"

📊 Training Data Format

We trained on Relation252k. The training script expects 2 folder: control - which will contain images of the ${a,a',b}$ triplets, and target which contains images of the corresponding $b$ image. Use preprocess_data.py to preprocess a pre-downloaded dataset.

🎨 Inference

You can test our model's checkpoint from HuggingFace (coming soon) using inference.py.

python inference.py -w "output/your_model/your_model.safetensors" -c "output/your_model/config.yaml" -a "data/path_to_a_img.jpg" -t "data/path_to_atag_img.jpg" -b "data/patH-to_b_img.jpg" -o "outputs/generated_btag_img_path.jpg"

ℹ️ Additional Information

Our complementary custom evaluation set is available on HuggingFace (coming soon).

📚 Citation

If you use this code in your research, please cite:

@article{manor2026lorweb,
    title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
    author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
    journal={arXiv preprint arXiv:2602.15727},
    year={2026}
}

🙏 Acknowledgements

This project builds upon:


⭐ Star this repo if you find it useful! ⭐

About

We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

0