siglip
Here are 21 public repositories matching this topic...
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
-
Updated
Feb 21, 2025 - Jupyter Notebook
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
-
Updated
Oct 5, 2024 - Python
Official repository of "TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models".
-
Updated
Jan 20, 2025 - Python
本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。
-
Updated
Feb 17, 2025 - Jupyter Notebook
Official PyTorch implementation of the WACV 2025 Oral paper "Composed Image Retrieval for Training-FREE DOMain Conversion".
-
Updated
Jan 24, 2025 - Python
Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts
-
Updated
Aug 31, 2024 - Jupyter Notebook
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
-
Updated
Feb 14, 2024 - Jupyter Notebook
[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
-
Updated
Feb 7, 2025
Download flickr8k, flickr30k image caption datasets
-
Updated
Feb 6, 2024
Chitrarth: Bridging Vision and Language for a Billion People
-
Updated
Feb 12, 2025 - Python
Meme search and discovery engine using OpenAI CLIP and Salesforce BLIP
-
Updated
Nov 6, 2024 - Python
Este proyecto presenta una solución de Computer Vision para la detección y clasificación de objetos en imágenes, las cuales son extraídas como frames de vídeos. Utiliza el modelo FastSAM para la detección de objetos, y para la clasificación, emplea embeddings que pueden ser generados mediante dos modelos distintos: CLIP o SigLIP.
-
Updated
Feb 2, 2024 - Python
Code for Post-hoc Probabilistic Vision-Language Models
-
Updated
Feb 21, 2025 - Python
A simple open-sourced SigLIP model finetuned on Genshin Impact's image-text pairs.
-
Updated
Oct 9, 2024
Notes for the Vision Language Model implementation by Umar Jamil
-
Updated
Sep 3, 2024 - Python
Framework for learning multi-domain image embeddings suitable for multi-domain image retrieval at instance-level
-
Updated
May 11, 2024 - Python
Korean version of CLIP which achieves Korean cross-modal retrieval and representation generation.
-
Updated
Nov 20, 2024
Web scraper for Wildberries + simple vectorization/multimodal embedding workflow
-
Updated
May 29, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the siglip topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the siglip topic, visit your repo's landing page and select "manage topics."