[go: up one dir, main page]

Skip to content
View nbasyl's full-sized avatar
🌵
I am Groot
🌵
I am Groot

Highlights

  • Pro

Block or report nbasyl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 74 9 Updated Aug 12, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 532 39 Updated Aug 15, 2024

Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"

Python 77 7 Updated May 31, 2024

For releasing code related to compression methods for transformers, accompanying our publications

Python 352 31 Updated Aug 26, 2024

PyTorch native quantization and sparsity for training and inference

Python 571 77 Updated Aug 28, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 382 16 Updated Aug 13, 2024

Official repository for "AM-RADIO: Reduce All Domains Into One"

Python 563 22 Updated Aug 13, 2024

The official implementation of the DAC 2024 paper GQA-LUT

Python 10 Updated Jun 18, 2024

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Python 29,703 3,651 Updated Aug 27, 2024

Training LLMs with QLoRA + FSDP

Jupyter Notebook 1,370 182 Updated Aug 28, 2024

The official Meta Llama 3 GitHub site

Python 25,785 2,873 Updated Aug 12, 2024

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 532 28 Updated Jul 6, 2024

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python 245 17 Updated Jul 22, 2024

ReFT: Representation Finetuning for Language Models

Python 1,042 88 Updated Aug 21, 2024

A simple and effective LLM pruning approach.

Python 600 74 Updated Aug 9, 2024

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Python 6,181 897 Updated Jul 3, 2024

This is a project page for continuous 3D words

JavaScript 2 Updated Apr 9, 2024

awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.

115 8 Updated Aug 20, 2024

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

119 3 Updated Apr 28, 2024

Modeling, training, eval, and inference code for OLMo

Python 4,310 422 Updated Aug 28, 2024

The code and data for the GPT-4 based benchmark in the vicuna blog post

Python 33 8 Updated Aug 2, 2023
Python 188 16 Updated Jun 11, 2024

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,512 2,208 Updated Jul 29, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,947 2,075 Updated Aug 12, 2024

Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"

72 Updated Apr 12, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,024 883 Updated Aug 27, 2024

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 15,609 1,502 Updated Aug 26, 2024

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 10,172 649 Updated Aug 14, 2024

The official implementation of the EMNLP 2023 paper LLM-FP4

Python 153 9 Updated Dec 15, 2023

A curated list of awesome vision and language resources (still under construction... stay tuned!)

425 36 Updated Aug 4, 2024
Next