-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
7 Releases published by 3 people
-
v4.53.1 Patch Release v4.53.1
published
Jul 4, 2025 -
v4.53.2 Patch Release v4.53.2
published
Jul 11, 2025 -
v4.53.2-modernbert-decoder-preview ModernBERT Decoder (based on v4.53.2)
published
Jul 16, 2025 -
v4.53.3 Patch release v4.53.3
published
Jul 22, 2025 -
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published
Jul 23, 2025 -
4.54.1 Patch release 4.54.1
published
Jul 29, 2025
377 Pull requests merged by 154 people
-
Remove python3.7 reference from doc link
#39706 merged
Jul 29, 2025 -
[docs] Ko doc fixes after toc update
#39660 merged
Jul 29, 2025 -
Fix Cache.max_cache_len max value for Hybrid models
#39737 merged
Jul 29, 2025 -
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
how_to_hack_models.md
to Korean#39536 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
perf_train_gpu_one.md
to Korean#39552 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
pipeline_gradio.md
to Korean#39520 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
tokenizer.md
to Korean#39532 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
tvp.md
to Korean#39578 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated albert.md to Korean
#39524 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
main_classes/peft.md
#39515 merged
Jul 29, 2025 -
[modenbert] fix regression
#39750 merged
Jul 29, 2025 -
add
libcst
toextras["testing"]
insetup.py
#39761 merged
Jul 29, 2025 -
Fix version issue in modeling_utils.py
#39759 merged
Jul 29, 2025 -
Enable xpu allocator on caching_allocato 10000 r_warmup
#39654 merged
Jul 29, 2025 -
Support loading Qwen3 MoE GGUF
#39638 merged
Jul 29, 2025 -
Fix GPT2 with cross attention
#39754 merged
Jul 29, 2025 -
Avoid OOM when other tests are failing
#39758 merged
Jul 29, 2025 -
AMD disable torchcodec
#39757 merged
Jul 29, 2025 -
Use
--gpus all
in workflow files#39752 merged
Jul 29, 2025 -
Apply several ruff SIM rules
#37283 merged
Jul 29, 2025 -
Fix mamba regression
#39728 merged
Jul 29, 2025 -
Update IMPORTANT_MODELS list
#39734 merged
Jul 29, 2025 -
update
GemmaIntegrationTest::test_model_2b_bf16_dola
again#39731 merged
Jul 29, 2025 -
Fix: add back base model plan
#39733 merged
Jul 29, 2025 -
[Fix] import two missing typos in
models/__init__.py
for typo checking#39745 merged
Jul 29, 2025 -
fix cache inheritance
#39748 merged
Jul 29, 2025 -
extend more trainer test cases to XPU, all pass
#39652 merged
Jul 29, 2025 -
BLIPs clean-up
#35560 merged
Jul 29, 2025 -
Add Fast Segformer Processor
#37024 merged
Jul 28, 2025 -
Superpoint fast image processor
#37804 merged
Jul 28, 2025 -
Fix AMD dockerfile for audio models
#39669 merged
Jul 28, 2025 -
Fix cache-related tests
#39676 merged
Jul 28, 2025 -
Fix Layer device placement in Caches
#39732 merged
Jul 28, 2025 -
Fix
Qwen2AudioForConditionalGeneration.forward()
andtest_flash_attn_kernels_inference_equivalence
#39503 merged
Jul 28, 2025 -
skip
Glm4MoeModelTest::test_torch_compile_for_training
#39670 merged
Jul 28, 2025 -
Update
QAPipelineTests::test_large_model_course
after #39193#39666 merged
Jul 28, 2025 -
mllama outputs refactor
#39643 merged
Jul 28, 2025 -
Remove all expired deprecation cycles
#39725 merged
Jul 28, 2025 -
[
CI
] Add Eric to comment slow ci#39601 merged
Jul 28, 2025 -
PATCH: add back n-dim device-mesh + fix tp trainer saving
#39693 merged
Jul 28, 2025 -
Add self-hosted runner scale set workflow for mi325 CI
#39651 merged
Jul 28, 2025 -
[configuration] remove redundant
classmethod
#38812 merged
Jul 28, 2025 -
update ernie model card
#39657 merged
Jul 28, 2025 -
[processors] add tests for helper fn
#39629 merged
Jul 28, 2025 -
xpu optimization for generation case
#39573 merged
Jul 28, 2025 -
fix(tokenization): check token.content for trie
#39587 merged
Jul 28, 2025 -
Fix missing initialization of
FastSpeech2Conformer
#39689 merged
Jul 28, 2025 -
fix missing model._tp_size from ep refactor
#39688 merged
Jul 26, 2025 -
More robust tied weight test
#39681 merged
Jul 25, 2025 -
Add padding-free to Granite hybrid moe models
#39677 merged
Jul 25, 2025 -
Fix tied weight test
#39680 merged
Jul 25, 2025 -
fix break for ckpt without _tp_plan
#39658 merged
Jul 25, 2025 -
Add EXAONE 4.0 model
#39129 merged
Jul 25, 2025 -
Support
typing.Literal
as type of tool parameters or return value#39633 merged
Jul 25, 2025 -
Add ep
#39501 merged
Jul 25, 2025 -
bad_words_ids no longer slow on mps
#39556 merged
Jul 25, 2025 -
Add xlstm model
#39665 merged
Jul 25, 2025 -
Use auto_docstring for perception_lm fast image processor
#39679 merged
Jul 25, 2025 -
fix: HWIO to OIHW
#39200 merged
Jul 25, 2025 -
Fix auto_docstring crashing when dependencies are missing
#39564 merged
Jul 25, 2025 -
Add support for DeepseekAI's DeepseekVL
#36248 merged
Jul 25, 2025 -
Add missing flag for CacheLayer
#39678 merged
Jul 25, 2025 -
Add evolla rebase main
#36232 merged
Jul 25, 2025 -
update expected outputs for whisper after #38778
#39304 merged
Jul 25, 2025 -
fix
kyutai
tests#39416 merged
Jul 25, 2025 -
Fixes the BC
#39636 merged
Jul 25, 2025 -
Delete bad rebasing functions
#39672 merged
Jul 25, 2025 -
[
Ernie 4.5
] Post merge adaptations#39664 merged
Jul 25, 2025 -
[CI] revert device in
test_export_static_cache
#39662 merged
Jul 25, 2025 -
Fix ModernBERT Decoder model
#39671 merged
Jul 25, 2025 -
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged
Jul 25, 2025 -
Rename huggingface_cli to hf
#39630 merged
Jul 25, 2025 -
fix(voxtral): correct typo in apply_transcription_request
#39572 merged
Jul 25, 2025 -
make fixup
#39661 merged
Jul 25, 2025 -
[docs] fix ko cache docs
#39644 merged
Jul 25, 2025 -
Make pytorch examples UV-compatible
#39635 merged
Jul 25, 2025 -
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged
Jul 25, 2025 -
Fix: explicit not none check for tensors in flash attention
#39639 merged
Jul 25, 2025 -
[attention] fix test for packed padfree masking
#39582 merged
Jul 25, 2025 -
Add owlv2 fast processor
#39041 merged
Jul 25, 2025 -
revert behavior of _prepare_from_posids
#39622 merged
Jul 24, 2025 -
[Voxtral] values for A10 runners
#39605 merged
Jul 24, 2025 -
[timm] new timm pin
#39640 merged
10000Jul 24, 2025 -
Fix EfficientLoFTR model id in tests
#39621 merged
Jul 24, 2025 -
Update recent processors for vLLM backend
#39583 merged
Jul 24, 2025 -
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged
Jul 23, 2025 -
standardized YOLOS model card according to template in #36979
#39528 merged
Jul 23, 2025 -
Feature/standardize opt model card
#39568 merged
Jul 23, 2025 -
🔴 Fix EnCodec internals and integration tests
#39431 merged
Jul 23, 2025 -
Fix DAC integration tests and checkpoint conversion.
#39313 merged
Jul 23, 2025 -
Move openai import
#39613 merged
Jul 23, 2025 -
Transformers serve VLM
#39454 merged
Jul 23, 2025 -
Fix important models CI
#39576 merged
Jul 23, 2025 -
Fix typos and grammar issues in documentation and code
#39598 merged
Jul 23, 2025 -
Allow
device_mesh
have multiple dim#38949 merged
Jul 23, 2025 -
enable triton backend on awq xpu
#39443 merged
Jul 23, 2025 -
[idefics3] fix for vLLM
#39470 merged
Jul 23, 2025 -
fix moe routing_weights
#39581 merged
Jul 23, 2025 -
FP-Quant support
#38696 merged
Jul 23, 2025 -
Rename
supports_static_cache
tocan_compile_fullgraph
#39505 merged
Jul 23, 2025 -
[Trackio] Allow single-gpu training and monitor power
#39595 merged
Jul 23, 2025 -
Generic task-specific base classes
#39584 merged
Jul 23, 2025 -
Fix DynamicCache and simplify Cache classes a bit
#39590 merged
Jul 23, 2025 -
Mask2former & Maskformer Fast Image Processor
#35685 merged
Jul 23, 2025 -
🎯 Trackio integration
#38814 merged
Jul 22, 2025 -
[WIP] Add OneformerFastImageProcessor
#38343 merged
Jul 22, 2025 -
Fix link in "Inference server backends" doc
#39589 merged
Jul 22, 2025 -
Torchdec RuntimeError catch
#39580 merged
Jul 22, 2025 -
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged
Jul 22, 2025 -
updated mistral3 model card
#39531 merged
Jul 22, 2025 -
Update
docs/source/ko/_toctree.yml
#39516 merged
Jul 22, 2025 -
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged
Jul 22, 2025 -
General weight initialization scheme
#39579 merged
Jul 22, 2025 -
Add AMD GPU expectations for LLaVA tests
#39486 merged
Jul 22, 2025 -
Kernels flash attn
#39474 merged
Jul 22, 2025 -
Add AMD expectations to Mistral3 tests
#39481 merged
Jul 22, 2025 -
[docs] Create page on inference servers with transformers backend
#39550 merged
Jul 22, 2025 -
[docs] update attention implementation and cache docs
#39547 merged
Jul 22, 2025 -
Add AMD test expectations to DETR model
#39539 merged
Jul 22, 2025 -
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged
Jul 22, 2025 -
Fixes needed for n-d parallelism and TP
#39562 merged
Jul 22, 2025 -
Bump AMD container for 2.7.1 PyTorch
#39458 merged
Jul 22, 2025 -
Add EfficientLoFTR model
#36355 merged
Jul 22, 2025 -
[gemma3] fix bidirectional image mask
#39396 merged
Jul 22, 2025 -
Update OLMoE model card
#39344 merged
Jul 21, 2025 -
Update modernbertdecoder docs
#39453 merged
Jul 21, 2025 -
[
CI
] Fix post merge ernie 4.5#39561 merged
Jul 21, 2025 -
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged
Jul 21, 2025 -
[
Ernie 4.5
] Add ernie text models#39228 merged
Jul 21, 2025 -
Refactor embedding input/output getter/setter
#39339 merged
Jul 21, 2025 -
🌐 [i18n-KO] Translated
perf_infer_gpu_multi.md
to Korean#39441 merged
Jul 21, 2025 -
[Fast image processor] refactor fast image processor glm4v
#39490 merged
Jul 21, 2025 -
fix ndim check of device_mesh for TP
#39538 merged
Jul 21, 2025 -
Refactor
MambaCache
tomodeling_mamba.py
#38086 merged
Jul 21, 2025 -
Fix Docstring of BarkProcessor
#39546 merged
Jul 21, 2025 -
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged
Jul 21, 2025 -
Fix missing initializations for models created in 2023
#39239 merged
Jul 21, 2025 -
Raise
TypeError
instead of ValueError for invalid types#38660 merged
Jul 21, 2025 -
Fix pylint warnings
#39477 merged
Jul 21, 2025 -
Fix Qwen Omni integration test
#39553 merged
Jul 21, 2025 -
🚨🚨🚨 [Trainer] Enable
average_tokens_across_devices
by default inTrainingArguments
#39395 merged
Jul 21, 2025 -
Rename
_supports_flash_attn_2
in examples and tests#39471 merged
Jul 21, 2025 -
Fix the check in flex test
#39548 merged
Jul 21, 2025 -
Fix bad tensor shape in failing Hubert test.
#39502 merged
Jul 21, 2025 -
GLM-4 Update
#39393 merged
Jul 21, 2025 -
[qwen2 vl] fix packing with all attentions
#39447 merged
Jul 21, 2025 -
[gemma3] support sequence classification task
#39465 merged
Jul 21, 2025 -
Fix placeholders replacement logic in auto_docstring
#39433 merged
Jul 18, 2025 -
Update SAM/SAM HQ attention implementation + fix Cuda sync issues
#39386 merged
Jul 18, 2025 -
Improve @auto_docstring doc and rename
args_doc.py
toauto_docstring.py
#39439 merged
Jul 18, 2025 -
Add fast image processor SAM
#39385 merged
Jul 18, 2025 -
Fix BatchEncoding.to() for nested elements 10000
#38985 merged
Jul 18, 2025 -
[gemma3] Fix do_convert_rgb in image processors.
#39438 merged
Jul 18, 2025 -
[chat template] return assistant mask in processors
#38545 merged
Jul 18, 2025 -
[dependencies] Update
datasets
pin#39500 merged
Jul 18, 2025 -
Slack CI bot: set default result for non-existing artifacts
#39499 merged
Jul 18, 2025 -
🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling
#39423 merged
Jul 18, 2025 -
[doc builder job] temporary pyarrow pin
#39496 merged
Jul 18, 2025 -
Add voxtral
#39429 merged
Jul 18, 2025 -
Fix typing order
#39467 merged
Jul 17, 2025 -
Add unified logits_to_keep support to LLMClass
#39472 merged
Jul 17, 2025 -
[serve] Add speech to text (
/v1/audio/transcriptions
)#39434 merged
Jul 17, 2025 -
Update integration_utils.py
#39469 merged
Jul 17, 2025 -
fix: ImageTextToTextPipeline handles user-defined generation_config
#39374 merged
Jul 17, 2025 -
Enable some ruff checks for performance and readability
#39383 merged
Jul 17, 2025 -
Fix convert_and_export_with_cache failures for GPU models
#38976 merged
Jul 17, 2025 -
Update
GemmaIntegrationTest::test_model_2b_bf16_dola
#39362 merged
Jul 17, 2025 -
fix a comment typo in utils.py
#39459 merged
Jul 17, 2025 -
Use newer typing notation
#38934 merged
Jul 17, 2025 -
Fix tests due to breaking change in accelerate
#39451 merged
Jul 17, 2025 -
fix max_length calculating using cu_seq_lens
#39341 merged
Jul 17, 2025 -
fix(pipelines): QA pipeline returns fewer than top_k results in batch mode
#39193 merged
Jul 17, 2025 -
Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings
#38822 merged
Jul 16, 2025 -
create ijepa modelcard (ref : PR #36979 ).
#39354 merged
Jul 16, 2025 -
Improve grammar and clarity in perf_hardware.md
#39428 merged
Jul 16, 2025 -
fix cached file error when repo type is dataset
#36909 merged
Jul 16, 2025 -
Fix indentation bug in SmolVLM image processor causing KeyError
#39452 merged
Jul 16, 2025 -
Updated Megatron conversion script for gpt2 checkpoints
#38969 merged
Jul 16, 2025 -
[
CI
] Fix partially red CI#39448 merged
Jul 16, 2025 -
Fixes #39204: add fallback if get_base_model missing
#39226 merged
Jul 16, 2025 -
make the loss context manager easier to extend
#39321 merged
Jul 16, 2025 -
Remove something that should have never been there
#38254 merged
Jul 16, 2025 -
Fix processor tests
#39450 merged
Jul 16, 2025 -
[Bugfix] [Quantization] Remove unused init arg
#39324 merged
Jul 16, 2025 -
Better typing for model.config
#39132 merged
Jul 16, 2025 -
Fix typo in generation configuration for Janus model weight conversion
#39432 merged
Jul 16, 2025 -
Responses API in
transformers serve
#39155 merged
Jul 16, 2025 -
[cache] make all classes cache compatible finally
#38635 merged
Jul 16, 2025 -
docs: add missing numpy import to minimal example
#39444 merged
Jul 16, 2025 -
Remove runtime conditions for type checking
#37340 merged
Jul 16, 2025 -
Add StableAdamW Optimizer
#39446 merged
Jul 16, 2025 -
add test scanner
#39419 merged
Jul 16, 2025 -
Fix missing definition of diff_file_url in notification service
#39445 merged
Jul 16, 2025 -
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 merged
Jul 16, 2025 -
Change log level from warning to info for scheduled request logging in
ContinuousBatchProcessor
#39372 merged
Jul 16, 2025 -
Defaults to adamw_torch_fused for Pytorch>=2.8
#37358 merged
Jul 16, 2025 -
Fix L270 - hasattr("moe_args") returning False error
#38715 merged
Jul 16, 2025 -
[chat template] add a testcase for kwargs
#39415 merged
Jul 16, 2025 -
Fixed a bug calculating cross entropy loss in
JetMoeForCausalLM
#37830 merged
Jul 16, 2025 -
Remove double soft-max in load-balancing loss. Fixes #39055 .
#39056 merged
Jul 16, 2025 -
[Core] [Offloading] Fix saving offloaded submodules
#39280 merged
Jul 16, 2025 -
[autodocstring] add video and audio inputs
#39420 merged
Jul 16, 2025 -
Responses API (to be merged into #39155)
#39338 merged
Jul 16, 2025 -
CI workflow for performed test regressions
#39198 merged
Jul 16, 2025 -
docs: update LightGlue docs
#39407 merged
Jul 15, 2025 -
docs: update SuperGlue docs
#39406 merged
Jul 15, 2025 -
[vlm] fix loading of retrieval VLMs
#39242 merged
Jul 15, 2025 -
handle training summary when creating modelcard but offline mode is set
#37095 merged
Jul 15, 2025 -
Remove residual quantization attribute from dequantized models
#39373 merged
Jul 15, 2025 -
Remove deprecated audio utils functions
#39330 merged
Jul 15, 2025 -
Fix bugs in pytorch example run_clm when streaming is enabled
#39286 merged
Jul 15, 2025 -
Fix bugs from pipeline preprocessor overhaul
#39425 merged
Jul 15, 2025 -
refactor: remove
set_tracer_provider
andset_meter_provider
calls#39422 merged
Jul 15, 2025 -
Fix invalid property
#39384 merged
Jul 15, 2025 -
set document_question_answering pipeline _load_tokenizer to True
#39411 merged
Jul 15, 2025 -
Ignore extra position embeddings weights for ESM
#39063 merged
Jul 15, 2025 -
support loading qwen3 gguf
#38645 merged
Jul 15, 2025 -
Add ModernBERT Decoder Models - ModernBERT, but trained with CLM!
#38967 merged
Jul 15, 2025 -
Fix typo in
/v1/models
output payload#39414 merged
Jul 15, 2025 -
[refactor] set attention implementation
#38974 merged
Jul 15, 2025 -
Fix/siglip2 pooling comment
#39378 merged
Jul 14, 2025 -
Update phi4_multimodal.md
#38830 merged
Jul 14, 2025 -
[Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
#39391 merged
Jul 14, 2025 -
Use np.pad instead of np.lib.pad.
#39346 merged
Jul 14, 2025 -
🚨 Totally rewrite how pipelines load preprocessors
#38947 merged
Jul 14, 2025 -
Remove do_reduce_labels Argument from model initialization in run_semantic_segmentation_no_trainer
#39322 merged
Jul 14, 2025 -
Fix Lfm2 and common tests
#39398 merged
Jul 14, 2025 -
Deprecate AutoModelForVision2Seq
#38900 merged
Jul 14, 2025 -
[Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor
#39333 merged
Jul 14, 2025 -
[BLIP] remove cache from Qformer
#39335 merged
Jul 14, 2025 -
[shieldgemma] fix checkpoint loading
#39348 merged
Jul 14, 2025 -
Fix overriding Fast Image/Video Processors instance attributes affect other instances
#39363 merged
Jul 12, 2025 -
update docker file to use latest
timm
(forperception_lm
)#39380 merged
Jul 12, 2025 -
Update Model Card for Encoder Decoder Model
#39272 merged
Jul 11, 2025 -
fix gpt2 usage doc
#39351 merged
Jul 11, 2025 -
Updated CamemBERT model card to new standardized format
#39227 merged
Jul 11, 2025 -
Update Readme to Run Multiple Choice Script from Example Directory
#39323 merged
Jul 11, 2025 -
Add mistral common support
#38906 merged
Jul 11, 2025 -
Remove device check in HQQ quantizer
#39299 merged
Jul 11, 2025 -
Small fixes for utils/check_docstrings.py
#38915 merged
Jul 11, 2025 -
fix failing
test_sdpa_can_dispatch_on_flash
#39259 merged
Jul 11, 2025 -
update cb TP
#39361 merged
Jul 11, 2025 -
Fix link for testpypi
#39360 merged
Jul 11, 2025 -
PerceptionLM
#37878 merged
Jul 11, 2025 -
Updated Switch Transformers model card with standardized format (Issue #36979)
#39305 merged
Jul 10, 2025 -
Update check_modular_conversion
#37456 merged
Jul 10, 2025 -
Add a default value for
position_ids
in masking_utils#39310 merged
Jul 10, 2025 -
[Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups
#39263 merged
Jul 10, 2025 -
[tests] tag serve tests as slow
#39343 merged
Jul 10, 2025 -
[modeling][lfm2] LFM2: Remove deprecated seen_tokens
#39342 merged
Jul 10, 2025 -
LFM2
#39340 merged
Jul 10, 2025 -
[server] add tests and fix passing a custom
generation_config
#39230 merged
Jul 10, 2025 -
Handle DAC conversion when using weight_norm with newer PyTorch versions
#36393 merged
Jul 10, 2025 -
fix
phi3
tests#39312 merged
Jul 10, 2025 -
fix Glm4v batch videos forward
#39172 merged
Jul 10, 2025 -
Delete deprecated stuff
#38838 merged
Jul 10, 2025 -
Fix broken SAM after #39120
#39289 merged
Jul 9, 2025 -
enable static cache on TP model
#39164 merged
Jul 9, 2025 -
Fix
max_length_q
andmax_length_k
types toflash_attn_varlen_func
#37206 merged
Jul 9, 2025 -
Granite speech speedups
#39197 merged
Jul 9, 2025 -
Fix typo: langauge -> language
#39317 merged
Jul 9, 2025 -
docs: update LLaVA-NeXT model card
#38894 merged
Jul 9, 2025 -
skip files in
src/
for doctest (for now)#39316 merged
Jul 9, 2025 -
Updated the Model docs - for the MARIAN model
#39138 merged
Jul 9, 2025 -
add
stevhliu
to the list inself-comment-ci.yml
#39315 merged
Jul 9, 2025 -
Fix consistency and a few docstrings warnings
#39314 merged
Jul 9, 2025 -
🌐 [i18n-KO] Translated quark.md to Korean
#39268 merged
Jul 9, 2025 -
Add DeepSeek V2 Model into Transformers
#36400 merged
Jul 9, 2025 -
[sliding window] revert and deprecate
#39301 merged
Jul 9, 2025 -
[modular] Allow method with the same name in case of @property decorator
#39308 merged
Jul 9, 2025 -
skip
test_torchscript_*
for now until the majority of the community ask for it#39307 merged
Jul 9, 2025 -
fix
aria
tests#39277 merged
Jul 9, 2025 -
[flash attn 3] bring back flags
#39294 merged
Jul 9, 2025 -
Fix SDPA attention precision issue in Qwen2.5-VL
#37363 merged
Jul 9, 2025 -
[Tests] Update model_id in AIMv2 Tests
#39281 merged
Jul 8, 2025 -
Update T5gemma
#39210 merged
Jul 8, 2025 -
Add torchcodec in docstrings/tests for
datasets
4.0#39156 merged
Jul 8, 2025 -
Add trust_remote_code in LightGlueConfig
#39253 merged
Jul 8, 2025 -
fix flaky
test_generate_compile_model_forward
#39276 merged
Jul 8, 2025 -
Refactor
PretrainedConfig.__init__
method to make it more explicit#39158 merged
Jul 8, 2025 -
[smollm3] add tokenizer mapping for
smollm3
#39271 merged
Jul 8, 2025 -
[pagged-attention] fix off-by-1 error in pagged attention generation
#39258 merged
Jul 8, 2025 -
[CI] fix docs
#39273 merged
Jul 8, 2025 -
Add Aimv2 model
#36625 merged
Jul 8, 2025 -
Add Doge model
#35891 merged
Jul 8, 2025 -
Fix errors when use verl to train GLM4.1v model
#39199 merged
Jul 8, 2025 -
fix recompiles due to instance key, and deepcopy issues
#39270 merged
Jul 8, 2025 -
fix(generation): stop beam search per-instance when heuristic satisfied
#38778 merged
Jul 8, 2025 -
remove broken block
#39255 merged
Jul 8, 2025 -
Skip
test_eager_matches sdpa generate
and update an integration test for blip-like models#39248 merged
Jul 8, 2025 -
Fix license text, duplicate assignment, and typo in constant names
#39250 merged
Jul 8, 2025 -
fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU
#39187 merged
Jul 8, 2025 -
Glm 4 doc
#39247 merged
Jul 8, 2025 -
Update LED model card
#39233 merged
Jul 7, 2025 -
fix some flaky tests in
tests/generation/test_utils.py
#39254 merged
Jul 7, 2025 -
Simplify Mixtral and its modular children
#39252 merged
Jul 7, 2025 -
Add
segmentation_maps
support to MobileNetV2ImageProcessor#37312 merged
Jul 7, 2025 -
Clarify per_device_train_batch_size scaling in TrainingArguments (#38…
#38857 merged
Jul 7, 2025 -
Add Korean translation for glossary.md
#38804 merged
Jul 7, 2025 -
Update tiny-agents example
#39245 merged
Jul 7, 2025 -
adjust input and output texts for test_modeling_recurrent_gemma.py
#39190 merged
Jul 7, 2025 -
enable xpu on kv-cache and hqq doc
#39246 merged
Jul 7, 2025 -
Fix patch helper
#39216 merged
Jul 7, 2025 -
RotaryEmbeddings change
is not None
->isinstance(..., dict)
#39145 merged
Jul 7, 2025 -
fix
fastspeech2_conformer
tests#39229 merged
Jul 7, 2025 -
[bugfix] fix flash attention 2 unavailable error on Ascend NPU
#39166 merged
Jul 7, 2025 -
[modular] Simplify logic and docstring handling
#39185 merged
Jul 7, 2025 -
Make _compute_dynamic_ntk_parameters exportable
#39171 merged
Jul 7, 2025 -
fix bug using FSDP V1 will lead to model device not properly set
#39177 merged
Jul 7, 2025 -
Don't send new comment if the previous one is less than 30 minutes (unless the content is changed)
#39170 merged
Jul 7, 2025 -
fix typo in Gemma3n notes
#39196 merged
Jul 7, 2025 -
[modular] Follow global indexing and attribute setting, and their dependencies
#39180 merged
Jul 7, 2025 -
Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor
#39244 merged
Jul 7, 2025 -
Replace einsum with unsqueeze
#39234 merged
Jul 7, 2025 -
Expectations re-order and corrected FA3 skip
#39195 merged
Jul 7, 2025 -
[video processors] Support float fps for precise frame sampling
#39134 merged
Jul 7, 2025 -
Refactor the way we handle outputs for new llamas and new models
#39120 merged
Jul 5, 2025 -
Update expected values (after switching to A10) - part 8 - Final
#39220 merged
Jul 4, 2025 -
Update expected values (after switching to A10) - part 7
#39218 merged
Jul 4, 2025 -
Add packed tensor format support for flex/sdpa/eager through the mask!
#39194 merged
Jul 4, 2025 -
Update expected values (after switching to A10) - part 6
#39207 merged
Jul 3, 2025 -
Update expected values (after switching to A10) - part 5
#39205 merged
Jul 3, 2025 -
Fix continuous batching in
transformers serve
#39149 merged
Jul 3, 2025 -
[serve] Cursor support, move docs into separate page, add more examples
#39133 merged
Jul 3, 2025 -
Better return typehints for
from_pretrained
#39184 merged
Jul 3, 2025 -
Update expected values (after switching to A10) - part 4
#39189 merged
Jul 3, 2025 -
[
Dia
] Change ckpt path in docs#39181 merged
Jul 3, 2025 -
Fix many HPU failures in the CI
#39066 merged
Jul 3, 2025 -
Decouple device_map='auto' and tp_plan='auto'
#38942 merged
Jul 3, 2025 -
when delaying optimizer creation only prepare the model
#39152 merged
Jul 3, 2025 -
[glm4v] fix video inference
#39174 merged
Jul 3, 2025 -
Test fixes for Aria (and some Expectation for llava_next_video)
#39131 merged
Jul 2, 2025 -
Update expected values (after switching to A10) - part 3
#39179 merged
Jul 2, 2025 -
Update expected values (after switching to A10) - part 2
#39165 merged
Jul 2, 2025 -
Random serve fixes
#39176 merged
Jul 2, 2025 -
[serve] Model name or path should be required
#39178 merged
Jul 2, 2025 -
[generate] document non-canonical beam search default behavior
#39000 merged
Jul 2, 2025 -
[docs] ViTPose
#38630 merged
Jul 2, 2025 -
Reduce Glm4v model test size significantly
#39173 merged
Jul 2, 2025 -
Fix missing initializations for models created in 2024
#38987 merged
Jul 2, 2025 -
Blip2 fixes
#39080 merged
Jul 2, 2025 -
Fix multimodal processor get duplicate arguments when receive kwargs for initialization
#39125 merged
Jul 2, 2025 -
[Fix] Make EoMT compatible with pipeline
#39122 merged
Jul 2, 2025 -
[smolvlm] fix video inference
#39147 merged
Jul 2, 2025 -
fix default value of config to match checkpionts in LLaVa-OV models
#39163 merged
Jul 2, 2025 -
Add activation sparsity reference in gemma3n doc
#39160 merged
Jul 2, 2025 -
fix
llama
tests#39161 merged
Jul 1, 2025 -
Update expected values (after switching to A10)
#39157 merged
Jul 1, 2025 -
Suggest jobs to use in
run-slow
#39100 merged
Jul 1, 2025 -
update bnb ground truth
#39117 merged
Jul 1, 2025 -
fix: remove undefined variable
#39146 merged
Jul 1, 2025 -
Change
@lru_cache()
to@lru_cache
to match styles from #38883.#39093 merged
Jul 1, 2025 -
Fix: Ensure wandb logs config in offline mode
#38992 merged
Jul 1, 2025 -
Fix missing fsdp & trainer jobs in daily CI
#39153 merged
Jul 1, 2025 -
fix: fixed wrong concatenation which made batching results wrong
#38850 merged
Jul 1, 2025 -
[VLMs] support passing embeds along with pixels
#38467 merged
Jul 1, 2025 -
LlamaAttention forward function type hint is incorrect from new Branch
#38998 merged
Jul 1, 2025 -
[qwen2-vl] fix FA2 inference
#39121 merged
Jul 1, 2025 -
feat: support indivisible shards for TP model loading and TPlizing.
#37220 merged
Jul 1, 2025 -
fix caching_allocator_warmup with tie weights
#39070 merged
Jul 1, 2025 -
🚨 Don't use cache in non-generative models
#38751 merged
Jul 1, 2025 -
Several fixes for Gemma3n
#39135 merged
Jul 1, 2025 -
Fix key mapping for VLMs
#39029 merged
Jul 1, 2025 -
[Whisper] update token timestamps tests
#39126 merged
Jun 30, 2025 -
Update BigBirdPegasus model card
#39104 merged
Jun 30, 2025 -
switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8
#39024 merged
Jun 30, 2025 -
docs: correct two typos in awesome-transformers.md
#39102 merged
Jun 30, 2025 -
Enable XPU doc
#38929 merged
Jun 30, 2025 -
Fix chat
#39128 merged
Jun 30, 2025 -
Licenses
#39127 merged
Jun 30, 2025 -
Split
transformers chat
andtransformers serve
#38443 merged
Jun 30, 2025 -
[chat] Split chat/serve (built on top of lysandre's PR)
#39031 merged
Jun 30, 2025 -
All CI jobs with A10
#39119 merged
Jun 30, 2025 -
docs: Gemma 3n audio encoder
#39087 merged
Jun 30, 2025 -
Fix some bug for finetune and batch infer For GLM-4.1V
#39090 merged
Jun 30, 2025 -
fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8
#39116 merged
Jun 30, 2025
141 Pull requests opened by 111 people
-
add pin memory and block table
#39130 opened
Jun 30, 2025 -
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 opened
Jul 1, 2025 -
Efficient Expert Weight Fusion for Moe deepseek v3
#39150 opened
Jul 1, 2025 -
Mllama fixes
#39182 opened
Jul 2, 2025 -
Add a 'chat' extra
#39183 opened
Jul 2, 2025 -
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 opened
Jul 3, 2025 -
Standardize FSMT class naming: PretrainedFSMTModel → PreTrainedFSMTModel
#39209 opened
Jul 3, 2025 -
Add mobilenet_v5 stub implementation to fix "Unknown Model" error
#39211 opened
Jul 3, 2025 -
Add Ukrainian translation of README.md
#39212 opened
Jul 3, 2025 -
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 opened
Jul 4, 2025 -
Enable granite 4 hybrid integration tests
#39222 opened
Jul 4, 2025 -
feat: add sliding window attention to Continuous Batching
#39225 opened
Jul 4, 2025 -
Add support for `ModernBertForMultipleChoice`
#39232 opened
Jul 4, 2025 -
added moment_p sampling
#39236 opened
Jul 5, 2025 -
[WIP]support npu fusion patch
#39238 opened
Jul 6, 2025 -
Fix slow test_moshika_greedy_unconditional_fp16
#39251 opened
Jul 7, 2025 -
Fix to tuple conversion with config
#39257 opened
Jul 7, 2025 -
Fix: Add version check for timm to support mobilenetv5 models (fixes #39208)
#39264 opened
Jul 7, 2025 -
Refactor label name handling for PEFT models in Trainer class
#39265 opened
Jul 8, 2025 -
Add support for logging number of image tokens
#39274 opened
Jul 8, 2025 -
Bump transformers from 4.48.0 to 4.52.1 in /examples/tensorflow/language-modeling-tpu
#39284 opened
Jul 8, 2025 -
Feat: add Kwai-Keye transformers
#39292 opened
Jul 9, 2025 -
Add T5LA models
#39293 opened
Jul 9, 2025 -
Fix bug with deepspeed and accelerator args in training_args.py
#39297 opened
Jul 9, 2025 -
Fix critical typos in code example
#39303 opened
Jul 9, 2025 -
Fix batch object detection 31356
#39306 opened
Jul 9, 2025 -
Fix audio pipeline with torchcodec input
#39309 opened
Jul 9, 2025 -
Add dates to the model docs
#39320 opened
Jul 9, 2025 -
Remove conditional generation in image-to-text pipelines
#39332 opened
Jul 10, 2025 -
Fix the issue that csm model cannot work with pipeline mode.
#39349 opened
Jul 11, 2025 -
fix colpali mapping
#39353 opened
Jul 11, 2025 -
Update docstring for glm4v
#39357 opened
Jul 11, 2025 -
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 opened
Jul 11, 2025 -
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 opened
Jul 11, 2025 -
Fix `fix_and_overwrite` mode of `utils/check_docstring.py`
#39369 opened
Jul 11, 2025 -
Support FP8 accelerate config
#39370 opened
Jul 11, 2025 -
fix(siglip2): clarify text pooling logic and remove misleading EOS co…
#39371 opened
Jul 11, 2025 -
Add Apertus
#39381 opened
Jul 12, 2025 -
Fix: Docker Build Vulnerable to Malicious Package Installation Attack in docker/custom-tokenizers.dockerfile
#39394 opened
Jul 14, 2025 -
[RoPE] allow models to configure local RoPE
#39397 opened
Jul 14, 2025 -
No repeat kv
#39402 opened
Jul 14, 2025 -
Add Vocos model
#39403 opened
Jul 14, 2025 -
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 opened
Jul 15, 2025 -
Fix logger warnings in Gemma model test files
#39449 opened
Jul 16, 2025 -
Add eurobert
#39455 opened
Jul 16, 2025 -
Fix quantized model initialization for int8 dtypes
#39456 opened
Jul 16, 2025 -
Skipping `initialize_weights` when model is quantized
#39464 opened
Jul 17, 2025 -
README: Update Bert Japanese model card
#39466 opened
Jul 17, 2025 -
Fix quantized model dispatch with device_map='auto'
#39468 opened
Jul 17, 2025 -
Fix Bark failing tests
#39478 opened
Jul 17, 2025 -
Add model arcinstitute state
#39480 opened
Jul 17, 2025 -
Bye bye env vars, keep everything as configs
#39483 opened
Jul 17, 2025 -
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 opened
Jul 17, 2025 -
Update CTRL model card with improved usage examples and documentation notes
#39487 opened
Jul 17, 2025 -
Avoid aliasing in cond's branches for torch 2.8
#39488 opened
Jul 17, 2025 -
Fix: Skip weight initialization for quantized int8 models
#39491 opened
Jul 17, 2025 -
[Voxtral] nit + pin correct mistral common version
#39493 opened
Jul 18, 2025 -
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 opened
Jul 18, 2025 -
[ASR pipline] fix with datasets 4.0
#39504 opened
Jul 18, 2025 -
Make sure Moshi is exportable with static cache
#39506 opened
Jul 18, 2025 -
feat(tokenization): add encode_message to tokenize messages one by one
#39507 opened
Jul 18, 2025 -
[WIP] :broom: :broom: :broom: Get set decoder cleanup
#39509 opened
Jul 18, 2025 -
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 opened
Jul 19, 2025 -
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 opened
Jul 19, 2025 -
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 opened
Jul 19, 2025 -
build: Add fast image processor tvp
#39529 opened
Jul 20, 2025 -
Add Beit3 model
#39534 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated `cache_explanation.md` to Korean
#39535 opened
Jul 20, 2025 -
[Voxtral] Fix typo
#39540 opened
Jul 20, 2025 -
Add Muon optimizer implementation and integration
#39541 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened
Jul 21, 2025 -
[WIP] try to relax the tie_weights method
#39555 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened
Jul 21, 2025 -
fix load_model_end = true work when save_steps < eval_steps
#39560 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened
Jul 21, 2025 -
[i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened
Jul 22, 2025 -
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened
Jul 22, 2025 -
[`Ernie 4.5`] Ernie VL models
#39585 opened
Jul 22, 2025 -
WIP, reference modeling
#39588 opened
Jul 22, 2025 -
Add Fast Image Processor for ImageGPT
#39592 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened
Jul 22, 2025 -
Fix: check TrainerState file exists before loading during resume
#39599 opened
Jul 23, 2025 -
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened
Jul 23, 2025 -
feat: add `is_fast` to ImageProcessor
#39603 opened
Jul 23, 2025 -
Update model card for Cohere2 (Command R7B)
#39604 opened
Jul 23, 2025 -
HunYuan opensource
#39606 opened
Jul 23, 2025 -
Chat schemas
#39609 opened
Jul 23, 2025 -
Fix return typehint for decoder and annotate inv_freq
#39610 opened
Jul 23, 2025 -
Rework add-new-model-like with modular and make test filenames coherent
#39612 opened
Jul 23, 2025 -
Export SmolvLM
#39614 opened
Jul 23, 2025 -
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened
Jul 23, 2025 -
docs: Update EfficientLoFTR documentation
#39620 opened
Jul 23, 2025 -
fix tensor device when loading state dict
#39623 opened
Jul 24, 2025 -
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened
Jul 24, 2025 -
[serve] Add speech-to-text
#39631 opened
Jul 24, 2025 -
fix dead NVIDIA link
#39632 opened
Jul 24, 2025 -
Reorder serving docs
#39634 opened
Jul 24, 2025 -
Fix quant docker for fp-quant
#39641 opened
Jul 24, 2025 -
fix chameleonvision UT failure
#39646 opened
Jul 24, 2025 -
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened
Jul 24, 2025 -
[modular] small fixes
#39663 opened
Jul 25, 2025 -
Reduce atol values in test_dynamic_cache_exportability
#39667 opened
Jul 25, 2025 -
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened
Jul 25, 2025 -
[BugFix]: Support dict and config file path for deepspeed
#39675 opened
Jul 25, 2025 -
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened
Jul 25, 2025 -
Allow custom hf_quantizer in from_pretrained
#39690 opened
Jul 26, 2025 -
fix misspelled issues
#39691 opened
Jul 26, 2025 -
Don't set `run_name` when none
#39695 opened
Jul 26, 2025 -
use untyped storage for dtensors due to deprecation
#39697 opened
Jul 26, 2025 -
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened
Jul 26, 2025 -
standardized BARThez model card
#39701 opened
Jul 26, 2025 -
Update mT5 model card
#39702 opened
Jul 26, 2025 -
Fix Causality Handling in Flash Attention to Support Bidirectional Attention
#39707 opened
Jul 27, 2025 -
🌐[i18n-bn] Introduce Bengali version of Transformers documentation
#39708 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39712 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 opened
Jul 27, 2025 -
Fix eval thread fork bomb
#39717 opened
Jul 27, 2025 -
Fix SigLIP2 documentation model/processor mismatch
#39718 opened
Jul 28, 2025 -
[Feat] Adding Intern-S1
#39722 opened
Jul 28, 2025 -
Fix int4 quantized model cannot work with cpu
#39724 opened
Jul 28, 2025 -
[qwen-vl] fix beam search with videos
#39726 opened
Jul 28, 2025 -
Super tiny update
#39727 opened
Jul 28, 2025 -
Export private symbols
#39729 opened
Jul 28, 2025 -
handle multimodal models with tp_plan on the text_config
#39735 opened
Jul 28, 2025 -
Standardize CLAP model card format
#39738 opened
Jul 28, 2025 -
Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid
#39739 opened
Jul 28, 2025 -
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 opened
Jul 28, 2025 -
Fix HfArgumentParser to filter out dict types from Union
#39741 opened
Jul 28, 2025 -
Update HuBERT model card according to template
#39742 opened
Jul 29, 2025 -
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 opened
Jul 29, 2025 -
🌐 [i18n-KO] Translated `text-to-speech.md` to Korean
#39751 opened
Jul 29, 2025 -
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 opened
Jul 29, 2025 -
[Draft] Add Llasa TTS family of models
#39760 opened
Jul 29, 2025 -
Fix an invalid judgement
#39762 opened
Jul 29, 2025 -
Improve Gemma3n model and tests
#39764 opened
Jul 29, 2025 -
[draft] No more using `from_legacy_cache` as initialization
#39765 opened
Jul 29, 2025
177 Issues closed by 53 people
-
Max cache length issue with Gemma 3
#39711 closed
Jul 29, 2025 -
ModernBERT has been totally destroyed by PR #38974 and #38838
#39747 closed
Jul 29, 2025 -
Support loading Qwen3 MoE GGUF
#39721 closed
Jul 29, 2025 -
[XPU] Model get OOM when loading models
#39627 closed
Jul 29, 2025 -
encoder decoder model compile failed after refactor cache
#39746 closed
Jul 29, 2025 -
_supports_static_cache disappear
#39744 closed
Jul 29, 2025 -
device mismatch error when using `SlidingWindowLayer`.
#39730 closed
Jul 28, 2025 -
AddedToken should check content on `_update`
#39586 closed
Jul 28, 2025 -
Checkpointing broken for classifier training multi-gpu
#38925 closed
Jul 28, 2025 -
vlmm 0.10.0 load baidu/ERNIE-4.5-300B-A47B-Base-PT error
#39719 closed
Jul 28, 2025 -
[i18n-<languageCode>] Translating docs to <عربي>
#38381 closed
Jul 27, 2025 -
Not installable on arm64 due to jaxlib upper bound
#36611 closed
Jul 27, 2025 -
KeyError in Llama-4-Maverick-17B-128E-Instruct-FP8 Inference with Offloading
#38281 closed
Jul 27, 2025 -
ImportError: DLL load failed while importing _safetensors_rust: The specified module could not be found
#38479 closed
Jul 27, 2025 -
Contribute to Transformers on windows natively without WSL
#38601 closed
Jul 27, 2025 -
Reproducibility Issue of Siglip2 with Blackwell Architecture GPUs (RTX 5090)
#38874 closed
Jul 27, 2025 -
The wrong config parameter found in src/transformers/models/qwen2_5_vl/configuration_qwen2_5_vl.py.
#38889 closed
Jul 27, 2025 -
CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed
Jul 26, 2025 -
text-generation extremely slow with large `bad_words_ids` list
#39512 closed
Jul 25, 2025 -
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed
Jul 25, 2025 -
Add Deepseek-VL
#36110 closed
Jul 25, 2025 -
Grammatical error in the "Loading model's" page
#39018 closed
Jul 25, 2025 -
Inference API Returning 404
#39650 closed
Jul 25, 2025 -
Backwards incompatible change in returned hidden states
#39558 closed
Jul 25, 2025 -
Typo in `apply_transcrition_request` method name
#39530 closed
Jul 25, 2025 -
video_auto_processing.py breaks everything
#38846 closed
Jul 25, 2025 -
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed
Jul 25, 2025 -
VoxtralForConditionalGeneration import error
#39611 closed
Jul 24, 2025 -
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed
Jul 24, 2025 -
Wandb isn't logging config in offline mode
#38968 closed
Jul 23, 2025 -
The similarity between image and text in siglip2 is very low
#39597 closed
Jul 23, 2025 -
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed
Jul 23, 2025 -
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed
Jul 23, 2025 -
CI skipped failures tracking issue
#38820 closed
Jul 23, 2025 -
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed
Jul 22, 2025 -
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed
Jul 22, 2025 -
Add EfficientLoFTR model
#36354 closed
Jul 22, 2025 -
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed
Jul 22, 2025 -
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed
Jul 21, 2025 -
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed
Jul 21, 2025 -
T5Gemma problem with tokenizer(?)
#39521 closed
Jul 21, 2025 -
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed
Jul 21, 2025 -
KeyError: 'llava_qwen2'
#39533 closed
Jul 21, 2025 -
Add Gemma 3 For Sequence Classification
#36755 closed
Jul 21, 2025 -
Expected all tensors to be on the same device, but found at least two devices
#37545 closed
Jul 21, 2025 -
DynamicCache results in too many torch recompiles after 4.51
#37908 closed
Jul 21, 2025 -
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed
Jul 21, 2025 -
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed
Jul 21, 2025 -
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed
Jul 21, 2025 -
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed
Jul 20, 2025 -
Issue importing models in jupyter notebooks 'No module named transformers.models.ipynb_checkpoints'
#38726 closed
Jul 19, 2025 -
T5Gemma returning 0 loss for s2s training
#39514 closed
Jul 19, 2025 -
Whisper models appear to be broken with Flash Attention 2
#38662 closed
Jul 18, 2025 -
Speculative Decoding(do_sample=False) get different outputs
#39421 closed
Jul 18, 2025 -
BarkProcessor voice_preset doesn't work
#34634 closed
Jul 18, 2025 -
dataset 4.0.0 , issue with load_dataset loading audio dataset
#39497 closed
Jul 18, 2025 -
Gemma3n don't support chat with history
#39498 closed
Jul 18, 2025 -
modeling_flax_gemma.FlaxGemmaModule failed with incompatible shapes when running with GemmaConfig
#39492 closed
Jul 18, 2025 -
Error for `return_assistant_tokens_mask` in MLLM processor
#38521 closed
Jul 18, 2025 -
`get_video_features` in XCLIPModel always returns `pooled_output`
#38709 closed
Jul 18, 2025 -
I can't make sense of this works on Windows but not on Linux AutoModelForCausalLM.from_pretrained
#39461 closed
Jul 17, 2025 -
HfArgumentParser cannot parse `str` for local path
#39462 closed
Jul 17, 2025 -
breaking changes in ESM model classes
#39405 closed
Jul 17, 2025 -
[torch.export] Unhandled FakeTensor Device Propagation for two different devices
#38975 closed
Jul 17, 2025 -
QA pipeline prediction generates wrong response when `top_k` param > 1
#38984 closed
Jul 17, 2025 -
When will transformers 4.51.4 be released?
#37812 closed
Jul 17, 2025 -
CheckpointLoaderSimple ..... Error while deserializing header: InvalidHeaderDeserialization
#38692 closed
Jul 17, 2025 -
can't torch.export.export tinyllama model
#39463 closed
Jul 17, 2025 -
Missing 4 spaces in SmolVLMImageProcessorFast
#39442 closed
Jul 16, 2025 -
ModernBERT for Sequence Classification - issues with finetuning
#38720 closed
Jul 16, 2025 -
SigLip2 text pooler output selection
#39269 closed
Jul 16, 2025 -
[YosoConfig] Missing `architectures` field
#39424 closed
Jul 16, 2025 -
Qwen3 tokenizer wrong offset_mapping
#39401 closed
Jul 16, 2025 -
OpenTelemetry Collector Connection error when installing the latest release 4.53.0 during `docker build`
#39143 closed
Jul 16, 2025 -
DBRX model passes probabilities and not logits to the load balancer
#39055 closed
Jul 16, 2025 -
`verify_tp_plan` function raises an error if a key without '.' is given
#38419 closed
Jul 16, 2025 -
Whisper chunking algorithm increases WER
#37789 closed
Jul 16, 2025 -
model_type = self._reverse_config_mapping[key.__name__] KeyError: 'Qwen2RMConfig'
#38517 closed
Jul 16, 2025 -
TypeError: 'NoneType' object is not iterable in ESM when using DDP training
#38667 closed
Jul 16, 2025 -
LlamaAttention forward function type hint is incorrect
#38739 closed
Jul 15, 2025 -
`quantization_method` is not cleared after calling `.dequantize()`
#39295 closed
Jul 15, 2025 -
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 closed
Jul 15, 2025 -
Mypy errors since v4.51.0
#37339 closed
Jul 15, 2025 -
Errors using TinyLlama-1.1B-Chat-v1.0 and DirectML
#38340 closed
Jul 15, 2025 -
Pytorch language_modelling example run_clm fails when streaming is enabled
#39285 closed
Jul 15, 2025 -
`transformers.utils.metrics` sets global `TracerProvider`
#39115 closed
Jul 15, 2025 -
There is no transformers version that can run DeepSeek V3 generate
#38710 closed
Jul 15, 2025 -
Support of Qwen3 GGUF model
#38650 closed
Jul 15, 2025 -
Latest Transformers release causes CUDA out-of-memory errors during VisionLLM fine-tuning
#39337 closed
Jul 14, 2025 -
Paligemma model card needs update
#38544 closed
Jul 14, 2025 -
Using resnet-18 in flax
#39388 closed
Jul 14, 2025 -
Getting Warnings When Instantiating Object Detection Models Due to Meta Tensor Initialization
#37615 closed
Jul 14, 2025 -
4.52.2 报错Could not import module 'Qwen3ForCausalLM'
#38291 closed
Jul 14, 2025 -
Transformers fail to load deepseek-ai/DeepSeek-V3 with vllm
#38588 closed
Jul 13, 2025 -
MambaInnerFnBackward
#38600 closed
Jul 13, 2025 -
Failed to full fine tuning code5p 2B
#38602 closed
Jul 13, 2025 -
Exporting google/gemma-3n-e4b-it language_model (decoder) into ONNX format
#39328 closed
Jul 12, 2025 -
Removing the modification of loss value due to rounding off to 4 digits
#38032 closed
Jul 12, 2025 -
Clarification on default top_k sampling parameter
#38549 closed
Jul 12, 2025 -
hidden_states, self_attn_weights = self.self_attn( ValueError: too many values to unpack (expected 2)
#38554 closed
Jul 12, 2025 -
how to use EncoderDecoderModel to do en-de translation?
#8944 closed
Jul 11, 2025 -
vLLM v0.9.2: Qwen2.5-Omni-7B-AWQ fails to load with transformers 4.53.1 (requires 4.52.4)
#39359 closed
Jul 11, 2025 -
Beit image classification have different results compared from versions prior to 4.43.0
#34446 closed
Jul 11, 2025 -
AssertionError: Torch not compiled with CUDA enabled when using device_map="auto" in Ascend NPU
#38468 closed
Jul 11, 2025 -
Allow `mlm_probability` to be set to None when `mlm`=False in `DataCollatorForLanguageModeling`
#38522 closed
Jul 11, 2025 -
"Size mismatch" error when trying to download pretrained ChatGPT-4 using transformers
#38523 closed
Jul 11, 2025 -
[Core] Saving models with multiple shared tensor groups is not supported when model is dispatched
#39097 closed
Jul 10, 2025 -
FlashAttention2 ImportError: undefined symbol with flash_attn_2_cuda when loading Phi-4-Multimodal
#39334 closed
Jul 10, 2025 -
Can't load my LoRA checkpoint after gemma3 refactor
#38927 closed
Jul 10, 2025 -
TypeError in Qwen2_5_VLForConditionalGeneration (torch.finfo misuse)
#39326 closed
Jul 10, 2025 -
Gemma3 slightly alters hidden state when input_ids is batched
#39302 closed
Jul 10, 2025 -
[CI ENERGY Waste] The exist jobs in `Doctests` that has never completed successfully
#39159 closed
Jul 9, 2025 -
ddp_time in TrainingArguments with deepspeed does not work
#38933 closed
Jul 9, 2025 -
transformers showing decoder model architecture detected so padding should be left
#38071 closed
Jul 9, 2025 -
[Florence-2] SyntaxWarning: invalid escape sequence '\d' in processing_florence2.py
#38498 closed
Jul 9, 2025 -
id2label assignment problem in run_glue.py
#38507 closed
Jul 9, 2025 -
flash_attention_3 for Qwen2_5_VisionTransformerPretrainedModel
#39288 closed
Jul 9, 2025 -
Illegal memory access when using 3d rope
#39168 closed
Jul 8, 2025 -
OSError: Tensor parallel is only supported for `torch>=2.5`
#39249 closed
Jul 8, 2025 -
Any plans to add AIMv2 in the model?
#35351 closed
Jul 8, 2025 -
Request to add Doge
#35889 closed
Jul 8, 2025 -
ModernBERT for MLM outputs incorrect hidden state shape.
#38499 closed
Jul 8, 2025 -
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#38500 closed
Jul 8, 2025 -
ModuleNotFoundError: No module named 'habana_frameworks.torch'
#39256 closed
Jul 7, 2025 -
disable_grouping parameter missed in image_processing_glm4v_fast.py
#39237 closed
Jul 7, 2025 -
A word-level timestamps on whisper generation pipeline is mismatched to total duration
#36228 closed
Jul 7, 2025 -
Pickle error when downloading DeepSeek model
#38476 closed
Jul 7, 2025 -
Token shape issue in LLaVA-onevision fine-tuning
#38481 closed
Jul 7, 2025 -
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 closed
Jul 6, 2025 -
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed
Jul 6, 2025 -
Whisper Beam Search doesn't work
#33445 closed
Jul 6, 2025 -
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed
Jul 6, 2025 -
Shape Error in Llama4VisionMLP2
#37321 closed
Jul 6, 2025 -
Maybe the vocab_size can be duplicated to the mainconfig for PEFT to pick up
#38017 closed
Jul 6, 2025 -
[Question] The logic of data sampler in data parallel.
#38428 closed
Jul 6, 2025 -
quantizer_hqq should not require a gpu/cuda device to run
#38439 closed
Jul 6, 2025 -
Incorrect API call
#38457 closed
Jul 6, 2025 -
Unable to run run_instance_segmentation_no_trainer with HF Accelerate
#38375 closed
Jul 5, 2025 -
accelerate + device_map auto = error
#38408 closed
Jul 5, 2025 -
Tokenizer returns float32 tensor for empty string input instead of long dtype
#38417 closed
Jul 5, 2025 -
FSDP RuntimeError: 'weight' must be 2-D
#39186 closed
Jul 5, 2025 -
KV cache optimization with paged attention
#27303 closed
Jul 5, 2025 -
No or astronomical loss in `ModernBertForMultipleChoice`
#39201 closed
Jul 4, 2025 -
Possible Typo in "Mask2FormerLoss"
#38559 closed
Jul 4, 2025 -
[Gaudi] the seamless_m4t cannot work on Gaudi. No need to fix. Workaround PR is merged.
#39118 closed
Jul 3, 2025 -
device_map='auto' coupled with tp_plan='auto'
#38771 closed
Jul 3, 2025 -
Weights not initialized correctly when instantiating model with a pretrained backbone
#38061 closed
Jul 3, 2025 -
401 Unauthorized Error: "Invalid credentials" on POST requests to Inference API from multiple services
#38289 closed
Jul 3, 2025 -
`transformers`' dependency on `sentencepiece` blocks use on windows in python 3.13
#39091 closed
Jul 3, 2025 -
AttributeError: 'Resampler' object has no attribute '_initialize_weights'
#39124 closed
Jul 2, 2025 -
Gibberish generations with FSDP2 and MixedPrecisionPolicy
#38190 closed
Jul 2, 2025 -
Potential mix-up with IMAGENET_STANDARD and IMAGENET_DEFAULT values
#38318 closed
Jul 2, 2025 -
Why is return_assistant_tokens_mask and continue_final_message incompatible?
#38346 closed
Jul 2, 2025 -
VLLM depoly Qwen2.5_omni server error
#39141 closed
Jul 2, 2025 -
`LayoutLMv3TokenizerFast` doesn't pass all the params.
#39151 closed
Jul 1, 2025 -
Incorrect keypoint batch handling inside SuperGlueForKeypointMatching
#38348 closed
Jul 1, 2025 -
`Qwen2_5_VLVisionAttention` with flash attention has no `is_causal` attribute
#39095 closed
Jul 1, 2025 -
Warning when load pretrained model for qwen2-VL-1.5B-Instruct.
#39004 closed
Jul 1, 2025 -
Qwen2_5OmniProcessor.__init__() got multiple values for argument 'image_processor'
#38898 closed
Jul 1, 2025 -
docs: fix typos in awesome-transformers.md WIP
#39101 clo 10000 sed
Jun 30, 2025 -
Behaviour of `batch_eval_metrics` determines the `include_for_metrics` behaviour
#37683 closed
Jun 30, 2025 -
The latest transformer.utils.fx does not working on llama. Only far older version(4.45.1) works.
#38313 closed
Jun 30, 2025 -
Will Gemma 3n be added to transformers?
#38300 closed
Jun 30, 2025
102 Issues opened by 95 people
-
Instantiating `google/gemma-3-4b-pt` with AutoModelForSequenceClassification Reports Unitialized Model
#39763 opened
Jul 29, 2025 -
Follow-up on Issues Regarding Training State Restoration from Interruptions
#39755 opened
Jul 29, 2025 -
Inv frequency has not default, going against our philosophy
#39753 opened
Jul 29, 2025 -
Qwen2_5_VLForConditionalGeneration cfg forward twice error
#39749 opened
Jul 29, 2025 -
`num_beams` > 1 leads to exception for Qwen2.5VL (Qwen family or all VLM models?)
#39723 opened
Jul 28, 2025 -
[transformers==4.54.0] FSDP1 forward misalignment after loading state dict
#39720 opened
Jul 28, 2025 -
[rank0]: ValueError: Your setup doesn't support bf16/gpu.
#39716 opened
Jul 27, 2025 -
OWLv2 with visual prompt - alternative query embedding selection method
#39710 opened
Jul 27, 2025 -
[i18n-<bn>] Translating docs to <Bengali>
#39705 opened
Jul 27, 2025 -
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 opened
Jul 27, 2025 -
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 opened
Jul 26, 2025 -
No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened
Jul 26, 2025 -
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened
Jul 26, 2025 -
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 opened
Jul 26, 2025 -
Qwen 2.5 VL - error without attention_mask
#39685 opened
Jul 26, 2025 -
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened
Jul 25, 2025 -
Accelerate beam search decoding via tree attention
#39682 opened
Jul 25, 2025 -
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened
Jul 25, 2025 -
Issue when initializing a DynamicCache
#39668 opened
Jul 25, 2025 -
T5Gemma training not working
#39656 opened
Jul 25, 2025 -
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened
Jul 24, 2025 -
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened
Jul 23, 2025 -
SageAttention for attention implementation?
#39618 opened
Jul 23, 2025 -
Trainer: Error when folded metrics are saved
#39616 opened
Jul 23, 2025 -
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened
Jul 23, 2025 -
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened
Jul 23, 2025 -
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened
Jul 23, 2025 -
Model forward execution in full eager mode?
#39565 opened
Jul 21, 2025 -
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened
Jul 21, 2025 -
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened
Jul 21, 2025 -
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened
Jul 21, 2025 -
Add muon and flash-muon optimizer
#39537 opened
Jul 20, 2025 -
training google colab error
#39527 opened
Jul 19, 2025 -
paged attention NOT working with Qwen Models
#39525 opened
Jul 19, 2025 -
T5Gemma failing on provided example
#39522 opened
Jul 19, 2025 -
Export voxtral to ExecuTorch
#39511 opened
Jul 18, 2025 -
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 opened
Jul 18, 2025 -
Add Muon Optimiser for 2x faster convergence
#39495 opened
Jul 18, 2025 -
Transformers still tries to use apex.amp which is no longer a thing in apex.
#39484 opened
Jul 17, 2025 -
Adding Space-Time-MiniLM-v0
#39479 opened
Jul 17, 2025 -
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 opened
Jul 17, 2025 -
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 opened
Jul 17, 2025 -
Autoformer get_lagged_subsequences always true if condition
#39460 opened
Jul 16, 2025 -
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 opened
Jul 15, 2025 -
Export LFM2 to ExecuTorch
#39436 opened
Jul 15, 2025 -
Add DiCoW: Diarization-Conditioned Whisper
#39430 opened
Jul 15, 2025 -
Gemma 3 Compilation Issues During Generation
#39427 opened
Jul 15, 2025 -
object detection : matchin outputs.last_hidden_state with results
#39426 opened
Jul 15, 2025 -
Exeception 3 type mismatch
#39413 opened
Jul 15, 2025 -
FP8 training support for Model Parallel / Tensor Parallel (MP/TP)
#39410 opened
Jul 15, 2025 -
TypeError: couldn't find storage object Float8_e4m3fnStorage - which version is needed for this?
#39409 opened
Jul 15, 2025 -
Off-by-one error when using flash_attention with a sliding window
#39408 opened
Jul 15, 2025 -
Whisper `return_language` with pipeline no longer working
#39404 opened
Jul 14, 2025 -
Qwen2.5-VL Sharding error when using Tensor Parallelism
#39399 opened
Jul 14, 2025 -
Mask2FormerImageProcessor yields inconsistent results between single and batch inference
#39382 opened
Jul 12, 2025 -
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 opened
Jul 12, 2025 -
Option to tokenize messages one after the other
#39417 opened
Jul 12, 2025 -
FlashAttention2 support for GSAI-ML / LLaDA-8B-Instruct?
#39377 opened
Jul 12, 2025 -
Adding api key to `transformers serve`
#39367 opened
Jul 11, 2025 -
RuntimeError when loading llmcompressor W8A8 quantized model: int8 dtype in weight initialization
#39366 opened
Jul 11, 2025 -
Bug in modeling_bart.eager_attention_forward
#39365 opened
Jul 11, 2025 -
env.useBrowserCache = true causes JSON parsing error, forced to disable cache making app slower.
#39352 opened
Jul 11, 2025 -
surpport for google/medgemma-27b-it
#39350 opened
Jul 11, 2025 -
Adding support for Gemma 3n GGUFs
#39329 opened
Jul 10, 2025 -
Add HF integration dates + paper release dates to the model docs
#39319 opened
Jul 9, 2025 -
Whisper demo code for model + processor API is broken
#39318 opened
Jul 9, 2025 -
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 opened
Jul 9, 2025 -
Support 2D Array Inputs in Wav2Vec2FeatureExtractor for Non-Waveform Modalities
#39291 opened
Jul 9, 2025 -
hangs during training using deepspeed
#39275 opened
Jul 8, 2025 -
Please help i am trying to run model but issue
#39260 opened
Jul 7, 2025 -
[Trainer] Eval loss depends on batch size (with solution)
#39241 opened
Jul 7, 2025 -
Specifying multiple metrics in TrainingArguments.metric_for_best_model
#39235 opened
Jul 5, 2025 -
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 opened
Jul 4, 2025 -
Feature Request: Native Support for Custom Multimodal Models
#39219 opened
Jul 4, 2025 -
torch fake_tensor load hf model failed
#39217 opened
Jul 4, 2025 -
Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39214 opened
Jul 4, 2025 -
Remove device to host sync triggered in _flash_attention_forward
#39213 opened
Jul 4, 2025 -
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 opened
Jul 3, 2025 -
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 opened
Jul 3, 2025 -
Naming incosistencies of `PreTrained*` classes.
#39202 opened
Jul 3, 2025 -
🐛 Bug Report: Accelerate config to disable torch dynamo is ignored by transformers automatic compilation
#39191 opened
Jul 3, 2025 -
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 opened
Jul 3, 2025 -
Torch patches tracker for HPU/Gaudi
#39175 opened
Jul 2, 2025 -
Using Gemma3n with text-only generation requires image dependencies
#39169 opened
Jul 2, 2025 -
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 opened
Jul 1, 2025 -
Add x-transformers library by lucidrains
#39139 opened
Jun 30, 2025 -
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 opened
Jun 30, 2025 -
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 opened
Jun 30, 2025 -
Gradient accumulation steps for Vision Languge model
#39123 opened
Jun 30, 2025 -
Is there a way to force it to use ASCII based progress bar and not the ipython widget one?
#39114 opened
Jun 29, 2025 -
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 opened
Jun 29, 2025 -
New release 4.53.0 breaks HF trainer/model
#39111 opened
Jun 29, 2025
152 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Segment Anything 2 (SAM2)
#32317 commented on
Jul 29, 2025 • 91 new comments -
[WiP] Add xcodec2 model
#37868 commented on
Jul 28, 2025 • 70 new comments -
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on
Jul 23, 2025 • 57 new comments -
Add support for Florence-2
#38188 commented on
Jul 29, 2025 • 48 new comments -
blt wip
#38579 commented on
Jul 29, 2025 • 38 new comments -
Add Ovis2 model and processor implementation
#37088 commented on
Jul 29, 2025 • 29 new comments -
Add NVIDIA Cosmos
#36476 commented on
Jul 16, 2025 • 14 new comments -
🔴[`Attention`] Bert-based Models Attention Refactor
#38301 commented on
Jul 16, 2025 • 12 new comments -
feat: Add ConvaiCausalLM model for Hindi Causal Language Modeling
#37837 commented on
Jul 16, 2025 • 10 new comments -
Disable static cache on certain MoE models
#39108 commented on
Jul 28, 2025 • 8 new comments -
Add X-Codec model
#38248 commented on
Jul 23, 2025 • 7 new comments -
Add Dust3R
#38805 commented on
Jul 22, 2025 • 6 new comments -
[WIP] Add MM Grounding DINO
#37925 commented on
Jul 26, 2025 • 4 new comments -
Add FAST
#35476 commented on
Jul 29, 2025 • 4 new comments -
Modular m4t speecht5 sew
#37473 commented on
Jul 2, 2025 • 3 new comments -
Force real tensors and clone state_dict in src/transformers/modeling_utils.py
#38114 commented on
Jul 15, 2025 • 3 new comments -
[omni modality] support composite processor config
#38142 commented on
Jul 28, 2025 • 3 new comments -
Make executorch integration more seamless by analyzing model signature
#36969 commented on
Jul 16, 2025 • 3 new comments -
Add callback to monitor progress in whisper transcription
#37483 commented on
Jul 29, 2025 • 2 new comments -
Use deep copies instead of shallow copies for bbox_embed in GroundingDINO decoder (#37333).
#38999 commented on
Jul 1, 2025 • 1 new comment -
Add serialization function for StaticCache
#38879 commented on
Jul 2, 2025 • 1 new comment -
Provide clearer instructions on how to specify target language.
#38786 commented on
Jul 21, 2025 • 1 new comment -
deci gguf support
#38669 commented on
Jul 29, 2025 • 1 new comment -
[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training
#39012 commented on
Jul 1, 2025 • 1 new comment -
Fix ModernBERT tokenizer issue with is_split_into_words flag
#38564 commented on
Jul 16, 2025 • 1 new comment -
another way to use shift_labels
#38533 commented on
Jul 16, 2025 • 1 new comment -
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on
Jul 7, 2025 • 0 new comments -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on
Jul 23, 2025 • 0 new comments -
[WIP] Computer vision util: vision visualizer
#36892 commented on
Jul 25, 2025 • 0 new comments -
Remove the redundant shift during the loss computation in the Moshi m…
#36928 commented on
Jul 7, 2025 • 0 new comments -
docs: PyTorch examples (image-classification & image-pretraining) clarity
#39094 commented on
Jul 3, 2025 • 0 new comments -
Adding a stub for MiniCPM-o to the models
#37049 commented on
Jul 7, 2025 • 0 new comments -
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on
Jul 22, 2025 • 0 new comments -
docs: Update LayoutLMv3 model card with standardized format and impro…
#37155 commented on
Jul 3, 2025 • 0 new comments -
trying custom tokenizer fix
#37177 commented on
Jul 16, 2025 • 0 new comments -
[RFC] Fix Gemma 3 FP16 with activation scaling
#37226 commented on
Jul 16, 2025 • 0 new comments -
Add QLIP Model
#37328 commented on
Jul 7, 2025 • 0 new comments -
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on
Jul 21, 2025 • 0 new comments -
Fix edge case for tokenize (#36277)
#36555 commented on
Jul 8, 2025 • 0 new comments -
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on
Jul 29, 2025 • 0 new comments -
Fix pos idx v4.52.4
#39096 commented on
Jul 8, 2025 • 0 new comments -
Add Phi-3.5-vision
#36036 commented on
Jul 28, 2025 • 0 new comments -
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on
Jul 21, 2025 • 0 new comments -
Add StyleTTS 2
#35790 commented on
Jul 28, 2025 • 0 new comments -
use warning_once instead of warning in Trainer.tokenizer
#35482 commented on
Jul 25, 2025 • 0 new comments -
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on
Jul 26, 2025 • 0 new comments -
Fix hardcoded `float` dtypes in DeBERTa model, which caused multiple RuntimeErrors in `bfloat16`
#35336 commented on
Jul 9, 2025 • 0 new comments -
Add JinaBERT model
#35320 commented on
Jul 15, 2025 • 0 new comments -
uniformize kwargs for OneFormer
#34547 commented on
Jul 7, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Jul 21, 2025 • 0 new comments -
Fix audio-related config naming for Gemma3n
#39103 commented on
Jun 30, 2025 • 0 new comments -
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on
Jul 15, 2025 • 0 new comments -
Allow compile with bnb
#38886 commented on
Jul 3, 2025 • 0 new comments -
Add `SepCache` [An efficient and easy-to-use Cache from the SepLLM paper - ICML 2025 (https://arxiv.org/abs/2412.12094) ] to the `cache_utils.py` and `__init__.py`
#38824 commented on
Jul 28, 2025 • 0 new comments -
docs: Musicgen melody model card
#38955 commented on
Jul 3, 2025 • 0 new comments -
Adding custom 3d mask into ModernBert
#38671 commented on
Jul 29, 2025 • 0 new comments -
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on
Jul 22, 2025 • 0 new comments -
Updating model card for wav2vec2
#38956 commented on
Jul 5, 2025 • 0 new comments -
Add Bagel
#38569 commented on
Jul 25, 2025 • 0 new comments -
On branch fix-void-segment-mask-input [WIP]
#38532 commented on
Jul 1, 2025 • 0 new comments -
2/2 More cleaning for the `LlamaModel` keeping only the core
#38368 commented on
Jul 10, 2025 • 0 new comments -
Update wav2vec2-bert model card
#38957 commented on
Jul 3, 2025 • 0 new comments -
Fix the shape of ModernBertForMaskedLM's output hidden_states
#38272 commented on
Jul 16, 2025 • 0 new comments -
Updated model card for wav2vec2-conformer
#38958 commented on
Jul 3, 2025 • 0 new comments -
Updated the model card for wav2vec2-phoneme
#38959 commented on
Jul 3, 2025 • 0 new comments -
Check docstring inside modular files as well
#38988 commented on
Jul 9, 2025 • 0 new comments -
SQuat cache implementation
#38055 commented on
Jul 11, 2025 • 0 new comments -
Add submodels support check function
#39009 commented on
Jul 1, 2025 • 0 new comments -
support MiniCPM-o2.6
#37917 commented on
Jul 8, 2025 • 0 new comments -
add profiler to trainer
#37889 commented on
Jul 29, 2025 • 0 new comments -
fix `kosmos2` tests
#39037 commented on
Jun 30, 2025 • 0 new comments -
Update ruff to 0.12.3 and apply its fixes
#37809 commented on
Jul 21, 2025 • 0 new comments -
Vectorize deepseek moe
#37769 commented on
Jul 16, 2025 • 0 new comments -
Add PLM Model
#37634 commented on
Jul 7, 2025 • 0 new comments -
Allow compression on meta device
#39039 commented on
Jul 15, 2025 • 0 new comments -
Fix interpolation of convnext image processor
#37460 commented on
Jul 7, 2025 • 0 new comments -
Fix typo in Gemma3ForCausalLM doctest
#37374 commented on
Jul 7, 2025 • 0 new comments -
If a training job job failed MLFlow will not be reported and MLFlow shows job still running
#30333 commented on
Jul 15, 2025 • 0 new comments -
"pipeline" is not exported from module "transformers"
#37646 commented on
Jul 16, 2025 • 0 new comments -
[DOCS] Add `pruna` as optimization framework
#38740 commented on
Jul 16, 2025 • 0 new comments -
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on
Jul 16, 2025 • 0 new comments -
Modernbert 3D attention mask
#38040 commented on
Jul 16, 2025 • 0 new comments -
Automatic dynamic batch size selection for DataCollatorWithFlattening
#33945 commented on
Jul 16, 2025 • 0 new comments -
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on
Jul 16, 2025 • 0 new comments -
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on
Jul 16, 2025 • 0 new comments -
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on
Jul 16, 2025 • 0 new comments -
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on
Jul 16, 2025 • 0 new comments -
Flex attention support with arbitrary 4d mask for LlamaModel
#33898 commented on
Jul 17, 2025 • 0 new comments -
Add `pruna` integration for loading model through `transformers.from_pretrained` / `pipeline`.
#37971 commented on
Jul 17, 2025 • 0 new comments -
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on
Jul 17, 2025 • 0 new comments -
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 commented on
Jul 18, 2025 • 0 new comments -
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on
Jul 18, 2025 • 0 new comments -
Safetensors deserializing silently mishandles tied parameters
#38870 commented on
Jul 18, 2025 • 0 new comments -
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on
Jul 20, 2025 • 0 new comments -
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on
Jul 21, 2025 • 0 new comments -
Add support for BAGEL from ByteDance
#38267 commented on
Jun 30, 2025 • 0 new comments -
Support Asynchronous Evaluation on Separate GPU in `Trainer`
#38829 commented on
Jun 30, 2025 • 0 new comments -
Trainer.training_step incorrectly normalizes mean token loss when n_gpu > 1
#37474 commented on
Jul 1, 2025 • 0 new comments -
AutoConfig has potential issue with composite config.
#38258 commented on
Jul 2, 2025 • 0 new comments -
scale loss per token/local sequence for discrete system representation
#38854 commented on
Jul 3, 2025 • 0 new comments -
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 commented on
Jul 7, 2025 • 0 new comments -
Pretrainedtokenizerfast Segmentation fault
#39099 commented on
Jul 7, 2025 • 0 new comments -
How to use other acceleration apis of npu?
#39105 commented on
Jul 7, 2025 • 0 new comments -
[i18n-es] Translating docs to Spanish
#28936 commented on
Jul 7, 2025 • 0 new comments -
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 commented on
Jul 8, 2025 • 0 new comments -
Incorrect scaling of Gemma embeddings in float32 regime
#38702 commented on
Jul 9, 2025 • 0 new comments -
Object detection training/fine-tuning for Owl-vit/Owlv2
#33664 commented on
Jul 10, 2025 • 0 new comments -
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
#38966 commented on
Jul 10, 2025 • 0 new comments -
Exporting Llava decoder into ONNX format
#38924 commented on
Jul 11, 2025 • 0 new comments -
Llama4 inference encounter unsupported op in dynamo ?
#38118 commented on
Jul 11, 2025 • 0 new comments -
Loading audio in video from video URLs fail with chat template
#39076 commented on
Jul 14, 2025 • 0 new comments -
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 commented on
Jul 14, 2025 • 0 new comments -
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on
Jul 15, 2025 • 0 new comments -
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on
Jul 15, 2025 • 0 new comments -
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 commented on
Jul 26, 2025 • 0 new comments -
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 commented on
Jul 27, 2025 • 0 new comments -
pytorch_utils.py > isin_mps_friendly > RuntimeError: Expected elements.dtype() == test_elements.dtype() to be true, but got false.
#37423 commented on
Jul 27, 2025 • 0 new comments -
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 commented on
Jul 28, 2025 • 0 new comments -
facebook/dinov2-with-registers-giant does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
#39075 commented on
Jul 28, 2025 • 0 new comments -
Inefficient default GELU implementation in GPT2
#39073 commented on
Jul 28, 2025 • 0 new comments -
Inefficient memory resharding in attention layer
#39072 commented on
Jul 28, 2025 • 0 new comments -
enable GraniteMoeHybridIntegrationTest in UT
#38542 commented on
Jul 28, 2025 • 0 new comments -
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 commented on
Jul 28, 2025 • 0 new comments -
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 commented on
Jul 28, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
Jul 28, 2025 • 0 new comments -
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on
Jul 28, 2025 • 0 new comments -
Output logits differ significantly for different attn_implementations on image inputs
#39067 commented on
Jul 28, 2025 • 0 new comments -
ValueError: GGUF model with architecture deci is not supported yet.
#37736 commented on
Jul 28, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
Jul 29, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
Jul 29, 2025 • 0 new comments -
Implement MambaForSequenceClassification
#31155 commented on
Jul 15, 2025 • 0 new comments -
DeepSpeed sequence parallelism (aka Ulysses) integration with HF transformer
#32305 commented on
Jul 15, 2025 • 0 new comments -
Load a pretrainedfast tokenizer if fast=true and tokenizer.json exists
#33751 commented on
Jul 15, 2025 • 0 new comments -
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on
Jul 21, 2025 • 0 new comments -
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 commented on
Jul 21, 2025 • 0 new comments -
Transformers version causing my finetuned model to hallucinate
#38378 commented on
Jul 21, 2025 • 0 new comments -
Significant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 commented on
Jul 21, 2025 • 0 new comments -
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on
Jul 22, 2025 • 0 new comments -
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on
Jul 22, 2025 • 0 new comments -
How to streaming output audio of Qwen2.5-omni-7b
#37570 commented on
Jul 22, 2025 • 0 new comments -
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 commented on
Jul 22, 2025 • 0 new comments -
Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 commented on
Jul 22, 2025 • 0 new comments -
add MiniCPM-o
#37029 commented on
Jul 22, 2025 • 0 new comments -
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on
Jul 24, 2025 • 0 new comments -
Model implmenetation using Liger Kernel layers
#38416 commented on
Jul 24, 2025 • 0 new comments -
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on
Jul 24, 2025 • 0 new comments -
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on
Jul 24, 2025 • 0 new comments -
Not able to use flash attention with torch.compile with model like BERT
#39017 commented on
Jul 25, 2025 • 0 new comments -
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 commented on
Jul 25, 2025 • 0 new comments -
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on
Jul 25, 2025 • 0 new comments -
pytorch version 1.8.1 compatibility
#39049 commented on
Jul 26, 2025 • 0 new comments -
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 commented on
Jul 26, 2025 • 0 new comments