Pulse · huggingface/transformers · GitHub

8000 Pulse · huggingface/transformers · GitHub

June 29, 2025 – July 29, 2025

Overview

518 Active pull requests

279 Active issues

7 Releases published by 3 people

v4.53.1 Patch Release v4.53.1
published Jul 4, 2025
v4.53.2 Patch Release v4.53.2
published Jul 11, 2025
v4.53.2-modernbert-decoder-preview ModernBERT Decoder (based on v4.53.2)
published Jul 16, 2025
v4.53.3 Patch release v4.53.3
published Jul 22, 2025
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published Jul 23, 2025
v4.54.0 v4.54.0: Kernels, Transformers Serve, Ernie, Voxtral, LFM2, DeepSeek v2, ModernBERT Decoder...
published Jul 25, 2025
4.54.1 Patch release 4.54.1
published Jul 29, 2025

377 Pull requests merged by 154 people

Remove python3.7 reference from doc link
#39706 merged Jul 29, 2025
[docs] Ko doc fixes after toc update
#39660 merged Jul 29, 2025
Fix Cache.max_cache_len max value for Hybrid models
#39737 merged Jul 29, 2025
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 merged Jul 29, 2025
🌐 [i18n-KO] Translated how_to_hack_models.md to Korean
#39536 merged Jul 29, 2025
🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean
#39552 merged Jul 29, 2025
🌐 [i18n-KO] Translated pipeline_gradio.md to Korean
#39520 merged Jul 29, 2025
🌐 [i18n-KO] Translated tokenizer.md to Korean
#39532 merged Jul 29, 2025
🌐 [i18n-KO] Translated tvp.md to Korean
#39578 merged Jul 29, 2025
🌐 [i18n-KO] Translated albert.md to Korean
#39524 merged Jul 29, 2025
🌐 [i18n-KO] Translated main_classes/peft.md
#39515 merged Jul 29, 2025
[modenbert] fix regression
#39750 merged Jul 29, 2025
add libcst to extras["testing"] in setup.py
#39761 merged Jul 29, 2025
Fix version issue in modeling_utils.py
#39759 merged Jul 29, 2025
Enable xpu allocator on caching_allocato 10000 r_warmup
#39654 merged Jul 29, 2025
Support loading Qwen3 MoE GGUF
#39638 merged Jul 29, 2025
Fix GPT2 with cross attention
#39754 merged Jul 29, 2025
Avoid OOM when other tests are failing
#39758 merged Jul 29, 2025
AMD disable torchcodec
#39757 merged Jul 29, 2025
Use --gpus all in workflow files
#39752 merged Jul 29, 2025
Apply several ruff SIM rules
#37283 merged Jul 29, 2025
Fix mamba regression
#39728 merged Jul 29, 2025
Update IMPORTANT_MODELS list
#39734 merged Jul 29, 2025
update GemmaIntegrationTest::test_model_2b_bf16_dola again
#39731 merged Jul 29, 2025
Fix: add back base model plan
#39733 merged Jul 29, 2025
[Fix] import two missing typos in models/__init__.py for typo checking
#39745 merged Jul 29, 2025
fix cache inheritance
#39748 merged Jul 29, 2025
extend more trainer test cases to XPU, all pass
#39652 merged Jul 29, 2025
BLIPs clean-up
#35560 merged Jul 29, 2025
Add Fast Segformer Processor
#37024 merged Jul 28, 2025
Superpoint fast image processor
#37804 merged Jul 28, 2025
Fix AMD dockerfile for audio models
#39669 merged Jul 28, 2025
Fix cache-related tests
#39676 merged Jul 28, 2025
Fix Layer device placement in Caches
#39732 merged Jul 28, 2025
Fix Qwen2AudioForConditionalGeneration.forward() and test_flash_attn_kernels_inference_equivalence
#39503 merged Jul 28, 2025
skip Glm4MoeModelTest::test_torch_compile_for_training
#39670 merged Jul 28, 2025
Update QAPipelineTests::test_large_model_course after #39193
#39666 merged Jul 28, 2025
mllama outputs refactor
#39643 merged Jul 28, 2025
Remove all expired deprecation cycles
#39725 merged Jul 28, 2025
[CI] Add Eric to comment slow ci
#39601 merged Jul 28, 2025
PATCH: add back n-dim device-mesh + fix tp trainer saving
#39693 merged Jul 28, 2025
Add self-hosted runner scale set workflow for mi325 CI
#39651 merged Jul 28, 2025
[configuration] remove redundant classmethod
#38812 merged Jul 28, 2025
update ernie model card
#39657 merged Jul 28, 2025
[processors] add tests for helper fn
#39629 merged Jul 28, 2025
xpu optimization for generation case
#39573 merged Jul 28, 2025
fix(tokenization): check token.content for trie
#39587 merged Jul 28, 2025
Fix missing initialization of FastSpeech2Conformer
#39689 merged Jul 28, 2025
fix missing model._tp_size from ep refactor
#39688 merged Jul 26, 2025
More robust tied weight test
#39681 merged Jul 25, 2025
Add padding-free to Granite hybrid moe models
#39677 merged Jul 25, 2025
Fix tied weight test
#39680 merged Jul 25, 2025
fix break for ckpt without _tp_plan
#39658 merged Jul 25, 2025
Add EXAONE 4.0 model
#39129 merged Jul 25, 2025
Support typing.Literal as type of tool parameters or return value
#39633 merged Jul 25, 2025
Add ep
#39501 merged Jul 25, 2025
bad_words_ids no longer slow on mps
#39556 merged Jul 25, 2025
Add xlstm model
#39665 merged Jul 25, 2025
Use auto_docstring for perception_lm fast image processor
#39679 merged Jul 25, 2025
fix: HWIO to OIHW
#39200 merged Jul 25, 2025
Fix auto_docstring crashing when dependencies are missing
#39564 merged Jul 25, 2025
Add support for DeepseekAI's DeepseekVL
#36248 merged Jul 25, 2025
Add missing flag for CacheLayer
#39678 merged Jul 25, 2025
Add evolla rebase main
#36232 merged Jul 25, 2025
update expected outputs for whisper after #38778
#39304 merged Jul 25, 2025
fix kyutai tests
#39416 merged Jul 25, 2025
Fixes the BC
#39636 merged Jul 25, 2025
Delete bad rebasing functions
#39672 merged Jul 25, 2025
[Ernie 4.5] Post merge adaptations
#39664 merged Jul 25, 2025
[CI] revert device in test_export_static_cache
#39662 merged Jul 25, 2025
Fix ModernBERT Decoder model
#39671 merged Jul 25, 2025
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged Jul 25, 2025
Rename huggingface_cli to hf
#39630 merged Jul 25, 2025
fix(voxtral): correct typo in apply_transcription_request
#39572 merged Jul 25, 2025
make fixup
#39661 merged Jul 25, 2025
[docs] fix ko cache docs
#39644 merged Jul 25, 2025
Make pytorch examples UV-compatible
#39635 merged Jul 25, 2025
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged Jul 25, 2025
Fix: explicit not none check for tensors in flash attention
#39639 merged Jul 25, 2025
[attention] fix test for packed padfree masking
#39582 merged Jul 25, 2025
Add owlv2 fast processor
#39041 merged Jul 25, 2025
revert behavior of _prepare_from_posids
#39622 merged Jul 24, 2025
[Voxtral] values for A10 runners
#39605 merged Jul 24, 2025
[timm] new timm pin
#39640 merged Jul 24, 2025
10000
Fix EfficientLoFTR model id in tests
#39621 merged Jul 24, 2025
Update recent processors for vLLM backend
#39583 merged Jul 24, 2025
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged Jul 23, 2025
standardized YOLOS model card according to template in #36979
#39528 merged Jul 23, 2025
Feature/standardize opt model card
#39568 merged Jul 23, 2025
🔴 Fix EnCodec internals and integration tests
#39431 merged Jul 23, 2025
Fix DAC integration tests and checkpoint conversion.
#39313 merged Jul 23, 2025
Move openai import
#39613 merged Jul 23, 2025
Transformers serve VLM
#39454 merged Jul 23, 2025
Fix important models CI
#39576 merged Jul 23, 2025
Fix typos and grammar issues in documentation and code
#39598 merged Jul 23, 2025
Allow device_mesh have multiple dim
#38949 merged Jul 23, 2025
enable triton backend on awq xpu
#39443 merged Jul 23, 2025
[idefics3] fix for vLLM
#39470 merged Jul 23, 2025
fix moe routing_weights
#39581 merged Jul 23, 2025
FP-Quant support
#38696 merged Jul 23, 2025
Rename supports_static_cache to can_compile_fullgraph
#39505 merged Jul 23, 2025
[Trackio] Allow single-gpu training and monitor power
#39595 merged Jul 23, 2025
Generic task-specific base classes
#39584 merged Jul 23, 2025
Fix DynamicCache and simplify Cache classes a bit
#39590 merged Jul 23, 2025
Mask2former & Maskformer Fast Image Processor
#35685 merged Jul 23, 2025
🎯 Trackio integration
#38814 merged Jul 22, 2025
[WIP] Add OneformerFastImageProcessor
#38343 merged Jul 22, 2025
Fix link in "Inference server backends" doc
#39589 merged Jul 22, 2025
Torchdec RuntimeError catch
#39580 merged Jul 22, 2025
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged Jul 22, 2025
updated mistral3 model card
#39531 merged Jul 22, 2025
Update docs/source/ko/_toctree.yml
#39516 merged Jul 22, 2025
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged Jul 22, 2025
General weight initialization scheme
#39579 merged Jul 22, 2025
Add AMD GPU expectations for LLaVA tests
#39486 merged Jul 22, 2025
Kernels flash attn
#39474 merged Jul 22, 2025
Add AMD expectations to Mistral3 tests
#39481 merged Jul 22, 2025
[docs] Create page on inference servers with transformers backend
#39550 merged Jul 22, 2025
[docs] update attention implementation and cache docs
#39547 merged Jul 22, 2025
Add AMD test expectations to DETR model
#39539 merged Jul 22, 2025
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged Jul 22, 2025
Fixes needed for n-d parallelism and TP
#39562 merged Jul 22, 2025
Bump AMD container for 2.7.1 PyTorch
#39458 merged Jul 22, 2025
Add EfficientLoFTR model
#36355 merged Jul 22, 2025
[gemma3] fix bidirectional image mask
#39396 merged Jul 22, 2025
Update OLMoE model card
#39344 merged Jul 21, 2025
Update modernbertdecoder docs
#39453 merged Jul 21, 2025
[CI] Fix post merge ernie 4.5
#39561 merged Jul 21, 2025
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged Jul 21, 2025
[Ernie 4.5] Add ernie text models
#39228 merged Jul 21, 2025
Refactor embedding input/output getter/setter
#39339 merged Jul 21, 2025
🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean
#39441 merged Jul 21, 2025
[Fast image processor] refactor fast image processor glm4v
#39490 merged Jul 21, 2025
fix ndim check of device_mesh for TP
#39538 merged Jul 21, 2025
Refactor MambaCache to modeling_mamba.py
#38086 merged Jul 21, 2025
Fix Docstring of BarkProcessor
#39546 merged Jul 21, 2025
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged Jul 21, 2025
Fix missing initializations for models created in 2023
#39239 merged Jul 21, 2025
Raise TypeError instead of ValueError for invalid types
#38660 merged Jul 21, 2025
Fix pylint warnings
#39477 merged Jul 21, 2025
Fix Qwen Omni integration test
#39553 merged Jul 21, 2025
🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments
#39395 merged Jul 21, 2025
Rename _supports_flash_attn_2 in examples and tests
#39471 merged Jul 21, 2025
Fix the check in flex test
#39548 merged Jul 21, 2025
Fix bad tensor shape in failing Hubert test.
#39502 merged Jul 21, 2025
GLM-4 Update
#39393 merged Jul 21, 2025
[qwen2 vl] fix packing with all attentions
#39447 merged Jul 21, 2025
[gemma3] support sequence classification task
#39465 merged Jul 21, 2025
Fix placeholders replacement logic in auto_docstring
#39433 merged Jul 18, 2025
Update SAM/SAM HQ attention implementation + fix Cuda sync issues
#39386 merged Jul 18, 2025
Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py
#39439 merged Jul 18, 2025
Add fast image processor SAM
#39385 merged Jul 18, 2025
Fix BatchEncoding.to() for nested elements 10000
#38985 merged Jul 18, 2025
[gemma3] Fix do_convert_rgb in image processors.
#39438 merged Jul 18, 2025
[chat template] return assistant mask in processors
#38545 merged Jul 18, 2025
[dependencies] Update datasets pin
#39500 merged Jul 18, 2025
Slack CI bot: set default result for non-existing artifacts
#39499 merged Jul 18, 2025
🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling
#39423 merged Jul 18, 2025
[doc builder job] temporary pyarrow pin
#39496 merged Jul 18, 2025
Add voxtral
#39429 merged Jul 18, 2025
Fix typing order
#39467 merged Jul 17, 2025
Add unified logits_to_keep support to LLMClass
#39472 merged Jul 17, 2025
[serve] Add speech to text (/v1/audio/transcriptions)
#39434 merged Jul 17, 2025
Update integration_utils.py
#39469 merged Jul 17, 2025
fix: ImageTextToTextPipeline handles user-defined generation_config
#39374 merged Jul 17, 2025
Enable some ruff checks for performance and readability
#39383 merged Jul 17, 2025
Fix convert_and_export_with_cache failures for GPU models
#38976 merged Jul 17, 2025
Update GemmaIntegrationTest::test_model_2b_bf16_dola
#39362 merged Jul 17, 2025
fix a comment typo in utils.py
#39459 merged Jul 17, 2025
Use newer typing notation
#38934 merged Jul 17, 2025
Fix tests due to breaking change in accelerate
#39451 merged Jul 17, 2025
fix max_length calculating using cu_seq_lens
#39341 merged Jul 17, 2025
fix(pipelines): QA pipeline returns fewer than top_k results in batch mode
#39193 merged Jul 17, 2025
Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings
#38822 merged Jul 16, 2025
create ijepa modelcard (ref : PR #36979 ).
#39354 merged Jul 16, 2025
Improve grammar and clarity in perf_hardware.md
#39428 merged Jul 16, 2025
fix cached file error when repo type is dataset
#36909 merged Jul 16, 2025
Fix indentation bug in SmolVLM image processor causing KeyError
#39452 merged Jul 16, 2025
Updated Megatron conversion script for gpt2 checkpoints
#38969 merged Jul 16, 2025
[CI] Fix partially red CI
#39448 merged Jul 16, 2025
Fixes #39204: add fallback if get_base_model missing
#39226 merged Jul 16, 2025
make the loss context manager easier to extend
#39321 merged Jul 16, 2025
Remove something that should have never been there
#38254 merged Jul 16, 2025
Fix processor tests
#39450 merged Jul 16, 2025
[Bugfix] [Quantization] Remove unused init arg
#39324 merged Jul 16, 2025
Better typing for model.config
#39132 merged Jul 16, 2025
Fix typo in generation configuration for Janus model weight conversion
#39432 merged Jul 16, 2025
Responses API in transformers serve
#39155 merged Jul 16, 2025
[cache] make all classes cache compatible finally
#38635 merged Jul 16, 2025
docs: add missing numpy import to minimal example
#39444 merged Jul 16, 2025
Remove runtime conditions for type checking
#37340 merged Jul 16, 2025
Add StableAdamW Optimizer
#39446 merged Jul 16, 2025
add test scanner
#39419 merged Jul 16, 2025
Fix missing definition of diff_file_url in notification service
#39445 merged Jul 16, 2025
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 merged Jul 16, 2025
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
#39372 merged Jul 16, 2025
Defaults to adamw_torch_fused for Pytorch>=2.8
#37358 merged Jul 16, 2025
Fix L270 - hasattr("moe_args") returning False error
#38715 merged Jul 16, 2025
[chat template] add a testcase for kwargs
#39415 merged Jul 16, 2025
Fixed a bug calculating cross entropy loss in JetMoeForCausalLM
#37830 merged Jul 16, 2025
Remove double soft-max in load-balancing loss. Fixes #39055 .
#39056 merged Jul 16, 2025
[Core] [Offloading] Fix saving offloaded submodules
#39280 merged Jul 16, 2025
[autodocstring] add video and audio inputs
#39420 merged Jul 16, 2025
Responses API (to be merged into #39155)
#39338 merged Jul 16, 2025
CI workflow for performed test regressions
#39198 merged Jul 16, 2025
docs: update LightGlue docs
#39407 merged Jul 15, 2025
docs: update SuperGlue docs
#39406 merged Jul 15, 2025
[vlm] fix loading of retrieval VLMs
#39242 merged Jul 15, 2025
handle training summary when creating modelcard but offline mode is set
#37095 merged Jul 15, 2025
Remove residual quantization attribute from dequantized models
#39373 merged Jul 15, 2025
Remove deprecated audio utils functions
#39330 merged Jul 15, 2025
Fix bugs in pytorch example run_clm when streaming is enabled
#39286 merged Jul 15, 2025
Fix bugs from pipeline preprocessor overhaul
#39425 merged Jul 15, 2025
refactor: remove set_tracer_provider and set_meter_provider calls
#39422 merged Jul 15, 2025
Fix invalid property
#39384 merged Jul 15, 2025
set document_question_answering pipeline _load_tokenizer to True
#39411 merged Jul 15, 2025
Ignore extra position embeddings weights for ESM
#39063 merged Jul 15, 2025
support loading qwen3 gguf
#38645 merged Jul 15, 2025
Add ModernBERT Decoder Models - ModernBERT, but trained with CLM!
#38967 merged Jul 15, 2025
Fix typo in /v1/models output payload
#39414 merged Jul 15, 2025
[refactor] set attention implementation
#38974 merged Jul 15, 2025
Fix/siglip2 pooling comment
#39378 merged Jul 14, 2025
Update phi4_multimodal.md
#38830 merged Jul 14, 2025
[Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
#39391 merged Jul 14, 2025
Use np.pad instead of np.lib.pad.
#39346 merged Jul 14, 2025
🚨 Totally rewrite how pipelines load preprocessors
#38947 merged Jul 14, 2025
Remove do_reduce_labels Argument from model initialization in run_semantic_segmentation_no_trainer
#39322 merged Jul 14, 2025
Fix Lfm2 and common tests
#39398 merged Jul 14, 2025
Deprecate AutoModelForVision2Seq
#38900 merged Jul 14, 2025
[Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor
#39333 merged Jul 14, 2025
[BLIP] remove cache from Qformer
#39335 merged Jul 14, 2025
[shieldgemma] fix checkpoint loading
#39348 merged Jul 14, 2025
Fix overriding Fast Image/Video Processors instance attributes affect other instances
#39363 merged Jul 12, 2025
update docker file to use latest timm (for perception_lm)
#39380 merged Jul 12, 2025
Update Model Card for Encoder Decoder Model
#39272 merged Jul 11, 2025
fix gpt2 usage doc
#39351 merged Jul 11, 2025
Updated CamemBERT model card to new standardized format
#39227 merged Jul 11, 2025
Update Readme to Run Multiple Choice Script from Example Directory
#39323 merged Jul 11, 2025
Add mistral common support
#38906 merged Jul 11, 2025
Remove device check in HQQ quantizer
#39299 merged Jul 11, 2025
Small fixes for utils/check_docstrings.py
#38915 merged Jul 11, 2025
fix failing test_sdpa_can_dispatch_on_flash
#39259 merged Jul 11, 2025
update cb TP
#39361 merged Jul 11, 2025
Fix link for testpypi
#39360 merged Jul 11, 2025
PerceptionLM
#37878 merged Jul 11, 2025
Updated Switch Transformers model card with standardized format (Issue #36979)
#39305 merged Jul 10, 2025
Update check_modular_conversion
#37456 merged Jul 10, 2025
Add a default value for position_ids in masking_utils
#39310 merged Jul 10, 2025
[Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups
#39263 merged Jul 10, 2025
[tests] tag serve tests as slow
#39343 merged Jul 10, 2025
[modeling][lfm2] LFM2: Remove deprecated seen_tokens
#39342 merged Jul 10, 2025
LFM2
#39340 merged Jul 10, 2025
[server] add tests and fix passing a custom generation_config
#39230 merged Jul 10, 2025
Handle DAC conversion when using weight_norm with newer PyTorch versions
#36393 merged Jul 10, 2025
fix phi3 tests
#39312 merged Jul 10, 2025
fix Glm4v batch videos forward
#39172 merged Jul 10, 2025
Delete deprecated stuff
#38838 merged Jul 10, 2025
Fix broken SAM after #39120
#39289 merged Jul 9, 2025
enable static cache on TP model
#39164 merged Jul 9, 2025
Fix max_length_q and max_length_k types to flash_attn_varlen_func
#37206 merged Jul 9, 2025
Granite speech speedups
#39197 merged Jul 9, 2025
Fix typo: langauge -> language
#39317 merged Jul 9, 2025
docs: update LLaVA-NeXT model card
#38894 merged Jul 9, 2025
skip files in src/ for doctest (for now)
#39316 merged Jul 9, 2025
Updated the Model docs - for the MARIAN model
#39138 merged Jul 9, 2025
add stevhliu to the list in self-comment-ci.yml
#39315 merged Jul 9, 2025
Fix consistency and a few docstrings warnings
#39314 merged Jul 9, 2025
🌐 [i18n-KO] Translated quark.md to Korean
#39268 merged Jul 9, 2025
Add DeepSeek V2 Model into Transformers
#36400 merged Jul 9, 2025
[sliding window] revert and deprecate
#39301 merged Jul 9, 2025
[modular] Allow method with the same name in case of @property decorator
#39308 merged Jul 9, 2025
skip test_torchscript_* for now until the majority of the community ask for it
#39307 merged Jul 9, 2025
fix aria tests
#39277 merged Jul 9, 2025
[flash attn 3] bring back flags
#39294 merged Jul 9, 2025
Fix SDPA attention precision issue in Qwen2.5-VL
#37363 merged Jul 9, 2025
[Tests] Update model_id in AIMv2 Tests
#39281 merged Jul 8, 2025
Update T5gemma
#39210 merged Jul 8, 2025
Add torchcodec in docstrings/tests for datasets 4.0
#39156 merged Jul 8, 2025
Add trust_remote_code in LightGlueConfig
#39253 merged Jul 8, 2025
fix flaky test_generate_compile_model_forward
#39276 merged Jul 8, 2025
Refactor PretrainedConfig.__init__ method to make it more explicit
#39158 merged Jul 8, 2025
[smollm3] add tokenizer mapping for smollm3
#39271 merged Jul 8, 2025
[pagged-attention] fix off-by-1 error in pagged attention generation
#39258 merged Jul 8, 2025
[CI] fix docs
#39273 merged Jul 8, 2025
Add Aimv2 model
#36625 merged Jul 8, 2025
Add Doge model
#35891 merged Jul 8, 2025
Fix errors when use verl to train GLM4.1v model
#39199 merged Jul 8, 2025
fix recompiles due to instance key, and deepcopy issues
#39270 merged Jul 8, 2025
fix(generation): stop beam search per-instance when heuristic satisfied
#38778 merged Jul 8, 2025
remove broken block
#39255 merged Jul 8, 2025
Skip test_eager_matches sdpa generate and update an integration test for blip-like models
#39248 merged Jul 8, 2025
Fix license text, duplicate assignment, and typo in constant names
#39250 merged Jul 8, 2025
fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU
#39187 merged Jul 8, 2025
Glm 4 doc
#39247 merged Jul 8, 2025
Update LED model card
#39233 merged Jul 7, 2025
fix some flaky tests in tests/generation/test_utils.py
#39254 merged Jul 7, 2025
Simplify Mixtral and its modular children
#39252 merged Jul 7, 2025
Add segmentation_maps support to MobileNetV2ImageProcessor
#37312 merged Jul 7, 2025
Clarify per_device_train_batch_size scaling in TrainingArguments (#38…
#38857 merged Jul 7, 2025
Add Korean translation for glossary.md
#38804 merged Jul 7, 2025
Update tiny-agents example
#39245 merged Jul 7, 2025
adjust input and output texts for test_modeling_recurrent_gemma.py
#39190 merged Jul 7, 2025
enable xpu on kv-cache and hqq doc
#39246 merged Jul 7, 2025
Fix patch helper
#39216 merged Jul 7, 2025
RotaryEmbeddings change is not None -> isinstance(..., dict)
#39145 merged Jul 7, 2025
fix fastspeech2_conformer tests
#39229 merged Jul 7, 2025
[bugfix] fix flash attention 2 unavailable error on Ascend NPU
#39166 merged Jul 7, 2025
[modular] Simplify logic and docstring handling
#39185 merged Jul 7, 2025
Make _compute_dynamic_ntk_parameters exportable
#39171 merged Jul 7, 2025
fix bug using FSDP V1 will lead to model device not properly set
#39177 merged Jul 7, 2025
Don't send new comment if the previous one is less than 30 minutes (unless the content is changed)
#39170 merged Jul 7, 2025
fix typo in Gemma3n notes
#39196 merged Jul 7, 2025
[modular] Follow global indexing and attribute setting, and their dependencies
#39180 merged Jul 7, 2025
Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor
#39244 merged Jul 7, 2025
Replace einsum with unsqueeze
#39234 merged Jul 7, 2025
Expectations re-order and corrected FA3 skip
#39195 merged Jul 7, 2025
[video processors] Support float fps for precise frame sampling
#39134 merged Jul 7, 2025
Refactor the way we handle outputs for new llamas and new models
#39120 merged Jul 5, 2025
Update expected values (after switching to A10) - part 8 - Final
#39220 merged Jul 4, 2025
Update expected values (after switching to A10) - part 7
#39218 merged Jul 4, 2025
Add packed tensor format support for flex/sdpa/eager through the mask!
#39194 merged Jul 4, 2025
Update expected values (after switching to A10) - part 6
#39207 merged Jul 3, 2025
Update expected values (after switching to A10) - part 5
#39205 merged Jul 3, 2025
Fix continuous batching in transformers serve
#39149 merged Jul 3, 2025
[serve] Cursor support, move docs into separate page, add more examples
#39133 merged Jul 3, 2025
Better return typehints for from_pretrained
#39184 merged Jul 3, 2025
Update expected values (after switching to A10) - part 4
#39189 merged Jul 3, 2025
[Dia] Change ckpt path in docs
#39181 merged Jul 3, 2025
Fix many HPU failures in the CI
#39066 merged Jul 3, 2025
Decouple device_map='auto' and tp_plan='auto'
#38942 merged Jul 3, 2025
when delaying optimizer creation only prepare the model
#39152 merged Jul 3, 2025
[glm4v] fix video inference
#39174 merged Jul 3, 2025
Test fixes for Aria (and some Expectation for llava_next_video)
#39131 merged Jul 2, 2025
Update expected values (after switching to A10) - part 3
#39179 merged Jul 2, 2025
Update expected values (after switching to A10) - part 2
#39165 merged Jul 2, 2025
Random serve fixes
#39176 merged Jul 2, 2025
[serve] Model name or path should be required
#39178 merged Jul 2, 2025
[generate] document non-canonical beam search default behavior
#39000 merged Jul 2, 2025
[docs] ViTPose
#38630 merged Jul 2, 2025
Reduce Glm4v model test size significantly
#39173 merged Jul 2, 2025
Fix missing initializations for models created in 2024
#38987 merged Jul 2, 2025
Blip2 fixes
#39080 merged Jul 2, 2025
Fix multimodal processor get duplicate arguments when receive kwargs for initialization
#39125 merged Jul 2, 2025
[Fix] Make EoMT compatible with pipeline
#39122 merged Jul 2, 2025
[smolvlm] fix video inference
#39147 merged Jul 2, 2025
fix default value of config to match checkpionts in LLaVa-OV models
#39163 merged Jul 2, 2025
Add activation sparsity reference in gemma3n doc
#39160 merged Jul 2, 2025
fix llama tests
#39161 merged Jul 1, 2025
Update expected values (after switching to A10)
#39157 merged Jul 1, 2025
Suggest jobs to use in run-slow
#39100 merged Jul 1, 2025
update bnb ground truth
#39117 merged Jul 1, 2025
fix: remove undefined variable
#39146 merged Jul 1, 2025
Change @lru_cache() to @lru_cache to match styles from #38883.
#39093 merged Jul 1, 2025
Fix: Ensure wandb logs config in offline mode
#38992 merged Jul 1, 2025
Fix missing fsdp & trainer jobs in daily CI
#39153 merged Jul 1, 2025
fix: fixed wrong concatenation which made batching results wrong
#38850 merged Jul 1, 2025
[VLMs] support passing embeds along with pixels
#38467 merged Jul 1, 2025
LlamaAttention forward function type hint is incorrect from new Branch
#38998 merged Jul 1, 2025
[qwen2-vl] fix FA2 inference
#39121 merged Jul 1, 2025
feat: support indivisible shards for TP model loading and TPlizing.
#37220 merged Jul 1, 2025
fix caching_allocator_warmup with tie weights
#39070 merged Jul 1, 2025
🚨 Don't use cache in non-generative models
#38751 merged Jul 1, 2025
Several fixes for Gemma3n
#39135 merged Jul 1, 2025
Fix key mapping for VLMs
#39029 merged Jul 1, 2025
[Whisper] update token timestamps tests
#39126 merged Jun 30, 2025
Update BigBirdPegasus model card
#39104 merged Jun 30, 2025
switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8
#39024 merged Jun 30, 2025
docs: correct two typos in awesome-transformers.md
#39102 merged Jun 30, 2025
Enable XPU doc
#38929 merged Jun 30, 2025
Fix chat
#39128 merged Jun 30, 2025
Licenses
#39127 merged Jun 30, 2025
Split transformers chat and transformers serve
#38443 merged Jun 30, 2025
[chat] Split chat/serve (built on top of lysandre's PR)
#39031 merged Jun 30, 2025
All CI jobs with A10
#39119 merged Jun 30, 2025
docs: Gemma 3n audio encoder
#39087 merged Jun 30, 2025
Fix some bug for finetune and batch infer For GLM-4.1V
#39090 merged Jun 30, 2025
fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8
#39116 merged Jun 30, 2025

141 Pull requests opened by 111 people

add pin memory and block table
#39130 opened Jun 30, 2025
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 opened Jul 1, 2025
Efficient Expert Weight Fusion for Moe deepseek v3
#39150 opened Jul 1, 2025
Mllama fixes
#39182 opened Jul 2, 2025
Add a 'chat' extra
#39183 opened Jul 2, 2025
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 opened Jul 3, 2025
Standardize FSMT class naming: PretrainedFSMTModel → PreTrainedFSMTModel
#39209 opened Jul 3, 2025
Add mobilenet_v5 stub implementation to fix "Unknown Model" error
#39211 opened Jul 3, 2025
Add Ukrainian translation of README.md
#39212 opened Jul 3, 2025
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 opened Jul 4, 2025
Enable granite 4 hybrid integration tests
#39222 opened Jul 4, 2025
feat: add sliding window attention to Continuous Batching
#39225 opened Jul 4, 2025
Add support for `ModernBertForMultipleChoice`
#39232 opened Jul 4, 2025
added moment_p sampling
#39236 opened Jul 5, 2025
[WIP]support npu fusion patch
#39238 opened Jul 6, 2025
Fix slow test_moshika_greedy_unconditional_fp16
#39251 opened Jul 7, 2025
Fix to tuple conversion with config
#39257 opened Jul 7, 2025
Fix: Add version check for timm to support mobilenetv5 models (fixes #39208)
#39264 opened Jul 7, 2025
Refactor label name handling for PEFT models in Trainer class
#39265 opened Jul 8, 2025
Add support for logging number of image tokens
#39274 opened Jul 8, 2025
Bump transformers from 4.48.0 to 4.52.1 in /examples/tensorflow/language-modeling-tpu
#39284 opened Jul 8, 2025
Feat: add Kwai-Keye transformers
#39292 opened Jul 9, 2025
Add T5LA models
#39293 opened Jul 9, 2025
Fix bug with deepspeed and accelerator args in training_args.py
#39297 opened Jul 9, 2025
fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test
#39300 opened Jul 9, 2025
Fix critical typos in code example
#39303 opened Jul 9, 2025
Fix batch object detection 31356
#39306 opened Jul 9, 2025
Fix audio pipeline with torchcodec input
#39309 opened Jul 9, 2025
Add dates to the model docs
#39320 opened Jul 9, 2025
Remove conditional generation in image-to-text pipelines
#39332 opened Jul 10, 2025
Fix the issue that csm model cannot work with pipeline mode.
#39349 opened Jul 11, 2025
fix colpali mapping
#39353 opened Jul 11, 2025
Update docstring for glm4v
#39357 opened Jul 11, 2025
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 opened Jul 11, 2025
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 opened Jul 11, 2025
Fix `fix_and_overwrite` mode of `utils/check_docstring.py`
#39369 opened Jul 11, 2025
Support FP8 accelerate config
#39370 opened Jul 11, 2025
fix(siglip2): clarify text pooling logic and remove misleading EOS co…
#39371 opened Jul 11, 2025
Add Apertus
#39381 opened Jul 12, 2025
Fix: Docker Build Vulnerable to Malicious Package Installation Attack in docker/custom-tokenizers.dockerfile
#39394 opened Jul 14, 2025
[RoPE] allow models to configure local RoPE
#39397 opened Jul 14, 2025
No repeat kv
#39402 opened Jul 14, 2025
Add Vocos model
#39403 opened Jul 14, 2025
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 opened Jul 15, 2025
Fix logger warnings in Gemma model test files
#39449 opened Jul 16, 2025
Add eurobert
#39455 opened Jul 16, 2025
Fix quantized model initialization for int8 dtypes
#39456 opened Jul 16, 2025
Skipping `initialize_weights` when model is quantized
#39464 opened Jul 17, 2025
README: Update Bert Japanese model card
#39466 opened Jul 17, 2025
Fix quantized model dispatch with device_map='auto'
#39468 opened Jul 17, 2025
Fix Bark failing tests
#39478 opened Jul 17, 2025
Add model arcinstitute state
#39480 opened Jul 17, 2025
Bye bye env vars, keep everything as configs
#39483 opened Jul 17, 2025
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 opened Jul 17, 2025
Update CTRL model card with improved usage examples and documentation notes
#39487 opened Jul 17, 2025
Avoid aliasing in cond's branches for torch 2.8
#39488 opened Jul 17, 2025
Fix: Skip weight initialization for quantized int8 models
#39491 opened Jul 17, 2025
[Voxtral] nit + pin correct mistral common version
#39493 opened Jul 18, 2025
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 opened Jul 18, 2025
[ASR pipline] fix with datasets 4.0
#39504 opened Jul 18, 2025
Make sure Moshi is exportable with static cache
#39506 opened Jul 18, 2025
feat(tokenization): add encode_message to tokenize messages one by one
#39507 opened Jul 18, 2025
[WIP] :broom: :broom: :broom: Get set decoder cleanup
#39509 opened Jul 18, 2025
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 opened Jul 19, 2025
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 opened Jul 19, 2025
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 opened Jul 19, 2025
build: Add fast image processor tvp
#39529 opened Jul 20, 2025
Add Beit3 model
#39534 opened Jul 20, 2025
🌐 [i18n-KO] Translated `cache_explanation.md` to Korean
#39535 opened Jul 20, 2025
[Voxtral] Fix typo
#39540 opened Jul 20, 2025
Add Muon optimizer implementation and integration
#39541 opened Jul 20, 2025
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened Jul 21, 2025
[WIP] try to relax the tie_weights method
#39555 opened Jul 21, 2025
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened Jul 21, 2025
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened Jul 21, 2025
fix load_model_end = true work when save_steps < eval_steps
#39560 opened Jul 21, 2025
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened Jul 21, 2025
[i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened Jul 22, 2025
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened Jul 22, 2025
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened Jul 22, 2025
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened Jul 22, 2025
[`Ernie 4.5`] Ernie VL models
#39585 opened Jul 22, 2025
WIP, reference modeling
#39588 opened Jul 22, 2025
Add Fast Image Processor for ImageGPT
#39592 opened Jul 22, 2025
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened Jul 22, 2025
Fix: check TrainerState file exists before loading during resume
#39599 opened Jul 23, 2025
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened Jul 23, 2025
feat: add `is_fast` to ImageProcessor
#39603 opened Jul 23, 2025
Update model card for Cohere2 (Command R7B)
#39604 opened Jul 23, 2025
HunYuan opensource
#39606 opened Jul 23, 2025
Chat schemas
#39609 opened Jul 23, 2025
Fix return typehint for decoder and annotate inv_freq
#39610 opened Jul 23, 2025
Rework add-new-model-like with modular and make test filenames coherent
#39612 opened Jul 23, 2025
Export SmolvLM
#39614 opened Jul 23, 2025
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened Jul 23, 2025
docs: Update EfficientLoFTR documentation
#39620 opened Jul 23, 2025
fix tensor device when loading state dict
#39623 opened Jul 24, 2025
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened Jul 24, 2025
[serve] Add speech-to-text
#39631 opened Jul 24, 2025
fix dead NVIDIA link
#39632 opened Jul 24, 2025
Reorder serving docs
#39634 opened Jul 24, 2025
Fix quant docker for fp-quant
#39641 opened Jul 24, 2025
fix chameleonvision UT failure
#39646 opened Jul 24, 2025
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened Jul 24, 2025
[modular] small fixes
#39663 opened Jul 25, 2025
Reduce atol values in test_dynamic_cache_exportability
#39667 opened Jul 25, 2025
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened Jul 25, 2025
[BugFix]: Support dict and config file path for deepspeed
#39675 opened Jul 25, 2025
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened Jul 25, 2025
Allow custom hf_quantizer in from_pretrained
#39690 opened Jul 26, 2025
fix misspelled issues
#39691 opened Jul 26, 2025
Don't set `run_name` when none
#39695 opened Jul 26, 2025
use untyped storage for dtensors due to deprecation
#39697 opened Jul 26, 2025
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened Jul 26, 2025
standardized BARThez model card
#39701 opened Jul 26, 2025
Update mT5 model card
#39702 opened Jul 26, 2025
Fix Causality Handling in Flash Attention to Support Bidirectional Attention
#39707 opened Jul 27, 2025
🌐[i18n-bn] Introduce Bengali version of Transformers documentation
#39708 opened Jul 27, 2025
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39712 opened Jul 27, 2025
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 opened Jul 27, 2025
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 opened Jul 27, 2025
Fix eval thread fork bomb
#39717 opened Jul 27, 2025
Fix SigLIP2 documentation model/processor mismatch
#39718 opened Jul 28, 2025
[Feat] Adding Intern-S1
#39722 opened Jul 28, 2025
Fix int4 quantized model cannot work with cpu
#39724 opened Jul 28, 2025
[qwen-vl] fix beam search with videos
#39726 opened Jul 28, 2025
Super tiny update
#39727 opened Jul 28, 2025
Export private symbols
#39729 opened Jul 28, 2025
handle multimodal models with tp_plan on the text_config
#39735 opened Jul 28, 2025
Standardize CLAP model card format
#39738 opened Jul 28, 2025
Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid
#39739 opened Jul 28, 2025
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 opened Jul 28, 2025
Fix HfArgumentParser to filter out dict types from Union
#39741 opened Jul 28, 2025
Update HuBERT model card according to template
#39742 opened Jul 29, 2025
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 opened Jul 29, 2025
🌐 [i18n-KO] Translated `text-to-speech.md` to Korean
#39751 opened Jul 29, 2025
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 opened Jul 29, 2025
[Draft] Add Llasa TTS family of models
#39760 opened Jul 29, 2025
Fix an invalid judgement
#39762 opened Jul 29, 2025
Improve Gemma3n model and tests
#39764 opened Jul 29, 2025
[draft] No more using `from_legacy_cache` as initialization
#39765 opened Jul 29, 2025

177 Issues closed by 53 people

Max cache length issue with Gemma 3
#39711 closed Jul 29, 2025
Loss is incorrectly scaled in Trainer during the last step with gradient accumulation when the final batch is smaller than accumulation steps.
#38837 closed Jul 29, 2025
ModernBERT has been totally destroyed by PR #38974 and #38838
#39747 closed Jul 29, 2025
Support loading Qwen3 MoE GGUF
#39721 closed Jul 29, 2025
[XPU] Model get OOM when loading models
#39627 closed Jul 29, 2025
encoder decoder model compile failed after refactor cache
#39746 closed Jul 29, 2025
_supports_static_cache disappear
#39744 closed Jul 29, 2025
[exaone4] ZeroDivisionError/TypeError when sliding_window_pattern is None/"LLLG" and _attn_implementation stays None (4.54.0 & main)
#39696 closed Jul 29, 2025
device mismatch error when using `SlidingWindowLayer`.
#39730 closed Jul 28, 2025
AddedToken should check content on `_update`
#39586 closed Jul 28, 2025
Checkpointing broken for classifier training multi-gpu
#38925 closed Jul 28, 2025
vlmm 0.10.0 load baidu/ERNIE-4.5-300B-A47B-Base-PT error
#39719 closed Jul 28, 2025
[i18n-<languageCode>] Translating docs to <عربي>
#38381 closed Jul 27, 2025
Not installable on arm64 due to jaxlib upper bound
#36611 closed Jul 27, 2025
KeyError in Llama-4-Maverick-17B-128E-Instruct-FP8 Inference with Offloading
#38281 closed Jul 27, 2025
ImportError: DLL load failed while importing _safetensors_rust: The specified module could not be found
#38479 closed Jul 27, 2025
Contribute to Transformers on windows natively without WSL
#38601 closed Jul 27, 2025
Error when create ModernBert model with flash attention TypeError: RotaryEmbedding.__init__() got an unexpected keyword argument 'pos_idx_in_fp32'
#38843 closed Jul 27, 2025
Reproducibility Issue of Siglip2 with Blackwell Architecture GPUs (RTX 5090)
#38874 closed Jul 27, 2025
The wrong config parameter found in src/transformers/models/qwen2_5_vl/configuration_qwen2_5_vl.py.
#38889 closed Jul 27, 2025
CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed Jul 26, 2025
text-generation extremely slow with large `bad_words_ids` list
#39512 closed Jul 25, 2025
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed Jul 25, 2025
Add Deepseek-VL
#36110 closed Jul 25, 2025
Grammatical error in the "Loading model's" page
#39018 closed Jul 25, 2025
Inference API Returning 404
#39650 closed Jul 25, 2025
Backwards incompatible change in returned hidden states
#39558 closed Jul 25, 2025
Typo in `apply_transcrition_request` method name
#39530 closed Jul 25, 2025
video_auto_processing.py breaks everything
#38846 closed Jul 25, 2025
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed Jul 25, 2025
VoxtralForConditionalGeneration import error
#39611 closed Jul 24, 2025
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed Jul 24, 2025
`gemma-3-1b-it` with `use_cache=True` and `past_key_values` throws `RuntimeError: CUDA error: device-side assert` error
#39593 closed Jul 24, 2025
Wandb isn't logging config in offline mode
#38968 closed Jul 23, 2025
The similarity between image and text in siglip2 is very low
#39597 closed Jul 23, 2025
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed Jul 23, 2025
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed Jul 23, 2025
CI skipped failures tracking issue
#38820 closed Jul 23, 2025
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed Jul 22, 2025
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed Jul 22, 2025
Add EfficientLoFTR model
#36354 closed Jul 22, 2025
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed Jul 22, 2025
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed Jul 21, 2025
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed Jul 21, 2025
T5Gemma problem with tokenizer(?)
#39521 closed Jul 21, 2025
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed Jul 21, 2025
KeyError: 'llava_qwen2'
#39533 closed Jul 21, 2025
Add Gemma 3 For Sequence Classification
#36755 closed Jul 21, 2025
Expected all tensors to be on the same device, but found at least two devices
#37545 closed Jul 21, 2025
DynamicCache results in too many torch recompiles after 4.51
#37908 closed Jul 21, 2025
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed Jul 21, 2025
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed Jul 21, 2025
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed Jul 21, 2025
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed Jul 20, 2025
Issue importing models in jupyter notebooks 'No module named transformers.models.ipynb_checkpoints'
#38726 closed Jul 19, 2025
T5Gemma returning 0 loss for s2s training
#39514 closed Jul 19, 2025
Whisper models appear to be broken with Flash Attention 2
#38662 closed Jul 18, 2025
Speculative Decoding(do_sample=False) get different outputs
#39421 closed Jul 18, 2025
BarkProcessor voice_preset doesn't work
#34634 closed Jul 18, 2025
dataset 4.0.0 , issue with load_dataset loading audio dataset
#39497 closed Jul 18, 2025
Gemma3n don't support chat with history
#39498 closed Jul 18, 2025
modeling_flax_gemma.FlaxGemmaModule failed with incompatible shapes when running with GemmaConfig
#39492 closed Jul 18, 2025
Error for `return_assistant_tokens_mask` in MLLM processor
#38521 closed Jul 18, 2025
`get_video_features` in XCLIPModel always returns `pooled_output`
#38709 closed Jul 18, 2025
`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
#38717 closed Jul 18, 2025
I can't make sense of this works on Windows but not on Linux AutoModelForCausalLM.from_pretrained
#39461 closed Jul 17, 2025
HfArgumentParser cannot parse `str` for local path
#39462 closed Jul 17, 2025
breaking changes in ESM model classes
#39405 closed Jul 17, 2025
[torch.export] Unhandled FakeTensor Device Propagation for two different devices
#38975 closed Jul 17, 2025
QA pipeline prediction generates wrong response when `top_k` param > 1
#38984 closed Jul 17, 2025
When will transformers 4.51.4 be released?
#37812 closed Jul 17, 2025
[RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when fine-tuning with peft and device_map=auto
#38687 closed Jul 17, 2025
CheckpointLoaderSimple ..... Error while deserializing header: InvalidHeaderDeserialization
#38692 closed Jul 17, 2025
can't torch.export.export tinyllama model
#39463 closed Jul 17, 2025
Missing 4 spaces in SmolVLMImageProcessorFast
#39442 closed Jul 16, 2025
ModernBERT for Sequence Classification - issues with finetuning
#38720 closed Jul 16, 2025
When creating a Trainer object for a MixedModel, the initialization tries to access attribute get_base_model (which does not exist) rather than model
#39204 closed Jul 16, 2025
SigLip2 text pooler output selection
#39269 closed Jul 16, 2025
[YosoConfig] Missing `architectures` field
#39424 closed Jul 16, 2025
Qwen3 tokenizer wrong offset_mapping
#39401 closed Jul 16, 2025
OpenTelemetry Collector Connection error when installing the latest release 4.53.0 during `docker build`
#39143 closed Jul 16, 2025
DBRX model passes probabilities and not logits to the load balancer
#39055 closed Jul 16, 2025
`verify_tp_plan` function raises an error if a key without '.' is given
#38419 closed Jul 16, 2025
Whisper chunking algorithm increases WER
#37789 closed Jul 16, 2025
model_type = self._reverse_config_mapping[key.__name__] KeyError: 'Qwen2RMConfig'
#38517 closed Jul 16, 2025
TypeError: 'NoneType' object is not iterable in ESM when using DDP training
#38667 closed Jul 16, 2025
LlamaAttention forward function type hint is incorrect
#38739 closed Jul 15, 2025
`quantization_method` is not cleared after calling `.dequantize()`
#39295 closed Jul 15, 2025
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 closed Jul 15, 2025
Mypy errors since v4.51.0
#37339 closed Jul 15, 2025
Errors using TinyLlama-1.1B-Chat-v1.0 and DirectML
#38340 closed Jul 15, 2025
Pytorch language_modelling example run_clm fails when streaming is enabled
#39285 closed Jul 15, 2025
`transformers.utils.metrics` sets global `TracerProvider`
#39115 closed Jul 15, 2025
There is no transformers version that can run DeepSeek V3 generate
#38710 closed Jul 15, 2025
Support of Qwen3 GGUF model
#38650 closed Jul 15, 2025
Latest Transformers release causes CUDA out-of-memory errors during VisionLLM fine-tuning
#39337 closed Jul 14, 2025
Paligemma model card needs update
#38544 closed Jul 14, 2025
Using resnet-18 in flax
#39388 closed Jul 14, 2025
Getting Warnings When Instantiating Object Detection Models Due to Meta Tensor Initialization
#37615 closed Jul 14, 2025
4.52.2 报错Could not import module 'Qwen3ForCausalLM'
#38291 closed Jul 14, 2025
Transformers fail to load deepseek-ai/DeepSeek-V3 with vllm
#38588 closed Jul 13, 2025
MambaInnerFnBackward
#38600 closed Jul 13, 2025
Failed to full fine tuning code5p 2B
#38602 closed Jul 13, 2025
Exporting google/gemma-3n-e4b-it language_model (decoder) into ONNX format
#39328 closed Jul 12, 2025
Removing the modification of loss value due to rounding off to 4 digits
#38032 closed Jul 12, 2025
`AutoModel.from_pretrained(...)` (with explicit `device_map` unset) fails under `with torch.device("meta")` with PyTorch 2.6.0 and 2.7.0
#38066 closed Jul 12, 2025
eval_loss not found when training a peft model using trainer.py / losses not retrieved from base model where appropriate
#38130 closed Jul 12, 2025
Clarification on default top_k sampling parameter
#38549 closed Jul 12, 2025
hidden_states, self_attn_weights = self.self_attn( ValueError: too many values to unpack (expected 2)
#38554 closed Jul 12, 2025
how to use EncoderDecoderModel to do en-de translation?
#8944 closed Jul 11, 2025
vLLM v0.9.2: Qwen2.5-Omni-7B-AWQ fails to load with transformers 4.53.1 (requires 4.52.4)
#39359 closed Jul 11, 2025
Beit image classification have different results compared from versions prior to 4.43.0
#34446 closed Jul 11, 2025
AssertionError: Torch not compiled with CUDA enabled when using device_map="auto" in Ascend NPU
#38468 closed Jul 11, 2025
Allow `mlm_probability` to be set to None when `mlm`=False in `DataCollatorForLanguageModeling`
#38522 closed Jul 11, 2025
"Size mismatch" error when trying to download pretrained ChatGPT-4 using transformers
#38523 closed Jul 11, 2025
[Core] Saving models with multiple shared tensor groups is not supported when model is dispatched
#39097 closed Jul 10, 2025
FlashAttention2 ImportError: undefined symbol with flash_attn_2_cuda when loading Phi-4-Multimodal
#39334 closed Jul 10, 2025
Can't load my LoRA checkpoint after gemma3 refactor
#38927 closed Jul 10, 2025
TypeError in Qwen2_5_VLForConditionalGeneration (torch.finfo misuse)
#39326 closed Jul 10, 2025
Gemma3 slightly alters hidden state when input_ids is batched
#39302 closed Jul 10, 2025
[CI ENERGY Waste] The exist jobs in `Doctests` that has never completed successfully
#39159 closed Jul 9, 2025
v4.53.0+ starts erroring with 'Gemma3TextConfig' object has no attribute 'sliding_window_pattern' with vLLM
#39290 closed Jul 9, 2025
ddp_time in TrainingArguments with deepspeed does not work
#38933 closed Jul 9, 2025
transformers showing decoder model architecture detected so padding should be left
#38071 closed Jul 9, 2025
[Florence-2] SyntaxWarning: invalid escape sequence '\d' in processing_florence2.py
#38498 closed Jul 9, 2025
id2label assignment problem in run_glue.py
#38507 closed Jul 9, 2025
flash_attention_3 for Qwen2_5_VisionTransformerPretrainedModel
#39288 closed Jul 9, 2025
Illegal memory access when using 3d rope
#39168 closed Jul 8, 2025
OSError: Tensor parallel is only supported for `torch>=2.5`
#39249 closed Jul 8, 2025
Any plans to add AIMv2 in the model?
#35351 closed Jul 8, 2025
Request to add Doge
#35889 closed Jul 8, 2025
ModernBERT for MLM outputs incorrect hidden state shape.
#38499 closed Jul 8, 2025
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#38500 closed Jul 8, 2025
Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
#39266 closed Jul 8, 2025
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.
#38963 closed Jul 8, 2025
ModuleNotFoundError: No module named 'habana_frameworks.torch'
#39256 closed Jul 7, 2025
disable_grouping parameter missed in image_processing_glm4v_fast.py
#39237 closed Jul 7, 2025
A word-level timestamps on whisper generation pipeline is mismatched to total duration
#36228 closed Jul 7, 2025
Pickle error when downloading DeepSeek model
#38476 closed Jul 7, 2025
Token shape issue in LLaVA-onevision fine-tuning
#38481 closed Jul 7, 2025
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 closed Jul 6, 2025
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed Jul 6, 2025
Whisper Beam Search doesn't work
#33445 closed Jul 6, 2025
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed Jul 6, 2025
Shape Error in Llama4VisionMLP2
#37321 closed Jul 6, 2025
Maybe the vocab_size can be duplicated to the mainconfig for PEFT to pick up
#38017 closed Jul 6, 2025
[Question] The logic of data sampler in data parallel.
#38428 closed Jul 6, 2025
quantizer_hqq should not require a gpu/cuda device to run
#38439 closed Jul 6, 2025
Incorrect API call
#38457 closed Jul 6, 2025
Unable to run run_instance_segmentation_no_trainer with HF Accelerate
#38375 closed Jul 5, 2025
accelerate + device_map auto = error
#38408 closed Jul 5, 2025
Tokenizer returns float32 tensor for empty string input instead of long dtype
#38417 closed Jul 5, 2025
FSDP RuntimeError: 'weight' must be 2-D
#39186 closed Jul 5, 2025
KV cache optimization with paged attention
#27303 closed Jul 5, 2025
No or astronomical loss in `ModernBertForMultipleChoice`
#39201 closed Jul 4, 2025
Possible Typo in "Mask2FormerLoss"
#38559 closed Jul 4, 2025
[Gaudi] the seamless_m4t cannot work on Gaudi. No need to fix. Workaround PR is merged.
#39118 closed Jul 3, 2025
device_map='auto' coupled with tp_plan='auto'
#38771 closed Jul 3, 2025
Weights not initialized correctly when instantiating model with a pretrained backbone
#38061 closed Jul 3, 2025
401 Unauthorized Error: "Invalid credentials" on POST requests to Inference API from multiple services
#38289 closed Jul 3, 2025
`transformers`' dependency on `sentencepiece` blocks use on windows in python 3.13
#39091 closed Jul 3, 2025
Bug in version 4.52.4: LlavaOnevisonConfig Class init the inappropriate (hidden_size,num_attention_heads) pair in vision_config
#39089 closed Jul 2, 2025
AttributeError: 'Resampler' object has no attribute '_initialize_weights'
#39124 closed Jul 2, 2025
Gibberish generations with FSDP2 and MixedPrecisionPolicy
#38190 closed Jul 2, 2025
Potential mix-up with IMAGENET_STANDARD and IMAGENET_DEFAULT values
#38318 closed Jul 2, 2025
Why is return_assistant_tokens_mask and continue_final_message incompatible?
#38346 closed Jul 2, 2025
VLLM depoly Qwen2.5_omni server error
#39141 closed Jul 2, 2025
`LayoutLMv3TokenizerFast` doesn't pass all the params.
#39151 closed Jul 1, 2025
Incorrect keypoint batch handling inside SuperGlueForKeypointMatching
#38348 closed Jul 1, 2025
`Qwen2_5_VLVisionAttention` with flash attention has no `is_causal` attribute
#39095 closed Jul 1, 2025
Incorrect calculation of strides leading to loss of param data upon tensor parallel use while sliced model loading
#37051 closed Jul 1, 2025
Warning when load pretrained model for qwen2-VL-1.5B-Instruct.
#39004 closed Jul 1, 2025
Qwen2_5OmniProcessor.__init__() got multiple values for argument 'image_processor'
#38898 closed Jul 1, 2025
docs: fix typos in awesome-transformers.md WIP
#39101 clo 10000 sed Jun 30, 2025
Behaviour of `batch_eval_metrics` determines the `include_for_metrics` behaviour
#37683 closed Jun 30, 2025
The latest transformer.utils.fx does not working on llama. Only far older version(4.45.1) works.
#38313 closed Jun 30, 2025
Will Gemma 3n be added to transformers?
#38300 closed Jun 30, 2025

102 Issues opened by 95 people

Instantiating `google/gemma-3-4b-pt` with AutoModelForSequenceClassification Reports Unitialized Model
#39763 opened Jul 29, 2025
Follow-up on Issues Regarding Training State Restoration from Interruptions
#39755 opened Jul 29, 2025
Inv frequency has not default, going against our philosophy
#39753 opened Jul 29, 2025
Qwen2_5_VLForConditionalGeneration cfg forward twice error
#39749 opened Jul 29, 2025
losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) output logits and labels are not the batch same size
#39736 opened Jul 28, 2025
`num_beams` > 1 leads to exception for Qwen2.5VL (Qwen family or all VLM models?)
#39723 opened Jul 28, 2025
[transformers==4.54.0] FSDP1 forward misalignment after loading state dict
#39720 opened Jul 28, 2025
[rank0]: ValueError: Your setup doesn't support bf16/gpu.
#39716 opened Jul 27, 2025
OWLv2 with visual prompt - alternative query embedding selection method
#39710 opened Jul 27, 2025
[i18n-<bn>] Translating docs to <Bengali>
#39705 opened Jul 27, 2025
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 opened Jul 27, 2025
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 opened Jul 26, 2025
No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened Jul 26, 2025
4.54.0 bug: ImportError: cannot import name 'deterministic_g' from 'transformers.modeling_flash_attention_utils'
#39694 opened Jul 26, 2025
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened Jul 26, 2025
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 opened Jul 26, 2025
Qwen 2.5 VL - error without attention_mask
#39685 opened Jul 26, 2025
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened Jul 25, 2025
Accelerate beam search decoding via tree attention
#39682 opened Jul 25, 2025
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened Jul 25, 2025
Issue when initializing a DynamicCache
#39668 opened Jul 25, 2025
T5Gemma training not working
#39656 opened Jul 25, 2025
Use DP+FSDP device mesh dimensions for scaling loss with default value of average_tokens_across_devices: True
#39648 opened Jul 24, 2025
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened Jul 24, 2025
[BUG] Run 111B+ Teacher distributed inference and 8B Student distributed training on multi-node H200 GPUs using the Transformers Trainer without encountering OOM errors?
#39637 opened Jul 24, 2025
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened Jul 23, 2025
SageAttention for attention implementation?
#39618 opened Jul 23, 2025
Trainer: Error when folded metrics are saved
#39616 opened Jul 23, 2025
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened Jul 23, 2025
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened Jul 23, 2025
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened Jul 23, 2025
Model forward execution in full eager mode?
#39565 opened Jul 21, 2025
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened Jul 21, 2025
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened Jul 21, 2025
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened Jul 21, 2025
Add muon and flash-muon optimizer
#39537 opened Jul 20, 2025
InformerForPrediction [I would like to seek your opinions, everyone, How can I set the dynamic real features for prediction]
#39551 opened Jul 20, 2025
training google colab error
#39527 opened Jul 19, 2025
paged attention NOT working with Qwen Models
#39525 opened Jul 19, 2025
T5Gemma failing on provided example
#39522 opened Jul 19, 2025
Export voxtral to ExecuTorch
#39511 opened Jul 18, 2025
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 opened Jul 18, 2025
Add Muon Optimiser for 2x faster convergence
#39495 opened Jul 18, 2025
Transformers still tries to use apex.amp which is no longer a thing in apex.
#39484 opened Jul 17, 2025
Adding Space-Time-MiniLM-v0
#39479 opened Jul 17, 2025
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 opened Jul 17, 2025
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 opened Jul 17, 2025
Autoformer get_lagged_subsequences always true if condition
#39460 opened Jul 16, 2025
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 opened Jul 15, 2025
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 opened Jul 15, 2025
Export LFM2 to ExecuTorch
#39436 opened Jul 15, 2025
Add DiCoW: Diarization-Conditioned Whisper
#39430 opened Jul 15, 2025
Gemma 3 Compilation Issues During Generation
#39427 opened Jul 15, 2025
object detection : matchin outputs.last_hidden_state with results
#39426 opened Jul 15, 2025
Exeception 3 type mismatch
#39413 opened Jul 15, 2025
FP8 training support for Model Parallel / Tensor Parallel (MP/TP)
#39410 opened Jul 15, 2025
TypeError: couldn't find storage object Float8_e4m3fnStorage - which version is needed for this?
#39409 opened Jul 15, 2025
Off-by-one error when using flash_attention with a sliding window
#39408 opened Jul 15, 2025
Whisper `return_language` with pipeline no longer working
#39404 opened Jul 14, 2025
Qwen2.5-VL Sharding error when using Tensor Parallelism
#39399 opened Jul 14, 2025
Mask2FormerImageProcessor yields inconsistent results between single and batch inference
#39382 opened Jul 12, 2025
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 opened Jul 12, 2025
Option to tokenize messages one after the other
#39417 opened Jul 12, 2025
FlashAttention2 support for GSAI-ML / LLaDA-8B-Instruct?
#39377 opened Jul 12, 2025
Adding api key to `transformers serve`
#39367 opened Jul 11, 2025
RuntimeError when loading llmcompressor W8A8 quantized model: int8 dtype in weight initialization
#39366 opened Jul 11, 2025
Bug in modeling_bart.eager_attention_forward
#39365 opened Jul 11, 2025
env.useBrowserCache = true causes JSON parsing error, forced to disable cache making app slower.
#39352 opened Jul 11, 2025
surpport for google/medgemma-27b-it
#39350 opened Jul 11, 2025
TypeError: GenerationMixin._extract_past_from_model_output() got an unexpected keyword argument 'standardize_cache_format'
#39336 opened Jul 10, 2025
Adding support for Gemma 3n GGUFs
#39329 opened Jul 10, 2025
Add HF integration dates + paper release dates to the model docs
#39319 opened Jul 9, 2025
Whisper demo code for model + processor API is broken
#39318 opened Jul 9, 2025
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 opened Jul 9, 2025
Support 2D Array Inputs in Wav2Vec2FeatureExtractor for Non-Waveform Modalities
#39291 opened Jul 9, 2025
hangs during training using deepspeed
#39275 opened Jul 8, 2025
Please help i am trying to run model but issue
#39260 opened Jul 7, 2025
[Trainer] Eval loss depends on batch size (with solution)
#39241 opened Jul 7, 2025
Specifying multiple metrics in TrainingArguments.metric_for_best_model
#39235 opened Jul 5, 2025
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 opened Jul 4, 2025
transformers: FlaubertTokenizer: do_lowercase_and_remove_accent: make the logger warning actionable (don't only tell what's wrong, rather suggest what could be done about that)
#39224 opened Jul 4, 2025
Feature Request: Native Support for Custom Multimodal Models
#39219 opened Jul 4, 2025
torch fake_tensor load hf model failed
#39217 opened Jul 4, 2025
_load_rng_state after get_batch_samples may break training reproducibility when dataloader has random operations
#39215 opened Jul 4, 2025
Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39214 opened Jul 4, 2025
Remove device to host sync triggered in _flash_attention_forward
#39213 opened Jul 4, 2025
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 opened Jul 3, 2025
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 opened Jul 3, 2025
Naming incosistencies of `PreTrained*` classes.
#39202 opened Jul 3, 2025
🐛 Bug Report: Accelerate config to disable torch dynamo is ignored by transformers automatic compilation
#39191 opened Jul 3, 2025
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 opened Jul 3, 2025
Torch patches tracker for HPU/Gaudi
#39175 opened Jul 2, 2025
Using Gemma3n with text-only generation requires image dependencies
#39169 opened Jul 2, 2025
apply_rotary_pos_emb_flashatt failed during triton jit compilation 'constexpr' object has no attribute 'bit_length'
#39167 opened Jul 2, 2025
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 opened Jul 1, 2025
Add x-transformers library by lucidrains
#39139 opened Jun 30, 2025
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 opened Jun 30, 2025
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 opened Jun 30, 2025
Gradient accumulation steps for Vision Languge model
#39123 opened Jun 30, 2025
Is there a way to force it to use ASCII based progress bar and not the ipython widget one?
#39114 opened Jun 29, 2025
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 opened Jun 29, 2025
New release 4.53.0 breaks HF trainer/model
#39111 opened Jun 29, 2025

152 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Segment Anything 2 (SAM2)
#32317 commented on Jul 29, 2025 • 91 new comments
[WiP] Add xcodec2 model
#37868 commented on Jul 28, 2025 • 70 new comments
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on Jul 23, 2025 • 57 new comments
Add support for Florence-2
#38188 commented on Jul 29, 2025 • 48 new comments
blt wip
#38579 commented on Jul 29, 2025 • 38 new comments
Add Ovis2 model and processor implementation
#37088 commented on Jul 29, 2025 • 29 new comments
Add NVIDIA Cosmos
#36476 commented on Jul 16, 2025 • 14 new comments
🔴[`Attention`] Bert-based Models Attention Refactor
#38301 commented on Jul 16, 2025 • 12 new comments
feat: Add ConvaiCausalLM model for Hindi Causal Language Modeling
#37837 commented on Jul 16, 2025 • 10 new comments
Disable static cache on certain MoE models
#39108 commented on Jul 28, 2025 • 8 new comments
Add X-Codec model
#38248 commented on Jul 23, 2025 • 7 new comments
Add Dust3R
#38805 commented on Jul 22, 2025 • 6 new comments
[WIP] Add MM Grounding DINO
#37925 commented on Jul 26, 2025 • 4 new comments
Add FAST
#35476 commented on Jul 29, 2025 • 4 new comments
Modular m4t speecht5 sew
#37473 commented on Jul 2, 2025 • 3 new comments
Force real tensors and clone state_dict in src/transformers/modeling_utils.py
#38114 commented on Jul 15, 2025 • 3 new comments
[omni modality] support composite processor config
#38142 commented on Jul 28, 2025 • 3 new comments
Make executorch integration more seamless by analyzing model signature
#36969 commented on Jul 16, 2025 • 3 new comments
Add callback to monitor progress in whisper transcription
#37483 commented on Jul 29, 2025 • 2 new comments
Use deep copies instead of shallow copies for bbox_embed in GroundingDINO decoder (#37333).
#38999 commented on Jul 1, 2025 • 1 new comment
Add serialization function for StaticCache
#38879 commented on Jul 2, 2025 • 1 new comment
Provide clearer instructions on how to specify target language.
#38786 commented on Jul 21, 2025 • 1 new comment
deci gguf support
#38669 commented on Jul 29, 2025 • 1 new comment
[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training
#39012 commented on Jul 1, 2025 • 1 new comment
Fix ModernBERT tokenizer issue with is_split_into_words flag
#38564 commented on Jul 16, 2025 • 1 new comment
another way to use shift_labels
#38533 commented on Jul 16, 2025 • 1 new comment
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on Jul 7, 2025 • 0 new comments
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on Jul 23, 2025 • 0 new comments
[WIP] Computer vision util: vision visualizer
#36892 commented on Jul 25, 2025 • 0 new comments
Remove the redundant shift during the loss computation in the Moshi m…
#36928 commented on Jul 7, 2025 • 0 new comments
docs: PyTorch examples (image-classification & image-pretraining) clarity
#39094 commented on Jul 3, 2025 • 0 new comments
Adding a stub for MiniCPM-o to the models
#37049 commented on Jul 7, 2025 • 0 new comments
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on Jul 22, 2025 • 0 new comments
docs: Update LayoutLMv3 model card with standardized format and impro…
#37155 commented on Jul 3, 2025 • 0 new comments
trying custom tokenizer fix
#37177 commented on Jul 16, 2025 • 0 new comments
[RFC] Fix Gemma 3 FP16 with activation scaling
#37226 commented on Jul 16, 2025 • 0 new comments
Add QLIP Model
#37328 commented on Jul 7, 2025 • 0 new comments
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on Jul 21, 2025 • 0 new comments
Fix edge case for tokenize (#36277)
#36555 commented on Jul 8, 2025 • 0 new comments
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on Jul 29, 2025 • 0 new comments
Fix pos idx v4.52.4
#39096 commented on Jul 8, 2025 • 0 new comments
Add Phi-3.5-vision
#36036 commented on Jul 28, 2025 • 0 new comments
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on Jul 21, 2025 • 0 new comments
Add StyleTTS 2
#35790 commented on Jul 28, 2025 • 0 new comments
use warning_once instead of warning in Trainer.tokenizer
#35482 commented on Jul 25, 2025 • 0 new comments
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on Jul 26, 2025 • 0 new comments
Fix hardcoded `float` dtypes in DeBERTa model, which caused multiple RuntimeErrors in `bfloat16`
#35336 commented on Jul 9, 2025 • 0 new comments
Add JinaBERT model
#35320 commented on Jul 15, 2025 • 0 new comments
uniformize kwargs for OneFormer
#34547 commented on Jul 7, 2025 • 0 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Jul 21, 2025 • 0 new comments
Fix audio-related config naming for Gemma3n
#39103 commented on Jun 30, 2025 • 0 new comments
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on Jul 15, 2025 • 0 new comments
Allow compile with bnb
#38886 commented on Jul 3, 2025 • 0 new comments
Add `SepCache` [An efficient and easy-to-use Cache from the SepLLM paper - ICML 2025 (https://arxiv.org/abs/2412.12094) ] to the `cache_utils.py` and `__init__.py`
#38824 commented on Jul 28, 2025 • 0 new comments
docs: Musicgen melody model card
#38955 commented on Jul 3, 2025 • 0 new comments
Adding custom 3d mask into ModernBert
#38671 commented on Jul 29, 2025 • 0 new comments
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on Jul 22, 2025 • 0 new comments
Updating model card for wav2vec2
#38956 commented on Jul 5, 2025 • 0 new comments
Add Bagel
#38569 commented on Jul 25, 2025 • 0 new comments
On branch fix-void-segment-mask-input [WIP]
#38532 commented on Jul 1, 2025 • 0 new comments
2/2 More cleaning for the `LlamaModel` keeping only the core
#38368 commented on Jul 10, 2025 • 0 new comments
Update wav2vec2-bert model card
#38957 commented on Jul 3, 2025 • 0 new comments
Fix the shape of ModernBertForMaskedLM's output hidden_states
#38272 commented on Jul 16, 2025 • 0 new comments
Updated model card for wav2vec2-conformer
#38958 commented on Jul 3, 2025 • 0 new comments
Updated the model card for wav2vec2-phoneme
#38959 commented on Jul 3, 2025 • 0 new comments
Check docstring inside modular files as well
#38988 commented on Jul 9, 2025 • 0 new comments
SQuat cache implementation
#38055 commented on Jul 11, 2025 • 0 new comments
Add submodels support check function
#39009 commented on Jul 1, 2025 • 0 new comments
support MiniCPM-o2.6
#37917 commented on Jul 8, 2025 • 0 new comments
add profiler to trainer
#37889 commented on Jul 29, 2025 • 0 new comments
fix `kosmos2` tests
#39037 commented on Jun 30, 2025 • 0 new comments
Update ruff to 0.12.3 and apply its fixes
#37809 commented on Jul 21, 2025 • 0 new comments
Vectorize deepseek moe
#37769 commented on Jul 16, 2025 • 0 new comments
Add PLM Model
#37634 commented on Jul 7, 2025 • 0 new comments
Allow compression on meta device
#39039 commented on Jul 15, 2025 • 0 new comments
Fix interpolation of convnext image processor
#37460 commented on Jul 7, 2025 • 0 new comments
Fix typo in Gemma3ForCausalLM doctest
#37374 commented on Jul 7, 2025 • 0 new comments
If a training job job failed MLFlow will not be reported and MLFlow shows job still running
#30333 commented on Jul 15, 2025 • 0 new comments
"pipeline" is not exported from module "transformers"
#37646 commented on Jul 16, 2025 • 0 new comments
[DOCS] Add `pruna` as optimization framework
#38740 commented on Jul 16, 2025 • 0 new comments
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on Jul 16, 2025 • 0 new comments
Modernbert 3D attention mask
#38040 commented on Jul 16, 2025 • 0 new comments
Automatic dynamic batch size selection for DataCollatorWithFlattening
#33945 commented on Jul 16, 2025 • 0 new comments
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on Jul 16, 2025 • 0 new comments
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on Jul 16, 2025 • 0 new comments
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on Jul 16, 2025 • 0 new comments
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on Jul 16, 2025 • 0 new comments
Flex attention support with arbitrary 4d mask for LlamaModel
#33898 commented on Jul 17, 2025 • 0 new comments
Add `pruna` integration for loading model through `transformers.from_pretrained` / `pipeline`.
#37971 commented on Jul 17, 2025 • 0 new comments
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on Jul 17, 2025 • 0 new comments
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 commented on Jul 18, 2025 • 0 new comments
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on Jul 18, 2025 • 0 new comments
Safetensors deserializing silently mishandles tied parameters
#38870 commented on Jul 18, 2025 • 0 new comments
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on Jul 20, 2025 • 0 new comments
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on Jul 21, 2025 • 0 new comments
Add support for BAGEL from ByteDance
#38267 commented on Jun 30, 2025 • 0 new comments
Support Asynchronous Evaluation on Separate GPU in `Trainer`
#38829 commented on Jun 30, 2025 • 0 new comments
Trainer.training_step incorrectly normalizes mean token loss when n_gpu > 1
#37474 commented on Jul 1, 2025 • 0 new comments
AutoConfig has potential issue with composite config.
#38258 commented on Jul 2, 2025 • 0 new comments
scale loss per token/local sequence for discrete system representation
#38854 commented on Jul 3, 2025 • 0 new comments
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 commented on Jul 7, 2025 • 0 new comments
Pretrainedtokenizerfast Segmentation fault
#39099 commented on Jul 7, 2025 • 0 new comments
How to use other acceleration apis of npu?
#39105 commented on Jul 7, 2025 • 0 new comments
[i18n-es] Translating docs to Spanish
#28936 commented on Jul 7, 2025 • 0 new comments
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 commented on Jul 8, 2025 • 0 new comments
Incorrect scaling of Gemma embeddings in float32 regime
#38702 commented on Jul 9, 2025 • 0 new comments
Object detection training/fine-tuning for Owl-vit/Owlv2
#33664 commented on Jul 10, 2025 • 0 new comments
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
#38966 commented on Jul 10, 2025 • 0 new comments
Exporting Llava decoder into ONNX format
#38924 commented on Jul 11, 2025 • 0 new comments
Llama4 inference encounter unsupported op in dynamo ?
#38118 commented on Jul 11, 2025 • 0 new comments
Loading audio in video from video URLs fail with chat template
#39076 commented on Jul 14, 2025 • 0 new comments
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 commented on Jul 14, 2025 • 0 new comments
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on Jul 15, 2025 • 0 new comments
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on Jul 15, 2025 • 0 new comments
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 commented on Jul 26, 2025 • 0 new comments
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 commented on Jul 27, 2025 • 0 new comments
pytorch_utils.py > isin_mps_friendly > RuntimeError: Expected elements.dtype() == test_elements.dtype() to be true, but got false.
#37423 commented on Jul 27, 2025 • 0 new comments
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 commented on Jul 28, 2025 • 0 new comments
facebook/dinov2-with-registers-giant does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
#39075 commented on Jul 28, 2025 • 0 new comments
Inefficient default GELU implementation in GPT2
#39073 commented on Jul 28, 2025 • 0 new comments
Inefficient memory resharding in attention layer
#39072 commented on Jul 28, 2025 • 0 new comments
enable GraniteMoeHybridIntegrationTest in UT
#38542 commented on Jul 28, 2025 • 0 new comments
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 commented on Jul 28, 2025 • 0 new comments
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 commented on Jul 28, 2025 • 0 new comments
Whisper word-level timestamp extraction fails with beam search
#36093 commented on Jul 28, 2025 • 0 new comments
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on Jul 28, 2025 • 0 new comments
Output logits differ significantly for different attn_implementations on image inputs
#39067 commented on Jul 28, 2025 • 0 new comments
ValueError: GGUF model with architecture deci is not supported yet.
#37736 commented on Jul 28, 2025 • 0 new comments
[Contributions Welcome] Add Fast Image Processors
#36978 commented on Jul 29, 2025 • 0 new comments
[Community contributions] Model cards
#36979 commented on Jul 29, 2025 • 0 new comments
Implement MambaForSequenceClassification
#31155 commented on Jul 15, 2025 • 0 new comments
DeepSpeed sequence parallelism (aka Ulysses) integration with HF transformer
#32305 commented on Jul 15, 2025 • 0 new comments
Load a pretrainedfast tokenizer if fast=true and tokenizer.json exists
#33751 commented on Jul 15, 2025 • 0 new comments
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on Jul 21, 2025 • 0 new comments
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 commented on Jul 21, 2025 • 0 new comments
Transformers version causing my finetuned model to hallucinate
#38378 commented on Jul 21, 2025 • 0 new comments
Significant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 commented on Jul 21, 2025 • 0 new comments
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on Jul 22, 2025 • 0 new comments
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on Jul 22, 2025 • 0 new comments
How to streaming output audio of Qwen2.5-omni-7b
#37570 commented on Jul 22, 2025 • 0 new comments
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 commented on Jul 22, 2025 • 0 new comments
Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 commented on Jul 22, 2025 • 0 new comments
add MiniCPM-o
#37029 commented on Jul 22, 2025 • 0 new comments
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on Jul 24, 2025 • 0 new comments
Model implmenetation using Liger Kernel layers
#38416 commented on Jul 24, 2025 • 0 new comments
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on Jul 24, 2025 • 0 new comments
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on Jul 24, 2025 • 0 new comments
Not able to use flash attention with torch.compile with model like BERT
#39017 commented on Jul 25, 2025 • 0 new comments
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 commented on Jul 25, 2025 • 0 new comments
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on Jul 25, 2025 • 0 new comments
pytorch version 1.8.1 compatibility
#39049 commented on Jul 26, 2025 • 0 new comments
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 commented on Jul 26, 2025 • 0 new comments

0