Pulse · huggingface/transformers · GitHub

8000 Pulse · huggingface/transformers · GitHub

July 19, 2025 – July 26, 2025

Overview

173 Active pull requests

63 Active issues

3 Releases published by 2 people

v4.53.3 Patch release v4.53.3
published Jul 22, 2025
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published Jul 23, 2025
v4.54.0 v4.54.0: Kernels, Transformers Serve, Ernie, Voxtral, LFM2, DeepSeek v2, ModernBERT Decoder...
published Jul 25, 2025

100 Pull requests merged by 55 people

fix missing model._tp_size from ep refactor
#39688 merged Jul 26, 2025
More robust tied weight test
#39681 merged Jul 25, 2025
Add padding-free to Granite hybrid moe models
#39677 merged Jul 25, 2025
Fix tied weight test
#39680 merged Jul 25, 2025
fix break for ckpt without _tp_plan
#39658 merged Jul 25, 2025
Add EXAONE 4.0 model
#39129 merged Jul 25, 2025
Support typing.Literal as type of tool parameters or return value
#39633 merged Jul 25, 2025
Add ep
#39501 merged Jul 25, 2025
bad_words_ids no longer slow on mps
#39556 merged Jul 25, 2025
Add xlstm model
#39665 merged Jul 25, 2025
Use auto_docstring for perception_lm fast image processor
#39679 merged Jul 25, 2025
fix: HWIO to OIHW
#39200 merged Jul 25, 2025
Fix auto_docstring crashing when dependencies are missing
#39564 merged Jul 25, 2025
Add support for DeepseekAI's DeepseekVL
#36248 merged Jul 25, 2025
Add missing flag for CacheLayer
#39678 merged Jul 25, 2025
Add evolla rebase main
#36232 merged Jul 25, 2025
update expected outputs for whisper after #38778
#39304 merged Jul 25, 2025
fix kyutai tests
#39416 merged Jul 25, 2025
Fixes the BC
#39636 merged Jul 25, 2025
Delete bad rebasing functions
#39672 merged Jul 25, 2025
[Ernie 4.5] Post merge adaptations
#39664 merged Jul 25, 2025
[CI] revert device in test_export_static_cache
#39662 merged Jul 25, 2025
Fix ModernBERT Decoder model
#39671 merged Jul 25, 2025
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged Jul 25, 2025
Rename huggingface_cli to hf
#39630 merged Jul 25, 2025
fix(voxtral): correct typo in apply_transcription_request
#39572 merged Jul 25, 2025
make fixup
#39661 merged Jul 25, 2025
[docs] fix ko cache docs
#39644 merged Jul 25, 2025
Make pytorch examples UV-compatible
#39635 merged Jul 25, 2025
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged Jul 25, 2025
Fix: explicit not none check for tensors in flash attention
#39639 merged Jul 25, 2025
[attention] fix test for packed padfree masking
#39582 merged Jul 25, 2025
Add owlv2 fast processor
#39041 merged Jul 25, 2025
revert behavior of _prepare_from_posids
#39622 merged Jul 24, 2025
[Voxtral] values for A10 runners
#39605 merged Jul 24, 2025
[timm] new timm pin
#39640 merged Jul 24, 2025
Fix EfficientLoFTR model id in tests
#39621 merged Jul 24, 2025
Update recent processors for vLLM backend
#39583 merged Jul 24, 2025
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged Jul 23, 2025
standardized YOLOS model card according to template in #36979
#39528 merged Jul 23, 2025
Feature/standardize opt model card
#39568 merged Jul 23, 2025
🔴 Fix EnCodec internals and integration tests
#39431 merged Jul 23, 2025
Fix DAC integration tests and checkpoint conversion.
#39313 merged Jul 23, 2025
Move openai import
#39613 merged Jul 23, 2025
Transformers serve VLM
#39454 merged Jul 23, 2025
Fix important models CI
#39576 merged Jul 23, 2025
Fix typos and grammar issues in documentation and code
#39598 merged Jul 23, 2025
Allow device_mesh have multiple dim
#38949 merged Jul 23, 2025
enable triton backend on awq xpu
#39443 merged Jul 23, 2025
[idefics3] fix for vLLM
#39470 merged Jul 23, 2025
fix moe routing_weights
#39581 merged Jul 23, 2025
FP-Quant support
#38696 merged Jul 23, 2025
Rename supports_static_cache to can_compile_fullgraph
#39505 merged Jul 23, 2025
[Trackio] Allow single-gpu training and monitor power
#39595 merged Jul 23, 2025
Generic task-specific base classes
#39584 merged Jul 23, 2025
Fix DynamicCache and simplify Cache classes a bit
#39590 merged Jul 23, 2025
Mask2former & Maskformer Fast Image Processor
#35685 merged Jul 23, 2025
🎯 Trackio integration
#38814 merged Jul 22, 2025
[WIP] Add OneformerFastImageProcessor
#38343 merged Jul 22, 2025
Fix link in "Inference server backends" doc
#39589 merged Jul 22, 2025
Torchdec RuntimeError catch
#39580 merged Jul 22, 2025
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged Jul 22, 2025
updated mistral3 model card
#39531 merged Jul 22, 2025
Update docs/source/ko/_toctree.yml
#39516 merged Jul 22, 2025
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged Jul 22, 2025
General weight initialization scheme
#39579 merged Jul 22, 2025
Add AMD GPU expectations for LLaVA tests
#39486 merged Jul 22, 2025
Kernels flash attn
#39474 merged Jul 22, 2025
Add AMD expectations to Mistral3 tests
#39481 merged Jul 22, 2025
[docs] Create page on inference servers with transformers backend
#39550 merged Jul 22, 2025
[docs] update attention implementation and cache docs
#39547 merged Jul 22, 2025
Add AMD test expectations to DETR model
#39539 merged Jul 22, 2025
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged Jul 22, 2025
Fixes needed for n-d parallelism and TP
#39562 merged Jul 22, 2025
Bump AMD container for 2.7.1 PyTorch
#39458 merged Jul 22, 2025
Add EfficientLoFTR model
#36355 merged Jul 22, 2025
[gemma3] fix bidirectional image mask
#39396 merged Jul 22, 2025
Update OLMoE model card
#39344 merged Jul 21, 2025
Update modernbertdecoder docs
#39453 merged Jul 21, 2025
[CI] Fix post merge ernie 4.5
#39561 merged Jul 21, 2025
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged Jul 21, 2025
[Ernie 4.5] Add ernie text models
#39228 merged Jul 21, 2025
Refactor embedding input/output getter/setter
#39339 merged Jul 21, 2025
🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean
#39441 merged Jul 21, 2025
[Fast image processor] refactor fast image processor glm4v
#39490 merged Jul 21, 2025
fix ndim check of device_mesh for TP
#39538 merged Jul 21, 2025
Refactor MambaCache to modeling_mamba.py
#38086 merged Jul 21, 2025
Fix Docstring of BarkProcessor
#39546 merged Jul 21, 2025
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged Jul 21, 2025
Fix missing initializations for models created in 2023
#39239 merged Jul 21, 2025
Raise TypeError instead of ValueError for invalid types
#38660 merged Jul 21, 2025
Fix pylint warnings
#39477 merged Jul 21, 2025
Fix Qwen Omni integration test
#39553 merged Jul 21, 2025
🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments
#39395 merged Jul 21, 2025
Rename _supports_flash_attn_2 in examples and tests
#39471 merged Jul 21, 2025
Fix the check in flex test
#39548 merged Jul 21, 2025
Fix bad tensor shape in failing Hubert test.
#39502 merged Jul 21, 2025
GLM-4 Update
#39393 merged Jul 21, 2025
[qwen2 vl] fix packing with all attentions
#39447 merged Jul 21, 2025
[gemma3] support sequence classification task
#39465 merged Jul 21, 2025

73 Pull requests opened by 61 people

build: Add fast image processor tvp
#39529 opened Jul 20, 2025
🌐 [i18n-KO] Translated `tokenizer.md` to Korean
#39532 opened Jul 20, 2025
Add Beit3 model
#39534 opened Jul 20, 2025
🌐 [i18n-KO] Translated `cache_explanation.md` to Korean
#39535 opened Jul 20, 2025
🌐 [i18n-KO] Translated `how_to_hack_models.md` to Korean
#39536 opened Jul 20, 2025
[Voxtral] Fix typo
#39540 opened Jul 20, 2025
Add Muon optimizer implementation and integration
#39541 opened Jul 20, 2025
added smollama base model - 1B parameter
#39543 opened Jul 21, 2025
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened Jul 21, 2025
🌐 [i18n-KO] Translated `perf_train_gpu_one.md` to Korean
#39552 opened Jul 21, 2025
[WIP] try to relax the tie_weights method
#39555 opened Jul 21, 2025
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened Jul 21, 2025
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened Jul 21, 2025
fix load_model_end = true work when save_steps < eval_steps
#39560 opened Jul 21, 2025
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened Jul 21, 2025
[i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened Jul 22, 2025
xpu optimization for generation case
#39573 opened Jul 22, 2025
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened Jul 22, 2025
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened Jul 22, 2025
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened Jul 22, 2025
🌐 [i18n-KO] Translated `tvp.md` to Korean
#39578 opened Jul 22, 2025
[`Ernie 4.5`] Ernie VL models
#39585 opened Jul 22, 2025
fix(tokenization): check token.content for trie
#39587 opened Jul 22, 2025
WIP, reference modeling
#39588 opened Jul 22, 2025
Add Fast Image Processor for ImageGPT
#39592 opened Jul 22, 2025
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened Jul 22, 2025
Fix: check TrainerState file exists before loading during resume
#39599 opened Jul 23, 2025
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened Jul 23, 2025
[`CI`] Add Eric to comment slow ci
#39601 opened Jul 23, 2025
feat: add `is_fast` to ImageProcessor
#39603 opened Jul 23, 2025
Update model card for Cohere2 (Command R7B)
#39604 opened Jul 23, 2025
HunYuan opensource
#39606 opened Jul 23, 2025
Chat schemas
#39609 opened Jul 23, 2025
Fix return typehint for decoder and annotate inv_freq
#39610 opened Jul 23, 2025
Rework add-new-model-like with modular
#39612 opened Jul 23, 2025
Export SmolvLM
#39614 opened Jul 23, 2025
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened Jul 23, 2025
docs: Update EfficientLoFTR documentation
#39620 opened Jul 23, 2025
fix tensor device when loading state dict
#39623 opened Jul 24, 2025
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened Jul 24, 2025
🌐 [i18n-KO] Translated '<text-to-speech>.md' to Korean
#39628 opened Jul 24, 2025
[processors] add tests for helper fn
#39629 opened Jul 24, 2025
[serve] Add speech-to-text
#39631 opened Jul 24, 2025
fix dead NVIDIA link
#39632 opened Jul 24, 2025
Reorder serving docs
#39634 opened Jul 24, 2025
Support loading Qwen3 MoE GGUF
#39638 opened Jul 24, 2025
Fix quant docker for fp-quant
#39641 opened Jul 24, 2025
mllama outputs refactor
#39643 opened Jul 24, 2025
fix chameleonvision UT failure
#39646 opened Jul 24, 2025
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened Jul 24, 2025
Add self-hosted runner scale set workflow for mi325 CI
#39651 opened Jul 24, 2025
extend more trainer test cases to XPU, all pass
#39652 opened Jul 25, 2025
Enable xpu allocator on caching_allocator_warmup
#39654 opened Jul 25, 2025
update ernie model card
#39657 opened Jul 25, 2025
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 opened Jul 25, 2025
[docs] Ko doc fixes after toc update
#39660 opened Jul 25, 2025
[modular] small fixes
#39663 opened Jul 25, 2025
Update `QAPipelineTests::test_large_model_course` after #39193
#39666 opened Jul 25, 2025
Reduce atol values in test_dynamic_cache_exportability
#39667 opened Jul 25, 2025
Fix AMD dockerfile for audio models
#39669 opened Jul 25, 2025
skip `Glm4MoeModelTest::test_torch_compile_for_training`
#39670 opened Jul 25, 2025
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened Jul 25, 2025
[BugFix]: Support dict and config file path for deepspeed
#39675 opened Jul 25, 2025
Fix cache-related tests
#39676 opened Jul 25, 2025
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened Jul 25, 2025
Fix missing initialization of `FastSpeech2Conformer`
#39689 opened Jul 26, 2025
Allow custom hf_quantizer in from_pretrained
#39690 opened Jul 26, 2025
fix misspelled issues
#39691 opened Jul 26, 2025
PATCH: add back n-dim device-mesh
#39693 opened Jul 26, 2025
Don't set `run_name` when none
#39695 opened Jul 26, 2025
use untyped storage for dtensors due to deprecation
#39697 opened Jul 26, 2025
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened Jul 26, 2025
properly save model across tensor parallel processes
#39700 opened Jul 26, 2025

35 Issues closed by 12 people

CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed Jul 26, 2025
text-generation extremely slow with large `bad_words_ids` list
#39512 closed Jul 25, 2025
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed Jul 25, 2025
Add Deepseek-VL
#36110 closed Jul 25, 2025
Grammatical error in the "Loading model's" page
#39018 closed Jul 25, 2025
Inference API Returning 404
#39650 closed Jul 25, 2025
Backwards incompatible change in returned hidden states
#39558 closed Jul 25, 2025
Typo in `apply_transcrition_request` method name
#39530 closed Jul 25, 2025
video_auto_processing.py breaks everything
#38846 closed Jul 25, 2025
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed Jul 25, 2025
Run 111B+ Teacher distributed inference and 8B Student distributed training on multi-node H200 GPUs using the Transformers Trainer without encountering OOM errors?
#39637 closed Jul 24, 2025
VoxtralForConditionalGeneration import error
#39611 closed Jul 24, 2025
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed Jul 24, 2025
`gemma-3-1b-it` with `use_cache=True` and `past_key_values` throws `RuntimeError: CUDA error: device-side assert` error
#39593 closed Jul 24, 2025
Wandb isn't logging config in offline mode
#38968 closed Jul 23, 2025
The similarity between image and text in siglip2 is very low
#39597 closed Jul 23, 2025
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed Jul 23, 2025
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed Jul 23, 2025
CI skipped failures tracking issue
#38820 closed Jul 23, 2025
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed Jul 22, 2025
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed Jul 22, 2025
Add EfficientLoFTR model
#36354 closed Jul 22, 2025
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed Jul 22, 2025
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed Jul 21, 2025
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed Jul 21, 2025
T5Gemma problem with tokenizer(?)
#39521 closed Jul 21, 2025
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed Jul 21, 2025
KeyError: 'llava_qwen2'
#39533 closed Jul 21, 2025
Add Gemma 3 For Sequence Classification
#36755 closed Jul 21, 2025
Expected all tensors to be on the same device, but found at least two devices
#37545 closed Jul 21, 2025
DynamicCache results in too many torch recompiles after 4.51
#37908 closed Jul 21, 2025
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed Jul 21, 2025
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed Jul 21, 2025
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed Jul 21, 2025
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed Jul 20, 2025

28 Issues opened by 27 people

No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened Jul 26, 2025
[exaone4] ZeroDivisionError/TypeError when sliding_window_pattern is None/"LLLG" and _attn_implementation stays None (4.54.0 & main)
#39696 opened Jul 26, 2025
4.54.0 bug: ImportError: cannot import name 'deterministic_g' from 'transformers.modeling_flash_attention_utils'
#39694 opened Jul 26, 2025
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened Jul 26, 2025
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 opened Jul 26, 2025
Qwen 2.5 VL - error without attention_mask
#39685 opened Jul 26, 2025
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened Jul 25, 2025
Accelerate beam search decoding via tree attention
#39682 opened Jul 25, 2025
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened Jul 25, 2025
Issue when initializing a DynamicCache
#39668 opened Jul 25, 2025
T5Gemma training not working
#39656 opened Jul 25, 2025
Use DP+FSDP device mesh dimensions for scaling loss with default value of average_tokens_across_devices: True
#39648 opened Jul 24, 2025
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened Jul 24, 2025
[XPU] Model get OOM when loading models
#39627 opened Jul 24, 2025
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened Jul 23, 2025
SageAttention for attention implementation?
#39618 opened Jul 23, 2025
Trainer: Error when folded metrics are saved
#39616 opened Jul 23, 2025
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened Jul 23, 2025
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened Jul 23, 2025
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened Jul 23, 2025
AddedToken should check content on `_update`
#39586 opened Jul 22, 2025
Model forward execution in full eager mode?
#39565 opened Jul 21, 2025
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened Jul 21, 2025
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened Jul 21, 2025
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened Jul 21, 2025
Add muon and flash-muon optimizer
#39537 opened Jul 20, 2025
InformerForPrediction [I would like to seek your opinions, everyone, How can I set the dynamic real features for prediction]
#39551 opened Jul 20, 2025
training google colab error
#39527 opened Jul 19, 2025

96 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[WiP] Add xcodec2 model
#37868 commented on Jul 25, 2025 • 55 new comments
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 commented on Jul 26, 2025 • 22 new comments
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 commented on Jul 25, 2025 • 15 new comments
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 commented on Jul 24, 2025 • 10 new comments
🌐 [i18n-KO] Translated `main_classes/peft.md`
#39515 commented on Jul 24, 2025 • 8 new comments
blt wip
#38579 commented on Jul 24, 2025 • 6 new comments
🌐 [i18n-KO] Translated processors.md to Korean
#39519 commented on Jul 24, 2025 • 5 new comments
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 commented on Jul 21, 2025 • 5 new comments
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on Jul 24, 2025 • 5 new comments
Add Segment Anything 2 (SAM2)
#32317 commented on Jul 24, 2025 • 5 new comments
Add T5LA models
#39293 commented on Jul 25, 2025 • 4 new comments
feat(tokenization): add encode_message to tokenize messages one by one
#39507 commented on Jul 23, 2025 • 4 new comments
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on Jul 23, 2025 • 4 new comments
[WIP] Add MM Grounding DINO
#37925 commented on Jul 26, 2025 • 4 new comments
BLIPs clean-up
#35560 commented on Jul 21, 2025 • 3 new comments
Fix Bark failing tests
#39478 commented on Jul 24, 2025 • 2 new comments
add pin memory and block table
#39130 commented on Jul 21, 2025 • 1 new comment
Add support for Florence-2
#38188 commented on Jul 26, 2025 • 1 new comment
Provide clearer instructions on how to specify target language.
#38786 commented on Jul 21, 2025 • 1 new comment
Add Dust3R
#38805 commented on Jul 22, 2025 • 0 new comments
[configuration] remove redundant `classmethod`
#38812 commented on Jul 23, 2025 • 0 new comments
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on Jul 22, 2025 • 0 new comments
🌐 [i18n-KO] Translated albert.md to Korean
#39524 commented on Jul 26, 2025 • 0 new comments
Add Bagel
#38569 commented on Jul 25, 2025 • 0 new comments
Add X-Codec model
#38248 commented on Jul 23, 2025 • 0 new comments
[omni modality] support composite processor config
#38142 commented on Jul 25, 2025 • 0 new comments
Update ruff to 0.12.3 and apply its fixes
#37809 commented on Jul 21, 2025 • 0 new comments
Superpoint fast image processor
#37804 commented on Jul 25, 2025 • 0 new comments
Add callback to monitor progress in whisper transcription
#37483 commented on Jul 22, 2025 • 0 new comments
Apply several ruff SIM rules
#37283 commented on Jul 22, 2025 • 0 new comments
Add Fast Segformer Processor
#37024 commented on Jul 25, 2025 • 0 new comments
🌐 [i18n-KO] Translated `pipeline_gradio.md` to Korean
#39520 commented on Jul 26, 2025 • 0 new comments
Fix `Qwen2AudioForConditionalGeneration.forward()`
#39503 commented on Jul 24, 2025 • 0 new comments
[WIP] Add support for including video object in apply_chat_template function
#39494 commented on Jul 20, 2025 • 0 new comments
Update CTRL model card with improved usage examples and documentation notes
#39487 commented on Jul 21, 2025 • 0 new comments
Add model arcinstitute state
#39480 commented on Jul 25, 2025 • 0 new comments
Skipping `initialize_weights` when model is quantized
#39464 commented on Jul 26, 2025 • 0 new comments
Add eurobert
#39455 commented on Jul 25, 2025 • 0 new comments
Add Vocos model
#39403 commented on Jul 24, 2025 • 0 new comments
[RoPE] allow models to configure local RoPE
#39397 commented on Jul 24, 2025 • 0 new comments
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on Jul 24, 2025 • 0 new comments
Add dates to the model docs
#39320 commented on Jul 25, 2025 • 0 new comments
Feat: add Kwai-Keye transformers
#39292 commented on Jul 26, 2025 • 0 new comments
Add support for `ModernBertForMultipleChoice`
#39232 commented on Jul 24, 2025 • 0 new comments
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 commented on Jul 21, 2025 • 0 new comments
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on Jul 26, 2025 • 0 new comments
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on Jul 22, 2025 • 0 new comments
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on Jul 21, 2025 • 0 new comments
Add `SepCache` [An efficient and easy-to-use Cache from the SepLLM paper - ICML 2025 (https://arxiv.org/abs/2412.12094) ] to the `cache_utils.py` and `__init__.py`
#38824 commented on Jul 22, 2025 • 0 new comments
[WIP] Computer vision util: vision visualizer
#36892 commented on Jul 25, 2025 • 0 new comments
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 commented on Jul 23, 2025 • 0 new comments
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 commented on Jul 23, 2025 • 0 new comments
Whisper word-level timestamp extraction fails with beam search
#36093 commented on Jul 23, 2025 • 0 new comments
_load_rng_state after get_batch_samples may break training reproducibility when dataloader has random operations
#39215 commented on Jul 23, 2025 • 0 new comments
Export voxtral to ExecuTorch
#39511 commented on Jul 23, 2025 • 0 new comments
Whisper `return_language` with pipeline no longer working
#39404 commented on Jul 23, 2025 • 0 new comments
object detection : matchin outputs.last_hidden_state with results
#39426 commented on Jul 22, 2025 • 0 new comments
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 commented on Jul 22, 2025 • 0 new comments
add MiniCPM-o
#37029 commented on Jul 22, 2025 • 0 new comments
Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 commented on Jul 22, 2025 • 0 new comments
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 commented on Jul 22, 2025 • 0 new comments
How to streaming output audio of Qwen2.5-omni-7b
#37570 commented on Jul 22, 2025 • 0 new comments
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on Jul 22, 2025 • 0 new comments
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on Jul 22, 2025 • 0 new comments
T5Gemma failing on provided example
#39522 commented on Jul 21, 2025 • 0 new comments
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 commented on Jul 21, 2025 • 0 new comments
Sign BCEA ificant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 commented on Jul 21, 2025 • 0 new comments
Transformers version causing my finetuned model to hallucinate
#38378 commented on Jul 21, 2025 • 0 new comments
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 commented on Jul 21, 2025 • 0 new comments
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on Jul 21, 2025 • 0 new comments
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on Jul 21, 2025 • 0 new comments
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on Jul 20, 2025 • 0 new comments
Checkpointing broken for classifier training multi-gpu
#38925 commented on Jul 20, 2025 • 0 new comments
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on Jul 23, 2025 • 0 new comments
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on Jul 21, 2025 • 0 new comments
use warning_once instead of warning in Trainer.tokenizer
#35482 commented on Jul 25, 2025 • 0 new comments
Add FAST
#35476 commented on Jul 22, 2025 • 0 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Jul 21, 2025 • 0 new comments
[Contributions Welcome] Add Fast Image Processors
#36978 commented on Jul 26, 2025 • 0 new comments
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 commented on Jul 26, 2025 • 0 new comments
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 commented on Jul 26, 2025 • 0 new comments
pytorch version 1.8.1 compatibility
#39049 commented on Jul 26, 2025 • 0 new comments
[Community contributions] Model cards
#36979 commented on Jul 26, 2025 • 0 new comments
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on Jul 25, 2025 • 0 new comments
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 commented on Jul 25, 2025 • 0 new comments
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 commented on Jul 25, 2025 • 0 new comments
Not able to use flash attention with torch.compile with model like BERT
#39017 commented on Jul 25, 2025 • 0 new comments
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on Jul 25, 2025 • 0 new comments
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on Jul 24, 2025 • 0 new comments
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on Jul 24, 2025 • 0 new comments
Model implmenetation using Liger Kernel layers
#38416 commented on Jul 24, 2025 • 0 new comments
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 commented on Jul 24, 2025 • 0 new comments
Adding support for Gemma 3n GGUFs
#39329 commented on Jul 24, 2025 • 0 new comments
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on Jul 24, 2025 • 0 new comments
Add DiCoW: Diarization-Conditioned Whisper
#39430 commented on Jul 24, 2025 • 0 new comments
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 commented on Jul 24, 2025 • 0 new comments

0