-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
3 Releases published by 2 people
-
v4.53.3 Patch release v4.53.3
published
Jul 22, 2025 -
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published
Jul 23, 2025
100 Pull requests merged by 55 people
-
fix missing model._tp_size from ep refactor
#39688 merged
Jul 26, 2025 -
More robust tied weight test
#39681 merged
Jul 25, 2025 -
Add padding-free to Granite hybrid moe models
#39677 merged
Jul 25, 2025 -
Fix tied weight test
#39680 merged
Jul 25, 2025 -
fix break for ckpt without _tp_plan
#39658 merged
Jul 25, 2025 -
Add EXAONE 4.0 model
#39129 merged
Jul 25, 2025 -
Support
typing.Literal
as type of tool parameters or return value#39633 merged
Jul 25, 2025 -
Add ep
#39501 merged
Jul 25, 2025 -
bad_words_ids no longer slow on mps
#39556 merged
Jul 25, 2025 -
Add xlstm model
#39665 merged
Jul 25, 2025 -
Use auto_docstring for perception_lm fast image processor
#39679 merged
Jul 25, 2025 -
fix: HWIO to OIHW
#39200 merged
Jul 25, 2025 -
Fix auto_docstring crashing when dependencies are missing
#39564 merged
Jul 25, 2025 -
Add support for DeepseekAI's DeepseekVL
#36248 merged
Jul 25, 2025 -
Add missing flag for CacheLayer
#39678 merged
Jul 25, 2025 -
Add evolla rebase main
#36232 merged
Jul 25, 2025 -
update expected outputs for whisper after #38778
#39304 merged
Jul 25, 2025 -
fix
kyutai
tests#39416 merged
Jul 25, 2025 -
Fixes the BC
#39636 merged
Jul 25, 2025 -
Delete bad rebasing functions
#39672 merged
Jul 25, 2025 -
[
Ernie 4.5
] Post merge adaptations#39664 merged
Jul 25, 2025 -
[CI] revert device in
test_export_static_cache
#39662 merged
Jul 25, 2025 -
Fix ModernBERT Decoder model
#39671 merged
Jul 25, 2025 -
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged
Jul 25, 2025 -
Rename huggingface_cli to hf
#39630 merged
Jul 25, 2025 -
fix(voxtral): correct typo in apply_transcription_request
#39572 merged
Jul 25, 2025 -
make fixup
#39661 merged
Jul 25, 2025 -
[docs] fix ko cache docs
#39644 merged
Jul 25, 2025 -
Make pytorch examples UV-compatible
#39635 merged
Jul 25, 2025 -
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged
Jul 25, 2025 -
Fix: explicit not none check for tensors in flash attention
#39639 merged
Jul 25, 2025 -
[attention] fix test for packed padfree masking
#39582 merged
Jul 25, 2025 -
Add owlv2 fast processor
#39041 merged
Jul 25, 2025 -
revert behavior of _prepare_from_posids
#39622 merged
Jul 24, 2025 -
[Voxtral] values for A10 runners
#39605 merged
Jul 24, 2025 -
[timm] new timm pin
#39640 merged
Jul 24, 2025 -
Fix EfficientLoFTR model id in tests
#39621 merged
Jul 24, 2025 -
Update recent processors for vLLM backend
#39583 merged
Jul 24, 2025 -
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged
Jul 23, 2025 -
standardized YOLOS model card according to template in #36979
#39528 merged
Jul 23, 2025 -
Feature/standardize opt model card
#39568 merged
Jul 23, 2025 -
🔴 Fix EnCodec internals and integration tests
#39431 merged
Jul 23, 2025 -
Fix DAC integration tests and checkpoint conversion.
#39313 merged
Jul 23, 2025 -
Move openai import
#39613 merged
Jul 23, 2025 -
Transformers serve VLM
#39454 merged
Jul 23, 2025 -
Fix important models CI
#39576 merged
Jul 23, 2025 -
Fix typos and grammar issues in documentation and code
#39598 merged
Jul 23, 2025 -
Allow
device_mesh
have multiple dim#38949 merged
Jul 23, 2025 -
enable triton backend on awq xpu
#39443 merged
Jul 23, 2025 -
[idefics3] fix for vLLM
#39470 merged
Jul 23, 2025 -
fix moe routing_weights
#39581 merged
Jul 23, 2025 -
FP-Quant support
#38696 merged
Jul 23, 2025 -
Rename
supports_static_cache
tocan_compile_fullgraph
#39505 merged
Jul 23, 2025 -
[Trackio] Allow single-gpu training and monitor power
#39595 merged
Jul 23, 2025 -
Generic task-specific base classes
#39584 merged
Jul 23, 2025 -
Fix DynamicCache and simplify Cache classes a bit
#39590 merged
Jul 23, 2025 -
Mask2former & Maskformer Fast Image Processor
#35685 merged
Jul 23, 2025 -
🎯 Trackio integration
#38814 merged
Jul 22, 2025 -
[WIP] Add OneformerFastImageProcessor
#38343 merged
Jul 22, 2025 -
Fix link in "Inference server backends" doc
#39589 merged
Jul 22, 2025 -
Torchdec RuntimeError catch
#39580 merged
Jul 22, 2025 -
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged
Jul 22, 2025 -
updated mistral3 model card
#39531 merged
Jul 22, 2025 -
Update
docs/source/ko/_toctree.yml
#39516 merged
Jul 22, 2025 -
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged
Jul 22, 2025 -
General weight initialization scheme
#39579 merged
Jul 22, 2025 -
Add AMD GPU expectations for LLaVA tests
#39486 merged
Jul 22, 2025 -
Kernels flash attn
#39474 merged
Jul 22, 2025 -
Add AMD expectations to Mistral3 tests
#39481 merged
Jul 22, 2025 -
[docs] Create page on inference servers with transformers backend
#39550 merged
Jul 22, 2025 -
[docs] update attention implementation and cache docs
#39547 merged
Jul 22, 2025 -
Add AMD test expectations to DETR model
#39539 merged
Jul 22, 2025 -
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged
Jul 22, 2025 -
Fixes needed for n-d parallelism and TP
#39562 merged
Jul 22, 2025 -
Bump AMD container for 2.7.1 PyTorch
#39458 merged
Jul 22, 2025 -
Add EfficientLoFTR model
#36355 merged
Jul 22, 2025 -
[gemma3] fix bidirectional image mask
#39396 merged
Jul 22, 2025 -
Update OLMoE model card
#39344 merged
Jul 21, 2025 -
Update modernbertdecoder docs
#39453 merged
Jul 21, 2025 -
[
CI
] Fix post merge ernie 4.5#39561 merged
Jul 21, 2025 -
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged
Jul 21, 2025 -
[
Ernie 4.5
] Add ernie text models#39228 merged
Jul 21, 2025 -
Refactor embedding input/output getter/setter
#39339 merged
Jul 21, 2025 -
🌐 [i18n-KO] Translated
perf_infer_gpu_multi.md
to Korean#39441 merged
Jul 21, 2025 -
[Fast image processor] refactor fast image processor glm4v
#39490 merged
Jul 21, 2025 -
fix ndim check of device_mesh for TP
#39538 merged
Jul 21, 2025 -
Refactor
MambaCache
tomodeling_mamba.py
#38086 merged
Jul 21, 2025 -
Fix Docstring of BarkProcessor
#39546 merged
Jul 21, 2025 -
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged
Jul 21, 2025 -
Fix missing initializations for models created in 2023
#39239 merged
Jul 21, 2025 -
Raise
TypeError
instead of ValueError for invalid types#38660 merged
Jul 21, 2025 -
Fix pylint warnings
#39477 merged
Jul 21, 2025 -
Fix Qwen Omni integration test
#39553 merged
Jul 21, 2025 -
🚨🚨🚨 [Trainer] Enable
average_tokens_across_devices
by default inTrainingArguments
#39395 merged
Jul 21, 2025 -
Rename
_supports_flash_attn_2
in examples and tests#39471 merged
Jul 21, 2025 -
Fix the check in flex test
#39548 merged
Jul 21, 2025 -
Fix bad tensor shape in failing Hubert test.
#39502 merged
Jul 21, 2025 -
GLM-4 Update
#39393 merged
Jul 21, 2025 -
[qwen2 vl] fix packing with all attentions
#39447 merged
Jul 21, 2025 -
[gemma3] support sequence classification task
#39465 merged
Jul 21, 2025
73 Pull requests opened by 61 people
-
build: Add fast image processor tvp
#39529 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated `tokenizer.md` to Korean
#39532 opened
Jul 20, 2025 -
Add Beit3 model
#39534 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated `cache_explanation.md` to Korean
#39535 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated `how_to_hack_models.md` to Korean
#39536 opened
Jul 20, 2025 -
[Voxtral] Fix typo
#39540 opened
Jul 20, 2025 -
Add Muon optimizer implementation and integration
#39541 opened
Jul 20, 2025 -
added smollama base model - 1B parameter
#39543 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `perf_train_gpu_one.md` to Korean
#39552 opened
Jul 21, 2025 -
[WIP] try to relax the tie_weights method
#39555 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened
Jul 21, 2025 -
fix load_model_end = true work when save_steps < eval_steps
#39560 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened
Jul 21, 2025 -
[i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened
Jul 22, 2025 -
xpu optimization for generation case
#39573 opened
Jul 22, 2025 -
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `tvp.md` to Korean
#39578 opened
Jul 22, 2025 -
[`Ernie 4.5`] Ernie VL models
#39585 opened
Jul 22, 2025 -
fix(tokenization): check token.content for trie
#39587 opened
Jul 22, 2025 -
WIP, reference modeling
#39588 opened
Jul 22, 2025 -
Add Fast Image Processor for ImageGPT
#39592 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened
Jul 22, 2025 -
Fix: check TrainerState file exists before loading during resume
#39599 opened
Jul 23, 2025 -
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened
Jul 23, 2025 -
[`CI`] Add Eric to comment slow ci
#39601 opened
Jul 23, 2025 -
feat: add `is_fast` to ImageProcessor
#39603 opened
Jul 23, 2025 -
Update model card for Cohere2 (Command R7B)
#39604 opened
Jul 23, 2025 -
HunYuan opensource
#39606 opened
Jul 23, 2025 -
Chat schemas
#39609 opened
Jul 23, 2025 -
Fix return typehint for decoder and annotate inv_freq
#39610 opened
Jul 23, 2025 -
Rework add-new-model-like with modular
#39612 opened
Jul 23, 2025 -
Export SmolvLM
#39614 opened
Jul 23, 2025 -
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened
Jul 23, 2025 -
docs: Update EfficientLoFTR documentation
#39620 opened
Jul 23, 2025 -
fix tensor device when loading state dict
#39623 opened
Jul 24, 2025 -
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened
Jul 24, 2025 -
🌐 [i18n-KO] Translated '<text-to-speech>.md' to Korean
#39628 opened
Jul 24, 2025 -
[processors] add tests for helper fn
#39629 opened
Jul 24, 2025 -
[serve] Add speech-to-text
#39631 opened
Jul 24, 2025 -
fix dead NVIDIA link
#39632 opened
Jul 24, 2025 -
Reorder serving docs
#39634 opened
Jul 24, 2025 -
Support loading Qwen3 MoE GGUF
#39638 opened
Jul 24, 2025 -
Fix quant docker for fp-quant
#39641 opened
Jul 24, 2025 -
mllama outputs refactor
#39643 opened
Jul 24, 2025 -
fix chameleonvision UT failure
#39646 opened
Jul 24, 2025 -
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened
Jul 24, 2025 -
Add self-hosted runner scale set workflow for mi325 CI
#39651 opened
Jul 24, 2025 -
extend more trainer test cases to XPU, all pass
#39652 opened
Jul 25, 2025 -
Enable xpu allocator on caching_allocator_warmup
#39654 opened
Jul 25, 2025 -
update ernie model card
#39657 opened
Jul 25, 2025 -
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 opened
Jul 25, 2025 -
[docs] Ko doc fixes after toc update
#39660 opened
Jul 25, 2025 -
[modular] small fixes
#39663 opened
Jul 25, 2025 -
Update `QAPipelineTests::test_large_model_course` after #39193
#39666 opened
Jul 25, 2025 -
Reduce atol values in test_dynamic_cache_exportability
#39667 opened
Jul 25, 2025 -
Fix AMD dockerfile for audio models
#39669 opened
Jul 25, 2025 -
skip `Glm4MoeModelTest::test_torch_compile_for_training`
#39670 opened
Jul 25, 2025 -
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened
Jul 25, 2025 -
[BugFix]: Support dict and config file path for deepspeed
#39675 opened
Jul 25, 2025 -
Fix cache-related tests
#39676 opened
Jul 25, 2025 -
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened
Jul 25, 2025 -
Fix missing initialization of `FastSpeech2Conformer`
#39689 opened
Jul 26, 2025 -
Allow custom hf_quantizer in from_pretrained
#39690 opened
Jul 26, 2025 -
fix misspelled issues
#39691 opened
Jul 26, 2025 -
PATCH: add back n-dim device-mesh
#39693 opened
Jul 26, 2025 -
Don't set `run_name` when none
#39695 opened
Jul 26, 2025 -
use untyped storage for dtensors due to deprecation
#39697 opened
Jul 26, 2025 -
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened
Jul 26, 2025 -
properly save model across tensor parallel processes
#39700 opened
Jul 26, 2025
35 Issues closed by 12 people
-
CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed
Jul 26, 2025 -
text-generation extremely slow with large `bad_words_ids` list
#39512 closed
Jul 25, 2025 -
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed
Jul 25, 2025 -
Add Deepseek-VL
#36110 closed
Jul 25, 2025 -
Grammatical error in the "Loading model's" page
#39018 closed
Jul 25, 2025 -
Inference API Returning 404
#39650 closed
Jul 25, 2025 -
Backwards incompatible change in returned hidden states
#39558 closed
Jul 25, 2025 -
Typo in `apply_transcrition_request` method name
#39530 closed
Jul 25, 2025 -
video_auto_processing.py breaks everything
#38846 closed
Jul 25, 2025 -
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed
Jul 25, 2025 -
VoxtralForConditionalGeneration import error
#39611 closed
Jul 24, 2025 -
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed
Jul 24, 2025 -
Wandb isn't logging config in offline mode
#38968 closed
Jul 23, 2025 -
The similarity between image and text in siglip2 is very low
#39597 closed
Jul 23, 2025 -
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed
Jul 23, 2025 -
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed
Jul 23, 2025 -
CI skipped failures tracking issue
#38820 closed
Jul 23, 2025 -
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed
Jul 22, 2025 -
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed
Jul 22, 2025 -
Add EfficientLoFTR model
#36354 closed
Jul 22, 2025 -
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed
Jul 22, 2025 -
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed
Jul 21, 2025 -
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed
Jul 21, 2025 -
T5Gemma problem with tokenizer(?)
#39521 closed
Jul 21, 2025 -
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed
Jul 21, 2025 -
KeyError: 'llava_qwen2'
#39533 closed
Jul 21, 2025 -
Add Gemma 3 For Sequence Classification
#36755 closed
Jul 21, 2025 -
Expected all tensors to be on the same device, but found at least two devices
#37545 closed
Jul 21, 2025 -
DynamicCache results in too many torch recompiles after 4.51
#37908 closed
Jul 21, 2025 -
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed
Jul 21, 2025 -
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed
Jul 21, 2025 -
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed
Jul 21, 2025 -
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed
Jul 20, 2025
28 Issues opened by 27 people
-
No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened
Jul 26, 2025 -
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened
Jul 26, 2025 -
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 opened
Jul 26, 2025 -
Qwen 2.5 VL - error without attention_mask
#39685 opened
Jul 26, 2025 -
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened
Jul 25, 2025 -
Accelerate beam search decoding via tree attention
#39682 opened
Jul 25, 2025 -
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened
Jul 25, 2025 -
Issue when initializing a DynamicCache
#39668 opened
Jul 25, 2025 -
T5Gemma training not working
#39656 opened
Jul 25, 2025 -
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened
Jul 24, 2025 -
[XPU] Model get OOM when loading models
#39627 opened
Jul 24, 2025 -
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened
Jul 23, 2025 -
SageAttention for attention implementation?
#39618 opened
Jul 23, 2025 -
Trainer: Error when folded metrics are saved
#39616 opened
Jul 23, 2025 -
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened
Jul 23, 2025 -
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened
Jul 23, 2025 -
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened
Jul 23, 2025 -
AddedToken should check content on `_update`
#39586 opened
Jul 22, 2025 -
Model forward execution in full eager mode?
#39565 opened
Jul 21, 2025 -
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened
Jul 21, 2025 -
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened
Jul 21, 2025 -
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened
Jul 21, 2025 -
Add muon and flash-muon optimizer
#39537 opened
Jul 20, 2025 -
training google colab error
#39527 opened
Jul 19, 2025
96 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[WiP] Add xcodec2 model
#37868 commented on
Jul 25, 2025 • 55 new comments -
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 commented on
Jul 26, 2025 • 22 new comments -
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 commented on
Jul 25, 2025 • 15 new comments -
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 commented on
Jul 24, 2025 • 10 new comments -
🌐 [i18n-KO] Translated `main_classes/peft.md`
#39515 commented on
Jul 24, 2025 • 8 new comments -
blt wip
#38579 commented on
Jul 24, 2025 • 6 new comments -
🌐 [i18n-KO] Translated processors.md to Korean
#39519 commented on
Jul 24, 2025 • 5 new comments -
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 commented on
Jul 21, 2025 • 5 new comments -
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on
Jul 24, 2025 • 5 new comments -
Add Segment Anything 2 (SAM2)
#32317 commented on
Jul 24, 2025 • 5 new comments -
Add T5LA models
#39293 commented on
Jul 25, 2025 • 4 new comments -
feat(tokenization): add encode_message to tokenize messages one by one
#39507 commented on
Jul 23, 2025 • 4 new comments -
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on
Jul 23, 2025 • 4 new comments -
[WIP] Add MM Grounding DINO
#37925 commented on
Jul 26, 2025 • 4 new comments -
BLIPs clean-up
#35560 commented on
Jul 21, 2025 • 3 new comments -
Fix Bark failing tests
#39478 commented on
Jul 24, 2025 • 2 new comments -
add pin memory and block table
#39130 commented on
Jul 21, 2025 • 1 new comment -
Add support for Florence-2
#38188 commented on
Jul 26, 2025 • 1 new comment -
Provide clearer instructions on how to specify target language.
#38786 commented on
Jul 21, 2025 • 1 new comment -
Add Dust3R
#38805 commented on
Jul 22, 2025 • 0 new comments -
[configuration] remove redundant `classmethod`
#38812 commented on
Jul 23, 2025 • 0 new comments -
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on
Jul 22, 2025 • 0 new comments -
🌐 [i18n-KO] Translated albert.md to Korean
#39524 commented on
Jul 26, 2025 • 0 new comments -
Add Bagel
#38569 commented on
Jul 25, 2025 • 0 new comments -
Add X-Codec model
#38248 commented on
Jul 23, 2025 • 0 new comments -
[omni modality] support composite processor config
#38142 commented on
Jul 25, 2025 • 0 new comments -
Update ruff to 0.12.3 and apply its fixes
#37809 commented on
Jul 21, 2025 • 0 new comments -
Superpoint fast image processor
#37804 commented on
Jul 25, 2025 • 0 new comments -
Add callback to monitor progress in whisper transcription
#37483 commented on
Jul 22, 2025 • 0 new comments -
Apply several ruff SIM rules
#37283 commented on
Jul 22, 2025 • 0 new comments -
Add Fast Segformer Processor
#37024 commented on
Jul 25, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `pipeline_gradio.md` to Korean
#39520 commented on
Jul 26, 2025 • 0 new comments -
Fix `Qwen2AudioForConditionalGeneration.forward()`
#39503 commented on
Jul 24, 2025 • 0 new comments -
[WIP] Add support for including video object in apply_chat_template function
#39494 commented on
Jul 20, 2025 • 0 new comments -
Update CTRL model card with improved usage examples and documentation notes
#39487 commented on
Jul 21, 2025 • 0 new comments -
Add model arcinstitute state
#39480 commented on
Jul 25, 2025 • 0 new comments -
Skipping `initialize_weights` when model is quantized
#39464 commented on
Jul 26, 2025 • 0 new comments -
Add eurobert
#39455 commented on
Jul 25, 2025 • 0 new comments -
Add Vocos model
#39403 commented on
Jul 24, 2025 • 0 new comments -
[RoPE] allow models to configure local RoPE
#39397 commented on
Jul 24, 2025 • 0 new comments -
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on
Jul 24, 2025 • 0 new comments -
Add dates to the model docs
#39320 commented on
Jul 25, 2025 • 0 new comments -
Feat: add Kwai-Keye transformers
#39292 commented on
Jul 26, 2025 • 0 new comments -
Add support for `ModernBertForMultipleChoice`
#39232 commented on
Jul 24, 2025 • 0 new comments -
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 commented on
Jul 21, 2025 • 0 new comments -
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on
Jul 26, 2025 • 0 new comments -
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on
Jul 22, 2025 • 0 new comments -
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on
Jul 21, 2025 • 0 new comments -
Add `SepCache` [An efficient and easy-to-use Cache from the SepLLM paper - ICML 2025 (https://arxiv.org/abs/2412.12094) ] to the `cache_utils.py` and `__init__.py`
#38824 commented on
Jul 22, 2025 • 0 new comments -
[WIP] Computer vision util: vision visualizer
#36892 commented on
Jul 25, 2025 • 0 new comments -
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 commented on
Jul 23, 2025 • 0 new comments -
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 commented on
Jul 23, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
Jul 23, 2025 • 0 new comments -
_load_rng_state after get_batch_samples may break training reproducibility when dataloader has random operations
#39215 commented on
Jul 23, 2025 • 0 new comments -
Export voxtral to ExecuTorch
#39511 commented on
Jul 23, 2025 • 0 new comments -
Whisper `return_language` with pipeline no longer working
#39404 commented on
Jul 23, 2025 • 0 new comments -
object detection : matchin outputs.last_hidden_state with results
#39426 commented on
Jul 22, 2025 • 0 new comments -
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 commented on
Jul 22, 2025 • 0 new comments -
add MiniCPM-o
#37029 commented on
Jul 22, 2025 • 0 new comments -
Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 commented on
Jul 22, 2025 • 0 new comments -
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 commented on
Jul 22, 2025 • 0 new comments -
How to streaming output audio of Qwen2.5-omni-7b
#37570 commented on
Jul 22, 2025 • 0 new comments -
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on
Jul 22, 2025 • 0 new comments -
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on
Jul 22, 2025 • 0 new comments -
T5Gemma failing on provided example
#39522 commented on
Jul 21, 2025 • 0 new comments -
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 commented on
Jul 21, 2025 • 0 new comments -
Sign BCEA ificant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 commented on
Jul 21, 2025 • 0 new comments -
Transformers version causing my finetuned model to hallucinate
#38378 commented on
Jul 21, 2025 • 0 new comments -
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 commented on
Jul 21, 2025 • 0 new comments -
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on
Jul 21, 2025 • 0 new comments -
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on
Jul 21, 2025 • 0 new comments -
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on
Jul 20, 2025 • 0 new comments -
Checkpointing broken for classifier training multi-gpu
#38925 commented on
Jul 20, 2025 • 0 new comments -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on
Jul 23, 2025 • 0 new comments -
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on
Jul 21, 2025 • 0 new comments -
use warning_once instead of warning in Trainer.tokenizer
#35482 commented on
Jul 25, 2025 • 0 new comments -
Add FAST
#35476 commented on
Jul 22, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Jul 21, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
Jul 26, 2025 • 0 new comments -
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 commented on
Jul 26, 2025 • 0 new comments -
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 commented on
Jul 26, 2025 • 0 new comments -
pytorch version 1.8.1 compatibility
#39049 commented on
Jul 26, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
Jul 26, 2025 • 0 new comments -
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on
Jul 25, 2025 • 0 new comments -
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 commented on
Jul 25, 2025 • 0 new comments -
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 commented on
Jul 25, 2025 • 0 new comments -
Not able to use flash attention with torch.compile with model like BERT
#39017 commented on
Jul 25, 2025 • 0 new comments -
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on
Jul 25, 2025 • 0 new comments -
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on
Jul 24, 2025 • 0 new comments -
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on
Jul 24, 2025 • 0 new comments -
Model implmenetation using Liger Kernel layers
#38416 commented on
Jul 24, 2025 • 0 new comments -
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 commented on
Jul 24, 2025 • 0 new comments -
Adding support for Gemma 3n GGUFs
#39329 commented on
Jul 24, 2025 • 0 new comments -
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on
Jul 24, 2025 • 0 new comments -
Add DiCoW: Diarization-Conditioned Whisper
#39430 commented on
Jul 24, 2025 • 0 new comments -
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 commented on
Jul 24, 2025 • 0 new comments