🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` #38129

gante · 2025-05-14T17:35:02Z

What does this PR do?

TL;DR our defaults for text generation are very outdated. Most notably, the default maximum length.

This PR adds the tooling to easily add pipeline-specific generate defaults, and adds new defaults to all pipelines that call generate.

⚠️ ⚠️ ⚠️

In all pipelines that call generate, the new max_new_tokens default is 256

⚠️ ⚠️ ⚠️

Tests

This PR reviewed the tests of the pipelines that call generate such that their CI is now green. Overall changes:

broken TF tests -> deleted
tests that were already broken on main -> added a skip with a TODO
slow tests that were not slow (< 3s) -> removed @slow
tests were the actual output contents were not checked -> changed to a smaller model whenever possible, and reduced max_new_tokens

Example

from transformers import pipeline
generator = pipeline(model="Qwen/Qwen3-0.6B")
print(generator("Tell me a story about a cat:"))

On main it prints

[{'generated_text': 'Tell me a story about a cat: a cat named Bella, who is a member of a small town in the countryside. 
She has a'}]

With this PR it prints

[{'generated_text': "Tell me a story about a cat: a cat with the name of Cesar, who has a special ability to create things
 out of his imagination, and he loves playing with toys. He has a dog named Luna who loves playing with him. What are the 
possible themes and messages that can be conveyed through this story? Also, could you provide an example of a plot 
twist that could be added to the story? Additionally, could you write a short story of at least 1000 words, including the 
opening scene, the development of the story, and the closing scene?\nAnswer:\n\n### A Cat with a Special Ability\n\nIn a 
quiet village nestled between rolling hills and a distant forest, nestled in a cozy cottage, lived a cat named Cesar. He was 
small and playful, with a gentle gaze that made people smile. However, something about Cesar was truly unique. He could 
create things out of his imagination, a gift that made him both curious and endearing.\n\nCesar's ability to create things 
out of his imagination was unlike anything he had ever known. He could conjure up whimsical creatures, build elaborate 
structures, and even make things that seemed impossible at first. He loved to play with toys, and his favorite toy was a 
small wooden box that he always filled with colorful feathers and"}]

github-actions · 2025-05-14T17:35:22Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

src/transformers/generation/configuration_utils.py

gante · 2025-05-14T17:42:14Z

src/transformers/generation/utils.py

@@ -1688,7 +1688,7 @@ def _prepare_generation_config(
                    if custom_gen_config_value == default_value and model_gen_config_value != default_value:
                        modified_values[key] = model_gen_config_value
                        setattr(generation_config, key, model_gen_config_value)
-                if len(modified_values) > 0:
+                if use_model_defaults is None and len(modified_values) > 0:


We only want to print the warning [model-specific values overriding global defaults] when the use_model_defaults is unset.

In pipelines we were already using the model defaults, this change prevents unwanted warnings.

(model defaults is messy at the moment, model-specific generation config files should untangle this mess)

This area feels a little confusing and bug-prone because we have default_generation_config, self.generation_config and generation_config all being accessed in the same loop. Maybe we can alias self.generation_config or something? I feel like I'm extremely likely to get it mixed up with generation_config if I edit this code.

HuggingFaceDocBuilderDev · 2025-05-14T17:48:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Vaibhavs10

yayyyyy - quite excited for this!

Vaibhavs10 · 2025-05-14T18:06:41Z

src/transformers/pipelines/document_question_answering.py

@@ -129,6 +134,11 @@ class DocumentQuestionAnsweringPipeline(ChunkPipeline):
    [huggingface.co/models](https://huggingface.co/models?filter=document-question-answering).
    """

+    # Make sure the docstring is updated when the default generation config is changed
+    _default_generation_config = GenerationConfig(
+        max_new_tokens=256,


Suggested change

max_new_tokens=256,

max_new_tokens=1024,

maybe a bit higher specifically if some one wants to just pass a long prompt

it's for max new tokens (not total tokens) so it doesn't take into account the prompt! Same answer as below, I'd encourage not setting too high values either

Makes sense, my understanding was that if EOS token is encountered it'll early exit the generation anyway - and it'll not generate all tokens

@LysandreJik @Vaibhavs10

We might want to enable something like max_new_length="auto", which maximizes max_new_length according to available HW?

As VB wrote, in most models we don't actually hit max_new_length because of the EOS token(s). But a large default max_new_tokens will 100% hit OOM issues (it was one of the reasons behind the old default btw 👀 )

ah yeah! might be a good idea, but maybe in another PR as follow-up if we don't see a lot of edge cases from this one.

yes, 100% as a follow-up

src/transformers/pipelines/automatic_speech_recognition.py

Vaibhavs10 · 2025-05-14T18:11:20Z

src/transformers/pipelines/text_generation.py

@@ -95,6 +102,13 @@ class TextGenerationPipeline(Pipeline):
    begging for his blessing. <eod> </s> <eos>
    """

+    # Make sure the docstring is updated when the default generation config is changed
+    _default_generation_config = GenerationConfig(
+        max_new_tokens=256,


Suggested change

max_new_tokens=256,

max_new_tokens=1024,

also think here a sensible default would be 1K, for typical summarise X usecases, wdyt?

I think 1024 can be really, really long to generate - for example some Qwen models have 2k set and even on an M4 laptop I can end up waiting minutes for an output

I think having a very large default for systems that are streaming is fine, but for systems which are blocking, I'd actually advise something lower (256 already seems high to me, but I can live with it)

LysandreJik

256 sounds good to me IMO!

gante · 2025-05-19T15:48:42Z

tests/generation/test_configuration_utils.py

@@ -204,8 +204,9 @@ def test_validate(self):

        # By default we throw a short warning. However, we log with INFO level the details.
        # Default: we don't log the incorrect input values, only a short summary. We explain how to get more details.
-        with CaptureLogger(logger) as captured_logs:
-            GenerationConfig(do_sample=False, temperature=0.5)
+        with LoggingLevel(logging.WARNING):


there is something in our test suite that likely changes the default logging level -- this test is flaky because the line below is capturing INFO level logs 🤔

these flaky checks were merged today (2025-may-19) by me

Rocketknight1

LGTM! The changes to tests and individual pipelines are mostly straightforward, so the core changes are:

Replacing errors when an assistant model is passed for generations that don't support it (e.g. beam search) with a warning
Flagging model-specific values overriding defaults in generation/utils.py
Adding the _pipeline_calls_generate attribute in pipelines/base.py to replace model.can_generate(), which doesn't always indicate that the pipeline will be generating text

Those changes all make sense to me! I made a couple of small comments, but they're nits and none of them are blockers - it's fine to ignore them or deal with them after merging.

Rocketknight1 · 2025-05-19T15:47:58Z

src/transformers/pipelines/base.py

+            self.assistant_model, self.assistant_tokenizer = load_assistant_model(
+                self.model, kwargs.pop("assistant_model", None), kwargs.pop("assistant_tokenizer", None)
+            )


It feels a little weird for this to be outside self.model.can_generate() - is there a case where the model can't generate but we still have assistant_model? I'm guessing this is related to the changes in configuration_utils.py that replace errors for unexpected assistant model kwargs with warnings?

is there a case where the model can't generate but we still have assistant_model?

No, moving these lines in :)

The replacement of errors by warnings is because of the new defaults in some pipelines. e.g. on ASR we now set num_beams=5, which would make any attempt of calling assisted generation crash unless we were to manually set num_beams=1. We want the code to run if we pass an assistant model, so the warning is a compromise 🤔

Rocketknight1 · 2025-05-19T15:50:02Z

tests/pipelines/test_pipelines_automatic_speech_recognition.py

@@ -420,6 +420,7 @@ def test_return_timestamps_in_preprocess(self):

    @slow
    @require_torch
+    @unittest.skip("TODO (joao, eustache): this test is failing, find the breaking PR and fix the cause or the test")


Slightly nervous about all the skips here, but if we do a follow-up PR soon then it's probably fine!

btw, these tests were already failing on main, and I think for quite a long time 👀

Rocketknight1 · 2025-05-19T15:51:38Z

tests/pipelines/test_pipelines_text2text_generation.py

+            "text2text-generation",
+            model="patrickvonplaten/t5-tiny-random",
+            framework="pt",
+            num_beams=1,


is num_beams=1 not the default?

not with these changes (we recommend using beam search in text2text models in our docs, so might as well set it as default for those pipelines!)

Rocketknight1 · 2025-05-19T16:00:00Z

src/transformers/generation/utils.py

@@ -1688,7 +1688,7 @@ def _prepare_generation_config(
                    if custom_gen_config_value == default_value and model_gen_config_value != default_value:
                        modified_values[key] = model_gen_config_value
                        setattr(generation_config, key, model_gen_config_value)
-                if len(modified_values) > 0:
+                if use_model_defaults is None and len(modified_values) > 0:


This area feels a little confusing and bug-prone because we have default_generation_config, self.generation_config and generation_config all being accessed in the same loop. Maybe we can alias self.generation_config or something? I feel like I'm extremely likely to get it mixed up with generation_config if I edit this code.

…gingface#38129) * pipeline generation defaults * add max_new_tokens=20 in test pipelines * pop all kwargs that are used to parameterize generation config * add class attr that tell us whether a pipeline calls generate * tmp commit * pt text gen pipeline tests passing * remove failing tf tests * fix text gen pipeline mixin test corner case * update text_to_audio pipeline tests * trigger tests * a few more tests * skips * some more audio tests * not slow * broken * lower severity of generation mode errors * fix all asr pipeline tests * nit * skip * image to text pipeline tests * text2test pipeline * last pipelines * fix flaky * PR comments * handle generate attrs more carefully in models that cant generate * same as above

Some PEFT integration tests involving text generation pipelines were failing since huggingface#38129 because the base model is too small to generate longer sequences. Setting max_new_tokens fixes this.

Some PEFT integration tests involving text generation pipelines were failing since #38129 because the base model is too small to generate longer sequences. Setting max_new_tokens fixes this.

github-actions bot marked this pull request as draft May 14, 2025 17:35

gante requested a review from LysandreJik May 14, 2025 17:35

gante commented May 14, 2025

View reviewed changes

src/transformers/generation/configuration_utils.py Outdated Show resolved Hide resolved

gante commented May 14, 2025

View reviewed changes

Vaibhavs10 reviewed May 14, 2025

View reviewed changes

LysandreJik approved these changes May 14, 2025

View reviewed changes

gante marked this pull request as ready for review May 15, 2025 09:08

gante mentioned this pull request May 16, 2025

[generation] Less verbose warnings by default #38179

Merged

gante added 9 commits May 19, 2025 10:13

pipeline generation defaults

1adb56c

add max_new_tokens=20 in test pipelines

db94561

pop all kwargs that are used to parameterize generation config

22f7ef6

add class attr that tell us whether a pipeline calls generate

6975821

tmp commit

3efd1af

pt text gen pipeline tests passing

1e9b8f2

remove failing tf tests

7f66a8f

fix text gen pipeline mixin test corner case

2121c51

update text_to_audio pipeline tests

a0c6cfd

gante force-pushed the pipeline_generation_defaults branch from 130d794 to a0c6cfd Compare May 19, 2025 10:14

gante added 10 commits May 19, 2025 10:23

trigger tests

0b957b1

a few more tests

3ea809b

skips

27139f7

some more audio tests

771bb9b

not slow

e54d8f8

broken

F438

6794430

lower severity of generation mode errors

648e305

fix all asr pipeline tests

5f32d03

nit

fd5f304

skip

7aafeda

gante added 3 commits May 19, 2025 15:20

image to text pipeline tests

7f5f4c2

text2test pipeline

818ba5e

last pipelines

bc97545

gante requested a review from Rocketknight1 May 19, 2025 15:42

fix flaky

d088717

gante commented May 19, 2025

View reviewed changes

Rocketknight1 approved these changes May 19, 2025

View reviewed changes

gante added 3 commits May 19, 2025 16:16

PR comments

7120ee2

handle generate attrs more carefully in models that cant generate

5b4fdbb

same as above

9a1dfec

gante merged commit 9c50001 into huggingface:main May 19, 2025
20 checks passed

gante deleted the pipeline_generation_defaults branch May 19, 2025 17:02

BenjaminBossan mentioned this pull request Jun 27, 2025

TST PEFT integration tests with pipeline generate #39086

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` #38129

🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` #38129

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🚨🚨🚨 [pipelines] update defaults in pipelines that can generate #38129

🚨🚨🚨 [pipelines] update defaults in pipelines that can generate #38129

Uh oh!

Conversation

Uh oh!

What does this PR do?

Tests

Example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` #38129

🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` #38129