10BC0 Remove Qwen Image Redundant RoPE Cache by dg845 · Pull Request #12452 · huggingface/diffusers · GitHub
[go: up one dir, main page]

Skip to content

Conversation

dg845
Copy link
Collaborator
@dg845 dg845 commented Oct 9, 2025

What does this PR do?

This PR removes self.rope_cache in QwenEmbedRope, so that RoPE frequency caching is only done once through the functools.lru_cache decorator on _compute_video_freqs. It also changes the maxsize argument for lru_cache from None to 128 to prevent the cache from causing OOM errors.

Fixes #12401

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
@naykun
@chenxiao111222

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member
@sayakpaul sayakpaul left a comment
8000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naykun could you also give it a look?

@dg845
Copy link
Collaborator Author
dg845 commented Oct 9, 2025

One thing to note is that the current PR implementation (in 52cf252) isn't quite equivalent to the implementation in main when not compiling since frame is part of the cache key for the lru_cache, but not part of the cache key for self.rope_cache.

@naykun do you think changing the cache key to include frame makes sense? If frame is the same for most calls to _compute_video_freqs, then I think the two implementations are approximately equivalent.

@sayakpaul
Copy link
Member

I think the use of a frame based notation is present to allow extension to videos easily. For images, this shouldn't matter at all because the idx is always a constant and since img_shapes is supplemented just once from the pipeline:

img_shapes: Optional[List[Tuple[int, int, int]]] = None,

(example)

do you think changing the cache key to include frame makes sense? If frame is the same for most calls to _compute_video_freqs, then I think the two implementations are approximately equivalent.

For the rope_cache, I am not sure how much of a readability compromise that would be, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redundant lru_cache in QwenEmbedRope
3 participants
0