8000 [RFC] Enable XPU+FlexAttention on Intel GPU · Issue #153024 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content
[RFC] Enable XPU+FlexAttention on Intel GPU #153024
@liangan1

Description

@liangan1

🚀 The feature, motivation and pitch

Motivation

The Attention has been the critical performance bottleneck in the current LLM models, and FlexAttention is a good choice to cover the broad variants in the transformers series models. With FlexAttention, it is easy for us to enable the paged attention and fused SDPA in the transformers repo on XPU device. Besides, it also provide a candidate to process attention in LLM ecosystem libraries ., e.g., vLLM, SGLang on XPU device.

FlexAttention is also a good start point to push the intel triton based GEMM kernel to be matured. FlexAttention provide both flexattention kernel and flexdecoding kernel to cover both compute bound and memory bound GEMM computation, and different shapes should also been supported to serve LLM inference., e.g. head_dim=64, 96, 128, 256.

Our Plan

As you know, FlexAttention is flexible enough to cover all kinds of variant of attention which also means that the dependent software stack need to be strong enough to cooperate with triton template kernel. So, it is still a stretch goal to push the XPU+FlexAttention to be landed in the torch-2.8.

PR List

The FlexAttention is still in active development and the API is not stable yet.

Alternatives

No response

Additional context

No response

cc @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @gujinghui @EikanWang @fengyuan14 @guangyey @Chillee @drisspg @yanboliang @BoyuanFeng

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: flex attentionmodule: higher order operatorstorch.cond and similarmodule: pt2-dispatcherPT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op,module: xpuIntel XPU related issuesoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0