8000 Llama4TextExperts module implementation · Issue #37325 · huggingface/transformers · GitHub
[go: up one dir, main page]

Skip to content
Llama4TextExperts module implementation #37325
@Godofnothing

Description

@Godofnothing

System Info

Llama4 model family adopts MoE layer implementation for better efficiency.

However, in the current implementation MoE layer in fact performs an ordinary dense FFN forward pass with all experts being involved in the computation. One can see, that gate_up_proj matrix has the same shape as if all num_experts are active.

Image

I guess the intent was to perform computation only for the experts selected by router.

Who can help?

@ArthurZucker

Reproduction

Any usage of the model

Expected behavior

Only experts chosen by the router are involved in computation

Metadata

Metadata

Assignees

No one assigned

    Labels

    UsageGeneral questions about the librarybug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0