Open
Description
The RoPE python code is being copied and pasted over and over in multiple pytorch org repos. I propose we move the RoPE operation to pytorch core (e.g. under nn.functional) and also add a RotaryPositionalEmbeddings module. Some examples of code duplication:
pytorch/ao:
pytorch/benchmark:
- https://github.com/pytorch/benchmark/blob/2c5bc4ad6ae2e78a943aff182ad7c3400a7bb879/torchbenchmark/models/simple_gpt/model.py#L441-L458
- https://github.com/pytorch/benchmark/blob/2c5bc4ad6ae2e78a943aff182ad7c3400a7bb879/torchbenchmark/models/simple_gpt_tp_manual/model.py#L400-L417
pytorch/torchchat:
pytorch/torchtune:
- https://github.com/pytorch/torchtune/blob/c3703482bde72e572b535d3f7c43c81e94164ebc/torchtune/modules/position_embeddings.py#L99-L122
- https://github.com/pytorch/torchtune/blob/c3703482bde72e572b535d3f7c43c81e94164ebc/torchtune/models/llama3_1/_position_embeddings.py#L168-L191
pytorch/xla:
pytorch/pytorch:
pytorch/benchmarks/gpt_fast/model.py
Lines 280 to 292 in 518563d
pytorch/benchmarks/gpt_fast/mixtral_moe_model.py
Lines 293 to 305 in 518563d
cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki