Addind RoPE to pytorch core

@albanD

The RoPE python code is being copied and pasted over and over in multiple pytorch org repos. I propose we move the RoPE operation to pytorch core (e.g. under nn.functional) and also add a RotaryPositionalEmbeddings module. Some examples of code duplication:

pytorch/ao:

https://github.com/pytorch/ao/blob/64bcf4c25755a783685ba7383000b3bf722523c1/torchao/_models/llama/model.py#L546-L558

pytorch/benchmark:

pytorch/torchchat:

https://github.com/pytorch/torchchat/blob/4d8bab57ce5dca927402923c2b1ad83cd7e2f6ac/torchchat/model.py#L988-L1000

pytorch/torchtune:

pytorch/xla:

https://github.com/pytorch/xla/blob/4190fc0e8e73598966cb019108aa871a92bae046/torchax/test/llama/llama_model.py#L296-L310

pytorch/pytorch:

pytorch/benchmarks/gpt_fast/model.py

Lines 280 to 292 in 518563d

    
           def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor: 
        
               xshaped = x.float().reshape(*x.shape[:-1], -1, 2) 
        
               freqs_cis = freqs_cis.view(1, xshaped.size(1), 1, xshaped.size(3), 2) 
        
               x_out2 = torch.stack( 
        
                   [ 
        
                       xshaped[..., 0] * freqs_cis[..., 0] - xshaped[..., 1] * freqs_cis[..., 1], 
        
                       xshaped[..., 1] * freqs_cis[..., 0] + xshaped[..., 0] * freqs_cis[..., 1], 
        
                   ], 
        
                   -1, 
        
               ) 
        
               x_out2 = x_out2.flatten(3) 
        
               return x_out2.type_as(x)

pytorch/benchmarks/gpt_fast/mixtral_moe_model.py

Lines 293 to 305 in 518563d

    
           def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor: 
        
               xshaped = x.float().reshape(*x.shape[:-1], -1, 2) 
        
               freqs_cis = freqs_cis.view(1, xshaped.size(1), 1, xshaped.size(3), 2) 
        
               x_out2 = torch.stack( 
        
                   [ 
        
                       xshaped[..., 0] * freqs_cis[..., 0] - xshaped[..., 1] * freqs_cis[..., 1], 
        
                       xshaped[..., 1] * freqs_cis[..., 0] + xshaped[..., 0] * freqs_cis[..., 1], 
        
                   ], 
        
                   -1, 
        
               ) 
        
               x_out2 = x_out2.flatten(3) 
        
               return x_out2.type_as(x)

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor:
	xshaped = x.float().reshape(*x.shape[:-1], -1, 2)
	freqs_cis = freqs_cis.view(1, xshaped.size(1), 1, xshaped.size(3), 2)
	x_out2 = torch.stack(
	[
	xshaped[..., 0] * freqs_cis[..., 0] - xshaped[..., 1] * freqs_cis[..., 1],
	xshaped[..., 1] * freqs_cis[..., 0] + xshaped[..., 0] * freqs_cis[..., 1],
	],
	-1,
	)

	x_out2 = x_out2.flatten(3)
	return x_out2.type_as(x)

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions