Hide getitems in Dynamo bytecode profiling

@voznesenskym

Profiling vLLM leads to a profile that looks like the above. There are a lot of small "dict getitem" calls in the middle.

This seems to be not representative. For example, in the profile above it looks like it takes 40% of the overall time, but in reality, it likely takes a lot less and there is some per-call profiler overhead.

We should try to figure out what is emitting these (this is likely a more generic torch.compile x profiler problem, because this is using the PyTorch profiler) and see if we can group them all together in one single "dynamo bytecode" region.

To repro:

use https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/simple_profiling.py#L24
change "model" to "meta-llama/Llama-3.1-8B-Instruct"
run the script

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions