10000 Measure gpu time for all render passes by almarklein · Pull Request #769 · pygfx/pygfx · GitHub
[go: up one dir, main page]

Skip to content

Conversation

almarklein
Copy link
Member

This adds support to measure the GPU time for the different render passes, which can be useful in benchmarks to see how much strain each pass in the rendering process puts on the GPU, and how certain choices affect the GPU usage. This feature is opt-in, one must set renderer.measure_gpu_times = True. This flag is undocumented/experimental for now.

Note1: This relies on pygfx/wgpu-py#505, which is not yet released. But IMO that's ok, since for now this feature is undocumented and only used by ourselves (meaning devs who checkout the mainof wgpu-py).

Note2: Using this feature requires gfx.renderers.wgpu.enable_wgpu_features("timestamp-query").

This PR:

  • Adds a GpuTimeMeasurer to abstract away the details, so that the rendering code has minimal extra stuff to do the measurements.
  • The renderer creates an instance for each render() call, and uses it to measure the render passes, blender combine pass (if applicable), flush pass, and shadow passes.
  • The renderer then publishes the result in a new stats prop at renderer.stats["gpu_times"].

An example output from the benchmark code I'm working on. You can see how the number of measurements changes depending on the presence of shadows and used blend mode.

       weighthed_plus blending - cpu:  6.0  bcombine:  0.5  flush:  1.6  pass1:  0.2  pass2:  0.3  pass3:  0.3
                        shadow - cpu:  7.6  flush:  1.4  pass1:  0.2  pass2:  0.2  shadow:  0.4

@almarklein almarklein requested a review from Korijn as a code owner May 30, 2024 08:57
@almarklein
Copy link
Member Author

Interesting ... in a benchmark that renders 1M points, the passes (obviously) take more gpu time, but the flush-pass also reports a much larger time. In other words, the measured gpu-time is affected by previous render passes.

Nice reallity-check that the GPU and CPU operate asynchronously: the CPU submits work, and the GPU then does it. The longer time for the flush-pass means just that all passes where submitted in quick succession, but the flush pass finishes later, because the render-pass took longer ...

In practice, we sync the CPU with the GPU (i.e. the CPU waits for the GPU to finish) when we read from the buffer containing the timestamps.

There's a lot to gain by using async code instead ... Also, using start+end times (instead of their just difference) can help us create graphs that show where CPU and GPU are doing work!! Handy when we work on #495.

@almarklein almarklein marked this pull request as draft June 7, 2024 08:40
@almarklein
Copy link
Member Author

Moving back to draft. Don't need this right now for benchmarks. But these timestamps are very useful when we look at scheduling cpu vs gpu work, see #495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0