Measure gpu time for all render passes #769

almarklein · 2024-05-30T08:57:25Z

This adds support to measure the GPU time for the different render passes, which can be useful in benchmarks to see how much strain each pass in the rendering process puts on the GPU, and how certain choices affect the GPU usage. This feature is opt-in, one must set renderer.measure_gpu_times = True. This flag is undocumented/experimental for now.

Note1: This relies on pygfx/wgpu-py#505, which is not yet released. But IMO that's ok, since for now this feature is undocumented and only used by ourselves (meaning devs who checkout the mainof wgpu-py).

Note2: Using this feature requires gfx.renderers.wgpu.enable_wgpu_features("timestamp-query").

This PR:

Adds a GpuTimeMeasurer to abstract away the details, so that the rendering code has minimal extra stuff to do the measurements.
The renderer creates an instance for each render() call, and uses it to measure the render passes, blender combine pass (if applicable), flush pass, and shadow passes.
The renderer then publishes the result in a new stats prop at renderer.stats["gpu_times"].

An example output from the benchmark code I'm working on. You can see how the number of measurements changes depending on the presence of shadows and used blend mode.

       weighthed_plus blending - cpu:  6.0  bcombine:  0.5  flush:  1.6  pass1:  0.2  pass2:  0.3  pass3:  0.3
                        shadow - cpu:  7.6  flush:  1.4  pass1:  0.2  pass2:  0.2  shadow:  0.4

almarklein · 2024-05-30T10:25:31Z

Interesting ... in a benchmark that renders 1M points, the passes (obviously) take more gpu time, but the flush-pass also reports a much larger time. In other words, the measured gpu-time is affected by previous render passes.

Nice reallity-check that the GPU and CPU operate asynchronously: the CPU submits work, and the GPU then does it. The longer time for the flush-pass means just that all passes where submitted in quick succession, but the flush pass finishes later, because the render-pass took longer ...

In practice, we sync the CPU with the GPU (i.e. the CPU waits for the GPU to finish) when we read from the buffer containing the timestamps.

There's a lot to gain by using async code instead ... Also, using start+end times (instead of their just difference) can help us create graphs that show where CPU and GPU are doing work!! Handy when we work on #495.

almarklein · 2024-06-07T08:41:29Z

Moving back to draft. Don't need this right now for benchmarks. But these timestamps are very useful when we look at scheduling cpu vs gpu work, see #495

Measure gpu time for all render passes

2ad28c8

almarklein requested a review from Korijn as a code owner May 30, 2024 08:57

better name

4e9ab35

Merge branch 'main' into gpu-times

dd7f1d2

almarklein marked this pull request as draft June 7, 2024 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Measure gpu time for all render passes #769

Measure gpu time for all render passes #769

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Measure gpu time for all render passes #769

Are you sure you want to change the base?

Measure gpu time for all render passes #769

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!