Add a benchmark to measure gc traversal #244

pablogsal · 2022-11-07T14:41:47Z

No description provided.

ericsnowcurrently

LGTM

There's one thing to clarify, but I think I've understood it correctly. I'm approving the PR under that assumption. 🙂

ericsnowcurrently · 2022-11-07T17:05:11Z

pyperformance/data-files/benchmarks/bm_gc_traversal/run_benchmark.py

+    all_cycles = create_recursive_containers(n_levels)
+    for _ in range(loops):
+        gc.collect()
+        # Main loop to measure
+        t0 = pyperf.perf_counter()
+        collected = gc.collect()
+        total_time += pyperf.perf_counter() - t0


Just to be sure I understand, the GC objects aren't cleaned up during gc.collect() because all_cycles is still bound in the locals. That's why we can re-use it in each loop. Is that right?

Yeah, this test is just benchmarking the gc traversing all those objects again and again so no need to remove them

nanjekyejoannah · 2022-11-07T18:54:20Z

Why dont you create one benchmark as we know its supposed to be a workload, then things like traversal, collection are just metrics? The PyPy folks have a workload called gcbench we can reference. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure. Rather its applying some metrics on several workloads. Am I missing something here @pablogsal ?

pablogsal · 2022-11-07T18:58:09Z

Thanks for you comments Joannah!

e a workload called gcbench we can reference. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure. Rather its applying some metrics on several workloads. Am I missing something here @pablogsal ?

is very difficult to get a benchmark that is heavily exercising the GC only and that's not skewed by other things getting faster. For now, I added these benchmarks to ensure that we can keep track of several aspects of the -CPython- GC that we may want to track, such as how efficient traversal and removing cycles are.

then things like traversal, collection are just metrics

These things matter for the runtime, so these benchmarks allow us to keep track of them. That's why I am adding them.

. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure.

That's ok, we are not researching GCs, we are keeping track of optimizations in CPython and we want to keep track on how these things get faster for our specific GC, at least for the time being.

The PyPy folks have a workload called gcbench we can reference

You mean this? https://github.com/mozillazg/pypy/blob/master/rpython/translator/goal/gcbench.py

yes, I plan to add it after some modifications but notice that this benchmarks is not doing many things differently in the GC for the things we are adding here. Is just creating a tree and benchmark construction, which involves many other things than just the GC.

nanjekyejoannah · 2022-11-07T19:10:43Z

8000

That's ok, we are not researching GCs, we are keeping track of optimizations in CPython
and we want to keep track on how these things get faster for our specific GC,
at least for the time being.

I understand the motivation, I will give an example, treating traversal as just a metric, we should
be able to measure it on any workload, typically any benchmark like pidigits in the pyperformance suite. I am
not very conviced it should be a workload unless you have decided no, only this workload should be
used to measure GC traversal. Pyperformance workloads process cycles too, albeit not many so a more
rigorous workload is required.

I am not too bent on this, its just appeared wierd to me at first glance, so your decision.

You mean this? https://github.com/mozillazg/pypy/blob/master/rpython/translator/goal/gcbench.py

Yes, but I maybe biased, its usually sufficient for GC things for us, you can modify if you want.

pablogsal · 2022-11-07T19:14:07Z

I understand the motivation, I will give an example, treating traversal as just a metric, we should
be able to measure it on any workload, typically any benchmark like pidigits in the pyperformance suite. I am
not very conviced it should be a workload unless you have decided no, only this workload should be
used to measure GC traversal.

I think I understand your point of view, but these metrics/benchmarks are important for keeping track of potential improvements and we need to isolate them from other runtime improvements to ensure that what we think is getting faster gets faster. You can measure GC in many many ways, this is just more information, nothing else. Performance has plenty of "smaller" benchmarks like "list_unpack" and other things that are not measuring big end-to-end things. But thanks for your feedback, I promise to take into account for adding more benchmarks in the future.

Yes, but I maybe biased, its usually sufficient for GC things for us, you can modify if you want.

When I measured this benchmark had quite a lot of GC activity, but also a lot of other things regarding the VM. The problem is that if the VM gets faster (but not the GC) so it will this benchmark so we won't know where the advantage is coming from. This is why we want to track this separately.

nanjekyejoannah

I also consider and wish we get the Pyperformance suite to a level of being as agnostic as possible because its considered the stardard Python benchmark suite. If we have unique benchmarks specific to CPython, we can folder them differently as internal.

Add a benchmark to measure gc traversal

2a42293

ericsnowcurrently approved these changes Nov 7, 2022

View reviewed changes

pablogsal merged commit bd386ad into python:main Nov 7, 2022

pablogsal deleted the gc_traversal branch November 7, 2022 17:08

nanjekyejoannah reviewed Nov 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a benchmark to measure gc traversal #244

Add a benchmark to measure gc traversal #244

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add a benchmark to measure gc traversal #244

Add a benchmark to measure gc traversal #244

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!