8000 Add a benchmark to measure gc traversal by pablogsal · Pull Request #244 · python/pyperformance · GitHub
[go: up one dir, main page]

Skip to content

Add a benchmark to measure gc traversal #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2022

Conversation

pablogsal
Copy link
Member

No description provided.

Copy link
Member
@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There's one thing to clarify, but I think I've understood it correctly. I'm approving the PR under that assumption. 🙂

Comment on lines +21 to +27
all_cycles = create_recursive_containers(n_levels)
for _ in range(loops):
gc.collect()
# Main loop to measure
t0 = pyperf.perf_counter()
collected = gc.collect()
total_time += pyperf.perf_counter() - t0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand, the GC objects aren't cleaned up during gc.collect() because all_cycles is still bound in the locals. That's why we can re-use it in each loop. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test is just benchmarking the gc traversing all those objects again and again so no need to remove them

@pablogsal pablogsal merged commit bd386ad into python:main Nov 7, 2022
@pablogsal pablogsal deleted the gc_traversal branch November 7, 2022 17:08
@nanjekyejoannah
Copy link

Why dont you create one benchmark as we know its supposed to be a workload, then things like traversal, collection are just metrics? The PyPy folks have a workload called gcbench we can reference. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure. Rather its applying some metrics on several workloads. Am I missing something here @pablogsal ?

@pablogsal
Copy link
Member Author
pablogsal commented Nov 7, 2022

Thanks for you comments Joannah!

e a workload called gcbench we can reference. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure. Rather its applying some metrics on several workloads. Am I missing something here @pablogsal ?

is very difficult to get a benchmark that is heavily exercising the GC only and that's not skewed by other things getting faster. For now, I added these benchmarks to ensure that we can keep track of several aspects of the -CPython- GC that we may want to track, such as how efficient traversal and removing cycles are.

then things like traversal, collection are just metrics

These things matter for the runtime, so these benchmarks allow us to keep track of them. That's why I am adding them.

. I stand to be corrected but I am not familiar with folks writing a specific workload for every metric they want to measure.

That's ok, we are not researching GCs, we are keeping track of optimizations in CPython and we want to keep track on how these things get faster for our specific GC, at least for the time being.

The PyPy folks have a workload called gcbench we can reference

You mean this? https://github.com/mozillazg/pypy/blob/master/rpython/translator/goal/gcbench.py

yes, I plan to add it after some modifications but notice that this benchmarks is not doing many things differently in the GC for the things we are adding here. Is just creating a tree and benchmark construction, which involves many other things than just the GC.

@nanjekyejoannah
Copy link

That's ok, we are not researching GCs, we are keeping track of optimizations in CPython
and we want to keep track on how these things get faster for our specific GC,
at least for the time being.

I understand the motivation, I will give an example, treating traversal as just a metric, we should
be able to measure it on any workload, typically any benchmark like pidigits in the pyperformance s 8000 uite. I am
not very conviced it should be a workload unless you have decided no, only this workload should be
used to measure GC traversal. Pyperformance workloads process cycles too, albeit not many so a more
rigorous workload is required.

I am not too bent on this, its just appeared wierd to me at first glance, so your decision.

You mean this? https://github.com/mozillazg/pypy/blob/master/rpython/translator/goal/gcbench.py

Yes, but I maybe biased, its usually sufficient for GC things for us, you can modify if you want.

@pablogsal
Copy link
Member Author
pablogsal commented Nov 7, 2022

I understand the motivation, I will give an example, treating traversal as just a metric, we should
be able to measure it on any workload, typically any benchmark like pidigits in the pyperformance suite. I am
not very conviced it should be a workload unless you have decided no, only this workload should be
used to measure GC traversal.

I think I understand your point of view, but these metrics/benchmarks are important for keeping track of potential improvements and we need to isolate them from other runtime improvements to ensure that what we think is getting faster gets faster. You can measure GC in many many ways, this is just more information, nothing else. Performance has plenty of "smaller" benchmarks like "list_unpack" and other things that are not measuring big end-to-end things. But thanks for your feedback, I promise to take into account for adding more benchmarks in the future.

Yes, but I maybe biased, its usually sufficient for GC things for us, you can modify if you want.

When I measured this benchmark had quite a lot of GC activity, but also a lot of other things regarding the VM. The problem is that if the VM gets faster (but not the GC) so it will this benchmark so we won't know where the advantage is coming from. This is why we want to track this separately.

Copy link
@nanjekyejoannah nanjekyejoannah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also consider and wish we get the Pyperformance suite to a level of being as agnostic as possible because its considered the stardard Python benchmark suite. If we have unique benchmarks specific to CPython, we can folder them differently as internal.

75D0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0