8000 bpo-34037: test_asyncio uses shutdown_default_executor() by vstinner · Pull Request #16284 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-34037: test_asyncio uses shutdown_default_executor() #16284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2019
Merged

bpo-34037: test_asyncio uses shutdown_default_executor() #16284

merged 2 commits into from
Sep 19, 2019

Conversation

vstinner
Copy link
Member
@vstinner vstinner commented Sep 19, 2019

@vstinner
Copy link
Member Author

@aeros167: Would you mind to review this change?

@vstinner
Copy link
Member Author

I simplified my change (I reverted some useless changes).

@vstinner vstinner merged commit 079931d into python:master Sep 19, 2019
@vstinner vstinner deleted the asyncio_tests branch September 19, 2019 14:45
@vstinner
Copy link
Member Author

@asvetlov: Thanks for the review. I merged my PR.

Copy link
Contributor
@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner Apologies for the late response, I've been taking it easier today since I think I've been doing a bit too much recently, trying to avoid the infamous open source burnout you've talked about a few times. (:

Overall, the tests look correct to me. My only concern is a potential timeout issue from the part in close_loop() (lines 511-516 in test_asyncio/utils.py):

     def close_loop(loop):
         if loop._default_executor is not None:
             if not loop.is_closed():
                 loop.run_until_complete(loop.shutdown_default_executor())
             else:
                 loop._default_executor.shutdown(wait=True)

Since shutdown_default_executor() already attempts to call _default_executor.shutdown(wait=True) within the thread it creates, if there are any issues with joining the th 8000 reads, presumably calling _default_executor.shutdown(wait=True) again would not resolve the issue and could end up resulting in a complete timeout for test_asyncio if any of the threads in the ThreadPoolExecutor were unable to be joined.

Therefore, I would recommend changing the else block to the following:

            else:
                loop._default_executor.shutdown(wait=False)

This will at least guarantee that the executor will be shutdown completely, regardless of the status of the threads. IMO, a timeout is the worst case scenario, since it gives us far less useful debugging information and makes the issue harder to resolve.

An alternative of this could be:

     def close_loop(loop):
         if loop._default_executor is not None:
             try:
                 loop.run_until_complete(loop.shutdown_default_executor())
             finally:
                 loop._default_executor.shutdown(wait=False)

This would provide further assurance that the executor will be shut down, since loop._default_executor.shutdown(wait=False) would always be called last. But this may come at the cost of slightly reducing performance, since it would be called redundantly in normal scenarios. For that reason, I prefer the first option.

What are your thoughts @asvetlov and @1st1?

@asvetlov
Copy link
Contributor

I think in tests we need to wait for executor worker threads.
If they hang -- we can debug the reason and fix broken test.

@aeros
Copy link
Contributor
aeros commented Sep 20, 2019

@asvetlov:

I think in tests we need to wait for executor worker threads.
If they hang -- we can debug the reason and fix broken test.

Since there isn't a timeout specified, wouldn't the worker threads indefinitely hang until the buildbot itself times out? This doesn't seem like it would give us very useful debugging information (which was my initial concern). Perhaps this should raise an exception if shutdown_default_executor() was unable to close the threads?

@asvetlov
Copy link
Contributor

Thanks for the nice question!
This PR provides a very good improvement, I don't regret on merging :)
The next step could be:

  1. Support timeout parameter by shutdown_default_executor(timeout=None); pass it to th.join(timeout).
  2. Add timeout keyword-only argument to asyncio.run()

Does it make sense?
@aeros167 would you master a PR for the proposed change?

@vstinner
Copy link
Member Author

Honestly, if a test ends with "loop._default_executor is not None and loop.is_closed()": I would call this a bug in the test, but I'm not interested to review all tests which end up like this.

I prefer to always shutdown the executor and wait until it completes.

I'm not sure of what you mean by shutting down the executor twice. In which case it would occur?

Since there isn't a timeout specified, wouldn't the worker threads indefinitely hang until the buildbot itself times out?

Buildbots implement multiple layers of timeouts:

  • the test runner calls faulthandler.dump_traceback_later(timeout, exit=True): usually between 20 and 30 minutes
  • tests are working in subprocesses: the parent waits the wotker process with a timeout: based on the first timeout x 1.5
  • buildbot has an hard limit of 1200 seconds with no output

@aeros
Copy link
Contributor
aeros commented Sep 20, 2019

@vstinner:

I'm not sure of what you mean by shutting down the executor twice. In which case it would occur?

Apologies if I didn't word this clearly, I meant that in a scenario where loop.run_until_complete(loop.shutdown_default_executor()) would fail (as a result of threads not closing), loop._default_executor.shutdown(wait=True) would also fail. So it's not clear to me that the else clause would accomplish anything different.

An "emergency shutdown" of sorts that would always close the executor would be loop._default_executor.shutdown(wait=False), so that like it would be more effective in those scenarios. I like @asvetlov's idea of adding a timeout parameter better though, that sounds like a far more robust solution.

Honestly, if a test ends with "loop._default_executor is not None and loop.is_closed()": I would call this a bug in the test, but I'm not interested to review all tests which end up like this.

Ah, I think that clears things up for me then. I was thinking more along the lines of treating this like a bug and attempting to resolve that scenario. Calling loop._default_executor.shutdown(wait=False) would likely result in dangling threads, and show us there's an issue with that section. But that may result in an excessive number of build failures, as you mentioned.

Buildbots implement multiple layers of timeouts

Good to know! I wasn't aware of there being multiple layers of timeouts for the buildbots. I figured there was some form of hard limit when there's no output for x seconds, but I didn't know about faulthandler.dump_traceback_later(timeout, exit=True).

tests are working in subprocesses: the parent waits the wotker process with a timeout: based on the first timeout x 1.5

Is there a particular reason why the timeout duration multiplier is 1.5? I'm not familiar with how the tests utilize subprocesses or the implementation details. But I am always eager to learn. (:

@vstinner
Copy link
Member Author

Is there a particular reason why the timeout duration multiplier is 1.5? I'm not familiar with how the tests utilize subprocesses or the implementation details. But I am always eager to learn. (:

faulthandler is supposed to work. The second timeout in the parent process is just a watchdog, "just in case". I prefer to wait for faulthandler since it dumps the Python traceback of all threads before exiting, whereas when the parent hits the worker timeout, it just kills the worker process: no traceback.

@aeros
Copy link
Contributor
aeros commented Sep 20, 2019

@asvetlov:

Thanks for the nice question!
This PR provides a very good improvement, I don't regret on merging :)

No problem, I always try to ask questions when I don't understand something. Worst case scenario, the question goes unanswered. Best case scenario, I learn something new! I certainly understand though if you guys don't have the time to answer all of them.

I certainly don't have an issue with the PR, I just didn't understand what the else clause was accomplishing differently. But that was likely a result of my own misunderstanding, rather than something being flawed with the test. I think this PR provides an improvement either way. (:

Support timeout parameter by shutdown_default_executor(timeout=None); pass it to th.join(timeout).

Add timeout keyword-only argument to asyncio.run()

Does it make sense?
would you master a PR for the proposed change?

Yes, that makes sense. I would be glad to help with working on this!

@vstinner
Copy link
Member Author

I meant that in a scenario where loop.run_until_complete(loop.shutdown_default_executor()) would fail (as a result of threads not closing), loop._default_executor.shutdown(wait=True) would also fail.

IMHO this discussion is a little bit too far from the initial issue and sounds like a theorical issue. If you can see a practical issue, please open a new issue. You are commenting a closed PR of a closed issue.

@aeros
Copy link
Contributor
aeros commented Sep 20, 2019

@vstinner:

IMHO this discussion is a little bit too far from the initial issue and sounds like a theorical issue. If you can see a practical issue, please open a new issue.

It's not so much a practical problem, it's more of a question to clear up my understanding (so it wouldn't justify opening an issue for it).

You are commenting a closed PR of a closed issue.

Apologies, I think I may have been doing that a bit too much recently, is this a bad thing to do? I mostly have been doing it when something doesn't make sense to me, to ask questions. Perhaps there's a better location to do this that I'm not aware of. Either way though, thank you for the in-depth explanation, I certainly appreciate it. (:

@1st1
Copy link
Member 8000
1st1 commented Sep 20, 2019

@asvetlov

Add timeout keyword-only argument to asyncio.run()

I don't like this idea. We can have a global asyncio "reasonable" timeout for shutting down threads -- say 5 minutes -- if we ever reach it we complain loudly that asyncio cannot shutdown the threadpool cleanly.

But we don't want to have multiple knobs to tweak asyncio.run().

@aeros
Copy link
Contributor
aeros commented Sep 20, 2019

@1st1:

We can have a global asyncio "reasonable" timeout for shutting down threads -- say 5 minutes --

Do you think this would be best implemented as a private property within BaseEventLoop, for example _thread_timeout?

But we don't want to have multiple knobs to tweak asyncio.run().

I can understand why that might not be the best long term strategy. A single additional knob doesn't make a big difference, but it can significantly add up over time.

@1st1
Copy link
Member
1st1 commented Sep 20, 2019

Do you think this would be best implemented as a private property within BaseEventLoop, for example _thread_timeout?

I think it can just be a constant defined somewhere. Private properties sometimes end up becoming some "unofficial API" and we don't want those.

I can understand why that might not be the best long term strategy. A single additional knob doesn't make a big difference, but it can significantly add up over time.

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0