Attempt to fix actions hang #397
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an attempt to fix the issue causing the actions to hang sometimes such as: https://github.com/adafruit/circuitpython-org/actions/runs/14512434132
My hypothesis is that this issue is caused by different parts of the code handling of waiting for the rate limit reset time. I think there is a situation where some parts of the code end up sleeping for a very long time due to a "negative overflow" from subtracting dates.
I found this section of the log at the end of the failed / stalled actions run:
At
09:57:55
the code prints that it will sleep for1517
seconds, I believe the print came from here: https://github.com/adafruit/adabot/blob/main/adabot/github_requests.py#L106. It also prints that reset is at10:23:13
Then at
10:23:13
it wakes up and starts to do more things. But it seems that the rate limit is not actually (fully?) reset yet because it is still indicating 1 and then 0 for remaining requests.This is the last line before the job seems to start getting cancelled / cleaned up by Github:
It seems to be saying that the rate limit will reset at the time that it currently is. I believe maybe the times "collided" or that the current time is actually slightly in front of the reset time.
I think that print must have come from one of these places:
https://github.com/adafruit/adabot/blob/main/adabot/lib/circuitpython_library_validators.py#L937
https://github.com/adafruit/adabot/blob/main/adabot/lib/circuitpython_library_validators.py#L1297
In the logic next to it to determin how long to sleep it subtracts datetimes:
If
datetime.datetime.utcnow()
were actually bigger than core_rate_limit_reset even by a little bit this subtraction basically overflows into negative which then causes theseconds
value to be a very large (positive) value that the code then sleeps for.I tested this theory in the python repl with this:
In this PR I have tried to address the issue in two ways:
github_requests.py
to try to avoid a situation where the code incircuitpython_library_validators.py
is reached at a time when the current time could be past the rate limit reset time already but not by very much.min()
so that if it comes up with a large value like86399
it will instead only wait for just over an hour. Theoretically it could get by waiting less in some some scenarios, but I figure that waiting for just over an hour is safest to ensure the limit reset is reached no matter what.