8000 Issue#2949 Put in a check to prevent USB starvation by long calculations by DavePutz · Pull Request #3149 · adafruit/circuitpython · GitHub
[go: up one dir, main page]

Skip to content

Issue#2949 Put in a check to prevent USB starvation by long calculations #3149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 21, 2020
Merged

Issue#2949 Put in a check to prevent USB starvation by long calculations #3149

merged 6 commits into from
Jul 21, 2020

Conversation

DavePutz
Copy link
Collaborator
@DavePutz DavePutz commented Jul 14, 2020

Added a call to RUN_BACKGROUND_TASKS in the loop in mpn_mul(). This allows USB requests to be processed during long-running multiplications. Tests were also run to check performance impact. Surprisingly, when using the test script from issue #2949 there was a 3-5% performance improvement with the change. Note that the second test mentioned in that issue (running 10**40000 from REPL) is not really affected by this patch, as tracing shows that the time for that command is actually being spent in p_obj_int_formatted_impl->mpz_as_str_inpl; formatting the very long string for output.
Note that this method was used rather than some sort of timer/interrupt to make sure that any possible conflicts with storage allocations would be avoided.

@dhalbert
Copy link
Collaborator
dhalbert commented Jul 14, 2020

RUN_BACKGROUND_TASKS is not necessarily that cheap. I would suggest testing the performance impact on loops doing much shorter longint manipulations, such as multiplying two numbers that are only a couple of 32-bit words long. The #2949 code seems pretty pathological to me.

If we really want to worry about this, then I think we could check the lengths and only do this every words, where n is order-of-magnitude 100 or 1000 or whatever. Or we could monitor a tick counter in an efficient way to make sure we call RUN_BACKGROUND_TASKS every msec or half msec.

There are other operations besides multiplication that could also cause long lockouts.

I do not want to discourage trying to solve the problem, but I think the use cases where this does cause a problem are pretty limited.

@DavePutz
Copy link
Collaborator Author
DavePutz commented Jul 14, 2020 via email

@jepler
Copy link
jepler commented Jul 14, 2020

Like @dhalbert I am surprised at your benchmark results. At the tip of main, RUN_BACKGROUND_TASKS is fairly expensive (relative to 5.3.0) and I am working on addressing that at #2879.

If the finding that there's no important performance reduction (or indeed if there's a performance increase) with #2879 merged is duplicated, I think it makes total sense for this to go in.

When #2979 does land, supervisor_run_background_tasks_if_tick() (which is what RUN_BACKGROUND_TASKS expands to) simply calls background_callback_run_all which has a very efficient initial check:

void background_callback_run_all() {
    if (!callback_head) {
        return;

(callback_head is a NULL pointer most of the time, unless there's actual work to be done)

This would be faster than a loop-counter-modulo type check if fully inlined. however, depending on the choices of the inliner, there will also be 1 or 2 function call overheads.

Another thing one might wish for is the ability to interrupt the long-running long arithmetic. I don't think RUN_BACKGROUND_TASKS alone is enough to do this. I wonder if we could factor out the checks used in mp_hal_delay_ms so that in such a location we can handle interrupt, reload, and watchdog all from one simple and efficient macro. However, this can totally be a subsequent PR. I'm brainstorming here, not adding requirements to allow this to be mergeable.

@dhalbert
Copy link
Collaborator

RUN_BACKGROUND_TASKS cannot take negative time. So I think there may be something wrong with the timekeeping if inserting it causes the tests to appear to run faster. A way to check this would be to do some kind of "wall clock" check, such as toggling a pin before and after the operation, and measuring the pin-toggle time with a Saleae or similar.

Another thing that might be happening is that inserting RUN_BACKGROUND_TASKS is causing the longint operation to be truncated or not finished in some way. So the results of the computation should be checked for correctness.

A long-running division, addition, subtraction, etc. would also cause USB starve-out. That's why I was surprised you only added it to multiplication.

@jepler
Copy link
jepler commented Jul 14, 2020

Addition and subtraction are probably quite quick relative to multiplication; and adding it to multiplication gets ** powers too. Division would be similarly long-running as multiplication. (think of how we write out addition and multiplication long-hand; you write down a "lot more" when multiplying two numbers than when adding two numbers. The performance of the "mpz" multiplication and addition code roughly mirrors the number of digits you'd write down doing long-hand arithmetic; heavy hand waving in order to not really invoke computer science concepts)

@DavePutz
Copy link
Collaborator Author
DavePutz commented Jul 14, 2020 via email

@tannewt tannewt requested review from dhalbert and jepler July 18, 2020 01:02
@dhalbert
Copy link
Collaborator

I had an nice audio chat with @DavePutz a few days ago, and he has gotten even more anomalous timing results when monitoring the time externally by pin-toggling in C. He is investigating further.

@DavePutz
Copy link
Collaborator Author

It looks as though some of the anomalous timing was due to cache lines. Turning off cache made things a little more sensible.
I did test using the improved background that @jepler has developed, but it did not make any real difference.
The measured overhead of running RUN_BACKGROUND_TASKS is even less than I had previously thought ( I had to increase the Mhz resolution on my logic analyzer to get more accurate values). It looks like the overhead for the ~4 second run in the test script was about .5 seconds, or .2%. Since in a 'normal' multiplication the background tasks would only be run once, my feeling is that the small overhead would be acceptable to resolve this edge case.. I guess as a result I would recommend merging, If other routines are found that block USB we can look at them individually.

@dhalbert
Copy link
Collaborator

@DavePutz Thanks for the more thorough testing. To expand,on this, I suggested disabling the I/D cache in case the odd results were due to crossing a cache line or something like that.

I'm thinking that the reason that multiplication makes a difference, but not division, is because the issue is really repeated multiplication due to the ** operator being atomic. If I understand correctly, the test program is exponentiation, not explicit repeated multiplication. The background tasks will get a chance to run after each multiplication in the latter case, but not the former.

So an alternative fix is to add RUN_BACKGROUND_TASKS running to the exponentiation operation, not multiplication. I don't mind leaving it in multiplication but would welcome your thoughts.

@DavePutz
Copy link
Collaborator Author
DavePutz commented Jul 21, 2020 via email

Copy link
Collaborator
@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing in exponentiation. This covers all the bases I was worried about, so we can merge. Thank you for your patience.

@dhalbert dhalbert merged commit 7f27fcd into adafruit:main Jul 21, 2020
@DavePutz DavePutz deleted the issue2949 branch December 30, 2020 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ItsyBitsy M4 Express w/ CP5.3.0: bignum(?) code prevents USB from functioning (mount and REPL access)
3 participants
0