-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issue#2949 Put in a check to prevent USB starvation by long calculations #3149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we really want to worry about this, then I think we could check the lengths and only do this every words, where n is order-of-magnitude 100 or 1000 or whatever. Or we could monitor a tick counter in an efficient way to make sure we call There are other operations besides multiplication that could also cause long lockouts. I do not want to discourage trying to solve the problem, but I think the use cases where this does cause a problem are pretty limited. |
I have run multiple tests, using both short and long numbers; and the
version with the change to run background tasks
always is faster (1-5%) than the standard version. I currently have no
rationale for this; but the results are consistent.
So, I don't think there is a significant cost to the suggested change.
I will see what the performance impact of counting ticks would be; but
since just calling RUN_BACKGROUND_TASKS doesn't
seem to be costing us anything I'm not sure what the benefit would be.
Certainly there are other operations that can take a long time; but without
some sort of overall scheduling mechanism I'm not
sure what better options are available.
…On Mon, Jul 13, 2020 at 11:04 PM Dan Halbert ***@***.***> wrote:
RUN_BACKGROUND_TASKS is not necessarily that cheap. I would suggest
testing the performance impact on loops doing much shorter longint
manipulations, such as multiplying two numbers that are only a couple of
32-bit words long. The #2949
<#2949> code seems pretty
pathological to me.
If we really want to worry about this, then I think we could check the
lengths and only do this every words, where n is order-of-magnitude 100 or
1000 or whatever. Or we could monitor a tick counter in an efficient way to
make sure we call RUN_BACKGROUND_TASKS every msec or half msec.
There are other operations besides multiplication that could also cause
long lockouts
I do not want to discourage trying to solve the problem, but I think the
use cases where this does cause a problem are pretty limited.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3149 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKEQ4C2O6UEKMLEUUG4DR3PKMZANCNFSM4OZDEEKA>
.
|
Like @dhalbert I am surprised at your benchmark results. At the tip of main, If the finding that there's no important performance reduction (or indeed if there's a performance increase) with #2879 merged is duplicated, I think it makes total sense for this to go in. When #2979 does land,
(callback_head is a NULL pointer most of the time, unless there's actual work to be done) This would be faster than a loop-counter-modulo type check if fully inlined. however, depending on the choices of the inliner, there will also be 1 or 2 function call overheads. Another thing one might wish for is the ability to interrupt the long-running long arithmetic. I don't think RUN_BACKGROUND_TASKS alone is enough to do this. I wonder if we could factor out the checks used in |
Another thing that might be happening is that inserting A long-running division, addition, subtraction, etc. would also cause USB starve-out. That's why I was surprised you only added it to multiplication. |
Addition and subtraction are probably quite quick relative to multiplication; and adding it to multiplication gets |
Division does not seem to have the same issue. I generated some very large
numbers (i.e. 2**(65536*2) ) and tried various
division in REPL. All returned immediately (or close enough so I couldn't
see any lag). It seems only the large exponents
are causing an issue.
I agree that running more code really shouldn't take less time. But,
I've instrumented the mpn_mul() and looked at the timing
with a login analyzer. The average time in the standard code is 4.12
seconds, and in the code calling background tasks it is
3.83 seconds. I also verified that the calculations are accurate.
Dis-assembling the code in mpz_mul_inpl() ( which inlines
mpn_mul, since it is the only caller) does show some differences, but I
can't see enough to say where the difference in time
is occurring. I also instrumented run_background_tasks and can see that it
is taking about 5 microseconds and getting run every
688 microseconds. So, I would have expected an overhead of about 7/10ths of
1 percent. While I am curious enough to keep
investigating the time difference, I think the patch does not show any
signs of hurting performance and does at least take care
of the edge case raised in issue 2949.
…On Tue, Jul 14, 2020 at 9:14 AM Jeff Epler ***@***.***> wrote:
Addition and subtraction are probably quite quick relative to
multiplication; and adding it to multiplication gets ** powers too.
Division would be similarly long-running as multiplication. (think of how
we write out addition and multiplication long-hand; you write down a "lot
more" when multiplying two numbers than when adding two numbers. The
performance of the "mpz" multiplication and addition code roughly mirrors
the number of digits you'd write down doing long-hand arithmetic; heavy
hand waving in order to not really invoke computer science concepts)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3149 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKERF5QFZNJ5DNTCFLDLR3RR2VANCNFSM4OZDEEKA>
.
|
I had an nice audio chat with @DavePutz a few days ago, and he has gotten even more anomalous timing results when monitoring the time externally by pin-toggling in C. He is investigating further. |
It looks as though some of the anomalous timing was due to cache lines. Turning off cache made things a little more sensible. |
@DavePutz Thanks for the more thorough testing. To expand,on this, I suggested disabling the I/D cache in case the odd results were due to crossing a cache line or something like that. I'm thinking that the reason that multiplication makes a difference, but not division, is because the issue is really repeated multiplication due to the So an alternative fix is to add |
@dhalbert <https://github.com/dhalbert> Unfortunately, moving the call of
RUN_BACKGROUND_TASKS to the loops in mpz_pow_inpl()
or mpz_mul_inpl() is not as effective. Since in there it is not called as
often, it takes a very long time (over 2 minutes) for
CIRCUITPY to show up after a reset, if it mounts at all. Since in a normal
(i.e. not exponentiation) multiplication
it would only call RUN_BACKGROUND_TASKS once; I think leaving it in
mpn_mul() will take care of the
USB stall without a noticeable impact overall.
…On Mon, Jul 20, 2020 at 9:35 AM Dan Halbert ***@***.***> wrote:
@DavePutz <https://github.com/DavePutz> Thanks for the more thorough
testing. To expand,on this, I suggested disabling the I/D cache in case the
odd results were due to crossing a cache line or something like that.
I'm thinking that the reason that multiplication makes a difference, but
not division, is because the issue is really repeated multiplication due to
the ** operator being atomic. If I understand correctly, the test program
is exponentiation, not explicit repeated multiplication. The background
tasks will get a chance to run after each multiplication in the latter
case, but not the former.
So an alternative fix is to add RUN_BACKGROUND_TASKS running to the
exponentiation operation, not multiplication. I don't mind leaving it in
multiplication but would welcome your thoughts.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3149 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKERYAISUNQVC642R7ELR4RI4VANCNFSM4OZDEEKA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing in exponentiation. This covers all the bases I was worried about, so we can merge. Thank you for your patience.
Added a call to RUN_BACKGROUND_TASKS in the loop in mpn_mul(). This allows USB requests to be processed during long-running multiplications. Tests were also run to check performance impact. Surprisingly, when using the test script from issue #2949 there was a 3-5% performance improvement with the change. Note that the second test mentioned in that issue (running 10**40000 from REPL) is not really affected by this patch, as tracing shows that the time for that command is actually being spent in p_obj_int_formatted_impl->mpz_as_str_inpl; formatting the very long string for output.
Note that this method was used rather than some sort of timer/interrupt to make sure that any possible conflicts with storage allocations would be avoided.