-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
extmod: add new implementation of uasyncio #5332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Features available in this version (so far):
|
In terms of total code size, it's currently smaller than the original uasyncio. On a PYBv1.0 with uasyncio a frozen module (with mpy-cross optimisation level 3, no debugging) the firmware size for the frozen module is:
Note that pend-throw and utimeq are only needed for the original uasyncio, so a fair comparison is to exclude them with the new version. Eventually parts of this new uasyncio could be rewritten in C, but it's not clear if that would increase or decrease size. |
In terms of base memory usage, I tested the following simple program which runs on both original and new uasyncio: import uasyncio as asyncio
async def main():
print('start')
await asyncio.sleep(1)
print('end')
loop = asyncio.get_event_loop()
loop.run_until_complete(main()) On the unix port, x86 64bit, the minimum heap needed to run this program (with uasyncio frozen) is:
So this new version has lower requirements for heap RAM for the basic event scheduler. |
In terms of raw performance scheduling tasks, I tested with a slightl modified version of @peterhinch 's rate.py found at https://github.com/peterhinch/micropython-async/blob/master/benchmarks/rate.py ( Tests run on PYBv1.0 with frozen uasyncio. For the original uasyncio, it could create up to 1350 tasks before it ran out of memory. The time taken for switching versus number of tasks/coros was:
(So that's about 200us to switch task.) For this new version, it could create up to 1000 tasks before it ran out of memory. Timing results:
So this new version takes a bit more RAM per task (about 30% more), and a bit more time to switch (about 65% more). But keep in mind that this new version is written in pure Python while the original uasyncio has the queuing routines written in C. Parts of this new version can be rewritten in C to reduce RAM usage per task and improve scheduling speed. |
I also tested performance using ApacheBench (ab), with
It's worth noting that with the original uasyncio if more requests come in than it is preconfigured to handle (via the fixed runq/waitq lengths) then it fails with a |
The new uasyncio in this PR has a few compatibility functions/classes/methods which means that it can run most existing uasyncio apps (as long as they only used public API functions), including the latest picoweb version unmodified (tested with |
Great work, thanks to everyone involved. |
This script fails - cancellation of a task which has run to completion. Formerly worked. Tested on a Pyboard 1.1. import uasyncio as asyncio
async def test():
print("test")
await asyncio.sleep(0.1) # Works if value is 5
print('test2')
async def main():
t = asyncio.create_task(test())
await asyncio.sleep(0.5)
print('gh')
t.cancel()
await asyncio.sleep(1)
print('done')
asyncio.run(main()) Outcome:
|
Doing some initial optimisation, rewriting the Queue and Task classes in C, total code size (Python+C) is increased by about 160 bytes on stm32, and the rate.py benchmark on PYBv1.0 gives:
(So can do more tasks at once, and faster switching, than the original uasyncio.) Testing with ApacheBench gives about 25500 requests per second, maximum request time 6ms. This optimised code is not pushed here, just proof of concept that it's possible to optimise using C. |
Thanks, can confirm. Should be fixed by latest commit. |
@dpgeorge YHM re a possible solution to the I/O priority issue. |
Thanks, really looking forward to it! We have an application where we hande several UART streams and it's beneficial if they can be served fast (e.g. for USB VCP, but also for internal UART's). Any other ideas about introducing some basic priority handling (e.g. as on peters fork fast_io)? |
@dpgeorge @hoihu I have posted a version of the new uasyncio which supports fast scheduling. The code is here and a test script may be found here. This solution improves on than that in my fast_io fork because the fast I/O option is specified on a per-stream basis rather than globally. This was in response to a suggestion from @dpgeorge who pointed out that a fast stream which did not require priority scheduling (such as a fast socket) could hog the scheduler. Fast I/O is specified by means of Stream.priority(v=True). In the case of bidirectional streams, the priority value applies in both directions. @dpgeorge Feel free to adapt any of this code as you see fit. |
I'm using this new uasyncio implementation in my mqtt connected application (using mqtt_as with the new Lock implementation and using Task Cancellation frequently) since a few days now and it works superb. The only bugs I know of at the moment are rare scenarios with wait_for and gather, but I guess that nobody will encounter those any time soon :) |
Will there be Similar to |
@dpgeorge You might like to look at this. I made a few changes and provided synchronisation primitives adapted to use your new version efficiently.
The |
Direct waiting on a pin, like |
@dpgeorge - As a note of caution - the new uasyncio may have some of the same bad socket error behaviors in the prior versions. I haven't reviewed and fully absorbed then tested the new uasyncio here but consider the following in the socket server class... (and please excuse me if I'm just mis-reading the commits here)... Any socket accept s.accept() that follows a yielded wait for a client connection may throw an exception. This is because the client connection may be aborted or otherwise dropped in that thin time slice between the connection and the call to s.accept() it. If not properly wrapped in a Try/Except, this will crash the server. IMHO, the server should handle such an exception by ignoring the error, skipping over the new task generation, and going back to waiting for connections. The server crash behavior was easily demonstrated with the older uasyncio. @peterhinch has the details and fix from the prior testing and can probably correct or collaborate my concern here.
POSSIBLY FIX WITH (UNTESTED):
|
I would like to await a pin change as well as a timeout, concurrently, and yield when any of these yield. The pin will be a data available interrupt from external hardware, I would like to avoid implementing IRQ handlers in my code and keep everything asyncio where I can, and avoid using busy-loops for timeout functionality when waiting for a state change from IRQ. I believe asyncio is a nice way to wrap up these problems in to straight-forward code. The data is fetched over SPI on ESP32 which is currently blocking and would need to be made asyncio compatible also. |
This would also be interesting in a case where I use an Arduino connected through 1-wire with the goal of controlling its pins and reading its ADCs. Currently to stay compatible to machine.Pin and machine.ADC classes I use synchronous calls but these block for quite some time, especially if a retransmission is needed. An awaitable Pin object would make this a lot better. This is of course a bit different from the scenario you and damien wrote about. |
Efficient waiting on a pin could be achieved using the ioread mechanism with no changes to uasyncio. With a suitable ioctl polling would be delegated to select.poll (which is implemented in C). I'll see if I can produce a demo of an awaitable PinChange class. |
That's still polling, though, right? I'm wondering if an event driven approach is better. Since its coming from a hardware interrupt. |
Yes it's still polling, using
An alternative would be for interrupts (hardware events) to directly schedule uasyncio tasks, by creating an The polling approach may seem like a better option but it's quite limiting because it only has the concept of readable/writable. Eg if one task waits for a pin to go high, and another task waits for the same pin to go low, that's 2 distinct events, and how do they map to pollability? With a duplex UART it's possible to wait for reading and writing at the same time, but objects like pins and other things that may have more than 2 distinct events, it's hard to fit that into To make polling work would probable need to create a distinct wrapper object for each event, and register each object with |
Personally a fan of this one:
|
After considering the issues I think the Event approach is best. An Event seems the obvious way to synchronise a coroutine with an ISR; further, enabling .set() to be called asynchronously is a worthwhile end in itself - eventually someone will do it and wonder why it's unreliable. |
And use Ubuntu bionic for qemu-arm Travic CI job.
This fixes a bug in the pairing-heap implementation when nodes are deleted with mp_pairheap_delete and then reinserted later on.
This commit adds a completely new implementation of the uasyncio module. The aim of this version (compared to the original one in micropython-lib) is to be more compatible with CPython's asyncio module, so that one can more easily write code that runs under both MicroPython and CPython (and reuse CPython asyncio libraries, follow CPython asyncio tutorials, etc). Async code is not easy to write and any knowledge users already have from CPython asyncio should transfer to uasyncio without effort, and vice versa. The implementation here attempts to provide good compatibility with CPython's asyncio while still being "micro" enough to run where MicroPython runs. This follows the general philosophy of MicroPython itself, to make it feel like Python. The main change is to use a Task object for each coroutine. This allows more flexibility to queue tasks in various places, eg the main run loop, tasks waiting on events, locks or other tasks. It no longer requires pre-allocating a fixed queue size for the main run loop. A pairing heap is used to queue Tasks. It's currently implemented in pure Python, separated into components with lazy importing for optional components. In the future parts of this implementation can be moved to C to improve speed and reduce memory usage. But the aim is to maintain a pure-Python version as a reference version.
All .exp files are included because they require CPython 3.8 which may not always be available.
Includes a test where the (non uasyncio) client does a RST on the connection, as a simple TCP server/client test where both sides are using uasyncio, and a test for TCP stream close then write.
Implements Task and TaskQueue classes in C, using a pairing-heap data structure. Using this reduces RAM use of each Task, and improves overall performance of the uasyncio scheduler.
Only included in GENERIC build.
204e14c
to
ad004db
Compare
MERGED! Thanks to all for the feedback, discussion, testing, etc. Feel free to open new issues/PRs for items discussed above (and others) that were not fixed/included in these commits. |
Finally! Thanks all for your effort. |
This commit adds a generator test for throwing into a nested exception, and one when using yield-from with a pending exception cleanup. Both these tests currently fail on the native emitter, and are simplified versions of native test failures from uasyncio in micropython#5332.
This PR adds a completely new implementation of the
uasyncio
module. The aim of this version (compared to the original one in micropython-lib) is to be more compatible with CPython'sasyncio
module, so that one can more easily write code that runs under both MicroPython and CPython (and reuse CPython asyncio libraries, follow CPython asyncio tutorials, etc). Async code is not easy to write and any knowledge users already have from CPython asyncio should transfer to uasyncio without effort, and vice versa.The implementation here attempts to provide good compatibility with CPython's asyncio while still being "micro" enough to run where MicroPython runs. This follows the general philosophy of MicroPython itself, to make it feel like Python.
The existing uasyncio at micropython-lib has its merits and will remain as an independent module/library, but would need to be renamed so as to not clash with the new implementation here. Note that the implementation in this PR provides a compatibility layer to be compatible (for the most part) with the original uasyncio.
It's currently implemented in pure Python and runs under existing, unmodified MicroPython (there's a commit in this PR to improve allocation of iterator buffers but that is not needed for uasyncio to work). In the future parts of this implementation could be moved to C to improve speed and reduce memory usage. But it would be good to maintain a pure-Python version as a reference version.
At this point efficiency is not a goal, rather correctness is. Tests are included in this PR.
Thanks to @peterhinch and @kevinkk525 for help with initial testing and bug finding.