extmod: add new implementation of uasyncio #5332

dpgeorge · 2019-11-15T04:09:12Z

This PR adds a completely new implementation of the uasyncio module. The aim of this version (compared to the original one in micropython-lib) is to be more compatible with CPython's asyncio module, so that one can more easily write code that runs under both MicroPython and CPython (and reuse CPython asyncio libraries, follow CPython asyncio tutorials, etc). Async code is not easy to write and any knowledge users already have from CPython asyncio should transfer to uasyncio without effort, and vice versa.

The implementation here attempts to provide good compatibility with CPython's asyncio while still being "micro" enough to run where MicroPython runs. This follows the general philosophy of MicroPython itself, to make it feel like Python.

The existing uasyncio at micropython-lib has its merits and will remain as an independent module/library, but would need to be renamed so as to not clash with the new implementation here. Note that the implementation in this PR provides a compatibility layer to be compatible (for the most part) with the original uasyncio.

It's currently implemented in pure Python and runs under existing, unmodified MicroPython (there's a commit in this PR to improve allocation of iterator buffers but that is not needed for uasyncio to work). In the future parts of this implementation could be moved to C to improve speed and reduce memory usage. But it would be good to maintain a pure-Python version as a reference version.

At this point efficiency is not a goal, rather correctness is. Tests are included in this PR.

Thanks to @peterhinch and @kevinkk525 for help with initial testing and bug finding.

dpgeorge · 2019-11-15T05:20:51Z

Features available in this version (so far):

basic task creation, create_task(), run() functions
sleep(); and sleep_ms() extension
Task objects, ability to cancel and await on tasks
Lock and Event class
gather() and wait_for() functions
TCP client and server via open_connection() and start_server()
supports duplex streams
does not allocate any heap in the core event loop, including sleep() calls

dpgeorge · 2019-11-15T05:27:29Z

In terms of total code size, it's currently smaller than the original uasyncio.

On a PYBv1.0 with uasyncio a frozen module (with mpy-cross optimisation level 3, no debugging) the firmware size for the frozen module is:

original uasyncio: 9432 bytes
this new version: 8100 bytes (with pend-throw and utimeq still enabled)
this new version: 7204 bytes (with pend-throw and utimeq disabled)

Note that pend-throw and utimeq are only needed for the original uasyncio, so a fair comparison is to exclude them with the new version.

Eventually parts of this new uasyncio could be rewritten in C, but it's not clear if that would increase or decrease size.

dpgeorge · 2019-11-15T05:30:00Z

In terms of base memory usage, I tested the following simple program which runs on both original and new uasyncio:

import uasyncio as asyncio

async def main():
    print('start')
    await asyncio.sleep(1)
    print('end')

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

On the unix port, x86 64bit, the minimum heap needed to run this program (with uasyncio frozen) is:

original uasyncio: -X heapsize=11280
this new version: -X heapsize=8180

So this new version has lower requirements for heap RAM for the basic event scheduler.

dpgeorge · 2019-11-15T05:38:13Z

In terms of raw performance scheduling tasks, I tested with a slightl modified version of @peterhinch 's rate.py found at https://github.com/peterhinch/micropython-async/blob/master/benchmarks/rate.py (yield replaced with await asyncio.sleep_ms(0)). This tests the time taken to switch from one task/coro to another.

Tests run on PYBv1.0 with frozen uasyncio.

For the original uasyncio, it could create up to 1350 tasks before it ran out of memory. The time taken for switching versus number of tasks/coros was:

Coros   10  Iterations/sec  4660  Duration 214us
Coros   50  Iterations/sec  4855  Duration 205us
Coros  100  Iterations/sec  4875  Duration 205us
Coros  200  Iterations/sec  4950  Duration 202us
Coros  500  Iterations/sec  5100  Duration 196us
Coros 1000  Iterations/sec  5250  Duration 190us
Coros 1350  Iterations/sec  5900  Duration 169us

(So that's about 200us to switch task.)

For this new version, it could create up to 1000 tasks before it ran out of memory. Timing results:

Coros   10  Iterations/sec  3070  Duration 325us
Coros   50  Iterations/sec  3065  Duration 326us
Coros  100  Iterations/sec  3071  Duration 325us
Coros  200  Iterations/sec  3055  Duration 327us
Coros  500  Iterations/sec  3009  Duration 332us
Coros 1000  Iterations/sec  2877  Duration 347us

So this new version takes a bit more RAM per task (about 30% more), and a bit more time to switch (about 65% more). But keep in mind that this new version is written in pure Python while the original uasyncio has the queuing routines written in C. Parts of this new version can be rewritten in C to reduce RAM usage per task and improve scheduling speed.

dpgeorge · 2019-11-15T05:43:48Z

I also tested performance using ApacheBench (ab), with ab -n10000 -c100 ..., so 10k requests total with 100 concurrent ones.

original uasyncio does 21000 requests per second, maximum request took 9ms
this new version does 19500 requests per second, maximum request took 7ms

It's worth noting that with the original uasyncio if more requests come in than it is preconfigured to handle (via the fixed runq/waitq lengths) then it fails with a IndexError due to queue overflow. On this new version this doesn't happen because the schedule queue is a linked list.

dpgeorge · 2019-11-15T05:46:11Z

The new uasyncio in this PR has a few compatibility functions/classes/methods which means that it can run most existing uasyncio apps (as long as they only used public API functions), including the latest picoweb version unmodified (tested with example_webapp.py).

dpgeorge · 2019-11-15T05:51:32Z

The version here would fix issues raised in #5242 and #5276 (cancellation of tasks), and I also did some basic testing related to #5172 (POLLHUP and POLLERR handilng) and it seems to work correctly there.

dxxb · 2019-11-15T07:48:05Z

Great work, thanks to everyone involved.

peterhinch · 2019-11-15T12:56:39Z

This script fails - cancellation of a task which has run to completion. Formerly worked. Tested on a Pyboard 1.1.

import uasyncio as asyncio

async def test():
    print("test")
    await asyncio.sleep(0.1)  # Works if value is 5
    print('test2')

async def main():
    t = asyncio.create_task(test())
    await asyncio.sleep(0.5)
    print('gh')
    t.cancel()
    await asyncio.sleep(1)
    print('done')

asyncio.run(main())

Outcome:

test
test2
gh
task raised exception: None
Traceback (most recent call last):
  File "uasyncio.py", line 456, in run_until_complete
AttributeError: 'NoneType' object has no attribute 'throw'
done
>>>

dpgeorge · 2019-11-15T13:30:49Z

Doing some initial optimisation, rewriting the Queue and Task classes in C, total code size (Python+C) is increased by about 160 bytes on stm32, and the rate.py benchmark on PYBv1.0 gives:

Coros   10  Iterations/sec  6265  Duration 159us
Coros   50  Iterations/sec  6273  Duration 159us
Coros  100  Iterations/sec  6294  Duration 158us
Coros  200  Iterations/sec  6323  Duration 158us
Coros  500  Iterations/sec  6393  Duration 156us
Coros 1000  Iterations/sec  6560  Duration 152us
Coros 1500  Iterations/sec  6331  Duration 157us

(So can do more tasks at once, and faster switching, than the original uasyncio.)

Testing with ApacheBench gives about 25500 requests per second, maximum request time 6ms.

This optimised code is not pushed here, just proof of concept that it's possible to optimise using C.

dpgeorge · 2019-11-15T13:37:09Z

This script fails - cancellation of a task which has run to completion.

Thanks, can confirm. Should be fixed by latest commit.

peterhinch · 2019-11-15T17:55:16Z

@dpgeorge YHM re a possible solution to the I/O priority issue.

hoihu · 2019-11-15T20:55:03Z

Thanks, really looking forward to it!

We have an application where we hande several UART streams and it's beneficial if they can be served fast (e.g. for USB VCP, but also for internal UART's). Any other ideas about introducing some basic priority handling (e.g. as on peters fork fast_io)?

peterhinch · 2019-11-16T06:11:29Z

@dpgeorge @hoihu I have posted a version of the new uasyncio which supports fast scheduling. The code is here and a test script may be found here.

This solution improves on than that in my fast_io fork because the fast I/O option is specified on a per-stream basis rather than globally. This was in response to a suggestion from @dpgeorge who pointed out that a fast stream which did not require priority scheduling (such as a fast socket) could hog the scheduler.

Fast I/O is specified by means of Stream.priority(v=True). In the case of bidirectional streams, the priority value applies in both directions.

@dpgeorge Feel free to adapt any of this code as you see fit.

kevinkk525 · 2019-11-16T14:29:05Z

I'm using this new uasyncio implementation in my mqtt connected application (using mqtt_as with the new Lock implementation and using Task Cancellation frequently) since a few days now and it works superb.

The only bugs I know of at the moment are rare scenarios with wait_for and gather, but I guess that nobody will encounter those any time soon :)

nevercast · 2019-11-20T01:06:51Z

Will there be await on Pin coming soon, or custom signals that can be awaited so that interrupts can be used without a while loop?

Similar to Event https://docs.python.org/3/library/asyncio-sync.html#asyncio.Event

peterhinch · 2019-11-20T11:32:10Z

@dpgeorge You might like to look at this. I made a few changes and provided synchronisation primitives adapted to use your new version efficiently.

uasyncio is implemented as a Python package. This requires primitives to be explicitly imported: while not CPython compatible it does make for significant RAM savings.

The Queue class is adapted from Paul's code, and the Lock class is Kevin Köck's solution, included for completeness and so my tests can run.

dpgeorge · 2019-11-21T00:05:03Z

Will there be await on Pin coming soon, or custom signals that can be awaited so that interrupts can be used without a while loop?

Event is already provided so can be used to signal custom events, like pin change. Although it might need some slight improvements so that Event.set() can be called from a (soft) interrupt handler.

Direct waiting on a pin, like await pin, could be implemented (maybe as await pin.wait_high() for example) but it'd need some thought/design as to how it behaves. In what way would you use this feature?

t35tB0t · 2019-11-22T08:12:12Z

@dpgeorge - As a note of caution - the new uasyncio may have some of the same bad socket error behaviors in the prior versions. I haven't reviewed and fully absorbed then tested the new uasyncio here but consider the following in the socket server class... (and please excuse me if I'm just mis-reading the commits here)...

Any socket accept s.accept() that follows a yielded wait for a client connection may throw an exception. This is because the client connection may be aborted or otherwise dropped in that thin time slice between the connection and the call to s.accept() it. If not properly wrapped in a Try/Except, this will crash the server. IMHO, the server should handle such an exception by ignoring the error, skipping over the new task generation, and going back to waiting for connections.

The server crash behavior was easily demonstrated with the older uasyncio. @peterhinch has the details and fix from the prior testing and can probably correct or collaborate my concern here.

while True:
           _io_queue.queue_read(s)
            try:
                yield
            except CancelledError:
                # Shutdown server
                s.close()
                return
            s2, addr = s.accept()   <--- WILL CRASH SERVER IF CLIENT HAS ABORTED SOCKET
            s2.setblocking(False)
            s2s = Stream(s2, {'peername': addr})
            create_task(cb(s2s, s2s))

POSSIBLY FIX WITH (UNTESTED):

while True:
           _io_queue.queue_read(s)
            try:
                yield
            except CancelledError:
                # Shutdown server
                s.close()
                return
            try:
                 s2, addr = s.accept()
            except:
                continue
            else:
                s2.setblocking(False)
                s2s = Stream(s2, {'peername': addr})
                create_task(cb(s2s, s2s))

nevercast · 2019-11-25T01:12:23Z

@dpgeorge

In what way would you use this feature?

I would like to await a pin change as well as a timeout, concurrently, and yield when any of these yield. The pin will be a data available interrupt from external hardware, I would like to avoid implementing IRQ handlers in my code and keep everything asyncio where I can, and avoid using busy-loops for timeout functionality when waiting for a state change from IRQ.

I believe asyncio is a nice way to wrap up these problems in to straight-forward code.

The data is fetched over SPI on ESP32 which is currently blocking and would need to be made asyncio compatible also.

kevinkk525 · 2019-11-25T06:54:43Z

This would also be interesting in a case where I use an Arduino connected through 1-wire with the goal of controlling its pins and reading its ADCs. Currently to stay compatible to machine.Pin and machine.ADC classes I use synchronous calls but these block for quite some time, especially if a retransmission is needed. An awaitable Pin object would make this a lot better.

This is of course a bit different from the scenario you and damien wrote about.

peterhinch · 2019-11-25T19:59:50Z

Efficient waiting on a pin could be achieved using the ioread mechanism with no changes to uasyncio. With a suitable ioctl polling would be delegated to select.poll (which is implemented in C). I'll see if I can produce a demo of an awaitable PinChange class.

nevercast · 2019-11-25T20:48:11Z

ioctl polling would be delegated to select.poll

That's still polling, though, right? I'm wondering if an event driven approach is better. Since its coming from a hardware interrupt.

dpgeorge · 2019-11-26T02:36:04Z

That's still polling, though, right? I'm wondering if an event driven approach is better. Since its coming from a hardware interrupt.

Yes it's still polling, using select.poll. With a lot of events being waited on (eg many pins waiting on change, sockets for read/write, UART, async SPI, etc) it won't scale well to register everything with select.poll as it currently works.

select.poll (on bare metal ports) could be reworked so that it used interrupts (hardware events) to move objects from the poll waiting list to the poll ready list. That would require a fair amount of effort.

An alternative would be for interrupts (hardware events) to directly schedule uasyncio tasks, by creating an Event() object for each hardware event and calling event.set() when the interrupt fires.

The polling approach may seem like a better option but it's quite limiting because it only has the concept of readable/writable. Eg if one task waits for a pin to go high, and another task waits for the same pin to go low, that's 2 distinct events, and how do they map to pollability? With a duplex UART it's possible to wait for reading and writing at the same time, but objects like pins and other things that may have more than 2 distinct events, it's hard to fit that into select.poll behaviour.

To make polling work would probable need to create a distinct wrapper object for each event, and register each object with select.poll. Eg for pin, would need to provide 4 event objects (pin high, pin low, pin rising edge, pin falling edge) which all get registered with POLLIN and become "readable" when the event occurs.

nevercast · 2019-11-26T02:51:11Z

Personally a fan of this one:

An alternative would be for interrupts (hardware events) to directly schedule uasyncio tasks, by creating an Event() object for each hardware event and calling event.set() when the interrupt fires.

peterhinch · 2019-11-26T08:52:39Z

After considering the issues I think the Event approach is best. An Event seems the obvious way to synchronise a coroutine with an ISR; further, enabling .set() to be called asynchronously is a worthwhile end in itself - eventually someone will do it and wonder why it's unreliable.

And use Ubuntu bionic for qemu-arm Travic CI job.

This fixes a bug in the pairing-heap implementation when nodes are deleted with mp_pairheap_delete and then reinserted later on.

This commit adds a completely new implementation of the uasyncio module. The aim of this version (compared to the original one in micropython-lib) is to be more compatible with CPython's asyncio module, so that one can more easily write code that runs under both MicroPython and CPython (and reuse CPython asyncio libraries, follow CPython asyncio tutorials, etc). Async code is not easy to write and any knowledge users already have from CPython asyncio should transfer to uasyncio without effort, and vice versa. The implementation here attempts to provide good compatibility with CPython's asyncio while still being "micro" enough to run where MicroPython runs. This follows the general philosophy of MicroPython itself, to make it feel like Python. The main change is to use a Task object for each coroutine. This allows more flexibility to queue tasks in various places, eg the main run loop, tasks waiting on events, locks or other tasks. It no longer requires pre-allocating a fixed queue size for the main run loop. A pairing heap is used to queue Tasks. It's currently implemented in pure Python, separated into components with lazy importing for optional components. In the future parts of this implementation can be moved to C to improve speed and reduce memory usage. But the aim is to maintain a pure-Python version as a reference version.

All .exp files are included because they require CPython 3.8 which may not always be available.

Includes a test where the (non uasyncio) client does a RST on the connection, as a simple TCP server/client test where both sides are using uasyncio, and a test for TCP stream close then write.

Implements Task and TaskQueue classes in C, using a pairing-heap data structure. Using this reduces RAM use of each Task, and improves overall performance of the uasyncio scheduler.

Only included in GENERIC build.

dpgeorge · 2020-03-25T14:51:36Z

MERGED!

Thanks to all for the feedback, discussion, testing, etc.

Feel free to open new issues/PRs for items discussed above (and others) that were not fixed/included in these commits.

mk-pmb · 2020-03-25T19:29:21Z

Finally! Thanks all for your effort.

This commit adds a generator test for throwing into a nested exception, and one when using yield-from with a pending exception cleanup. Both these tests currently fail on the native emitter, and are simplified versions of native test failures from uasyncio in micropython#5332.

dpgeorge added the extmod Relates to extmod/ directory in source label Nov 15, 2019

dpgeorge mentioned this pull request Nov 15, 2019

collections.deque constructor does not take keyword arguments #5244

Open

peterhinch mentioned this pull request Nov 15, 2019

Calling aclose() on StreamReader ? peterhinch/micropython-async#30

Closed

amotl mentioned this pull request Nov 17, 2019

New MicroPython implementation of uasyncio (WIP) jczic/MicroWebSrv2#9

Closed

dpgeorge added 20 commits March 26, 2020 01:21

qemu-arm: Set default board as mps2-an385 to get more flash for tests.

ab00f4c

And use Ubuntu bionic for qemu-arm Travic CI job.

travis: Print errors out for OSX job.

98ab764

py/pairheap: Properly unlink node on pop and delete.

c47a3dd

This fixes a bug in the pairing-heap implementation when nodes are deleted with mp_pairheap_delete and then reinserted later on.

py/pairheap: Add helper function to initialise a new node.

6c7e78d

unix/coverage: Init all pairheap test nodes before using them.

f9741d1

stm32/softtimer: Initialise pairing-heap node before pushing to heap.

f05ae41

tests/extmod: Add uasyncio tests.

c4935f3

All .exp files are included because they require CPython 3.8 which may not always be available.

tests/run-tests: Skip uasyncio if no async, and skip one test on native.

5d09a40

travis: Exclude some uasyncio tests on OSX.

3667eff

tests: Make default MICROPYPATH include extmod to find uasyncio.

18fa65e

tests/multi_net: Add uasyncio test for TCP server and client.

38904b8

Includes a test where the (non uasyncio) client does a RST on the connection, as a simple TCP server/client test where both sides are using uasyncio, and a test for TCP stream close then write.

tests/net_inet: Add uasyncio internet tests.

081d067

extmod/uasyncio: Add optional implementation of core uasyncio in C.

bc009fd

Implements Task and TaskQueue classes in C, using a pairing-heap data structure. Using this reduces RAM use of each Task, and improves overall performance of the uasyncio scheduler.

unix: Enable uasyncio C helper module on coverage build.

91dd394

docs/library: Add initial docs for uasyncio module.

c99322f

extmod/uasyncio: Add manifest.py for freezing uasyncio Py files.

3b68f36

stm32: Enable and freeze uasyncio.

35e2dd0

esp8266: Enable and freeze uasyncio.

1d4d688

Only included in GENERIC build.

esp32: Enable and freeze uasyncio.

ad004db

dpgeorge force-pushed the extmod-uasyncio branch from 204e14c to ad004db Compare March 25, 2020 14:31

dpgeorge merged commit ad004db into micropython:master Mar 25, 2020

dpgeorge deleted the extmod-uasyncio branch March 25, 2020 14:51

dpgeorge mentioned this pull request Mar 30, 2020

extmod/uasyncio Event.set() not safe in ISR #5795

Open

stepansnigirev mentioned this pull request Apr 26, 2020

gui: animated qr cryptoadvance/specter-diy#55

Merged

dlech mentioned this pull request May 15, 2021

asyncio: fix import #7267

Closed

dhalbert mentioned this pull request Nov 16, 2023

Could TaskQueue be implemented without Task being in C? adafruit/Adafruit_CircuitPython_asyncio#60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

extmod: add new implementation of uasyncio #5332

extmod: add new implementation of uasyncio #5332

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

extmod: add new implementation of uasyncio #5332

extmod: add new implementation of uasyncio #5332

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!