8000 Alarm pool sleep changes by peterharperuk · Pull Request #16454 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

Alarm pool sleep changes #16454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 12, 2025

Conversation

peterharperuk
Copy link
Contributor

rp2350 generates a sev when using spin locks, which can prevent the device sleeping, see raspberrypi/pico-sdk#1812 for more background.

The pico-sdk alarm pool handles this.

However Micropython moved away from using the alarm pool due to issues like this...
Fixed in sdk 2.0.0: raspberrypi/pico-sdk#1552
Fixed in sdk 2.1.0: raspberrypi/pico-sdk#1953
Fixed in develop: raspberrypi/pico-sdk#2127

This change puts back the use of the alarm pool in mp_wfe_or_timeout to hopefully fix sleep issues.

Copy link
github-actions bot commented Dec 19, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:  +184 +0.020% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Copy link
codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (a05766f) to head (69993da).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #16454   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21896    21896           
=======================================
  Hits        21578    21578           
  Misses        318      318           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@peterharperuk
Copy link
Contributor Author

Related PR #16431

@projectgus projectgus self-requested a review December 20, 2024 04:28
@projectgus
Copy link
Contributor

Quick update for anyone wondering where all this has got to: This is the preferred fix for the rp2 lightsleep issues, but we can't merge it until raspberrypi/pico-sdk#2127 makes it into a pico-sdk release.

@ssotheremail
Copy link
ssotheremail commented Jan 16, 2025

Thanks for the update projectgus. Any thoughts at all on when?
BTW There appears to be another behaviour which MAY not be covered by the current proposals/comments and it would be a pity if it escaped:
Pico 2 lightsleep(5000) ok, except when preceded by a sleep(nn). e.g. sleep(5)
This can be very confusing if encountered.
This does not happen on a Pico. I have not retested this recently on a PicoW or Pico2W because of their additional sleep problems.

I notice:
raspberrypi/pico-sdk#2127
Is not milestoned for 2.1.1 of the SDK (undated) but 2.2.0 (even further into the future I imagine and undated)
Is it anticipated that all the issues with sleeping will be cleared up by the fixes you plan such as #16454 and #16431?

@projectgus
Copy link
Contributor

We've updated to pico-sdk v2.1.1 in #16783, which includes the fix we're waiting on for the alarm pool.

@peterharperuk are you able to come back to this PR, or is it better if one of us picks it up from here?

@peterharperuk
Copy link
Contributor Author

Sorry, yes. I'll try and take a look this week.

@projectgus
Copy link
Contributor

@peterharperuk All good, we only just updated anyhow. Please let us know how you get on.

@ssotheremail
Copy link

Thanks for all the work on this. Sorry if I am being dim but I am slightly confused because earlier on (above) it says "waiting for pico sdk 2127" but thats still milestoned for 2.2.0. Obviously it would be great if lightsleep is fixed before that. In case its useful as a baseline I reran my test program with 1.25.0 preview 389 and got 20 seconds for pico (correct), 15 seconds for pico2 (lightsleep after sleep returns early), 5 seconds picow and pico2w (lightsleep always returns early).

@projectgus
Copy link
Contributor

"waiting for pico sdk 2127" but thats still milestoned for 2.2.0.

Peter will know more than I do about this, but even though the milestone is set to 2.2.0 in their tracker the fix commit looks like it was included in the 2.1.1 tag: raspberrypi/pico-sdk@969f589

(The list of tags is on the line under the commit title in the GitHub UI.)

@peterharperuk peterharperuk force-pushed the alarm_pool_sleep_changes branch 2 times, most recently from bd5dbf0 to 05b02b8 Compare March 18, 2025 18:05
@peterharperuk peterharperuk marked this pull request as ready for review March 18, 2025 18:05
@peterharperuk peterharperuk force-pushed the alarm_pool_sleep_changes branch from 05b02b8 to b9cfba5 Compare March 18, 2025 18:09
@projectgus projectgus linked an issue Mar 25, 2025 that may be closed by this pull request
Copy link
Contributor
@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterharperuk for submitting this. The changes look good to me, and I think we should merge this.

I've tested the USB power consumption of both a PICO W and a PICO 2W with and without this patch, and it does fix the lightsleep and idle issues with RP2350 from what I can see.

Board Version Idle at REPL Busy time.sleep() machine.lightsleep in a loop
RPI_PICO_W master 18mA 18mA 18mA 3mA
this PR 18mA 20mA 18mA 3mA
RPI_PICO2_W master 16mA 16mA 16mA 3mA (wakeup issue)
this PR 16mA 16mA 13mA 3mA

However, I think there's probably still some follow-up improvements we can make:

  • The issue with lwIP wakeups limiting lightsleep time remains, details here. I think the fix for this will be to disable lwIP's timers when there are no active network interfaces. Will pick this up on the relevant issue lightsleep on Pico W regression for 1.24.0 #16181.
  • The consumption when idle at a REPL vs running Python code in a loop vs time.sleep() still seem a bit close to me, surely idle should be a little lower. However I haven't dug into what's happening there.

I also cherry-picked the unit test I added for lightsleep on CPU1 and pushed it here. It passes. I think could either add it to this PR, or I'll submit it as a follow-up.

@ssotheremail
Copy link
ssotheremail commented Apr 6, 2025

Please forgive me if I am wasting your time because of my lack of knowledge of github processes but I would be grateful if someone could answer a few questions:

On my copy of micropython I have done a "git pull" today, compiled micropython (I struggled slightly with picotool but fixed that) and I get a 1.25.0 preview 442 for the 4 types of pico. On the download page the newest version for everything I have looked at is for example: https://micropython.org/resources/firmware/RPI_PICO2-20250327-v1.25.0-preview.428.g50da085d9.uf2 which is dated 27/03/25.
Q: Has the automated build process broken?

I notice that #16454 is milestoned for 1.26.0 (undated) and that 1.25.0 has progressed very rapidly since I last looked (although described as overdue by 3 months) but it doesn't seem to have 16454 included as far as I can tell.
Q: Does anyone have a date when the downloadable automatic development build will have the 16454 fix?

For info:
I ran my test program today with the 442 build I generated. I found no change from the behaviour of 1.24.1.
On Pico2:
lightsleep returns very quickly if preceded by a time.sleep, but other things correct, current fractionally over 3ma.

On the pico_w and pico2_w
lightsleep returns very quickly, unless using safe_sleep (lightsleep in a loop checking that the correct total time has been slept for)
With wireless disabled current approx 4.6ma using safe_sleep.

Q: is this the behaviour I should expect with the 442 build?

Q: do any of the lightsleep PRs currently being considered fix the "lightsleep preceded by time.sleep" issue?

I have just had a thought when writing this and will test it tomorrow. Say I want:
time.sleep_ms(750)
....... (some code)
lightsleep(5000)

would the lightsleep(5000) work properly if I did:
time.sleep_ms(750)
....... (some code)
lightsleep(1) # some value that fails, or provides a known short delay
lightsleep(5000)

If there is a known short delay it could be subtracted form 5000 to give the correct overall delay

Good news, the extra lightsleep trick works! I put an extra lightsleep(5000) in and it came back from from that very quickly and the second lightsleep(5000) worked. I can recalibrate the timing and get on with the projects I have now that I have a workaround for both bugs. I feel a bit silly I didnt think of that before!

@ssotheremail
Copy link

I notice:

https://micropython.org/resources/firmware/RPI_PICO2-20250407-v1.25.0-preview.447.g9e9be83fd.uf2

has now appeared, I will test again tomorrow..

@projectgus
Copy link
Contributor

I notice:

https://micropython.org/resources/firmware/RPI_PICO2-20250407-v1.25.0-preview.447.g9e9be83fd.uf2

has now appeared, I will test again tomorrow..

This nightly build is available, but the nightly builds only contain changes which have been merged to the "master" branch. While a PR is in the Open state, this hasn't happened yet so these changes are not included in that build.

If you want to test these changes then you have to check out this exact PR's code and compile with it. There are multiple ways to do this, including GitHub Desktop (I believe). The way I do it is to install the GitHub CLI tool (separate to git) and use the gh pr checkout command.

@ssotheremail
Copy link

Thanks for your time, clarification, and the info on the gh command, projectgus. I will try that later.

That leaves me with 2 remaining questions from earlier:

I notice that #16454 is milestoned for 1.26.0 (undated) and that 1.25.0 has progressed very rapidly since I last looked (although described as overdue by 3 months) but it doesn't seem to have 16454 included as far as I can tell.
Q: Does anyone have an estimated date when the downloadable automatic development build will have the 16454 fix?

Q: Do any of the several lightsleep PRs currently being considered fix the "lightsleep preceded by time.sleep" issue?

@ssotheremail
Copy link

Projectgus,
In case it might be useful I tried the "gh pr checkout 16454" on my clone and it built to a uf2, seemingly without problems. On trying it on a Pico2, it appeared to fix the "lightsleep after sleep" problem.
I then produced a uf2 for the Pico2_W but that still suffers from the early return problem. I cant tell if it has the "lightsleep after sleep" problem, obviously. I have not experimented with turning off the wireless 1st, yet.

@peterharperuk
Copy link
Contributor Author

Pico2_W but that still suffers from the early return problem

Timers firing in LwIP?

@ssotheremail
Copy link

Probably the Firing in LwIP thing, if I have understood properly. if there is a PR at a state that's worth testing I am willing to give it a go.

@projectgus
Copy link
67E6
Contributor

Probably the Firing in LwIP thing, if I have understood properly. if there is a PR at a state that's worth testing I am willing to give it a go.

No PR for this yet, follow issue #16181 for that one.

Copy link
Member
@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterharperuk for the work on this PR, @projectgus for testing, and others for comments and discussion.

I confirm that all the known pico-sdk bugs for alarm pool have been fixed and merged to pico-sdk 2.1.1, which we currently use here in MicroPython.

I also extensively tested this PR on RPI_PICO, RPI_PICO2 in RISCV mode and RPI_PICO2_W. All the tests pass, there are no regressions.

Power consumption at the REPL, doing time.sleep() and machine.lightsleep() also look good, on RP2350 time.sleep() now reduces power consumption by about 3mA (and as noted above I see that there's probably room for improvement on RP2350 to make the idle REPL use less power as well).

@dpgeorge
Copy link
Member
dpgeorge commented May 1, 2025

I also cherry-picked the unit test I added for lightsleep on CPU1 and pushed it here. It passes. I think could either add it to this PR, or I'll submit it as a follow-up.

Would be good to add this test to this PR. Angus will look into that.

@projectgus
Copy link
Contributor

Would be good to add this test to this PR. Angus will look into that.

Pushed!

@dpgeorge
Copy link
Member
dpgeorge commented May 2, 2025

@projectgus I tested your new test on a RPI_PICO2_W, and it fails. The third test fails:

$ mpremote run tests/ports/rp2/rp2_lightsleep_thread.py
test_cpu0_busy (__main__.LightSleepInThread) ... ok
test_cpu0_sleeping (__main__.LightSleepInThread) ... ok
test_cpu0_also_lightsleep (__main__.LightSleepInThread) ... FAIL

======================================================================
FAIL: test_cpu0_also_lightsleep <class 'LightSleepInThread'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest/__init__.py", line 399, in run_one
  File "<stdin>", line 56, in test_cpu0_also_lightsleep
  File "unittest/__init__.py", line 133, in assertAlmostEqual
AssertionError: 254 != 2500 within 1250 delta

----------------------------------------------------------------------
Ran 3 tests

FAILED (failures=1, errors=0)

Can you reproduce that failure?

Maybe we can just merge this PR without your test, and investigate your test failure as a follow up?

@projectgus
Copy link
Contributor
projectgus commented May 2, 2025

Can you reproduce that failure?

I can't, I just ran the test 15 times in a loop on RPI_PICO2_W and it passes every time. That's with the commit in this branch (a0880ec), I haven't rebased it against master.

Any chance your board might have a boot.py or something on it that's causing some subtle change?

@dpgeorge
Copy link
Member
dpgeorge commented May 2, 2025

The plot thickens:

  • Building this PR as-is without rebasing it on master, the new test can pass. BUT, it prints "F2 not ready" as part of the 3rd unittest, which means the cyw43 is having issues starting up (and that's due to the use of the LED in the test).
  • Making sure network.WLAN().active(1) is run before the tests, and the "F2 not ready" message goes away and the test passes.
  • Building this PR rebased on master, the new test consistently fails, even if WLAN is activated before running the test.

@ssotheremail
Copy link
ssotheremail commented May 2, 2025

My initial comment below was flawed by a code typo. I have removed the affected error report.

FWIW I have been using the "safe_sleep" code below to work around the "lightsleep returns early" problem on Pico W and 2w, when I test with:
MicroPython v1.26.0-preview.51.g00a0cd70f on 2025-05-01; Raspberry Pi Pico 2 W with RP2350

Iin cases its news, the "Lightsleep returns early" still happens on Pico2 W (haven't tried Pico W).
2350 based devices still have the "time.time does not advance", and "time.time forces lightsleep early return" on Pico2 with its preview 51

Safe_sleep(5000)

def safe_sleep(ms: int):
    begin = time.ticks_ms()
    wake = -1
    sleep = ms
    while sleep > 0:
        safe_sleep(sleep) Edit: my mistake will retest with lightsleep(sleep) tomorrow
        wake += 1
        sleep = ms - time.ticks_diff(time.ticks_ms(), begin)
    return


@madozu
Copy link
madozu commented May 2, 2025

@ssotheremail: Replace the line
safe_sleep(sleep)
with
machine.lightsleep(sleep)
and you should be fine. Your code actually never sleeps, but calls itself 5000 times.

@ssotheremail
Copy link

idiot me, I was in a rush and did a global edit in another file replacing lightsleep with safe_sleep, then cut and pasted the def safe_sleep here.... Oh dear. I will retest tomorrow.

@projectgus
Copy link
Contributor
projectgus commented May 8, 2025
  • Building this PR rebased on master, the new test consistently fails, even if WLAN is activated before running the test.

If I revert commit 23fb171 and then 6fa498c then the new test passes again... Reverting 6fa498c in particular is the difference between pass/fail. Still looking at it...

@projectgus projectgus force-pushed the alarm_pool_sleep_changes branch 2 times, most recently from 592ede7 to fd2f3cf Compare May 9, 2025 06:15
@projectgus
Copy link
Contributor
projectgus commented May 9, 2025

EDIT: Important context: when looking at the test failure I realised it was only the led.toggle() lines that were preventing the failing test case from hanging MicroPython permanently on wireless boards where the LED is part of the CYW43.

After discussing with @dpgeorge I was going to check if the hanging test case is a regression from master. Turns out it is, the test doesn't get correct timing on master but it doesn't hang indefinitely.

I've pushed another commit here to enforce mutual exclusion for calling into lightsleep(). It seems otherwise if both threads go into light sleep then there's a race during alarm cleanup that hangs both CPUs indefinitely. Coming up with ideal specified behaviour for lightsleep() across two CPUs is hard, but preventing this situation from causing a hang is relatively easy.

This allowed cleaning up the test so it now passes on RPI_PICO and RPI_PICO2 boards. The wireless boards still fail, but that's because of #16181 so should be fixed when that is fixed.

peterharperuk and others added 5 commits May 12, 2025 16:24
Stop using soft timer for `mp_wfe_or_timeout`.  Now uses the alarm pool
again as issues with this code have been fixed.  This resolves the "sev"
issue that stops the RP2350 going idle.

Also, change the lightsleep code to use the hardware timer library and
alarm 1, as alarm 2 is used by and soft timers and alarm 3 is used by the
alarm pool.

Signed-off-by: Peter Harper <peter.harper@raspberrypi.com>
Add some debug code that can be enabled to determine why lightsleep is
returning early.

Signed-off-by: Peter Harper <peter.harper@raspberrypi.com>
This reverts commit b42bb91.

Signed-off-by: Peter Harper <peter.harper@raspberrypi.com>
Not currently passing.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
There's no specified behaviour for what should happen if both CPUs call
`lightsleep()` together, but the latest changes could cause a permanent
hang due to a race in the timer cleanup code.  Add a flag to prevent hangs
if two threads accidentally lightsleep, at least.

This allows the new lightsleep test to pass on RPI_PICO and RPI_PICO2, and
even have much tighter time deltas.  However, the test still fails on
wireless boards where the lwIP tick wakes them up too frequently.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Copy link
Member
@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated test and mutual exclusion for machine.lightsleep() look good!

Tested on RPI_PICO, RPI_PICO2 and RPI_PICO2 in RISCV mode. Everything seems to work.

@dpgeorge dpgeorge force-pushed the alarm_pool_sleep_changes branch from fd2f3cf to 69993da Compare May 12, 2025 07:11
@dpgeorge dpgeorge merged commit 69993da into micropython:master May 12, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On RP2350 time.sleep_ms is a busy wait loop
5 participants
0