8000 Issues with `best_effort_wfe_or_timeout` and WFE in SDK 2.0.0 · Issue #1812 · raspberrypi/pico-sdk · GitHub
[go: up one dir, main page]

Skip to content

Issues with best_effort_wfe_or_timeout and WFE in SDK 2.0.0 #1812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dpgeorge opened this issue Aug 13, 2024 · 8 comments
Closed

Issues with best_effort_wfe_or_timeout and WFE in SDK 2.0.0 #1812

dpgeorge opened this issue Aug 13, 2024 · 8 comments
Assignees
Milestone

Comments

@dpgeorge
Copy link

Background

We are trying to achieve good idle power consumption on RP2350 in MicroPython. Idle here means sitting at the REPL (with either UART or USB serial) doing nothing. In this case the CPU should be able to gate its clock via WFI/WFE to reduce power consumption.

Eg on a standard Pico with RP2040, connected to a USB port and a terminal port program open, the Pico draws about 15.2mA (at 5V). Then running a simple while-True busy loop that increases to about 18mA. This is expected behaviour.

Summary of problems

Things seem to have changed in SDK 2.0.0 (the tag 2.0.0). There are a few issues that seem to be related:

  • best_effort_wfe_or_timeout() now seems to return immediately, on both RP2040 and RP2350
  • best_effort_wfe_or_timeout() calls __sev(); __wfe() to clear any existing event, which potentially misses events
  • on RP2350 spin locks are implemented in software using ldaexb and strexb exclusive-access instructions, and these seem to set the event flag (equivalent to __sev()), meaning that spin_lock_blocking sets the event
  • due to the above, functions like hardware_alarm_set_target set the event flag, making these functions unusable for making a timer to wake from WFE

Details

best_effort_wfe_or_timeout() now seems to return immediately

Running the following on RP2040:

absolute_time_t timeout_time = make_timeout_time_us(1000000);
while (!best_effort_wfe_or_timeout(timeout_time)) {
}

With pico-sdk 1.5.1 that will consume about 15.2mA on a Pico board. With pico-sdk 2.0.0 that code consumes about 18mA on a Pico board.

Timing how long the best_effort_wfe_or_timeout function lasts, on pico-sdk 2.0.0 it always returns pretty much immediately (eg within 5us), so the above loop is effectively a busy loop, hence the higher power consumption.

This looks like a regression with pico-sdk 2.0.0 on RP2040.

best_effort_wfe_or_timeout() calls __sev(); __wfe()

Inspecting the code for best_effort_wfe_or_timeout in pico-sdk 2.0.0, there's a new bit:

            // the above alarm add now may force an IRQ which will wake us up,
            // so we want to consume one __wfe.. we do an explicit __sev
            // just to make sure there is one
            __sev(); // make sure there is an event sow ee don't block 
            __wfe();                      
            if (!time_reached(timeout_timestamp))
            {                                    
                // ^ at the point above the timer hadn't fired, so it is safe
                // to wait; the event will happen due to IRQ at some point between
                // then and the correct wakeup time
                __wfe();                                                                  
            }

This sev/wfe pair will clear any existing event flag. But what if that event was from a user interrupt, and was the event the user was waiting for? Eg:

void my_irq_handler(void) {
    my_event_flag = true;
    __sev();
}

int main(void) {
    ...
    for (;;) {
        if (my_event_flag) {
            break;
        }

        // <-- IRQ fires here and calls my_irq_handler

        best_effort_wfe_or_timeout(make_timeout_time_us(10000));
    }
    ...
}

In principle (I don't have code to show this behaviour) the __sev() from the my_irq_handler will be cleared by the best_effort_wfe_or_timeout and that latter function will wait the entire 10ms if no other IRQs come in.

ldaexb and strexb set the event flag

According to the datasheet for RP2350:

Processors also receive an event signal from the global monitor if their
reservation is lost due to a write by a different master, in accordance with
Armv8-M architecture requirements.

From my testing it seems that executing a pair of ldaexb and strexb instructions on RP2350 does indeed do an effective __sev(). This means spin_lock_blocking() sets the event flag, and hence any function that calls this.

It would be great to instead use the hardware spin locks on RP2350. According to the errata it's still possible to use some of them, those which don't have aliases for writable registers.

hardware_alarm_set_target set the event flag

In MicroPython we implement low power idle by setting up a callback using hardware_alarm_set_target(<id>, <timeout>) and then execute __wfe() to either wait for an event or the timeout. But this does not work on RP2350 because hardware_alarm_set_target() sets the event flag, meaning that the subsequent __wfe() wakes immediately.

It's unclear how to use __wfe() effectively in the pico-sdk because of this issue of the event flag being set in many locations.

Final thoughts

Ideally we'd be able to use __wfe() to implement low-power idle on RP2350. If anyone has any pointers on how to do this that would be much appreciated.

Regardless, I think there are a few bugs with best_effort_wfe_or_timeout as mentioned above.

@dpgeorge
Copy link
Author

Regarding the ldaexb/strexb pair setting the processor event flag: I tested these instructions (the exact SW_SPIN_LOCK_LOCK code from this repo) on:

  • a different Cortex-M33 MCU (an STM32H5xx)
  • a multi-core Cortex-M55 (an Alif E7).

On both these MCUs the ldaexb/strexb pair do not set the processor event flag on the processor executing these instructions.

So it seems to be a quirk of the RP2350 -- which has the processor event flags cross-wired between the CPUs -- that the ldaexb/strexb pair set the event flag on the current processor.

@Wren6991
Copy link
Contributor

Speaking just to the monitor behaviour:

Regarding the ldaexb/strexb pair setting the processor event flag: I tested these instructions (the exact SW_SPIN_LOCK_LOCK code from this repo) on:

* a different Cortex-M33 MCU (an STM32H5xx)

* a multi-core Cortex-M55 (an Alif E7).

On both these MCUs the ldaexb/strexb pair do not set the processor event flag on the processor executing these instructions.

This is specified behaviour on Armv8-M, see DDI0553B.y section B9.3.1 (page 275):

image

We implemented this (useless) behaviour because it is strictly required by the spec. I haven't looked into the intricacies of the global monitor implementations on the systems you mentioned. It's possible this is their bug, and they missed that part of the spec. It's also possible they don't retire reservations on a PE's exclusive store to its own reservation, which is implementation-defined as per B9.3.2:

image

We do take this arc, as per 2.1.6.1 "Implementation-defined Monitor Behaviour" in our datasheet:

image

I think the intended use here is to leave a reservation open and get pinged when someone writes to it, but it causes unnecessary events on RMW.

kilograham added a commit that referenced this issue Sep 30, 2024
* #1812 improvements to best_effort_wfe_or_timeout

* Fix best_effort_wfe_or_timeout further by not having the IRQ ever move the alarm target backwards
@kilograham
Copy link
Contributor

fixed in develop

@dpgeorge
Copy link
Author
dpgeorge commented Oct 1, 2024

@Wren6991 thanks for the details regarding the silicon implementation of the RP2350.

It's also possible they don't retire reservations on a PE's exclusive store to its own reservation

Yes, it seems to be that this is the difference between the other MCUs I tested and the RP2350. In particular it looks like:

StoreExcl(Marked_address, n)*

is the key difference, that the RP2350 will go from exclusive to open access (and hence generate an event) when the PE stores back to the marked address using strexb in the spin lock code (the n meaning that the same PE doing the ldaexb generates the event).


Anyway, thanks for fixing the issues with best_effort_wfe_or_timeout()!

Although it seems that hardware_alarm_set_target() still sets the event flag (it uses spin locks) and so is still going to cause issues for MicroPython, we still won't be able to go into a CPU-gated mode.

@kilograham
Copy link
Contributor

Although it seems that hardware_alarm_set_target() still sets the event flag (it uses spin locks) and so is still going to cause issues for MicroPython, we still won't be able to go into a CPU-gated mode.

The suggestion is to use the alarm pool code which hopefully now works correctly in all cases

@dpgeorge
Copy link
Author
dpgeorge commented Oct 1, 2024

The suggestion is to use the alarm pool code which hopefully now works correctly in all cases

Yes, we will do that, move back to using the alarm pool code. But we will need a pico-sdk release for that 😄 do you know when that might be?

@peterharperuk
Copy link
Contributor
peterharperuk commented Oct 1, 2024

The implementation of mp_wfe_or_timeout always sets an alarm (actually it mostly modifies an existing alarm). If we avoid doing that we can avoid generating a sev? I was going to suggest a temporary change like this - but it's not fully tested yet peterharperuk/micropython@9ed4a1d

(inserting a soft timer calls pendsv_suspend which uses a recursive mutex which also uses spinlocks).

@kilograham
Copy link
Contributor

The suggestion is to use the alarm pool code which hopefully now works correctly in all cases

Yes, we will do that, move back to using the alarm pool code. But we will need a pico-sdk release for that 😄 do you know when that might be?

some time this month hopefully

peterharperuk added a commit to peterharperuk/pico-sdk that referenced this issue Oct 7, 2024
best_effort_wfe_or_timeout should not ignore an outstanding sev
See raspberrypi#1812
peterharperuk added a commit that referenced this issue Nov 5, 2024
* Test for best_effort_wfe_or_timeout sev issue

best_effort_wfe_or_timeout should not ignore an outstanding sev
See #1812

* Test for alarm being set in the past issue

See #1953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0