Alarm pool stops working under certain conditions #1552

dpgeorge · 2023-11-20T02:35:24Z

In MicroPython we have a reported issue of sleep_us causing the Pico W to hang, see micropython/micropython#12873

Debugging this issue is difficult because it's very sensitive to the timing and ordering of events, and adding debugging prints changes the behaviour. It looks like the CPU is hanging forever on the WFE in best_effort_wfe_or_timeout(), and the reason for this hang is because the hardware timer target associated with the pairing heap for the default alarm pool is set to a value in the past. The pairing heap is correct, the root node has a correct time in the future, but the hardware timer target is in the past.

I suspect that the pairing heap and its associated hardware timer have at some point got out of sync. And it's possibly in add_alarm_under_lock() that this happens (constructing an invariant for the pairing heap and its timer, that invariant is broken after a call to this function).

In the following part of add_alarm_under_lock():

          if (id == ph_insert_node(pool->heap, id)) {
              bool is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              if (is_missed && !create_if_past) {
                  ph_remove_and_free_node(pool->heap, id);
              }
              if (missed) *missed = is_missed;
          }

If is_missed is true and the node is removed, and that node was the head, then I think the hardware timer target needs to be set to the time associated with the new head of the pairing heap (ie the bug is that the timer target is not being updated to reflect the state of the pairing heap in the case is_missed is true).

Sorry that I can't give a reproduction in pure C, but the MicroPython code in this post does reproduce the issue: micropython/micropython#12873 (comment)

Let me know if I can help with giving more details on the bug.

The text was updated successfully, but these errors were encountered:

dpgeorge · 2023-11-20T02:35:40Z

Possibly the same bug as #1500.

dpgeorge · 2023-11-20T05:52:32Z

Here is schematically how I think it should be fixed:

          if (id == ph_insert_node(pool->heap, id)) {
              bool is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              while (is_missed && !create_if_past) {
                  ph_remove_and_free_node(pool->heap, id);
                  if (no more nodes) break;
                  time = get_entry(pool, pool->heap->root_id)->target;
                  is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              }
              if (missed) *missed = is_missed;
          }

peterharperuk · 2023-11-20T11:35:34Z

I can reproduce this with the second linked MP example on a Pico. Assigned to graham for his thoughts.

dpgeorge · 2023-11-20T11:57:23Z

Note that Pico and Pico W have a difference, in that MicroPython running on a Pico W has an alarm pool timer running at a periodic 64ms (for lwIP polling). So that's why the first test in the linked MicroPython issue only shows the bug on a Pico W.

kilograham · 2024-08-09T19:52:53Z

this is fixed we believe

dpgeorge mentioned this issue Nov 20, 2023

Pico W: sleep_us causes execution to hang micropython/micropython#12873

Closed

dpgeorge mentioned this issue Nov 20, 2023

Possible race condition in add_alarm_in_us/sleep_us, incorrect alarm period using LTO #1500

Closed

peterharperuk assigned kilograham Nov 20, 2023

peterharperuk mentioned this issue Nov 20, 2023

repeating timer stop fire when GPIO-IRQ is in use #1531

Closed

projectgus mentioned this issue Jan 3, 2024

rp2: Fix hang triggered by timing of short sleeps and soft timer events micropython/micropython#13329

Closed

cshaa mentioned this issue May 16, 2024

Execution hangs on sleep_us(1) on Pico W ArmDeveloperEcosystem/st7789-library-for-pico#5

Open

kilograham added this to the 1.6.0 milestone May 19, 2024

kilograham added the pico_time label May 19, 2024

kilograham modified the milestones: 1.6.1, 1.6.0 May 19, 2024

kilograham added the review2 label May 19, 2024

kilograham modified the milestones: 1.6.0, 1.6.1 Jul 7, 2024

kilograham removed the review2 label Jul 7, 2024

kilograham modified the milestones: 1.6.1, 1.6.2 Jul 20, 2024

kilograham modified the milestones: 1.6.2, 2.0.0 Aug 8, 2024

kilograham closed this as completed Aug 9, 2024

peterharperuk mentioned this issue Dec 19, 2024

Alarm pool sleep changes micropython/micropython#16454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alarm pool stops working under certain conditions #1552

Alarm pool stops working under certain conditions #1552

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alarm pool stops working under certain conditions #1552

Alarm pool stops working under certain conditions #1552

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!