8000 Alarm pool stops working under certain conditions · Issue #1552 · raspberrypi/pico-sdk · GitHub
[go: up one dir, main page]

Skip to content

Alarm pool stops working under certain conditions #1552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dpgeorge opened this issue Nov 20, 2023 · 5 comments
Closed

Alarm pool stops working under certain conditions #1552

dpgeorge opened this issue Nov 20, 2023 · 5 comments
Assignees
Milestone

Comments

@dpgeorge
Copy link

In MicroPython we have a reported issue of sleep_us causing the Pico W to hang, see micropython/micropython#12873

Debugging this issue is difficult because it's very sensitive to the timing and ordering of events, and adding debugging prints changes the behaviour. It looks like the CPU is hanging forever on the WFE in best_effort_wfe_or_timeout(), and the reason for this hang is because the hardware timer target associated with the pairing heap for the default alarm pool is set to a value in the past. The pairing heap is correct, the root node has a correct time in the future, but the hardware timer target is in the past.

I suspect that the pairing heap and its associated hardware timer have at some point got out of sync. And it's possibly in add_alarm_under_lock() that this happens (constructing an invariant for the pairing heap and its timer, that invariant is broken after a call to this function).

In the following part of add_alarm_under_lock():

          if (id == ph_insert_node(pool->heap, id)) {
              bool is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              if (is_missed && !create_if_past) {
                  ph_remove_and_free_node(pool->heap, id);
              }
              if (missed) *missed = is_missed;
          }   

If is_missed is true and the node is removed, and that node was the head, then I think the hardware timer target needs to be set to the time associated with the new head of the pairing heap (ie the bug is that the timer target is not being updated to reflect the state of the pairing heap in the case is_missed is true).

Sorry that I can't give a reproduction in pure C, but the MicroPython code in this post does reproduce the issue: micropython/micropython#12873 (comment)

Let me know if I can help with giving more details on the bug.

@dpgeorge
Copy link
Author

Possibly the same bug as #1500.

@dpgeorge
8000 Copy link
Author

Here is schematically how I think it should be fixed:

          if (id == ph_insert_node(pool->heap, id)) {
              bool is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              while (is_missed && !create_if_past) {
                  ph_remove_and_free_node(pool->heap, id);
                  if (no more nodes) break;
                  time = get_entry(pool, pool->heap->root_id)->target;
                  is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
              }
              if (missed) *missed = is_missed;
          }

@peterharperuk
Copy link
Contributor

I can reproduce this with the second linked MP example on a Pico. Assigned to graham for his thoughts.

@dpgeorge
Copy link
Author

Note that Pico and Pico W have a difference, in that MicroPython running on a Pico W has an alarm pool timer running at a periodic 64ms (for lwIP polling). So that's why the first test in the linked MicroPython issue only shows the bug on a Pico W.

@kilograham
Copy link
Contributor

this is fixed we believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0