-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Alarm pool stops working under certain conditions #1552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Possibly the same bug as #1500. |
Here is schematically how I think it should be fixed: if (id == ph_insert_node(pool->heap, id)) {
bool is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
while (is_missed && !create_if_past) {
ph_remove_and_free_node(pool->heap, id);
if (no more nodes) break;
time = get_entry(pool, pool->heap->root_id)->target;
is_missed = hardware_alarm_set_target(pool->hardware_alarm_num, time);
}
if (missed) *missed = is_missed;
} |
I can reproduce this with the second linked MP example on a Pico. Assigned to graham for his thoughts. |
Note that Pico and Pico W have a difference, in that MicroPython running on a Pico W has an alarm pool timer running at a periodic 64ms (for lwIP polling). So that's why the first test in the linked MicroPython issue only shows the bug on a Pico W. |
this is fixed we believe |
In MicroPython we have a reported issue of
sleep_us
causing the Pico W to hang, see micropython/micropython#12873Debugging this issue is difficult because it's very sensitive to the timing and ordering of events, and adding debugging prints changes the behaviour. It looks like the CPU is hanging forever on the WFE in
best_effort_wfe_or_timeout()
, and the reason for this hang is because the hardware timer target associated with the pairing heap for the default alarm pool is set to a value in the past. The pairing heap is correct, the root node has a correct time in the future, but the hardware timer target is in the past.I suspect that the pairing heap and its associated hardware timer have at some point got out of sync. And it's possibly in
add_alarm_under_lock()
that this happens (constructing an invariant for the pairing heap and its timer, that invariant is broken after a call to this function).In the following part of
add_alarm_under_lock()
:If
is_missed
is true and the node is removed, and that node was the head, then I think the hardware timer target needs to be set to the time associated with the new head of the pairing heap (ie the bug is that the timer target is not being updated to reflect the state of the pairing heap in the caseis_missed
is true).Sorry that I can't give a reproduction in pure C, but the MicroPython code in this post does reproduce the issue: micropython/micropython#12873 (comment)
Let me know if I can help with giving more details on the bug.
The text was updated successfully, but these errors were encountered: