-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
mpremote cannot cp files to stm32wb55 nucleo on stlink/com port. #8386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I slow down the transfer with some diagnostic logging, it works on slightly larger files (575 byte mpconfigboard.mk)
def raw_paste_write(self, command_bytes):
# Read initial header, with window size.
print("command_bytes", command_bytes)
data = self.serial.read(2)
window_size = data[0] | data[1] << 8
window_remain = window_size
# Write out the command_bytes data.
i = 0
while i < len(command_bytes):
while window_remain == 0 or self.serial.inWaiting():
print("window_remain", window_remain)
data = self.serial.read(1)
if data == b"\x01":
# Device indicated that a new window of data can be sent.
window_remain += window_size
elif data == b"\x04":
# Device indicated abrupt end. Acknowledge it and finish.
self.serial.write(b"\x04")
return
#elif not data:
# time.sleep(0.02)
# continue
else:
# Unexpected data from device.
raise PyboardError("unexpected read during raw paste: {}".format(data))
# Send out as much data as possible that fits within the allowed window.
b = command_bytes[i : min(i + window_remain, len(command_bytes))]
written = self.serial.write(b)
window_remain -= written
i += written
print("written", len(b), written)
This still fails on a larger file (25KB)
I'm unsure about what's going on with the device end of the communication back and forth to know what's supposed to be happening here that isn't. I presume it's something to do with once a window worth of data is sent the device should grab it, store it, then respond with a |
I can confirm the issue exists when running |
edited
Uh oh!
There was an error while loading. Please reload this page.
Ah yes that would make sense! maybe writing to flash in smaller chunks would work to be fast enough to continue to service the uart hardware interrupt. or perhaps some more synchroising between the flash semaphore and the commands being sent back to mpremote to ensure it waits long enough before sending more data |
Well I can confirm that #8452 does not fix this issue on its own. Also confirmed - yes it's a UART overrun. I added an in-code breakpoint on the overrun flag and yep it hit immediatly when the These overrun situations are currently not handled / reported at all, it ends up in an infinite lock-up when it happens during a I haven't yet got proof of how/why it's overrunning here, though flash erase/write would make sense. That being said, at the time when data is being read in on stdin here, there shouldn't neccesarily be background write operations - these should be synchronous in between the chunks of stdin reading. edit: scratch that last paragraph... I just looked at the other side of the window I just snipped the above image from... |
According to en.STM32WB-Memory-Flash-FLASH.pdf a flash erase operation takes 22ms. If 115200 baud uart has a real world spee of ~11520 bytes/s then yeah, over 250 bytes would be missed during an erase. The particual erase in question in the failures above is the background irq cache erase:
This backround operation is routinely blocked along with USB interrupt activity for a number of other flash operations. What we really need here is to block this interrupt for the duration of uart stdin read. However the function performing a stdin.read() doesn't know if it's a uart, usb or any other dupterm source. By the time it gets to Perhaps flash and usb should be set to a slightly different IRQ level so just the flash task can be blocked here and not interrupt usb, then block both in the other places where they're currently both blocked. Or perhaps a soft flag should be used to just prevent this background flash operation rather than using |
Uart repl stdin loses data when the background flash erase / cache flush happens during communication. This change forces all erase / write operations to happen during file sync/close instead. This change can be overridden by defining MICROPY_HW_FLASH_BACKGROUND_CACHE (1) in board config. For more details see micropython#8386
Uart repl stdin loses data when the background flash erase / cache flush happens during communication. This change forces all erase / write operations to happen during file sync/close instead. This change can be overridden in board config etc. with: #define MICROPY_HW_FLASH_BACKGROUND_CACHE (1) For more details see micropython#8386
I guess the reality is that any/all uart communication is unsafe in general, but particularly with internal flash opertions. This could be resolved with HW flow control, but only if the hardware in question has them wired up (the nucleo WB55 does not) The PR #8645 does resolve ths issue with mpremote / pyboard, allowing cp operations to work successfully. Alternatively, we could fix it by moving all interrupt handlers into ram, and making sure none of the interrupt handers need to directly touch flash (including not using const variables), and then things like uart buffering would continue to work while flash is locked. This is obviously a much larger change however and comes at a ram cost. It could still be easily accidentally broken in future by adding any line to an irq handler that touches flash incidentally. |
I think the issue is simply that when internal flash erases/writes are going on, the entire MCU is effectively blocked from executing, because the flash cannot be read while it's being erased/written to. This affects mpremote in the following way / under the following conditions:
As done in #8645, it's probably enough to disable the async flushing of the flashbdev cache. At least this will get mpremote working because then flash erases/writes only occur once mpremote finishes sending a command and while it's waiting for the response. Alternatively, as mentioned above, a better fix would be to put the UART IRQ handler in RAM so it can continue to execute while flash is being erased/written. I don't think this would be too difficult, but it needs:
I feel like putting the UART IRQ handler in RAM is a better solution. It would only need to be enabled if both |
I can confirm the issue exists on a NUCLEO-F413ZH, with REPL over UART. So it's not related to the WB55 specifically. |
True, I did think I should take WB55 out of the title, it could affect any chip/port with blocking flash and non-flow-control uart really. While I do think the IRQs in ram is a great solution (possibly with some slight performance improvements from not needing to execute from flash), I think it might require most if not all IRQ's to be migrated into ram as once a flash operation starts and the cpu can no longer access flash, any and all IRQ's that could possibly run at this point could block the cpu the instant one of them calls a function (or read a const variable) that needs to run from flash. This would likely block any other irq's from running? Actually I guess I don't know if this would be blocking... if the uart irq was higher priority preemption probably should work to run the uart irq and leave the flash-blocked one as is? Any higher prio irq's (systick) would need migrating to ram to ensure they don't block everything, but perhaps that's all. Interesting question about |
Yes that's a good point, I'm not sure if higher priority IRQs can run if a lower priority one is blocked on a flash access (because it may block the internal instruction bus). |
I tried moving the UART IRQ handler to RAM (along with other relevant things, including SysTick_Handler), but it did not work. |
I'm keen to have a go at this myself... even if it can work it does feel like it'll be easy to break though, just accidentally add a function call and/or variable in flash and it's all blocked. If the problem is a flash based irq hapenning first, they could always be masked out before flash operations to only allow uart/systick (on cores that support this at least) |
Here is my attempt at putting the UART IRQ handler (and other things) in RAM: https://github.com/dpgeorge/micropython/tree/stm32-uart-irq-handler-in-ram . It may need to have other .ld files changed to work on NUCLEO_WB55. |
Thanks heaps for that, I had started an attempt but in the debugger it was just running the wrong code all over the place. I'll definitely take a look at your branch and try to make it all work. |
@dpgeorge building on top of your branch I've got this working, see https://github.com/dpgeorge/micropython/compare/stm32-uart-irq-handler-in-ram...andrewleech:stm32-uart-irq-handler-in-ram?expand=1 The key is that the function that starts the flash process & anything that runs in the "while flash not finished" loop also needs to be in ram. The commit I added in the link above gets the |
Nice find! But I think this is pointing to a big limitation: that any code that runs from flash while the flash is stalled, will stall the entire instruction bus and even IRQs in RAM can no longer execute. Eg if an IRQ is raised whose handler is in flash, then that IRQ will stall the instruction bus and other (higher priority) IRQs that are in RAM will no longer be able to execute. You could test this by configuring a pyb.Timer with a callback and a frequency of around 1kHz. If enabling that breaks everything then... In the end I think putting IRQs in RAM is a lot of work for only marginal gain, and it only works in very certain circumstances. So probably just turning off the background flushing of flash is indeed the best solution. |
Yes it does feel like it's going to be rather fragile. Other irq's on most cores could be masked out during flash operation to prevent them getting in the way. Even with that in place though, it's really hard to know if it's going to keep working in future, and it's probably going to be impossible to unit test for. |
As an example (not neccesarily intended to merge) I wrapped up the isr/uart/flash from ram changes into a set of board-only changes - these could be used on a per-board / per-project basis if needed rather than "polluting" the main micropython codebase with these verbose / difficult to configure / fragile changes. |
Uart repl stdin loses data when the background flash erase / cache flush happens during communication. This change forces all erase / write operations to happen during file sync/close instead. This change can be overridden in board config etc. with: #define MICROPY_HW_FLASH_BACKGROUND_CACHE (1) For more details see micropython#8386
Add root cert DST Root CA X3 for Let's Encrypt
Allows mpremote file transfer to work correctly when mpremote is used over the ST-link USB/UART REPL port. Fixes issue micropython#8386. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Allows mpremote file transfer to work correctly when mpremote is used over the ST-link USB/UART REPL port. Fixes issue micropython#8386. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
When connecting to the nucleo stm32wb55 repl on the stlink (usb <-> uart) port file transfers with mpremote above a certain (small) size always fail.
In this case, transferring a ~23K file hangs for about 15 seconds then fails with the below error.
It fails in the same way after a similar timeout every time.
The same file transfer works perfectly on the direct usb port.
Transferring a 259 byte board.json works.
Transferring a 575 byte mpconfigboard.mk fails.
Note this is on Windows so may be related to #8358 - I will test shortly if this PR resolves (hides) the issue.
The text was updated successfully, but these errors were encountered: