10000 mpremote cannot cp files to stm32wb55 nucleo on stlink/com port. · Issue #8386 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

mpremote cannot cp files to stm32wb55 nucleo on stlink/com port. #8386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewleech opened this issue Mar 7, 2022 · 18 comments · Fixed by #9054
Closed

mpremote cannot cp files to stm32wb55 nucleo on stlink/com port. #8386

andrewleech opened this issue Mar 7, 2022 · 18 comments · Fixed by #9054

Comments

@andrewleech
Copy link
Contributor

When connecting to the nucleo stm32wb55 repl on the stlink (usb <-> uart) port file transfers with mpremote above a certain (small) size always fail.

In this case, transferring a ~23K file hangs for about 15 seconds then fails with the below error.

micropython\ports\stm32\boards\NUCLEO_WB55>py ..\..\..\..\tools\mpremote\mpremote.py cp rfcore_firmware.py :
cp rfcore_firmware.py :rfcore_firmware.py
Traceback (most recent call last):
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 582, in filesystem_command
    op(src, dest2)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 522, in fs_put
    self.exec_("w(" + repr(data) + ")")
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 465, in exec_
    ret, ret_err = self.exec_raw(command, data_consumer=data_consumer)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 456, in exec_raw
    self.exec_raw_no_follow(command)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 434, in exec_raw_no_follow
    return self.raw_paste_write(command_bytes)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 412, in raw_paste_write
    raise PyboardError("could not complete raw paste: {}".format(data))
pyboard.PyboardError: could not complete raw paste: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote.py", line 6, in <module>
    sys.exit(main.main())
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote\main.py", line 520, in main
    do_filesystem(pyb, args)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote\main.py", line 298, in do_filesystem
    pyboard.filesystem_command(pyb, args)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 598, in filesystem_command
    print(str(er.args[2], "ascii"))
IndexError: tuple index out of range

It fails in the same way after a similar timeout every time.

The same file transfer works perfectly on the direct usb port.

Transferring a 259 byte board.json works.
Transferring a 575 byte mpconfigboard.mk fails.

Note this is on Windows so may be related to #8358 - I will test shortly if this PR resolves (hides) the issue.

@andrewleech
Copy link
Contributor Author

If I slow down the transfer with some diagnostic logging, it works on slightly larger files (575 byte mpconfigboard.mk)

pyboard.py

    def raw_paste_write(self, command_bytes):
        # Read initial header, with window size.
        print("command_bytes", command_bytes)
        data = self.serial.read(2)
        window_size = data[0] | data[1] << 8
        window_remain = window_size

        # Write out the command_bytes data.
        i = 0
        while i < len(command_bytes):
            while window_remain == 0 or self.serial.inWaiting():
                print("window_remain", window_remain)
                data = self.serial.read(1)
                if data == b"\x01":
                    # Device indicated that a new window of data can be sent.
                    window_remain += window_size
                elif data == b"\x04":
                    # Device indicated abrupt end.  Acknowledge it and finish.
                    self.serial.write(b"\x04")
                    return
                #elif not data:
                #    time.sleep(0.02)
                #    continue
                else:
                    # Unexpected data from device.
                    raise PyboardError("unexpected read during raw paste: {}".format(data))
            # Send out as much data as possible that fits within the allowed window.
            b = command_bytes[i : min(i + window_remain, len(command_bytes))]
            written = self.serial.write(b)
            window_remain -= written
            i += written
            print("written", len(b), written)
>py ..\..\..\..\tools\mpremote\mpremote.py cp mpconfigboard.mk :
cp mpconfigboard.mk :mpconfigboard.mk
command_bytes b"f=open('mpconfigboard.mk','wb')\nw=f.write"
window_remain 128
written 41 41
command_bytes b"w(b'MCU_SERIES = wb\\nCMSIS_MCU = STM32WB55xx\\nAF_FILE = boards/stm32wb55_af.csv\\nSTARTUP_FILE = $(STM32LIB_CMSIS_BASE)/Source/Templates/gcc/startup_stm32wb55xx_cm4.o\\n\\nifeq ($(USE_MBOOT),1)\\n# When using Mboot all the text goes together after the bootloader\\nLD_FILE')"
window_remain 128
written 256 256
window_remain 0
written 13 13
command_bytes b"w(b'S = boards/stm32wb55xg.ld boards/common_bl.ld\\nTEXT0_ADDR = 0x08004000\\nelse\\n# When not using Mboot the text goes at the start of flash\\nLD_FILES = boards/stm32wb55xg.ld boards/common_basic.ld\\nTEXT0_ADDR = 0x08000000\\nendif\\n\\n# MicroPython settings\\nMICROPY_PY_B')"
window_remain 128
written 256 256
window_remain 0
written 15 15
command_bytes b"w(b'LUETOOTH = 1\\nMICROPY_BLUETOOTH_NIMBLE = 1\\nMICROPY_VFS_LFS2 = 1\\n')"
window_remain 128
written 72 72
command_bytes b'f.close()'
window_remain 128
written 9 9

This still fails on a larger file (25KB)

command_bytes b"w(b'tate\\n\\n\\ndef _read_failure_reason():\\n    return machine.mem32[REG_RTC_REASON]\\n\\n\\ndef _write_failure_state(reason):\\n    machine.mem32[REG_RTC_REASON] = reason\\n    _write_state(_STATE_FAILED)\\n    return reason\\n\\n\\n# Check for the presence of a given file and atte')"
window_remain 128
written 256 256
window_remain 0
written 19 19
command_bytes b'w(b\'mpt to start installing it.\\ndef _stat_and_start_copy(path, addr, copying_state, copied_state):\\n    try:\\n        os.stat(path)\\n    except OSError:\\n        log("{} not found", path)\\n        return False\\n\\n    log("{} update is available", path)\\n    if sum(stm\')'
window_remain 128
written 256 256
window_remain 0
written 15 15
command_bytes b'w(b\'.rfcore_fw_version(_FW_VERSION_WS)):\\n        # There was some WS firmware already installed. Need to remove that\\n        # before copying to flash (both FUS or WS copy require this).\\n        log("Removing existing WS firmware")\\n        _write_state(_STATE_\')'
window_remain 128
written 256 256
window_remain 0
written 10 10
command_bytes b'w(b\'DELETING_WS)\\n        _fus_fwdelete()\\n    else:\\n        log("Copying {} to flash", path)\\n        # Mark that the flash write has started. Any failure should result in an overall failure.\\n        _write_state(copying_state)  # Either _STATE_COPYING_FUS or _S\')'
window_remain 128
written 256 256
window_remain 0
Traceback (most recent call last):
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 592, in filesystem_command
    op(src, dest2)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 532, in fs_put
    self.exec_("w(" + repr(data) + ")")
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 475, in exec_
    ret, ret_err = self.exec_raw(command, data_consumer=data_consumer)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 466, in exec_raw
    self.exec_raw_no_follow(command)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 444, in exec_raw_no_follow
    return self.raw_paste_write(command_bytes)
  File "W:\home\anl\tempt\src\micropython\tools\mpremote\mpremote/../..\pyboard.py", line 408, in raw_paste_write
    raise PyboardError("unexpected read during raw paste: {}".format(data))
pyboard.PyboardError: unexpected read during raw paste: b''

I'm unsure about what's going on with the device end of the communication back and forth to know what's supposed to be happening here that isn't. I presume it's something to do with once a window worth of data is sent the device should grab it, store it, then respond with a b"\x01". In this case the window is empty but no return byte has been received so it hangs in the while loop until serial timeout / returns empty data triggering the exception.

@dpgeorge
Copy link
Member
dpgeorge commented Mar 7, 2022

I can confirm the issue exists when running mpremote on linux. It has something to do with the way internal flash writes have to synchronise with the second core, and maybe it misses UART chars.

@andrewleech
Copy link
Contributor Author
andrewleech commented Mar 7, 2022

Ah yes that would make sense!
We've tested doubling the uart buffer size #define MICROPY_HW_UART_REPL_RXBUF (520) (didn't fix it) and the nucleo doesn't have cts/rts wired in.

maybe writing to flash in smaller chunks would work to be fast enough to continue to service the uart hardware interrupt.

or perhaps some more synchroising between the flash semaphore and the commands being sent back to mpremote to ensure it waits long enough before sending more data

@andrewleech
Copy link
Contributor Author
andrewleech commented May 8, 2022

Well I can confirm that #8452 does not fix this issue on its own.

Also confirmed - yes it's a UART overrun.
image

I added an in-code breakpoint on the overrun flag and yep it hit immediatly when the mpremote cp operation got to the point where it usually locks up.

These overrun situations are currently not handled / reported at all, it ends up in an infinite lock-up when it happens during a usys.stdin.buffer.read() style operation. This is breakable by ctrl-c at least, but certainly not ideal.

I haven't yet got proof of how/why it's overrunning here, though flash erase/write would make sense. That being said, at the time when data is being read in on stdin here, there shouldn't neccesarily be background write operations - these should be synchronous in between the chunks of stdin reading.

edit: scratch that last paragraph... I just looked at the other side of the window I just snipped the above image from...
image

@andrewleech
Copy link
Contributor Author

According to en.STM32WB-Memory-Flash-FLASH.pdf a flash erase operation takes 22ms. If 115200 baud uart has a real world spee of ~11520 bytes/s then yeah, over 250 bytes would be missed during an erase.

The particual erase in question in the failures above is the background irq cache erase:

static void flash_bdev_irq_handler(void) {
    if (!(flash_flags & FLASH_FLAG_DIRTY)) {
        return;
    }
    // This code erases the flash directly, waiting for it to finish
    if (!(flash_flags & FLASH_FLAG_ERASED)) {
        flash_erase(flash_cache_sector_start, flash_cache_sector_size / 4);
        flash_flags |= FLASH_FLAG_ERASED;
        return;
    }
...

This backround operation is routinely blocked along with USB interrupt activity for a number of other flash operations.

What we really need here is to block this interrupt for the duration of uart stdin read. However the function performing a stdin.read() doesn't know if it's a uart, usb or any other dupterm source. By the time it gets to mp_hal_stdin_rx_chr() to try the uart read it'll be too late to bock irq's because the uart has already been read (or not) in the irq.

Perhaps flash and usb should be set to a slightly different IRQ level so just the flash task can be blocked here and not interrupt usb, then block both in the other places where they're currently both blocked.

Or perhaps a soft flag should be used to just prevent this background flash operation rather than using raise_irq_pri() as for a larger stdin.read() this could be blocking for quite a long time, and therefore masking off a range of lower priority irq's.

andrewleech pushed a commit to andrewleech/micropython that referenced this issue May 9, 2022
Uart repl stdin loses data when the background flash erase / cache flush happens during communication.

This change forces all erase / write operations to happen during file sync/close instead.

This change can be overridden by defining MICROPY_HW_FLASH_BACKGROUND_CACHE (1) in board config.

For more details see micropython#8386
andrewleech pushed a commit to andrewleech/micropython that referenced this issue May 9, 2022
Uart repl stdin loses data when the background flash erase / cache flush
happens during communication.

This change forces all erase / write operations to happen during file
sync/close instead.

This change can be overridden in board config etc. with:
#define MICROPY_HW_FLASH_BACKGROUND_CACHE (1)

For more details see micropython#8386
@andrewleech
Copy link
Contributor Author

I guess the reality is that any/all uart communication is unsafe in general, but particularly with internal flash opertions. This could be resolved with HW flow control, but only if the hardware in question has them wired up (the nucleo WB55 does not)

The PR #8645 does resolve ths issue with mpremote / pyboard, allowing cp operations to work successfully.

Alternatively, we could fix it by moving all interrupt handlers into ram, and making sure none of the interrupt handers need to directly touch flash (including not using const variables), and then things like uart buffering would continue to work while flash is locked. This is obviously a much larger change however and comes at a ram cost. It could still be easily accidentally broken in future by adding any line to an irq handler that touches flash incidentally.

@dpgeorge
Copy link
Member

I think the issue is simply that when internal flash erases/writes are going on, the entire MCU is effectively blocked from executing, because the flash cannot be read while it's being erased/written to.

This affects mpremote in the following way / under the following conditions:

  • internal flash is used for the filesystem
  • UART is used for the REPL, without any hardware flow control
  • mpremote sends some commands to write a file
  • the internal flash-block-device timeout triggers asynchronously at some point and flushes the cache to flash
  • this stalls the MCU and any UART chars sent by mpremote are lost

As done in #8645, it's probably enough to disable the async flushing of the flashbdev cache. At least this will get mpremote working because then flash erases/writes only occur once mpremote finishes sending a command and while it's waiting for the response.

Alternatively, as mentioned above, a better fix would be to put the UART IRQ handler in RAM so it can continue to execute while flash is being erased/written. I don't think this would be too difficult, but it needs:

  • IRQ vector table in RAM
  • all the USARTx_IRQHandler functions in RAM
  • uart_irq_handler in RAM
  • possibly pendsv_kbd_intr in RAM, along with everything it calls (probably not necessary to fix the mpremote issue)

I feel like putting the UART IRQ handler in RAM is a better solution. It would only need to be enabled if both MICROPY_HW_ENABLE_INTERNAL_FLASH_STORAGE and MICROPY_HW_UART_REPL are enabled.

@dpgeorge
Copy link
Member

I can confirm the issue exists on a NUCLEO-F413ZH, with REPL over UART. So it's not related to the WB55 specifically.

@andrewleech
Copy link
Contributor Author

True, I did think I should take WB55 out of the title, it could affect any chip/port with blocking flash and non-flow-control uart really.

While I do think the IRQs in ram is a great solution (possibly with some slight performance improvements from not needing to execute from flash), I think it might require most if not all IRQ's to be migrated into ram as once a flash operation starts and the cpu can no longer access flash, any and all IRQ's that could possibly run at this point could block the cpu the instant one of them calls a function (or read a const variable) that needs to run from flash. This would likely block any other irq's from running?

Actually I guess I don't know if this would be blocking... if the uart irq was higher priority preemption probably should work to run the uart irq and leave the flash-blocked one as is? Any higher prio irq's (systick) would need migrating to ram to ensure they don't block everything, but perhaps that's all.

Interesting question about pendsv_kbd_intr, in all likelihood it would never matter if calling this function from uart irq blocked for flash as it's going to only ever interrupt the vm "soon" anyway, but before other chars are handled.

@dpgeorge
Copy link
Member

Actually I guess I don't know if this would be blocking... if the uart irq was higher priority preemption probably should work to run the uart irq and leave the flash-blocked one as is? Any higher prio irq's (systick) would need migrating to ram to ensure they don't block everything, but perhaps that's all.

Yes that's a good point, I'm not sure if higher priority IRQs can run if a lower priority one is blocked on a flash access (because it may block the internal instruction bus).

@dpgeorge
Copy link
Member

I tried moving the UART IRQ handler to RAM (along with other relevant things, including SysTick_Handler), but it did not work.

@andrewleech
Copy link
Contributor Author

I'm keen to have a go at this myself... even if it can work it does feel like it'll be easy to break though, just accidentally add a function call and/or variable in flash and it's all blocked.

If the problem is a flash based irq hapenning first, they could always be masked out before flash operations to only allow uart/systick (on cores that support this at least)

@dpgeorge
Copy link
Member

Here is my attempt at putting the UART IRQ handler (and other things) in RAM: https://github.com/dpgeorge/micropython/tree/stm32-uart-irq-handler-in-ram . It may need to have other .ld files changed to work on NUCLEO_WB55.

@andrewleech
Copy link
Contributor Author

Thanks heaps for that, I had started an attempt but in the debugger it was just running the wrong code all over the place. I'll definitely take a look at your branch and try to make it all work.

@andrewleech
Copy link
Contributor Author

@dpgeorge building on top of your branch I've got this working, see https://github.com/dpgeorge/micropython/compare/stm32-uart-irq-handler-in-ram...andrewleech:stm32-uart-irq-handler-in-ram?expand=1

The key is that the function that starts the flash process & anything that runs in the "while flash not finished" loop also needs to be in ram.

The commit I added in the link above gets the mpremote cp working for me, although if the stm32hal function names don't match then it won't work on other boards. I'm not sure on the best way to integrate this all cleanly and reliably...

@dpgeorge
Copy link
Member

The key is that the function that starts the flash process & anything that runs in the "while flash not finished" loop also needs to be in ram.

Nice find! But I think this is pointing to a big limitation: that any code that runs from flash while the flash is stalled, will stall the entire instruction bus and even IRQs in RAM can no longer execute. Eg if an IRQ is raised whose handler is in flash, then that IRQ will stall the instruction bus and other (higher priority) IRQs that are in RAM will no longer be able to execute. You could test this by configuring a pyb.Timer with a callback and a frequency of around 1kHz. If enabling that breaks everything then...

In the end I think putting IRQs in RAM is a lot of work for only marginal gain, and it only works in very certain circumstances. So probably just turning off the background flushing of flash is indeed the best solution.

@andrewleech
Copy link
Contributor Author

Yes it does feel like it's going to be rather fragile. Other irq's on most cores could be masked out during flash operation to prevent them getting in the way. Even with that in place though, it's really hard to know if it's going to keep working in future, and it's probably going to be impossible to unit test for.

@andrewleech
Copy link
Contributor Author
andrewleech commented May 31, 2022

As an example (not neccesarily intended to merge) I wrapped up the isr/uart/flash from ram changes into a set of board-only changes - these could be used on a per-board / per-project basis if needed rather than "polluting" the main micropython codebase with these verbose / difficult to configure / fragile changes.

andrewleech pushed a commit to andrewleech/micropython that referenced this issue Jun 3, 2022
Uart repl stdin loses data when the background flash erase / cache flush
happens during communication.

This change forces all erase / write operations to happen during file
sync/close instead.

This change can be overridden in board config etc. with:
#define MICROPY_HW_FLASH_BACKGROUND_CACHE (1)

For more details see micropython#8386
Wind-stormger pushed a commit to BPI-STEAM/micropython that referenced this issue Sep 14, 2023
Add root cert DST Root CA X3 for Let's Encrypt
dpgeorge pushed a commit to andrewleech/micropython that referenced this issue Sep 2, 2024
Allows mpremote file transfer to work correctly when mpremote is used over
the ST-link USB/UART REPL port.

Fixes issue micropython#8386.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
graeme-winter pushed a commit to winter-special-projects/micropython that referenced this issue Sep 21, 2024
Allows mpremote file transfer to work correctly when mpremote is used over
the ST-link USB/UART REPL port.

Fixes issue micropython#8386.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0