8000 extmod/vfs_reader: Increase buffer size for speed boost. by andrewleech · Pull Request #12836 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

extmod/vfs_reader: Increase buffer size for speed boost. #12836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 9, 2023

Conversation

andrewleech
Copy link
Contributor

This will speed up importing a file from a vfs based filesystem.

In particular this speeds up importing modules from a mpremote mount dramatically, generally better than x10.

@github-actions
Copy link
github-actions bot commented Oct 31, 2023

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +112 +0.014% standard
      stm32:   +44 +0.011% PYBV10
     mimxrt:   +40 +0.011% TEENSY40
        rp2:   +48 +0.015% RPI_PICO
       samd:   +44 +0.017% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@codecov
Copy link
codecov bot commented Oct 31, 2023

Codecov Report

Merging #12836 (c2b1a1d) into master (dff2938) will increase coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head c2b1a1d differs from pull request most recent head bbc5a18. Consider uploading reports for the commit bbc5a18 to get more accurate results

@@            Coverage Diff             @@
##           master   #12836      +/-   ##
==========================================
+ Coverage   98.39%   98.43%   +0.03%     
==========================================
  Files         158      158              
  Lines       20972    20978       +6     
==========================================
+ Hits        20636    20649      +13     
+ Misses        336      329       -7     
Files Coverage Δ
extmod/vfs_reader.c 97.05% <100.00%> (+0.63%) ⬆️
py/stream.h 100.00% <ø> (ø)

... and 3 files with indirect coverage changes

Copy link
Member
@jimmo jimmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewleech

py/mpconfig.h Outdated
@@ -597,6 +597,14 @@
#define MICROPY_READER_VFS (0)
#endif

#ifndef MICROPY_READER_VFS_BUFFER_SIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature level is typically about ROM (Flash) capacity, but the reader buffer is using RAM.

We have talked in the past about adding a corresponding RAM_LEVEL, which I think would have plenty of uses.

(Of course, RAM size and Flash size do tend to correlate, with the exception of XIP parts e.g. rp2).

I wonder if we should instead have some sort of runtime way to configure this. Maybe even an ioctl on the filesystem, which would be easy for mpremote to implement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the buffer size should also improve import speed from the filesystem. Would be good to measure that. In that case we'd want to have a large-ish buffer even when not using mpremote.

So, I'm not sure an ioctl would be that useful? Just have a fixed size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about the ROM/RAM implications of that setting, though I'm sure other features end up taking more/less ram in line with the rom level setting ;-)

ioctl isn't a bad idea, though it does add overhead. Also, do we use mp_vfs_blockdev_ioctl as a "global" vfs ioctl that would set a default buffer size for the filesystem in question, or the IOBase.ioctl and have to set it on a per-file basis?
blockdev doesn't quite feel right, it's very abstracted from the "file" level that this buffer is allocated for. But having to set the buffer size for each file seems quite annoying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was it's a file ioctl (i.e. the vfs file that the reader is currently opening). The filesystem can then chose to implement that particular ioctl (or not). The fact that it's per-file doesn't matter, e.g. mpremote's RemoteFile can always return a constant value to that ioctl.

(There is no block device involved here)

Copy link
Contributor Author
@andrewleech andrewleech Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the buffer size should also improve import speed from the filesystem. Would be good to measure that

I'm assuming this will be quite different on each platform, depending on flash speeds and interfaces. I reckon it won't make much different on stm's with spiflash as that generally has a buffer already, but as a first test with the esp32-s2, loading the same hello.py as earlier but from internal flash:
without change: 321ms import
with (24+180): 233ms import

pyb-d:
as guessed: no change, with and without both 91ms.

stm32wb55 (24 + 192)
before: 364ms
after: 236ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was it's a file ioctl (i.e. the vfs file that the reader is currently opening).

The issue is there's not really any file to execute the ioctl on. At the Python level you call import foo and that constructs a VFS reader (allocating the read buffer) and then calls mp_vfs_open(), which ends up in the mpremote-mount code. The latter has no idea about the VFS reader, nor any way to reference it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to get a feel for how this could work, mp_reader_new_file calls mp_vfs_open after which it could immediately call bufsize = file->ioctl(GET_BUFFER_SIZE) then reader->buf = malloc(bufsize).

However it still requires the "file type" class ioctl to specify a suitable buffer size; the python impl remotefile in mpremote doesn't know how big the buffer could be unless it queries the chip first to get an idea of free ram.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewleech Yes that is pretty much exactly the idea I'm suggesting. Except you don't need the double malloc, because you can use m_new_obj_var to allocate the mp_reader_vfs_t. (You just need to create the file first now).

; the python impl remotefile in mpremote doesn't know how big the buffer

It does -- it's the bigger value that we're proposing here. But it could also be a flag to mpremote which the user could set even larger if they want a small extra speed boost (or they know their board has lots of RAM).

(There's a mildly annoying detail which is that we now need to steal a byte or two to store the length of the buffer in the mp_reader_vfs_t struct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does -- it's the bigger value that we're proposing here

However if you're using a small device this larger value might not be appropriate!

Also I did get ~30% speed up of import from flash for larger py file on some boards, which we'd miss out on if the internal file types didn't use the larger value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However if you're using a small device this larger value might not be appropriate!

If you're using a small device, mpremote-mount probably doesn't work because it already requires a fair bit of RAM to load the VFS hook code. If you have the RAM for mpremote-mount, you most likely want to spend a little bit more in the buffer to get acceptable speed.

I think @jimmo's idea with the ioctl is the way to go. Because the buffer size is then dependent on the filesystem properties, not the board properties, which makes more sense (it's the filesystem that's slow, not the board). Also, a user of mpremote could increase this buffer size via an mpremout-mount command-line flag, in case they want the mount as fast as possible and have the RAM to spare.

If/when we have a memory-map FS, that won't need any buffer, and so it doesn't need to implement the ioctl, and doesn't need to waste RAM on a buffer.

If some boards show increased speed of filesystem imports with a bigger buffer, then we can either increase the default buffer size on those boards, or have them implement the ioctl for FAT and littlefs files when they sit on top of a block device (because it's the underlying block device that is slow).

py/mpconfig.h Outdated
#if MICROPY_CONFIG_ROM_LEVEL_AT_LEAST_EXTRA_FEATURES
#define MICROPY_READER_VFS_BUFFER_SIZE (24 + 180)
#else
#define MICROPY_READER_VFS_BUFFER_SIZE (24 + 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the 24 should remain a fixed constant in the mp_reader_vfs_t struct, since it pads it out to 32 bytes, which is a GC block on 32-bit systems. Then this constant would be additional memory to allocate on top of the 24.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I've pushed that change (also increasing it to 24 + 192 to keep it divisible by 32)

@andrewleech
Copy link
Contributor < 8000 span aria-label="This user is the author of this pull request." data-view-component="true" class="tooltipped tooltipped-n"> Author

Ok, I've taken a run at adding an ioctl for buffer size, including support in mpremote.

This includes support for zero buffer. I just realised though that the released mpremote returns 0 for unhandled ioctl so that'll slow down existing users with released mpremote installed...

@andrewleech andrewleech force-pushed the vfs_reader_buffer branch 2 times, most recently from 6ed1add to b4c98aa Compare November 1, 2023 05:05
bufsize = MICROPY_READER_VFS_DEFAULT_BUF_SIZE;
}

mp_reader_vfs_t *rf = m_new_obj_var(mp_reader_vfs_t, uint8_t, bufsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has uint8_t but the struct uses byte for the VLA (I know they're the same, but good to be consistent).

mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFSIZE, 0, &errcode);
if (errcode || bufsize > 255) {
errcode = 0;
bufsize = MICROPY_READER_VFS_DEFAULT_BUF_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename s/BUF/BUFFER/

py/mpconfig.h Outdated
@@ -597,6 +597,10 @@
#define MICROPY_READER_VFS (0)
#endif

#ifndef MICROPY_READER_VFS_DEFAULT_BUF_SIZE
#define MICROPY_READER_VFS_DEFAULT_BUF_SIZE (48)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 39 or 55 -- sizeof(mp_obj_t)+2*sizeof(uint16_t)+sizeof(uint8_t)+MICROPY_READER_VFS_DEFAULT_BUF_SIZE needs to be a multiple of the block size.

} mp_reader_vfs_t;

STATIC mp_uint_t mp_reader_vfs_readbyte(void *data) {
mp_reader_vfs_t *reader = (mp_reader_vfs_t *)data;
if (!reader->bufsize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a different code path (adding code size), could we have the buffer size be always at least 1.

So if the ioctl returns zero, then the effective buffer size is 1, and we use the mp_reader_vfs_t's buf in place of the byte c used here.

Because the mp_reader_vfs_t is always heap allocated, really what we should probably do is make the minimum buffer sizeof(block)-9 (i.e. 7), otherwise those bytes are just wasted. And even for low-latency filesystems, it's still faster to buffer than not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support zero buffer size. At least use 7 as the minimum since the bytes are anyway there.

For the default, I think it should be 23 so we retain the existing performance for filesystems like FAT and littlefs. So if the ioctl is not supported it defaults to 23. And the minimum value is 7, maximum is 255.

const mp_stream_p_t *stream_p = mp_get_stream(file);
int errcode = 0;
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFSIZE, 0, &errcode);
if (errcode || bufsize > 255) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errcode should only be valid if bufsize is MP_STREAM_ERROR, so just check the latter.

@andrewleech
Copy link
Contributor Author
andrewleech commented Nov 1, 2023

How do you feel about MP_STREAM_GET_BUFSIZE as the ioctl name? MP_STREAM_GET_BUFFER_SIZE is getting long, but more explicit? Or just MP_STREAM_BUFFER_SIZE

@andrewleech andrewleech force-pushed the vfs_reader_buffer branch 2 times, most recently from 928258d to 491d2a1 Compare November 1, 2023 09:01
int errcode = 0;
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_BUFFER_SIZE, 0, &errcode);
if (bufsize < 7 || bufsize > 255) {
errcode = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mp_stream_rw sets *errcode to zero at the start, so this line isn't necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'd noticed that ioctl generally doesn't set unless there's an error and assumed it might be the same.

const mp_stream_p_t *stream_p = mp_get_stream(file);
int errcode = 0;
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_BUFFER_SIZE, 0, &errcode);
if (bufsize < 7 || bufsize > 255) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damien's earlier suggestion is pretty subtle, probably worth adding a comment for.

// ioctl returns unsigned -1 ( MP_STREAM_ERROR) on error, which will be >255, so will get the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, a comment here would save the next person from having to also trace out the expected vs error behaviour.

py/stream.h Outdated
@@ -43,6 +43,7 @@
#define MP_STREAM_GET_DATA_OPTS (8) // Get data/message options
#define MP_STREAM_SET_DATA_OPTS (9) // Set data/message options
#define MP_STREAM_GET_FILENO (10) // Get fileno of underlying file
#define MP_STREAM_BUFFER_SIZE (11) // Get preferred buffer size for file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be MP_STREAM_GET_BUFFER_SIZE

All the others have either GET or SET in their name, except for MP_STREAM_TIMEOUT (but that's because it's both get & set)

I'm not worried about the length of the name.

@jimmo
Copy link
Member
jimmo commented Nov 1, 2023

Currently +44 on stm32, which might drop to +40 with the extra line removed. I think that's a pretty good code size tradeoff for a significant improvement very useful feature (mpremote mount). Thanks for the work implementing this @andrewleech !

@andrewleech andrewleech force-pushed the vfs_reader_buffer branch 2 times, most recently from d393c8f to cffb05f Compare November 2, 2023 21:51
@andrewleech
Copy link
Contributor Author

@jimmo yep it looks like it did go to +40 with that final cleanup.

I'm still inclined to think the original hardcoded value that's loosely linked to the size of the chip is kind of better due to not taking up extra flash and having dual benefit to mpremote and internal flash imports .... but ioctl is more flexible for sure! Eventually the ioctl could be used for related things like buffering other kind of file loads etc.

@jimmo
Copy link
Member
jimmo commented Nov 3, 2023

@andrewleech I was double-checking the allocation here, and it actually uncovered a bug in m_new_obj_var fixed by #12866

After some more thought I think we should make the following changes:

The buffer size is uint8_t, but the pos/len are uint16_t. These should all be the same. First we should rename the fields to all be prefixed with buf so this is more obvious. But also this lets us reclaim two bytes. Alternatively, we can make the buffer size uint16_t, which I think is actually more useful, even if it does now make the default size two bytes smaller than before this PR. This also lets the buffer size be even, which although not strictly necessary feels more correct. (Ideally it'd be a multiple of the word size or something I guess).

(The other option would be to use a bitfield with three 10-bit values, but this adds about +46 bytes code size compared to three uint16_t, although it gets two bytes back for the buffer).

Also we realised the logic around the return value from ioctl isn't quite right. I think it should clamp to the range, rather than setting to the default.

And MICROPY_READER_VFS_DEFAULT_BUFFER_SIZE doesn't need to be overridable. Let's just keep it in vfs_reader.c as a constant.

I've implemented this in jimmo@94fcf2f -- that should squash cleanly into your commit (once you rebase after #12866 is merged). It's now +60 bytes.

@dpgeorge dpgeorge added the extmod Relates to extmod/ directory in source label Nov 3, 2023
@andrewleech
Copy link
Contributor Author

Nice find on the alloc @jimmo !
Good point about the pos/length types, I wonder why they were ever made 16bit when the buffer was so small!

Prior to the ioctl stuff I tried with a eg. 512 buffer size and it failed badly, without looking I presume some of the other mpremote mount stuff is also limited to 8bit.

I'd say we should keep all these fields as 8bit until support to use larger buffers is added elsewhere; it's not a public API struct so safe to change in the future if desired.

I think it is worth keeping the define as something that can be overridden on a board / build if someone wants faster imports from internal flash.

I'm not too fussed about capping vs default, I'd think replacing invalid with default is appropriate to save on logic effort / code size, but either way has advantages.

const mp_stream_p_t *stream_p = mp_get_stream(file);
int errcode = 0;
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFFER_SIZE, 0, &errcode);
if (bufsize == MP_STREAM_ERROR || bufsize == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bufsize to be zero, then either:

  • the file's ioctl implementation was invalid (i.e. just returns zero for everything)
  • they wanted the smallest possible buffer

in the first case, it's arbritrary, and in the second case the clamp behavior seems more appropriate, resulting in the min buffer size. So you could skip the bufsize==0 check here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so it turns out the existing mpremote implementation of ioctl is exactly the "invalid" case I described above. So we need this special case. Can you add a comment here explaining this (and probably best fix mpremote as part of this PR too to return -1 for invalid request).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def ioctl(self, request, arg):

Copy link
Contributor Author
@andrewleech andrewleech Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that thanks, also added some more unit test coverage
image

#define MICROPY_READER_VFS_DEFAULT_BUFFER_SIZE (2 * MICROPY_BYTES_PER_GC_BLOCK - offsetof(mp_reader_vfs_t, buf))
#endif
#define MICROPY_READER_VFS_MIN_BUFFER_SIZE (MICROPY_BYTES_PER_GC_BLOCK - offsetof(mp_reader_vfs_t, buf))
#define MICROPY_READER_VFS_MAX_BUFFER_SIZE (256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be 255.

@andrewleech andrewleech force-pushed the vfs_reader_buffer branch 2 times, most recently from d9d0976 to b1e41e2 Compare November 8, 2023 03:07
@andrewleech andrewleech force-pushed the vfs_reader_buffer branch 2 times, most recently from 1edc21e to c2b1a1d Compare November 8, 2023 03:16
Can be used to speed up importing a file from a vfs based filesystem.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Speeds up importing files from mounted filesystem.

Also fix the return code for invalid / unsupported ioctl requests.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@dpgeorge dpgeorge merged commit bbc5a18 into micropython:master Nov 9, 2023
@dpgeorge
Copy link
Member
dpgeorge commented Nov 9, 2023

Thanks for the work on this, now merged!

@andrewleech andrewleech deleted the vfs_reader_buffer branch November 20, 2023 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0