-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
extmod/vfs_reader: Increase buffer size for speed boost. #12836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code size report:
|
Codecov Report
@@ Coverage Diff @@
## master #12836 +/- ##
==========================================
+ Coverage 98.39% 98.43% +0.03%
==========================================
Files 158 158
Lines 20972 20978 +6
==========================================
+ Hits 20636 20649 +13
+ Misses 336 329 -7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andrewleech
py/mpconfig.h
Outdated
@@ -597,6 +597,14 @@ | |||
#define MICROPY_READER_VFS (0) | |||
#endif | |||
|
|||
#ifndef MICROPY_READER_VFS_BUFFER_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature level is typically about ROM (Flash) capacity, but the reader buffer is using RAM.
We have talked in the past about adding a corresponding RAM_LEVEL
, which I think would have plenty of uses.
(Of course, RAM size and Flash size do tend to correlate, with the exception of XIP parts e.g. rp2).
I wonder if we should instead have some sort of runtime way to configure this. Maybe even an ioctl on the filesystem, which would be easy for mpremote to implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing the buffer size should also improve import speed from the filesystem. Would be good to measure that. In that case we'd want to have a large-ish buffer even when not using mpremote.
So, I'm not sure an ioctl would be that useful? Just have a fixed size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about the ROM/RAM implications of that setting, though I'm sure other features end up taking more/less ram in line with the rom level setting ;-)
ioctl isn't a bad idea, though it does add overhead. Also, do we use mp_vfs_blockdev_ioctl
as a "global" vfs ioctl that would set a default buffer size for the filesystem in question, or the IOBase.ioctl
and have to set it on a per-file basis?
blockdev doesn't quite feel right, it's very abstracted from the "file" level that this buffer is allocated for. But having to set the buffer size for each file seems quite annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was it's a file ioctl (i.e. the vfs file that the reader is currently opening). The filesystem can then chose to implement that particular ioctl (or not). The fact that it's per-file doesn't matter, e.g. mpremote's RemoteFile
can always return a constant value to that ioctl.
(There is no block device involved here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing the buffer size should also improve import speed from the filesystem. Would be good to measure that
I'm assuming this will be quite different on each platform, depending on flash speeds and interfaces. I reckon it won't make much different on stm's with spiflash as that generally has a buffer already, but as a first test with the esp32-s2, loading the same hello.py
as earlier but from internal flash:
without change: 321ms
import
with (24+180): 233ms
import
pyb-d:
as guessed: no change, with and without both 91ms
.
stm32wb55 (24 + 192)
before: 364ms
after: 236ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was it's a file ioctl (i.e. the vfs file that the reader is currently opening).
The issue is there's not really any file to execute the ioctl on. At the Python level you call import foo
and that constructs a VFS reader (allocating the read buffer) and then calls mp_vfs_open()
, which ends up in the mpremote-mount code. The latter has no idea about the VFS reader, nor any way to reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to get a feel for how this could work, mp_reader_new_file
calls mp_vfs_open
after which it could immediately call bufsize = file->ioctl(GET_BUFFER_SIZE)
then reader->buf = malloc(bufsize)
.
However it still requires the "file type" class ioctl to specify a suitable buffer size; the python impl remotefile in mpremote doesn't know how big the buffer could be unless it queries the chip first to get an idea of free ram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewleech Yes that is pretty much exactly the idea I'm suggesting. Except you don't need the double malloc, because you can use m_new_obj_var to allocate the mp_reader_vfs_t. (You just need to create the file first now).
; the python impl remotefile in mpremote doesn't know how big the buffer
It does -- it's the bigger value that we're proposing here. But it could also be a flag to mpremote which the user could set even larger if they want a small extra speed boost (or they know their board has lots of RAM).
(There's a mildly annoying detail which is that we now need to steal a byte or two to store the length of the buffer in the mp_reader_vfs_t struct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does -- it's the bigger value that we're proposing here
However if you're using a small device this larger value might not be appropriate!
Also I did get ~30% speed up of import from flash for larger py file on some boards, which we'd miss out on if the internal file types didn't use the larger value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However if you're using a small device this larger value might not be appropriate!
If you're using a small device, mpremote-mount probably doesn't work because it already requires a fair bit of RAM to load the VFS hook code. If you have the RAM for mpremote-mount, you most likely want to spend a little bit more in the buffer to get acceptable speed.
I think @jimmo's idea with the ioctl is the way to go. Because the buffer size is then dependent on the filesystem properties, not the board properties, which makes more sense (it's the filesystem that's slow, not the board). Also, a user of mpremote could increase this buffer size via an mpremout-mount command-line flag, in case they want the mount as fast as possible and have the RAM to spare.
If/when we have a memory-map FS, that won't need any buffer, and so it doesn't need to implement the ioctl, and doesn't need to waste RAM on a buffer.
If some boards show increased speed of filesystem imports with a bigger buffer, then we can either increase the default buffer size on those boards, or have them implement the ioctl for FAT and littlefs files when they sit on top of a block device (because it's the underlying block device that is slow).
py/mpconfig.h
Outdated
#if MICROPY_CONFIG_ROM_LEVEL_AT_LEAST_EXTRA_FEATURES | ||
#define MICROPY_READER_VFS_BUFFER_SIZE (24 + 180) | ||
#else | ||
#define MICROPY_READER_VFS_BUFFER_SIZE (24 + 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the 24 should remain a fixed constant in the mp_reader_vfs_t
struct, since it pads it out to 32 bytes, which is a GC block on 32-bit systems. Then this constant would be additional memory to allocate on top of the 24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I've pushed that change (also increasing it to 24 + 192 to keep it divisible by 32)
fa4518d
to
5bc3368
Compare
5bc3368
to
b0aacc8
Compare
Ok, I've taken a run at adding an ioctl for buffer size, including support in mpremote. This includes support for zero buffer. I just realised though that the released mpremote returns 0 for unhandled ioctl so that'll slow down existing users with released mpremote installed... |
6ed1add
to
b4c98aa
Compare
extmod/vfs_reader.c
Outdated
bufsize = MICROPY_READER_VFS_DEFAULT_BUF_SIZE; | ||
} | ||
|
||
mp_reader_vfs_t *rf = m_new_obj_var(mp_reader_vfs_t, uint8_t, bufsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has uint8_t
but the struct uses byte
for the VLA (I know they're the same, but good to be consistent).
extmod/vfs_reader.c
Outdated
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFSIZE, 0, &errcode); | ||
if (errcode || bufsize > 255) { | ||
errcode = 0; | ||
bufsize = MICROPY_READER_VFS_DEFAULT_BUF_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename s/BUF/BUFFER/
py/mpconfig.h
Outdated
@@ -597,6 +597,10 @@ | |||
#define MICROPY_READER_VFS (0) | |||
#endif | |||
|
|||
#ifndef MICROPY_READER_VFS_DEFAULT_BUF_SIZE | |||
#define MICROPY_READER_VFS_DEFAULT_BUF_SIZE (48) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 39 or 55 -- sizeof(mp_obj_t)+2*sizeof(uint16_t)+sizeof(uint8_t)+MICROPY_READER_VFS_DEFAULT_BUF_SIZE
needs to be a multiple of the block size.
extmod/vfs_reader.c
Outdated
} mp_reader_vfs_t; | ||
|
||
STATIC mp_uint_t mp_reader_vfs_readbyte(void *data) { | ||
mp_reader_vfs_t *reader = (mp_reader_vfs_t *)data; | ||
if (!reader->bufsize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a different code path (adding code size), could we have the buffer size be always at least 1.
So if the ioctl returns zero, then the effective buffer size is 1, and we use the mp_reader_vfs_t
's buf
in place of the byte c
used here.
Because the mp_reader_vfs_t is always heap allocated, really what we should probably do is make the minimum buffer sizeof(block)-9
(i.e. 7), otherwise those bytes are just wasted. And even for low-latency filesystems, it's still faster to buffer than not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should support zero buffer size. At least use 7 as the minimum since the bytes are anyway there.
For the default, I think it should be 23 so we retain the existing performance for filesystems like FAT and littlefs. So if the ioctl is not supported it defaults to 23. And the minimum value is 7, maximum is 255.
extmod/vfs_reader.c
Outdated
const mp_stream_p_t *stream_p = mp_get_stream(file); | ||
int errcode = 0; | ||
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFSIZE, 0, &errcode); | ||
if (errcode || bufsize > 255) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errcode should only be valid if bufsize is MP_STREAM_ERROR, so just check the latter.
How do you feel about |
928258d
to
491d2a1
Compare
extmod/vfs_reader.c
Outdated
int errcode = 0; | ||
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_BUFFER_SIZE, 0, &errcode); | ||
if (bufsize < 7 || bufsize > 255) { | ||
errcode = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mp_stream_rw
sets *errcode
to zero at the start, so this line isn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd noticed that ioctl generally doesn't set unless there's an error and assumed it might be the same.
extmod/vfs_reader.c
Outdated
const mp_stream_p_t *stream_p = mp_get_stream(file); | ||
int errcode = 0; | ||
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_BUFFER_SIZE, 0, &errcode); | ||
if (bufsize < 7 || bufsize > 255) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damien's earlier suggestion is pretty subtle, probably worth adding a comment for.
// ioctl returns unsigned -1 ( MP_STREAM_ERROR) on error, which will be >255, so will get the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, a comment here would save the next person from having to also trace out the expected vs error behaviour.
py/stream.h
Outdated
@@ -43,6 +43,7 @@ | |||
#define MP_STREAM_GET_DATA_OPTS (8) // Get data/message options | |||
#define MP_STREAM_SET_DATA_OPTS (9) // Set data/message options | |||
#define MP_STREAM_GET_FILENO (10) // Get fileno of underlying file | |||
#define MP_STREAM_BUFFER_SIZE (11) // Get preferred buffer size for file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be MP_STREAM_GET_BUFFER_SIZE
All the others have either GET or SET in their name, except for MP_STREAM_TIMEOUT
(but that's because it's both get & set)
I'm not worried about the length of the name.
Currently +44 on stm32, which might drop to +40 with the extra line removed. I think that's a pretty good code size tradeoff for a significant improvement very useful feature ( |
d393c8f
to
cffb05f
Compare
@jimmo yep it looks like it did go to +40 with that final cleanup. I'm still inclined to think the original hardcoded value that's loosely linked to the size of the chip is kind of better due to not taking up extra flash and having dual benefit to mpremote and internal flash imports .... but ioctl is more flexible for sure! Eventually the ioctl could be used for related things like buffering other kind of file loads etc. |
@andrewleech I was double-checking the allocation here, and it actually uncovered a bug in After some more thought I think we should make the following changes: The buffer size is (The other option would be to use a bitfield with three 10-bit values, but this adds about +46 bytes code size compared to three uint16_t, although it gets two bytes back for the buffer). Also we realised the logic around the return value from And I've implemented this in jimmo@94fcf2f -- that should squash cleanly into your commit (once you rebase after #12866 is merged). It's now +60 bytes. |
Nice find on the alloc @jimmo ! Prior to the ioctl stuff I tried with a eg. 512 buffer size and it failed badly, without looking I presume some of the other mpremote mount stuff is also limited to 8bit. I'd say we should keep all these fields as 8bit until support to use larger buffers is added elsewhere; it's not a public API struct so safe to change in the future if desired. I think it is worth keeping the define as something that can be overridden on a board / build if someone wants faster imports from internal flash. I'm not too fussed about capping vs default, I'd think replacing invalid with default is appropriate to save on logic effort / code size, but either way has advantages. |
cffb05f
to
3400900
Compare
const mp_stream_p_t *stream_p = mp_get_stream(file); | ||
int errcode = 0; | ||
mp_uint_t bufsize = stream_p->ioctl(file, MP_STREAM_GET_BUFFER_SIZE, 0, &errcode); | ||
if (bufsize == MP_STREAM_ERROR || bufsize == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bufsize
to be zero, then either:
- the file's ioctl implementation was invalid (i.e. just returns zero for everything)
- they wanted the smallest possible buffer
in the first case, it's arbritrary, and in the second case the clamp behavior seems more appropriate, resulting in the min buffer size. So you could skip the bufsize==0
check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so it turns out the existing mpremote
implementation of ioctl is exactly the "invalid" case I described above. So we need this special case. Can you add a comment here explaining this (and probably best fix mpremote
as part of this PR too to return -1 for invalid request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def ioctl(self, request, arg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3400900
to
0117a06
Compare
extmod/vfs_reader.c
Outdated
#define MICROPY_READER_VFS_DEFAULT_BUFFER_SIZE (2 * MICROPY_BYTES_PER_GC_BLOCK - offsetof(mp_reader_vfs_t, buf)) | ||
#endif | ||
#define MICROPY_READER_VFS_MIN_BUFFER_SIZE (MICROPY_BYTES_PER_GC_BLOCK - offsetof(mp_reader_vfs_t, buf)) | ||
#define MICROPY_READER_VFS_MAX_BUFFER_SIZE (256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be 255.
d9d0976
to
b1e41e2
Compare
1edc21e
to
c2b1a1d
Compare
Can be used to speed up importing a file from a vfs based filesystem. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Speeds up importing files from mounted filesystem. Also fix the return code for invalid / unsupported ioctl requests. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
c2b1a1d
to
bbc5a18
Compare
Thanks for the work on this, now merged! |
This will speed up importing a file from a vfs based filesystem.
In particular this speeds up importing modules from a
mpremote mount
dramatically, generally better than x10.