-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Bounds check for pack index read #6796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes: libgit2#6795 Co-Authored-By: Bennet <bennetbo@gmx.de>
src/libgit2/pack.c
Outdated
8000 | @@ -1525,6 +1525,14 @@ static int pack_entry_find_offset( | ||
if (p->index_version > 1) { | |||
level1_ofs += 2; | |||
index += 8; | |||
|
|||
if ((int)short_oid->id[0] + 2 >= p->index_map.len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you need to upcast this to a size_t
to be able to compare against the len
(which is a size_t
itself).
Thanks for the PR - I pushed a change to cast to Any chance you have a corrupted index that you could share, so that we could add to the test corpus? |
I’m going to go ahead and merge this, but @ConradIrwin if you have some corrupt index files that we can add to the test resources, that would be helpful! We can add them in separately if so. |
Thanks again! |
Sorry for the slow response here, and thanks for fixing my fix! I haven't been able to reproduce this crash locally (but it is coming through in our telemetry a bunch). I suspect, but am not certain, that this is some kind of a race where libgit2 reads a file while git is writing it, so we could try truncating some pack files to see if there's an easy reproduction. (I think we're a relatively unusual user of libgit2 where we are running on repos that the user may be actively interacting with git cli simultaneously). |
I hope that's not too uncommon — my first involvement with libgit2 was adding it to Visual Studio, and there was an expectation that there would be concurrent use in the VS plug-in and command-line users (Git for Windows). There's a difference here in file locking strategies between POSIX and Windows, of course. Git (and libgit2) produce an I think that we audited all the places that we write to the index and wrap them all with a lock. We try to be a bit more semantically thoughtful about holding the lock for the entirety of "the thing" we're doing, so we might take the lock a little bit longer to avoid having to take it multiple times to complete (what appears to the user to be) a single operation. Having said all that: the problem here is that the write lock is just a write lock. Readers don't obey it, it's not some shared rwlock or something. So I can indeed imagine a world where there's a write happening that we don't get complete visibility into:
|
Fascinatingly (and worryingly?) I was finally able to reproduce this crash (while trying to reproduce rust-lang/rust#124105) by:
- Using sshfs (via macFuse) to do
sshfs -o reconnect HOST:~/0 ~/sshfs
- Opening Zed in
~/sshfs
- Calling
diskutil umount force ~/sshfs
while Zed is running.
I think this means that the bounds check was probably a red-herring; and there is some other filesystem magic going on here that I don't understand.
What is supposed to happen when you force unmount an mmapped file anyway?!
Looking closer at the crash report, I (with hindsight) see that it was probably telling me this all along:
Termination Reason: Namespace SIGNAL, Code 10 Bus error: 10
Terminating Process: exc handler [11875]
VM Region Info: 0x11f3480f4 is in 0x11f348000-0x11f3b8000; bytes after start: 244 bytes before end: 458507
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
mapped file 11f340000-11f348000 [ 32K] r--/r-- SM=PRV Object_id=c33a63d
---> mapped file 11f348000-11f3b8000 [ 448K] r--/r-- SM=PRV Object_id=c2ee83d
GAP OF 0x108000 BYTES
VM_ALLOCATE 11f4c0000-11f4c4000 [ 16K] r--/rwx SM=COW
Kernel Triage:
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted
Tracking here: zed-industries/zed#10992 |
Nothing good! 🥴 I don't think that there's much opportunity to avoid a bus error here, at least not without introducing some significant overhead and even then all I can imagine is that we would only lessen the problem and it would still have racy TOCTOU problems. I think that we should consider how to avoid mmap on flaky file systems. I don't know that the macOS APIs are to query here, but you can imagine that at repository open time we should try to detect whether we're on a flaky filesystem and downgrade to not-mmapping. The mechanics are an interesting question - should it be just sshfs? And remote filesystem? Something user-configurable? |
That's a good idea. I also saw there's a It looks like we may be able to detect this state with It seems like we could go down the "no mmap" path if I'm not sure about optionality – it seems nice to control it from the user side, but I also know that Zed is a few layers of abstraction away from libgit, so we'd need to pass something through the rust crates; it might be best if the default behavior was sensible, and there's a way to force it on/off as needed? (maybe If you're interested in shipping this, I'd love to help build it. Do you have any pointers on where to start, or would you be interested in pairing on this? |
Fixes: #6795