Bounds check for pack index read #6796

ConradIrwin · 2024-04-15T15:52:09Z

Fixes: libgit2#6795 Co-Authored-By: Bennet <bennetbo@gmx.de>

ethomson · 2024-04-19T08:53:34Z

src/libgit2/pack.c

@@ -1525,6 +1525,14 @@ static int pack_entry_find_offset(
 	if (p->index_version > 1) {
 		level1_ofs += 2;
 		index += 8;
+
+		if ((int)short_oid->id[0] + 2 >= p->index_map.len) {


I think that you need to upcast this to a size_t to be able to compare against the len (which is a size_t itself).

ethomson · 2024-04-19T09:01:03Z

Thanks for the PR - I pushed a change to cast to size_t instead of int. 🙏

Any chance you have a corrupted index that you could share, so that we could add to the test corpus?

ethomson · 2024-04-23T07:11:47Z

I’m going to go ahead and merge this, but @ConradIrwin if you have some corrupt index files that we can add to the test resources, that would be helpful! We can add them in separately if so.

ethomson · 2024-04-23T07:11:53Z

Thanks again!

ConradIrwin · 2024-04-23T15:11:31Z

Sorry for the slow response here, and thanks for fixing my fix!

I haven't been able to reproduce this crash locally (but it is coming through in our telemetry a bunch). I suspect, but am not certain, that this is some kind of a race where libgit2 reads a file while git is writing it, so we could try truncating some pack files to see if there's an easy reproduction. (I think we're a relatively unusual user of libgit2 where we are running on repos that the user may be actively interacting with git cli simultaneously).

ethomson · 2024-04-23T20:03:11Z

I hope that's not too uncommon — my first involvement with libgit2 was adding it to Visual Studio, and there was an expectation that there would be concurrent use in the VS plug-in and command-line users (Git for Windows). There's a difference here in file locking strategies between POSIX and Windows, of course.

Git (and libgit2) produce an index.lock file when they're going to start writing to the index, to prevent concurrent writes. (I think that there are places where git does not hold this long over the entirety of a semantic operation. Meaning, when you run a git command, it actually is implemented as multiple independent invocations of other commands, so it may lock and unlock the index repeatedly, which raises some questions about consistency. I mention this only for explanation of how this locking works, I don't mean to suggest that's the culprit here.)

I think that we audited all the places that we write to the index and wrap them all with a lock. We try to be a bit more semantically thoughtful about holding the lock for the entirety of "the thing" we're doing, so we might take the lock a little bit longer to avoid having to take it multiple times to complete (what appears to the user to be) a single operation.

Having said all that: the problem here is that the write lock is just a write lock. Readers don't obey it, it's not some shared rwlock or something. So I can indeed imagine a world where there's a write happening that we don't get complete visibility into:

morbo:~/Temp% touch .git/index.lock
morbo:~/Temp% git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
morbo:~/Temp% lsl .git/index.lock
0 -rw-r--r--@ 1 ethomson  staff  0 Apr 23 20:53 .git/index.lock
morbo:~/Temp% ls
./                 .git/              LICENSE            package.json
../                .github/           README.md
.app.js.swp        .gitignore         app.js
.eslintrc.cjs      .swp               package-lock.json
morbo:~/Temp% touch foo
morbo:~/Temp% git add foo
fatal: Unable to create '/Users/ethomson/Temp/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
morbo:~/Temp/%

ConradIrwin · 2024-04-25T04:53:58Z

Fascinatingly (and worryingly?) I was finally able to reproduce this crash (while trying to reproduce rust-lang/rust#124105) by:

Using sshfs (via macFuse) to do sshfs -o reconnect HOST:~/0 ~/sshfs
Opening Zed in ~/sshfs
Calling diskutil umount force ~/sshfs while Zed is running.

I think this means that the bounds check was probably a red-herring; and there is some other filesystem magic going on here that I don't understand.

What is supposed to happen when you force unmount an mmapped file anyway?!

Looking closer at the crash report, I (with hindsight) see that it was probably telling me this all along:

Termination Reason:    Namespace SIGNAL, Code 10 Bus error: 10
Terminating Process:   exc handler [11875]

VM Region Info: 0x11f3480f4 is in 0x11f348000-0x11f3b8000;  bytes after start: 244  bytes before end: 458507
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      mapped file                 11f340000-11f348000    [   32K] r--/r-- SM=PRV  Object_id=c33a63d
--->  mapped file                 11f348000-11f3b8000    [  448K] r--/r-- SM=PRV  Object_id=c2ee83d
      GAP OF 0x108000 BYTES
      VM_ALLOCATE                 11f4c0000-11f4c4000    [   16K] r--/rwx SM=COW  

Kernel Triage:
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted

ConradIrwin · 2024-04-25T15:10:29Z

Tracking here: zed-industries/zed#10992

ethomson · 2024-04-25T19:51:35Z

What is supposed to happen when you force unmount an mmapped file anyway?!

Nothing good! 🥴

I don't think that there's much opportunity to avoid a bus error here, at least not without introducing some significant overhead and even then all I can imagine is that we would only lessen the problem and it would still have racy TOCTOU problems.

I think that we should consider how to avoid mmap on flaky file systems. I don't know that the macOS APIs are to query here, but you can imagine that at repository open time we should try to detect whether we're on a flaky filesystem and downgrade to not-mmapping.

The mechanics are an interesting question - should it be just sshfs? And remote filesystem? Something user-configurable?

ConradIrwin · 2024-04-26T04:25:50Z

That's a good idea. I also saw there's a NO_MMAP compile flag, but not sure what the performance impact is.

It looks like we may be able to detect this state with statfs https://gist.github.com/ConradIrwin/61684c273936ac9bc27d785dfc1b4cf5 (though I'm not sure about false positives/negatives).

It seems like we could go down the "no mmap" path if MNT_LOCAL is not set; or we could be more conservative and just blacklist anything with the name "macfuse" (or "nfs" or "smbfs", more testing needed?). (And if the filesystem isn't local, then likely it's not very fast either, so we don't need to worry too much about a slowdown).

I'm not sure about optionality – it seems nice to control it from the user side, but I also know that Zed is a few layers of abstraction away from libgit, so we'd need to pass something through the rust crates; it might be best if the default behavior was sensible, and there's a way to force it on/off as needed? (maybe ALWAYS_MMAP to go alongside NO_MMAP?).

If you're interested in shipping this, I'd love to help build it. Do you have any pointers on where to start, or would you be interested in pairing on this?

Bounds check for pack index read

6e8227a

Fixes: libgit2#6795 Co-Authored-By: Bennet <bennetbo@gmx.de>

ethomson reviewed Apr 19, 2024

View reviewed changes

pack: upcast to size_t for bounds checking

06cafdf

ethomson merged commit 85d42ea into libgit2:main Apr 23, 2024
18 of 19 checks passed

ConradIrwin mentioned this pull request Apr 26, 2024

panic: unexpected error during closedir "Bad file descriptor" for unix::fs::Dir rust-lang/rust#124105

Closed

ghost mentioned this pull request Jun 3, 2024

libgit2@1.6 1.8.1 Homebrew/homebrew-core#173530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounds check for pack index read #6796

Bounds check for pack index read #6796

Bounds check for pack index read #6796

Bounds check for pack index read #6796

Conversation

Choose a reason for hiding this comment