8000 Bounds check for pack index read by ConradIrwin · Pull Request #6796 · libgit2/libgit2 · GitHub
[go: up one dir, main page]

Skip to content

Bounds check for pack index read #6796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 23, 2024
Merged

Bounds check for pack index read #6796

merged 2 commits into from
Apr 23, 2024

Conversation

ConradIrwin
Copy link
Contributor

Fixes: #6795

Fixes: libgit2#6795

Co-Authored-By: Bennet <bennetbo@gmx.de>
8000
@@ -1525,6 +1525,14 @@ static int pack_entry_find_offset(
if (p->index_version > 1) {
level1_ofs += 2;
index += 8;

if ((int)short_oid->id[0] + 2 >= p->index_map.len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you need to upcast this to a size_t to be able to compare against the len (which is a size_t itself).

@ethomson
Copy link
Member

Thanks for the PR - I pushed a change to cast to size_t instead of int. 🙏

Any chance you have a corrupted index that you could share, so that we could add to the test corpus?

@ethomson
Copy link
Member

I’m going to go ahead and merge this, but @ConradIrwin if you have some corrupt index files that we can add to the test resources, that would be helpful! We can add them in separately if so.

@ethomson
Copy link
Member

Thanks again!

@ethomson ethomson merged commit 85d42ea into libgit2:main Apr 23, 2024
18 of 19 checks passed
@ConradIrwin
Copy link
Contributor Author

Sorry for the slow response here, and thanks for fixing my fix!

I haven't been able to reproduce this crash locally (but it is coming through in our telemetry a bunch). I suspect, but am not certain, that this is some kind of a race where libgit2 reads a file while git is writing it, so we could try truncating some pack files to see if there's an easy reproduction. (I think we're a relatively unusual user of libgit2 where we are running on repos that the user may be actively interacting with git cli simultaneously).

@ethomson
Copy link
Member

I hope that's not too uncommon — my first involvement with libgit2 was adding it to Visual Studio, and there was an expectation that there would be concurrent use in the VS plug-in and command-line users (Git for Windows). There's a difference here in file locking strategies between POSIX and Windows, of course.

Git (and libgit2) produce an index.lock file when they're going to start writing to the index, to prevent concurrent writes. (I think that there are places where git does not hold this long over the entirety of a semantic operation. Meaning, when you run a git command, it actually is implemented as multiple independent invocations of other commands, so it may lock and unlock the index repeatedly, which raises some questions about consistency. I mention this only for explanation of how this locking works, I don't mean to suggest that's the culprit here.)

I think that we audited all the places that we write to the index and wrap them all with a lock. We try to be a bit more semantically thoughtful about holding the lock for the entirety of "the thing" we're doing, so we might take the lock a little bit longer to avoid having to take it multiple times to complete (what appears to the user to be) a single operation.

Having said all that: the problem here is that the write lock is just a write lock. Readers don't obey it, it's not some shared rwlock or something. So I can indeed imagine a world where there's a write happening that we don't get complete visibility into:

morbo:~/Temp% touch .git/index.lock
morbo:~/Temp% git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
morbo:~/Temp% lsl .git/index.lock
0 -rw-r--r--@ 1 ethomson  staff  0 Apr 23 20:53 .git/index.lock
morbo:~/Temp% ls
./                 .git/              LICENSE            package.json
../                .github/           README.md
.app.js.swp        .gitignore         app.js
.eslintrc.cjs      .swp               package-lock.json
morbo:~/Temp% touch foo
morbo:~/Temp% git add foo
fatal: Unable to create '/Users/ethomson/Temp/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
morbo:~/Temp/%

@ConradIrwin
Copy link
Contributor Author

Fascinatingly (and worryingly?) I was finally able to reproduce this crash (while trying to reproduce rust-lang/rust#124105) by:

  • Using sshfs (via macFuse) to do sshfs -o reconnect HOST:~/0 ~/sshfs
  • Opening Zed in ~/sshfs
  • Calling diskutil umount force ~/sshfs while Zed is running.

I think this means that the bounds check was probably a red-herring; and there is some other filesystem magic going on here that I don't understand.

What is supposed to happen when you force unmount an mmapped file anyway?!

Looking closer at the crash report, I (with hindsight) see that it was probably telling me this all along:

Termination Reason:    Namespace SIGNAL, Code 10 Bus error: 10
Terminating Process:   exc handler [11875]

VM Region Info: 0x11f3480f4 is in 0x11f348000-0x11f3b8000;  bytes after start: 244  bytes before end: 458507
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      mapped file                 11f340000-11f348000    [   32K] r--/r-- SM=PRV  Object_id=c33a63d
--->  mapped file                 11f348000-11f3b8000    [  448K] r--/r-- SM=PRV  Object_id=c2ee83d
      GAP OF 0x108000 BYTES
      VM_ALLOCATE                 11f4c0000-11f4c4000    [   16K] r--/rwx SM=COW  

Kernel Triage:
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted
VM - (arg = 0x0) Object has no pager because the backing vnode was force unmounted

@ConradIrwin
Copy link
Contributor Author

Tracking here: zed-industries/zed#10992

@ethomson
Copy link
Member

What is supposed to happen when you force unmount an mmapped file anyway?!

Nothing good! 🥴

I don't think that there's much opportunity to avoid a bus error here, at least not without introducing some significant overhead and even then all I can imagine is that we would only lessen the problem and it would still have racy TOCTOU problems.

I think that we should consider how to avoid mmap on flaky file systems. I don't know that the macOS APIs are to query here, but you can imagine that at repository open time we should try to detect whether we're on a flaky filesystem and downgrade to not-mmapping.

The mechanics are an interesting question - should it be just sshfs? And remote filesystem? Something user-configurable?

@ConradIrwin
Copy link
Contributor Author

That's a good idea. I also saw there's a NO_MMAP compile flag, but not sure what the performance impact is.

It looks like we may be able to detect this state with statfs https://gist.github.com/ConradIrwin/61684c273936ac9bc27d785dfc1b4cf5 (though I'm not sure about false positives/negatives).

It seems like we could go down the "no mmap" path if MNT_LOCAL is not set; or we could be more conservative and just blacklist anything with the name "macfuse" (or "nfs" or "smbfs", more testing needed?). (And if the filesystem isn't local, then likely it's not very fast either, so we don't need to worry too much about a slowdown).

I'm not sure about optionality – it seems nice to control it from the user side, but I also know that Zed is a few layers of abstraction away from libgit, so we'd need to pass something through the rust crates; it might be best if the default behavior was sensible, and there's a way to force it on/off as needed? (maybe ALWAYS_MMAP to go alongside NO_MMAP?).

If you're interested in shipping this, I'd love to help build it. Do you have any pointers on where to start, or would you be interested in pairing on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bus error 10: in pack_entry_find_offset
2 participants
0