10000 Move `git_off_t` to `git_object_size_t` by ethomson · Pull Request #5123 · libgit2/libgit2 · GitHub
[go: up one dir, main page]

Skip to content

Move git_off_t to git_object_size_t #5123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 28, 2019
Merged

Move git_off_t to git_object_size_t #5123

merged 13 commits into from
Nov 28, 2019

Conversation

ethomson
Copy link
Member

git_off_t is problematic: first, it's used to represent the size of
a blob, but that's represented in git as an unsigned long, meaning
that we've lost a bit (on 64-bit machines). Second, git_off_t is a
poor name, suggesting that it's an offset, not a size (which makes
sense, given that it's a signed type).

Migrate git_off_t to git_object_size_t, an unsigned 64-bit type.

@ethomson ethomson force-pushed the ethomson/off_t branch 6 times, most recently from 88fc191 to f22e3fc Compare June 16, 2019 21:50
@ethomson
Copy link
Member Author

Aha, we do actually treat git_off_t as an offset in the packfile handling (and as a length, of course). I'll have to step through here and 👀 how we're dealing with this and tighten it up.

@ethomson ethomson force-pushed the ethomson/off_t branch 4 times, most recently from 5d18fc5 to 33ff267 Compare June 23, 2019 19:01
@ethomson ethomson force-pushed the ethomson/off_t branch 4 times, most recently from a2a587d to b871167 Compare July 20, 2019 20:30
@ethomson
Copy link
Member Author

Rebased to fix conflicts.

@ethomson
Copy link
Member Author

Rebased.

@ethomson
Copy link
Member Author

@pks-t any thoughts here? I'd love to think about a release 🔜 and include this.

@pks-t
Copy link
Member
pks-t commented Aug 21, 2019 via email

Copy link
Member
@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me. There's two APIs where I'm sceptical about using git_object_size_t with the diff API and the futils interfaces, though

8000

@@ -603,7 +603,7 @@ typedef struct {
int new_lineno; /**< Line number in new file or -1 for deleted line */
int num_lines; /**< Number of newline characters in content */
size_t content_len; /**< Number of bytes of data */
git_off_t content_offset; /**< Offset in the original file to the content */
git_object_size_t content_offset; /**< Offset in the original file to the content */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really make sense to use git_object_size_t here? Diffs aren't necessary referring to any objects at all but, so it seems a bit dirty to me to mix these up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps not. The rare good use of git_off_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/diff_file.c Outdated
@@ -46,7 +46,7 @@ static int diff_file_content_init_common(
{
fc->opts_flags = opts ? opts->flags : GIT_DIFF_NORMAL;

if (opts && opts->max_size >= 0)
if (opts && (int64_t)opts->max_size >= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh. This comparison doesn't make sense anymore, as max_size is now unsigned. If it becomes negative due to casting then I'd consider this a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ignore my last comment. You're right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* Defaults to 512MB.
*/
git_off_t max_size;
git_object_size_t max_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this changes our API, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, if callers ever provided a negative value, it was caught in diff_file_content_init_common, and ignored.
#5123 (diff) was doing the same. So as-is we are API compatible, but if we remove that odd-looking check, then we are in fact changing the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this back to a signed int for sanity.

src/diff_xdiff.c Outdated
@@ -160,7 +160,7 @@ static int git_xdiff_cb(void *priv, mmbuffer_t *bufs, int len)
else if (line.origin == GIT_DIFF_LINE_DELETION)
line.content_offset = bufs[1].ptr - info->xd_old_data.ptr;
else
line.content_offset = -1;
line.content_offset = GIT_OBJECT_SIZE_MAX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes semantics too, as we pass these to the callers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can you give me more information about what you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're populating the git_diff_line struct here, which we directly pass to the user via the data_cb callback. So previously callers would've seen content_offset = -1, now he sees content_offset = UINT64_MAX if a line was unchanged. I could imagine people having checks for that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this back to a signed int to support this.

src/futils.c Outdated
@@ -112,7 +112,7 @@ int git_futils_truncate(const char *path, int mode)
return 0;
}

git_off_t git_futils_filesize(git_file fd)
int git_futils_filesize(git_object_size_t *out, git_file fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< 6D4E button type="submit" data-view-component="true" class="Button--secondary Button--medium Button d-inline-block"> Hide comment

Not sure how I feel about these. off_t directly maps to the mmap functions of the OS, using git_object_size_t feels weird to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, mmap does take an off_t but only for the offset into the file. That doesn't correspond to the length of the file or memory-mapped segment.

I definitely don't think that we should use a signed type for file length, it simply can't be negative. We wouldn't want to use size_t here, since on a 32-bit system, we want to be able to get the size of a file larger than 4 GB.

We could call this the more direct uint64_t here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. Some platforms handle negative offsets in mmap just fine and treat it as valid. It's questionable though whether we're going to ever need that, especially considering that we'd have to fight with cross-platform behaviour here. So 👍 for uint64_t.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for uint64_t

@ethomson ethomson force-pushed the ethomson/off_t branch 2 times, most recently from c689857 to 284d1d9 Compare October 28, 2019 11:07
@ethomson ethomson mentioned this pull request Nov 1, 2019
2 tasks
Copy link
Contributor
@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I think I'd like it better as a single global: deprecate git_off_t commit, since it's mostly about typechanging things around, but that's minor. Thanks for fixing this @ethomson !

src/futils.h Outdated
@@ -330,7 +330,7 @@ extern int git_futils_fake_symlink(const char *new, const char *old);
*/
typedef struct {
struct timespec mtime;
git_off_t size;
git_object_size_t size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some file APIs now use uint64_t, some use git_object_size_t. Shouldn't this one here use uint64_t, as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this probably makes sense.

@@ -235,7 +235,7 @@ int git__mmap_alignment(size_t *alignment)
}


int p_mmap(git_map *out, size_t len, int prot, int flags, int fd, git_off_t offset)
int p_mmap(git_map *out, size_t len, int prot, int flags, int fd, off64_t offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, no need to change: why did you opt to convert to off64_t instead of just typedeffing git_off_t to off64_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it was clearer for internal uses to use something Very Obviously 64 bit, and something that is a Standard Type. git_off_t feels like the sort of thing that is perhaps useful for our external API, but less useful for our internal.

Your comment did make me rethink a few things, so I actually kept git_off_t in the external type places that it was used. I think this is a pretty reasonable compromise.

src/win32/map.c Outdated
@@ -99,7 +99,7 @@ int p_mmap(git_map *out, size_t len, int prot, int flags, int fd, git_off_t offs
return -1;
}

assert(sizeof(git_off_t) == 8);
assert(sizeof(off64_t) == 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense? I mean it's very unlikely that off64_t is ever going to be something else than 8 bytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 Yes, that's very true. Removed entirely.

Introduce `git_object_size_t`, an unsigned type that we can use for the
maximum size of git objects.
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
Instead of using a signed type (`off_t`) use `uint64_t` for the maximum
size of files.
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
Instead of using a signed type (`off_t`) use an unsigned `uint64_t` for
the size of the files.
@ethomson ethomson force-pushed the ethomson/off_t branch 7 times, most recently from 994e41e to 6c13cf6 Compare November 22, 2019 23:18
64 bit types are always 64 bit.
Prefer `off64_t` to `git_off_t` for internal visibility.
Prefer `off64_t` to `git_off_t` internally for visibility.
Use int64_t internally for type visibility.
Prefer `off64_t` internally.
Copy link
Member
@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, Ed!

if (!fc->file->size)
error = git_futils_filesize(&fc->file->size, fd);

if (error < 0 || !fc->file->size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look a bit weird. Seems like we do not allow loading empty files from disk here. Your change only makes code a lot clearer here, so I'm fine with this as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0