8000 Add example program to emulate 'git log' and create pathspec matching API by arrbee · Pull Request #1711 · libgit2/libgit2 · GitHub
[go: up one dir, main page]

Skip to content

Add example program to emulate 'git log' and create pathspec matching API #1711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

arrbee
Copy link
Member
@arrbee arrbee commented Jul 5, 2013

I started this a little while ago as an attempt to write an example program that would mimic git log behavior. I ended up with that (albeit only supporting a small fraction of the options), plus new public APIs for "compiling" a pathspec and matching against the working directory, the index, or an arbitrary tree. I also added a couple of small adjustments that probably won't make much difference - a way to get the raw text of a commit header (which changed the commit parsing internals slightly since I now keep the commit header text around) and a GIT_INDEX_STAGE_ANY constant that can be used during index lookups if you just want to find out if a given file is mentioned anywhere in the index regardless of the stage.

The log example is far from complete. In particular, I didn't really do any formatting, including no support currently for --oneline or anything like that. Also, there is no color output or pagination, so it's kind of a pain to use. However, once the basic log example is in place, it becomes more of a side project or newbie project to add support for various command line options.

The log example does support several of the simple options (--max-count and --skip, --min-parents and --max-parents) and it supports two of the more complicated options: pathname matching and patch printing (-p). The pathname matching was particularly interesting - it is the reason that I created the public pathspec API (although in the end, it generally works off of diff, not off of the pathspec API) and because it took me a while to get the rules right for merge commits. The diff display was pretty easy given that we have a working example of diff.

Comments and criticism welcome! There are probably some parts of the example that are a little odd, so I'd love feedback.

arrbee added 12 commits July 3, 2013 14:56
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository.  This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.

While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git).  Further enhancements are
coming, but this was a good place to take a functional snapshot.
This adds more command line processing to the example version of
log.  In particular, this adds the funky command line processing
that allows an arbitrary series of revisions followed by an
arbitrary number of paths and/or glob patterns.

The actual logging part still isn't implemented.
This fixes the way the example log program decides if a merge
commit should be shown when a pathspec is given.  Also makes it
easier to use the pathspec API to just check "does a tree match
anything in the pathspec" without allocating a match list.
if (parent_count < 1)
parent_count = 1;

if (git_vector_init(&commit->parent_ids, parent_count, NULL) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an ideal place to use the typedef array you created. The vector here has always felt overkill.

This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block.  Since the two collections
have almost identical APIs, there isn't much involved in making
the change.  I did add an API to git_array_t so that it could be
allocated at a precise initial size.
@arrbee
Copy link
Member Author
arrbee commented Jul 8, 2013

@vmg I updated the commit object to keep a git_array_t(git_oid) for the parent ids. As always, another great suggestion! I ran valgrind on the result and everything looks clean to me.

@arrbee
Copy link
Member Author
arrbee commented Jul 8, 2013

@nulltoken This work includes a new public API for "compiling" a pathspec and matching it against the working directory, the index, or a given tree - see https://github.com/libgit2/libgit2/pull/1711/files#diff-6 - I'd really love your feedback if you have a chance.

I tried to cover various use cases like "just tell if it matches or not", "tell me the list of files that match", and "tell me the list of patterns that had no match." I'm don't think that the various case-insensitivity issues are really addressed at this point, but I think that fixing them will be part of an eventual fix for #1689 and not part of this PR.

* Compile a pathspec
*
* @param out Output of the compiled pathspec
* @param flags Combination of git_pathspec_flag_t values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused parameter?

@arrbee
Copy link
Member Author
arrbee commented Jul 8, 2013

Thanks @nulltoken - I fixed that problem and a number of other doc issues in the header (and tried to make the explanations clearer in general). The docs could probably still be even better, but I think at least there are no outright errors now.

@arrbee
Copy link
Member Author
arrbee commented Jul 8, 2013

There is one more change I'd like to make to this PR based on looking at some of the old issues that are open related to implement log behavior.

I'd like to add a git_pathspec_match_diff() function that allows you to build a match list of the delta objects that match a pathspec. This may initially seem redundant with the pathspec matching built in to diff generation, but when renames come into play, I think it is not. To track a file across renames, you can't use the built in diff pathspec matching because that will filter out possible rename sources or targets - you have to generate the entire diff. Then you would call this proposed API to check if the diff results (with renames) actually match the named file. The match list should be able to tell you not just that it matched, but which git_diff_delta matched (so that you can actually track the new name of the file across the rename).

Anyhow, I'm already working on this extension, but I'd like to hold off on merging this PR until I have it ready!

@arrbee
Copy link
Member Author
arrbee commented Jul 8, 2013

Adding the function to match a pathspec to a diff has led to a lot of refactoring so that new function won't have to duplicate all the code in the old function. It's fine but it's just a little bigger project that I was initially thinking. Still, I think it will improve the code in the long run.

arrbee added 2 commits July 8, 2013 22:42
This is a simple bit vector object that is not resizable after
the initial allocation but can be of arbitrary size.  It will
keep the bti vector entirely on the stack for vectors 64 bits
or less, and will allocate the vector on the heap for larger
sizes.  The API is uniform regardless of storage location.

This is very basic right now and all the APIs are inline functions,
but it is useful for storing an array of boolean values.
This adds an additional pathspec API that will match a pathspec
against a diff object.  This is convenient if you want to handle
renames (so you need the whole diff and can't use the pathspec
constraint built into the diff API) but still want to tell if the
diff had any files that matched the pathspec.

When the pathspec is matched against a diff, instead of keeping
a list of filenames that matched, instead the API keeps the list
of git_diff_deltas that matched and they can be retrieved via a
new API git_pathspec_match_list_diff_entry.

There are a couple of other minor API extensions here that were
mostly for the sake of convenience and to reduce dependencies
on knowing the internal data structure between files inside the
library.
@arrbee
Copy link
Member Author
arrbee commented Jul 9, 2013

Okay, I added in:

GIT_EXTERN(int) git_pathspec_match_diff(
    git_pathspec_match_list **out,
    git_diff_list *diff,
    uint32_t flags,
    git_pathspec *ps);

This is designed to be useful if you want to apply rename detection (so you need all the deltas in the diff object and can't use the pathspec built in to the diff options) but you still want to understand if the diff results contain any items that match your pathspec.

Along with this, I also reorganized the code in pathspec.c so there could be more code sharing between the two inner function bodies (i.e. the diff match and all the other match function which use iterators).

As part of the code cleanup, instead of allocating an array of uint8_t to store the boolean values that store which patterns have been matched and which have not, I wrote a small bit vector object to track things. The bit vector is ridiculously simplistic, but it has the one nice feature that a vector smaller that 64 bits can be managed without heap allocation and the choice of stack vs. heap allocation is transparent in the API. Feel free to pick it apart!

@vmg
Copy link
Member
vmg commented Jul 10, 2013

MERGED MANUALLY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0