-
Notifications
You must be signed in to change notification settings - Fork 899
Diff: Add rename detection #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What kind of a sane API should we be using here? Should we always do the mostest possible rename detection? It looks like there's a test that expects that. If so, there's no external surface for this; if not, should it just be |
Just pushed a version that always does basic rename detection, and it makes that old skipped test pass. Is this what we want? |
I have sense of the aesthetic here, so I can't advise on what will feel good in this API, but I would advise against defaulting to "mostest possible rename detection" (i.e. with The more modest |
Have a look at #278 for discussion on rename/copies API. Also, https://github.com/dahlbyk/libgit2sharp/compare/gh278 |
Just ported in the work from #278, including all the great tests. Looks like there was a bit of debate as to how to expose all the rename-detection options. Right now there are just three booleans. It sort of seemed like a flags enum was the favored solution, with possibly a fluent API layered on top of it? Would something like this be okay? [Flags]
public enum RenameDetectionFlags
{
Renames = 1 << 0,
Copies = 1 << 1,
CopiesHarder = 1 << 3,
RenamesAndCopiesForUntrackedItems = 1 << 6,
IgnoreWhitespace = 1 << 12,
ConsiderWhitespace = 1 << 13,
ExactMatchOnly = 1 << 14,
}
public RenameDetectionFlags Flags { get; set; } |
Also: do we want to worry about all the break-rewrites-into-separate-adds-and-deletes flags? That adds a bit of complexity and quite a few tests, and I'm not sure of the use case for it. |
I think @nulltoken was thinking more a non-flags enum: public enum DetectionFlags
{
None,
Renames,
RenamesAndCopies,
RenamesAndCopiesHarder,
}
|
No:
|
In that case, it seems like maybe we should split out
|
In case there is any confusion regarding ignoring whitespace, there are three supported settings, not two: the default value ( Of course, |
Something like this, that namespace LibGit2Sharp
{
public sealed class SimilarityOptions
{
public enum RenameDetectionMode
{
// ...
}
public enum WhitespaceMode
{
// ...
}
public int RenameThreshold { get; set; }
public int RenameFromRewriteThreshold { get; set; }
public int CopyThreshold { get; set; }
public int BreakRewriteThreshold { get; set; }
public int RenameLimit { get; set; }
// TODO: similarity metric
}
} |
Rather than an additional argument to Compare, I was thinking a type like that set as a property on CompareOptions. |
For reference: @arrbee I'm a little confused which of those options imply which others. For example, does For this first pass, I'd suggest we pass on including options related to finding/breaking rewrites. |
@dahlbyk That's a good question and it should be answered more clearly in the docs. I So, just reading from the C code, the current implemented rules are:
Currently, that's it in the code. However, reading over the code, it also appears to me that the Anyhow, feedback is welcome - this corner of functionality has not had much scrutiny, so this is really the first external consideration of whether these rules actually make sense or not. |
That seems reasonable to me, in which case So it looks to me like we're good with four rename modes:
And to support I had forgotten about the |
GitDiffFindFlags.GIT_DIFF_FIND_COPIES | | ||
GitDiffFindFlags.GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dahlbyk: I think this logic is correct? (There's an early return above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displaye 8000 d to describe this comment to others. Learn more.
👍
❤️ this API I wonder to what extent we should exercise the other options (whitespace, thresholds)? Perhaps just open an up-for-grabs issue that more extensive testing would not be a bad thing? By my reading, there are now only a handful of settings not supported:
|
string originalPath = Path.Combine(repo.Info.WorkingDirectory, "original.txt"); | ||
string renamedPath = Path.Combine(repo.Info.WorkingDirectory, "renamed.txt"); | ||
|
||
File.WriteAllText(originalPath, "a\nb\nc\nd\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace all the File.Writexxx()
calls with Touch()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I do agree! However, once again Travis is playing the PITA game.
I wonder if relying on a Scottish post-britpop band as a CI server was such a good idea... They seem very picky 😉 |
Good point. I guess the question here is how Or we'll just ignore it. I think you're right about the naming, though; I'll roll that change back for clarity.
This is basically the case now. |
I'd tend to go the other direction - if a consumer of this library cares how renames/copies are handled, they can specify At least, I thought that was the direction we were headed - as @arrbee describes the current implementation, the only missing piece is switching libgit2 to not detect renames if So that's what was rattling around in my head when I asked about the default tests - I was thinking of them as tests to confirm |
This would take a change at the libgit2 level; you can't pass |
Good point. Certainly not a reason to hold this up, just an edge case I wanted to mention. Thanks for your patience with me. Since the "no similarity detection" case is handled simply enough by not calling |
/// <summary> | ||
/// Obey the user's `diff.renames` configuration setting | ||
/// </summary> | ||
Default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is also literally our default behavior, can we list this option first so it's default(RenameDetectionMode)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Bump
Bump |
That is not how it is implemented currently.
That is how it is implemented currently. If you pass NULL for the options structure to I feel pretty strongly that an explicitly chosen value in the options structure should take precedence over any setting in the config file, otherwise it is too difficult to override the config (just as an explicit command-line option should override a config value). Reading over this discussion, I see that the current implementation of Additionally, if you pass in |
This is done.
I just spiked out a test to check: no, it doesn't. I think it works by looking at add/remove pairs and seeing if their SHAs match. Do you think adding an |
Are you sure? I just modified the existing |
@arrbee: Let me see if I've got the design scenarios right:
(Aside: am I missing anything from that list?) (1) is accomplished with I'm proposing adding a |
So it is. Reading are hard.
Wouldn't hurt. |
That's my reading, and I think it makes sense. |
@ben Your four cases sound right to me.
Yes, I agree. I also intended to ask whether GIT_DIFF_FIND_DEFAULT should be the default (when an options struct is passed in) or whether GIT_DIFF_FIND_RENAMES should be the default if given other options. My thinking there is that if you are calling Regardless of what names we end up with for base libgit2, I certainly think the libgit2sharp names (where rename detection is more naturally integrated with the diff creation process) should be independent and natural (as I think you've achieved here). |
Hwhoops, you're right! I should be setting all the rename and copy flags along with the exact flag. This last commit does exactly that. |
Isn't it unlikely that one would set thresholds without also noticing
👍 Doesn't seem like it would be much more expensive to check for exact renames and copies, so might as well simplify the API and always check both. Good call. |
That makes sense to me. |
Alright, let's move the native-level discussion over to libgit2/libgit2; I'll spin up a PR. Is there anything more I should do with the C# layer here? |
Looks great to me! |
|
||
// TODO: custom metric | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😒
🔨 it 🙏? |
🔨'd. |
A. W. E. S. O. M. E. |
Currently, if a file is renamed,
Diff.Compare<TreeChanges>()
will return twoTreeEntryChanges
structures; one with the original file deleted, and one with the new file added. That just ain't right.IncludeUnmodified
functionality