-
Notifications
You must be signed in to change notification settings - Fork 899
Add rename detection to tree comparisons #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure if it's the cause of your problems, but right now the underlying libgit2 code only finds exact matches for renames; the actual file similarity metric is not hooked up. Finishing that part up is about 3rd on my list of priorities. |
Any way I can help with the similarity metric? |
I actually have it implemented as a standalone snippet of code already. When I get a chance, I'll make a gist of it and link it so you can take a look and see what you think... |
@@ -76,6 +76,11 @@ public virtual TreeChanges Compare(Tree oldTree, Tree newTree, IEnumerable<strin | |||
newTree != null ? newTree.Id : null, | |||
options)) | |||
{ | |||
Proxy.git_diff_find_similar(diff, new GitDiffFindOptions | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIND_RENAMES
being the default, passing null would be enough there
@roend83 Awesome addition! Once you're done with it, could you please consider covering those new features with tests? Below, a first one to get you started ;-) diff --git a/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs b/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
index e8d10b0..04ece39 100644
--- a/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
+++ b/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
@@ -167,6 +167,43 @@ public void CanDetectTheRenamingOfAModifiedFile()
}
}
+ [Fact]
+ public void CanDetectTheRenamingAndTheCopyingOfNonModifiedFiles()
+ {
+ SelfCleaningDirectory scd = BuildSelfCleaningDirectory();
+ using (var repo = Repository.Init(scd.DirectoryPath))
+ {
+ string originalPath = Path.Combine(repo.Info.WorkingDirectory, "original.txt");
+ string unmodifiedPath = Path.Combine(repo.Info.WorkingDirectory, "unmodified.txt");
+ string copiedPath = Path.Combine(repo.Info.WorkingDirectory, "copied.txt");
+ string alsoCopiedPath = Path.Combine(repo.Info.WorkingDirectory, "alsocopied.txt");
+ string renamedPath = Path.Combine(repo.Info.WorkingDirectory, "renamed.txt");
+
+ File.WriteAllText(originalPath, "a\nb\nc\nd\n");
+ File.WriteAllText(unmodifiedPath, "1\n2\n3\n4\n");
+
+ repo.Index.Stage(originalPath);
+ repo.Index.Stage(unmodifiedPath);
+
+ Commit old = repo.Commit("Initial", DummySignature, DummySignature);
+
+ File.Copy(originalPath, copiedPath);
+ File.Copy(originalPath, alsoCopiedPath);
+
+ repo.Index.Stage(copiedPath);
+ repo.Index.Stage(alsoCopiedPath);
+
+ repo.Index.Move(originalPath, renamedPath);
+
+ Commit @new = repo.Commit("Updated", DummySignature, DummySignature);
+
+ TreeChanges changes = repo.Diff.Compare(old.Tree, @new.Tree);
+ Assert.Equal(3, changes.Count());
+ Assert.Equal(2, changes.Copied.Count());
+ Assert.Equal(1, changes.Renamed.Count());
+ }
+ }
+
/*
* $ git diff f8d44d7..ec9e401
* diff --git a/numbers.txt b/numbers.txt |
…sts. Note that copy tests are currently failing.
@nulltoken I tried to add some tests for this, but the copy detection isn't working how I assumed that it would. Do you have some time to look at the failing tests? @arrbee Any updates on partial rename detection? |
@roend83 Mistery solved! Your tests now pass. Sorry, this is a bit hacky, but I hadn't had much time. Regarding the proposed API, I'd prefer to have the options exposed through a enum rather than booleans. Maybe something like diff --git a/LibGit2Sharp/Diff.cs b/LibGit2Sharp/Diff.cs
index 5dadd7c..f0fb507 100644
--- a/LibGit2Sharp/Diff.cs
+++ b/LibGit2Sharp/Diff.cs
@@ -29,6 +29,11 @@ private GitDiffOptions BuildOptions(DiffOptions diffOptions, IEnumerable<string>
GitDiffOptionFlags.GIT_DIFF_INCLUDE_UNTRACKED_CONTENT;
}
+ if (diffOptions.HasFlag(DiffOptions.IncludeUnmodified))
+ {
+ options.Flags |= GitDiffOptionFlags.GIT_DIFF_INCLUDE_UNMODIFIED;
+ }
+
if (paths == null)
{
return options;
@@ -104,7 +109,14 @@ internal Diff(Repository repo)
/// </returns>
public virtual TreeChanges Compare(Tree oldTree, Tree newTree, IEnumerable<string> paths = null, bool detectRenames = false, bool detectCopies = false)
{
- using(GitDiffOptions options = BuildOptions(DiffOptions.None, paths))
+ var opts = DiffOptions.None;
+
+ if (detectCopies)
+ {
+ opts |= DiffOptions.IncludeUnmodified;
+ }
+
+ using(GitDiffOptions options = BuildOptions(opts, paths))
using (DiffListSafeHandle diff = BuildDiffListFromTrees(
oldTree != null ? oldTree.Id : null,
newTree != null ? newTree.Id : null,
@@ -124,22 +136,25 @@ private DiffListSafeHandle BuildDiffListFromTrees(ObjectId oldTree, ObjectId new
private DiffListSafeHandle HandleRenameAndCopyDetection(DiffListSafeHandle diff, bool detectRenames, bool detectCopies)
{
- if (detectRenames || detectCopies)
+ if (!detectRenames && !detectCopies)
{
- var diffFindOptions = new GitDiffFindOptions();
+ return diff;
+ }
- if (!detectCopies)
- {
- diffFindOptions.Flags = GitDiffFindOptionFlags.GIT_DIFF_FIND_RENAMES;
- }
- else if (!detectRenames)
- {
- diffFindOptions.Flags = GitDiffFindOptionFlags.GIT_DIFF_FIND_COPIES;
- }
+ var diffFindOptions = new GitDiffFindOptions();
- Proxy.git_diff_find_similar(diff, diffFindOptions);
+ if (detectRenames)
+ {
+ diffFindOptions.Flags |= GitDiffFindOptionFlags.GIT_DIFF_FIND_RENAMES;
}
+ if (detectCopies)
+ {
+ diffFindOptions.Flags |= GitDiffFindOptionFlags.GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED;
+ }
+
+ Proxy.git_diff_find_similar(diff, diffFindOptions);
+
return diff;
}
diff --git a/LibGit2Sharp/DiffOptions.cs b/LibGit2Sharp/DiffOptions.cs
index 133e5a4..d099f03 100644
--- a/LibGit2Sharp/DiffOptions.cs
+++ b/LibGit2Sharp/DiffOptions.cs
@@ -19,5 +19,11 @@ internal enum DiffOptions
/// diffing against the working directory.
/// </summary>
IncludeUntracked,
+
+ /// <summary>
+ /// Include unmodified files among the files to be processed. This
+ /// is a requirement when emulating --find-copies-harder
+ /// </summary>
+ IncludeUnmodified,
}
}
diff --git a/LibGit2Sharp/TreeChanges.cs b/LibGit2Sharp/TreeChanges.cs
index 09d64f3..2a1029c 100644
--- a/LibGit2Sharp/TreeChanges.cs
+++ b/LibGit2Sharp/TreeChanges.cs
@@ -63,6 +63,12 @@ private int PrintCallBack(GitDiffDelta delta, GitDiffRange range, GitDiffLineOri
string formattedoutput = Utf8Marshaler.FromNative(content, (int)contentlen);
TreeEntryChanges currentChange = AddFileChange(delta, lineorigin);
+
+ if (currentChange == null)
+ {
+ return 0;
+ }
+
AddLineChange(currentChange, lineorigin);
currentChange.AppendToPatch(formattedoutput);
@@ -89,6 +95,11 @@ private void AddLineChange(Changes currentChange, GitDiffLineOrigin lineOrigin)
private TreeEntryChanges AddFileChange(GitDiffDelta delta, GitDiffLineOrigin lineorigin)
{
+ if (delta.Status == ChangeKind.Unmodified)
+ {
+ return null;
+ }
+
var newFilePath = FilePathMarshaler.FromNative(delta.NewFile.Path);
if (lineorigin != GitDiffLineOrigin.GIT_DIFF_LINE_FILE_HDR) |
@nulltoken What do you think of this API? I went with a fluent API instead of an enumeration to allow you to pass the threshold value along too. If you like it, I'll go ahead and get it added to the rest of the diffing functions. |
@roend83 Good catch! I indeed completely forgot about the threshold values. The fluent interface is a nice idea. However, after having playing with the idea of it for some days, I'd prefer something slightly less elaborate. I'd more inclined to make the In order to keep things simple (until some more complex requirement is raised), I think we should consider making One possible option would be to initialize the similarity index threshold to 50% (Git default) and only consider its value when the detection behavior enum is switched from Note: Maybe would Note 2: @ben @jamill What would you think about rewriting of the signatures of |
I like it. |
Using an enum kind of backs us into the corner should we want to supplement with configurable threshold and whatnot in the future. Why not something like this? class CompareOptions {
public CompareOptions(bool renames = false, int renameThreshold = 50, bool copies = false, int copiesThreshold = 50) { ... }
public static CompareOptions None { get { return new CompareOptions(); } }
public static CompareOptions FindRenames { get { return new CompareOptions(renames: true); } }
public static CompareOptions FindRenamesAndCopies { get { return new CompareOptions(renames: true, copies: true); } }
} A fluent config API could even be layered on top of this type (perhaps in a Fluent namespace?) for those that prefer that style to normal object construction. |
Smart refactoring ;-) !
I'd rather wait for real world use cases (and bugs) regarding the need for a second threshold. So, for now, I'd rather keep it simple with one threshold and an enum, and deploy the However, when it's starting to be painful to maintain (because of new params, or kind of params), of course we'll select a new form of |
Result of tonight's procrastination: https://github.com/dahlbyk/libgit2sharp/compare/gh278 I left the API close to how @roend83 had started, but applied to It's worth noting that libgit2's "find" API has evolved quite a bit since this PR was created... the binding will need to be updated, too. |
@dahlbyk As the flags seem to be cumulative, I wonder if we shouldn't rather make Thoughts? |
You had suggested that before, and I agree. It may not be that simple though, given the other combinations of find options we may want to support (rewrite handling, detection for untracked, whitespace handling). @roend83, do you want to take it from here? |
@dahlbyk The libgit2 binaries have just been updated |
Superseded by #573 Cheers! |
I'm trying to add support for detecting renames in tree comparisons using the git_diff_find_similar function in libgit2. I think I have the back-end all implemented to call the function. To try to see if it works I've hard-coded the compare function to find renames, but it doesn't seem to find renames at all. Can someone take a look at this and tell me what I'm doing wrong?