8000 Add rename detection to tree comparisons by roend83 · Pull Request #278 · libgit2/libgit2sharp · GitHub
[go: up one dir, main page]

Skip to content

Add rename detection to tree comparisons #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

roend83
Copy link
Contributor
@roend83 roend83 commented Dec 20, 2012

I'm trying to add support for detecting renames in tree comparisons using the git_diff_find_similar function in libgit2. I think I have the back-end all implemented to call the function. To try to see if it works I've hard-coded the compare function to find renames, but it doesn't seem to find renames at all. Can someone take a look at this and tell me what I'm doing wrong?

@arrbee
Copy link
Member
arrbee commented Dec 21, 2012

it doesn't seem to find renames at all

I'm not sure if it's the cause of your problems, but right now the underlying libgit2 code only finds exact matches for renames; the actual file similarity metric is not hooked up. Finishing that part up is about 3rd on my list of priorities.

@roend83
Copy link
Contributor Author
roend83 commented Dec 21, 2012

Any way I can help with the similarity metric?

@arrbee
Copy link
Member
arrbee commented Dec 21, 2012

I actually have it implemented as a standalone snippet of code already. When I get a chance, I'll make a gist of it and link it so you can take a look and see what you think...

@@ -76,6 +76,11 @@ public virtual TreeChanges Compare(Tree oldTree, Tree newTree, IEnumerable<strin
newTree != null ? newTree.Id : null,
options))
{
Proxy.git_diff_find_similar(diff, new GitDiffFindOptions
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIND_RENAMES being the default, passing null would be enough there

@nulltoken
Copy link
Member

@roend83 Awesome addition!

Once you're done with it, could you please consider covering those new features with tests? Below, a first one to get you started ;-)

diff --git a/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs b/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
index e8d10b0..04ece39 100644
--- a/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
+++ b/LibGit2Sharp.Tests/DiffTreeToTreeFixture.cs
@@ -167,6 +167,43 @@ public void CanDetectTheRenamingOfAModifiedFile()
             }
         }

+        [Fact]
+        public void CanDetectTheRenamingAndTheCopyingOfNonModifiedFiles()
+        {
+            SelfCleaningDirectory scd = BuildSelfCleaningDirectory();
+            using (var repo = Repository.Init(scd.DirectoryPath))
+            {
+                string originalPath = Path.Combine(repo.Info.WorkingDirectory, "original.txt");
+                string unmodifiedPath = Path.Combine(repo.Info.WorkingDirectory, "unmodified.txt");
+                string copiedPath = Path.Combine(repo.Info.WorkingDirectory, "copied.txt");
+                string alsoCopiedPath = Path.Combine(repo.Info.WorkingDirectory, "alsocopied.txt");
+                string renamedPath = Path.Combine(repo.Info.WorkingDirectory, "renamed.txt");
+
+                File.WriteAllText(originalPath, "a\nb\nc\nd\n");
+                File.WriteAllText(unmodifiedPath, "1\n2\n3\n4\n");
+
+                repo.Index.Stage(originalPath);
+                repo.Index.Stage(unmodifiedPath);
+
+                Commit old = repo.Commit("Initial", DummySignature, DummySignature);
+
+                File.Copy(originalPath, copiedPath);
+                File.Copy(originalPath, alsoCopiedPath);
+
+                repo.Index.Stage(copiedPath);
+                repo.Index.Stage(alsoCopiedPath);
+
+                repo.Index.Move(originalPath, renamedPath);
+
+                Commit @new = repo.Commit("Updated", DummySignature, DummySignature);
+
+                TreeChanges changes = repo.Diff.Compare(old.Tree, @new.Tree);
+                Assert.Equal(3, changes.Count());
+                Assert.Equal(2, changes.Copied.Count());
+                Assert.Equal(1, changes.Renamed.Count());
+            }
+        }
+
         /*
          * $ git diff f8d44d7..ec9e401
          * diff --git a/numbers.txt b/numbers.txt

@roend83
Copy link
Contributor Author
roend83 commented Feb 8, 2013

@nulltoken I tried to add some tests for this, but the copy detection isn't working how I assumed that it would. Do you have some time to look at the failing tests?

@arrbee Any updates on partial rename detection?

@nulltoken
Copy link
Member

Do you have some time to look at the failing tests?

@roend83 Mistery solved! Your tests now pass.

Sorry, this is a bit hacky, but I hadn't had much time.
Indeed, I made the find-harder copy detection option the default which wouldn't be recommended (from a performance perspective).

Regarding the proposed API, I'd prefer to have the options exposed through a enum rather than booleans. Maybe something like FindRenames, FindRenamesAndCopies, FindRenamesAndCopiesHarder?

diff --git a/LibGit2Sharp/Diff.cs b/LibGit2Sharp/Diff.cs
index 5dadd7c..f0fb507 100644
--- a/LibGit2Sharp/Diff.cs
+++ b/LibGit2Sharp/Diff.cs
@@ -29,6 +29,11 @@ private GitDiffOptions BuildOptions(DiffOptions diffOptions, IEnumerable<string>
                 GitDiffOptionFlags.GIT_DIFF_INCLUDE_UNTRACKED_CONTENT;
             }

+            if (diffOptions.HasFlag(DiffOptions.IncludeUnmodified))
+            {
+                options.Flags |= GitDiffOptionFlags.GIT_DIFF_INCLUDE_UNMODIFIED;
+            }
+
             if (paths == null)
             {
                 return options;
@@ -104,7 +109,14 @@ internal Diff(Repository repo)
         /// </returns>
         public virtual TreeChanges Compare(Tree oldTree, Tree newTree, IEnumerable<string> paths = null, bool detectRenames = false, bool detectCopies = false)
         {
-            using(GitDiffOptions options = BuildOptions(DiffOptions.None, paths))
+            var opts = DiffOptions.None;
+
+            if (detectCopies)
+            {
+                opts |= DiffOptions.IncludeUnmodified;
+            }
+
+            using(GitDiffOptions options = BuildOptions(opts, paths))
             using (DiffListSafeHandle diff = BuildDiffListFromTrees(
                 oldTree != null ? oldTree.Id : null,
                 newTree != null ? newTree.Id : null,
@@ -124,22 +136,25 @@ private DiffListSafeHandle BuildDiffListFromTrees(ObjectId oldTree, ObjectId new

         private DiffListSafeHandle HandleRenameAndCopyDetection(DiffListSafeHandle diff, bool detectRenames, bool detectCopies)
         {
-            if (detectRenames || detectCopies)
+            if (!detectRenames && !detectCopies)
             {
-                var diffFindOptions = new GitDiffFindOptions();
+                return diff;
+            }

-                if (!detectCopies)
-                {
-                    diffFindOptions.Flags = GitDiffFindOptionFlags.GIT_DIFF_FIND_RENAMES;
-                }
-                else if (!detectRenames)
-                {
-                    diffFindOptions.Flags = GitDiffFindOptionFlags.GIT_DIFF_FIND_COPIES;
-                }
+            var diffFindOptions = new GitDiffFindOptions();

-                Proxy.git_diff_find_similar(diff, diffFindOptions);
+            if (detectRenames)
+            {
+                diffFindOptions.Flags |= GitDiffFindOptionFlags.GIT_DIFF_FIND_RENAMES;
             }

+            if (detectCopies)
+            {
+                diffFindOptions.Flags |= GitDiffFindOptionFlags.GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED;
+            }
+
+            Proxy.git_diff_find_similar(diff, diffFindOptions);
+
             return diff;
         }

diff --git a/LibGit2Sharp/DiffOptions.cs b/LibGit2Sharp/DiffOptions.cs
index 133e5a4..d099f03 100644
--- a/LibGit2Sharp/DiffOptions.cs
+++ b/LibGit2Sharp/DiffOptions.cs
@@ -19,5 +19,11 @@ internal enum DiffOptions
         ///   diffing against the working directory.
         /// </summary>
         IncludeUntracked,
+
+        /// <summary>
+        ///   Include unmodified files among the files to be processed. This
+        ///   is a requirement when emulating --find-copies-harder
+        /// </summary>
+        IncludeUnmodified,
     }
 }
diff --git a/LibGit2Sharp/TreeChanges.cs b/LibGit2Sharp/TreeChanges.cs
index 09d64f3..2a1029c 100644
--- a/LibGit2Sharp/TreeChanges.cs
+++ b/LibGit2Sharp/TreeChanges.cs
@@ -63,6 +63,12 @@ private int PrintCallBack(GitDiffDelta delta, GitDiffRange range, GitDiffLineOri
             string formattedoutput = Utf8Marshaler.FromNative(content, (int)contentlen);

             TreeEntryChanges currentChange = AddFileChange(delta, lineorigin);
+
+            if (currentChange == null)
+            {
+                return 0;
+            }
+
             AddLineChange(currentChange, lineorigin);

             currentChange.AppendToPatch(formattedoutput);
@@ -89,6 +95,11 @@ private void AddLineChange(Changes currentChange, GitDiffLineOrigin lineOrigin)

         private TreeEntryChanges AddFileChange(GitDiffDelta delta, GitDiffLineOrigin lineorigin)
         {
+            if (delta.Status == ChangeKind.Unmodified)
+            {
+                return null;
+            } 
+
             var newFilePath = FilePathMarshaler.FromNative(delta.NewFile.Path);

             if (lineorigin != GitDiffLineOrigin.GIT_DIFF_LINE_FILE_HDR)

@roend83
Copy link
Contributor Author
roend83 commented Feb 9, 2013

@nulltoken What do you think of this API? I went with a fluent API instead of an enumeration to allow you to pass the threshold value along too. If you like it, I'll go ahead and get it added to the rest of the diffing functions.

@nulltoken
Copy link
Member

@roend83 Good catch! I indeed completely forgot about the threshold values.

The fluent interface is a nice idea. However, after having playing with the idea of it for some days, I'd prefer something slightly less elaborate.

I'd more inclined to make the Compare() method accept a new CompareOptions public type (Similarly to the RepositoryOptions type) which would expose the previously proposed enum.

In order to keep things simple (until some more complex requirement is raised), I think we should consider making CompareOptions only expose one threshold. It would be used for detection of both Renames and Copies.

One possible option would be to initialize the similarity index threshold to 50% (Git default) and only consider its value when the detection behavior enum is switched from None to FindRenames, FindRenamesAndCopies or FindRenamesAndCopiesHarder

Note: Maybe would DiffOptions be a better name ?

Note 2: @ben @jamill What would you think about rewriting of the signatures of Checkout() and Clone() to make them accept xxxOptions parameters?

@ben
Copy link
Member
ben commented Feb 11, 2013

I like it. Repository.Clone currently exposes a small subset of the functionality available through the git_clone native call. Might be nice to have a way of exposing it without a billion function parameters.

@dahlbyk
Copy link
Member
dahlbyk commented Feb 11, 2013

Using an enum kind of backs us into the corner should we want to supplement with configurable threshold and whatnot in the future. Why not something like this?

class CompareOptions {
    public CompareOptions(bool renames = false, int renameThreshold = 50, bool copies = false, int copiesThreshold = 50) { ... }

    public static CompareOptions None { get { return new CompareOptions(); } }
    public static CompareOptions FindRenames { get { return new CompareOptions(renames: true); } }
    public static CompareOptions FindRenamesAndCopies { get { return new CompareOptions(renames: true, copies: true); } }
}

A fluent config API could even be layered on top of this type (perhaps in a Fluent namespace?) for those that prefer that style to normal object construction.

@nulltoken
Copy link
Member

Why not something like this? (...] A fluent config API could even be layered on top of this type

Smart refactoring ;-) !

Using an enum kind of backs us into the corner should we want to supplement with configurable threshold and whatnot in the future.

I'd rather wait for real world use cases (and bugs) regarding the need for a second threshold.

So, for now, I'd rather keep it simple with one threshold and an enum, and deploy the xxxOptions pattern among the library.

However, when it's starting to be painful to maintain (because of new params, or kind of params), of course we'll select a new form of xxxOptions considering all the usages.

@dahlbyk
Copy link
Member
dahlbyk commented Sep 4, 2013

Result of tonight's procrastination: https://github.com/dahlbyk/libgit2sharp/compare/gh278

I left the API close to how @roend83 had started, but applied to CompareOptions (introduced by #423). I'm still intrigued the idea of layering a fluent API on top, but I agree with @nulltoken that we probably shouldn't start there.

It's worth noting that libgit2's "find" API has evolved quite a bit since this PR was created... the binding will need to be updated, too.

@nulltoken
Copy link
Member

@dahlbyk As the flags seem to be cumulative, I wonder if we shouldn't rather make CompareOptions expose a DetectionBehavior enum (Renames, RenamesAndCopies, ...)

Thoughts?

@dahlbyk
Copy link
Member
dahlbyk commented Sep 5, 2013

You had suggested that before, and I agree. It may not be that simple though, given the other combinations of find options we may want to support (rewrite handling, detection for untracked, whitespace handling).

@roend83, do you want to take it from here?

@nulltoken
Copy link
Member

It's worth noting that libgit2's "find" API has evolved quite a bit since this PR was created... the binding will need to be updated, too.

@dahlbyk The libgit2 binaries have just been updated

@dahlbyk dahlbyk mentioned this pull request Nov 22, 2013
4 tasks
@nulltoken
Copy link
Member

Superseded by #573

Cheers!

@nulltoken nulltoken closed this Dec 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0