-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Calling assertEquals for moderately long list takes too long #63416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Call to assertEquals(list1, list2) does not finish (takes more than couple of minutes), for lists that containt 10000 elements if all list elements are different. The same call in python2.6 finishes instanteneously. This occours even if error message is truncated using maxDiff. |
I have attached a simple test case. |
Ouch. Looking. |
Here's a proof of concept to fix the issue (still lacks tests and breaks one existing test). The problem is that assertSequenceEqual computes the diff regardless of the size of the sequences, and then truncates it if it's too long. Producing the diff takes time, so checking the size of the sequences and skip the diff if they are too long solves the problem. In the patch I used an hardcoded and arbitrary value of 30 elements -- the other attributes like maxDiff and _diffThreshold can't be used here. This issue is related to bpo-11763. All other methods should be checked for similar issues. |
Perhaps a combination of len(pprint.pformat(seq1).splitlines()) and len(pprint.pformat(seq2).splitlines()) (minimum, maximum or production) is better metric? |
The patch works for me, thanks Ezio! |
pprint is also likely to have performance issues. I agree with Ezio that a diff consisting of more than 30(x2) lines is not likely to be directly useful anyway. A test for the changed behaviour would be nice. |
The cause of this has been identified as a bug in libdiff.compare(). See bpo-21253 for more information. |
Ezio, do you intend to add a test to your patch? |
I don't think I will have time for this for a while, so if anyone wants to provide tests they are welcomed to do it. |
Hi, The attached patch is an attempt to write tests for this issue, and get all the tests passing. Since a new threshold has been introduced, |
Attached is the patch that tries to solve the issue for the strings, tuples, lists and dictionaries. Tuples, lists and dictionaries use the same value for the threshold. There's a helper method in the tests that is generic to all mentioned types. |
Thanks Puneeth for the initial tests and Elena for expanding the fix and the tests to cover assertDictEqual too. With the attached patch, unittest_scse.py works fine, producing this output: Traceback (most recent call last):
File "unittest_scse.py", line 14, in test_compare
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2] First differing element 0: ====================================================================== Traceback (most recent call last):
File "unittest_scse.py", line 24, in test_compare_2
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2] First differing element 9999: ---------------------------------------------------------------------- FAILED (failures=2) |
Here is alternative patch. It outputs line diff even for large sequences/dicts/strings. With the attached patch, unittest_scse.py works fine, producing this output: FF Traceback (most recent call last):
File "unittest_scse.py", line 14, in test_compare
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2] First differing element 0: - [1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
+ [2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
+ 2,
...
*** Diff is truncated. Set self.ma
8000
xDiffItems to None to see it all. ====================================================================== Traceback (most recent call last):
File "unittest_scse.py", line 24, in test_compare_2
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2] First differing element 9999:
---------------------------------------------------------------------- FAILED (failures=2) |
Thanks for your comments Ezio. Yes, this patch is rather only a demo. Sorry, but I'm not very interesting in completing this patch (at least not right now). I only want to see more detailed error reports. There are some ideas about improving diffs reporting. First, we can iterative decrease slices size until diff size become less maxDiff. And output truncated diff instead of omitting it at all. Second, we can extend difflib by functions which allow to limit a number of reported differences. |
I thought some more about this, and I think we can do better. I'll try to summarize the current situation and propose some possible solutions to achieve this goal. Currently we have:
These two attributes deal with two different but related problems:
These are different because one acts before and the other after, but they are also related because the threshold directly affects the size of the diff. By picking some ideas from Serhiy comments and patch, I think we should:
Now, doing 1) shouldn't be an issue, 2) is also doable if we agree on it, and 3) is also not a problem since these are internal functions. If we decide to go for 4) as well there are at least two options: Option a) might be doable, and even if it introduces a change in behavior it might be acceptable since it affects the output of the messages in case of failure, and I don't think anyone is relying on an exact output (also because tests shouldn't be failing). Moreover, the most common usage of maxDiff is setting it to None, and having the threshold to None means that the full diff will be computed and printed, leaving the behavior unchanged. Thoughts?
I was thinking about having a lazy, generator-based diffing, so we could generate as many diff lines we need and stop once we reach the threshold. I'm not sure if difflib already has something similar but anyway this is a separate issue. Also, if we unify maxDiff and the threshold we won't need this, since all that is computer is also printed. |
You forgot about strings with few but very long lines. We should hide or
This is too much for bug fix. We should fix this issue (do not calculate diffs Then we can refactor and improve diffs reporting in other issues. |
Sorry, something went wrong.
A few thoughts. Adding a new public symbol seems inappropriate here: this is a performance issue that is well predictable and we should cater for that (given difflibs current performance). I'll note in passing that both bzr and hg have much higher performance difference algorithms that we could pick up and includes as a replacement SequenceMatcher, which might significantly reduce the threshold at which we need to default-cap things - but such a threshold will still exist. I totally agree that _diffThreshold should apply to non-string sequences - anything where we're going to hit high-order complexity outputting the difference. That said, I speculate that perhaps we'd be better off outputting both objects in some structured fashion and letting a later process render them (for things like CI systems and test databases, where fidelity of reproduction is more important than having the output fit on one screen. This is a different issue though and something we should revisit later. That suggests to me though that the largest diff we output should be chosen based on the textual representation of the diff - we're doing it for human readability. Whereas the threshold for calculating a diff at all should be based on performance. It can be very expensive to calculate a diff on large sequences, but the diff might be much much larger than the sequence length indicates [because each item in the sequence may be very large]. Perhaps thats over thinking it? Anyhow- short term, surely just making the threshold apply to any sequenced type is sufficient to fix the bug? |
Oh, I got a profile from the test case for my own interest. 6615 seconds .. some highlights that jumped out at me
which means we're basically using ndiff, which is cubic rather than quadratic - the same performance issue the html module had. I think we'll probably make a better tradeoff by using unified_diff instead which is quadratic. |
Switching this to unified_diff allows the test case to finish nearly instantly. The downside is that the output is changed in the case of a difference found: FF Traceback (most recent call last):
File "test_case.py", line 14, in test_compare
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2] First differing element 0: Diff is 100034 characters long. Set self.maxDiff to None to see it. ====================================================================== Traceback (most recent call last):
File "test_case.py", line 24, in test_compare_2
self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2] First differing element 9999: ---
+++
@@ -9997,4 +9997,4 @@
1,
1,
1,
- 1]
+ 2] Ran 2 tests in 0.098s FAILED (failures=2) |
The new output seems ok to me? |
Hi all, What do we do with this ticket? Patch were provided, People have agreed (I think). So what's missing to close it (or pass to the next step)? It's going to be a year that a high priority ticket has no update and I would like to accelerate it if I can :). Let me know what I can do. Regards, |
unittest_unified_diff.patch: Rebased patch for the default branch. My patch updates also unit tests. The patch changes the test output. If we decide to apply the patch, I propose to only apply it to the default branch (Python 3.7). The bug report is about a test which fails. I'm not sure that it's a real blocker issue that Python is slow when a test fails. At least, it should be fast when a test pass. I mean that I like the current output, I'm not sure about the new output. Example with attached unified_diff.py. Before: Traceback (most recent call last):
File "unified_diff.py", line 5, in test_x
self.assertEqual([], [None])
AssertionError: Lists differ: [] != [None] Second list contains 1 additional elements.
---------------------------------------------------------------------- FAILED (failures=1) With the patch: Traceback (most recent call last):
File "unified_diff.py", line 5, in test_x
self.assertEqual([], [None])
AssertionError: Lists differ: [] != [None] Second list contains 1 additional elements. ---
+++
@@ -1 +1 @@
-[]
+[None] Ran 1 test in 0.001s FAILED (failures=1) The patch adds the following header which can be suprising:
|
Hi! I see that this issue have a pr (#10034) opened 20 days ago. So, I want to continue work on this (my first pr). I use the vstinner's patch and I make two little change (attached patch):
This is the output for the vstinner test case. What do you think? |
I create this patch on 3.7 |
This is a bug for string comparisons as well. I just had to manually reimplement assertMultiLineEqual (which doesn't call assertSequenceEqual) on one of my test classes to workaround this issue. |
Hi all! @eamanu, I went ahead and picked up where you left off. I stopped short of Despite not creating a PR, you can see the changes here: main...jdevries3133:bpo-19217-slow-assertEq @eamanu, if you prefer, you can probably just merge this branch into Also, below is a summary of what I've done: Revert Change to difflib Link to the original thread where this change was discussed a bit: I assume this was a performance improvement. It's not clear to me why we are
Obviously, I don't have knowledge of the reasoning behind this change It's definitely possible that I'm missing something here, so let me know if Add Regression Test The regression test basically attempts to reproduce the problem and asserts Misc I also fixed the one failing test in unittest's own test case, and added a |
Lukasz, perhaps you can finish this. Too many assistent cooks and no chief cook. At least tell Jack (above) whether to submit his PR. |
Hey all, I'm putting a ping on this issue. I think my fix is ready to merge, see python/issues-test-cpython#27434. Thanks for all the feedback on the PR so far! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: