-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-63416: Speed up assertEqual on long sequences #27434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 30 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
20fde31
Fix bpo-19217
eamanu 6996d8a
fix unnecessary space and solve @gpshead comments
eamanu adfcf6d
fix test
ddebcfa
Merge remote-tracking branch 'temp/fix_bpo-19217' into bpo-19217-slow…
jdevries3133 81900d4
bpo-19217: fix failing test in unittest's test suite
jdevries3133 e5cb7bc
bpo-19217: add blurb
jdevries3133 6955d81
revert changes to difflib.py
jdevries3133 46f5e29
add regression test
jdevries3133 847bfc4
Merge branch 'main' of github.com:python/cpython into bpo-19217-slow-…
jdevries3133 856029d
draft implementation of unittest.case._heuristic_diff
jdevries3133 1dd1bcd
fix: remove now-unused imports from test_case.py
jdevries3133 1f45b28
merge upstream python/cpython:main
jdevries3133 988740a
add variably scaled test cases, misc updates & revisions
jdevries3133 4661f75
move Test_HeuristicDiff beneath main tests
jdevries3133 a9d23c4
remove unnecessary list comprehension in Lib/unittest/case.py
jdevries3133 192d7a4
spelling error in Lib/unittest/test/test_case.py
jdevries3133 74895e9
implement second review from @ambv
jdevries3133 e21dbe3
merge changes from PR suggestions
jdevries3133 5e8a186
Merge branch 'main' of github.com:python/cpython into bpo-19217-slow-…
jdevries3133 28cb042
fix from @JelleZijlstra
jdevries3133 63ecc45
fix from @JelleZijlstra
jdevries3133 e4344cb
remove unnecessary type checker supression
jdevries3133 0bdf06c
fix typo
jdevries3133 05ffdf2
simplify tuple syntax
jdevries3133 5767d21
fix news entry "~lists~ => sequenecs"
jdevries3133 b9f2f9d
better document the reasoning behind the heuristic
jdevries3133 5351506
thanks @JelleZijlstra
jdevries3133 872de08
thanks @JelleZijlstra
jdevries3133 180c732
fix whitespace around keyword argument
jdevries3133 200a1a7
Merge branch 'bpo-19217-slow-assertEq' of github.com:jdevries3133/cpy…
jdevries3133 fc752cf
update blurb to describe observable changes
jdevries3133 b53738f
Merge branch 'main' of github.com:python/cpython into bpo-19217-slow-…
jdevries3133 adc7d6f
reword to make less promises
gpshead c4d6f38
Merge branch 'main' of github.com:python/cpython into bpo-19217-slow-…
jdevries3133 9e6a464
Merge branch 'bpo-19217-slow-assertEq' of github.com:jdevries3133/cpy…
jdevries3133 0ad029f
Merge remote-tracking branch 'upstream/main' into bpo-19217-slow-asse…
AA-Turner 4c5175c
Remove annotations
AA-Turner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -5,6 +5,7 @@ | |||||||
import difflib | ||||||||
import pprint | ||||||||
import re | ||||||||
from collections.abc import Iterator | ||||||||
import warnings | ||||||||
import collections | ||||||||
import contextlib | ||||||||
|
@@ -170,6 +171,46 @@ def _is_subtype(expected, basetype): | |||||||
return all(_is_subtype(e, basetype) for e in expected) | ||||||||
return isinstance(expected, type) and issubclass(expected, basetype) | ||||||||
|
||||||||
|
||||||||
def _heuristic_diff(a: list[str], b: list[str]) -> Iterator[str]: | ||||||||
"""After testing the magnitude of the inputs, preferably return the output | ||||||||
of difflib.ndiff, but fallback to difflib.unified_diff for prohibitively | ||||||||
expensive inputs. How cost is calculated: | ||||||||
|
||||||||
Cost is calculated according to this heuristic: | ||||||||
|
||||||||
cost = (number of differing lines | ||||||||
* total length of all differing lines) | ||||||||
|
||||||||
This heuristic is used because the time complexity of ndiff is | ||||||||
approximately O((diff)^2), where `diff` is the product of the number of | ||||||||
differing lines, and the total length of differing lines. On the other | ||||||||
hand, unified_diff's cost is the same as the cost of producing `diff` | ||||||||
by itself: O(a + b). | ||||||||
|
||||||||
See bpo-19217 for additional context. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
""" | ||||||||
COST_LIMIT = 1_000_000 | ||||||||
|
||||||||
# call unified_diff | ||||||||
udiff = list(difflib.unified_diff(a, b, fromfile="expected", tofile="got")) | ||||||||
udiff_differing_lines = [l for l in udiff | ||||||||
if l.startswith(('-', '+'))] | ||||||||
Comment on lines
+216
to
+217
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
||||||||
# inspect unified_diff output | ||||||||
num_difflines = len(udiff_differing_lines) | ||||||||
total_diffline_length = sum(len(l) for l in udiff_differing_lines) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
||||||||
# now, we know what it will cost to call `ndiff`, according to the | ||||||||
# heuristic | ||||||||
diff_cost = num_difflines * total_diffline_length | ||||||||
gpshead marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
if diff_cost > COST_LIMIT: | ||||||||
yield from udiff | ||||||||
else: | ||||||||
yield from difflib.ndiff(a, b) | ||||||||
|
||||||||
|
||||||||
class _BaseTestCaseContext: | ||||||||
|
||||||||
def __init__(self, test_case): | ||||||||
|
@@ -1020,9 +1061,9 @@ def assertSequenceEqual(self, seq1, seq2, msg=None, seq_type=None): | |||||||
differing += ('Unable to index element %d ' | ||||||||
'of second %s\n' % (len1, seq_type_name)) | ||||||||
standardMsg = differing | ||||||||
diffMsg = '\n' + '\n'.join( | ||||||||
difflib.ndiff(pprint.pformat(seq1).splitlines(), | ||||||||
pprint.pformat(seq2).splitlines())) | ||||||||
diffMsg = _heuristic_diff(pprint.pformat(seq1).splitlines(), | ||||||||
pprint.pformat(seq2).splitlines()) | ||||||||
diffMsg = '\n' + '\n'.join(diffMsg) | ||||||||
|
||||||||
standardMsg = self._truncateMessage(standardMsg, diffMsg) | ||||||||
msg = self._formatMessage(msg, standardMsg) | ||||||||
|
@@ -1133,7 +1174,7 @@ def assertDictEqual(self, d1, d2, msg=None): | |||||||
|
||||||||
if d1 != d2: | ||||||||
standardMsg = '%s != %s' % _common_shorten_repr(d1, d2) | ||||||||
diff = ('\n' + '\n'.join(difflib.ndiff( | ||||||||
diff = ('\n' + '\n'.join(_heuristic_diff( | ||||||||
pprint.pformat(d1).splitlines(), | ||||||||
pprint.pformat(d2).splitlines()))) | ||||||||
standardMsg = self._truncateMessage(standardMsg, diff) | ||||||||
|
@@ -1216,7 +1257,7 @@ def assertMultiLineEqual(self, first, second, msg=None): | |||||||
firstlines = [first + '\n'] | ||||||||
secondlines = [second + '\n'] | ||||||||
standardMsg = '%s != %s' % _common_shorten_repr(first, second) | ||||||||
diff = '\n' + ''.join(difflib.ndiff(firstlines, secondlines)) | ||||||||
diff = '\n' + ''.join(_heuristic_diff(firstlines, secondlines)) | ||||||||
standardMsg = self._truncateMessage(standardMsg, diff) | ||||||||
self.fail(self._formatMessage(msg, standardMsg)) | ||||||||
|
||||||||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletions
2
Misc/NEWS.d/next/Library/2021-07-21-22-57-51.bpo-19217.vm-cr-.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Optimize :meth:`~unittest.TestCase.assertEqual` method for long sequences of varied | ||
jdevries3133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
items. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this isn't true. Both have worst-case quadratic time, but on different kinds of input, and it's less likely to get provoked in the simpler kind of differencing
unified_diff
tries.Well, actually actually 😉, the possible worst case of
ndiff
is worse than that, at least cubic time.