8000 gh-119105: Differ.compare is too slow [for degenerate cases] by tim-one · Pull Request #119492 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-119105: Differ.compare is too slow [for degenerate cases] #119492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix the windowing code to be symmetric.
Stop tresting junk lines specially - if we synch on junk, so be it -
it almost never occurs in real life anyway.
  • Loading branch information
tim-one committed May 24, 2024
commit c187ce140a0bb4734ebc7244efe2c69fe8b064a1
57 changes: 23 additions & 34 deletions Lib/difflib.py
Original file line number Diff line number Diff line change
Expand Up @@ -923,48 +923,35 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
cr = cruncher.ratio

WINDOW = 10
dump_i, dump_j = alo, blo
start_i, start_j = alo, blo
while start_i < ahi and start_j < bhi:
# Search for the pair in the next WINDOW i's and j's that
# scores better than `cutoff` without being identical
# (identical lines must be junk lines, & we don't want to
# synch up on junk -- unless we have to)
best_i = best_j = None
dump_i, dump_j = alo, blo # smallest indices not yet resolved
for j in range(blo, bhi):
bj = b[j]
cruncher.set_seq2(bj)
# Search the corresponding i's within WINDOW for rhe highest
# ratio greater than `cutoff`.
aequiv = alo + j - blo
arange = range(max(aequiv - WINDOW, dump_i),
min(aequiv + WINDOW + 1, ahi))
if not arange: # likely exit if `a` is shorter than `b`
break
best_ratio = cutoff
best_i = best_j = None
for j in range(start_j, min(start_j + WINDOW, bhi)):
bj = b[j]
cruncher.set_seq2(bj)
for i in range(start_i, min(start_i + WINDOW, ahi)):
ai = a[i]
if ai == bj:
if best_i is None:
best_i, best_j = i, j
continue
cruncher.set_seq1(ai)
if (crqr() > best_ratio
and cqr() > best_ratio
and (ratio := cr()) > best_ratio):
best_ratio = ratio
best_i, best_j = i, j
break
if best_ratio > cutoff:
# if best_ratio <= cutoff, we don't yet have a
A4BD # non-identical pair, so keep searching in the
# window
break
for i in arange:
ai = a[i]
cruncher.set_seq1(ai)
if (crqr() > best_ratio
and cqr() > best_ratio
and cr() > best_ratio):
best_ratio = cr()
best_i, best_j = i, j

if best_i is None:
# found nothing to synch on - move to next window
start_i += WINDOW
start_j += WINDOW
# found nothing to synch on yet
continue

# pump out straight replace from before this synch pair
yield from self._fancy_helper(a, dump_i, best_i,
b, dump_j, best_j)
start_i, start_j = best_i + 1, best_j + 1
dump_i, dump_j = start_i, start_j
# do intraline marking on the synch pair
aelt, belt = a[best_i], b[best_j]
if aelt != belt:
Expand All @@ -989,6 +976,8 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
else:
# the synch pair is identical
yield ' ' + aelt
dump_i, dump_j = best_i + 1, best_j + 1
best_i = best_j = None

# pump out straight replace from after the last synch pair
yield from self._fancy_helper(a, dump_i, ahi,
Expand Down
Loading
0