8000 gh-119105: difflib: improve recursion for degenerate cases by pulkin · Pull Request #119131 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-119105: difflib: improve 8000 recursion for degenerate cases #119131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 19, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
difflib.py: even simpler impl
  • Loading branch information
pulkin authored May 19, 2024
commit f9e3480e42654cf2ba09c6fd2ccb1385347604e7
20 changes: 6 additions & 14 deletions Lib/difflib.py
CF4B
Original file line number Diff line number Diff line change
Expand Up @@ -918,29 +918,22 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
# search for the pair that matches best without being identical
# (identical lines must be junk lines, & we don't want to synch up
# on junk -- unless we have to)
alen = alo + ahi - 1
blen = blo + bhi - 1
# weight is used to balance the recursion by prioritizing
# i and j in the middle of their ranges
weight = 0
amid = (alo + ahi - 1) / 2
bmid = (blo + bhi - 1) / 2
for j in range(blo, bhi):
bj = b[j]
cruncher.set_seq2(bj)
if j < blen / 2:
weight += alen
elif j > blen / 2:
weight -= alen
weight_j = - abs(j - bmid)
for i in range(alo, ahi):
ai = a[i]
if ai == bj:
if eqi is None:
eqi, eqj = i, j
continue
cruncher.set_seq1(ai)
if i < alen / 2:
weight += blen
elif i > alen / 2:
weight -= blen
# weight is used to balance the recursion by prioritizing
# i and j in the middle of their ranges
weight = weight_j - abs(i - amid)
# computing similarity is expensive, so use the quick
# upper bounds first -- have seen this speed up messy
# compares by a factor of 3.
Expand All @@ -951,7 +944,6 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
(cruncher.quick_ratio(), weight) > best_ratio and \
(cruncher.ratio(), weight) > best_ratio:
best_ratio, best_i, best_j = (cruncher.ratio(), weight), i, j
# assert weight == 0, weight
best_ratio, _ = best_ratio
if best_ratio < cutoff:
# no non-identical "pretty close" pair
Expand Down
Loading
0