-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-34002: Minor efficiency and 8000 clarity improvements in email package. #7999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Comparing ``len(a) > ``len(a - b)`` is essentially looking for an intersection between the two sets. If set ``b`` does not intersect ``a`` then ``len(a - b)`` will be equal to ``len(a)``. This logic is more clearly expressed as ``a & b``.
Copying the list, then repeatedly popping the first element was unnecessarily slow. I also cleaned up a couple other inefficiencies. There's no need to unpack a tuple, then re-pack and append it. The list can be created with the first element instead of empty. Secondly, the ``endswith`` method returns a bool, so there's no need for an if- statement to set ``encoding`` to True or False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open an issue on the tracker for discussing this PR.
# Copy params so we don't mess with the original | ||
params = params[:] | ||
new_params = [] | ||
new_params = [params[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if params[0]
is not a 2-tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's possible for the first params element, though the others are implicitly checked by the unpacking for-loop.
Looking at the usage of decode_params
, it's only use is in email.message._get_params_preserve
which in turn has 3 call sites. All 3 calls unpack the elements as (k, v)
pairs. The bad data would cause an immediate error, given how the method is used. The error would be the same as today, though the traceback would not be as deep.
If that's not satisfactory, I'll revert to:
key, value = params[0]
new_params = [(key, value)]
``a.intersection(b)`` method is more clear of purpose than ``not a.isdisjoint(b)`` and avoids an unnecessary set construction that ``a & set(b)`` performs.
I created an issue in the tracker and changed the name of this pull-request. |
encoded = True | ||
else: | ||
encoded = False | ||
for name, value in params[1:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params[1:]
still creates a copy of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. The real benefit is avoiding repeated .pop(0)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the code can be made more optimal if avoid a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that the final value of 'encoded' only depends on the final value of name. If so, the loop is equivalent to
if params:
encoded = params[-1][0].endswith('*')
Otherwise, one can skip-iterate without copying with
it = iter(params)
try:
next(it)
except StopIteration:
pass
for name, value in it:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each iteration of the loop uses the encoded
value. The iter
and next
ing would be a better technique.
it = iter(params)
new_params = [next(it)]
for name, value in it:
...
While it reads slightly worse, the isdisjoint method will stop when it finds a counterexample and returns a bool, rather than looping over the entire iterable and constructing a new set.
This PR seems to be a good and looks like has reviews from several core devs. Can @serhiy-storchaka, @terryjreedy @warsaw @bitdancer merge this with label to skip news entry? Unless we do do need a NEWS entry, in which case it would be good to know that. |
< 8000 details-menu class="dropdown-menu dropdown-menu-sw show-more-popover color-fg-default" style="width:185px" src="" preload >
Though sometimes disagreeing on meaning of 'trivial', we generally want an issue and news entry for non-trivial changes, and this is non-trivial. Someone should add a short news entry. It will inform anyone looking for email changes that might have caused an obscure bug that we made a change that we think should have no visible effect other than performance. |
Thanks @terryjreedy, that totally makes sense. @selik can you please add a news entry for this PR? |
I am okay with the changes, like I mentioned before, but like @terryjreedy mentioned, it would be good to have a NEWS entry. I'd push a NEWS entry myself if I had access to push to his PR (which can be done using the flag "Allow changes from maintainers flag", I think. |
I believe the box is checked to allow edits from maintainers |
@maxking I am able to edit the files, so you should be able to create a branch, add a blurb, and push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @selik for this :) |
Creating a copy of a list, then repeatedly popping the first element could be much slower than necessary if the list is long. Further explanations in the commit messages
Automerge-Triggered-By: @maxking