-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-37935: Improve performance of pathlib.scandir() #15331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When the scandir function is used in the pathlib module, it's first converted into a list and then used in a for loop. The conversion to list is unnecessary since the list isn't used except for the iteration in the loop, so it's a waste of performance. I got rid of the conversions to list and used the scandir iterator directly in the loop. In addition, I wrapped the use of scandir in the with statement to close its resources properly.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @ShaiAvr and welcome!
I have a few recommendations for you:
-
Create an account on https://bugs.python.org/ and sign the CLA. Usually it takes around 24 hours from the time it's submitted, and it may have to be checked manually for the status to be properly updated on GitHub. This must be done for any contribution to Python for legal purposes.
-
Since this is a non-trivial change, open an issue on https://bugs.python.org/. This acts as an effective way to bring attention to the topic, and for core developers that are knowledgeable with
pathlib
to provide feedback. After the issue is created, you can attach the PR to it by changing the PR title tobpo-<issue-number>: <PR title>
. -
Adjust the PR title to
Improve performance of pathlib.scandir()
. This helps the relevant core developers find the PR more easily and quickly identify what it's addressing. -
A news entry can also be submitted via
blurb-it
. This is usually a 1-2 sentence summary of what the PR changes. Optionally at the end, you can include "Patch by <name>". It's especially recommended for any PRs involving code changes (or adding new sections to docs).
Let me know if you have any questions, and I'll do my best to address them. (:
@aeros167. Thanks for your comment. It's the first time I do something like this and I appreciate your help. I have created an account and signed the CLA like you said and I'm now working on creating the issue. This issue isn't specific for any version of python (any version that supports Thank you for your help. I really appreciate it. |
Since this is a performance enhancement, I would start with version 3.9 and see what the others think about backporting the PR the previous branches. Most changes are applied to the latest development version (rather than the latest release). If the PR is approved and there's no significant conflicts, there's a decent chance that the backport to 3.8 will be approved. However, it will likely not be backported to 3.7, unless the release manager decides otherwise. For that branch, it's pretty much only documentation corrections and security fixes.
No problem, let me know if you have any other questions. (: We'll have to wait for a core developer to review the PR before we can move forward with deciding whether or not it should be merged, but I can help with most general questions or anything process related. Also, the "Lifecycle of a Pull Pequest" section of the devguide might be able to provide some useful information. |
The user has signed the CLA: But they need to update their bugs.python.org "Your Details" page to list their ShaiAvr github user name for our bot to understand that. https://bugs.python.org/user32163 |
I'm considering this a performance bugfix, thus both tags. it is always an antipattern to call |
Can you update your branch with the latest changes from master? |
@ShaiAvr, please address the review comments. Thank you! |
I'm going to close this as the pull request appears to have been abandoned. It can be reopened or a new pull request can be opened to replace it. Thanks! |
I recently have taken a look in the source code of the
pathlib
module, and I saw something weird there:when the module used the
scandir
function, it first converted its iterator into a list and then used it in a for loop. The list wasn't used anywhere else, so I think the conversion to list is just a waste of performance.In addition, I noticed that the
scandir
iterator is never closed (it's not used in a with statement and itsclose
method isn't called). I know that the iterator is closed automatically when it's garbaged collected, but according to the docs, it's advisable to close it explicitly.So, I changed the code so that,
scandir
is used in a with statement and its iterator is used directly in the for loop instead of being converted to list first, which should enhance the performance.https://bugs.python.org/issue37935