-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fixed multiple tar slips from using "extractall" which can lead to an exploit #23163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…loits or overwriting of arbitrary files via archive extraction.
I don't see how your current fix is meant to improve the situation. This is still using I am not sure whether we should aim to fix this in scikit-learn or whether the standard library would be a better place for a fix. I would also venture that this vulnerability is not very likely to be exploited inside scikit-learn. The attacker would need to take control of a website we are downloading a tarfile from and the scikit-learn user would need to download this dataset for the first time (i.e. without any cache). Context:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! I think closing the tarfile is an improvement even if it does not resolve the exploit.
@@ -169,7 +170,9 @@ def progress(blocknum, bs, size): | |||
if _not_in_sphinx(): | |||
sys.stdout.write("\r") | |||
print("untarring Reuters dataset...") | |||
tarfile.open(archive_path, "r:gz").extractall(data_path) | |||
# extractall can cause a tar slip | |||
with closing(tarfile.open(archive_path, "r:gz")) as archive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the Tarfile docs
A TarFile object can be used as a context manager in a with statement. It will automatically be closed when the block is completed.
with closing(tarfile.open(archive_path, "r:gz")) as archive: | |
with tarfile.open(archive_path, "r:gz") as archive: |
This comment applies to the other diffs as well.
@@ -169,7 +170,9 @@ def progress(blocknum, bs, size): | |||
if _not_in_sphinx(): | |||
sys.stdout.write("\r") | |||
print("untarring Reuters dataset...") | |||
tarfile.open(archive_path, "r:gz").extractall(data_path) | |||
# extractall can cause a tar slip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #23163 (comment), this does not actually prevent the exploit:
# extractall can cause a tar slip |
This applies to the other diffs as well.
Reference Issues/PRs
This is not a response to an existing open issue, however I thought it is worth taking a look at since tar/zip slips can be dangerous and lead to vulnerability!
What does this implement/fix? Explain your changes.
Calling extractall() to extract all files from a tar file without sanitization can result in files outside destination directory to be overwritten, resulting in an arbitrary file write. I suggest using:
with closing(tarfile.open(ARCHIVE_NAME, "r:gz")) as archive: archive.extractall(path=PATH_NAME)
Any other comments?
This can lead to an exploit by extracting files from an archive by giving access to parts of the file system outside of the target folder in which they should reside. With that being said, executable files can be overwritten and either be invoked remotely or wait for the system or user to call them, thus achieving remote command execution on the victim’s machine. The vulnerability can also cause damage by overwriting configuration files or other sensitive resources, and can be exploited on both client machines and servers.