8000 Fixed multiple tar slips from using "extractall" which can lead to an exploit by awiswasi · Pull Request #23163 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Fixed multiple tar slips from using "extractall" which can lead to an exploit #23163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

awiswasi
Copy link

Reference Issues/PRs

This is not a response to an existing open issue, however I thought it is worth taking a look at since tar/zip slips can be dangerous and lead to vulnerability!

What does this implement/fix? Explain your changes.

Calling extractall() to extract all files from a tar file without sanitization can result in files outside destination directory to be overwritten, resulting in an arbitrary file write. I suggest using:

with closing(tarfile.open(ARCHIVE_NAME, "r:gz")) as archive: archive.extractall(path=PATH_NAME)

Any other comments?

This can lead to an exploit by extracting files from an archive by giving access to parts of the file system outside of the target folder in which they should reside. With that being said, executable files can be overwritten and either be invoked remotely or wait for the system or user to call them, thus achieving remote command execution on the victim’s machine. The vulnerability can also cause damage by overwriting configuration files or other sensitive resources, and can be exploited on both client machines and servers.

…loits or overwriting of arbitrary files via archive extraction.
@lesteve
Copy link
Member
lesteve commented Apr 21, 2022

I don't see how your current fix is meant to improve the situation. This is still using extractall without any kind of validation.

I am not sure whether we should aim to fix this in scikit-learn or whether the standard library would be a better place for a fix.

I would also venture that this vulnerability is not very likely to be exploited inside scikit-learn. The attacker would need to take control of a website we are downloading a tarfile from and the scikit-learn user would need to download this dataset for the first time (i.e. without any cache).

Context:

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I think closing the tarfile is an improvement even if it does not resolve the exploit.

@@ -169,7 +170,9 @@ def progress(blocknum, bs, size):
if _not_in_sphinx():
sys.stdout.write("\r")
print("untarring Reuters dataset...")
tarfile.open(archive_path, "r:gz").extractall(data_path)
# extractall can cause a tar slip
with closing(tarfile.open(archive_path, "r:gz")) as archive:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the Tarfile docs

A TarFile object can be used as a context manager in a with statement. It will automatically be closed when the block is completed.

Suggested change
with closing(tarfile.open(archive_path, "r:gz")) as archive:
with tarfile.open(archive_path, "r:gz") as archive:

This comment applies to the other diffs as well.

@@ -169,7 +170,9 @@ def progress(blocknum, bs, size):
if _not_in_sphinx():
sys.stdout.write("\r")
print("untarring Reuters dataset...")
tarfile.open(archive_path, "r:gz").extractall(data_path)
# extractall can cause a tar slip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #23163 (comment), this does not actually prevent the exploit:

Suggested change
# extractall can cause a tar slip

This applies to the other diffs as well.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0