Last active
July 25, 2023 16:34
-
-
Save Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e to your computer and use it in GitHub Desktop.
Skeleton of the patch that is being applied to projects vulnerable to CVE-2007-4559. Based off of code from pip's unpacking.py.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
def is_within_directory(directory, target): | |
abs_directory = os.path.abspath(directory) | |
abs_target = os.path.abspath(target) | |
prefix = os.path.commonprefix([abs_directory, abs_target]) | |
return prefix == abs_directory | |
def safe_extract(tar, path=".", members=None, *, numeric_owner=False): | |
for member in tar.getmembers(): | |
member_path = os.path.join(path, member.name) | |
if not is_within_directory(path, member_path): | |
raise Exception("Attempted Path Traversal in Tar File") | |
tar.extractall(path, members, numeric_owner=numeric_owner) |
@jowagner Erm, doesn't the os.path.commonprefix()
strategy have exactly the same false negative, also in @Kasimir123's (and pip's) code above? Shouldn't it rather be os.path.commonpath()
?
>>> from pathlib import Path
>>> import os.path
>>> p1 = Path("/a/b/c")
>>> p2 = Path("/a/b/c2")
>>> os.path.commonprefix([p1, p2])
'/a/b/c'
>>> os.path.commonprefix([p1, p2]) == str(p1)
True
>>> os.path.commonpath([p1, p2])
'/a/b'
>>> p2.is_relative_to(p1)
False
See also PEP 706 for recent progress on this.
Here's an updated safe extraction implementation I'm now using for my projects:
https://gist.github.com/mara004/6fe0ac15d0cf303bed0aea2f22d8531f
And FWIW, some threads confirming my suspicion that os.path.commonprefix()
is invalid in the Trellix patch:
https://stackoverflow.com/a/43613742/15547292
http://grokbase.com/t/python/python-ideas/153nyg6qt1/os-path-commonprefix-yes-that-old-chestnut
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oh, I see. Thanks! (I wrongly assumed
str(path)
would add a trailing slash to directories.)The
is_relative_to()
strategy should work correctly, though?Imported from pathlib, of course. Doing
Path(dest_dir).resolve()
would be an option but isn't necessary if the caller uses pathlib already.