Skip to content

Instantly share code, notes, and snippets.

@mara004
Last active August 6, 2024 00:10
Show Gist options
  • Save mara004/6fe0ac15d0cf303bed0aea2f22d8531f to your computer and use it in GitHub Desktop.
Save mara004/6fe0ac15d0cf303bed0aea2f22d8531f to your computer and use it in GitHub Desktop.
Safer tar extraction
# SPDX-FileCopyrightText: 2023 geisserml <[email protected]>
# SPDX-License-Identifier: Apache-2.0 OR BSD-3-Clause OR MPL-2.0
# Safer tar extraction (hopefully) preventing CVE-2007-4559 etc.
# Tries to use the most elegant strategy available in the caller's python version (>= 3.6)
__all__ = ["safer_tar_unpack"]
import sys
if sys.version_info >= (3, 11, 4): # PEP 706
import shutil
def safer_tar_unpack(archive_path, dest_dir):
shutil.unpack_archive(archive_path, dest_dir, format="tar", filter="data")
else: # workaround
import tarfile
from pathlib import Path
def safer_tar_unpack(archive_path, dest_dir):
dest_dir = Path(dest_dir).resolve()
with tarfile.open(archive_path) as tar:
for m in tar.getmembers():
if not ((m.isfile() or m.isdir()) and dest_dir in (dest_dir/m.name).resolve().parents):
raise RuntimeError("Path traversal, symlink or non-file member in tar archive (probably malicious).")
tar.extractall(dest_dir)
@mara004
Copy link
Author

mara004 commented Aug 1, 2023

Just tested with Jakub Wilk's traversal archives.
Results good. Workaround code correctly blocks all invalid samples.
Same for PEP 706 code, except that it automatically converts absolute to relative paths, i.e. they are extracted into dest_dir. This is safe (and probably better), only a behavioral difference.

Also confirmed with all strategies that we prevent traversal even if a path with shared prefix exists in the destination directory, while invalid archives get through with the < (3, 9) code passage when changing commonpath() to commonprefix(). This proofs the Trellix patch and pip's code are incorrect (they use commonprefix() as of this writing).

@mara004
Copy link
Author

mara004 commented Feb 20, 2024

I just realized that we could simplify the is_within_dir() check to dir in path.parents.
It'd be easier than both current strategies (no version checks), and strictly speaking also more correct, because it does not permit the case that path == dir.

@mara004
Copy link
Author

mara004 commented Feb 20, 2024

Also note, a known limitation of the above code is that existing files in dir may be overwritten by the archive. But this is expected.

@mara004
Copy link
Author

mara004 commented Feb 20, 2024

Re Comment 2: I updated the code accordingly, however the latest revision is now untested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment