Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
How to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally

Commit jupyter notebooks code to git and keep output locally

  1. Add a filter to git config by running the following command in bash inside the repo:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  
  1. Create a .gitattributes file inside the directory with the notebooks

  2. Add the following to that file:

*.ipynb filter=strip-notebook-output  

After that, commit to git as usual. The notebook output will be stripped out in git commits, but it will remain unchanged locally.

Source: StackOverflow

How to override the above for a specific notebook

This is useful if you sometimes want to add specific notebooks with their cell outputs intact to git, while still having the default behavior of clearing out cells.

  1. When adding to git a notebook whose cell outputs you want to keep, instead of the usual git add <path to your notebook> command, use this: git -c filter.strip-notebook-output.clean= add <path to your notebook>

Source: StackOverflow

@miguel9554
Copy link

@konradmb your solution is fantastic. Is there a way to push it so that everyone who clones the repo has the config? As I understand it this is a local solution. Thanks!

@konradmb
Copy link

konradmb commented Jun 13, 2024

@miguel9554

  • I don't think that would be possible as that feature would be deemed as unsafe. For example an attacker could add a filter that runs rm for all files.
  • Yes, this solution works only locally.
  • There's a workaround at https://stackoverflow.com/a/18330114 but it still requires every user to adjust local settings (they discuss the security concerns too).
  • An idea: maybe you can add a pre-merge filter step to GitHub Actions that would run on every pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment