-
Star
(124)
You must be signed in to star a gist -
Fork
(16)
You must be signed in to fork a gist
-
-
Save minrk/6176788 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
"""strip outputs from an IPython Notebook | |
Opens a notebook, strips its output, and writes the outputless version to the original file. | |
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS. | |
This does mostly the same thing as the `Clear All Output` command in the notebook UI. | |
LICENSE: Public Domain | |
""" | |
import io | |
import sys | |
try: | |
# Jupyter >= 4 | |
from nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython 3 | |
try: | |
from IPython.nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython < 3 | |
from IPython.nbformat import current | |
def read(f, as_version): | |
return current.read(f, 'json') | |
def write(nb, f): | |
return current.write(nb, f, 'json') | |
def _cells(nb): | |
"""Yield all cells in an nbformat-insensitive manner""" | |
if nb.nbformat < 4: | |
for ws in nb.worksheets: | |
for cell in ws.cells: | |
yield cell | |
else: | |
for cell in nb.cells: | |
yield cell | |
def strip_output(nb): | |
"""strip the outputs from a notebook object""" | |
nb.metadata.pop('signature', None) | |
for cell in _cells(nb): | |
if 'outputs' in cell: | |
cell['outputs'] = [] | |
if 'prompt_number' in cell: | |
cell['prompt_number'] = None | |
return nb | |
if __name__ == '__main__': | |
filename = sys.argv[1] | |
with io.open(filename, 'r', encoding='utf8') as f: | |
nb = read(f, as_version=NO_CONVERT) | |
nb = strip_output(nb) | |
with io.open(filename, 'w', encoding='utf8') as f: | |
write(nb, f) | |
#!/bin/sh | |
# | |
# strip output of IPython Notebooks | |
# add this as `.git/hooks/pre-commit` | |
# to run every time you commit a notebook | |
# | |
# requires `nbstripout` to be available on your PATH | |
# | |
# LICENSE: Public Domain | |
if git rev-parse --verify HEAD >/dev/null 2>&1; then | |
against=HEAD | |
else | |
# Initial commit: diff against an empty tree object | |
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904 | |
fi | |
# Find notebooks to be committed | |
( | |
IFS=' | |
' | |
NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq` | |
for NB in $NBS ; do | |
echo "Removing outputs from $NB" | |
nbstripout "$NB" | |
git add "$NB" | |
done | |
) | |
exec git diff-index --check --cached $against -- |
I have added documentation, an nbstripout install
command to install the filter in the current Git repository and turned it into a module with a setuptools script entry point: https://github.com/kynan/nbstripout
How do you feel about publishing that on PyPI @minrk?
I've adapted cfriedline's repo to make it easy to install to any repo as a filter https://github.com/jond3k/ipynb_stripout
@jond3k Have a look at my repo linked above: it works with v3 and v4 and has an install command to automate the installation in any git repo.
@kynan feel free to put it on PyPI. No need to wait for me.
@minrk OK, will do, thanks!
Great snippet, thanks a lot for sharing!
Two suggestions:
- Small fix: I guess it should be
grep '\.ipynb$'
with the.
escaped, else it will match anything - Also add
| tr -d '\000' |
before grep:NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq
The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh
(i.e. getting Binary file (standard input) matches
from grep
instead of the matchiing parts)
I've created a version that removes the whole cell. Although I have to admit the way I track the index is not at all optimal and there might be better ways making proper use of the API. Feedback welcome:
https://gist.github.com/dietmarw/dc0cf089d8d6211136d5