-
Star
(124)
You must be signed in to star a gist -
Fork
(16)
You must be signed in to fork a gist
-
-
Save minrk/6176788 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
"""strip outputs from an IPython Notebook | |
Opens a notebook, strips its output, and writes the outputless version to the original file. | |
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS. | |
This does mostly the same thing as the `Clear All Output` command in the notebook UI. | |
LICENSE: Public Domain | |
""" | |
import io | |
import sys | |
try: | |
# Jupyter >= 4 | |
from nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython 3 | |
try: | |
from IPython.nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython < 3 | |
from IPython.nbformat import current | |
def read(f, as_version): | |
return current.read(f, 'json') | |
def write(nb, f): | |
return current.write(nb, f, 'json') | |
def _cells(nb): | |
"""Yield all cells in an nbformat-insensitive manner""" | |
if nb.nbformat < 4: | |
for ws in nb.worksheets: | |
for cell in ws.cells: | |
yield cell | |
else: | |
for cell in nb.cells: | |
yield cell | |
def strip_output(nb): | |
"""strip the outputs from a notebook object""" | |
nb.metadata.pop('signature', None) | |
for cell in _cells(nb): | |
if 'outputs' in cell: | |
cell['outputs'] = [] | |
if 'prompt_number' in cell: | |
cell['prompt_number'] = None | |
return nb | |
if __name__ == '__main__': | |
filename = sys.argv[1] | |
with io.open(filename, 'r', encoding='utf8') as f: | |
nb = read(f, as_version=NO_CONVERT) | |
nb = strip_output(nb) | |
with io.open(filename, 'w', encoding='utf8') as f: | |
write(nb, f) | |
#!/bin/sh | |
# | |
# strip output of IPython Notebooks | |
# add this as `.git/hooks/pre-commit` | |
# to run every time you commit a notebook | |
# | |
# requires `nbstripout` to be available on your PATH | |
# | |
# LICENSE: Public Domain | |
if git rev-parse --verify HEAD >/dev/null 2>&1; then | |
against=HEAD | |
else | |
# Initial commit: diff against an empty tree object | |
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904 | |
fi | |
# Find notebooks to be committed | |
( | |
IFS=' | |
' | |
NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq` | |
for NB in $NBS ; do | |
echo "Removing outputs from $NB" | |
nbstripout "$NB" | |
git add "$NB" | |
done | |
) | |
exec git diff-index --check --cached $against -- |
I've created a version that removes the whole cell. Although I have to admit the way I track the index is not at all optimal and there might be better ways making proper use of the API. Feedback welcome:
https://gist.github.com/dietmarw/dc0cf089d8d6211136d5
I have added documentation, an nbstripout install
command to install the filter in the current Git repository and turned it into a module with a setuptools script entry point: https://github.com/kynan/nbstripout
How do you feel about publishing that on PyPI @minrk?
I've adapted cfriedline's repo to make it easy to install to any repo as a filter https://github.com/jond3k/ipynb_stripout
@jond3k Have a look at my repo linked above: it works with v3 and v4 and has an install command to automate the installation in any git repo.
@kynan feel free to put it on PyPI. No need to wait for me.
@minrk OK, will do, thanks!
Great snippet, thanks a lot for sharing!
Two suggestions:
- Small fix: I guess it should be
grep '\.ipynb$'
with the.
escaped, else it will match anything - Also add
| tr -d '\000' |
before grep:NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq
The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh
(i.e. getting Binary file (standard input) matches
from grep
instead of the matchiing parts)
Slightly modified method that works with the new notebook format (v4) used in iPython 3
https://gist.github.com/waylonflinn/010f0a1a66760adf914f
The essential difference is an added check for the presence of the
worksheets
object on the root.