Skip to content

Instantly share code, notes, and snippets.

@technillogue
Forked from anonymous/gist:d432388d5bb9309020a7
Last active August 29, 2015 14:17
Show Gist options
  • Save technillogue/d296da65827caeb5b357 to your computer and use it in GitHub Desktop.
Save technillogue/d296da65827caeb5b357 to your computer and use it in GitHub Desktop.
#mount a Gutenberg CD
mount -t iso9660 PG2003-08.ISO iso -o loop
# flatten the hierarchy of a Gutenberg CD
cp $(find iso -name "*.txt") lt/corpus
# strip the corpus
for i in $(ls corpus); do < corpus/$i ./strip.py > words/$i; done
# count some words
for i in $(ls words); do < words/$i ./wordcount.py > wc/$i; done
# parse the master list of books in a Gutenberg CD
with open("master_list.csv") as f:
meta = [[((title + ": " + subtitle) if subtitle else title), (ln, fn), txt] for title,subtitle,fn,ln,txt,html,catmo,catyear,lang in csv.reader(f) if lang is '' and txt is not '']d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment