Created
October 21, 2016 07:52
-
-
Save pete-otaqui/969cc810af662e7e8b5a40482817ac91 to your computer and use it in GitHub Desktop.
download a website for offline browsing with wget
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
wget -E -k -r -p -e robots=off https://some-site.com/docs/ | |
#### Note the following arguments: | |
# -E : converts downloaded HTML filenames to have a ".html" suffix | |
# -k : converts internal links within downloaded files to point to other downloaded files | |
# -r : recursively download by scanning for internal links in pages | |
# -p : download "page requisites", i.e. images, styles, scripts | |
# -e robots=off : ignore robots.txt (because some sites use it to avoid indexing) | |
#### Other useful arguments | |
# --no-parent : don't ascend in the path hierarchy (useful for just getting a "/docs/" section) | |
# -A "/index.html,*.svg,*/docs/*" : comma-separated "accept list", can use patterns | |
# -R "*.eot,*.woff,/archive" : comma-separated "reject list", can use patterns | |
# -L : spans host names, careful you don't try to download the entire web |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment