|
au,gov,australia)/about 20070831172339 http://australia.gov.au/about text/html 200 ZUEQ3STH3JAEABZG22LQI626TTY7DN2A - - - 14369759 NLA-AU-CRAWL-002-20070831172246-04117-crawling015.us.archive.org.arc.gz |
|
au,gov,australia)/about 20080719174427 http://www.australia.gov.au/About text/html 200 CGSTTFZGMVAHEOMHQTGTUZUG46MLBFL6 - - - 62867360 NLA-AU-CRAWL-003-20080719174211-01545-crawling104.us.archive.org.arc.gz |
|
au,gov,australia)/about 20090916104859 http://www.australia.gov.au/about text/html 200 7VXWF4Y6TXFWR7JZPORIEHUD5ORMHBMY - - - 59828846 NLA-AU-CRAWL-004-20090916104520-09084-crawling106.us.archive.org.arc.gz |
|
au,gov,australia)/about 20091112023446 http://australia.gov.au/about text/html 200 7VXWF4Y6TXFWR7JZPORIEHUD5ORMHBMY - - - 70365777 NLA-AU-CRAWL-004-PATCH-20091112023201-00275-crawling108.us.archive.org.arc.gz |
|
au,gov,australia)/about 20110216141839 http://www.australia.gov.au/about - 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 765762042 NLA-AU-CRAWL-005-20110216141406-00155-crawling218.us.archive.org.warc.gz |
|
au,gov,australia)/about 20110217132700 http://australia.gov.au/about text/html 200 3JQ6HKH4HXEI4G335KENZBQNCFHF7PP4 - - - 477571492 NLA-AU-CRAWL-005-20110217132352-00349-crawling218.us.archive.org.warc.gz |
|
au,gov,australia)/about 20110226123639 http://australia.gov.au/about text/html 200 DRP6CY44HXCJP4TNTMNOKE6AF3ZANGVU - - - 343881716 NLA-AU-CRAWL-005-20110226123146-00037-crawling218.us.archive.org.warc.gz |
|
au,gov,australia)/about 20110226133347 https://australia.gov.au/about text/html 200 WBAG4MI6N5QCQ2LFLKSA3OQ6RZUMPTMO - - - 593342593 NLA-AU-CRAWL-005-20110226132237-00040-crawling218.us.archive.org.warc.gz |
|
au,gov,australia)/about 20110328204616 http://australia.gov.au/about text/html 200 Z34GAL7DQINDJUXUS4CGPEL4YK4FRIOH - - - 656072390 NLA-AU-CRAWL-005-20110328201652-00001-crawling218.us.archive.org.warc.gz |
|
au,gov,australia)/about 20110422083017 http://australia.gov.au/about text/html 200 BPPC5KI3E44TVMKFA66ZFUUT46KP7SAV - - - 513717895 NLA-AU-CRAWL-005-20110422082730-00024-crawling213.us.archive.org.warc.gz |
|
au,gov,australia)/about 20120321062048 http://australia.gov.au/about text/html 200 RWIUXTTE64RHNEWQCCL7UEDIGZJNPLVJ - - - 198474221 NLA-AU-CRAWL-006-20120321061732570-00098-3266~web-crawl001.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130409003017 http://australia.gov.au/about text/html 200 IB3AMRZJMPFIATC6WQHPH4LVUUACXAW7 - - - 718271905 NLA-AU-CRAWL-04-03-2013-20130409002124240-00009-27793~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130409234435 https://www.australia.gov.au/about application/http 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 855531863 NLA-AU-CRAWL-04-03-2013-20130409232851898-00397-27793~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130410112908 https://australia.gov.au/about text/html 200 PNFEFWUCNGAVTARXTS5LLSOMATLRFRG3 - - - 11352123 NLA-AU-CRAWL-04-03-2013-20130410112854462-00572-27793~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130421094357 https://www.australia.gov.au/about warc/revisit - 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 218425876 NLA-AU-CRAWL-04-03-2013-20130421093701746-01421-29417~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130421122342 https://australia.gov.au/about text/html 200 PICUVAYGMZY5IOXPWHKLE6BVYFACC7LG - - - 353874381 NLA-AU-CRAWL-04-03-2013-20130421121108172-01443-29417~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130427095452 https://www.australia.gov.au/about warc/revisit - 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 903094466 NLA-AU-CRAWL-04-03-2013-20130427085524926-01687-433~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130427132450 https://australia.gov.au/about text/html 200 PNX76BM2Z5WK4H66M4LGLE25AXLC5SZ5 - - - 286205731 NLA-AU-CRAWL-04-03-2013-20130427131330785-01699-433~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130502072522 http://australia.gov.au/about text/html 200 NUUDONRPIPF3FBGZL2UZRVUKIJPL6G2F - - - 659173849 NLA-AU-CRAWL-04-03-2013-20130502071652498-00011-26913~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130503074233 https://www.australia.gov.au/about application/http 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 68418494 NLA-AU-CRAWL-04-03-2013-20130503074145197-00410-26913~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130503082356 https://australia.gov.au/about text/html 200 DLJ7MHJX7XSBHL3DSK4AJTP47YJQ4XD6 - - - 169734931 NLA-AU-CRAWL-04-03-2013-20130503082227277-00426-26913~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130509171208 https://www.australia.gov.au/about application/http 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ - - - 469007252 NLA-AU-CRAWL-04-03-2013-20130509170233360-01422-2357~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20130509202803 https://australia.gov.au/about text/html 200 NW2L4S35DERPWVH63TGZRN67PILW5BSK - - - 76130469 NLA-AU-CRAWL-04-03-2013-20130509202651491-01459-2357~wbgrp-crawl008.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20140114001125 http://australia.gov.au/about text/html 200 N75NS3Y3B44BJE22SDICYCBKE5YQKHUI - - 8295 476816575 NLA-AU-TEST-01-10-2014-20140114000228973-00012-24613~wbgrp-crawl003.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20140126200707 http://australia.gov.au/about text/html 200 V3PBNQEH6EPI5ARS6HOTI4GA7MA37ZG6 - - 8316 40437948 NLA-AU-CRAWL-01-21-2014-20140126200633086-00572-25807~wbgrp-crawl004.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about 20140407081604 http://www.australia.gov.au/about text/html 200 AKSSUZJLOW3BJF546AAVCYPSJH3PSCRF - - 8265 42047962 NLA-AU-CRAWL-01-21-2014-20140407081424273-03408-7081~wbgrp-crawl004.us.archive.org~8443.warc.gz |
|
au,gov,australia)/about-australia 20050622180623 http://australia.gov.au/about-australia text/html 200 1a1eb13d0f84d6f7980546cf1254e019 - - - 15680184 NLA-AU-CRAWL-000-20050622180402-06036-crawling016.archive.org |
|
au,gov,australia)/about-australia 20060819094822 http://australia.gov.au/about-australia text/html 200 751c368557765d512bf9ec76ba513ff5 - - - 40191902 NLA-AU-CRAWL-001-20060819094631-00724-crawling01.us.archive.org |
|
au,gov,australia)/about-australia 20060820210554 http://australia.gov.au/about-australia text/html 200 751c368557765d512bf9ec76ba513ff5 - - - 62545839 NLA-AU-CRAWL-001-20060820210248-02827-crawling01.us.archive.org |
|
au,gov,australia)/about-australia 20070830152508 http://australia.gov.au/about-australia text/html 404 6ZY2SKU552PWOLOTNZF5OTDE4WUJISSN - - - 8534439 NLA-AU-CRAWL-002-20070830152458-02848-crawling015.us.archive.org.arc.gz |
Annoyingly RocksDB just seems to silently not compress if it's not built with snappy, even if you explicitly set the compression algorithm option. I'm not sure if there's a proper way to check it. The way I noticed the first time was just the file sizes were larger than what I was expecting and then I confirmed what it was doing by reading the raw .sst database files.
I don't have any uncompressed examples handy, but if it's working if you
hexdump
orstrings
on an .sst file you'll only see full urls at the start of each compression block (~8KB but it varies) and then following records will only have small fragments as the algorithm reuses previous strings. An uncompressed index will spell out the full URLs in each record and be a lot more human-readable.