Created
February 12, 2017 16:14
-
-
Save amishshah/678d7600c450181a94e6481fee514208 to your computer and use it in GitHub Desktop.
Rough script to extract images from HTTP Archive (HAR) files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
const fs = require('fs'); | |
const file = JSON.parse(fs.readFileSync('./dump.har')).log; | |
const targetMimeType = 'image/jpeg'; | |
let count = 1; | |
for (const entry of file.entries) { | |
if (entry.response.content.mimeType === targetMimeType) { | |
// ensure output directory exists before running! | |
fs.writeFileSync(`output/${count}.png`, new Buffer(entry.response.content.text, 'base64'), 'binary'); | |
count++; | |
} | |
} | |
console.log(`Grabbed ${count} files`); |
Thanks.But the image genereated can't be read.Its corrupt.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for the work and the time to share this.
I've made some further improvements to make the script more usable while handling large archives with thousands of files.
It saves the files concurrently, and displays a text progress bar to the console while doing so. I've also made it so it keeps the original file names from the req URL, and also creates the output dir first if it does not exist yet.
No external dependencies. Just run it normally: