Created
February 12, 2017 16:14
-
-
Save amishshah/678d7600c450181a94e6481fee514208 to your computer and use it in GitHub Desktop.
Rough script to extract images from HTTP Archive (HAR) files
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
const fs = require('fs'); | |
const file = JSON.parse(fs.readFileSync('./dump.har')).log; | |
const targetMimeType = 'image/jpeg'; | |
let count = 1; | |
for (const entry of file.entries) { | |
if (entry.response.content.mimeType === targetMimeType) { | |
// ensure output directory exists before running! | |
fs.writeFileSync(`output/${count}.png`, new Buffer(entry.response.content.text, 'base64'), 'binary'); | |
count++; | |
} | |
} | |
console.log(`Grabbed ${count} files`); |
Thanks for the work and the time to share this.
I've made some further improvements to make the script more usable while handling large archives with thousands of files.
It saves the files concurrently, and displays a text progress bar to the console while doing so. I've also made it so it keeps the original file names from the req URL, and also creates the output dir first if it does not exist yet.
const fs = require('fs');
const fsAsync = require('fs').promises;
const targetMimeType = 'image/jpeg';
const file = JSON.parse(fs.readFileSync('./dump.har')).log;
const dir = './output';
if (!fs.existsSync(dir)){
fs.mkdirSync(dir);
}
// renders a text based progress bar to the console
const width = 30;
const displayProgress = (cur, total) => {
const pct = Math.round(cur / total * 100) / 100;
const done = pct * width;
const remaining = width - done;
const filled = '\u2588'.repeat(done);
const empty = '\u2591'.repeat(remaining);
// '\r' clears the current line in stout
process.stdout.write(`\r${filled}${empty} | ${Math.ceil(pct * 100)}% | ${cur} of ${total} images saved.`);
}
const promises = [];
let started = 0;
let finished = 0;
for (const entry of file.entries) {
if (entry.response.content.mimeType === targetMimeType) {
const pathParts = new URL(entry.request.url).pathname.split('/');
const filename = pathParts.pop() || pathParts.pop(); // Pop twice to avoid potential trailing slash
promises.push(fsAsync.writeFile(`${dir}/${filename}`, entry.response.content.text, 'base64')
.then(() => {
finished++;
displayProgress(finished, started);
})
.catch(err => {
console.log(err)
})
);
started++;
}
}
Promise.all(promises).then(() => {
process.stdout.write(`\n\u2713 Done.`);
});
No external dependencies. Just run it normally:
node .\har-extract.js
Thanks.But the image genereated can't be read.Its corrupt.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for the script.
But it ran into an error saying
TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object.
fromBuffer
. So I edited the script to make it work. Since'base64'
is a valid type forwriteFileSync
(from this SO answer), we can just use'base64'
without creatingBuffer
object.And to execute, I've ran the following in the PowerShell console.