-
Star
(218)
You must be signed in to star a gist -
Fork
(37)
You must be signed in to fork a gist
-
-
Save tobek/a17fa9101d7e28ddad26 to your computer and use it in GitHub Desktop.
/* open up chrome dev tools (Menu > More tools > Developer tools) | |
* go to network tab, refresh the page, wait for images to load (on some sites you may have to scroll down to the images for them to start loading) | |
* right click/ctrl click on any entry in the network log, select Copy > Copy All as HAR | |
* open up JS console and enter: var har = [paste] | |
* (pasting could take a while if there's a lot of requests) | |
* paste the following JS code into the console | |
* copy the output, paste into a text file | |
* open up a terminal in same directory as text file, then: wget -i [that file] | |
*/ | |
var imageUrls = []; | |
har.log.entries.forEach(function (entry) { | |
// This step will filter out all URLs except images. If you just want e.g. just jpg's then check mimeType against "image/jpeg", etc. | |
if (entry.response.content.mimeType.indexOf("image/") !== 0) return; | |
imageUrls.push(entry.request.url); | |
}); | |
console.log(imageUrls.join('\n')); |
Thank you.
Than you!!! OMG :)
lmfaoooo this is super clever haha
Thank you so much! And I found it does take some time to paste, so I write a Python script to get image URLs offline. See below, please.
import json
from haralyzer import HarParser, HarPage
# Download the .har file from Developer tools(roughly the same as your operations), and we can parse it offline.
# Even if we have many image files to be download, it will not take too much time to wait to paste.
with open('source_har.har', 'r') as f:
har_parser = HarParser(json.loads(f.read()))
data = har_parser.har_data["entries"]
image_urls = []
for entry in data:
if entry["response"]["content"]["mimeType"].find("image/") == 0:
image_urls.append(entry["request"]["url"])
# Save the URL list to a text file directly.
with open('target_link.txt', 'w') as f:
for link in image_urls:
f.write("%s\n" % link)
@puziyi thanks for the python script!
hi, I just created an account to respond to this thread. I followed the steps up until "7. copy the output, paste into a text file" because when I attempt to copy the output or right click in it, it freezes DevTools, and if I'm not on the tab for too long, DevTools will become blank unless I refresh. I'm also having trouble downloading Python, so I can't use the offline downloader script provided by @puziyi. how do I circumvent the first issue?
I figured out a workaround a while ago by using Mozilla Firefox and following the steps from there. Now, my issue is at "8. open up a terminal in same directory as text file, then: wget -i [that file]" because when I input "wget -i [the file path]", Windows Terminal at first needed me to "Supply values for the following parameters: Uri:" and typing the target website comes back with an error. Should I go somewhere else because my problem deviates from the original topic?
when I input "wget -i [the file path]", Windows Terminal at first needed me to "Supply values for the following parameters: Uri:" and typing the target website comes back with an error
The instructions I wrote are for Linux. I didn't think Windows even had wget
, but sounds like it does but with a different interface. Look up how to download files using a text file with a list of URLs in Windows.
Cool. Thanks!
I don't understand what you mean by it takes a long time to paste? because when i paste, its instant then i get the message "undefined"
I don't understand what you mean by it takes a long time to paste? because when i paste, its instant then i get the message "undefined"
It occurs in situations where one needs to download a bunch of images.
update: we can also use charles or fiddler to proxy the chrome/firefox http traffic, then just select and save all image file to your cumputer, remember to add file extension like jpeg or png after that. It's effictive when you need download images with cookies. However this method won't keep the file order like what it is in Network Devtool panel.
an example python code for download image from har with cookies, inspired by @puziyi
import json
import requests
with open('source_har.har', 'r', encoding="utf-8") as f:
har_json = json.loads(f.read())
for i,entry in enumerate(har_json['log']["entries"]):
if entry["response"]["content"]["mimeType"].find("image/jpeg") == 0:
url = entry["request"]["url"]
name = str(i) + '.jpeg'
cookies = entry["request"]["cookies"][0]
# when cookies's value is boolean, you need convert it to str
cookies = {k:str(v) for k,v in cookies.items()}
img = requests.get(url, cookies=cookies).content
with open(name,'wb') as f:
f.write(img)
That worked on my windows10:
& 'C:\path\to\wget.exe' -r -nH --cut-dirs=<N> -P 'C:\Path\to\output' -i 'target_link.txt'
Thanks! saved me some time <3
if that comes to u " 'wget' is not recognized as an internal or external command"
Follow this ==> https://bobbyhadz.com/blog/wget-is-not-recognized-as-internal-or-external-command
in windows's instead of
"wget -i [that file]"
use following command from PowerShell:
Get-Content [that file] | ForEach-Object { Invoke-WebRequest -Uri $_ -OutFile (Split-Path -Leaf $_) }
fantastic demo. It does get sluggish for me and the inspector tended to freeze depending on the amount of images.