-
-
Save mhmdiaa/adf6bff70142e5091792841d4b372050 to your computer and use it in GitHub Desktop.
import requests | |
import sys | |
import json | |
def waybackurls(host, with_subs): | |
if with_subs: | |
url = 'http://web.archive.org/cdx/search/cdx?url=*.%s/*&output=json&fl=original&collapse=urlkey' % host | |
else: | |
url = 'http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey' % host | |
r = requests.get(url) | |
results = r.json() | |
return results[1:] | |
if __name__ == '__main__': | |
argc = len(sys.argv) | |
if argc < 2: | |
print('Usage:\n\tpython3 waybackurls.py <url> <include_subdomains:optional>') | |
sys.exit() | |
host = sys.argv[1] | |
with_subs = False | |
if argc > 3: | |
with_subs = True | |
urls = waybackurls(host, with_subs) | |
json_urls = json.dumps(urls) | |
if urls: | |
filename = '%s-waybackurls.json' % host | |
with open(filename, 'w') as f: | |
f.write(json_urls) | |
print('[*] Saved results to %s' % filename) | |
else: | |
print('[-] Found nothing') |
A bash function which uses jq
(not for sub-domain search but works for any URL prefix). It gives the full web archive url which is generally of format https://web.archive.org/web/$TIMESTAMP/$ORIGINAL
:
wb ()
{
if [[ -z $1 ]]; then
echo "Usage: $0 URL";
else
curl "http://web.archive.org/cdx/search/cdx?url=$1/*&output=json&fl=original,timestamp" 2> /dev/null | jq '.[1:][] |"https://web.archive.org/web/" +.[1] + "/" + .[0]' 2> /dev/null;
fi
}
This can be added to the ~/.bashrc or relevant shell profile.
Usage: wb gist.github.com/mhmdiaa
Hi,
Just wanted to tell you that I used your Idea in https://github.com/akamhy/waybackpy. [commit]
Usage :
pip3 install waybackpy
waybackpy --url akamhy.github.io --user_agent "my-user-agent" --known_urls
Output:
http://akamhy.github.io
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/
https://akamhy.github.io/waybackpy/assets/css/style.css?v=a418a4e4641a1dbaad8f3bfbf293fad21a75ff11
https://akamhy.github.io/waybackpy/assets/css/style.css?v=f881705d00bf47b5bf0c58808efe29eecba2226c
6 URLs found and saved in ./akamhy.github.io-6-urls.txt
Flags:
- '--alive' will only fetch URLs that are not dead. alive will be slower for websites with too many archived URLs e.g. google
- '--subdomain' will include URLs from subdomains.
See live use @ https://repl.it/@akamhy/Waybackpy-Known-Urls#main.sh
thanku man>
What to do if you have installed wb in python and want to try it in go. They have the same initialization. How to use it in this case?
Hey man just want to say i used your idea as-well. you have been credited :) i made the script because the waybackurls tool was not working on my install.
it works well
pip3 install requests