Last active
November 21, 2024 05:15
-
-
Save mhmdiaa/adf6bff70142e5091792841d4b372050 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
import sys | |
import json | |
def waybackurls(host, with_subs): | |
if with_subs: | |
url = 'http://web.archive.org/cdx/search/cdx?url=*.%s/*&output=json&fl=original&collapse=urlkey' % host | |
else: | |
url = 'http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey' % host | |
r = requests.get(url) | |
results = r.json() | |
return results[1:] | |
if __name__ == '__main__': | |
argc = len(sys.argv) | |
if argc < 2: | |
print('Usage:\n\tpython3 waybackurls.py <url> <include_subdomains:optional>') | |
sys.exit() | |
host = sys.argv[1] | |
with_subs = False | |
if argc > 3: | |
with_subs = True | |
urls = waybackurls(host, with_subs) | |
json_urls = json.dumps(urls) | |
if urls: | |
filename = '%s-waybackurls.json' % host | |
with open(filename, 'w') as f: | |
f.write(json_urls) | |
print('[*] Saved results to %s' % filename) | |
else: | |
print('[-] Found nothing') |
Hi,
Just wanted to tell you that I used your Idea in https://github.com/akamhy/waybackpy. [commit]
Usage :
pip3 install waybackpy
waybackpy --url akamhy.github.io --user_agent "my-user-agent" --known_urls
Output:
http://akamhy.github.io
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/
https://akamhy.github.io/waybackpy/assets/css/style.css?v=a418a4e4641a1dbaad8f3bfbf293fad21a75ff11
https://akamhy.github.io/waybackpy/assets/css/style.css?v=f881705d00bf47b5bf0c58808efe29eecba2226c
6 URLs found and saved in ./akamhy.github.io-6-urls.txt
Flags:
- '--alive' will only fetch URLs that are not dead. alive will be slower for websites with too many archived URLs e.g. google
- '--subdomain' will include URLs from subdomains.
See live use @ https://repl.it/@akamhy/Waybackpy-Known-Urls#main.sh
thanku man>
What to do if you have installed wb in python and want to try it in go. They have the same initialization. How to use it in this case?
Hey man just want to say i used your idea as-well. you have been credited :) i made the script because the waybackurls tool was not working on my install.
it works well
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A bash function which uses
jq
(not for sub-domain search but works for any URL prefix). It gives the full web archive url which is generally of formathttps://web.archive.org/web/$TIMESTAMP/$ORIGINAL
:This can be added to the ~/.bashrc or relevant shell profile.
Usage:
wb gist.github.com/mhmdiaa