Skip to content

Instantly share code, notes, and snippets.

@azizur
Last active January 1, 2025 11:43
Show Gist options
  • Save azizur/ffe8ee6a0a2bb418e5cc8ff101fad91a to your computer and use it in GitHub Desktop.
Save azizur/ffe8ee6a0a2bb418e5cc8ff101fad91a to your computer and use it in GitHub Desktop.
Creating a static copy of a dynamic website

The command line, in short…

wget -k -K -E -r -l 10 -p -N -F --restrict-file-names=windows -nH http://website.com/

…and the options explained

  • -k : convert links to relative
  • -K : keep an original versions of files without the conversions made by wget
  • -E : rename html files to .html (if they don’t already have an htm(l) extension)
  • -r : recursive… of course we want to make a recursive copy
  • -l 10 : the maximum level of recursion. if you have a really big website you may need to put a higher number, but 10 levels should be enough.
  • -p : download all necessary files for each page (css, js, images)
  • -N : Turn on time-stamping.
  • -F : When input is read from a file, force it to be treated as an HTML file.
  • -nH : By default, wget put files in a directory named after the site’s hostname. This will disabled creating of those hostname directories and put everything in the current directory.
  • –restrict-file-names=windows : may be useful if you want to copy the files to a Windows PC.

source: http://blog.jphoude.qc.ca/2007/10/16/creating-static-copy-of-a-dynamic-website/

@tiendungitd
Copy link

can you use this script for copy website with authentication? I tried to append option --user {user_name} --ask-password in this command, but open the downloaded webpage only showed authentication page.

@azizur
Copy link
Author

azizur commented Dec 24, 2022

The website you are trying to copy does not use a AuthType Basic for authentication. Hence it did not work.

As per wget man

--user=USER                 set both ftp and http user to USER
--password=PASS             set both ftp and http password to PASS
--ask-password              prompt for passwords
--use-askpass=COMMAND       specify credential handler for requesting
                             username and password.  If no COMMAND is
                             specified the WGET_ASKPASS or the SSH_ASKPASS
                             environment variable is used.

These parameters are for sites that uses AuthType Basic.

If the site uses cookies perhaps you can use the cookies options.

--load-cookies=FILE         load cookies from FILE before session
--save-cookies=FILE         save cookies to FILE after session
--keep-session-cookies      load and save session (non-permanent) cookies

If the site uses JWT you can use the header option.

--header=STRING             insert STRING among the headers

@lamff
Copy link

lamff commented Mar 1, 2024

great!

  • --no-check-certificate : may be useful if cert expired

@xoussamax
Copy link

where is the code ? the websites doesn't work

@azizur
Copy link
Author

azizur commented Sep 15, 2024

where is the code ? the websites doesn't work

What do you mean? Code for what?

@Grachy
Copy link

Grachy commented Sep 21, 2024

hello mate, can you explain why wget can NOT download any css or js? I had to manually do that.

@azizur
Copy link
Author

azizur commented Sep 30, 2024

@Grachy Please check that your wget implementation supports the -p flag.

-p : download all necessary files for each page (css, js, images)

It may possibly be under another flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment