Skip to content

Instantly share code, notes, and snippets.

@traek
Last active May 20, 2024 14:26
Show Gist options
  • Save traek/02c3634e1e439434a256e9ff6f289df8 to your computer and use it in GitHub Desktop.
Save traek/02c3634e1e439434a256e9ff6f289df8 to your computer and use it in GitHub Desktop.
Simple script to download files from single web page
#!/usr/bin/env bash
if (( $# > 0 )); then destination=$1; else destination="."; fi
# Check for required commands
required=(awk grep lynx wget); missing=()
for command in ${required[@]}; do
hash $command 2>/dev/null || missing+=($command)
done
if (( ${#missing[@]} > 0 )); then
echo "[FATAL] could not find command(s): ${missing[@]}. Exiting!"
exit 1
fi
# Edit these two variables as needed (example usage for 'The MagPi' magazine issues)
sourcepath="https://www.raspberrypi.org/magpi-issues"
pattern="^MagPi[0-9]*.pdf"
echo -n "[GET] Reading source links... "
file=($(lynx -listonly -dump $sourcepath | awk -F'/' '{print $NF}' | grep $pattern))
echo "DONE"
for dl in ${file[@]}; do
if [[ ! -f $destination/$dl ]]; then
echo -n "[MISSING] Downloading to $destination/$dl"
wget -q -P $destination --show-progress $sourcepath/$dl
else
echo "[FOUND] Skipping $destination/$dl"
fi
done
@traek
Copy link
Author

traek commented Apr 10, 2019

Very basic script to download any files missing from the target directory from a list of files on a single web page source. Used as-is, it is a handy script to keep up-to-date on issues of [free to download] The MagPi magazine.

The original version of the file is pretty straight-forward but the most recent version includes a check to verify all required commands are present. I used this as a tool to help my kids learn how to develop bash scripts while they are working with their Raspberry Pis.

@traek
Copy link
Author

traek commented May 31, 2019

I updated this script to work with sites that have invalid XHTML tags (read: most) and used 'lynx' to do it. Though not as common as the combination of curl and xmllint, it provides far more consistent results.

@traek
Copy link
Author

traek commented Jul 15, 2022

This method no longer works for The MagPi (it was broken quite some time ago) but is still useful for similar pages. I made another attempt that specifically downloads any available Raspberry Pi Press publications here: https://github.com/traek/pi-tools/blob/main/raspipress.py

@jclack2
Copy link

jclack2 commented Sep 26, 2023

This is cool - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment