Last active
May 20, 2024 14:26
-
-
Save traek/02c3634e1e439434a256e9ff6f289df8 to your computer and use it in GitHub Desktop.
Simple script to download files from single web page
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
if (( $# > 0 )); then destination=$1; else destination="."; fi | |
# Check for required commands | |
required=(awk grep lynx wget); missing=() | |
for command in ${required[@]}; do | |
hash $command 2>/dev/null || missing+=($command) | |
done | |
if (( ${#missing[@]} > 0 )); then | |
echo "[FATAL] could not find command(s): ${missing[@]}. Exiting!" | |
exit 1 | |
fi | |
# Edit these two variables as needed (example usage for 'The MagPi' magazine issues) | |
sourcepath="https://www.raspberrypi.org/magpi-issues" | |
pattern="^MagPi[0-9]*.pdf" | |
echo -n "[GET] Reading source links... " | |
file=($(lynx -listonly -dump $sourcepath | awk -F'/' '{print $NF}' | grep $pattern)) | |
echo "DONE" | |
for dl in ${file[@]}; do | |
if [[ ! -f $destination/$dl ]]; then | |
echo -n "[MISSING] Downloading to $destination/$dl" | |
wget -q -P $destination --show-progress $sourcepath/$dl | |
else | |
echo "[FOUND] Skipping $destination/$dl" | |
fi | |
done |
I updated this script to work with sites that have invalid XHTML tags (read: most) and used 'lynx
' to do it. Though not as common as the combination of curl
and xmllint
, it provides far more consistent results.
This method no longer works for The MagPi (it was broken quite some time ago) but is still useful for similar pages. I made another attempt that specifically downloads any available Raspberry Pi Press publications here: https://github.com/traek/pi-tools/blob/main/raspipress.py
This is cool - thank you!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Very basic script to download any files missing from the target directory from a list of files on a single web page source. Used as-is, it is a handy script to keep up-to-date on issues of [free to download] The MagPi magazine.
The original version of the file is pretty straight-forward but the most recent version includes a check to verify all required commands are present. I used this as a tool to help my kids learn how to develop bash scripts while they are working with their Raspberry Pis.