Most online instructions for Linux watchdog refer to the 'watchdog' package which is best described here
The watchdog configuration described here uses systemd, and the two methods are mutually exclusive, so if you've previously setup the above watchdog, you must first uninstall it.
sudo apt remove watchdog
Our goal is to add a systemd service that continuously pings a known host (or IP address) and if the remote host stops responding (i.e. ping command fails), we start timing the outage and if connectivity isn't restored within a given timeout - we reboot the pi.
Many online instructions for setting up systemd watchdog on Linux begin by having you set
RuntimeWatchdogSec=
in /etc/systemd/system.conf
. This enables the hardware watchdog
device /dev/watchdog
-- BCM2837 on the raspberry pi. Unless you're trying to protect
against kernel hangs or other general runtime conditions that render the pi unresponsive,
you can skip this step and it allows you greater flexiblity in the watchdog test interval
settings later on. Also, you can get in trouble if you do enable the hardware watchdog
and then set any watchdog value to greater than the BCM2837 maximum (15 seconds). Failure
modes include continuous reboot loops
that can be hard to troubleshoot.
First, add the following script as /usr/local/bin/pingtest.sh
and don't forget to make it
executable with sudo chmod go+x /usr/local/bin/pingtest.sh
.
#!/usr/bin/bash
TARGET=io.adafruit.com
FAIL=false
RETRIES=10
RT=$RETRIES
systemd-notify --ready
while (true);do
ping -c3 -q $TARGET > /dev/null
if [ $? -ne 0 ];then
FAIL=true
else
FAIL=false
fi
if (! $FAIL); then
RT=$RETRIES
systemd-notify WATCHDOG=1
else
RT=$(($RT - 1))
if [ $RT -gt 0 ]; then
echo "pingtest failed, trying $RT more times"
systemd-notify WATCHDOG=1
fi
fi
sleep $(($WATCHDOG_USEC / 2000000))
done
In the next step, we will specify a watchdog interval (e.g. 60 sec). You can see in the script that this value is referenced in microseconds, and we sleep for 1/2 the timeout on each loop pass. Set the TARGET to the host you want to PING. RETRIES are the # of times we keep trying before giving up.
Next, add the following file as /etc/systemd/system/pingtest.service
[Unit]
Description=Ping Test Service
After=network.target
FailureAction=reboot-force
[Service]
Type=notify
ExecStart=/usr/local/bin/pingtest.sh
Restart=always
RestartSec=1
WatchdogSec=60
[Install]
WantedBy=multi-user.target
After adding this file, run:
sudo systemctl daemon-reload
Given that WatchdogSec is set to 60, we will end up going through the program loop and pinging every 30 seconds. If we hit a failure, we'll keep feeding the watchdog for RETRIES more times, then let the watchdog timeout, causing the service to fail and the pi to reboot.
To enable this service, run the following
sudo systemctl enable pingtest
sudo systemctl start pingtest
You should test that it works by manually disconnecting network connections and watching it reboot after the expected timeout.