Skip to content

Instantly share code, notes, and snippets.

@mharsch
Last active August 31, 2024 09:46
Show Gist options
  • Save mharsch/e942fc0f0092f69ea5904a727542340f to your computer and use it in GitHub Desktop.
Save mharsch/e942fc0f0092f69ea5904a727542340f to your computer and use it in GitHub Desktop.
Configure systemd watchdog on raspberry pi to reboot if it can't ping a certain host

Most online instructions for Linux watchdog refer to the 'watchdog' package which is best described here

The watchdog configuration described here uses systemd, and the two methods are mutually exclusive, so if you've previously setup the above watchdog, you must first uninstall it.

sudo apt remove watchdog

Our goal is to add a systemd service that continuously pings a known host (or IP address) and if the remote host stops responding (i.e. ping command fails), we start timing the outage and if connectivity isn't restored within a given timeout - we reboot the pi.

Note/Warning

Many online instructions for setting up systemd watchdog on Linux begin by having you set RuntimeWatchdogSec= in /etc/systemd/system.conf. This enables the hardware watchdog device /dev/watchdog -- BCM2837 on the raspberry pi. Unless you're trying to protect against kernel hangs or other general runtime conditions that render the pi unresponsive, you can skip this step and it allows you greater flexiblity in the watchdog test interval settings later on. Also, you can get in trouble if you do enable the hardware watchdog and then set any watchdog value to greater than the BCM2837 maximum (15 seconds). Failure modes include continuous reboot loops that can be hard to troubleshoot.

Begin

First, add the following script as /usr/local/bin/pingtest.sh and don't forget to make it executable with sudo chmod go+x /usr/local/bin/pingtest.sh.

#!/usr/bin/bash

TARGET=io.adafruit.com
FAIL=false
RETRIES=10
RT=$RETRIES

systemd-notify --ready

while (true);do
    ping -c3 -q $TARGET > /dev/null
    if [ $? -ne 0 ];then
        FAIL=true
    else
        FAIL=false
    fi

    if (! $FAIL); then
        RT=$RETRIES
        systemd-notify WATCHDOG=1
    else
        RT=$(($RT - 1))
        if [ $RT -gt 0 ]; then
            echo "pingtest failed, trying $RT more times"
            systemd-notify WATCHDOG=1
        fi
    fi
    sleep $(($WATCHDOG_USEC / 2000000))
done

In the next step, we will specify a watchdog interval (e.g. 60 sec). You can see in the script that this value is referenced in microseconds, and we sleep for 1/2 the timeout on each loop pass. Set the TARGET to the host you want to PING. RETRIES are the # of times we keep trying before giving up.

Next, add the following file as /etc/systemd/system/pingtest.service

[Unit]
Description=Ping Test Service
After=network.target
FailureAction=reboot-force

[Service]
Type=notify
ExecStart=/usr/local/bin/pingtest.sh
Restart=always
RestartSec=1
WatchdogSec=60

[Install]
WantedBy=multi-user.target

After adding this file, run:

sudo systemctl daemon-reload

Given that WatchdogSec is set to 60, we will end up going through the program loop and pinging every 30 seconds. If we hit a failure, we'll keep feeding the watchdog for RETRIES more times, then let the watchdog timeout, causing the service to fail and the pi to reboot.

To enable this service, run the following

sudo systemctl enable pingtest
sudo systemctl start pingtest

You should test that it works by manually disconnecting network connections and watching it reboot after the expected timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment