Skip to content

Instantly share code, notes, and snippets.

@conorsch
Created March 24, 2020 21:00
Show Gist options
  • Save conorsch/bb8b573a6a7a98af70db2a20b4866122 to your computer and use it in GitHub Desktop.
Save conorsch/bb8b573a6a7a98af70db2a20b4866122 to your computer and use it in GitHub Desktop.
Helper scripts to manage Qubes memory balance service
#!/bin/bash
# Utility script to check whether Qubes memory balancing
# service has failed. Compares the timestamps of the last
# success balance operation and the most recent "EOF"
# message available in the log file. If EOF is more
# recent, declare service broken. Recommended invocation:
#
# watch -n5 ./check-qmemman.sh
#
set -e
set -u
set -o pipefail
get_last_balance_time() {
grep -P 'balance_when_enough_memory' /var/log/qubes/qmemman.log \
| tail -n1 \
| perl -nE '/^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+)/ and say $1' \
| xargs -d '\n' date +%s -d
}
get_last_eof_time() {
grep -P 'EOF$' /var/log/qubes/qmemman.log \
| tail -n1 \
| perl -nE '/^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+)/ and say $1' \
| xargs -d '\n' date +%s -d
}
if [[ (( $(get_last_eof_time) > $(get_last_balance_time) )) ]]; then
echo "Looks like qmembalance has failed."
echo "You should restarted it with:"
echo "sudo systemctl restart qubes-qmemman"
exit 1
else
echo "The qmembalance service appears to be working correctly."
fi
#!/bin/bash
# Utility script to restart the Qubes memory balancing
# service if it's failed. Depends on another script
# to determine whether it's failed or not.
set -e
set -u
set -o pipefail
if ! test -e check-qmemman.sh ; then
echo "Could not find check-qmemman.sh script!"
exit 1
fi
echo "$(date) Begin monitoring qmemman-behavior" >> /tmp/qmemman-check.log
while true; do
clear
if ! ./check-qmemman.sh ; then
echo "$(date) qmemman service failed" >> /tmp/qmemman-check.log
sudo systemctl restart qubes-qmemman
echo "$(date) qmemman service restarted" >> /tmp/qmemman-check.log
fi
sleep 5
done
@eloquence
Copy link

I'm also not noticing a single EOF line in the log currently so maybe get_last_eof_time is erroring out because of that.

@conorsch
Copy link
Author

conorsch commented Apr 6, 2020

If no EOF, then yes, that's the problem. The --no-run-if-empty flag for xargs is made for this case. I'm currently running the patches from QubesOS/qubes-core-admin#331 so I also have zero EOFs locally.

@eloquence
Copy link

Yup just adding -r to xargs resolves. Will keep this running again during my next update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment