Skip to content

Instantly share code, notes, and snippets.

@JonathonMA
Created January 6, 2016 07:18
Show Gist options
  • Save JonathonMA/a6796f5488e36d331e94 to your computer and use it in GitHub Desktop.
Save JonathonMA/a6796f5488e36d331e94 to your computer and use it in GitHub Desktop.

We have a set of serialized tasks that occasionally fail. When this happens we usually retry from the start. The failures are usually transient, so the second attempt will usually succeed. However, we start again from the beginning, so we pay the cost of redoing all the tasks. Because the errors are transient, if we could retry in-line these transient errors would not abort the entire pipeline.

Let's try introducing retry(1) to allow us to specify a managed execution.

command-that-fails --argument

becomes:

retry 3 command-that-fails --argument

As long as the command succeeds at least one in three we shouldn't have to restart the pipeline.

retry$ ./retry.sh 3 ./ok
running ok

retry$ ./retry.sh 3 ./bad
running bad
Jan  6 17:11:49 host.local user[653] <Notice>: ./bad  failed, retrying
running bad
Jan  6 17:11:49 host.local jonathona[655] <Notice>: ./bad  failed, retrying
running bad
Jan  6 17:11:49 host.local jonathona[657] <Notice>: ./bad  failed to much, bailing
#!/bin/bash
echo running bad
false
#!/bin/bash
echo running ok
true
#!/bin/bash
# retry -- retry a command a few times
# retry <count> [command] [parameter...]
# retry 3 systemctl restart app-server.service
set -uo pipefail
retry_count="$1"
shift
cmd="$1"
shift
while true; do
cmd_exit_status=$?
bash -c "$cmd" "$@"
if [ $? -eq 0 ]; then
exit
fi
retry_count=$[ $retry_count - 1 ]
if [ $retry_count -lt 1 ]; then
logger -s "$cmd $* failed to much, bailing"
exit $cmd_exit_status
fi
logger -s "$cmd $* failed, retrying"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment