Skip to content

Instantly share code, notes, and snippets.

@sandfox
Last active May 16, 2021 18:48
Show Gist options
  • Save sandfox/32e749b5eac861c93f1bbeb8782ae8fd to your computer and use it in GitHub Desktop.
Save sandfox/32e749b5eac861c93f1bbeb8782ae8fd to your computer and use it in GitHub Desktop.

We've had a number of NTP and clock based anomalies over approximately the past 12 to 18 hours. Upon further investigation, as unlikely as this might seem, we think the Hypervisor may be presenting an incorrect time to the guest.

Whilst I'll discuss a specific host in this ticket we've seen it in multiple places over this time period. All examples have been in eu-west-1, have been spot instances, although the instance type varies.

Consider that /sys/devices/system/clocksource0/current_clocksource returns xen.

With no ntp daemon running the following can be observed:

# ntpdate 0.amazon.pool.ntp.org && sleep 900 && ntpdate 0.amazon.pool.ntp.org
13 Dec 13:42:04 ntpdate[4889]: step time server 4.53.160.75 offset 4.531351 sec
13 Dec 13:57:29 ntpdate[5639]: step time server 52.48.113.20 offset 16.268760 sec

That is to say ntpdate corrected 4.53 seconds of skew, we waited 15 minutes, and then 16 seconds of lag were then corrected with the xen clock source.

We also see the same thing with tsc as a source. However, given that these are all PV instances we believe the time source is broadly the same:

root@ip-172-31-20-47:/var/lib/ntp# echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
root@ip-172-31-20-47:/var/lib/ntp# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
root@ip-172-31-20-47:/var/lib/ntp# ntpdate 0.amazon.pool.ntp.org && sleep 900 && ntpdate 0.amazon.pool.ntp.org
13 Dec 14:00:25 ntpdate[5786]: step time server 52.48.113.20 offset 3.049115 sec
13 Dec 14:15:48 ntpdate[8165]: step time server 31.28.161.68 offset 16.155150 sec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment