novascale bullion
VMware PSOD - Purple Screen of Death - in case of Time Drift
On a bullion multi-modules server, under VMware, the modules must be exactly synchronized. In ESXi 5.0, if the reported time diverges too much from one module to another, a PSOD can occur. (VMware "Purple Screen Of Death")
In ESXi 5.0, by default, clock is set on TSC ("Time Stamp Counter") mode. So it is necessary to set it on HPET mode ("High Precision Event Timer") to avoid this time drift.
1) What are the concerned bullion servers ?
- Multi-modules systems with 32DIMM and 64DIMM architecture
- Technical State : TS before TS 054.01 (i.e. BIOSX02.013.06.030 & EMM 11.11.00 build1202)
- Hypervisor : ESX & ESXi before ESXi 5.0-Update1
For Mono-module system, please, let the default TSC clock mode.
For systems having both a Technical State upper than TS 054.01 and an hypervisor ESXi 5.0 Update 1, please, let the default TSC clock mode.
2) How to qualify the clock drift problem
PSOD ("Purple screen Of Death") occurs, with messages like :
Heartbeat: xxx: PCPU xx didn't have a heartbeat for xx seconds. *may* be locked up
Example :
On the other hand, you can see if the default timer choosen is TSC in the /var/log/boot file.
Example :
TSC: 159106518672 cpu0:0)Timer: 975: reference timer is TSC at 2000070800 Hz
3) How to chose HPET mode
So to force HPET mode, it is necessary to dis-activate TSC and ACPI timer (otherwise the last takes the reference)
To do that enter the commands :
esxcli system settings kernel set -s timerEnableTSC -v FALSE
esxcli system settings kernel set -s timerEnableACPI -v FALSE
Then reboot the system to take into account the new clock mode.
4) How to check the HPET mode setting
After system reboot, enter the command :
zcat /var/log/boot.gz | less
then enter : /reference
You must obtain something like this :
TSC: 159109647640 cpu0:0)Timer: 850: TSC disabled as reference timer by config option
TSC: 159109651948 cpu0:0)Timer: 786: ACPI PM disabled as reference timer by config option
TSC: 159109658324 cpu0:0)Timer: 975: reference timer is HPET at 14318179 Hz
Documentation Utilisateur