There is a “new” issue in town and this is a story about three colleagues helping us to tackle it.
tldr; go to the bullet points
ESXi installed on SD Cards can fail to reboot as the SD Medium wears silently out.
My colleague John Nicholson posted about it in March Using SD cards for embedded ESXi and vSAN?
The main problem with this issue is that the SD Cards are silently failing. Well, they are not when you know what to look out for, there are some lines in the log files to give this issue away but who reads them? You could probably set something up in LogInsight but that is for another day.
In Johns article, he referenced a nice script from an other colleague of mine William Lam. William created a handy ash script which you can run on the ESXi host to check if the SD card returns the same information several times from the boot sector area.
I took Williams script and put a PowerCLI wrapper around it (which utilizes Plink.Exe) because I needed to check on 180 servers. During a Workshop, I talked about that with an other colleague and he mentioned to me that I should share it with the community. So here it is, I hope you find it useful.
There are several things about this script you should know:
- You can find it here: CheckSDCard Script on Bitbucket
- It has some standard functions which I use when I write PowerShell scripts so you could make it slimmer when you remove those and rewrite it a bit
- You can choose either one ESXi or the whole Cluster to check for
- It requires the root password (either on the command line, per prompt or hardcoded in the script…. DON’T use that option 🙂 )
- If ssh is disabled on the server it will enable ssh first, do it’s magic and will disable ssh afterward again
- Oh, you need to be connected to the vCenter Server before running the script.
Looking a little bit closer at the script:
Line 60 – 69:
Will check if Plink.exe is in the directory of the script and exit if it is not.
Line 71 – 95:
Will determine based on the ParameterSetName if you want to check a host or a cluster. If it is a cluster the script will get all the ESXi in the cluster and then run the function checkSDCard.
Line 97 – 101:
Will Export the result as a CSV File. The filename is based on the ESXi/Cluster and a timestamp.
Line 102 – 125:
Will only execute this part of the script if the ESXi host is at least Version 6.x. This could be removed.
Line 120 – 123:
Almost the real thing. This Part of the code reads the SDCardCheck.txt file. This file is almost the same as the SDCardCheck_debug.txt file. The Debug Version contains the original script from William. The Non-Debug Version has every output remarked out except the important part which detects if the SD card is corrupted or not.
Line 151 – 208:
This function is almost 100% my default Invoke-SSHCommand.
It will check if SSH is enabled, if not it will enable SSH and afterward disable it again. It will then execute the command via plink and write the output of that command into $output. I also check if authentication is possible and will exit if it is not.
The only change can be found in
Line 192 – 193:
Here a new $row object is created with ESXi Name and the result of the stripped down result of the SDCardCheck script. Then the $row is added to the SDCardList which is at the end of the script exported into the CSV file.
Bonus Line 324 – 357:
This is a Update-Check function I wrote to check if the local version is different to the version in your git repository. If there is a difference a backup is created and the new version downloaded. After that, you need to rerun the script.
I hope you find this script and the functions in it useful.