Graceful Shut Down of vSphere Guests from UPS

Graceful Shut Down of vSphere Guests from UPS

How to initiate graceful shut down of vsphere guests from UPS. This seems to be a frequently asked question in the VMWare forums and a Google of this subject will return a variety of possible solutions. Where this solution differs from most is it allows you to shutdown VM’s in a specific order, which is useful in a lot of situations but is a must if you are using VSA style storage solutions e.g. HP StoreVirtual, Nutanix, etc.

Firstly here’s my usual solution where shutdown order is of no importance. I would simply deploy vMA appliances to each VMware host and install the UPS agent or in the case of APC I would use their preconfigured VMware PCNS appliance. With these I would simply configure them to shut down their respective hosts before the battery runs out. The method to get the VM’s to shutdown gracefully is to configure the virtual machine startup/shutdown settings on each host. Officially these options are not supported in a vSphere cluster, however they do still work to a point. Virtual Machine startup order is lost when VM’s get migrated around the cluster so don’t expect vSphere to observe this configuration but otherwise VM’s will startup. With regard to the shutdown option this is again functioning to a point. VM’s will shutdown but I have found they will ignore the shutdown delay settings. So this is fine for environments where we just want to shut the guests down in no particular order but it’s no good to us if we need to leave some VM’s up until last.

So if for example we are using VSA style storage e.g. HP StoreVirtual VSA or Nutanix we must leave our storage appliances up until last. If we let our hosts blanket shutdown their guests at once our storage will disappear while our VM’s are still running or in the middle of shutting down and this could lead to some serious corruption. So in this instance I would use PowerCLI. So to start with I will pick a server (probably a Windows box depending on UPS support for other OS’s and what you have in your environment) to install the UPS agent on, this could be a VM but would be easier if it was a physical server outside of the vSphere cluster. Next install PowerCLI on the server with the UPS agent.

So now we need a PowerShell script that out UPS agent can execute when utility power is lost to the UPS. Here is an example script you can use as a starting point but every environment is different so you will need to modify this to suite your needs.

Add-PSSnapin VMware.VimAutomation.Core
$Creds = Get-VICredentialStoreItem -Host vcentre -File C:\temp\credentials.xml
Connect-VIServer vcentre -User $Creds.User -Password $Creds.Password
$vmservers=Get-VM | Where-Object {$_.powerstate -eq ‘PoweredOn’ -and $ -notlike "*NTNX*" -and $ -notlike "*vCentre*" -and $ -notlike "vMA*"}
$novmtools=Get-VM | Where-Object {$_.powerstate -eq ‘PoweredOn’ -and $ -notlike "*NTNX*" -and $ -notlike "*vCentre*" -and $ -notlike "vMA*"} | Get-View | where-object {$_.guest.ToolsVersion -eq "0"}
$vmservers | Shutdown-VMGuest -Confirm $false
$novmtools | Stop-VM -Confirm $false
start-sleep -s 360
$stillon=Get-VM | Where-Object {$_.powerstate -eq ‘PoweredOn’ -and $ -notlike "*NTNX*" -and $ -notlike "*vCentre*" -and $ -notlike "vMA*"}
$stillon | Stop-VM -Confirm $false
start-sleep -s 60
Get-VM vcentre | Shutdown-VMGuest -Confirm $false

So take a copy of this and paste it to notepad (or your favourite text editor)  and modify as required. Save it as a ps1 file and save it somewhere on your chosen server. Depending on whether your UPS agent will directly execute a ps1 script you may need to use a batch file to call the PowerShell script.

What this script is doing is connecting to vCenter which may well be a VM within the cluster, you could also use something similar but connect directly to each host to save having to leave vCenter running. To avoid clear text credentials this script is using an xml file to provide the hashed credentials, details of how to do this can be found here. Once connected it gets all the VM’s running in the cluster except VM’s with certain text in their names, so here we can replace *NTNX* with *VSA* or whatever will work for your storage appliances and their naming convention, the same goes for vCenter if it is a VM and any vMA’s you might be using for the actual host shut down. The script then shuts down all VM’s excepting those with no VMWare Tools installed (because it can’t) and the VM’s we have excluded via their name. At this point we are also grabbing VM’s with no VMware Tools installed and powering them off as we won’t be able to gracefully shut them down from the host, be careful here as again we need to exclude VM’s we want to leave up just encase they don’t support VMware Tools. Now the script waits for 5 minutes before powering off anything that didn’t shut down, again excepting the VM’s we want to leave on. You can change the sleep time here if you have VM’s that usually take longer to shut down. You could omit this step altogether but an ungraceful powering off of a server would likely be preferable to a running VM’s storage disappearing. Lastly we shut down the vCenter server. You can also add a line to the end to shut down the local server as presumably we also want to have our physical servers shut down gracefully.

From here we can either start connecting to hosts with our script and shut them down via PowerCLI or we could have vMA appliances configured with UPS agents to shut down hosts, in the case of vMA’s we need to ensure the shut down delay is sufficient to allow our script to have finished. We still need to configure startup/shut down on each host as we want whatever our script left running to shut down gracefully with the host, hopefully this will just be vMA’s and our VSA appliances, of course if our vMA’s are on the shared VSA storage we could potentially have them getting corrupted but it is minimal effort to restore or rebuild the odd one of these as they probably won’t contain any valuable data.

Lastly you should test whatever script you come up with as best you can on non-production systems rather than waiting for a power outage to find out whether it works. My script is purely an example to give you some ideas so it is highly unlikely it will be fit for purpose in your environment. One way we can test this if we don’t have a suitable test environment is by making a slight modification to connect to a single host only and migrating everything off except the resident storage appliance and some non-critical non-production VM’s and then running your script, most UPS agents have some sort of test shut down facility. VSA style storage tends to suffer the loss of one appliance without losing the storage so this test should verify that the script works as expected without impact to production systems.

Thanks for reading and I hope this is useful to someone.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>