Thursday 26 May 2011

Removing VMWare snapshots with invalid snapshot configuration errors

You come across a server with some unknown performance issues.  The host system is VMWare ESXi (version 3.5) and there are 2 virtual machines.  One of them is an SBS 2008 (Virtual Machine A) which is experiencing the problems.  As you carry out your preliminary checks you discover  snapshot disks (delta) which are not registered by the snapshot manager.  To compound the problem the size of the snapshots have grown quite large and you urgently need to commit them but this will be your first attempt.  The pressure is on.
I've listed the sequential steps I took below, although figuring it out as I did the task wasn't so straight forward and nerve racking:
1. Make sure all user sessions/open files  are all closed.  If you have no choice then forcibly close their connections.
2. Disable all external services. eg.  Exchange which is your mail.
3. Make an incremental backup of the virtual server with an imaging based disaster recovery/backup software.  I used shadow protect from storage craft.  If you do not have access to one then good luck!
4.  Shut the virtual server down, browse to the data store and edit the .vmx file.  Remove any invalid entries.  When I say invalid I mean of any references to virtual/physical objects that do not exist anymore such as to a virtual disk.  Also rename the file with the extension .vmsd by appending .bak or .old.
You should only carry this step out if you have trouble creating snapshots or increasing the size of the virtual hard drives and VMWare throws the error "...invalid snapshot configuration".
5. With the virtual machine still shutdown, use the snapshot manager to create a new snapshot.  Now delete the snapshot you just created.  VMWare should now be attempting to commit the snapshot disks.
6.  Wait a few hours depending on how slow/fast your physical server is.  Don't be alarmed if you are unable to connect back to the esx server with the VMWare infrastructure client.  The process of commiting those disks are so intense that all other input/output are literally ignored.  I had to wait 2 hours.  I read that others have had to wait 2 days!!!  They must have a really old and enormous snapshot or their server is an absolute snail.
7.  Good luck!


references:
- www.vmware.com
- http://www.virtualisation.co.uk/2009/01/esx-snapshots-an-invalid-snapshot-configuration-was-detected/
- http://ict-freak.nl/2009/07/11/vmware-remove-snapshot-stuck-at-95/
- http://www.vmdamentals.com/?p=332
- http://www.experts-exchange.com/Software/VMWare/Q_26803123.html

No comments:

Post a Comment