Insights Microsoft VSS Backup Retryable Error

Microsoft VSS Backup Retryable Error

What was going on?

One of our customers reported an issue with how an Exchange server was backing up. The backups themselves were working alright, and the customer was able to perform test restores. However, one of the VSS writers was consistently showing an error state.

You can check the status of all available VSS writers by opening a command prompt and running “vssadmin list writers.” Every line should show “Stable” and “No error.”

When “vssadmin list writers” was ran on their system, the below was reported:

VSS writers can be in one of a couple states. The first is stable which means the writer is in a ready state and ready to process a backup. This is the normal resting state. Next, they can show a failed or unstable state. Typically, these can be reset to get it back into a stable state. The last set of states it can be in, is in-progress or waiting for completion. (This is a key for later solving this case.) This shows that the writer is still in use by a backup process. Once the backup completes, it should return to the normal stable state.

Steps taken to remediate

  1. Attempt to reset the affected VSS services. This can be done one of two ways. You can find the applicable service, and restart the service. Or, you can restart the entire system. Since restarting servers is not always an easy thing to do considering today’s high-uptime requirements, restarting services is typically preferable. You can reference a list such as the one provided here to help match up the writer names with the service to restart.
  1. Restarting the services did not help us in this particular case so we continued on. Next, a viable step could be to check for any corrupted system files. This can be done by running a “sfc /scannow” . I also ran “dism.exe /online /cleanup-image /scanhealth” followed by if necessary; “dism.exe /online /cleanup-image /restorehealth”. This came back clean.
  1. Finally, it was found via second opinion that this Exchange server was also running Azure ASR for off site replication duties. The local agent was found to be out of date and since it uses VSS as well, it was causing the problems we were seeing. After the agent was updated, the problem went away.

The moral of the story here is, keep all of your backup agents updated!

For official tutorials and more information about the Volume Shadow Copy Service, see the official documentation.