Take care – Express Patch 6 for ESXi 6 can break your Backup (CBT Bug)!

Update 05/08/2016: The issue is fixed with ESXi 6.0 Patch 03: VMware ESXi 6.0, Patch ESXi600-201608401-BG: Updates esx-base, vsanhealth, vsan VIBs (2145664)

—- —– —– ——

Update 25/06/2016: VMware published a KB article covering the issue described in this blog post two days ago (VMware KB 2145895).  If you read the KB and the post carefully, you will find some differences.  Eg. VMware states in the KB: “No data is lost. Due to this issue, incremental backups are effectively full backups.”  As I have seen various affected virtual machines where the backup was not equal to a full backup, I decided not to change the blog post. I still recommend performing a CBT reset to ensure that you can rely on your backup.

—- —– —– ——

Just a short notice if you are already on ESXi 6:

There is an issue with Change Block Tracking (CBT) in Express Patch 6 for ESXi 6 (Build 3825889) that can affect your backup if your backup software relies on CBT.

  • affected are only VMs with Operating System Server 2008 or newer (Linux VMs are unaffected.)
  • all vHW Versions (7, 8, 9, 10 and 11) and all VMware tools versions are affected
  • the issue occurs with thick and thin provisioned disks
  • the issue only occurs when the backup software uses quiesced snapshots

Error messages:

You can find errors in the vmware.log of the affected VMs similar to:

vcpu-0| I120: SNAPSHOT:SnapshotBranchDisk: Failed to acquire current epoch for disk /vmfs/volumes/
vmdk : Change tracking is not active for this disk 572.

Other symptoms:

  • you have to backup an abnormal amount of data (it is almost a full backup even if it is an incremental one)
  • longer backup durations due to the abnormal amount of data sent to the backup software

Resolution:

There are currently 2 workarounds available:

  1. don’t use quiesced (application-aware) snapshots
  2. Revert to a previous version of ESXi, e.g. vSphere ESXi 6 U2 (build 3620759) (https://kb.vmware.com/kb/1033604)

If you already have affected virtual machines, migrate them to a host using an unaffected ESXi version (eg. VMware ESXi 6 Update 2, build 3620759).

Then you have to reset CBT of the affected VMs and perform a Full Backup to ensure that your backup is reliable (How to reset CBT on multiple VMs using PowerCLI)

Here you can find a short description: How to rollback/revert to a previous ESXi Version

Additional information:

  • VMware VDP seems to be affected by the issue
  • IBM TSM seems to be affected by the issue
  • Veeam Backup & Replication seems not to be affected as long as you do not enable VMware Tools quiescence (in the backup job settings):

enable_guest_quiescence

16 Comments

  1. in4sometech

    Thanks for sharing..

  2. Darian

    f*** this is the reason for the increased backups – thx for sharing

  3. Mike

    Another bug with CBT
    same here

  4. Anthony Spiteri

    Hey there…is there an official VMwareKB for this bug yet?

  5. KABO

    Thanks for info! What about replicas? same?

  6. Andreas

    @Anthony: As far as I know it should be published soon

  7. Andreas Neufert

    Hi, if you have enabled Veeam InGuest processing it will overwrite a set “VMware Tools quiescence” setting in the job configuration. So even if you have enabled quiescence you are maybe not affected because you had enabled Veeam InGuest processing.

  8. Andreas

    @Andreas: thank you for this valuable input

  9. Chris Wahl

    Hi Andreas. Chris from Rubrik here. Just to confirm, we don’t use the VMware Tools quiescence for application consistent backups and are thus not affected by this issue. Cheers!

  10. Luciano Patrao

    Hi Andreas,

    We use Veeam and I have this issue today

    SNAPSHOT:SnapshotBranchDisk: Failed to acquire current epoch for disk /vmfs/volumes/2f6294e0-bc15bc5d/edeacwdp2vnf01/edeacwdp2vnf01.vmdk : Change tracking is not active for this disk 572.

    And we use VMware Tools quiescence

    Until now I see this only in 2 VMs (Windows 2012)

    Still investigating if I have more issues.

  11. Andreas

    @Luciano: take a look at the amount of data sent to your backup system. There is a good Chance that it is abnormal high compared to incremental backups before applying EP6.

  12. Luciano Patrao

    @Andreas for now I don’t see to much difference in the backup sizes.

    But since I had some full backups in some of the jobs, need to double check this in the next incremental backup jobs.

  13. FSKSpook

    Isn’t this fixed in VMWare Tools 10.0.9?

  14. Ui
  15. Dick

    VMware top brass are super egomaniac, I really think they have no motivation to make quality software because bugs generate a lot o support revenues for them, and they don’t give a damn about customer complains at all, people should start moving away from VMware products, it’s getting so ridiculously complicated, it will be very buggy and super expensive.
    Time to plan to migrate to other products in the next 1-4 years.

Leave a Comment

Your email address will not be published. Required fields are marked *