How to check VMFS for metadata corruption (ESXi 5.1 and later)

Since ESXi 5.1 it is possible to check VMFS for metadata inconsistency with a tool called VOMA (VMware Ondisk Metadata Analyser). With VOMA you can check VMFS3 and VMFS5 datastores.

Please note, that the tool can only identify problems, as it runs in a read-only mode. So it does not help you to fix detected errors.

Reasons to use VOMA:

  • occurrence of metadata errors in the vmkernel log
  • if you experience SAN outages
  • after rebuilding a RAID
  • if you cannot modify, erase or access files on a VMFS datastore, that is not in use by another host

Before you start VOMA from the CLI of your ESXi host, take care of the following guidelines:

  • Shut down all virtual machines running on the VMFS datastore (or migrate them)
  • make sure that the VMFS volume is not in use by other hosts (best practice: unmount the datastore on the other hosts)
  • make sure that the datastore is not in use by vSphere HA for heartbeating
  • make sure that the datastore is not in use by other features like Storage I/O control,…
  • make sure that the volume is not a multi-extent volume

Now log on to your ESXi host and let’s take a look at the available parameters of VOMA (voma -h)

voma_parameters

First, you need to know the path to the partition (naa.xxxxxx:1). Run the following command to display a list with Volume Name, VMFS UUID and Device Name:

esxcli storage vmfs extent list

The output should be simular like this:

esxcli_storage_vmfs_extent_list

If we want to scan VMLUN_01 we have to combine the Device Name (naa.60a98000646e6c…) and the partition number (1) with a “:”.

voma -m vmfs -f check -d /vmfs/devices/disks/naa.60a98000646e6c50566f6a6c6a683164:1

If VOMA runs successfully, you should see something like this:

Checking if device is actively used by other hosts

Running VMFS Checker version 0.9 in check mode

Initializing LVM metadata, Basic Checks will be done

Phase 1: Checking VMFS header and resource files

Detected file system (labeled:’VMLUN_01′) with UUID:4fa227b8-8d16cdf3-4816-984be103b9a0, Version 5:54

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

Total Errors Found:           0

What should I do if VOMA detects an error?

The tool can only find errors – but not fix them. So if VOMA detects any errors, please consult VMware support for further help.

Possible reasons/messages for stopping the VOMA scan:

If there is activity on the datastore you try to scan with VOMA, you will see the following output:

Found 1 actively heartbeating hosts on device / 1): MAC address xx:xx:xx:xx:xx:xx

VOMA stops the scan, as there is activity on the VMFS filesystem. The MAC address indicates the management interface of the ESXi host causing the activity.

Reasons for this can be:

  • a running VM on the scanned datastore
  • other hosts are accessing the datastore
  • vSphere HA is using the datastore for heartbeating
  • Storage I/O Control is turned on

1 Comment

  1. Dmitry

    Very nice post, thanks a lot. I have the following situation: VOMA detected several errors on my primary hardware-RAID disk.

    Unfortunately I cannot create a ticket on vmware support site. Can you point me to any direction how to correct these errors without data loss? This RAID had been re-build several month ago. I double checked hardware level – everything is ok.

Leave a Comment

Your e-mail address will not be published. Required fields are marked *