Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

Disclaimer: The note is for entertainment purposes. The specific density of useful information in it is small. It was written for myself.

Lyrical introduction

The file dump in our organization runs on a VMware ESXi 6 virtual machine running Windows Server 2016. And this is not just a dump. This is a file exchange server between structural divisions: there is collaboration, project documentation, and folders from network scanners. In general, here the whole production life.

And now this receptacle of the entire industrial life began to hang. Moreover, the guest could hang quietly himself, without affecting the others. Could hang up after itself all host and, accordingly, all other guest machines. It could hang itself and hang up the vSphere client services: that is, the processes of the other guests are alive, the machines work properly and respond, but there is no file dump and the vSphere Client does not cling to the host. In general, no system could be identified. Hangups could occur during the day during a light load. Could at night during no load. Could at night during differential backup and medium load. Could on weekends during full backup and high load. And there was a clear degradation of the situation. At first it was once a year, then once every six months. At the end of my patience - twice a week.
I sinned on RAM. But they didn’t let me stop the garbage can even on weekends and drive Memtest away. Waiting for the May holidays. During the May holidays, I ran Memtest and ... no errors were found.

I fell into amazement and decided to go on vacation. While I was on vacation, the garbage dump did not have a single freeze. And when on Monday the first day of work came out, the garbage dump hung. Sustained a full backup and hung right at the end of it. Such a warm meeting from vacation prompted me to decide to physically move the disks with the guest machine to another host.

And, although it has long been known that nothing serious can be done on the first day after a vacation, although I set myself up not to work all the way to work, my indignation at another freeze knocked out of my head both mood and vows ...

The physical disks have been moved to a different host. Hot connection. In the storage settings on the tab Drives discs appear. On the tab Datastores there is no storage on these disks. refresh - do not appear. And, of course, the first impulse - Add Storage. The add wizard tells what it supports. Of course it also supports VMFS. I did not doubt it. A quick look at the wizard's messages at each step: Next, Next, Next, Finish. My gaze didn't even close to the small yellow circle with an exclamation point at the bottom of the window of one of the wizard's steps.

At the end of the wizard, a fresh Datastore appeared in the list ... and with it the Datastores from the remaining physical disks.

I turn to navigate the newly added Datastore, and it is ... empty. Of course, I again fell into amazement. 8 am on the clock, the first 15 minutes at work after the holidays, even the sugar in the coffee has not stirred yet. And here it is. The first thought was - I pulled out the wrong disk from the "native" host. I looked to see if the Datastore I was looking for was present in the "native" host: no, it is not present. The second thought was: β€œfuck # s!”. I'm not sure, but it seems to me that the third, fourth and at least the fifth thought was the same.

To dispel doubts, I quickly installed a fresh ESXi for testing, took the left disk and, already reading it, walked through the wizard's steps. Yes. When adding a Datastore using the wizard, all data on the disk is lost without the ability to roll back the operation and restore the data. Later, I read on one of the forums an assessment of such a design by the master: shitsome crap. And right now, I really agreed.

Starting with the sixth, thoughts flowed in a more constructive direction. OK. Initialization takes a matter of seconds even for a 3Tb disk. So this is high-level formatting. So, the partition table was simply rewritten. So the data is still there. So, now let's look for some unformat and voila.

I load the machine from the Strelec boot image ... And I find out that partition recovery programs know everything except VMFS. Synology's partition layout, for example, is known, but VMFS is not.

The enumeration of programs is not comforting: at best, GetDataBack and R.Saver find NTFS partitions with a live directory structure and live file names. But it doesn't suit me. I need two vmdk files: with the system disk and the garbage disk.

And then I understand that, it seems, now I will install Windows and roll out of the file backup. And at the same time I remember that I had a DFS root there. And also a system of access rights to subdivision folders that is completely wild in terms of volume and branching. Not an option. The only option acceptable in time is restoring the system state and the disk with data and all rights.

Again googling, forums, KB'shki and again Yaroslavna's crying: VMware ESXi does not provide a data recovery mechanism. All discussion threads have two endings: someone recovered using the expensive DiskInternals VMFS Recovery, or someone was helped by a specialist in vmfs-tools ΠΈ dd. Buying a $700 DiskInternals VMFS Recovery license is not an option. The admission of an unauthorized person from the "territory of a potential enemy" to corporate data is also not an option. But it was googled that VMFS partitions can also be read by UFS Explorer.

DiskInternals VMFS Recovery

The trial version has been downloaded and installed. The program successfully saw an empty VMFS partition:

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

In the mode Undelete (Fast Scan) I also found a shabby Datastore with folders of virtual machines with disks inside:

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

The preview showed that the files are alive:

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

Mounting the partition into the system was successful, but for some unknown reason, all three folders had the same virtual machine. Of course, according to the law of meanness - not the one that is required.

Three lines of shameAn attempt to shamelessly lock the software ended in failure. But UFS Explorer was locked.

I am extremely negative about software theft. In no case do I call for the use of means to bypass protection against unlicensed use.

I was in a catastrophic situation and was not at all proud of the measures I resorted to.

UVS Explorer

Scanning the disk showed the presence of 7 nodes. The number of nodes β€œsurprisingly” coincided with the number of *-flat.vmdk files found by VMFS Recovery:

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

Comparison of file sizes and node sizes also showed a match up to a byte. At the same time, the names of *-flat.vmdk files and, accordingly, their belonging to virtual machines were restored.

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

In general, vmdk disks from the point of view of ESXi consist of two files: a data file (<machine name>-flat.vmdk) and a β€œphysical” disk layout file (<machine name>.vmdk). If you upload a *-flat.vmdk file from a local machine to the Datastore, then ESXi will not recognize it as a valid disk file. The VMware Knowledge Base has an article on how to manually create a disk descriptor file: kb.vmware.com/s/article/1002511, but I didn’t have to do this, I just copied the contents of the corresponding files from the file content preview area in DiskInternals VMFS Recovery:

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

After 4 hours of unloading a 2,5 TB node from UFS Explorer and 20 hours of loading into the hypervisor's Datastore, the crashed disk files were connected to a freshly created virtual machine. Disks picked up. No data loss was observed.

Restoring virtual machines from an erroneously initialized Datastore. The story of one nonsense with a happy ending

Source: habr.com

Add a comment