Where do logs come from? Veeam Log Diving

Where do logs come from? Veeam Log Diving

We continue our immersion into the fascinating world of guessing ... troubleshooting by logs. IN previous article we agreed on the meaning of the basic terms and looked at the overall structure of Veeam as a single application with one eye. The task for this one is to figure out how log files are formed, what kind of information is displayed in them and why they look the way they look.

What do you think these "logs" are? According to most, the logs of any application should be assigned the role of a kind of omnipotent entity that most of the time vegetates somewhere in the backyard, but at the right moment appears out of nowhere in shining armor and saves everyone. That is, they should contain everything, from the slightest errors in each component to individual database transactions. And so that after the error it was immediately written how else to fix it. And all this should fit in a couple of megabytes, no more. It's just text! Text files cannot take tens of gigabytes, I heard it somewhere!

So the logs

In the real world, logs are just an archive of diagnostic information. And what to store there, where to get information for storage and how detailed it should be, is up to the developers themselves to decide. Someone follows the path of minimalism by keeping records of the ON / OFF level, and someone diligently rakes everything they can reach. Although there is also an intermediate option with the ability to select the so-called Logging Level, when you yourself indicate how detailed information you want to store and how much extra disk space you have =) VBR has six such levels, by the way. And, believe me, you do not want to see what happens with the most detailed logging with free space on your disk.

Fine. We roughly understood what we want to save, but a legitimate question arises: where to get this information from? Of course, we form part of the events for logging ourselves by our internal processes. But what to do when there is an interaction with the external environment? In order not to slide into a pitch hell of crutches and bicycles, Veeam tends not to invent inventions that have already been invented. Whenever there is a ready-made API, built-in function, library, etc., we will give preference to ready-made options before starting to fence our contraptions. Although the latter is also enough. Therefore, when analyzing logs, it is important to understand that the lion's share of errors falls on messages from third-party APIs, system calls, and other libraries. In this case, the role of VBR comes down to forwarding these errors to the as is log files. And the main task of the user is to learn to understand which line is from whom, and what this “who” is responsible for. So if an error code from the VBR log takes you to an MSDN page, that's fine and correct.

As we agreed earlier: Veeam is a so-called SQL-based application. This means that all settings, all information and in general everything that is only necessary for normal functioning - everything is stored in its database. Hence the simple truth: what is not in the logs is most likely in the database. But this is not a silver bullet either: some things are not in the local logs of Veeam components, nor in its database. Therefore, you need to learn how to study the host logs, the logs of the local machine and the logs of everything that is involved in the backup and restore process. And it also happens that the necessary information is not available anywhere at all. That is the way. 

Some examples of such APIs

This list does not aim to be exceptionally complete, so there is no need to look for the ultimate truth in it. Its purpose is only to show the most common third-party APIs and technologies used in our products.

Let's start with VMware

First on the list will be vSphere API. Used for authentication, reading the hierarchy, creating and deleting snapshots, requesting information about machines, and much (very much) more. The functionality of the solution is very wide, so I can recommend the VMware vSphere API Reference for the version 5.5 и 6.0. For more current versions, everything is just googled.

VIX API. The black magic of the hypervisor, for which there is a separate error list. VMware API for working with files on the host without connecting to them over the network. A last resort option when you need to put a file in a machine to which there is no better communication channel. It is pain and suffering if the file is large and the host is loaded. But here the rule works that even 56,6 Kb / s is better than 0 Kb / s. In Hyper-V, this thing is called PowerShell Direct. But that was only before

vSpehere Web Services API Starting from vSphere 6.0 (approximately, since this API was first introduced on version 5.5) it is used to work with guest machines and has supplanted VIX almost everywhere. In fact, this is another API for managing vSphere. For those who are interested, I recommend to study a great manual. 

VDDK (Virtual Disk Development Kit). The library, which was partially discussed in this article. Used to read virtual disks. Once upon a time it was part of the VIX, but over time it was moved into a separate product. But as an heir, it uses the same error codes as VIX. But for some reason, there is no description of these errors in the SDK itself. Therefore, it was found out empirically that VDDK errors with other codes are just a translation from binary to decimal code. It consists of two parts - the first half is undocumented information about the context, and the second part is the traditional VIX / VDDK errors. For example, if we see:

VDDK error: 21036749815809.Unknown error

Then we boldly convert this to hex and get 132200000001. We simply discard the uninformative beginning of 132200, and the remainder will be our error code (VDDK 1: Unknown error). About the most frequent VDDK errors, there was just recently a separate article.

Now let's look at WIndows.

Here, everything that is most necessary and important for us can be found in the standard Event Viewer. But there is one catch: according to a long tradition, Windows does not log the full text of the error, but only its number. For example, error 5 is “Access denied”, and 1722 is “The RPC server is unavailable”, and 10060 is “Connection timed out”. Of course, it's great if you remember the most famous ones, but what about hitherto unseen ones? 

And so that life does not seem like honey at all, errors are also stored in hexadecimal form, with the prefix 0x8007. For example, 0x8007000e is actually 14, Out of Memory. Why and for whom this was done is a mystery shrouded in darkness. However, a complete list of errors can be downloaded for free and without SMS from devcenter.

By the way, sometimes there are other prefixes, not just 0x8007. In such a sad situation, in order to understand HRESULT (“result handle”), you need to delve even deeper into documentation for developers. In ordinary life, I don’t advise you to do this, but if you suddenly pressed against the wall or are just curious, now you know what to do.

But the comrades at Microsoft took pity on us a little and showed the world a utility ERR. This is a small piece of console happiness that can translate error codes into human without using Google. It works like this.

C:UsersrootDesktop>err.exe 0x54f
# for hex 0x54f / decimal 1359
  ERROR_INTERNAL_ERROR                                           winerror.h
# An internal error occurred.
# as an HRESULT: Severity: SUCCESS (0), FACILITY_NULL (0x0), Code 0x54f
# for hex 0x54f / decimal 1359
  ERROR_INTERNAL_ERROR                                           winerror.h
# An internal error occurred.
# 2 matches found for "0x54f"

A legitimate question arises: why do we not immediately write the decryption to the logs, but leave these mysterious codes? The answer is in third-party applications. When you pull some WinAPI call yourself, it is not difficult to decipher its response, because there is even a special WinAPI call for this. But as already mentioned, everything that only comes to us in responses gets into our logs. And here, in order to decrypt it, one would have to constantly monitor this stream of consciousness, pull out pieces with Windows errors from it, decrypt them and paste them back. Let's be honest, not the most exciting activity.

Windows File Management API used in every possible way when working with files. Creating files, deleting, opening for writing, working with attributes, and so on and so forth.

Mentioned above PowerShell Direct as an analogue of the VIX API in the Hyper-V world. Unfortunately, not so flexible: a lot of restrictions on functionality, it does not work with every version of the host and not with all guests.

RPC (Remote Procedure Call) I don't think there is a single person who has worked with WIndows who hasn't seen RPC-related errors. Despite the popular misconception, this is not a single protocol, but any client-server protocol that satisfies a number of parameters. However, if there is an RPC error in our logs, 90% of the time it will be an error from Microsoft RPC, which is part of DCOM (Distributed Component Object Model). You can find a huge amount of documentation on this topic on the net, but the lion's share of it is quite outdated. But if there is an acute desire to study the topic, then I can recommend articles What is RPC?, How RPC Works and long list RPC errors.

The main causes of RPC errors in our logs are failed attempts to communicate between VBR components (server > proxy, for example) and most often due to communication problems.

The top top among all tops is the error The RPC server is unavailable (1722). In simple terms, the client could not establish a connection with the server. How and why - there is no single answer, but usually it is a problem with authentication or with network access to port 135. The latter is typical for infrastructures with dynamic port assignment. On this topic, there is even separate HF. And Microsoft has voluminous guide to find the cause of the failure.

Second most popular error: There are no more endpoints available from the endpoint mapper (1753). The RPC client or server failed to assign itself a port. Usually occurs when the server (in our case, the guest machine) has been configured to dynamically allocate ports from a narrow range that has ended. And if you log in from the client side (in our case, the VBR server), this means that our VeeamVssAgent either did not start or was not registered as an RPC interface. There is also on this topic separate HF.

Well, to complete the Top 3 RPC errors, let's remember RPC function call failed (1726). Appears if the connection has been established, but RPC requests are not processed. For example, we request information about the status of VSS (suddenly right now a shadow mine is being made there, and we are trying to climb), and in response to us, silence and ignore.

Windows Tape Backup API needed to work with tape libraries or drives. As I mentioned at the beginning: we have no pleasure in writing our own drivers and then suffering with the support of each device. Therefore, vim does not have any drivers of its own. All through a standard API, the support of which is implemented by the hardware vendors themselves. So much more logical, right?

SMB / CIFS Out of habit, everyone writes them side by side, although not everyone remembers that CIFS (Common Internet File System) is just a private version of SMB (Server Message Block). So there is nothing wrong with generalizing these concepts. Samba is already a LinuxUnix implementation, and it has its own peculiarities, but I digress. What is important here: when Veeam asks to write something to the UNC path (serverdirectory), the server uses the hierarchy of file system drivers, including mup and mrxsmb, to write to the ball. Accordingly, these drivers will also generate errors.

Can't do without Winsock API. If something needs to be done over the network, VBR works through the Windows Socket API, popularly known as Winsock. So if we see a bunch of IP:Port in the log, this is it. The official documentation has a good list of possible errors.

Mentioned above WMI (Windows Management Instrumentation) is a kind of almighty API for managing everything and everyone in the Windows world. For example, when working with Hyper-V, almost all requests to the host go through it. In a word, the thing is absolutely irreplaceable and very powerful in its capabilities. In an attempt to help find out where and what is broken, the built-in WBEMtest.exe tool helps a lot.

And last on the list, but by no means least in importance - VSS (Volume Shadow Storage). The topic is as inexhaustible and mysterious as much documentation has been written on it. Shadow Copy is most simply understood as a special type of snapshot, which in essence it is. Thanks to him, you can make application-consistent backups in VMware, and almost everything in Hyper-V. I have plans to make a separate article with some squeeze on VSS, but for now you can try to read this description. Just be careful, because. trying to understand VSS in a flash can lead to brain injury.

On this, perhaps, we can stop. I consider the task of explaining the most basic things completed, so in the next chapter we will already look at the logs. But if you have any questions, feel free to ask them in the comments.

Source: habr.com

Add a comment