We continue our immersion into the fascinating world of guessing ... troubleshooting by logs. IN
What do you think these "logs" are? According to most, the logs of any application should be assigned the role of a kind of omnipotent entity that most of the time vegetates somewhere in the backyard, but at the right moment appears out of nowhere in shining armor and saves everyone. That is, they should contain everything, from the slightest errors in each component to individual database transactions. And so that after the error it was immediately written how else to fix it. And all this should fit in a couple of megabytes, no more. It's just text! Text files cannot take tens of gigabytes, I heard it somewhere!
So the logs
In the real world, logs are just an archive of diagnostic information. And what to store there, where to get information for storage and how detailed it should be, is up to the developers themselves to decide. Someone follows the path of minimalism by keeping records of the ON / OFF level, and someone diligently rakes everything they can reach. Although there is also an intermediate option with the ability to select the so-called Logging Level, when you yourself indicate how detailed information you want to store and how much extra disk space you have =) VBR has six such levels, by the way. And, believe me, you do not want to see what happens with the most detailed logging with free space on your disk.
Fine. We roughly understood what we want to save, but a legitimate question arises: where to get this information from? Of course, we form part of the events for logging ourselves by our internal processes. But what to do when there is an interaction with the external environment? In order not to slide into a pitch hell of crutches and bicycles, Veeam tends not to invent inventions that have already been invented. Whenever there is a ready-made API, built-in function, library, etc., we will give preference to ready-made options before starting to fence our contraptions. Although the latter is also enough. Therefore, when analyzing logs, it is important to understand that the lion's share of errors falls on messages from third-party APIs, system calls, and other libraries. In this case, the role of VBR comes down to forwarding these errors to the as is log files. And the main task of the user is to learn to understand which line is from whom, and what this “who” is responsible for. So if an error code from the VBR log takes you to an MSDN page, that's fine and correct.
As we agreed earlier: Veeam is a so-called SQL-based application. This means that all settings, all information and in general everything that is only necessary for normal functioning - everything is stored in its database. Hence the simple truth: what is not in the logs is most likely in the database. But this is not a silver bullet either: some things are not in the local logs of Veeam components, nor in its database. Therefore, you need to learn how to study the host logs, the logs of the local machine and the logs of everything that is involved in the backup and restore process. And it also happens that the necessary information is not available anywhere at all. That is the way.
Some examples of such APIs
This list does not aim to be exceptionally complete, so there is no need to look for the ultimate truth in it. Its purpose is only to show the most common third-party APIs and technologies used in our products.
Let's start with VMware.
First on the list will be vSphere API. Used for authentication, reading the hierarchy, creating and deleting snapshots, requesting information about machines, and much (very much) more. The functionality of the solution is very wide, so I can recommend the VMware vSphere API Reference for the version
VIX API. The black magic of the hypervisor, for which there is a separate
vSpehere Web Services API Starting from vSphere 6.0 (approximately, since this API was first introduced on version 5.5) it is used to work with guest machines and has supplanted VIX almost everywhere. In fact, this is another API for managing vSphere. For those who are interested, I recommend to study
VDDK (Virtual Disk Development Kit). The library, which was partially discussed in this
VDDK error: 21036749815809.Unknown error
Then we boldly convert this to hex and get 132200000001. We simply discard the uninformative beginning of 132200, and the remainder will be our error code (VDDK 1: Unknown error). About the most frequent VDDK errors, there was just recently a separate
Now let's look at WIndows.
Here, everything that is most necessary and important for us can be found in the standard Event Viewer. But there is one catch: according to a long tradition, Windows does not log the full text of the error, but only its number. For example, error 5 is “Access denied”, and 1722 is “The RPC server is unavailable”, and 10060 is “Connection timed out”. Of course, it's great if you remember the most famous ones, but what about hitherto unseen ones?
And so that life does not seem like honey at all, errors are also stored in hexadecimal form, with the prefix 0x8007. For example, 0x8007000e is actually 14, Out of Memory. Why and for whom this was done is a mystery shrouded in darkness. However, a complete list of errors can be downloaded for free and without SMS from
By the way, sometimes there are other prefixes, not just 0x8007. In such a sad situation, in order to understand HRESULT (“result handle”), you need to delve even deeper into
But the comrades at Microsoft took pity on us a little and showed the world a utility
C:UsersrootDesktop>err.exe 0x54f
# for hex 0x54f / decimal 1359
ERROR_INTERNAL_ERROR winerror.h
# An internal error occurred.
# as an HRESULT: Severity: SUCCESS (0), FACILITY_NULL (0x0), Code 0x54f
# for hex 0x54f / decimal 1359
ERROR_INTERNAL_ERROR winerror.h
# An internal error occurred.
# 2 matches found for "0x54f"
A legitimate question arises: why do we not immediately write the decryption to the logs, but leave these mysterious codes? The answer is in third-party applications. When you pull some WinAPI call yourself, it is not difficult to decipher its response, because there is even a special WinAPI call for this. But as already mentioned, everything that only comes to us in responses gets into our logs. And here, in order to decrypt it, one would have to constantly monitor this stream of consciousness, pull out pieces with Windows errors from it, decrypt them and paste them back. Let's be honest, not the most exciting activity.
Windows File Management API used in every possible way when working with files. Creating files, deleting, opening for writing, working with attributes, and so on and so forth.
Mentioned above PowerShell Direct as an analogue of the VIX API in the Hyper-V world. Unfortunately, not so flexible: a lot of restrictions on functionality, it does not work with every version of the host and not with all guests.
RPC (Remote Procedure Call) I don't think there is a single person who has worked with WIndows who hasn't seen RPC-related errors. Despite the popular misconception, this is not a single protocol, but any client-server protocol that satisfies a number of parameters. However, if there is an RPC error in our logs, 90% of the time it will be an error from Microsoft RPC, which is part of DCOM (Distributed Component Object Model). You can find a huge amount of documentation on this topic on the net, but the lion's share of it is quite outdated. But if there is an acute desire to study the topic, then I can recommend articles
The main causes of RPC errors in our logs are failed attempts to communicate between VBR components (server > proxy, for example) and most often due to communication problems.
The top top among all tops is the error The RPC server is unavailable (1722). In simple terms, the client could not establish a connection with the server. How and why - there is no single answer, but usually it is a problem with authentication or with network access to port 135. The latter is typical for infrastructures with dynamic port assignment. On this topic, there is even
Second most popular error: There are no more endpoints available from the endpoint mapper (1753). The RPC client or server failed to assign itself a port. Usually occurs when the server (in our case, the guest machine) has been configured to dynamically allocate ports from a narrow range that has ended. And if you log in from the client side (in our case, the VBR server), this means that our VeeamVssAgent either did not start or was not registered as an RPC interface. There is also on this topic
Well, to complete the Top 3 RPC errors, let's remember RPC function call failed (1726). Appears if the connection has been established, but RPC requests are not processed. For example, we request information about the status of VSS (suddenly right now a shadow mine is being made there, and we are trying to climb), and in response to us, silence and ignore.
Windows Tape Backup API needed to work with tape libraries or drives. As I mentioned at the beginning: we have no pleasure in writing our own drivers and then suffering with the support of each device. Therefore, vim does not have any drivers of its own. All through a standard API, the support of which is implemented by the hardware vendors themselves. So much more logical, right?
SMB / CIFS Out of habit, everyone writes them side by side, although not everyone remembers that CIFS (Common Internet File System) is just a private version of SMB (Server Message Block). So there is nothing wrong with generalizing these concepts. Samba is already a LinuxUnix implementation, and it has its own peculiarities, but I digress. What is important here: when Veeam asks to write something to the UNC path (serverdirectory), the server uses the hierarchy of file system drivers, including mup and mrxsmb, to write to the ball. Accordingly, these drivers will also generate errors.
Can't do without Winsock API. If something needs to be done over the network, VBR works through the Windows Socket API, popularly known as Winsock. So if we see a bunch of IP:Port in the log, this is it. The official documentation has a good list of possible
Mentioned above WMI (Windows Management Instrumentation) is a kind of almighty API for managing everything and everyone in the Windows world. For example, when working with Hyper-V, almost all requests to the host go through it. In a word, the thing is absolutely irreplaceable and very powerful in its capabilities. In an attempt to help find out where and what is broken, the built-in WBEMtest.exe tool helps a lot.
And last on the list, but by no means least in importance - VSS (Volume Shadow Storage). The topic is as inexhaustible and mysterious as much documentation has been written on it. Shadow Copy is most simply understood as a special type of snapshot, which in essence it is. Thanks to him, you can make application-consistent backups in VMware, and almost everything in Hyper-V. I have plans to make a separate article with some squeeze on VSS, but for now you can try to read
On this, perhaps, we can stop. I consider the task of explaining the most basic things completed, so in the next chapter we will already look at the logs. But if you have any questions, feel free to ask them in the comments.
Source: habr.com