Stop thinking that SLA will save you. It is needed to calm and create a false sense of security.

Stop thinking that SLA will save you. It is needed to calm and create a false sense of security.

SLA, it is also a "service-level agreement" - an agreement-guarantee between the customer and the service provider about what the client will receive in terms of service. It also stipulates compensation in case of downtime due to the fault of the supplier and so on. In essence, SLA is a credential with which a data center or hosting provider convinces a potential client that he will be treated kindly in every possible way and in general. The question is that you can write anything in the SLA, and the events prescribed in this document do not occur too often. SLA is far from a guideline in the selection of a data center, and you certainly shouldn’t rely on it.

We are all used to signing some kind of contracts that impose certain obligations. The SLA is no exception, usually the most out of touch document imaginable. More useless, perhaps, only NDA in jurisdictions where the concept of "trade secrets" does not really exist. And the whole problem is that SLA does not help the client in any way in choosing the right supplier, but only throws dust in the eyes.

What do hosters most often write in the public version of SLA, which is shown to the public? Well, the first line is such a term as the "reliability" of the host - these are usually numbers from 98 to 99,999%. In fact, these figures are just a beautiful invention of marketers. Once upon a time, when hosting was young and expensive, and specialists only dreamed of clouds (as well as broadband access for everyone), the hosting uptime indicator was extremely, extremely important. Now, when all providers use plus or minus the same equipment, sit on the same backbone networks and offer the same service packages, the uptime indicator is absolutely indicative.

Is there a “correct” SLA at all?

Of course, there are also ideal versions of SLA, but all of them are non-standard documents and are written and concluded between the client and the supplier manually. At the same time, it is this type of SLA that most often concerns some kind of contract work, rather than services.

What should be in a good SLA? If you give TLDR, then a good SLA is a document that regulates the relationship between two entities, which gives one of the parties (the customer) maximum control over the process. That is, how it works in the real world: there is a document that describes global interaction processes and regulates the relationship between the parties. It sets boundaries, rules, and in itself becomes a leverage that both parties can use to the fullest. So, thanks to the correct SLA, the customer can simply force the contractor to work as agreed, and the contractor helps to fight off the "wants" of an overly active client that are unreasonable by the contract. It looks like this: “In our SLA it is written this way and that, go from here, we do everything as agreed.”

That is, “correct SLA” = “adequate contract for the provision of services” and gives control over the situation. And this is possible only when working “on an equal footing”.

What they write on the site and what awaits in reality are two different things.

In general, everything that we will discuss further is typical marketing tricks and checking for attentiveness.

If we take popular domestic hosters, then one offer is more beautiful than the other: 25/8 support, server uptime 99,9999999% of the time, a bunch of their own data centers at least in Russia. Please remember the moment about data centers, we will return to it a little later. In the meantime, let's talk about the ideal statistics of fault tolerance and what a person faces when his server still falls into "0,0000001% of falls."

With indicators from 98% and above, any fall is an event on the verge of a statistical error. The working equipment and connection either is, or they are not. You can use a hosting provider with a “reliability” index of 50% (according to its own SLA) for years without a single problem, or “fall” once a month for a couple of days with the guys, where 99,99% is declared.

When the moment of falling nevertheless comes (and we remind you that someday everyone falls), then the client is faced with an internal corporate machine called “support”, and a contract for the provision of services and SLA is brought to light. What does it mean:

  • most likely, for the first four hours of downtime, you won’t be able to present anything at all, although some hosters start recalculating the tariff (payment of compensation) from the moment of the fall.
  • If the server is unavailable for more time, you may be able to request a rate recalculation.
  • And this is provided that the problem arose through the fault of the supplier.
  • If your problem arose due to a third party (on the highway), then it’s like “no one is to blame” and when the problem is solved is a matter of your luck.

That being said, it is important to understand that you never get access to the engineering team, most often you are stopped by the first line of support, who is in correspondence with you while real engineers are trying to fix the situation. Familiar scenario?

Here, many rely on the SLA, which, it seems, should protect you from such situations. But, in fact, companies rarely go beyond the boundaries of their own document or are able to turn the situation in such a way as to minimize their own costs. The primary task of the SLA is to lull the vigilance and convince that even in the event of an unforeseen situation, “everything will be fine.” The second task of the SLA is to spell out the main critical points and give the service provider room to maneuver, that is, the ability to attribute the failure to something for which the provider is “not responsible”.

At the same time, large clients, in fact, do not care at all about compensation within the SLA. “SLA compensation” is a refund within the tariff in proportion to equipment downtime, which will never cover even 1% of potential monetary and reputational losses. In this case, it is much more important for the client that the problems are fixed as soon as possible, rather than some kind of “tariff recalculation”.

“Many data centers around the world” is a cause for concern

We put the situation with a large number of data centers at the service provider into a separate category, because in addition to the obvious above-described problems with communication, non-obvious problems also pop up. For example, your service provider does not have access to "their" data centers.

In our last article we wrote about the types of affiliate programs and mentioned the white label model, the essence of which is the resale of other people's capacities under its own sign. The vast majority of modern hosters that claim to have "their own data centers" in many regions are resellers according to the White Label model. That is, physically they have nothing to do with a conditional data center in Switzerland, Germany or the Netherlands.

There are some very interesting conflicts here. Your SLA with the service provider is still working and is valid, but the provider is not able to somehow drastically influence the situation in the event of an accident. He himself is in a dependent position on his own supplier - the data center, from which the power racks were bought for resale.

Thus, if not only beautiful wording in the contract and SLA about reliability and service is important to you, but also the ability of the service provider to quickly solve problems, you should work directly with the owner of the facilities. In fact, this implies direct interaction directly with the data center.

Why are we not considering options when many DCs can actually belong to one company? Well, there are very, very few such companies. One, two, three small data centers or one large one is real. But a dozen DCs, half of which are in the Russian Federation, and the second in Europe, is almost impossible. And this means that there are much more reseller companies than you can imagine. Here is a simple example:

Stop thinking that SLA will save you. It is needed to calm and create a false sense of security.
Estimate the number of Google Cloud service data centers. There are only six in Europe. In London, Amsterdam, Brussels, Helsinki, Frankfurt and Zurich. That is, at all major main points. Because a data center is an expensive, complex and very large project. And now remember the hosting companies from somewhere in Moscow with "a dozen data centers throughout Russia and Europe."

Of course, there are not enough good suppliers who have partners in the White Label program, and they provide top-notch services. They make it possible to rent capacities in the EU and the Russian Federation simultaneously through the same browser window, accept payment in rubles, not in foreign currency, and so on. But when the cases described in the SLA occur, they become exactly the same hostages of the situation as you are.

This reminds us once again that SLAs are useless if you don't understand the structure of the organization and the capacity of the supplier.

With the result that

Server crashes are always an unpleasant event and can happen to anyone, anywhere. The question is how much control over the situation you want. Now there are not too many direct capacity providers on the market, and if we talk about large players, then they own, conditionally, only one DC somewhere in Moscow out of a dozen across Europe that you can access.

Here, each client must decide for himself: I choose comfort right now or spend time and effort searching for a data center in an acceptable place in Russia or Europe, where I can place my equipment or buy capacity. In the first case, standard solutions that are now on the market are suitable. In the second - you have to sweat.

First of all, it is necessary to identify whether the service seller is the direct owner of the facilities / data center. A lot of White Label resellers do their best to disguise their status, and in this case, you need to look for some indirect signs. For example, if “their European DCs” have some specific names and logos that differ from the name of the supplier company. Or if the word "partners" flashes somewhere. Partners = White Label in 95% of cases.

Next, you need to get acquainted with the very structure of the company, and it is better to look at the equipment live. Among data centers, the practice of excursions or at least excursion articles on their own website or blog is not new (we wrote such articles, time и two) where they talk about their data center with photos and detailed descriptions.

With many data centers, you can arrange a personal visit to the office and mini-excursions to the DC itself. There you can assess the degree of order, perhaps you will be able to communicate with one of the engineers. It is clear that no one will give you an excursion to production if you need one server for 300 RUB / month, but if you need serious capacity, then the sales department may well meet you. We, for example, conduct such excursions.

In any case, you should be guided by common sense and the needs of the business. For example, if you need a distributed infrastructure (part of the servers in the Russian Federation, the second - in the EU), it will be easier and more profitable to use the services of hosters that have partnerships with European DCs according to the White Label model. If your entire infrastructure is concentrated at one point, that is, in one data center, then you should spend some time looking for a supplier.

Because a typical SLA will most likely not help you. But working with the owner of the facilities, and not the reseller, will significantly speed up the solution of possible problems.

Source: habr.com

Add a comment