We make support cheaper, trying not to lose quality

We make support cheaper, trying not to lose qualityFallback mode (also referred to as IPKVM), which allows you to connect to a VPS without RDP directly from the hypervisor layer, saves 15-20 minutes per week.

First and foremost, don't piss people off. All over the world, support is divided into lines, and the employee should be the first to try typical solutions. If the task is knocked out of their limits - transfer to the second line. So, among VDS administrators, quite often there are people who know how to think. Unlike many other supports. Well, at least much more frequently. And they structure the ticket well, immediately describing everything that is needed. If the first line “blurs their eyes” and they accidentally ask you to turn it on and off in response to this, this is a fiasco.

The task is very simple: to make the support of our VDS hosting adequate at a minimum cost. Because we are the fast food of the world of hosting providers: no special “licking”, low prices, normal quality. Ранее there was already a story about the fact that with the advent of Instagram nyasheks trying to automate account maintenance and small business owners with remote accounting and other people who are not too pumped in technology, communication “like an admin with an admin” stopped working. I had to change the language of communication.

Now I'll tell you a little more about the processes - and about the inevitable jambs with them.

Don't piss people off #1

Any support is an assembly line production. An application comes in, the first-line employee immediately tries to recognize a typical situation that has already happened a thousand times and will happen a thousand times more. There is a 90% chance that the application is typical, and you can answer it by pressing just a couple of buttons so that the template is substituted. You usually need to enter a couple of words into the template - and you're done. Or go to the management interface and press a couple of buttons there. In more complex cases (transfers from zone to zone, for example), you need to perform actions according to the algorithm.

What infuriates people the most, regardless of other qualities of support, is the typical response to an atypical request. A ticket arrives, where everything is described in detail, there is a bunch of necessary data for three questions ahead, the client anticipates a dialogue ... And at the first words, the support employee on autopilot picks up a chord to substitute the template “try to restart, it should help”.

This is what directly opens the brain of people, and it is after such situations that most negative reviews and angry comments remain. It is clear that we were so wrong, from there we know the statistics. In general, we were wrong in different ways, but such cases are always just wild. Including for ourselves. Of course, we would like this not to happen at all. But this is not very possible in practice: once every few weeks, an employee tired of the monotony, no, no, yes, and press the funny buttons.

Don't piss people off #2

The second thing that opens the brain with equal success is when no one answers the ticket for a long time. In Europe, this support behavior is normal: three days before an incident is accepted for work is more than the norm. Even if you are very urgent and something is on fire - no social networks, no phone, no instant messenger, only mail and wait for your turn. In Russia, this is much less common, but still some tickets are “forgotten”. At the very beginning of work, we set an SLA for the first reaction of 15 minutes. And this is with 24/7 honest. Of course, when VDS hosting becomes large, it appears. But dubious service providers do not have this. And we were just doubtful at the start and only then became more or less large. Okay, more or less average.

The first line is operators who were given scripts and taught how to respond to typical situations. They sort problems quickly and quickly and try in 15 minutes to either respond with a typical action, or report that the ticket is in progress and transfer it to the second.

The second line is already hosting administrators, they can do almost everything by hand. There is also a support manager who can do everything and a little more. The third line is already the developers, they get tickets like “correct this in the interface” or “this parameter is incorrectly taken into account there”.

Reduce the number of applications

For obvious reasons, if you want to provide support cheaply, then you should not increase the first line so that people with scripts can cope faster, but increase automation. So that instead of people with scripts there are real scripts. Therefore, one of the first things we did was to automate the processes of raising a virtual machine, scaling by resources (including up and down the disk, but not the processor frequency) and other similar things. The more the user can out of the interface, the easier the first line is to live, and the less it can be. When a user accesses something that is in the personal account, you need to do it and tell how you can do it yourself.

If you don't need support, then she's doing a good job.

The second feature that saves a lot of time is the long filling of the knowledge base. If the user has a problem that is not included in the list of supported actions (most often these are questions of the level “how to install a Minecraft server” or “Where to set up VPS in Win Server”), then an article is written in the knowledge base. The same detailed article is written for all strange requests. For example, if a user asks support to remove the built-in Windows Server firewall, then we send them to read about what will happen if it is really disabled, and how to throw permissions only for the selected software. Because the problem is usually with the fact that something cannot connect due to settings, and not with the firewall itself. But it is very difficult to explain this every time in a dialogue. And somehow I don’t want to turn off the firewall, because pretty soon we will lose either a virtual machine or a client.

If something about application software in the knowledge base becomes very popular, then you can bring the distribution kit to the marketplace so that the service “raise a server with this already installed” appears. Actually, it happened with Docker, and it happened with the Minecraft server. Again, one “do me nice” button in the interface saves up to a hundred tickets a year.

Emergency mode

After these actions, most of the serious breakdowns that require manual work remain with the fact that the user, for some reason, has lost the means of remote access to the guest OS in the hypervisor. The most common case is a trite incorrect firewall setting, the second most common is some kind of bugs that prevent Win from starting normally and force it to reboot into Safe Mode. And in safe mode, RDP is not available by default.

We made an emergency mode for this case. In fact, usually to access a VDS machine, you need to have some kind of client for remote work. Most often we are talking about console access, RDP, VNC or something like that. The disadvantage of these methods is that they do not work without an OS. But we, at the hypervisor level, can also receive an image on the screen, and transfer keystrokes there! True, this is quite heavy on the processor (due to the actual video broadcast), but it allows you to get the desired result.

Therefore, we have given access to the emergency mode to all users, but it is limited in the duration of continuous use. Fortunately, as practice shows, this time is enough to reboot and fix something.

The result is even fewer support tickets. And where the admin can fix the ticket himself, the support does not need to go in and figure it out.

Remaining problems

Very often, users think that support is selling them something. Unfortunately, nothing can be done about it (well, or we didn’t come up with it). The two most common examples are resource limits and DDoS protection.

Each virtual machine has limits on disk load, memory, and allowed traffic. The possibility of setting limits is spelled out in the offer, while the limits themselves are selected so that the majority of users can work quietly without even knowing about them. But if you suddenly start fiddling with the channel and disk very strongly, then the algorithms automatically warn the user. Since April last year, we have removed autoblocks. Instead, set soft limits for a variable period.

It used to be like this: a warning, then, if the user did not heed, - automatic blocking. And at that moment people were offended: “Why are you, your system is buggy, nothing happened!” - and then you can either try to figure out the application software, or offer to increase the tariff plan. We do not have the opportunity to understand the operation of application software, because it is beyond the scope of support. Although the first few cases were analyzed together with users. I especially remember the one where the YouTube views booster had a built-in Trojan, and this Trojan was leaking memory. As a result, we came to the conclusion that these are not heisenbugs, but problems for users, otherwise we would have been flooded with similar applications. But not a single person has yet admitted that he could exceed the tariffs himself.

A similar story is with DDoS: we write that you, dear user, are under attack. Turn on protection, please. And the user: “Yes, you are attacking me yourself!” Of course, we are DDoSing just one user in order to dilute for 300 rubles. A profitable business. Yes, I know that many large hosts from the more expensive category include this protection in their tariff, but we cannot do this: the fast food economy dictates other minimum prices.

No less often, those whose data we have deleted are dissatisfied with the support. In the sense that they were legitimately deleted after the end of the paid period. If someone does not renew the VDS lease, then several notifications come with an explanation of what will happen next. At the end of the payment, the virtual machine stops, but its image is saved. Another notification comes, and then a couple more. The image is kept for seven additional days before being permanently deleted. So, there is a category of people who are very unhappy with this. Starting from “the administrator quit, notifications were sent to his mail, restore” and ending with accusations of fraud and threats of physical violence. The reason is the same prices for all other users. If we store a month, then more storage will be needed. This will mean big prices for each individual client. And the fast food economy… Well, you get the idea. And as a result, on the forums we get reviews in the spirit of "they took the money, deleted the data, scammers."

I note that we have a line of premium tariffs. There, of course, the situation is different, since we take into account the wishes of the client and flexibly configure both the limit and deletion in case of non-payment (we take it to the minus, so as not to block). There it is already economically feasible, because really anything happens, and maintaining a permanent large client is expensive.

Sometimes users are malicious. Several times in our system there were failures with blocking of hundreds of virtual machines due to some obviously illegitimate actions of clients. Actually, it is precisely because of such situations that we needed our own network drivers in order to monitor network activity and see that the user is not executing an attack from his server. Monitoring of such a plan is important so that the boundaries of neighboring virtual machines are not violated by violent guys.

There are those who simply spam, mine, or otherwise violate the offer. Then he knocks on support and asks what went wrong and why the car is blocked. If the process in the ticket in the screenshot is called "spammer.exe", then something is probably going wrong. About once every two weeks, we receive complaints from Sony or Lucasfilm (now Disney) that someone from our virtual machine from our range of IP addresses is distributing a burned movie. For this, immediately block and return the money remaining in the account according to the offer (let me remind you: we have quantization per second, that is, there will always be a balance). And in order to return the money, according to the law, you need to show your passport: this is an anti-money laundering countermeasure. For some reason, instead of showing the passport, the pirates write that we squeezed money from them, forgetting to clarify some of the circumstances.

Ah, yes. Our best request of the year is: “Can I test a virtual machine for a few days at a rate of 30 rubles per month before purchase?”.

Сonclusion

The first line sorts tickets and responds with typical actions. Most of the dissatisfaction is here. It still won’t work to fix it, because the basis of the fix is ​​in hosting automation, that is, in a huge backlog. Yes, we have more than many on the market, but still not enough. Therefore, the best thing to do is to establish first-line monitoring. Service Desk Monitoring - First line KPI implementation. SLA delays are visible in real time: who messes up, often why. Applications thanks to such alerts are never lost. Yes, a ticket can be answered with an off-topic template, but we will find out already from the feedback.

If the client really asks, then the second-line specialist can go to the server and do what the client needs there (the condition is confirmation by letter in which he will provide data for entering the server).

We do this very rarely and only trust the best to do this work, because we want to have guarantees that user data will not be damaged. The best are the second line of support.

The first line has a knowledge base, where you can send the complex to watch.

A personal account rich in functions plus a knowledge base - and now we were able to reduce the number of requests to 1-1,5 per year per client on average.

The second line usually handles complex applications requiring manual labor. What is typical: the more expensive the tariff plan, the fewer such applications per virtual machine. Usually because those who can afford an expensive tariff either have specialists in the state, or simply half of the problems do not arise due to the fact that there is enough configuration for everything. I still remember that hero who put not the oldest Windows Server on a configuration with 256 MB of RAM.

The second line has a set of distributions and a set of automation scripts. Both can be updated as needed.

The second line and personal VIP rate managers are able to add notes to the client's profile. If he is a Linux admin, let's write it down. This will be the first line hint: the user knows for sure that it will not be a shot in the leg, but a controlled destruction.

The third line rules the strangest. For example, we had a bug that it was impossible to reach one of the functions of the personal account in Firefox. The user directly blackmailed: "If you do not fix it within 12 hours, then I will write on all host reviews." As it turned out, the problem was in the custom adblock. On the user's side, oddly enough. Often complex errors come without details, and they can no longer repeat. There are detectives with a screenshot: “Why are you fixing it for a month?” - “Yes, we’ve been looking for your bug all this time just”, “Ah, well, I came across again today, but I couldn’t repeat it again” ...

In general, you never know where the screenshot of the dialogue with support will end up, and if a person already knocks on support, then he has a problem. You can improve your attitude. At least try.

Yes, we know that our support is not perfect, but I would like to believe that it combines enough speed with enough quality. And does not increase the price of tariffs for those who can do without it.

We make support cheaper, trying not to lose quality

Source: habr.com

Add a comment