A story about how, because of one option on server Windows, our sites slowed down

A story about how, because of one option on server Windows, our sites slowed down

Many have already heard that Cloud4Y is an enterprise cloud provider. Therefore, we will not talk about ourselves, but will share a short story about how we had problems accessing some sites and what caused this.

One fine day, the marketing department complained to the engineers that when working through the terminal in browsers, some sites were loading for a long time. In particular, vk.com is vital for them. We received the signal, began to figure out what the problem was.

So, the situation: Internet provider "MegaFon", server OS Windows, browser Firefox. If you open VKontakte with regular Windows 10, the site will load in 10-100 ms. If we try to open with Windows Server 2012/16/19, then the delay is up to 15 seconds, or even more.

Have taken Pixel VK, and through it they began to work out possible versions of what was happening.

Hypothesis Test #1 - Terminal Server issue.
Not confirmed. When I tried opening the page through another server on the same network, the problem persisted.

Hypothesis Test #2 - The problem is in the gateway.
Not confirmed. It is noted that for local laptops everything opens easily and quickly. But at the same time, the problem persists for terminals (and internal servers). We played around with the ICMP settings on the external and internal interfaces - it did not help.

It's kind of weird how it works.

From a local laptop, the site does not slow down.
From the internal Scan-machine (terminal for scanning) - does not slow down.
And marketing slows down. Disorder!

Let's go further.

Hypothesis Test #3 is a problem with DNS.
Not confirmed. We launched the pixel through the public DNS (8.8.8.8) - the same story. The problem is clearly visible when you pull this pixel for the first time in incognito mode, for example.

There is a suspicion that the problem strongly depends on the browser. In FF, the pixel is always dull, in chrome at the first entry. Marketing is constantly stupid and on all browsers.

Hypothesis Test #4 - Something with the OS pattern.
Not confirmed. We deployed a clean Windows Server 2016, ran the test from the .0 network. Got a problem. Transferred to the network .200., The problem persisted. That is, the network gate is .0. nothing. However, laptops from this network do not have this problem. That is, the network gate is .200. also nothing.

That is, the matter is not in the OS template. The virtual machine slows down when loading a pixel. But if you raise a VPN on it (a separate network card) and let traffic through it, then everything works out very quickly (as it should be). We see that there are two options that can cause a problem: a gateway in the office or an Internet operator in the office.

But can Megafon specifically cut off access to the VKontakte pixel? No, it's some nonsense. Let's try to dig some more.

Hypothesis Test #5 - VMware Tools is to blame.
Not confirmed. No harmful actions are observed. Tried to change the map settings, also no. TTL changed - no effect. Well, it’s generally not clear what is the difference between Windows 10 and Windows Server. But there is a difference. As in the story with the gopher.

A story about how, because of one option on server Windows, our sites slowed down

We have been dealing with this problem for quite some time. Of course, I googled similar situations, but did not find anything. So they acted without prompting, working out all possible versions. We tested it from a Windows 2016 laptop to make sure that virtualization and other things are not to blame for the slowdown when loading a pixel. Changed all possible settings of the network card and IP stack. Have tried a bunch of things. But the problem remained, and marketing beat with a hoof and demanded to fix everything.

After some time, we still found where the dog was buried. It was all about options.
netsh interface tcp setglobal ecncapability=disabled

This option is disabled by default on Windows desktop OSes and enabled by default on server ones. As soon as we disable it on the server, everything opens instantly, just like on the desktop. We were able to confirm this problem from the provider that provides us with Internet in the office (Megafon), via Megafon's mobile Internet (if you share it from your phone and connect via Windows Server), via Yota, we tried it in some areas of Moscow and this problem was present everywhere. When working on other operators, access to the site was instant.

Here is such a squiggle, as one prominent political figure put it. In principle, the problem is now solved, but we are very interested: did it occur only in our country or is it a large-scale disaster affecting companies from other cities? If this case is not isolated, then Megafon should think about solving this problem. After all, the ECN (ecncapability) option is enabled by default on servers, and to figure out what the point is, you need to spend a lot of time.

How to check? Yes, just like us. Through the Firefox browser, we are trying to open any page vk.com and again through ctrl + f5. If there is a problem, there will be a constant delay, if there is no problem, the site will open instantly.

What else can you read on the blog? Cloud4Y

β†’ Salt solar energy
β†’ How did the bank fail?
β†’ The Great Snowflake Theory
β†’ Internet in balloons
β†’ Pentesters at the forefront of cybersecurity

Subscribe to our Telegram-channel, so as not to miss the next article! We write no more than twice a week and only on business.

Only registered users can participate in the survey. Sign in, you are welcome.

Is there a delay in loading through server Windows?

  • 4,8%Yes, it takes a long time to load

  • 50,0%No, everything flies

  • 45,2%The problem is not in the settings, but in marketers19

42 users voted. 35 users abstained.

Source: habr.com

Add a comment