A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

Moved me to this post here is the comment.

I bring it here:

kaleman Today in 18: 53

I was pleased with the provider today. Along with the update of the site blocking system, he got the mail.ru mailer under the ban. In the morning I pull technical support, they can’t do anything. The provider is small, and apparently higher providers are blocking. I also noticed a slowdown in the opening of all sites, maybe some kind of crooked DLP was hung? Previously, there were no problems with access. The destruction of the Runet is happening right before my eyes…

The fact is that it seems that we are the same provider 🙁

And indeed, kaleman I almost guessed the cause of the problems with mail.ru (although we refused to believe in such a thing for a long time).

The following will be divided into two parts:

  1. the causes of our current problems with mail.ru and an exciting quest to find them
  2. the existence of ISP in today's realities, the stability of the sovereign Runet.

Problems with the availability of mail.ru

Oh, it's quite a long story.

The fact is that in order to implement the requirements of the state (more details in the second part), we purchased, configured, installed some equipment - both for filtering prohibited resources and for implementing NAT translations subscribers.

Some time ago, we finally rebuilt the network core in such a way that all subscriber traffic passed through this equipment in exactly the right direction.

A few days ago, they turned on forbidden filtering on it (at the same time leaving the old system to work) - everything seemed to go well.

Further, they gradually began to enable NAT for different parts of subscribers on this equipment. On the face of it, everything seems to be going well.

But today, having enabled NAT on the equipment for the next part of subscribers, in the morning we faced a decent number of complaints about inaccessibility or partial availability mail.ru and other resources of Mail Ru Group.

They began to check: something somewhere sometimes, occasionally sends TCP RST in response to requests exclusively to mail.ru networks. Moreover, it sends an incorrectly generated (without ACK), obviously artificial TCP RST. It looked like this:

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

Naturally, the first thoughts were about new equipment: terrible DPI, no trust in it, you never know what it can do - after all, TCP RST is a fairly common thing among blocking tools.

Assumption kaleman about the fact that someone “superior” filters, we also put forward - but immediately discarded.

Firstly, we have enough sane uplinks not to suffer like this 🙂

Secondly, we are connected to several IX in Moscow, and traffic to mail.ru goes through them - and they have neither duties nor any other motive to filter traffic.

The next half of the day was spent on what is usually called shamanism - together with the equipment vendor, for which thanks to them, they did not quit 🙂

  • filtering is completely disabled.
  • disabled NAT under the new scheme
  • the test PC was placed in a separate isolated pool
  • changed IP addressing

In the afternoon, a virtual machine was allocated that went online according to the scheme of a regular user, and representatives of the vendor were given access to it and to the equipment. The shamanism continued 🙂

In the end, the vendor's representative confidently stated that the piece of iron had absolutely nothing to do with it: rst's come from somewhere higher.

NoteAt this point, someone may say: but it was much easier to dump not from a test PC, but from a highway higher than DPI?

No, unfortunately, dumping (and even just mirroring) 40+gbps is not at all trivial.

After that, already in the evening, there was nothing left but to return to the assumption of a strange filtering somewhere higher.

I looked through which IX the traffic is currently going to the MWG networks and simply turned off the bgp sessions to it. And - about a miracle! Everything returned to normal immediately

On the one hand, it is a pity that the whole day was spent searching for the problem, although it was solved in five minutes.

On the other hand:

- In my memory, this is an unprecedented thing. As I wrote above - IX'am really there is no point in filtering transit traffic. They usually have hundreds of gigabits / terabits per second. I just could not seriously imagine such a thing until recently.

- an incredibly fortunate combination of circumstances: a new complex hardware that is not particularly trusted and from which it is not clear what to expect - sharpened just for blocking resources, including TCP RSTs

The NOC of this internet exchange is currently looking for a problem. According to them (and I believe them), they do not have any specially deployed filtering system. But, thanks to the sky, the further quest is no longer our problem 🙂

It was a small attempt to justify, please understand and forgive 🙂

PS: I deliberately do not name either the manufacturer of DPI / NAT or IX (I actually don’t even have any special complaints about them, the main thing is to understand what it was)

Today's (as well as yesterday's and the day before yesterday's) reality from the point of view of an Internet provider

I've spent the last weeks heavily rebuilding the core of the network, doing a bunch of live manipulation, at the risk of significantly impacting live user traffic. Considering the goals, results and consequences of all this, morally, all this is quite difficult. Especially - once again listening to beautiful-hearted speeches about protecting the stability of the Runet, sovereignty, etc. and so on.

In this section, I will try to tell the "evolution" of the network core of a typical ISP over the past ten years.

Ten years ago.

In those blessed times, the core of the provider network could be as simple and reliable as a traffic jam:

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

In this very, very simplified picture, there are no trunks, rings, ip / mpls routing.

Its essence is that user traffic eventually came to the core level switching - from where it went to BNG, from where, as a rule, back to the core switching, and then "to the exit" - through one or more border gateway's to the Internet.

Such a scheme is very, very easy to back up both on L3 (dynamic routing) and L2 (MPLS).

You can put N + 1 anything: access servers, switches, borders - and one way or another reserve them for automatic failover.

After few years it became clear to everyone in Russia that it was impossible to live like this any longer: it was urgent to protect children from the pernicious influence of the network.

There was an urgent need to find ways to filter user traffic.

There are different approaches here.

In a not very good case, something is put “in the context”: between user traffic and the Internet. The traffic passing through this “something” is analyzed and, for example, a fake packet with a redirect is sent to the subscriber.

In a slightly better case - if traffic volumes allow - you can make a small feint with your ears: send for filtering only traffic outgoing from users only to those addresses that need to be filtered (for this, you can either take the IP addresses specified there from the registry, or additionally resolve the existing in the domain registry).

At one time, for these purposes, I wrote a simple mini dpi - although even the language does not dare to call it that. It is very simple and not very productive - however, it allowed us and dozens (if not hundreds) of other providers not to immediately lay out millions on industrial DPI systems, but gave several additional years of time.

By the way, about the then and current DPIBy the way, many who purchased the DPI systems available at that time on the market had already thrown them away. Well, they are not sharpened for this: hundreds of thousands of addresses, tens of thousands of URLs.

And at the same time, domestic producers have risen very strongly under this market. I'm not talking about the hardware component - everything is clear to everyone, but software - the main thing that is in DPI - perhaps today, if not the most advanced in the world, then certainly a) developing by leaps and bounds, and b) at the price of a box - just incomparable with foreign competitors.

I would like to be proud, but a little sad =)

Now everything looked like this:

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

A couple more years later everyone already had auditors; resources in the registry became more and more. For some old equipment (for example, cisco 7600), the “side filtering” scheme simply became inapplicable: the number of routes on 76 platforms is limited to something around nine hundred thousand, while the number of IPv4-only routes today is already approaching 800 thousand. And if also ipv6 ... And also ... how much is there? 900000 separate addresses in the rkn bath? =)

Someone switched to a scheme with mirroring of all backbone traffic to a filtering server, which should analyze the entire stream and, if something bad is found, send RST in both directions (sender and recipient).

However, the more traffic, the less applicable such a scheme. At the slightest delay in processing, the mirrored traffic will simply fly by unnoticed, and the provider will receive a penalty protocol.

More and more providers are forced to put DPI systems of varying degrees of reliability in the context of highways.

A year or two ago according to rumors, almost all the FSB began to require the real installation of equipment SORM (previously, most of the providers managed with coordination with the authorities SORM plan - a plan of operational measures in case you need to find something somewhere)

In addition to money (not so directly absolutely transcendental, but still millions), SORM demanded from many the next manipulations with the network.

  • SORM needs to see "gray" addresses of users, before nat-translation
  • SORM has a limited number of network interfaces

Therefore, we, in particular, had to rebuild a big piece of the kernel - just in order to collect user traffic to access servers somewhere in one place. In order to mirror it in SORM with several links.

That is, very simplified, it was (on the left) vs became (on the right):

A detailed response to the comment, as well as a little about the life of providers in the Russian Federation

Now most providers are also required to implement SORM-3 - which includes, among other things, the logging of nat broadcasts.

For these purposes, we had to add separate equipment for NAT to the diagram above (just the one that is discussed in the first part). And add in a certain order: since SORM must “see” traffic before address translation, traffic must go strictly as follows: users -> switching, core -> access servers -> SORM -> NAT -> switching, core -> Internet. To do this, we had to literally “turn” traffic flows in the other direction for profit, which was also quite difficult.

In summary: over a decade, the core scheme of an average provider has become much more complicated, and additional points of failure (both in the form of equipment and in the form of single switching lines) have increased significantly. Actually, in itself the requirement to “see everything” implies the reduction of this “everything” to one point.

It seems to me that this can be quite transparently extrapolated to the current initiatives for the sovereignization of the Runet, its protection, stabilization and improvement 🙂

And ahead is Yarovaya.

Source: habr.com

Add a comment