Let's count the agents "Inspector"

It is no secret that the automated system "Revizor" monitors the control of blocking on the list of prohibited information in Russia. How it works is written well here in this article on Habr, the picture is from there:

Let's count the agents "Inspector"

Installed directly at the provider "Agent Auditor" module:

The "Agent Inspector" module is a structural element of the automated system "Inspector" (AS "Inspector"). This system is designed to monitor compliance by telecom operators with access restriction requirements within the framework of the provisions established by Articles 15.1-15.4 of the Federal Law of July 27, 2006 No. 149-FZ “On Information, Information Technologies and Information Protection.”

The main purpose of the creation of the AS "Revizor" is to ensure monitoring of compliance by telecom operators with the requirements established by articles 15.1-15.4 of the Federal Law of July 27, 2006 No. 149-FZ "On Information, Information Technologies and Information Protection" in terms of identifying facts of access to prohibited information and obtaining supporting materials (data) about violations of restricting access to prohibited information.

Taking into account the fact that, if not all, then many providers installed this device at home, it should have turned out to be a large network of probe beacons like RIPE Atlas and even more, but with closed access. However, a lighthouse is a lighthouse to send signals in all directions, but what if we catch them and see what we caught and how much?

Before counting, let's see why this might even be possible.

Some theory

Agents check the availability of a resource, including through HTTP(S) requests, such as this one:

TCP, 14678  >  80, "[SYN] Seq=0"
TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"
TCP, 14678  >  80, "[ACK] Seq=1 Ack=1"

HTTP, "GET /somepage HTTP/1.1"
TCP, 80  >  14678, "[ACK] Seq=1 Ack=71"
HTTP, "HTTP/1.1 302 Found"

TCP, 14678  >  80, "[FIN, ACK] Seq=71 Ack=479"
TCP, 80  >  14678, "[FIN, ACK] Seq=479 Ack=72"
TCP, 14678  >  80, "[ACK] Seq=72 Ack=480"

In addition to the payload, the request also consists of the connection setup phase: exchange SYN и SYN-ACK, and connection termination phases: FIN-ACK.

The forbidden information registry contains several types of locks. Obviously, if the resource is blocked by IP address or domain name, then we will not see any requests. These are the most destructive types of blocking that result in the unavailability of all resources on one IP address or all information on a domain. There is also a "URL" type of blocking. In this case, the filtering system must parse the HTTP request header to determine exactly what to block. And before it, as can be seen above, there should be a connection setup phase that you can try to track, since most likely the filter will miss it.

To do this, you need to select a suitable free domain with the type of blocking "by URL" and HTTP, in order to facilitate the work of the filtering system, preferably a long-abandoned one, to minimize the ingress of extraneous traffic except from Agents. This task turned out to be not difficult at all, there are quite a lot of free domains in the registry of prohibited information and for every taste. Therefore, the domain was purchased, tied to IP addresses on a VPS running tcpdump and the counting began.

Revision of the "Auditors"

I was expecting to see periodic bursts of requests, which would indicate, in my opinion, a controlled action. It’s impossible to say that I didn’t see it at all, but there was definitely no clear picture:

Let's count the agents "Inspector"

Which is not surprising, even an unnecessary domain on a never used IP will receive just a mass of unsolicited information, such is the modern Internet. But fortunately, I only needed requests for a specific URL, so all the scanners and password brute force were quickly found. Also, it was quite easy to understand where the flood was due to the mass of requests of the same type. Then I compiled the frequency of occurrence of IP addresses and walked through the entire top manually separating those who slipped through in the previous stages. Additionally, I cut out all the sources that sent one package at a time, there were not many of them. And this is what happened:

Let's count the agents "Inspector"

A small lyrical digression. A little more than a day later, my hosting provider sent a letter of rather streamlined content, saying that your facilities have a resource from the prohibited list of the ILV, so it is blocked. At first I thought that my account was blocked, it was not. Then I thought that I was just being warned about what I already know. But it turned out that the hoster turned on his filter in front of my domain, and as a result, I fell under double filtering: from the providers and from the hoster. The filter passed only the ends of the requests: FIN-ACK и RST cutting off all HTTP on the forbidden URL. As you can see from the graph above, after the first day I began to receive less data, but I still received them, which was quite enough for the task of counting query sources.

Get to the point. In my opinion, two bursts are clearly visible every day, the first is smaller, after midnight Moscow time, the second is closer to 6 in the morning with a tail up to 12 in the afternoon. The peak does not occur exactly at the same time. At first, I wanted to highlight the IP addresses that fell only in these periods and each in all periods, based on the assumption that Agent checks are performed periodically. But upon closer inspection, I quickly discovered periods falling into other intervals, with other frequencies, up to one request every hour. Then I thought about time zones and that it might be the case, then I thought that in general the system might not be synchronized globally. In addition, for sure, NAT will play its role and the same Agent can make requests from different public IPs.

Since my original goal was not exactly, I generally counted all the addresses that I got in a week and got - 2791. The number of TCP sessions established from one address is 4 on average, with a median of 2. Top sessions per address: 464, 231, 149, 83, 77. Maximum out of 95% of the sample is 8 sessions per address. The median is not very high, let me remind you that the graph shows a clear daily periodicity, so you could expect something around 4 to 8 in 7 days. If we throw out all once-occurring sessions, then we just get a median equal to 5. But I could not exclude them on a clear basis. On the contrary, a random check showed that they are related to requests for a prohibited resource.

Addresses are addresses, and on the Internet, autonomous systems are more important - AS, which turned out to be 1510, average 2 addresses per AS with a median of 1. Top addresses per AS: 288, 77, 66, 39, 27. Maximum out of 95% of the sample is 4 addresses per AS. Here the median is expected - one Agent per provider. The top is also expected - there are big players in it. In a large network, Agents, probably, should be in each region of the operator's presence, do not forget about NAT. If we take it by country, then the maximums will be: 1409 - RU, 42 - UA, 23 - CZ, 36 from other regions, not RIPE NCC. Requests not from Russia attract attention. Probably, this can be explained by geolocation errors or registrar errors when filling in the data. Or the fact that a Russian company may not have Russian roots, or have a foreign representative office because it is easier that way, which is natural when dealing with a foreign organization RIPE NCC. Some part is undoubtedly superfluous, but it is authentically difficult to separate it, since the resource is under blocking, and from the second day it is under double blocking, and most sessions are just an exchange of several service packets. Let's agree that this is a small part.

These numbers can already be compared with the number of providers in Russia. According to the RKN licenses for "Communication services for data transmission, except for voice" - 6387, but this is a very high estimate from above, not all of these licenses apply specifically to Internet providers who need to install an Agent. In the RIPE NCC zone, a similar number of AS registered in Russia is 6230, of which not all providers. UserSide did a more strict calculation and received 3940 companies in 2017, and this is rather an upper estimate. In any case, we have two and a half times less number of illuminated ASs. But here it is worth understanding that AS is not strictly equal to the provider. Some providers do not have their own AS, some have more than one. If we assume that everyone still has Agents, then someone filters more than the others, so their requests are indistinguishable from garbage, if they reach at all. But for a rough estimate, it is quite tolerable, even if something was lost due to my oversight.

About DPI

Despite the fact that my hosting provider turned on its filter starting from the second day, according to the information for the first day, we can conclude that the blocking is working successfully. Only 4 sources were able to break through and have fully completed HTTP and TCP sessions (as in the example above). Another 460 can be sent GET, but the session is instantly terminated by RST. pay attention to TTL:

TTL 50, TCP, 14678  >  80, "[SYN] Seq=0"
TTL 64, TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"
TTL 50, TCP, 14678  >  80, "[ACK] Seq=1 Ack=1"

HTTP, "GET /filteredpage HTTP/1.1"
TTL 64, TCP, 80  >  14678, "[ACK] Seq=1 Ack=294"

#Вот это прислал фильтр
TTL 53, TCP, 14678  >  80, "[RST] Seq=3458729893"
TTL 53, TCP, 14678  >  80, "[RST] Seq=3458729893"

HTTP, "HTTP/1.1 302 Found"

#А это попытка исходного узла получить потерю
TTL 50, TCP ACKed unseen segment, 14678 > 80, "[ACK] Seq=294 Ack=145"

TTL 50, TCP, 14678  >  80, "[FIN, ACK] Seq=294 Ack=145"
TTL 64, TCP, 80  >  14678, "[FIN, ACK] Seq=171 Ack=295"

TTL 50, TCP Dup ACK 14678 > 80 "[ACK] Seq=295 Ack=145"

#Исходный узел понимает что сессия разрушена
TTL 50, TCP, 14678  >  80, "[RST] Seq=294"
TTL 50, TCP, 14678  >  80, "[RST] Seq=295"

Variations of this can be different: less RST or more retransmits - also depends on what the filter sends to the source node. In any case, this is the most reliable template, from which it is clear that it was a prohibited resource that was requested. Plus there is always a response that appears in the session with TTL greater than in previous and subsequent packages.

You can't even see from the rest GET:

TTL 50, TCP, 14678  >  80, "[SYN] Seq=0"
TTL 64, TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"

#Вот это прислал фильтр
TTL 53, TCP, 14678  >  80, "[RST] Seq=1"

Or so:

TTL 50, TCP, 14678  >  80, "[SYN] Seq=0"
TTL 64, TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"
TTL 50, TCP, 14678  >  80, "[ACK] Seq=1 Ack=1"

#Вот это прислал фильтр
TTL 53, TCP, 14678  >  80, "[RST, PSH] Seq=1"

TTL 50, TCP ACKed unseen segment, 14678 > 80, "[FIN, ACK] Seq=89 Ack=172"
TTL 50, TCP ACKed unseen segment, 14678 > 80, "[FIN, ACK] Seq=89 Ack=172"

#Опять фильтр, много раз
TTL 53, TCP, 14678  >  80, "[RST, PSH] Seq=1"
...

You can definitely see the difference in TTL if something comes from the filter. But often nothing can fly at all:

TCP, 14678  >  80, "[SYN] Seq=0"
TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"
TCP Retransmission, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1"
...

Or so:

TCP, 14678  >  80, "[SYN] Seq=0"
TCP, 80  >  14678, "[SYN, ACK] Seq=0 Ack=1"
TCP, 14678  >  80, "[ACK] Seq=1 Ack=1"

#Прошло несколько секунд без трафика

TCP, 80  >  14678, "[FIN, ACK] Seq=1 Ack=1"
TCP Retransmission, 80 > 14678, "[FIN, ACK] Seq=1 Ack=1"
...

And all this repeats and repeats and repeats, as you can see on the graph, not exactly once, every day.

About IPv6

The good news is he is. I can say for sure that there are periodic requests to the forbidden resource from 5 different IPv6 addresses, exactly the behavior of the Agents that I expected. Moreover, one of the IPv6 addresses does not fall under filtering and I see a full-fledged session. From two more I saw only one incomplete session each, one of which was interrupted by RST from the filter, the second in time. Total amount 7.

Since there are few addresses, I studied all of them in detail and it turned out that there are actually only 3 providers, they can be given a standing ovation! Another address is a cloud hosting in Russia (does not filter), another is a research center in Germany (is there a filter, where?). But why they check the availability of prohibited resources on a schedule is a good question. The remaining two made one request and are located outside of Russia, and one of them is filtered (after all, in transit?).

Locks and Agents are a big brake on IPv6, which is not moving very fast anyway. It is sad. Those who have solved this problem can be fully proud of themselves.

In conclusion

I did not pursue 100% accuracy, I ask you to forgive me for this, I hope someone wants to repeat such work with greater accuracy. It was important for me to understand whether such an approach would work in principle. The answer is it will. The figures obtained in the first approximation, I think, are quite reliable.

What else could be done and what I was too lazy to do - count DNS requests. They aren't filtered, but they don't provide much precision either, as they only work for the domain, not the entire URL. Periodicity should be visible. If combined with what is visible directly in the queries, then this will allow you to separate the excess and get more information. It is even possible to identify the developers of the DNS used by the ISPs and much more.

I did not expect at all that for my VPS the hoster would also include its own filter. Maybe this is common practice. In the end, the RKN sends a request to delete the resource just to the host. But this did not surprise me and even played in favor somewhere. The filter worked very effectively, cutting off all valid HTTP requests to the forbidden URL, but not the correct ones, which had previously passed through the provider filter, but only in the form of endings: FIN-ACK и RST - minus to minus and almost turned out to be a plus. By the way, IPv6 was not filtered by the host. Of course, this affected the quality of the collected material, but it still made it possible to see the periodicity. It turned out that this is an important point when choosing a site for hosting resources, do not forget to be interested in the organization of work with the list of prohibited sites and requests from the RKN.

At the beginning, I compared AS "Revizor" with RIPE Atlas. This comparison is quite justified and a large network of Agents can be useful. For example, determining the quality of availability of a resource from different providers in different parts of the country. You can calculate the delays, you can build graphs, you can analyze it all and see the changes taking place both locally and globally. This is not the most direct way, but astronomers use "standard candles", why not use Agents? Knowing (finding) their standard behavior, it is possible to determine the changes that occur around them and how this affects the quality of services provided. And at the same time, you do not need to independently place probes over the network, they have already been installed by Roskomnadzor.

Another point I want to touch on is that every tool can be a weapon. AS "Revizor" is a closed network, but the Agents give everyone away by sending requests for all resources from the prohibited list. Having such a resource does not present any problems at all. In total, providers through Agents, unwittingly, tell a lot more about their network than it might be worth: DPI and DNS types, Agent location (central node and service network?), network delay and loss markers - and this is just the most obvious. Just as someone can monitor the actions of Agents to improve the availability of their resources, someone can do it for other purposes and there are no obstacles to this. A double-edged and very multifaceted tool has turned out, anyone can be convinced of this.

Source: habr.com

Add a comment