In 2010 the company there were 50 servers and a simple network model: backend, frontend and firewall. The number of servers grew, the model became more complicated: staging, isolated VLANs with ACLs, then VPNs with VRFs, VLANs with ACLs on L2, VRFs with ACLs on L3. Head is spinning? Next will be more fun.
When there were 16 servers, it became impossible to work without tears with so many heterogeneous segments. So we came up with another solution. They took the Netfilter stack, added Consul to it as a data source, and got a fast distributed firewall. They replaced ACLs on routers and used both external and internal firewalls. To dynamically manage the tool, we developed the BEFW system, which was used everywhere: from managing user access to the grocery chain to isolating network segments from each other.

How it all works and why you should take a closer look at this system, will tell Ivan Agarkov () is the head of the infrastructure security group of the Maintenance division in the Minsk development center of the company. Ivan is a fan of SELinux, loves Perl, writes code. As the head of the information security team, he regularly works with logs, backups and R&D to protect Wargaming from hackers and ensure the operation of all game servers in the company.

Historical information
Before telling how we did it, I’ll tell you how we came to this in general and why it was needed. To do this, fast forward 9 years ago: 2010, just appeared World of Tanks. Wargaming had about 50 servers.

Graph of the growth of the company's servers.
We had a network model. For that time, it was optimal.

Network model in 2010.
There are bad guys on the front end who want to break us, but it has a firewall. There is no firewall on the backend, but there are 50 servers, we know them all. Everything works well.
In 4 years, the server park has grown 100 times, up to 5000. The first isolated networks appeared - staging: they cannot go to production, and something that could be dangerous was often spinning there.

Network model in 2014.
By inertia, they used all the same pieces of iron, and all the work was carried out on isolated VLANs: ACLs are written to VLANs that allow or deny some kind of connection.
In 2016, the number of servers reached 8000. Wargaming absorbed other studios, additional partner networks appeared. They seem to be ours, but not quite: VLAN often does not work for partners, you have to use VPN with VRF, isolation becomes more complicated. The mix of ACL insulations grew.

Network model in 2016.
By the beginning of 2018, the fleet of machines had grown to 16. There were 000 segments, and we did not count the rest, including closed ones, in which financial data was stored. There were container networks (Kubernetes), DevOps, cloud networks connected via VPN, for example, from IVS. There were a lot of rules - it was painful.

Network model and isolation methods in 2018.
For isolation, we used: VLAN with ACL on L2, VRF with ACL on L3, VPN and much more. Too much.
Problems
Everyone lives with ACLs and VLANs. What is wrong at all? This question will be answered by Harold hiding the pain.

There were many problems, but there were five massive ones.
- Geometric price increase for new rules. Each new rule was added longer than the previous one, because it was necessary to first see if there was already such a rule.
- No firewall inside segments. The segments were somehow separated from each other, there are already not enough resources inside.
- The rules have been in place for a long time. With their hands, operators could write one local rule in an hour. The global took several days.
- Difficulties with auditing rules. More precisely, it was not possible. The first rules were written back in 2010, and most of their authors no longer worked for the company.
- Low level of infrastructure control. This is the main problem - we did not know very well what was going on with us in general.
This is what a network engineer looked like in 2018 when he heard: "We need some more ACLs."

Solutions
At the beginning of 2018, it was decided to do something about it.
The price of integrations is constantly growing. The starting point was that large data centers stopped supporting isolated VLANs and ACLs because the memory on the devices ran out.
Solution: we removed the human factor and automated the provision of access to the maximum.
The new rules apply for a long time. Solution: speed up the application of the rules, make it distributed and parallel. This requires a distributed system so that the rules are delivered by themselves, without rsync or SFTP for a thousand systems.
No firewall inside segments. The firewall inside the segments began to fly to us when different services appeared within the same network. Solution: use a firewall at the host level - host-based firewalls. Almost everywhere we have Linux, and everywhere there are iptables, this is not a problem.
Difficulties with auditing rules. Solution: keep all rules in one place for review and management so we can audit everything.
Low level of infrastructure control. Solution: inventory all services and access between them.
This is more of an administrative process than a technical one. Sometimes we have 200-300 new releases a week, especially during promotions and holidays. However, this is only for one team of our DevOps. With so many releases, it is impossible to see what ports, IPs, integrations are needed. Therefore, we needed specially trained service managers who asked the teams: “What is there and why did you raise it?”
After all that we launched, the network engineer in 2019 began to look like this.

Consul
We decided that we would put everything that we found with the help of service managers into Consul and from there we would write iptables rules.
How did we decide to do this?
- Let's collect all services, networks and users.
- Let's make iptables rules based on them.
- We automate control.
- ....
- PROFIT.
Consul is not a remote API, it can run on every node and write to iptables. It remains only to come up with automatic controls that will clean up the excess, and most of the problems will be solved! The rest will be finalized in the process.
Why Consul?
Well proven. In 2014-15, we used it as a backend for Vault, in which we store passwords.
Doesn't lose data. During the use of Consul did not lose data in any accident. This is a huge plus for a firewall management system.
P2P connections accelerate the spread of change. With P2P, all changes come quickly, no need to wait for hours.
Convenient REST API. We also considered Apache ZooKeeper, but it does not have a REST API, you will have to install crutches.
Works as both a keystore (KV) and a directory (Service Discovery). You can store services, catalogs, data centers at once. This is convenient not only for us, but also for neighboring teams, because when building a global service, we think big.
Written in Go, which is part of the Wargaming stack. We love this language, we have a lot of Go developers.
Powerful ACL system. In Consul, using ACLs, you can control who and what to write to. We guarantee that the firewall rules will not interfere with anything else and we will not have any problems with this.
But Consul also has its downsides.
- Does not scale within the data center if you do not have a business version. It scales only by federation.
- Very dependent on network quality and server load. Consul will not work properly as a server on a busy server if there are some lags in the network, such as uneven speed. This is due to P2P connections and update distribution models.
- Difficulty monitoring availability. In Consul status, he can say that everything is fine, but he has died a long time ago.
We solved most of these problems during the operation of Consul, which is why we chose it. The company has plans for an alternative backend, but we have learned how to deal with problems and for now we live with Consul.
How Consul works
We will install servers in a conditional data center - from three to five. One or two servers will not do: they will not be able to organize a quorum and decide who is right and who is wrong when the data does not match. More than five does not make sense, performance will drop.

Clients connect to servers in any order: the same agents, only with the flag server = false.

After that, clients receive a list of P2P connections and build connections between themselves.

At the global level, we interconnect several data centers. They also connect P2P and communicate.

When we want to take data from another data center, the request goes from server to server. Such a scheme is called Serf protocol. The Serf protocol, like Consul, is developed by HashiCorp.
Some important facts about Consul
Consul has documentation describing how it works. I will give only selective facts that are worth knowing.
Consul-servers choose the master from among the voters. Consul selects a master from a list of servers for each data center, and all requests go only to him, regardless of the number of servers. Hanging the master does not lead to re-elections. If the master is not selected, requests are not served by anyone.
Did you want horizontal scaling? Soorry, no.
A request to another data center goes from master to master, regardless of which server it came to. The selected master receives 100% of the load, except for the load on forward requests. All servers in the data center have an up-to-date copy of the data, but only one responds.
The only way to scale is to enable stale mode on the client.
In stale mode, you can respond without a quorum. This is a mode in which we refuse data consistency, but read a little faster than usual, and any server responds. Naturally, recording only through the master.
Consul does not copy data between data centers. When federation is collected, each server will have only its own data. For others, he always turns to someone else.
Atomicity of operations is not guaranteed outside of a transaction. Remember that not only you can change something. If you want differently, conduct a transaction with a lock.
Blocking operations do not guarantee blocking. The request goes from master to master, and not directly, so there is no guarantee that the lock will work when you make a lock, for example, in another data center.
ACL doesn't guarantee access either (in many cases). The ACL may not work because it is stored in one data center of the federation - in the ACL data center (Primary DC). If the DC does not answer you, the ACL will not work.
One stuck master will hang the whole federation. For example, in a federation there are 10 data centers, and one has a bad network, and one master falls. Everyone who communicates with him will freeze in a circle: there is a request, there is no answer to it, the thread freezes. There is no way to know when this will happen, just in an hour or two the whole federation will fall. You can't do anything about it.
Status, quorum, and elections are handled by a separate thread. There will be no reselection, the status will not show anything. You think that you have a live Consul, you ask, and nothing happens - there is no answer. At the same time, the status shows that everything is fine.
We have faced this problem, we had to rebuild specific parts of the data centers in order to avoid it.
Consul Enterprise business edition does not have some of the disadvantages above. It has many useful functions: voter selection, distribution, scaling. There is only one "but" - the licensing system for a distributed system is very expensive.
LifeFac: rm -rf /var/lib/consul - a cure for all diseases of the agent. If something does not work for you, just delete your data and download the data from a copy. Most likely, Consul will work.
BEFW
Now let's talk about what we have added to Consul.
is an acronym for BackEndFireWall. I had to name the product somehow when I created the repository in order to put the first test commits into it. This name has remained.
Rule Templates
Rules are written in iptables syntax.
- -N BEFW
- -P INPUT DROP
- -A INPUT -m state—state RELATED,ESTABLISHED -j ACCEPT
- -A INPUT -i lo -j ACCEPT
- -A INPUT -j BEFW
Everything goes into the BEFW chain with us, except for ESTABLISHED, RELATED and localhost. The template can be anything, this is just an example.
Why is BEFW useful?
Services
We have a service, it always has a port, a node on which it runs. From our node, we can locally ask the agent and find out that we have some kind of service. You can also add tags.

Any service that is started and registered with Consul becomes an iptables rule. We have SSH - we open port 22. The Bash script is simple: curl and iptables, nothing else is needed.
Client
How to open access not to everyone, but selectively? Store IP lists in KV-storage by service name.

For example, we want everyone on the tenth network to be able to access the SSH_TCP_22 service. Adding one small TTL field? and now we have temporary permits, for example, for a day.
Access
We connect services and clients: we have a service, a KV-storage is ready for each. Now we give access not to everyone, but selectively.

Group
If every time we write thousands of IPs for access, we will get tired. Let's come up with groupings - a separate subset in KV. Let's call it Alias (or groups) and we will store groups there according to the same principle.

We connect: now we can open SSH not specifically on P2P, but on a whole group or several groups. Similarly, there is TTL - you can add to the group and remove from the group temporarily.

Integration
Our problem is the human factor and automation. So far we have solved it like this.

We work with Puppet, and transfer everything related to the system (application code) to them. Puppetdb (plain PostgreSQL) stores a list of services that are running there, they can be found by resource type. You can also find out who goes where. We also have a pull request and merge request system for this.
We wrote befw-sync, a simple solution that helps transfer data. First, sync cookies are accessed by puppetdb. The HTTP API is configured there: we ask what services we have, what needs to be done. Then they make a request to Consul.
Is there integration? Yes: they wrote the rules, allowed to accept the Pull Request. Do you need some port or add a host to some group? Pull Request, review - no more "Find 200 other ACLs and try to do something with it."
Optimization
Pinging localhost with an empty rule chain takes 0,075ms.

Let's add 10 iptables addresses to this chain. As a result, the ping will increase by 000 times: iptables is completely linear, processing each address takes some time.

For a firewall where we are migrating thousands of ACLs, we have a lot of rules and this introduces a delay. For gaming protocols, this is bad.
But if we place 10 addresses per ipset ping will even decrease.

The point is that "O" (algorithm complexity) for ipset is always 1, no matter how many rules there are. True, there is a limitation there - there can be no more than 65535 rules. For now, we live with this: you can combine them, expand them, make two ipsets in one.
Storage
A logical continuation of the iteration process is the storage of information about clients for the service in ipset.

Now we have the same SSH, and we do not write 100 IPs at once, but we set the name of the ipset to communicate with, and the following rule DROP. It can be remade into one rule “Who is not here, that DROP”, but this is more clear.
Now we have rules and sets. The main task is to set before writing the rule, because otherwise iptables will not write the rule.
General scheme
In the form of a diagram, everything that I said looks like this.

We commit to Puppet, everything is sent to the host, services are here, ipset is there, and those who are not registered there are not allowed.
allow & deny
To quickly save the world or quickly disconnect someone, at the beginning of all chains we made two ipsets: rules_allow и rules_deny... How it works?
For example, someone using bots creates a load on our Web. Previously, it was necessary to find his IP from the logs, take it to network engineers so that they find the source of traffic and ban him. Now it looks different.

We send it to Consul, wait 2,5 seconds, and you're done. Since Consul distributes quickly through P2P, it works everywhere, in any part of the world.
Once I somehow completely stopped WOT by making a mistake with the firewall. rules_allow is our insurance against such cases. If we made a mistake with the firewall somewhere, something gets blocked somewhere, we can always send a conditional 0.0/0to get everything up quickly. Later we will fix everything by hand.
Other sets
You can add any other sets in space $IPSETS$.

For what? Sometimes someone needs ipset, for example, to emulate the shutdown of some part of the cluster. Everyone can bring any sets, name them and they will be taken from Consul. At the same time, sets can either participate in iptables rules or be like a team NOOP: consistency will be maintained by the daemon.
Members
It used to be like this: a user connected to the network and received parameters through the domain. Before the advent of the new generation of firewalls, Cisco did not know how to understand where the user is and where the IP is. Therefore, access was granted only through hostname machines.
What did we do? Wedged in at the time of receiving the address. Usually this is dot1x, Wi-Fi or VPN - everything goes through RADIUS. For each user, we create a group by username and put in it an IP with a TTL that is equal to its dhcp.lease - as soon as it expires, the rule will disappear.

Now we can open access to services, as well as to other groups, by username. We took the pain out of hostnames when they change and took the burden off network engineers because they don't need Cisco anymore. Now engineers themselves prescribe accesses on their servers.
Insulation
In parallel, we began to disassemble the insulation. Service managers took inventory, and we analyzed all our networks. Let's decompose them into the same groups, and on the necessary servers, the groups were added, for example, to deny. Now the same staging isolation goes into rules_deny in production, but not in production itself.

The scheme works quickly and simply: we remove all ACLs from servers, unload hardware, reduce the number of isolated VLANs.
Integrity control
Previously, a special trigger worked for us, which reported when someone changed the firewall rule by hand. I wrote a huge linter for checking firewall rules, it was difficult. BEFW controls integrity now. He jealously ensures that the rules he makes do not change. If someone changes the firewall rules, he will return everything back. “I quickly raised a proxy here to work from home” - there are no such options anymore.
BEFW controls ipset from services and list in befw.conf, service rules in BEFW chain. But does not follow other chains and rules and other ipsets.
Accident protection
BEFW always stores the last known good state directly in the state.bin binary structure. If something went wrong, it always rolls back to this state.bin.

This is insurance against Consul's instability when it didn't send data or someone made a mistake and used rules that can't be applied. So that we are not left without a firewall, BEFW will roll back to the last state if an error occurs at some stage.
In critical situations, this is a guarantee that we will remain with a working firewall. We open all gray networks in the hope that the admin will come and fix it. Someday I will put this in the configs, but now we just have three gray networks: 10/8, 172/12 and 192.168/16. Within our Consul, this is an important feature that helps to develop further.
Demo: during the presentation, Ivan demonstrates the BEFW demo mode. The demonstration is more convenient to look at . Demo source code available .
Pitfalls
I'll tell you about the bugs that we encountered.
ipset add set 0.0.0.0/0. What happens if you add 0.0.0.0/0 to ipset? Will all IPs be added? Will you have access to the Internet?
No, we will get a bug that cost us two hours of downtime. Moreover, the bug has not been working since 2016, it lies in the RedHat Bugzilla under the number # 1297092, and we found it by accident - from the developer's report.
BEFW now has a hard and fast rule that 0.0.0.0/0 turns into two addresses: 0.0.0.0/1 и 128.0.0.0/1.
ipset restore set < file. What does ipset do when you tell it restore? Do you think it works the same as iptables? Restore data?
Nothing of the kind - it does merge, and the old addresses do not go anywhere, you do not close access.
We found a bug when we tested isolation. Now there is a rather complicated system - instead of restore held create temp, later restore flush temp и restore temp. At the end of swap: for atomicity, because if you first conduct flush and at that moment some packet arrives, it will be discarded and something will go wrong. So there's some black magic in there.
consul kv get -datacenter=other. As I said, we think that we are requesting some data, but we will receive either data or an error. We can do this through Consul locally, but in this case, both will hang.
The local Consul client is a wrapper over the HTTP API. But it just hangs and doesn't respond to Ctrl+C or Ctrl+Z or anything but kill -9 in the adjacent console. We ran into this when we were building a large cluster. But we don't have a solution yet, we are preparing to fix this bug in Consul.
Consul leader is not responding. The master in the data center does not answer us, we think: “Probably, the reselection algorithm will work now?”
No, it will not work, and monitoring will not show anything: Consul will say that there is a commitment index, a leader has been found, everything is fine.
How do we deal with it? service consul restart in cron every hour. If you have 50 servers, it's not scary. When there are 16 of them, you will understand how it works.
Conclusion
As a result, we have received the following benefits:
- 100% coverage of all Linux machines.
- Speed.
- Automation.
- Freed hardware and network engineers from slavery.
- There are opportunities for integration that are almost limitless: even with Kubernetes, even with Ansible, even with Python.
Cons: Consul, with which we now live, and the very high price of a mistake. As an example, once at 6 pm (prime in Russia) I ruled something in the lists of networks. We were just then building isolation at BEFW. I made a mistake somewhere, I think I indicated the wrong mask, but everything fell in two seconds. The monitoring lights up, the support duty officer comes running: “We have everything!” The head of the department turned gray when he explained to the business why this happened.
The price of a mistake is so high that we have come up with our own complex prevention procedure. If you will implement this in a large production, you do not need to give a master token over Consul to everyone. This will end badly.
Cost. I wrote code for 400 hours alone. For support, my team of 4 people spends 10 hours a month for everyone. Compared to the price of any new generation firewall, it's free.
Plans. The long term plan is to look for alternative transportation to replace or in addition to Consul. Perhaps it will be Kafka or something similar. But in the coming years we will live on Consul.
Future plans: integration with Fail2ban, with monitoring, with nftables, possibly with other distributions, metrics, advanced monitoring, optimization. Support for Kubernetes is also somewhere in the plans, because now we have several clusters and desire.
More plans:
- search for anomalies in traffic;
- network map management;
- Kubernetes support;
- building packages for all systems;
- Web UI.
We are constantly working on expanding the configuration, increasing metrics and optimizing.
Join the project. The project turned out to be cool, but, unfortunately, this is still a project of one person. Come on and try to do something: commit, test, offer something, give your assessment.
In the meantime, we are getting ready for , which will be held on April 6 and 7 in St. Petersburg, and we invite developers of high-load systems . Experienced speakers already know what to do, and for newcomers to speaking, we recommend at least . Participation in the conference as a speaker has a number of advantages. Which, you can read, for example, at the end .
Source: habr.com
