My unfinished project. Network of 200 MikroTik routers

My unfinished project. Network of 200 MikroTik routers

Hi all. This article is intended for those who have a lot of Mikrotik devices in the park, and who want to make maximum unification so as not to connect to each device separately. In this article, I will describe a project that, unfortunately, did not reach combat conditions due to human factors. In short: more than 200 routers, quick setup and staff training, unification by region, filtering networks and specific hosts, the ability to easily add rules to all devices, logging and access control.

What is described below does not pretend to be a ready-made case, but I hope it will be useful to you when planning your networks and minimizing errors. Perhaps some points and decisions will not seem quite correct to you - if so, write in the comments. Criticism in this case will be an experience in a common piggy bank. Therefore, reader, look in the comments, perhaps the author made a gross mistake - the community will help.

The number of routers is 200-300, scattered in different cities with different quality of Internet connection. It is necessary to make everything beautiful and explain to local admins in an accessible way how everything will work.

So where does every project start? Of course, with TK.

  1. Organization of a network plan for all branches according to customer requirements, network segmentation (from 3 to 20 networks in branches, depending on the number of devices).
  2. Set up devices in each branch. Checking the real bandwidth of the provider in different working conditions.
  3. Organization of device protection, whitelist control, auto-detection of attacks with auto-blacklisting for a certain period of time, minimization of the use of various technical means used to intercept control access and denial of service.
  4. Organization of secure vpn connections with network filtering according to customer requirements. At least 3 vpn connections from each branch to the center.
  5. Based on points 1, 2. Choose the best ways to build fault-tolerant vpn. The dynamic routing technology, with the correct justification, can be chosen by the contractor.
  6. Organization of traffic prioritization by protocols, ports, hosts and other specific services that the customer uses. (VOIP, hosts with important services)
  7. Organization of monitoring and logging of router events for the response of technical support staff.

As we understand, in some cases, the TOR is compiled from the requirements. I formulated these requirements on my own, after listening to the main problems. He admitted the possibility that someone else could take up the implementation of these points.

What tools will be used to fulfill these requirements:

  1. ELK stack (after some time, it was understood that fluentd would be used instead of logstash).
  2. Ansible. For ease of administration and sharing of access, we will use AWX.
  3. GITLAB. There is no need to explain here. Where without version control of our configs.
  4. powershell. There will be a simple script for the initial generation of the config.
  5. Doku wiki, for writing documentation and manuals. In this case, we use habr.com.
  6. Monitoring will be done through zabbix. There will also be a connection diagram for a general understanding.

EFK setup points

On the first point, I will describe only the ideology on which the indexes will be built. There are many
excellent articles on setting up and receiving logs from devices running mikrotik.

I will dwell on some points:

1. According to the scheme, it is worth considering receiving logs from different places and on different ports. To do this, we will use a log aggregator. We also want to make universal graphics for all routers with the ability to share access. Then we build the indexes as follows:

here is a piece of config with fluentd elasticsearch
logstash_format true
index_name mikrotiklogs.north
logstash_prefix mikrotiklogs.north
flush_interval 10s
hosts elasticsearch: 9200
port 9200

Thus, we can combine routers and segment according to the plan - mikrotiklogs.west, mikrotiklogs.south, mikrotiklogs.east. Why make it so difficult? We understand that we will have 200 or more devices. Don't follow everything. Since version 6.8 of elasticsearch, security settings are available to us (without buying a license), thus, we can distribute viewing rights between technical support employees or local system administrators.
Tables, graphs - here you just need to agree - either use the same ones, or everyone does it as it will be convenient for him.

2. By logging. If we enable log in the firewall rules, then we make the names without spaces. It can be seen that using a simple config in fluentd, we can filter the data and make convenient panels. The picture below is my home router.

My unfinished project. Network of 200 MikroTik routers

3. According to the occupied space and logs. On average, with 1000 messages per hour, the logs take up 2-3 MB per day, which, you see, is not so much. elasticsearch version 7.5.

ANSIBLE.AWX

Fortunately for us, we have a ready-made module for routeros
I pointed out about AWX, but the commands below are only about ansible in its purest form - I think for those who have worked with ansible, there will be no problems using awx through the gui.

To be honest, before that I looked at other guides where they used ssh, and everyone had different problems with response time and a bunch of other problems. I repeat, it didn’t get to the battle , take this information as an experiment that didn’t go beyond a stand of 20 routers.

We need to use a certificate or an account. It's up to you to decide, I'm for certificates. Some subtle point on rights. I give the rights to write - at least “reset config” will not work.

There should be no problems with generating, copying the certificate and importing:

Brief listing of commandsOn your PC
ssh-keygen -t RSA, answer questions, save the key.
Copy to mikrotik:
user ssh-keys import public-key-file=id_mtx.pub user=ansible
First you need to create an account and allocate rights to it.
Checking the connection with the certificate
ssh -p 49475 -i /keys/mtx [email protected]

Write vi /etc/ansible/hosts
MT01 ansible_network_os=routeros ansible_ssh_port=49475 ansible_ssh_user= ansible
MT02 ansible_network_os=routeros ansible_ssh_port=49475 ansible_ssh_user= ansible
MT03 ansible_network_os=routeros ansible_ssh_port=49475 ansible_ssh_user= ansible
MT04 ansible_network_os=routeros ansible_ssh_port=49475 ansible_ssh_user= ansible

Well, an example of a playbook: name: add_work_sites
hosts:testmt
serial: 1
connection:network_cli
remote_user: mikrotik.west
gather_facts: yes
tasks:
name: add Work_sites
routeros_command:
commands:
- /ip firewall address-list add address=gov.ru list=work_sites comment=Ticket665436_Ochen_nado
- /ip firewall address-list add address=habr.com list=work_sites comment=for_habr

As you can see from the above configuration, compiling your own playbooks is a simple matter. It is good enough to master cli mikrotik. Imagine a situation where you need to remove the address list with certain data on all routers, then:

Find and remove/ip firewal address-list remove [find where list="gov.ru"]

I deliberately didn't include the entire firewall listing here. it will be individual for each project. But I can say one thing for sure, use only the address list.

According to GITLAB, everything is clear. I will not dwell on this moment. Everything is beautiful in terms of individual tasks, templates, handlers.

Powershell

There will be 3 files. Why powershell? The tool for generating configs can be chosen by anyone who is more comfortable. In this case, everyone has windows on their PC, so why do it on bash when powershell is more convenient. Who is more comfortable.

The script itself (simple and understandable):[cmdletBinding()] Param(
[Parameter(Mandatory=$true)] [string]$EXTERNALIPADDRESS,
[Parameter(Mandatory=$true)] [string]$EXTERNALIPROUTE,
[Parameter(Mandatory=$true)] [string]$BWorknets,
[Parameter(Mandatory=$true)] [string]$CWorknets,
[Parameter(Mandatory=$true)] [string]$BVoipNets,
[Parameter(Mandatory=$true)] [string]$CVoipNets,
[Parameter(Mandatory=$true)] [string]$CClientss,
[Parameter(Mandatory=$true)] [string]$BVPNWORKs,
[Parameter(Mandatory=$true)] [string]$CVPNWORKs,
[Parameter(Mandatory=$true)] [string]$BVPNCLIENTSs,
[Parameter(Mandatory=$true)] [string]$cVPNCLIENTSs,
[Parameter(Mandatory=$true)] [string]$NAMEROUTER,
[Parameter(Mandatory=$true)] [string]$ServerCertificates,
[Parameter(Mandatory=$true)] [string]$infile,
[Parameter(Mandatory=$true)] [string]$outfile
)

Get-Content $infile | Foreach-Object {$_.Replace("EXTERNIP", $EXTERNALIPADDRESS)} |
Foreach-Object {$_.Replace("EXTROUTE", $EXTERNALIPROUTE)} |
Foreach-Object {$_.Replace("BWorknet", $BWorknets)} |
Foreach-Object {$_.Replace("CWorknet", $CWorknets)} |
Foreach-Object {$_.Replace("BVoipNet", $BVoipNets)} |
Foreach-Object {$_.Replace("CVoipNet", $CVoipNets)} |
Foreach-Object {$_.Replace("CClients", $CClientss)} |
Foreach-Object {$_.Replace("BVPNWORK", $BVPNWORKs)} |
Foreach-Object {$_.Replace("CVPNWORK", $CVPNWORKs)} |
Foreach-Object {$_.Replace("BVPNCLIENTS", $BVPNCLIENTSs)} |
Foreach-Object {$_.Replace("CVPNCLIENTS", $cVPNCLIENTSs)} |
Foreach-Object {$_.Replace("MYNAMERROUTER", $NAMEROUTER)} |
Foreach-Object {$_.Replace("ServerCertificate", $ServerCertificates)} | Set-Content $outfile

I beg your pardon, I can’t lay out all the rules. it won't be pretty. You can make up the rules yourself, guided by the best practices.

For example, here is a list of links that I was guided by:wiki.mikrotik.com/wiki/Manual:Securing_Your_Router
wiki.mikrotik.com/wiki/Manual:IP/Firewall/Filter
wiki.mikrotik.com/wiki/Manual:OSPF-examples
wiki.mikrotik.com/wiki/Drop_port_scanners
wiki.mikrotik.com/wiki/Manual:Winbox
wiki.mikrotik.com/wiki/Manual:Upgrading_RouterOS
wiki.mikrotik.com/wiki/Manual:IP/Fasttrack - here you need to know that when fasttrack is enabled, the traffic prioritization and shaping rules will not work - useful for weak devices.

Variable conventions:The following networks are taken as an example:
192.168.0.0/24 working network
172.22.4.0/24 VOIP network
10.0.0.0/24 network for clients without LAN access
192.168.255.0/24 VPN network for large branches
172.19.255.0/24 VPN network for small

The network address consists of 4 decimal numbers, respectively ABCD, the replacement works according to the same principle, if it asks B at startup, then you need to enter the number 192.168.0.0 for the network 24/0, and for C = 0.
$EXTERNALIPADDRESS - allocated address from the provider.
$EXTERNALIPROUTE - default route to network 0.0.0.0/0
$BWorknets - Working network, in our example there will be 168
$CWorknets - Work network, in our example it will be 0
$BVoipNets - VOIP network in our example here 22
$CVoipNets - VOIP network in our example here 4
$CClientss - Network for clients - access only to the Internet, in our case here 0
$BVPNWORKs - VPN network for large branches, in our example 20
$CVPNWORKs - VPN network for large branches, in our example 255
$BVPNCLIENTS - VPN network for small branches, means 19
$CVPNCLIENTS - VPN network for small branches, means 255
$NAMEROUTER - router name
$ServerCertificate - the name of the certificate that you are importing first
$infile - Specify the path to the file from which we will read the config, for example D:config.txt (better English path without quotes and spaces)
$outfile - specify the path where to save, for example D:MT-test.txt

I deliberately changed the addresses in the examples for obvious reasons.

I missed the point on detecting attacks and anomalous behavior - this deserves a separate article. But it is worth pointing out that in this category you can use monitoring data values ​​​​from Zabbix + worked out curl data from elasticsearch.

What points to focus on:

  1. Network plan. It is better to write it in a readable form. Excel is enough. Unfortunately, I often see that networks are compiled according to the principle "A new branch has appeared, here's /24 for you." No one finds out how many devices are expected in a given location and whether there will be further growth. For example, a small store has opened, in which it is initially clear that the device will be no more than 10, why allocate / 24? For large branches, on the contrary, they allocate / 24, and there are 500 devices - you can just add a network, but you want to think everything through right away.
  2. Filtering rules. If the project assumes that there will be separation of networks and maximum segmentation. Best Practices change over time. Previously, they shared a PC network and a printer network, now it’s quite normal not to share these networks. It is worth using common sense and not producing many subnets where they are not needed and not combining all devices into one network.
  3. "Golden" settings on all routers. Those. if you have a plan. It is worth foreseeing everything at once and trying to make sure that all settings are identical - there are only different address list and ip addresses. In case of problems, the time for debugging will be less.
  4. Organizational aspects are no less important than technical ones. Often, lazy employees follow these recommendations “manually”, without using ready-made configurations and scripts, which ultimately leads to problems from scratch.

By dynamic routing. OSPF with zoning was used. But this is a test bench, in combat conditions such things are more interesting to set up.

I hope no one was upset that I did not post the configuration of the routers. I think that links will be enough, and then everything depends on the requirements. And of course tests, more tests are needed.

I wish everyone to realize their projects in the new year. May access granted be with you!!!

Source: habr.com

Add a comment