ProHoster > Blog > Administration > How to take control of your network infrastructure. Chapter two. Cleaning and documentation
How to take control of your network infrastructure. Chapter two. Cleaning and documentation
This article is the second in a series of articles "How to take control of your network infrastructure." The content of all articles in the cycle and links can be found here.
Our goal at this stage is to bring order to the documentation and configuration.
As a result of this process, you should have the necessary set of documents and the network configured in accordance with them.
Now we will not talk about the security audit - the third part will be devoted to this.
The complexity of the task at this stage, of course, varies greatly from company to company.
The ideal situation is when
your network was created in accordance with the project and you have a complete set of documents
in accordance with this process, you have documents (including all necessary diagrams) that provide complete information about the current state of affairs
In this case, your task is quite simple. You should study the documents and review all the changes that have been made.
At worst, you will have
a network created without a project, without a plan, without coordination, by engineers who do not have a sufficient level of qualification,
with chaotic, undocumented changes, with a lot of "garbage" and suboptimal solutions
It is clear that your situation is somewhere in between, but, unfortunately, on this scale, better - worse with a high probability, you will be closer to the worst end.
In this case, you will also need the ability to read minds, because you will have to learn to understand what the “designers” wanted to do, restore their logic, finish what was not finished and remove the “garbage”.
And, of course, you will need to correct their mistakes, change (as little as possible at this stage) the design and change or re-create the circuits.
This article does not claim to be complete in any way. Here I will describe only the general principles and dwell on some common problems that have to be solved.
Set of documents
Let's start with an example.
The following are some of the documents that are customarily created at Cisco Systems during design.
CR – Customer Requirements, customer requirements (technical assignment).
It is created together with the customer and defines the requirements for the network.
HLD – High Level Design, high-level design based on network requirements (CR). The document explains and justifies the architectural decisions made (topology, protocols, equipment selection,…). The HLD does not include design details such as interfaces and IP addresses used. Also, the specific hardware configuration is not discussed here. Rather, this document is intended to explain key design concepts to the customer's technical management.
LLDs – Low Level Design, low-level design based on high-level (HLD).
It should contain all the details necessary for the implementation of the project, such as information on how to connect and set up the equipment. This is a complete guide to design implementation. This document should provide sufficient information for its implementation even by not very qualified personnel.
Something, for example, IP addresses, AS numbers, physical switching scheme (cabling), can be "rendered" into separate documents, such as NIP (Network Implementation Plan).
The construction of the network begins after the creation of these documents and takes place in strict accordance with them and then checked by the customer (tests) for compliance with the design.
Of course, different integrators, different clients, different countries may have different requirements for project documentation. But I would like to avoid formalities and consider the issue on the merits. This stage is not about designing, but about putting things in order, and we need a set of documents (diagrams, tables, descriptions ...) sufficient to complete our tasks.
And in my opinion, there is a certain absolute minimum, without which it is impossible to effectively control the network.
These are the following documents:
scheme (log) of physical switching (cabling)
network diagram or diagrams with essential L2/L3 information
Physical Switching Diagram
In some small companies, the work associated with the installation of equipment and physical switching (cabling) is the responsibility of network engineers.
In this case, the problem is partly solved by the following approach.
use a description on an interface to describe what is connected to it
administratively shutdown all unconnected network equipment ports
This will give you the ability, even if there is a problem with the link (when cdp or lldp is not working on this interface), to quickly determine what is connected to this port.
You can also easily see which ports are occupied and which are free, which is necessary for planning connections for new network equipment, servers or workstations.
But it is clear that if you lose access to the equipment, then you will lose access to this information. In addition, in this way you will not be able to record such important information as what kind of equipment, with what power consumption, with how many ports, in which rack it is located, what patch panels are there and where (in which rack / patch panel) they are connected . Therefore, additional documentation (not just descriptions on the hardware) is still very useful.
The ideal option is to use applications designed to work with this kind of information. But you can limit yourself to simple tables (for example, in Excel) or display information that you consider necessary in L1 / L2 diagrams.
Important!
A network engineer, of course, can know quite well the intricacies and standards of SCS, types of racks, types of uninterruptible power supplies, what is a cold and hot aisle, do the right grounding, ... just like in principle he can know elementary particle physics or C ++. But we must understand, nevertheless, that all this is not his area of \uXNUMXb\uXNUMXbknowledge.
Therefore, it is good practice to have either dedicated departments or dedicated people to solve problems related to installation, connection, maintenance of equipment, as well as physical switching. Usually for data centers this is data center engineers, and for the office - help-desk.
If such divisions are provided in your company, then the issues of logging the physical switching are not your task, and you can limit yourself to a description on the interface and administrative shutdown of unused ports.
Network diagrams
There is no universal approach to drawing diagrams.
Most importantly, the schemes should give an understanding of how traffic will go, through which logical and physical elements of your network.
By physical elements we mean
active equipment
interfaces/ports of active equipment
Under logical -
logical devices (N7K VDC, Palo Alto VSYS, ...)
VRF
Wealans
subinterfaces
the tunnels
zone
...
Also, if your network is not completely elementary, it will consist of different segments.
For example
data center
Internet
WAN
remote access
office LAN
DMZ
...
It would be wise to have several diagrams that give both the big picture (how traffic travels between all of these segments) and a detailed explanation of each individual segment.
Since there can be many logical levels in modern networks, it is probably a good (but not mandatory) approach to make different schemes for different levels, for example, in the case of an overlay approach, these schemes could be:
overlay
L1/L2 underlay
L3 underlay
Of course, the most important schema, without which it is impossible to understand the idea of your design, is the routing schema.
Routing scheme
At a minimum, this diagram should show
what routing protocols are used and where
basic information about routing protocol settings (area/AS number/router-id/…)
What devices are redistributed on?
where filtering and route aggregation takes place
default route information
Also, the L2 scheme (OSI) is often useful.
L2 scheme (OSI)
This diagram can show the following information:
which VLANs
which ports are trunk ports
which ports are aggregated in ether-channel (port channel), virtual port channel
what STP protocols are used and on what devices
basic STP settings: root/root backup, STP cost, port priority
An example of a bad approach to building a network.
Let's take a simple example of building a simple office LAN.
Having experience in teaching telecom to students, I can say that virtually any student by the middle of the second semester has the necessary knowledge (within the course that I taught) in order to set up a simple office LAN.
What's so difficult about connecting switches to each other, configuring VLANs, SVI interfaces (in the case of L3 switches) and setting up static routing?
Everything will work.
But at the same time, issues related to
security
reservation
network scaling
performance
throughput
reliability
...
From time to time I hear the statement that the office LAN is something very simple and I usually hear this from engineers (and managers) who do anything but networks, and they say it so confidently that do not be surprised if the LAN will be made by people with insufficient practice and knowledge and will be made with approximately the same mistakes that I will describe below.
Typical L1 Layer Design Errors (OSI)
If, nevertheless, you are also responsible for the SCS, then one of the most unpleasant legacies that you can get is careless and not thought out switching.
I would also include errors related to the resources of the equipment used, for example,
insufficient bandwidth
insufficient TCAM on equipment (or inefficient use of it)
insufficient performance (often referred to as firewalls)
Typical L2 Layer Design Errors (OSI)
Often, when there is no good understanding of how STP works, what potential problems it brings with it, switches are connected randomly, with default settings, without additional STP tuning.
As a result, we often have the following
large STP network diameter, which can lead to broadcast storms
STP root will be determined randomly (based on mac address) and the traffic path will be suboptimal
ports connecting to hosts will not be configured as edge (portfast), which will cause STP to be recalculated when the end stations are turned on / off
the network will not be segmented at the L1 / L2 level, as a result of which problems with any switch (for example, power overload) will lead to the recalculation of the STP topology and stop traffic in all VLANs on all switches (including the one that is critical from the point of view of continuity service segment)
Examples of errors in L3 design (OSI)
A few typical mistakes of beginner networkers:
frequent use (or use only) of static routing
use of routing protocols that are not optimal for a given design
suboptimal logical network segmentation
suboptimal use of address space, which does not allow route aggregation
lack of backup routes
no redundancy for default gateway
asymmetric routing when rebuilding routes (may be critical in case of NAT/PAT, statefull firewalls)
problems with MTU
when rerouting, traffic goes through other security zones or even other firewalls, which leads to the fact that this traffic drops
poor topology scalability
Design Quality Evaluation Criteria
When we talk about optimality / non-optimality, we must understand in terms of what criteria we can evaluate it. Here, from my point of view, are the most significant (but not all) criteria (and decoding in relation to routing protocols):
scalability
For example, you decide to add another data center. How easy can you do it.
ease of management
How easy and secure are operational changes to be made, such as announcing a new mesh or filtering routes
availability
What percentage of the time does your system provide the required level of service?
security
How secure is the transmitted data?
price
Changes
The basic principle at this stage can be expressed by the formula "do no harm."
Therefore, even if you do not fully agree with the design, and the chosen implementation (configuration), it is not always advisable to make changes. A reasonable approach is to rank all identified problems according to two parameters:
how easily this problem can be fixed
how much risk does it bear
First of all, you need to eliminate things that currently reduce the level of service provided below the acceptable level, for example, problems that lead to packet losses. Then fix what is easiest and safest to fix in order of decreasing severity of risk (from design or configuration issues that pose a greater risk to a lesser one).
Perfectionism at this stage can be harmful. Bring the design to a satisfactory state and synchronize the network configuration according to it.