How to take control of your network infrastructure. Chapter two. Cleaning and documentation

This article is the second in a series of articles "How to take control of your network infrastructure." The content of all articles in the cycle and links can be found here.

How to take control of your network infrastructure. Chapter two. Cleaning and documentation

Our goal at this stage is to bring order to the documentation and configuration.
As a result of this process, you should have the necessary set of documents and the network configured in accordance with them.

Now we will not talk about the security audit - the third part will be devoted to this.

The complexity of the task at this stage, of course, varies greatly from company to company.

The ideal situation is when

  • your network was created in accordance with the project and you have a complete set of documents
  • implemented in your company change control and management process for the network
  • in accordance with this process, you have documents (including all necessary diagrams) that provide complete information about the current state of affairs

In this case, your task is quite simple. You should study the documents and review all the changes that have been made.

At worst, you will have

  • a network created without a project, without a plan, without coordination, by engineers who do not have a sufficient level of qualification,
  • with chaotic, undocumented changes, with a lot of "garbage" and suboptimal solutions

It is clear that your situation is somewhere in between, but, unfortunately, on this scale, better - worse with a high probability, you will be closer to the worst end.

In this case, you will also need the ability to read minds, because you will have to learn to understand what the “designers” wanted to do, restore their logic, finish what was not finished and remove the “garbage”.
And, of course, you will need to correct their mistakes, change (as little as possible at this stage) the design and change or re-create the circuits.

This article does not claim to be complete in any way. Here I will describe only the general principles and dwell on some common problems that have to be solved.

Set of documents

Let's start with an example.

The following are some of the documents that are customarily created at Cisco Systems during design.

CR – Customer Requirements, customer requirements (technical assignment).
It is created together with the customer and defines the requirements for the network.

HLD – High Level Design, high-level design based on network requirements (CR). The document explains and justifies the architectural decisions made (topology, protocols, equipment selection,…). The HLD does not include design details such as interfaces and IP addresses used. Also, the specific hardware configuration is not discussed here. Rather, this document is intended to explain key design concepts to the customer's technical management.

LLDs – Low Level Design, low-level design based on high-level (HLD).
It should contain all the details necessary for the implementation of the project, such as information on how to connect and set up the equipment. This is a complete guide to design implementation. This document should provide sufficient information for its implementation even by not very qualified personnel.

Something, for example, IP addresses, AS numbers, physical switching scheme (cabling), can be "rendered" into separate documents, such as NIP (Network Implementation Plan).

The construction of the network begins after the creation of these documents and takes place in strict accordance with them and then checked by the customer (tests) for compliance with the design.

Of course, different integrators, different clients, different countries may have different requirements for project documentation. But I would like to avoid formalities and consider the issue on the merits. This stage is not about designing, but about putting things in order, and we need a set of documents (diagrams, tables, descriptions ...) sufficient to complete our tasks.

And in my opinion, there is a certain absolute minimum, without which it is impossible to effectively control the network.

These are the following documents:

  • scheme (log) of physical switching (cabling)
  • network diagram or diagrams with essential L2/L3 information

Physical Switching Diagram

In some small companies, the work associated with the installation of equipment and physical switching (cabling) is the responsibility of network engineers.

In this case, the problem is partly solved by the following approach.

  • use a description on an interface to describe what is connected to it
  • administratively shutdown all unconnected network equipment ports

This will give you the ability, even if there is a problem with the link (when cdp or lldp is not working on this interface), to quickly determine what is connected to this port.
You can also easily see which ports are occupied and which are free, which is necessary for planning connections for new network equipment, servers or workstations.

But it is clear that if you lose access to the equipment, then you will lose access to this information. In addition, in this way you will not be able to record such important information as what kind of equipment, with what power consumption, with how many ports, in which rack it is located, what patch panels are there and where (in which rack / patch panel) they are connected . Therefore, additional documentation (not just descriptions on the hardware) is still very useful.

The ideal option is to use applications designed to work with this kind of information. But you can limit yourself to simple tables (for example, in Excel) or display information that you consider necessary in L1 / L2 diagrams.

Important!

A network engineer, of course, can know quite well the intricacies and standards of SCS, types of racks, types of uninterruptible power supplies, what is a cold and hot aisle, do the right grounding, ... just like in principle he can know elementary particle physics or C ++. But we must understand, nevertheless, that all this is not his area of ​​\uXNUMXb\uXNUMXbknowledge.

Therefore, it is good practice to have either dedicated departments or dedicated people to solve problems related to installation, connection, maintenance of equipment, as well as physical switching. Usually for data centers this is data center engineers, and for the office - help-desk.

If such divisions are provided in your company, then the issues of logging the physical switching are not your task, and you can limit yourself to a description on the interface and administrative shutdown of unused ports.

Network diagrams

There is no universal approach to drawing diagrams.

Most importantly, the schemes should give an understanding of how traffic will go, through which logical and physical elements of your network.

By physical elements we mean

  • active equipment
  • interfaces/ports of active equipment

Under logical -

  • logical devices (N7K VDC, Palo Alto VSYS, ...)
  • VRF
  • Wealans
  • subinterfaces
  • the tunnels
  • zone
  • ...

Also, if your network is not completely elementary, it will consist of different segments.
For example

  • data center
  • Internet
  • WAN
  • remote access
  • office LAN
  • DMZ
  • ...

It would be wise to have several diagrams that give both the big picture (how traffic travels between all of these segments) and a detailed explanation of each individual segment.

Since there can be many logical levels in modern networks, it is probably a good (but not mandatory) approach to make different schemes for different levels, for example, in the case of an overlay approach, these schemes could be:

  • overlay
  • L1/L2 underlay
  • L3 underlay

Of course, the most important schema, without which it is impossible to understand the idea of ​​your design, is the routing schema.

Routing scheme

At a minimum, this diagram should show

  • what routing protocols are used and where
  • basic information about routing protocol settings (area/AS number/router-id/…)
  • What devices are redistributed on?
  • where filtering and route aggregation takes place
  • default route information

Also, the L2 scheme (OSI) is often useful.

L2 scheme (OSI)

This diagram can show the following information:

  • which VLANs
  • which ports are trunk ports
  • which ports are aggregated in ether-channel (port channel), virtual port channel
  • what STP protocols are used and on what devices
  • basic STP settings: root/root backup, STP cost, port priority
  • additional STP settings: BPDU guard/filter, root guard…

Common Design Errors

An example of a bad approach to building a network.

Let's take a simple example of building a simple office LAN.

Having experience in teaching telecom to students, I can say that virtually any student by the middle of the second semester has the necessary knowledge (within the course that I taught) in order to set up a simple office LAN.

What's so difficult about connecting switches to each other, configuring VLANs, SVI interfaces (in the case of L3 switches) and setting up static routing?

Everything will work.

But at the same time, issues related to

  • security
  • reservation
  • network scaling
  • performance
  • throughput
  • reliability
  • ...

From time to time I hear the statement that the office LAN is something very simple and I usually hear this from engineers (and managers) who do anything but networks, and they say it so confidently that do not be surprised if the LAN will be made by people with insufficient practice and knowledge and will be made with approximately the same mistakes that I will describe below.

Typical L1 Layer Design Errors (OSI)

  • If, nevertheless, you are also responsible for the SCS, then one of the most unpleasant legacies that you can get is careless and not thought out switching.

I would also include errors related to the resources of the equipment used, for example,

  • insufficient bandwidth
  • insufficient TCAM on equipment (or inefficient use of it)
  • insufficient performance (often referred to as firewalls)

Typical L2 Layer Design Errors (OSI)

Often, when there is no good understanding of how STP works, what potential problems it brings with it, switches are connected randomly, with default settings, without additional STP tuning.

As a result, we often have the following

  • large STP network diameter, which can lead to broadcast storms
  • STP root will be determined randomly (based on mac address) and the traffic path will be suboptimal
  • ports connecting to hosts will not be configured as edge (portfast), which will cause STP to be recalculated when the end stations are turned on / off
  • the network will not be segmented at the L1 / L2 level, as a result of which problems with any switch (for example, power overload) will lead to the recalculation of the STP topology and stop traffic in all VLANs on all switches (including the one that is critical from the point of view of continuity service segment)

Examples of errors in L3 design (OSI)

A few typical mistakes of beginner networkers:

  • frequent use (or use only) of static routing
  • use of routing protocols that are not optimal for a given design
  • suboptimal logical network segmentation
  • suboptimal use of address space, which does not allow route aggregation
  • lack of backup routes
  • no redundancy for default gateway
  • asymmetric routing when rebuilding routes (may be critical in case of NAT/PAT, statefull firewalls)
  • problems with MTU
  • when rerouting, traffic goes through other security zones or even other firewalls, which leads to the fact that this traffic drops
  • poor topology scalability

Design Quality Evaluation Criteria

When we talk about optimality / non-optimality, we must understand in terms of what criteria we can evaluate it. Here, from my point of view, are the most significant (but not all) criteria (and decoding in relation to routing protocols):

  • scalability
    For example, you decide to add another data center. How easy can you do it.
  • ease of management
    How easy and secure are operational changes to be made, such as announcing a new mesh or filtering routes
  • availability
    What percentage of the time does your system provide the required level of service?
  • security
    How secure is the transmitted data?
  • price

Changes

The basic principle at this stage can be expressed by the formula "do no harm."
Therefore, even if you do not fully agree with the design, and the chosen implementation (configuration), it is not always advisable to make changes. A reasonable approach is to rank all identified problems according to two parameters:

  • how easily this problem can be fixed
  • how much risk does it bear

First of all, you need to eliminate things that currently reduce the level of service provided below the acceptable level, for example, problems that lead to packet losses. Then fix what is easiest and safest to fix in order of decreasing severity of risk (from design or configuration issues that pose a greater risk to a lesser one).

Perfectionism at this stage can be harmful. Bring the design to a satisfactory state and synchronize the network configuration according to it.

Source: habr.com

Add a comment