How to take control of your network infrastructure. Chapter three. Network security. Part one

This article is the third in a series of articles "How to take control of your network infrastructure." The content of all articles in the cycle and links can be found here.

How to take control of your network infrastructure. Chapter three. Network security. Part one

It makes no sense to talk about the complete elimination of security risks. In principle, we cannot reduce them to zero. You also need to understand that while striving to make the network more and more secure, our solutions become more and more expensive. You need to find a reasonable trade-off between cost, complexity, and security for your network.

Of course, the security design is organically built into the overall architecture and the security solutions used affect the scalability, reliability, manageability, ... of the network infrastructure, which should also be taken into account.

But, let me remind you that now we are not talking about creating a network. According to our initial conditions we have already chosen a design, selected equipment, and created an infrastructure, and at this stage, if possible, we should “live” and find solutions in the context of the approach chosen earlier.

Our task now is to identify the risks associated with network-level security and reduce them to a reasonable level.

Network security audit

If your organization has implemented ISO 27k processes, then security audits and network changes should fit seamlessly into the overall processes within this approach. But these standards are still not about specific solutions, not about configuration, not about design ... There are no unambiguous advice, there are no standards that dictate in detail what your network should be, this is the complexity and beauty of this task.

I would highlight several possible network security audits:

  • hardware configuration audit (hardening)
  • security design audit
  • access audit
  • process audit

Hardware configuration audit (hardening)

It seems to be the best starting point for auditing and improving the security of your network in most cases. IMHO, this is a good demonstration of the Pareto law (20% of the effort gives 80% of the result, and the remaining 80% of the effort - only 20% of the result).

The bottom line is that we usually have recommendations from vendors regarding “best practices” for security when configuring equipment. This is called "hardening".

You can also often come across a questionnaire (or create it yourself) based on these recommendations, which will help you determine how your equipment configuration complies with these "best practices" and, in accordance with the result, make changes to your network. This will allow you to quite easily, in fact, at no cost, significantly reduce security risks.

A few examples for some Cisco operating systems.

Cisco IOS Configuration Hardening
Cisco IOS-XR Configuration Hardening
Cisco NX-OS Configuration Hardening
Cisco Baseline Security Check List

Based on these documents, a list of configuration requirements for each type of equipment can be generated. For example, for the Cisco N7K VDC, these requirements might look like so.

Thus, configuration files can be created for different types of active equipment in your network infrastructure. Further, manually or using automation, you can "upload" these configuration files. How to automate this process will be discussed in detail in another series of articles on orchestration and automation.

Security design audit

Typically, the following segments are present in an enterprise network in one form or another:

  • DC (Public services DMZ and Intranet data center)
  • Internet access
  • Remote access VPN
  • WAN edge
  • Branch
  • Campus (office)
  • Core

Names taken from Cisco SAFE models, but it is not necessary, of course, to become attached to these names and to this model. Still, I want to talk about the essence and not get bogged down in formalities.

For each of these segments, the security requirements, risks and, accordingly, solutions will be different.

Let's take a look at each of them individually for problems you may encounter in terms of security design. Of course, I repeat again that in no way does this article claim to be complete, which is not easy (if not impossible) to achieve in this really deep and multifaceted topic, but reflects my personal experience.

There is no perfect solution (at least for now). It's always a compromise. But it is important that the decision to apply this or that approach is made consciously, with an understanding of both its pluses and minuses.

Data Center

The most critical segment in terms of security.
And, as usual, there is no one-size-fits-all solution either. It all depends on the network requirements.

Do you need a firewall or not?

It would seem that the answer is obvious, but everything is not quite as clear as it might seem. And not only price.

Example 1. delays.

If between some segments of the network low latency is an essential requirement, which, for example, is true in the case of an exchange, then we will not be able to use firewalls between these segments. It's hard to find research on firewall delays, but only a few switch models can provide delays less than or on the order of 1 mksec, so I think that if microseconds are significant for you, then firewalls are not for you.

Example 2. Performance.

The bandwidth of top L3 switches is usually an order of magnitude higher than the bandwidth of the most productive firewalls. Therefore, in the case of high-density traffic, you will also most likely have to allow this traffic to bypass firewalls.

Example 3. Reliability.

Firewalls, especially modern NGFW (Next-Generation FW) are complex devices. They are much more complex than L3/L2 switches. They provide a large number of services and configuration options, so it is not surprising that their reliability is much lower. If service continuity is critical for the network, then you may have to choose what will lead to better availability - security through a firewall or the simplicity of a network built on switches (or various kinds of factories) using conventional ACLs.

In the case of the above examples, you will most likely (as usual) have to find a compromise. Look towards the following solutions:

  • if you decide not to use firewalls inside the data center, then you need to think about how to limit access around the perimeter as much as possible. For example, you can open only necessary ports from the Internet (for client traffic) and administrative access to the data center only from jump hosts. Perform all necessary checks on jump hosts (authentication / authorization, antivirus, logging, ...)
  • you can use the logical partitioning of the data center network into segments, like the scheme described in PSEFABRIC example p002. At the same time, routing should be configured in such a way that traffic that is sensitive to delays or high-intensity traffic goes "inside" one segment (in the case of p002, VRF-a) and does not go through the firewall. Traffic between different segments will still go through the firewall. You can also use route leaking between VRFs to avoid redirecting traffic through the firewall
  • you can also use the firewall in transparent mode and only for those VLANs where these factors (latency / performance) are not significant. But you need to carefully study the restrictions associated with the use of this mod for each vendor.
  • you might want to consider using a service chain architecture. This will allow only necessary traffic to pass through the firewall. Theoretically looks nice, but I've never seen this solution in production. We tested the service chain for Cisco ACI / Juniper SRX / F5 LTM about 3 years ago, but at that time this solution seemed “raw” to us

Protection level

Now you need to answer the question of what tools you want to use to filter traffic. Here are some of the features that are commonly found in NGFW (for example, here):

  • stateful firewalling (default)
  • application firewalling
  • threat prevention (antivirus, anti-spyware, and vulnerability)
  • URL filtering
  • data filtering (content filtering)
  • file blocking (file types blocking)
  • dos protection

And not everything is clear either. It would seem that the higher the level of protection, the better. But you also need to consider that

  • the more of the above firewall functions you use, the naturally it will be more expensive (licenses, additional modules)
  • the use of some algorithms can significantly reduce the throughput of the firewall, as well as increase delays, see for example here
  • like any complex solution, the use of complex security methods can reduce the reliability of your solution, for example, when using application firewalling, I encountered blocking of some quite standard applications (dns, smb)

You, as usual, need to find the best solution for your network.

It is impossible to unequivocally answer the question of what protection functions may be required. Firstly, because it certainly depends on the data that you are transmitting or storing and trying to protect. Secondly, in reality, often the choice of means of protection is a matter of faith and trust in the vendor. You don't know the algorithms, you don't know how efficient they are, and you can't fully test them.

Therefore, in critical segments, it may be a good solution to use offers from different companies. For example, you can enable antivirus on the firewall, but also use antivirus protection (from another vendor) locally on the hosts.

Segmentation

We are talking about the logical segmentation of the data center network. For example, partitioning into VLANs and subnets is also a logical segmentation, but we will not consider it because of its obviousness. Segmentation is interesting, taking into account such entities as FW security zones, VRF (and their analogues in relation to various vendors), logical devices (PA VSYS, Cisco N7K VDC, Cisco ACI Tenant, ...), ...

An example of such a logical segmentation and currently in demand data center design is given in p002 of the PSEFABRIC project.

Having defined the logical parts of your network, you can then describe how traffic moves between different segments, on which devices filtering will be performed and by what means.

If your network does not have a clear logical division and the rules for applying security policies for different data flows (flow) are not formalized, then this means that when you open one or another access, you are forced to solve this problem, and with a high probability you will solve it every time differently.

Often segmentation is based only on FW security zones. Then you need to answer the following questions:

  • what security zones do you need
  • what level of protection do you want to apply to each of these zones
  • whether intra-zone traffic will be allowed by default
  • if not, what traffic filtering policies will be applied within each of the zones
  • what traffic filtering policies will be applied for each pair of zones (source/destination)

CAGR

Often there is a problem of insufficient TCAM (Ternary Content Addressable Memory), both for routing and for accesses. IMHO, this is one of the most important issues when choosing equipment, so you need to treat this issue with the proper degree of care.

Example 1. Forwarding Table TCAM.

let's consider Palo Alto 7k firewall.
We see that IPv4 forwarding table size* = 32K
Moreover, this number of routes is common for all VSYSs.

Let's say that according to your design, you decide to use 4 VSYS.
Each of these VSYSs are connected via BGP to two MPLS cloud PEs that you use as a BB. Thus, 4 VSYSs exchange all specific routes with each other and have a forwarding table with approximately the same set of routes (but different NHs). Because each VSYS has 2 BGP sessions (with the same settings), then each route received via MPLS has 2 NH and, accordingly, 2 FIB entries in the Forwarding Table. If we assume that this is the only firewall in the data center and it must know about all routes, then this will mean that the total number of routes in our data center cannot be more than 32K / (4 * 2) = 4K.

Now, if we assume that we have 2 data centers (with the same design), and we want to use VLANs that are “stretched” between data centers (for example, for vMotion), then in order to solve the routing problem, we must use host routes. But this means that for 2 data centers we will have no more than 4096 possible hosts and, of course, this may not be enough.

Example 2: TCAM ACLs.

If you plan to filter traffic on L3 switches (or other solutions that use L3 switches, such as Cisco ACI), then you should pay attention to TCAM ACLs when choosing equipment.

Suppose you want to control accesses on Cisco Catalyst 4500 SVI interfaces. Then, as you can see from this article, you can use only 4096 lines of TCAM to control outgoing (as well as incoming) traffic on interfaces. Which when using TCAM3 will give you about 4000 thousand ACEs (ACL lines).

In case you are faced with the problem of insufficient TCAM, then, first of all, of course, you need to consider the possibility of optimization. So, in case of a problem with the size of the Forwarding Table, you need to consider the possibility of aggregating routes. In case of a problem with the TCAM size for accesses, auditing accesses, deleting obsolete and overlapping entries, and possibly revising the procedure for opening accesses (will be discussed in detail in the chapter on access auditing).

High availability

The question is whether to use HA for firewalls or put two independent boxes “in parallel” and in case one of them fails, route traffic through the second one?

It would seem that the answer is obvious - use HA. The reason why this question still arises is that, unfortunately, the theoretical and advertising 99 and a few nines after the decimal point of availability in practice turn out to be far from rosy. HA is logically quite a complicated thing, and on different equipment, and with different vendors (there were no exceptions), we caught problems and bugs and stopped the service.

In the case of using HA, you will be able to turn off individual nodes, switch between them without stopping the service, which is important, for example, when upgrading, but at the same time, you have a far from zero probability that both nodes will break at the same time, and also that the next the upgrade will not go as smoothly as the vendor promises (this problem can be avoided if you have the opportunity to test the upgrade on laboratory equipment).

If you don't use HA, then your risks in terms of double breakage are much lower (because you have 2 independent firewalls), but because sessions are not synchronized, then every time you switch between these firewalls you will lose traffic. You can, of course, use stateless firewalling, but then the meaning of using a firewall is largely lost.

Therefore, if you find lone firewalls as a result of the audit, and you are thinking about increasing the reliability of your network, then HA is, of course, one of the recommended solutions, but you should also take into account the disadvantages associated with this approach and, perhaps, specifically for your networks, another solution would be more appropriate.

Ease of management

In principle, HA is also about manageability. Instead of configuring 2 boxes separately and solving the configuration synchronization problem, you manage them much like you have one device.

But perhaps you have a lot of data centers and a lot of firewalls, then this question rises to a new level. And the question is not only about configuration, but also about

  • backup configurations
  • updates
  • upgrades
  • monitoring
  • logging

And all this can be solved by centralized management systems.

So, for example, if you are using Palo Alto firewalls, then Panorama is such a solution.

To be continued.

Source: habr.com

Add a comment