How to evaluate and compare encryption devices for Ethernet networks

I wrote this review (or comparison guide, if you prefer) when I was assigned to compare several devices from different vendors. In addition, these devices belonged to different classes. I had to understand the architecture and characteristics of all these devices and draw up a "coordinate system" for comparison. I will be glad if my review helps someone:

  • Understand the descriptions and specifications of encryption devices
  • Distinguish "paper" characteristics from really important ones in real life
  • Go beyond the usual set of vendors and include in consideration any products suitable for solving the task
  • Ask the right questions in negotiations
  • Write tender requirements (RFP)
  • Understand what characteristics will have to be sacrificed if some device model is selected

What can be assessed

In principle, the approach is applicable to any standalone (standalone) devices suitable for encrypting network traffic between remote Ethernet segments (cross-site encryption). That is, “boxes” in a separate case (well, blades / modules for the chassis will also be included here), which are connected via one or more Ethernet ports to the local (campus) Ethernet network with unencrypted traffic, and through another port (ports) - to channel/network through which already encrypted traffic is transmitted to other, remote segments. Such an encryption solution can be deployed in a private or carrier network through different types of “transport” (“dark” fiber, frequency division equipment, switched Ethernet network, as well as “pseudo-wires” laid through a network with a different routing architecture, most often MPLS), with or without VPN technology.

How to evaluate and compare encryption devices for Ethernet networks
Network encryption in a distributed Ethernet network

The devices themselves can be either specialized (designed exclusively for encryption), or multifunctional (hybrid, convergent), that is, they also perform other functions (for example, a firewall or a router). Different vendors classify their devices into different classes/categories, but it doesn't matter - what matters is whether they can encrypt intersite traffic and what characteristics they have.

Just in case, I remind you that “network encryption”, “traffic encryption”, “encryptor” are informal terms, although they are often used. In Russian regulatory legal acts (including those introducing GOSTs), you most likely will not find them.

Encryption levels and transmission modes

Before proceeding with the description of the characteristics themselves that will be used for evaluation, we will first have to deal with one important thing, namely the “encryption level”. I noticed that it is often mentioned both in official documents of vendors (in descriptions, manuals, etc.) and in informal discussions (at negotiations, at trainings). That is, everyone seemed to know everything well what was at stake, but personally witnessed some confusion.

So, what exactly is an "encryption layer"? It is clear that we are talking about the layer number of the OSI / ISO reference network model at which encryption occurs. We read GOST R ISO 7498-2-99 “Information technology. The relationship of open systems. Base reference model. Part 2. Information security architecture. From this document, it can be understood that the level of the confidentiality service (one of the mechanisms for providing which is just encryption) is the level of the protocol, the service data unit (“payload”, user data) of which is encrypted. As it is also written in the standard, the service can be provided both at the same level, “on its own”, and with the help of a lower level (this is how, for example, this is most often implemented in MACsec).

In practice, there are two modes of transmitting encrypted information over the network (IPsec immediately comes to mind, but these modes are also found in other protocols). IN transport (sometimes also called native) mode is encrypted only service data block, and the headers remain "open", unencrypted (sometimes additional fields are added with the service information of the encryption algorithm, and other fields are modified, recalculated). IN tunnel same mode whole protocol the data block (that is, the packet itself) is encrypted and encapsulated in a service data block of the same or higher level, that is, it is framed with new headers.

By itself, the level of encryption in combination with some kind of transmission mode is neither good nor bad, so it cannot be said, for example, that L3 in transport mode is better than L2 in tunnel mode. It's just that many characteristics depend on them, according to which devices are evaluated. For example, flexibility and compatibility. To work in the network L1 (bit stream relay), L2 (frame switching) and L3 (packet routing) in transport mode, solutions are needed that encrypt at the same or higher level (otherwise the address information will be encrypted and the data will not reach its destination) , and the tunnel mode makes it possible to overcome this limitation (though sacrificing other important characteristics).

How to evaluate and compare encryption devices for Ethernet networks
Transport and tunnel modes of L2 encryption

And now let's move on to the analysis of the characteristics.

Performance

For network encryption, performance is a complex, multidimensional concept. It happens that a certain model, superior in one performance characteristic, is inferior in another. Therefore, it is always a good idea to consider all aspects of encryption performance and their impact on the performance of the network and the applications that use it. Here you can draw an analogy with a car, for which not only the maximum speed is important, but also the acceleration time to “hundreds”, fuel consumption, and so on. Vendor companies and their potential customers pay great attention to performance characteristics. As a rule, it is by performance that the ranking in the lines of vendors of encryption devices takes place.

It is clear that performance depends both on the complexity of the networking and cryptographic operations performed on the device (including how well these tasks lend themselves to parallelization and pipelining), as well as on the performance of the hardware and the quality of the firmware. Therefore, older models use more productive hardware, sometimes it is possible to equip it with additional processors and memory modules. There are several approaches to implementing cryptographic functions: on a general purpose central processing unit (CPU), on an application specific integrated circuit (ASIC), or on a programmable logic integrated circuit (FPGA). Each approach has its pros and cons. For example, the CPU can become an encryption bottleneck, especially if the processor does not have specialized instructions to support the encryption algorithm (or if they are not used). Specialized chips lack flexibility, and it is not always possible to “reflash” them to improve performance, add new features, or eliminate vulnerabilities. In addition, their use becomes profitable only with large volumes of output. That is why the "golden mean" has become so popular - the use of FPGA (in Russian, FPGA). It is on the FPGA that the so-called crypto accelerators are made - built-in or plug-in specialized hardware modules for supporting cryptographic operations.

Since it is about network encryption, it is logical that the performance of solutions should be measured in the same terms as for other network devices - throughput, frame loss percentage (frame loss) and latency (latency). These values ​​are defined in RFC 1242. By the way, nothing is written about the often mentioned delay variation (jitter) in this RFC. How to measure these quantities? I have not found a methodology approved in any standards (official or unofficial such as RFC) specifically for network encryption. It would be logical to use the methodology for network devices, enshrined in the RFC 2544 standard. Many vendors follow it - many, but not all. For example, test traffic is fed in only one direction instead of both, as recommended standard. Anyway.

Measuring the performance of network encryption devices still has its own characteristics. First, it is correct to carry out all measurements for a pair of devices: although the encryption algorithms are symmetrical, the delays and packet losses during encryption and decryption will not necessarily be equal. Secondly, it makes sense to measure exactly the delta, the impact of network encryption on the final network performance, comparing two configurations with each other: without encryption devices and with them. Or, as in the case of hybrid devices that combine several functions in addition to network encryption, with encryption turned off and on. This influence can be different and depend on the connection scheme of encryption devices, on modes of operation, and finally, on the nature of the traffic. In particular, many performance parameters depend on the length of packets, which is why to compare the performance of different solutions, graphs of the dependence of these parameters on the length of packets are often used, or they use IMIX - the distribution of traffic by packet lengths, which approximately reflects the real one. If we take the same basic configuration without encryption for comparison, we can compare network encryption solutions implemented differently without going into these differences: L2 with L3, store-and-forward ) with end-to-end (cut-through), specialized with convergent, GOST with AES and so on.

How to evaluate and compare encryption devices for Ethernet networks
Wiring Diagram for Performance Testing

The first characteristic that people pay attention to is the “speed” of the encryption device, that is bandwidth (bandwidth) of its network interfaces, bit rate. It is defined by the networking standards that are supported by the interfaces. For Ethernet, the usual numbers are 1 Gbps and 10 Gbps. But, as we know, in any network the maximum theoretical bandwidth (throughput) at each of its levels is always less than the bandwidth: part of the bandwidth is "eaten up" by interframe intervals, service headers, and so on. If the device is able to receive, process (in our case, encrypt or decrypt) and transmit traffic at the full speed of the network interface, that is, with the maximum theoretical bandwidth for this level of the network model, then it is said to work at line speed. To do this, it is necessary that the device does not lose, does not discard packets at any size and any frequency of their repetition. If the encryption device does not support line speed, then its maximum throughput is usually indicated in the same gigabits per second (sometimes indicating the length of the packets - the shorter the packets, the usually lower the throughput). It is very important to understand that the maximum throughput is the maximum lossless (even if the device can “pump” traffic through itself at a higher speed, but losing some of the packets). In addition, you need to pay attention to the fact that some vendors measure the total throughput between all pairs of ports, so these numbers mean little if all encrypted traffic goes through a single port.

Where is it especially important to work at line speed (or, in other words, without packet loss)? In high-bandwidth, high-latency links (such as satellite), where a large TCP window size must be set to maintain a high transmission rate, and where packet loss drastically reduces network performance.

But not all bandwidth is used to transfer payload data. We have to reckon with the so-called overhead (overhead) bandwidth. This is the portion of the encryption device's bandwidth (percentage or bytes per packet) that is actually wasted (cannot be used to transfer application data). Overhead costs arise, firstly, due to an increase in the size (padding, padding) of the data field in encrypted network packets (depending on the encryption algorithm and its mode of operation). Secondly, due to the increase in the length of the packet headers (tunnel mode, service insertion of the encryption protocol, impersonation, etc., depending on the protocol and the operating mode of the cipher and transmission mode) - usually these overheads are the most significant, and they pay attention first. Thirdly, due to packet fragmentation when the maximum data unit size (MTU) is exceeded (if the network can split a packet with an excess of MTU into two, duplicating its headers). Fourthly, due to the appearance in the network of additional service (control) traffic between encryption devices (for key exchange, tunneling, etc.). Low overhead is important where bandwidth is limited. This is especially evident in traffic from small packets, for example, voice - there overhead can "eat" more than half the speed of the channel!

How to evaluate and compare encryption devices for Ethernet networks
Throughput

Finally, there is more insertion delay is the difference (in fractions of a second) in the network delay (the time it takes data to travel from entering the network to leaving it) between data transmission without and with network encryption. Generally speaking, the smaller the delay (“latency”) of the network, the more critical the delay introduced by encryption devices becomes. The delay is introduced by the encryption operation itself (depending on the encryption algorithm, block length and cipher operation mode, as well as the quality of its implementation in the software), and the processing of the network packet in the device. The latency introduced depends both on the packet processing mode (end-to-end or store-and-forward) and platform performance (hardware implementation on FPGA or ASIC is generally faster than software implementation on CPU). L2 encryption almost always has lower latency compared to L3 or L4 encryption due to the fact that L3/L4 encryption devices are often converged. For example, for high-speed Ethernet encoders implemented on FPGA and encrypting on L2, the delay due to the encryption operation is vanishingly small - sometimes when encryption is enabled on a pair of devices, the total delay introduced by them even decreases! Low delay is important where it is comparable to the total channel delays, including the signal propagation delay, which is approximately 5 µs per kilometer. That is, we can say that for city-scale networks (tens of kilometers across), microseconds can solve a lot. For example, for synchronous database replication, high-frequency trading, the same blockchain.

How to evaluate and compare encryption devices for Ethernet networks
Insertion delay

Scalability

Large distributed networks may include many thousands of nodes and network devices, hundreds of segments of local networks. It is important that encryption solutions do not impose their own additional restrictions on the size and topology of a distributed network. This applies primarily to the maximum number of host and network addresses. Such limitations can be encountered, for example, when implementing a multipoint encryption-protected network topology (with independent secure connections, or tunnels) or selective encryption (for example, by protocol number or VLAN). If at the same time network addresses (MAC, IP, VLAN ID) are used as keys in a table, the number of rows in which is limited, then these restrictions appear here.

In addition, large networks often have several structural levels, including the core network, each of which implements its own addressing scheme and its own routing policy. To implement this approach, special frame formats (such as Q-in-Q or MAC-in-MAC) and route determination protocols are often used. In order not to interfere with the construction of such networks, encryption devices must correctly deal with such frames (that is, in this sense, scalability will mean compatibility - more on that below).

Flexibility

Here we are talking about supporting various configurations, connection schemes, topologies and other things. For example, for switched networks based on Carrier Ethernet technologies, this means support for different types of virtual connections (E-Line, E-LAN, E-Tree), different types of service (both ports and VLANs) and different transport technologies (they already listed above). That is, the device must be able to work both in linear (“point-to-point”) and multipoint modes, establish separate tunnels for different VLANs, and allow in-order packet delivery within a secure channel. The ability to select different modes of operation of the cipher (including with or without content authentication) and different modes of packet transmission allows you to strike a balance between strength and performance depending on current conditions.

It is also important to support both private networks, the equipment of which belongs to one organization (or is rented by it), and operator networks, different segments of which are run by different companies. It is good if the solution allows management both on its own and by a third-party organization (according to the managed service model). In operator networks, another important function is support for multi-tenancy (sharing by different customers) in the form of cryptographic isolation of individual customers (subscribers) whose traffic passes through the same set of encryption devices. As a rule, this requires the use of separate sets of keys and certificates for each customer.

If the device is purchased for a specific scenario, then all these features may not be very important - you just need to make sure that the device supports what you need now. But if the solution is purchased “for growth”, to support, among other things, future scenarios, and is chosen as a “corporate standard”, then flexibility will not be superfluous - especially taking into account the limitations on the interoperability of devices from different vendors (more on this below).

Simplicity and convenience

Serviceability is also a multifactorial concept. Approximately, we can say that these are the total time spent by specialists of a certain qualification necessary to support the solution at different stages of its life cycle. If there are no costs, and installation, configuration, operation are fully automatic, then the costs are zero, and the convenience is absolute. Of course, this does not happen in the real world. A reasonable approximation is a model "knot on a wire" (bump-in-the-wire), or transparent connection, in which adding and disabling encryption devices does not require any manual or automatic changes to the network configuration. This simplifies the maintenance of the solution: you can safely turn on / off the encryption function, and if necessary, simply “bypass” the device with a network cable (that is, connect directly those ports of the network equipment to which it was connected). True, there is one minus - the attacker can do the same. To implement the "node on the wire" principle, it is necessary to take into account the traffic not only data layerbut layers of control and management – Devices must be transparent to them. Therefore, such traffic can be encrypted only when there are no recipients of these types of traffic in the network between the encryption devices, since if it is discarded or encrypted, then the network configuration may change when encryption is turned on or off. The encryption device can also be transparent to signaling at the physical layer. In particular, when the signal is lost, it must transmit this loss (that is, turn off its transmitters) back and forth (“for itself”) in the direction of the signal.

Support in the division of powers between the departments of information security and IT, in particular, the network department, is also important. The encryption solution must support the organization's access control and auditing model. The need for interaction between different departments to perform routine operations should be minimized. Therefore, in terms of convenience, specialized devices that support only encryption functions and are as transparent as possible to network operations have an advantage in terms of convenience. Simply put, IS employees should not have a reason to contact "networkers" to change network settings. And those, in turn, should not need to change the encryption settings when maintaining the network.

Another factor is the capability and convenience of the controls. They should be visual, logical, provide import-export of settings, automation, and so on. Immediately you need to pay attention to what management options are available (usually it is its own management environment, web interface and command line) and with what set of functions in each of them (there are limitations). An important function is support out-of-band (out-of-band) control, i.e. through a dedicated control network, and in-band (in-band) control, that is, through a common network through which useful traffic is transmitted. Controls should signal all abnormal situations, including information security incidents. Routine, repetitive operations should be performed automatically. First of all, this applies to key management. They should be generated/distributed automatically. PKI support is a big plus.

Compatibility

That is, the compatibility of the device with network standards. And this refers not only to industrial standards adopted by authoritative organizations such as IEEE, but also proprietary protocols of industry leaders, such as Cisco. There are two fundamental ways to ensure compatibility: either through transparency, or through explicit support protocols (when the encryption device becomes one of the network nodes for some protocol and processes the control traffic of this protocol). Compatibility with networks depends on the completeness and correctness of the implementation of control protocols. It is important to support different PHY level options (speeds, transmission media, coding schemes), Ethernet frames of different formats with any MTU, different L3 service protocols (primarily the TCP / IP family).

Transparency is provided by the mechanisms of mutation (temporarily changing the contents of open headers in traffic between encryptors), skipping (when individual packets remain unencrypted) and indenting the beginning of encryption (when normally encrypted fields of packets are not encrypted).

How to evaluate and compare encryption devices for Ethernet networks
How transparency is ensured

Therefore, always specify exactly how support for a particular protocol is provided. Often support in transparent mode is more convenient and reliable.

Interoperability

This is also compatibility, but in a different sense, namely the ability to work with other models of encryption devices, including other manufacturers. Much depends on the state of standardization of encryption protocols. There are simply no generally accepted encryption standards on L1.

For L2 encryption on Ethernet networks, there is the 802.1ae (MACsec) standard, but it does not use through (end-to-end), and interport, "hop-by-hop" encryption, and in its original version is unsuitable for use in distributed networks, so its proprietary extensions have appeared that overcome this limitation (of course, due to interoperability with equipment from other manufacturers). True, in 2018, support for distributed networks was added to the 802.1ae standard, but there is still no support for GOST encryption algorithm sets. Therefore, proprietary, non-standard L2 encryption protocols, as a rule, are more efficient (in particular, lower bandwidth overhead) and flexible (the ability to change encryption algorithms and modes).

At higher levels (L3 and L4) there are recognized standards, primarily IPsec and TLS, then everything is not so simple here. The fact is that each of these standards is a set of protocols, each with different versions and extensions that are mandatory or optional for implementation. In addition, some manufacturers prefer to use their proprietary encryption protocols on L3/L4 as well. Therefore, in most cases, full interoperability should not be expected, but it is important that at least interoperability between different models and different generations of the same manufacturer is ensured.

Reliability

To compare different solutions, you can use either the mean time between failures or the availability factor. If these figures are not available (or there is no confidence in them), then a qualitative comparison can be performed. Devices with convenient management (less risk of configuration errors), specialized encoders (for the same reason), as well as solutions with a minimum failure detection and elimination time, including means of “hot” redundancy of nodes and devices as a whole, will have an advantage.

Price

When it comes to cost, as with most IT solutions, it's worth comparing total cost of ownership. To calculate it, you can not reinvent the wheel, but use any suitable methodology (for example, from Gartner) and any calculator (for example, the one that is already used in the organization to calculate TCO). It is clear that for a network encryption solution, the total cost of ownership is the sum of direct the cost of buying or renting the solution itself, the infrastructure for hosting equipment and the costs of deploying, administering and maintaining (whether in-house or through third-party services), as well as from indirect solution downtime costs (caused by loss of end user productivity). Perhaps there is only one subtlety. The performance impact of the solution can be taken into account in different ways: either as indirect costs caused by a drop in productivity, or as “virtual” direct costs for the purchase / upgrade and maintenance of network facilities that compensate for the network performance degradation due to the use of encryption. In any case, expenses that are difficult to calculate with sufficient accuracy, it is better to “bracket” the calculation: this way there will be more confidence in the final value. And, as usual, in any case, it makes sense to compare different devices by TCO for a specific scenario of their use - real or typical.

Resistance

And the last characteristic is the stability of the solution. In most cases, durability can only be assessed qualitatively by comparing different solutions. We must remember that encryption devices are not only a means, but also an object of protection. They may be exposed to various threats. In the foreground are the threats of privacy violations, reproduction and modification of messages. These threats can be implemented through vulnerabilities in the cipher or its individual modes, through vulnerabilities in encryption protocols (including at the stages of establishing a connection and generating/distributing keys). Solutions that allow changing the encryption algorithm or switching the cipher mode (at least through a firmware update), solutions that provide the most complete encryption that hide from an attacker not only user data, but also address and other service information, as well as those solutions that not only encrypt, but also protect messages from reproduction and modification. For all modern encryption algorithms, electronic signatures, key generation, and other things that are enshrined in standards, the strength can be taken the same (otherwise you can just get lost in the wilds of cryptography). Should it certainly be GOST algorithms? Everything is simple here: if the application scenario requires FSB certification for cryptographic information protection (and in Russia this is most often the case) for most network encryption scenarios this is the case), then we choose only between certified ones. If not, then there is no point in excluding devices without certificates from consideration.

Another threat is the threat of hacking, unauthorized access to devices (including through physical access outside and inside the case). The threat may come through
implementation vulnerabilities - in hardware and in code. Therefore, solutions with a minimal “surface for attack” through the network, with cases protected from physical access (with tamper sensors, with protection against probing and automatic reset of key information when the case is opened), as well as those that allow firmware updates, will have an advantage when a vulnerability in the code becomes known. There is another way: if all the compared devices have FSB certificates, then the CIPF class for which the certificate was issued can be considered an indicator of resistance to hacking.

Finally, another type of threat is errors during configuration and operation, the human factor in its purest form. This is another advantage of specialized encoders over convergent solutions, which are often focused on seasoned “networkers” and can cause difficulties for “ordinary”, broad-based information security specialists.

Summing up

In principle, here one could offer some integral indicator for comparing different devices, something like

$$display$$K_j=∑p_i r_{ij}$$display$$

where p is the weight of the indicator, and r is the rank of the device according to this indicator, and any of the above characteristics can be divided into “atomic” indicators. Such a formula could be useful, for example, when comparing bids according to pre-agreed rules. But you can get by with a simple table like

Characterization
Device 1
Device 2
...
Device N

Throughput
+
+

+ + +

Overheads
+
++

+ + +

Delay
+
+

++

Scalability
+ + +
+

+ + +

Flexibility
+ + +
++

+

Interoperability
++
+

+

Compatibility
++
++

+ + +

Simplicity and convenience
+
+

++

fault tolerance
+ + +
+ + +

++

Price
++
+ + +

+

Resistance
++
++

+ + +

I will be glad to answer questions and constructive criticisms.

Source: habr.com

Add a comment