nftables 0.9.9 packet filter release

The release of packet filter nftables 0.9.9 has been published, unifying packet filtering interfaces for IPv4, IPv6, ARP and network bridges (aimed at replacing iptables, ip6table, arptables and ebtables). At the same time, the release of the companion library libnftnl 1.2.0 was published, providing a low-level API for interacting with the nf_tables subsystem. The changes required for the nftables 0.9.9 release to work are included in the Linux kernel 5.13-rc1.

The nftables package includes packet filter components that run in user space, while the kernel level is provided by the nf_tables subsystem, which has been part of the Linux kernel since release 3.13. At the kernel level, only a generic protocol-independent interface is provided that provides basic functions for extracting data from packets, performing operations on data, and controlling flow.

The filtering rules themselves and protocol-specific handlers are compiled into user-space bytecode, after which this bytecode is loaded into the kernel using the Netlink interface and executed in the kernel in a special virtual machine resembling BPF (Berkeley Packet Filters). This approach makes it possible to significantly reduce the size of the filtering code running at the kernel level and move all the functions of parsing rules and the logic of working with protocols into user space.

Main innovations:

  • The ability to move flowtable processing to the network adapter side has been implemented, enabled using the 'offload' flag. Flowtable is a mechanism for optimizing the path of packet redirection, in which the complete passage of all rule processing chains is applied only to the first packet, and all other packets in the flow are forwarded directly. table ip global { flowtable f { hook ingress priority filter + 1 devices = { lan3, lan0, wan } flags offload } chain forward { type filter hook forward priority filter; policy accept; ip protocol { tcp, udp } flow add @f } chain post { type nat hook postrouting priority filter; policy accept; oifname "wan" masquerade } }
  • Added support for attaching an owner flag to a table to ensure exclusive use of the table by a process. When a process terminates, the table associated with it is automatically deleted. Information about the process is displayed in the rules dump in the form of a comment: table ip x { # progname nft flags owner chain y { type filter hook input priority filter; policy accept; counter packets 1 bytes 309 } }
  • Added support for the IEEE 802.1ad specification (VLAN stacking or QinQ), which defines a means for substituting multiple VLAN tags into a single Ethernet frame. For example, to check the type of external Ethernet frame 8021ad and vlan id=342, you can use the construction ... ether type 802.1ad vlan id 342 to check the external type of Ethernet frame 8021ad/vlan id=1, nested 802.1q/vlan id=2 and further IP packet encapsulation: ... ether type 8021ad vlan id 1 vlan type 8021q vlan id 2 vlan type ip counter
  • Added support for managing resources using the unified hierarchy cgroups v2. The key difference between cgroups v2 and v1 is the use of a common cgroups hierarchy for all types of resources, instead of separate hierarchies for allocating CPU resources, for regulating memory consumption, and for I/O. For example, to check whether the ancestor of a socket at the first level cgroupv2 matches the β€œsystem.slice” mask, you can use the construction: ... socket cgroupv2 level 1 β€œsystem.slice”
  • Added the ability to check components of SCTP packets (the functionality required for this will appear in the Linux kernel 5.14). For example, to check if a packet contains a chunk with type 'data' and field 'type': ... sctp chunk data exists ... sctp chunk data type 0
  • The execution of the rule loading operation has been accelerated by approximately two times using the β€œ-f” flag. The output of the list of rules has also been accelerated.
  • A compact form for checking whether flag bits are set is provided. For example, to check that the snat and dnat status bits are not set, you can specify: ... ct status ! snat,dnat to check that the syn bit is set in the bitmask syn,ack: ... tcp flags syn / syn,ack to check that the fin and rst bits are not set in the bitmask syn,ack,fin,rst: ... tcp flags ! = fin,rst / syn,ack,fin,rst
  • Allow the "verdict" keyword in set/map typeof definitions: add map xm { typeof iifname . ip protocol th dport : verdict ;}

Source: opennet.ru

Add a comment