The nftables 0.9.9 packet filter has been released. It unifies packet filtering interfaces for IPv4, IPv6, ARP, and network bridges (targeted as a replacement for iptables, ip6table, arptables, and ebtables). The accompanying libnftnl 1.2.0 library, which provides a low-level API for interacting with the nf_tables subsystem, has been released simultaneously. The changes required for nftables 0.9.9 have been incorporated into the kernel. Linux 5.13-rc1.
The nftables package contains the packet filter components that operate in user space, while kernel-level work is provided by the nf_tables subsystem, which is part of the kernel. Linux Since release 3.13, only a generic protocol-independent interface is provided at the kernel level, providing basic functionality for extracting data from packets, performing data operations, and flow control.
The filtering rules themselves and protocol-specific handlers are compiled into bytecode in user space, after which this bytecode is loaded into the kernel using the Netlink interface and executed in the kernel in a special virtual machine, reminiscent of BPF (Berkeley Packet Filters). This approach allows for a significant reduction in the size of the filtering code running at the kernel level and moves all rule parsing and protocol logic into user space.
Main innovations:
- The ability to move flowtable processing to the network adapter side has been implemented, enabled using the 'offload' flag. Flowtable is a mechanism for optimizing the path of packet redirection, in which the complete passage of all rule processing chains is applied only to the first packet, and all other packets in the flow are forwarded directly. table ip global { flowtable f { hook ingress priority filter + 1 devices = { lan3, lan0, wan } flags offload } chain forward { type filter hook forward priority filter; policy accept; ip protocol { tcp, udp } flow add @f } chain post { type nat hook postrouting priority filter; policy accept; oifname "wan" masquerade } }
- Added support for attaching an owner flag to a table to ensure exclusive use of the table by a process. When a process terminates, the table associated with it is automatically deleted. Information about the process is displayed in the rules dump in the form of a comment: table ip x { # progname nft flags owner chain y { type filter hook input priority filter; policy accept; counter packets 1 bytes 309 } }
- Added support for the IEEE 802.1ad specification (VLAN stacking or QinQ), which defines a means for substituting multiple VLAN tags into a single Ethernet frame. For example, to check the type of external Ethernet frame 8021ad and vlan id=342, you can use the construction ... ether type 802.1ad vlan id 342 to check the external type of Ethernet frame 8021ad/vlan id=1, nested 802.1q/vlan id=2 and further IP packet encapsulation: ... ether type 8021ad vlan id 1 vlan type 8021q vlan id 2 vlan type ip counter
- Added support for managing resources using the unified hierarchy cgroups v2. The key difference between cgroups v2 and v1 is the use of a common cgroups hierarchy for all types of resources, instead of separate hierarchies for allocating CPU resources, for regulating memory consumption, and for I/O. For example, to check whether the ancestor of a socket at the first level cgroupv2 matches the “system.slice” mask, you can use the construction: ... socket cgroupv2 level 1 “system.slice”
- Added the ability to check the components of SCTP packets (the functionality required for operation will appear in the kernel Linux 5.14). For example, to check if a packet contains a chunk with the 'data' type and the 'type' field: … sctp chunk data exists … sctp chunk data type 0
- The execution of the rule loading operation has been accelerated by approximately two times using the “-f” flag. The output of the list of rules has also been accelerated.
- A compact form for checking whether flag bits are set is provided. For example, to check that the snat and dnat status bits are not set, you can specify: ... ct status ! snat,dnat to check that the syn bit is set in the bitmask syn,ack: ... tcp flags syn / syn,ack to check that the fin and rst bits are not set in the bitmask syn,ack,fin,rst: ... tcp flags ! = fin,rst / syn,ack,fin,rst
- Allow the "verdict" keyword in set/map typeof definitions: add map xm { typeof iifname . ip protocol th dport : verdict ;}
Source: opennet.ru
