We write protection against DDoS attacks on XDP. Nuclear part

eXpress Data Path (XDP) technology allows arbitrary processing of traffic on Linux interfaces before packets enter the kernel network stack. Application of XDP - protection against DDoS attacks (CloudFlare), complex filters, statistics collection (Netflix). XDP programs are executed by the eBPF virtual machine, and therefore have restrictions on both their code and the available kernel functions, depending on the type of filter.

The article is intended to make up for the shortcomings of numerous materials on XDP. First, they provide ready-made code that immediately bypasses the features of XDP: prepared for verification or too simple to cause problems. When you try to write your own code from scratch later, there is no understanding of what to do with typical errors. Secondly, it does not cover ways to locally test XDP without a VM and hardware, despite the fact that they have their own pitfalls. The text is intended for programmers familiar with networks and Linux who are interested in XDP and eBPF.

In this part, we will understand in detail how the XDP filter is assembled and how to test it, then we will write a simple version of the well-known SYN cookies mechanism at the packet processing level. Until we form a "white list"
verified clients, keep counters and manage the filter - enough logs.

We will write in C - this is not fashionable, but practical. All code is available on GitHub at the link at the end and is divided into commits according to the steps described in the article.

Disclaimer. In the course of the article, a mini-solution for repelling DDoS attacks will be developed, because this is a realistic task for XDP and my area. However, the main goal is to understand the technology, this is not a guide to creating ready-made protection. The tutorial code is not optimized and omits some nuances.

A Brief Overview of XDP

I will state only the key points so as not to duplicate the documentation and existing articles.

So, the filter code is loaded into the kernel. The filter is passed incoming packets. As a result, the filter must make a decision: to pass the packet to the kernel (XDP_PASS), drop packet (XDP_DROP) or send it back (XDP_TX). The filter can change the package, this is especially true for XDP_TX. You can also crash the program (XDP_ABORTED) and drop the package, but this is analogous assert(0) - for debugging.

The eBPF (extended Berkley Packet Filter) virtual machine is deliberately made simple so that the kernel can check that the code does not loop and does not damage other people's memory. Cumulative restrictions and checks:

  • Loops (jumps back) are prohibited.
  • There is a stack for data, but no functions (all C functions must be inlined).
  • Accesses to memory outside the stack and packet buffer are prohibited.
  • The size of the code is limited, but in practice this is not very significant.
  • Only special kernel functions (eBPF helpers) are allowed.

Developing and installing a filter looks like this:

  1. source code (eg. kernel.c) compiles to object (kernel.o) for the eBPF virtual machine architecture. As of October 2019, compiling to eBPF is supported by Clang and promised in GCC 10.1.
  2. If in this object code there are calls to kernel structures (for example, to tables and counters), instead of their IDs there are zeros, that is, such code cannot be executed. Before loading into the kernel, these zeros must be replaced with the IDs of specific objects created through kernel calls (link the code). You can do this with external utilities, or you can write a program that will link and load a specific filter.
  3. The kernel verifies the program being loaded. It checks for the absence of cycles and non-exit of the package and stack boundaries. If the verifier cannot prove that the code is correct, the program is rejected - one must be able to please him.
  4. After successful verification, the kernel compiles the eBPF architecture object code into system architecture machine code (just-in-time).
  5. The program is attached to the interface and starts processing packets.

Since XDP runs in the kernel, debugging is based on trace logs and, in fact, on packets that the program filters or generates. However, eBPF keeps the downloaded code safe for the system, so you can experiment with XDP right on your local Linux.

Preparing the Environment

Assembly

Clang cannot directly issue object code for the eBPF architecture, so the process consists of two steps:

  1. Compile C code to LLVM bytecode (clang -emit-llvm).
  2. Convert bytecode to eBPF object code (llc -march=bpf -filetype=obj).

When writing a filter, a couple of files with auxiliary functions and macros will come in handy from kernel tests. It is important that they match the kernel version (KVER). Download them to helpers/:

export KVER=v5.3.7
export BASE=https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/plain/tools/testing/selftests/bpf
wget -P helpers --content-disposition "${BASE}/bpf_helpers.h?h=${KVER}" "${BASE}/bpf_endian.h?h=${KVER}"
unset KVER BASE

Makefile for Arch Linux (kernel 5.3.7):

CLANG ?= clang
LLC ?= llc

KDIR ?= /lib/modules/$(shell uname -r)/build
ARCH ?= $(subst x86_64,x86,$(shell uname -m))

CFLAGS = 
    -Ihelpers 
    
    -I$(KDIR)/include 
    -I$(KDIR)/include/uapi 
    -I$(KDIR)/include/generated/uapi 
    -I$(KDIR)/arch/$(ARCH)/include 
    -I$(KDIR)/arch/$(ARCH)/include/generated 
    -I$(KDIR)/arch/$(ARCH)/include/uapi 
    -I$(KDIR)/arch/$(ARCH)/include/generated/uapi 
    -D__KERNEL__ 
    
    -fno-stack-protector -O2 -g

xdp_%.o: xdp_%.c Makefile
    $(CLANG) -c -emit-llvm $(CFLAGS) $< -o - | 
    $(LLC) -march=bpf -filetype=obj -o $@

.PHONY: all clean

all: xdp_filter.o

clean:
    rm -f ./*.o

KDIR contains the path to the kernel headers, ARCH - system architecture. Paths and tools may vary slightly between distributions.

Difference example for Debian 10 (kernel 4.19.67)

# другая команда
CLANG ?= clang
LLC ?= llc-7

# другой каталог
KDIR ?= /usr/src/linux-headers-$(shell uname -r)
ARCH ?= $(subst x86_64,x86,$(shell uname -m))

# два дополнительных каталога -I
CFLAGS = 
    -Ihelpers 
    
    -I/usr/src/linux-headers-4.19.0-6-common/include 
    -I/usr/src/linux-headers-4.19.0-6-common/arch/$(ARCH)/include 
    # далее без изменений

CFLAGS include a directory with auxiliary headers and several directories with kernel headers. Symbol __KERNEL__ means that UAPI (userspace API) headers are defined for the kernel code, since the filter is executed in the kernel.

Stack protection can be disabled (-fno-stack-protector) because the eBPF code verifier checks for non-out of stack boundaries anyway. You should immediately enable optimizations, because the size of the eBPF bytecode is limited.

Let's start with a filter that passes all packets and does nothing:

#include <uapi/linux/bpf.h>

#include <bpf_helpers.h>

SEC("prog")
int xdp_main(struct xdp_md* ctx) {
    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

Team make collects xdp_filter.o. Where can you test it now?

Test stand

The stand should include two interfaces: on which there will be a filter and from which packets will be sent. These must be full Linux devices with their own IPs in order to check how regular applications work with our filter.

Devices like veth (virtual Ethernet) are suitable for us: they are a pair of virtual network interfaces “connected” directly to each other. You can create them like this (in this section, all commands ip performed from root):

ip link add xdp-remote type veth peer name xdp-local

Here xdp-remote и xdp-local — device names. On xdp-local (192.0.2.1/24) a filter will be attached, with xdp-remote (192.0.2.2/24) incoming traffic will be sent. However, there is a problem: the interfaces are on the same machine, and Linux will not send traffic to one of them through the other. You can solve it with tricky rules iptables, but they will have to change packages, which is inconvenient when debugging. It is better to use network namespaces (network namespaces, further netns).

The network namespace contains a set of interfaces, routing tables, and NetFilter rules that are isolated from similar objects in other netns. Each process runs in some namespace, and only the objects of this netns are available to it. By default, the system has a single network namespace for all objects, so you can work on Linux and not know about netns.

Let's create a new namespace xdp-test and move there xdp-remote.

ip netns add xdp-test
ip link set dev xdp-remote netns xdp-test

Then the process running in xdp-test, will not "see" xdp-local (it will remain in netns by default) and when sending a packet to 192.0.2.1 will pass it through xdp-remote, because that is the only interface at 192.0.2.0/24 available to this process. This also works in reverse.

When moving between netns, the interface goes down and loses the address. To set up an interface in netns, you need to run ip ... in this command namespace ip netns exec:

ip netns exec xdp-test 
    ip address add 192.0.2.2/24 dev xdp-remote
ip netns exec xdp-test 
    ip link set xdp-remote up

As you can see, this is no different from setting xdp-local in the default namespace:

    ip address add 192.0.2.1/24 dev xdp-local
    ip link set xdp-local up

If you run tcpdump -tnevi xdp-local, you can see that packets sent from xdp-test, are delivered to this interface:

ip netns exec xdp-test   ping 192.0.2.1

It is convenient to run a shell in xdp-test. The repository has a script that automates work with the stand, for example, you can set up the stand with the command sudo ./stand up and remove it sudo ./stand down.

tracing

The filter is attached to the device like this:

ip -force link set dev xdp-local xdp object xdp_filter.o verbose

Key -force needed to link a new program if another one is already linked. "No news is good news" is not about this command, the output is voluminous anyway. indicate verbose optional, but with it a report about the work of the code verifier with the assembler listing appears:

Verifier analysis:

0: (b7) r0 = 2
1: (95) exit

Detach the program from the interface:

ip link set dev xdp-local xdp off

In the script, these are the commands sudo ./stand attach и sudo ./stand detach.

By binding the filter, you can make sure that ping continues to work, but does the program work? Let's add logos. Function bpf_trace_printk() similar to printf(), but only supports up to three arguments other than the pattern, and a limited list of specifiers. Macro bpf_printk() simplifies the call.

   SEC("prog")
   int xdp_main(struct xdp_md* ctx) {
+      bpf_printk("got packet: %pn", ctx);
       return XDP_PASS;
   }

The output goes to the kernel trace channel, which needs to be enabled:

echo -n 1 | sudo tee /sys/kernel/debug/tracing/options/trace_printk

View message flow:

cat /sys/kernel/debug/tracing/trace_pipe

Both of these teams make a call sudo ./stand log.

Ping should now produce messages like this in it:

<...>-110930 [004] ..s1 78803.244967: 0: got packet: 00000000ac510377

If you look closely at the output of the verifier, you can notice strange calculations:

0: (bf) r3 = r1
1: (18) r1 = 0xa7025203a7465
3: (7b) *(u64 *)(r10 -8) = r1
4: (18) r1 = 0x6b63617020746f67
6: (7b) *(u64 *)(r10 -16) = r1
7: (bf) r1 = r10
8: (07) r1 += -16
9: (b7) r2 = 16
10: (85) call bpf_trace_printk#6
<...>

The fact is that eBPF programs do not have a data section, so the only way to encode the format string is the immediate arguments of the VM commands:

$ python -c "import binascii; print(bytes(reversed(binascii.unhexlify('0a7025203a74656b63617020746f67'))))"
b'got packet: %pn'

For this reason, debug output greatly bloats the resulting code.

Sending XDP Packets

Let's change the filter: let it send all incoming packets back. This is incorrect from a network point of view, since it would be necessary to change the addresses in the headers, but now the work in principle is important.

       bpf_printk("got packet: %pn", ctx);
-      return XDP_PASS;
+      return XDP_TX;
   }

Run tcpdump on xdp-remote. It should show identical outgoing and incoming ICMP Echo Request and stop showing ICMP Echo Reply. But it doesn't show. Turns out to work XDP_TX in the program for xdp-local necessaryto pair interface xdp-remote a program was also assigned, even if it was empty, and it was raised.

How did I know?

Tracing the path of a package in the kernel the perf events mechanism allows, by the way, using the same virtual machine, that is, eBPF is used for disassembly with eBPF.

You must make good out of evil, because there is nothing else to make of it.

$ sudo perf trace --call-graph dwarf -e 'xdp:*'
   0.000 ping/123455 xdp:xdp_bulk_tx:ifindex=19 action=TX sent=0 drops=1 err=-6
                                     veth_xdp_flush_bq ([veth])
                                     veth_xdp_flush_bq ([veth])
                                     veth_poll ([veth])
                                     <...>

What is code 6?

$ errno 6
ENXIO 6 No such device or address

Function veth_xdp_flush_bq() gets error code from veth_xdp_xmit(), where search by ENXIO and find a comment.

Restore the minimum filter (XDP_PASS) in file xdp_dummy.c, add it to the Makefile, bind to xdp-remote:

ip netns exec remote 
    ip link set dev int xdp object dummy.o

Now tcpdump shows what is expected:

62:57:8e:70:44:64 > 26:0e:25:37:8f:96, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 13762, offset 0, flags [DF], proto ICMP (1), length 84)
    192.0.2.2 > 192.0.2.1: ICMP echo request, id 46966, seq 1, length 64
62:57:8e:70:44:64 > 26:0e:25:37:8f:96, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 13762, offset 0, flags [DF], proto ICMP (1), length 84)
    192.0.2.2 > 192.0.2.1: ICMP echo request, id 46966, seq 1, length 64

If only ARP is shown instead, you need to remove the filters (this makes sudo ./stand detach), let ping, then install filters and try again. The problem is that the filter XDP_TX affects ARP as well, and if the stack
namespaces xdp-test managed to "forget" the MAC address 192.0.2.1, he will not be able to resolve this IP.

Formulation of the problem

Let's move on to the stated task: to write a SYN cookie mechanism on XDP.

Until now, the SYN flood remains a popular DDoS attack, the essence of which is as follows. When a connection is established (TCP handshake), the server receives a SYN, allocates resources for a future connection, responds with a SYNACK packet, and waits for an ACK. The attacker simply sends SYN packets from fake addresses in the amount of thousands per second from each host in a multi-thousand botnet. The server is forced to allocate resources immediately upon the arrival of the packet, but releases it after a long timeout, as a result, memory or limits are exhausted, new connections are not accepted, the service is unavailable.

If you do not allocate resources on the SYN packet, but only respond with a SYNACK packet, then how can the server understand that the ACK packet that came later belongs to the SYN packet that was not saved? After all, an attacker can also generate fake ACKs. The essence of the SYN cookie is to encode in seqnum connection parameters as a hash of addresses, ports and changing salt. If the ACK managed to arrive before the salt change, you can calculate the hash again and compare with acknum. fake acknum the attacker cannot, since the salt includes the secret, and will not have time to sort through it because of the limited channel.

SYN cookies have been implemented in the Linux kernel for a long time and can even be automatically enabled if SYNs arrive too quickly and in bulk.

Educational program on TCP handshake

TCP provides the transfer of data as a stream of bytes, for example, HTTP requests are transmitted over TCP. The stream is transmitted piece by piece in packets. All TCP packets have logical flags and 32-bit sequence numbers:

  • The combination of flags defines the role of a particular package. The SYN flag means that this is the sender's first packet on the connection. The ACK flag means that the sender has received all connection data up to a byte. acknum. A packet may have several flags and is named after their combination, for example, a SYNACK packet.

  • Sequence number (seqnum) specifies the offset in the data stream for the first byte that is sent in this packet. For example, if in the first packet with X bytes of data this number was N, in the next packet with new data it will be N+X. At the beginning of the connection, each party chooses this number randomly.

  • Acknowledgment number (acknum) - the same offset as seqnum, but it does not determine the number of the transmitted byte, but the number of the first byte from the recipient, which the sender did not see.

At the beginning of the connection, the parties must agree seqnum и acknum. The client sends a SYN packet with its seqnum = X. The server responds with a SYNACK packet, where it writes its own seqnum = Y and exposes acknum = X + 1. The client responds to SYNACK with an ACK packet, where seqnum = X + 1, acknum = Y + 1. After that, the actual data transfer begins.

If the interlocutor does not acknowledge receipt of the packet, TCP resends it by timeout.

Why are SYN cookies not always used?

First, if a SYNACK or ACK is lost, you will have to wait for a resend - the connection establishment slows down. Secondly, in the SYN packet - and only in it! - a number of options are transmitted that affect the further operation of the connection. Not remembering incoming SYN packets, the server thus ignores these options, in the following packets the client will no longer send them. TCP can work in this case, but at least at the initial stage, the quality of the connection will decrease.

In terms of packages, an XDP program should do the following:

  • respond to SYN with SYNACK with cookie;
  • answer ACK with RST (break the connection);
  • drop other packets.

Pseudocode of the algorithm along with packet parsing:

Если это не Ethernet,
    пропустить пакет.
Если это не IPv4,
    пропустить пакет.
Если адрес в таблице проверенных,               (*)
        уменьшить счетчик оставшихся проверок,
        пропустить пакет.
Если это не TCP,
    сбросить пакет.     (**)
Если это SYN,
    ответить SYN-ACK с cookie.
Если это ACK,
    если в acknum лежит не cookie,
        сбросить пакет.
    Занести в таблицу адрес с N оставшихся проверок.    (*)
    Ответить RST.   (**)
В остальных случаях сбросить пакет.

One (*) the points where you need to manage the state of the system are marked - at the first stage, you can do without them by simply implementing a TCP handshake with generating a SYN cookie as a seqnum.

On site (**), while we do not have a table, we will skip the packet.

TCP handshake implementation

Package parsing and code verification

We need network header structures: Ethernet (uapi/linux/if_ether.h), IPv4 (uapi/linux/ip.h) and TCP (uapi/linux/tcp.h). The last one I could not connect due to errors related to atomic64_t, I had to copy the necessary definitions into the code.

All functions that are distinguished in C for readability must be inlined at the call site, since the eBPF verifier in the kernel forbids back jumps, that is, in fact, loops and function calls.

#define INTERNAL static __attribute__((always_inline))

Macro LOG() disables printing in a release build.

The program is a pipeline of functions. Each receives a packet in which a header of the corresponding level is highlighted, for example, process_ether() waiting to be filled ether. Based on the results of field analysis, the function can transfer the packet to a higher level. The result of the function is an XDP action. While the SYN and ACK handlers let all packets through.

struct Packet {
    struct xdp_md* ctx;

    struct ethhdr* ether;
    struct iphdr* ip;
    struct tcphdr* tcp;
};

INTERNAL int process_tcp_syn(struct Packet* packet) { return XDP_PASS; }
INTERNAL int process_tcp_ack(struct Packet* packet) { return XDP_PASS; }
INTERNAL int process_tcp(struct Packet* packet) { ... }
INTERNAL int process_ip(struct Packet* packet) { ... }

INTERNAL int
process_ether(struct Packet* packet) {
    struct ethhdr* ether = packet->ether;

    LOG("Ether(proto=0x%x)", bpf_ntohs(ether->h_proto));

    if (ether->h_proto != bpf_ntohs(ETH_P_IP)) {
        return XDP_PASS;
    }

    // B
    struct iphdr* ip = (struct iphdr*)(ether + 1);
    if ((void*)(ip + 1) > (void*)packet->ctx->data_end) {
        return XDP_DROP; /* malformed packet */
    }

    packet->ip = ip;
    return process_ip(packet);
}

SEC("prog")
int xdp_main(struct xdp_md* ctx) {
    struct Packet packet;
    packet.ctx = ctx;

    // A
    struct ethhdr* ether = (struct ethhdr*)(void*)ctx->data;
    if ((void*)(ether + 1) > (void*)ctx->data_end) {
        return XDP_PASS;
    }

    packet.ether = ether;
    return process_ether(&packet);
}

I pay attention to the checks marked A and B. If you comment out A, the program will build, but there will be a verification error when loading:

Verifier analysis:

<...>
11: (7b) *(u64 *)(r10 -48) = r1
12: (71) r3 = *(u8 *)(r7 +13)
invalid access to packet, off=13 size=1, R7(id=0,off=0,r=0)
R7 offset is outside of the packet
processed 11 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Error fetching program/map!

Key string invalid access to packet, off=13 size=1, R7(id=0,off=0,r=0): there are execution paths when the thirteenth byte from the start of the buffer is outside the packet. It’s hard to tell from the listing which line we are talking about, but there is an instruction number (12) and a disassembler that shows the lines of the source code:

llvm-objdump -S xdp_filter.o | less

In this case, it points to the line

LOG("Ether(proto=0x%x)", bpf_ntohs(ether->h_proto));

which makes it clear that the problem is ether. It would always be like that.

Reply to SYN

The goal at this stage is to generate a correct SYNACK packet with a fixed seqnum, which will be replaced by the SYN cookie in the future. All changes take place in process_tcp_syn() and surroundings.

Checking the package

Oddly enough, here is the most remarkable line, or rather, a comment to it:

/* Required to verify checksum calculation */
const void* data_end = (const void*)ctx->data_end;

When writing the first version of the code, the 5.1 kernel was used, for the verifier of which there was a difference between data_end и (const void*)ctx->data_end. At the time of writing, the 5.3.1 kernel did not have this problem. Perhaps the compiler was accessing a local variable differently than a field. Moral - on a large nesting, simplifying the code can help.

Further routine checks of lengths for the glory of the verifier; O MAX_CSUM_BYTES below.

const u32 ip_len = ip->ihl * 4;
if ((void*)ip + ip_len > data_end) {
    return XDP_DROP; /* malformed packet */
}
if (ip_len > MAX_CSUM_BYTES) {
    return XDP_ABORTED; /* implementation limitation */
}

const u32 tcp_len = tcp->doff * 4;
if ((void*)tcp + tcp_len > (void*)ctx->data_end) {
    return XDP_DROP; /* malformed packet */
}
if (tcp_len > MAX_CSUM_BYTES) {
    return XDP_ABORTED; /* implementation limitation */
}

Package spread

Fill seqnum и acknum, set ACK (SYN already set):

const u32 cookie = 42;
tcp->ack_seq = bpf_htonl(bpf_ntohl(tcp->seq) + 1);
tcp->seq = bpf_htonl(cookie);
tcp->ack = 1;

Swap TCP ports, IP and MAC addresses. The standard library is not available from the XDP program, so memcpy() — a macro that hides the Clang intrinsik.

const u16 temp_port = tcp->source;
tcp->source = tcp->dest;
tcp->dest = temp_port;

const u32 temp_ip = ip->saddr;
ip->saddr = ip->daddr;
ip->daddr = temp_ip;

struct ethhdr temp_ether = *ether;
memcpy(ether->h_dest, temp_ether.h_source, ETH_ALEN);
memcpy(ether->h_source, temp_ether.h_dest, ETH_ALEN);

Checksum recalculation

IPv4 and TCP checksums require the addition of all 16-bit words in the headers, and the size of the headers is written in them, that is, at the time of compilation is unknown. This is a problem because the verifier won't skip the normal loop until the boundary variable. But the size of the headers is limited: up to 64 bytes each. You can make a loop with a fixed number of iterations, which can end early.

I note that there is RFC 1624 about how to recalculate the checksum partially if only the fixed words of the packets are changed. However, the method is not universal, and the implementation would be more difficult to maintain.

Checksum calculation function:

#define MAX_CSUM_WORDS 32
#define MAX_CSUM_BYTES (MAX_CSUM_WORDS * 2)

INTERNAL u32
sum16(const void* data, u32 size, const void* data_end) {
    u32 s = 0;
#pragma unroll
    for (u32 i = 0; i < MAX_CSUM_WORDS; i++) {
        if (2*i >= size) {
            return s; /* normal exit */
        }
        if (data + 2*i + 1 + 1 > data_end) {
            return 0; /* should be unreachable */
        }
        s += ((const u16*)data)[i];
    }
    return s;
}

Although size checked by the calling code, the second exit condition is necessary so that the verifier can prove the end of the loop.

For 32-bit words, a simpler version is implemented:

INTERNAL u32
sum16_32(u32 v) {
    return (v >> 16) + (v & 0xffff);
}

Actually recalculating the checksums and sending the packet back:

ip->check = 0;
ip->check = carry(sum16(ip, ip_len, data_end));

u32 tcp_csum = 0;
tcp_csum += sum16_32(ip->saddr);
tcp_csum += sum16_32(ip->daddr);
tcp_csum += 0x0600;
tcp_csum += tcp_len << 8;
tcp->check = 0;
tcp_csum += sum16(tcp, tcp_len, data_end);
tcp->check = carry(tcp_csum);

return XDP_TX;

Function carry() makes a checksum out of a 32-bit sum of 16-bit words, according to RFC 791.

TCP handshake check

The filter correctly establishes a connection with netcat, skipping the final ACK, to which Linux responded with an RST packet, since the network stack did not receive a SYN - it was converted to SYNACK and sent back - and from the point of view of the OS, a packet arrived that was not related to open connections.

$ sudo ip netns exec xdp-test   nc -nv 192.0.2.1 6666
192.0.2.1 6666: Connection reset by peer

It is important to check with full-fledged applications and observe tcpdump on xdp-remote because, for example, hping3 does not respond to incorrect checksums.

From the point of view of XDP, the check itself is trivial. The calculation algorithm is primitive and probably vulnerable to a sophisticated attacker. The Linux kernel, for example, uses the cryptographic SipHash, but its implementation for XDP is clearly beyond the scope of this article.

Appeared for new TODOs related to external interaction:

  • XDP program cannot store cookie_seed (the secret part of the salt) in a global variable, you need a kernel store whose value will be periodically updated from a reliable generator.

  • If the SYN cookie in the ACK packet matches, you do not need to print a message, but remember the IP of the verified client in order to further pass packets from it.

Validation by a legitimate client:

$ sudoip netns exec xdp-test   nc -nv 192.0.2.1 6666
192.0.2.1 6666: Connection reset by peer

The logs recorded the passage of the check (flags=0x2 is SYN flags=0x10 is ACK):

Ether(proto=0x800)
  IP(src=0x20e6e11a dst=0x20e6e11e proto=6)
    TCP(sport=50836 dport=6666 flags=0x2)
Ether(proto=0x800)
  IP(src=0xfe2cb11a dst=0xfe2cb11e proto=6)
    TCP(sport=50836 dport=6666 flags=0x10)
      cookie matches for client 20200c0

As long as there is no list of verified IPs, there will be no protection against the SYN flood itself, but here is the reaction to the ACK flood launched by this command:

sudo ip netns exec xdp-test   hping3 --flood -A -s 1111 -p 2222 192.0.2.1

Log entries:

Ether(proto=0x800)
  IP(src=0x15bd11a dst=0x15bd11e proto=6)
    TCP(sport=3236 dport=2222 flags=0x10)
      cookie mismatch

Conclusion

Sometimes eBPF in general and XDP in particular are presented as more of an advanced administrator's tool than a development platform. Indeed, XDP is a tool for interfering with kernel packet processing, and not an alternative to the kernel stack, like DPDK and other kernel bypass options. On the other hand, XDP allows you to implement rather complex logic, which, moreover, is easy to update without a pause in traffic processing. The verifier does not create big problems, personally I would not refuse such for parts of the userspace code.

In the second part, if the topic is interesting, we will complete the table of verified clients and break connections, implement counters and write a userspace utility to manage the filter.

Links:

Source: habr.com

Add a comment