ʻO kahi hoʻolauna pōkole i ka BPF a me ka eBPF

E Habr! Ke haʻi aku nei mākou iā ʻoe ke mākaukau nei mākou e hoʻokuʻu i kahi puke "Linux Observability me BPF".

ʻO kahi hoʻolauna pōkole i ka BPF a me ka eBPF
Ke hoʻomau nei ka hoʻomohala ʻana o ka mīkini virtual BPF a hoʻohana ikaika ʻia i ka hana, ua unuhi mākou i kahi ʻatikala no ʻoe e wehewehe ana i kāna mau hiʻohiʻona nui a me ke kūlana o kēia manawa.

I nā makahiki i hala iho nei, ua loaʻa i nā mea hana hoʻolālā a me nā ʻenehana i mea kaulana e hoʻopaʻi i nā palena o ka kernel Linux i nā hihia kahi e koi ʻia ai ka hoʻoili packet kiʻekiʻe. ʻO kekahi o nā ala kaulana loa o kēia ʻano i kapa ʻia kāʻei kumu (kernel bypass) a hiki ke hoʻokuʻu i ka ʻāpana pūnaewele o ka kernel, e hana i nā hana ʻeke a pau mai ka wahi mea hoʻohana. ʻO ke kāʻei ʻana i ka kernel e pili pū ana i ka mālama ʻana i ke kāleka pūnaewele mai wahi mea hoʻohana. Ma nā'ōlelo'ē aʻe, i ka hanaʻana me kahi kāleka pūnaewele, ke hilinaʻi nei mākou i ka mea hoʻokele wahi mea hoʻohana.

Ma ka hoʻoili ʻana i ka mana piha o ke kāleka pūnaewele i kahi papahana mea hoʻohana-space, hoʻemi mākou i ke poʻo i hoʻokumu ʻia e ka kernel (nā hoʻololi ʻōlelo, ka hoʻoili ʻana o ka papa pūnaewele, nā interrupts, a me nā mea ʻē aʻe), he mea nui loa ia i ka holo ʻana i ka wikiwiki o 10Gb / s a ​​i ʻole. kiʻekiʻe. Kaʻalo ʻana i ka kernel me ka hui pū ʻana o nā hiʻohiʻona ʻē aʻe (kaʻina hana pūʻulu) a me ka hoʻokō pono ʻana (helu helu NUMA, Kaʻawale CPU. Malia paha he laʻana hoʻohālike o kēia ala hou i ka hoʻoili ʻana i ka ʻeke DPDK mai Intel (ʻIkepili Hoʻolālā Plane), ʻoiai aia kekahi mau mea hana a me nā ʻenehana kaulana, me ka VPP mai Cisco (Vector Packet Processing), Netmap a, ʻoiaʻiʻo, snab.

ʻO ka hoʻonohonoho ʻana o nā pilina pūnaewele ma kahi o ka mea hoʻohana he nui nā hemahema:

  • ʻO kahi kernel OS he papa abstraction no nā kumuwaiwai lako. No ka mea pono e hoʻokele pono nā polokalamu hoʻohana-space i kā lākou mau kumuwaiwai, pono lākou e hoʻokele i kā lākou lako ponoʻī. ʻO kēia ka manaʻo pinepine e hoʻonohonoho i kāu mau mea hoʻokele ponoʻī.
  • Ma muli o ka hāʻawi piha ʻana o mākou i ke kikowaena kernel, ke hāʻawi pū nei mākou i nā hana pūnaewele āpau i hāʻawi ʻia e ka kernel. Pono nā polokalamu hoʻohana-space e hoʻokō hou i nā hiʻohiʻona i hāʻawi mua ʻia e ka kernel a i ʻole ka ʻōnaehana hana.
  • Ke hana nei nā papahana ma kahi ʻano pahu pahu, kahi e kaupalena nui ai i kā lākou pilina a pale iā lākou mai ka hoʻohui ʻana me nā ʻāpana ʻē aʻe o ka ʻōnaehana hana.

ʻO ka mea nui, i ka wā e hoʻopili ai i ka wahi mea hoʻohana, loaʻa ka loaʻa ʻana o ka hana ma ka neʻe ʻana i ka hoʻoili packet mai ka kernel a i kahi mea hoʻohana. Hana ʻo XDP i ka ʻaoʻao ʻē aʻe: hoʻoneʻe ia i nā polokalamu ʻoihana mai kahi mea hoʻohana (nā kānana, nā mea hoʻololi, ke alahele, a me nā mea ʻē aʻe) i ka ʻāpana kernel. ʻAe ʻo XDP iā mākou e hoʻokō i ka hana ʻoihana i ka wā e paʻi ai ka ʻeke i ke kikowaena pūnaewele a ma mua o ka hoʻomaka ʻana e hele i ka subsystem pūnaewele o ka kernel. ʻO ka hopena, ua hoʻonui nui ʻia ka wikiwiki o ka hoʻoili ʻana i ka packet. Eia nō naʻe, pehea e ʻae ai ka kernel i ka mea hoʻohana e holo i kā lākou mau papahana ma ke kikowaena kernel? Ma mua o ka pane ʻana i kēia nīnau, e nānā kākou i ka BPF.

BPF a me eBPF

ʻOiai ka inoa ʻaʻole maopopo loa, ʻo BPF (Packet Filtering, Berkeley), ʻoiaʻiʻo, he kumu hoʻohālike mīkini. Ua hoʻolālā mua ʻia kēia mīkini virtual e mālama i ka kānana packet, no laila ka inoa.

ʻO kekahi o nā mea hana kaulana loa e hoʻohana ana i ka BPF tcpdump. I ka hopu ʻana i nā ʻeke me tcpdump hiki i ka mea hoʻohana ke kuhikuhi i kahi huaʻōlelo no ka kānana packet. ʻO nā ʻeke wale nō e pili ana i kēia ʻōlelo e hopu ʻia. No ka laʻana, ka ʻōlelo "tcp dst port 80” pili i nā ʻeke TCP a pau e hōʻea ana ma ke awa 80. Hiki i ka mea hoʻopili ke hoʻopōkole i kēia ʻōlelo ma ka hoʻololi ʻana iā BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

ʻO kēia ka mea e hana ai ka papahana ma luna nei:

  • 'Ōlelo Aʻo (000): Hoʻouka i ka ʻeke ma ka offset 12, ma ke ʻano he huaʻōlelo 16-bit, i loko o ka accumulator. Hoʻopili ka Offset 12 i ke ʻano ethertype o ka ʻeke.
  • 'Ōlelo Aʻo (001): hoʻohālikelike i ka waiwai ma ka accumulator me 0x86dd, ʻo ia hoʻi, me ka waiwai ethertype no IPv6. Inā ʻoiaʻiʻo ka hopena, a laila hele ka papa kuhikuhi i ke aʻo (002), a inā ʻaʻole, a laila i (006).
  • 'Ōlelo Aʻo (006): hoʻohālikelike i ka waiwai me 0x800 (waiwai ethertype no IPv4). Inā ʻoiaʻiʻo ka pane, a laila hele ka papahana i (007), inā ʻaʻole, a laila i (015).

A pēlā aku, a hiki i ka hoʻihoʻi ʻana o ka polokalamu kānana packet i kahi hopena. ʻO ka maʻamau he boolean. ʻO ka hoʻihoʻi ʻana i kahi waiwai ʻaʻohe zero (aʻo (014)) ʻo ia ka mea ua kūlike ka ʻeke, a ʻo ka hoʻihoʻi ʻana i ka helu (aʻo (015)) ʻo ia ka mea ʻaʻole i kūlike ka ʻeke.

Ua noi ʻia ka mīkini virtual BPF a me kāna bytecode e Steve McCann lāua ʻo Van Jacobson i ka hopena o 1992 i ka wā i puka mai ai kā lākou pepa. BSD Packet Filter: Hoʻolālā hou no ka hoʻopaʻa paʻi kiʻekiʻe o ka mea hoʻohana, no ka manawa mua i hōʻike ʻia kēia ʻenehana ma ka ʻaha ʻo Usenix i ka hoʻoilo o 1993.

No ka mea he mīkini virtual ka BPF, wehewehe ia i ke kaiapuni e holo ai nā polokalamu. Ma waho aʻe o ka bytecode, wehewehe pū ʻo ia i kahi ʻano hoʻomanaʻo packet (ua hoʻopili ʻia nā ʻōlelo hoʻouka i kahi ʻeke), nā papa inoa (A a me X; accumulator and index registers), scratch memory storage, a me kahi counter program implicit. ʻO ka mea mahalo, ua hoʻohālikelike ʻia ka BPF bytecode ma hope o ka Motorola 6502 ISA. E like me kā Steve McCann i hoʻomanaʻo ai i kāna hoike plenary ma Sharkfest '11, ua kamaʻāina ʻo ia i ke kūkulu 6502 mai ke kula kiʻekiʻe i ka wā e hoʻolālā ai ma ka Apple II, a ua hoʻololi kēia ʻike i kāna hana hoʻolālā ʻana i ka BPF bytecode.

Hoʻokomo ʻia ke kākoʻo BPF ma ka Linux kernel ma ka mana v2.5 a ma hope, i hoʻohui ʻia e Jay Schullist. ʻAʻole i hoʻololi ʻia ka code BPF a hiki i ka makahiki 2011, i ka manawa i hoʻolālā hou ai ʻo Eric Dumaset i ka unuhi ʻōlelo BPF e hana ma ke ʻano JIT (Source: JIT no nā kānana Puke). Ma hope o kēlā, ma kahi o ka unuhi ʻana i ka BPF bytecode, hiki i ka kernel ke hoʻololi pololei i nā polokalamu BPF i ka hoʻolālā ʻana: x86, ARM, MIPS, etc.

Ma hope mai, i ka makahiki 2014, hāʻawi ʻo Alexei Starovoitov i kahi hana JIT hou no BPF. ʻO kaʻoiaʻiʻo, ua lilo kēia JIT hou i hale hana hou e pili ana i ka BPF a ua kapa ʻia ʻo eBPF. Manaʻo wau ua noho pū nā VM ʻelua no kekahi manawa, akā ke hoʻokō ʻia nei ka kānana packet ma luna o ka eBPF. ʻO ka ʻoiaʻiʻo, i loko o nā laʻana palapala hou hou, kapa ʻia ʻo BPF ʻo eBPF, a ʻike ʻia ka BPF maʻamau i kēia lā ʻo cBPF.

Hoʻonui ka eBPF i ka mīkini virtual BPF maʻamau ma nā ʻano he nui:

  • Ke hilinaʻi nei i nā hale hoʻolālā 64-bit hou. Hoʻohana ka eBPF i nā papa inoa 64-bit a hoʻonui i ka helu o nā papa inoa i loaʻa mai ka 2 (accumulator a me X) a i ka 10. Hāʻawi pū ka eBPF i nā opcode hou (BPF_MOV, BPF_JNE, BPF_CALL…).
  • Hoʻokaʻawale ʻia mai ka subsystem papa pūnaewele. Ua hoʻopaʻa ʻia ʻo BPF i ke kumu hoʻohālike ʻikepili. No ka mea i hoʻohana ʻia e kānana i nā ʻeke, aia kāna code i loko o ka subsystem i hāʻawi i nā pilina pūnaewele. Eia nō naʻe, ʻaʻole i hoʻopaʻa ʻia ka mīkini virtual eBPF i kahi kumu hoʻohālike a hiki ke hoʻohana ʻia no kekahi kumu. No laila, hiki ke hoʻopili ʻia ka papahana eBPF i ka tracepoint a i ʻole kprobe. Wehe kēia i ka puka i ka mea hana eBPF, ka nānā ʻana i ka hana, a me nā hihia hoʻohana ʻē aʻe he nui i ka pōʻaiapili o nā ʻōnaehana kernel ʻē aʻe. I kēia manawa aia ka code eBPF ma kona ala ponoʻī: kernel/bpf.
  • Nā hale kūʻai ʻikepili honua i kapa ʻia ʻo Maps. ʻO nā palapala palapala he mau hale kūʻai waiwai nui e hāʻawi ana i ka hoʻololi ʻikepili ma waena o ka mea hoʻohana a me ka lumi kernel. Hāʻawi ka eBPF i kekahi mau ʻano kāleka.
  • Nā hana lua. ʻO ka mea kūikawā, no ke kākau hou ʻana i kahi pūʻolo, e helu i kahi checksum, a i ʻole clone i kahi pūʻolo. Holo kēia mau hana i loko o ka kernel a ʻaʻole pili i nā polokalamu hoʻohana-space. Eia kekahi, hiki ke hana ʻia nā kelepona ʻōnaehana mai nā polokalamu eBPF.
  • Hoʻopau kelepona. Ua kaupalena ʻia ka nui o ka papahana ma eBPF i 4096 bytes. ʻO ka hiʻohiʻona kelepona hope e hiki ai i kahi polokalamu eBPF ke hoʻololi i ka mana i kahi polokalamu eBPF hou a no laila e kāpae i kēia palena (hiki i nā polokalamu 32 hiki ke hoʻopaʻa ʻia i kēia ala).

laʻana eBPF

Nui nā hiʻohiʻona no ka eBPF i nā kumu kumu Linux. Loaʻa iā lākou ma nā samples/bpf/. No ka hōʻuluʻulu ʻana i kēia mau laʻana, e kikokiko wale:

$ sudo make samples/bpf/

ʻAʻole wau e kākau i kahi laʻana hou no ka eBPF iaʻu iho, akā e hoʻohana wau i kekahi o nā laʻana i loaʻa i nā samples/bpf/. E nānā au i kekahi mau māhele o ke code a wehewehe i ka hana. Ma keʻano he laʻana, ua koho wau i ka papahana tracex4.

Ma ka laulā, ʻo kēlā me kēia o nā laʻana i nā samples/bpf/ he ʻelua faila. I kēia hihia:

  • tracex4_kern.c, loaʻa i ke code kumu e hoʻokō ʻia i loko o ka kernel e like me eBPF bytecode.
  • tracex4_user.c, loaʻa kahi papahana mai kahi mea hoʻohana.

I kēia hihia, pono mākou e hōʻuluʻulu tracex4_kern.c i ka eBPF bytecode. I kēia manawa i loko gcc ʻaʻohe ʻāpana kikowaena no ka eBPF. ʻO ka pōmaikaʻi, clang hiki ke hana i ka eBPF bytecode. Makefile hoʻohana clang e houluulu tracex4_kern.c i ka waihona mea.

Ua ʻōlelo wau ma luna nei ʻo kekahi o nā hiʻohiʻona hoihoi loa o ka eBPF he palapala ʻāina. wehewehe ʻo tracex4_kern i hoʻokahi palapala ʻāina:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH ʻo ia kekahi o nā ʻano kāleka he nui i hāʻawi ʻia e eBPF. I kēia hihia, he hash wale nō. Ua ʻike paha ʻoe i ka hoʻolaha SEC("maps"). ʻO SEC kahi macro i hoʻohana ʻia e hana i kahi ʻāpana hou o kahi faila binary. ʻOiaʻiʻo, ma ka laʻana tracex4_kern ʻelua mau ʻāpana hou i wehewehe ʻia:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // получаем ip-адрес вызывающей стороны kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

ʻAe kēia mau hana ʻelua iā ʻoe e wehe i kahi komo mai ka palapala ʻāina (kprobe/kmem_cache_free) a hoʻohui i kahi helu hou i ka palapala ʻāina (kretprobe/kmem_cache_alloc_node). ʻO nā inoa hana a pau i kākau ʻia ma nā leka nui e pili ana i nā macros i wehewehe ʻia ma bpf_helpers.h.

Inā hoʻolei au i nā ʻāpana o ka faila mea, pono wau e ʻike ua wehewehe ʻia kēia mau ʻāpana hou:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Aia kekahi tracex4_user.c, papahana nui. Basically, hoʻolohe kēia polokalamu i nā hanana kmem_cache_alloc_node. Ke loaʻa kahi hanana, hoʻokō ʻia ke code eBPF pili. Mālama ke code i ka ʻano IP o ka mea i ka palapala ʻāina, a laila hoʻopaʻa ʻia ka mea ma o ka papahana nui. Laʻana:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

Pehea ka pili o ka polokalamu hoʻohana a me ka papahana eBPF? I ka hoʻomaka ʻana tracex4_user.c hoʻouka mea waihona tracex4_kern.o hoʻohana i ka hana load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

ʻOiai e hana ana load_bpf_file Hoʻohui ʻia nā ʻimi i wehewehe ʻia i ka faila eBPF /sys/kernel/debug/tracing/kprobe_events. I kēia manawa hoʻolohe mākou i kēia mau hanana a hiki i kā mākou papahana ke hana i kekahi mea ke hiki mai.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

Hoʻonohonoho like ʻia nā papahana ʻē aʻe a pau i ka sample/bpf/. Loaʻa iā lākou ʻelua mau faila:

  • XXX_kern.c: polokalamu eBPF.
  • XXX_user.c: papahana nui.

Ho'ākāka ka papahana eBPF i nā palapala 'āina a me nā hana e pili ana i kahi ʻāpana. Ke hoʻopuka ka kernel i kahi hanana o kekahi ʻano (no ka laʻana, tracepoint), ua hoʻokō ʻia nā hana i hoʻopaʻa ʻia. Hāʻawi ka palapala ʻāina i ke kamaʻilio ma waena o kahi papahana kernel a me kahi papahana hoʻohana-space.

hopena

Ma kēia ʻatikala, ua kūkākūkā ʻia ʻo BPF a me eBPF ma nā ʻōlelo maʻamau. ʻIke wau he nui nā ʻike a me nā kumuwaiwai e pili ana i ka eBPF i kēia lā, no laila e paipai wau i kekahi mau mea hou aku no ke aʻo hou ʻana.

Paipai au e heluhelu:

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka