Intshayelelo emfutshane nge-BPF kunye ne-eBPF

Molo, Habr! Sithanda ukunazisa ukuba silungiselela incwadi esiza kuyikhupha."Ukuqwalaselwa kweLinux ngeBPF".

Intshayelelo emfutshane nge-BPF kunye ne-eBPF
Ukusukela oko umatshini we-BPF wenyani uqhubeka nokuvela kwaye usetyenziswa ngokusebenzayo, sikuguqulele inqaku elichaza amandla awo aphambili kunye nemeko yangoku.

Kwiminyaka yakutshanje, izixhobo zeprogram kunye nobuchule ziye zanda kakhulu ukuhlawulela imida ye-Linux kernel kwiimeko apho kufuneka ukusetyenzwa kwepakethi ephezulu. Enye yeendlela ezidumileyo zolu hlobo kuthiwa kernel bypass (i-kernel bypass) kwaye ivumela, ukugqitha umaleko wothungelwano lwe-kernel, ukwenza lonke ulungiso lwepakethi ukusuka kwindawo yomsebenzisi. Ukugqitha ikernel kukwabandakanya ukulawula ikhadi lenethiwekhi ukusuka indawo yomsebenzisi. Ngamanye amazwi, xa usebenza nekhadi lenethiwekhi, sithembele kumqhubi indawo yomsebenzisi.

Ngokudlulisela ulawulo olupheleleyo lwekhadi lomnatha kwiprogram yendawo yomsebenzisi, sinciphisa i-kernel overhead (ukuguqulwa komxholo, ukulungiswa kwe-network layer, ukuphazamisa, njl.), Okubaluleke kakhulu xa usebenza ngesantya se-10Gb / s okanye ngaphezulu. I-Kernel bypass kunye nendibaniselwano yezinye iimpawu (ukusetyenzwa kwebhetshi) kunye nohlengahlengiso lwentsebenzo ngononophelo (NUMA accounting, Ukwahlukaniswa kweCPUnjl. Mhlawumbi umzekelo ongumzekelo wale ndlela intsha yokusetyenzwa kwepakethi yi I-DPDK ukusuka kwi-Intel (IKhiti yoPhuhliso lweNdlela yeDatha), nangona kukho ezinye izixhobo kunye neendlela ezaziwayo-kakuhle, kuquka i-VPP ye-Cisco (i-Vector Packet Processing), i-Netmap kwaye, ngokuqinisekileyo, Snabb.

Ukulungelelanisa unxibelelwano lwenethiwekhi kwindawo yomsebenzisi kuneqela lezinto ezingeloncedo:

  • I-OS kernel ngumaleko wokuthabatha kwizixhobo zehardware. Ngenxa yokuba iinkqubo zesithuba somsebenzisi kufuneka zilawule izixhobo zabo ngokuthe ngqo, kufuneka zilawule i-hardware yazo. Oku kuthetha ukuba kufuneka ucwangcise abaqhubi bakho.
  • Kuba sincama indawo ye-kernel ngokupheleleyo, sikwancama konke ukusebenza kwenethiwekhi okubonelelwa yi-kernel. Iiprogram zesithuba somsebenzisi kufuneka ziphinde ziphumeze imisebenzi esele ibonelelwe yi-kernel okanye inkqubo yokusebenza.
  • Iinkqubo zisebenza kwimowudi yebhokisi yesanti, ethintela kakhulu ukusebenzisana kwabo kwaye ibathintele ekudibaneni nezinye iindawo zenkqubo yokusebenza.

Ngokwenene, xa uthungelwano kwindawo yomsebenzisi, iinzuzo zokusebenza zifezekiswa ngokuhambisa ipakethe yokusetyenzwa kwi-kernel ukuya kwindawo yomsebenzisi. I-XDP yenza kanye okwahlukileyo: ihambisa iinkqubo zothungelwano ukusuka kwindawo yomsebenzisi (izihluzi, izisombululi, iindlela, njl.njl.) kwisithuba sekernel. XDP isivumela ukuba senze umsebenzi wothungelwano nje ukuba ipakethi ibethe ujongano lomsebenzi wothungelwano naphambi kokuba iqalise ukuya phezulu kwindlela esezantsi yothungelwano lwekernel. Ngenxa yoko, isantya sokupakishwa kwepakethi sikhula kakhulu. Nangona kunjalo, i-kernel ivumela njani umsebenzisi ukuba aqhube iinkqubo zabo kwindawo ye-kernel? Ngaphambi kokuba uphendule lo mbuzo, makhe sijonge ukuba yintoni i-BPF.

I-BPF kunye ne-eBPF

Ngaphandle kwegama elididayo, i-BPF (i-Berkeley Packet Filtering) ngokwenene, imodeli yomatshini wenyani. Lo matshini wenyani wawuyilelwe ekuqaleni ukuphatha ukuhluza ipakethi, kungoko igama.

Esinye sezixhobo ezidumileyo zisebenzisa i-BPF yi tcpdump. Xa ubamba iipakethi usebenzisa tcpdump Umsebenzisi angakhankanya intetho yokucoca ipakethe. Kuphela iipakethi ezihambelana nalo mzekelo ziya kubanjwa. Umzekelo, ibinzana elithi β€œtcp dst port 80” ibhekisa kuzo zonke iipakethi ze-TCP ezifika kwi-port 80. Umqokeleli unokucutha eli binzana ngokuyiguqulela kwi-bytecode ye-BPF.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

Nantsi into eyenziwa yile nkqubo ingasentla:

  • Umyalelo (000): Ilayisha ipakethi kwi-offset 12, njengegama le-16-bit, kwi-accumulator. I-Offset 12 ihambelana ne-ethertype yepakethi.
  • Umyalelo (001): uthelekisa ixabiso kwi-accumulator ne-0x86dd, oko kukuthi, ngexabiso le-ethertype le-IPv6. Ukuba isiphumo siyinyani, ngoko ke ikhawunta yeprogram iya kumyalelo (002), kwaye ukuba akunjalo, ngoko ke ukuya ku-(006).
  • Umyalelo (006): uthelekisa ixabiso kunye ne-0x800 (ixabiso le-ethertype le-IPv4). Ukuba impendulo iyinyani, ngoko inkqubo iya ku-(007), ukuba akunjalo, ngoko ke ukuya ku-(015).

Kwaye njalo de inkqubo yokucoca ipakethe ibuyisela isiphumo. Oku kudla ngokuba yiBoolean. Ukubuyisela ixabiso elingengo-zero (umyalelo (014)) lithetha ukuba ipakethi yamkelwe, kwaye ukubuyisela ixabiso elinguziro (umyalelo (015)) lithetha ukuba ipakethi ayizange yamkelwe.

Umatshini obonakalayo we-BPF kunye ne-bytecode yawo yacetywa nguSteve McCann kunye noVan Jacobson ngasekupheleni kuka-1992 xa iphepha labo lapapashwa. Isihluzi sePakethi ye-BSD: Uyilo olutsha lweNqanaba loMsebenzisi lokuThatha ipakethe, le teknoloji yaboniswa okokuqala kwinkomfa yase-Usenix ebusika be-1993.

Ngenxa yokuba i-BPF ingumatshini obonakalayo, ichaza indawo apho iinkqubo zisebenza. Ukongeza kwi-bytecode, iphinda ichaze imodeli yememori ye-batch (imiyalelo yomthwalo isetyenziswe ngokuthe ngqo kwi-batch), iirejista (A kunye no-X; i-accumulator kunye neerejista zesalathisi), ukugcinwa kwememori ye-scratch, kunye ne-counter counter ecacileyo. Okubangela umdla kukuba, i-BPF bytecode imodelwe emva kwe-Motorola 6502 ISA. Njengoko uSteve McCann wakhumbulayo kweyakhe ingxelo yesigqeba e-Sharkfest '11, wayeqhelene nokwakha i-6502 ukusuka kwiprogram yakhe yeentsuku zesikolo samabanga aphakamileyo kwi-Apple II, kwaye olu lwazi lwaba nefuthe kumsebenzi wakhe wokuyila i-bytecode ye-BPF.

Inkxaso ye-BPF iphunyezwa kwi-Linux kernel kwiinguqulelo ze-v2.5 nangaphezulu, zongezwa ngokukodwa yimigudu kaJay Schullist. Ikhowudi ye-BPF yahlala ingatshintshi kwada kwango-2011, xa u-Eric Dumaset wayeyila ngokutsha itoliki ye-BPF ukuba isebenze kwimo ye-JIT (Umthombo: I-JIT yezihluzi zepakethi). Emva koko, i-kernel, endaweni yokutolika i-BPF bytecode, inokuguqula ngokuthe ngqo iinkqubo ze-BPF kwi-architecture ekujoliswe kuyo: x86, i-ARM, i-MIPS, njl.

Kamva, ngo-2014, u-Alexey Starovoitov ucebise indlela entsha ye-JIT ye-BPF. Ngapha koko, le JIT intsha yaba yinto entsha esekwe kwi-BPF kwaye yayibizwa ngokuba yi-eBPF. Ndicinga ukuba zombini ii-VM zahlala ixesha elithile, kodwa okwangoku ukuhluzwa kwepakethi kuphunyeziwe ngokusekwe kwi-eBPF. Enyanisweni, kwimizekelo emininzi yamaxwebhu anamhlanje, i-BPF iqondwa njenge-eBPF, kwaye i-BPF yakudala yaziwa namhlanje njenge-cBPF.

I-eBPF yandisa umatshini we-BPF wakudala ngeendlela ezininzi:

  • Ngokusekwe kuyilo lwangoku lwe-64-bit. I-eBPF isebenzisa iirejista ezingama-64-bit kwaye yandisa inani leerejista ezikhoyo ukusuka ku-2 (i-accumulator no-X) ukuya kutsho ku-10. I-eBPF ikwabonelela ngee-opcodes ezongezelelweyo (BPF_MOV, BPF_JNE, BPF_CALL...).
  • Ikhutshiwe kwisixokelelwano somaleko wothungelwano. I-BPF yayibotshelelwe kwimodeli yedatha yebhetshi. Ekubeni yayisetyenziselwa ukuhluza ipakethe, ikhowudi yayo yayikwinkqubo engaphantsi ebonelela ngonxibelelwano lwenethiwekhi. Nangona kunjalo, umatshini we-eBPF wenyani awusabotshelelwa kwimodeli yedatha kwaye unokusetyenziselwa nayiphi na injongo. Ke, ngoku inkqubo ye-eBPF inokudityaniswa kwi-tracepoint okanye i-kprobe. Oku kuvula indlela ye-instrumentation ye-eBPF, uhlalutyo lokusebenza, kunye nezinye iimeko ezininzi zokusetyenziswa kumxholo wezinye ii-kernel subsystems. Ngoku ikhowudi ye-eBPF ibekwe kwindlela yayo: i-kernel/bpf.
  • Iivenkile zedatha zehlabathi ezibizwa ngokuba ziiMaphu. Iimephu ziivenkile zexabiso eziphambili ezenza utshintshiselwano lwedatha phakathi kwendawo yomsebenzisi kunye nendawo yekernel. I-eBPF ibonelela ngeendidi ezininzi zeemephu.
  • Imisebenzi yesibini. Ngokukodwa, ukubhala kwakhona ipakethe, ukubala i-checksum, okanye uhlanganise ipakethe. Le misebenzi isebenza ngaphakathi kwe-kernel kwaye ayizizo iinkqubo zesithuba somsebenzisi. Unokwenza iifowuni zesistim kwiinkqubo ze-eBPF.
  • Phelisa iminxeba. Ubungakanani beprogram kwi-eBPF buqingqelwe kwi-4096 bytes. Isici sokufowuna komsila sivumela inkqubo ye-eBPF ukuba idlulisele ulawulo kwinkqubo entsha ye-eBPF kwaye ngaloo ndlela iwugqithise lo mda (ukuya kuthi ga kwiiprogram ezingama-32 zinokudityaniswa ngolu hlobo).

eBPF: umzekelo

Kukho imizekelo emininzi ye-eBPF kwimithombo ye-Linux kernel. Zifumaneka kwiisampuli/bpf/. Ukuqokelela le mizekelo, faka ngokulula:

$ sudo make samples/bpf/

Andizi kubhala umzekelo omtsha we-eBPF ngokwam, kodwa ndiza kusebenzisa enye yeesampuli ezikhoyo kwiisampuli/bpf/. Ndiza kujonga ezinye iindawo zekhowudi kwaye ndichaze indlela esebenza ngayo. Njengomzekelo, ndakhetha inkqubo tracex4.

Ngokubanzi, imizekelo nganye kwiisampuli/bpf/ ineefayile ezimbini. Kule meko:

  • tracex4_kern.c, iqulethe ikhowudi yemvelaphi eya kusetyenziswa kwi-kernel njenge-eBPF bytecode.
  • tracex4_user.c, iqulethe inkqubo esuka kwindawo yomsebenzisi.

Kule meko, kufuneka sihlanganise tracex4_kern.c kwi-eBPF bytecode. Ngoku ungaphakathi gcc akukho mva we-eBPF. Ngethamsanqa, clang inokukhupha i-eBPF bytecode. Makefile isebenzisa clang ukuqokelela tracex4_kern.c kwifayile yento.

Ndikhankanye apha ngasentla ukuba enye yezona zinto zinomdla kwi-eBPF ziimephu. tracex4_kern ichaza imephu enye:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH lolunye lweentlobo ezininzi zamakhadi abonelelwa yi-eBPF. Kule meko, i-hash nje. Usenokuba uqaphele intengiso SEC("maps"). I-SEC yimacro esetyenziselwa ukwenza icandelo elitsha lefayile yokubini. Enyanisweni, kumzekelo tracex4_kern amacandelo amabini ngaphezulu achaziwe:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ ip-адрСс Π²Ρ‹Π·Ρ‹Π²Π°ΡŽΡ‰Π΅ΠΉ стороны kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

Le misebenzi mibini ikuvumela ukuba ucime ungeniso kwimephu (kprobe/kmem_cache_free) kwaye yongeza ungeno olutsha kwimephu (kretprobe/kmem_cache_alloc_node). Onke amagama emisebenzi abhalwe ngoonobumba abakhulu ahambelana neemacros ezichazwe kuyo bpf_helpers.h.

Ukuba ndilahla amacandelo efayile yento, kufuneka ndibone ukuba la macandelo amatsha sele echaziwe:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Kukho kwakhona tracex4_user.c, inkqubo ephambili. Ngokusisiseko, le nkqubo imamela imicimbi kmem_cache_alloc_node. Xa isiganeko esinjalo sisenzeka, ikhowudi ye-eBPF ehambelanayo iphunyeziwe. Ikhowudi igcina uphawu lwe-IP lwento kwimephu, kwaye into leyo ikhutshwe ngeprogram ephambili. Umzekelo:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

Ingaba inkqubo yendawo yomsebenzisi kunye nenkqubo ye-eBPF inxulumana njani? Ekuqalisweni tracex4_user.c ilayisha ifayile yento tracex4_kern.o usebenzisa umsebenzi load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

Ngokwenza load_bpf_file iiprobe ezichazwe kwifayile ye-eBPF zongezwa kuyo /sys/kernel/debug/tracing/kprobe_events. Ngoku simamele ezi ziganeko kwaye inkqubo yethu inokwenza into xa isenzeka.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

Zonke ezinye iinkqubo kwisampulu/bpf/ zakhiwe ngendlela efanayo. Zihlala zineefayile ezimbini:

  • XXX_kern.c: Inkqubo ye-eBPF.
  • XXX_user.c: inkqubo ephambili.

Inkqubo ye-eBPF ichonga iimephu nemisebenzi enxulumene necandelo. Xa i-kernel ikhupha isiganeko sohlobo oluthile (umzekelo, tracepoint), imisebenzi ebotshiweyo iyenziwa. Amakhadi abonelela ngonxibelelwano phakathi kweprogram ye-kernel kunye neprogram yendawo yomsebenzisi.

isiphelo

Eli nqaku lixoxe nge-BPF ne-eBPF ngokubanzi. Ndiyazi ukuba luninzi ulwazi kunye nezixhobo malunga ne-eBPF namhlanje, ke ndiya kucebisa ezinye izixhobo ezimbalwa zokuqhubela phambili isifundo.

Ndincoma ukufunda:

umthombo: www.habr.com

Yongeza izimvo