Hordhac Kooban oo ku saabsan BPF iyo eBPF

Hello, Habr! Waxaan jecelnahay inaan ku wargelinno inaan diyaarinayno buug la sii daayo."Daawashada Linux oo leh BPF".

Hordhac Kooban oo ku saabsan BPF iyo eBPF
Maadaama mashiinka farsamada ee BPF uu sii socdo inuu horumariyo oo si firfircoon loogu isticmaalo ficil ahaan, waxaan kuu soo turjunay maqaal qeexaya awoodihiisa ugu muhiimsan iyo xaaladda hadda jirta.

Sanadihii la soo dhaafay, aaladaha barnaamijyada iyo farsamooyinka ayaa noqday kuwa caan ah si loo magdhabo xaddidaadda kernel-ka Linux marka loo baahdo habaynta baakadaha waxqabadka sare leh. Mid ka mid ah farsamooyinka ugu caansan ee noocan oo kale ah ayaa loo yaqaan kernel bypass (Kernel bypass) oo ogolaada, ka gudubta lakabka shabagga kernel-ka, in la sameeyo dhammaan habaynta baakadaha ee booska isticmaalaha. Ka gudubka kernel-ka waxa kale oo ay ku lug leedahay xakamaynta kaadhka shabakada booska isticmaalaha. Si kale haddii loo dhigo, marka la shaqeynayo kaarka shabakadda, waxaan ku tiirsanahay darawalka booska isticmaalaha.

Adigoo si buuxda u wareejinaya kantaroolka kaadhka shabakada barnaamijka isticmaale-booska, waxaanu hoos u dhigaynaa dusha sare ee kernel-ka (beddelka macnaha, habaynta lakabka shabakada, kala goynta, iwm.), taas oo aad muhiim u ah markaad ku socoto xawaare dhan 10Gb/s ama ka sareeya. Kernel bypass oo ay weheliso astaamo kale oo la isku darayhabaynta dufcaddii) iyo habaynta waxqabadka oo taxadar leh (NUMA xisaabaadka, Go'doominta CPU, iwm) u dhigma aasaaska habaynta shabakada waxqabadka sare ee goobta isticmaalaha. Waxaa laga yaabaa in tusaale tusaale u ah habkan cusub ee habaynta baakadaha DPDK ka Intel (Qalabka Horumarinta Diyaaradda Xogta), inkastoo ay jiraan qalabyo iyo farsamooyin kale oo si fiican loo yaqaan, oo ay ku jiraan Cisco's VPP (Vector Packet Processing), Netmap iyo, dabcan, dharbaaxo.

Abaabulka is dhexgalka shabakada ee goobta isticmaalaha waxay leedahay faa'iidooyin dhowr ah:

  • Kernel-ka OS waa lakabka abstraction ee agabka qalabka. Sababtoo ah barnaamijyada booska isticmaaleyaashu waa inay si toos ah u maareeyaan kheyraadkooda, sidoo kale waa inay maamulaan qalabkooda. Tani waxay inta badan ka dhigan tahay in aad diyaarisay darawaladaada.
  • Sababtoo ah waxaan ka tanaasuleynaa booska kernel gebi ahaanba, waxaan sidoo kale ka tanaasuleynaa dhammaan shaqeynta isku xirka ee uu bixiyo kernel-ku. Barnaamijyada booska isticmaaleyaashu waa inay dib u hirgeliyaan sifooyin laga yaabo inay horeba u bixiyeen kernel ama nidaamka hawlgalka.
  • Barnaamijyadu waxay ku shaqeeyaan habka sandbox, kaas oo si dhab ah u xaddidaya isdhexgalka oo ka hortagaya inay la midoobaan qaybaha kale ee nidaamka hawlgalka.

Nuxur ahaan, marka la isku xidho booska isticmaalaha, guulaha waxqabadka waxaa lagu gaaraa iyada oo ka wareejinta habaynta baakadaha kernel ilaa booska isticmaale. XDP waxay si sax ah u qabataa lidkeeda: waxay ka dhaqaajisaa barnaamijyada isku xirka booska isticmaalaha (miirayaasha, xalliyaasha, jiheynta, iwm.) una wareejisaa booska kernel. XDP waxay noo ogolaataa inaanu samayno shaqo shabakad isla marka baakidhku ku dhufto interface network iyo ka hor inta aanu bilaabin u guurista nidaamka hoose ee shabakada kernel. Natiijo ahaan, xawaaraha farsamaynta baakadaha ayaa si aad ah u kordha. Si kastaba ha ahaatee, sidee buu kernel-ku u ogolaanayaa isticmaaluhu inuu ku fuliyo barnaamijyadooda meel bannaan kernel? Kahor intaanan ka jawaabin su'aashan, aan eegno waxa ay tahay BPF.

BPF iyo eBPF

Inkasta oo magaca jahawareerku jiro, BPF (Berkeley Packet Filtering) waa, dhab ahaantii, nooc mashiinka farsamada ah. Mashiinka casriga ah waxaa markii hore loogu talagalay in uu xakameeyo shaandhaynta baakadaha, markaa magaca.

Mid ka mid ah qalabka ugu caansan ee loo isticmaalo BPF waa tcpdump. Marka la qabanayo baakadaha la isticmaalayo tcpdump Isticmaaluhu wuxuu qeexi karaa odhaah uu ku shaandheeyo baakadaha. Kaliya baakidhyo u dhigma tibaaxan ayaa la qabsan doonaa. Tusaale ahaan, odhaahda "tcp dst port 80"waxaa loola jeedaa dhammaan baakadaha TCP ee ka imaanaya dekedda 80. Isku-dubariduhu wuxuu soo gaabin karaa ereygan isagoo u beddelaya BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

Tani waa waxa uu aasaas ahaan sameeyo barnaamijka kore:

  • Tilmaanta (000): Waxay ku shubtaa baakadda marka la dhimayo 12, sida kelmed 16-bit ah, oo gelinaysa ururiyaha. Offset 12 waxay u dhigantaa nooca baakadda.
  • Tilmaanta (001): waxay is barbardhigtaa qiimaha ku jira ururiyaha 0x86dd, taas oo ah, qiimaha ethertype ee IPv6. Hadday natiijadu run tahay, markaa xisaabiyaha barnaamijku wuxuu aadayaa tilmaan-bixinta (002), haddii kale, markaa (006).
  • Tilmaanta (006): waxay barbar dhigaysaa qiimaha 0x800 (qiimaha ethertype ee IPV4). Hadday jawaabtu run tahay, markaas barnaamijku wuxuu tagayaa (007), haddii kale, markaa (015).

Iyo wixii la mid ah ilaa barnaamijka shaandhaynta baakidhku soo celiyo natiijada. Tani badanaa waa Boolean. Soo celinta qiime aan eber ahayn (tilmaan (014)) waxay ka dhigan tahay in baakadda la aqbalay, iyo soo celinta qiimaha eber (waxbarashada (015)) waxay ka dhigan tahay in xirmada aan la aqbalin.

Mashiinka farsamada ee BPF iyo bytecode-keeda waxaa soo jeediyay Steve McCann iyo Van Jacobson dabayaaqadii 1992 markii warqadooda la daabacay. Shaandhaynta Xirmada BSD: Nashqada Cusub ee Qabashada Xidhmada Heerka Isticmaalaha, Tignoolajiyadan waxaa markii ugu horreysay lagu soo bandhigay shirkii Usenix ee jiilaalka 1993.

Sababtoo ah BPF waa mashiinka farsamada, waxay qeexaysaa bay'ada ay barnaamijyadu ku shaqeeyaan. Marka lagu daro bytecode, waxay sidoo kale qeexaysaa qaabka xusuusta dufcada (tilmaamaha culeyska ayaa si toos ah loogu dabaqayaa dufcadda), diiwaannada (A iyo X; kaydinta iyo diiwaanka index), kaydinta xusuusta xoqan, iyo xisaabiyaha barnaamijka qarsoon. Waxa xiiso leh, BPF bytecode waxaa loo qaabeeyey Motorola 6502 ISA. Sida Steve McCann dib u xasuusiyay kiisa warbixin guud Sharkfest '11, waxa uu aqoon u lahaa dhismaha 6502 laga soo bilaabo barnaamijka maalmaha dugsiga sare ee Apple II, aqoontani waxay saamaysay shaqadiisa naqshadaynta bytecode BPF.

Taageerada BPF waxaa laga hirgaliyay kernel Linux ee noocyada v2.5 iyo ka sareeya, waxaa ku daray inta badan dadaalka Jay Schullist. Koodhka BPF isma bedelin ilaa 2011, markii Eric Dumaset uu dib u habeeyey turjubaanka BPF si uu ugu shaqeeyo qaabka JIT (Isha: JIT ee filtarrada baakadaha). Taas ka dib, kernel, halkii uu ka tarjumi lahaa BPF bytecode, wuxuu si toos ah u beddeli karaa barnaamijyada BPF qaab-dhismeedka bartilmaameedka: x86, ARM, MIPS, iwm.

Later, 2014, Alexey Starovoitov wuxuu soo jeediyay hab cusub oo JIT ah oo loogu talagalay BPF. Dhab ahaantii, JIT-kan cusub wuxuu noqday dhisme cusub oo BPF ku salaysan waxaana loo yaqaan eBPF. Waxaan u maleynayaa in VM-yada labaduba ay wada noolaayeen in muddo ah, laakiin shaandhaynta baakadaha hadda waxaa la hirgeliyay iyada oo ku saleysan eBPF. Dhab ahaantii, tusaalooyin badan oo dukumeenti casri ah, BPF waxaa la fahamsan yahay inay tahay eBPF, BPF-ga caadiga ah waxaa maanta loo yaqaan cBPF.

eBPF waxay ku kordhisaa mashiinka farsamada casriga ah ee BPF dhowr siyaabood:

  • Iyada oo ku saleysan naqshadaha 64-bit ee casriga ah. eBPF waxay isticmaashaa diiwaanada 64-bit waxayna kordhisaa tirada diiwaanada la heli karo min 2 (accumulator iyo X) ilaa 10. eBPF waxay kaloo bixisaa opcodes dheeraad ah (BPF_MOV, BPF_JNE, BPF_CALL...).
  • Ka go'ay nidaamka hoose ee lakabka shabakada BPF waxay ku xidhnayd qaabka xogta dufcada. Maadaama loo istcimaali jiray shaandhaynta baakadaha, koodkoodu waxa uu ku yaalay nidaamka hoose ee bixiya isgaarsiinta shabakada. Si kastaba ha ahaatee, mishiinka farsamada eBPF kuma xidhna qaabka xogta waxaana loo isticmaali karaa ujeedo kasta. Markaa, hadda barnaamijka eBPF waxa lagu xidhi karaa barta raadraaca ama kprobe. Tani waxay furaysaa dariiqa qalabaynta eBPF, falanqaynta waxqabadka, iyo kiisas kale oo badan oo la isticmaalo marka la eego nidaamka hoose ee kernel kale. Hadda koodka eBPF wuxuu ku yaal jidkiisa: kernel/bpf.
  • Dukaamada xogta caalamiga ah ee loo yaqaan Maps. Khariidadaha waa dukaamada qiimaha muhiimka ah ee suurtageliya in xogta ay isweydaarsadaan booska isticmaalaha iyo booska kernelka. eBPF waxay bixisaa dhowr nooc oo khariidado ah.
  • Hawlaha dugsiga sare. Gaar ahaan, si dib loogu qoro xirmo, xisaabiyo xisaab hubin, ama xidhid baakidh. Hawlahani waxay ku dhex shaqeeyaan kernel-ka mana aha barnaamijyo isticmaale. Waxa kale oo aad samayn kartaa wicitaanada nidaamka barnaamijyada eBPF.
  • Jooji wicitaanada Cabbirka barnaamijka eBPF wuxuu ku xaddidan yahay 4096 bytes. Muuqaalka wicitaanka dabada wuxuu u oggolaanayaa barnaamijka eBPF inuu u wareejiyo kantaroolka barnaamij eBPF cusub oo sidaas darteed looga gudbo xaddidan (ilaa 32 barnaamij ayaa sidan loola xiriirin karaa).

eBPF: tusaale

Waxaa jira dhowr tusaale oo eBPF ah oo ku jira ilaha kernel Linux. Waxa laga heli karaa muunado/bpf/. Si aad u ururiso tusaalahan, si fudud u geli:

$ sudo make samples/bpf/

Anigu naftayda u qori maayo tusaale cusub eBPF, laakiin waxaan isticmaali doonaa mid ka mid ah muunadaha laga heli karo muunado/bpf/. Waxaan eegi doonaa qaybo ka mid ah koodka oo aan sharaxi doonaa sida uu u shaqeeyo. Tusaale ahaan, waxaan doortay barnaamijka tracex4.

Guud ahaan, mid kasta oo ka mid ah tusaalooyinka muunado/bpf/ waxa uu ka kooban yahay laba fayl. Kiiskan:

  • tracex4_kern.c, waxa uu ka kooban yahay koodhka isha lagu fulinayo kernel-ka sida eBPF bytecode.
  • tracex4_user.c, waxa uu ka kooban yahay barnaamuj ka yimid booska isticmaalaha.

Xaaladdan oo kale, waxaan u baahanahay inaan ururino tracex4_kern.c EBPF bytecode. Hadda gudaha gcc ma jirto wax ka dambeeya eBPF. Nasiib wanaag, clang waxay soo saari kartaa eBPF bytecode. Makefile adeegsadaa clang si loo ururiyo tracex4_kern.c faylka shayga.

Waxaan kor ku xusay in mid ka mid ah sifooyinka ugu xiisaha badan eBPF ay yihiin maab. tracex4_kern wuxuu qeexayaa hal khariidad:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH waa mid ka mid ah noocyada badan ee kaararka ay bixiso eBPF. Xaaladdan oo kale, waa xashiish kaliya. Waxa kale oo laga yaabaa inaad dareentay xayeysiis SEC("maps"). SEC waa makro loo isticmaalo in lagu abuuro qayb cusub oo ka mid ah faylka binary. Dhab ahaantii, tusaale ahaan tracex4_kern laba qaybood oo kale ayaa lagu qeexay:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ ip-адрСс Π²Ρ‹Π·Ρ‹Π²Π°ΡŽΡ‰Π΅ΠΉ стороны kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

Labadan hawlood waxay kuu oggolaanayaan inaad ka tirtirto gelidda khariidada (kprobe/kmem_cache_free) oo ku dar gal cusub khariidada (kretprobe/kmem_cache_alloc_node). Dhammaan magacyada shaqada ee ku qoran xarfaha waaweyn waxay u dhigmaan macros lagu qeexay bpf_helpers.h.

Haddii aan daadiyo qaybaha faylka shayga, waa inaan arko in qaybahan cusub ay hore u qeexeen:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Waxaa kaloo jira tracex4_user.c, barnaamijka ugu muhiimsan. Asal ahaan, barnaamijkani wuxuu dhagaystaa dhacdooyinka kmem_cache_alloc_node. Marka ay dhacdo noocan oo kale ah dhacdo, koodka eBPF ee u dhigma waa la fuliyay. Koodhku waxa uu ku kaydiyaa sifada IP-ga shayga khariidad, shaygana waxa lagu siidaayaa barnaamijka ugu muhiimsan. Tusaale:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

Sidee bay u xidhiidhsan yihiin barnaamijka booska isticmaale iyo barnaamijka eBPF? Marka la bilaabay tracex4_user.c raraya galka shayga tracex4_kern.o iyadoo la isticmaalayo shaqada load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

Adigoo samaynaya load_bpf_file Baaritaannada lagu qeexay faylka eBPF ayaa lagu daraa /sys/kernel/debug/tracing/kprobe_events. Hadda waxaan dhageysaneynaa dhacdooyinkaas oo barnaamijkeena waxbuu qaban karaa marka ay dhacaan.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

Dhammaan barnaamijyada kale ee muunada/bpf/ waxay u habaysan yihiin si la mid ah. Had iyo jeer waxay ka kooban yihiin laba fayl:

  • XXX_kern.c: barnaamijka eBPF.
  • XXX_user.c: barnaamijka ugu weyn.

Barnaamijka eBPF waxa uu tilmaamayaa maab iyo hawlo la xidhiidha qayb. Marka kernel-ku soo saaro dhacdo nooc gaar ah (tusaale ahaan, tracepoint), hawlaha xidhxidhan waa la fuliyay. Kaadhadhku waxay bixiyaan xidhiidhka ka dhexeeya barnaamijka kernel-ka iyo barnaamijka booska isticmaalaha.

gunaanad

Maqaalkani wuxuu ka hadlay BPF iyo eBPF guud ahaan ereyada. Waan ogahay inay jiraan macluumaad iyo ilo badan oo ku saabsan eBPF maanta, markaa waxaan ku talin doonaa dhowr ilo oo dheeri ah si loo helo daraasad dheeri ah

Waxaan ku talinayaa inaad akhrido:

Source: www.habr.com

Add a comment