Takaitaccen Gabatarwa zuwa BPF da eBPF

Hai Habr! Muna sanar da ku cewa muna shirin fitar da littafi "Linux Observability tare da BPF".

Takaitaccen Gabatarwa zuwa BPF da eBPF
Yayin da na'urar kama-da-wane ta BPF ke ci gaba da haɓakawa kuma ana amfani da ita sosai a aikace, mun fassara muku labarin da ke kwatanta manyan fasalulluka da yanayin halin yanzu.

A cikin 'yan shekarun nan, kayan aikin shirye-shirye da dabaru sun sami shahara don rama ƙarancin kernel na Linux a lokuta inda ake buƙatar sarrafa fakiti masu girma. Daya daga cikin shahararrun hanyoyin irin wannan ana kiransa core kewaye (kernel bypass) kuma yana ba da damar, tsallake layin hanyar sadarwa na kernel, don aiwatar da duk sarrafa fakiti daga sararin mai amfani. Ketare kwaya shima ya ƙunshi sarrafa katin sadarwar daga sarari mai amfani. A wasu kalmomi, lokacin aiki tare da katin sadarwar, muna dogara ga direba sarari mai amfani.

Ta hanyar canja wurin cikakken iko na katin cibiyar sadarwa zuwa shirin mai amfani-sarari, muna rage saman kernel (maɓallin yanayi, sarrafa Layer cibiyar sadarwa, katsewa, da dai sauransu), wanda yake da mahimmanci yayin da yake gudana a cikin saurin 10 Gb / s ko mafi girma. Ketare kwaya tare da haɗin wasu fasaloli (sarrafa tsari) da kuma daidaita aikin a hankali (NUMA lissafin kudi, keɓewar CPU, da dai sauransu) sun dace da tushen hanyoyin sadarwar sararin samaniya mai fa'ida. Wataƙila misali mai kyau na wannan sabuwar hanyar sarrafa fakiti shine DPDK daga Intel (Kit ɗin Haɓaka Jirgin Sama), ko da yake akwai wasu sanannun kayan aiki da fasaha, ciki har da VPP daga Cisco (Vector Packet Processing), Netmap kuma, ba shakka, snab.

Ƙirƙirar hulɗar cibiyar sadarwa a cikin sarari mai amfani yana da yawan rashin amfani:

  • Kernel OS shine Layer abstraction don albarkatun kayan masarufi. Saboda shirye-shiryen masu amfani da sararin samaniya dole ne su sarrafa albarkatun su kai tsaye, suma dole ne su sarrafa kayan aikin nasu. Wannan sau da yawa yana nufin shirya direbobin ku.
  • Tun da muna barin sararin kwaya gaba ɗaya, muna kuma barin duk ayyukan sadarwar da kernel ke bayarwa. Shirye-shiryen-sararin mai amfani dole ne su sake aiwatar da fasalulluka waɗanda ƙila an riga an samar da kernel ko tsarin aiki.
  • Shirye-shiryen suna aiki a cikin yanayin sandbox, wanda ke iyakance hulɗar su kuma yana hana su haɗawa da sauran sassan tsarin aiki.

Mahimmanci, lokacin sadarwar yanar gizo a cikin sararin mai amfani, ana samun nasarorin aiki ta hanyar matsar da sarrafa fakiti daga kernel zuwa sararin mai amfani. XDP yayi daidai da akasin haka: yana matsar da shirye-shiryen cibiyar sadarwa daga sararin mai amfani (fita, masu juyawa, kewayawa, da sauransu) zuwa yankin kernel. XDP yana ba mu damar aiwatar da aikin cibiyar sadarwa da zaran fakitin ya sami hanyar sadarwar cibiyar sadarwa kuma kafin ya fara tafiya har zuwa tsarin cibiyar sadarwa na kernel. Sakamakon haka, saurin sarrafa fakiti yana ƙaruwa sosai. Koyaya, ta yaya kernel ke ba mai amfani damar gudanar da shirye-shiryen su a sararin kwaya? Kafin amsa wannan tambayar, bari mu kalli menene BPF.

BPF da eBPF

Duk da cikakken sunan da ba a bayyana ba, BPF (Packet Filtering, Berkeley) shine, a zahiri, ƙirar injin kama-da-wane. An tsara wannan na'ura mai kama da asali don sarrafa fakitin tacewa, saboda haka sunan.

Ɗaya daga cikin sanannun kayan aikin ta amfani da BPF shine tcpdump. Lokacin ɗaukar fakiti tare da tcpdump mai amfani zai iya ƙayyade magana don tace fakiti. Fakitin da suka dace da wannan magana kawai za a kama su. Alal misali, kalmar "tcp dst port 80” yana nufin duk fakitin TCP da suka isa tashar jiragen ruwa 80. Mai tarawa zai iya rage wannan magana ta hanyar canza shi zuwa BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

Wannan shine ainihin abin da shirin na sama yake yi:

  • Umarni (000): Load da fakitin a kashe 12, azaman kalmar 16-bit, a cikin tarawa. Kashe 12 yayi daidai da ethertype na fakitin.
  • Umarni (001): yana kwatanta darajar a cikin tarawa tare da 0x86dd, wato, tare da ƙimar ethertype na IPv6. Idan sakamakon ya kasance gaskiya, to, lissafin shirin yana zuwa umarni (002), idan kuma ba haka ba, to (006).
  • Umarni (006): yana kwatanta darajar tare da 0x800 (ƙimar ethertype don IPv4). Idan amsar gaskiya ce, to shirin ya tafi (007), idan ba haka ba, to (015).

Da sauransu, har sai shirin tace fakiti ya dawo da sakamako. Yawancin lokaci yana da boolean. Mayar da ƙima mara sifili (umarni (014)) yana nufin fakitin ya yi daidai, kuma mayar da sifili (umarni (015)) yana nufin fakitin bai daidaita ba.

Steve McCann da Van Jacobson ne suka gabatar da na'urar kama-da-wane ta BPF da bytecode a ƙarshen 1992 lokacin da takardarsu ta fito. Tace Fakitin BSD: Sabbin gine-gine don ɗaukar fakitin matakin mai amfani, a karon farko an gabatar da wannan fasaha a taron Usenix a cikin hunturu na 1993.

Saboda BPF na'ura ce ta kama-da-wane, tana bayyana yanayin da shirye-shiryen ke gudana. Baya ga bytecode, yana kuma ayyana ƙirar ƙwaƙwalwar fakiti (ana yin amfani da umarnin kaya a fakiti a fakaice), rijistar (A da X; ma'ajiyar tarawa da rajistar index), ma'ajin ajiyar ƙwaƙwalwar ajiya, da fakitin shirin. Abin sha'awa, an ƙirƙira lambar BPF ta hanyar Motorola 6502 ISA. Kamar yadda Steve McCann ya tuna a cikin nasa cikakken rahoton a Sharkfest '11, ya saba da gina 6502 daga makarantar sakandare lokacin da yake shirye-shirye akan Apple II, kuma wannan ilimin ya rinjayi aikinsa na ƙirar BPF bytecode.

Ana aiwatar da tallafin BPF a cikin Linux kernel a sigar v2.5 kuma daga baya, wanda Jay Schullist ya ƙara musamman. Lambar BPF ba ta canzawa har zuwa 2011, lokacin da Eric Dumaset ya sake fasalin mai fassarar BPF don aiki a yanayin JIT (Madogararsa: JIT don Fakitin Tace). Bayan haka, maimakon fassarar BPF bytecode, kernel zai iya canza shirye-shiryen BPF kai tsaye zuwa gine-ginen da aka yi niyya: x86, ARM, MIPS, da sauransu.

Daga baya, a cikin 2014, Alexei Starovoitov ya ba da shawarar sabon tsarin JIT don BPF. A gaskiya ma, wannan sabon JIT ya zama sabon gine-gine bisa BPF kuma ana kiransa eBPF. Ina tsammanin duka VMs sun kasance tare na ɗan lokaci, amma a halin yanzu ana aiwatar da tace fakiti akan saman eBPF. A haƙiƙa, a yawancin misalan rubuce-rubuce na zamani, ana kiran BPF da eBPF, kuma BPF na gargajiya ana kiranta a yau da cBPF.

eBPF yana haɓaka na'ura mai mahimmanci na BPF ta hanyoyi da yawa:

  • Ya dogara da gine-ginen 64-bit na zamani. eBPF yana amfani da rijistar 64-bit kuma yana ƙara yawan adadin rajista daga 2 (accumulator da X) zuwa 10. eBPF kuma yana ba da ƙarin opcodes (BPF_MOV, BPF_JNE, BPF_CALL…).
  • Ware daga tsarin tsarin Layer Layer. An ɗaure BPF zuwa samfurin bayanan batch. Tun da aka yi amfani da shi don tace fakiti, lambar sa tana cikin tsarin tsarin da ke ba da hulɗar hanyar sadarwa. Koyaya, injin kama-da-wane na eBPF baya daure da ƙirar bayanai kuma ana iya amfani dashi ga kowace manufa. Don haka, yanzu ana iya haɗa shirin eBPF zuwa wurin ganowa ko zuwa kprobe. Wannan yana buɗe kofa zuwa kayan aikin eBPF, nazarin aiki, da sauran lokuta masu amfani da yawa a cikin mahallin sauran tsarin kernel. Yanzu lambar eBPF tana kan hanyarta: kernel/bpf.
  • Ma'ajiyar bayanai ta duniya da ake kira Maps. Taswirori manyan shagunan ƙima ne waɗanda ke ba da musayar bayanai tsakanin sararin mai amfani da sararin kernel. eBPF yana ba da nau'ikan katunan da yawa.
  • Ayyuka na biyu. Musamman, don sake rubuta fakiti, ƙididdige adadin checksum, ko haɗa kunshin. Waɗannan ayyuka suna gudana a cikin kwaya kuma ba sa cikin shirye-shiryen sarari mai amfani. Bugu da kari, ana iya yin kiran tsarin daga shirye-shiryen eBPF.
  • Ƙare kira. Girman shirin a cikin eBPF an iyakance shi zuwa bytes 4096. Ƙarshen kiran fasalin yana ba da damar shirin eBPF don canja wurin sarrafawa zuwa sabon shirin eBPF kuma don haka ketare wannan iyakance (har zuwa shirye-shirye 32 ana iya ɗaure su ta wannan hanya).

eBPF misali

Akwai misalai da yawa don eBPF a cikin tushen kernel Linux. Ana samun su a samfurori/bpf/. Don haɗa waɗannan misalan, kawai rubuta:

$ sudo make samples/bpf/

Ba zan rubuta sabon misali don eBPF da kaina ba, amma zan yi amfani da ɗayan samfuran da ake samu a samfuran/bpf/. Zan duba wasu sassa na lambar in bayyana yadda take aiki. A matsayin misali, na zaɓi shirin tracex4.

Gabaɗaya, kowane misalan a cikin samfurori/bpf/ ya ƙunshi fayiloli guda biyu. A wannan yanayin:

  • tracex4_kern.c, ya ƙunshi lambar tushe da za a aiwatar a cikin kernel azaman eBPF bytecode.
  • tracex4_user.c, ya ƙunshi shirin daga sararin mai amfani.

A wannan yanayin, muna buƙatar tattarawa tracex4_kern.c zuwa eBPF bytecode. A halin yanzu in gcc babu sashin uwar garken na eBPF. Anyi sa'a, clang na iya samar da eBPF bytecode. Makefile amfani clang don tarawa tracex4_kern.c zuwa fayil ɗin abu.

Na ambata a sama cewa ɗayan abubuwan ban sha'awa na eBPF shine taswira. tracex4_kern ya bayyana taswira daya:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH yana ɗaya daga cikin nau'ikan katunan da eBPF ke bayarwa. A wannan yanayin, zanta ne kawai. Wataƙila kun lura da tallan SEC("maps"). SEC shine macro da ake amfani dashi don ƙirƙirar sabon sashe na fayil ɗin binary. A gaskiya, a cikin misali tracex4_kern an ayyana ƙarin sassa biyu:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // получаем ip-адрес вызывающей стороны kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

Waɗannan ayyuka guda biyu suna ba ku damar cire shigarwa daga taswira (kprobe/kmem_cache_free) kuma ƙara sabon shigarwa zuwa taswirar (kretprobe/kmem_cache_alloc_node). Duk sunayen ayyuka da aka rubuta cikin manyan haruffa sun dace da macro da aka ayyana a ciki bpf_helpers.h.

Idan na zubar da sassan fayil ɗin abu, ya kamata in ga cewa an riga an ayyana waɗannan sabbin sassan:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Akwai kuma tracex4_user.c, babban shirin. Ainihin, wannan shirin yana sauraron abubuwan da suka faru kmem_cache_alloc_node. Lokacin da irin wannan abin ya faru, ana aiwatar da lambar eBPF daidai. Lambar tana adana sifa ta IP na abu zuwa taswira, sa'an nan kuma an matse abin ta hanyar babban shirin. Misali:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

Ta yaya shirin sararin samaniyar mai amfani da shirin eBPF ke da alaƙa? A lokacin farawa tracex4_user.c lodi abu fayil tracex4_kern.o amfani da aikin load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

Yayin yin load_bpf_file Ana ƙara binciken da aka ayyana a cikin fayil ɗin eBPF /sys/kernel/debug/tracing/kprobe_events. Yanzu muna sauraron waɗannan abubuwan kuma shirinmu na iya yin wani abu idan sun faru.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

Duk sauran shirye-shirye a cikin samfurin/bpf/ an tsara su iri ɗaya. Kullum suna ƙunshi fayiloli guda biyu:

  • XXX_kern.c: eBPF shirin.
  • XXX_user.c: babban shirin.

Shirin eBPF yana bayyana taswirori da ayyuka masu alaƙa da sashe. Lokacin da kwaya ta fitar da wani lamari na wani nau'i (misali, tracepoint), ana aiwatar da ayyukan ɗaure. Taswirori suna ba da sadarwa tsakanin shirin kernel da shirin mai amfani-sarari.

ƙarshe

A cikin wannan labarin, an tattauna BPF da eBPF gabaɗaya. Na san cewa akwai bayanai da yawa da albarkatu game da eBPF a yau, don haka zan ba da shawarar wasu ƙarin kayan don ƙarin nazari.

Ina ba da shawarar karantawa:

  • BPF: Injin in-kernel na duniya Jonathan Corbett. Gabatarwa ga BPF da kuma yadda ta samo asali zuwa eBPF.
  • Cikakken gabatarwa ga eBPF Brendan Gregg. Labari daga LWN.net. Brendan akai-akai tweets game da eBPF kuma yana kiyaye jerin albarkatun kan batun akan gidan yanar gizon sa. shafi.
  • Bayanan kula akan BPF & eBPF Julia Evans. Sharhi akan gabatarwar Suchakra Sharma "Tacewar fakitin BSD: Sabuwar Tsarin Gine-gine don ɗaukar fakitin matakin mai amfani". Bayanan suna da kyau kuma suna taimakawa sosai don fahimtar nunin faifai.
  • eBPF, Kashi na 1: Baya, Yanzu da Gaba Ferris Ellis ne adam wata. Tsawon karatu tare da ci gabaamma ya cancanci karantawa. Ɗaya daga cikin mafi kyawun labaran eBPF da na ci karo da su.

source: www.habr.com

Add a comment