Kenyelletso e Khutšoanyane ea BPF le eBPF

Lumela, Habr! Re rata ho le tsebisa hore re ntse re lokisa buka e tla lokolloa."Linux Observability le BPF".

Kenyelletso e Khutšoanyane ea BPF le eBPF
Kaha mochini oa BPF o ntse o tsoela pele ho fetoha ebile o sebelisoa ka mafolofolo, re u fetoletse sengoloa se hlalosang bokhoni ba ona ba mantlha le boemo ba hajoale.

Lilemong tsa morao tjena, lisebelisoa le mekhoa ea ho etsa mananeo li se li tumme haholo ho lefella mefokolo ea Linux kernel maemong ao ho hlokahalang hore ho sebetsanoe le lipakete tsa ts'ebetso e phahameng. E 'ngoe ea mekhoa e tummeng ka ho fetisisa ea mofuta ona e bitsoa nqane ea kernel (kernel bypass) mme e lumella, ho feta kernel network layer, ho etsa ts'ebetso eohle ea pakete ho tloha sebakeng sa mosebelisi. Ho feta kernel ho boetse ho kenyelletsa ho laola karete ea marang-rang ho tloha sebaka sa mosebedisi. Ka mantsoe a mang, ha re sebetsa le karete ea marang-rang, re itšetlehile ka mokhanni sebaka sa mosebedisi.

Ka ho fetisetsa taolo e feletseng ea karete ea marang-rang ho lenaneo la sebaka sa basebelisi, re fokotsa kernel overhead (context switching, network layer processing, interrupts, joalo-joalo), e leng ea bohlokoa haholo ha e matha ka lebelo la 10Gb / s kapa holimo. Kernel bypass hammoho le motsoako oa likarolo tse ling (ts'ebetso ea batch) le tokiso e hlokolosi ea tšebetso (NUMA accounting, Ho itšehla thajana ha CPU, joalo-joalo) li lumellana le metheo ea ts'ebetso e phahameng ea ts'ebetso ea marang-rang sebakeng sa basebelisi. Mohlomong mohlala oa mohlala oa mokhoa ona o mocha oa ho sebetsana le lipakete ke DPDK ho tsoa ho Intel (Data Plane Development Kit), leha ho na le lisebelisoa le mekhoa e meng e tsebahalang, ho kenyeletsoa Cisco's VPP (Vector Packet Processing), Netmap mme, ehlile, hlaba.

Ho hlophisa litšebelisano tsa marang-rang sebakeng sa basebelisi ho na le mathata a 'maloa:

  • The OS kernel ke lera le khutsitseng bakeng sa lisebelisoa tsa Hardware. Hobane mananeo a sebaka sa basebelisi a tlameha ho laola lisebelisoa tsa bona ka kotloloho, le bona ba tlameha ho laola lisebelisoa tsa bona. Hangata sena se bolela ho tlameha ho hlophisa bakhanni ba hau.
  • Hobane re tlohela sebaka sa kernel ka botlalo, re boetse re tela ts'ebetso eohle ea marang-rang e fanoeng ke kernel. Mananeo a sebaka sa basebelisi a tlameha ho kenya tšebetsong likarolo tseo e kanna eaba li se li fanoe ke kernel kapa sistimi e sebetsang.
  • Mananeo a sebetsa ka mokhoa oa sandbox, e fokotsang tšebelisano ea bona ka botebo mme e ba thibela ho hokahana le likarolo tse ling tsa sistimi e sebetsang.

Ha e le hantle, ha ho etsoa marang-rang sebakeng sa basebelisi, phaello ea ts'ebetso e finyelloa ka ho tsamaisa ts'ebetso ea pakete ho tloha kernel ho ea sebakeng sa mosebedisi. XDP e etsa se fapaneng hantle: e tsamaisa mananeo a marang-rang ho tloha sebakeng sa basebelisi (li-filters, solvers, routing, joalo-joalo) sebakeng sa kernel. XDP e re lumella ho etsa ts'ebetso ea marang-rang hang ha pakete e otla sebopeho sa marang-rang le pele e qala ho nyolohela ka har'a kernel network subsystem. Ka lebaka leo, lebelo la ho sebetsana le pakete le eketseha haholo. Leha ho le joalo, kernel e lumella mosebelisi joang ho etsa mananeo a bona sebakeng sa kernel? Pele re araba potso ena, a re shebeng hore na BPF ke eng.

BPF le eBPF

Ho sa tsotellehe lebitso le ferekanyang, BPF (Berkeley Packet Filtering) ha e le hantle, ke mohlala oa mochine oa sebele. Mochini ona oa sebele o ne o etselitsoe ho sebetsana le ho sefa lipakete, ka hona lebitso.

E 'ngoe ea lisebelisoa tse tsebahalang haholo tse sebelisang BPF ke tcpdump. Ha u tšoara lipakete u sebelisa tcpdump mosebelisi a ka hlakisa polelo ho lipakete tsa sefa. Ke lipakete feela tse tsamaellanang le polelo ena tse tla nkuoa. Ka mohlala, poleloana "tcp dst port 80” e bua ka lipakete tsohle tsa TCP tse fihlang koung ea 80. Moqapi a ka khutsufatsa polelo ena ka ho e fetolela ho BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

Sena ke seo lenaneo le ka holimo le se etsang:

  • Taelo (000): E kenya sephutheloana ho offset 12, joalo ka lentsoe la 16-bit, ho accumulator. Offset 12 e lumellana le ethertype ea pakete.
  • Taelo (001): e bapisa boleng ba accumulator le 0x86dd, ke hore, le boleng ba ethertype bakeng sa IPv6. Haeba sephetho e le 'nete, joale k'haonte ea lenaneo e ea ho litaelo (002), 'me haeba ho se joalo, joale ho (006).
  • Taelo (006): e bapisa boleng le 0x800 (boleng ba ethertype bakeng sa IPv4). Haeba karabo ke 'nete, joale lenaneo le ea ho (007), haeba ho se joalo, joale ho (015).

'Me joalo-joalo ho fihlela lenaneo la ho sefa pakete le khutlisetsa sephetho. Hangata sena ke Boolean. Ho khutlisa boleng boo e seng lefela (taelo (014)) ho bolela hore pakete e amohetsoe, 'me ho khutlisa boleng ba lefela (taelo (015)) ho bolela hore pakete ha ea amoheloa.

Mochini o hlakileng oa BPF le li-bytecode tsa ona li hlahisitsoe ke Steve McCann le Van Jacobson ho elella bofelong ba 1992 ha pampiri ea bona e phatlalatsoa. Setlhopha sa Pakete sa BSD: Meaho e Ncha bakeng sa Capture ea Pakete ea Mosebelisi, theknoloji ena e ile ea qala ho hlahisoa kopanong ea Usenix mariha a 1993.

Hobane BPF ke mochini o sebetsang, o hlalosa tikoloho eo mananeo a sebetsang ho eona. Ntle le bytecode, e boetse e hlalosa mohlala oa memori ea batch (litaelo tsa mojaro li sebelisoa ka mokhoa o hlakileng ho batch), lirejista (A le X; li-accumulator le li-index), polokelo ea memori ea scratch, le k'haonte e hlakileng ea lenaneo. Ho khahlisang, BPF bytecode e entsoe ka mor'a Motorola 6502 ISA. Joalokaha Steve McCann a hopola bukeng ea hae tlaleho ea kakaretso ha a le Sharkfest '11, o ne a tloaelane le build 6502 ho tloha lenaneong la hae la matsatsi a sekolo se phahameng ho Apple II,' me tsebo ena e ile ea susumetsa mosebetsi oa hae oa ho rala BPF bytecode.

Ts'ehetso ea BPF e kenngoa ts'ebetsong ea Linux kernel ka mefuta ea v2.5 le holimo, e ekelitsoeng haholo ke boiteko ba Jay Schullist. Khoutu ea BPF e ile ea lula e sa fetohe ho fihlela 2011, ha Eric Dumaset a ne a hlophisa bocha toloko ea BPF hore e sebetse ka mokhoa oa JIT (Mohloli: JIT bakeng sa li-filters tsa pakete). Kamora sena, kernel, ho fapana le ho fetolela BPF bytecode, e ka fetolela mananeo a BPF ka kotloloho ho meaho e reriloeng: x86, ARM, MIPs, jj.

Hamorao, ka 2014, Alexey Starovoitov o ile a etsa tlhahiso ea mochine o mocha oa JIT bakeng sa BPF. Ha e le hantle, JIT ena e ncha e ile ea fetoha mohaho o mocha oa BPF 'me o bitsoa eBPF. Ke nahana hore li-VM ka bobeli li bile teng ka nako e itseng, empa hajoale ho sefa lipakete ho sebelisoa ho ipapisitse le eBPF. Ebile, mehlaleng e mengata ea litokomane tsa sejoale-joale, BPF e utloisisoa e le eBPF, 'me BPF ea khale kajeno e tsejoa e le cBPF.

eBPF e holisa mochini oa khale oa BPF ka mekhoa e mengata:

  • E ipapisitse le meaho ea sejoale-joale ea 64-bit. eBPF e sebedisa 64-bit rejisetara mme e eketsa palo ya direjistara tse teng ho tloha ho 2 (accumulator le X) ho ya ho 10. eBPF e fana hape ka diopcode tse ding (BPF_MOV, BPF_JNE, BPF_CALL...).
  • E khaotsoe ho sistimi e nyane ea layer layer. BPF e ne e tlameletsoe ho mofuta oa data oa batch. Kaha e ne e sebelisetsoa ho sefa lipakete, khoutu ea eona e ne e le ka har'a subsystem e fanang ka likhokahano tsa marang-rang. Leha ho le joalo, mochine oa sebele oa eBPF ha o sa tlamelloa ho mohlala oa data mme o ka sebelisoa bakeng sa morero ofe kapa ofe. Kahoo, joale lenaneo la eBPF le ka hokeloa ho tracepoint kapa kprobe. Sena se bula tsela ea lisebelisoa tsa eBPF, tlhahlobo ea ts'ebetso, le linyeoe tse ling tse ngata tsa tšebeliso maemong a li-subsystem tse ling tsa kernel. Hona joale khoutu ea eBPF e fumaneha ka tsela ea eona: kernel/bpf.
  • Mabenkele a lefats'e a data a bitsoang Maps. Limmapa ke mabenkele a bohlokoa a lumellang phapanyetsano ea data lipakeng tsa sebaka sa basebelisi le sebaka sa kernel. eBPF e fana ka mefuta e mengata ea limmapa.
  • Mesebetsi ea bobeli. Haholo-holo, ho ngola sephutheloana bocha, ho bala cheke, kapa ho kopanya sephutheloana. Mesebetsi ena e sebetsa ka hare ho kernel mme ha se mananeo a sebaka sa basebelisi. U ka etsa mehala ea sistimi ho tsoa mananeong a eBPF.
  • Qetella mehala. Boholo ba lenaneo ho eBPF bo lekanyelitsoe ho li-byte tse 4096. Karolo ea mohala oa mohatla e lumella lenaneo la eBPF ho fetisetsa taolo ho lenaneo le lecha la eBPF mme ka hona le fete moedi ona (ho fihlela mananeo a 32 a ka amahanngoa ka tsela ena).

eBPF: mohlala

Ho na le mehlala e mengata ea eBPF mehloling ea kernel ea Linux. Li fumaneha ho disampole/bpf/. Ho bokella mehlala ena, kenya feela:

$ sudo make samples/bpf/

Nke ke ka ngola mohlala o mocha bakeng sa eBPF ka bonna, empa ke tla sebelisa e 'ngoe ea lisampole tse fumanehang ho disampole/bpf/. Ke tla sheba likarolo tse ling tsa khoutu ebe ke hlalosa hore na e sebetsa joang. Ka mohlala, ke khethile lenaneo tracex4.

Ka kakaretso, mohlala o mong le o mong ho disampole/bpf/ o na le lifaele tse peli. Boemong bona:

  • tracex4_kern.c, e na le khoutu ea mohloli e lokelang ho etsoa kernel joalo ka eBPF bytecode.
  • tracex4_user.c, e na le lenaneo le tsoang sebakeng sa basebelisi.

Tabeng ena, re lokela ho bokella tracex4_kern.c ho eBPF bytecode. Hajoale ka gcc ha ho na backend bakeng sa eBPF. Ka lehlohonolo, clang e ka hlahisa eBPF bytecode. Makefile sebedisa clang bakeng sa ho bokella tracex4_kern.c ho faele ea ntho.

Ke boletse ka holimo hore e 'ngoe ea likarolo tse khahlisang haholo tsa eBPF ke limmapa. tracex4_kern e hlalosa 'mapa o le mong:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH ke e 'ngoe ea mefuta e mengata ea likarete tse fanoang ke eBPF. Tabeng ena, ke hashi feela. E kanna eaba u hlokometse papatso SEC("maps"). SEC ke macro e sebelisoang ho theha karolo e ncha ea faele ea binary. Ha e le hantle, ka mohlala tracex4_kern likarolo tse ling tse peli li hlalositsoe:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // получаем ip-адрес вызывающей стороны kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

Mesebetsi ena e 'meli e u lumella ho hlakola ho kena' mapeng (kprobe/kmem_cache_free) ebe o kenya ntho e ncha 'mapeng (kretprobe/kmem_cache_alloc_node). Mabitso ohle a tšebetso a ngotsoeng ka litlhaku tse kholo a lumellana le macros defined in bpf_helpers.h.

Haeba ke lahla likarolo tsa faele ea ntho, ke lokela ho bona hore likarolo tsena tse ncha li se li hlalositsoe:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Ho boetse ho na le tracex4_user.c, lenaneo le ka sehloohong. Ha e le hantle, lenaneo lena le mamela liketsahalo kmem_cache_alloc_node. Ha ketsahalo e joalo e etsahala, khoutu ea eBPF e tsamaellanang e etsoa. Khoutu e boloka tšobotsi ea IP ea ntho ho 'mapa,' me ntho eo e ntan'o kenngoa ka har'a lenaneo le ka sehloohong. Mohlala:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

Lenaneo la sebaka sa basebelisi le lenaneo la eBPF li amana joang? Ha e qala tracex4_user.c e jarisa faele ea ntho tracex4_kern.o ka ho sebedisa tshebetso load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

Ha o ntse o etsa load_bpf_file lipatlisiso tse hlalositsoeng faeleng ea eBPF li kenyellelitsoe ho /sys/kernel/debug/tracing/kprobe_events. Joale re mamela liketsahalo tsena mme lenaneo la rona le ka etsa ho hong ha li etsahala.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

Mananeo a mang kaofela a sampole/bpf/ a hlophisitsoe ka ho tšoana. Ka linako tsohle li na le lifaele tse peli:

  • XXX_kern.c: lenaneo la eBPF.
  • XXX_user.c: lenaneo le ka sehloohong.

Lenaneo la eBPF le supa limmapa le mesebetsi e amanang le karolo. Ha kernel e fana ka ketsahalo ea mofuta o itseng (mohlala, tracepoint), mesebetsi e tlanngoeng e ea etsoa. Likarete li fana ka puisano pakeng tsa lenaneo la kernel le lenaneo la sebaka sa mosebedisi.

fihlela qeto e

Sengoliloeng sena se buile ka BPF le eBPF ka kakaretso. Kea tseba hore ho na le tlhahisoleseling le lisebelisoa tse ngata mabapi le eBPF kajeno, kahoo ke tla khothaletsa lisebelisoa tse ling bakeng sa boithuto bo eketsehileng.

Ke khothaletsa ho bala:

Source: www.habr.com

Eketsa ka tlhaloso