Mhoro, Habr! Tinoda kukuzivisai kuti tiri kugadzirira bhuku rekuti ribude."
Sezvo iyo BPF chaiyo muchina ichiramba ichishanduka uye ichishandiswa nesimba mukuita, isu takashandurira iwe chinyorwa chinotsanangura kugona kwayo kukuru uye mamiriro azvino.
Mumakore achangopfuura, maturusi ezvirongwa uye matekiniki zvave kuwedzera kufarirwa kubhadharira zvipimo zveLinux kernel mune zviitiko apo yakakwirira-inoshanda packet process inodiwa. Imwe yenzira dzakakurumbira dzemhando iyi inonzi kernel bypass (kernel bypass) uye inobvumira, ichipfuura kernel network layer, kuita ese epacket kugadzirisa kubva munzvimbo yemushandisi. Kupfuura kernel kunosanganisirawo kudzora network kadhi kubva nzvimbo yemushandisi. Mune mamwe mazwi, pakushanda nekadhi yetiweki, tinovimba nemutyairi nzvimbo yemushandisi.
Nokuendesa kutonga kwakazara kwekadhi yetiweki kune purogiramu yemushandisi-nzvimbo, tinoderedza kernel pamusoro (context switching, network layer processing, interrupts, etc.), iyo inonyanya kukosha kana uchimhanya nekumhanya kwe10Gb / s kana kupfuura. Kernel bypass pamwe nemusanganiswa wezvimwe zvinhu (batch processing) uye kunyatsoita tuning (NUMA accounting, CPU kuparadzaniswa, zvichingodaro) zvinoenderana nezvakakosha zve-high-performance network processing munzvimbo yemushandisi. Zvichida muenzaniso wemuenzaniso weiyi nzira nyowani yekugadziriswa kwepaketi ndeye
Kuronga kupindirana kwetiweki munzvimbo yevashandisi kune akati wandei akashata:
- Iyo OS kernel ndeye abstraction layer ye Hardware zviwanikwa. Nekuti mushandisi nzvimbo zvirongwa zvinofanirwa kubata zviwanikwa zvavo zvakanangana, ivo vanofanirwawo kubata yavo hardware. Izvi zvinowanzoreva kuve nekuronga vatyairi vako.
- Nekuti isu tiri kusiya kernel nzvimbo zvachose, isu tiri kusiyawo ese enetwork mashandiro akapihwa nekernel. Zvirongwa zvemushandisi zvenzvimbo zvinofanirwa kusimudzira maficha angave atopihwa nekernel kana sisitimu yekushandisa.
- Zvirongwa zvinoshanda mu sandbox modhi, iyo inodzika zvakanyanya kupindirana kwavo uye inovadzivirira kubva mukubatanidzwa nezvimwe zvikamu zveiyo inoshanda sisitimu.
Muchidimbu, kana network ikaitika munzvimbo yevashandisi, kubudirira kwekuita kunowanikwa nekufambisa pakiti kugadzirisa kubva kukernel kuenda kunzvimbo yemushandisi. XDP inoita zvakapesana chaizvo: inofambisa zvirongwa zvetiweki kubva munzvimbo yemushandisi (mafirita, zvinogadzirisa, nzira, nezvimwewo) kuenda kunzvimbo yekernel. XDP inotitendera kuti tiite network basa nekukurumidza kana pakiti yarova network interface uye isati yatanga kukwira kumusoro kune kernel network subsystem. Somugumisiro, iyo packet processing speed inowedzera zvakanyanya. Nekudaro, iyo kernel inobvumira sei mushandisi kuita zvirongwa zvavo munzvimbo yekernel? Tisati tapindura mubvunzo uyu, ngatitarisei kuti BPF chii.
BPF uye eBPF
Pasinei nezita rinovhiringidza, BPF (Berkeley Packet Filtering) iri, chaizvoizvo, muenzaniso wemashini. Uyu muchina chaiwo wakagadzirirwa kubata kusefa kwepaketi, ndosaka zita.
Imwe yezvishandiso zvakakurumbira kushandisa BPF ndeye tcpdump
. Pakutora mapaketi uchishandisa tcpdump
mushandisi anogona kudoma chirevo chekusefa mapaketi. Mapaketi chete anoenderana nechirevo ichi ndiwo achatorwa. Somuenzaniso, izwi rokuti "tcp dst port 80
β inoreva mapaketi ese eTCP anosvika pachiteshi 80. Muunganidzi anogona kupfupisa kutaura uku nekuchishandura kuBPF bytecode.
$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 6
(002) ldb [20]
(003) jeq #0x6 jt 4 jf 15
(004) ldh [56]
(005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23]
(008) jeq #0x6 jt 9 jf 15
(009) ldh [20]
(010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16]
(013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0
Izvi ndizvo zvinoitwa nechirongwa chiri pamusoro apa:
- Instruction (000): Inotakura packet pa offset 12, sezwi 16-bit, mu accumulator. Offset 12 inoenderana ne ethertype yepakiti.
- Instruction (001): inoenzanisa kukosha mu accumulator ne 0x86dd, kureva, ne ethertype kukosha kwe IPv6. Kana mhedzisiro iri yechokwadi, saka iyo counter yepurogiramu inoenda kune rairo (002), uye kana zvisiri, ipapo ku (006).
- Instruction (006): inoenzanisa kukosha ne0x800 (ethertype kukosha kweIPv4). Kana mhinduro iri yechokwadi, ipapo chirongwa chinoenda ku (007), kana zvisiri, ipapo ku (015).
Uye zvichingodaro kusvikira purogiramu yekusefa yepakiti inodzorera chigumisiro. Izvi zvinowanzova Boolean. Kudzosa kukosha kusiri zero (kuraira (014)) kunoreva kuti pakiti yakagamuchirwa, uye kudzorera zero kukosha (murayiridzo (015)) zvinoreva kuti packet haina kugamuchirwa.
Iyo BPF virtual muchina uye bytecode yayo yakakurudzirwa naSteve McCann naVan Jacobson mukupera kwa1992 pakadhindwa bepa ravo.
Nekuti BPF muchina chaiwo, inotsanangura nharaunda umo zvirongwa zvinomhanya. Pamusoro peiyo bytecode, zvakare inotsanangura iyo batch memory modhi (mirayiridzo yemutoro inoiswa zvakajeka kune batch), marejista (A uye X; accumulator uye index marejista), kukwenya ndangariro chengetedzo, uye isina kujeka chirongwa counter. Sezvineiwo, iyo BPF bytecode yakateedzerwa mushure meiyo Motorola 6502 ISA. Sezvo Steve McCann akarangarira mune yake
BPF rutsigiro runoitwa muLinux kernel mushanduro v2.5 uye yepamusoro, yakawedzerwa zvakanyanya nekuedza kwaJay Schullist. BPF kodhi yakaramba isina kuchinjwa kusvika 2011, apo Eric Dumaset akagadzira patsva muturikiri weBPF kuti ashande muJIT mode (Kwakabva:
Gare gare, muna 2014, Alexey Starovoitov akaronga nzira itsva yeJIT yeBPF. Muchokwadi, iyi JIT nyowani yakava itsva BPF-yakavakirwa architecture uye yakanzi eBPF. Ini ndinofunga ese maVM akagara kwenguva yakati, asi parizvino kusefa kwepaketi kunoitwa zvichibva paEBPF. Muchokwadi, mumienzaniso yakawanda yezvinyorwa zvazvino, BPF inonzwisiswa seBPF, uye yekirasi BPF nhasi inozivikanwa secBPF.
eBPF inowedzera iyo yakasarudzika BPF chaiyo muchina munzira dzinoverengeka:
- Kubva pane zvemazuva ano 64-bit zvivakwa. eBPF inoshandisa 64-bit register uye inowedzera nhamba yezvinyorwa zviripo kubva ku2 (accumulator uye X) kusvika ku10. eBPF inopawo mamwe maopcode (BPF_MOV, BPF_JNE, BPF_CALL...).
- Yakabviswa kubva kune network layer subsystem. BPF yakasungirirwa kune batch data modhi. Sezvo yaishandiswa kusefa kwepaketi, kodhi yayo yaive mune subsystem inopa network kutaurirana. Nekudaro, iyo eBPF chaiyo muchina haichasungirirwa kune data data uye inogona kushandiswa kune chero chinangwa. Saka, ikozvino chirongwa cheBPF chinogona kubatanidzwa kune tracepoint kana kprobe. Izvi zvinovhura nzira yeBPF instrumentation, ongororo yekuita, uye mamwe akawanda ekushandisa kesi mumamiriro emamwe kernel subsystems. Ikozvino iyo eBPF kodhi iri munzira yayo yega: kernel/bpf.
- Zvitoro zvepasi rose zvinonzi Mepu. Mepu zvitoro zvakakosha-zvinokosha zvinogonesa kuchinjanisa data pakati pemushandisi nzvimbo nenzvimbo yekernel. eBPF inopa akati wandei marudzi emamepu.
- Secondary mabasa. Kunyanya, kunyora pasuru zvakare, kuverenga cheki, kana kutevedzera pasuru. Aya mabasa anomhanya mukati me kernel uye haasi mapurogiramu emushandisi-nzvimbo. Iwe unogona zvakare kufona system kubva kuEBPF zvirongwa.
- Pedzisa mafoni. Saizi yechirongwa muBPF inogumira ku4096 bytes. Iyo muswe yekufona ficha inobvumira eBPF chirongwa chekuendesa kutonga kune itsva eBPF chirongwa uye nekudaro kudarika ichi chinogumira (kusvika makumi matatu nemaviri zvirongwa zvinogona kubatanidzwa neiyi nzira).
eBPF: muenzaniso
Kune akati wandei mienzaniso yeBPF muLinux kernel masosi. Iwo anowanikwa pamasampuli/bpf/. Kuunganidza iyi mienzaniso, ingo pinda:
$ sudo make samples/bpf/
Ini handisi kuzonyora muenzaniso mutsva weBPF pachangu, asi ndichashandisa imwe yemasampuli inowanikwa mumasampuli/bpf/. Ini ndichatarisa zvimwe zvikamu zvekodhi uye kutsanangura kuti inoshanda sei. Semuenzaniso, ndakasarudza purogiramu tracex4
.
Kazhinji, imwe neimwe yemuenzaniso mumasampuli/bpf/ ine mafaera maviri. Muchiitiko ichi:
tracex4_kern.c
, ine sosi kodhi inofanirwa kuuraiwa mu kernel seBPF bytecode.tracex4_user.c
, ine chirongwa kubva munzvimbo yemushandisi.
Muchiitiko ichi, tinofanira kuunganidza tracex4_kern.c
kune eBPF bytecode. Iye zvino mu gcc
hapana backend yeEBPF. Sezvineiwo, clang
inogona kuburitsa eBPF bytecode.
anoshandisa clang
yekubatanidza tracex4_kern.c
kune chinhu faira.
Ndataura pamusoro apa kuti chimwe chezvinhu zvinonakidza zveBPF mamepu. tracex4_kern inotsanangura mepu imwe chete:
struct pair {
u64 val;
u64 ip;
};
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(struct pair),
.max_entries = 1000000,
};
BPF_MAP_TYPE_HASH
ndeimwe yemhando dzakawanda dzemakadhi anopihwa ne eBPF. Muchiitiko ichi, inongova hashi. Unogonawo kuona ad SEC("maps")
. SEC ndeye macro inoshandiswa kugadzira chikamu chitsva chebhinari faira. Chokwadi, muenzaniso tracex4_kern
zvimwe zvikamu zviviri zvinotsanangurwa:
SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{
long ptr = PT_REGS_PARM2(ctx);
bpf_map_delete_elem(&my_map, &ptr);
return 0;
}
SEC("kretprobe/kmem_cache_alloc_node")
int bpf_prog2(struct pt_regs *ctx)
{
long ptr = PT_REGS_RC(ctx);
long ip = 0;
// ΠΏΠΎΠ»ΡΡΠ°Π΅ΠΌ ip-Π°Π΄ΡΠ΅Ρ Π²ΡΠ·ΡΠ²Π°ΡΡΠ΅ΠΉ ΡΡΠΎΡΠΎΠ½Ρ kmem_cache_alloc_node()
BPF_KRETPROBE_READ_RET_IP(ip, ctx);
struct pair v = {
.val = bpf_ktime_get_ns(),
.ip = ip,
};
bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
return 0;
}
Aya maviri mabasa anotendera iwe kudzima yekupinda kubva pamepu (kprobe/kmem_cache_free
) uye wedzera imwe yekupinda pamepu (kretprobe/kmem_cache_alloc_node
) Mazita ese emabasa akanyorwa nemavara makuru anoenderana nemacros anotsanangurwa mukati
.
Kana ndikarasa zvikamu zvechinhu faira, ndinofanira kuona kuti zvikamu zvitsva izvi zvakatotsanangurwa:
$ objdump -h tracex4_kern.o
tracex4_kern.o: file format elf64-little
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
Kune zvakare
, purogiramu huru. Chaizvoizvo, chirongwa ichi chinoteerera kune zviitiko kmem_cache_alloc_node
. Kana chiitiko chakadaro chikaitika, iyo inoenderana eBPF kodhi inoitwa. Iyo kodhi inochengetedza iyo IP hunhu hwechinhu mumepu, uye chinhu chinobva chasunungurwa kuburikidza nechirongwa chikuru. Muenzaniso:
$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f
Ko chirongwa chemushandisi nzvimbo uye chirongwa cheBPF chine hukama sei? Pakutanga tracex4_user.c
inotakura chinhu faira tracex4_kern.o
kushandisa basa load_bpf_file
.
int main(int ac, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
char filename[256];
int i;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
return 1;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
}
for (i = 0; ; i++) {
print_old_objects(map_fd[1]);
sleep(1);
}
return 0;
}
Ndichiri kuita
probes inotsanangurwa muEBPF faira inowedzerwa kune /sys/kernel/debug/tracing/kprobe_events
. Iye zvino tinoteerera zviitiko izvi uye purogiramu yedu inogona kuita chimwe chinhu pazvinoitika.
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node
Mamwe mapurogiramu ese mumuenzaniso/bpf/ akagadzirwa zvakafanana. Anogara aine mafaera maviri:
XXX_kern.c
: eBPF chirongwa.XXX_user.c
: purogiramu huru.
Chirongwa cheBPF chinozivisa mamepu uye mabasa ane chekuita nechikamu. Kana kernel yaburitsa chiitiko cheimwe mhando (semuenzaniso, tracepoint
), mabasa akasungwa anoitwa. Iwo makadhi anopa kutaurirana pakati pe kernel chirongwa uye mushandisi nzvimbo chirongwa.
mhedziso
Ichi chinyorwa chakakurukura nezveBPF neBPF mune zvakajairika. Ndinoziva kune ruzivo rwakawanda uye zviwanikwa nezve eBPF nhasi, saka ini ndichakurudzira zvimwe zviwanikwa zvekuwedzera kudzidza.
Ndinokurudzira kuverenga:
BPF: universal in-kernel virtual muchina Jonathan Corbett. Nhanganyaya kuBPF uye kuti yakashanduka sei kuita eBPF.Sumo yakakwana yeBPF Brendan Gregg. Chinyorwa kubva kuLWN.net. Brendan anogara achitumira matweets nezve eBPF uye anochengeta runyoro rwezviwanikwa pamusoro wenyaya pane yakeblog .Zvinyorwa paBPF & eBPF Julia Evans. Mhinduro pamharidzo yakaitwa naSuchakra Sharma "Iyo BSD Packet Sefa: A New Architecture yeMushandisi-level Packet Capture". Maonero akanaka uye anonyatso kukubatsira kuti unzwisise masiraidhi.eBPF, chikamu 1: Yakare, Yazvino uye Yeramangwana Ferris Ellis. Yakareba nekuenderera mberi , asi zvakakodzera kuverenga. Chimwe chezvakanakisa zvinyorwa zvandakasangana nazvo paBPF.
Source: www.habr.com