BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

A farkon akwai fasaha kuma ana kiranta BPF. Muka kalle ta da suka gabata, Labarin Tsohon Alkawari na wannan jerin. A cikin 2013, ta hanyar ƙoƙarin Alexei Starovoitov da Daniel Borkman, ingantaccen sigar sa, wanda aka inganta don injunan 64-bit na zamani, an haɓaka kuma an haɗa su a cikin kwaya ta Linux. Wannan sabuwar fasaha ana kiranta a takaice Internal BPF, sannan aka sake masa suna Extended BPF, kuma yanzu, bayan shekaru da yawa, kowa kawai ya kira ta BPF.

Kusan yin magana, BPF yana ba ku damar gudanar da lambar da aka ba da mai amfani na sabani a cikin sararin kernel na Linux, kuma sabon tsarin gine-gine ya yi nasara sosai har za mu buƙaci ƙarin labaran dozin don bayyana duk aikace-aikacen sa. (Abin da kawai masu haɓaka ba su yi kyau ba, kamar yadda kuke gani a cikin lambar wasan kwaikwayon da ke ƙasa, shine ƙirƙirar tambari mai kyau.)

Wannan labarin yana bayyana tsarin na'urar kama-da-wane ta BPF, mu'amalar kernel don aiki tare da BPF, kayan aikin haɓakawa, da kuma taƙaitaccen bayani game da iyawar da ake da su, watau. duk abin da za mu buƙaci a nan gaba don zurfafa nazarin aikace-aikacen aikace-aikacen BPF.
BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Takaitaccen labarin

Gabatarwa ga gine-ginen BPF. Da farko, za mu ɗauki kallon idon tsuntsu game da gine-ginen BPF kuma mu zayyana manyan abubuwan da aka gyara.

Masu yin rijista da tsarin umarni na injin kama-da-wane na BPF. Tuni muna da ra'ayin gine-ginen gaba ɗaya, za mu bayyana tsarin na'ura mai kama da BPF.

Zagayowar rayuwa abubuwan BPF, tsarin fayil bpffs. A cikin wannan sashe, za mu yi nazari sosai kan tsarin rayuwar abubuwan BPF - shirye-shirye da taswira.

Sarrafa abubuwa ta amfani da kiran tsarin bpf. Tare da wasu fahimtar tsarin da aka riga aka yi, a ƙarshe za mu dubi yadda ake ƙirƙira da sarrafa abubuwa daga sararin samaniya ta amfani da tsarin kira na musamman - bpf(2).

Пишем программы BPF с помощью libbpf. Tabbas, zaku iya rubuta shirye-shirye ta amfani da kiran tsarin. Amma yana da wahala. Don ƙarin tabbataccen labari, masu shirye-shiryen nukiliya sun haɓaka ɗakin karatu libbpf. Za mu ƙirƙiri ainihin kwarangwal ɗin aikace-aikacen BPF wanda za mu yi amfani da shi a cikin misalai na gaba.

Mataimakan kernel. Anan zamu koyi yadda shirye-shiryen BPF zasu iya samun damar ayyukan taimakon kwaya - kayan aiki wanda, tare da taswira, yana faɗaɗa ƙarfin sabon BPF idan aka kwatanta da na gargajiya.

Samun damar taswira daga shirye-shiryen BPF. A wannan gaba, za mu san isashen fahimtar yadda za mu iya ƙirƙirar shirye-shirye masu amfani da taswira. Kuma bari ma mu dauki saurin lekawa cikin babban mai tabbatarwa mai girma.

Kayan aikin haɓakawa. Sashen taimako kan yadda ake haɗa abubuwan amfani da kernel da ake buƙata don gwaji.

Tsayawa. A ƙarshen talifin, waɗanda suka karanta wannan nisa za su sami kalmomi masu ƙarfafawa da kuma taƙaitaccen bayanin abin da zai faru a talifofi na gaba. Za mu kuma lissafta adadin hanyoyin haɗin gwiwa don nazarin kansu ga waɗanda ba su da sha'awar ko ikon jira don ci gaba.

Gabatarwa zuwa BPF Architecture

Kafin mu fara yin la'akari da tsarin gine-ginen BPF, za mu koma lokaci na ƙarshe (oh) zuwa Babban darajar BPF, wanda aka haɓaka a matsayin martani ga zuwan na'urorin RISC da kuma warware matsalar ingantaccen fakitin tacewa. Tsarin gine-ginen ya kasance mai nasara sosai wanda, da aka haife shi a cikin shekaru casa'in da biyar a Berkeley UNIX, an tura shi zuwa yawancin tsarin aiki na yanzu, ya tsira zuwa cikin XNUMX na mahaukaci kuma har yanzu yana samun sababbin aikace-aikace.

An haɓaka sabon BPF a matsayin martani ga fa'idodin injunan 64-bit, sabis na girgije da ƙarin buƙatar kayan aiki don ƙirƙirar SDN (Skayan aiki-dtsabtace naiki). Injiniyoyin cibiyar sadarwa na kwaya ne suka haɓaka a matsayin ingantaccen maye gurbin BPF na yau da kullun, sabon BPF a zahiri watanni shida bayan haka ya sami aikace-aikace a cikin aiki mai wahala na gano tsarin Linux, kuma yanzu, shekaru shida bayan bayyanarsa, zamu buƙaci labarin gaba ɗaya kawai don jera nau'ikan shirye-shirye daban-daban.

Hotunan ban dariya

A ainihin sa, BPF injin kama-da-wane ne wanda ke ba ku damar gudanar da lambar "sakamakon" a cikin sararin kwaya ba tare da lalata tsaro ba. An ƙirƙiri shirye-shiryen BPF a cikin sararin mai amfani, an ɗora su cikin kwaya, kuma an haɗa su zuwa wasu tushen taron. Wani lamari na iya zama, misali, isar da fakiti zuwa hanyar sadarwa ta hanyar sadarwa, ƙaddamar da wasu ayyukan kernel, da sauransu. A cikin yanayin kunshin, shirin BPF zai sami damar yin amfani da bayanai da metadata na fakitin (don karantawa da, maiyuwa, rubutu, dangane da nau'in shirin); a yanayin tafiyar da aikin kernel, muhawarar aikin, gami da masu nuni zuwa ƙwaƙwalwar kernel, da sauransu.

Bari mu dubi wannan tsari sosai. Don fara da, bari mu magana game da farko bambanci daga classic BPF, shirye-shiryen da aka rubuta a assembler. A cikin sabon sigar, an faɗaɗa tsarin gine-gine ta yadda za a iya rubuta shirye-shirye a cikin manyan harsuna, da farko, ba shakka, a cikin C. Don wannan, an ƙirƙiri wani baya ga llvm, wanda ke ba ku damar samar da bytecode don gine-ginen BPF.

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

An tsara gine-ginen BPF, a wani ɓangare, don gudanar da aiki yadda ya kamata akan injuna na zamani. Don yin wannan aiki a aikace, BPF bytecode, da zarar an ɗora shi a cikin kwaya, ana fassara shi zuwa lambar asali ta amfani da sashin da ake kira JIT compiler (Just In Tina). Na gaba, idan kun tuna, a cikin classic BPF an ɗora shirin a cikin kwaya kuma an haɗa shi zuwa tushen taron atomically - a cikin mahallin kiran tsarin guda ɗaya. A cikin sabon gine-gine, wannan yana faruwa a matakai biyu - na farko, an ɗora lambar a cikin kernel ta amfani da kiran tsarin. bpf(2)sa'an nan kuma, daga baya, ta hanyar wasu hanyoyin da suka bambanta dangane da nau'in shirin, shirin yana jingina zuwa tushen taron.

Anan mai karatu na iya samun tambaya: shin zai yiwu? Ta yaya ake tabbatar da amincin aiwatar da wannan lambar? An tabbatar mana da lafiyar kisa ta matakin loda shirye-shiryen BPF da ake kira verifier (a Turanci ana kiran wannan matakin verifier kuma zan ci gaba da amfani da kalmar Ingilishi):

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Verifier mai nazari ne a tsaye wanda ke tabbatar da cewa shirin baya rushe aikin kwaya na yau da kullun. Wannan, ta hanyar, ba yana nufin cewa shirin ba zai iya tsoma baki tare da tsarin aiki ba - shirye-shiryen BPF, dangane da nau'in, na iya karantawa da sake rubuta sassan ƙwaƙwalwar kernel, dawo da dabi'u na ayyuka, datsa, ƙarawa, sake rubutawa. har ma da tura fakitin cibiyar sadarwa. Mai tabbatarwa yana ba da tabbacin cewa gudanar da shirin BPF ba zai rushe kernel ba kuma shirin da, bisa ga ƙa'idodi, yana da damar rubutawa, misali, bayanan fakiti mai fita, ba zai iya sake rubuta ƙwaƙwalwar kernel a wajen fakitin ba. Za mu dubi mai tabbatarwa dalla-dalla a cikin sashin da ya dace, bayan mun saba da duk sauran abubuwan da ke cikin BPF.

To me muka koya zuwa yanzu? Mai amfani yana rubuta shiri a cikin C, yana loda shi cikin kernel ta amfani da kiran tsarin bpf(2), inda mai tabbatarwa ya duba shi kuma aka fassara shi zuwa lambar byte na asali. Sa'an nan iri ɗaya ko wani mai amfani ya haɗa shirin zuwa tushen taron kuma ya fara aiwatarwa. Rarraba taya da haɗi ya zama dole don dalilai da yawa. Da fari dai, gudanar da tantancewa yana da tsada sosai kuma ta hanyar zazzage wannan shirin sau da yawa muna bata lokacin kwamfuta. Abu na biyu, daidai yadda ake haɗa shirin ya dogara da nau'insa, kuma ɗayan "duniya" da aka haɓaka shekara guda da ta gabata bazai dace da sabbin nau'ikan shirye-shirye ba. (Ko da yake yanzu da gine-ginen ke ƙara girma, akwai ra'ayi don haɗa wannan haɗin gwiwa a matakin libbpf.)

Mai karatu mai hankali na iya lura cewa ba mu gama da hotuna ba tukuna. Tabbas, duk abubuwan da ke sama ba su bayyana dalilin da yasa BPF ke canza hoton ba idan aka kwatanta da BPF na al'ada. Sabbin sabbin abubuwa guda biyu waɗanda ke faɗaɗa fa'idar aiki sosai shine ikon yin amfani da haɗin gwiwar ƙwaƙwalwar ajiya da ayyukan taimakon kwaya. A cikin BPF, ana aiwatar da ƙwaƙwalwar da aka raba ta amfani da abin da ake kira taswira - tsarin bayanan da aka raba tare da takamaiman API. Wataƙila sun sami wannan suna saboda nau'in taswira na farko da ya bayyana shine tebur ɗin hash. Sa'an nan arrays ya bayyana, na gida (per-CPU) zanta teburi da na gida arrays, bishiyar bincike, taswirorin dauke da nuni ga shirye-shiryen BPF da ƙari mai yawa. Abin da ke da ban sha'awa a gare mu yanzu shi ne cewa shirye-shiryen BPF yanzu suna da ikon dagewa yanayi tsakanin kira da raba shi tare da wasu shirye-shirye da tare da sararin mai amfani.

Ana samun damar taswirori daga matakan mai amfani ta amfani da tsarin kiran tsarin bpf(2), kuma daga shirye-shiryen BPF da ke gudana a cikin kernel ta amfani da ayyukan taimako. Haka kuma, mataimaka sun wanzu ba kawai don yin aiki tare da taswira ba, har ma don samun damar sauran damar kwaya. Misali, shirye-shiryen BPF na iya amfani da ayyukan taimako don tura fakiti zuwa wasu musaya, samar da abubuwan da suka faru, samun damar tsarin kwaya, da sauransu.

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

A taƙaice, BPF yana ba da damar yin lodi na sabani, watau, gwajin tabbatarwa, lambar mai amfani zuwa sararin kernel. Wannan lambar zata iya adana yanayi tsakanin kira da musayar bayanai tare da sarari mai amfani, sannan kuma tana da damar yin amfani da tsarin kernel wanda irin wannan tsarin ya yarda da shi.

Wannan ya riga ya yi kama da damar da kernel modules ke bayarwa, idan aka kwatanta da wanda BPF yana da wasu fa'idodi (ba shakka, zaku iya kwatanta aikace-aikacen irin wannan kawai, alal misali, tsarin ganowa - ba za ku iya rubuta direba na sabani tare da BPF ba). Kuna iya lura da ƙaramin ƙofar shiga (wasu kayan aikin da ke amfani da BPF ba sa buƙatar mai amfani don samun ƙwarewar shirye-shiryen kernel, ko ƙwarewar shirye-shirye gabaɗaya), amincin lokacin aiki (ɗaga hannun ku cikin sharhi ga waɗanda ba su karya tsarin ba yayin rubutawa). ko gwaje-gwajen gwaje-gwaje), atomity - akwai raguwa lokacin da ake sake shigar da kayayyaki, kuma tsarin tsarin BPF yana tabbatar da cewa ba a rasa abubuwan da suka faru ba (don yin adalci, wannan ba gaskiya ba ne ga kowane nau'in shirye-shiryen BPF).

Kasancewar irin wannan damar ya sa BPF ta zama kayan aiki na duniya don faɗaɗa kwaya, wanda aka tabbatar a aikace: ana ƙara sabbin nau'ikan shirye-shirye zuwa BPF, manyan kamfanoni da yawa suna amfani da BPF akan sabobin fama 24 × 7, ƙari da ƙari. masu farawa suna gina kasuwancin su akan mafita dangane da wanda aka dogara akan BPF. Ana amfani da BPF a ko'ina: a cikin kariya daga hare-haren DDoS, ƙirƙirar SDN (alal misali, aiwatar da hanyoyin sadarwa don kubernetes), a matsayin babban kayan aikin gano tsarin da mai tattara kididdiga, a cikin tsarin gano kutse da tsarin sandbox, da sauransu.

Bari mu gama sashin bayyani na labarin anan kuma mu kalli injin kama-da-wane da yanayin yanayin BPF daki-daki.

Digression: utilities

Domin samun damar gudanar da misalan a cikin sassan da ke gaba, kuna iya buƙatar yawan kayan aiki, aƙalla. llvm/clang tare da goyon bayan bpf da bpftool. A sashen Kayayyakin Ci gaba Kuna iya karanta umarnin don haɗa kayan aiki, da kuma kernel ɗin ku. An sanya wannan sashe a ƙasa don kada ya dame jituwar gabatarwarmu.

BPF Virtual Machine Rajista da Tsarin Umarni

An haɓaka tsarin gine-gine da tsarin umarni na BPF la'akari da cewa za a rubuta shirye-shirye a cikin yaren C kuma, bayan lodawa cikin kernel, an fassara su zuwa lambar asali. Sabili da haka, an zaɓi adadin rajista da saitin umarni tare da ido ga tsaka-tsaki, a cikin ma'anar lissafi, na iyawar injinan zamani. Bugu da ƙari, an sanya ƙuntatawa daban-daban akan shirye-shirye, alal misali, har zuwa kwanan nan ba a iya rubuta madaukai da subroutines ba, kuma adadin umarnin ya iyakance zuwa 4096 (yanzu shirye-shirye masu dama na iya ɗaukar nauyin umarni miliyan).

BPF tana da rajista goma sha ɗaya masu amfani da 64-bit r0-r10 da ma'aunin shirin. Yi rijista r10 ya ƙunshi ma'anar firam kuma ana karantawa kawai. Shirye-shiryen suna da damar yin amfani da tari mai 512-byte a lokacin aiki da mara iyaka na adadin ƙwaƙwalwar ajiyar da aka raba ta hanyar taswira.

Ana ba da izinin shirye-shiryen BPF don gudanar da takamaiman saiti na nau'ikan mataimakan kwaya da, kwanan nan, ayyuka na yau da kullun. Kowace aikin da ake kira zai iya ɗaukar har zuwa gardama biyar, waɗanda aka wuce cikin rajista r1-r5, kuma an wuce ƙimar dawowa zuwa r0. An tabbatar da cewa bayan dawowa daga aikin, abubuwan da ke cikin rajista r6-r9 Ba zai canza ba.

Don ingantaccen fassarar shirin, yin rajista r0-r11 don duk gine-ginen da aka goyan baya an tsara su na musamman zuwa rajista na gaske, la'akari da fasalin ABI na gine-gine na yanzu. Misali, don x86_64 yin rajista r1-r5, da ake amfani da su don wuce sigogin aiki, ana nunawa akan rdi, rsi, rdx, rcx, r8, waɗanda ake amfani da su don ƙaddamar da sigogi zuwa ayyuka a kunne x86_64. Misali, lambar da ke hannun hagu tana fassara lambar da ke hannun dama kamar haka:

1:  (b7) r1 = 1                    mov    $0x1,%rdi
2:  (b7) r2 = 2                    mov    $0x2,%rsi
3:  (b7) r3 = 3                    mov    $0x3,%rdx
4:  (b7) r4 = 4                    mov    $0x4,%rcx
5:  (b7) r5 = 5                    mov    $0x5,%r8
6:  (85) call pc+1                 callq  0x0000000000001ee8

Yi rijista r0 Hakanan ana amfani da shi don dawo da sakamakon aiwatar da shirin, kuma a cikin rajista r1 An ƙaddamar da shirin mai nuni zuwa mahallin - ya danganta da nau'in shirin, wannan na iya zama, misali, tsari struct xdp_md (na XDP) ko tsari struct __sk_buff (don shirye-shiryen cibiyar sadarwa daban-daban) ko tsari struct pt_regs (don nau'ikan shirye-shiryen bincike daban-daban), da sauransu.

Don haka, muna da saitin rajista, mataimakan kwaya, tari, mai nuna mahalli da ƙwaƙwalwar ajiyar da aka raba ta hanyar taswira. Ba wai duk wannan ya zama dole a tafiyar ba, amma...

Bari mu ci gaba da bayanin kuma muyi magana game da tsarin umarni don aiki tare da waɗannan abubuwa. Duk (Kusan duka) Umurnin BPF suna da ƙayyadaddun girman 64-bit. Idan ka kalli umarni ɗaya akan injin Big Endian mai girman 64-bit zaka gani

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Yana da Code - Wannan shi ne rikodin koyarwar, Dst/Src su ne codeing na mai karɓa da tushen, bi da bi, Off - 16-bit sanya hannu a ciki, kuma Imm lamba ce mai sa hannun hannu 32-bit da aka yi amfani da ita a wasu umarni (mai kama da cBPF akai-akai K). Rufewa Code yana da daya daga cikin nau'i biyu:

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Darasi na koyarwa 0, 1, 2, 3 suna ayyana umarni don aiki tare da ƙwaƙwalwar ajiya. Su ake kira, BPF_LD, BPF_LDX, BPF_ST, BPF_STX, bi da bi. Darasi na 4, 7 (BPF_ALU, BPF_ALU64) zama saitin umarnin ALU. Darasi na 5, 6 (BPF_JMP, BPF_JMP32) ya ƙunshi umarnin tsalle.

Ƙarin shirin na nazarin tsarin koyarwa na BPF shine kamar haka: maimakon yin lissafin duk umarnin da sigogin su sosai, za mu dubi misalai biyu a cikin wannan sashe kuma daga cikinsu zai bayyana yadda umarnin ke aiki da yadda za a yi aiki. da hannu kwance kowane fayil na binary don BPF. Don ƙarfafa kayan daga baya a cikin labarin, za mu kuma sadu da umarnin mutum ɗaya a cikin sassan game da Verifier, JIT compiler, fassarar BPF na gargajiya, da kuma lokacin nazarin taswira, ayyukan kira, da sauransu.

Lokacin da muke magana game da umarni ɗaya, za mu koma ga ainihin fayilolin bpf.h и bpf_common.h, waɗanda ke ayyana lambobin lambobi na umarnin BPF. Lokacin nazarin gine-gine da kanku da/ko nazarin binaries, zaku iya samun ilimin tarukan tarukan a cikin maɓuɓɓuka masu zuwa, an jera su cikin tsari mai rikitarwa: Unoffice eBPF Spec, Jagoran Magana na BPF da XDP, Saitin umarni, Takaddun bayanai/cibiyar sadarwa/filter.txt kuma, ba shakka, a cikin lambar tushen Linux - mai tabbatarwa, JIT, fassarar BPF.

Misali: rarraba BPF a cikin kai

Bari mu kalli misalin da muke hada shirin readelf-example.c kuma duba sakamakon binary. Za mu bayyana ainihin abun ciki readelf-example.c kasa, bayan mun mayar da dabaru daga binary codes:

$ clang -target bpf -c readelf-example.c -o readelf-example.o -O2
$ llvm-readelf -x .text readelf-example.o
Hex dump of section '.text':
0x00000000 b7000000 01000000 15010100 00000000 ................
0x00000010 b7000000 02000000 95000000 00000000 ................

Rukunin farko a cikin fitarwa readelf Indentation ne kuma shirin namu ya ƙunshi umarni guda huɗu:

Code Dst Src Off  Imm
b7   0   0   0000 01000000
15   0   1   0100 00000000
b7   0   0   0000 02000000
95   0   0   0000 00000000

Lambobin umarni daidai suke b7, 15, b7 и 95. Ka tuna cewa mafi ƙanƙanta mahimman rago uku su ne ajin koyarwa. A cikin yanayinmu, kashi na huɗu na duk umarnin ba komai bane, don haka azuzuwan koyarwa sune 7, 5, 7, 5, bi da bi. BPF_ALU64,kuma 5 BPF_JMP. Ga duka azuzuwan, tsarin koyarwa iri ɗaya ne (duba sama) kuma za mu iya sake rubuta shirinmu kamar haka (a lokaci guda za mu sake rubuta sauran ginshiƙai a cikin sigar ɗan adam):

Op S  Class   Dst Src Off  Imm
b  0  ALU64   0   0   0    1
1  0  JMP     0   1   1    0
b  0  ALU64   0   0   0    2
9  0  JMP     0   0   0    0

Ayyuka b aji ALU64 Shin BPF_MOV. Yana ba da ƙima ga rijistar manufa. Idan an saita bit s (source), to ana ɗaukar darajar daga rajistar tushen, kuma idan, kamar yadda a cikin yanayinmu, ba a saita shi ba, to ana ɗaukar ƙimar daga filin. Imm. Don haka a cikin umarnin farko da na uku muna yin aikin r0 = Imm. Bugu da ari, JMP aji 1 aiki ne BPF_JEQ (tsalle idan daidai). A cikin yanayinmu, tun daga bit S sifili ne, yana kwatanta ƙimar rijistar tushen da filin Imm. Idan dabi'u sun zo daidai, to, canji yana faruwa zuwa PC + Offinda PC, kamar yadda aka saba, ya ƙunshi adireshin umarni na gaba. A ƙarshe, JMP Class 9 Aiki shine BPF_EXIT. Wannan umarnin yana ƙare shirin, yana komawa ga kernel r0. Bari mu ƙara sabon shafi zuwa teburin mu:

Op    S  Class   Dst Src Off  Imm    Disassm
MOV   0  ALU64   0   0   0    1      r0 = 1
JEQ   0  JMP     0   1   1    0      if (r1 == 0) goto pc+1
MOV   0  ALU64   0   0   0    2      r0 = 2
EXIT  0  JMP     0   0   0    0      exit

Za mu iya sake rubuta wannan a cikin mafi dacewa tsari:

     r0 = 1
     if (r1 == 0) goto END
     r0 = 2
END:
     exit

Idan muka tuna abin da ke cikin rajista r1 An ƙaddamar da shirin mai nuni zuwa mahallin daga kernel, kuma a cikin rajista r0 Ana mayar da darajar zuwa kwaya, to, za mu iya ganin cewa idan mai nuni ga mahallin ya zama sifili, to, mu dawo 1, kuma in ba haka ba - 2. Bari mu bincika cewa muna da gaskiya ta hanyar kallon tushen:

$ cat readelf-example.c
int foo(void *ctx)
{
        return ctx ? 2 : 1;
}

Ee, shiri ne mara ma'ana, amma yana fassara cikin umarni huɗu masu sauƙi kawai.

Misali na musamman: umarnin 16-byte

Mun ambata a baya cewa wasu umarni suna ɗaukar fiye da 64 bits. Wannan ya shafi, misali, ga umarni lddw (Kodi = 0x18 = BPF_LD | BPF_DW | BPF_IMM) - loda kalma biyu daga filayen cikin rajista Imm... Gaskiyar ita ce Imm yana da girman 32, kuma kalmar biyu ita ce 64-bit, don haka loda ƙimar 64-bit nan take cikin rajista a cikin umarnin 64-bit ɗaya ba zai yi aiki ba. Don yin wannan, ana amfani da umarni guda biyu masu kusa don adana ɓangaren na biyu na ƙimar 64-bit a cikin filin Imm... Misali:

$ cat x64.c
long foo(void *ctx)
{
        return 0x11223344aabbccdd;
}
$ clang -target bpf -c x64.c -o x64.o -O2
$ llvm-readelf -x .text x64.o
Hex dump of section '.text':
0x00000000 18000000 ddccbbaa 00000000 44332211 ............D3".
0x00000010 95000000 00000000                   ........

Akwai umarni guda biyu kawai a cikin shirin binary:

Binary                                 Disassm
18000000 ddccbbaa 00000000 44332211    r0 = Imm[0]|Imm[1]
95000000 00000000                      exit

Za mu sake saduwa da umarni lddw, lokacin da muke magana game da ƙaura da aiki tare da taswira.

Misali: tarwatsa BPF ta amfani da daidaitattun kayan aikin

Don haka, mun koyi karanta lambobin binary na BPF kuma a shirye muke mu rarraba kowane umarni idan ya cancanta. Duk da haka, yana da daraja a faɗi cewa a aikace ya fi dacewa da sauri don rarraba shirye-shirye ta amfani da kayan aiki na yau da kullum, misali:

$ llvm-objdump -d x64.o

Disassembly of section .text:

0000000000000000 <foo>:
 0: 18 00 00 00 dd cc bb aa 00 00 00 00 44 33 22 11 r0 = 1234605617868164317 ll
 2: 95 00 00 00 00 00 00 00 exit

Rayuwar abubuwan BPF, tsarin fayil bpffs

(Na fara koyon wasu cikakkun bayanai da aka bayyana a cikin wannan ƙaramin sashe daga post Alexei Starovoitov BPF Blog.)

Abubuwan BPF - shirye-shirye da taswira - an ƙirƙira su daga sararin mai amfani ta amfani da umarni BPF_PROG_LOAD и BPF_MAP_CREATE tsarin kira bpf(2), Za mu yi magana game da ainihin yadda hakan ke faruwa a sashe na gaba. Wannan yana haifar da tsarin bayanan kwaya kuma ga kowane ɗayansu refcount (ƙididdigar ƙididdiga) an saita zuwa ɗaya, kuma ana mayar da mai siffanta fayil ɗin da ke nuna abu ga mai amfani. Bayan an rufe hannun refcount abu ya rage da daya, kuma idan ya kai sifili, abin ya lalace.

Idan shirin yana amfani da taswira, to refcount wadannan taswirori suna karuwa da daya bayan loda shirin, watau. Ana iya rufe kwatancen fayilolin su daga tsarin mai amfani da har yanzu refcount ba zai zama sifili ba:

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Bayan mun yi nasarar loda shirin, yawanci muna haɗa shi zuwa wani nau'in janareta na taron. Misali, zamu iya sanya shi akan hanyar sadarwa ta hanyar sadarwa don sarrafa fakiti masu shigowa ko haɗa shi da wasu tracepoint a cikin gindi. A wannan lokacin, ma'aunin tunani shima zai ƙaru da ɗaya kuma zamu iya rufe bayanin fayil ɗin a cikin shirin lodawa.

Me zai faru idan yanzu mun rufe bootloader? Ya dogara da nau'in janareta na taron (ƙugiya). Duk ƙugiya na cibiyar sadarwa za su kasance bayan mai ɗaukar kaya ya ƙare, waɗannan su ne abin da ake kira ƙugiya na duniya. Kuma, alal misali, za a saki shirye-shiryen ganowa bayan tsarin da ya haifar da su ya ƙare (sabili da haka ana kiran su gida, daga "na gida zuwa tsari"). A fasaha, ƙugiya na gida koyaushe suna da madaidaicin bayanin fayil a sararin mai amfani don haka rufe lokacin da aka rufe tsari, amma ƙugiya na duniya ba sa. A cikin adadi mai zuwa, ta yin amfani da giciye ja, na yi ƙoƙarin nuna yadda ƙarewar shirin mai ɗaukar kaya ya shafi rayuwar abubuwa a cikin yanayin ƙugiya na gida da na duniya.

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Me yasa akwai bambanci tsakanin ƙugiya na gida da na duniya? Gudun wasu nau'ikan shirye-shiryen cibiyar sadarwa yana da ma'ana ba tare da sarari mai amfani ba, alal misali, tunanin kariya ta DDoS - bootloader ya rubuta dokoki kuma ya haɗa shirin BPF zuwa cibiyar sadarwar cibiyar sadarwa, bayan haka bootloader zai iya zuwa ya kashe kansa. A gefe guda, yi tunanin shirin gano kuskuren da kuka rubuta akan gwiwoyi a cikin mintuna goma - lokacin da ya gama, kuna son babu datti a cikin tsarin, kuma ƙugiya na gida za su tabbatar da hakan.

A gefe guda, yi tunanin cewa kuna son haɗawa zuwa wurin ganowa a cikin kernel kuma tattara ƙididdiga cikin shekaru masu yawa. A wannan yanayin, kuna so ku kammala ɓangaren mai amfani kuma ku koma cikin ƙididdiga lokaci zuwa lokaci. Tsarin fayil ɗin bpf yana ba da wannan damar. Tsarin fayil ne na ƙwaƙwalwar ajiya-kawai wanda ke ba da damar ƙirƙirar fayilolin da ke nuni da abubuwan BPF kuma ta haka ya ƙaru. refcount abubuwa. Bayan haka, mai ɗaukar kaya zai iya fita, kuma abubuwan da ya ƙirƙira za su kasance da rai.

BPF don ƙananan yara, sashi na ɗaya: BPF mai tsawo

Ƙirƙirar fayiloli a cikin bpffs waɗanda ke nuni da abubuwan BPF ana kiran su "pinning" (kamar yadda a cikin jumla mai zuwa: "tsari na iya haɗa shirin BPF ko taswira"). Ƙirƙirar abubuwan fayil don abubuwan BPF yana da ma'ana ba kawai don tsawaita rayuwar abubuwan gida ba, har ma don amfani da abubuwan duniya - komawa ga misali tare da shirin kare DDoS na duniya, muna so mu iya zuwa mu duba kididdiga. lokaci zuwa lokaci.

Tsarin fayil ɗin BPF yawanci ana hawa a ciki /sys/fs/bpf, amma kuma ana iya dora shi a gida, misali, kamar haka:

$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint

Ana ƙirƙira sunayen tsarin fayil ta amfani da umarnin BPF_OBJ_PIN Kira tsarin BPF. Alal misali, bari mu ɗauki shirin, mu haɗa shi, mu loda shi, mu saka shi a ciki bpffs. Shirinmu ba ya yin wani abu mai amfani, muna gabatar da lambar ne kawai don ku iya sake buga misali:

$ cat test.c
__attribute__((section("xdp"), used))
int test(void *ctx)
{
        return 0;
}

char _license[] __attribute__((section("license"), used)) = "GPL";

Bari mu haɗa wannan shirin kuma mu ƙirƙiri kwafin tsarin fayil na gida bpffs:

$ clang -target bpf -c test.c -o test.o
$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint

Yanzu bari mu sauke shirin ta amfani da utility bpftool kuma dubi tsarin kira na rakiyar bpf(2) (wasu layin da ba su da mahimmanci an cire su daga fitarwa):

$ sudo strace -e bpf bpftool prog load ./test.o bpf-mountpoint/test
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="test", ...}, 120) = 3
bpf(BPF_OBJ_PIN, {pathname="bpf-mountpoint/test", bpf_fd=3}, 120) = 0

Anan mun loda shirin ta amfani da BPF_PROG_LOAD, an karɓi bayanin fayil daga kernel 3 da kuma amfani da umarnin BPF_OBJ_PIN liƙa wannan bayanin fayil ɗin azaman fayil "bpf-mountpoint/test". Bayan wannan shirin bootloader bpftool gama aiki, amma shirin mu ya kasance a cikin kernel, kodayake ba mu haɗa shi zuwa kowane cibiyar sadarwa ba:

$ sudo bpftool prog | tail -3
783: xdp  name test  tag 5c8ba0cf164cb46c  gpl
        loaded_at 2020-05-05T13:27:08+0000  uid 0
        xlated 24B  jited 41B  memlock 4096B

Za mu iya share abun fayil kullum unlink(2) kuma bayan haka za a share shirin da ya dace:

$ sudo rm ./bpf-mountpoint/test
$ sudo bpftool prog show id 783
Error: get by id (783): No such file or directory

Share abubuwa

Da yake magana game da goge abubuwa, ya zama dole a fayyace cewa bayan mun cire haɗin shirin daga ƙugiya (generator na taron), babu wani sabon al'amari da zai haifar da ƙaddamar da shi, duk da haka, duk abubuwan da ke faruwa a halin yanzu na shirin za a kammala su cikin tsari na yau da kullun. .

Wasu nau'ikan shirye-shiryen BPF suna ba ku damar maye gurbin shirin a kan tashi, watau. samar da atomity jerin replace = detach old program, attach new program. A wannan yanayin, duk lokuta masu aiki na tsohuwar sigar shirin za su gama aikinsu, kuma za a ƙirƙiri sabbin masu gudanar da taron daga sabon shirin, kuma “atomicity” a nan yana nufin cewa ba za a rasa ko ɗaya taron ba.

Haɗa shirye-shirye zuwa tushen taron

A cikin wannan labarin, ba za mu bayyana daban-daban haɗa shirye-shiryen zuwa tushen abubuwan da suka faru ba, tunda yana da ma'ana don nazarin wannan a cikin mahallin takamaiman nau'in shirin. Cm. misali a kasa, inda muke nuna yadda ake haɗa shirye-shirye kamar XDP.

Gudanar da Abubuwan Ta amfani da Kiran Tsarin bpf

Shirye-shiryen BPF

An ƙirƙira da sarrafa duk abubuwan BPF daga sararin mai amfani ta amfani da kiran tsarin bpf, yana da samfuri mai zuwa:

#include <linux/bpf.h>

int bpf(int cmd, union bpf_attr *attr, unsigned int size);

Ga tawagar cmd yana daya daga cikin dabi'u na nau'in enum bpf_cmd, attr - mai nuni ga sigogi don takamaiman shirin da size - girman abu bisa ga mai nuni, i.e. yawanci wannan sizeof(*attr). A cikin kernel 5.8 tsarin kira bpf yana goyan bayan umarni 34 daban-daban, kuma ma'anar union bpf_attr ya mamaye layi 200. Amma bai kamata mu tsorata da wannan ba, tun da za mu saba da kanmu da umarni da sigogi a tsawon labaran da yawa.

Bari mu fara da tawagar BPF_PROG_LOAD, wanda ke ƙirƙirar shirye-shiryen BPF - yana ɗaukar saitin umarnin BPF kuma ya loda shi cikin kwaya. A lokacin lodawa, an ƙaddamar da mai tabbatarwa, sannan mai tarawa JIT kuma, bayan nasarar aiwatarwa, ana mayar da bayanin fayil ɗin shirin ga mai amfani. Mun ga abin da ya faru da shi a gaba a cikin sashin da ya gabata game da zagayowar rayuwa na abubuwa BPF.

Yanzu za mu rubuta wani shiri na al'ada wanda zai loda shirin BPF mai sauƙi, amma da farko muna buƙatar yanke shawarar irin shirin da muke son ɗauka - dole ne mu zaɓi. nau'in kuma a cikin tsarin wannan nau'in, rubuta shirin da zai wuce gwajin tabbatarwa. Koyaya, don kar a rikitar da tsarin, ga wani shiri da aka yi: za mu ɗauki shirin kamar BPF_PROG_TYPE_XDP, wanda zai dawo da darajar XDP_PASS (tsalle duk fakitin). A cikin mai haɗa BPF yana da sauƙi sosai:

r0 = 2
exit

Bayan mun yanke shawara cewa za mu yi upload, za mu iya gaya muku yadda za mu yi shi:

#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>

static inline __u64 ptr_to_u64(const void *ptr)
{
        return (__u64) (unsigned long) ptr;
}

int main(void)
{
    struct bpf_insn insns[] = {
        {
            .code = BPF_ALU64 | BPF_MOV | BPF_K,
            .dst_reg = BPF_REG_0,
            .imm = XDP_PASS
        },
        {
            .code = BPF_JMP | BPF_EXIT
        },
    };

    union bpf_attr attr = {
        .prog_type = BPF_PROG_TYPE_XDP,
        .insns     = ptr_to_u64(insns),
        .insn_cnt  = sizeof(insns)/sizeof(insns[0]),
        .license   = ptr_to_u64("GPL"),
    };

    strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
    syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));

    for ( ;; )
        pause();
}

Abubuwan ban sha'awa a cikin shirin suna farawa da ma'anar tsararru insns - shirin mu na BPF a lambar injin. A wannan yanayin, kowane umarni na shirin BPF yana cushe cikin tsarin bpf_insn. Abu na farko insns ya bi umarnin r0 = 2, na biyu - exit.

Ja da baya. Kwayar tana bayyana mafi dacewa macros don rubuta lambobin inji, da kuma amfani da fayil ɗin taken kernel tools/include/linux/filter.h za mu iya rubuta

struct bpf_insn insns[] = {
    BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
    BPF_EXIT_INSN()
};

Amma tun da rubuta shirye-shiryen BPF a cikin lambar asali ya zama dole kawai don rubuta gwaje-gwaje a cikin kwaya da labarai game da BPF, rashin waɗannan macro ba ya dagula rayuwar mai haɓakawa da gaske.

Bayan ayyana shirin BPF, za mu matsa zuwa loda shi cikin kwaya. Mafi ƙarancin saitin sigoginmu attr ya haɗa da nau'in shirin, saiti da adadin umarni, lasisin da ake buƙata, da suna "woo", wanda muke amfani da shi don nemo shirin mu a kan tsarin bayan saukewa. Shirin, kamar yadda aka alkawarta, ana loda shi a cikin tsarin ta amfani da tsarin kira bpf.

A ƙarshen shirin za mu ƙare a cikin madauki marar iyaka wanda ke kwatanta nauyin biyan kuɗi. Idan ba tare da shi ba, kernel za ta kashe shirin lokacin da aka rufe bayanin fayil ɗin da tsarin kiran tsarin ya dawo mana. bpf, kuma ba za mu gan shi a cikin tsarin ba.

To, mun shirya don gwaji. Mu tara mu gudanar da shirin a karkashin stracedon tabbatar da cewa komai yana aiki kamar yadda ya kamata:

$ clang -g -O2 simple-prog.c -o simple-prog

$ sudo strace ./simple-prog
execve("./simple-prog", ["./simple-prog"], 0x7ffc7b553480 /* 13 vars */) = 0
...
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0x7ffe03c4ed50, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_V
ERSION(0, 0, 0), prog_flags=0, prog_name="woo", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = 3
pause(

Komai yayi kyau, bpf(2) ya dawo mana da hannu 3 kuma muka shiga madauki marar iyaka da pause(). Bari muyi kokarin nemo shirin mu a cikin tsarin. Don yin wannan, za mu je wani tashoshi da kuma amfani da mai amfani bpftool:

# bpftool prog | grep -A3 woo
390: xdp  name woo  tag 3b185187f1855c4c  gpl
        loaded_at 2020-08-31T24:66:44+0000  uid 0
        xlated 16B  jited 40B  memlock 4096B
        pids simple-prog(10381)

Mun ga cewa akwai shirin da aka ɗora akan tsarin woo Wanda ID na duniya shine 390 kuma a halin yanzu yana ci gaba simple-prog akwai buɗaɗɗen bayanin fayil yana nuna shirin (kuma idan simple-prog zai gama aikin, sannan woo zai bace). Kamar yadda aka zata, shirin woo yana ɗaukar bytes 16 - umarni biyu - na lambobin binary a cikin gine-ginen BPF, amma a cikin sigar asali (x86_64) ya riga ya zama 40 bytes. Mu kalli shirin namu a asali:

# bpftool prog dump xlated id 390
   0: (b7) r0 = 2
   1: (95) exit

babu mamaki. Yanzu bari mu dubi lambar da JIT compiler ya samar:

# bpftool prog dump jited id 390
bpf_prog_3b185187f1855c4c_woo:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x0,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0
  19:   mov    $0x2,%eax
  1e:   pop    %rbx
  1f:   pop    %r15
  21:   pop    %r14
  23:   pop    %r13
  25:   pop    %rbx
  26:   leaveq
  27:   retq

ba tasiri sosai ga exit(2), amma a cikin adalci, shirin namu yana da sauƙi, kuma ga shirye-shirye marasa mahimmanci, gabatarwa da maganganun da JIT compiler ya kara, ba shakka, ana buƙatar.

Maps

Shirye-shiryen BPF na iya amfani da tsararrun wuraren ƙwaƙwalwar ajiya waɗanda ke samun dama ga sauran shirye-shiryen BPF da zuwa shirye-shirye a sararin mai amfani. Ana kiran waɗannan abubuwa taswira kuma a cikin wannan sashe za mu nuna yadda ake sarrafa su ta amfani da tsarin kira bpf.

Bari mu ce nan da nan cewa iyawar taswirorin ba su iyakance kawai don samun damar yin amfani da ƙwaƙwalwar ajiya ba. Akwai taswirori na musamman da suka ƙunshi, misali, masu nuni ga shirye-shiryen BPF ko masu nuni ga mu'amalar hanyar sadarwa, taswirori don aiki tare da abubuwan da suka faru, da sauransu. Ba za mu yi magana game da su a nan ba, don kada mu dame mai karatu. Baya ga wannan, muna watsi da batutuwan aiki tare, tunda wannan ba shi da mahimmanci ga misalan mu. Ana iya samun cikakken jerin nau'ikan taswirar da ake samu a ciki <linux/bpf.h>, kuma a cikin wannan sashe za mu ɗauki a matsayin misali nau'in farko na tarihi, tebur zanta BPF_MAP_TYPE_HASH.

Idan ka ƙirƙiri teburin zanta a ciki, ka ce, C++, za ka ce unordered_map<int,long> woo, wanda a cikin Rashanci yana nufin "Ina buƙatar tebur woo Unlimited size, wanda maɓallai iri ne int, kuma dabi'u sune nau'in long" Domin ƙirƙirar tebur na zanta na BPF, muna buƙatar yin abu ɗaya kawai, sai dai cewa dole ne mu ƙayyade matsakaicin girman teburin, kuma maimakon ƙididdige nau'ikan maɓalli da ƙima, muna buƙatar ƙayyade girmansu a cikin bytes. . Don ƙirƙirar taswira yi amfani da umarnin BPF_MAP_CREATE tsarin kira bpf. Bari mu kalli ƙaramin shiri ko žasa wanda ke ƙirƙirar taswira. Bayan shirin da ya gabata wanda ke ɗaukar shirye-shiryen BPF, wannan ya kamata ya zama mai sauƙi a gare ku:

$ cat simple-map.c
#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>

int main(void)
{
    union bpf_attr attr = {
        .map_type = BPF_MAP_TYPE_HASH,
        .key_size = sizeof(int),
        .value_size = sizeof(int),
        .max_entries = 4,
    };
    strncpy(attr.map_name, "woo", sizeof(attr.map_name));
    syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));

    for ( ;; )
        pause();
}

Anan mun ayyana saitin sigogi attr, wanda a ciki muke cewa "Ina buƙatar tebur zanta tare da maɓalli da ƙimar girma sizeof(int), wanda zan iya sanya iyakar abubuwa hudu." Lokacin ƙirƙirar taswirar BPF, zaku iya tantance wasu sigogi, alal misali, kamar yadda yake a cikin misalin tare da shirin, mun ayyana sunan abu kamar haka. "woo".

Mu hada mu gudanar da shirin:

$ clang -g -O2 simple-map.c -o simple-map
$ sudo strace ./simple-map
execve("./simple-map", ["./simple-map"], 0x7ffd40a27070 /* 14 vars */) = 0
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=4, max_entries=4, map_name="woo", ...}, 72) = 3
pause(

Ga tsarin kira bpf(2) dawo mana da lambar taswirar bayanin 3 sannan shirin, kamar yadda aka zata, yana jiran ƙarin umarni a cikin kiran tsarin pause(2).

Yanzu bari mu aika da shirin mu a bango ko kuma bude wani tashar kuma mu kalli abin da muke amfani da utility bpftool (zamu iya bambanta taswirar mu da wasu da sunanta):

$ sudo bpftool map
...
114: hash  name woo  flags 0x0
        key 4B  value 4B  max_entries 4  memlock 4096B
...

Lamba 114 shine ID na duniya na abinmu. Duk wani shiri akan tsarin zai iya amfani da wannan ID don buɗe taswirar da ke akwai ta amfani da umarnin BPF_MAP_GET_FD_BY_ID tsarin kira bpf.

Yanzu za mu iya yin wasa da tebur ɗin mu. Mu duba abinda ke cikinsa:

$ sudo bpftool map dump id 114
Found 0 elements

Babu komai. Mu sanya kima a ciki hash[1] = 1:

$ sudo bpftool map update id 114 key 1 0 0 0 value 1 0 0 0

Bari mu sake duba teburin:

$ sudo bpftool map dump id 114
key: 01 00 00 00  value: 01 00 00 00
Found 1 element

Hooray! Mun yi nasarar ƙara kashi ɗaya. Lura cewa dole ne mu yi aiki a matakin byte don yin wannan, tunda bptftool bai san irin nau'in dabi'u a teburin zanta ba. (Wannan ilimin za a iya canza shi zuwa gare ta ta amfani da BTF, amma ƙari akan hakan a yanzu.)

Ta yaya daidai bpftool ke karantawa da ƙara abubuwa? Bari mu kalli ƙarƙashin hular:

$ sudo strace -e bpf bpftool map dump id 114
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=NULL, next_key=0x55856ab65280}, 120) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x55856ab65280, value=0x55856ab652a0}, 120) = 0
key: 01 00 00 00  value: 01 00 00 00
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=0x55856ab65280, next_key=0x55856ab65280}, 120) = -1 ENOENT

Da farko mun buɗe taswirar ta ID ɗin sa na duniya ta amfani da umarnin BPF_MAP_GET_FD_BY_ID и bpf(2) dawo mana da bayanin 3. Ci gaba da amfani da umarnin BPF_MAP_GET_NEXT_KEY mun sami maɓallin farko a teburin ta wucewa NULL a matsayin mai nuni ga maɓalli "na baya". Idan muna da makullin za mu iya yi BPF_MAP_LOOKUP_ELEMwanda ke mayar da ƙima zuwa mai nuni value. Mataki na gaba shine mu yi ƙoƙarin nemo kashi na gaba ta hanyar ƙaddamar da mai nuni zuwa maɓalli na yanzu, amma teburin mu ya ƙunshi kashi ɗaya kawai da umarnin. BPF_MAP_GET_NEXT_KEY ya dawo ENOENT.

To, bari mu canza darajar ta maɓalli 1, bari mu ce dabarun kasuwancin mu na buƙatar yin rajista hash[1] = 2:

$ sudo strace -e bpf bpftool map update id 114 key 1 0 0 0 value 2 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x55dcd72be260, value=0x55dcd72be280, flags=BPF_ANY}, 120) = 0

Kamar yadda aka zata, abu ne mai sauqi qwarai: umarnin BPF_MAP_GET_FD_BY_ID yana buɗe taswirar mu ta ID, da umarni BPF_MAP_UPDATE_ELEM ya sake rubuta kashi.

Don haka, bayan ƙirƙirar teburin zanta daga wannan shirin, zamu iya karantawa da rubuta abubuwan da ke cikinsa daga wani. Lura cewa idan mun sami damar yin wannan daga layin umarni, to duk wani shiri akan tsarin zai iya yin shi. Baya ga umarnin da aka bayyana a sama, don aiki tare da taswira daga sararin mai amfani, mai zuwa:

  • BPF_MAP_LOOKUP_ELEM: nemo ƙima ta maɓalli
  • BPF_MAP_UPDATE_ELEM: sabunta/ƙirƙira ƙima
  • BPF_MAP_DELETE_ELEM: cire maɓalli
  • BPF_MAP_GET_NEXT_KEY: nemo maɓalli na gaba (ko na farko).
  • BPF_MAP_GET_NEXT_ID: yana ba ku damar shiga taswirar da ke akwai, haka yake aiki bpftool map
  • BPF_MAP_GET_FD_BY_ID: buɗe taswirar data kasance ta ID ɗin sa na duniya
  • BPF_MAP_LOOKUP_AND_DELETE_ELEM: atomically sabunta darajar abu da mayar da tsohon
  • BPF_MAP_FREEZE: sanya taswirar ta zama marar canzawa daga sararin mai amfani (ba za a iya soke wannan aikin ba)
  • BPF_MAP_LOOKUP_BATCH, BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH: taro ayyuka. Misali, BPF_MAP_LOOKUP_AND_DELETE_BATCH - wannan ita ce kawai abin dogaro don karantawa da sake saita duk ƙimar daga taswira

Ba duk waɗannan umarni suna aiki ga kowane nau'in taswira ba, amma gabaɗaya aiki tare da sauran nau'ikan taswira daga sararin mai amfani yayi kama da aiki tare da tebur ɗin zanta.

Domin oda, bari mu gama gwajin teburin mu na hash. Ka tuna cewa mun ƙirƙiri tebur wanda zai iya ƙunsar har zuwa maɓallai huɗu? Bari mu ƙara wasu abubuwa kaɗan:

$ sudo bpftool map update id 114 key 2 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 3 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 4 0 0 0 value 1 0 0 0

Ya zuwa yanzu yana da kyau:

$ sudo bpftool map dump id 114
key: 01 00 00 00  value: 01 00 00 00
key: 02 00 00 00  value: 01 00 00 00
key: 04 00 00 00  value: 01 00 00 00
key: 03 00 00 00  value: 01 00 00 00
Found 4 elements

Bari mu gwada ƙara ɗaya:

$ sudo bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
Error: update failed: Argument list too long

Kamar yadda aka zata, ba mu yi nasara ba. Bari mu dubi kuskuren daki-daki:

$ sudo strace -e bpf bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=80, info=0x7ffe6c626da0}}, 120) = 0
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x56049ded5260, value=0x56049ded5280, flags=BPF_ANY}, 120) = -1 E2BIG (Argument list too long)
Error: update failed: Argument list too long
+++ exited with 255 +++

Komai yana da kyau: kamar yadda ake tsammani, ƙungiyar BPF_MAP_UPDATE_ELEM yayi ƙoƙarin ƙirƙirar sabon, na biyar, maɓalli, amma faɗuwa E2BIG.

Don haka, za mu iya ƙirƙira da loda shirye-shiryen BPF, kazalika da ƙirƙira da sarrafa taswira daga sararin mai amfani. Yanzu yana da ma'ana don duba yadda za mu iya amfani da taswira daga shirye-shiryen BPF da kansu. Za mu iya magana game da wannan a cikin harshen shirye-shirye masu wuyar karantawa a cikin lambobin macro na inji, amma a gaskiya lokaci ya yi da za a nuna yadda ake rubuta da kuma kiyaye shirye-shiryen BPF - ta amfani da libbpf.

(Ga masu karatu waɗanda ba su gamsu da rashin ƙarancin misali ba: za mu bincika dalla-dalla shirye-shiryen da ke amfani da taswira da ayyukan taimako waɗanda aka kirkira ta amfani da su. libbpf kuma gaya muku abin da ya faru a matakin koyarwa. Ga masu karatu wadanda basu gamsu ba sosai, mun kara da cewa misali a wurin da ya dace a cikin labarin.)

Rubuta shirye-shiryen BPF ta amfani da libbpf

Rubuta shirye-shiryen BPF ta amfani da lambobin na'ura na iya zama mai ban sha'awa a karon farko kawai, sannan satiety saita shiga. A wannan lokacin kuna buƙatar juya hankalin ku zuwa ga llvm, wanda ke da baya don samar da lambar don gine-ginen BPF, da kuma ɗakin karatu libbpf, wanda ke ba ka damar rubuta gefen mai amfani na aikace-aikacen BPF da loda lambar shirye-shiryen BPF da aka samar ta amfani da su llvm/clang.

A hakikanin gaskiya, kamar yadda za mu gani a cikin wannan labarin da kuma na gaba. libbpf yana aiki da yawa ba tare da shi ba (ko makamancinsa - iproute2, libbcc, libbpf-go, da sauransu) ba shi yiwuwa a rayu. Ɗaya daga cikin siffofin kisa na aikin libbpf shi ne BPF CO-RE (Compile Sau ɗaya, Gudu A Ko'ina) - aikin da ke ba ka damar rubuta shirye-shiryen BPF waɗanda suke da sauƙin ɗauka daga kwaya zuwa wani, tare da ikon yin aiki akan API daban-daban (misali, lokacin da tsarin kernel ya canza daga sigar). zuwa version). Domin samun damar yin aiki tare da CO-RE, dole ne a haɗa kernel ɗinku tare da tallafin BTF (mun bayyana yadda ake yin hakan a cikin sashin. Kayayyakin Ci gaba. Kuna iya bincika ko an gina kernel ɗinku tare da BTF ko ba a sauƙaƙe ba - ta kasancewar fayil ɗin mai zuwa:

$ ls -lh /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 2.6M Jul 29 15:30 /sys/kernel/btf/vmlinux

Wannan fayil ɗin yana adana bayanai game da duk nau'ikan bayanan da aka yi amfani da su a cikin kernel kuma ana amfani da su a duk misalan mu ta amfani da su libbpf. Za mu yi magana dalla-dalla game da CO-RE a cikin labarin na gaba, amma a cikin wannan - kawai gina kanka da kwaya CONFIG_DEBUG_INFO_BTF.

Laburare libbpf yana zaune daidai a cikin kundin adireshi tools/lib/bpf Kwaya da haɓakarta ana aiwatar da su ta cikin jerin aikawasiku [email protected]. Koyaya, ana kiyaye wurin ajiyar daban don buƙatun aikace-aikacen da ke zaune a waje da kwaya https://github.com/libbpf/libbpf wanda a cikinsa ake madubi ɗakin karatu na kernel don samun damar karatu fiye ko žasa kamar yadda yake.

A cikin wannan sashe za mu dubi yadda za ku iya ƙirƙirar aikin da ke amfani da shi libbpf, bari mu rubuta shirye-shiryen gwaji da yawa (mafi ko žasa mara ma'ana) kuma mu bincika dalla-dalla yadda duk yake aiki. Wannan zai ba mu damar yin bayani cikin sauƙi a cikin sassan masu zuwa daidai yadda shirye-shiryen BPF ke hulɗa da taswira, mataimakan kernel, BTF, da sauransu.

Yawanci ayyukan amfani libbpf ƙara ma'ajiyar GitHub a matsayin git submodule, za mu yi haka:

$ mkdir /tmp/libbpf-example
$ cd /tmp/libbpf-example/
$ git init-db
Initialized empty Git repository in /tmp/libbpf-example/.git/
$ git submodule add https://github.com/libbpf/libbpf.git
Cloning into '/tmp/libbpf-example/libbpf'...
remote: Enumerating objects: 200, done.
remote: Counting objects: 100% (200/200), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 3354 (delta 101), reused 118 (delta 79), pack-reused 3154
Receiving objects: 100% (3354/3354), 2.05 MiB | 10.22 MiB/s, done.
Resolving deltas: 100% (2176/2176), done.

Je zuwa libbpf mai sauqi qwarai:

$ cd libbpf/src
$ mkdir build
$ OBJDIR=build DESTDIR=root make -s install
$ find root
root
root/usr
root/usr/include
root/usr/include/bpf
root/usr/include/bpf/bpf_tracing.h
root/usr/include/bpf/xsk.h
root/usr/include/bpf/libbpf_common.h
root/usr/include/bpf/bpf_endian.h
root/usr/include/bpf/bpf_helpers.h
root/usr/include/bpf/btf.h
root/usr/include/bpf/bpf_helper_defs.h
root/usr/include/bpf/bpf.h
root/usr/include/bpf/libbpf_util.h
root/usr/include/bpf/libbpf.h
root/usr/include/bpf/bpf_core_read.h
root/usr/lib64
root/usr/lib64/libbpf.so.0.1.0
root/usr/lib64/libbpf.so.0
root/usr/lib64/libbpf.a
root/usr/lib64/libbpf.so
root/usr/lib64/pkgconfig
root/usr/lib64/pkgconfig/libbpf.pc

Shirinmu na gaba a wannan sashe shine kamar haka: za mu rubuta shirin BPF kamar haka BPF_PROG_TYPE_XDP, daidai da misalin da ya gabata, amma a cikin C, muna tattara shi ta amfani da shi clang, kuma rubuta shirin taimako wanda zai loda shi a cikin kwaya. A cikin sassan masu zuwa za mu faɗaɗa iyawar duka shirin BPF da shirin mataimaka.

Misali: ƙirƙirar cikakken aikace-aikacen ta amfani da libbpf

Don farawa, muna amfani da fayil ɗin /sys/kernel/btf/vmlinux, wanda aka ambata a sama, kuma ya ƙirƙiri makamancinsa a cikin hanyar fayil ɗin taken:

$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

Wannan fayil ɗin zai adana duk tsarin bayanan da ke akwai a cikin kwaya, alal misali, wannan shine yadda aka ayyana taken IPv4 a cikin kernel:

$ grep -A 12 'struct iphdr {' vmlinux.h
struct iphdr {
    __u8 ihl: 4;
    __u8 version: 4;
    __u8 tos;
    __be16 tot_len;
    __be16 id;
    __be16 frag_off;
    __u8 ttl;
    __u8 protocol;
    __sum16 check;
    __be32 saddr;
    __be32 daddr;
};

Yanzu za mu rubuta shirin mu na BPF a cikin C:

$ cat xdp-simple.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

SEC("xdp/simple")
int simple(void *ctx)
{
        return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";

Ko da yake shirin namu ya zama mai sauƙi, amma har yanzu muna buƙatar kula da cikakkun bayanai. Na farko, fayil ɗin taken farko da muka haɗa shine vmlinux.h, wanda muka samar kawai ta amfani da shi bpftool btf dump - yanzu ba ma buƙatar shigar da kunshin kernel-headers don gano yadda tsarin kwaya yayi kama. Fayil na kai mai zuwa yana zuwa mana daga ɗakin karatu libbpf. Yanzu muna buƙatar shi kawai don ayyana macro SEC, wanda ke aika hali zuwa sashin da ya dace na fayil ɗin ELF. Shirin namu yana kunshe ne a cikin sashe xdp/simple, inda kafin slash za mu ayyana nau'in shirin BPF - wannan shine yarjejeniyar da aka yi amfani da ita libbpf, bisa sunan sashe zai maye gurbin daidai nau'in a farawa bpf(2). Shirin BPF kanta shine C - mai sauqi qwarai kuma ya ƙunshi layi ɗaya return XDP_PASS. A ƙarshe, wani sashe daban "license" ya ƙunshi sunan lasisin.

Za mu iya tattara shirin mu ta amfani da llvm/clang, sigar>= 10.0.0, ko mafi kyau tukuna, mafi girma (duba sashe Kayayyakin Ci gaba):

$ clang --version
clang version 11.0.0 (https://github.com/llvm/llvm-project.git afc287e0abec710398465ee1f86237513f2b5091)
...

$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o

Daga cikin siffofi masu ban sha'awa: muna nuna gine-ginen manufa -target bpf da kuma hanyar zuwa headers libbpf, wanda muka shigar kwanan nan. Hakanan, kar ku manta game da -O2, ba tare da wannan zaɓin ba za ku iya kasancewa cikin abubuwan ban mamaki a nan gaba. Mu duba lambar mu, shin mun sami nasarar rubuta shirin da muke so?

$ llvm-objdump --section=xdp/simple --no-show-raw-insn -D xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       r0 = 2
       1:       exit

Ee, ya yi aiki! Yanzu, muna da fayil ɗin binary tare da shirin, kuma muna son ƙirƙirar aikace-aikacen da zai loda shi a cikin kernel. Don wannan dalili ɗakin karatu libbpf yana ba mu zaɓuɓɓuka biyu - yi amfani da API na ƙasa ko babban matakin API. Za mu bi hanya ta biyu, tunda muna son koyon yadda ake rubutu, lodawa da haɗa shirye-shiryen BPF tare da ƙaramin ƙoƙari don nazarin su na gaba.

Da farko, muna buƙatar samar da "kwarangwal" na shirin mu daga binary ta amfani da wannan kayan aiki bpftool - wuka Swiss na BPF duniya (wanda za a iya ɗauka a zahiri, tun da Daniel Borkman, ɗaya daga cikin masu kirkiro da masu kula da BPF, Swiss):

$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h

A cikin fayil xdp-simple.skel.h ya ƙunshi lambar binary code na shirinmu da ayyuka don sarrafawa - lodawa, haɗawa, share abin mu. A cikin sauki yanayin wannan yana kama da wuce gona da iri, amma kuma yana aiki a cikin yanayin inda fayil ɗin abu ya ƙunshi shirye-shiryen BPF da taswira da yawa kuma don loda wannan giant ELF kawai muna buƙatar samar da kwarangwal kuma mu kira ayyuka ɗaya ko biyu daga aikace-aikacen al'ada. suna rubuta Bari mu ci gaba yanzu.

A taƙaice, shirin mu na lodawa ba shi da mahimmanci:

#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"

int main(int argc, char **argv)
{
    struct xdp_simple_bpf *obj;

    obj = xdp_simple_bpf__open_and_load();
    if (!obj)
        err(1, "failed to open and/or load BPF objectn");

    pause();

    xdp_simple_bpf__destroy(obj);
}

Yana da struct xdp_simple_bpf bayyana a cikin fayil xdp-simple.skel.h kuma ya bayyana fayil ɗin mu:

struct xdp_simple_bpf {
    struct bpf_object_skeleton *skeleton;
    struct bpf_object *obj;
    struct {
        struct bpf_program *simple;
    } progs;
    struct {
        struct bpf_link *simple;
    } links;
};

Za mu iya ganin alamun API mai ƙanƙanta a nan: tsarin struct bpf_program *simple и struct bpf_link *simple. Tsarin farko ya bayyana musamman shirinmu, wanda aka rubuta a cikin sashe xdp/simple, kuma na biyu yana bayanin yadda shirin ke haɗawa da tushen taron.

aiki xdp_simple_bpf__open_and_load, yana buɗe wani abu na ELF, ya ƙididdige shi, ƙirƙirar duk tsari da tsarin ƙasa (ban da shirin, ELF kuma ya ƙunshi wasu sassan - bayanai, bayanan karanta kawai, bayanan lalata, lasisi, da sauransu), sannan a loda shi cikin kernel ta amfani da tsarin. kira bpf, wanda za mu iya dubawa ta hanyar tattarawa da gudanar da shirin:

$ clang -O2 -I ./libbpf/src/root/usr/include/ xdp-simple.c -o xdp-simple ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz

$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_BTF_LOAD, 0x7ffdb8fd9670, 120)  = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0xdfd580, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 8, 0), prog_flags=0, prog_name="simple", prog_ifindex=0, expected_attach_type=0x25 /* BPF_??? */, ...}, 120) = 4

Yanzu bari mu dubi shirin mu ta amfani da bpftool. Mu nemo ID dinta:

# bpftool p | grep -A4 simple
463: xdp  name simple  tag 3b185187f1855c4c  gpl
        loaded_at 2020-08-01T01:59:49+0000  uid 0
        xlated 16B  jited 40B  memlock 4096B
        btf_id 185
        pids xdp-simple(16498)

da juji (muna amfani da gajeriyar hanyar umarnin bpftool prog dump xlated):

# bpftool p d x id 463
int simple(void *ctx):
; return XDP_PASS;
   0: (b7) r0 = 2
   1: (95) exit

Wani sabon abu! Shirin ya buga guntun fayil ɗin tushen mu na C. Wannan ɗakin karatu ne ya yi libbpf, wanda ya sami sashin cirewa a cikin binary, ya tattara shi a cikin wani abu na BTF, ya loda shi a cikin kwaya ta amfani da shi. BPF_BTF_LOAD, sa'an nan kuma ƙayyadadden bayanin bayanin fayil ɗin da aka samu lokacin loda shirin tare da umarni BPG_PROG_LOAD.

Mataimakan kernel

Shirye-shiryen BPF na iya gudanar da ayyukan "na waje" - mataimakan kernel. Waɗannan ayyuka masu taimako suna ba da damar shirye-shiryen BPF don samun damar tsarin kwaya, sarrafa taswira, da kuma sadarwa tare da "duniya ta gaske" - ƙirƙirar abubuwan da suka faru, kayan sarrafa kayan aiki (misali, fakitin turawa), da sauransu.

Misali: bpf_get_smp_processor_id

A cikin tsarin tsarin "koyo ta misali", bari mu yi la'akari da ɗaya daga cikin ayyukan taimako, bpf_get_smp_processor_id(), tabbata cikin fayil kernel/bpf/helpers.c. Yana mayar da lambar processor ɗin da shirin BPF da ya kira shi ke gudana. Amma ba mu da sha'awar ilimin tauhidi kamar yadda aiwatar da shi yana ɗaukar layi ɗaya:

BPF_CALL_0(bpf_get_smp_processor_id)
{
    return smp_processor_id();
}

Ma'anar aikin taimakon BPF yayi kama da ma'anar kiran tsarin tsarin Linux. Anan, alal misali, ana ayyana aikin da ba shi da gardama. (Aikin da ke ɗauka, a ce, ana bayyana mahawara guda uku ta amfani da macro BPF_CALL_3. Matsakaicin adadin muhawara biyar ne.) Duk da haka, wannan shine kawai ɓangaren farko na ma'anar. Kashi na biyu shine ayyana nau'in tsarin struct bpf_func_proto, wanda ya ƙunshi bayanin aikin mataimaki wanda mai tantancewa ya fahimta:

const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
    .func     = bpf_get_smp_processor_id,
    .gpl_only = false,
    .ret_type = RET_INTEGER,
};

Rijista Ayyukan Taimako

Domin shirye-shiryen BPF na wani nau'i na musamman don amfani da wannan aikin, dole ne su yi rajistar shi, misali ga nau'in BPF_PROG_TYPE_XDP An ayyana aiki a cikin kwaya xdp_func_proto, wanda ke ƙayyade daga ID na aikin taimako ko XDP yana goyan bayan wannan aikin ko a'a. Aikin mu shine goyon bayan:

static const struct bpf_func_proto *
xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
    switch (func_id) {
    ...
    case BPF_FUNC_get_smp_processor_id:
        return &bpf_get_smp_processor_id_proto;
    ...
    }
}

Sabbin nau'ikan shirye-shiryen BPF an "kayyade" a cikin fayil ɗin include/linux/bpf_types.h amfani da macro BPF_PROG_TYPE. An ayyana shi a cikin ƙididdiga saboda ma'anar ma'ana ce, kuma a cikin ma'anar harshen C ma'anar duk wani tsari na siminti yana faruwa a wasu wurare. Musamman a cikin fayil kernel/bpf/verifier.c duk ma'anar daga fayil bpf_types.h ana amfani da su don ƙirƙirar tsararrun tsari bpf_verifier_ops[]:

static const struct bpf_verifier_ops *const bpf_verifier_ops[] = {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) 
    [_id] = & _name ## _verifier_ops,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
};

Wato, ga kowane nau'in shirin BPF, an ayyana mai nuni ga tsarin bayanai na nau'in struct bpf_verifier_ops, wanda aka fara tare da ƙimar _name ## _verifier_ops, i.e. xdp_verifier_ops to xdp. Tsarin xdp_verifier_ops ƙaddara cikin fayil net/core/filter.c kamar haka:

const struct bpf_verifier_ops xdp_verifier_ops = {
    .get_func_proto     = xdp_func_proto,
    .is_valid_access    = xdp_is_valid_access,
    .convert_ctx_access = xdp_convert_ctx_access,
    .gen_prologue       = bpf_noop_prologue,
};

Anan muna ganin aikin da muka saba xdp_func_proto, wanda zai gudanar da tabbatarwa a duk lokacin da ya fuskanci kalubale wani irin yana aiki a cikin shirin BPF, duba verifier.c.

Bari mu kalli yadda shirin BPF mai hasashe ke amfani da aikin bpf_get_smp_processor_id. Don yin haka, za mu sake rubuta shirin daga sashinmu na baya kamar haka:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

SEC("xdp/simple")
int simple(void *ctx)
{
    if (bpf_get_smp_processor_id() != 0)
        return XDP_DROP;
    return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";

Alamar bpf_get_smp_processor_id ƙaddara в <bpf/bpf_helper_defs.h> dakunan karatu libbpf yadda

static u32 (*bpf_get_smp_processor_id)(void) = (void *) 8;

wato, bpf_get_smp_processor_id ma'auni ne na aiki wanda ƙimarsa 8, inda 8 shine ƙimar BPF_FUNC_get_smp_processor_id nau'in enum bpf_fun_id, wanda aka ayyana mana a cikin fayil ɗin vmlinux.h (fayil bpf_helper_defs.h a cikin kernel an samar da shi ta hanyar rubutun, don haka lambobin "sihiri" suna da kyau). Wannan aikin baya ɗaukar gardama kuma yana dawo da ƙimar nau'in __u32. Idan muka gudanar da shi a cikin shirinmu, clang yana haifar da umarni BPF_CALL "Irin da ya dace" Mu hada shirin mu duba sashen xdp/simple:

$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ llvm-objdump -D --section=xdp/simple xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       bf 01 00 00 00 00 00 00 r1 = r0
       2:       67 01 00 00 20 00 00 00 r1 <<= 32
       3:       77 01 00 00 20 00 00 00 r1 >>= 32
       4:       b7 00 00 00 02 00 00 00 r0 = 2
       5:       15 01 01 00 00 00 00 00 if r1 == 0 goto +1 <LBB0_2>
       6:       b7 00 00 00 01 00 00 00 r0 = 1

0000000000000038 <LBB0_2>:
       7:       95 00 00 00 00 00 00 00 exit

A cikin layin farko muna ganin umarni call, siga IMM wanda yayi daidai da 8, kuma SRC_REG - sifili. Dangane da yarjejeniyar ABI da aka yi amfani da ita ta tabbatarwa, wannan kira ne zuwa aikin mataimaki mai lamba takwas. Da zarar an ƙaddamar da shi, ma'anar yana da sauƙi. Koma darajar daga rajista r0 kofe zuwa r1 kuma akan layi 2,3 an canza shi zuwa nau'in u32 - na sama 32 ragowa an share. A kan layi 4,5,6,7 za mu dawo 2 (XDP_PASS) ko 1 (XDP_DROP) dangane da ko aikin mataimaki daga layin 0 ya dawo da sifili ko maras sifili.

Bari mu gwada kanmu: loda shirin kuma mu dubi fitarwa bpftool prog dump xlated:

$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple &
[2] 10914

$ sudo bpftool p | grep simple
523: xdp  name simple  tag 44c38a10c657e1b0  gpl
        pids xdp-simple(10915)

$ sudo bpftool p d x id 523
int simple(void *ctx):
; if (bpf_get_smp_processor_id() != 0)
   0: (85) call bpf_get_smp_processor_id#114128
   1: (bf) r1 = r0
   2: (67) r1 <<= 32
   3: (77) r1 >>= 32
   4: (b7) r0 = 2
; }
   5: (15) if r1 == 0x0 goto pc+1
   6: (b7) r0 = 1
   7: (95) exit

Ok, mai tabbatarwa ya samo madaidaicin mai taimakon kwaya.

Misali: wuce gardama kuma a ƙarshe gudanar da shirin!

Duk ayyukan taimakon matakan gudu suna da samfuri

u64 fn(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)

Ana wuce ma'auni don ayyukan taimako a cikin rajista r1-r5, kuma ana mayar da ƙimar a cikin rajista r0. Babu wasu ayyuka da suka ɗauki fiye da muhawara biyar, kuma ba a sa ran za a ƙara goyon bayan su a nan gaba.

Bari mu kalli sabon mataimaki na kwaya da yadda BPF ke wuce sigogi. Mu sake rubutawa xdp-simple.bpf.c kamar haka (sauran layin ba su canza ba):

SEC("xdp/simple")
int simple(void *ctx)
{
    bpf_printk("running on CPU%un", bpf_get_smp_processor_id());
    return XDP_PASS;
}

Shirin namu yana buga lambar CPU da yake aiki a kai. Mu hada shi mu duba lambar:

$ llvm-objdump -D --section=xdp/simple --no-show-raw-insn xdp-simple.bpf.o

0000000000000000 <simple>:
       0:       r1 = 10
       1:       *(u16 *)(r10 - 8) = r1
       2:       r1 = 8441246879787806319 ll
       4:       *(u64 *)(r10 - 16) = r1
       5:       r1 = 2334956330918245746 ll
       7:       *(u64 *)(r10 - 24) = r1
       8:       call 8
       9:       r1 = r10
      10:       r1 += -24
      11:       r2 = 18
      12:       r3 = r0
      13:       call 6
      14:       r0 = 2
      15:       exit

A cikin layi 0-7 muna rubuta kirtani running on CPU%un, sa'an nan kuma a kan layi 8 muna gudanar da saba bpf_get_smp_processor_id. A kan layi na 9-12 muna shirya muhawarar masu taimako bpf_printk - rajista r1, r2, r3. Me ya sa uku daga cikinsu ba biyu ba? Domin bpf_printkwannan macro wrapper ne a kusa da ainihin mataimaki bpf_trace_printk, wanda ke buƙatar wuce girman tsarin kirtani.

Bari yanzu mu ƙara layuka biyu zuwa xdp-simple.cta yadda shirin mu ya haɗu zuwa wurin sadarwa lo kuma da gaske ya fara!

$ cat xdp-simple.c
#include <linux/if_link.h>
#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"

int main(int argc, char **argv)
{
    __u32 flags = XDP_FLAGS_SKB_MODE;
    struct xdp_simple_bpf *obj;

    obj = xdp_simple_bpf__open_and_load();
    if (!obj)
        err(1, "failed to open and/or load BPF objectn");

    bpf_set_link_xdp_fd(1, -1, flags);
    bpf_set_link_xdp_fd(1, bpf_program__fd(obj->progs.simple), flags);

cleanup:
    xdp_simple_bpf__destroy(obj);
}

Anan muna amfani da aikin bpf_set_link_xdp_fd, wanda ke haɗa shirye-shiryen BPF nau'in XDP zuwa mu'amalar hanyar sadarwa. Mun hardcoded da ke dubawa lamba lo, wanda ko da yaushe 1. Muna gudanar da aikin sau biyu don fara cire tsohon shirin idan an haɗa shi. Ka lura cewa yanzu ba ma buƙatar ƙalubale pause ko madauki mara iyaka: shirin mu na lodi zai fita, amma ba za a kashe shirin BPF ba tunda an haɗa shi da tushen taron. Bayan an yi nasarar saukarwa da haɗin kai, za a ƙaddamar da shirin don kowane fakitin cibiyar sadarwa da ya isa lo.

Bari mu download da shirin da kuma dubi dubawa lo:

$ sudo ./xdp-simple
$ sudo bpftool p | grep simple
669: xdp  name simple  tag 4fca62e77ccb43d6  gpl
$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 669

Shirin da muka zazzage yana da ID 669 kuma muna ganin ID iri ɗaya akan mahaɗin lo. Za mu aika da fakiti biyu zuwa 127.0.0.1 (buƙata + amsa):

$ ping -c1 localhost

kuma yanzu bari mu dubi abubuwan da ke cikin fayil ɗin gyara kuskure /sys/kernel/debug/tracing/trace_pipe, a cikinsa bpf_printk ya rubuta sakonsa:

# cat /sys/kernel/debug/tracing/trace_pipe
ping-13937 [000] d.s1 442015.377014: bpf_trace_printk: running on CPU0
ping-13937 [000] d.s1 442015.377027: bpf_trace_printk: running on CPU0

An ga fakiti biyu a kan lo kuma an sarrafa shi akan CPU0 - shirinmu na farko na BPF mara ma'ana yayi aiki!

Yana da kyau a lura da hakan bpf_printk Ba don komai ba ne ya rubuta zuwa fayil ɗin gyarawa: wannan ba shine mataimaki mafi nasara don amfani da samarwa ba, amma burin mu shine nuna wani abu mai sauƙi.

Samun damar taswira daga shirye-shiryen BPF

Misali: amfani da taswira daga shirin BPF

A cikin sassan da suka gabata mun koyi yadda ake ƙirƙira da amfani da taswira daga sararin mai amfani, kuma yanzu bari mu kalli ɓangaren kernel. Bari mu fara, kamar yadda aka saba, da misali. Mu sake rubuta shirin mu xdp-simple.bpf.c kamar haka:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 8);
    __type(key, u32);
    __type(value, u64);
} woo SEC(".maps");

SEC("xdp/simple")
int simple(void *ctx)
{
    u32 key = bpf_get_smp_processor_id();
    u32 *val;

    val = bpf_map_lookup_elem(&woo, &key);
    if (!val)
        return XDP_ABORTED;

    *val += 1;

    return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";

A farkon shirin mun ƙara ma'anar taswira woo: Wannan jerin abubuwa 8 ne wanda ke adana dabi'u kamar u64 (a cikin C za mu ayyana irin wannan tsararru kamar u64 woo[8]). A cikin shirin "xdp/simple" muna samun lambar processor na yanzu zuwa ma'auni key sa'an nan kuma amfani da aikin taimako bpf_map_lookup_element muna samun mai nuni ga shigarwar da ta dace a cikin tsararru, wanda muke haɓaka ta ɗaya. Fassara zuwa Rashanci: muna ƙididdige ƙididdiga waɗanda CPU ke sarrafa fakiti masu shigowa. Bari mu yi kokarin gudanar da shirin:

$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple

Mu duba cewa ta kamu da cutar lo kuma aika wasu fakiti:

$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 108

$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done

Yanzu bari mu dubi abubuwan da ke cikin tsararru:

$ sudo bpftool map dump name woo
[
    { "key": 0, "value": 0 },
    { "key": 1, "value": 400 },
    { "key": 2, "value": 0 },
    { "key": 3, "value": 0 },
    { "key": 4, "value": 0 },
    { "key": 5, "value": 0 },
    { "key": 6, "value": 0 },
    { "key": 7, "value": 46400 }
]

Kusan dukkan matakai an sarrafa su akan CPU7. Wannan ba shi da mahimmanci a gare mu, babban abu shine shirin yana aiki kuma mun fahimci yadda ake samun damar taswira daga shirye-shiryen BPF - ta amfani da хелперов bpf_mp_*.

Indexididdigar sufi

Don haka, za mu iya samun damar taswirar daga shirin BPF ta amfani da kira kamar

val = bpf_map_lookup_elem(&woo, &key);

inda aikin mataimaki yayi kama

void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)

amma muna wucewa mai nuni &woo zuwa tsarin da ba a bayyana sunansa ba struct { ... }...

Idan muka kalli mai tara shirin, zamu ga cewa darajar &woo Ba a bayyana ainihin (layi na 4):

llvm-objdump -D --section xdp/simple xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
       2:       bf a2 00 00 00 00 00 00 r2 = r10
       3:       07 02 00 00 fc ff ff ff r2 += -4
       4:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
       6:       85 00 00 00 01 00 00 00 call 1
...

kuma yana cikin ƙaura:

$ llvm-readelf -r xdp-simple.bpf.o | head -4

Relocation section '.relxdp/simple' at offset 0xe18 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000020  0000002700000001 R_BPF_64_64            0000000000000000 woo

Amma idan muka kalli shirin da aka riga aka ɗora, za mu ga mai nuni zuwa daidai taswira (layi 4):

$ sudo bpftool prog dump x name simple
int simple(void *ctx):
   0: (85) call bpf_get_smp_processor_id#114128
   1: (63) *(u32 *)(r10 -4) = r0
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = map[id:64]
...

Don haka, za mu iya yanke shawarar cewa a lokacin ƙaddamar da shirin mu na loader, hanyar haɗi zuwa &woo aka maye gurbinsu da wani abu da ɗakin karatu libbpf. Da farko za mu dubi fitarwa strace:

$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=8, max_entries=8, map_name="woo", ...}, 120) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="simple", ...}, 120) = 5

Muna ganin haka libbpf halitta taswira woo sannan kayi downloading na shirin mu simple. Bari mu kalli yadda muke loda shirin:

  • kira xdp_simple_bpf__open_and_load daga fayil xdp-simple.skel.h
  • wanda ke haddasawa xdp_simple_bpf__load daga fayil xdp-simple.skel.h
  • wanda ke haddasawa bpf_object__load_skeleton daga fayil libbpf/src/libbpf.c
  • wanda ke haddasawa bpf_object__load_xattr daga libbpf/src/libbpf.c

Aikin ƙarshe, a tsakanin sauran abubuwa, zai kira bpf_object__create_maps, wanda ke ƙirƙira ko buɗe taswirorin da ke akwai, suna juya su zuwa bayanan bayanan fayil. (Wannan shine inda muke gani BPF_MAP_CREATE a cikin fitarwa strace.) Na gaba ana kiran aikin bpf_object__relocate ita kuma ita ce take son mu, tunda mun tuna da abin da muka gani woo a cikin tebur na ƙaura. Binciken shi, a ƙarshe mun sami kanmu a cikin aikin bpf_program__relocate, wanda yana hulɗar ƙaura taswira:

case RELO_LD64:
    insn[0].src_reg = BPF_PSEUDO_MAP_FD;
    insn[0].imm = obj->maps[relo->map_idx].fd;
    break;

Don haka muna ɗaukar umarninmu

18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll

da kuma maye gurbin rijistar tushen da ke cikinsa da BPF_PSEUDO_MAP_FD, da IMM na farko zuwa mai bayanin taswirar mu kuma, idan ya yi daidai da, misali, 0xdeadbeef, to, a sakamakon haka za mu sami umarnin

18 11 00 00 ef eb ad de 00 00 00 00 00 00 00 00 r1 = 0 ll

Wannan shine yadda ake canja wurin bayanan taswira zuwa takamaiman shirin BPF da aka ɗora. A wannan yanayin, ana iya ƙirƙirar taswirar ta amfani da ita BPF_MAP_CREATE, kuma an buɗe ta ID ta amfani da shi BPF_MAP_GET_FD_BY_ID.

Total, lokacin amfani libbpf Algorithm shine kamar haka:

  • yayin haɗawa, ana ƙirƙira bayanai a teburin ƙaura don hanyoyin haɗin taswira
  • libbpf yana buɗe littafin abu na ELF, ya nemo duk taswirorin da aka yi amfani da su kuma ya ƙirƙira masu bayanin fayil ɗin
  • ana loda masu bayanin fayil a cikin kernel a matsayin wani ɓangare na umarnin LD64

Kamar yadda zaku iya tunanin, akwai sauran abubuwa masu zuwa kuma dole ne mu duba cikin ainihin. Abin farin ciki, muna da ma'ana - mun rubuta ma'anar BPF_PSEUDO_MAP_FD a cikin rajistar tushen kuma za mu iya binne shi, wanda zai kai mu zuwa ga tsarkakan dukkan tsarkaka. kernel/bpf/verifier.c, inda aiki tare da suna na musamman ya maye gurbin bayanin fayil tare da adireshin tsarin nau'in struct bpf_map:

static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env) {
    ...

    f = fdget(insn[0].imm);
    map = __bpf_map_get(f);
    if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
        addr = (unsigned long)map;
    }
    insn[0].imm = (u32)addr;
    insn[1].imm = addr >> 32;

(ana iya samun cikakken code mahada). Don haka za mu iya fadada algorithm:

  • yayin loda shirin, mai tabbatarwa yana bincika daidai amfani da taswira kuma ya rubuta adireshin tsarin da ya dace struct bpf_map

Lokacin zazzage ELF binary ta amfani da libbpf Akwai abubuwa da yawa da ke faruwa, amma za mu tattauna hakan a wasu talifofin.

Ana loda shirye-shirye da taswira ba tare da libbpf ba

Kamar yadda aka yi alkawari, ga misali ga masu karatu waɗanda ke son sanin yadda ake ƙirƙira da loda shirin da ke amfani da taswira, ba tare da taimako ba. libbpf. Wannan na iya zama da amfani lokacin da kake aiki a cikin yanayin da ba za ka iya gina abin dogaro ba, ko adana kowane abu, ko rubuta wani shiri kamar ply, wanda ke haifar da lambar binary BPF akan tashi.

Don sauƙaƙa bin dabaru, za mu sake rubuta misalinmu don waɗannan dalilai xdp-simple. Cikakken kuma ɗan faɗaɗa lambar shirin da aka tattauna a cikin wannan misali ana iya samun shi a cikin wannan gistar.

Ma'anar aikace-aikacen mu shine kamar haka:

  • ƙirƙirar taswirar nau'in BPF_MAP_TYPE_ARRAY ta amfani da umarnin BPF_MAP_CREATE,
  • ƙirƙirar shirin da ke amfani da wannan taswira,
  • haɗa shirin zuwa dubawa lo,

wanda ke fassara zuwa mutum kamar

int main(void)
{
    int map_fd, prog_fd;

    map_fd = map_create();
    if (map_fd < 0)
        err(1, "bpf: BPF_MAP_CREATE");

    prog_fd = prog_load(map_fd);
    if (prog_fd < 0)
        err(1, "bpf: BPF_PROG_LOAD");

    xdp_attach(1, prog_fd);
}

Yana da map_create yana ƙirƙirar taswira kamar yadda muka yi a misali na farko game da kiran tsarin bpf - “Kernel, don Allah a yi mini sabuwar taswira a cikin tsari na abubuwa 8 kamar __u64 kuma ku mayar mini da bayanin fayil":

static int map_create()
{
    union bpf_attr attr;

    memset(&attr, 0, sizeof(attr));
    attr.map_type = BPF_MAP_TYPE_ARRAY,
    attr.key_size = sizeof(__u32),
    attr.value_size = sizeof(__u64),
    attr.max_entries = 8,
    strncpy(attr.map_name, "woo", sizeof(attr.map_name));
    return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}

Shirin kuma yana da sauƙin lodawa:

static int prog_load(int map_fd)
{
    union bpf_attr attr;
    struct bpf_insn insns[] = {
        ...
    };

    memset(&attr, 0, sizeof(attr));
    attr.prog_type = BPF_PROG_TYPE_XDP;
    attr.insns     = ptr_to_u64(insns);
    attr.insn_cnt  = sizeof(insns)/sizeof(insns[0]);
    attr.license   = ptr_to_u64("GPL");
    strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
    return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}

Bangaren yaudara prog_load shine ma'anar shirin mu na BPF a matsayin tsararrun tsari struct bpf_insn insns[]. Amma da yake muna amfani da shirin da muke da shi a cikin C, za mu iya ɗan zamba:

$ llvm-objdump -D --section xdp/simple xdp-simple.bpf.o

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
       2:       bf a2 00 00 00 00 00 00 r2 = r10
       3:       07 02 00 00 fc ff ff ff r2 += -4
       4:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
       6:       85 00 00 00 01 00 00 00 call 1
       7:       b7 01 00 00 00 00 00 00 r1 = 0
       8:       15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2>
       9:       61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0)
      10:       07 01 00 00 01 00 00 00 r1 += 1
      11:       63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1
      12:       b7 01 00 00 02 00 00 00 r1 = 2

0000000000000068 <LBB0_2>:
      13:       bf 10 00 00 00 00 00 00 r0 = r1
      14:       95 00 00 00 00 00 00 00 exit

Gabaɗaya, muna buƙatar rubuta umarnin 14 a cikin nau'ikan tsari kamar struct bpf_insn (shawara: Ɗauki juji daga sama, sake karanta sashin umarni, buɗe linux/bpf.h и linux/bpf_common.h da kuma kokarin tantancewa struct bpf_insn insns[] a kan kansa):

struct bpf_insn insns[] = {
    /* 85 00 00 00 08 00 00 00 call 8 */
    {
        .code = BPF_JMP | BPF_CALL,
        .imm = 8,
    },

    /* 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0 */
    {
        .code = BPF_MEM | BPF_STX,
        .off = -4,
        .src_reg = BPF_REG_0,
        .dst_reg = BPF_REG_10,
    },

    /* bf a2 00 00 00 00 00 00 r2 = r10 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_X,
        .src_reg = BPF_REG_10,
        .dst_reg = BPF_REG_2,
    },

    /* 07 02 00 00 fc ff ff ff r2 += -4 */
    {
        .code = BPF_ALU64 | BPF_ADD | BPF_K,
        .dst_reg = BPF_REG_2,
        .imm = -4,
    },

    /* 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll */
    {
        .code = BPF_LD | BPF_DW | BPF_IMM,
        .src_reg = BPF_PSEUDO_MAP_FD,
        .dst_reg = BPF_REG_1,
        .imm = map_fd,
    },
    { }, /* placeholder */

    /* 85 00 00 00 01 00 00 00 call 1 */
    {
        .code = BPF_JMP | BPF_CALL,
        .imm = 1,
    },

    /* b7 01 00 00 00 00 00 00 r1 = 0 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 0,
    },

    /* 15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2> */
    {
        .code = BPF_JMP | BPF_JEQ | BPF_K,
        .off = 4,
        .src_reg = BPF_REG_0,
        .imm = 0,
    },

    /* 61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0) */
    {
        .code = BPF_MEM | BPF_LDX,
        .off = 0,
        .src_reg = BPF_REG_0,
        .dst_reg = BPF_REG_1,
    },

    /* 07 01 00 00 01 00 00 00 r1 += 1 */
    {
        .code = BPF_ALU64 | BPF_ADD | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 1,
    },

    /* 63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1 */
    {
        .code = BPF_MEM | BPF_STX,
        .src_reg = BPF_REG_1,
        .dst_reg = BPF_REG_0,
    },

    /* b7 01 00 00 02 00 00 00 r1 = 2 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 2,
    },

    /* <LBB0_2>: bf 10 00 00 00 00 00 00 r0 = r1 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_X,
        .src_reg = BPF_REG_1,
        .dst_reg = BPF_REG_0,
    },

    /* 95 00 00 00 00 00 00 00 exit */
    {
        .code = BPF_JMP | BPF_EXIT
    },
};

Wani motsa jiki ga waɗanda ba su rubuta wannan da kansu ba - sami map_fd.

Akwai sauran kashi daya da ba a bayyana ba a cikin shirin namu - xdp_attach. Abin takaici, ba za a iya haɗa shirye-shirye kamar XDP ta amfani da kiran tsarin ba bpf. Mutanen da suka ƙirƙiri BPF da XDP sun fito ne daga al'ummar Linux ta kan layi, wanda ke nufin sun yi amfani da wanda ya fi sani da su (amma ba don al'ada mutane) dubawa don hulɗa tare da kernel: netlink sockets, duba kuma RFC3549. Hanya mafi sauƙi don aiwatarwa xdp_attach yana copying code daga libbpf, wato, daga fayil netlink.c, abin da muka yi ke nan, muka gajarta shi kadan:

Barka da zuwa duniyar netlink sockets

Bude nau'in soket na netlink NETLINK_ROUTE:

int netlink_open(__u32 *nl_pid)
{
    struct sockaddr_nl sa;
    socklen_t addrlen;
    int one = 1, ret;
    int sock;

    memset(&sa, 0, sizeof(sa));
    sa.nl_family = AF_NETLINK;

    sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    if (sock < 0)
        err(1, "socket");

    if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK, &one, sizeof(one)) < 0)
        warnx("netlink error reporting not supported");

    if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0)
        err(1, "bind");

    addrlen = sizeof(sa);
    if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0)
        err(1, "getsockname");

    *nl_pid = sa.nl_pid;
    return sock;
}

Mun karanta daga wannan soket:

static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
{
    bool multipart = true;
    struct nlmsgerr *errm;
    struct nlmsghdr *nh;
    char buf[4096];
    int len, ret;

    while (multipart) {
        multipart = false;
        len = recv(sock, buf, sizeof(buf), 0);
        if (len < 0)
            err(1, "recv");

        if (len == 0)
            break;

        for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
                nh = NLMSG_NEXT(nh, len)) {
            if (nh->nlmsg_pid != nl_pid)
                errx(1, "wrong pid");
            if (nh->nlmsg_seq != seq)
                errx(1, "INVSEQ");
            if (nh->nlmsg_flags & NLM_F_MULTI)
                multipart = true;
            switch (nh->nlmsg_type) {
                case NLMSG_ERROR:
                    errm = (struct nlmsgerr *)NLMSG_DATA(nh);
                    if (!errm->error)
                        continue;
                    ret = errm->error;
                    // libbpf_nla_dump_errormsg(nh); too many code to copy...
                    goto done;
                case NLMSG_DONE:
                    return 0;
                default:
                    break;
            }
        }
    }
    ret = 0;
done:
    return ret;
}

A ƙarshe, ga aikinmu wanda ke buɗe soket da aika saƙo na musamman zuwa gare shi mai ɗauke da bayanin fayil:

static int xdp_attach(int ifindex, int prog_fd)
{
    int sock, seq = 0, ret;
    struct nlattr *nla, *nla_xdp;
    struct {
        struct nlmsghdr  nh;
        struct ifinfomsg ifinfo;
        char             attrbuf[64];
    } req;
    __u32 nl_pid = 0;

    sock = netlink_open(&nl_pid);
    if (sock < 0)
        return sock;

    memset(&req, 0, sizeof(req));
    req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
    req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
    req.nh.nlmsg_type = RTM_SETLINK;
    req.nh.nlmsg_pid = 0;
    req.nh.nlmsg_seq = ++seq;
    req.ifinfo.ifi_family = AF_UNSPEC;
    req.ifinfo.ifi_index = ifindex;

    /* started nested attribute for XDP */
    nla = (struct nlattr *)(((char *)&req)
            + NLMSG_ALIGN(req.nh.nlmsg_len));
    nla->nla_type = NLA_F_NESTED | IFLA_XDP;
    nla->nla_len = NLA_HDRLEN;

    /* add XDP fd */
    nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
    nla_xdp->nla_type = IFLA_XDP_FD;
    nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
    memcpy((char *)nla_xdp + NLA_HDRLEN, &prog_fd, sizeof(prog_fd));
    nla->nla_len += nla_xdp->nla_len;

    /* if user passed in any flags, add those too */
    __u32 flags = XDP_FLAGS_SKB_MODE;
    nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
    nla_xdp->nla_type = IFLA_XDP_FLAGS;
    nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
    memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
    nla->nla_len += nla_xdp->nla_len;

    req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);

    if (send(sock, &req, req.nh.nlmsg_len, 0) < 0)
        err(1, "send");
    ret = bpf_netlink_recv(sock, nl_pid, seq);

cleanup:
    close(sock);
    return ret;
}

Don haka, an shirya komai don gwaji:

$ cc nolibbpf.c -o nolibbpf
$ sudo strace -e bpf ./nolibbpf
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, map_name="woo", ...}, 72) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=15, prog_name="woo", ...}, 72) = 4
+++ exited with 0 +++

Bari mu ga ko shirin namu ya haɗa da lo:

$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 160

Bari mu aika pings mu dubi taswira:

$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done
$ sudo bpftool m dump name woo
key: 00 00 00 00  value: 90 01 00 00 00 00 00 00
key: 01 00 00 00  value: 00 00 00 00 00 00 00 00
key: 02 00 00 00  value: 00 00 00 00 00 00 00 00
key: 03 00 00 00  value: 00 00 00 00 00 00 00 00
key: 04 00 00 00  value: 00 00 00 00 00 00 00 00
key: 05 00 00 00  value: 00 00 00 00 00 00 00 00
key: 06 00 00 00  value: 40 b5 00 00 00 00 00 00
key: 07 00 00 00  value: 00 00 00 00 00 00 00 00
Found 8 elements

Hurray, komai yana aiki. Lura, ta hanya, an sake nuna taswirar mu ta hanyar bytes. Wannan shi ne saboda gaskiyar cewa, sabanin libbpf Ba mu loda bayanan nau'in (BTF). Amma za mu yi magana game da wannan a gaba.

Kayayyakin Ci gaba

A wannan sashe, za mu kalli mafi ƙarancin kayan aikin haɓaka BPF.

Gabaɗaya magana, ba kwa buƙatar wani abu na musamman don haɓaka shirye-shiryen BPF - BPF yana gudana akan kowane kwaya mai kyau na rarraba, kuma ana gina shirye-shiryen ta amfani da clang, wanda za'a iya bayarwa daga kunshin. Koyaya, saboda gaskiyar cewa BPF yana ƙarƙashin haɓaka, kwaya da kayan aikin suna canzawa koyaushe, idan ba kwa son rubuta shirye-shiryen BPF ta amfani da hanyoyin da suka dace daga 2019, to dole ne ku tattara.

  • llvm/clang
  • pahole
  • jigon sa
  • bpftool

(Don yin tunani, wannan sashe da duk misalai a cikin labarin an gudanar da su akan Debian 10.)

lvm/kula

BPF yana da abokantaka tare da LLVM kuma, kodayake shirye-shiryen kwanan nan na BPF ana iya haɗa su ta amfani da gcc, duk ci gaban yanzu ana aiwatar da shi don LLVM. Saboda haka, da farko, za mu gina na yanzu version clang daga git:

$ sudo apt install ninja-build
$ git clone --depth 1 https://github.com/llvm/llvm-project.git
$ mkdir -p llvm-project/llvm/build/install
$ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" 
                      -DLLVM_ENABLE_PROJECTS="clang" 
                      -DBUILD_SHARED_LIBS=OFF 
                      -DCMAKE_BUILD_TYPE=Release 
                      -DLLVM_BUILD_RUNTIME=OFF
$ time ninja
... много времени спустя
$

Yanzu za mu iya bincika idan komai ya taru daidai:

$ ./bin/llc --version
LLVM (http://llvm.org/):
  LLVM version 11.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    bpf    - BPF (host endian)
    bpfeb  - BPF (big endian)
    bpfel  - BPF (little endian)
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64

(Umarnin taro clang dauke ni daga bpf_devel_QA.)

Ba za mu shigar da shirye-shiryen da muka gina yanzu ba, amma a maimakon haka kawai ƙara su zuwa PATH, alal misali:

export PATH="`pwd`/bin:$PATH"

(Za a iya ƙara wannan zuwa .bashrc ko zuwa wani fayil daban. Da kaina, Ina ƙara abubuwa kamar wannan zuwa ~/bin/activate-llvm.sh kuma idan ya cancanta na yi . activate-llvm.sh.)

Pahole da BTF

Mai amfani pahole ana amfani da shi lokacin gina kernel don ƙirƙirar bayanan lalata a cikin tsarin BTF. Ba za mu shiga daki-daki ba a cikin wannan labarin game da cikakkun bayanai na fasahar BTF, ban da gaskiyar cewa ya dace kuma muna so mu yi amfani da shi. Don haka idan za ku gina kernel ɗinku, fara fara ginawa pahole (ba tare da pahole ba za ku iya gina kernel tare da zaɓi ba CONFIG_DEBUG_INFO_BTF:

$ git clone https://git.kernel.org/pub/scm/devel/pahole/pahole.git
$ cd pahole/
$ sudo apt install cmake
$ mkdir build
$ cd build/
$ cmake -D__LIB=lib ..
$ make
$ sudo make install
$ which pahole
/usr/local/bin/pahole

Kernels don gwaji tare da BPF

Lokacin bincika yuwuwar BPF, Ina so in haɗa ainihin kaina. Wannan, gabaɗaya magana, ba lallai ba ne, tunda zaku iya tattarawa da loda shirye-shiryen BPF akan kernel rarraba, duk da haka, samun kernel ɗin ku yana ba ku damar amfani da sabbin fasalolin BPF, waɗanda zasu bayyana a cikin rarraba ku cikin watanni mafi kyau. , ko, kamar yadda a cikin yanayin wasu kayan aikin gyara ba za a tattara su kwata-kwata ba a nan gaba. Hakanan, ainihin nasa yana sa ya zama mahimmanci don gwaji tare da lambar.

Domin gina kwaya kuna buƙatar, na farko, kernel kanta, na biyu, fayil ɗin kernel. Don gwaji tare da BPF za mu iya amfani da saba vanilla kwaya ko daya daga cikin ci gaban kernels. A tarihi, ci gaban BPF yana faruwa a cikin al'ummar sadarwar Linux don haka duk canje-canje ba dade ko ba dade ba ta hanyar David Miller, mai kula da sadarwar Linux. Ya danganta da yanayinsu - gyare-gyare ko sababbin fasali - canje-canjen hanyar sadarwa sun faɗi cikin ɗaya daga cikin nau'i biyu - net ko net-next. Canje-canje don BPF ana rarraba su ta hanya ɗaya tsakanin bpf и bpf-next, wanda sai a haɗa su cikin net da net-na gaba, bi da bi. Don ƙarin bayani, duba bpf_devel_QA и netdev-FAQ. Don haka zaɓi kernel dangane da ɗanɗanon ku da kwanciyar hankali na tsarin da kuke gwadawa akan (*-next kernels sune mafi rashin kwanciyar hankali daga cikin waɗanda aka lissafa).

Ya wuce iyakar wannan labarin don magana game da yadda ake sarrafa fayilolin sanyi na kwaya - ana ɗauka cewa ko dai kun riga kun san yadda ake yin wannan, ko shirye ya koya a kan kansa. Koyaya, umarni masu zuwa yakamata su kasance sama ko ƙasa da isa don ba ku tsarin aiki mai kunna BPF.

Zazzage ɗaya daga cikin kernels na sama:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
$ cd bpf-next

Gina ƙaƙƙarfan tsarin kernel mai aiki:

$ cp /boot/config-`uname -r` .config
$ make localmodconfig

Kunna zaɓuɓɓukan BPF a cikin fayil .config na zaɓinku (mafi yiwuwa CONFIG_BPF za a riga an kunna tunda systemd yana amfani da shi). Anan akwai jerin zaɓuɓɓuka daga kernel da aka yi amfani da su don wannan labarin:

CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_LSM=y
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_IPV6_SEG6_BPF=y
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_BPFILTER is not set
CONFIG_NET_CLS_BPF=y
CONFIG_NET_ACT_BPF=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_DEBUG_INFO_BTF=y

Sa'an nan za mu iya sauƙi tara da shigar da kayayyaki da kuma kwaya (a hanya, za ka iya harhada kernel ta amfani da sabon taru). clangta ƙara CC=clang):

$ make -s -j $(getconf _NPROCESSORS_ONLN)
$ sudo make modules_install
$ sudo make install

kuma sake yi tare da sabon kernel (Ina amfani da wannan kexec daga kunshin kexec-tools):

v=5.8.0-rc6+ # если вы пересобираете текущее ядро, то можно делать v=`uname -r`
sudo kexec -l -t bzImage /boot/vmlinuz-$v --initrd=/boot/initrd.img-$v --reuse-cmdline &&
sudo kexec -e

bpftool

Mafi yawan amfani da amfani a cikin labarin zai zama mai amfani bpftool, wanda aka kawo azaman ɓangare na kernel na Linux. Masu haɓaka BPF sun rubuta kuma suna kiyaye shi don masu haɓaka BPF kuma ana iya amfani da su don sarrafa kowane nau'in abubuwan BPF - shirye-shiryen ɗaukar nauyi, ƙirƙira da gyara taswira, bincika rayuwar yanayin yanayin BPF, da sauransu. Ana iya samun takaddun a cikin nau'ikan lambobin tushe don shafukan mutum a cikin gindi ko, an riga an harhada, kan layi.

A lokacin rubuta wannan labarin bpftool ya zo shirye kawai don RHEL, Fedora da Ubuntu (duba, misali, wannan zaren, wanda ke ba da labarin da ba a ƙare ba na marufi bpftool in Debian). Amma idan kun riga kun gina kwaya, to ku gina bpftool mai sauki kamar kek:

$ cd ${linux}/tools/bpf/bpftool
# ... пропишите пути к последнему clang, как рассказано выше
$ make -s

Auto-detecting system features:
...                        libbfd: [ on  ]
...        disassembler-four-args: [ on  ]
...                          zlib: [ on  ]
...                        libcap: [ on  ]
...               clang-bpf-co-re: [ on  ]

Auto-detecting system features:
...                        libelf: [ on  ]
...                          zlib: [ on  ]
...                           bpf: [ on  ]

$

(Nan ${linux} - wannan shine kundin adireshin ku.) Bayan aiwatar da waɗannan umarni bpftool za a tattara a cikin kundin adireshi ${linux}/tools/bpf/bpftool kuma ana iya ƙara shi zuwa hanya (na farko ga mai amfani root) ko kawai kwafi zuwa /usr/local/sbin.

Tattara bpftool yana da kyau a yi amfani da na ƙarshe clang, an tattara kamar yadda aka bayyana a sama, kuma duba ko an haɗa shi daidai - ta amfani da, misali, umarnin.

$ sudo bpftool feature probe kernel
Scanning system configuration...
bpf() syscall for unprivileged users is enabled
JIT compiler is enabled
JIT compiler hardening is disabled
JIT compiler kallsyms exports are enabled for root
...

wanda zai nuna waɗanne fasalolin BPF aka kunna a cikin kwaya.

Af, umarnin da ya gabata ana iya gudanar da shi azaman

# bpftool f p k

Ana yin wannan ta kwatanci tare da abubuwan amfani daga kunshin iproute2, inda za mu iya, misali, ce ip a s eth0 maimakon ip addr show dev eth0.

ƙarshe

BPF yana ba ku damar yin takalman ƙuma don aunawa yadda ya kamata kuma a kan-tashi canza ayyukan ainihin. Tsarin ya juya ya zama mai nasara sosai, a cikin mafi kyawun al'adun UNIX: hanya mai sauƙi wanda ke ba ku damar (sake) shirin kernel ya ba da damar adadi mai yawa na mutane da ƙungiyoyi don gwaji. Kuma, ko da yake gwaje-gwajen, da kuma ci gaban kayan aikin BPF da kanta, sun yi nisa da ƙarewa, tsarin ya riga ya kasance da kwanciyar hankali ABI wanda ke ba ku damar gina abin dogara, kuma mafi mahimmanci, ingantaccen dabarun kasuwanci.

Ina so in lura cewa, a ra'ayi na, fasaha ya zama sananne sosai saboda, a gefe guda, yana iya играть (ana iya fahimtar gine-ginen na'ura fiye ko ƙasa da haka a maraice ɗaya), a daya bangaren kuma, don magance matsalolin da ba za a iya magance su ba (da kyau) kafin bayyanarsa. Wadannan sassa guda biyu tare suna tilasta wa mutane yin gwaji da yin mafarki, wanda ke haifar da fitowar sababbin hanyoyin magance.

Wannan labarin, ko da yake ba a takaice ba, gabatarwa ne kawai ga duniyar BPF kuma baya kwatanta siffofin "ci-gaba" da mahimman sassa na gine-gine. Shirin da ke gaba shine kamar haka: labarin na gaba zai kasance bayyani na nau'ikan shirye-shiryen BPF (akwai nau'ikan shirye-shirye 5.8 da ke tallafawa a cikin kernel 30), sannan a ƙarshe za mu kalli yadda ake rubuta ainihin aikace-aikacen BPF ta amfani da shirye-shiryen gano kernel. a matsayin misali, to lokaci yayi don ƙarin zurfin kwas akan gine-ginen BPF, sannan kuma misalai na sadarwar BPF da aikace-aikacen tsaro.

Labaran da suka gabata a cikin wannan silsilar

  1. BPF ga ƙananan yara, ɓangaren sifili: BPF na al'ada

Hanyoyin haɗi

  1. Jagoran Magana na BPF da XDP - takardun shaida akan BPF daga cilium, ko fiye daidai daga Daniel Borkman, ɗaya daga cikin masu ƙirƙira da masu kula da BPF. Wannan ɗaya ne daga cikin kwatanci mai tsanani na farko, wanda ya bambanta da sauran domin Daniyel ya san ainihin abin da yake rubutawa kuma babu kurakurai a wurin. Musamman, wannan takaddun yana bayyana yadda ake aiki tare da shirye-shiryen BPF na nau'ikan XDP da TC ta amfani da sanannen mai amfani. ip daga kunshin iproute2.

  2. Takaddun bayanai/cibiyar sadarwa/filter.txt - fayil na asali tare da takaddun don al'ada sannan kuma ƙara BPF. Kyakkyawan karatu idan kuna son zurfafa cikin harshe taro da cikakkun bayanan gine-gine.

  3. Blog game da BPF daga facebook. An sabunta shi da wuya, amma daidai, kamar yadda Alexei Starovoitov (mawallafin eBPF) da Andrii Nakryiko - (mai kula) suka rubuta a can. libbpf).

  4. Asirin bpftool. Zaren twitter mai nishadantarwa daga Quentin Monnet tare da misalai da sirrin amfani da bpftool.

  5. Nutse cikin BPF: jerin kayan karatu. Babban (kuma har yanzu ana kiyaye) jerin hanyoyin haɗin kai zuwa takaddun BPF daga Quentin Monnet.

source: www.habr.com

Add a comment