ืžื‘ื•ื ืงืฆืจ ืœ-BPF ื•-eBPF

ืฉืœื•ื, ื”ื‘ืจ! ื‘ืจืฆื•ื ื ื• ืœื”ื•ื“ื™ืข โ€‹โ€‹ืœื›ื ืฉืื ื• ืžื›ื™ื ื™ื ืกืคืจ ืœื™ืฆื™ืืชื•".ื™ื›ื•ืœืช ืชืฆืคื™ืช ืฉืœ ืœื™ื ื•ืงืก ืขื BPF".

ืžื‘ื•ื ืงืฆืจ ืœ-BPF ื•-eBPF
ืžื›ื™ื•ื•ืŸ ืฉื”ืžื›ื•ื ื” ื”ื•ื™ืจื˜ื•ืืœื™ืช BPF ืžืžืฉื™ื›ื” ืœื”ืชืคืชื— ื•ื ืžืฆืืช ื‘ืฉื™ืžื•ืฉ ืคืขื™ืœ ื‘ืคื•ืขืœ, ืชืจื’ืžื ื• ืขื‘ื•ืจื›ื ืžืืžืจ ื”ืžืชืืจ ืืช ื”ื™ื›ื•ืœื•ืช ื”ืขื™ืงืจื™ื•ืช ืฉืœื” ื•ืืช ืžืฆื‘ื” ื”ื ื•ื›ื—ื™.

ื‘ืฉื ื™ื ื”ืื—ืจื•ื ื•ืช, ื›ืœื™ ืชื›ื ื•ืช ื•ื˜ื›ื ื™ืงื•ืช ื”ืคื›ื• ืคื•ืคื•ืœืจื™ื™ื ื™ื•ืชืจ ื•ื™ื•ืชืจ ื›ื“ื™ ืœืคืฆื•ืช ืขืœ ื”ืžื’ื‘ืœื•ืช ืฉืœ ืœื™ื‘ืช ืœื™ื ื•ืงืก ื‘ืžืงืจื™ื ื‘ื”ื ื ื“ืจืฉ ืขื™ื‘ื•ื“ ืžื ื•ืช ื‘ืขืœ ื‘ื™ืฆื•ืขื™ื ื’ื‘ื•ื”ื™ื. ืื—ืช ื”ื˜ื›ื ื™ืงื•ืช ื”ืคื•ืคื•ืœืจื™ื•ืช ื‘ื™ื•ืชืจ ืžืกื•ื’ ื–ื” ื ืงืจืืช ืžืขืงืฃ ืœื™ื‘ื” (ืžืขืงืฃ ืœื™ื‘ื”) ื•ืžืืคืฉืจ, ืชื•ืš ืขืงื™ืคืช ืฉื›ื‘ืช ืจืฉืช ื”ืงืจื ืœ, ืœื‘ืฆืข ืืช ื›ืœ ืขื™ื‘ื•ื“ ื”ืžื ื•ืช ืžืžืจื—ื‘ ื”ืžืฉืชืžืฉ. ืขืงื™ืคืช ื”ืงืจื ืœ ื›ืจื•ื›ื” ื’ื ื‘ืฉืœื™ื˜ื” ื‘ื›ืจื˜ื™ืก ื”ืจืฉืช ืž ืฉื˜ื— ืžืฉืชืžืฉ. ื‘ืžื™ืœื™ื ืื—ืจื•ืช, ื›ืฉืขื•ื‘ื“ื™ื ืขื ื›ืจื˜ื™ืก ืจืฉืช, ืื ื—ื ื• ืžืกืชืžื›ื™ื ืขืœ ื”ื“ืจื™ื™ื‘ืจ ืฉื˜ื— ืžืฉืชืžืฉ.

ืขืœ ื™ื“ื™ ื”ืขื‘ืจืช ืฉืœื™ื˜ื” ืžืœืื” ืขืœ ื›ืจื˜ื™ืก ื”ืจืฉืช ืœืชื•ื›ื ื™ืช ืžืจื—ื‘ ืžืฉืชืžืฉ, ืื ื• ืžืฆืžืฆืžื™ื ืืช ืชืงื•ืจื” ืฉืœ ื”ืœื™ื‘ื” (ืžื™ืชื•ื’ ื”ืงืฉืจ, ืขื™ื‘ื•ื“ ืฉื›ื‘ื•ืช ืจืฉืช, ืคืกื™ืงื•ืช ื•ื›ื•'), ื•ื–ื” ื“ื™ ื—ืฉื•ื‘ ื›ืืฉืจ ืคื•ืขืœ ื‘ืžื”ื™ืจื•ื™ื•ืช ืฉืœ 10Gb/s ื•ืžืขืœื”. ืžืขืงืฃ ืœื™ื‘ื” ื‘ืชื•ืกืคืช ืฉื™ืœื•ื‘ ืฉืœ ืชื›ื•ื ื•ืช ืื—ืจื•ืช (ืขื™ื‘ื•ื“ ืืฆื•ื•ื”) ื•ื›ื•ื•ื ื•ืŸ ื‘ื™ืฆื•ืขื™ื ื–ื”ื™ืจ (ื—ืฉื‘ื•ื ืื•ืช NUMA, ื‘ื™ื“ื•ื“ ืžืขื‘ื“, ื•ื›ื•') ืชื•ืืžื™ื ืืช ื”ื™ืกื•ื“ื•ืช ืฉืœ ืขื™ื‘ื•ื“ ืจืฉืช ื‘ืขืœ ื‘ื™ืฆื•ืขื™ื ื’ื‘ื•ื”ื™ื ื‘ืžืจื—ื‘ ื”ืžืฉืชืžืฉ. ืื•ืœื™ ื“ื•ื’ืžื” ืœืžื•ืคืช ืœื’ื™ืฉื” ื”ื—ื“ืฉื” ื”ื–ื• ืœืขื™ื‘ื•ื“ ืžื ื•ืช ื”ื™ื DPDK ืžืืช ืื™ื ื˜ืœ (ืขืจื›ืช ืคื™ืชื•ื— ืžื˜ื•ืกื™ ื ืชื•ื ื™ื), ืœืžืจื•ืช ืฉืงื™ื™ืžื™ื ื›ืœื™ื ื•ื˜ื›ื ื™ืงื•ืช ื™ื“ื•ืขื•ืช ืื—ืจื•ืช, ื›ื•ืœืœ ื”-VPP ืฉืœ ืกื™ืกืงื• (ืขื™ื‘ื•ื“ ืžื ื•ืช ื•ืงื˜ื•ืจ), Netmap ื•ื›ืžื•ื‘ืŸ, ืกื ืง.

ืœืืจื’ื•ืŸ ืื™ื ื˜ืจืืงืฆื™ื•ืช ืจืฉืช ื‘ืžืจื—ื‘ ื”ืžืฉืชืžืฉ ื™ืฉ ืžืกืคืจ ื—ืกืจื•ื ื•ืช:

  • ืœื™ื‘ืช ืžืขืจื›ืช ื”ื”ืคืขืœื” ื”ื™ื ืฉื›ื‘ืช ื”ืคืฉื˜ื” ืขื‘ื•ืจ ืžืฉืื‘ื™ ื—ื•ืžืจื”. ืžื›ื™ื•ื•ืŸ ืฉืชื•ื›ื ื™ื•ืช ืžืจื—ื‘ ืžืฉืชืžืฉ ืฆืจื™ื›ื•ืช ืœื ื”ืœ ืืช ื”ืžืฉืื‘ื™ื ืฉืœื”ืŸ ื™ืฉื™ืจื•ืช, ืขืœื™ื”ืŸ ื’ื ืœื ื”ืœ ืืช ื”ื—ื•ืžืจื” ืฉืœื”ืŸ. ื–ื” ืื•ืžืจ ืœืขืชื™ื ืงืจื•ื‘ื•ืช ืฉืชืฆื˜ืจืš ืœืชื›ื ืช ืžื ื”ืœื™ ื”ืชืงื ื™ื ืžืฉืœืš.
  • ืžื›ื™ื•ื•ืŸ ืฉืื ื• ืžื•ื•ืชืจื™ื ืœื—ืœื•ื˜ื™ืŸ ืขืœ ืฉื˜ื— ื”ืœื™ื‘ื”, ืื ื• ื’ื ืžื•ื•ืชืจื™ื ืขืœ ื›ืœ ืคื•ื ืงืฆื™ื•ื ืœื™ื•ืช ื”ืจืฉืช ืฉืžืกืคืงืช ื”ืœื™ื‘ื”. ืชื•ื›ื ื™ื•ืช ืžืจื—ื‘ ืžืฉืชืžืฉ ื—ื™ื™ื‘ื•ืช ืœื™ื™ืฉื ืžื—ื“ืฉ ืชื›ื•ื ื•ืช ืฉืื•ืœื™ ื›ื‘ืจ ืžืกื•ืคืงื•ืช ืขืœ ื™ื“ื™ ื”ืœื™ื‘ื” ืื• ืžืขืจื›ืช ื”ื”ืคืขืœื”.
  • ืชื•ื›ื ื™ื•ืช ืคื•ืขืœื•ืช ื‘ืžืฆื‘ ืืจื’ื– ื—ื•ืœ, ืžื” ืฉืžื’ื‘ื™ืœ ืžืื•ื“ ืืช ื”ืื™ื ื˜ืจืืงืฆื™ื” ื‘ื™ื ื™ื”ืŸ ื•ืžื•ื ืข ืžื”ืŸ ืœื”ืฉืชืœื‘ ืขื ื—ืœืงื™ื ืื—ืจื™ื ืฉืœ ืžืขืจื›ืช ื”ื”ืคืขืœื”.

ื‘ืขื™ืงืจื• ืฉืœ ื“ื‘ืจ, ื‘ืขืช ื™ืฆื™ืจืช ืจืฉืชื•ืช ื‘ืžืจื—ื‘ ื”ืžืฉืชืžืฉ, ืจื•ื•ื—ื™ ื‘ื™ืฆื•ืขื™ื ืžื•ืฉื’ื™ื ืขืœ ื™ื“ื™ ื”ืขื‘ืจืช ืขื™ื‘ื•ื“ ืžื ื•ืช ืžื”ืงืจื ืœ ืœืžืจื—ื‘ ื”ืžืฉืชืžืฉ. XDP ืขื•ืฉื” ื‘ื“ื™ื•ืง ืืช ื”ื”ื™ืคืš: ื”ื•ื ืžืขื‘ื™ืจ ืชื•ื›ื ื™ื•ืช ืจืฉืช ืžืžืจื—ื‘ ื”ืžืฉืชืžืฉ (ืžืกื ื ื™ื, ืจื–ื•ืœื•ืจื™ื, ื ื™ืชื•ื‘ ื•ื›ื•') ืœืžืจื—ื‘ ื”ืงืจื ืœ. XDP ืžืืคืฉืจ ืœื ื• ืœื‘ืฆืข ืคื•ื ืงืฆื™ื™ืช ืจืฉืช ื‘ืจื’ืข ืฉื—ื‘ื™ืœื” ืคื•ื’ืขืช ื‘ืžืžืฉืง ืจืฉืช ื•ืœืคื ื™ ืฉื”ื™ื ืžืชื—ื™ืœื” ืœืขืœื•ืช ืœืชื•ืš ืชืช-ื”ืžืขืจื›ืช ืฉืœ ืจืฉืช ื”ืงืจื ืœ. ื›ืชื•ืฆืื” ืžื›ืš, ืžื”ื™ืจื•ืช ืขื™ื‘ื•ื“ ื”ื—ื‘ื™ืœื•ืช ืขื•ืœื” ื‘ืื•ืคืŸ ืžืฉืžืขื•ืชื™. ืขื ื–ืืช, ื›ื™ืฆื“ ื”ืœื™ื‘ื” ืžืืคืฉืจืช ืœืžืฉืชืžืฉ ืœื”ืคืขื™ืœ ืืช ื”ืชื•ื›ื ื™ื•ืช ืฉืœื• ื‘ื—ืœืœ ื”ืœื™ื‘ื”? ืœืคื ื™ ืฉื ืขื ื” ืขืœ ืฉืืœื” ื–ื•, ื‘ื•ืื• ื ืกืชื›ืœ ืขืœ ืžื” ื–ื” BPF.

BPF ื•-eBPF

ืœืžืจื•ืช ื”ืฉื ื”ืžื‘ืœื‘ืœ, BPF (Berkeley Packet Filtering) ื”ื•ื ืœืžืขืฉื” ื“ื’ื ืฉืœ ืžื›ื•ื ื” ื•ื™ืจื˜ื•ืืœื™ืช. ืžื›ื•ื ื” ื•ื™ืจื˜ื•ืืœื™ืช ื–ื• ืชื•ื›ื ื ื” ื‘ืžืงื•ืจ ืœื˜ืคืœ ื‘ืกื™ื ื•ืŸ ืžื ื•ืช, ื•ืžื›ืืŸ ื”ืฉื.

ืื—ื“ ื”ื›ืœื™ื ื”ืžืคื•ืจืกืžื™ื ื‘ื™ื•ืชืจ ื‘ืืžืฆืขื•ืช BPF ื”ื•ื tcpdump. ื‘ืขืช ืœื›ื™ื“ืช ืžื ื•ืช ื‘ืืžืฆืขื•ืช tcpdump ื”ืžืฉืชืžืฉ ื™ื›ื•ืœ ืœืฆื™ื™ืŸ ื‘ื™ื˜ื•ื™ ืœืกื™ื ื•ืŸ ืžื ื•ืช. ืจืง ืžื ื•ืช ื”ืชื•ืืžื•ืช ืœื‘ื™ื˜ื•ื™ ื–ื” ื™ื™ืงืœื˜ื•. ืœื“ื•ื’ืžื”, ื”ื‘ื™ื˜ื•ื™ "tcp dst port 80โ€ ืžืชื™ื™ื—ืก ืœื›ืœ ืžื ื•ืช ื”-TCP ื”ืžื’ื™ืขื•ืช ืœื™ืฆื™ืื” 80. ื”ืžื”ื“ืจ ื™ื›ื•ืœ ืœืงืฆืจ ืืช ื”ื‘ื™ื˜ื•ื™ ื”ื–ื” ืขืœ ื™ื“ื™ ื”ืžืจืชื• ืœ-BPF bytecode.

$ sudo tcpdump -d "tcp dst port 80"
(000) ldh [12] (001) jeq #0x86dd jt 2 jf 6
(002) ldb [20] (003) jeq #0x6 jt 4 jf 15
(004) ldh [56] (005) jeq #0x50 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23] (008) jeq #0x6 jt 9 jf 15
(009) ldh [20] (010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16] (013) jeq #0x50 jt 14 jf 15
(014) ret #262144
(015) ret #0

ื–ื” ืžื” ืฉื”ืชื•ื›ื ื™ืช ืœืขื™ืœ ืขื•ืฉื” ื‘ืขืฆื:

  • ื”ื•ืจืื” (000): ื˜ื•ืขืŸ ืืช ื”ื—ื‘ื™ืœื” ื‘ื”ื™ืกื˜ 12, ื›ืžื™ืœื” ืฉืœ 16 ืกื™ื‘ื™ื•ืช, ืœืชื•ืš ื”ืžืฆื‘ืจ. ื”ื™ืกื˜ 12 ืžืชืื™ื ืœืกื•ื’ ื”ืืชืจ ืฉืœ ื”ื—ื‘ื™ืœื”.
  • ื”ื•ืจืื” (001): ืžืฉื•ื•ื” ืืช ื”ืขืจืš ื‘ืฆื•ื‘ืจ ืขื 0x86dd, ื›ืœื•ืžืจ, ืขื ืขืจืš ื”ืืชืจื˜ื™ื™ืค ืขื‘ื•ืจ IPv6. ืื ื”ืชื•ืฆืื” ื ื›ื•ื ื”, ืžื•ื ื” ื”ืชื•ื›ื ื™ืช ืขื•ื‘ืจ ืœื”ื•ืจืื” (002), ื•ืื ืœื, ืื– ืœ- (006).
  • ื”ื•ืจืื” (006): ืžืฉื•ื•ื” ืืช ื”ืขืจืš ืœ-0x800 (ืขืจืš ethertype ืขื‘ื•ืจ IPv4). ืื ื”ืชืฉื•ื‘ื” ื ื›ื•ื ื”, ืื– ื”ืชื•ื›ื ื™ืช ืขื•ื‘ืจืช ืืœ (007), ืื ืœื, ืื– ืืœ (015).

ื•ื›ืš ื”ืœืื” ืขื“ ืฉืชื•ื›ื ื™ืช ืกื™ื ื•ืŸ ื”ืžื ื•ืช ืชื—ื–ื™ืจ ืชื•ืฆืื”. ื–ื” ื‘ื“ืจืš ื›ืœืœ ื‘ื•ืœื™ืื ื™ืช. ื”ื—ื–ืจืช ืขืจืš ืœื ืืคืก (ื”ื•ืจืื” (014)) ืคื™ืจื•ืฉื” ืฉื”ื—ื‘ื™ืœื” ื”ืชืงื‘ืœื”, ื•ื”ื—ื–ืจืช ืขืจืš ืืคืก (ื”ื•ืจืื” (015)) ืคื™ืจื•ืฉื” ืฉื”ื—ื‘ื™ืœื” ืœื ื”ืชืงื‘ืœื”.

ื”ืžื›ื•ื ื” ื”ื•ื™ืจื˜ื•ืืœื™ืช BPF ื•ืงื•ื“ ื”ื‘ืชื™ื ืฉืœื” ื”ื•ืฆืขื• ืขืœ ื™ื“ื™ ืกื˜ื™ื‘ ืžืงืืŸ ื•ืืŸ ื’'ื™ื™ืงื•ื‘ืกื•ืŸ ื‘ืกื•ืฃ 1992 ื›ืืฉืจ ืžืืžืจื ืคื•ืจืกื ืžืกื ืŸ ืžื ื•ืช BSD: ืืจื›ื™ื˜ืงื˜ื•ืจื” ื—ื“ืฉื” ืœืœื›ื™ื“ืช ืžื ื•ืช ื‘ืจืžืช ื”ืžืฉืชืžืฉ, ื˜ื›ื ื•ืœื•ื’ื™ื” ื–ื• ื”ื•ืฆื’ื” ืœืจืืฉื•ื ื” ื‘ื›ื ืก Usenix ื‘ื—ื•ืจืฃ 1993.

ืžื›ื™ื•ื•ืŸ ืฉ-BPF ื”ื™ื ืžื›ื•ื ื” ื•ื™ืจื˜ื•ืืœื™ืช, ื”ื™ื ืžื’ื“ื™ืจื” ืืช ื”ืกื‘ื™ื‘ื” ืฉื‘ื” ืชื•ื›ื ื™ื•ืช ืคื•ืขืœื•ืช. ื‘ื ื•ืกืฃ ืœืงื•ื“ ื”ื‘ืชื™ื, ื”ื•ื ื’ื ืžื’ื“ื™ืจ ืืช ืžื•ื“ืœ ื”ื–ื™ื›ืจื•ืŸ ื”ืืฆื•ื•ื” (ื”ื•ืจืื•ืช ื”ื˜ืขื™ื ื” ืžื™ื•ืฉืžื•ืช ื‘ืื•ืคืŸ ืžืจื•ืžื– ืขืœ ื”ืืฆื•ื•ื”), ืื•ื’ืจื™ื (A ื•-X; ืื•ื’ืจื™ ืžืฆื‘ืจ ื•ืื™ื ื“ืงืก), ืื—ืกื•ืŸ ื–ื™ื›ืจื•ืŸ ืฉืจื™ื˜ื” ื•ืžื•ื ื” ืชื•ื›ื ื™ื•ืช ืžืจื•ืžื–. ืžืขื ื™ื™ืŸ ืœืฆื™ื™ืŸ ืฉืงื•ื“ ื”ื‘ื™ื˜ื™ื ืฉืœ BPF ืขื•ืฆื‘ ืขืœ ืคื™ ื”-Motorola 6502 ISA. ื›ืคื™ ืฉืกื˜ื™ื‘ ืžืงืืŸ ื ื–ื›ืจ ื‘ืฉืœื• ื“ื•"ื— ื”ืžืœื™ืื” ื‘-Sharkfest '11, ื”ื•ื ื”ื›ื™ืจ ืืช ื”-build 6502 ืžื™ืžื™ ื‘ื™ืช ื”ืกืคืจ ื”ืชื™ื›ื•ืŸ ืฉืœื• ืœืชื›ื ื•ืช ื‘-Apple II, ื•ื”ื™ื“ืข ื”ื–ื” ื”ืฉืคื™ืข ืขืœ ืขื‘ื•ื“ืชื• ื‘ืชื›ื ื•ืŸ ื”-BPF bytecode.

ืชืžื™ื›ืช BPF ืžื™ื•ืฉืžืช ื‘ืœื™ื‘ืช ืœื™ื ื•ืงืก ื‘ื’ืจืกืื•ืช v2.5 ื•ืžืขืœื”, ื”ื ื•ืกืคื” ื‘ืขื™ืงืจ ืขืœ ื™ื“ื™ ืžืืžืฆื™ื• ืฉืœ ื’'ื™ื™ ืฉื•ืœื™ืกื˜. ืงื•ื“ BPF ื ื•ืชืจ ืœืœื ืฉื™ื ื•ื™ ืขื“ 2011, ื›ืืฉืจ ืืจื™ืง Dumaset ืขื™ืฆื‘ ืžื—ื“ืฉ ืืช ื”ืžืชื•ืจื’ืžืŸ BPF ื›ืš ืฉื™ืคืขืœ ื‘ืžืฆื‘ JIT (ืžืงื•ืจ: JIT ืขื‘ื•ืจ ืžืกื ื ื™ ืžื ื•ืช). ืœืื—ืจ ืžื›ืŸ, ื”ืงืจื ืœ, ื‘ืžืงื•ื ืœืคืจืฉ BPF bytecode, ื™ื•ื›ืœ ืœื”ืžื™ืจ ื™ืฉื™ืจื•ืช ืชื•ื›ื ื™ื•ืช BPF ืœืืจื›ื™ื˜ืงื˜ื•ืจืช ื”ื™ืขื“: x86, ARM, MIPS ื•ื›ื•'.

ืžืื•ื—ืจ ื™ื•ืชืจ, ื‘ืฉื ืช 2014, ืืœื›ืกื™ื™ Starovoitov ื”ืฆื™ืข ืžื ื’ื ื•ืŸ JIT ื—ื“ืฉ ืขื‘ื•ืจ BPF. ืœืžืขืฉื”, ื”-JIT ื”ื—ื“ืฉ ื”ื–ื” ื”ืคืš ืœืืจื›ื™ื˜ืงื˜ื•ืจื” ื—ื“ืฉื” ืžื‘ื•ืกืกืช BPF ื•ื ืงืจื eBPF. ืื ื™ ื—ื•ืฉื‘ ืฉืฉื ื™ ื”-VMs ื”ืชืงื™ื™ืžื• ื™ื—ื“ ื‘ืžืฉืš ื–ืžืŸ ืžื”, ืื‘ืœ ื›ืจื’ืข ืกื™ื ื•ืŸ ืžื ื•ืช ืžื™ื•ืฉื ืขืœ ื‘ืกื™ืก eBPF. ืœืžืขืฉื”, ื‘ื“ื•ื’ืžืื•ืช ืจื‘ื•ืช ืฉืœ ืชื™ืขื•ื“ ืžื•ื“ืจื ื™, BPF ืžื•ื‘ืŸ ื›-eBPF, ื•ื”-BPF ื”ืงืœืืกื™ ื™ื“ื•ืข ื”ื™ื•ื ื›-cBPF.

eBPF ืžืจื—ื™ื‘ ืืช ื”ืžื›ื•ื ื” ื”ื•ื™ืจื˜ื•ืืœื™ืช ื”ืงืœืืกื™ืช BPF ื‘ื›ืžื” ื“ืจื›ื™ื:

  • ืžื‘ื•ืกืก ืขืœ ืืจื›ื™ื˜ืงื˜ื•ืจื•ืช ืžื•ื“ืจื ื™ื•ืช ืฉืœ 64 ืกื™ื‘ื™ื•ืช. eBPF ืžืฉืชืžืฉ ื‘ืื•ื’ืจื™ื ืฉืœ 64 ืกื™ื‘ื™ื•ืช ื•ืžื’ื“ื™ืœ ืืช ืžืกืคืจ ื”ืื•ื’ืจื™ื ื”ื–ืžื™ื ื™ื ืž-2 (ืฆื‘ืจ ื•-X) ืœ-10. eBPF ืžืกืคืง ื’ื ืงื•ื“ื™ื ื ื•ืกืคื™ื (BPF_MOV, BPF_JNE, BPF_CALL...).
  • ืžื ื•ืชืง ืžืชืช ื”ืžืขืจื›ืช ืฉืœ ืฉื›ื‘ืช ื”ืจืฉืช. BPF ื”ื™ื” ืงืฉื•ืจ ืœืžื•ื“ืœ ื ืชื•ื ื™ ื”ืืฆื•ื•ื”. ืžื›ื™ื•ื•ืŸ ืฉื”ื•ื ืฉื™ืžืฉ ืœืกื™ื ื•ืŸ ืžื ื•ืช, ื”ืงื•ื“ ืฉืœื• ื”ื™ื” ืžืžื•ืงื ื‘ืชืช ื”ืžืขืจื›ืช ื”ืžืกืคืงืช ืชืงืฉื•ืจืช ืจืฉืช. ืขื ื–ืืช, ื”ืžื›ื•ื ื” ื”ื•ื™ืจื˜ื•ืืœื™ืช eBPF ืื™ื ื” ืงืฉื•ืจื” ืขื•ื“ ืœืžื•ื“ืœ ื”ื ืชื•ื ื™ื ื•ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ื” ืœื›ืœ ืžื˜ืจื”. ืื–, ื›ืขืช ื ื™ืชืŸ ืœื—ื‘ืจ ืืช ืชื•ื›ื ื™ืช eBPF ืœื ืงื•ื“ืช ืขืงื™ื‘ื” ืื• kprobe. ื–ื” ืคื•ืชื— ืืช ื”ื“ืจืš ืœืžื›ืฉื•ืจ eBPF, ื ื™ืชื•ื— ื‘ื™ืฆื•ืขื™ื ื•ืžืงืจื™ ืฉื™ืžื•ืฉ ืจื‘ื™ื ืื—ืจื™ื ื‘ื”ืงืฉืจ ืฉืœ ืชืช-ืžืขืจื›ื•ืช ืœื™ื‘ื” ืื—ืจื•ืช. ื›ืขืช ืงื•ื“ ื”-eBPF ืžืžื•ืงื ื‘ื ืชื™ื‘ ืฉืœื•: kernel/bpf.
  • ืžืื’ืจื™ ื ืชื•ื ื™ื ื’ืœื•ื‘ืœื™ื™ื ื‘ืฉื ืžืคื•ืช. ืžืคื•ืช ื”ืŸ ืžืื’ืจื™ ืžืคืชื—-ืขืจืš ื”ืžืืคืฉืจื™ื ื—ื™ืœื•ืคื™ ื ืชื•ื ื™ื ื‘ื™ืŸ ืžืจื—ื‘ ื”ืžืฉืชืžืฉ ืœืžืจื—ื‘ ื”ืงืจื ืœ. eBPF ืžืกืคืง ืžืกืคืจ ืกื•ื’ื™ื ืฉืœ ืžืคื•ืช.
  • ืคื•ื ืงืฆื™ื•ืช ืžืฉื ื™ื•ืช. ื‘ืคืจื˜, ื›ื“ื™ ืœืฉื›ืชื‘ ื—ื‘ื™ืœื”, ืœื—ืฉื‘ ืกื›ื•ื ื‘ื™ืงื•ืจืช ืื• ืœืฉื›ืคืœ ื—ื‘ื™ืœื”. ืคื•ื ืงืฆื™ื•ืช ืืœื• ืคื•ืขืœื•ืช ื‘ืชื•ืš ื”ืœื™ื‘ื” ื•ืื™ื ืŸ ืชื•ื›ื ื™ื•ืช ืžืจื—ื‘ ืžืฉืชืžืฉ. ืืชื” ื™ื›ื•ืœ ื’ื ืœื‘ืฆืข ืฉื™ื—ื•ืช ืžืขืจื›ืช ืžืชื•ื›ื ื™ื•ืช eBPF.
  • ืกื™ื•ื ืฉื™ื—ื•ืช. ื’ื•ื“ืœ ื”ืชื•ื›ื ื™ืช ื‘-eBPF ืžื•ื’ื‘ืœ ืœ-4096 ื‘ืชื™ื. ืชื›ื•ื ืช ื”-tail call ืžืืคืฉืจืช ืœืชื•ื›ื ื™ืช eBPF ืœื”ืขื‘ื™ืจ ืืช ื”ืฉืœื™ื˜ื” ืœืชื•ื›ื ื™ืช eBPF ื—ื“ืฉื” ื•ื‘ื›ืš ืœืขืงื•ืฃ ืžื’ื‘ืœื” ื–ื• (ื ื™ืชืŸ ืœืงืฉืจ ืขื“ 32 ืชื•ื›ื ื™ื•ืช ื‘ื“ืจืš ื–ื•).

eBPF: ื“ื•ื’ืžื”

ื™ืฉื ืŸ ืžืกืคืจ ื“ื•ื’ืžืื•ืช ืขื‘ื•ืจ eBPF ื‘ืžืงื•ืจื•ืช ืœื™ื‘ืช ืœื™ื ื•ืงืก. ื”ื ื–ืžื™ื ื™ื ื‘- samples/bpf/. ื›ื“ื™ ืœื”ืจื›ื™ื‘ ืืช ื”ื“ื•ื’ืžืื•ืช ื”ืืœื”, ืคืฉื•ื˜ ื”ื–ืŸ:

$ sudo make samples/bpf/

ืื ื™ ืœื ืื›ืชื•ื‘ ื“ื•ื’ืžื” ื—ื“ืฉื” ืขื‘ื•ืจ eBPF ื‘ืขืฆืžื™, ืืœื ืืฉืชืžืฉ ื‘ืื—ืช ื”ื“ื•ื’ืžืื•ืช ื”ื–ืžื™ื ื•ืช ื‘- samples/bpf/. ืื ื™ ืืกืชื›ืœ ืขืœ ื—ืœืงื™ื ืžืกื•ื™ืžื™ื ืฉืœ ื”ืงื•ื“ ื•ืืกื‘ื™ืจ ืื™ืš ื”ื•ื ืขื•ื‘ื“. ื›ื“ื•ื’ืžื”, ื‘ื—ืจืชื™ ื‘ืชื•ื›ื ื™ืช tracex4.

ื‘ืื•ืคืŸ ื›ืœืœื™, ื›ืœ ืื—ืช ืžื”ื“ื•ื’ืžืื•ืช ื‘ื“ื•ื’ืžืื•ืช/bpf/ ืžื•ืจื›ื‘ืช ืžืฉื ื™ ืงื‘ืฆื™ื. ื‘ืžืงืจื” ื”ื–ื”:

  • tracex4_kern.c, ืžื›ื™ืœ ืืช ืงื•ื“ ื”ืžืงื•ืจ ืฉื™ื‘ื•ืฆืข ื‘ืœื™ื‘ื” ื‘ืชื•ืจ eBPF bytecode.
  • tracex4_user.c, ืžื›ื™ืœ ืชื•ื›ื ื™ืช ืžืžืจื—ื‘ ื”ืžืฉืชืžืฉ.

ื‘ืžืงืจื” ื–ื”, ืขืœื™ื ื• ืœื‘ืฆืข ืงื•ืžืคื™ืœืฆื™ื” tracex4_kern.c ืœืงื•ื“ ื‘ืชื™ื eBPF. ื›ืจื’ืข ื‘ gcc ืื™ืŸ ืงืฆื” ืื—ื•ืจื™ ืขื‘ื•ืจ eBPF. ืœึฐืžึทืจึฐื‘ึผึถื” ื”ึทืžึทื–ึธืœ, clang ื™ื›ื•ืœ ืœื”ื•ืฆื™ื eBPF bytecode. Makefile ะธัะฟะพะปัŒะทัƒะตั‚ clang ืœื”ื™ื“ื•ืจ tracex4_kern.c ืœืงื•ื‘ืฅ ื”ืื•ื‘ื™ื™ืงื˜.

ืฆื™ื™ื ืชื™ ืœืžืขืœื” ืฉืื—ืช ื”ืชื›ื•ื ื•ืช ื”ืžืขื ื™ื™ื ื•ืช ื‘ื™ื•ืชืจ ืฉืœ eBPF ื”ืŸ ืžืคื•ืช. tracex4_kern ืžื’ื“ื™ืจ ืžืคื” ืื—ืช:

struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};

BPF_MAP_TYPE_HASH ื”ื•ื ืื—ื“ ืžืกื•ื’ื™ ื”ื›ืจื˜ื™ืกื™ื ื”ืจื‘ื™ื ื”ืžื•ืฆืขื™ื ืขืœ ื™ื“ื™ eBPF. ื‘ืžืงืจื” ื”ื–ื”, ื–ื” ืจืง ื—ืฉื™ืฉ. ื™ื™ืชื›ืŸ ืฉื’ื ืฉืžืช ืœื‘ ืœืžื•ื“ืขื” SEC("maps"). SEC ื”ื•ื ืžืืงืจื• ื”ืžืฉืžืฉ ืœื™ืฆื™ืจืช ืงื˜ืข ื—ื“ืฉ ืฉืœ ืงื•ื‘ืฅ ื‘ื™ื ืืจื™. ื‘ืขืฆื, ื‘ื“ื•ื’ืžื” tracex4_kern ืฉื ื™ ืกืขื™ืคื™ื ื ื•ืกืคื™ื ืžื•ื’ื“ืจื™ื:

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&my_map, &ptr); 
    return 0;
}
    
SEC("kretprobe/kmem_cache_alloc_node") 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // ะฟะพะปัƒั‡ะฐะตะผ ip-ะฐะดั€ะตั ะฒั‹ะทั‹ะฒะฐัŽั‰ะตะน ัั‚ะพั€ะพะฝั‹ kmem_cache_alloc_node() 
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
    return 0;
}   

ืฉืชื™ ื”ืคื•ื ืงืฆื™ื•ืช ื”ืœืœื• ืžืืคืฉืจื•ืช ืœืš ืœืžื—ื•ืง ืขืจืš ืžื”ืžืคื” (kprobe/kmem_cache_free) ื•ื”ื•ืกื™ืคื• ืขืจืš ื—ื“ืฉ ืœืžืคื” (kretprobe/kmem_cache_alloc_node). ื›ืœ ืฉืžื•ืช ื”ืคื•ื ืงืฆื™ื•ืช ืฉื ื›ืชื‘ื• ื‘ืื•ืชื™ื•ืช ื’ื“ื•ืœื•ืช ืชื•ืืžื™ื ืœืคืงื•ื“ื•ืช ืžืืงืจื• ืฉื”ื•ื’ื“ืจื• ื‘ bpf_helpers.h.

ืื ืื ื™ ื–ื•ืจืง ืืช ื”ืงื˜ืขื™ื ืฉืœ ืงื•ื‘ืฅ ื”ืื•ื‘ื™ื™ืงื˜, ืื ื™ ืืžื•ืจ ืœืจืื•ืช ืฉื”ืงื˜ืขื™ื ื”ื—ื“ืฉื™ื ื”ืืœื” ื›ื‘ืจ ืžื•ื’ื“ืจื™ื:

$ objdump -h tracex4_kern.o

tracex4_kern.o: file format elf64-little

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 kprobe/kmem_cache_free 00000048 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 kretprobe/kmem_cache_alloc_node 000000c0 0000000000000000 0000000000000000 00000088 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
3 maps 0000001c 0000000000000000 0000000000000000 00000148 2**2
CONTENTS, ALLOC, LOAD, DATA
4 license 00000004 0000000000000000 0000000000000000 00000164 2**0
CONTENTS, ALLOC, LOAD, DATA
5 version 00000004 0000000000000000 0000000000000000 00000168 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .eh_frame 00000050 0000000000000000 0000000000000000 00000170 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

ื™ืฉ ื’ื tracex4_user.c, ืชื•ื›ื ื” ืจืืฉื™ืช. ื‘ืขื™ืงืจื•ืŸ, ื”ืชื•ื›ื ื™ืช ื”ื–ื• ืžืื–ื™ื ื” ืœืื™ืจื•ืขื™ื kmem_cache_alloc_node. ื›ืืฉืจ ืื™ืจื•ืข ื›ื–ื” ืžืชืจื—ืฉ, ืงื•ื“ ื”-eBPF ื”ืžืชืื™ื ืžื‘ื•ืฆืข. ื”ืงื•ื“ ืฉื•ืžืจ ืืช ืชื›ื•ื ืช ื”-IP ืฉืœ ื”ืื•ื‘ื™ื™ืงื˜ ื‘ืžืคื”, ื•ืื– ื”ืื•ื‘ื™ื™ืงื˜ ืขื•ื‘ืจ ื‘ืœื•ืœืื” ื“ืจืš ื”ืชื•ื›ื ื™ืช ื”ืจืืฉื™ืช. ื“ื•ื’ืžื:

$ sudo ./tracex4
obj 0xffff8d6430f60a00 is 2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is 6sec old was allocated at ip ffffffff98090e8f

ืื™ืš ืชื•ื›ื ื™ืช ืžืจื—ื‘ ืžืฉืชืžืฉ ื•ืชื•ื›ื ื™ืช eBPF ืงืฉื•ืจื•ืช? ืขืœ ืืชื—ื•ืœ tracex4_user.c ื˜ื•ืขืŸ ืงื•ื‘ืฅ ืื•ื‘ื™ื™ืงื˜ tracex4_kern.o ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” load_bpf_file.

int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &r)) {
        perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf("%s", bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}

ื‘ืขืช ื‘ื™ืฆื•ืข load_bpf_file ื‘ื“ื™ืงื•ืช ื”ืžื•ื’ื“ืจื•ืช ื‘ืงื•ื‘ืฅ eBPF ืžืชื•ื•ืกืคื•ืช /sys/kernel/debug/tracing/kprobe_events. ืขื›ืฉื™ื• ืื ื—ื ื• ืžืงืฉื™ื‘ื™ื ืœืื™ืจื•ืขื™ื ื”ืืœื” ื•ื”ืชื•ื›ื ื™ืช ืฉืœื ื• ื™ื›ื•ืœื” ืœืขืฉื•ืช ืžืฉื”ื• ื›ืฉื”ื ืžืชืจื—ืฉื™ื.

$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node

ื›ืœ ืฉืืจ ื”ืชื•ื›ื ื™ื•ืช ื‘-example/bpf/ ื‘ื ื•ื™ื•ืช ื‘ืื•ืคืŸ ื“ื•ืžื”. ื”ื ืชืžื™ื“ ืžื›ื™ืœื™ื ืฉื ื™ ืงื‘ืฆื™ื:

  • XXX_kern.c: ืชื•ื›ื ื™ืช eBPF.
  • XXX_user.c: ืชื•ื›ื ื” ืจืืฉื™ืช.

ืชื•ื›ื ื™ืช eBPF ืžื–ื”ื” ืžืคื•ืช ื•ืคื•ื ืงืฆื™ื•ืช ื”ืงืฉื•ืจื•ืช ืœืžืงื˜ืข. ื›ืืฉืจ ื”ืงืจื ืœ ืžื ืคื™ืง ืื™ืจื•ืข ืžืกื•ื’ ืžืกื•ื™ื (ืœื“ื•ื’ืžื”, tracepoint), ื”ืคื•ื ืงืฆื™ื•ืช ื”ืžืื•ื’ื“ื•ืช ืžื‘ื•ืฆืขื•ืช. ื”ื›ืจื˜ื™ืกื™ื ืžืกืคืงื™ื ืชืงืฉื•ืจืช ื‘ื™ืŸ ืชื•ื›ื ื™ืช ื”ืงืจื ืœ ืœืชื•ื›ื ื™ืช ืžืจื—ื‘ ื”ืžืฉืชืžืฉ.

ืžืกืงื ื”

ืžืืžืจ ื–ื” ื“ืŸ ื‘-BPF ื•ื‘-eBPF ื‘ืžื•ื ื—ื™ื ื›ืœืœื™ื™ื. ืื ื™ ื™ื•ื“ืข ืฉื™ืฉ ื”ืจื‘ื” ืžื™ื“ืข ื•ืžืฉืื‘ื™ื ืขืœ eBPF ื”ื™ื•ื, ืื– ืื ื™ ืืžืœื™ืฅ ืขืœ ื›ืžื” ืžืฉืื‘ื™ื ื ื•ืกืคื™ื ืœืžื—ืงืจ ื ื•ืกืฃ

ืื ื™ ืžืžืœื™ืฅ ืœืงืจื•ื:

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”