I-Linux inenani elikhulu lamathuluzi okulungisa iphutha le-kernel nezinhlelo zokusebenza. Iningi lazo linomthelela omubi ekusebenzeni kohlelo lokusebenza futhi azikwazi ukusetshenziswa ekukhiqizeni.
Eminyakeni embalwa edlule kwakukhona
Sekuvele kunezinsiza eziningi ezisebenzisa i-eBPF, futhi kulesi sihloko sizobheka ukuthi ungabhala kanjani insiza yakho yokuphrofayili ngokusekelwe kumtapo wezincwadi.
UCeph Uhamba Kancane
Umsingathi omusha ungeziwe kuqoqo le-Ceph. Ngemva kokuthuthela enye idatha kuyo, siqaphele ukuthi isivinini sokucubungula izicelo zokubhala ngayo sasiphansi kakhulu kunakwamanye amaseva.
Ngokungafani namanye amapulatifomu, lo msingathi usebenzise i-bcache kanye ne-linux 4.15 kernel entsha. Bekungokokuqala ngqa ukuthi kusetshenziswe inqwaba yalokhu kulungiselelwa lapha. Futhi ngaleso sikhathi kwacaca ukuthi umsuka wenkinga ungaba noma yini.
Ukuphenya uMbambisi
Ake siqale ngokubheka ukuthi kwenzekani ngaphakathi kwenqubo ye-ceph-osd. Kulokhu sizosebenzisa
Isithombe sisitshela ukuthi umsebenzi i-fdatasync() uchithe isikhathi esiningi ukuthumela isicelo kumisebenzi generic_make_request(). Lokhu kusho ukuthi cishe imbangela yezinkinga isendaweni ethile ngaphandle kwe-osd daemon ngokwayo. Lokhu kungaba i-kernel noma amadiski. Okukhiphayo kwe-iostat kubonise ukubambezeleka okuphezulu ekucubunguleni izicelo ngamadiski e-bcache.
Lapho sihlola umsingathi, sithole ukuthi i-systemd-udevd daemon idla isikhathi esiningi se-CPU - cishe u-20% kuma-cores ambalwa. Lokhu ukuziphatha okungajwayelekile, ngakho-ke udinga ukuthola ukuthi kungani. Njengoba i-Systemd-udevd isebenza nama-uevents, sinqume ukuwabuka udevadm qapha. Kuvela ukuthi inani elikhulu lezenzakalo zoshintsho zakhiwe kudivayisi ngayinye ye-block ohlelweni. Lokhu akujwayelekile neze, ngakho-ke kuzofanele sibheke ukuthi yini ekhiqiza yonke le micimbi.
Ukusebenzisa i-BCC Toolkit
Njengoba sesivele sitholile, i-kernel (kanye ne-ceph daemon ocingweni lwesistimu) ichitha isikhathi esiningi generic_make_request(). Ake sizame ukukala isivinini salo msebenzi. IN
Lesi sici ngokuvamile sisebenza ngokushesha. Ekwenzayo nje ukudlulisa isicelo kumugqa womshayeli wedivayisi.
Bcache iyithuluzi eliyinkimbinkimbi eliqukethe amadiski amathathu:
- idivayisi yokusekela (i-cached disk), kulokhu i-HDD ehamba kancane;
- idivayisi ye-caching (i-caching disk), nansi ingxenye eyodwa yedivayisi ye-NVMe;
- idivayisi ebonakalayo ye-bcache uhlelo lokusebenza olusebenza ngayo.
Siyazi ukuthi ukudluliswa kwesicelo kuhamba kancane, kodwa kumaphi kulawa madivayisi? Sizobhekana nalokhu ngemva kwesikhashana.
Manje siyazi ukuthi izehlakalo zingadala izinkinga. Ukuthola ukuthi yini ngempela ebangela isizukulwane sabo akulula kangako. Ake sicabange ukuthi lolu uhlobo oluthile lwesofthiwe eyethulwa ngezikhathi ezithile. Ake sibone ukuthi hlobo luni lwesofthiwe esebenza kusistimu kusetshenziswa iskripthi execsnoop kusukela okufanayo
Ngokwesibonelo kanje:
/usr/share/bcc/tools/execsnoop | tee ./execdump
Ngeke sibonise okukhiphayo okugcwele kwe-execsnoop lapha, kodwa umugqa owodwa esithakaselayo ubukeke kanje:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Ikholomu yesithathu yi-PPID (i-PID yomzali) yenqubo. Inqubo ene-PID 5802 iphenduke enye yezinhlelo zesistimu yethu yokuqapha. Lapho kuhlolwa ukucushwa kwesistimu yokuqapha, kutholwe imingcele eyiphutha. Izinga lokushisa le-adaptha ye-HBA lithathwa njalo ngemizuzwana engama-30, okuvamise ukudlula isidingo. Ngemva kokushintsha isikhathi sokuhlola sibe side, sithole ukuthi ukubambezeleka kokucubungula isicelo kulo msingathi akusagqama uma kuqhathaniswa nabanye ababungazi.
Kodwa namanje akukacaci ukuthi kungani idivayisi ye-bcache yayihamba kancane. Silungise inkundla yokuhlola enokucushwa okufanayo futhi sazama ukukhiqiza kabusha inkinga ngokusebenzisa i-fio ku-bcache, ngezikhathi ezithile sisebenzisa i-udevadm trigger ukuze sikhiqize imicimbi.
Ukubhala Amathuluzi Asekelwe ku-BCC
Ake sizame ukubhala insiza elula ukuze silandele futhi sibonise izingcingo ezihamba kancane generic_make_request(). Futhi sinentshisekelo egameni ledrayivu lo msebenzi obizwe ngayo.
Uhlelo lulula:
- Bhalisa kprobe on generic_make_request():
- Sigcina igama lediski kumemori, lifinyeleleka ngokusebenzisa ingxabano yomsebenzi;
- Silondoloza isitembu sesikhathi.
- Bhalisa i-kretprobe ukubuya kusuka generic_make_request():
- Sithola isitembu sesikhathi samanje;
- Sibheka isitembu sesikhathi esilondoloziwe futhi sisiqhathanise nesamanje;
- Uma umphumela mkhulu kunalowo oshiwo, khona-ke sithola igama lediski eligciniwe futhi silibonise ku-terminal.
Ama-Kprobes ΠΈ ama-kretprobes sebenzisa i-breakpoint mechanism ukuze ushintshe ikhodi yokusebenza empukaneni. Ungafunda
Umbhalo we-eBPF ngaphakathi kweskripthi se-python ubukeka kanje:
bpf_text = βββ # Here will be the bpf program code βββ
Ukushintshanisa idatha phakathi kwemisebenzi, izinhlelo ze-eBPF zisebenzisa
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
Lapha sibhalisa ithebula le-hash elibizwa ngokuthi p, ngohlobo lokhiye u64 kanye nenani lohlobo hlela idatha_t. Ithebula lizotholakala kungqikithi yohlelo lwethu lwe-BPF. I-BPF_PERF_OUTPUT macro ibhalisa elinye ithebula elibizwa izenzakalo, esetshenziselwa
Lapho ukala ukubambezeleka phakathi kokubiza umsebenzi nokubuya kuwo, noma phakathi kwezingcingo eziya emisebenzini ehlukene, udinga ukucabangela ukuthi idatha eyamukelwe kufanele ibe ngeyomongo ofanayo. Ngamanye amazwi, udinga ukukhumbula mayelana nokwethulwa okuhambisanayo okungenzeka kwemisebenzi. Sinekhono lokulinganisa ukubambezeleka phakathi kokubiza umsebenzi kumongo wenqubo eyodwa nokubuya kulowo msebenzi kumongo wenye inqubo, kodwa lokhu cishe akusizi ngalutho. Isibonelo esihle lapha kungaba
Okulandelayo, sidinga ukubhala ikhodi ezosebenza lapho umsebenzi ongaphansi kocwaningo ubizwa ngokuthi:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
Lapha i-agumenti yokuqala yomsebenzi obizwayo izothathelwa indawo njengempikiswano yesibili
Umsebenzi olandelayo uzobizwa lapho ubuya generic_make_request():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
Lo msebenzi ufana nowangaphambilini: sithola i-PID yenqubo kanye nesitembu sesikhathi, kodwa singabeki inkumbulo kusakhiwo sedatha esisha. Esikhundleni salokho, sisesha ithebula le-hashi ukuthola isakhiwo esivele sikhona sisebenzisa ukhiye == i-PID yamanje. Uma isakhiwo sitholakala, khona-ke sithola igama lenqubo esebenzayo futhi singeze kuso.
Ukushintsha kanambambili esikusebenzisa lapha kuyadingeka ukuze sithole i-GID yochungechunge. labo. I-PID yenqubo eyinhloko eqale uchungechunge kumongo esisebenza kuwo. Umsebenzi esiwubizayo
Lapho sikhipha kutheminali, okwamanje asinantshisekelo kuchungechunge, kodwa sinentshisekelo kunqubo eyinhloko. Ngemva kokuqhathanisa ukubambezeleka okubangelwa umkhawulo onikeziwe, sidlula isakhiwo sethu idatha endaweni yomsebenzisi ngetafula izenzakalo, ngemva kwalokho sisusa okufakiwe kusuka p.
Kuskripthi se-python esizolayisha le khodi, sidinga ukufaka u-MIN_US kanye ne-FACTOR esikhundleni sokubambezeleka namayunithi esikhathi, esizowadlulisa kuma-agumenti:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Manje sidinga ukulungiselela uhlelo lwe-BPF nge
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Kuzodingeka futhi sinqume hlela idatha_t kusikripthi sethu, ngaphandle kwalokho ngeke sikwazi ukufunda lutho:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Isinyathelo sokugcina ukukhipha idatha kutheminali:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Iskripthi ngokwaso siyatholakala ku-
Ekugcineni! Manje siyabona ukuthi okwakubukeka sengathi idivayisi ye-bcache emile empeleni kuwucingo olubambekayo generic_make_request() okwediski egciniwe.
Gcoba ku-Kernel
Yini ngempela eyehlisa ijubane phakathi nokudluliswa kwesicelo? Siyabona ukuthi ukubambezeleka kwenzeka nangaphambi kokuqala kokubalwa kwesicelo, i.e. ukubalwa kwesicelo esithile sokukhipha okwengeziwe kwezibalo kuso (/proc/diskstats noma iostat) akukakaqali. Lokhu kungaqinisekiswa kalula ngokusebenzisa i-iostat ngenkathi kukhiqizwa kabusha inkinga, noma
Uma sibheka umsebenzi generic_make_request(), khona-ke sizobona ukuthi ngaphambi kokuba isicelo siqale ukubalwa kwezimali, kubizwa eminye imisebenzi emibili. Okokuqala - generic_make_request_checks(), yenza ukuhlola ukufaneleka kwesicelo mayelana nezilungiselelo zediski. Okwesibili -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
Kuyo, i-kernel ilinda ukuthi ulayini ungaqandi. Ake silinganise ukubambezeleka blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Kubukeka sengathi sesiseduze nesixazululo. Imisebenzi esetshenziswa ukumisa/ukukhulula ulayini
Isikhathi esisithathayo ukusula lo mugqa silingana nokubambezeleka kwediski njengoba i-kernel ilinda ukuthi yonke imisebenzi ekulayini iphele. Uma ulayini ungenalutho, izinguquko zezilungiselelo ziyasetshenziswa. Ngemva kwalokho kuthiwa
Manje sesazi ngokwanele ukulungisa isimo. Umyalo we-trigger ye-udevadm ubangela ukuthi izilungiselelo zedivayisi ye-block zisetshenziswe. Lezi zilungiselelo zichazwe emithethweni ye-udev. Singathola ukuthi yiziphi izilungiselelo ezifriza ulayini ngokuzama ukuzishintsha ngama-sysfs noma ngokubheka ikhodi yomthombo we-kernel. Singaphinda sizame insiza ye-BCC
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Imithetho ye-Udev ishintsha kuyaqabukela futhi lokhu kwenzeka ngendlela elawulwayo. Ngakho-ke siyabona ukuthi ngisho nokusebenzisa amanani asevele asethiwe kudala i-spike ekubambezelekeni kokudlulisa isicelo kusuka kuhlelo kuya kudiski. Yiqiniso, ukukhiqiza imicimbi ye-udev lapho kungekho zinguquko ekucushweni kwediski (isibonelo, idivayisi ayikhwezwanga/inqanyuliwe) akuwona umkhuba omuhle. Kodwa-ke, singasiza i-kernel ukuthi ingenzi umsebenzi ongadingekile futhi imise ulayini wesicelo uma kungenasidingo.
Isiphetho
I-eBPF iyithuluzi elivumelana nezimo kakhulu futhi elinamandla. Esihlokweni sibheke indaba eyodwa engokoqobo futhi sabonisa ingxenye encane yalokho okungenziwa. Uma ungathanda ukuthuthukisa izinsiza ze-BCC, kufanelekile ukuthi uzibheke
Kukhona amanye amathuluzi athokozisayo okulungisa iphutha nawokwenza iphrofayela asekelwe ku-eBPF. Omunye wabo -
Source: www.habr.com