Linux waxay haysataa tiro badan oo qalab ah oo lagu saxayo kernel-ka iyo codsiyada. Badankoodu waxay saameyn xun ku leeyihiin waxqabadka codsiga mana loo isticmaali karo wax soo saarka.
Dhowr sano ka hor ayaa jirtay
Waxaa hore u jiray adeegyo badan oo codsi ah oo isticmaala eBPF, maqaalkan waxaan ku eegi doonaa sida loo qoro utility-ga gaarka ah ee ku saleysan maktabadda.
Ceph wuu gaabiyaa
Martigeliye cusub ayaa lagu daray kooxda Ceph. Ka dib markii aan u haajiray qaar ka mid ah xogta, waxaan ogaanay in xawaaraha socodsiinta codsiyada qorista ay aad uga yar tahay server-yada kale.
Si ka duwan aaladaha kale, martida loo yahay waxay adeegsatay bcache iyo linux 4.15 kernel-ka cusub. Tani waxay ahayd markii ugu horeysay ee halkan lagu isticmaalo martigeliyaha qaabeyntan. Waxaana markaas caddaatay in asalka dhibku uu aragti ahaan noqon karo wax kasta.
Baaritaanka Martigeliyaha
Aan ku bilowno inaan eegno waxa ka dhacaya gudaha habka ceph-osd. Tan waxaan u isticmaali doonaa
Sawirku wuxuu noo sheegayaa in shaqada fdatasync() waqti badan ayay ku qaadatay codsi u dirida hawlaha codsi_guud (). Tani waxay ka dhigan tahay in ay u badan tahay in sababta dhibaatooyinka ay tahay meel ka baxsan osd daemon laftiisa. Tani waxay noqon kartaa mid ka mid ah kernel ama disks. Soo saarida iostat waxay muujisay daahitaan sare ee habaynta codsiyada saxanka bcache.
Markii aan hubinay martigeliyaha, waxaan ogaanay in systemd-udevd daemon uu isticmaalo waqti aad u badan oo CPU ah - qiyaastii 20% dhowr nooc. Tani waa dabeecad qariib ah, markaa waxaad u baahan tahay inaad ogaato sababta. Maadaama Systemd-udevd uu la shaqeeyo dhacdooyinka, waxaan go'aansanay inaan ku eegno iyaga udevadm kormeeraha. Waxay soo baxday in tiro badan oo dhacdooyin isbeddel ah loo sameeyay qalab kasta oo xannibaya nidaamka. Tani waa wax aan caadi ahayn, markaa waa inaan eegno waxa abuura dhammaan dhacdooyinkan.
Isticmaalka Qalabka BCC
Sidaan horeyba u ogaanay, kernel-ka (iyo ceph daemon ee wicitaanka nidaamka) wuxuu ku qaataa waqti badan codsi_guud (). Aynu isku dayno inaan cabbirno xawaaraha shaqadan. IN
Habkani inta badan si dhakhso ah ayuu u shaqeeyaa. Waxa kaliya ee ay sameyso waa u gudubta codsiga safka darawalka qalabka.
Bcache waa qalab qalafsan oo dhab ahaantii ka kooban saddex saxan:
- qalabka dib-u-celinta (cached disk), kiiskan waa HDD gaabis ah;
- Aaladda kaydinta (caching disk), halkan kani waa qayb ka mid ah qalabka NVMe;
- Aaladda bcache Virtual ee uu codsigu ku shaqeeyo.
Waan ognahay in gudbinta codsigu uu gaabis yahay, laakiin waa kuwee aaladahan? Tan wax yar ka dib ayaan wax ka qaban doonaa.
Waxaan hadda ognahay in dhacdooyinka ay u badan tahay inay sababi karaan dhibaatooyin. Helitaanka waxa dhabta ah ee sababa jiilkooda maaha wax fudud. Aynu ka soo qaadno in kani yahay nooc ka mid ah software-ka oo si xilliyo ah loo soo saaro. Aynu aragno nooca software ee ku shaqeeya nidaamka iyadoo la adeegsanayo qoraal exsnoop ka mid ah
Tusaale ahaan sidan:
/usr/share/bcc/tools/execsnoop | tee ./execdump
Kuma tusi doono wax soo saarka buuxa ee execsnoop halkan, laakiin hal xariiq oo xiiso noo leh ayaa sidan u ekaa:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Tiirka saddexaad waa PPID (PID) ee habka. Habka PID 5802 wuxuu noqday mid ka mid ah dunta nidaamkayaga la socodka. Marka la hubinayo qaabeynta nidaamka kormeerka, xuduudaha khaldan ayaa la helay. Heerkulka adabtarada HBA ayaa la qaaday 30-kii sekan kasta, taas oo aad uga badan intii loo baahnaa. Ka dib markii aan u beddelnay muddada jeegga mid ka dheer, waxaan ogaanay in codsiga habaynta daahitaanka ee martida loo yahay uusan sii muuqan marka la barbardhigo martigeliyayaasha kale.
Laakin wali ma cadda sababta aaladda bcache-ga uu aad u gaabis u ahaa. Waxaan diyaarinay goob tijaabo ah oo leh qaab isku mid ah waxaanan isku daynay in aan dib u soo saarno dhibaatada anagoo ku shaqaynayna fio on bcache, oo si joogto ah u socodsiiya udevadm kicinta si ay u abuurto dhacdooyin.
Qorista Aalado ku Salaysan BCC
Aan isku dayno inaan qorno utility fudud si aan u raadraacno oo u muujino wicitaanada ugu gaabiya codsi_guud (). Waxa kale oo aanu xiisaynaynaa magaca wadista shaqadan loogu yeedhay.
Qorshuhu waa mid fudud:
- Is diwaangeli kprobe on codsi_guud ():
- Waxaan ku keydineynaa magaca diskka xusuusta, oo lagu heli karo doodda shaqada;
- Waxaan keydineynaa shaambada waqtiyada.
- Is diwaangeli kretprobe ka soo noqoshada codsi_guud ():
- Waxaan helnaa shaambada wakhtiga hadda;
- Waxaan raadineynaa shaambada waqtiga la keydiyay waxaanan barbar dhignaa kan hadda jira;
- Haddii natiijadu ay ka weyn tahay midka la cayimay, markaa waxaan helnaa magaca diskka ee la keydiyay oo aan ku tusino terminalka.
Kprobes и kretprobes Isticmaal habka jabinta si aad u bedesho code-ka shaqada ee duulista. Waad akhrin kartaa
Qoraalka eBPF ee ku jira qoraalka python wuxuu u eg yahay sidan:
bpf_text = “”” # Here will be the bpf program code “””
Si loo kala beddelo xogta u dhaxaysa shaqooyinka, barnaamijyada eBPF ayaa isticmaala
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
Halkan waxaan ku diiwaan gelineynaa miiska xashiishka oo la yiraahdo p, oo leh nooca muhiimka ah U64 iyo qiimaha nooca xogta qaabdhismeedka_t. Jadwalka waxa lagu heli doonaa macnaha guud ee barnaamijkayaga BPF. BPF_PERF_OUTPUT macro waxa uu diiwaan galinayaa miis kale oo la yiraahdo dhacdooyinka, kaas oo loo isticmaalo
Marka la cabbirayo dib u dhacyada u dhexeeya wacitaanka shaqada iyo ka soo noqoshada, ama inta u dhaxaysa wicitaannada hawlo kala duwan, waxaad u baahan tahay inaad tixgeliso in xogta la helay ay tahay inay ka tirsan tahay isla macnaha guud. Si kale haddii loo dhigo, waxaad u baahan tahay inaad xasuusato wax ku saabsan bilaabista isku midka ah ee suurtogalka ah ee hawlaha. Waxaan awood u leenahay inaan cabbirno daahitaanka u dhexeeya u yeerista shaqada ee macnaha guud ee hal geeddi-socod iyo ka soo noqoshada shaqadaas macnaha habraac kale, laakiin tani waxay u badan tahay inay faa'iido lahayn. Tusaale wanaagsan halkan ayaa noqon lahaa
Marka xigta, waxaan u baahanahay inaan qorno koodka socon doona marka shaqada la baarayo loo yaqaan:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
Halkan doodda koowaad ee shaqada loo yaqaan ayaa loo beddeli doonaa doodda labaad
Shaqada soo socota ayaa la wici doonaa marka laga soo laabto codsi_guud ():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
Shaqadani waxay la mid tahay tii hore: waxaynu ogaanaa PID-da habka iyo wakhtiga, laakiin ha u qoondayn xusuusta qaab dhismeedka xogta cusub. Taa beddelkeeda, waxaan ka raadineynaa miiska xashiishka qaab dhismeed horay u jiray anagoo adeegsanayna furaha == hadda PID. Haddii qaab-dhismeedka la helo, markaa waxaan ogaannaa magaca habka socodsiinta oo aan ku darno.
Isbedelka binary ee aan ku isticmaalno halkan ayaa loo baahan yahay si loo helo dunta GID. kuwaas. PID ee habka ugu muhiimsan ee bilaabay dunta macnaha guud ee aan ka shaqeyneyno. Shaqada aan ugu yeerno
Marka la soo saarayo terminalka, hadda ma xiisayneyno dunta, laakiin waxaan xiiseyneynaa habka ugu muhiimsan. Ka dib marka la barbardhigo dib u dhaca ka dhashay xadka la bixiyay, waxaan ka gudubnaa qaabdhismeedkayaga data meel bannaan oo isticmaale loo maro miiska dhacdooyinka, ka dib waxaan ka tirtirnaa soo galitaanka p.
Qoraalka python ee soo shubi doona koodkan, waxaan u baahanahay inaan ku bedelno MIN_US iyo FACTOR xadka dib u dhaca iyo cutubyada waqtiga, kuwaas oo aan dhex mari doono doodaha:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Hadda waxaan u baahanahay inaan ku diyaarino barnaamijka BPF anagoo adeegsanayna
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Waa inaan sidoo kale go'aan ka gaarnaa xogta qaabdhismeedka_t qoraalkayaga, haddii kale ma awoodi doono inaan wax akhriyo:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Tallaabada ugu dambeysa waa in xogta loo soo saaro terminalka:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Qoraalka laftiisa ayaa laga heli karaa
Ugu dambeyntii! Hadda waxaan aragnaa in waxa u eg qalab bcache ah oo taagan uu dhab ahaantii yahay wicitaan joogsanaya codsi_guud () saxan kaydsan.
Ku dhex qod Kernel-ka
Maxaa dhab ahaan hoos u dhacaya inta lagu jiro gudbinta codsiga? Waxaan aragnaa in dib u dhacu dhaco xitaa ka hor bilowga xisaabinta codsiga, i.e. xisaabinta codsi gaar ah oo ku saabsan soo-saarka dheeraadka ah ee tirakoobka (/proc/diskstats ama iostat) weli ma bilaaban. Tan waxaa si fudud loo xaqiijin karaa iyadoo la wado iostat inta la soo saarayo dhibaatada, ama
Haddaan eegno shaqada codsi_guud (), ka dib waxaan arki doonaa in ka hor inta uusan codsigu bilaabin xisaabinta, laba hawlood oo kale ayaa loo yaqaan. Marka hore - jeegag_gudbi_make_request_checks(), wuxuu sameeyaa hubinta sharcinimada codsiga ee ku saabsan goobaha saxanka. Labaad -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
Dhexdeeda, kernel-ku wuxuu sugayaa in safku furmo. Aynu cabbirno dib u dhaca blk_queue_geli():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Waxay u egtahay inaan ku dhownahay xal. Hawlaha loo isticmaalo in lagu qaboojiyo/furto safka waa
Waqtiga ay qaadanayso in la nadiifiyo safkan waxay la mid tahay daahitaanka diskka maadaama kernel uu sugayo dhammaan hawlgallada safka ku jira inay dhammaystirmaan. Marka safku maran yahay, habaynta ayaa la bedelayaa. Kadibna waxaa la yiraahdaa
Hadda waxaan ognahay in ku filan si loo saxo xaaladda. Amarka udevadm kicinta wuxuu sababaa dejinta qalabka xannibaadda in lagu dabaqo. Goobahan waxaa lagu sifeeyay xeerarka udev. Waxaan ku heli karnaa goobaha qaboojinaya safka anagoo isku daynayna inaan ku bedelno sysfs ama annagoo eegayna koodhka isha kernel. Waxaan sidoo kale isku dayi karnaa utility BCC
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Xeerarka Udev si dhif ah ayey isu beddelaan, badanaana tani waxay ku dhacdaa hab la xakameeyey. Markaa waxaan aragnaa in xitaa adeegsiga qiyamka hore loo dejiyay ay keenayso dib u dhac ku yimaada wareejinta codsiga arjiga diskka. Dabcan, abuurista dhacdooyinka udev marka aysan jirin wax isbeddel ah oo ku yimaada qaabeynta diskka (tusaale ahaan, qalabku maaha mid la rakibay / go'ay) ma aha dhaqan wanaagsan. Si kastaba ha noqotee, waxaan ka caawin karnaa kernel-ku inuu sameeyo shaqo aan loo baahnayn oo uu xayiro safka codsiga haddii aysan lagama maarmaan ahayn.
Ugu Dambeyn
eBPF waa qalab aad u dabacsan oo awood badan. Maqaalka waxaan eegnay hal kiis oo wax ku ool ah waxaanan muujinnay qayb yar oo ka mid ah waxa la samayn karo. Haddii aad xiisaynayso horumarinta adeegyada BCC, waxaa habboon in aad eegto
Waxa jira qalab kale oo wax-ka-hortagga iyo sifaynta xiisaha leh oo ku salaysan eBPF. Mid ka mid ah -
Source: www.habr.com