Linux ine nhamba huru yezvishandiso zvekugadzirisa kernel uye maapplication. Mazhinji acho ane kukanganisa kwekuita kwekushandisa uye haagone kushandiswa mukugadzira.
Makore mashoma apfuura kwaivepo
Patova nezvakawanda zvekushandisa zvinoshandisa eBPF, uye mune ino chinyorwa tichatarisa maitiro ekunyora yako wega profiling utility zvichibva paraibhurari.
Ceph Ari Kunonoka
Muiti mutsva akawedzerwa kuboka reCeph. Mushure mekutamisa imwe data kwairi, takaona kuti kumhanya kwekugadzirisa zvikumbiro zvekunyora nayo kwaive kwakadzikira pane kune mamwe maseva.
Kusiyana nemamwe mapuratifomu, mugadziri uyu akashandisa bcache uye itsva linux 4.15 kernel. Aka kakanga kari kekutanga kushandiswa kwegadziriro iyi. Uye panguva iyoyo zvaive pachena kuti mudzi wedambudziko ungangove chero chinhu.
Kuongorora Mugamuchiri
Ngatitange nekutarisa izvo zvinoitika mukati meiyo ceph-osd maitiro. Nokuda kweizvi tichashandisa
Mufananidzo unotiudza kuti basa racho fdatasync() akapedza nguva yakawanda achitumira chikumbiro kumabasa generic_make_request(). Izvi zvinoreva kuti kazhinji chikonzero chematambudziko chiri kumwe kunze kwe osd daemon pachayo. Izvi zvinogona kuva kernel kana disks. Iyo iostat yakabuda yakaratidza yakakwira latency mukugadzirisa zvikumbiro ne bcache disks.
Pakutarisa mugadziri, takaona kuti systemd-udevd daemon inoshandisa yakawanda yeCPU nguva - ingangoita 20% pamacores akati wandei. Aya maitiro asinganzwisisike, saka unofanirwa kuziva kuti sei. Sezvo Systemd-udevd ichishanda nemaevents, isu takasarudza kuvatarisa kuburikidza udevadm monitor. Zvinoitika kuti nhamba huru yezviitiko zvekuchinja yakagadzirwa kune yega yega block mudziyo muhurongwa. Izvi hazvina kujairika, saka tichafanirwa kutarisa izvo zvinogadzira zvese izvi zviitiko.
Kushandisa BCC Toolkit
Sezvatakatoona, kernel (uye ceph daemon mune system call) inopedza nguva yakawanda generic_make_request(). Ngatiedze kuyera kukurumidza kwebasa iri. IN
Ichi chimiro chinowanzoshanda nekukurumidza. Zvese zvazvinoita kupfuudza chikumbiro kumutsara wemutyairi wemudziyo.
Bcache chigadzirwa chakaoma icho chine madhisiki matatu:
- backing device (cached disk), munyaya iyi inononoka HDD;
- caching device (caching disk), heino ichi chikamu chimwe cheNVMe device;
- iyo bcache chaiyo mudziyo iyo application inomhanya nayo.
Isu tinoziva kuti kuendesa chikumbiro kunonoka, asi ndeipi yemidziyo iyi? Tichagadzirisa izvi zvishoma gare gare.
Isu tava kuziva kuti zviitiko zvinogona kukonzera matambudziko. Kuwana kuti chii chaizvo chinokonzera chizvarwa chavo hakusi nyore. Ngatifungei kuti iyi imhando yesoftware inotangwa nguva nenguva. Ngationei kuti ndeupi rudzi rwesoftware inomhanya pane system uchishandisa script execsnoop kubva zvakafanana
Somuenzaniso seizvi:
/usr/share/bcc/tools/execsnoop | tee ./execdump
Hatisi kuzoratidza kuburitsa kwakazara kwe execsnoop pano, asi mutsara mumwe wekufarira kwatiri wakaita seuyu:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Koramu yechitatu ndiyo PPID (mubereki PID) yemaitiro. Maitiro nePID 5802 akazoita imwe yetambo dzeyedu yekutarisa system. Paunenge uchitarisa magadzirirwo ehurongwa hwekutarisa, zvikanganiso zvisizvo zvakawanikwa. Iyo tembiricha yeHBA adapta yakatorwa masekonzi makumi matatu ega ega, inova kazhinji kazhinji pane zvakafanira. Mushure mekushandura nguva yekutarisa kune imwe yakareba, takaona kuti chikumbiro chekugadzirisa latency pane ino host iyi yakanga isisina kumira kunze kana ichienzaniswa nemamwe mauto.
Asi hazvisati zvanyatsojeka kuti sei bcache mudziyo wainonoka kudaro. Isu takagadzirira chikuva chekuyedza neyakafanana gadziriso uye takaedza kuburitsa dambudziko nekumhanyisa fio pabcache, nguva nenguva tichimhanyisa udevadm trigger kugadzira zviitiko.
Kunyora BCC-Yakavakirwa Zvishandiso
Ngatiedzei kunyora zvirinyore zvekushandisa kutsvaga uye kuratidza inononoka kufona generic_make_request(). Isu tinofarirawo kune zita rekutyaira iro basa iri rakashevedzwa.
Chirongwa chacho chiri nyore:
- Register kprobe pamusoro generic_make_request():
- Isu tinochengetedza zita re diski mundangariro, rinowanikwa kuburikidza nenharo yebasa;
- Isu tinochengetedza timetamp.
- Register kretprobe zvekudzoka kubva generic_make_request():
- Isu tinowana iyo nguva yenguva;
- Isu tinotarisa iyo yakachengetwa timestamp toienzanisa neyazvino;
- Kana mhedzisiro yakakura kupfuura yakatsanangurwa, saka isu tinowana yakachengetedzwa disk zita uye toiratidza pane iyo terminal.
Kprobes ΠΈ kretprobes shandisa breakpoint mechanism kushandura kodhi yebasa panhunzi. Unogona kuverenga
Iyo eBPF mavara mukati meiyo python script inoita seizvi:
bpf_text = βββ # Here will be the bpf program code βββ
Kuchinjana data pakati pemabasa, eBPF zvirongwa zvinoshandisa
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
Pano tinonyoresa tafura yehashi inonzi p, ine kiyi mhando u64 uye kukosha kwerudzi struct data_t. Tafura yacho ichavepo mukati mechirongwa chedu cheBPF. Iyo BPF_PERF_OUTPUT macro inonyoresa imwe tafura inonzi zviitiko, iyo inoshandiswa
Paunenge uchiyera kunonoka pakati pekudaidza basa uye kudzoka kubva kwairi, kana pakati pemafoni kune akasiyana mabasa, unofanirwa kufunga kuti iyo yakagamuchirwa data inofanirwa kunge iri yemamiriro akafanana. Mune mamwe mazwi, iwe unofanirwa kuyeuka nezve inokwanisika parallel kuvhurwa kwemabasa. Isu tine kugona kuyera latency pakati pekudaidza basa muchimiro cheimwe nzira uye kudzoka kubva kune iyo basa mumamiriro eimwe nzira, asi izvi zvingangove zvisingabatsiri. Muenzaniso wakanaka pano ungave
Tevere, isu tinofanirwa kunyora kodhi iyo inomhanya kana basa riri pasi pekudzidza richinzi:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
Pano gakava rekutanga rekudanwa kwechiito richatsiviwa senharo yechipiri
Basa rinotevera richadaidzwa pakudzoka kubva generic_make_request():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
Iri basa rakafanana nerekare: isu tinowana iyo PID yemaitiro uye timestamp, asi usagove ndangariro kune itsva data chimiro. Pane kudaro, tinotsvaga tafura yehashi yechimiro chave chiripo tichishandisa kiyi == yazvino PID. Kana iyo dhizaini yakawanikwa, saka isu tinowana zita rekumhanyisa maitiro uye towedzera kwariri.
Iko kuchinja kwebhinari kwatinoshandisa pano kunodiwa kuti tiwane shinda GID. avo. PID yenzira huru yakatanga tambo mumamiriro atiri kushanda. Basa ratinodaidza
Pakuburitsa kune terminal, isu hatisi parizvino kufarira rukova, asi isu tiri kufarira iyo huru maitiro. Mushure mokuenzanisa kunonoka kunoguma nechikumbaridzo chakapiwa, tinopfuudza chimiro chedu dhata munzvimbo yemushandisi kuburikidza netafura zviitiko, mushure mezvo tinodzima chinyorwa kubva p.
Mune python script inozoisa iyi kodhi, isu tinofanirwa kutsiva MIN_US uye FACTOR nekunonoka zvikumbaridzo uye nguva mayuniti, ayo isu tichapfuura nemapokana:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Iye zvino tinoda kugadzirira chirongwa cheBPF kuburikidza
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Tichafanirawo kusarudza struct data_t mune yedu script, zvikasadaro isu hatizokwanisa kuverenga chero chinhu:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Nhanho yekupedzisira ndeyekuburitsa data kune terminal:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Iyo script pachayo inowanikwa pa
Pakupedzisira! Ikozvino tinoona kuti chaiita senge chinomira bcache mudziyo ichiri kufona generic_make_request() kune cached disk.
Dzvanya muKernel
Chii chaizvo chiri kudzikira panguva yekukumbira kutapurirana? Tinoona kuti kunonoka kunoitika kunyange kusati kwatanga kukumbira accounting, i.e. kuverenga kwechikumbiro chaicho chekuwedzera kuburitswa kwenhamba pairi (/proc/diskstats kana iostat) haisati yatanga. Izvi zvinogona kusimbiswa zviri nyore nekumhanyisa iostat uchigadzira dambudziko, kana
Kana tikatarisa basa generic_make_request(), ipapo tichaona kuti chikumbiro chisati chatanga accounting, mamwe maviri mabasa anodanwa. Chekutanga - generic_make_request_checks(), inoita cheki pamusoro pekutendeseka kwechikumbiro maererano nedhisiki marongero. Chepiri -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
Mariri, kernel inomirira kuti mutsara usununguke. Ngatiyere kunonoka blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Zvinoita sekunge tave pedyo nemhinduro. Mafunctions anoshandiswa kuomesa/kusunungura mutsetse ndiwo
Nguva yainotora kubvisa iyi queue yakaenzana nedisk latency sezvo kernel inomirira kuti mabasa ese akamirirwa apedze. Kana mutsara usisina chinhu, shanduko dzemaseting dzinoiswa. Mushure mezvo zvodanwa
Iye zvino tava kuziva zvakakwana kugadzirisa mamiriro acho ezvinhu. Iyo udevadm trigger command inokonzeresa kuti zvigadziriso zvechivharo chishandiswe. Aya marongero anotsanangurwa mumitemo yeudev. Tinogona kuwana kuti ndeapi marongero ari kuomesa mutsara nekuyedza kuvashandura kuburikidza nesysfs kana nekutarisa kernel source code. Isu tinogona zvakare kuedza iyo BCC yekushandisa
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Mitemo yeUdev inoshanduka kashoma uye kazhinji izvi zvinoitika nenzira inodzorwa. Saka isu tinoona kuti kunyangwe kushandisa iyo yakatotarwa kukosha kunokonzeresa spike mukunonoka kuendesa chikumbiro kubva kuchikumbiro kuenda kudiski. Zvechokwadi, kugadzira zviitiko zveudev kana pasina kuchinja mukugadzirisa disk (somuenzaniso, chigadzirwa chacho hachina kukwidzwa / kubviswa) haisi tsika yakanaka. Nekudaro, isu tinogona kubatsira kernel kuti isaite basa risingaite uye kuomesa mutsara wekukumbira kana zvisiri izvo.
mhedziso
eBPF chishandiso chinochinjika uye chine simba. Muchinyorwa takatarisa pane chimwe chiitiko chinoshanda uye takaratidzira chikamu chidiki chezvingaitwa. Kana iwe uchida kugadzira BCC zvishandiso, zvakakosha kuti utarise
Kune zvimwe zvinonakidza debugging uye profiling zvishandiso zvinoenderana neBPF. Mumwe wavo -
Source: www.habr.com