Linux e na le lisebelisoa tse ngata tsa ho lokisa kernel le lits'ebetso. Tse ngata tsa tsona li na le phello e mpe ts'ebetsong ea kopo 'me li ke ke tsa sebelisoa tlhahisong.
Lilemong tse 'maloa tse fetileng ho ne ho le teng
Ho se ho na le lisebelisoa tse ngata tse sebelisang eBPF, 'me sehloohong sena re tla sheba mokhoa oa ho ngola ts'ebeliso ea hau ea profiling ho latela laeborari.
Ceph o butle
Moamoheli e mocha o kentsoe sehlopheng sa Ceph. Ka mor'a hore re fallele boitsebiso bo bong ho eona, re hlokometse hore lebelo la ho sebetsana le likopo tsa ho ngola ka eona le ne le le tlaase haholo ho feta ho li-server tse ling.
Ho fapana le li-platform tse ling, moamoheli enoa o sebelisitse bcache le linux 4.15 kernel e ncha. Lena e ne e le lekhetlo la pele palo e ngata ea tlhophiso ena e sebelisoa mona. 'Me ka nako eo ho ne ho hlakile hore motso oa bothata e ka ba ntho leha e le efe.
Ho Fuputsa Moeti
Ha re qale ka ho sheba se etsahalang ka har'a ts'ebetso ea ceph-osd. Bakeng sa sena re tla sebelisa
Setšoantšo se re bolella hore mosebetsi fdatasync() o qetile nako e ngata a romella kopo mesebetsing generic_make_request(). Sena se bolela hore mohlomong sesosa sa mathata ke kae-kae ka ntle ho daemon ea osd ka boeona. Sena e ka ba kernel kapa disks. Sephetho sa iostat se bonts'itse latency e phahameng ha ho sebetsoa likopo ka li-disk tsa bcache.
Ha re hlahloba moamoheli, re fumane hore daemon ea systemd-udevd e sebelisa nako e ngata ea CPU - e ka bang 20% ho li-cores tse 'maloa. Ena ke boitšoaro bo makatsang, kahoo o hloka ho fumana hore na ke hobane'ng. Kaha Systemd-udevd e sebetsa le liketsahalo, re nkile qeto ea ho li sheba ka botlalo udevadm monitor. Hoa etsahala hore palo e kholo ea liketsahalo tsa phetoho e hlahisitsoe bakeng sa sesebelisoa se seng le se seng sa thibela tsamaiso. Sena ha sea tloaeleha, kahoo re tla tlameha ho sheba hore na ke eng e hlahisang liketsahalo tsena kaofela.
Ho sebelisa BCC Toolkit
Joalo ka ha re se re fumane, kernel (le ceph daemon ka har'a mohala oa sistimi) e qeta nako e ngata e le teng. generic_make_request(). A re leke ho lekanya lebelo la mosebetsi ona. IN
Hangata tšobotsi ena e sebetsa kapele. Sohle seo e se etsang ke ho fetisetsa kopo ho queue ea mokhanni oa sesebelisoa.
Bcache ke sesebelisoa se rarahaneng se hlileng se nang le li-disk tse tharo:
- sesebelisoa sa tšehetso (cached disk), tabeng ena ke HDD e liehang;
- sesebelisoa sa caching (caching disk), mona ke karolo e le 'ngoe ea sesebelisoa sa NVMe;
- sesebelisoa sa bcache seo sesebelisoa se sebetsang ka sona.
Rea tseba hore phetisetso ea kopo e lieha, empa ke efe ea lisebelisoa tsee? Re tla sebetsana le taba ena hamorao.
Hona joale rea tseba hore liketsahalo li ka baka mathata. Ho fumana se hlileng se bakang moloko oa bona ha ho bonolo hakaalo. Ha re nke hore ona ke mofuta o mong oa software o qalisoang nako le nako. Ha re boneng hore na ke software ea mofuta ofe e tsamaisang sistimi e sebelisang script execsnoop ho tloha ho tshoana
Ka mohlala, joalo ka:
/usr/share/bcc/tools/execsnoop | tee ./execdump
Ha re na ho bonts'a tlhahiso e felletseng ea execsnoop mona, empa mola o le mong oa thahasello ho rona o ne o shebahala tjena:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Kholomo ea boraro ke PPID (PID ea motsoali) ea ts'ebetso. Ts'ebetso e nang le PID 5802 e fetohile e 'ngoe ea likhoele tsa sistimi ea rona ea ho beha leihlo. Ha ho hlahlojoa tlhophiso ea tsamaiso ea ho shebella, ho ile ha fumanoa litekanyetso tse fosahetseng. Thempereichara ea adaptara ea HBA e nkiloe metsotsoana e meng le e meng e 30, e leng hangata ho feta kamoo ho hlokahalang. Ka mor'a ho fetola nako ea ho hlahloba hore e be e telele, re fumane hore nako ea ho sebetsa ha kopo ho moamoheli enoa ha e sa hlahella ha e bapisoa le baamoheli ba bang.
Empa ho ntse ho sa hlaka hore na ke hobane'ng ha sesebelisoa sa bcache se ne se lieha hakana. Re hlophisitse sethala sa liteko se nang le tlhophiso e ts'oanang mme ra leka ho hlahisa bothata hape ka ho sebelisa fio ho bcache, nako le nako re sebelisa udevadm trigger ho hlahisa liketsahalo.
Ho Ngola Lisebelisoa tse thehiloeng ho BCC
Ha re leke ho ngola ts'ebeliso e bonolo ea ho ts'oara le ho bonts'a mehala e liehang ho feta generic_make_request(). Re boetse re thahasella lebitso la koloi eo mosebetsi ona o neng o bitsoa ka eona.
Morero o bonolo:
- Ngodisa kprobe mabapi le generic_make_request():
- Re boloka lebitso la disk mohopolong, le fumaneha ka khang ea mosebetsi;
- Re boloka setempe sa nako.
- Ngodisa kretprobe bakeng sa ho khutla ho tsoa generic_make_request():
- Re fumana setempe sa nako sa hajoale;
- Re batla setempe sa nako se bolokiloeng ebe re se bapisa le sa hajoale;
- Haeba sephetho se le seholo ho feta se boletsoeng, joale re fumana lebitso la disk le bolokiloeng ebe re le hlahisa ho terminal.
Kprobes и li-kretprobes sebelisa mochine oa breakpoint ho fetola khoutu ea ts'ebetso ho fofa. U ka bala
Mongolo oa eBPF kahare ho python script o shebahala tjena:
bpf_text = “”” # Here will be the bpf program code “””
Ho fapanyetsana data lipakeng tsa mesebetsi, mananeo a eBPF a sebelisa
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
Mona re ngolisa tafole ea hash e bitsoang p, ka mofuta oa senotlolo u64 le boleng ba mofuta sebopeho data_t. Tafole e tla fumaneha ho latela maemo a lenaneo la rona la BPF. BPF_PERF_OUTPUT macro e ngolisa tafole e 'ngoe e bitsoang liketsahalo, e sebelisetsoang
Ha u lekanya tieho lipakeng tsa ho letsetsa tšebetso le ho khutla ho tsoa ho eona, kapa lipakeng tsa mehala ho ea lits'ebetsong tse fapaneng, o hloka ho ela hloko hore data e amohetsoeng e tlameha ho ba ea moelelo o tšoanang. Ka mantsoe a mang, o hloka ho hopola ka ts'ebetso e ts'oanang e ts'oanang ea mesebetsi. Re na le bokhoni ba ho lekanya ho lieha ha nako pakeng tsa ho letsetsa ts'ebetso molemong oa ts'ebetso e le 'ngoe le ho khutla mosebetsing oo ho latela mokhoa o mong, empa sena se ka' na sa se ke sa thusa. Mohlala o motle mona e ka ba
Ka mor'a moo, re hloka ho ngola khoutu e tla sebetsa ha mosebetsi o ntseng o ithuta o bitsoa:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
Mona ho tla nkeloa khang ea pele ea tšebetso e bitsoang khang ea bobeli
Mosebetsi o latelang o tla bitsoa ha o khutla generic_make_request():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
Ts'ebetso ena e ts'oana le e fetileng: re fumana PID ea ts'ebetso le setempe sa nako, empa u se ke ua fana ka mohopolo bakeng sa sebopeho se secha sa data. Sebakeng seo, re batla tafole ea hash bakeng sa sebopeho se seng se ntse se le teng re sebelisa senotlolo == PID ea hajoale. Haeba sebopeho se fumanoa, joale re fumana lebitso la ts'ebetso e sebetsang ebe re e eketsa ho eona.
Phetoho ea binary eo re e sebelisang mona ea hlokahala ho fumana khoele ea GID. tseo. PID ea ts'ebetso e kholo e qalileng khoele maemong ao re sebetsang ka ona. Mosebetsi oo re o bitsang
Ha re hlahisa ho terminal, ha joale ha re thahaselle khoele, empa re thahasella ts'ebetso ea mantlha. Ka mor'a ho bapisa ho lieha ho hlahisoang le moeli o fanoeng, re feta mohaho oa rona ya data sebakeng sa basebelisi ka tafole liketsahalo, ka mor'a moo re hlakola ho kena ho tloha p.
Ho script ea python e tla kenya khoutu ena, re hloka ho khutlisa MIN_US le FACTOR ka litekanyo tsa ho lieha le likarolo tsa nako, tseo re tla li fetisa likhang:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Hona joale re hloka ho lokisa lenaneo la BPF ka
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Hape re tla tlameha ho etsa qeto sebopeho data_t ka mongolo oa rona, ho seng joalo re ke ke ra khona ho bala letho:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Mohato oa ho qetela ke ho ntša data ho terminal:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Script ka boeona e fumaneha ho
Qetellong! Joale rea bona hore se neng se shebahala joalo ka sesebelisoa sa bcache se tsitsitseng ha e le hantle ke mohala o thibang generic_make_request() bakeng sa "cached disk".
Kena ka har'a Kernel
Hantle-ntle ho fokotseha ha lebelo nakong ea phetisetso ea kopo? Rea bona hore ho lieha ho etsahala le pele ho qala kopo ea accounting, i.e. tlaleho ea kopo e khethehileng ea tlhahiso e eketsehileng ea lipalo-palo ho eona (/proc/diskstats kapa iostat) ha e so qale. Sena se ka netefatsoa habonolo ka ho sebelisa iostat ha o ntse o hlahisa bothata, kapa
Haeba re sheba mosebetsi generic_make_request(), joale re tla bona hore pele kopo e qala accounting, ho bitsoa mesebetsi e meng e 'meli. Ea pele - generic_make_request_checks(), e etsa licheke mabapi le ho nepahala ha kopo mabapi le litlhophiso tsa disk. Ea bobeli -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
Ho eona, kernel e emela hore letoto le theohe. Ha re lekanye tieho blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Ho bonahala eka re haufi le tharollo. Mesebetsi e sebelisoang ho hatsetsa/ho lokolla mokoloko ke
Nako eo e e nkang ho hlakola letoto lena e lekana le disk latency ha kernel e emetse hore ts'ebetso eohle e phethoe. Hang ha mola o se o se na letho, liphetoho tsa litlhophiso li tla sebelisoa. Ka mor'a moo e bitsoa
Joale re tseba ho lekana ho lokisa boemo. Taelo ea trigger ea udevadm e etsa hore litlhophiso tsa sesebelisoa sa block se sebelisoe. Litlhophiso tsena li hlalositsoe ho melao ea udev. Re ka fumana hore na ke li-setting life tse etsang hore queue e be leqhoa ka ho leka ho e fetola ka li-sysfs kapa ka ho sheba khoutu ea mohloli oa kernel. Re ka boela ra leka lisebelisoa tsa BCC
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Melao ea Udev e fetoha ka seoelo mme hangata sena se etsahala ka tsela e laoloang. Kahoo rea bona hore esita le ho sebelisa litekanyetso tse seng li behiloe ho baka spike ho lieha ho fetisetsa kopo ho tloha ho kopo ho ea ho disk. Ha e le hantle, ho hlahisa liketsahalo tsa udev ha ho se na liphetoho ho tlhophiso ea disk (mohlala, sesebelisoa ha se hloekisoe / se khaotsoe) hase mokhoa o motle. Leha ho le joalo, re ka thusa kernel hore e se ke ea etsa mosebetsi o sa hlokahaleng le ho emisa mokoloko oa kopo haeba ho sa hlokahale.
fihlela qeto e
eBPF ke sesebelisoa se tenyetsehang haholo ebile se matla. Sehloohong re ile ra sheba ketsahalo e le ’ngoe e sebetsang ’me ra bontša karolo e nyenyane ea se ka etsoang. Haeba u thahasella ho nts'etsapele lisebelisoa tsa BCC, ho bohlokoa hore u shebe
Ho na le lisebelisoa tse ling tse khahlisang tsa ho lokisa liphoso le ho etsa profilse tse thehiloeng ho eBPF. E mong oa bona -
Source: www.habr.com