Linux nwere ọnụ ọgụgụ buru ibu nke ngwaọrụ maka debugging kernel na ngwa. Ọtụtụ n'ime ha nwere mmetụta ọjọọ na arụmọrụ ngwa na enweghị ike iji ya mee ihe.
Afọ ole na ole gara aga enwere
Enweelarị ọtụtụ ngwa ngwa na-eji eBPF, na n'isiokwu a, anyị ga-eleba anya ka esi ede njirimara profaịlụ nke gị dabere na ọbá akwụkwọ.
Ceph dị nwayọọ
Agbakwunyela onye ọbịa ọhụrụ na ụyọkọ Ceph. Mgbe ịkwaga ụfọdụ data na ya, anyị chọpụtara na ọsọ nke nhazi dee arịrịọ site na ya dị ala karịa na sava ndị ọzọ.
N'adịghị ka nyiwe ndị ọzọ, onye ọbịa a jiri bcache na kernel Linux 4.15 ọhụrụ. Nke a bụ oge mbụ ejiri onye nhazi nhazi a mee ihe ebe a. Ma n'oge ahụ, o doro anya na mgbọrọgwụ nke nsogbu ahụ nwere ike ịbụ ihe ọ bụla.
Na-enyocha onye ọbịa
Ka anyị bido site na ilele ihe na-eme n'ime usoro ceph-osd. Maka nke a anyị ga-eji
Foto a na-agwa anyị na ọrụ ahụ fdatasync() nọrọ ọtụtụ oge na-eziga arịrịọ maka ọrụ generic_make_request(). Nke a pụtara na o yikarịrị ka ihe kpatara nsogbu ahụ bụ ebe na-abụghị osd daemon n'onwe ya. Nke a nwere ike ịbụ kernel ma ọ bụ diski. Mmepụta iostat gosipụtara nnukwu latency na nhazi arịrịọ site na diski bcache.
Mgbe anyị na-elele onye ọbịa ahụ, anyị chọpụtara na systemd-udevd daemon na-eri nnukwu oge CPU - ihe dịka 20% na ọtụtụ cores. Nke a bụ omume iju, yabụ ịkwesịrị ịchọpụta ihe kpatara ya. Ebe ọ bụ na Systemd-udevd na-arụ ọrụ na uevents, anyị kpebiri ileba anya na ha udevadm nlekota oru. Ọ na-apụta na e mepụtara ọnụ ọgụgụ dị ukwuu nke mgbanwe mgbanwe maka ngwaọrụ ngọngọ ọ bụla na usoro. Nke a bụ ihe a na-adịghị ahụkebe, yabụ anyị ga-elele ihe na-ebute ihe omume ndị a niile.
Iji ngwa ngwa BCC
Dịka anyị chọpụtala, kernel (na ceph daemon na oku sistemụ) na-etinye oge dị ukwuu n'ime ya. generic_make_request(). Ka anyị gbalịa ịlele ọsọ nke ọrụ a. N'ime
Njirimara a na-arụkarị ọrụ ngwa ngwa. Naanị ihe ọ na-eme bụ ịnyefe arịrịọ na kwụ n'ahịrị ọkwọ ụgbọ ala ngwaọrụ.
Bcache bụ ngwaọrụ dị mgbagwoju anya nke nwere diski atọ n'ezie:
- ngwaọrụ na-akwado (nke echekwara diski), na nke a ọ bụ HDD dị nwayọọ;
- ngwaọrụ caching (caching disk), ebe a bụ otu akụkụ nke ngwaọrụ NVMe;
- bcache mebere ngwaọrụ nke ngwa na-eji.
Anyị maara na nnyefe arịrịọ adịghị ngwa, mana kedu n'ime ngwaọrụ ndị a? Anyị ga-eme nke a obere oge ma emechaa.
Anyị maara ugbu a na ihe omume nwere ike ịkpata nsogbu. Ịchọta ihe kpọmkwem na-akpata ọgbọ ha adịghị mfe. Ka anyị were ya na nke a bụ ụdị sọftụwia a na-ewepụta kwa oge. Ka anyị hụ ụdị sọftụwia na-agba na sistemụ site na iji edemede exsnoop site na otu
Dịka ọmụmaatụ dịka nke a:
/usr/share/bcc/tools/execsnoop | tee ./execdump
Anyị agaghị egosi mmepụta execsnoop zuru ezu ebe a, mana otu ahịrị mmasị anyị dị ka nke a:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Kọlụm nke atọ bụ PPID (PID) nke usoro a. Usoro na PID 5802 tụgharịrị bụrụ otu n'ime eriri nke sistemu nlekota anyị. Mgbe ị na-elele nhazi nke usoro nleba anya, ahụrụ paramita na-ezighi ezi. A na-ewere ọnọdụ okpomọkụ nke nkwụnye HBA kwa sekọnd 30, nke na-adịkarị karịa ka ọ dị mkpa. Mgbe ịgbanwere oge nlele ahụ ka ọ bụrụ ogologo oge, anyị chọpụtara na nkwụsị nhazi arịrịọ na onye ọbịa a adịkwaghị apụta ìhè ma e jiri ya tụnyere ndị ọbịa ndị ọzọ.
Mana amabeghị ihe kpatara ngwaọrụ bcache ji nwayọ nwayọ. Anyị kwadebere ikpo okwu ule nwere nhazi yiri ya ma gbalịa imepụtaghachi nsogbu ahụ site na ịgba ọsọ fio na bcache, na-agba ọsọ udevadm na-akpali akpali kwa oge iji mepụta ihe omume.
Ederede Ngwa dabere na BCC
Ka anyị gbalịa dee ngwa dị mfe iji chọpụta ma gosipụta oku kacha nwayọ generic_make_request(). Anyị nwekwara mmasị na aha mbanye nke a na-akpọ ọrụ a.
Atụmatụ ahụ dị mfe:
- Debanye aha kpoprobe on generic_make_request():
- Anyị na-echekwa aha diski n'ime ebe nchekwa, nweta site na arụmụka ọrụ;
- Anyị na-echekwa akara oge.
- Debanye aha kretprobe maka nloghachi si generic_make_request():
- Anyị na-enweta stampụ nke ugbu a;
- Anyị na-achọ stampụ oge echekwara wee jiri ya tụnyere nke dị ugbu a;
- Ọ bụrụ na nsonaazụ ya karịrị nke akọwapụtara, anyị ga-ahụ aha diski echekwara wee gosipụta ya na ọnụ.
Kprobes и kretprobes jiri usoro nkwụsịtụ iji gbanwee koodu ọrụ na ofufe. Ị nwere ike ịgụ
Ederede eBPF n'ime edemede Python dị ka nke a:
bpf_text = “”” # Here will be the bpf program code “””
Iji gbanwee data n'etiti ọrụ, mmemme eBPF na-eji
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
N'ebe a, anyị debanyere tebụl hash a na-akpọ p, na ụdị igodo u64 na uru nke ụdị data nhazi_t. Tebụlụ a ga-adị na ọnọdụ nke mmemme BPF anyị. Nnukwu BPF_PERF_OUTPUT na-edebanye aha tebụl ọzọ a na-akpọ ihe, nke a na-eji maka ya
Mgbe ị na-atụ oge igbu oge n'etiti ịkpọ ọrụ na nlọghachi site na ya, ma ọ bụ n'etiti oku na ọrụ dị iche iche, ịkwesịrị iburu n'uche na data enwetara ga-abụrịrị otu ọnọdụ. N'ikwu ya n'ụzọ ọzọ, ịkwesịrị icheta maka mmalite mmalite nke ọrụ nwere ike ime. Anyị nwere ike ịlele nkwụsịtụ n'etiti ịkpọ ọrụ na nhazi nke otu usoro na ịlaghachi na ọrụ ahụ na usoro nke usoro ọzọ, ma nke a nwere ike ọ gaghị abaghị uru. Ezi ihe atụ ebe a ga-abụ
Ọzọ, anyị kwesịrị ide koodu ga-arụ ọrụ mgbe a na-akpọ ọrụ a na-amụ:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
N'ebe a, arụmụka mbụ nke ọrụ a na-akpọ ga-anọchi anya dị ka arụmụka nke abụọ
A ga-akpọ ọrụ na-esonụ na nloghachi si generic_make_request():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
Ọrụ a yiri nke gara aga: anyị na-achọpụta PID nke usoro na stampụ oge, mana etinyela ebe nchekwa maka nhazi data ọhụrụ. Kama, anyị na-achọ tebụl hash maka nhazi dị adị na-eji igodo == PID dị ugbu a. Ọ bụrụ na achọtara ihe owuwu ahụ, mgbe ahụ, anyị ga-achọpụta aha usoro ịgba ọsọ ma tinye ya na ya.
Ngbanwe ọnụọgụ abụọ anyị na-eji ebe a chọrọ iji nweta eriri GID. ndị ahụ. PID nke isi usoro malitere eri na ọnọdụ nke anyị na-arụ ọrụ. Ọrụ anyị na-akpọ
Mgbe ị na-emepụta na njedebe, anyị enweghị mmasị ugbu a na eri ahụ, mana anyị nwere mmasị na isi usoro. Mgbe atụnyere igbu oge na-apụta na ọnụ ụzọ enyere, anyị gafere usoro anyị data banye ohere onye ọrụ site na tebụl ihe, mgbe nke ahụ gasịrị, anyị na-ehichapụ ntinye site na p.
N'edemede python nke ga-ebu koodu a, anyị kwesịrị iji dochie MIN_US na FACTOR na nkwụsị oge na nkeji oge, nke anyị ga-agafe na arụmụka:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Ugbu a, anyị kwesịrị ịkwado mmemme BPF site na
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Anyị ga-ekpebikwa data nhazi_t na edemede anyị, ma ọ bụghị ya, anyị agaghị enwe ike ịgụ ihe ọ bụla:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Nzọụkwụ ikpeazụ bụ iwepụta data na ọnụ:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Edemede n'onwe ya dị na
N'ikpeazụ! Ugbu a anyị na-ahụ na ihe dị ka ngwaọrụ bcache na-akwụsị bụ n'ezie oku na-akwụsị akwụsị generic_make_request() maka diski echekwara.
Gwuo n'ime kernel
Kedu ihe na-ebelata ngwa ngwa n'oge nnyefe arịrịọ? Anyị na-ahụ na igbu oge na-eme ọbụna tupu mmalite nke mkpesa arịrịọ, ya bụ. aza ajụjụ maka otu arịrịọ maka mpụta ọzọ nke ọnụ ọgụgụ na ya (/proc/diskstats ma ọ bụ iostat) amalitebeghị. Enwere ike ịnwapụta nke a n'ụzọ dị mfe site na ịgba ọsọ iostat ka ị na-emepụtagharị nsogbu ahụ, ma ọ bụ
Ọ bụrụ na anyị eleba anya na ọrụ ahụ generic_make_request(), mgbe ahụ, anyị ga-ahụ na tupu arịrịọ ahụ amalite ịza ajụjụ, a na-akpọ ọrụ abụọ ọzọ. Mbụ - generic_make_request_checks(), na-eme nyocha na izi ezi nke arịrịọ ahụ gbasara ntọala diski. Nke abụọ -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
N'ime ya, kernel na-echere kwụ n'ahịrị ka ọ tọhapụ. Ka anyị tụọ igbu oge blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Ọ dị ka anyị nọ nso n'ihe ngwọta. Ọrụ ndị a na-eji eme ka ifriizi/iwepụ kwụ n'ahịrị bụ
Oge ọ na-ewe iji kpochapụ kwụ n'ahịrị a dabara na nkwụsị diski ka kernel na-eche ka ọrụ niile kwụ n'ahịrị mechaa. Ozugbo kwụ n'ahịrị tọhapụrụ, a na-etinye mgbanwe ntọala. Mgbe nke a gasịrị, a na-akpọ ya
Ugbu a, anyị maara nke ọma iji dozie ọnọdụ ahụ. Iwu udevadm na-akpalite na-eme ka ntọala maka ngwaọrụ ngọngọ tinye n'ọrụ. A kọwara ntọala ndị a na iwu udev. Anyị nwere ike ịchọta ntọala ndị na-eme ka kwụ n'ahịrị site n'ịgbalị ịgbanwe ha site na sysfs ma ọ bụ site na ilele koodu isi mmalite kernel. Anyị nwekwara ike ịnwale ọrụ BCC
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Iwu Udev na-agbanwe obere oge ma na-emekarị nke a n'ụzọ a na-achịkwa. Ya mere, anyị na-ahụ na ọbụna itinye ụkpụrụ edoberelarị na-akpata mmụba na igbu oge na-ebufe arịrịọ site na ngwa ahụ na diski. N'ezie, ịmepụta ihe omume udev mgbe enweghị mgbanwe na nhazi diski (dịka ọmụmaatụ, ngwaọrụ anaghị agbanye / kwụsịrị) abụghị ezigbo omume. Otú ọ dị, anyị nwere ike inyere kernel aka ka ọ ghara ịrụ ọrụ na-adịghị mkpa na ifriizi n'ahịrị arịrịọ ma ọ bụrụ na ọ dịghị mkpa.
mmechi
eBPF bụ ngwá ọrụ dị ike ma dị ike. N’isiokwu ahụ, anyị lere anya n’otu ihe mere eme ma gosi ntakịrị akụkụ nke ihe a pụrụ ime. Ọ bụrụ na ị nwere mmasị ịzụlite akụrụngwa BCC, ọ bara uru ileba anya
Enwere ngwa nbipu na profaịlụ ndị ọzọ na-atọ ụtọ dabere na eBPF. Otu n'ime ha -
isi: www.habr.com