Linux ã«ã¯ãã«ãŒãã«ãšã¢ããªã±ãŒã·ã§ã³ããããã°ããããã®ããŒã«ãå€æ°ãããŸãã ãããã®ã»ãšãã©ã¯ã¢ããªã±ãŒã·ã§ã³ã®ããã©ãŒãã³ã¹ã«æªåœ±é¿ãäžãããããéçšç°å¢ã§ã¯äœ¿çšã§ããŸããã
æ°å¹Žåã«ã¯ãããŸãã
eBPF ã䜿çšããã¢ããªã±ãŒã·ã§ã³ ãŠãŒãã£ãªãã£ã¯ãã§ã«å€æ°ãããŸãããã®èšäºã§ã¯ãã©ã€ãã©ãªã«åºã¥ããŠç¬èªã®ãããã¡ã€ãªã³ã° ãŠãŒãã£ãªãã£ãäœæããæ¹æ³ãèŠãŠãããŸãã
Ceph ãé ã
æ°ãããã¹ãã Ceph ã¯ã©ã¹ã¿ãŒã«è¿œå ãããŸããã äžéšã®ããŒã¿ãããã«ç§»è¡ããåŸãæžã蟌ã¿ãªã¯ãšã¹ãã®åŠçé床ãä»ã®ãµãŒããŒãããã¯ããã«é ãããšã«æ°ã¥ããŸããã
ä»ã®ãã©ãããã©ãŒã ãšã¯ç°ãªãããã®ãã¹ã㯠bcache ãšæ°ãã Linux 4.15 ã«ãŒãã«ã䜿çšããŸããã ãã®æ§æã®ãã¹ããããã§äœ¿çšãããã®ã¯ãããåããŠã§ããã ãããŠãã®ç¬éãåé¡ã®æ ¹æ¬ã¯çè«çã«ã¯äœã§ãããåŸãããšãæããã§ããã
ãã¹ãã®èª¿æ»
ãŸãã¯ãceph-osd ããã»ã¹å
ã§äœãèµ·ãã£ãŠããããèŠãŠã¿ãŸãããã ãã®ããã«äœ¿çšããŸã
åçã¯ãã®æ©èœã瀺ããŠããŸã fdatasync() é¢æ°ã«ãªã¯ãšã¹ããéä¿¡ããã®ã«å€ãã®æéãè²»ããã generic_make_request()ã ããã¯ãåé¡ã®åå ã OSD ããŒã¢ã³èªäœã®å€éšã«ããå¯èœæ§ãé«ãããšãæå³ããŸãã ããã¯ã«ãŒãã«ãŸãã¯ãã£ã¹ã¯ã®ããããã§ãã iostat ã®åºåã§ã¯ãbcache ãã£ã¹ã¯ã«ãããªã¯ãšã¹ãã®åŠçã§é·ãé
延ãçºçããŠããããšã瀺ãããŸããã
ãã¹ãããã§ãã¯ãããšãsystemd-udevd ããŒã¢ã³ã倧éã® CPU æéãæ¶è²»ããŠããããšãããããŸãã (ããã€ãã®ã³ã¢ã§çŽ 20%)ã ããã¯å¥åŠãªåäœãªã®ã§ããã®çç±ã調ã¹ãå¿ èŠããããŸãã Systemd-udevd 㯠uevents ãšé£åãããããuevents ãéããŠãããã調ã¹ãããšã«ããŸããã udevadm ã¢ãã¿ã ã·ã¹ãã å ã®ãããã¯ããã€ã¹ããšã«ãå€æ°ã®å€æŽã€ãã³ããçæãããŠããããšãããããŸããã ããã¯éåžžã«çããããšãªã®ã§ãäœããããã®ã€ãã³ããçæããã®ãã調ã¹ãå¿ èŠããããŸãã
BCC ããŒã«ãããã®äœ¿çš
ãã§ã«ããã£ãããã«ãã«ãŒãã« (ããã³ã·ã¹ãã ã³ãŒã«ã® ceph ããŒã¢ã³) ã¯ã generic_make_request()ã ãã®é¢æ°ã®é床ã枬å®ããŠã¿ãŸãããã ã§
éåžžããã®æ©èœã¯ããã«æ©èœããŸãã è¡ãããšã¯ããªã¯ãšã¹ããããã€ã¹ ãã©ã€ã㌠ãã¥ãŒã«æž¡ãããšã ãã§ãã
Bãã£ãã·ã¥ ã¯ãå®éã«ã¯ XNUMX ã€ã®ãã£ã¹ã¯ã§æ§æãããè€éãªããã€ã¹ã§ãã
- ãããã³ã° ããã€ã¹ (ãã£ãã·ã¥ããããã£ã¹ã¯)ããã®å Žåã¯äœéã® HDD ã§ãã
- ãã£ãã·ã¥ ããã€ã¹ (ãã£ãã·ã¥ ãã£ã¹ã¯)ãããã§ã¯ããã㯠NVMe ããã€ã¹ã® XNUMX ã€ã®ããŒãã£ã·ã§ã³ã§ãã
- ã¢ããªã±ãŒã·ã§ã³ãå®è¡ãã bcache ä»®æ³ããã€ã¹ã
ãªã¯ãšã¹ãã®éä¿¡ãé ãããšã¯ããã£ãŠããŸããã次ã®ã©ã®ããã€ã¹ã«å¯Ÿãããã®ã§ãããã? ããã«ã€ããŠã¯å°ãåŸã§æ±ããŸãã
uevents ãåé¡ãåŒãèµ·ããå¯èœæ§ãé«ãããšãããããŸããã 圌ãã®çºçã®æ£ç¢ºãªåå ãèŠã€ããã®ã¯ããã»ã©ç°¡åã§ã¯ãããŸããã ãããå®æçã«èµ·åãããããçš®ã®ãœãããŠã§ã¢ã§ãããšä»®å®ããŸãããã ã¹ã¯ãªããã䜿çšããŠã·ã¹ãã äžã§ã©ã®ãããªãœãããŠã§ã¢ãå®è¡ãããããèŠãŠã¿ãŸããã ãšã°ãŒã¯ããŒã åããã
ããšãã°ã次ã®ããã«ãªããŸãã
/usr/share/bcc/tools/execsnoop | tee ./execdump
ããã§ã¯ execsnoop ã®å®å šãªåºåã¯ç€ºããŸããããèå³æ·±ã XNUMX è¡ã¯æ¬¡ã®ãããªãã®ã§ããã
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
5802 çªç®ã®åã¯ããã»ã¹ã® PPID (芪 PID) ã§ãã PID 30 ã®ããã»ã¹ã¯ãç£èŠã·ã¹ãã ã®ã¹ã¬ããã® XNUMX ã€ã§ããããšãå€æããŸããã ç£èŠã·ã¹ãã ã®æ§æããã§ãã¯ãããšããã誀ã£ããã©ã¡ãŒã¿ãèŠã€ãããŸããã HBA ã¢ããã¿ã®æž©åºŠã¯ XNUMX ç§ããšã«æž¬å®ãããŸããããããã¯å¿ èŠä»¥äžã«é »ç¹ã§ãã ãã§ãã¯ééãé·ãã«å€æŽãããšããããã®ãã¹ãã®ãªã¯ãšã¹ãåŠçã®é 延ãä»ã®ãã¹ããšæ¯ã¹ãŠç®ç«ããªããªã£ãããšãããããŸããã
ããããbcache ããã€ã¹ããªãããã»ã©é ãã£ãã®ãã¯ãŸã äžæã§ãã åãæ§æã®ãã¹ã ãã©ãããã©ãŒã ãæºåããbcache 㧠fio ãå®è¡ããå®æçã« udevadm ããªã¬ãŒãå®è¡ã㊠uevents ãçæããããšã§åé¡ã®åçŸãè©Šã¿ãŸããã
BCC ããŒã¹ã®ããŒã«ã®äœæ
æãé ãåŒã³åºãã远跡ããŠè¡šç€ºããç°¡åãªãŠãŒãã£ãªãã£ãäœæããŠã¿ãŸãããã generic_make_request()ã ãã®é¢æ°ãåŒã³åºããããã©ã€ãã®ååã«ãèå³ããããŸãã
èšç»ã¯åçŽã§ãïŒ
- ç»é²ãã kãããŒã Ма generic_make_request():
- ãã£ã¹ã¯åãã¡ã¢ãªã«ä¿åããé¢æ°ã®åŒæ°ãéããŠã¢ã¯ã»ã¹ã§ããŸãã
- ã¿ã€ã ã¹ã¿ã³ããä¿åããŸãã
- ç»é²ãã ã¯ã¬ãããããŒã ããã®åž°åœã®ããã« generic_make_request():
- çŸåšã®ã¿ã€ã ã¹ã¿ã³ããååŸããŸãã
- ä¿åãããã¿ã€ã ã¹ã¿ã³ããæ€çŽ¢ããçŸåšã®ã¿ã€ã ã¹ã¿ã³ããšæ¯èŒããŸãã
- çµæãæå®ããããã®ãã倧ããå Žåã¯ãä¿åããããã£ã¹ã¯åãèŠã€ããŠç«¯æ«ã«è¡šç€ºããŸãã
KãããŒã О ã¯ã¬ãããããŒã ãã¬ãŒã¯ãã€ã³ã ã¡ã«ããºã ã䜿çšããŠãé¢æ°ã³ãŒãããªã³ã¶ãã©ã€ã§å€æŽããŸãã ããªãã¯èªãããšãã§ããŸã
Python ã¹ã¯ãªããå ã® eBPF ããã¹ãã¯æ¬¡ã®ããã«ãªããŸãã
bpf_text = âââ # Here will be the bpf program code âââ
é¢æ°éã§ããŒã¿ã亀æããããã«ãeBPF ããã°ã©ã ã¯æ¬¡ã䜿çšããŸãã
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
ããã§ã¯ããšããããã·ã¥ããŒãã«ãç»é²ããŸãã pãããŒä»ãã¿ã€ã u64 ããã³ type ã®å€ æ§é äœããŒã¿_tã ãã®ããŒãã«ã¯ãBPF ããã°ã©ã ã®ã³ã³ããã¹ãã§å©çšå¯èœã«ãªããŸãã BPF_PERF_OUTPUT ãã¯ãã¯ãBPF_PERF_OUTPUT ãšããå¥ã®ããŒãã«ãç»é²ããŸãã ã€ãã³ãã«äœ¿çšãããŸãã
é¢æ°ã®åŒã³åºããšé¢æ°ããã®æ»ãã®éã®é
延ããŸãã¯ç°ãªãé¢æ°ã®åŒã³åºãéã®é
延ã枬å®ããå Žåã¯ãåä¿¡ããããŒã¿ãåãã³ã³ããã¹ãã«å±ããŠããå¿
èŠãããããšãèæ
®ããå¿
èŠããããŸãã èšãæããã°ãé¢æ°ã®äžŠè¡èµ·åã®å¯èœæ§ã«ã€ããŠèŠããŠããå¿
èŠããããŸãã ããããã»ã¹ã®ã³ã³ããã¹ãã§é¢æ°ãåŒã³åºããŠãããå¥ã®ããã»ã¹ã®ã³ã³ããã¹ãã§ãã®é¢æ°ããæ»ããŸã§ã®åŸ
ã¡æéã枬å®ããæ©èœããããŸããããããã圹ã«ç«ã¡ãŸããã ããã§ã®è¯ãäŸã¯æ¬¡ã®ãšããã§ã
次ã«ã調æ»å¯Ÿè±¡ã®é¢æ°ãåŒã³åºããããšãã«å®è¡ãããã³ãŒããèšè¿°ããå¿ èŠããããŸãã
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
ããã§ã¯ãåŒã³åºãããé¢æ°ã®æåã®åŒæ°ã XNUMX çªç®ã®åŒæ°ãšããŠçœ®ãæããããŸãã
次ã®é¢æ°ã¯ãããã®æ»ãæã«åŒã³åºãããŸãã generic_make_request():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
ãã®é¢æ°ã¯åã®é¢æ°ãšäŒŒãŠããŸããããã»ã¹ã® PID ãšã¿ã€ã ã¹ã¿ã³ããèŠã€ããŸãããæ°ããããŒã¿æ§é ã«ã¡ã¢ãªãå²ãåœãŠãŸããã 代ããã«ãã㌠== çŸåšã® PID ã䜿çšããŠãããã·ã¥ ããŒãã«ã§æ¢åã®æ§é ãæ€çŽ¢ããŸãã æ§é ãèŠã€ãã£ãå Žåã¯ãå®è¡äžã®ããã»ã¹ã®ååãèŠã€ããŠãããã«è¿œå ããŸãã
ããã§äœ¿çšãããã€ã㪠ã·ããã¯ãã¹ã¬ãã GID ãååŸããããã«å¿
èŠã§ãã ãããã®ã äœæ¥äžã®ã³ã³ããã¹ãã§ã¹ã¬ãããéå§ããã¡ã€ã³ããã»ã¹ã® PIDã åŒã³åºãé¢æ°
ã¿ãŒããã«ã«åºåãããšããçŸæç¹ã§ã¯ã¹ã¬ããã«ã¯é¢å¿ããããŸããããã¡ã€ã³ããã»ã¹ã«ã¯é¢å¿ããããŸãã çµæãšããŠçããé 延ãæå®ããããããå€ãšæ¯èŒããåŸãæ§é äœãæž¡ããŸãã ããŒã¿ ããŒãã«çµç±ã§ãŠãŒã¶ãŒç©ºéã«ã¢ã¯ã»ã¹ ã€ãã³ãããã®åŸãšã³ããªãåé€ããŸã p.
ãã®ã³ãŒããããŒããã Python ã¹ã¯ãªããã§ã¯ãMIN_US ãš FACTOR ãé 延ãããå€ãšæéåäœã«çœ®ãæããå¿ èŠããããããããåŒæ°ã§æž¡ããŸãã
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
次ã«ãBPF ããã°ã©ã ãæºåããå¿
èŠããããŸãã
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
ç§ãã¡ãå€æããªããã°ãªããŸãã æ§é äœããŒã¿_t ããããªããšãäœãèªã¿åãããšãã§ããªããªããŸãã
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
æåŸã®ã¹ãããã¯ãããŒã¿ã端æ«ã«åºåããããšã§ãã
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
ã¹ã¯ãªããèªäœã¯æ¬¡ã®å Žæããå
¥æã§ããŸãã
ã€ãã«ïŒ ããã§ãã¹ããŒã«ããŠãã bcache ããã€ã¹ã®ããã«èŠãããã®ããå®éã«ã¯ã¹ããŒã«ããŠããåŒã³åºãã§ããããšãããããŸãã generic_make_request() ãã£ãã·ã¥ããããã£ã¹ã¯ã®å Žåã
ã«ãŒãã«ã詳ãã調ã¹ã
ãªã¯ãšã¹ãã®éä¿¡äžã«äœãé
ããªã£ãŠããã®ã§ãããã? ãªã¯ãšã¹ã ã¢ã«ãŠã³ãã£ã³ã°ã®éå§åã§ãã£ãŠãé
延ãçºçããŠããããšãããããŸãã çµ±èšæ
å ± (/proc/diskstats ãŸã㯠iostat) ãããã«åºåããããã®ç¹å®ã®ãªã¯ãšã¹ãã®åŠçã¯ãŸã éå§ãããŠããŸããã ããã¯ãåé¡ã®åçŸäžã« iostat ãå®è¡ããããšã§ç°¡åã«ç¢ºèªã§ããŸãã
æ©èœãèŠãŠã¿ããš generic_make_request(), 次ã«ããªã¯ãšã¹ããã¢ã«ãŠã³ãã£ã³ã°ãéå§ããåã«ãããã« XNUMX ã€ã®é¢æ°ãåŒã³åºãããããšãããããŸãã åã - generic_make_request_checks()ããã£ã¹ã¯èšå®ã«é¢ãããªã¯ãšã¹ãã®æ£åœæ§ã®ãã§ãã¯ãå®è¡ããŸãã XNUMXçª -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
ãã®äžã§ãã«ãŒãã«ã¯ãã¥ãŒãããªãŒãºè§£é€ããããŸã§åŸ æ©ããŸãã é 延ã枬ã£ãŠã¿ãã blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
ã©ããã解決ã«è¿ã¥ããŠããããã ã ãã¥ãŒã®åçµ/åçµè§£é€ã«äœ¿çšãããé¢æ°ã¯æ¬¡ã®ãšããã§ãã
ãã®ãã¥ãŒãã¯ãªã¢ããã®ã«ãããæéã¯ãã«ãŒãã«ããã¥ãŒã«å
¥ãããããã¹ãŠã®æäœãå®äºãããŸã§åŸ
æ©ããããããã£ã¹ã¯ã®é
延ã«çžåœããŸãã ãã¥ãŒã空ã«ãªããšãèšå®ã®å€æŽãé©çšãããŸãã ãã®åŸãããã¯åŒã³åºãããŸã
ããã§ãç¶æ³ãä¿®æ£ããã®ã«ååãªããšãããããŸããã udevadm ããªã¬ãŒ ã³ãã³ãã«ãããããã㯠ããã€ã¹ã®èšå®ãé©çšãããŸãã ãããã®èšå®ã¯ãudev ã«ãŒã«ã§èª¬æãããŠããŸãã sysfs ãä»ããŠèšå®ãå€æŽããããšããããã«ãŒãã« ãœãŒã¹ ã³ãŒãã確èªããããšã§ããã¥ãŒãããªãŒãºãããŠããèšå®ãèŠã€ããããšãã§ããŸãã BCC ãŠãŒãã£ãªãã£ãè©Šãããšãã§ããŸã
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
Udev ã«ãŒã«ãå€æŽãããããšã¯éåžžã«ãŸãã§ãéåžžãããã¯å¶åŸ¡ãããæ¹æ³ã§è¡ãããŸãã ãããã£ãŠããã§ã«èšå®ãããŠããå€ãé©çšããŠããã¢ããªã±ãŒã·ã§ã³ãããã£ã¹ã¯ãžã®ãªã¯ãšã¹ãã®è»¢éã«é
延ãæ¥å¢ããããšãããããŸãã ãã¡ããããã£ã¹ã¯æ§æã«å€æŽããªãå ŽåïŒããšãã°ãããã€ã¹ãããŠã³ããããŠããªããåæãããŠããªãå ŽåïŒã« udev ã€ãã³ããçæããããšã¯è¯ãç¿æ
£ã§ã¯ãããŸããã ãã ããã«ãŒãã«ãäžå¿
èŠãªäœæ¥ãè¡ããªãããã«ããŠãå¿
èŠããªãå Žåã¯ãªã¯ãšã¹ã ãã¥ãŒãããªãŒãºããããšãã§ããŸãã
ãŸãšã
eBPF ã¯éåžžã«æè»ã§åŒ·åãªããŒã«ã§ãã ãã®èšäºã§ã¯ãå®éã®äºäŸã XNUMX ã€åãäžããäœãã§ããã®ãã®äžéšã瀺ããŸããã BCC ãŠãŒãã£ãªãã£ã®éçºã«èå³ãããå Žåã¯ãæ€èšããŠã¿ã䟡å€ããããŸãã
eBPF ã«åºã¥ããèå³æ·±ããããã° ããŒã«ããããã¡ã€ãªã³ã° ããŒã«ã¯ä»ã«ããããŸãã ãããã®äžã®äžã€ -
åºæïŒ habr.com