He maha nga taputapu a Linux mo te tarai i te pata me nga tono. Ko te nuinga o ratou he paanga kino ki te mahi tono, kaore e taea te whakamahi i roto i te whakaputa.
E rua tau ki muri i reira
He maha nga taputapu tono e whakamahi ana i te eBPF, a, i roto i tenei tuhinga ka titiro tatou me pehea te tuhi i a koe ake whaipainga korero i runga i te whare pukapuka.
He Puhoi a Ceph
Kua whakauruhia he kaihautu hou ki te roopu Ceph. I muri i te hekenga o etahi o nga raraunga ki a ia, i kite matou he iti ake te tere o te tukatuka i nga tono tuhi i a ia i runga i etahi atu tūmau.
Kaore i rite ki etahi atu papaaho, i whakamahia e tenei kaihautu te bcache me te kernel linux 4.15 hou. Koinei te wa tuatahi i whakamahia ai he ope o tenei whirihoranga ki konei. A i taua wa ka maarama ko te putake o te raru ka taea he aha.
Te tirotiro i te Kaihautu
Me timata ma te titiro ki nga mea ka tupu i roto i te tukanga ceph-osd. Mo tenei ka whakamahia e matou
Ko te pikitia e whakaatu ana ko te mahi fdatasync() he nui te wa ki te tuku tono ki nga mahi generic_make_tono(). Ko te tikanga ko te take o nga raru kei waho o te daemon osd ake. Ko tenei pea ko te kernel, kopae ranei. Ko te putanga o te iostat i whakaatu i te roanga nui ki te tukatuka i nga tono ma nga kōpae bcache.
I te tirotiro i te kaihautu, i kitea e te systemd-udevd daemon te nui o te wa PTM - tata ki te 20% i runga i nga waahanga maha. He ahua ke tenei, no reira me mohio koe he aha. I te mea kei te mahi tahi a Systemd-udevd me nga uevents, i whakatau matou ki te tirotiro i a raatau aroturuki udevadm. Te ahua nei he maha nga kaupapa whakarereke i hangaia mo ia taputapu poraka i roto i te punaha. He rerekee tenei, no reira me titiro tatou ki nga mea ka puta mai enei huihuinga katoa.
Te whakamahi i te Utauta BCC
I te mea kua kitea e matou, ko te kernel (me te ceph daemon i roto i te waea punaha) he roa te wa i roto generic_make_tono(). Me ngana ki te ine i te tere o tenei mahi. IN
Ko tenei ahuatanga ka tere te mahi. Ko nga mea katoa ka tukuna te tono ki te rarangi taraiwa taputapu.
Bcache he taputapu uaua e toru nga kōpae:
- te taputapu tautoko (te keteroki kōpae), i tenei keehi he HDD puhoi;
- taputapu keteroki (keteroki kōpae), koinei tetahi waahanga o te taputapu NVMe;
- te taputapu mariko bcache e whakahaere ana te tono.
E mohio ana matou he puhoi te tuku tono, engari mo tehea o enei taputapu? Ka mahi maatau i tenei wa iti nei.
Kei te mohio tatou inaianei ka raru pea nga huihuinga. Ehara i te mea ngawari te rapu he aha te take o to ratau reanga. Me whakaaro tatou koinei etahi momo rorohiko ka whakarewahia i ia wa. Kia kite tatou he aha te momo rorohiko e rere ana i runga i te punaha ma te whakamahi i te tuhinga execsnoop mai i te taua
Hei tauira penei:
/usr/share/bcc/tools/execsnoop | tee ./execdump
E kore matou e whakaatu i te katoa o nga putanga o execsnoop ki konei, engari ko te ahua o tetahi rarangi e pai ana ki a matou:
sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'
Ko te pou tuatoru ko te PPID (matua PID) o te tukanga. Ko te tukanga me te PID 5802 ko tetahi o nga aho o to maatau punaha aroturuki. I te tirotiro i te whirihoranga o te punaha aroturuki, i kitea nga tawhā he. Ko te pāmahana o te pūurutau HBA i tangohia ia 30 hēkona, he nui ake i te mea e tika ana. Whai muri i te whakarereketanga o te waahi tirotiro ki te waa roa, i kitea e matou kua kore e tu kee te tuutuu tono tono mo tenei kaihautu ki etahi atu kaihautu.
Engari kaore i te maarama he aha te puhoi o te taputapu bcache. I whakareri matou i tetahi papa whakamatautau me te whirihoranga rite, ka ngana ki te whakaputa i te raru ma te whakahaere i te fio i runga i te bcache, ka whakahaere i ia waa te keu udevadm ki te whakaputa uevents.
Tuhi Utauta BCC-Based Utauta
Me ngana ki te tuhi i tetahi taputapu ngawari ki te whai me te whakaatu i nga waea puhoi generic_make_tono(). Kei te pirangi ano matou ki te ingoa o te puku i kiia ai tenei mahi.
He ngawari te mahere:
- Rehita kprobe i runga i generic_make_tono():
- Ka tiakina e matou te ingoa kōpae ki roto i te mahara, ka taea ma te tohenga mahi;
- Ka tiakina e matou te tohu wa.
- Rehita kretprobe mo te hokinga mai generic_make_tono():
- Ka whiwhi tatou i te tohu waahi o naianei;
- Ka kimihia e matou te tohu wa kua tiakina, ka whakatauritea ki te tohu o naianei;
- Mena he nui ake te hua i te mea kua tohua, katahi ka kitea te ingoa kōpae kua tiakina ka whakaatu ki te tauranga.
Kprobes и kretprobes whakamahi i te tikanga whatiwhati ki te huri i te waehere mahi i runga i te rere. Ka taea e koe te panui
Ko te kupu eBPF kei roto i te tuhinga python he penei te ahua:
bpf_text = “”” # Here will be the bpf program code “””
Hei whakawhiti raraunga i waenga i nga mahi, ka whakamahia e nga kaupapa eBPF
struct data_t {
u64 pid;
u64 ts;
char comm[TASK_COMM_LEN];
u64 lat;
char disk[DISK_NAME_LEN];
};
BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);
I konei ka rehitatia he ripanga hash e kiia nei p, me te momo matua u64 me te uara o te momo raraunga hanganga_t. Ka waatea te tepu i roto i te horopaki o ta maatau kaupapa BPF. Ka rehitatia e te tonotono BPF_PERF_OUTPUT tetahi atu ripanga e kiia ana ngā, e whakamahia ana mo
I te ine i nga wa roa i waenga i te karanga i tetahi mahi me te hoki mai, i waenga ranei i nga waea ki nga mahi rereke, me whai whakaaro koe me uru nga raraunga kua riro mai ki te horopaki kotahi. I etahi atu kupu, me mahara koe mo te whakarewatanga whakarara o nga mahi. Kei a matou te kaha ki te ine i te waahi i waenga i te karanga i tetahi mahi i roto i te horopaki o tetahi tukanga me te hoki mai i taua mahi i roto i te horopaki o tetahi atu tukanga, engari he koretake tenei. He tauira pai kei konei
I muri mai, me tuhi tatou i te waehere ka haere ina karangahia te mahi e akohia ana:
void start(struct pt_regs *ctx, struct bio *bio) {
u64 pid = bpf_get_current_pid_tgid();
struct data_t data = {};
u64 ts = bpf_ktime_get_ns();
data.pid = pid;
data.ts = ts;
bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
p.update(&pid, &data);
}
I konei ka whakakapia te tohenga tuatahi o te mahi i kiia hei tohenga tuarua
Ka karangahia te mahi e whai ake nei ina hoki mai i generic_make_tono():
void stop(struct pt_regs *ctx) {
u64 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
struct data_t* data = p.lookup(&pid);
if (data != 0 && data->ts > 0) {
bpf_get_current_comm(&data->comm, sizeof(data->comm));
data->lat = (ts - data->ts)/1000;
if (data->lat > MIN_US) {
FACTOR
data->pid >>= 32;
events.perf_submit(ctx, data, sizeof(struct data_t));
}
p.delete(&pid);
}
}
He rite tenei mahi ki te mea o mua: ka kitea e matou te PID o te tukanga me te tohu waahi, engari kaua e tohatoha te mahara mo te hanganga raraunga hou. Engari, ka rapua e matou te ripanga hash mo tetahi hanganga o mua ma te whakamahi i te matua == PID o naianei. Mena ka kitea te hanganga, ka kitea e matou te ingoa o te tukanga whakahaere me te taapiri atu.
Ko te huringa rua e whakamahia ana i konei ka hiahiatia kia whiwhi i te miro GID. aua. PID o te tukanga matua i timata te miro i roto i te horopaki e mahi ana matou. Ko te mahi ka kiia e matou
I te wa e whakaputa ana ki te tauranga, kaore matou e aro ki te miro, engari kei te pirangi matou ki te mahi matua. Whai muri i te whakatairite i te whakaroa kua puta me te paepae kua homai, ka tukuna to maatau hanganga raraunga ki te mokowā kaiwhakamahi mā te ripanga ngā, ka mutu ka mukua te urunga mai p.
I roto i te tuhinga python ka utaina tenei waehere, me whakakapi e tatou te MIN_US me te FACTOR me nga paepae whakaroa me nga waeine wa, ka paahitia e tatou nga tautohetohe:
bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
label = "msec"
else:
bpf_text = bpf_text.replace('FACTOR','')
label = "usec"
Inaianei me whakarite te kaupapa BPF ma
b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")
Ma tatou ano e whakatau raraunga hanganga_t i roto i ta maatau tuhinga, mena kaore e taea e taatau te panui tetahi mea:
TASK_COMM_LEN = 16 # linux/sched.h
DISK_NAME_LEN = 32 # linux/genhd.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("lat", ct.c_ulonglong),
("disk",ct.c_char * DISK_NAME_LEN)]
Ko te mahi whakamutunga ko te whakaputa raraunga ki te tauranga:
def print_event(cpu, data, size):
global start
event = ct.cast(data, ct.POINTER(Data)).contents
if start == 0:
start = event.ts
time_s = (float(event.ts - start)) / 1000000000
print("%-18.9f %-16s %-6d %-1s %s %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))
b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Ko te tuhinga ake kei te waatea i
Ka mutu! Inaianei kua kite tatou ko te ahua o te taputapu bcache e tarai ana he tino waea kua mutu generic_make_tono() mo te kōpae keteroki.
Keria ki roto i te Kernel
He aha te mea e puhoi ana i te wa tuku tono? Ka kite matou ka puta te whakaroa i mua i te tiimata o te tono kaute, i.e. Ko te kaute mo tetahi tono motuhake mo etahi atu putanga o nga tatauranga kei runga (/proc/diskstats or iostat) kaore ano kia timata. Ka ngawari te manatoko ma te whakahaere iostat i te wa e whakaputa ana i te raru, ranei
Mena ka titiro tatou ki te mahi generic_make_tono(), katahi ka kite tatou i mua i te tiimata o te tono kaute, ka karangahia etahi atu mahi e rua. Tuatahi - generic_make_request_checks(), ka tirotiro i te tika o te tono mo nga tautuhinga kōpae. Tuarua -
ret = wait_event_interruptible(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
I roto, ka tatari te kakano kia wetewete te rarangi. Kia inehia te roa blk_queue_enter():
~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 341 |****************************************|
msecs : count distribution
0 -> 1 : 316 |****************************************|
msecs : count distribution
0 -> 1 : 255 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
Te ahua nei kua tata tatou ki tetahi otinga. Ko nga mahi e whakamahia ana ki te whakatio/whakarewa i te rarangi
Ko te wa hei whakawātea i tēnei tūtira he ōrite ki te torohū kōpae i te wā e tatari ana te kākano kia oti ngā mahi tūtira katoa. Kia noho kau te rarangi, ka tukuna nga huringa tautuhinga. Muri iho ka kiia
Inaianei kei te mohio tatou ki te whakatika i te ahuatanga. Ko te whakahau keu udevadm ka tukuna nga tautuhinga mo te taputapu poraka. Ko enei tautuhinga e whakaahuatia ana i roto i nga ture udev. Ka kitea e tatou ko nga tautuhinga kei te whakatio i te rarangi ma te ngana ki te whakarereke ma te sysfs, ma te titiro ranei ki te waehere puna kernel. Ka taea hoki e tatou te whakamatau i te whaipainga BCC
~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID TID COMM FUNC
3809642 3809642 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
elevator_switch+0x29 [kernel]
elv_iosched_store+0x197 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
3809631 3809631 systemd-udevd blk_freeze_queue
blk_freeze_queue+0x1 [kernel]
queue_requests_store+0xb6 [kernel]
queue_attr_store+0x5c [kernel]
sysfs_kf_write+0x3c [kernel]
kernfs_fop_write+0x125 [kernel]
__vfs_write+0x1b [kernel]
vfs_write+0xb8 [kernel]
sys_write+0x55 [kernel]
do_syscall_64+0x73 [kernel]
entry_SYSCALL_64_after_hwframe+0x3d [kernel]
__write_nocancel+0x7 [libc-2.23.so]
[unknown]
He iti noa te rereke o nga ture Udev me te nuinga o te waa ka puta ma te whakahaere. Na ka kite matou ko te tono i nga uara kua oti te whakarite ka puta te pikinga o te roa o te whakawhiti i te tono mai i te tono ki te kōpae. Ko te tikanga, ko te whakaputa i nga kaupapa udev karekau he huringa i roto i te whirihoranga kōpae (hei tauira, kaore te taputapu i utaina / momotuhia) ehara i te mahi pai. Heoi, ka taea e tatou te awhina i te kakano kia kore e mahi i nga mahi koretake me te whakatio i te rarangi tono mena kaore e tika.
Opaniraa
Ko te eBPF he taputapu tino ngawari me te kaha. I roto i te tuhinga i titiro matou ki tetahi keehi whaitake me te whakaatu i tetahi waahanga iti o nga mea ka taea. Mena kei te pirangi koe ki te whakawhanake i nga taputapu BCC, he pai ki te titiro
Arā ano etahi atu taputapu patuiro me te tohu i runga i te eBPF. Ko tetahi o ratou -
Source: will.com