Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC

Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC

He maha nga taputapu a Linux mo te tarai i te pata me nga tono. Ko te nuinga o ratou he paanga kino ki te mahi tono, kaore e taea te whakamahi i roto i te whakaputa.

E rua tau ki muri i reira kua whakawhanakehia tetahi atu taputapu - eBPF. Ka taea te whai i te pata me nga tono a nga kaiwhakamahi me te iti o runga ake, me te kore e hiahia ki te hanga i nga kaupapa me te uta i nga waahanga tuatoru ki roto i te kernel.

He maha nga taputapu tono e whakamahi ana i te eBPF, a, i roto i tenei tuhinga ka titiro tatou me pehea te tuhi i a koe ake whaipainga korero i runga i te whare pukapuka. PythonBCC. Ko te tuhinga kei runga i nga kaupapa pono. Ka haere tatou mai i te raru ki te whakatika hei whakaatu me pehea te whakamahi taputapu o naianei i roto i nga ahuatanga motuhake.

He Puhoi a Ceph

Kua whakauruhia he kaihautu hou ki te roopu Ceph. I muri i te hekenga o etahi o nga raraunga ki a ia, i kite matou he iti ake te tere o te tukatuka i nga tono tuhi i a ia i runga i etahi atu tūmau.

Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC
Kaore i rite ki etahi atu papaaho, i whakamahia e tenei kaihautu te bcache me te kernel linux 4.15 hou. Koinei te wa tuatahi i whakamahia ai he ope o tenei whirihoranga ki konei. A i taua wa ka maarama ko te putake o te raru ka taea he aha.

Te tirotiro i te Kaihautu

Me timata ma te titiro ki nga mea ka tupu i roto i te tukanga ceph-osd. Mo tenei ka whakamahia e matou tino tika и muramura (he maha atu nga korero ka taea e koe te panui konei):

Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC
Ko te pikitia e whakaatu ana ko te mahi fdatasync() he nui te wa ki te tuku tono ki nga mahi generic_make_tono(). Ko te tikanga ko te take o nga raru kei waho o te daemon osd ake. Ko tenei pea ko te kernel, kopae ranei. Ko te putanga o te iostat i whakaatu i te roanga nui ki te tukatuka i nga tono ma nga kōpae bcache.

I te tirotiro i te kaihautu, i kitea e te systemd-udevd daemon te nui o te wa PTM - tata ki te 20% i runga i nga waahanga maha. He ahua ke tenei, no reira me mohio koe he aha. I te mea kei te mahi tahi a Systemd-udevd me nga uevents, i whakatau matou ki te tirotiro i a raatau aroturuki udevadm. Te ahua nei he maha nga kaupapa whakarereke i hangaia mo ia taputapu poraka i roto i te punaha. He rerekee tenei, no reira me titiro tatou ki nga mea ka puta mai enei huihuinga katoa.

Te whakamahi i te Utauta BCC

I te mea kua kitea e matou, ko te kernel (me te ceph daemon i roto i te waea punaha) he roa te wa i roto generic_make_tono(). Me ngana ki te ine i te tere o tenei mahi. IN Bcc Kei kona ano tetahi taputapu whakamiharo - taumahinga. Ka whai tatou i te daemon e tana PID me te 1 tuarua te wa i waenga i nga putanga me te whakaputa i te hua i roto i te mirihakona.

Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC
Ko tenei ahuatanga ka tere te mahi. Ko nga mea katoa ka tukuna te tono ki te rarangi taraiwa taputapu.

Bcache he taputapu uaua e toru nga kōpae:

  • te taputapu tautoko (te keteroki kōpae), i tenei keehi he HDD puhoi;
  • taputapu keteroki (keteroki kōpae), koinei tetahi waahanga o te taputapu NVMe;
  • te taputapu mariko bcache e whakahaere ana te tono.

E mohio ana matou he puhoi te tuku tono, engari mo tehea o enei taputapu? Ka mahi maatau i tenei wa iti nei.

Kei te mohio tatou inaianei ka raru pea nga huihuinga. Ehara i te mea ngawari te rapu he aha te take o to ratau reanga. Me whakaaro tatou koinei etahi momo rorohiko ka whakarewahia i ia wa. Kia kite tatou he aha te momo rorohiko e rere ana i runga i te punaha ma te whakamahi i te tuhinga execsnoop mai i te taua He kete taputapu BCC. Me whakahaere ka tukuna te putanga ki tetahi konae.

Hei tauira penei:

/usr/share/bcc/tools/execsnoop  | tee ./execdump

E kore matou e whakaatu i te katoa o nga putanga o execsnoop ki konei, engari ko te ahua o tetahi rarangi e pai ana ki a matou:

sh 1764905 5802 0 sudo arcconf getconfig 1 AD | grep Temperature | awk -F '[:/]' '{print $2}' | sed 's/^ ([0-9]*) C.*/1/'

Ko te pou tuatoru ko te PPID (matua PID) o te tukanga. Ko te tukanga me te PID 5802 ko tetahi o nga aho o to maatau punaha aroturuki. I te tirotiro i te whirihoranga o te punaha aroturuki, i kitea nga tawhā he. Ko te pāmahana o te pūurutau HBA i tangohia ia 30 hēkona, he nui ake i te mea e tika ana. Whai muri i te whakarereketanga o te waahi tirotiro ki te waa roa, i kitea e matou kua kore e tu kee te tuutuu tono tono mo tenei kaihautu ki etahi atu kaihautu.

Engari kaore i te maarama he aha te puhoi o te taputapu bcache. I whakareri matou i tetahi papa whakamatautau me te whirihoranga rite, ka ngana ki te whakaputa i te raru ma te whakahaere i te fio i runga i te bcache, ka whakahaere i ia waa te keu udevadm ki te whakaputa uevents.

Tuhi Utauta BCC-Based Utauta

Me ngana ki te tuhi i tetahi taputapu ngawari ki te whai me te whakaatu i nga waea puhoi generic_make_tono(). Kei te pirangi ano matou ki te ingoa o te puku i kiia ai tenei mahi.

He ngawari te mahere:

  • Rehita kprobe i runga i generic_make_tono():
    • Ka tiakina e matou te ingoa kōpae ki roto i te mahara, ka taea ma te tohenga mahi;
    • Ka tiakina e matou te tohu wa.

  • Rehita kretprobe mo te hokinga mai generic_make_tono():
    • Ka whiwhi tatou i te tohu waahi o naianei;
    • Ka kimihia e matou te tohu wa kua tiakina, ka whakatauritea ki te tohu o naianei;
    • Mena he nui ake te hua i te mea kua tohua, katahi ka kitea te ingoa kōpae kua tiakina ka whakaatu ki te tauranga.

Kprobes и kretprobes whakamahi i te tikanga whatiwhati ki te huri i te waehere mahi i runga i te rere. Ka taea e koe te panui tuhinga и pai tuhinga mo tenei kaupapa. Mena ka titiro koe ki te waehere o nga momo taputapu kei roto Bcc, ka kite koe he rite tonu te hanganga. Na i roto i tenei tuhinga ka pekehia e matou nga tohenga tuhinga tuhi ka haere ki te kaupapa BPF ake.

Ko te kupu eBPF kei roto i te tuhinga python he penei te ahua:

bpf_text = “”” # Here will be the bpf program code “””

Hei whakawhiti raraunga i waenga i nga mahi, ka whakamahia e nga kaupapa eBPF ripanga hash. Ka pera ano tatou. Ka whakamahia e matou te tukanga PID hei matua, ka tautuhi i te hanganga hei uara:

struct data_t {
	u64 pid;
	u64 ts;
	char comm[TASK_COMM_LEN];
	u64 lat;
	char disk[DISK_NAME_LEN];
};

BPF_HASH(p, u64, struct data_t);
BPF_PERF_OUTPUT(events);

I konei ka rehitatia he ripanga hash e kiia nei p, me te momo matua u64 me te uara o te momo raraunga hanganga_t. Ka waatea te tepu i roto i te horopaki o ta maatau kaupapa BPF. Ka rehitatia e te tonotono BPF_PERF_OUTPUT tetahi atu ripanga e kiia ana ngā, e whakamahia ana mo tuku raraunga ki te waahi kaiwhakamahi.

I te ine i nga wa roa i waenga i te karanga i tetahi mahi me te hoki mai, i waenga ranei i nga waea ki nga mahi rereke, me whai whakaaro koe me uru nga raraunga kua riro mai ki te horopaki kotahi. I etahi atu kupu, me mahara koe mo te whakarewatanga whakarara o nga mahi. Kei a matou te kaha ki te ine i te waahi i waenga i te karanga i tetahi mahi i roto i te horopaki o tetahi tukanga me te hoki mai i taua mahi i roto i te horopaki o tetahi atu tukanga, engari he koretake tenei. He tauira pai kei konei whaipainga biolatency, kei reira te taviri ripanga kua tautuhia ki te atatohu ki tono hanganga, e whakaatu ana i te tono kōpae kotahi.

I muri mai, me tuhi tatou i te waehere ka haere ina karangahia te mahi e akohia ana:

void start(struct pt_regs *ctx, struct bio *bio) {
	u64 pid = bpf_get_current_pid_tgid();
	struct data_t data = {};
	u64 ts = bpf_ktime_get_ns();
	data.pid = pid;
	data.ts = ts;
	bpf_probe_read_str(&data.disk, sizeof(data.disk), (void*)bio->bi_disk->disk_name);
	p.update(&pid, &data);
}

I konei ka whakakapia te tohenga tuatahi o te mahi i kiia hei tohenga tuarua generic_make_tono(). I muri i tenei, ka whiwhi matou i te PID o te tukanga i roto i te horopaki e mahi ana matou, me te tohu waahi o naianei i roto i nga nanohekona. Ka tuhia katoatia e matou ki roto i te kowhiringa hou struct data_t raraunga. Ka whiwhi tatou i te ingoa kōpae mai i te hanganga bio, ka tukuna i te wa e karanga ana generic_make_tono(), ka tiakina i roto i te hanganga kotahi raraunga. Ko te mahi whakamutunga ko te taapiri i tetahi urunga ki te ripanga hash i whakahuahia i mua.

Ka karangahia te mahi e whai ake nei ina hoki mai i generic_make_tono():

void stop(struct pt_regs *ctx) {
    u64 pid = bpf_get_current_pid_tgid();
    u64 ts = bpf_ktime_get_ns();
    struct data_t* data = p.lookup(&pid);
    if (data != 0 && data->ts > 0) {
        bpf_get_current_comm(&data->comm, sizeof(data->comm));
        data->lat = (ts - data->ts)/1000;
        if (data->lat > MIN_US) {
            FACTOR
            data->pid >>= 32;
            events.perf_submit(ctx, data, sizeof(struct data_t));
        }
        p.delete(&pid);
    }
}

He rite tenei mahi ki te mea o mua: ka kitea e matou te PID o te tukanga me te tohu waahi, engari kaua e tohatoha te mahara mo te hanganga raraunga hou. Engari, ka rapua e matou te ripanga hash mo tetahi hanganga o mua ma te whakamahi i te matua == PID o naianei. Mena ka kitea te hanganga, ka kitea e matou te ingoa o te tukanga whakahaere me te taapiri atu.

Ko te huringa rua e whakamahia ana i konei ka hiahiatia kia whiwhi i te miro GID. aua. PID o te tukanga matua i timata te miro i roto i te horopaki e mahi ana matou. Ko te mahi ka kiia e matou bpf_get_current_pid_tgid() ka whakahoki i te GID o te miro me tana PID ki te uara moka-64 kotahi.

I te wa e whakaputa ana ki te tauranga, kaore matou e aro ki te miro, engari kei te pirangi matou ki te mahi matua. Whai muri i te whakatairite i te whakaroa kua puta me te paepae kua homai, ka tukuna to maatau hanganga raraunga ki te mokowā kaiwhakamahi mā te ripanga ngā, ka mutu ka mukua te urunga mai p.

I roto i te tuhinga python ka utaina tenei waehere, me whakakapi e tatou te MIN_US me te FACTOR me nga paepae whakaroa me nga waeine wa, ka paahitia e tatou nga tautohetohe:

bpf_text = bpf_text.replace('MIN_US',str(min_usec))
if args.milliseconds:
	bpf_text = bpf_text.replace('FACTOR','data->lat /= 1000;')
	label = "msec"
else:
	bpf_text = bpf_text.replace('FACTOR','')
	label = "usec"

Inaianei me whakarite te kaupapa BPF ma Tonotono BPF me te rehita tauira:

b = BPF(text=bpf_text)
b.attach_kprobe(event="generic_make_request",fn_name="start")
b.attach_kretprobe(event="generic_make_request",fn_name="stop")

Ma tatou ano e whakatau raraunga hanganga_t i roto i ta maatau tuhinga, mena kaore e taea e taatau te panui tetahi mea:

TASK_COMM_LEN = 16	# linux/sched.h
DISK_NAME_LEN = 32	# linux/genhd.h
class Data(ct.Structure):
	_fields_ = [("pid", ct.c_ulonglong),
            	("ts", ct.c_ulonglong),
            	("comm", ct.c_char * TASK_COMM_LEN),
            	("lat", ct.c_ulonglong),
            	("disk",ct.c_char * DISK_NAME_LEN)]

Ko te mahi whakamutunga ko te whakaputa raraunga ki te tauranga:

def print_event(cpu, data, size):
    global start
    event = ct.cast(data, ct.POINTER(Data)).contents
    if start == 0:
        start = event.ts
    time_s = (float(event.ts - start)) / 1000000000
    print("%-18.9f %-16s %-6d   %-1s %s   %s" % (time_s, event.comm, event.pid, event.lat, label, event.disk))

b["events"].open_perf_buffer(print_event)
# format output
start = 0
while 1:
    try:
        b.perf_buffer_poll()
    except KeyboardInterrupt:
        exit()

Ko te tuhinga ake kei te waatea i GItHub. Me ngana ki te whakahaere i runga i te papaaa whakamatautau kei te rere a fio, te tuhi ki te bcache, ka waea atu ki te aroturuki udevadm:

Mai i te High Ceph Latency ki te Kernel Patch ma te whakamahi i te eBPF/BCC
Ka mutu! Inaianei kua kite tatou ko te ahua o te taputapu bcache e tarai ana he tino waea kua mutu generic_make_tono() mo te kōpae keteroki.

Keria ki roto i te Kernel

He aha te mea e puhoi ana i te wa tuku tono? Ka kite matou ka puta te whakaroa i mua i te tiimata o te tono kaute, i.e. Ko te kaute mo tetahi tono motuhake mo etahi atu putanga o nga tatauranga kei runga (/proc/diskstats or iostat) kaore ano kia timata. Ka ngawari te manatoko ma te whakahaere iostat i te wa e whakaputa ana i te raru, ranei Ko te koiora tuhi BCC, i runga i te timatanga me te mutunga o te tono kaute. Kaore tetahi o enei taputapu e whakaatu raru mo nga tono ki te kōpae keteroki.

Mena ka titiro tatou ki te mahi generic_make_tono(), katahi ka kite tatou i mua i te tiimata o te tono kaute, ka karangahia etahi atu mahi e rua. Tuatahi - generic_make_request_checks(), ka tirotiro i te tika o te tono mo nga tautuhinga kōpae. Tuarua - blk_queue_enter(), he wero whakamere tatari_event_interruptible():

ret = wait_event_interruptible(q->mq_freeze_wq,
	(atomic_read(&q->mq_freeze_depth) == 0 &&
	(preempt || !blk_queue_preempt_only(q))) ||
	blk_queue_dying(q));

I roto, ka tatari te kakano kia wetewete te rarangi. Kia inehia te roa blk_queue_enter():

~# /usr/share/bcc/tools/funclatency  blk_queue_enter -i 1 -m               	 
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.

 	msecs           	: count 	distribution
     	0 -> 1      	: 341  	|****************************************|

 	msecs           	: count 	distribution
     	0 -> 1      	: 316  	|****************************************|

 	msecs           	: count 	distribution
     	0 -> 1      	: 255  	|****************************************|
     	2 -> 3      	: 0    	|                                    	|
     	4 -> 7      	: 0    	|                                    	|
     	8 -> 15     	: 1    	|                                    	|

Te ahua nei kua tata tatou ki tetahi otinga. Ko nga mahi e whakamahia ana ki te whakatio/whakarewa i te rarangi blk_mq_freeze_queue и blk_mq_unfreeze_queue. Ka whakamahia i te wa e tika ana ki te whakarereke i nga tautuhinga rarangi tono, he mea kino pea mo nga tono i tenei rarangi. I te wa e karanga ana blk_mq_freeze_queue() mahi blk_freeze_queue_start() kua piki ake te porotiti q->mq_freeze_depth. Whai muri i tenei, ka tatari te kakano kia putu te rarangi blk_mq_freeze_queue_wait().

Ko te wa hei whakawātea i tēnei tūtira he ōrite ki te torohū kōpae i te wā e tatari ana te kākano kia oti ngā mahi tūtira katoa. Kia noho kau te rarangi, ka tukuna nga huringa tautuhinga. Muri iho ka kiia blk_mq_unfreeze_queue(), te whakaheke i te porotiti whakatio_hohonu.

Inaianei kei te mohio tatou ki te whakatika i te ahuatanga. Ko te whakahau keu udevadm ka tukuna nga tautuhinga mo te taputapu poraka. Ko enei tautuhinga e whakaahuatia ana i roto i nga ture udev. Ka kitea e tatou ko nga tautuhinga kei te whakatio i te rarangi ma te ngana ki te whakarereke ma te sysfs, ma te titiro ranei ki te waehere puna kernel. Ka taea hoki e tatou te whakamatau i te whaipainga BCC tohu, ka whakaputa i te kaara me te mokowākaiwhakamahi tohu tāpae mo ia waea ki te tauranga blk_freeze_queue, hei tauira:

~# /usr/share/bcc/tools/trace blk_freeze_queue -K -U
PID 	TID 	COMM        	FUNC        	 
3809642 3809642 systemd-udevd   blk_freeze_queue
    	blk_freeze_queue+0x1 [kernel]
    	elevator_switch+0x29 [kernel]
    	elv_iosched_store+0x197 [kernel]
    	queue_attr_store+0x5c [kernel]
    	sysfs_kf_write+0x3c [kernel]
    	kernfs_fop_write+0x125 [kernel]
    	__vfs_write+0x1b [kernel]
    	vfs_write+0xb8 [kernel]
    	sys_write+0x55 [kernel]
    	do_syscall_64+0x73 [kernel]
    	entry_SYSCALL_64_after_hwframe+0x3d [kernel]
    	__write_nocancel+0x7 [libc-2.23.so]
    	[unknown]

3809631 3809631 systemd-udevd   blk_freeze_queue
    	blk_freeze_queue+0x1 [kernel]
    	queue_requests_store+0xb6 [kernel]
    	queue_attr_store+0x5c [kernel]
    	sysfs_kf_write+0x3c [kernel]
    	kernfs_fop_write+0x125 [kernel]
    	__vfs_write+0x1b [kernel]
    	vfs_write+0xb8 [kernel]
    	sys_write+0x55 [kernel]
    	do_syscall_64+0x73 [kernel]
    	entry_SYSCALL_64_after_hwframe+0x3d [kernel]
    	__write_nocancel+0x7 [libc-2.23.so]
    	[unknown]

He iti noa te rereke o nga ture Udev me te nuinga o te waa ka puta ma te whakahaere. Na ka kite matou ko te tono i nga uara kua oti te whakarite ka puta te pikinga o te roa o te whakawhiti i te tono mai i te tono ki te kōpae. Ko te tikanga, ko te whakaputa i nga kaupapa udev karekau he huringa i roto i te whirihoranga kōpae (hei tauira, kaore te taputapu i utaina / momotuhia) ehara i te mahi pai. Heoi, ka taea e tatou te awhina i te kakano kia kore e mahi i nga mahi koretake me te whakatio i te rarangi tono mena kaore e tika. E toru iti tuku whakatikahia te ahuatanga.

Opaniraa

Ko te eBPF he taputapu tino ngawari me te kaha. I roto i te tuhinga i titiro matou ki tetahi keehi whaitake me te whakaatu i tetahi waahanga iti o nga mea ka taea. Mena kei te pirangi koe ki te whakawhanake i nga taputapu BCC, he pai ki te titiro akoranga mana, e whakaatu pai ana i nga kaupapa taketake.

Arā ano etahi atu taputapu patuiro me te tohu i runga i te eBPF. Ko tetahi o ratou - bpftrace, ka taea e koe te tuhi i nga rarangi kotahi me nga kaupapa iti i roto i te reo awk-rite. Ko tetahi atu - ebpf_kaituku, ka taea e koe te kohikohi i nga inenga taumata-iti, teitei-taumira tika ki roto i to tūmau prometheus, me te kaha ki te tiki i nga tirohanga ataahua me nga matohi.

Source: will.com

Tāpiri i te kōrero