Labari game da ɓacewar fakitin DNS daga tallafin fasaha na Google Cloud

Daga Editan Blog na Google: Shin kun taɓa mamakin yadda injiniyoyin Google Cloud Technical Solutions (TSE) suke kula da buƙatun tallafin ku? Injiniyoyin Tallafin Fasaha na TSE suna da alhakin ganowa da gyara tushen matsalolin da mai amfani ya ruwaito. Wasu daga cikin waɗannan matsalolin suna da sauƙi, amma wani lokacin kuna cin karo da tikitin da ke buƙatar kulawar injiniyoyi da yawa lokaci guda. A cikin wannan labarin, ɗaya daga cikin ma'aikatan TSE zai gaya mana game da wata matsala mai banƙyama daga aikinsa na kwanan nan - yanayin bacewar fakitin DNS. A cikin wannan labarin, za mu ga yadda injiniyoyi suka yi nasarar warware matsalar, da kuma sabbin abubuwan da suka koya yayin gyara kuskuren. Muna fatan wannan labarin ba wai kawai ya ilmantar da ku game da bugu mai zurfi ba, har ma yana ba ku haske game da hanyoyin da ke shiga shigar da tikitin tallafi tare da Google Cloud.

Labari game da ɓacewar fakitin DNS daga tallafin fasaha na Google Cloud

Shirya matsala duka kimiyya ne da fasaha. Duk yana farawa tare da gina hasashe game da dalilin rashin daidaituwa na tsarin, bayan haka an gwada shi don ƙarfin. Duk da haka, kafin mu ƙirƙira hasashe, dole ne mu bayyana a fili kuma mu tsara matsalar daidai. Idan tambaya ta yi sauti da yawa, to dole ne ku bincika komai a hankali; Wannan shine "art" na magance matsala.

Ƙarƙashin Google Cloud, irin waɗannan hanyoyin suna zama masu rikitarwa sosai, yayin da Google Cloud ke ƙoƙarin ƙoƙarinsa don tabbatar da sirrin masu amfani da shi. Saboda wannan, injiniyoyin TSE ba su da damar yin gyara tsarin ku, ko ikon duba saiti kamar yadda masu amfani ke yi. Don haka, don gwada kowane ra'ayi namu, mu (injiniyoyi) ba za mu iya canza tsarin da sauri ba.

Wasu masu amfani sun yi imanin cewa za mu gyara komai kamar makanikai a cikin sabis na mota, kuma kawai za mu aiko mana da id na na'ura mai kama-da-wane, yayin da a zahiri tsarin yana faruwa a cikin tsarin tattaunawa: tattara bayanai, ƙirƙira da tabbatarwa (ko ƙaryata) hasashe, kuma, a ƙarshe, matsalolin yanke shawara sun dogara ne akan sadarwa tare da abokin ciniki.

Matsalar da ake tambaya

A yau muna da labari mai kyau. Ɗaya daga cikin dalilan da ya sa aka sami nasarar warware matsalar da aka tsara shi ne cikakken bayanin matsalar. A ƙasa zaku iya ganin kwafin tikitin farko (an gyara don ɓoye bayanan sirri):
Labari game da ɓacewar fakitin DNS daga tallafin fasaha na Google Cloud
Wannan saƙon ya ƙunshi bayanai masu fa'ida da yawa a gare mu:

  • Takamaiman VM
  • An nuna matsalar kanta - DNS ba ya aiki
  • An nuna inda matsalar ta bayyana kanta - VM da akwati
  • Ana nuna matakan da mai amfani ya ɗauka don gano matsalar.

An yi rajistar buƙatun a matsayin "P1: Babban Tasirin - Sabis ɗin da ba a iya amfani da shi ba a samarwa", wanda ke nufin sa ido akai-akai game da yanayin 24/7 bisa ga tsarin "Bi Rana" (zaka iya karanta ƙarin game da fifikon buƙatun mai amfani), tare da canja wurinsa daga ƙungiyar goyon bayan fasaha zuwa wani tare da kowane yanki na lokaci. A gaskiya ma, a lokacin da matsalar ta kai ga tawagarmu a Zurich, ta riga ta zagaya duniya. A wannan lokacin, mai amfani ya ɗauki matakan ragewa, amma yana jin tsoron sake maimaita halin da ake ciki a samarwa, tun da har yanzu ba a gano tushen dalilin ba.

A lokacin da tikitin ya isa Zurich, mun riga mun sami bayanai masu zuwa a hannu:

  • Abun ciki /etc/hosts
  • Abun ciki /etc/resolv.conf
  • ƙarshe iptables-save
  • Tawagar ta taru ngrep pcap fayil

Tare da wannan bayanan, mun kasance a shirye don fara "bincike" da lokaci na matsala.

Matakan mu na farko

Da farko, mun bincika rajistan ayyukan da matsayi na uwar garken metadata kuma mun tabbatar da cewa yana aiki daidai. Sabar metadata tana amsa adireshin IP 169.254.169.254 kuma, a tsakanin sauran abubuwa, yana da alhakin sarrafa sunayen yanki. Mun kuma bincika sau biyu cewa Tacewar zaɓi yana aiki daidai da VM kuma baya toshe fakiti.

Wata irin baƙuwar matsala ce: binciken nmap ya ƙaryata babban hasashe game da asarar fakitin UDP, don haka a hankali muka fito da ƙarin zaɓuɓɓuka da hanyoyin duba su:

  • An jefar da fakiti a zaɓe? => Duba ka'idojin iptables
  • Shin bai yi kankanta ba? MUTUM? => Duba fitarwa ip a show
  • Shin matsalar tana shafar fakitin UDP ko TCP kawai? => Fita dig +tcp
  • An dawo da fakitin tono? => Fita tcpdump
  • Shin libdns suna aiki daidai? => Fita strace don duba watsa fakiti a bangarorin biyu

Anan mun yanke shawarar kiran mai amfani don magance matsalolin kai tsaye.

Yayin kiran muna iya duba abubuwa da yawa:

  • Bayan dubawa da yawa mun ware ka'idodin iptables daga jerin dalilai
  • Muna duba musaya na cibiyar sadarwa da tebura, kuma muna duba sau biyu cewa MTU daidai ne
  • Mun gano hakan dig +tcp google.com (TCP) yana aiki kamar yadda ya kamata, amma dig google.com (UDP) ba ya aiki
  • Bayan sun tafi tcpdump har yanzu yana aiki dig, mun gano cewa ana dawo da fakitin UDP
  • Muka tafi strace dig google.com kuma mun ga yadda tono daidai kira sendmsg() и recvms(), duk da haka an katse na biyun ta hanyar ƙarewar lokaci

Abin takaici, ƙarshen motsi ya zo kuma an tilasta mana mu haɓaka matsalar zuwa yankin lokaci na gaba. Buƙatar, duk da haka, ta tayar da sha'awar ƙungiyarmu, kuma abokin aikinmu ya ba da shawarar ƙirƙirar fakitin DNS na farko ta amfani da tsarin Python mai gogewa.

from scapy.all import *

answer = sr1(IP(dst="169.254.169.254")/UDP(dport=53)/DNS(rd=1,qd=DNSQR(qname="google.com")),verbose=0)
print ("169.254.169.254", answer[DNS].summary())

Wannan guntun yana ƙirƙirar fakitin DNS kuma yana aika buƙatun zuwa uwar garken metadata.

Mai amfani yana gudanar da lambar, an mayar da martani na DNS, kuma aikace-aikacen yana karɓar shi, yana tabbatar da cewa babu matsala a matakin cibiyar sadarwa.

Bayan wani "zagaye-zagaye-duniya," buƙatun ya dawo ga ƙungiyarmu, kuma na canza shi gaba ɗaya zuwa kaina, ina tunanin cewa zai fi dacewa ga mai amfani idan buƙatar ta daina kewayawa daga wuri zuwa wuri.

A halin yanzu, mai amfani ya yarda ya samar da hoton hoton tsarin. Wannan labari ne mai kyau: ikon gwada tsarin da kaina yana yin matsala da sauri da sauri, saboda ba zan sake tambayar mai amfani don aiwatar da umarni ba, aika mani sakamakon kuma bincika su, zan iya yin komai da kaina!

Abokan aikina sun fara yi mini hassada kadan. A cikin abincin rana mun tattauna batun tuba, amma babu wanda ya san abin da ke faruwa. Abin farin ciki, mai amfani da kansa ya riga ya ɗauki matakai don rage sakamakon kuma ba shi da sauri, don haka muna da lokaci don rarraba matsalar. Kuma tun da muna da hoto, za mu iya gudanar da kowane gwaje-gwajen da ke sha'awar mu. Mai girma!

Daukar mataki baya

Ɗaya daga cikin shahararrun tambayoyin hira don matsayi na injiniyan tsarin shine: "Abin da ke faruwa lokacin da kuke yin ping www.google.com? Tambayar tana da kyau, tun da dan takarar yana buƙatar bayyana komai daga harsashi zuwa sararin mai amfani, zuwa tsarin kernel sannan zuwa cibiyar sadarwa. Na yi murmushi: wani lokaci tambayoyin hira suna zama masu amfani a rayuwa ta gaske...

Na yanke shawarar yin amfani da wannan tambayar HR zuwa matsala ta yanzu. Kusan magana, lokacin da kuke ƙoƙarin tantance sunan DNS, mai zuwa yana faruwa:

  1. Aikace-aikacen yana kiran ɗakin karatu na tsarin kamar libdns
  2. libdns yana duba tsarin tsarin da uwar garken DNS ya kamata ya tuntube shi (a cikin zanen wannan shine 169.254.169.254, uwar garken metadata)
  3. libdns yana amfani da kiran tsarin don ƙirƙirar soket na UDP (SOKET_DGRAM) kuma aika fakitin UDP tare da tambayar DNS a bangarorin biyu.
  4. Ta hanyar haɗin sysctl za ku iya saita tarin UDP a matakin kernel
  5. Kwayar tana mu'amala da kayan aikin don watsa fakiti akan hanyar sadarwa ta hanyar hanyar sadarwa
  6. Mai hypervisor yana kamawa kuma yana watsa fakitin zuwa uwar garken metadata akan tuntuɓar sa
  7. Sabar metadata, ta wurin sihirinta, tana ƙayyade sunan DNS kuma tana mayar da martani ta hanyar amfani da wannan hanya

Labari game da ɓacewar fakitin DNS daga tallafin fasaha na Google Cloud
Bari in tunatar da ku abubuwan da muka riga muka yi la'akari:

Hasashe: Rushewar ɗakunan karatu

  • Gwaji 1: gudanar da layi a cikin tsarin, duba cewa tono kira daidai kiran tsarin
  • Sakamako: Ana kiran madaidaicin kiran tsarin
  • Gwaji na 2: Yin amfani da srapy don bincika ko za mu iya tantance sunaye na ketare dakunan karatu na tsarin
  • Sakamako: za mu iya
  • Gwaji 3: gudu rpm –V akan kunshin libdns da fayilolin laburare md5sum
  • Sakamako: lambar laburare gabaɗaya iri ɗaya ce da lambar a tsarin aiki
  • Gwaji 4: Sanya hoton tushen tsarin mai amfani akan VM ba tare da wannan hali ba, gudanar da chroot, duba idan DNS yana aiki
  • Sakamakon: DNS yana aiki daidai

Ƙarshe bisa gwaje-gwaje: matsalar ba ta cikin dakunan karatu

Hasashe: Akwai kuskure a cikin saitunan DNS

  • Gwaji 1: duba tcpdump kuma duba idan an aika fakitin DNS kuma an dawo dasu daidai bayan kunna tono
  • Sakamako: ana watsa fakiti daidai
  • Gwaji 2: duba sau biyu akan uwar garken /etc/nsswitch.conf и /etc/resolv.conf
  • Sakamakon: komai daidai ne

Ƙarshe bisa gwaje-gwaje: matsalar ba tare da daidaitawar DNS ba

Hasashe: core lalace

  • Gwaji: shigar da sabon kwaya, duba sa hannu, sake farawa
  • Sakamako: irin wannan hali

Ƙarshe bisa gwaje-gwaje: kwaya bata lalace ba

Hasashe: kuskuren halayen cibiyar sadarwar mai amfani (ko cibiyar sadarwar hypervisor)

  • Gwaji 1: Bincika saitunan Tacewar zaɓi
  • Sakamakon: Tacewar zaɓi ta wuce fakitin DNS akan duka mai watsa shiri da GCP
  • Gwaji 2: satar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirgar zirga-zirga da saka idanu daidaiwar watsawa da dawo da buƙatun DNS
  • Sakamako: tcpdump ya tabbatar da cewa mai watsa shiri ya karɓi fakitin dawowa

Ƙarshe bisa gwaje-gwaje: matsalar ba ta cikin hanyar sadarwa

Hasashe: uwar garken metadata baya aiki

  • Gwaji 1: duba rajistan ayyukan uwar garken metadata don abubuwan da ba su dace ba
  • Sakamako: babu wani abu mara kyau a cikin rajistan ayyukan
  • Gwaji 2: Ketare uwar garken metadata ta dig @8.8.8.8
  • Sakamako: Resolution ya karye ko da ba tare da amfani da uwar garken metadata ba

Ƙarshe bisa gwaje-gwaje: matsalar bata tare da uwar garken metadata ba

Kasa line: mun gwada duk subsystems ban da saitin runtime!

Nitsewa cikin Saitunan Runtime na Kernel

Don saita yanayin aiwatar da kernel, zaku iya amfani da zaɓuɓɓukan layin umarni (grub) ko sysctl interface. na duba /etc/sysctl.conf kuma kuyi tunani kawai, Na gano saitunan al'ada da yawa. Jin kamar na kama wani abu, na watsar da duk saitunan da ba na hanyar sadarwa ba ko waɗanda ba tcp ba, na rage tare da saitunan dutse. net.core. Daga nan sai na je inda izinin mai masaukin ke cikin VM na fara amfani da saitin daya bayan daya, daya bayan daya, tare da karyar VM, har sai na sami mai laifi:

net.core.rmem_default = 2147483647

Anan shine, saitin mai watsewar DNS! Na sami makamin kisan kai. Amma me yasa hakan ke faruwa? Har yanzu ina bukatan dalili.

Asalin girman fakitin buffer DNS an saita ta ta net.core.rmem_default. Ƙimar ƙima tana wani wuri kusa da 200KiB, amma idan uwar garken ku ta karɓi fakitin DNS da yawa, kuna iya ƙara girman buffer. Idan buffer ya cika lokacin da sabon fakiti ya zo, misali saboda aikace-aikacen baya sarrafa shi da sauri, to zaku fara asarar fakiti. Abokin cinikinmu ya haɓaka girman buffer daidai saboda yana tsoron asarar bayanai, tunda yana amfani da aikace-aikacen tattara awo ta fakitin DNS. Ƙimar da ya saita ita ce matsakaicin yuwuwar: 231-1 (idan an saita zuwa 231, kernel zai dawo da "HUJJAR BANZA").

Nan da nan na gane dalilin da yasa nmap da scapy suka yi aiki daidai: suna amfani da danyen sockets! Raw soket sun bambanta da kwasfa na yau da kullun: suna ƙetare iptables, kuma ba a kulle su ba!

Amma me yasa "mafi girma da girma" ke haifar da matsala? A fili ba ya aiki kamar yadda aka yi niyya.

A wannan lokacin zan iya sake haifar da matsalar akan kernels da yawa da rarrabawa da yawa. Matsalar ta riga ta bayyana akan kernel 3.x kuma yanzu ta bayyana akan kernel 5.x.

Lalle ne, a kan farawa

sysctl -w net.core.rmem_default=$((2**31-1))

DNS ya daina aiki.

Na fara neman ƙimar aiki ta hanyar bincike mai sauƙi na binary algorithm kuma na gano cewa tsarin yana aiki tare da 2147481343, amma wannan lambar ta kasance jerin lambobi marasa ma'ana a gare ni. Na ba abokin ciniki shawarar ya gwada wannan lambar, kuma ya amsa cewa tsarin yana aiki tare da google.com, amma har yanzu ya ba da kuskure tare da wasu wuraren, don haka na ci gaba da bincike na.

na shigar agogon gudu, kayan aiki wanda yakamata a yi amfani dashi a baya: yana nuna daidai inda a cikin kwaya fakiti ya ƙare. Mai laifin shine aikin udp_queue_rcv_skb. Na zazzage tushen kwaya kuma na ƙara kaɗan ayyuka printk don bin diddigin inda ainihin fakitin ya ƙare. Na yi sauri na sami yanayin da ya dace if, kuma kawai ya kalle shi na ɗan lokaci, domin a lokacin ne duk abin da ya zo tare a cikin hoto duka: 231-1, lambar da ba ta da ma'ana, yanki mara aiki ... __udp_enqueue_schedule_skb:

if (rmem > (size + sk->sk_rcvbuf))
		goto uncharge_drop;

Lura:

  • rmem na irin int
  • size na nau'in u16 ne (wanda ba a sanya hannu ba na int goma sha shida) kuma yana adana girman fakitin
  • sk->sk_rcybuf na nau'in int ne kuma yana adana girman buffer wanda, ta ma'anarsa, daidai yake da ƙimar ciki net.core.rmem_default

Lokacin sk_rcvbuf kusanci 231, taƙaita girman fakitin na iya haifar da lamba ta ambaliya. Kuma tun da int ne, darajarsa ta zama mara kyau, don haka yanayin ya zama gaskiya lokacin da ya kamata ya zama ƙarya (zaku iya karanta ƙarin game da wannan a wurin. mahada).

Ana iya gyara kuskuren ta hanya maras muhimmanci: ta hanyar jefawa unsigned int. Na yi amfani da gyaran kuma na sake kunna tsarin kuma DNS ya sake aiki.

Dandanin nasara

Na mika sakamakon bincikena ga abokin ciniki na aika LKML kernel patch. Na ji daɗi: kowane yanki na wuyar warwarewa ya dace tare, Zan iya bayyana ainihin dalilin da yasa muka lura da abin da muka lura, kuma mafi mahimmanci, mun sami damar samun mafita ga matsalar godiya ga aikin haɗin gwiwa!

Yana da kyau a gane cewa shari'ar ta zama ba kasafai ba, kuma da kyar ba mu sami irin wannan hadadden buƙatun daga masu amfani ba.

Labari game da ɓacewar fakitin DNS daga tallafin fasaha na Google Cloud


source: www.habr.com

Add a comment