Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM

Yanayin al'ada lokacin aiwatar da CI / CD a cikin Kubernetes: aikace-aikacen dole ne ya sami damar karɓar sabbin buƙatun abokin ciniki kafin tsayawa gabaɗaya, kuma mafi mahimmanci, nasarar kammala abubuwan da ke akwai.

Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM

Yarda da wannan yanayin yana ba ku damar cimma lokacin raguwar sifili yayin turawa. Duk da haka, ko da lokacin amfani da shahararrun daure (kamar NGINX da PHP-FPM), za ku iya fuskantar matsalolin da za su haifar da yawan kurakurai tare da kowane turawa.

Ka'idar Yadda kwando ke rayuwa

Mun riga mun buga dalla-dalla game da yanayin rayuwar kwasfa wannan labarin. A cikin mahallin batun da aka yi la'akari, muna sha'awar wadannan: a lokacin da kwafsa ya shiga cikin jihar Tsayarwa, an daina aika sabbin buƙatun zuwa gare ta (pod cire daga jerin abubuwan ƙarshen sabis). Don haka, don guje wa raguwa lokacin turawa, ya ishe mu mu magance matsalar dakatar da aikace-aikacen daidai.

Hakanan ya kamata ku tuna cewa lokacin alheri na tsoho shine 30 seconds: bayan wannan, za a ƙare kwaf ɗin kuma aikace-aikacen dole ne ya sami lokaci don aiwatar da duk buƙatun kafin wannan lokacin. Примечание: ko da yake duk wani buƙatun da ya ɗauki fiye da daƙiƙa 5-10 ya riga ya sami matsala, kuma rufewar alheri ba zai ƙara taimaka masa ba.

Don ƙarin fahimtar abin da ke faruwa lokacin da kwas ɗin ya ƙare, kawai duba zane mai zuwa:

Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM

A1, B1 - Karɓar canje-canje game da yanayin murhu
A2 - Tashi SIGTERM
B2 - Cire kwasfa daga wuraren ƙarshe
B3 - Karɓan canje-canje (jerin abubuwan ƙarshen sun canza)
B4 - Sabunta dokokin iptables

Da fatan za a kula: share kwaf ɗin ƙarshen ƙarshen da aika SIGTERM baya faruwa a jere, amma a layi daya. Kuma saboda gaskiyar cewa Ingress ba ta karɓi sabunta jerin abubuwan Ƙarshen ba, za a aika sabbin buƙatu daga abokan ciniki zuwa kwafsa, wanda zai haifar da kuskuren 500 yayin ƙarewar kwas ɗin. (don ƙarin bayani game da wannan batu, mu fassara). Ana buƙatar magance wannan matsala ta hanyoyi kamar haka:

  • Aika Haɗin kai: kusa da kanun martani (idan wannan ya shafi aikace-aikacen HTTP).
  • Idan ba zai yiwu a yi canje-canje ga lambar ba, to labarin mai zuwa ya bayyana mafita wanda zai ba ku damar aiwatar da buƙatun har zuwa ƙarshen lokacin alheri.

Ka'idar Yadda NGINX da PHP-FPM ke ƙare ayyukan su

NGINX

Bari mu fara da NGINX, tunda komai yana da yawa ko žasa da shi. Narke cikin ka'idar, mun koyi cewa NGINX yana da babban tsari guda ɗaya da "ma'aikata" da yawa - waɗannan matakan yara ne waɗanda ke aiwatar da buƙatun abokin ciniki. An ba da zaɓi mai dacewa: ta amfani da umarnin nginx -s <SIGNAL> ƙare tafiyar matakai ko dai a cikin saurin rufewa ko yanayin rufewar alheri. Babu shakka, zaɓin na ƙarshe ne ke sha'awar mu.

Sa'an nan duk abin da yake da sauki: kana bukatar ka ƙara zuwa preStop-ƙugiya umarni wanda zai aika da siginar rufewa mai kyau. Ana iya yin wannan a cikin Deployment, a cikin toshe kwantena:

       lifecycle:
          preStop:
            exec:
              command:
              - /usr/sbin/nginx
              - -s
              - quit

Yanzu, lokacin da kwas ɗin ya rufe, za mu ga abubuwan masu zuwa a cikin NGINX rajistan ayyukan:

2018/01/25 13:58:31 [notice] 1#1: signal 3 (SIGQUIT) received, shutting down
2018/01/25 13:58:31 [notice] 11#11: gracefully shutting down

Kuma wannan yana nufin abin da muke buƙata: NGINX yana jiran buƙatun don kammalawa, sannan ya kashe tsarin. Duk da haka, a ƙasa za mu kuma yi la'akari da matsala na kowa saboda wanda, har ma da umarnin nginx -s quit tsarin yana ƙarewa ba daidai ba.

Kuma a wannan matakin mun yi tare da NGINX: aƙalla daga cikin rajistan ayyukan za ku iya fahimtar cewa komai yana aiki kamar yadda ya kamata.

Menene ma'amala da PHP-FPM? Ta yaya yake kula da rufewar alheri? Bari mu gane shi.

PHP-FPM

A cikin yanayin PHP-FPM, akwai ɗan ƙaramin bayani. Idan ka maida hankali akai hukuma manual bisa ga PHP-FPM, zai ce ana karɓar siginar POSIX masu zuwa:

  1. SIGINT, SIGTERM - saurin rufewa;
  2. SIGQUIT - m rufewa (abin da muke bukata).

Ba a buƙatar sauran sigina a cikin wannan aikin, don haka za mu bar nazarin su. Don ƙare aikin daidai, kuna buƙatar rubuta ƙugiya mai zuwa preStop:

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/kill
              - -SIGQUIT
              - "1"

A kallo na farko, wannan shine kawai abin da ake buƙata don aiwatar da kyakkyawan rufewa a cikin kwantena biyu. Duk da haka, aikin ya fi wuya fiye da yadda ake tsammani. A ƙasa akwai shari'o'i biyu waɗanda rufewar alheri bai yi aiki ba kuma ya haifar da rashin samun aikin na ɗan lokaci yayin turawa.

Yi aiki. Matsaloli masu yiwuwa tare da rufewar alheri

NGINX

Da farko, yana da amfani don tunawa: ban da aiwatar da umarnin nginx -s quit Akwai ƙarin mataki daya da ya dace a kula da shi. Mun ci karo da batun inda NGINX har yanzu zai aika SIGTERM maimakon siginar SIGQUIT, yana haifar da buƙatun ba su cika daidai ba. Ana iya samun irin waɗannan lokuta, alal misali, a nan. Abin takaici, mun kasa tantance takamaiman dalilin wannan hali: akwai tuhuma game da sigar NGINX, amma ba a tabbatar da ita ba. Alamar ita ce an lura da saƙon a cikin rumbun ajiyar NGINX: "bude soket #10 hagu dangane 5", bayan haka kwafsa ya tsaya.

Za mu iya lura da irin wannan matsalar, alal misali, daga martani akan Ingress da muke buƙata:

Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM
Alamun lambobin matsayi a lokacin turawa

A wannan yanayin, muna karɓar kawai lambar kuskuren 503 daga Ingress kanta: ba zai iya samun dama ga akwati na NGINX ba, tunda ba a iya samunsa. Idan ka kalli rajistar kwantena tare da NGINX, sun ƙunshi abubuwa masu zuwa:

[alert] 13939#0: *154 open socket #3 left in connection 16
[alert] 13939#0: *168 open socket #6 left in connection 13

Bayan canza siginar tasha, akwati ya fara tsayawa daidai: an tabbatar da wannan ta gaskiyar cewa ba a lura da kuskuren 503 ba.

Idan kun haɗu da irin wannan matsala, yana da ma'ana don gano abin da ake amfani da siginar tsayawa a cikin akwati da kuma menene ainihin ƙugiya na preStop yayi kama. Abu ne mai yuwuwa dalilin ya ta'allaka ne a cikin wannan.

PHP-FPM... da ƙari

Matsalar PHP-FPM an kwatanta shi a cikin hanya maras kyau: baya jira don kammala matakan yara, yana ƙare su, wanda shine dalilin da ya sa 502 kurakurai ya faru a lokacin ƙaddamarwa da sauran ayyuka. Akwai rahotannin kwari da yawa akan bugs.php.net tun 2005 (misali a nan и a nan), wanda ke bayyana wannan matsala. Amma da alama ba za ku ga komai ba a cikin rajistan ayyukan: PHP-FPM zai sanar da kammala aikin sa ba tare da wani kurakurai ko sanarwar ɓangare na uku ba.

Yana da kyau a fayyace cewa matsalar kanta na iya dogara ga ƙarami ko babba akan aikace-aikacen kanta kuma ƙila ba za ta bayyana kanta ba, alal misali, cikin saka idanu. Idan kun ci karo da shi, hanya mai sauƙi ta zo a hankali da farko: ƙara ƙugiya ta preStop tare da sleep(30). Zai ba ku damar kammala duk buƙatun da suka kasance a baya (kuma ba mu yarda da sababbi ba, tunda kwas ɗin riga mai iya Tsayarwa), kuma bayan daƙiƙa 30 kwaf ɗin kanta zai ƙare da sigina SIGTERM.

Sai dai itace cewa lifecycle domin kwandon zai yi kama da haka:

    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sleep
          - "30"

Koyaya, saboda dakika 30 sleep mu ne da yawa za mu ƙara lokacin turawa, tun da za a ƙare kowane kwafsa m 30 seconds, wanda ba shi da kyau. Me za a iya yi game da wannan?

Bari mu juya ga ƙungiyar da ke da alhakin aiwatar da aikace-aikacen kai tsaye. A wajenmu haka yake PHP-FPM, wanda ta tsohuwa ba ya saka idanu da aiwatar da tafiyar da yara: An ƙare aikin maigida nan da nan. Kuna iya canza wannan hali ta amfani da umarnin process_control_timeout, wanda ke ƙayyade ƙayyadaddun lokaci don tafiyar da yara don jira sakonni daga maigidan. Idan ka saita ƙimar zuwa daƙiƙa 20, wannan zai rufe yawancin tambayoyin da ke gudana a cikin akwati kuma zai dakatar da aikin maigida da zarar an kammala su.

Da wannan ilimin, mu koma ga matsalarmu ta ƙarshe. Kamar yadda aka ambata, Kubernetes ba dandamali ba ne na monolithic: sadarwa tsakanin sassanta daban-daban yana ɗaukar ɗan lokaci. Wannan shi ne ainihin gaskiya idan muka yi la'akari da aikin Ingresses da sauran abubuwan da suka danganci, tun da irin wannan jinkiri a lokacin ƙaddamarwa yana da sauƙi don samun kuskuren 500. Misali, kuskure na iya faruwa a matakin aika buƙatu zuwa sama, amma “lalacewar lokaci” na hulɗar tsakanin abubuwan haɗin gwiwa gajeru ne - ƙasa da daƙiƙa guda.

Saboda haka, Gabaɗaya tare da riga da aka ambata umarnin process_control_timeout za ka iya amfani da wadannan gini ga lifecycle:

lifecycle:
  preStop:
    exec:
      command: ["/bin/bash","-c","/bin/sleep 1; kill -QUIT 1"]

A wannan yanayin, za mu rama jinkiri tare da umarnin sleep kuma kada ku ƙara yawan lokacin turawa: shin akwai bambanci mai ban sha'awa tsakanin 30 seconds da ɗaya? .. A zahiri, shine process_control_timeoutda kuma lifecycle amfani da shi kawai azaman “cibiyar aminci” idan akwai lag.

Kullum magana Halayen da aka kwatanta da madaidaicin madaidaicin aiki ba kawai ga PHP-FPM ba. Irin wannan yanayi na iya tasowa wata hanya ko wata yayin amfani da wasu harsuna/tsari. Idan ba za ku iya gyara rufewar alheri ta wasu hanyoyi ba - alal misali, ta hanyar sake rubuta lambar don aikace-aikacen ya aiwatar da siginar ƙarewa daidai - kuna iya amfani da hanyar da aka bayyana. Wataƙila ba shine mafi kyawun ba, amma yana aiki.

Yi aiki. Load gwaji don duba aikin kwafsa

Gwajin lodi yana ɗaya daga cikin hanyoyin da za a bincika yadda kwandon ke aiki, tunda wannan hanya tana kawo shi kusa da yanayin yaƙi na gaske lokacin da masu amfani suka ziyarci rukunin yanar gizon. Don gwada shawarwarin da ke sama, zaku iya amfani da su Yandex.Tankom: Yana biyan dukkan bukatunmu daidai. Masu zuwa sune shawarwari da shawarwari don gudanar da gwaji tare da misali mai haske daga kwarewarmu godiya ga jadawali na Grafana da Yandex.Tank kanta.

Abu mafi mahimmanci anan shine duba canje-canje mataki-mataki. Bayan ƙara sabon gyara, gudanar da gwajin kuma duba idan sakamakon ya canza idan aka kwatanta da gudu na ƙarshe. In ba haka ba, zai zama da wuya a gano hanyoyin da ba su da tasiri, kuma a cikin dogon lokaci zai iya yin illa kawai (misali, ƙara lokacin ƙaddamarwa).

Wani nuance shi ne duba gundumomin gandun daji yayin ƙarewarsa. Ana yin rikodin bayanin rufewar alheri a wurin? Shin akwai wasu kurakurai a cikin rajistan ayyukan lokacin samun damar wasu albarkatu (misali, zuwa makwabciyar PHP-FPM maƙwabta)? Kurakurai a cikin aikace-aikacen kanta (kamar yadda yake a cikin yanayin NGINX da aka bayyana a sama)? Ina fatan cewa bayanin gabatarwa daga wannan labarin zai taimaka muku fahimtar abin da ke faruwa da akwati yayin ƙarewarsa.

Don haka, gwajin gwajin farko ya faru ba tare da lifecycle kuma ba tare da ƙarin umarni don uwar garken aikace-aikacen ba (process_control_timeout a cikin PHP-FPM). Manufar wannan gwajin shine don gano kusan adadin kurakurai (da kuma ko akwai). Har ila yau, daga ƙarin bayani, ya kamata ku sani cewa matsakaicin lokacin turawa ga kowane kwafsa ya kasance kusan 5-10 seconds har sai an shirya sosai. Sakamakon shine:

Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM

Shafin bayanan Yandex.Tank yana nuna girman kurakurai 502, wanda ya faru a lokacin turawa kuma ya dade akan matsakaita har zuwa 5 seconds. Mai yiwuwa hakan ya faru ne saboda an daina buƙatun da ake da su ga tsohon kwaf ɗin lokacin da ake ƙarewa. Bayan wannan, kurakurai 503 sun bayyana, wanda shine sakamakon kwandon NGINX da aka dakatar, wanda kuma ya watsar da haɗin gwiwa saboda baya (wanda ya hana Ingress haɗi zuwa gare shi).

Bari mu ga yadda process_control_timeout a cikin PHP-FPM zai taimake mu mu jira kammala matakan yara, watau. gyara irin wannan kurakurai. Sake turawa ta amfani da wannan umarnin:

Kubernetes tukwici & dabaru: fasali na rufewar alheri a cikin NGINX da PHP-FPM

Babu ƙarin kurakurai yayin jigilar 500th! An yi nasarar tura aikin, ayyukan rufewa masu kyau.

Duk da haka, yana da daraja tunawa da batun tare da kwantena na Ingress, ƙananan ƙananan kurakurai waɗanda za mu iya karɓa saboda rashin lokaci. Don guje wa su, duk abin da ya rage shine ƙara tsari tare da sleep kuma maimaita turawa. Koyaya, a cikin yanayinmu na musamman, ba a ga canje-canje ba (sake, babu kurakurai).

ƙarshe

Don ƙare aikin cikin alheri, muna tsammanin halaye masu zuwa daga aikace-aikacen:

  1. Jira ƴan daƙiƙa sannan ka daina karɓar sabbin haɗi.
  2. Jira duk buƙatun don kammalawa da rufe duk haɗin kai masu rai waɗanda basa aiwatar da buƙatun.
  3. Ƙare tsarin ku.

Koyaya, ba duk aikace-aikacen zasu iya aiki ta wannan hanyar ba. Ɗaya daga cikin mafita ga matsalar a cikin abubuwan Kubernetes shine:

  • ƙara ƙugiya ta riga-kafi wanda zai jira 'yan seconds;
  • nazarin fayil ɗin sanyi na bayanan baya don sigogi masu dacewa.

Misali tare da NGINX ya bayyana a sarari cewa ko da aikace-aikacen da ya kamata a fara aiwatar da siginar ƙarewa daidai ba zai iya yin haka ba, don haka yana da mahimmanci don bincika kurakuran 500 yayin ƙaddamar da aikace-aikacen. Wannan kuma yana ba ku damar kallon matsalar da yawa kuma ba ku mai da hankali kan kwasfa ɗaya ko akwati ɗaya ba, amma duba gabaɗayan abubuwan more rayuwa gaba ɗaya.

A matsayin kayan aikin gwaji, zaku iya amfani da Yandex.Tank tare da kowane tsarin kulawa (a cikin yanayinmu, an karɓi bayanai daga Grafana tare da goyon bayan Prometheus don gwajin). Matsalolin da ke tattare da rufewar alheri suna bayyane a ƙarƙashin nauyi masu nauyi waɗanda maƙasudin za su iya haifar da su, kuma saka idanu yana taimakawa wajen tantance halin da ake ciki dalla-dalla yayin gwaji ko bayan gwajin.

Dangane da amsawa game da labarin: yana da kyau a ambaci cewa an kwatanta matsalolin da mafita a nan dangane da NGINX Ingress. Ga wasu lokuta, akwai wasu mafita, waɗanda za mu iya la'akari da su a cikin abubuwan da ke gaba na jerin.

PS

Sauran daga jerin shawarwari da dabaru na K8s:

source: www.habr.com

Add a comment