Fluentd: Waa maxay sababta ay muhiim u tahay in la habeeyo wax soo saarka

Fluentd: Waa maxay sababta ay muhiim u tahay in la habeeyo wax soo saarka

Maalmahan, waa wax aan suurtagal ahayn in la qiyaaso mashruuc ku salaysan Kubernetes iyada oo aan la haynin ELK, kaas oo kaydiya diiwaannada codsiyada iyo qaybaha nidaamka ee kutlada. Dhaqankeena, waxaan isticmaalnaa xirmada EFK ee Fluentd halkii Logstash.

Fluentd waa ururiyaha log-ga caalamiga ah ee casriga ah kaas oo si aad ah u sii kordhaya oo ku biiray Cloud Native Computing Foundation, taas oo ah sababta horumarinta vector-keeda ay diiradda saarto isticmaalka iyada oo lala kaashanayo Kubernetes.

Xaqiiqda ah in la isticmaalo Fluentd halkii Logstash ma beddeleyso nuxurka guud ee xirmada softiweerka, si kastaba ha ahaatee, Fluentd waxaa lagu gartaa nuucyada gaarka ah ee ka dhashay isku-dhafka.

Tusaale ahaan, markii aan bilownay isticmaalka EFK mashruuc mashquul ah oo aad u sarreeya, waxaan la kulannay xaqiiqda ah in Kibana farriimaha qaarkood ayaa lagu soo bandhigay dhowr jeer. Maqaalkan waxaan kuu sheegi doonaa sababta ay dhacdadani u dhacdo iyo sida loo xalliyo dhibaatada.

Dhibaatada dukumeentiga nuqul ka mid ah

Mashaariicdayada, Fluentd waxa loo daabulayaa sidii DaemonSet (si toos ah ayaa loo bilaabay hal tusaale oo ka mid ah noodhka Kubernetes kutlada) wuxuuna la socdaa weelka stdout ee ku jira /var/log/containers. Ururinta iyo habaynta ka dib, logyada qaabka dukumeentiyada JSON waxa loo diraa ElasticSearch, kor loogu qaaday qaab kooxeed ama gooni ah, taas oo ku xidhan miisaanka mashruuca iyo shuruudaha waxqabadka iyo dulqaadka khaladka. Kibana waxa loo istcimaalaa is dhexgal garaaf ahaan.

Markii aan isticmaalnay Fluentd plugin wax soo saar leh, waxaan la kulannay xaalad dukumeentiyada ElasticSearch qaarkood ay lahaayeen nuxur isku mid ah oo ay ku kala duwan yihiin kaliya aqoonsiga. Waxaad xaqiijin kartaa in tani ay tahay ku celcelinta fariinta adoo isticmaalaya log Nginx tusaale ahaan. Galka log, fariintan waxay ku jirtaa hal koobi:

127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -

Si kastaba ha ahaatee, waxa jira dukumeentiyo dhowr ah oo ku jira ElasticSearch oo ka kooban fariintan:

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -",
    "tag": "custom-log"
  }
}

Waxaa intaa dheer, waxaa jiri kara wax ka badan laba jeer.

Markaad dhibaatadan ku hagaajinayso Fluentd logs, waxaad arki kartaa tiro badan oo digniino ah oo wata nuxurka soo socda:

2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"elasticsearch", :port=>9200, :scheme=>"http", :user=>"elastic", :password=>"obfuscated"}): read timeout reached"

Digniinahani waxa ay dhacaan marka ElasticSearch aanu ku soo celin karin jawaabta codsiga gudaha wakhtiga lagu cayimay halbeegga codsiga_timeout, taas oo ah sababta jajabka kaydinta la gudbiyay aan la nadiifin karin. Tan ka dib, Fluentd waxay isku daydaa inay u soo dirto jajabka kaydinta ElasticSearch mar labaad iyo ka dib isku dayo tiro aan sabab lahayn, hawlgalku si guul leh ayuu u dhammeeyaa:

2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"

Si kastaba ha ahaatee, ElasticSearch waxay ula dhaqantaa mid kasta oo ka mid ah jajabyada kaydka la wareejiyay si gaar ah oo waxay ku meelaysaa _id qiyamka goobta gaarka ah inta lagu jiro tusmaynta. Tani waa sida nuqullada fariimaha u muuqdaan.

Gudaha Kibana waxay u egtahay sidan:

Fluentd: Waa maxay sababta ay muhiim u tahay in la habeeyo wax soo saarka

Dhibaatooyinka Dhibaatada

Waxaa jira dhowr ikhtiyaar oo lagu xallinayo dhibaatadan. Mid ka mid ah iyaga ayaa ah habka loo dhisay plugin-fiil-plugin-elasticsearch si loo soo saaro hash gaar ah dukumeenti kasta. Haddii aad isticmaasho habkan, ElasticSearch waxay aqoonsan doontaa ku celcelinta heerka gudbinta waxayna ka hortagi doontaa dukumeenti nuqul ah. Laakiin waa in aan xisaabta ku darnaa in habkan lagu xallinayo dhibaatada uu la halgamayo baaritaanka oo uusan meesha ka saarin qaladka waqti la'aan, sidaas darteed waan ka tagnay isticmaalkeeda.

Waxaan u isticmaalnaa furaha wax-soo-saarka Fluentd si aan uga hortagno luminta log haddii ay dhacdo dhibaatooyinka shabakadda muddada-gaaban ama kordhinta xoojinta dhejinta. Haddii sabab qaar ka mid ah ElasticSearch uu awoodi waayo inuu isla markiiba u qoro dukumeenti tusmada, dukumeentigu saf buu galay oo lagu kaydiyaa saxanka. Sidaa darteed, xaaladdeenna, si loo baabi'iyo isha dhibaatada u horseedaysa qaladka kor lagu sharraxay, waxaa lagama maarmaan ah in la dejiyo qiyamka saxda ah ee cabbirrada wax-soo-saarka, kaas oo wax-soo-saarka Fluentd uu noqon doono cabbir ku filan iyo isla markaasna la maareeyo in lagu nadiifiyo wakhtiga loogu talagalay.

Waxaa xusid mudan in qiyamka halbeegyada hoos ku xusan ay yihiin shaqsi xaalad kasta oo gaar ah oo la isticmaalayo buffering in plugins wax soo saarka, maadaama ay ku xiran tahay arrimo badan: xoojinta fariimaha qoraal log by adeegyada, nidaamka disk waxqabadka, network load channel iyo bandwidth ay. Sidaa darteed, si aad u hesho habayn ku habboon kiis kasta oo gaar ah, laakiin aan loo baahnayn, adigoo iska ilaalinaya raadinta dheer si indho la'aan ah, waxaad isticmaali kartaa macluumaadka cilladaha ee Fluentd ku qorto log-keeda inta lagu jiro hawlgalka oo si dhakhso ah u hesho qiyamka saxda ah.

Waqtiga dhibaatada la duubay, qaabeyntu waxay u egtahay sidan:

 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>

Marka la xalinayo dhibaatada, qiyamka xuduudaha soo socda ayaa gacanta lagu doortay:
chunk_limit_size - cabbirka jajabyada fariimaha ku jira kaydka loo qaybiyo.

  • flush_interval - muddada u dhaxaysa ka dib marka kaydinta la nadiifiyo.
  • queue_limit_length - tirada ugu badan ee jajabyada safka ku jira.
  • request_timeout waa waqtiga xiriirka ka dhexeeya Fluentd iyo ElasticSearch la aasaasay.

Wadarta cabbirka kaydinta waxaa lagu xisaabin karaa iyada oo la dhufto halbeegyada queue_limit_length iyo chunk_limit_size, kaas oo loo tarjumi karo "tirada ugu badan ee jajabyada safka, mid kastaa wuxuu leeyahay cabbir la bixiyay." Haddii cabbirka kaydku aanu ku filnayn, digniinta soo socota ayaa ka soo bixi doonta diiwaannada:

2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block

Waxay ka dhigan tahay in bakhaarku aanu haysan wakhti lagu nadiifiyo wakhtiga loogu talagalay iyo xogta gelaysa kaydinta buuxa waa la xannibay, taas oo horseedi doonta luminta qayb ka mid ah diiwaannada.

Waxa aad ku kordhin kartaa bakhaarka laba siyaabood: adiga oo kordhinaya xajmiga qayb kasta oo safka ku jira, ama tirada cutubyada safka ku jiri kara.

Haddii aad dejiso cabbirka jajabka chunk_limit_size in ka badan 32 megabyte, markaa ElasticSeacrh ma aqbali doono, maadaama xidhmada soo socotaa aad u weynaan doonto. Sidaa darteed, haddii aad u baahan tahay inaad sii kordhiso kaydiyaha, waxa fiican inaad kordhiso dhererka safka ugu badan queue_limit_length.

Marka kaydku joojiyo qulqulka oo kaliya wakhtiga dhammayntu ay hadhsan tahay fariin ku filan, waxaad bilaabi kartaa kordhinta heerka request_timeout. Si kastaba ha noqotee, haddii aad dejiso qiimaha in ka badan 20 ilbiriqsi, digniinaha soo socda ayaa bilaabi doona inay ka soo muuqdaan Fluentd logs:

2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 

Fariintani sinaba uma saamaynayso hawlgalka nidaamka oo macneheedu waxa weeye in wakhtiga buffer flush uu qaatay in ka badan intii lagu dejiyay cabirka slow_flush_log_threshold. Tani waa macluumaadka khaladka ah oo aan isticmaalno marka aan dooraneyno qiimaha cabbirka codsiga_timeout.

Xulashada guud ee algorithm waa sida soo socota:

  1. Deji request_timeout qiime la dammaanad qaaday inuu ka weyn yahay lagama maarmaanka ah (boqolaal ilbiriqsi). Inta lagu jiro habaynta, cabbirka ugu weyn ee goobta saxda ah ee cabbirkan ayaa noqon doona luminta digniinaha ku saabsan waqti dhimis.
  2. Sug fariimaha ku saabsan ka dhaafitaanka heerka slow_flush_log_threshold. Qoraalka digniinta ee ku jira goobta_wakhtiga la dhaafay ayaa tusi doona wakhtiga dhabta ah ee kaydiyaha la nadiifiyay.
  3. U deji request_timeout qiimo ka weyn qiimaha ugu badnaa ee la soo dhaafay_waqtiga la helay intii lagu jiray muddada fiirsashada. Waxaan u xisaabinaa codsiga_timeout qiimihii uu dhaafay_time + 50%.
  4. Si aad uga saarto digniinaha ku saabsan qulqulka kaydinta dheer ee diiwaanka, waxaad sare u qaadi kartaa qiimaha slow_flush_log_threshold. Waxaan u xisaabineynaa qiimahan sidii uu dhaafay_time + 25%.

Qiimaha kama dambaysta ah ee xuduudahan, sida hore loo soo sheegay, ayaa si gaar ah loo helay kiis kasta. Markaad raacdo algorithm-ka sare, waxaa naloo dammaanad qaaday inaan baabi'inno qaladka keenaya farriimaha soo noqnoqda.

Jadwalka hoose wuxuu muujinayaa sida tirada khaladaadka maalintii, taasoo horseedaysa nuqul ka mid ah fariimaha, isbeddelka habka xulashada qiyamka cabbirrada kor lagu sharaxay:

noodhka-1
noodhka-2
noodhka-3
noodhka-4

Kahor kadib
Kahor kadib
Kahor kadib
Kahor kadib

ku guuldarraystay inuu nadiifiyo bakhaarka
1749/2
694/2
47/0
1121/2

isku daygii waa lagu guulaystay
410/2
205/1
24/0
241/2

Waxa kale oo xusid mudan in goobaha ka soo baxay ay lumin karaan ku-talogalkooda marka uu mashruucu kordho iyo, sidaas awgeed, tirada qormooyinka ayaa kordha. Calaamadda ugu muhiimsan ee waqti-guri la'aanta waa soo celinta fariimaha ku saabsan qulqulka dheer ee kaydinta Fluentd log, taas oo ah, ka badan heerka slow_flush_log_threshold. Laga bilaabo meeshan, waxaa weli jira margin yar ka hor inta aan la dhaafin codsiga_timeout, markaa waxaa lagama maarmaan ah in laga jawaabo fariimahan waqti ku habboon oo lagu celiyo habka xulashada goobaha ugu fiican ee kor lagu sharaxay.

gunaanad

Hagaajinta wax soo saarka Fluentd waa mid ka mid ah marxaladaha ugu muhiimsan ee habaynta xirmada EFK, go'aaminta xasiloonida hawlgalkeeda iyo meelaynta saxda ah ee dukumentiyada tusmooyinka. Iyada oo ku saleysan qaabeynta qaabeynta la sharraxay, waxaad hubin kartaa in dhammaan diiwaannada lagu qori doono tusmada ElasticSearch si sax ah, iyada oo aan lagu celin ama khasaare lahayn.

Sidoo kale akhri maqaallo kale oo ku jira blog-keena:

Source: www.habr.com

Add a comment