çŸåšã§ã¯ãã¯ã©ã¹ã¿ãŒã®ã¢ããªã±ãŒã·ã§ã³ãšã·ã¹ãã ã³ã³ããŒãã³ãã®äž¡æ¹ã®ãã°ãä¿åãã ELK ã¹ã¿ãã¯ãªã㧠Kubernetes ããŒã¹ã®ãããžã§ã¯ããæ³åããããšã¯äžå¯èœã§ãã ç§ãã¡ã®å®è·µã§ã¯ãLogstash ã®ä»£ããã« Fluentd 㧠EFK ã¹ã¿ãã¯ã䜿çšããŸãã
Fluentd ã¯ææ°ã®ãŠãããŒãµã« ãã° ã³ã¬ã¯ã¿ãŒã§ããããŸããŸã人æ°ãé«ãŸã£ãŠãããCloud Native Computing Foundation ã«åå ããŠããŸãããã®ãããFluentd ã®éçºãã¯ãã«ã¯ Kubernetes ãšçµã¿åãããŠäœ¿çšââããããšã«éç¹ã眮ãããŠããŸãã
Logstash ã®ä»£ããã« Fluentd ã䜿çšãããšããäºå®ã¯ããœãããŠã§ã¢ ããã±ãŒãžã®äžè¬çãªæ¬è³ªãå€æŽãããã®ã§ã¯ãããŸããããFluentd ã¯ããã®å€çšéæ§ã«èµ·å ããç¬èªã®ãã¥ã¢ã³ã¹ã«ãã£ãŠç¹åŸŽä»ããããŸãã
ããšãã°ããã°ã®åŒ·åºŠãé«ãå€å¿ãªãããžã§ã¯ã㧠EFK ã䜿ãå§ãããšããKibana ã§äžéšã®ã¡ãã»ãŒãžãäœåºŠãç¹°ãè¿ã衚瀺ããããšããäºå®ã«çŽé¢ããŸããã ãã®èšäºã§ã¯ããã®çŸè±¡ãçºçããçç±ãšåé¡ã®è§£æ±ºæ¹æ³ã説æããŸãã
æžé¡ã®éè€åé¡
ç§ãã¡ã®ãããžã§ã¯ãã§ã¯ãFluentd 㯠DaemonSet ãšããŠãããã€ãã (Kubernetes ã¯ã©ã¹ã¿ãŒã®åããŒãäžã® XNUMX ã€ã®ã€ã³ã¹ã¿ã³ã¹ã§èªåçã«èµ·åãããŸã)ã/var/log/containers ã® stdout ã³ã³ãã㌠ãã°ãç£èŠããŸãã åéãšåŠçã®åŸãJSON ããã¥ã¡ã³ã圢åŒã®ãã°ã ElasticSearch ã«éä¿¡ããããããžã§ã¯ãã®èŠæš¡ãšããã©ãŒãã³ã¹ãšãã©ãŒã«ã ãã¬ã©ã³ã¹ã®èŠä»¶ã«å¿ããŠãã¯ã©ã¹ã¿ãŒåœ¢åŒãŸãã¯ã¹ã¿ã³ãã¢ãã³åœ¢åŒã§çæãããŸãã Kibana ã¯ã°ã©ãã£ã«ã« ã€ã³ã¿ãŒãã§ã€ã¹ãšããŠäœ¿çšãããŸãã
Fluentd ãåºåãããã¡ãªã³ã° ãã©ã°ã€ã³ãšãšãã«äœ¿çšãããšãElasticSearch ã®äžéšã®ããã¥ã¡ã³ãããŸã£ããåãã³ã³ãã³ããæã¡ãèå¥åã®ã¿ãç°ãªããšããç¶æ³ã«ééããŸããã Nginx ãã°ãäŸãšããŠäœ¿çšãããšããããã¡ãã»ãŒãžã®ç¹°ãè¿ãã§ããããšã確èªã§ããŸãã ãã° ãã¡ã€ã«ã«ã¯ã次ã®ã¡ãã»ãŒãžã XNUMX ã€ã®ã³ããŒãšããŠååšããŸãã
127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -
ãã ããElasticSearch ã«ã¯æ¬¡ã®ã¡ãã»ãŒãžãå«ãããã¥ã¡ã³ããããã€ããããŸãã
{
"_index": "test-custom-prod-example-2020.01.02",
"_type": "_doc",
"_id": "HgGl_nIBR8C-2_33RlQV",
"_version": 1,
"_score": 0,
"_source": {
"service": "test-custom-prod-example",
"container_name": "nginx",
"namespace": "test-prod",
"@timestamp": "2020-01-14T05:29:47.599052886 00:00",
"log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -",
"tag": "custom-log"
}
}
{
"_index": "test-custom-prod-example-2020.01.02",
"_type": "_doc",
"_id": "IgGm_nIBR8C-2_33e2ST",
"_version": 1,
"_score": 0,
"_source": {
"service": "test-custom-prod-example",
"container_name": "nginx",
"namespace": "test-prod",
"@timestamp": "2020-01-14T05:29:47.599052886 00:00",
"log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -",
"tag": "custom-log"
}
}
ããã«ãXNUMX å以äžã®ç¹°ãè¿ãããã£ãŠãããã
Fluentd ãã°ã§ãã®åé¡ãä¿®æ£ããŠãããšãã«ã次ã®å 容ãå«ãå€æ°ã®èŠåã衚瀺ãããããšããããŸãã
2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"elasticsearch", :port=>9200, :scheme=>"http", :user=>"elastic", :password=>"obfuscated"}): read timeout reached"
ãããã®èŠåã¯ãElasticSearch ã request_timeout ãã©ã¡ãŒã¿ãŒã§æå®ãããæéå ã«ãªã¯ãšã¹ãã«å¯Ÿããå¿çãè¿ããªãããã«ã転éããããããã¡ãŒ ãã©ã°ã¡ã³ããã¯ãªã¢ã§ããªãå Žåã«çºçããŸãã ãã®åŸãFluentd ã¯ãããã¡ ãã©ã°ã¡ã³ãã ElasticSearch ã«å床éä¿¡ããããšããä»»æã®åæ°è©Šè¡ããåŸãæäœã¯æ£åžžã«å®äºããŸãã
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce"
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747"
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2"
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b"
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"
ãã ããElasticSearch ã¯è»¢éãããåãããã¡ ãã©ã°ã¡ã³ããäžæã®ãã®ãšããŠæ±ããã€ã³ããã¯ã¹äœææã«ãããã«äžæã® _id ãã£ãŒã«ãå€ãå²ãåœãŠãŸãã ãã®ããã«ããŠãã¡ãã»ãŒãžã®ã³ããŒã衚瀺ãããŸãã
Kibana ã§ã¯æ¬¡ã®ããã«ãªããŸãã
ãœãªã¥ãŒã·ã§ã³
ãã®åé¡ã解決ããã«ã¯ãããã€ãã®ãªãã·ã§ã³ããããŸãã ãã®ãã¡ã® XNUMX ã€ã¯ãåããã¥ã¡ã³ãã«äžæã®ããã·ã¥ãçæããããã« fluent-plugin-elasticsearch ãã©ã°ã€ã³ã«çµã¿èŸŒãŸããã¡ã«ããºã ã§ãã ãã®ã¡ã«ããºã ã䜿çšãããšãElasticSearch ã¯è»¢é段éã§ç¹°ãè¿ããèªèããããã¥ã¡ã³ãã®éè€ãé²ããŸãã ãããããã®åé¡è§£æ±ºæ¹æ³ã§ã¯èª¿æ»ãé£ãããã¿ã€ã ã¢ãŠãããªããšãšã©ãŒã解æ¶ãããªãããšãèæ ®ããå¿ èŠãããããããã®æ¹æ³ã®äœ¿çšãæ念ããŸããã
Fluentd åºåã§ã¯ãããã¡ãªã³ã° ãã©ã°ã€ã³ã䜿çšããŠãçæçãªãããã¯ãŒã¯ã®åé¡ããã°åŒ·åºŠã®å¢å ãçºçããå Žåã®ãã°æ倱ãé²ããŸãã äœããã®çç±ã§ ElasticSearch ãããã¥ã¡ã³ããã€ã³ããã¯ã¹ã«å³åº§ã«æžã蟌ãããšãã§ããªãå Žåãããã¥ã¡ã³ãã¯ãã¥ãŒã«å ¥ãããããã£ã¹ã¯ã«ä¿åãããŸãã ãããã£ãŠãç§ãã¡ã®å Žåãäžèšã®ãšã©ãŒã«ã€ãªããåé¡ã®åå ãæé€ããã«ã¯ãFluentd åºåãããã¡ãŒãååãªãµã€ãºã«ãªãããããã¡ãªã³ã° ãã©ã¡ãŒã¿ãŒã«æ£ããå€ãèšå®ããå¿ èŠããããŸããåæã«ãå²ãåœãŠãããæéå ã«ã¯ãªã¢ããããšãã§ããŸãã
以äžã§èª¬æãããã©ã¡ãŒã¿ã®å€ã¯ããµãŒãã¹ã«ãããã°ãžã®ã¡ãã»ãŒãžã®æžã蟌ã¿åŒ·åºŠããã£ã¹ã¯ ã·ã¹ãã ã®ããã©ãŒãã³ã¹ããããã¯ãŒã¯ãªã©ã®å€ãã®èŠå ã«äŸåãããããåºåãã©ã°ã€ã³ã§ãããã¡ãªã³ã°ã䜿çšããç¹å®ã®ã±ãŒã¹ããšã«åå¥ã§ããããšã«æ³šæããŠãã ããããã£ãã«è² è·ãšãã®åž¯åå¹ ã ãããã£ãŠãåã ã®ã±ãŒã¹ã«é©ããŠãããåé·ã§ã¯ãªããããã¡èšå®ãååŸãããã¿ããã«é·æéã®æ€çŽ¢ãåé¿ããã«ã¯ãFluentd ãåäœäžã«ãã°ã«æžã蟌ããããã°æ å ±ã䜿çšããŠãæ¯èŒçè¿ éã«æ£ããå€ãååŸã§ããŸãã
åé¡ãèšé²ãããæç¹ã§ã¯ãæ§æã¯æ¬¡ã®ããã«ãªã£ãŠããŸããã
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.test.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 8M
queue_limit_length 8
overflow_action block
</buffer>
åé¡ã解決ãããšãã次ã®ãã©ã¡ãŒã¿ã®å€ãæåã§éžæãããŸããã
chunk_limit_size â ãããã¡å
ã®ã¡ãã»ãŒãžãåå²ããããã£ã³ã¯ã®ãµã€ãºã
- lush_interval â ãããã¡ãã¯ãªã¢ããããŸã§ã®æéééã
- queue_limit_length â ãã¥ãŒå ã®ãã£ã³ã¯ã®æ倧æ°ã
- request_timeout ã¯ãFluentd ãš ElasticSearch éã®æ¥ç¶ã確ç«ãããæéã§ãã
åèšãããã¡ ãµã€ãºã¯ããã©ã¡ãŒã¿ queue_limit_length ãš chunk_limit_size ãä¹ç®ããããšã§èšç®ã§ããŸããããã¯ãããããããæå®ããããµã€ãºãæã€ãã¥ãŒå ã®ãã£ã³ã¯ã®æ倧æ°ããšããŠè§£éã§ããŸãã ãããã¡ ãµã€ãºãäžååãªå Žåã¯ã次ã®èŠåããã°ã«è¡šç€ºãããŸãã
2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block
ããã¯ãå²ãåœãŠãããæéå ã«ãããã¡ãã¯ãªã¢ããæéããªãããããã¡å šäœã«å ¥ãããŒã¿ããããã¯ããããã°ã®äžéšã倱ãããããšãæå³ããŸãã
ãããã¡ãŒãå¢ããã«ã¯ XNUMX ã€ã®æ¹æ³ããããŸããXNUMX ã€ã¯ãã¥ãŒå ã®åãã£ã³ã¯ã®ãµã€ãºãå¢ãããããã¥ãŒã«å ¥ããããšãã§ãããã£ã³ã¯ã®æ°ãå¢ããããšã§ãã
ãã£ã³ã¯ ãµã€ãº chunk_limit_size ã 32 ã¡ã¬ãã€ããè¶ ããå€ã«èšå®ãããšãåä¿¡ãã±ããã倧ãããããããElasticSeacrh ã¯ãããåãå ¥ããŸããã ãããã£ãŠããããã¡ãããã«å¢ããå¿ èŠãããå Žåã¯ãæ倧ãã¥ãŒé· queue_limit_length ãå¢ããããšããå§ãããŸãã
ãããã¡ãŒã®ãªãŒããŒãããŒãæ¢ãŸããã¿ã€ã ã¢ãŠããäžååã§ãããšããã¡ãã»ãŒãžã ããæ®ã£ãããrequest_timeout ãã©ã¡ãŒã¿ãŒãå¢ããå§ããããšãã§ããŸãã ãã ããå€ã 20 ç§ãè¶ ããå€ã«èšå®ãããšã次ã®èŠåã Fluentd ãã°ã«è¡šç€ºããå§ããŸãã
2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev"
ãã®ã¡ãã»ãŒãžã¯ã·ã¹ãã ã®åäœã«ã¯ãŸã£ãã圱é¿ãäžããŸããããããã¡ã®ãã©ãã·ã¥æéããslow_flush_log_threshold ãã©ã¡ãŒã¿ã§èšå®ãããæéãããé·ãããã£ãããšãæå³ããŸãã ããã¯ãããã°æ å ±ã§ãããrequest_timeout ãã©ã¡ãŒã¿ãŒã®å€ãéžæãããšãã«äœ¿çšããŸãã
äžè¬åãããéžæã¢ã«ãŽãªãºã ã¯æ¬¡ã®ãšããã§ãã
- request_timeout ããå¿ èŠä»¥äž (æ°çŸç§) ã§ããããšãä¿èšŒãããå€ã«èšå®ããŸãã ã»ããã¢ããäžã«ããã®ãã©ã¡ãŒã¿ãæ£ããèšå®ããããã®äž»ãªåºæºã¯ãã¿ã€ã ã¢ãŠãäžè¶³ã«é¢ããèŠåãæ¶ããããšã§ãã
- throw_flush_log_threshold ãããå€ãè¶ ããããšã«é¢ããã¡ãã»ãŒãžãåŸ ã¡ãŸãã elapsed_time ãã£ãŒã«ãã®èŠåããã¹ãã«ã¯ããããã¡ãã¯ãªã¢ããããªã¢ã«ã¿ã€ã æéã衚瀺ãããŸãã
- request_timeout ãã芳å¯æéäžã«ååŸãããæ倧 elapsed_time å€ããã倧ããå€ã«èšå®ããŸãã request_timeout å€ã¯ãelapsed_time + 50% ãšããŠèšç®ãããŸãã
- é·æéã®ãããã¡ ãã©ãã·ã¥ã«é¢ããèŠåããã°ããåé€ããã«ã¯ãslow_flush_log_threshold ã®å€ã倧ããããŸãã ãã®å€ã¯ãelapsed_time + 25% ãšããŠèšç®ãããŸãã
åè¿°ã®ããã«ããããã®ãã©ã¡ãŒã¿ã®æçµå€ã¯ã±ãŒã¹ããšã«åå¥ã«ååŸãããŸãã äžèšã®ã¢ã«ãŽãªãºã ã«åŸãããšã§ãã¡ãã»ãŒãžã®ç¹°ãè¿ãã«ã€ãªãããšã©ãŒã確å®ã«æé€ã§ããŸãã
以äžã®è¡šã¯ãã¡ãã»ãŒãžã®éè€ã«ã€ãªãã XNUMX æ¥ãããã®ãšã©ãŒæ°ããäžèšã®ãã©ã¡ãŒã¿ãŒã®å€ãéžæããããã»ã¹ã§ã©ã®ããã«å€åãããã瀺ããŠããŸãã
ããŒã1
ããŒã2
ããŒã3
ããŒã4
ããã©ã¢ãŒã¢ãã¿ãŒ
ããã©ã¢ãŒã¢ãã¿ãŒ
ããã©ã¢ãŒã¢ãã¿ãŒ
ããã©ã¢ãŒã¢ãã¿ãŒ
ãããã¡ã®ãã©ãã·ã¥ã«å€±æããŸãã
1749/2
694/2
47/0
1121/2
åè©Šè¡ãæåããŸãã
410/2
205/1
24/0
241/2
ããã«ããããžã§ã¯ããæé·ããããã«å¿ããŠãã°ã®æ°ãå¢å ããã«ã€ããŠãçµæãšããŠåŸãããèšå®ã®é¢é£æ§ã倱ãããå¯èœæ§ãããããšã«ã泚æããŠãã ããã ã¿ã€ã ã¢ãŠããäžååã§ããããšã®äž»ãªå åã¯ãFluentd ãã°ãžã®é·ããããã¡ ãã©ãã·ã¥ã«é¢ããã¡ãã»ãŒãžãè¿ãããããšãã€ãŸããslow_flush_log_threshold ã®ãããå€ãè¶ ããŠããããšã§ãã ãã®æç¹ãããrequest_timeout ãã©ã¡ãŒã¿ãŒãè¶ ãããŸã§ã«ã¯ãŸã ããããªããŒãžã³ãããããããããã®ã¡ãã»ãŒãžã«ã¿ã€ã ãªãŒã«å¿çããäžèšã®æé©ãªèšå®ãéžæããããã»ã¹ãç¹°ãè¿ãå¿ èŠããããŸãã
ãŸãšã
Fluentd åºåãããã¡ã®åŸ®èª¿æŽã¯ãEFK ã¹ã¿ãã¯ãæ§æããäž»èŠãªæ®µéã® XNUMX ã€ã§ããããã®åäœã®å®å®æ§ãšã€ã³ããã¯ã¹å ã®ããã¥ã¡ã³ãã®æ£ããé 眮ã決å®ããŸãã 説æããæ§æã¢ã«ãŽãªãºã ã«åºã¥ããŠããã¹ãŠã®ãã°ãç¹°ãè¿ããæ倱ãªãæ£ããé åºã§ ElasticSearch ã€ã³ããã¯ã¹ã«æžã蟌ãŸããããšã確èªã§ããŸãã
ç§ãã¡ã®ããã°ã®ä»ã®èšäºããèªã¿ãã ããã
Go ãš Zabbix 5.0 ãåéã«ãªããŸãã ããŠã³ã¿ã€ã ãªã㧠Kubernetes ã¯ã©ã¹ã¿ãŒãã¢ããã°ã¬ãŒããã Kubernetes: ã·ã¹ãã ãªãœãŒã¹ç®¡çãæ§æããããšããªãããã»ã©éèŠãªã®ã§ãããã? Docker ã€ã¡ãŒãžãçž®å°ããããã® XNUMX ã€ã®ç°¡åãªããªã㯠å€æ°ã®ç°çš® Web ãããžã§ã¯ãã®ããã¯ã¢ãã
åºæïŒ habr.com