我的名字是 Igor Sidorenko，我是管理團隊的技術主管，負責維護整個 Domclick 基礎設施的正常運作。

我想分享我在 Elasticsearch 中設定分散式資料儲存的經驗。我們將考慮節點上的哪些設定負責分片的分佈，以及 ILM 的結構和工作方式。

不管怎樣，使用日誌的人都面臨著長期儲存以供後續分析的問題。在 Elasticsearch 中，這一點尤其重要，因為策展人功能非常糟糕。版本 6.6 引進了 ILM 功能。它包含 4 個階段：

熱門－索引被主動更新和查詢。
暖 - 索引不再更新，但仍被查詢。
冷——索引不再更新並且很少被查詢。資訊仍然可以搜索，但查詢速度可能會更慢。
刪除 - 不再需要該索引，可以安全刪除。

給定

Elasticsearch Data Hot：24 個處理器、128 GB 記憶體、1,8 TB SSD RAID 10（8 個節點）。
Elasticsearch 資料熱備份：24 個處理器、64 GB 記憶體、8 TB NetApp SSD 策略（4 個節點）。
Elasticsearch 資料冷：8 個處理器、32 GB 記憶體、128 TB HDD RAID 10（4 個節點）。

目標

這些設定是單獨的，一切都取決於節點上的空間、索引數量、日誌等。對我們來說，這是每天 2-3 TB 的數據。

5 天 – 熱階段（8 個主伺服器/1 個副本）。
20 天 - 溫暖階段（收縮指數 4 個主伺服器/1 個副本伺服器）。
90 天 - 冷期（凍結指數 4 個主伺服器/1 個副本伺服器）。
120 天——刪除階段。

設定 Elasticsearch

要在節點之間分配分片，只需要一個參數：

熱賣-節點：

~]# cat /etc/elasticsearch/elasticsearch.yml | grep attr
# Add custom attributes to the node:
node.attr.box_type: hot

溫暖-節點：

~]# cat /etc/elasticsearch/elasticsearch.yml | grep attr
# Add custom attributes to the node:
node.attr.box_type: warm

冷-節點：

~]# cat /etc/elasticsearch/elasticsearch.yml | grep attr
# Add custom attributes to the node:
node.attr.box_type: cold

設定Logstash

這一切是如何運作的？我們又是如何實現這項功能的？讓我們先將日誌放入 Elasticsearch。有兩種方法：

Logstash 從 Kafka 中提取日誌。可以將其清洗乾淨或側放變形。
某些內容寫入 Elasticsearch 本身，例如 APM 伺服器。

我們來看一個透過 Logstash 管理索引的範例。它創建一個索引並將其應用於索引模板以及相應的 ILM.

k8s-ingress.conf

input {
    kafka {
        bootstrap_servers => "node01, node02, node03"
        topics => ["ingress-k8s"]
        decorate_events => false
        codec => "json"
    }
}

filter {
    ruby {
        path => "/etc/logstash/conf.d/k8s-normalize.rb"
    }
    if [log] =~ "[warn]" or [log] =~ "[error]" or [log] =~ "[notice]" or [log] =~ "[alert]" {
        grok {
            match => { "log" => "%{DATA:[nginx][error][time]} [%{DATA:[nginx][error][level]}] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: *%{NUMBER:[nginx][error][connection_id]} %{DATA:[nginx][error][message]}, client: %{IPORHOST:[nginx][error][remote_ip]}, server: %{DATA:[nginx][error][server]}, request: "%{WORD:[nginx][error][method]} %{DATA:[nginx][error][url]} HTTP/%{NUMBER:[nginx][error][http_version]}", (?:upstream: "%{DATA:[nginx][error][upstream][proto]}://%{DATA:[nginx][error][upstream][host]}:%{DATA:[nginx][error][upstream][port]}/%{DATA:[nginx][error][upstream][url]}", )?host: "%{DATA:[nginx][error][host]}"(?:, referrer: "%{DATA:[nginx][error][referrer]}")?" }
            remove_field => "log"
        }
    }
    else {
        grok {
            match => { "log" => "%{IPORHOST:[nginx][access][host]} - [%{IPORHOST:[nginx][access][remote_ip]}] - %{DATA:[nginx][access][remote_user]} [%{HTTPDATE:[nginx][access][time]}] "%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][bytes_sent]} "%{DATA:[nginx][access][referrer]}" "%{DATA:[nginx][access][agent]}" %{NUMBER:[nginx][access][request_lenght]} %{NUMBER:[nginx][access][request_time]} [%{DATA:[nginx][access][upstream][name]}] (?:-|%{IPORHOST:[nginx][access][upstream][addr]}:%{NUMBER:[nginx][access][upstream][port]}) (?:-|%{NUMBER:[nginx][access][upstream][response_lenght]}) %{DATA:[nginx][access][upstream][response_time]} %{DATA:[nginx][access][upstream][status]} %{DATA:[nginx][access][request_id]}" }
            remove_field => "log"
        }
    }
}
output {
    elasticsearch {
        id => "k8s-ingress"
        hosts => ["node01", "node02", "node03", "node04", "node05", "node06", "node07", "node08"]
        manage_template => true # включаем управление шаблонами
        template_name => "k8s-ingress" # имя применяемого шаблона
        ilm_enabled => true # включаем управление ILM
        ilm_rollover_alias => "k8s-ingress" # alias для записи в индексы, должен быть уникальным
        ilm_pattern => "{now/d}-000001" # шаблон для создания индексов, может быть как "{now/d}-000001" так и "000001"
        ilm_policy => "k8s-ingress" # политика прикрепляемая к индексу
        index => "k8s-ingress-%{+YYYY.MM.dd}" # название создаваемого индекса, может содержать %{+YYYY.MM.dd}, зависит от ilm_pattern
    }
}

設定 Kibana

有一個適用於所有新索引的基本範本。它定義了熱索引的分佈、分片數量、副本數等。模板的權重由選項決定 order。具有較高權重的範本會覆蓋現有範本參數或新增新的範本參數。

GET_模板/預設

{
  "default" : {
    "order" : -1, # вес шаблона
    "version" : 1,
    "index_patterns" : [
      "*" # применяем ко всем индексам
    ],
    "settings" : {
      "index" : {
        "codec" : "best_compression", # уровень сжатия
        "routing" : {
          "allocation" : {
            "require" : {
              "box_type" : "hot" # распределяем только по горячим нодам
            },
            "total_shards_per_node" : "8" # максимальное количество шардов на ноду от одного индекса
          }
        },
        "refresh_interval" : "5s", # интервал обновления индекса
        "number_of_shards" : "8", # количество шардов
        "auto_expand_replicas" : "0-1", # количество реплик на ноду от одного индекса
        "number_of_replicas" : "1" # количество реплик
      }
    },
    "mappings" : {
      "_meta" : { },
      "_source" : { },
      "properties" : { }
    },
    "aliases" : { }
  }
}

然後我們將映射應用於索引 k8s-ingress-* 使用權重較高的模板。

取得 _template/k8s-ingress

{
  "k8s-ingress" : {
    "order" : 100,
    "index_patterns" : [
      "k8s-ingress-*"
    ],
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "k8s-ingress",
          "rollover_alias" : "k8s-ingress"
        },
        "codec" : "best_compression",
        "routing" : {
          "allocation" : {
            "require" : {
              "box_type" : "hot"
            }
          }
        },
        "number_of_shards" : "8",
        "number_of_replicas" : "1"
      }
    },
    "mappings" : {
      "numeric_detection" : false,
      "_meta" : { },
      "_source" : { },
      "dynamic_templates" : [
        {
          "all_fields" : {
            "mapping" : {
              "index" : false,
              "type" : "text"
            },
            "match" : "*"
          }
        }
      ],
      "date_detection" : false,
      "properties" : {
        "kubernetes" : {
          "type" : "object",
          "properties" : {
            "container_name" : {
              "type" : "keyword"
            },
            "container_hash" : {
              "index" : false,
              "type" : "keyword"
            },
            "host" : {
              "type" : "keyword"
            },
            "annotations" : {
              "type" : "object",
              "properties" : {
                "value" : {
                  "index" : false,
                  "type" : "text"
                },
                "key" : {
                  "index" : false,
                  "type" : "keyword"
                }
              }
            },
            "docker_id" : {
              "index" : false,
              "type" : "keyword"
            },
            "pod_id" : {
              "type" : "keyword"
            },
            "labels" : {
              "type" : "object",
              "properties" : {
                "value" : {
                  "type" : "keyword"
                },
                "key" : {
                  "type" : "keyword"
                }
              }
            },
            "namespace_name" : {
              "type" : "keyword"
            },
            "pod_name" : {
              "type" : "keyword"
            }
          }
        },
        "@timestamp" : {
          "type" : "date"
        },
        "nginx" : {
          "type" : "object",
          "properties" : {
            "access" : {
              "type" : "object",
              "properties" : {
                "agent" : {
                  "type" : "text"
                },
                "response_code" : {
                  "type" : "integer"
                },
                "upstream" : {
                  "type" : "object",
                  "properties" : {
                    "port" : {
                      "type" : "keyword"
                    },
                    "name" : {
                      "type" : "keyword"
                    },
                    "response_lenght" : {
                      "type" : "integer"
                    },
                    "response_time" : {
                      "index" : false,
                      "type" : "text"
                    },
                    "addr" : {
                      "type" : "keyword"
                    },
                    "status" : {
                      "index" : false,
                      "type" : "text"
                    }
                  }
                },
                "method" : {
                  "type" : "keyword"
                },
                "http_version" : {
                  "type" : "keyword"
                },
                "bytes_sent" : {
                  "type" : "integer"
                },
                "request_lenght" : {
                  "type" : "integer"
                },
                "url" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword"
                    }
                  }
                },
                "remote_user" : {
                  "type" : "text"
                },
                "referrer" : {
                  "type" : "text"
                },
                "remote_ip" : {
                  "type" : "ip"
                },
                "request_time" : {
                  "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis||dd/MMM/YYYY:H:m:s Z",
                  "type" : "date"
                },
                "host" : {
                  "type" : "keyword"
                },
                "time" : {
                  "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis||dd/MMM/YYYY:H:m:s Z",
                  "type" : "date"
                }
              }
            },
            "error" : {
              "type" : "object",
              "properties" : {
                "server" : {
                  "type" : "keyword"
                },
                "upstream" : {
                  "type" : "object",
                  "properties" : {
                    "port" : {
                      "type" : "keyword"
                    },
                    "proto" : {
                      "type" : "keyword"
                    },
                    "host" : {
                      "type" : "keyword"
                    },
                    "url" : {
                      "type" : "text",
                      "fields" : {
                        "keyword" : {
                          "type" : "keyword"
                        }
                      }
                    }
                  }
                },
                "method" : {
                  "type" : "keyword"
                },
                "level" : {
                  "type" : "keyword"
                },
                "http_version" : {
                  "type" : "keyword"
                },
                "pid" : {
                  "index" : false,
                  "type" : "integer"
                },
                "message" : {
                  "type" : "text"
                },
                "tid" : {
                  "index" : false,
                  "type" : "keyword"
                },
                "url" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword"
                    }
                  }
                },
                "referrer" : {
                  "type" : "text"
                },
                "remote_ip" : {
                  "type" : "ip"
                },
                "connection_id" : {
                  "index" : false,
                  "type" : "keyword"
                },
                "host" : {
                  "type" : "keyword"
                },
                "time" : {
                  "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis||dd/MMM/YYYY:H:m:s Z",
                  "type" : "date"
                }
              }
            }
          }
        },
        "log" : {
          "type" : "text"
        },
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "ignore_above" : 256,
              "type" : "keyword"
            }
          }
        },
        "eventtime" : {
          "type" : "float"
        }
      }
    },
    "aliases" : { }
  }
}

套用所有範本後，我們套用 ILM 策略並開始監控索引的生命週期。

取得 _ilm/policy/k8s-ingress

{
  "k8s-ingress" : {
    "version" : 14,
    "modified_date" : "2020-06-11T10:27:01.448Z",
    "policy" : {
      "phases" : {
        "warm" : { # теплая фаза
          "min_age" : "5d", # срок жизни индекса после ротации до наступления теплой фазы
          "actions" : {
            "allocate" : {
              "include" : { },
              "exclude" : { },
              "require" : {
                "box_type" : "warm" # куда перемещаем индекс
              }
            },
            "shrink" : {
              "number_of_shards" : 4 # обрезание индексов, т.к. у нас 4 ноды
            }
          }
        },
        "cold" : { # холодная фаза
          "min_age" : "25d", # срок жизни индекса после ротации до наступления холодной фазы
          "actions" : {
            "allocate" : {
              "include" : { },
              "exclude" : { },
              "require" : {
                "box_type" : "cold" # куда перемещаем индекс
              }
            },
            "freeze" : { } # замораживаем для оптимизации
          }
        },
        "hot" : { # горячая фаза
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "50gb", # максимальный размер индекса до ротации (будет х2, т.к. есть 1 реплика)
              "max_age" : "1d" # максимальный срок жизни индекса до ротации
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : { # фаза удаления
          "min_age" : "120d", # максимальный срок жизни после ротации перед удалением
          "actions" : {
            "delete" : { }
          }
        }
      }
    }
  }
}

問題

設定和調試階段出現了問題。

熱相

為了正確旋轉索引，末尾的存在至關重要 index_name-date-000026 格式化數字 000001。程式碼中有一些行使用正規表示式檢查末尾的數字索引。否則，就會出現錯誤，策略將不會應用於索引，並且它將始終處於熱階段。

暖期

收縮（切割）－減少分片的數量，因為我們在溫暖和冷階段有 4 個節點。該文件包含以下幾行：

索引必須是唯讀的。

索引中每個分片的副本必須位於同一節點上。

集群健康狀態必須為綠色。

為了修剪索引，Elasticsearch 會將所有主分片移到一個節點，使用所需的參數複製修剪後的索引，然後刪除舊索引。範圍 total_shards_per_node 必須等於或大於適合一個節點的主分片數量。否則，將會有通知，分片將不會移動到所需的節點。

取得 /shrink-k8s-ingress-2020.06.06-000025/_settings

{
  "shrink-k8s-ingress-2020.06.06-000025" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "5s",
        "auto_expand_replicas" : "0-1",
        "blocks" : {
          "write" : "true"
        },
        "provided_name" : "shrink-k8s-ingress-2020.06.06-000025",
        "creation_date" : "1592225525569",
        "priority" : "100",
        "number_of_replicas" : "1",
        "uuid" : "psF4MiFGQRmi8EstYUQS4w",
        "version" : {
          "created" : "7060299",
          "upgraded" : "7060299"
        },
        "lifecycle" : {
          "name" : "k8s-ingress",
          "rollover_alias" : "k8s-ingress",
          "indexing_complete" : "true"
        },
        "codec" : "best_compression",
        "routing" : {
          "allocation" : {
            "initial_recovery" : {
              "_id" : "_Le0Ww96RZ-o76bEPAWWag"
            },
            "require" : {
              "_id" : null,
              "box_type" : "cold"
            },
            "total_shards_per_node" : "8"
          }
        },
        "number_of_shards" : "4",
        "routing_partition_size" : "1",
        "resize" : {
          "source" : {
            "name" : "k8s-ingress-2020.06.06-000025",
            "uuid" : "gNhYixO6Skqi54lBjg5bpQ"
          }
        }
      }
    }
  }
}

冷相

凍結（凍結）－我們凍結索引以根據歷史資料最佳化查詢。

對凍結索引執行的搜尋使用小型、專用、search_throttled 執行緒池來控制每個節點上命中凍結分片的並發搜尋數量。這限制了與凍結分片相對應的瞬態資料結構所需的額外記憶體量，從而保護節點免受過多的記憶體消耗。
凍結索引是唯讀的：您無法對其進行索引。
預計對凍結索引的搜尋將會執行緩慢。凍結索引不適用於高搜尋負載。即使在索引未凍結時相同的搜尋在幾毫秒內完成，凍結索引的搜尋可能需要幾秒鐘或幾分鐘才能完成。

結果

我們學習如何準備節點以使用 ILM、如何設定在熱節點之間分配分片的範本以及如何為具有所有生命階段的索引設定 ILM。

有用的鏈接

來源： www.habr.com

Elasticsearch 中的長期數據存儲

給定

目標

設定 Elasticsearch

設定Logstash

設定 Kibana

問題

熱相

暖期

冷相

結果

有用的鏈接