ClickHouse Database for Humans, or Alien Technologies

ʻO Aleksey Lizunov, ke poʻo o ka Center Competence for Remote Service Channels of the Directorate of Information Technologies of the MKB

ClickHouse Database for Humans, or Alien Technologies

Ma keʻano heʻokoʻa i ka ELK stack (ElasticSearch, Logstash, Kibana), ke hana nei mākou i ka noiʻiʻana i ka hoʻohanaʻana i ka waihona ClickHouse ma keʻano he waihonaʻikepili no nā lāʻau.

Ma kēia ʻatikala, makemake mākou e kamaʻilio e pili ana i kā mākou ʻike i ka hoʻohana ʻana i ka waihona ClickHouse a me nā hopena mua o ka hana hoʻokele. Pono e hoʻomaopopo koke ʻia he mea kupanaha nā hopena.


ClickHouse Database for Humans, or Alien Technologies

Ma hope aʻe, e wehewehe mākou i nā kikoʻī i ke ʻano o ka hoʻonohonoho ʻana o kā mākou ʻōnaehana, a me nā ʻāpana o ia mea. Akā i kēia manawa makemake wau e kamaʻilio liʻiliʻi e pili ana i kēia waihona ma ke ʻano holoʻokoʻa, a no ke aha e pono ai ka nānā ʻana. ʻO ka waihona ClickHouse kahi waihona kolamu analytical kiʻekiʻe mai Yandex. Hoʻohana ʻia ia i nā lawelawe ʻo Yandex, ma mua ʻo ia ka waihona ʻikepili nui no Yandex.Metrica. Pūnaehana kumu wehe, manuahi. Mai ka manaʻo o ka mea hoʻomohala, ua noʻonoʻo wau pehea lākou i hoʻokō ai, no ka mea, aia ka ʻikepili nui. A ʻo ka mea hoʻohana ʻo Metrica ponoʻī he maʻalahi a wikiwiki hoʻi. I ka ʻike mua ʻana i kēia waihona, ʻo ka manaʻo: "ʻAe, hope loa! Hana ʻia no nā kānaka! E hoʻomaka ana mai ke kaʻina hana a hoʻopau me ka hoʻouna ʻana i nā noi.

He haʻahaʻa loa ko kēia waihona waihona. Hiki i ka mea hoʻomohala akamai ke hoʻokomo i kēia waihona i loko o kekahi mau minuke a hoʻomaka e hoʻohana. Hana maopopo nā mea a pau. Hiki i nā poʻe hou i Linux ke hoʻopaʻa koke i ka hoʻonohonoho ʻana a hana i nā hana maʻalahi. Inā ma mua, me nā huaʻōlelo Big Data, Hadoop, Google BigTable, HDFS, he mea hoʻomohala maʻamau i manaʻo e pili ana i kekahi terabytes, petabytes, ua komo kekahi mau superhumans i nā hoʻonohonoho a me ka hoʻomohala ʻana no kēia mau ʻōnaehana, a laila me ka hiki ʻana mai o ka ClickHouse. waihona, loaʻa iā mākou kahi mea hana maʻalahi a hiki ke hoʻoponopono i kahi ʻano hana i hiki ʻole ke loaʻa mua. Hoʻokahi wale nō mīkini maʻamau a me ʻelima mau minuke e hoʻokomo ai. ʻO ia hoʻi, loaʻa iā mākou kahi waihona e like me MySql, akā no ka mālama ʻana i nā piliona o nā moʻolelo! ʻO kekahi super-archiver me ka ʻōlelo SQL. Me he mea lā ua hāʻawi ʻia nā mea kaua a nā malihini.

E pili ana i kā mākou ʻōnaehana logging

No ka hōʻiliʻili ʻana i ka ʻike, hoʻohana ʻia nā faila log IIS o nā palapala noi pūnaewele maʻamau (ke hoʻopau nei mākou i nā moʻolelo noiʻi i kēia manawa, akā ʻo ka pahuhopu nui i ka pae pilote ʻo ia ka hōʻiliʻili ʻana i nā log IIS).

No nā kumu like ʻole, ʻaʻole hiki iā mākou ke haʻalele loa i ka waihona ELK, a ke hoʻomau nei mākou i ka hoʻohana ʻana i nā ʻāpana LogStash a me Filebeat, i hōʻoia maikaʻi iā lākou iho a hana me ka hilinaʻi a me ka wānana.

Hōʻike ʻia ka ʻōnaehana logging maʻamau ma ke kiʻi ma lalo nei:

ClickHouse Database for Humans, or Alien Technologies

ʻO kahi hiʻohiʻona o ka kākau ʻana i ka ʻikepili i ka waihona ClickHouse ʻaʻole pinepine (hoʻokahi i kēlā me kēia kekona) ka hoʻokomo ʻana i nā moʻolelo i nā pūʻulu nui. ʻO kēia, ʻoiai, ʻo ia ka ʻāpana "pilikia" loa āu e ʻike ai i ka wā e ʻike mua ai ʻoe i ka hana ʻana me ka waihona ClickHouse: lilo ka hoʻolālā i mea paʻakikī iki.
ʻO ka plugin no LogStash, ka mea e hoʻokomo pololei i ka ʻikepili i ClickHouse, kōkua nui ma aneʻi. Hoʻokomo ʻia kēia ʻāpana ma ka kikowaena like me ka waihona ʻikepili ponoʻī. No laila, ma ka ʻōlelo maʻamau, ʻaʻole ia e ʻōlelo ʻia e hana ia, akā mai kahi ʻike kūpono, i ʻole e hana i nā kikowaena kaʻawale i ka wā e kau ʻia ana ma ka kikowaena like. ʻAʻole mākou i ʻike i nā hāʻule a i ʻole nā ​​​​pilikia waiwai me ka waihona. Eia kekahi, pono e hoʻomaopopo ʻia he hana hoʻāʻo hou ka plugin i ka hihia o nā hewa. A inā he hewa, kākau ka plugin i kahi ʻāpana o ka ʻikepili i hiki ʻole ke hoʻokomo ʻia (maʻalahi ke ʻano o ka faila: ma hope o ka hoʻoponopono ʻana, hiki iā ʻoe ke hoʻokomo maʻalahi i ka pūʻulu hoʻoponopono me ka clickhouse-client).

Hōʻike ʻia kahi papa inoa piha o nā polokalamu i hoʻohana ʻia i ka papahana ma ka papa:

Ka papa inoa o nā lako polokalamu i hoʻohana ʻia

Inoa

hōʻikeʻano

loulou hoʻolaha

NGINX

Reverse-proxy e hoʻopaʻa i ke komo ʻana e nā awa a hoʻonohonoho i ka ʻae

ʻAʻole hoʻohana ʻia i kēia manawa i ka papahana

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Ka hoʻoili ʻana i nā moʻolelo waihona.

https://www.elastic.co/downloads/beats/filebeat (puke mahele no Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

waihona lāʻau

ʻOhi lāʻau.

Hoʻohana ʻia e hōʻiliʻili i nā lāʻau mai FileBeat, a me ka hōʻiliʻili ʻana i nā lāʻau mai ka queue RabbitMQ (no nā kikowaena i loko o ka DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Loagstash plugin no ka hoʻoili ʻana i nā lāʻau i ka waihona ClickHouse i nā pūʻulu

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin e hoʻokomo i ka logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin hoʻokomo i ka logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin hoʻokomo i ka logstash-filter-multiline

KaomiHouse

Waihona moʻolelo https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Nānā. E hoʻomaka ana mai ʻAukake 2018, kūkulu ʻia ka rpm "maʻamau" no RHEL i ka waihona Yandex, no laila hiki iā ʻoe ke hoʻāʻo e hoʻohana iā lākou. I ka manawa o ke kau ʻana, hoʻohana mākou i nā pūʻolo i kūkulu ʻia e Altinity.

grafana

ʻIke ʻike moʻolelo. Hoʻonohonoho i nā papa kuhikuhi

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - mana hou loa

ʻIkepili ClickHouse no Grafana 4.6+

Pākuʻi no Grafana me ClickHouse kumu ʻikepili

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

waihona lāʻau

E hoʻopaʻa inoa i ka mea hoʻokele mai FileBeat a i ka queue RabbitMQ.

Nānā. ʻO ka mea pōʻino, ʻaʻole i loaʻa pololei i ka FileBeat i RabbitMQ, no laila pono kahi loulou waena ma ke ʻano o Logstash.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

ʻO RabbitMQ

pila memo. ʻO kēia ka log buffer ma ka DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Koi ʻia no RabbitMQ)

ʻO Erlang manawa holo. Pono no RabbitMQ e hana

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Hōʻike ʻia ka hoʻonohonoho kikowaena me ka waihona ClickHouse ma ka papa aʻe:

Inoa

waiwai

i hoʻopuka

Kauoa

HDD: 40GB
RAM: 8GB
Kaʻina hana: Core 2 2Ghz

Pono e hoʻolohe i nā ʻōlelo aʻoaʻo no ka hana ʻana i ka waihona ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

polokalamu ʻōnaehana maʻamau

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

E like me kāu e ʻike ai, he hale hana maʻamau kēia.

ʻO ke ʻano o ka papa no ka mālama ʻana i nā lāʻau penei:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Hoʻohana mākou i ka ʻāpana paʻamau (ma ka mahina) a me ka granularity index. Hoʻopili pono nā kahua āpau me nā hoʻokomo log IIS no ka hoʻopaʻa inoa ʻana i nā noi http. Ma kahi kaʻawale, ʻike mākou aia nā kahua ʻokoʻa no ka mālama ʻana i nā utm-tags (ua paʻi ʻia lākou ma ke kahua o ka hoʻokomo ʻana i ka papaʻaina mai ke kahua string query).

Eia kekahi, ua hoʻohui ʻia kekahi mau kahua ʻōnaehana i ka papa e mālama i ka ʻike e pili ana i nā ʻōnaehana, nā ʻāpana, nā kikowaena. E nānā i ka papa ma lalo no ka wehewehe ʻana i kēia mau kahua. Ma ka papa hoʻokahi, mālama mākou i nā lāʻau no nā ʻōnaehana he nui.

Inoa

hōʻikeʻano

Pākuhi:

fld_app_name

inoa noi/system
Nā waiwai kūpono:

  • site1.domain.com Paena waho 1
  • site2.domain.com Paena waho 2
  • internal-site1.domain.local Paena kūloko 1

kahua1.domain.com

fld_app_module

Pūnaehana module
Nā waiwai kūpono:

  • pūnaewele - Pūnaewele
  • svc - lawelawe pūnaewele pūnaewele
  • intgr - lawelawe pūnaewele hoʻohui
  • bo - Admin (BackOffice)

pūnaewele

fld_website_name

Ka inoa pūnaewele ma IIS

Hiki ke kau ʻia kekahi mau ʻōnaehana ma kahi kikowaena, a i ʻole kekahi mau manawa o hoʻokahi module ʻōnaehana

punaewele nui

fld_server_name

inoa kikowaena

web1.domain.com

fld_log_file_name

Ala i ka waihona log ma ke kikowaena

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Hiki iā ʻoe ke kūkulu pono i nā kiʻi ma Grafana. No ka laʻana, e nānā i nā noi mai ka mua o kahi ʻōnaehana. Ua like kēia me ka helu pūnaewele ma Yandex.Metrica.

Eia kekahi mau ʻikepili no ka hoʻohana ʻana i ka waihona no ʻelua mahina.

Ka helu o nā moʻolelo i wāwahi ʻia e nā ʻōnaehana a me kā lākou mau ʻāpana

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Ka nui o ka ʻikepili ma ka diski

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Degere o ke kōmi ʻikepili ma nā kolamu

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

ʻO ka wehewehe ʻana i nā mea i hoʻohana ʻia

FileBeat. Ka hoʻoili ʻana i nā moʻolelo waihona

Mālama kēia ʻāpana i nā loli e hoʻopaʻa i nā faila ma ka disk a hāʻawi i ka ʻike iā LogStash. Hoʻokomo ʻia ma nā kikowaena āpau kahi i kākau ʻia ai nā faila log (maʻamau IIS). Hana ʻia ma ke ʻano huelo (ʻo ia ka hoʻoili ʻana i nā moʻolelo i hoʻohui ʻia i ka faila). Akā ma kahi kaʻawale hiki ke hoʻonohonoho ʻia e hoʻoili i nā faila holoʻokoʻa. Pono kēia inā pono ʻoe e hoʻoiho i ka ʻikepili mai nā mahina i hala. E hoʻokomo wale i ka faila log i loko o kahi waihona a e heluhelu ʻo ia i kona holoʻokoʻa.

Ke pau ka lawelawe, ʻaʻole e hoʻoneʻe hou ʻia ka ʻikepili i ka waihona.

ʻO kahi hoʻonohonoho hoʻohālike e like me kēia:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

waihona lāʻau. ʻOhi lāʻau

Hoʻolālā ʻia kēia ʻāpana no ka loaʻa ʻana o nā hoʻokomo log mai FileBeat (a i ʻole ma o ka queue RabbitMQ), e hoʻopaʻa a hoʻokomo i nā pūʻulu i ka waihona ClickHouse.

No ka hoʻokomo ʻana i ClickHouse, hoʻohana ʻia ka plugin Logstash-output-clickhouse. Loaʻa i ka plugin Logstash kahi noi hoʻāʻo hou, akā me ka pani maʻamau, ʻoi aku ka maikaʻi o ka hoʻōki ʻana i ka lawelawe ponoʻī. I ka wā i kū ai, e hōʻiliʻili ʻia nā memo ma ka queue RabbitMQ, no laila inā lōʻihi ka hoʻomaha ʻana, a laila ʻoi aku ka maikaʻi o ka hoʻopau ʻana iā Filebeats ma nā kikowaena. Ma kahi hoʻolālā kahi i hoʻohana ʻole ʻia ai ʻo RabbitMQ (ma ka pūnaewele kūloko, hoʻouna pololei ʻo Filebeat i nā lāʻau i Logstash), hana maikaʻi ʻo Filebeats a paʻa, no laila no lākou ka loaʻa ʻole o ka puka ʻana me ka hopena ʻole.

ʻO kahi hoʻonohonoho hoʻohālike e like me kēia:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

hale kaomi. Waihona moʻolelo

Hoʻopaʻa ʻia nā moʻolelo no nā ʻōnaehana āpau i ka papa hoʻokahi (e ʻike i ka hoʻomaka o ka ʻatikala). Manaʻo ia e mālama i ka ʻike e pili ana i nā noi: ua like nā ʻāpana āpau no nā ʻano like ʻole, e like me IIS logs, apache a me nginx logs. No nā loina noi, kahi, no ka laʻana, nā hewa, nā memo ʻike, nā ʻōlelo aʻo i hoʻopaʻa ʻia, e hāʻawi ʻia kahi papa ʻokoʻa me ke ʻano kūpono (i kēia manawa ma ka pae hoʻolālā).

I ka hoʻolālā ʻana i kahi papaʻaina, he mea nui e hoʻoholo i ke kī nui (kahi e hoʻokaʻawale ʻia ai ka ʻikepili i ka wā mālama). ʻO ke kiʻekiʻe o ka hoʻopili ʻana i ka ʻikepili a me ka wikiwiki o ka nīnau e pili ana i kēia. I kā mākou laʻana, ʻo ke kī
KAUOHA MA (fld_app_name, fld_app_module, logdatetime)
ʻO ia, ma ka inoa o ka ʻōnaehana, ka inoa o ka ʻāpana ʻōnaehana a me ka lā o ka hanana. I ka hoʻomaka ʻana, ua hele mua ka lā o ka hanana. Ma hope o ka neʻe ʻana iā ia i kahi hope, hoʻomaka nā nīnau e hana ʻelua ʻoi aku ka wikiwiki. Pono ka hoʻololi ʻana i ke kī nui e hana hou i ka papaʻaina a hoʻouka hou i ka ʻikepili i hiki ai iā ClickHouse ke hoʻonohonoho hou i ka ʻikepili ma ka disk. He hana koʻikoʻi kēia, no laila he manaʻo maikaʻi e noʻonoʻo nui e pili ana i ka mea e hoʻokomo ʻia i ke kī ʻano.

Pono e hoʻomaopopo ʻia ua ʻike ʻia ka ʻano data LowCardinality i nā mana hou. I ka hoʻohana ʻana iā ia, ua hoʻemi nui ʻia ka nui o nā ʻikepili i hoʻopili ʻia no kēlā mau māla i loaʻa ka cardinality haʻahaʻa (mau koho liʻiliʻi).

Ke hoʻohana ʻia nei ka mana 19.6 a hoʻolālā mākou e hoʻāʻo e hoʻonui i ka mana hou loa. Loaʻa iā lākou nā hiʻohiʻona nani e like me Adaptive Granularity, Skipping indices a me ka codec DoubleDelta, no ka laʻana.

Ma ka maʻamau, i ka wā o ka hoʻokomo ʻana, ua hoʻonohonoho ʻia ka pae logging e trace. Hoʻololi ʻia nā lāʻau a hoʻopaʻa ʻia, akā i ka manawa like e hoʻonui lākou i kahi gigabyte. Inā ʻaʻohe pono, a laila hiki iā ʻoe ke hoʻonohonoho i ka pae ʻōlelo aʻo, a laila hoʻemi nui ʻia ka nui o ka log. Hoʻonohonoho ʻia ka hoʻonohonoho logging ma ka faila config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Kekahi mau kauoha pono

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

waihona lāʻau. E hoʻopaʻa inoa i ka mea hoʻokele mai FileBeat i ka queue RabbitMQ

Hoʻohana ʻia kēia ʻāpana e ala i nā lāʻau e hele mai ana mai FileBeat i ka queue RabbitMQ. ʻElua mau wahi ma ʻaneʻi:

  1. ʻO ka mea pōʻino, ʻaʻohe o FileBeat i kahi plugin output e kākau pololei iā RabbitMQ. A ʻo ia mau hana, e hoʻoholo ana i ka pilikia ma kā lākou github, ʻaʻole i hoʻolālā ʻia no ka hoʻokō. Aia kahi plugin no Kafka, akā no kekahi kumu ʻaʻole hiki iā mākou ke hoʻohana ma ka home.
  2. Aia nā koi no ka ʻohi ʻana i nā lāʻau ma ka DMZ. Ma muli o ia mau mea, pono e hoʻohui mua ʻia nā lāʻau i ka pila a laila heluhelu ʻo LogStash i nā mea komo mai ka pila mai waho.

No laila, no ka hihia kahi i loaʻa ai nā kikowaena i ka DMZ e hoʻohana kekahi i kahi hoʻolālā paʻakikī. ʻO kahi laʻana hoʻonohonoho e like me kēia:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. pila memo

Hoʻohana ʻia kēia ʻāpana no ka hoʻopaʻa ʻana i nā hoʻokomo log ma ka DMZ. Hana ʻia ka hoʻopaʻa ʻana ma o kahi hui o Filebeat → LogStash. Hana ʻia ka heluhelu ʻana mai waho o ka DMZ ma o LogStash. I ka hana ʻana ma o RabboitMQ, ma kahi o 4 tausani mau memo i kēlā me kēia kekona.

Hoʻonohonoho ʻia ka hoʻokele memo e ka inoa ʻōnaehana, ʻo ia hoʻi ma muli o ka ʻikepili hoʻonohonoho FileBeat. Hele nā ​​memo a pau i hoʻokahi pila. Inā no kekahi kumu i hoʻopau ʻia ka lawelawe queuing, a laila ʻaʻole ia e alakaʻi i ka nalowale o nā leka: E loaʻa iā FileBeats nā hewa pili a hoʻokuʻu i ka hoʻouna ʻana. A ʻo LogStash e heluhelu ana mai ka queue e loaʻa pū i nā hewa pūnaewele a kali i ka hoʻihoʻi ʻana o ka pilina. I kēia hihia, ʻaʻole e kākau hou ʻia ka ʻikepili i ka waihona.

Hoʻohana ʻia nā ʻōlelo aʻoaʻo e hana a hoʻonohonoho i nā queues:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Nā papa kuhikuhi

Hoʻohana ʻia kēia ʻāpana e nānā i ka ʻikepili nānā. I kēia hihia, pono ʻoe e hoʻokomo i ka ʻikepili ClickHouse no Grafana 4.6+ plugin. Pono mākou e hoʻololi iki i mea e hoʻomaikaʻi ai i ka hana ʻana i nā kānana SQL ma ka dashboard.

No ka laʻana, hoʻohana mākou i nā mea hoʻololi, a inā ʻaʻole i hoʻonohonoho ʻia i loko o ke kahua kānana, a laila makemake mākou ʻaʻole e hana i kahi kūlana ma WHERE o ke ʻano ( uriStem = » AND uriStem != » ). I kēia hihia, e heluhelu ʻo ClickHouse i ke kolamu uriStem. Ma keʻano laulā, ua hoʻāʻo mākou i nā koho like ʻole a hoʻoponopono hope i ka plugin (ka $ valueIfEmpty macro) i mea e hoʻihoʻi mai ai ka waiwai i ka 1, me ka ʻole o ka haʻi ʻana i ke kolamu ponoʻī.

A i kēia manawa hiki iā ʻoe ke hoʻohana i kēia nīnau no ka pakuhi

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

ʻo ia ka unuhi i kēia SQL (e hoʻomaopopo ua hoʻololi ʻia nā kahua uriStem hakahaka i 1 wale nō)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

hopena

ʻO ka hiʻohiʻona o ka waihona ClickHouse ua lilo i mea hōʻailona ma ka mākeke. He mea paʻakikī ke noʻonoʻo, me ka uku ʻole ʻole, i ka manawa koke ua paʻa mākou i kahi mea hana ikaika a kūpono no ka hana ʻana me ka ʻikepili nui. ʻOiaʻiʻo, me ka hoʻonui ʻana i nā pono (e like me ka sharding a me ka hoʻopiʻi ʻana i nā server he nui), e ʻoi aku ka paʻakikī o ka hoʻolālā. Akā ma nā manaʻo mua, ʻoluʻolu loa ka hana ʻana me kēia waihona. Hiki keʻikeʻia ua hanaʻia ka huahana "no nā kānaka."

Ke hoʻohālikelike ʻia me ElasticSearch, ua manaʻo ʻia e hoʻemi ʻia ke kumukūʻai o ka mālama ʻana a me ka hoʻoili ʻana i nā lāʻau. I nā huaʻōlelo ʻē aʻe, inā no ka nui o ka ʻikepili i kēia manawa e hoʻonohonoho mākou i kahi hui o nā mīkini, a laila i ka wā e hoʻohana ai iā ClickHouse, lawa ka mīkini haʻahaʻa haʻahaʻa iā mākou. ʻAe, ʻoiaʻiʻo, loaʻa iā ElasticSearch nā mīkini hoʻopili ʻikepili ma ka disk a me nā hiʻohiʻona ʻē aʻe e hiki ke hōʻemi nui i ka hoʻohana ʻana i nā kumuwaiwai, akā ke hoʻohālikelike ʻia me ClickHouse, ʻoi aku ka nui o ke kumukūʻai.

Me ka loaʻa ʻole o nā optimizations kūikawā ma kā mākou ʻaoʻao, ma nā hoʻonohonoho paʻamau, hoʻouka ʻana i ka ʻikepili a me ke koho ʻana mai ka waihona e hana i kahi wikiwiki kupaianaha. ʻAʻole nui kā mākou ʻikepili i kēia manawa (ma kahi o 200 miliona mau moʻolelo), akā nāwaliwali ke kikowaena ponoʻī. Hiki iā mākou ke hoʻohana i kēia mea hana i ka wā e hiki mai ana no nā kumu ʻē aʻe i pili ʻole i ka mālama ʻana i nā lāʻau. No ka laʻana, no ka ʻikepili hope-a-hope, ma ke kahua o ka palekana, aʻo mīkini.

I ka hopena, he wahi liʻiliʻi e pili ana i nā pono a me nā pōʻino.

Минусы

  1. Hoʻouka i nā moʻolelo ma nā pūʻulu nui. Ma ka lima hoʻokahi, he hiʻohiʻona kēia, akā pono ʻoe e hoʻohana i nā ʻāpana ʻē aʻe no ka hoʻopaʻa ʻana i nā moʻolelo. ʻAʻole maʻalahi kēia hana i nā manawa a pau, akā hiki ke hoʻonā. A makemake wau e maʻalahi i ka papahana.
  2. Ua haki pinepine kekahi mau hana exotic a i ʻole nā ​​hiʻohiʻona hou i nā mana hou. Hoʻopilikia kēia i ka hopohopo, e hōʻemi ana i ka makemake e hoʻonui i kahi mana hou. No ka laʻana, ʻo ka mīkini papaʻaina Kafka kahi hiʻohiʻona maikaʻi loa e hiki ai iā ʻoe ke heluhelu pololei i nā hanana mai Kafka, me ka ʻole o ka hoʻokō ʻana i nā mea kūʻai. Akā i ka hoʻoholo ʻana i ka nui o nā pilikia ma ka github, mālama mākou i ʻole e hoʻohana i kēia mīkini i ka hana. Eia nō naʻe, inā ʻaʻole ʻoe e hana koke i ka ʻaoʻao a hoʻohana i ka hana nui, a laila hana paʻa.

Плюсы

  1. ʻAʻole lohi.
  2. paepae komo haʻahaʻa.
  3. Māka-kumu.
  4. Kuokoa.
  5. Paipai maikaʻi ʻia (sharding/replication out of the box)
  6. Hoʻokomo ʻia i loko o ka papa inoa o nā polokalamu Lūkini i ʻōlelo ʻia e ka Ministry of Communications.
  7. ʻO ka loaʻa ʻana o ke kākoʻo mana mai Yandex.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka