Pātengi Raraunga ClickHouse mo te Tangata, Hangarau Tauiwi ranei

Aleksey Lizunov, Tumuaki o te Pokapū Whakaaetanga mo nga Hongere Ratonga Mamao o te Kaiwhakahaere o nga Hangarau Korero o te MKB

Pātengi Raraunga ClickHouse mo te Tangata, Hangarau Tauiwi ranei

Hei rereke ki te puranga ELK (ElasticSearch, Logstash, Kibana), kei te mahi rangahau matou mo te whakamahi i te paataka korero ClickHouse hei toa raraunga mo nga raarangi.

I roto i tenei tuhinga, e hiahia ana matou ki te korero mo o maatau wheako ki te whakamahi i te paataka korero ClickHouse me nga hua tuatahi o te mahi pairati. Me mahara tonu he tino whakamihi nga hua.


Pātengi Raraunga ClickHouse mo te Tangata, Hangarau Tauiwi ranei

I muri mai, ka whakaahuahia e matou nga korero mo te whirihora o to maatau punaha, me nga waahanga kei roto. Inaianei kei te pirangi au ki te korero paku mo te katoa o tenei putunga raraunga, me te aha e tika ana kia aro atu. Ko te patengi raraunga ClickHouse he putunga korero mo te mahi teitei mai i Yandex. Kei te whakamahia i roto i nga ratonga Yandex, i te tuatahi ko te rokiroki raraunga matua mo Yandex.Metrica. Pūnaha puna tuwhera, kore utu. Mai i te tirohanga a te kaiwhakawhanake, kua miharo tonu ahau me pehea ta raatau whakatinana, na te mea he nui nga raraunga. A ko te atanga kaiwhakamahi a Metrica ano he tino ngawari me te tere. I te wa tuatahi i mohio ai koe ki tenei putunga korero, ko te whakaaro: "Ae, ka mutu! I hangaia mo te iwi! Ka timata mai i te tukanga whakauru ka mutu ki te tuku tono.

He iti rawa te paepae urunga o tenei patengi raraunga. Ahakoa ka taea e te kaiwhakawhanake mohio-waenganui te whakauru i tenei patengi raraunga i roto i etahi meneti ka tiimata ki te whakamahi. Mahi marama nga mea katoa. Ahakoa ko nga tangata hou ki Linux ka taea te whakahaere i te whakaurunga me te mahi i nga mahi ngawari. Mena i mua atu, me nga kupu Raraunga Nui, Hadoop, Google BigTable, HDFS, he kaiwhakawhanake noa i whakaaro mo etahi terabytes, petabytes, kei te uru etahi tangata nui ki nga tautuhinga me te whanaketanga mo enei punaha, katahi ka tae mai te ClickHouse pātengi raraunga, i whiwhi matou i tetahi taputapu maamaa, maamaa e taea ai e koe te whakaoti i te whānuitanga o nga mahi kaore e taea. Kotahi noa te miihini me te rima meneti hei whakauru. Arā, he pātengi raraunga i whiwhi mātou, hei tauira, MySql, engari mo te penapena piriona rekoata anake! He tino-puranga me te reo SQL. He rite ki nga tangata i tukuna nga patu a nga tangata ke.

Mo ta matou punaha takiuru

Hei kohikohi i nga korero, ka whakamahia nga konae rangitaki IIS o nga tono paetukutuku whakatakotoranga paerewa (kei te poroporoaki matou i nga raarangi tono, engari ko te whainga matua i te waahi pairati ko te kohikohi i nga raarangi IIS).

He maha nga take, kaore i taea e matou te whakarere i te puranga ELK, ka whakamahi tonu matou i nga waahanga LogStash me te Filebeat, i whakamatau pai i a raatau ano me te mahi pono me te matapae.

E whakaatuhia ana te kaupapa takiuru whaanui i te ahua i raro nei:

Pātengi Raraunga ClickHouse mo te Tangata, Hangarau Tauiwi ranei

Ko tetahi ahuatanga o te tuhi i nga raraunga ki te paatete ClickHouse he iti noa (kotahi mo ia hekona) te whakauru i nga rekoata ki nga roopu nui. Ko te ahua tenei ko te waahanga tino "raruraru" e pa ana ki a koe i te wa tuatahi ka wheako koe ki te mahi me te paatete ClickHouse: he iti ake te uaua o te kaupapa.
Ko te mono mo LogStash, e whakauru tika ana i nga raraunga ki ClickHouse, i awhina nui i konei. Ka tukuna tenei waahanga ki runga i te tūmau rite tonu ki te pātengi raraunga. Na, i te nuinga o te korero, kaore i te tūtohutia kia mahia, engari mai i te tirohanga whaitake, kia kore ai e whakaputa i nga kaitoro motuhake i te wa e tukuna ana ki runga i te tūmau kotahi. Karekau matou i kite i nga rahunga, i nga taupatupatu rauemi ranei ki te papaunga raraunga. I tua atu, me tohu ko te mono he tikanga ngana ano mena he hapa. A, ki te he, ka tuhia e te mono ki te kōpae he puranga o nga raraunga kaore e taea te whakauru (he watea te whakatakotoranga konae: i muri i te whakatika, ka taea e koe te whakauru ngawari i te puranga kua whakatikahia ma te whakamahi clickhouse-kiritaki).

Ko te rarangi katoa o nga raupaparorohiko e whakamahia ana i roto i te kaupapa kua whakaatuhia ki te ripanga:

Rārangi o ngā pūmanawa i whakamahia

Taitara

Whakaahuatanga

Hononga tohatoha

NGINX

Takawaenga whakamuri ki te aukati i te uru ma nga tauranga me te whakarite whakamana

I tenei wa kaore i te whakamahia i roto i te kaupapa

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Te whakawhiti i nga raarangi konae.

https://www.elastic.co/downloads/beats/filebeat (kete tohatoha mo Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

putea rakau

Kaikohi rangitaki.

Ka whakamahia ki te kohi pororakau mai i te FileBeat, me te kohi rakau mai i te rarangi RabbitMQ (mo nga kaitoro kei roto i te DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Ko te mono Loagstash mo te whakawhiti i nga raarangi ki te paatakanga a ClickHouse i roto i nga roopu

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin tāuta logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin tāuta logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin tāuta logstash-filter-multiline

PaateneToko

Te rokiroki rangitaki https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Tuhipoka. Mai i te marama o Akuhata 2018, ka puta te hanga rpm "noa" mo te RHEL i roto i te whare taonga Yandex, na ka taea e koe te ngana ki te whakamahi. I te wa o te whakaurunga, kei te whakamahi matou i nga kete i hangaia e Altinity.

Karepe

Whakaaturanga rangitaki. Te whakarite papatohu

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - putanga hou

Puna raraunga ClickHouse mo Grafana 4.6+

Mono mo Grafana me te puna raraunga ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

putea rakau

Takiuru pouara mai i te FileBeat ki te rarangi RabbitMQ.

Tuhipoka. Kia aroha mai, karekau he putanga tika a FileBeat ki RabbitMQ, no reira ka hiahiatia he hononga takawaenga ki te ahua o Logstash.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

RabbitMQ

rarangi karere. Koinei te papaaa rakau i roto i te DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (E hiahiatia ana mo RabbitMQ)

Erlang wā whakahaere. E hiahiatia ana kia mahi a RabbitMQ

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Ko te whirihoranga tūmau me te pātengi raraunga ClickHouse kei te ripanga e whai ake nei:

Taitara

uara

parau

Hōutuutu

HDD: 40GB
RAM: 8GB
Tukatuka: Core 2 2Ghz

He mea tika ki te aro ki nga tohutohu mo te whakahaere i te paatete ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Pūmanawa pūnaha whānui

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Ka kite koe, he teihana mahi noa tenei.

Ko te hanganga o te ripanga mo te penapena i nga raarangi e whai ake nei:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Ka whakamahi matou i te wehewehenga taunoa (ma te marama) me te taupū kirikiri. Ka rite nga mara katoa ki nga urunga rangitaki IIS mo te takiuru tono http. Ka wehea, ka kite matou he waahi motuhake mo te rokiroki utm-tags (kua paahihia i te waahi o te whakauru ki te ripanga mai i te mara aho uira).

Ano, he maha nga mara punaha kua taapirihia ki te tepu hei penapena korero mo nga punaha, nga waahanga, nga kaitoro. Tirohia te ripanga i raro nei mo te whakamaarama mo enei mara. I roto i te ripanga kotahi, ka penapenahia e matou nga raarangi mo nga punaha maha.

Taitara

Whakaahuatanga

Hei tauira:

fld_app_name

Ingoa tono/pūnaha
Uara whaimana:

  • site1.domain.com Pae waho 1
  • site2.domain.com Pae waho 2
  • internal-site1.domain.local Pae ā-roto 1

pae1.domain.com

fld_app_module

Kōwae pūnaha
Uara whaimana:

  • paetukutuku - Paetukutuku
  • svc - Ratonga Tukutuku Paetukutuku
  • intgr - Ratonga Tukutuku Whakauru
  • bo - Kaiwhakahaere (BackOffice)

tukutuku

ingoa_paetukutuku fld

Ingoa pae i IIS

He maha nga punaha ka taea te tuku ki runga i te tūmau kotahi, ahakoa he maha nga waahanga o te waahanga punaha kotahi

matua tukutuku

fld_server_ingoa

Ingoa tūmau

web1.domain.com

fld_log_file_ingoa

Ara ki te kōnae rangitaki i runga i te tūmau

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Ma tenei ka taea e koe te hanga kauwhata ki Grafana. Hei tauira, tirohia nga tono mai i te pito o mua o tetahi punaha. He rite tenei ki te kaute pae i Yandex.Metrica.

Anei etahi tatauranga mo te whakamahinga o te paataka mo nga marama e rua.

Te maha o nga rekoata kua pakaruhia e nga punaha me o raatau waahanga

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Te nui o nga raraunga i runga i te kōpae

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Te tohu o te kōpeketanga raraunga i roto i nga pou

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Whakaahuatanga o nga waahanga kua whakamahia

FileBeat. Te whakawhiti i nga raarangi konae

Ko tenei waahanga ka whai i nga huringa ki te takiuru i nga konae kei runga i te kōpae ka tukuna nga korero ki te LogStash. Kua whakauruhia ki runga i nga tūmau katoa ka tuhia nga konae rangitaki (te nuinga o te IIS). Ka mahi i roto i te aratau hiku (arā, ka whakawhiti noa i nga rekoata taapiri ki te konae). Engari motuhake ka taea te whirihora hei whakawhiti i nga konae katoa. He pai tenei ina hiahia koe ki te tango raraunga mai i nga marama o mua. Hoatu noa te konae rangitaki ki roto i te kōpaki ka panuihia katoatia.

Ka mutu te ratonga, kua kore e whakawhitia atu nga raraunga ki te rokiroki.

He penei te ahua o te whirihoranga tauira:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

putea rakau. Kohikohi Rangitaki

I hangaia tenei waahanga ki te tango i nga urunga rangitaki mai i te FileBeat (ma te rarangi RabbitMQ ranei), te poroporo me te whakauru i nga puranga ki roto i te paataka raraunga ClickHouse.

Mo te whakauru ki ClickHouse, ka whakamahia te mono Logstash-output-clickhouse. Ko te monomai Logstash he tono ngana ano, engari me te katinga tonu, he pai ake te aukati i te ratonga ake. Ka mutu, ka kohia nga karere ki roto i te rarangi RabbitMQ, na mena he roa te mutu, he pai ake te aukati i a Filebeats i runga i nga kaimau. I roto i tetahi kaupapa kaore i te whakamahia a RabbitMQ (i runga i te whatunga o te rohe, ka tukuna e Filebeat nga raarangi ki te Logstash), he pai te mahi a Filebeats, na reira ko te koretake o nga putanga ka pahemo kaore he hua.

He penei te ahua o te whirihoranga tauira:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

whare clickhouse. Te rokiroki rangitaki

Ko nga raarangi mo nga punaha katoa ka penapena ki te tepu kotahi (tirohia i te timatanga o te tuhinga). Ko te tikanga ki te penapena i nga korero mo nga tono: he rite nga tawhā katoa mo nga whakatakotoranga rereke, penei i nga raarangi IIS, nga raarangi apache me nga nginx. Mo nga raupapa tono, hei tauira, he hapa, he korero korero, he whakatupato, ka tukuna he ripanga motuhake me te hanganga e tika ana (i tenei wa kei te waahanga hoahoa).

I te wa e hoahoa ana i te tepu, he mea tino nui ki te whakatau i te matua matua (e komaka ai nga raraunga i te wa e rokiroki ana). Ko te tohu o te kohinga raraunga me te tere uiui kei runga i tenei. I roto i to tatou tauira, ko te matua
ORATE MA (fld_app_name, fld_app_module, logdatetime)
Arā, ma te ingoa o te punaha, te ingoa o te waahanga punaha me te ra o te huihuinga. I te timatanga, ko te ra o te huihuinga i tae tuatahi. I muri i te neke ki te waahi whakamutunga, ka timata nga patai ki te mahi tere ake. Ko te whakarereke i te matua matua ka hiahia ki te hanga ano i te ripanga me te uta ano i nga raraunga kia taea ai e ClickHouse te whakariterite ano i nga raraunga i runga i te kōpae. He mahi taumaha tenei, no reira he pai ki te whakaaro nui ki nga mea e tika ana kia whakauruhia ki roto i te taviri tohu.

Me mahara ano kua puta te momo raraunga LowCardinality i roto i nga putanga tata. I te wa e whakamahia ana, ka tino whakaitihia te rahi o nga raraunga kua tohua mo nga mara he iti te cardinality (he iti noa nga whiringa).

Ko te Putanga 19.6 kei te whakamahia i tenei wa ka whakamahere matou ki te ngana ki te whakahou ki te putanga hou. He ahua whakamiharo enei penei i te Adaptive Granularity, Skipping indices me te DoubleDelta codec, hei tauira.

Ma te taunoa, i te wa o te whakaurunga, ka whakatauhia te taumata takiuru ki te whai. Ka huri, ka purangatia nga poro, engari i te wa ano ka piki ake ki te gigabyte. Mena kaore he hiahia, ka taea e koe te whakarite i te taumata whakatupato, ka tino heke te rahi o te raarangi. Kua whakaritea te tautuhinga takiuru ki te kōnae config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Ko etahi whakahau whai hua

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

putea rakau. Takiuru pouara mai i te FileBeat ki te rarangi RabbitMQ

Ka whakamahia tenei waahanga ki te arataki i nga raarangi mai i te FileBeat ki te rarangi RabbitMQ. E rua nga waahanga i konei:

  1. Kia aroha mai, karekau a FileBeat he mono whakaputa hei tuhi tika ki a RabbitMQ. A ko nga mahi penei, ma te whakatau i te take i runga i ta raatau github, kaore i te whakamaherehia mo te whakatinanatanga. He mono mo Kafka, engari mo etahi take kaore e taea te whakamahi i te kaainga.
  2. He whakaritenga mo te kohikohi rakau i roto i te DMZ. I runga i a raatau, me whakauru tuatahi nga raarangi ki te rarangi ka panuihia e LogStash nga whakaurunga mai i te rarangi mai i waho.

Na reira, mo te keehi kei reira nga kaitoro kei te DMZ me whakamahi tetahi i tetahi kaupapa uaua. He penei te ahua o te whirihoranga tauira:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RapetiMQ. rarangi karere

Ka whakamahia tenei waahanga ki te aukati i nga urunga rangitaki ki te DMZ. Ka mahia te tuhi ma te paihere o Filebeat → LogStash. Ka mahia te panui mai i waho o te DMZ ma te LogStash. I te wa e mahi ana ma RabboitMQ, tata ki te 4 mano nga karere mo ia hekona ka tukatukahia.

Ko te ararere karere kua whirihorahia e te ingoa punaha, ara i runga i nga raraunga whirihoranga FileBeat. Ka haere nga karere katoa ki te rarangi kotahi. Mena na etahi take ka mutu te ratonga tutira, ka kore tenei e arai ki te ngaro o nga karere: Ka whiwhi a FileBeats i nga hapa hononga me te aukati i te tuku tuku. A ko LogStash e panui ana mai i te rarangi ka whiwhi hapa whatunga ka tatari kia whakahokia mai te hononga. I roto i tenei take, ko nga raraunga, ko te tikanga, kaore e tuhia ki te papaarangi.

Ka whakamahia nga tohutohu e whai ake nei hei hanga me te whirihora i nga rarangi:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Karepe. Papatohu

Ka whakamahia tenei waahanga ki te whakaata i nga raraunga aroturuki. I tenei keehi, me whakauru koe i te puna raraunga ClickHouse mo Grafana 4.6+ mono. Me whakatikatika maatau ki te whakapai ake i te pai o te tukatuka whiriwhiringa SQL i runga i te papatohu.

Hei tauira, ka whakamahia e matou nga taurangi, a, ki te kore e tautuhia ki te mara tātari, katahi matou ka pai kia kaua e whakaputa i tetahi ahuatanga i te WHERE o te puka ( uriStem = » AND uriStem != » ). I tenei keehi, ka panuihia e ClickHouse te poupou uriStem. I te nuinga o te waa, i whakamatauhia e matou etahi momo whiringa ka mutu ka whakatikahia te mono (te $valueIfEmpty tonotono) na te mea he uara kau ka hoki mai te 1, me te kore e whakahua i te pou ake.

Na inaianei ka taea e koe te whakamahi i tenei patai mo te kauwhata

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

e whakamaori ana ki tenei SQL (kia mahara kua huri nga mara uriStem putua ki te 1 noa)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

mutunga

Ko te ahua o te paatete a ClickHouse kua waiho hei huihuinga tohu i te maakete. He uaua ki te whakaaro, he tino kore utu, i te wa tonu kua mauhia e matou he taputapu kaha me te mahi mo te mahi me nga raraunga nui. Ko te tikanga, me te piki haere o nga hiahia (hei tauira, te tohatoha me te tukurua ki nga tuumau maha), ka uaua ake te kaupapa. Engari i runga i nga whakaaro tuatahi, he tino pai te mahi tahi me tenei putunga raraunga. Ka kitea he mea hanga te hua "mo te tangata."

Ka whakatauritea ki te ElasticSearch, ko te utu mo te penapena me te tukatuka i nga raarangi e kiia ana ka whakahekehia ma te rima ki te tekau nga wa. I etahi atu kupu, mena mo te nui o nga raraunga o naianei me whakatu e matou he huinga miihini maha, katahi ka whakamahi ClickHouse, kotahi te miihini iti-hiko ka nui ma matou. Ae, ko te tikanga, kei a ElasticSearch ano hoki nga tikanga taapiri raraunga i runga i te kōpae me etahi atu ahuatanga ka tino whakaitihia te kohi rauemi, engari ka whakatauritea ki a ClickHouse, ka nui ake te utu.

Karekau he arotautanga motuhake ki a maatau, i runga i nga tautuhinga taunoa, te uta i nga raraunga me te kowhiringa mai i te paataka ka mahi i te tere miiharo. Kaore ano kia nui nga raraunga (tata ki te 200 miriona rekoata), engari he ngoikore te tūmau. Ka taea e tatou te whakamahi i tenei taputapu a meake nei mo etahi atu kaupapa kaore e pa ana ki te penapena rakau. Hei tauira, mo te tātari mutunga-ki-mutunga, i roto i te waahi o te haumarutanga, te ako miihini.

I te mutunga, he iti mo te pai me te kino.

Минусы

  1. Te uta i nga rekoata ki nga puranga nui. I tetahi taha, he waahanga tenei, engari me whakamahi tonu koe i etahi atu waahanga mo te pupuri i nga rekoata. Ehara i te mea ngawari tenei mahi i nga wa katoa, engari ka wetekina tonu. A e hiahia ana ahau ki te whakangawari i te kaupapa.
  2. He maha nga wa ka pakaru etahi o nga mahi kee me nga ahuatanga hou i nga waahanga hou. Ka puta te awangawanga, ka whakaiti i te hiahia ki te whakapai ake ki tetahi putanga hou. Hei tauira, ko te miihini tepu Kafka he waahanga tino whai hua e taea ai e koe te panui tika i nga kaupapa mai i Kafka, me te kore e whakatinana i nga kaihoko. Engari ma te whakatau i te maha o nga Take kei runga i te github, kei te tupato tonu matou kia kaua e whakamahi i tenei miihini ki te whakaputa. Heoi, ki te kore koe e mahi i nga tohu ohorere ki te taha me te whakamahi i nga mahi matua, ka pai te mahi.

Плюсы

  1. E kore e whakaroa.
  2. Paepae urunga iti.
  3. Tuwhera-puna.
  4. Kore utu.
  5. He pai te pauna (te tiritiri/whakarite i waho o te pouaka)
  6. Kei roto i te rehita o nga rorohiko a Ruhia e taunakihia ana e te Manatu Whakawhitiwhiti.
  7. Te aroaro o te tautoko mana mai i Yandex.

Source: will.com

Tāpiri i te kōrero