ClickHouse Database pikeun Manusa, atanapi Alien Technologies

Aleksey Lizunov, Kapala Pusat Kompeténsi pikeun Saluran Layanan Jauh Direktorat Téknologi Informasi MKB

ClickHouse Database pikeun Manusa, atanapi Alien Technologies

Salaku alternatif pikeun tumpukan ELK (ElasticSearch, Logstash, Kibana), kami ngalaksanakeun panalungtikan ngeunaan ngagunakeun database ClickHouse salaku panyimpen data pikeun log.

Dina artikel ieu, urang hoyong ngobrol ngeunaan pangalaman urang ngagunakeun database ClickHouse jeung hasil awal tina operasi pilot. Ieu kudu dicatet langsung yén hasilna éta impressive.


ClickHouse Database pikeun Manusa, atanapi Alien Technologies

Salajengna, urang bakal ngajelaskeun sacara langkung rinci kumaha sistem kami dikonpigurasi sareng komponén naon anu diwangun. Tapi ayeuna Abdi hoyong ngobrol saeutik ngeunaan database ieu sakabéhna, sarta naha éta patut nengetan. Database ClickHouse mangrupikeun pangkalan data kolom analitik kinerja tinggi ti Yandex. Dipaké dina jasa Yandex, mimitina ieu mangrupa gudang data utama pikeun Yandex.Metrica. Sistem open-source, gratis. Tina sudut pandang pamekar, kuring sok heran kumaha aranjeunna ngalaksanakeun ieu, sabab aya data anu luar biasa ageung. Jeung panganteur pamaké Metrica sorangan pisan fléksibel tur gancang. Dina mimiti kenalan jeung database ieu, kesan téh: "Muhun, tungtungna! Dijieun "pikeun jalma"! Dimimitian ti prosés instalasi tur ditungtungan make requests ngirim.

database ieu boga bangbarung entri pisan low. Malah hiji pamekar rata-terampil bisa install database ieu dina sababaraha menit tur mimitian ngagunakeun eta. Sagalana jalan jelas. Malah jalma anu anyar pikeun Linux Ubuntu bisa gancang nanganan instalasi tur ngalakukeun operasi pangbasajanna. Upami sateuacana, kalayan kecap Big Data, Hadoop, Google BigTable, HDFS, pamekar biasa ngagaduhan ide yén éta ngeunaan sababaraha terabytes, petabytes, yén sababaraha superhumans kalibet dina setélan sareng pamekaran pikeun sistem ieu, teras ku mecenghulna ClickHouse. database, urang ngagaduhan basajan, alat kaharti jeung nu bisa ngajawab sauntuyan pancén saméméhna unattainable. Éta ngan ukur peryogi hiji mesin anu rata-rata sareng lima menit pikeun dipasang. Nyaéta, urang ngagaduhan database sapertos, contona, MySql, tapi ngan ukur pikeun nyimpen milyaran rékaman! A super-archiver tangtu kalawan basa SQL. Ieu kawas jalma anu dibikeun pakarang alien.

Ngeunaan sistem logging urang

Pikeun ngumpulkeun inpormasi, file log IIS tina aplikasi wéb format standar dianggo (kami ogé ayeuna nga-parsing log aplikasi, tapi tujuan utama dina tahap pilot nyaéta pikeun ngumpulkeun log IIS).

Pikeun sagala rupa alesan, urang teu bisa sagemblengna abandon tumpukan ELK, sarta kami neruskeun migunakeun LogStash na Filebeat komponén, nu geus kabuktian sorangan ogé sarta dianggo rada reliably na predictably.

Skéma logging umum dipidangkeun dina gambar di handap ieu:

ClickHouse Database pikeun Manusa, atanapi Alien Technologies

A fitur nulis data kana database ClickHouse jarang (sakali per detik) sisipan rékaman dina bets badag. Ieu, katingalina, mangrupikeun bagian anu paling "masalah" anu anjeun tepang nalika anjeun mimiti ngalaman damel sareng database ClickHouse: skéma janten langkung rumit.
Plugin pikeun LogStash, anu langsung nyelapkeun data kana ClickHouse, ngabantosan pisan di dieu. komponén ieu deployed dina server sarua salaku database sorangan. Janten, sacara umum, henteu disarankeun pikeun ngalakukeunana, tapi tina sudut pandang praktis, ku kituna henteu ngahasilkeun server anu misah nalika dipasang dina server anu sami. Kami henteu ningali kagagalan atanapi konflik sumberdaya sareng pangkalan data. Salaku tambahan, éta kedah diperhatoskeun yén plugin gaduh mékanisme cobian deui upami aya kasalahan. Sareng upami aya kasalahan, plugin nyerat kana disk sakumpulan data anu henteu tiasa diselapkeun (format file anu merenah: saatos ngédit, anjeun tiasa sacara gampil nyelapkeun bets anu dilereskeun nganggo clickhouse-klien).

Daptar lengkep parangkat lunak anu dianggo dina skéma ieu dipidangkeun dina tabél:

Daptar software dipaké

nami

gambaran

Tumbu ka distribusi

NGINX

Reverse-proxy pikeun ngawatesan aksés ku palabuhan sarta ngatur otorisasina

Ayeuna henteu dianggo dina skéma

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Mindahkeun log file.

https://www.elastic.co/downloads/beats/filebeat (kit distribusi pikeun Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Koléktor log.

Dipaké pikeun ngumpulkeun log ti FileBeat, kitu ogé pikeun ngumpulkeun log ti antrian RabbitMQ (pikeun server anu aya dina DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Loagstash plugin pikeun mindahkeun log kana database ClickHouse dina bets

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin install logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin install logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin install logstash-filter-multiline

clickhouse

Panyimpen log https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Catetan. Mimitian ti Agustus 2018, rpm "normal" ngawangun pikeun RHEL muncul dina gudang Yandex, ku kituna anjeun tiasa nyobian nganggoana. Dina waktos instalasi, kami nganggo bungkusan anu diwangun ku Altinity.

grafana

Log visualisasi. Nyetél dasbor

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - versi panganyarna

Sumber data ClickHouse pikeun Grafana 4.6+

Plugin pikeun Grafana sareng sumber data ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Log router tina FileBeat ka antrian RabbitMQ.

Catetan. Hanjakalna, FileBeat henteu gaduh kaluaran langsung ka RabbitMQ, janten tautan perantara dina bentuk Logstash diperyogikeun.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Kelenci MQ

antrian pesen. Ieu mangrupikeun panyangga log dina DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Diperlukeun pikeun RabbitMQ)

Runtime Erlang. Diperlukeun pikeun RabbitMQ tiasa dianggo

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Konfigurasi server sareng pangkalan data ClickHouse dipidangkeun dina tabel ieu:

nami

ajen

nyarios

Konfigurasi

HDD: 40 GB
RAM: 8GB
Prosesor: Inti 2 2Ghz

Perlu nengetan tip pikeun ngoperasikeun database ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Parangkat lunak sistem-lega

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Jawa 8)

 

Sakumaha anjeun tiasa tingali, ieu téh workstation biasa.

Struktur tabel pikeun nyimpen log nyaéta kieu:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Kami nganggo partisi standar (ku bulan) sareng granularity indéks. Sadaya widang praktis pakait jeung éntri log IIS pikeun logging requests http. Kapisah, urang dicatet yén aya widang misah pikeun nyimpen utm-tags (aranjeunna parsed dina tahap inserting kana tabel tina widang string query).

Ogé, sababaraha widang sistem geus ditambahkeun kana tabél pikeun nyimpen informasi ngeunaan sistem, komponén, sarta server. Tempo tabel di handap pikeun pedaran widang ieu. Dina hiji méja, urang nyimpen log pikeun sababaraha sistem.

nami

gambaran

conto

fld_app_name

Ngaran aplikasi / sistem
Nilai anu valid:

  • site1.domain.com Situs éksternal 1
  • site2.domain.com Situs éksternal 2
  • internal-site1.domain.local Situs internal 1

site1.domain.com

fld_app_module

modul Sistim
Nilai anu valid:

  • wéb - Website
  • svc - jasa situs wéb
  • intgr - Pamaduan Web Service
  • bo - Admin (BackOffice)

web

fld_website_name

Ngaran situs di IIS

Sababaraha sistem tiasa dipasang dina hiji server, atanapi malah sababaraha instansi tina hiji modul sistem

wéb utama

fld_server_name

Ngaran server

web1.domain.com

fld_log_file_name

Jalur ka file log dina server

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Ieu ngamungkinkeun anjeun épisién ngawangun grafik dina Grafana. Contona, tempo pamundut ti tungtung hareup sistem husus. Ieu sami sareng counter situs di Yandex.Metrica.

Di dieu aya sababaraha statistik dina pamakéan database salila dua bulan.

Jumlah rékaman direcah ku sistem sareng komponenana

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Jumlah data dina disk

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Babandingan komprési data kolom

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Pedaran komponén dipaké

FileBeat. Mindahkeun log file

Komponén ieu ngalacak parobihan kana log file dina disk sareng masihan inpormasi ka LogStash. Dipasang dina sadaya server dimana file log ditulis (biasana IIS). Gawéna dina modeu buntut (nyaéta ngan ukur mindahkeun rékaman anu ditambah kana file). Tapi sacara misah tiasa dikonpigurasikeun pikeun nransferkeun sadayana file. Ieu mangpaat lamun Anjeun kudu ngundeur data ti bulan saméméhna. Ngan nempatkeun file log dina polder sareng éta bakal dibaca sacara lengkep.

Nalika ladenan eureun, data lirén ditransfer deui ka panyimpenan.

Konfigurasi conto sapertos kieu:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Koléktor Log

komponén ieu dirancang pikeun nampa éntri log ti FileBeat (atawa ngaliwatan antrian RabbitMQ), parsing jeung inserting bets kana database ClickHouse.

Pikeun ngalebetkeun kana ClickHouse, plugin Logstash-output-clickhouse dianggo. Plugin Logstash ngagaduhan mékanisme usaha deui, tapi kalayan mareuman rutin, langkung saé ngeureunkeun jasa éta sorangan. Nalika dieureunkeun, pesen bakal akumulasi dina antrian RabbitMQ, jadi lamun eureun geus lila, mangka hadé pikeun ngeureunkeun Filebeats dina server. Dina skéma dimana RabbitMQ henteu dianggo (dina jaringan lokal, Filebeat langsung ngirim log ka Logstash), Filebeats tiasa dianggo kalayan aman sareng aman, janten pikeun aranjeunna henteu aya kaluaran kaluaran tanpa akibat.

Konfigurasi conto sapertos kieu:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

clickhouse. Panyimpen log

Log pikeun sadaya sistem disimpen dina hiji méja (tingali dina awal tulisan). Hal ieu dimaksudkeun pikeun nyimpen informasi ngeunaan requests: kabéh parameter anu sarupa pikeun format béda, kayaning IIS log, Apache na nginx log. Pikeun log aplikasi, anu, contona, kasalahan, pesen inpormasi, peringatan dirékam, tabel anu misah bakal disayogikeun sareng struktur anu pas (ayeuna dina tahap desain).

Nalika ngarancang méja, penting pisan pikeun mutuskeun konci primér (ku mana data bakal diurutkeun nalika neundeun). Darajat komprési data jeung speed query gumantung kana ieu. Dina conto urang, konci téh
ORDER BY (fld_app_name, fld_app_module, logdatetime)
Nyaéta, ku nami sistem, nami komponén sistem sareng tanggal kajadian. Mimitina, tanggal kajadian éta mimitina. Saatos ngalih ka tempat anu terakhir, patarosan mimiti damel sakitar dua kali langkung gancang. Ngarobah konci primér bakal merlukeun recreating tabel sarta reloading data supados ClickHouse ulang sorts data dina disk. Ieu operasi beurat, jadi mangrupakeun ide nu sae pikeun mikir loba ngeunaan naon nu kudu kaasup dina konci sortir.

Ogé kudu dicatet yén tipe data LowCardinality geus mucunghul dina versi rélatif panganyarna. Nalika nganggo éta, ukuran data anu dikomprés dikirangan sacara drastis pikeun widang anu ngagaduhan kardinalitas rendah (sababaraha pilihan).

Vérsi 19.6 ayeuna nuju dianggo sareng kami badé nyobian ngapdet kana vérsi panganyarna. Aranjeunna gaduh fitur anu saé sapertos Adaptive Granularity, Skipping indeks sareng codec DoubleDelta, contona.

Sacara standar, nalika instalasi, tingkat logging disetel ka ngalacak. Log diputar sareng diarsipkeun, tapi dina waktos anu sami aranjeunna dilegakeun dugi ka gigabyte. Upami henteu peryogi, maka anjeun tiasa nyetél tingkat peringatan, teras ukuran log dikirangan sacara drastis. Setelan logging disetel dina file config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Sababaraha paréntah mangpaat

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Log router tina FileBeat ka antrian RabbitMQ

Komponén ieu dianggo pikeun ruteu log anu asalna tina FileBeat ka antrian RabbitMQ. Aya dua titik di dieu:

  1. Hanjakalna, FileBeat teu gaduh plugin kaluaran kanggo nyerat langsung ka RabbitMQ. Sareng fungsionalitas sapertos kitu, ditilik ku masalah dina githubna, henteu direncanakeun pikeun dilaksanakeun. Aya plugin pikeun Kafka, tapi pikeun alesan nu tangtu urang teu bisa make eta sorangan.
  2. Aya sarat pikeun ngumpulkeun log di DMZ. Dumasar kana éta, log kudu mimiti ditambahkeun kana antrian lajeng LogStash maca éntri ti antrian ti luar.

Ku alatan éta, pikeun kasus dimana server anu lokasina di DMZ, hiji kudu ngagunakeun skéma rada pajeulit. Konfigurasi conto sapertos kieu:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. antrian pesen

Komponén ieu dipaké pikeun panyangga éntri log dina DMZ. Ngarékam dilakukeun ngaliwatan sakumpulan Filebeat → LogStash. Maca dilakukeun ti luar DMZ via LogStash. Nalika beroperasi ngaliwatan RabboitMQ, sakitar 4 rébu pesen per detik diolah.

Rute pesen dikonpigurasi ku nami sistem, nyaéta dumasar kana data konfigurasi FileBeat. Sadaya pesen angkat ka hiji antrian. Upami kusabab sababaraha alesan jasa antrian dieureunkeun, maka ieu moal ngakibatkeun leungitna pesen: FileBeats bakal nampi kasalahan sambungan sareng ngagantungkeun samentawis ngirim. Sareng LogStash anu maca tina antrian ogé bakal nampi kasalahan jaringan sareng ngantosan sambungan dibalikeun. Dina hal ieu, data, tangtosna, moal deui ditulis kana database.

Parentah di handap ieu dianggo pikeun nyiptakeun sareng ngonpigurasikeun antrian:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dasbor

Komponén ieu dianggo pikeun ngabayangkeun data ngawaskeun. Dina hal ieu, anjeun kudu masang ClickHouse datasource pikeun Grafana 4.6+ plugin. Urang kedah nyabit sakedik pikeun ningkatkeun efisiensi ngolah saringan SQL dina dasbor.

Contona, urang make variabel, sarta lamun maranéhna teu diatur dina widang filter, teras urang hoyong eta teu ngahasilkeun kaayaan dina WHERE formulir ( uriStem = » AND uriStem != » ). Dina hal ieu, ClickHouse bakal maca kolom uriStem. Sacara umum, urang diusahakeun pilihan béda sarta pamustunganana dilereskeun plugin nu (makro $ valueIfEmpty) ku kituna dina kasus hiji nilai kosong eta mulih 1, tanpa mentioning kolom sorangan.

Sareng ayeuna anjeun tiasa nganggo pamundut ieu pikeun grafik

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

nu ditarjamahkeun kana SQL ieu (catetan yén widang uriStem kosong geus dirobah jadi ngan 1)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

kacindekan

Penampilan database ClickHouse parantos janten acara landmark di pasar. Éta hésé ngabayangkeun yén, lengkep gratis, dina sakedapan kami angkatan sareng alat anu kuat sareng praktis pikeun damel sareng data ageung. Tangtosna, kalayan paningkatan kabutuhan (contona, sharding sareng réplikasi ka sababaraha server), skéma bakal langkung rumit. Tapi dina tayangan munggaran, gawé bareng database ieu pisan pikaresepeun. Ieu bisa ditempo yén produk dijieun "pikeun jalma."

Dibandingkeun sareng ElasticSearch, biaya nyimpen sareng ngolah log diperkirakeun dikirangan ku lima dugi ka sapuluh kali. Kalayan kecap sanésna, upami pikeun jumlah data ayeuna urang kedah nyetél gugusan sababaraha mesin, teras nalika nganggo ClickHouse, hiji mesin kakuatan rendah cekap pikeun urang. Leres, tangtosna, ElasticSearch ogé gaduh mékanisme komprési data dina disk sareng fitur-fitur sanésna anu tiasa ngirangan konsumsi sumberdaya sacara signifikan, tapi dibandingkeun sareng ClickHouse, ieu bakal langkung mahal.

Tanpa aya optimasi husus di pihak urang, kalawan setélan standar, loading data sarta retrieving tina database jalan dina speed endah pisan. Urang teu boga loba data acan (kira 200 juta rékaman), tapi server sorangan lemah. Urang bisa make alat ieu dina mangsa nu bakal datang pikeun tujuan séjén teu patali jeung nyimpen log. Contona, pikeun analytics tungtung-to-tungtung, dina widang kaamanan, learning mesin.

Tungtungna, saeutik ngeunaan pro jeung kontra.

Минусы

  1. Ngamuat rékaman dina bets badag. Di hiji sisi, ieu fitur, tapi anjeun masih kudu make komponén tambahan pikeun rékaman buffering. tugas ieu teu salawasna gampang, tapi tetep solvable. Jeung Abdi hoyong simplify skéma.
  2. Sababaraha pungsionalitas aheng atawa fitur anyar mindeng megatkeun dina versi anyar. Ieu ngabalukarkeun kahariwang, ngurangan kahayang pikeun ngamutahirkeun ka versi anyar. Salaku conto, mesin méja Kafka mangrupikeun fitur anu mangpaat anu ngamungkinkeun anjeun langsung maca acara ti Kafka, tanpa ngalaksanakeun konsumén. Tapi ditilik ku Jumlah Isu on github nu, urang masih Kade ulah ngagunakeun mesin ieu dina produksi. Nanging, upami anjeun henteu ngalakukeun sikep ngadadak ka gigir sareng nganggo pungsi utama, maka éta tiasa dianggo sacara stabil.

Плюсы

  1. Teu ngalambatkeun.
  2. bangbarung Éntri low.
  3. open source.
  4. Gratis.
  5. Timbangan ogé (sharding / réplikasi kaluar tina kotak)
  6. Kaasup dina register tina software Rusia dianjurkeun ku Departemen Komunikasi.
  7. Ayana dukungan resmi ti Yandex.

sumber: www.habr.com

Tambahkeun komentar