ClickHouse Database rau Tib Neeg, lossis Alien Technologies

Aleksey Lizunov, Tus Thawj Coj ntawm Lub Chaw Txawj Ntse rau Kev Pabcuam Chaw Taws Teeb ntawm Tus Thawj Coj ntawm Cov Ntaub Ntawv Technology ntawm MKB

ClickHouse Database rau Tib Neeg, lossis Alien Technologies

Raws li lwm txoj hauv kev rau ELK pawg (ElasticSearch, Logstash, Kibana), peb tab tom tshawb fawb txog kev siv ClickHouse database ua cov ntaub ntawv khaws cia rau cov cav.

Hauv tsab xov xwm no, peb xav tham txog peb qhov kev paub dhau los ntawm kev siv ClickHouse database thiab cov txiaj ntsig ua ntej ntawm kev ua haujlwm tsav. Nws yuav tsum raug sau tseg tam sim ntawd tias cov txiaj ntsig tau zoo heev.


ClickHouse Database rau Tib Neeg, lossis Alien Technologies

Tom ntej no, peb yuav piav qhia ntxiv txog seb peb lub kaw lus tau teeb tsa li cas, thiab cov khoom siv nws muaj li cas. Tab sis tam sim no kuv xav tham me ntsis txog cov ntaub ntawv no tag nrho, thiab vim li cas nws thiaj li tsim nyog tau them rau. Lub ClickHouse database yog ib qho kev ua tau zoo analytical columnar database los ntawm Yandex. Nws yog siv nyob rau hauv Yandex cov kev pab cuam, pib nws yog lub ntsiab cov ntaub ntawv cia rau Yandex.Metrica. Qhib qhov system, pub dawb. Los ntawm tus tsim tawm txoj kev xav, Kuv ib txwm xav tsis thoob tias lawv siv nws li cas, vim tias muaj cov ntaub ntawv loj heev. Thiab Metrica tus neeg siv interface nws tus kheej yog yoog raws thiab nrawm. Ntawm thawj tus neeg paub nrog cov ntaub ntawv no, qhov kev xav yog: “Zoo, thaum kawg! Ua rau cov neeg! Pib los ntawm cov txheej txheem installation thiab xaus nrog kev xa cov lus thov.

Cov ntaub ntawv no muaj qhov pib nkag tsawg heev. Txawm tias tus tsim tawm nruab nrab tuaj yeem nruab qhov database hauv ob peb feeb thiab pib siv nws. Txhua yam ua haujlwm kom meej. Txawm tias cov neeg tshiab rau Linux tuaj yeem ceev cov kev teeb tsa thiab ua cov haujlwm yooj yim tshaj plaws. Yog tias ua ntej, nrog cov lus Cov Ntaub Ntawv Loj, Hadoop, Google BigTable, HDFS, tus tsim tawm zoo tib yam muaj cov tswv yim hais tias nws yog hais txog qee qhov terabytes, petabytes, uas qee tus neeg superhumans koom nrog hauv kev teeb tsa thiab kev txhim kho rau cov tshuab no, tom qab ntawd nrog lub advent ntawm ClickHouse database, peb tau txais ib qho yooj yim, nkag siab cov cuab yeej uas koj tuaj yeem daws cov haujlwm uas tsis tuaj yeem ua dhau los. Nws tsuas siv ib lub tshuab nruab nrab ncaj ncees thiab tsib feeb rau nruab. Ntawd yog, peb tau txais cov ntaub ntawv xws li, piv txwv li, MySql, tab sis tsuas yog khaws cia ntau lab ntawm cov ntaub ntawv! Ib qho super-archiver nrog SQL lus. Nws zoo li tib neeg tau muab riam phom ntawm neeg txawv teb chaws.

Hais txog peb lub kaw lus

Txhawm rau sau cov ntaub ntawv, IIS cov ntaub ntawv teev npe ntawm cov qauv hauv web daim ntawv thov raug siv (peb tam sim no kuj tau txheeb xyuas cov ntawv teev npe, tab sis lub hom phiaj tseem ceeb ntawm theem pib yog sau IIS cov cav).

Rau ntau yam laj thawj, peb tsis tuaj yeem tso tseg tag nrho ELK pawg, thiab peb txuas ntxiv siv LogStash thiab Filebeat Cheebtsam, uas tau ua pov thawj lawv tus kheej zoo thiab ua haujlwm tau zoo thiab kwv yees.

Cov txheej txheem kev txiav ntoo dav dav yog qhia hauv daim duab hauv qab no:

ClickHouse Database rau Tib Neeg, lossis Alien Technologies

Ib qho tshwj xeeb ntawm kev sau cov ntaub ntawv rau ClickHouse database tsis tshua muaj (ib zaug ib ob) tso cov ntaub ntawv hauv cov khoom loj. Qhov no, pom tau tias, yog qhov "teeb ​​meem" tshaj plaws uas koj ntsib thaum koj thawj zaug ua haujlwm nrog ClickHouse database: cov tswv yim yuav nyuaj me ntsis.
Lub plugin rau LogStash, uas ncaj qha ntxig cov ntaub ntawv rau hauv ClickHouse, tau pab ntau ntawm no. Cov tivthaiv no yog siv rau ntawm tib lub server raws li lub database nws tus kheej. Yog li, feem ntau hais lus, nws tsis pom zoo kom ua, tab sis los ntawm cov tswv yim pom zoo, thiaj li tsis tsim cov servers cais thaum nws tau xa mus rau tib lub server. Peb tsis tau soj ntsuam ib qho kev ua tsis tiav lossis kev tsis sib haum xeeb nrog cov ntaub ntawv. Ntxiv mus, nws yuav tsum tau muab sau tseg tias lub plugin muaj ib tug retry mechanism nyob rau hauv cov ntaub ntawv ntawm yuam kev. Thiab nyob rau hauv cov ntaub ntawv ntawm kev ua yuam kev, lub plugin sau rau disk ib batch ntawm cov ntaub ntawv uas yuav tsis tau muab tso (cov ntaub ntawv hom yog yooj yim: tom qab kho, koj tuaj yeem yooj yim ntxig cov kho batch siv clickhouse-neeg siv).

Ib daim ntawv teev tag nrho ntawm software siv nyob rau hauv lub tswv yim yog nthuav tawm nyob rau hauv lub rooj:

Daim ntawv teev cov software siv

Lub npe

piav qhia

Distribution link

NGINX

Rov qab-proxy txwv kev nkag los ntawm cov chaw nres nkoj thiab teeb tsa kev tso cai

Tam sim no tsis siv nyob rau hauv lub tswv yim

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Hloov cov ntaub ntawv teev tseg.

https://www.elastic.co/downloads/beats/filebeat (cov khoom siv faib rau Windows 64 ntsis).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Log sau.

Siv los sau cov cav los ntawm FileBeat, nrog rau sau cov cav los ntawm RabbitMQ kab (rau cov servers uas nyob hauv DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Loagstash plugin rau kev xa cov cav mus rau ClickHouse database hauv batch

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin nruab logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin nruab logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin nruab logstash-filter-multiline

Nyem Tsev

Log cia https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Nco tseg. Pib txij lub Yim Hli 2018, "ib txwm" rpm tsim rau RHEL tau tshwm sim hauv Yandex repository, yog li koj tuaj yeem sim siv lawv. Thaum lub sijhawm teeb tsa, peb tau siv cov pob tsim los ntawm Altinity.

ua grafana

Log visualization. Teeb tsa dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos (64 ntsis) - qhov tseeb version

ClickHouse datasource rau Grafana 4.6+

Plugin rau Grafana nrog ClickHouse cov ntaub ntawv qhov chaw

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Log router los ntawm FileBeat rau RabbitMQ kab.

Nco tseg. Hmoov tsis zoo, FileBeat tsis tso tawm ncaj qha rau RabbitMQ, yog li yuav tsum muaj qhov txuas nruab nrab hauv daim ntawv ntawm Logstash

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

RabbitMQ

lus kab. Qhov no yog lub log tsis nyob hauv DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (yuav tsum tau rau RabbitMQ)

Lub sijhawm ua haujlwm. Yuav tsum tau rau RabbitMQ ua haujlwm

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Tus neeg rau zaub mov configuration nrog ClickHouse database yog nthuav tawm hauv cov lus hauv qab no:

Lub npe

nqi

Примечание

Configuration

HDD: 40 GB
RAM: 8GB
Processor: Core 2 2 Ghz

Nws yog ib qho tsim nyog yuav tsum tau them sai sai rau cov lus qhia rau kev ua haujlwm ntawm ClickHouse database (https://clickhouse.yandex/docs/ru/operations/tips/)

General system software

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Raws li koj tuaj yeem pom, qhov no yog qhov chaw ua haujlwm zoo tib yam.

Cov qauv ntawm lub rooj rau khaws cov cav yog raws li nram no:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Peb siv default partitioning (los ntawm lub hli) thiab index granularity. Txhua daim teb xyaum ua raws li IIS cov ntaub ntawv nkag rau kev nkag http thov. Cais, peb nco ntsoov tias muaj cov chaw sib cais rau khaws cia utm-tags (lawv tau txheeb xyuas ntawm theem ntawm kev ntxig rau hauv lub rooj los ntawm cov lus nug txoj hlua).

Tsis tas li ntawd, ntau qhov system teb tau ntxiv rau lub rooj khaws cov ntaub ntawv hais txog cov tshuab, cov khoom siv, cov servers. Saib cov lus hauv qab no rau kev piav qhia ntawm cov teb no. Hauv ib lub rooj, peb khaws cov cav rau ntau lub tshuab.

Lub npe

piav qhia

Piv Txwv:

fld_app_name

Application/system npe
Cov nqi siv tau:

  • site1.domain.com Sab nraud qhov chaw 1
  • site2.domain.com Sab nraud qhov chaw 2
  • internal-site1.domain.local Internal site 1

site1.domain.com

fld_app_module

Qhov system module
Cov nqi siv tau:

  • web - Web
  • svc - Web site service
  • intgr - Integration Web Service
  • bo - Admin (BackOffice)

web

fld_website_name

Lub npe chaw nyob hauv IIS

Ntau lub tshuab tuaj yeem siv rau ntawm ib tus neeg rau zaub mov, lossis txawm tias ntau zaus ntawm ib qho system module

web loj

fld_server_npe

Server npe

web1.domain.com

fld_log_file_name

Txoj kev mus rau lub log ntaub ntawv ntawm lub server

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Qhov no tso cai rau koj los tsim cov duab zoo hauv Grafana. Piv txwv li, saib cov lus thov los ntawm frontend ntawm ib qho system. Qhov no zoo ib yam li qhov chaw txee hauv Yandex.Metrica.

Nov yog qee qhov kev txheeb cais ntawm kev siv cov ntaub ntawv rau ob lub hlis.

Tus naj npawb ntawm cov ntaub ntawv tawg los ntawm cov tshuab thiab lawv cov khoom

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Tus nqi ntawm cov ntaub ntawv ntawm lub disk

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Degree ntawm cov ntaub ntawv compression nyob rau hauv kab

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Kev piav qhia ntawm cov khoom siv

FileBeat. Hloov cov ntaub ntawv teev tseg

Qhov kev tivthaiv no taug kev hloov pauv cov ntaub ntawv hauv disk thiab kis cov ntaub ntawv mus rau LogStash. Nruab rau txhua lub servers uas cov ntaub ntawv teev npe sau (feem ntau IIS). Ua hauj lwm hauv tus Tsov tus tw hom (piv txwv li hloov tsuas yog cov ntaub ntawv ntxiv rau cov ntaub ntawv). Tab sis cais nws tuaj yeem teeb tsa kom hloov tag nrho cov ntaub ntawv. Qhov no muaj txiaj ntsig zoo thaum koj xav rub tawm cov ntaub ntawv los ntawm lub hli dhau los. Cia li muab cov ntaub ntawv teev cia rau hauv ib lub nplaub tshev thiab nws yuav nyeem nws tag nrho.

Thaum qhov kev pabcuam raug tso tseg, cov ntaub ntawv tsis raug xa mus ntxiv rau qhov chaw cia.

Ib qho piv txwv configuration zoo li no:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Tus sau lub cav

Qhov kev tivthaiv no yog tsim los kom tau txais cov ntaub ntawv nkag los ntawm FileBeat (lossis dhau ntawm RabbitMQ queue), parsing thiab inserting batches rau hauv ClickHouse database.

Rau kev tso rau hauv ClickHouse, Logstash-output-clickhouse plugin yog siv. Lub Logstash plugin muaj qhov kev thov rov ua haujlwm dua, tab sis nrog kev kaw tsis tu ncua, nws zoo dua los nres qhov kev pabcuam nws tus kheej. Thaum nres, cov lus yuav tau sau rau hauv RabbitMQ kab, yog li yog tias qhov nres tau ntev, ces nws yog qhov zoo dua los nres Filebeats ntawm cov servers. Nyob rau hauv ib lub tswv yim uas RabbitMQ tsis siv (ntawm lub network hauv zos, Filebeat ncaj qha xa cov cav mus rau Logstash), Filebeats ua haujlwm tau txais txiaj ntsig zoo thiab ruaj ntseg, yog li rau lawv qhov tsis txaus ntawm cov zis dhau los yam tsis muaj qhov tshwm sim.

Ib qho piv txwv configuration zoo li no:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

cev.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

clickhouse. Log cia

Cov log rau txhua lub tshuab tau muab khaws cia rau hauv ib lub rooj (saib ntawm qhov pib ntawm tsab xov xwm). Nws yog npaj los khaws cov ntaub ntawv hais txog kev thov: txhua qhov tsis zoo sib xws rau ntau hom, xws li IIS cav, apache thiab nginx cav. Rau cov ntawv teev npe, uas, piv txwv li, yuam kev, cov lus qhia, cov lus ceeb toom raug kaw, ib lub rooj sib cais yuav muab nrog cov qauv tsim nyog (tam sim no nyob rau theem tsim).

Thaum tsim ib lub rooj, nws yog ib qho tseem ceeb heev uas yuav tau txiav txim siab ntawm qhov tseem ceeb (los ntawm qhov uas cov ntaub ntawv yuav raug txheeb xyuas thaum khaws cia). Qhov degree ntawm cov ntaub ntawv compression thiab cov lus nug ceev nyob ntawm qhov no. Hauv peb qhov piv txwv, qhov tseem ceeb yog
ORDER BY (fld_app_name, fld_app_module, logdatetime)
Ntawd yog, los ntawm lub npe ntawm qhov system, lub npe ntawm qhov system tivthaiv thiab hnub ntawm qhov kev tshwm sim. Thaum pib, hnub ntawm qhov kev tshwm sim tuaj ua ntej. Tom qab tsiv mus rau qhov chaw kawg, cov lus nug pib ua haujlwm li ob zaug sai dua. Hloov tus yuam sij tseem ceeb yuav xav tau rov tsim lub rooj thiab rov xa cov ntaub ntawv kom ClickHouse rov xaiv cov ntaub ntawv ntawm disk. Qhov no yog ib qho kev ua haujlwm hnyav, yog li nws yog ib lub tswv yim zoo los xav ntau yam txog dab tsi yuav tsum tau muab tso rau hauv qhov tseem ceeb.

Nws yuav tsum tau muab sau tseg tias LowCardinality cov ntaub ntawv hom tau tshwm sim nyob rau hauv kuj tsis ntev los no versions. Thaum siv nws, qhov loj ntawm cov ntaub ntawv compressed yog txo qis rau cov teb uas tsis tshua muaj cardinality (ob peb txoj kev xaiv).

Version 19.6 yog tam sim no siv thiab peb npaj yuav sim hloov kho mus rau qhov tseeb version. Lawv muaj cov yam ntxwv zoo li Adaptive Granularity, Skipping indices thiab DoubleDelta codec, piv txwv li.

Los ntawm lub neej ntawd, thaum lub sij hawm kev teeb tsa, cov txheej txheem txiav tawm yog teem rau kab. Cov cav tau tig thiab khaws cia, tab sis tib lub sijhawm lawv nthuav mus txog ib gigabyte. Yog tias tsis muaj qhov xav tau, ces koj tuaj yeem teeb tsa qhov kev ceeb toom, ces qhov loj ntawm lub cav raug txo qis heev. Qhov teeb tsa kev nkag tau teeb tsa hauv cov ntaub ntawv config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Qee cov lus txib muaj txiaj ntsig

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Log router los ntawm FileBeat rau RabbitMQ kab

Qhov kev tivthaiv no yog siv los khiav cov cav los ntawm FileBeat mus rau RabbitMQ kab. Muaj ob lub ntsiab lus ntawm no:

  1. Hmoov tsis zoo, FileBeat tsis muaj qhov tso zis plugin los sau ncaj qha rau RabbitMQ. Thiab kev ua haujlwm zoo li no, txiav txim siab los ntawm qhov teeb meem ntawm lawv cov github, tsis tau npaj rau kev siv. Muaj ib lub plugin rau Kafka, tab sis rau qee yam peb siv tsis tau hauv tsev.
  2. Muaj cov kev cai rau kev sau cov cav hauv DMZ. Raws li lawv, cov cav yuav tsum xub muab ntxiv rau hauv kab thiab tom qab ntawd LogStash nyeem cov ntawv nkag los ntawm kab los ntawm sab nraud.

Yog li ntawd, nws yog rau rooj plaub uas cov servers nyob hauv DMZ uas ib tus yuav tsum siv cov txheej txheem nyuaj me ntsis. Ib qho piv txwv configuration zoo li no:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. lus kab

Cov khoom siv no yog siv los tiv thaiv cov ntawv nkag hauv DMZ. Cov ntaubntawv povthawj siv yog ua tiav los ntawm ib pawg ntawm Filebeat → LogStash. Kev nyeem ntawv yog ua los ntawm sab nraud DMZ ntawm LogStash. Thaum ua haujlwm los ntawm RabboitMQ, kwv yees li 4 txhiab lus ib ob yog ua tiav.

Cov lus routing yog teeb tsa los ntawm lub npe system, piv txwv li raws li FileBeat teeb tsa cov ntaub ntawv. Tag nrho cov lus mus rau ib kab. Yog tias vim qee qhov kev pabcuam queuing raug tso tseg, qhov no yuav tsis ua rau cov lus poob: FileBeats yuav tau txais kev sib txuas tsis raug thiab ncua kev xa mus ib ntus. Thiab LogStash uas nyeem los ntawm cov kab kuj yuav tau txais kev ua yuam kev hauv network thiab tos rau kev sib txuas kom rov qab los. Hauv qhov no, cov ntaub ntawv, tau kawg, yuav tsis raug sau rau hauv database lawm.

Cov lus qhia hauv qab no yog siv los tsim thiab teeb tsa cov kab ke:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Cov khoom no yog siv los saib xyuas cov ntaub ntawv. Hauv qhov no, koj yuav tsum nruab ClickHouse datasource rau Grafana 4.6+ plugin. Peb yuav tsum tweak nws me ntsis los txhim kho qhov ua tau zoo ntawm kev ua cov lim dej SQL ntawm lub dashboard.

Piv txwv li, peb siv cov kev hloov pauv, thiab yog tias lawv tsis tau teeb tsa hauv qhov chaw lim, ces peb xav kom nws tsis txhob tsim ib qho xwm txheej hauv qhov twg ntawm daim ntawv (uriStem = » THIAB uriStem != » ). Hauv qhov no, ClickHouse yuav nyeem kab ntawv uriStem. Feem ntau, peb sim cov kev xaiv sib txawv thiab thaum kawg kho lub plugin ($valueIfEmpty macro) yog li ntawd nyob rau hauv cov ntaub ntawv ntawm tus nqi npliag nws rov 1, yam tsis tau hais txog kab nws tus kheej.

Thiab tam sim no koj tuaj yeem siv cov lus nug no rau daim duab

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

uas txhais rau SQL no (nco ntsoov tias qhov khoob uriStem teb tau raug hloov mus rau 1 xwb)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

xaus

Cov tsos ntawm ClickHouse database tau dhau los ua qhov xwm txheej tseem ceeb hauv kev ua lag luam. Nws yog ib qho nyuaj rau xav txog tias, tsis pub dawb kiag li, hauv ib qho tam sim no peb tau ua tub rog nrog lub cuab yeej muaj zog thiab siv tau rau kev ua haujlwm nrog cov ntaub ntawv loj. Tau kawg, nrog kev xav tau ntau ntxiv (piv txwv li, kev sib faib thiab rov ua dua rau ntau lub servers), cov tswv yim yuav nyuaj dua. Tab sis ntawm thawj qhov kev xav, ua haujlwm nrog cov ntaub ntawv no yog qhov zoo siab heev. Nws tuaj yeem pom tias cov khoom tsim "rau tib neeg."

Muab piv rau ElasticSearch, tus nqi khaws cia thiab ua cov ntaub ntawv raug kwv yees yuav raug txo los ntawm tsib mus rau kaum zaug. Hauv lwm lo lus, yog tias rau cov ntaub ntawv tam sim no peb yuav tau teeb tsa ib pawg ntawm ntau lub tshuab, tom qab ntawd thaum siv ClickHouse, ib lub tshuab hluav taws xob tsawg txaus rau peb. Yog lawm, ElasticSearch kuj tseem muaj cov ntaub ntawv compression ntawm lub disk thiab lwm yam nta uas tuaj yeem txo cov peev txheej, tab sis piv rau ClickHouse, qhov no yuav kim dua.

Tsis muaj qhov tshwj xeeb optimizations ntawm peb feem, ntawm qhov chaw pib, thauj cov ntaub ntawv thiab xaiv los ntawm cov ntaub ntawv ua haujlwm ntawm qhov nrawm nrawm. Peb tseem tsis tau muaj cov ntaub ntawv ntau (txog 200 lab cov ntaub ntawv), tab sis lub server nws tus kheej tsis muaj zog. Peb tuaj yeem siv cov cuab yeej no yav tom ntej rau lwm lub hom phiaj tsis cuam tshuam txog kev khaws cov cav. Piv txwv li, rau qhov kawg-rau-kawg analytics, hauv kev ruaj ntseg, kev kawm tshuab.

Thaum kawg, me ntsis txog qhov zoo thiab qhov tsis zoo.

Daim ntawv

  1. Loading cov ntaub ntawv nyob rau hauv loj batch. Ntawm qhov tod tes, qhov no yog qhov tshwj xeeb, tab sis koj tseem yuav tsum siv cov khoom siv ntxiv rau cov ntaub ntawv buffering. Txoj hauj lwm no tsis yog ib qho yooj yim, tab sis tseem daws tau. Thiab kuv xav kom yooj yim lub tswv yim.
  2. Qee qhov kev ua haujlwm txawv txawv lossis cov yam ntxwv tshiab feem ntau tawg hauv cov qauv tshiab. Qhov no ua rau muaj kev txhawj xeeb, txo qhov kev xav hloov mus rau ib qho tshiab. Piv txwv li, Kafka lub rooj cav yog qhov muaj txiaj ntsig zoo uas tso cai rau koj ncaj qha nyeem cov xwm txheej los ntawm Kafka, tsis tas siv cov neeg siv khoom. Tab sis txiav txim los ntawm tus naj npawb ntawm Cov Teeb Meem ntawm github, peb tseem ceev faj tsis txhob siv lub cav no hauv kev tsim khoom. Txawm li cas los xij, yog tias koj tsis ua tam sim ntawd taw tes rau sab thiab siv lub luag haujlwm tseem ceeb, ces nws ua haujlwm ruaj khov.

Yav tas los

  1. Tsis qeeb.
  2. Tsawg nkag nkag.
  3. Qhib-qhov chaw.
  4. Dawb.
  5. Scales zoo (sharding / replication tawm ntawm lub thawv)
  6. Muaj nyob rau hauv cov npe ntawm Lavxias teb sab software pom zoo los ntawm Ministry of Communications.
  7. Lub xub ntiag ntawm kev txhawb nqa los ntawm Yandex.

Tau qhov twg los: www.hab.com

Ntxiv ib saib