Aaye aaye data ClickHouse fun Awọn eniyan, tabi Awọn Imọ-ẹrọ Ajeeji

Alexey Lizunov, ori ti ile-iṣẹ ijafafa fun awọn ikanni iṣẹ latọna jijin ti Imọ-ẹrọ Imọ-ẹrọ Alaye ti ICB

Aaye aaye data ClickHouse fun Awọn eniyan, tabi Awọn Imọ-ẹrọ Ajeeji

Gẹgẹbi yiyan si akopọ ELK (ElasticSearch, Logstash, Kibana), a n ṣe iwadii lori lilo ibi ipamọ data ClickHouse gẹgẹbi ibi ipamọ data fun awọn akọọlẹ.

Ninu nkan yii a yoo fẹ lati sọrọ nipa iriri wa nipa lilo ibi ipamọ data ClickHouse ati awọn abajade alakoko lati iṣẹ awakọ. O tọ lati ṣe akiyesi lẹsẹkẹsẹ pe awọn abajade jẹ iwunilori.


Aaye aaye data ClickHouse fun Awọn eniyan, tabi Awọn Imọ-ẹrọ Ajeeji

Nigbamii ti a yoo ṣe apejuwe ni alaye diẹ sii bi a ṣe tunto eto wa ati kini awọn paati ti o ni. Ṣugbọn ni bayi Emi yoo fẹ lati sọrọ diẹ nipa data data yii lapapọ, ati idi ti o fi tọ lati san ifojusi si. Ibi-ipamọ ClickHouse jẹ ibi-ipamọ data iṣiro ọwọn ti iṣẹ ṣiṣe giga lati Yandex. Ti a lo ninu awọn iṣẹ Yandex, lakoko eyi ni ibi ipamọ data akọkọ fun Yandex.Metrica. Eto orisun-ìmọ, ọfẹ. Lati wiwo ti olupilẹṣẹ, Mo nigbagbogbo iyalẹnu bi wọn ṣe ṣe imuse eyi, nitori data nla nla wa. Ati wiwo olumulo Metrica funrararẹ rọ pupọ ati ṣiṣẹ ni iyara. Nigbati o kọkọ ni oye pẹlu aaye data yii, o gba iwunilori: “Daradara, nikẹhin! Ṣe "fun awọn eniyan"! Lati ilana fifi sori ẹrọ si fifiranṣẹ awọn ibeere. ”

Aaye data yii ni idena titẹsi kekere pupọ. Paapaa olupilẹṣẹ apapọ le fi data data yii sori ẹrọ ni iṣẹju diẹ ki o bẹrẹ lilo rẹ. Ohun gbogbo ṣiṣẹ laisiyonu. Paapaa awọn eniyan ti o jẹ tuntun si Lainos le yara farada fifi sori ẹrọ ati ṣe awọn iṣẹ ti o rọrun. Ti o ba jẹ iṣaaju, nigbati o gbọ awọn ọrọ Big Data, Hadoop, Google BigTable, HDFS, olupilẹṣẹ apapọ ni imọran pe wọn n sọrọ nipa diẹ ninu awọn terabytes, petabytes, pe diẹ ninu awọn superhumans ni ipa ninu iṣeto ati idagbasoke awọn eto wọnyi, lẹhinna pẹlu dide. ti ibi ipamọ data ClickHouse a ni ohun elo ti o rọrun, oye pẹlu eyiti o le yanju ọpọlọpọ awọn iṣoro ti a ko le de tẹlẹ. Gbogbo ohun ti o gba jẹ ẹrọ apapọ deede ati iṣẹju marun lati fi sori ẹrọ. Iyẹn ni, a ni ibi ipamọ data bi, fun apẹẹrẹ, MySql, ṣugbọn fun titoju awọn ọkẹ àìmọye awọn igbasilẹ nikan! Iru superarchiver kan pẹlu ede SQL. O dabi pe a fun eniyan ni awọn ohun ija ajeji.

Nipa eto gbigba log wa

Lati gba alaye, awọn faili IIS log ti awọn ohun elo wẹẹbu ti ọna kika boṣewa ni a lo (a tun n ṣiṣẹ lọwọlọwọ ni sisọ awọn iwe ohun elo, ṣugbọn ibi-afẹde akọkọ wa ni ipele awakọ ni gbigba awọn iwe IIS).

A ko lagbara lati kọ akopọ ELK patapata fun awọn idi pupọ, ati pe a tẹsiwaju lati lo LogStash ati awọn paati Filebeat, eyiti o ti fi ara wọn han daradara ati ṣiṣẹ ni igbẹkẹle ati asọtẹlẹ.

Eto eto gedu gbogbogbo ti han ni aworan ni isalẹ:

Aaye aaye data ClickHouse fun Awọn eniyan, tabi Awọn Imọ-ẹrọ Ajeeji

Ẹya kan ti data gbigbasilẹ ni ibi-ipamọ ClickHouse jẹ igbagbogbo (lẹẹkan fun iṣẹju keji) fifi sii awọn igbasilẹ ni awọn ipele nla. Eyi, nkqwe, jẹ apakan “iṣoro” julọ ti o ba pade nigbati o n ṣiṣẹ pẹlu ibi ipamọ data ClickHouse fun igba akọkọ: ero naa di idiju diẹ sii.
Ohun itanna fun LogStash, eyiti o fi data taara sinu ClickHouse, ṣe iranlọwọ pupọ nibi. Yi paati ti wa ni ransogun lori kanna olupin bi awọn database ara. Nitorina, ni gbogbo igba, ko ṣe iṣeduro lati ṣe eyi, ṣugbọn lati oju-ọna ti o wulo, ki o má ba ṣẹda awọn olupin ọtọtọ nigba ti o ti gbe sori olupin kanna. A ko ṣe akiyesi awọn ikuna eyikeyi tabi awọn ariyanjiyan orisun pẹlu data data. Ni afikun, o yẹ ki o ṣe akiyesi pe ohun itanna naa ni ẹrọ atuntẹ ni ọran awọn aṣiṣe. Ati ni ọran ti awọn aṣiṣe, ohun itanna naa kọwe si disiki ipele data ti ko le fi sii (ọna kika faili jẹ irọrun: lẹhin ṣiṣatunṣe, o le ni rọọrun fi ipele ti a ṣe atunṣe ni rọọrun nipa lilo alabara-client).

Atokọ pipe ti sọfitiwia ti a lo ninu ero naa ni a gbekalẹ ninu tabili:

Akojọ ti awọn software lo

Akọle

Apejuwe

Ọna asopọ si pinpin

NGINX

Yiyipada-aṣoju fun ihamọ wiwọle nipasẹ ibudo ati siseto aṣẹ

Lọwọlọwọ ko lo ninu eto naa

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Gbigbe awọn akọọlẹ faili.

https://www.elastic.co/downloads/beats/filebeat (pinpin fun Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

LogStash

Log-odè.

Ti a lo lati gba awọn igbasilẹ lati FileBeat, bakannaa lati gba awọn igbasilẹ lati ori isinyi RabbitMQ (fun awọn olupin ti o wa ni DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-jade- tẹ ile

Ohun itanna Loagstash fun gbigbe awọn igbasilẹ si ibi ipamọ data ClickHouse ni awọn ipele

https://github.com/mikechris/logstash-output-clickhouse

/usr/pin/logstash/bin/logstash-plugin fi sori ẹrọ logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin fi sori ẹrọ logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin fi sori ẹrọ logstash-filter-multiline

Tẹ Ile

Wọle ipamọ https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Akiyesi. Bibẹrẹ lati Oṣu Kẹjọ ọdun 2018, “deede” rpm kọ fun RHEL han ni ibi ipamọ Yandex, nitorinaa o le gbiyanju lilo wọn. Ni akoko fifi sori ẹrọ a nlo awọn idii ti a ṣajọ nipasẹ Altinity.

Grafana

Visualization ti awọn àkọọlẹ. Ṣiṣeto awọn dasibodu

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos (64 Bit) - ẹya tuntun

ClickHouse datasource fun Grafana 4.6+

Ohun itanna fun Grafana pẹlu ClickHouse data orisun

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

LogStash

Wọle olulana lati FileBeat si isinyi RabbitMQ.

Akiyesi. Laanu FileBeat ko ni iṣelọpọ taara si RabbitMQ, nitorinaa ọna asopọ agbedemeji ni irisi Logstash nilo

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

EhoroMQ

Ifiranṣẹ isinyi. Eyi jẹ ifipamọ ti awọn titẹ sii log ni DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Akoko ṣiṣe Erlang (Ti a beere fun RabbitMQ)

Erlang asiko isise. Ti beere fun RabbitMQ lati ṣiṣẹ

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Iṣeto olupin pẹlu aaye data ClickHouse ti gbekalẹ ni tabili atẹle:

Akọle

Itumo

Daakọ

Iṣeto ni

HDD: 40GB
Ramu: 8GB
isise: Core 2 2Ghz

O yẹ ki o san ifojusi si awọn imọran fun lilo ibi ipamọ data ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Sọfitiwia jakejado eto

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Bi o ti le rii, eyi jẹ iṣẹ ṣiṣe deede.

Ilana ti tabili fun titoju awọn akọọlẹ jẹ bi atẹle:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

A lo awọn iye aiyipada fun ipin (oṣooṣu) ati granularity atọka. Gbogbo awọn aaye ni adaṣe ni ibamu si awọn titẹ sii log IIS fun gbigbasilẹ awọn ibeere http. Lọtọ, a ṣe akiyesi pe awọn aaye ọtọtọ wa fun titoju awọn aami utm (wọn ti ṣe atunto ni ipele ti fifi sii sinu tabili lati aaye okun ibeere).

Paapaa, ọpọlọpọ awọn aaye eto ni a ti ṣafikun si tabili lati tọju alaye nipa awọn eto, awọn paati, ati awọn olupin. Fun apejuwe awọn aaye wọnyi, wo tabili ni isalẹ. Ninu tabili kan a tọju awọn akọọlẹ fun awọn ọna ṣiṣe pupọ.

Akọle

Apejuwe

Apeere:

fld_app_orukọ

Ohun elo/orukọ eto
Awọn iye to wulo:

  • site1.domain.com Aaye ita 1
  • site2.domain.com Aaye ita 2
  • internal-site1.domain.local Aaye inu 1

ojula1.domain.com

fld_app_module

module eto
Awọn iye to wulo:

  • ayelujara - Aaye ayelujara
  • svc - Iṣẹ oju opo wẹẹbu
  • intgr - Iṣẹ iṣọpọ wẹẹbu
  • bo - Alakoso (BackOffice)

ayelujara

fld_website_name

Orukọ aaye ni IIS

Orisirisi awọn ọna šiše le wa ni ransogun lori ọkan server, tabi paapa ni ọpọlọpọ awọn instances ti ọkan eto module

ayelujara-akọkọ

fld_server_name

Orukọ olupin

web1.domain.com

fld_log_file_orukọ

Ọna si faili log lori olupin naa

Lati: inetpublogsLogFiles
W3SVC1u_ex190711.log

Eyi n gba ọ laaye lati kọ awọn aworan daradara ni Grafana. Fun apẹẹrẹ, wo awọn ibeere lati opin iwaju ti eto kan pato. Eyi jẹ iru si counter ojula ni Yandex.Metrica.

Eyi ni diẹ ninu awọn iṣiro lori lilo data data fun oṣu meji.

Nọmba awọn igbasilẹ nipasẹ eto ati paati

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Iwọn data disk

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Ipin data funmorawon iwe

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Apejuwe ti irinše lo

FileBeat. Gbigbe awọn akọọlẹ faili

Ẹya paati yii ṣe abojuto awọn ayipada lati wọle awọn faili lori disiki ati gbe alaye naa lọ si LogStash. Fi sori ẹrọ lori gbogbo awọn olupin nibiti a ti kọ awọn faili log (nigbagbogbo IIS). Ṣiṣẹ ni ipo iru (ie, o gbe awọn igbasilẹ ti a fi kun nikan si faili naa). Ṣugbọn o le tunto lọtọ lati gbe gbogbo awọn faili lọ. Eyi rọrun nigbati o nilo lati ṣe igbasilẹ data fun awọn oṣu iṣaaju. Kan fi faili log sinu folda kan ati pe yoo ka ni gbogbo rẹ.

Nigbati iṣẹ naa ba duro, data ma duro gbigbe siwaju si ibi ipamọ.

Iṣeto apẹẹrẹ kan dabi eyi:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Log-odè

Ẹya ara ẹrọ yii jẹ apẹrẹ lati gba awọn igbasilẹ igbasilẹ lati FileBeat (tabi nipasẹ isinyi RabbitMQ), ṣagbe ki o fi wọn sii ni awọn ipele sinu aaye data ClickHouse.

Lati fi sii sinu ClickHouse, lo Logstash-output-clickhouse itanna. Ohun itanna Logstash ni ẹrọ kan fun awọn ibeere atunkọ, ṣugbọn lakoko tiipa deede, o dara lati da iṣẹ naa duro funrararẹ. Nigbati o ba da duro, awọn ifiranṣẹ yoo kojọpọ ni isinyi RabbitMQ, nitorinaa ti iduro naa ba wa fun igba pipẹ, lẹhinna o dara lati da Filebeats duro lori awọn olupin naa. Ninu ero kan nibiti a ko lo RabbitMQ (lori nẹtiwọọki agbegbe Filebeat taara firanṣẹ awọn iforukọsilẹ si Logstash), Awọn faili faili n ṣiṣẹ itẹwọgba ati ailewu, nitorinaa wiwa abajade ko ni abajade.

Iṣeto apẹẹrẹ kan dabi eyi:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

Tẹ Ile. Wọle ipamọ

Awọn akọọlẹ fun gbogbo awọn ọna ṣiṣe ti wa ni fipamọ ni tabili kan (wo ni ibẹrẹ nkan naa). O jẹ apẹrẹ lati tọju alaye nipa awọn ibeere: gbogbo awọn paramita jẹ iru fun awọn ọna kika oriṣiriṣi, fun apẹẹrẹ awọn iwe IIS, apache ati awọn akọọlẹ nginx. Fun awọn igbasilẹ ohun elo ninu eyiti, fun apẹẹrẹ, awọn aṣiṣe, awọn ifiranṣẹ alaye, awọn ikilọ ti wa ni igbasilẹ, tabili lọtọ yoo pese pẹlu eto ti o yẹ (Lọwọlọwọ ni ipele apẹrẹ).

Nigbati o ba n ṣe apẹrẹ tabili kan, o ṣe pataki pupọ lati pinnu lori bọtini akọkọ (nipasẹ eyiti data yoo ṣe lẹsẹsẹ lakoko ibi ipamọ). Iwọn ti funmorawon data ati iyara ibeere da lori eyi. Ninu apẹẹrẹ wa, bọtini ni
PERE NIPA (fld_app_name, fld_app_module, logdatetime)
Iyẹn ni, nipasẹ orukọ eto naa, orukọ paati eto ati ọjọ iṣẹlẹ naa. Ni ibẹrẹ, ọjọ ti iṣẹlẹ naa wa ni akọkọ. Lẹhin gbigbe lọ si aaye ti o kẹhin, awọn ibeere bẹrẹ lati ṣiṣẹ ni iwọn meji ni iyara. Yiyipada bọtini akọkọ yoo nilo tun-ṣiṣẹda tabili ati tun gbejade data naa ki ClickHouse yoo tun to data naa lori disiki. Eyi jẹ iṣẹ ṣiṣe ti o nira, nitorinaa o ni imọran lati ronu ni pẹkipẹki nipa kini o yẹ ki o wa ninu bọtini too.

O yẹ ki o tun ṣe akiyesi pe iru data LowCardinality han ni awọn ẹya aipẹ to ṣẹṣẹ. Nigbati o ba nlo rẹ, iwọn ti data fisinuirindigbindigbin ti dinku pupọ fun awọn aaye wọnyẹn ti o ni kaadi kekere (awọn aṣayan diẹ).

Lọwọlọwọ a nlo ẹya 19.6 ati pe a gbero lati gbiyanju lati ṣe imudojuiwọn si ẹya tuntun. Wọn ni iru awọn ẹya iyalẹnu bii Granularity Adaptive, Awọn atọka yiyọ ati kodẹki DoubleDelta, fun apẹẹrẹ.

Nipa aiyipada, lakoko fifi sori ẹrọ ti ṣeto ipele gedu iṣeto lati wa kakiri. Awọn akọọlẹ ti yiyi ati ti wa ni ipamọ, ṣugbọn ni akoko kanna wọn gbooro si gigabyte kan. Ti ko ba si iwulo, lẹhinna o le ṣeto ipele si ikilọ, lẹhinna iwọn log yoo dinku ni didasilẹ. Awọn eto gedu naa jẹ pato ninu faili config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Diẹ ninu awọn ofin to wulo

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

LogStash. Wọle olulana lati FileBeat si isinyi RabbitMQ

Ẹya paati yii ni a lo lati ṣe ipa ọna awọn igbasilẹ ti nbọ lati FileBeat si isinyi RabbitMQ. Awọn aaye meji wa nibi:

  1. Laanu, FileBeat ko ni ohun itanna ti o wu jade fun kikọ taara si RabbitMQ. Ati iru iṣẹ ṣiṣe, ṣiṣe idajọ nipasẹ ifiweranṣẹ lori github wọn, ko gbero fun imuse. Ohun itanna kan wa fun Kafka, ṣugbọn fun awọn idi kan a ko le lo funrararẹ.
  2. Awọn ibeere wa fun gbigba awọn akọọlẹ ni DMZ. Da lori wọn, awọn akọọlẹ gbọdọ kọkọ wa ni isinyi ati lẹhinna LogStash ka awọn igbasilẹ lati isinyi ni ita.

Nitorinaa, pataki fun ọran ti awọn olupin ti o wa ni DMZ, o jẹ dandan lati lo iru ero idiju diẹ. Iṣeto apẹẹrẹ kan dabi eyi:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

EhoroMQ. Ifiranṣẹ isinyi

Ẹya paati yii ni a lo lati da awọn titẹ sii wọle sinu DMZ. Gbigbasilẹ jẹ ṣiṣe nipasẹ ọna asopọ Filebeat → LogStash. Kika ni a ṣe lati ita DMZ nipasẹ LogStash. Nigbati o ba n ṣiṣẹ nipasẹ RabbitMQ, nipa 4 ẹgbẹrun awọn ifiranṣẹ fun iṣẹju kan ni a ṣe ilana.

Ti tunto ipa ọna ifiranṣẹ nipasẹ orukọ eto, ie, da lori data iṣeto ni FileBeat. Gbogbo awọn ifiranṣẹ lọ sinu ọkan ti isinyi. Ti o ba jẹ fun idi kan ti iṣẹ isinyi duro, eyi kii yoo ja si pipadanu ifiranṣẹ: FileBeats yoo gba awọn aṣiṣe asopọ ati pe yoo da fifiranṣẹ duro fun igba diẹ. Ati LogStash, eyiti o ka lati isinyi, yoo tun gba awọn aṣiṣe nẹtiwọọki ati duro de asopọ lati mu pada. Ni ọran yii, dajudaju, data ko ni kọ si ibi ipamọ data mọ.

Awọn ilana wọnyi ni a lo lati ṣẹda ati tunto awọn ila:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dasibodu

A lo paati yii lati wo data ibojuwo. Ni idi eyi, o nilo lati fi sori ẹrọ orisun data ClickHouse fun ohun itanna Grafana 4.6+. A ni lati tweak diẹ diẹ lati mu iṣiṣẹ ti sisẹ awọn asẹ SQL lori dasibodu naa.

Fun apẹẹrẹ, a lo awọn oniyipada, ati pe ti wọn ko ba ni pato ninu aaye àlẹmọ, lẹhinna a yoo fẹ ki o ma ṣe agbekalẹ ipo kan ni NIBI ti fọọmu naa ( uriStem = "AND uriStem! = "). Ni idi eyi, ClickHouse yoo ka iwe uriStem. Nitorinaa, a gbiyanju awọn aṣayan oriṣiriṣi ati nikẹhin ṣe atunṣe ohun itanna naa (macro $valueIfEmpty) lati pada 1 ni ọran ti iye ṣofo, laisi mẹnuba ọwọn funrararẹ.

Ati ni bayi o le lo ibeere yii fun iyaya naa

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

eyiti o yipada si SQL bii eyi (akiyesi pe awọn aaye uriStem ofo ti yipada si 1 nikan)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

ipari

Irisi aaye data ClickHouse ti di iṣẹlẹ ala-ilẹ ni ọja naa. O jẹ gidigidi lati fojuinu pe ni akoko kan, laisi idiyele patapata, a ni ihamọra pẹlu ohun elo ti o lagbara ati ti o wulo fun ṣiṣẹ pẹlu data nla. Nitoribẹẹ, bi awọn iwulo ṣe n pọ si (fun apẹẹrẹ, sharding ati atunkọ si awọn olupin lọpọlọpọ), ero naa yoo di eka sii. Ṣugbọn gẹgẹ bi awọn iwunilori akọkọ, ṣiṣẹ pẹlu ibi ipamọ data yii dun pupọ. O han gbangba pe a ṣe ọja naa "fun awọn eniyan".

Ti a fiwera si ElasticSearch, iye owo ti ipamọ ati ṣiṣe awọn akọọlẹ, ni ibamu si awọn iṣiro alakoko, dinku nipasẹ marun si igba mẹwa. Ni awọn ọrọ miiran, ti o ba jẹ pe fun iwọn data lọwọlọwọ a yoo ni lati ṣeto iṣupọ ti awọn ẹrọ pupọ, lẹhinna nigba lilo ClickHouse a nilo ẹrọ agbara kekere kan nikan. Bẹẹni, nitorinaa, ElasticSearch tun ni awọn ọna ṣiṣe titẹ data lori disiki ati awọn ẹya miiran ti o le dinku agbara awọn orisun ni pataki, ṣugbọn ni akawe si ClickHouse eyi yoo nilo awọn idiyele nla.

Laisi awọn iṣapeye pataki eyikeyi ni apakan wa, pẹlu awọn eto aiyipada, ikojọpọ data ati gbigba data lati ibi ipamọ data ṣiṣẹ ni iyara iyalẹnu. A ko ni data pupọ sibẹsibẹ (nipa awọn igbasilẹ miliọnu 200), ṣugbọn olupin funrararẹ ko lagbara. A le lo ọpa yii ni ọjọ iwaju fun awọn idi miiran ti ko ni ibatan si titoju awọn akọọlẹ. Fun apẹẹrẹ, fun awọn atupale ipari-si-opin, ni aaye aabo, ẹkọ ẹrọ.

Ni ipari, kekere kan nipa awọn anfani ati awọn konsi.

Минусы

  1. Awọn igbasilẹ ikojọpọ ni awọn ipele nla. Ni apa kan, eyi jẹ ẹya kan, ṣugbọn o tun ni lati lo awọn paati afikun si awọn igbasilẹ ifipamọ. Iṣẹ yii kii ṣe rọrun nigbagbogbo, ṣugbọn tun yanju. Ati pe Emi yoo fẹ lati jẹ ki ero naa rọrun.
  2. Diẹ ninu awọn iṣẹ ṣiṣe nla tabi awọn ẹya tuntun nigbagbogbo fọ ni awọn ẹya tuntun. Eyi mu awọn ifiyesi dide, dinku ifẹ lati ṣe igbesoke si ẹya tuntun. Fun apẹẹrẹ, ẹrọ tabili Kafka jẹ ẹya ti o wulo pupọ ti o fun ọ laaye lati ka awọn iṣẹlẹ taara lati Kafka, laisi imuse awọn alabara. Ṣugbọn ṣiṣe idajọ nipasẹ nọmba Awọn ọran lori Github, a tun ṣọra fun lilo ẹrọ yii ni iṣelọpọ. Sibẹsibẹ, ti o ko ba ṣe awọn gbigbe lojiji si ẹgbẹ ati lo iṣẹ ṣiṣe ipilẹ, lẹhinna o ṣiṣẹ ni iduroṣinṣin.

Плюсы

  1. Ko fa fifalẹ.
  2. Ibalẹ titẹsi kekere.
  3. Open-orisun.
  4. Ọfẹ.
  5. Ti o le ṣe iwọn (atunṣe / jade kuro ninu apoti)
  6. Ti o wa ninu iforukọsilẹ ti sọfitiwia Russian ti a ṣeduro nipasẹ Ile-iṣẹ ti Awọn ibaraẹnisọrọ.
  7. Wiwa ti atilẹyin osise lati Yandex.

orisun: www.habr.com

Fi ọrọìwòye kun