ClickHouse Database don Mutane, ko Fasahar Alien

Aleksey Lizunov, Shugaban Cibiyar Kwarewa don Tashoshin Sabis na Nesa na Daraktan Fasaha na Fasaha na MKB

ClickHouse Database don Mutane, ko Fasahar Alien

A matsayin madadin tarin ELK (ElasticSearch, Logstash, Kibana), muna yin bincike kan amfani da bayanan ClickHouse azaman ma'ajin bayanai don rajistan ayyukan.

A cikin wannan labarin, muna so muyi magana game da kwarewarmu ta yin amfani da bayanan ClickHouse da sakamakon farko na aikin matukin jirgi. Ya kamata a lura nan da nan cewa sakamakon ya kasance mai ban sha'awa.


ClickHouse Database don Mutane, ko Fasahar Alien

Na gaba, za mu bayyana dalla-dalla yadda aka tsara tsarin mu, da kuma abubuwan da ya ƙunshi. Amma yanzu ina so in yi magana kaɗan game da wannan bayanan gaba ɗaya, da kuma dalilin da ya sa ya kamata a kula da shi. Database ClickHouse babban babban aiki ne na nazari na columnar database daga Yandex. Ana amfani da shi a cikin ayyukan Yandex, da farko shine babban ma'ajiyar bayanai don Yandex.Metrica. Tsarin tushen tushen, kyauta. Ta fuskar mai haɓakawa, koyaushe ina mamakin yadda suka aiwatar da shi, saboda akwai manyan bayanai masu ban mamaki. Kuma Metrica's interface mai amfani da kanta yana da sauƙi da sauri. A farkon sanin wannan ma'auni, ra'ayin shine: "To, a ƙarshe! An yi don mutane! Fara daga tsarin shigarwa kuma yana ƙarewa tare da buƙatun aikawa.

Wannan ma'adanin bayanai yana da ƙananan mashigin shigarwa. Ko da ƙwararren ƙwararren mai haɓakawa zai iya shigar da wannan bayanan a cikin 'yan mintuna kaɗan kuma ya fara amfani da shi. Komai yana aiki a fili. Ko da mutanen da suke sababbi ga Linux suna iya ɗaukar shigarwa cikin sauri kuma suyi ayyuka mafi sauƙi. Idan a baya, tare da kalmomin Big Data, Hadoop, Google BigTable, HDFS, mai haɓakawa na yau da kullun yana da ra'ayoyin cewa game da wasu terabytes, petabytes ne, cewa wasu manyan mutane suna shiga cikin saitunan da haɓakawa don waɗannan tsarin, sannan tare da zuwan ClickHouse. Database, mun sami kayan aiki mai sauƙi, mai sauƙin fahimta wanda tare da shi zaku iya magance kewayon ayyuka da ba a iya samu a baya. Yana ɗaukar matsakaicin matsakaicin inji guda ɗaya da mintuna biyar don girka. Wato mun sami irin wannan bayanan kamar, misali, MySql, amma kawai don adana biliyoyin bayanai! Wani babban ma'ajiyar bayanai tare da yaren SQL. Kamar an mika wa mutane makaman baki.

Game da tsarin mu na shiga

Don tattara bayanai, ana amfani da fayilolin log na IIS na daidaitattun tsarin aikace-aikacen gidan yanar gizo (muna kuma a halin yanzu muna nazarin rajistar rajistar aikace-aikacen, amma babban burin a matakin matukin jirgi shine tattara rajistan ayyukan IIS).

Don dalilai daban-daban, ba za mu iya watsi da jigon ELK gaba ɗaya ba, kuma muna ci gaba da yin amfani da abubuwan LogStash da Filebeat, waɗanda suka tabbatar da kansu da kyau kuma suna aiki sosai da dogaro da tsinkaya.

Ana nuna tsarin tsarin katako na gaba ɗaya a cikin hoton da ke ƙasa:

ClickHouse Database don Mutane, ko Fasahar Alien

Siffar rubuta bayanai zuwa bayanan ClickHouse ba kasafai ba ne (sau daya a cikin dakika) shigar da bayanai a cikin manyan batches. Wannan, a fili, shine mafi yawan ɓangaren "matsala" da kuka haɗu da lokacin da kuka fara fara aiki tare da bayanan ClickHouse: makircin ya zama mai rikitarwa.
Kayan aikin LogStash, wanda ke shigar da bayanai kai tsaye a cikin ClickHouse, ya taimaka sosai anan. Ana tura wannan ɓangaren akan sabar iri ɗaya da ita kanta ma'adanin bayanai. Don haka, a gabaɗaya, ba a ba da shawarar yin shi ba, amma ta hanyar hangen nesa, don kada a samar da sabar daban yayin da ake saka shi akan sabar iri ɗaya. Ba mu lura da wani gazawa ko rikici na albarkatu tare da bayanan ba. Bugu da ƙari, ya kamata a lura cewa plugin ɗin yana da hanyar sake gwadawa idan akwai kurakurai. Kuma idan akwai kurakurai, plugin ɗin yana rubuta wa faifai tarin bayanai waɗanda ba za a iya saka su ba (tsarin fayil ɗin ya dace: bayan gyara, zaku iya shigar da batch ɗin da aka gyara cikin sauƙi ta amfani da abokin ciniki-client).

An gabatar da cikakken jerin software da aka yi amfani da su a cikin tsarin a cikin tebur:

Jerin software da aka yi amfani da su

Title

Description

hanyar rarrabawa

NGINX

Maida-wakili don ƙuntata samun dama ta tashar jiragen ruwa da tsara izini

A halin yanzu ba a yi amfani da tsarin ba

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Canja wurin rajistan ayyukan fayil.

https://www.elastic.co/downloads/beats/filebeat (kit ɗin rarraba don Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Mai tattara log.

Ana amfani da shi don tattara rajistan ayyukan daga FileBeat, da kuma tattara rajistan ayyukan daga layin RabbitMQ (don sabar da ke cikin DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-fitarwa-clickhouse

Loagstash plugin don canja wurin rajistan ayyukan zuwa ClickHouse database a batches

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin shigar logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin shigar logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin shigar logstash-filter-multiline

DannaHause

Ma'ajiyar log https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Lura. Tun daga watan Agusta 2018, "al'ada" rpm yana ginawa don RHEL a cikin ajiyar Yandex, don haka zaka iya gwada amfani da su. A lokacin shigarwa, muna amfani da fakitin da Altinity ya gina.

Grafana

Shiga gani. Saita dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - sabon sigar

ClickHouse datasource don Grafana 4.6+

Plugin don Grafana tare da tushen bayanan ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Shiga na'ura mai ba da hanya tsakanin hanyoyin sadarwa daga FileBeat zuwa RabbitMQ jerin gwano.

Lura. Abin takaici, FileBeat ba shi da fitarwa kai tsaye zuwa RabbitMQ, don haka ana buƙatar hanyar haɗin kai ta hanyar Logstash.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

RabbitMQ

layin saƙo. Wannan shine buffer log a cikin DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Ake buƙata don RabbitMQ)

Erlang runtime. Ana buƙata don RabbitMQ yayi aiki

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Tsarin uwar garken tare da bayanan ClickHouse an gabatar da shi a cikin tebur mai zuwa:

Title

Ma'ana

Примечание

Kanfigareshan

HDD: 40 GB
RAM: 8GB
Mai sarrafawa: Core 2 2Ghz

Wajibi ne a kula da tukwici don aiki da bayanan ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Babban tsarin software

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Kamar yadda kake gani, wannan wurin aiki ne na yau da kullun.

Tsarin tebur don adana katako shine kamar haka:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Muna amfani da ɓangarorin tsoho (ta wata) da ƙayyadaddun ƙididdiga. Duk filayen a zahiri sun dace da shigarwar log na IIS don shigar da buƙatun http. Na dabam, mun lura cewa akwai filaye daban-daban don adana alamar utm (an rarraba su a matakin sakawa cikin tebur daga filin kirtani na tambaya).

Har ila yau, an ƙara filayen tsarin da yawa zuwa teburin don adana bayanai game da tsarin, sassan, sabar. Dubi teburin da ke ƙasa don bayanin waɗannan filayen. A cikin tebur ɗaya, muna adana rajistan ayyukan don tsarin da yawa.

Title

Description

Alal misali:

fld_app_name

Sunan aikace-aikacen/tsari
Ingantattun ƙimomi:

  • site1.domain.com Wurin waje 1
  • site2.domain.com Wurin waje 2
  • inter-site1.domain.local Internal site 1

site1.domain.com

fld_app_module

Tsarin tsarin
Ingantattun ƙimomi:

  • yanar gizo - Yanar Gizo
  • svc - Sabis na gidan yanar gizon
  • intgr - Sabis na Yanar Sadarwa
  • bo - Admin (BackOffice)

web

fld_website_name

Sunan rukunin yanar gizon a cikin IIS

Ana iya tura tsarin da yawa akan sabar guda ɗaya, ko ma da yawa na tsarin tsarin guda ɗaya

babban gidan yanar gizo

fld_server_name

Sunan uwar garken

web1.domain.com

fld_log_file_name

Hanyar zuwa fayil ɗin log akan sabar

C: inetpublogsLogFiles
W3SVC1u_ex190711.log

Wannan yana ba ku damar gina hotuna da kyau a cikin Grafana. Misali, duba buƙatun daga gaban gaba na wani tsari. Wannan yayi kama da ma'aunin shafin a Yandex.Metrica.

Anan akwai wasu ƙididdiga akan amfani da ma'ajin bayanai na tsawon watanni biyu.

Adadin bayanan da aka rushe ta tsarin da abubuwan haɗin su

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Adadin bayanai akan faifai

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Matsayin matsawar bayanai a cikin ginshiƙai

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Bayanin abubuwan da aka yi amfani da su

FileBeat. Canja wurin rajistan ayyukan fayil

Wannan bangaren yana bin canje-canje zuwa fayilolin log akan faifai kuma yana ba da bayanin zuwa LogStash. An shigar akan duk sabar inda aka rubuta fayilolin log (yawanci IIS). Yana aiki a yanayin wutsiya (watau canja wurin bayanan da aka ƙara kawai zuwa fayil ɗin). Amma daban ana iya saita shi don canja wurin fayiloli gaba ɗaya. Wannan yana da amfani lokacin da kuke buƙatar zazzage bayanai daga watannin baya. Kawai saka fayil ɗin log ɗin a cikin babban fayil kuma zai karanta shi gaba ɗaya.

Lokacin da aka dakatar da sabis ɗin, ba a ƙara canja wurin bayanan zuwa ma'ajiyar.

Tsarin misali yayi kama da haka:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Log Collector

An tsara wannan ɓangaren don karɓar shigarwar log daga FileBeat (ko ta hanyar layin RabbitMQ), yin la'akari da saka batches a cikin ClickHouse database.

Don sakawa cikin ClickHouse, ana amfani da kayan aikin Logstash-output-clickhouse. Logstash plugin yana da tsarin sake gwadawa, amma tare da rufewa na yau da kullun, yana da kyau a dakatar da sabis ɗin kanta. Lokacin da aka tsaya, za a tara saƙonni a cikin layin RabbitMQ, don haka idan tasha ta kasance na dogon lokaci, to yana da kyau a dakatar da Filebeats akan sabar. A cikin makircin da ba a amfani da RabbitMQ (a kan hanyar sadarwa na gida, Filebeat kai tsaye yana aika rajistan ayyukan zuwa Logstash), Filebeat yana aiki sosai karɓuwa kuma amintacce, don haka rashin samun fitarwa ya wuce ba tare da sakamako ba.

Tsarin misali yayi kama da haka:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

bututun.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

danna gidan. Ma'ajiyar log

Ana adana rajistan ayyukan duk tsarin a cikin tebur ɗaya (duba a farkon labarin). An yi niyya don adana bayanai game da buƙatun: duk sigogi suna kama da nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in nau'in iri) an yi niyya don adana su, kamar rajistan ayyukan IIS, apache da nginx. Don rajistan ayyukan aikace-aikacen, wanda, alal misali, kurakurai, saƙonnin bayanai, an rubuta gargaɗin, za a ba da tebur daban tare da tsarin da ya dace (a halin yanzu a matakin ƙira).

Lokacin zayyana tebur, yana da matukar mahimmanci a yanke shawara akan maɓalli na farko (wanda za'a jera bayanan yayin ajiya). Matsayin matsawar bayanai da saurin tambaya ya dogara da wannan. A cikin misalinmu, mabuɗin shine
ORDER BY (fld_app_name, fld_app_module, login lokaci)
Wato, da sunan tsarin, sunan bangaren tsarin da ranar taron. Da farko, ranar taron ta zo na farko. Bayan matsar da shi zuwa wuri na ƙarshe, tambayoyin sun fara aiki kusan sau biyu cikin sauri. Canza maɓallin farko zai buƙaci sake ƙirƙirar tebur da sake loda bayanan don ClickHouse ya sake tsara bayanan akan faifai. Wannan aiki ne mai nauyi, don haka yana da kyau a yi tunani sosai game da abin da ya kamata a haɗa a cikin nau'in maɓalli.

Ya kamata kuma a lura cewa nau'in bayanan LowCardinality ya bayyana a ɗanɗano a cikin 'yan kwanan nan. Lokacin amfani da shi, girman bayanan da aka matsa yana raguwa sosai don waɗancan filayen da ke da ƙarancin kadin (zaɓi kaɗan).

A halin yanzu ana amfani da sigar 19.6 kuma muna shirin gwada sabuntawa zuwa sabon salo. Suna da fasalulluka masu ban mamaki kamar Adaptive Granularity, Skipping fihirisa da kuma DoubleDelta codec, misali.

Ta hanyar tsoho, yayin shigarwa, an saita matakin shiga don ganowa. Ana jujjuya rajistan ayyukan kuma ana adana su, amma a lokaci guda suna faɗaɗa har zuwa gigabyte. Idan babu buƙata, to, zaku iya saita matakin faɗakarwa, sannan girman log ɗin yana raguwa sosai. An saita saitin shiga cikin fayil ɗin config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Wasu umarni masu amfani

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Shiga na'ura mai ba da hanya tsakanin hanyoyin sadarwa daga FileBeat zuwa RabbitMQ jerin gwano

Ana amfani da wannan ɓangaren don yin layin rajistan ayyukan da ke zuwa daga FileBeat zuwa jerin gwanon RabbitMQ. Akwai maki biyu a nan:

  1. Abin takaici, FileBeat ba shi da kayan aikin fitarwa don rubuta kai tsaye zuwa RabbitMQ. Kuma irin wannan aikin, yin la'akari da batun akan github, ba a shirya aiwatarwa ba. Akwai plugin don Kafka, amma saboda wasu dalilai ba za mu iya amfani da shi a gida ba.
  2. Akwai buƙatu don tattara rajistan ayyukan a cikin DMZ. Dangane da su, dole ne a fara ƙara rajistan ayyukan zuwa jerin gwano sannan LogStash ya karanta shigarwar daga layin daga waje.

Saboda haka, don yanayin inda sabobin ke cikin DMZ shine mutum yayi amfani da irin wannan makirci mai rikitarwa. Tsarin misali yayi kama da haka:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. layin saƙo

Ana amfani da wannan ɓangaren don adana shigarwar log a cikin DMZ. Ana yin rikodin ta hanyar gungun Filebeat → LogStash. Ana yin karatu daga wajen DMZ ta hanyar LogStash. Lokacin aiki ta hanyar RabboitMQ, ana sarrafa saƙon kusan 4 a cikin daƙiƙa guda.

An daidaita tsarin saƙon ta hanyar sunan tsarin, watau dangane da bayanan daidaitawar FileBeat. Duk saƙonni suna zuwa layi ɗaya. Idan saboda wasu dalilai an dakatar da sabis ɗin jerin gwano, to wannan ba zai haifar da asarar saƙonni ba: FileBeats zai karɓi kurakuran haɗi kuma ya dakatar da aikawa na ɗan lokaci. Kuma LogStash wanda ya karanta daga jerin gwano shima zai karɓi kurakuran hanyar sadarwa kuma yana jira don dawo da haɗin. A wannan yanayin, bayanan, ba shakka, ba za a sake rubuta su zuwa bayanan ba.

Ana amfani da waɗannan umarni masu zuwa don ƙirƙira da daidaita jerin gwano:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Ana amfani da wannan ɓangaren don ganin bayanan sa ido. A wannan yanayin, kuna buƙatar shigar da tushen bayanan ClickHouse don kayan aikin Grafana 4.6+. Dole ne mu ɗan ɗanɗana shi don inganta ingancin sarrafa matatun SQL akan dashboard.

Misali, muna amfani da masu canji, kuma idan ba a saita su a cikin filin tacewa ba, to muna son kada ya haifar da wani yanayi a INA form ( uriStem = » AND uriStem! = »). A wannan yanayin, ClickHouse zai karanta ginshiƙin uriStem. Gabaɗaya, mun gwada zaɓuɓɓuka daban-daban kuma a ƙarshe mun gyara plugin ɗin ( $valueIfEmpty macro) ta yadda a yanayin ƙima mara kyau ya dawo 1, ba tare da ambaton shafi kanta ba.

Kuma yanzu zaku iya amfani da wannan tambayar don jadawali

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

wanda ke fassara zuwa wannan SQL (lura cewa an canza filayen uriStem mara kyau zuwa 1 kawai)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

ƙarshe

Bayyanar bayanan ClickHouse ya zama abin tarihi a kasuwa. Yana da wuya a yi tunanin cewa, gaba ɗaya kyauta, a cikin nan take muna da makamai da kayan aiki mai ƙarfi da aiki don aiki tare da manyan bayanai. Tabbas, tare da haɓaka buƙatu (misali, sharding da maimaitawa zuwa sabobin masu yawa), makircin zai zama mafi rikitarwa. Amma a kan abubuwan farko, aiki tare da wannan bayanan yana da daɗi sosai. Ana iya ganin cewa an yi samfurin "ga mutane."

Idan aka kwatanta da ElasticSearch, an ƙiyasta farashin adanawa da sarrafa gundumomi zai ragu da sau biyar zuwa goma. A wasu kalmomi, idan don adadin bayanai na yanzu dole ne mu kafa gungu na injuna da yawa, to lokacin amfani da ClickHouse, na'ura mai ƙarancin wuta ɗaya ya ishe mu. Ee, ba shakka, ElasticSearch shima yana da hanyoyin damfara bayanai akan diski da sauran fasalulluka waɗanda zasu iya rage yawan amfani da albarkatu, amma idan aka kwatanta da ClickHouse, wannan zai fi tsada.

Ba tare da wani ingantawa na musamman a ɓangarenmu ba, akan saitunan tsoho, ɗora bayanai da zaɓar daga bayanan bayanai suna aiki cikin sauri mai ban mamaki. Ba mu da bayanai da yawa tukuna (kimanin rikodin miliyan 200), amma uwar garken kanta ba ta da ƙarfi. Za mu iya amfani da wannan kayan aiki a nan gaba don wasu dalilai waɗanda ba su da alaƙa da adana rajistan ayyukan. Misali, don nazari na ƙarshe zuwa ƙarshe, a fagen tsaro, koyon injina.

A ƙarshe, kaɗan game da ribobi da fursunoni.

Минусы

  1. Ana loda bayanai a cikin manyan batches. A gefe ɗaya, wannan siffa ce, amma har yanzu dole ne ku yi amfani da ƙarin abubuwan haɗin gwiwa don bayanan buffer. Wannan aikin ba koyaushe yake da sauƙi ba, amma har yanzu ana iya warware shi. Kuma ina so in sauƙaƙa tsarin.
  2. Wasu ayyuka masu ban mamaki ko sabbin abubuwa galibi suna karyewa a cikin sabbin nau'ikan. Wannan yana haifar da damuwa, rage sha'awar haɓakawa zuwa sabon salo. Misali, injin tebur na Kafka abu ne mai matukar amfani wanda ke ba ka damar karanta abubuwan da suka faru kai tsaye daga Kafka, ba tare da aiwatar da masu amfani ba. Amma idan muka yi la'akari da adadin batutuwan da ke kan github, har yanzu muna mai da hankali kada mu yi amfani da wannan injin wajen samarwa. Koyaya, idan ba ku yi alamun kwatsam zuwa gefe ba kuma kuyi amfani da babban aikin, to yana aiki da ƙarfi.

Плюсы

  1. Baya rage gudu.
  2. Ƙarƙashin ƙofar shiga.
  3. Bude tushen.
  4. Kyauta.
  5. Sikeli da kyau (sharing/maimaitawa daga cikin akwatin)
  6. Kunshe a cikin rajistar software na Rasha wanda Ma'aikatar Sadarwa ta ba da shawarar.
  7. Kasancewar goyan bayan hukuma daga Yandex.

source: www.habr.com

Add a comment