ClickHouse Database pro hominibus, vel Aliena Technologies

Aleksey Lizunov, Caput Competentiae Centrum pro servitii longinquis canales directoratus notitiarum technologiarum MKB

ClickHouse Database pro hominibus, vel Aliena Technologies

Velut jocus ad ELK acervum (ElasticSearch, Logstash, Kibana), investigationem facimus utentes database strepitatoria sicut data copia lignorum.

In hoc articulo loqui volumus de experientia nostra utendi CPU database et praeliminares eventus operationis gubernatoris. Notandum ilicet eventus infigo.


ClickHouse Database pro hominibus, vel Aliena Technologies

Deinde quomodo systema nostrum configuratur, et quae ex his constat, planius describemus. Nunc autem libet pauca de hoc datorum toto loqui, et cur operae pretium sit animadvertere. In database ClickHouse summus perficientur est datorum analyticorum columnaris Yandex. Usus est in officiis Yandex, initio est summa notitia repono pro Yandex.Metrica. Ratio fons open-s, gratis. Ex parte elit, semper miratus sum quomodo hoc impleverunt, quia magna notitia est fantastically. Et in usuario metrica ipsa flexibilis et velocissima est. In prima notitia huius datorum, impressio est: “Bene denique! Factus est pro populo! Incipiens ab institutione processus et usque ad petitiones mittens.

Hoc database ingressum limen nimis habet humile. Etiam mediocris elit peritum hoc database in paucis minutis instituere potest et eo utendo committitur. Omnia clare operatur. Etiam homines qui Linux novi sunt, celeriter institutionem tractare et operationes simplicissimas facere possunt. Si antea cum verbis Big Data, Hadoop, Google BigTable, HDFS, ordinarius elaboravit notiones habuit de quibusdam terabytis, petabytis, quosdam superhumanos in uncinis et progressibus pro his systematibus versari, tum adveniente ClickHouse database, instrumentum simplex et comprehensum possedimus quo solvere potes prius operarum amplitudinem attingi. Apparatus mediocris unus satis accipit et quinque minuta ad instituendum. Hoc est, tale database egimus ut, exempli gratia, MySql, sed tantum ad billions tabularum recondendas! Quidam eximius archiverus cum SQL sermone. Sicut homines alieni arma traditi sunt.

De nostra logging systema

Ad informationes colligendas, IIS tabulae stipes usitatae formarum applicationes interretiales adhibentur (etiam nunc parsing schedulae schedulae sunt, sed praecipuum propositum apud gubernatorem est IIS tigna colligere).

Propter varias causas ELK acervum omnino deserere non potuimus, et LogStash et Filebeat componentibus uti pergimus, quae se bene et fideliter et praevidere satis probaverunt.

Conclusio generalis logificationis in schemate infra ostenditur:

ClickHouse Database pro hominibus, vel Aliena Technologies

Pluma scribendi notitias ad datorum ClickHouse infrequens est (semel per secundam) monumentorum insertio in magnas batches. Haec, ut videtur, est maxima pars "problematica", quam invenimus cum primum experientiam operandi cum database ClickHouse: ratio paulo magis implicata fit.
Plugin pro LogStash, quae data protinus in ClickHouse inserta, multum hic adiuvit. Haec pars in eodem servo explicatur ac ipsa database. Ita generaliter loquendo non commendatur id faciendum, sed a ratione practica, ut non efficiat ministros separatos, dum ab eodem servo disponitur. Nos nullas defectiones vel facultates pugnas cum database observavimus. Praeterea notandum est plugin retry mechanismum in errorum casu habere. Et in casu errorum, plugin scribit ad disci massam notitiarum quae inseri non poterant (commodum est tabella forma: post edendum, facile potes inserere massam correctam utens cliens-cliens).

Integrum index programmatis usus est in schemate quod in schemate exhibetur:

Index usus software

nomine

Description

Distribution link

nginx

Vicissim-procurator ad restringere accessum per portus et potestatem organize

Currently non usus est in ratione

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

De translatione tabulariorum.

https://www.elastic.co/downloads/beats/filebeat (distributio ornamentum pro Fenestra 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Decumanus logi.

Tigna ex FileBeat colligebant, et ligna ex RabbitMQ queue colligebant (pro ministris qui in DMZ sunt).

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Loagstash plugin for transferendi ligna ad ClickHouse database in batches

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin install logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin install logstash-filter-puta

/usr/share/logstash/bin/logstash-plugin install logstash-filter multiline

clickhouse

Log repono https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Nota. Proficiscens ab Augusto 2018, "normalis" rpm aedificat pro RHE in Yandex repositorio apparens, ut illis uti possis. Ad tempus institutionis, fasciculis ab Altinitate aedificatis utebantur.

grafana

Log visualisation. Occasus sursum dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos (64 Bit) - latest version

ClickHouse datasource pro Grafana 4.6+

Plugin for Grafana cum ClickHouse data fonte

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Log iter itineris a FileBeat ad RabbitMQ queue.

Nota. Infeliciter FileBeat directe ad RabbitMQ non output, ideo nexus intermedius in forma Logstash requiritur

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

RabbitMQ

nuntius queue. Hoc est quiddam in DMZ trabem

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (required pro RabbitMQ)

Erlang runtime. Requiritur ad RabbitMQ ad opus

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Servo configurationem cum database ClickHouse exhibetur in sequenti tabula:

nomine

valorem

illud

configuratione

HDD: 40GB
Ram 8GB
Processus: Core 2 2Ghz

Necesse est operam dare labris ad datorum ClickHouse operandam (https://clickhouse.yandex/docs/ru/operations/tips/)

Systema generale software

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Ut vides, haec est vulgaris workstation.

Structura mensae ad condenda omnia haec est:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Default partitionibus (per mensem) utimur et indice granularitatis. Omnes agri fere respondent IIS stipes viscus ad logging petitiones http. Separatim animadvertimus singulas esse agros ad deponenda utm-tags (partim sunt in scaena inserendi mensam e chorda interrogationis campi).

Plures etiam agri systematis in tabula additi sunt ut informationes de systematibus, componentibus, servientibus congregarentur. Vide tabulam infra ad descriptionem agrorum. In una tabula, ligna pro pluribus systematibus condimus.

nomine

Description

exempli gratia

fld_app_name

Applicationem / nomen ratio
Valores validi:

  • site1.domain.com Externi situs 1
  • site2.domain.com Externi situs 2
  • internus-site1.domain.local Internus situs 1

site1.domain.com

fld_app_module

Ratio moduli
Valores validi:

  • web - Website
  • svc - Web site service
  • intgr - Integration Web Service
  • bo - Fusce ut (BackOffice)

Web

fld_website_name

Nomen situm in IIS

Plures rationes in uno servo explicari possunt, vel etiam plures instantiae unius moduli systematis

web main

fld_server_name

Servo nomine

web1.domain.com

fld_log_file_name

Iter ad stipes lima in calculonis servi

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Hoc tibi permittit ut graphas in Grafana efficaciter construas. Exempli gratia: petita sententia a ratione particulari frontend. Hoc simile est cum situs counter in Yandex.Metrica.

Hic statistica quaedam sunt in usu datorum duorum mensium.

Numerus monumentorum systematis eorumque componentium fractus est

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Moles notitia in disco

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Gradus notitia compressionem in columnas

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Description of used components

FileBeat. Transferens lima omnia

Haec pars vestigia mutat ad tabulas logas in disco et indicium ad LogStash transit. Omnibus servientibus inauguratus est ubi tabellae stipes scriptae sunt (plerumque IIS). Opera in cauda modum (i.e. tantum monumenta ad tabellam transfert). Sed separatim configurari potest ad integras tabulas transferendas. Hoc utile est cum notitias ex superioribus mensibus extrahere debes. Modo tabellam stipes pone in folder et totam perlege.

Cum servitium intercluditur, notitia ulterius ad repono non transfertur.

Exemplum figurae sic spectat:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Log Collector

Hoc component destinatur ad introitus tabularum a FileBeat (vel per RabbitMQ queue recipiendos), parsing et batches in datorum ClickHouse inserendo.

Ad deprimendum in ClickHouse, plugin Logstash-output-clickhouse adhibetur. Plugin Logstash petitionem retry mechanismum habet, sed cum shutdown regulari, melius est ipsum servitium obsistere. Cum cessaverunt, nuntii cumulabuntur in RabbitMQ queue, ergo si diu moratus est, melius est ut in servientibus scatebra sistere. In schemate ubi RabbitMQ adhibetur (in network locali, Filebeat directe trabes ad Logstash mittit), Filebeatae satis gratae et secures operantur, ut eis necessariae inexplicabilis sine consecutionibus transit.

Exemplum figurae sic spectat:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

clickhouse. Log repono

Acta pro omnibus systematibus in una tabula reponuntur (vide in principio articuli). Propositum est informationes de petitionibus condere: omnes parametri similes sunt diversis formatis, ut IIS tigna, apache et nginx tigna. Ad schedulam applicandam, in quibus, exempli gratia, errores, nuntii, monita notantur, mensa separata apta structura praebebitur (pro spectaculo in theatro designando).

Cum tabulam designans, magni momenti est de clavibus primaria decernere (qua notitia in repositione digesta erit). Compressionis notitiae et celeritatis quaesitio gradus ab hoc pendet. In exemplo nostro clavis est
ORDO (fld_app_name, fld_app_module, logdatetime)
Id est nomen systematis, nomen ratio componentis et temporis eventu. Primo, dies eius rei primus accessit. Postquam eam ad ultimum movens locum, queries de duplo velocius laborare incepit. Mutans primarium clavem mensam recreare et notitias reloadre requiret ut strepita re- lepide data in disco. Haec operatio gravis est, ut utilem suus multum cogitare debet in quali clavis comprehendi.

Animadvertendum etiam est genus notitiae LowCardinality apparuit in recentioribus versionibus relative. Cum ea utens, magnitudo notitiarum compressarum vehementius minuitur pro illis agris qui cardinalitatem humilem habent (paucae optiones).

Versio 19.6 nunc in usu est et ad emendationem recentissimam temptare cogitamus. Tam mirabiles lineas habent ut Granularity adaptiva, indices omissis et codec DoubleDelta, exempli gratia.

Defalta, durante institutione, gradus logging est ad vestigium. Tigna rotantur et scribuntur, sed simul dilatantur ad gigabytum. Si opus non est, tunc planum monitionem pones, magnitudo autem stipes vehementissime reducitur. Occasus colligationis positus in fasciculo config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Quidam utile imperium

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Log iter itineris a FileBeat ad RabbitMQ queue

Haec compositio ad ligna viae a FileBeat ad RabbitMQ queue venientem adhibetur. Circa primum quaeruntur duo.

  1. Infeliciter, FileBeat non habet plugin output scribens ad RabbitMQ directe. Et talis functionality, ex causa eorum github iudicans, non ordinatur ad exsequendum. Kafka plugin est, sed aliqua de causa domi uti non possumus.
  2. Sunt necessaria ad colligendas trabes in DMZ. Ex eis, ligna primum ad queue addenda sunt et deinde LogStash legit introitus a queue ab extra.

Ideo casui est ubi servientes in DMZ locantur talem rationem leviter perplexam uti. Exemplum figurae sic spectat:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. nuntius queue

Haec pars adhibetur ad quiddam in DMZ introitus iniuriarum. Recordatio fit per fasciculum Filebeat → LogStash. Lectio extra DMZ fit per LogStash. Cum per RabboitMQ operando, circiter 4 milia nuntii per alterum discursum sunt.

Nuntius profectus per nomen systematis configuratur, i.e., innixa in notitia configurationis FileBeat. Omnes nuntii ad unam queue eunt. Si ob aliquam causam sistitur servitium queuing, tunc hoc non erit ad iacturam nuntiorum: FileBeats nexum recipiet et errores ad tempus suspendet mittens. Et LogStash qui legit ex queue errores retis etiam recipiet et nexum restituendum exspectabit. In hoc casu, notitia, utique, datorum non erit scribenda.

Praecepta sequentes sunt stantibus anteponere et configurare:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Haec pars adhibetur ad visualize magna notitia. In hoc casu, necesse est ut Grafana 4.6+ obturaculum inaugurare deprimatur. We had to tweak it a frenum ut emendare efficientiam expediendi SQL Filtra in ashboardday.

Exempli causa, variabiles utimur, et si in agro sparguntur non positi, tum condicionem in UBI formae non generare volumus (uriStem = » ET uriStem != »). In hoc casu, ClickHouse leget columnam uriStem. In genere varias optiones temptavimus et tandem plugin (in $valueIfEmpty macronum) correximus ut in casu vacui valoris 1 redeat, sine mentione ipsius columnae.

Nunc ergo hac interrogatione utere potes pro graphe

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

which translates to this SQL.

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

conclusio,

Aspectus datorum ClickHouse terminus eventus in foro factus est. Haud facile est credere nos, omnino gratis, in momento valido et practico instrumento armati essemus ad operandum magnas notitias. Utique, crescente necessitate (exempli gratia, sharding et replicatio multis servientibus), ratio magis implicata fiet. Sed in primis impressionibus, cum hoc datorum operando valde jucundum est. Perspicuum est factum productum "pro populo".

Comparatus cum ElasticSearch, sumptus recondendis et dispensandis per quinquies decies centena milia minui aestimatur. Aliis verbis, si pro hodierna copia notitiarum variarum machinarum botrum constituere debeamus, tunc cum strepita utendo, una machina humilis potentia nobis sufficit. Ita, sane, ElasticSearch etiam mechanismos et alias notas compressionis in disco notas habet, quae signanter consummationem subsidiorum minuere possunt, sed comparati ClickHouse, hoc carior erit.

Sine ullis optimizationibus specialibus a parte nostra, in occasus defectus, notitias onerantium et e datorum operibus delectis mira celeritate. Multa indicia adhuc habemus (circiter 200 decies centena milia), sed ipsa server infirma est. Hoc instrumento uti possumus in futurum ad alia proposita ad acta reponenda non pertinentia. Exempli gratia, pro fine-ad-finem analyticorum, in campo securitatis, apparatus eruditionis.

Ad calcem paucula pros et cons.

Минусы

  1. Onerans records in large batches. Ex altera parte, haec lineamentum est, sed adhuc additis componentibus uti in monumentis buffering. Hoc munus non semper facile est, sed tamen solubile est. Atque utinam rationem simpliciorem reddere vellem.
  2. Quaedam functiones peregrinae vel novae notae saepe in novis versionibus erumpunt. Haec sollicitudo causat, desiderium ad novas versiones upgrade reducit. Exempli gratia, machina Kafka mensa valde utilis est quae te permittit ut eventus ex Kafka directe legere, sine consumers exsequendo. Sed de numero constitutionum in github diiudicamus, adhuc cavemus ne hac machina in productione utamur. Sed si non subitos gestus ad latus uteris principale functionis, tunc stabiliter operatur.

Плюсы

  1. Non morabor.
  2. Low limen viscus.
  3. Aperta principium.
  4. Liber.
  5. Squamas bene (sharding / replicationem de arca)
  6. Sub registro programmatum Russorum a Ministerio Communicationis commendatum est.
  7. Presentibus officialibus de Yandex.

Source: www.habr.com

Add a comment