Ebe nchekwa data ClickHouse maka ụmụ mmadụ, ma ọ bụ teknụzụ Alien

Aleksey Lizunov, Onye isi nke Center Competence for Remote Service Channel nke Directorate of Information Technologies nke MKB.

Ebe nchekwa data ClickHouse maka ụmụ mmadụ, ma ọ bụ teknụzụ Alien

Dị ka ihe ọzọ na ELK tojupụtara (ElasticSearch, Logstash, Kibana), anyị na-eme nyocha na iji ClickHouse nchekwa data dị ka ebe nchekwa data maka ndekọ.

N'isiokwu a, anyị ga-achọ ikwu banyere ahụmahụ anyị nke iji ClickHouse nchekwa data na nsonaazụ mmalite nke ọrụ pilot. Okwesiri iburu n'uche ozugbo na nsonaazụ ya dị egwu.


Ebe nchekwa data ClickHouse maka ụmụ mmadụ, ma ọ bụ teknụzụ Alien

Na-esote, anyị ga-akọwa n'ụzọ zuru ezu otú e si ahazi usoro anyị, na ihe ndị mejupụtara ya. Ma ugbu a, m ga-achọ ikwu ntakịrị banyere nchekwa data a n'ozuzu ya, na ihe mere o ji dị mma ịṅa ntị. Ebe nchekwa data ClickHouse bụ nchekwa data columnar na-arụ ọrụ dị elu sitere na Yandex. A na-eji ya na ọrụ Yandex, na mbụ ọ bụ isi nchekwa data maka Yandex.Metrica. Sistemu mepere emepe, efu. Site n'echiche nke onye nrụpụta, ana m eche mgbe niile ka ha siri mejuputa ya, n'ihi na enwere nnukwu data dị egwu. Na interface onye ọrụ Metrica n'onwe ya na-agbanwe agbanwe yana ngwa ngwa. Na mbụ maara na nchekwa data a, echiche bụ: "Ọfọn, n'ikpeazụ! Emere maka ndị mmadụ! Malite site na usoro nrụnye na-ejedebe na izipu arịrịọ.

Ebe nchekwa data a nwere oke ntinye dị ala. Ọbụna onye nrụpụta nwere nkà nwere ike ịwụnye nchekwa data a n'ime nkeji ole na ole wee malite iji ya. Ihe niile na-arụ ọrụ nke ọma. Ọbụna ndị ọhụrụ na Linux nwere ike ijikwa nrụnye ngwa ngwa ma rụọ ọrụ kachasị mfe. Ọ bụrụ na mbụ, na okwu Big Data, Hadoop, Google BigTable, HDFS, onye mmepụta nkịtị nwere echiche na ọ bụ banyere ụfọdụ terabytes, petabytes, na ụfọdụ ndị karịrị mmadụ na-etinye aka na ntọala na mmepe maka usoro ndị a, mgbe ahụ na ọbịbịa nke ClickHouse. nchekwa data, anyị nwetara ngwá ọrụ dị mfe, nghọta nke ị nwere ike iji dozie ọtụtụ ọrụ a na-apụghị iru na mbụ. Ọ na-ewe naanị otu igwe dị mma na nkeji ise iji wụnye. Ya bụ, anyị nwetara nchekwa data dị ka, dịka ọmụmaatụ, MySql, mana naanị maka ịchekwa ọtụtụ ijeri ndekọ! Otu nnukwu ebe nchekwa nwere asụsụ SQL. Ọ dị ka e nyere ndị mmadụ ngwa agha ndị mba ọzọ.

Banyere sistemụ osisi anyị

Iji nakọta ozi, a na-eji faịlụ ndekọ IIS nke ngwa weebụ ọkọlọtọ (anyị na-enyocha ndekọ ngwa ugbu a, mana ebumnuche bụ isi na ọkwa pilot bụ ịnakọta ndekọ IIS).

Maka ihe dị iche iche, anyị enweghị ike ịhapụ nchịkọta ELK kpamkpam, anyị na-aga n'ihu na-eji LogStash na Filebeat components, bụ ndị gosipụtara onwe ha nke ọma ma na-arụ ọrụ nke ọma na ntụkwasị obi.

E gosipụtara atụmatụ igbu osisi n'ozuzu na foto dị n'okpuru:

Ebe nchekwa data ClickHouse maka ụmụ mmadụ, ma ọ bụ teknụzụ Alien

Akụkụ nke ide data na nchekwa data ClickHouse adịghị adịkarị (otu ugboro kwa nkeji) ntinye ndekọ na nnukwu batches. Nke a, dịka o doro anya, bụ akụkụ kachasị "nsogbu" ị na-ezute mgbe mbụ ị na-arụ ọrụ na ClickHouse nchekwa data: atụmatụ ahụ na-aghọ ntakịrị mgbagwoju anya.
Ngwa mgbakwunye maka LogStash, nke na-etinye data ozugbo na ClickHouse, nyere aka nke ukwuu ebe a. A na-etinye akụkụ a n'otu ihe nkesa dị ka nchekwa data n'onwe ya. Ya mere, n'ikwu okwu n'ozuzu, a naghị atụ aro ka ịme ya, ma site na echiche bara uru, ka ị ghara ịmepụta sava dị iche iche mgbe a na-etinye ya na otu ihe nkesa. Anyị ahụghị ọdịda ọ bụla ma ọ bụ esemokwu akụrụngwa na nchekwa data. Na mgbakwunye, ekwesịrị ịmara na ngwa mgbakwunye nwere usoro nnwale ọzọ ma ọ bụrụ na enwere njehie. Na n'ọnọdụ nke ihie ụzọ, ngwa mgbakwunye na-ede na diski a ogbe nke data na-enweghị ike ịtinye (usoro faịlụ dị mma: mgbe edezi, ị nwere ike mfe fanye agbaziri ogbe site clickhouse-client).

Edepụtara ndepụta sọftụwia zuru oke na atụmatụ a na tebụl:

Ndepụta ngwanrọ eji

Aha

Nkowasi

Njikọ nkesa

NGINX

Weghachite-proxy iji gbochie ohere site na ọdụ ụgbọ mmiri wee hazie ikike

A naghị eji ya ugbu a na atụmatụ

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Nyefee ndekọ ndekọ faịlụ.

https://www.elastic.co/downloads/beats/filebeat (ihe nkesa maka Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Onye nchịkọta ndekọ.

A na-eji ya anakọta ndekọ sitere na FileBeat, yana ịnakọta ndekọ sitere na kwụ n'ahịrị RabbitMQ (maka sava ndị dị na DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-mmepụta-clickhouse

Loagstash ngwa mgbakwunye maka ibufe ndekọ na ClickHouse nchekwa data na batches

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin tinye logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin tinye logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin tinye logstash-filter-multiline

Pịa olọ

Nchekwa ndekọ https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Rịba ama. Malite na August 2018, "nkịtị" rpm na-ewu maka RHEL pụtara na ebe nchekwa Yandex, yabụ ị nwere ike ịnwale iji ha. N'oge echichi, anyị na-eji ngwugwu nke Altinity wuru.

Grafana

Anya ihe ndekọ. Ịtọlite ​​dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - ụdị kachasị ọhụrụ

Isi mmalite data ClickHouse maka Grafana 4.6+

Nkwụnye maka Grafana na ebe data ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Banye rawụta site na FileBeat gaa kwụ n'ahịrị RabbitMQ.

Rịba ama. N'ụzọ dị mwute, FileBeat enweghị mmepụta ozugbo na RabbitMQ, ya mere a chọrọ njikọ etiti n'ụdị Logstash.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Oke oyibo

kwụ n'ahịrị ozi. Nke a bụ ihe nchekwa ndekọ na DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Oge ịgba ọsọ Erlang (Achọrọ maka RabbitMQ)

Erlang oge ịgba ọsọ. Achọrọ maka RabbitMQ ka ọ rụọ ọrụ

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

A na-ewepụta nhazi ihe nkesa na nchekwa data ClickHouse na tebụl na-esonụ:

Aha

uru

Примечание

Nhazi

HDD: 40GB
RAM: 8GB
Ihe nhazi: Core 2 2Ghz

Ọ dị mkpa ịṅa ntị na ndụmọdụ maka ịrụ ọrụ nchekwa data ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Ngwanrọ sistemu izugbe

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Dị ka ị pụrụ ịhụ, nke a bụ nkịtị na-arụ ọrụ.

Nhazi nke tebụl maka ịchekwa ndekọ bụ nke a:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Anyị na-eji nkewa ndabara (site n'ọnwa) yana granularity index. Mpaghara niile na-adakọba na ndenye log IIS maka itinye arịrịọ http. Iche iche, anyị na-achọpụta na enwere mpaghara dị iche iche maka ịchekwa mkpado utm (a na-atụgharị ha na ọkwa nke itinye n'ime tebụl site na mpaghara eriri ajụjụ).

Ọzọkwa, agbakwunyere ọtụtụ mpaghara sistemụ na tebụl iji chekwaa ozi gbasara sistemụ, akụrụngwa, sava. Lee okpokoro n'okpuru maka nkọwa nke ubi ndị a. N'otu tebụl, anyị na-echekwa ndekọ maka ọtụtụ usoro.

Aha

Nkowasi

Ihe nlele:

fld_app_aha

Aha ngwa/usoro
Ụkpụrụ bara uru:

  • site1.domain.com Saịtị mpụga 1
  • site2.domain.com Saịtị mpụga 2
  • internal-site1.domain.local Saịtị ime 1

saịtị1.domain.com

fld_app_module

Modul sistemụ
Ụkpụrụ bara uru:

  • webụsaịtị - weebụsaịtị
  • svc - ọrụ weebụsaịtị
  • intgr - Ọrụ Weebụ njikọta
  • bo - Admin (BackOffice)

web

fld_website_aha

Aha saịtị na IIS

Enwere ike itinye ọtụtụ sistemụ n'otu ihe nkesa, ma ọ bụ ọbụna ọtụtụ oge nke otu modul sistemụ

isi webụ

fld_server_aha

Aha sava

web1.domain.com

fld_log_file_aha

Ụzọ na faịlụ log na ihe nkesa

C: inetpublogsLogFiles
W3SVC1u_ex190711.log

Nke a na-enye gị ohere ịmepụta eserese nke ọma na Grafana. Dịka ọmụmaatụ, lelee arịrịọ sitere na ihu ihu nke otu sistemụ. Nke a yiri counter saịtị na Yandex.Metrica.

Nke a bụ ụfọdụ ọnụ ọgụgụ maka ojiji nke nchekwa data maka ọnwa abụọ.

Ọnụọgụ ndekọ nke sistemu na ihe mejupụtara ha mebiri

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Ọnụ ego data dị na diski ahụ

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Ogo nke mkpakọ data na kọlụm

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Nkọwa nke akụrụngwa eji eme ihe

FileBeat. Na-ebufe ndekọ faịlụ

Akụkụ a na-eme mgbanwe maka faịlụ ndekọ na diski wee nyefee ozi ahụ na LogStash. Awụnyere na sava niile ebe edere faịlụ ndekọ (na-abụkarị IIS). Na-arụ ọrụ na ọnọdụ ọdụ (ya bụ na-ebufe naanị ndekọ agbakwunyere na faịlụ). Mana iche iche enwere ike ịhazi ya ka ịnyefe faịlụ niile. Nke a bara uru mgbe ịchọrọ ibudata data sitere na ọnwa ndị gara aga. Naanị tinye faịlụ log na nchekwa ma ọ ga-agụ ya n'ozuzu ya.

Mgbe akwụsịre ọrụ ahụ, a naghị ebufe data ahụ n'ihu na nchekwa ahụ.

Nhazi ihe atụ dị ka nke a:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Onye nchịkọta ndekọ

Emebere akụrụngwa a ka ọ nweta ndenye ndekọ site na FileBeat (ma ọ bụ site n'ahịrị RabbitMQ), nyocha na itinye batches n'ime nchekwa data ClickHouse.

Maka ntinye n'ime ClickHouse, a na-eji ngwa mgbakwunye Logstash-output-clickhouse. Ngwa mgbakwunye Logstash nwere usoro nyochagharị arịrịọ, mana na-emechi oge niile, ọ ka mma ịkwụsị ọrụ ahụ n'onwe ya. Mgbe a kwụsịrị, a ga-akwakọba ozi na RabbitMQ kwụ n'ahịrị, yabụ ọ bụrụ na nkwụsị ahụ dị ogologo oge, mgbe ahụ ọ ka mma ịkwụsị Filebeats na sava. N'ime atụmatụ ebe RabbitMQ anaghị eji (na netwọkụ mpaghara, Filebeat na-eziga ndekọ ozugbo na Logstash), Filebeat na-arụ ọrụ nke ọma na enweghị ntụkwasị obi, yabụ maka ha enweghị mmepụta na-agafe na-enweghị nsonaazụ.

Nhazi ihe atụ dị ka nke a:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

ọkpọkọ.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ụlọ akụ. Nchekwa ndekọ

A na-echekwa ndekọ maka sistemụ niile n'otu tebụl (lee na mbido isiokwu). Ezubere ya ịchekwa ozi gbasara arịrịọ: paramita niile yiri maka ụdị dị iche iche, dị ka ndekọ IIS, apache na ndekọ nginx. Maka ndekọ ngwa, nke, dịka ọmụmaatụ, njehie, ozi ozi, ịdọ aka ná ntị na-edekọ, a ga-enye tebụl dị iche iche nke kwesịrị ekwesị (ugbu a na nhazi nhazi).

Mgbe ị na-emepụta tebụl, ọ dị ezigbo mkpa ikpebi igodo isi (nke a ga-ahazi data n'oge nchekwa). Ogo mkpakọ data na ọsọ ajụjụ dabere na nke a. N'ihe atụ anyị, isi ihe bụ
Iwu site na (fld_app_name, fld_app_module, logdatetime)
Nke ahụ bụ, site n'aha nke usoro ahụ, aha akụkụ usoro na ụbọchị ihe omume ahụ. Na mbụ, ụbọchị ihe omume ahụ bịara buru ụzọ. Mgbe emechara ya na ebe ikpeazụ, ajụjụ malitere ịrụ ọrụ ihe dị ka okpukpu abụọ ngwa ngwa. Ịgbanwe igodo isi ga-achọ ịmegharị tebụl na ibugharị data ka ClickHouse wee hazie data na diski. Nke a bụ ọrụ dị arọ, yabụ ọ dị mma iche echiche nke ukwuu maka ihe kwesịrị ịgụnye na igodo ụdị.

Ekwesịrị ịmara na ụdị data LowCardinality apụtala dịtụ na nsụgharị ọhụrụ. Mgbe ị na-eji ya, a na-ebelata oke data abịakọrọ nke ukwuu maka ubi ndị nwere obere kadinality (nhọrọ ole na ole).

A na-eji ụdị 19.6 ugbu a ma anyị na-eme atụmatụ ịnwale imelite na ụdị kachasị ọhụrụ. Ha nwere njiri mara mma dị ka Adaptive Granularity, Skipping indices na DoubleDelta codec, dịka ọmụmaatụ.

Site na ndabara, n'oge nrụnye, a na-ahazi ọkwa ntinye ka ọ na-achọpụta. A na-atụgharị na ndekọ ndekọ, ma n'otu oge ahụ ha na-agbasa ruo gigabyte. Ọ bụrụ na ọ dịghị mkpa, mgbe ahụ, ị ​​nwere ike ịtọ ọkwa ịdọ aka ná ntị, mgbe ahụ, nha nke log na-ebelata nke ukwuu. A na-edozi ntọala ndekọ na faịlụ config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Ụfọdụ iwu bara uru

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Banye rawụta site na FileBeat gaa kwụ n'ahịrị RabbitMQ

A na-eji akụrụngwa a na-ebugharị ndekọ na-abịa site na FileBeat gaa n'ahịrị RabbitMQ. Enwere isi ihe abụọ ebe a:

  1. Ọ dị nwute, FileBeat enweghị ngwa mgbakwunye iji dee ozugbo na RabbitMQ. Na ọrụ dị otú ahụ, na-ekpe ikpe site na okwu dị na github ha, adịghị eme atụmatụ maka mmejuputa. Enwere ngwa mgbakwunye maka Kafka, mana n'ihi ihe ụfọdụ anyị enweghị ike iji ya n'ụlọ.
  2. Enwere ihe achọrọ maka ịnakọta ndekọ na DMZ. Dabere na ha, a ga-ebu ụzọ tinye ndekọ ahụ na kwụ n'ahịrị ma LogStash gụọ ndenye sitere na kwụ n'ahịrị si n'èzí.

Ya mere, ọ bụ maka ikpe ebe sava dị na DMZ ka mmadụ ga-eji atụmatụ dị mgbagwoju anya dị otú ahụ. Nhazi ihe atụ dị ka nke a:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. kwụ n'ahịrị ozi

A na-eji akụrụngwa a iji chekwaa ndenye ndekọ na DMZ. A na-eme ndekọ site na ụyọkọ Filebeat → LogStash. A na-eme agụ ihe site na mpụga DMZ site na LogStash. Mgbe ị na-arụ ọrụ site na RabboitMQ, a na-ahazi ihe dị ka ozi puku anọ kwa nkeji.

A na-ahazi nhazi ụzọ ozi site na aha sistemụ, ya bụ dabere na data nhazi FileBeat. Ozi niile na-aga n'otu kwụ n'ahịrị. Ọ bụrụ na n'ihi ihe ụfọdụ akwụsịla ọrụ kwụ n'ahịrị, mgbe ahụ nke a agaghị eduga na ọnwụ nke ozi: FileBeats ga-enweta njehie njikọ ma kwụsịtụ izipu nwa oge. Na LogStash nke na-agụ site na kwụ n'ahịrị ga-enwetakwa njehie netwọk wee chere ka eweghachi njikọ ahụ. N'okwu a, data, n'ezie, agaghịzi ede ya na nchekwa data.

A na-eji ntuziaka ndị a iji mepụta na hazie kwụ n'ahịrị:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Mpempe akwụkwọ

A na-eji akụrụngwa a iji hụ data nleba anya. N'okwu a, ịkwesịrị ịwụnye ngwa data ClickHouse maka ngwa mgbakwunye Grafana 4.6+. Anyị ga-emegharị ya ntakịrị iji melite arụmọrụ nke nhazi ihe nzacha SQL na dashboard.

Dịka ọmụmaatụ, anyị na-eji mgbanwe dị iche iche, ma ọ bụrụ na edoghị ha na mpaghara nzacha, mgbe ahụ, anyị ga-achọ ka ọ ghara ịmepụta ọnọdụ na WHERE nke ụdị ( uriStem = "NA uriStem! ="). N'okwu a, ClickHouse ga-agụ kọlụm uriStem. N'ozuzu, anyị nwara nhọrọ dị iche iche ma mesịa mezie ngwa mgbakwunye ( $ valueIfEmpty macro ) nke mere na n'ihe banyere uru efu ọ na-alaghachi 1, na-ekwughị na kọlụm n'onwe ya.

Ma ugbu a ị nwere ike iji ajụjụ a maka eserese

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

nke na-atụgharị na SQL a (rịba ama na mpaghara uriStem efu ka agbanweela ka ọ bụrụ naanị 1)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

nkwubi

Ọdịdị nke ClickHouse nchekwa data aghọwo ihe dị ịrịba ama na ahịa. O siri ike iche n'echiche na, n'efu kpamkpam, n'otu ntabi anya, anyị ji ngwá ọrụ dị ike ma dị irè maka ịrụ ọrụ na nnukwu data. N'ezie, na mkpa na-abawanye (dịka ọmụmaatụ, sharding na replication na multiple sava), atụmatụ ga-adị mgbagwoju anya. Ma na echiche mbụ, ịrụ ọrụ na nchekwa data a dị ezigbo ụtọ. Enwere ike ịhụ na e mere ngwaahịa ahụ "maka ndị mmadụ."

E jiri ya tụnyere ElasticSearch, a na-eche na ọnụ ahịa ịchekwa na nhazi ndekọ ga-ebelata ugboro ise ruo iri. N'ikwu ya n'ụzọ ọzọ, ọ bụrụ na maka ọnụọgụ data dị ugbu a, anyị ga-edozi ụyọkọ nke igwe dị iche iche, mgbe ahụ mgbe ị na-eji ClickHouse, otu igwe dị ala ezuru anyị. Ee, n'ezie, ElasticSearch nwekwara usoro mkpakọ data na diski yana atụmatụ ndị ọzọ nwere ike belata oriri akụrụngwa, mana atụnyere ClickHouse, nke a ga-adị ọnụ karịa.

Na-enweghị nkwalite ọ bụla pụrụ iche n'akụkụ anyị, na ntọala ndabara, nbudata data na ịhọrọ site na nchekwa data na-arụ ọrụ na ọsọ dị ịtụnanya. Anyị enweghị ọtụtụ data (ihe dị ka nde nde 200), mana ihe nkesa n'onwe ya adịghị ike. Anyị nwere ike iji ngwá ọrụ a n'ọdịnihu maka ebumnuche ndị ọzọ na-abụghị ihe metụtara ịchekwa ndekọ. Dịka ọmụmaatụ, maka nyocha njedebe na njedebe, na ngalaba nchekwa, mmụta igwe.

Na njedebe, ntakịrị banyere uru na ọghọm.

Минусы

  1. Na-ebu ihe ndekọ na nnukwu batches. N'otu aka, nke a bụ njirimara, mana ị ka ga-eji ihe mgbakwunye ndị ọzọ maka ndekọ nchekwa. Ọrụ a anaghị adị mfe mgbe niile, mana ọ ka nwere ike idozi ya. Ọ ga-amasị m ime ka atụmatụ ahụ dị mfe.
  2. Ụfọdụ ọrụ pụrụ iche ma ọ bụ atụmatụ ọhụrụ na-agbajikarị na nsụgharị ọhụrụ. Nke a na-akpata nchegbu, na-ebelata ọchịchọ ịkwalite na ụdị ọhụrụ. Dịka ọmụmaatụ, igwe tebụl Kafka bụ ihe bara uru nke na-enye gị ohere ịgụ ihe omume sitere na Kafka ozugbo, na-enweghị mmejuputa ndị na-azụ ahịa. Ma na-ekpe ikpe site na ọnụ ọgụgụ nke Okwu na github, anyị ka na-akpachara anya ka anyị ghara iji a engine na mmepụta. Otú ọ dị, ọ bụrụ na ịmeghị mmegharị mberede n'akụkụ ma jiri isi ọrụ, mgbe ahụ ọ na-arụ ọrụ nke ọma.

Плюсы

  1. Adịghị nwayọọ.
  2. Ọnụ ụzọ ntinye dị ala.
  3. Ebe mepere emepe.
  4. N'efu.
  5. Akara nke ọma (ịkọwa/mbigharị n'ime igbe)
  6. Gụnyere na ndebanye aha software nke Russian nke Ministry of Communications kwadoro.
  7. Ọnụnọ nke nkwado gọọmentị sitere na Yandex.

isi: www.habr.com

Tinye a comment