Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

vektor, mis on loodud logiandmete, mõõdikute ja sündmuste kogumiseks, teisendamiseks ja saatmiseks.

→ Github

Kuna see on kirjutatud Rust keeles, iseloomustab seda analoogidega võrreldes kõrge jõudlus ja väike RAM-i tarbimine. Lisaks pööratakse palju tähelepanu korrektsusega seotud funktsioonidele, eelkõige võimalusele salvestada saatmata sündmusi ketta puhvrisse ja pöörata faile.

Arhitektuuriliselt on Vector sündmuste ruuter, mis võtab vastu sõnumeid ühelt või mitmelt inimeselt allikatest, lisades valikuliselt nendele sõnumitele teisendusija saata need ühele või mitmele äravoolud.

Vector asendab filebeati ja logstashi, see võib toimida mõlemas rollis (logide vastuvõtmine ja saatmine), nende kohta lisateavet veebisait.

Kui Logstashis on kett ehitatud sisendiks → filter → väljundiks, siis Vectoris on see nii allikadmuudabvajub

Näiteid leiate dokumentatsioonist.

See juhend on muudetud juhis alates Vjatšeslav Rakhinski. Algsed juhised sisaldavad geoip-töötlust. Sisevõrgust geoipi testimisel andis vektor vea.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Kui kellelgi on vaja geoipi töödelda, siis vaadake algseid juhiseid alates Vjatšeslav Rakhinski.

Konfigureerime kombinatsiooni Nginx (juurdepääsulogid) → Vector (klient | Filebeat) → vektor (server | Logstash) → Clickhouse'is ja eraldi Elasticsearchis. Paigaldame 4 serverit. Kuigi saate sellest mööda minna 3 serveriga.

Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

Skeem on umbes selline.

Keela Selinux kõigis oma serverites

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Installime kõikidesse serveritesse HTTP-serveri emulaatori + utiliidid

HTTP-serveri emulaatorina kasutame nodejs-stub-server pärit Maksim Ignatenko

Nodejs-stub-serveril pole rpm. see on looge selle jaoks pöörete arv. rpm kompileeritakse kasutades Fedora Copr

Lisage hoidla antonpatsev/nodejs-stub-server

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Installige kõikidesse serveritesse nodejs-stub-server, Apache etalon ja ekraaniterminali multiplekser

yum -y install stub_http_server screen mc httpd-tools screen

Parandasin failis /var/lib/stub_http_server/stub_http_server.js stub_http_server reaktsiooniaega nii, et logisid oleks rohkem.

var max_sleep = 10;

Käivitame stub_http_serveri.

systemctl start stub_http_server
systemctl enable stub_http_server

Clickhouse'i paigaldamine serveris 3

ClickHouse kasutab SSE 4.2 käsukomplekti, seega kui pole teisiti määratud, muutub selle tugi kasutatavas protsessoris täiendavaks süsteeminõudeks. Siin on käsk, et kontrollida, kas praegune protsessor toetab SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Kõigepealt peate ühendama ametliku hoidla:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Pakettide installimiseks peate käivitama järgmised käsud:

sudo yum install -y clickhouse-server clickhouse-client

Luba clickhouse-serveril kuulata võrgukaarti failis /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

Logimistaseme muutmine jälgimisest silumisele

siluda

Standardsed tihendusseaded:

min_compress_block_size  65536
max_compress_block_size  1048576

Zstd tihendamise aktiveerimiseks soovitati mitte puudutada konfiguratsiooni, vaid kasutada DDL-i.

Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

Ma ei leidnud Google'is, kuidas DDL-i kaudu zstd-tihendust kasutada. Nii et jätsin selle nii nagu on.

Kolleegid, kes kasutavad Clickhouse'is zstd tihendamist, palun jagage juhiseid.

Serveri deemonina käivitamiseks käivitage:

service clickhouse-server start

Liigume nüüd Clickhouse'i seadistamise juurde

Minge Clickhouse'i

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — selle serveri IP, kuhu Clickhouse on installitud.

Loome vektorandmebaasi

CREATE DATABASE vector;

Kontrollime andmebaasi olemasolu.

show databases;

Looge tabel vector.logs.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Kontrollime, kas tabelid on loodud. Käivitame clickhouse-client ja esitage taotlus.

Läheme vektorite andmebaasi.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Vaatame tabeleid.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Elasticsearchi installimine 4. serverisse, et saata samad andmed Elasticsearchile, et võrrelda neid Clickhouse'iga

Lisage avalik pöörete arvu võti

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Loome 2 repot:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Paigaldage elasticsearch ja kibana

yum install -y kibana elasticsearch

Kuna see on ühes eksemplaris, peate faili /etc/elasticsearch/elasticsearch.yml lisama järgmise:

discovery.type: single-node

Et vektor saaks saata andmeid elasticsearchi teisest serverist, muudame aadressi network.host.

network.host: 0.0.0.0

Kibanaga ühenduse loomiseks muutke failis /etc/kibana/kibana.yml parameetrit server.host

server.host: "0.0.0.0"

Vana ja sisaldab automaatsesse käivitamisse elasticsearch

systemctl enable elasticsearch
systemctl start elasticsearch

ja kibana

systemctl enable kibana
systemctl start kibana

Elasticsearchi konfigureerimine ühe sõlme režiimi jaoks 1 kild, 0 koopiat. Tõenäoliselt on teil suure hulga serverite klaster ja te ei pea seda tegema.

Tulevaste indeksite jaoks värskendage vaikemalli:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

Paigaldamine vektor Logstashi asendajana serveris 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Seadistame Vectori Logstashi asendajaks. Faili /etc/vector/vector.toml redigeerimine

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Saate kohandada jaotist transforms.nginx_parse_add_defaults.

Kui Vjatšeslav Rakhinski kasutab neid konfiguratsioone väikese CDN-i jaoks ja ülesvoolus võib olla mitu väärtust_*

Näiteks:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Kui see pole teie olukord, saab seda jaotist lihtsustada

Loome süsteemid /etc/systemd/system/vector.service teenusesätted

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Pärast tabelite loomist saate käivitada Vectori

systemctl enable vector
systemctl start vector

Vektorloge saab vaadata järgmiselt:

journalctl -f -u vector

Logides peaksid olema sellised sissekanded

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Kliendis (veebiserveris) - 1. server

Nginxiga serveris peate ipv6 keelama, kuna clickhouse'i logitabel kasutab välja upstream_addr IPv4, kuna ma ei kasuta võrgus ipv6. Kui ipv6 ei ole välja lülitatud, ilmnevad vead:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Ehk lugejad, lisage ipv6 tugi.

Looge fail /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Seadete rakendamine

sysctl --system

Installime nginxi.

Lisatud nginxi hoidla fail /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Installige nginxi pakett

yum install -y nginx

Esiteks peame konfigureerima logivormingu Nginxis failis /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Et teie praegust konfiguratsiooni mitte rikkuda, võimaldab Nginx teil kasutada mitut access_logi käsku

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Ärge unustage lisada uute logide jaoks logrotate reeglit (kui logifail ei lõpe .log-iga)

Eemalda default.conf failist /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Lisage virtuaalne host /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Lisage virtuaalne host /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Lisage virtuaalne host /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Lisage virtuaalne host /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Lisage faili /etc/hosts kõikidesse serveritesse virtuaalsed hostid (serveri 172.26.10.106 ip, kuhu nginx on installitud):

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Ja kui kõik on valmis, siis

nginx -t 
systemctl restart nginx

Nüüd paigaldame selle ise vektor

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Loome seadete faili systemd /etc/systemd/system/vector.service jaoks

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Ja konfigureerige Filebeati asendus failis /etc/vector/vector.toml. IP-aadress 172.26.10.108 on logiserveri (Vector-Server) IP-aadress.

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Ärge unustage lisada vektorkasutaja vajalikku rühma, et ta saaks lugeda logifaile. Näiteks nginx in centos loob adm-rühma õigustega logisid.

usermod -a -G adm vector

Alustame vektorteenust

systemctl enable vector
systemctl start vector

Vektorloge saab vaadata järgmiselt:

journalctl -f -u vector

Logides peaks olema selline kanne

INFO vector::topology::builder: Healthcheck: Passed.

Stressi testimine

Testime läbi Apache etaloniga.

Pakett httpd-tools installiti kõikidesse serveritesse

Alustame testimist Apache etaloniga ekraanil neljas erinevas serveris. Esiteks käivitame ekraaniterminali multiplekseri ja seejärel alustame testimist Apache'i etaloniga. Kuidas töötada ekraaniga, leiate siit siit.

1. serverist

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

2. serverist

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

3. serverist

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

4. serverist

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Kontrollime Clickhouse'i andmeid

Minge Clickhouse'i

clickhouse-client -h 172.26.10.109 -m

SQL päringu tegemine

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Uuri Clickhouse'i laudade suurust

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Uurime, kui palju palke Clickhouse’is enda alla võttis.

Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

Logitabeli suurus on 857.19 MB.

Nginxi json-logide saatmine Vectori abil teenusesse Clickhouse ja Elasticsearch

Samade andmete suurus Elasticsearchi indeksis on 4,5 GB.

Kui te ei määra parameetrites vektoris andmeid, võtab Clickhouse 4500/857.19 = 5.24 korda vähem kui Elasticsearchis.

Vektoris kasutatakse vaikimisi tihendusvälja.

Telegrami vestlus Clickhouse
Telegrami vestlus Elasticsearch
Telegrami vestlus autor "Süsteemi kogumine ja analüüs sõnumeid"

Allikas: www.habr.com

Lisa kommentaar