Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Vector, dirancang kanggo ngumpulake, ngowahi lan ngirim data log, metrik lan acara.

→ GitHub

Ditulis ing basa Rust, ditondoi kanthi kinerja dhuwur lan konsumsi RAM sing sithik dibandhingake karo analoge. Kajaba iku, akeh perhatian dibayar kanggo fungsi sing ana gandhengane karo kabeneran, utamane, kemampuan kanggo nyimpen acara sing ora dikirim menyang buffer ing disk lan muter file.

Secara arsitektur, Vector minangka router acara sing nampa pesen saka siji utawa luwih sumber, opsional nglamar liwat pesen iki transformasi, lan ngirim menyang siji utawa luwih saluran.

Vektor minangka panggantos kanggo filebeat lan logstash, bisa tumindak ing loro peran (nampa lan ngirim log), rincian liyane babagan situs.

Yen ing Logstash chain dibangun minangka input → Filter → output banjur ing Vektor sumberngowahinglelebke

Conto bisa ditemokake ing dokumentasi.

Pandhuan iki minangka instruksi revisi saka Vyacheslav Rakhinsky. Instruksi asli ngemot pangolahan geoip. Nalika nguji geoip saka jaringan internal, vektor menehi kesalahan.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Yen ana sing kudu ngolah geoip, banjur deleng instruksi asli saka Vyacheslav Rakhinsky.

Kita bakal ngatur kombinasi Nginx (Log akses) → Vektor (Klien | Filebeat) → Vektor (Server | Logstash) → kanthi kapisah ing Clickhouse lan kanthi kapisah ing Elasticsearch. Kita bakal nginstal 4 server. Sanajan sampeyan bisa ngliwati karo 3 server.

Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Skema kaya iki.

Pateni Selinux ing kabeh server sampeyan

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Kita nginstal emulator server HTTP + utilitas ing kabeh server

Minangka emulator server HTTP sing bakal digunakake nodejs-stub-server saka Maxim Ignatenko

Nodejs-stub-server ora duwe rpm. iku nggawe rpm kanggo iku. rpm bakal dikompilasi nggunakake Fedora Copr

Tambah repositori antonpatsev/nodejs-stub-server

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Instal nodejs-stub-server, pathokan Apache lan multiplexer terminal layar ing kabeh server

yum -y install stub_http_server screen mc httpd-tools screen

Aku mbenerake wektu respon stub_http_server ing file /var/lib/stub_http_server/stub_http_server.js supaya luwih akeh log.

var max_sleep = 10;

Ayo miwiti stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Instalasi Clickhouse ing server 3

ClickHouse nggunakake set instruksi SSE 4.2, dadi kajaba kasebut, dhukungan ing prosesor sing digunakake dadi syarat sistem tambahan. Mangkene prentah kanggo mriksa manawa prosesor saiki ndhukung SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Pisanan sampeyan kudu nyambungake repositori resmi:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Kanggo nginstal paket sampeyan kudu mbukak printah ing ngisor iki:

sudo yum install -y clickhouse-server clickhouse-client

Allow clickhouse-server kanggo ngrungokake kertu jaringan ing file /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

Ngganti level logging saka trace menyang debug

debug

Setelan kompresi standar:

min_compress_block_size  65536
max_compress_block_size  1048576

Kanggo ngaktifake kompresi Zstd, disaranake ora ndemek konfigurasi kasebut, nanging nggunakake DDL.

Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Aku ora bisa nemokake carane nggunakake kompresi zstd liwat DDL ing Google. Dadi aku ninggalake iku.

Kolega sing nggunakake komprèsi zstd ing Clickhouse, please nuduhake instruksi.

Kanggo miwiti server minangka daemon, jalanake:

service clickhouse-server start

Saiki ayo pindhah menyang nyetel Clickhouse

Pindhah menyang Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — IP server ing ngendi Clickhouse diinstal.

Ayo nggawe database vektor

CREATE DATABASE vector;

Ayo priksa manawa database ana.

show databases;

Nggawe tabel vector.logs.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Kita priksa manawa tabel wis digawe. Ayo diluncurake clickhouse-client lan nggawe panjalukan.

Ayo menyang database vektor.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Ayo katon ing tabel.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Nginstal elasticsearch ing server kaping 4 kanggo ngirim data sing padha menyang Elasticsearch kanggo mbandhingake karo Clickhouse

Tambah tombol rpm umum

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Ayo nggawe 2 repo:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Instal elasticsearch lan kibana

yum install -y kibana elasticsearch

Amarga bakal ana ing 1 salinan, sampeyan kudu nambahake ing ngisor iki menyang file /etc/elasticsearch/elasticsearch.yml:

discovery.type: single-node

Supaya vektor bisa ngirim data menyang elasticsearch saka server liyane, ayo ngganti network.host.

network.host: 0.0.0.0

Kanggo nyambung menyang kibana, ganti parameter server.host ing file /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Lawas lan kalebu elasticsearch ing autostart

systemctl enable elasticsearch
systemctl start elasticsearch

lan kibana

systemctl enable kibana
systemctl start kibana

Konfigurasi Elasticsearch kanggo mode siji-node 1 shard, 0 replika. Paling kamungkinan sampeyan bakal duwe klompok nomer akeh server lan sampeyan ora perlu kanggo nindakake iki.

Kanggo indeks mangsa ngarep, nganyari cithakan standar:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

Instalasi Vector minangka panggantos kanggo Logstash ing server 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Ayo nyiyapake Vektor minangka panggantos kanggo Logstash. Ngedit file /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Sampeyan bisa nyetel transforms.nginx_parse_add_defaults bagean.

Wiwit Vyacheslav Rakhinsky nggunakake konfigurasi iki kanggo CDN cilik lan bisa uga ana sawetara nilai ing hulu_*

Contone:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Yen iki dudu kahanan sampeyan, bagean iki bisa disederhanakake

Ayo nggawe setelan layanan kanggo systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Sawise nggawe tabel, sampeyan bisa mbukak Vector

systemctl enable vector
systemctl start vector

Log vektor bisa dideleng kaya iki:

journalctl -f -u vector

Mesthine ana entri kaya iki ing log

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Ing klien (Web server) - 1st server

Ing server kanthi nginx, sampeyan kudu mateni ipv6, amarga tabel log ing clickhouse nggunakake lapangan upstream_addr IPv4, awit aku ora nggunakake IPv6 nang jaringan. Yen ipv6 ora dipateni, bakal ana kesalahan:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Mbok maca, nambah support ipv6.

Gawe file /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Nglamar setelan

sysctl --system

Ayo nginstal nginx.

Nambahake file repositori nginx /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Instal paket nginx

yum install -y nginx

Pisanan, kita kudu ngatur format log ing Nginx ing file /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Supaya ora ngrusak konfigurasi saiki, Nginx ngidini sampeyan duwe sawetara arahan access_log

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Aja lali kanggo nambah aturan kanggo logrotate kanggo log anyar (yen file log ora mungkasi karo .log)

Mbusak default.conf saka /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Tambah host virtual /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Tambah host virtual /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Tambah host virtual /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Tambah host virtual /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Tambah host virtual (172.26.10.106 ip saka server ing ngendi nginx diinstal) menyang kabeh server menyang file /etc/hosts:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Lan yen kabeh wis siyap

nginx -t 
systemctl restart nginx

Saiki ayo nginstal dhewe Vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Ayo nggawe file setelan kanggo systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Lan ngatur panggantos Filebeat ing /etc/vector/vector.toml config. Alamat IP 172.26.10.108 yaiku alamat IP server log (Vector-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Aja lali nambah pangguna vektor menyang grup sing dibutuhake supaya bisa maca file log. Contone, nginx ing centos nggawe log kanthi hak grup adm.

usermod -a -G adm vector

Ayo miwiti layanan vektor

systemctl enable vector
systemctl start vector

Log vektor bisa dideleng kaya iki:

journalctl -f -u vector

Mesthine ana entri kaya iki ing log

INFO vector::topology::builder: Healthcheck: Passed.

Stress Testing

Kita nindakake tes nggunakake benchmark Apache.

Paket httpd-tools diinstal ing kabeh server

Kita miwiti nyoba nggunakake pathokan Apache saka 4 server beda ing layar. Pisanan, kita miwiti multiplexer terminal layar, banjur kita miwiti nyoba nggunakake pathokan Apache. Cara nggarap layar sing bisa ditemokake ing artikel.

Saka server 1st

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Saka server 2st

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Saka server 3st

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Saka server 4st

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Ayo mriksa data ing Clickhouse

Pindhah menyang Clickhouse

clickhouse-client -h 172.26.10.109 -m

Nggawe pitakon SQL

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Temokake ukuran tabel ing Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Ayo goleki pinten log sing dijupuk ing Clickhouse.

Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Ukuran tabel log yaiku 857.19 MB.

Nginx log json Nginx nggunakake Vector menyang Clickhouse lan Elasticsearch

Ukuran data sing padha ing indeks ing Elasticsearch yaiku 4,5GB.

Yen sampeyan ora nemtokake data ing vektor ing paramèter, Clickhouse njupuk 4500/857.19 = 5.24 kaping kurang saka ing Elasticsearch.

Ing vektor, kolom kompresi digunakake kanthi standar.

Telegram chatting dening clickhouse
Telegram chatting dening Elasticsearch
Telegram chatting dening "Koleksi lan analisis sistem pesen"

Source: www.habr.com

Add a comment