Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

Vector, ûntworpen om loggegevens, metriken en eveneminten te sammeljen, te transformearjen en te ferstjoeren.

→ Github

Skreaun yn 'e Rust-taal, wurdt it karakterisearre troch hege prestaasjes en lege RAM-konsumpsje yn ferliking mei syn analogen. Dêrnjonken wurdt in protte omtinken jûn oan funksjes relatearre oan korrektens, benammen de mooglikheid om net-ferstjoerde eveneminten op in buffer op skiif te bewarjen en bestannen te rotearjen.

Architecturally, Vector is in evenemint router dy't ûntfangt berjochten fan ien of mear boarnen, opsjoneel tapasse oer dizze berjochten transformaasjes, en stjoer se nei ien of mear drains.

Vector is in ferfanging foar filebeat en logstash, it kin hannelje yn beide rollen (ûntfange en ferstjoere logs), mear details oer har side.

As yn Logstash de ketting is boud as ynfier → filter → útfier dan is it yn Vector boarnentransformearretsinkt

Foarbylden kinne fûn wurde yn 'e dokumintaasje.

Dizze ynstruksje is in herziene ynstruksje fan Vjatsjeslav Rakhinsky. De orizjinele ynstruksjes befetsje geoip-ferwurking. By it testen fan geoip fan in ynterne netwurk, joech vector in flater.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

As immen geoip moat ferwurkje, ferwize dan nei de orizjinele ynstruksjes fan Vjatsjeslav Rakhinsky.

Wy sille de kombinaasje fan Nginx (Access logs) → Vector (Client | Filebeat) → Vector (Tsjinner | Logstash) → apart yn Clickhouse en apart yn Elasticsearch konfigurearje. Wy sille 4 servers ynstallearje. Hoewol kinne jo it omgean mei 3 tsjinners.

Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

It skema is sa'n ding.

Skeakelje Selinux op al jo servers út

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Wy ynstallearje in HTTP-tsjinner emulator + utilities op alle servers

As HTTP-serveremulator sille wy brûke nodejs-stub-tsjinner от Maxim Ignatenko

Nodejs-stub-server hat gjin rpm. it is meitsje rpm foar it. rpm sil wurde boud mei help fan Fedora Copr

Foegje it antonpatsev/nodejs-stub-server repository ta

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Ynstallearje nodejs-stub-server, Apache benchmark en skerm terminal multiplexer op alle servers

yum -y install stub_http_server screen mc httpd-tools screen

Ik korrizjearre de stub_http_server-antwurdtiid yn 'e /var/lib/stub_http_server/stub_http_server.js-bestân sadat der mear logs wiene.

var max_sleep = 10;

Litte wy stub_http_server starte.

systemctl start stub_http_server
systemctl enable stub_http_server

Clickhouse ynstallaasje op server 3

ClickHouse brûkt de SSE 4.2-ynstruksjeset, dus as it net oars oanjûn is, wurdt stipe foar it yn 'e brûkte prosessor in ekstra systeemeask. Hjir is it kommando om te kontrolearjen oft de hjoeddeistige prosessor SSE 4.2 stipet:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Earst moatte jo it offisjele repository ferbine:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Om pakketten te ynstallearjen moatte jo de folgjende kommando's útfiere:

sudo yum install -y clickhouse-server clickhouse-client

tastean clickhouse-server te harkjen nei de netwurkkaart yn it bestân /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

It loggingnivo feroarje fan trace nei debug

debug

Standert kompresje ynstellings:

min_compress_block_size  65536
max_compress_block_size  1048576

Om Zstd-kompresje te aktivearjen, waard it advisearre om de konfiguraasje net oan te raken, mar leaver DDL te brûken.

Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

Ik koe net fine hoe't ik zstd-kompresje brûke kin fia DDL yn Google. Dat ik liet it sa't it is.

Kollega's dy't zstd-kompresje brûke yn Clickhouse, diel asjebleaft de ynstruksjes.

Om de tsjinner as in daemon te begjinnen, útfiere:

service clickhouse-server start

Litte wy no trochgean mei it ynstellen fan Clickhouse

Gean nei Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 - IP fan de tsjinner dêr't Clickhouse is ynstallearre.

Litte wy in vectordatabase meitsje

CREATE DATABASE vector;

Litte wy kontrolearje dat de databank bestiet.

show databases;

Meitsje in vector.logs tabel.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Wy kontrolearje dat de tabellen makke binne. Litte wy begjinne clickhouse-client en meitsje in fersyk.

Lit ús gean nei de vector databank.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Litte wy nei de tabellen sjen.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Elasticsearch ynstallearje op 'e 4e tsjinner om deselde gegevens nei Elasticsearch te stjoeren foar fergeliking mei Clickhouse

Foegje in iepenbiere rpm-kaai ta

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Litte wy 2 repo oanmeitsje:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Ynstallearje elasticsearch en kibana

yum install -y kibana elasticsearch

Om't it yn 1 kopy sil wêze, moatte jo it folgjende tafoegje oan it /etc/elasticsearch/elasticsearch.yml-bestân:

discovery.type: single-node

Sadat dat vector kin stjoere gegevens nei elasticsearch fan in oare server, lit ús feroarje network.host.

network.host: 0.0.0.0

Om te ferbinen mei kibana, feroarje de server.host parameter yn it bestân /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Alde en befetsje elasticsearch yn autostart

systemctl enable elasticsearch
systemctl start elasticsearch

en kiba

systemctl enable kibana
systemctl start kibana

Elasticsearch konfigurearje foar ienknooppuntmodus 1 shard, 0 replika. Meast wierskynlik sille jo in kluster hawwe fan in grut oantal servers en jo hoege dit net te dwaan.

Bywurkje it standert sjabloan foar takomstige yndeksen:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

ynstelling Vector as ferfanging foar Logstash op server 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Litte wy Vector ynstelle as ferfanging foar Logstash. It bewurkjen fan de triem /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Jo kinne oanpasse de transforms.nginx_parse_add_defaults seksje.

sûnt Vjatsjeslav Rakhinsky brûkt dizze konfiguraasjes foar in lyts CDN en d'r kinne ferskate wearden wêze yn streamop_*

Bygelyks:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

As dit net jo situaasje is, dan kin dizze seksje ferienfâldige wurde

Litte wy tsjinstynstellingen meitsje foar systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Nei it meitsjen fan de tabellen, kinne jo rinne Vector

systemctl enable vector
systemctl start vector

Vectorlogs kinne sa besjoen wurde:

journalctl -f -u vector

D'r moatte ynstjoerings lykas dit wêze yn 'e logs

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Op de client (webserver) - 1e tsjinner

Op de tsjinner mei nginx moatte jo ipv6 útskeakelje, om't de logtabel yn clickhouse it fjild brûkt upstream_addr IPv4, om't ik ipv6 net yn it netwurk brûke. As ipv6 net útskeakele is, sille d'r flaters wêze:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Faaks lêzers, foegje ipv6-stipe ta.

Meitsje it bestân /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

It tapassen fan de ynstellings

sysctl --system

Litte wy nginx ynstallearje.

Added nginx repository file /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Ynstallearje it nginx-pakket

yum install -y nginx

Earst moatte wy it logformaat yn Nginx ynstelle yn it bestân /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Om jo hjoeddeistige konfiguraasje net te brekken, lit Nginx jo ferskate access_log-rjochtlinen hawwe

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Ferjit net om in regel ta te foegjen om te logrotearjen foar nije logs (as it logbestân net einiget mei .log)

Ferwiderje default.conf fan /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Foegje firtuele host ta /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Foegje firtuele host ta /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Foegje firtuele host ta /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Foegje firtuele host ta /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Foegje firtuele hosts (172.26.10.106 ip fan de tsjinner dêr't nginx is ynstallearre) ta oan alle servers oan it /etc/hosts-bestân:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

En as alles dan klear is

nginx -t 
systemctl restart nginx

No litte wy it sels ynstallearje Vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Litte wy in ynstellingsbestân meitsje foar systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

En konfigurearje de Filebeat-ferfanging yn 'e /etc/vector/vector.toml konfiguraasje. 172.26.10.108 is it IP-adres fan 'e log-tsjinner (Vector-Server) IP-adres.

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Ferjit net de fektorbrûker ta te foegjen oan 'e fereaske groep, sadat hy logbestannen kin lêze. Bygelyks, nginx yn centos makket logs mei adm-groeprjochten.

usermod -a -G adm vector

Litte wy de vectortsjinst begjinne

systemctl enable vector
systemctl start vector

Vectorlogs kinne sa besjoen wurde:

journalctl -f -u vector

Der moat in yngong lykas dizze yn 'e logs stean

INFO vector::topology::builder: Healthcheck: Passed.

Stress Testing

Testen wurdt útfierd mei Apache-benchmark.

It pakket httpd-tools is ynstalleare op alle servers

Wy begjinne te testen mei Apache-benchmark fan 4 ferskillende servers op it skerm. Earst lansearje wy de skermterminalmultiplexer, en dan begjinne wy ​​​​te testen mei de Apache-benchmark. Hoe kinne jo wurkje mei skerm kinne jo fine yn artikel.

Fan 1e tsjinner

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Fan 2e tsjinner

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Fan 3e tsjinner

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Fan 4e tsjinner

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Litte wy de gegevens yn Clickhouse kontrolearje

Gean nei Clickhouse

clickhouse-client -h 172.26.10.109 -m

It meitsjen fan in SQL-query

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Fyn de grutte fan tabellen yn Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Litte wy útfine hoefolle logs opnommen yn Clickhouse.

Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

De logs tabel grutte is 857.19 MB.

Nginx json-logs ferstjoere mei Vector nei Clickhouse en Elasticsearch

De grutte fan deselde gegevens yn 'e yndeks yn Elasticsearch is 4,5GB.

As jo ​​net oantsjutte gegevens yn de vector yn de parameters, Clickhouse nimt 4500/857.19 = 5.24 kear minder as yn Elasticsearch.

Yn vector wurdt it kompresjefjild standert brûkt.

Telegram petear troch klikhûs
Telegram petear troch Elastyskesearch
Telegram chat troch "Samling en analyze fan systeem berjochten"

Boarne: www.habr.com

Add a comment