Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

véktor, dirancang pikeun ngumpulkeun, transformasi jeung ngirim data log, metrics sarta acara.

→ Github

Ditulis dina basa Rust, éta dicirikeun ku kinerja anu luhur sareng konsumsi RAM anu rendah dibandingkeun sareng analogna. Salaku tambahan, seueur perhatian dibayar ka fungsi anu aya hubunganana sareng kabeneran, khususna, kamampuan pikeun ngahemat kajadian anu teu dikirim ka panyangga dina disk sareng muterkeun file.

Sacara arsitéktur, Véktor mangrupikeun router acara anu nampi pesen ti hiji atanapi langkung sumber, opsional nerapkeun ngaliwatan pesen ieu transformasi, sarta ngirimkeunana ka hiji atawa leuwih solokan.

Véktor mangrupikeun gaganti filebeat sareng logstash, éta tiasa ngalaksanakeun duanana peran (nampi sareng ngirim log), langkung rinci ngeunaan éta. website.

Lamun di Logstash ranté diwangun salaku input → filter → output lajeng dina Véktor éta sumberngarobihtilelep

Conto tiasa dipendakan dina dokuméntasi.

Parentah ieu mangrupikeun instruksi anu dirévisi tina Vyacheslav Rakhinsky. Parentah aslina ngandung processing geoip. Nalika nguji geoip tina jaringan internal, vektor masihan kasalahan.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Upami aya anu peryogi ngolah geoip, teras tingal petunjuk asli ti Vyacheslav Rakhinsky.

Urang bakal ngonpigurasikeun kombinasi Nginx (Akses log) → Véktor (klien | Filebeat) → Véktor (Server | Logstash) → misah di Clickhouse sareng misah di Elasticsearch. Urang bakal masang 4 server. Sanajan anjeun bisa bypass eta kalawan 3 server.

Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

Skéma éta sapertos kieu.

Pareuman Selinux dina sadaya pangladén anjeun

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Urang masang émulator server HTTP + utilitas dina sadaya server

Salaku émulator server HTTP kami bakal nganggo nodejs-stub-server от Maxim Ignatenko

Nodejs-stub-server teu gaduh rpm. Ieu téh nyieun rpm pikeun eta. rpm bakal disusun ngagunakeun Fedora Copr

Tambahkeun gudang antonpatsev / nodejs-stub-server

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Pasang nodejs-stub-server, patokan Apache sareng multiplexer terminal layar dina sadaya server

yum -y install stub_http_server screen mc httpd-tools screen

Kuring ngabenerkeun waktos réspon stub_http_server dina file /var/lib/stub_http_server/stub_http_server.js supados langkung seueur log.

var max_sleep = 10;

Hayu urang ngajalankeun stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Pamasangan Clickhouse dina server 3

ClickHouse ngagunakeun set instruksi SSE 4.2, jadi iwal mun disebutkeun dieusian, rojongan pikeun eta dina processor dipaké jadi sarat sistem tambahan. Ieu paréntah pikeun pariksa naha prosésor ayeuna ngadukung SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Mimiti anjeun kedah nyambungkeun gudang resmi:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Pikeun masang bungkusan anjeun kedah ngajalankeun paréntah di handap ieu:

sudo yum install -y clickhouse-server clickhouse-client

Ngidinan clickhouse-server ngadangukeun kartu jaringan dina file /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

Ngarobah tingkat logging ti renik ka debug

debug

Setélan komprési standar:

min_compress_block_size  65536
max_compress_block_size  1048576

Pikeun ngaktifkeun komprési Zstd, disarankan pikeun henteu nyabak config, tapi nganggo DDL.

Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

Abdi henteu tiasa mendakan kumaha ngagunakeun komprési zstd via DDL di Google. Ku kituna kuring ninggalkeun eta sakumaha anu kasebut.

Kolega anu ngagunakeun komprési zstd di Clickhouse, mangga bagikeun parentah.

Pikeun ngamimitian server salaku daemon, jalankeun:

service clickhouse-server start

Ayeuna hayu urang teraskeun kana nyetél Clickhouse

Pindah ka Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — IP server dimana Clickhouse dipasang.

Hayu urang nyieun database vektor

CREATE DATABASE vector;

Hayu urang pariksa yen database aya.

show databases;

Jieun tabel vector.logs.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Urang pariksa yen tabel geus dijieun. Hayu urang ngajalankeun clickhouse-client sarta nyieun pamundut.

Hayu urang buka database vektor.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Hayu urang nempo tabel.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Masang elasticsearch dina server ka-4 pikeun ngirim data anu sami ka Elasticsearch pikeun ngabandingkeun sareng Clickhouse

Tambahkeun konci rpm umum

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Hayu urang ngadamel 2 repo:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Pasang elasticsearch sareng kibana

yum install -y kibana elasticsearch

Kusabab éta bakal aya dina 1 salinan, anjeun kedah nambihan ieu kana file /etc/elasticsearch/elasticsearch.yml:

discovery.type: single-node

Sangkan vektor bisa ngirim data ka elasticsearch ti server sejen, hayu urang ngaganti network.host.

network.host: 0.0.0.0

Pikeun nyambung ka kibana, robih parameter server.host dina file /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Heubeul sareng kalebet elasticsearch dina autostart

systemctl enable elasticsearch
systemctl start elasticsearch

jeung kibana

systemctl enable kibana
systemctl start kibana

Ngonpigurasikeun Elasticsearch pikeun mode single-node 1 shard, 0 replika. Paling dipikaresep anjeun bakal boga klaster angka nu gede ngarupakeun server jeung anjeun teu kedah ngalakukeun ieu.

Pikeun indéks hareup, apdet témplat standar:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

setting véktor salaku gaganti Logstash dina server 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Hayu urang nyetél Véktor salaku gaganti Logstash. Ngédit file /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Anjeun tiasa nyaluyukeun bagian transforms.nginx_parse_add_defaults.

ti mimiti Vyacheslav Rakhinsky ngagunakeun konfigurasi ieu pikeun CDN leutik sareng tiasa aya sababaraha nilai di hulu_*

Contona:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Upami ieu sanés kaayaan anjeun, maka bagian ieu tiasa disederhanakeun

Hayu urang nyieun setélan jasa pikeun systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Saatos nyieun tabel, anjeun tiasa ngajalankeun Véktor

systemctl enable vector
systemctl start vector

Log vektor tiasa ditingali sapertos kieu:

journalctl -f -u vector

Kedah aya éntri sapertos kieu dina log

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Dina klien (Web server) - 1st server

Dina server nganggo nginx, anjeun kedah nganonaktipkeun ipv6, sabab tabel log di clickhouse nganggo lapangan. upstream_addr IPv4, saprak kuring henteu nganggo IPv6 di jero jaringan. Upami ipv6 henteu dipareuman, bakal aya kasalahan:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Panginten pamiarsa, tambahkeun dukungan ipv6.

Jieun file /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Nerapkeun setélan

sysctl --system

Hayu urang masang nginx.

Ditambahkeun file gudang nginx /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Pasang pakét nginx

yum install -y nginx

Mimiti, urang kedah ngonpigurasikeun format log dina Nginx dina file /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Dina raraga teu megatkeun konfigurasi anjeun ayeuna, Nginx ngidinan Anjeun pikeun mibanda sababaraha access_log directives

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Tong hilap nambihan aturan pikeun logrotate pikeun log énggal (upami file log henteu ditungtungan ku .log)

Hapus default.conf tina /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Tambahkeun virtual host /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Tambahkeun virtual host /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Tambahkeun virtual host /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Tambahkeun virtual host /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Tambihkeun host virtual (172.26.10.106 ip tina server dimana nginx dipasang) ka sadaya server kana file /etc/hosts:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Tur upami sagalana geus siap lajeng

nginx -t 
systemctl restart nginx

Ayeuna hayu urang pasang sorangan véktor

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Hayu urang nyieun file setelan pikeun systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Jeung ngonpigurasikeun ngagantian Filebeat dina /etc/vector/vector.toml config. Alamat IP 172.26.10.108 nyaéta alamat IP tina server log (Véktor-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Tong hilap nambihan pangguna vektor kana grup anu diperyogikeun supados anjeunna tiasa maca file log. Contona, nginx di centos nyieun log kalawan hak grup adm.

usermod -a -G adm vector

Hayu urang ngamimitian jasa vektor

systemctl enable vector
systemctl start vector

Log vektor tiasa ditingali sapertos kieu:

journalctl -f -u vector

Kedah aya éntri sapertos kieu dina log

INFO vector::topology::builder: Healthcheck: Passed.

Uji Stress

Kami ngalaksanakeun tés nganggo patokan Apache.

Paket httpd-tools dipasang dina sadaya server

Urang mimitian nguji ngagunakeun patokan Apache ti 4 server béda dina layar. Kahiji, urang ngajalankeun multiplexer terminal layar, lajeng urang mimitian nguji ngagunakeun patokan Apache. Kumaha damel sareng layar anjeun tiasa mendakan di artikel.

Ti server 1st

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Ti server 2st

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Ti server 3st

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Ti server 4st

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Hayu urang pariksa data dina Clickhouse

Pindah ka Clickhouse

clickhouse-client -h 172.26.10.109 -m

Nyieun query SQL

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Panggihan ukuran tabel di Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Hayu urang terang sabaraha log nyandak di Clickhouse.

Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

Ukuran tabel log nyaéta 857.19 MB.

Ngirim log Nginx json nganggo Véktor ka Clickhouse sareng Elasticsearch

Ukuran data anu sami dina indéks dina Elasticsearch nyaéta 4,5GB.

Mun anjeun teu nangtukeun data dina vektor dina parameter, Clickhouse nyokot 4500/857.19 = 5.24 kali kirang ti di Elasticsearch.

Dina véktor, médan komprési dianggo sacara standar.

Telegram obrolan ku clickhouse
Telegram obrolan ku Elasticsearch
Telegram chat ku "Koléksi sareng analisa sistem pesen"

sumber: www.habr.com

Tambahkeun komentar