Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Vector, ontwerp om logdata, statistieke en gebeure te versamel, te transformeer en te stuur.

→ GitHub

Omdat dit in die Rust-taal geskryf is, word dit gekenmerk deur hoë werkverrigting en lae RAM-verbruik in vergelyking met sy eweknieë. Daarbenewens word baie aandag geskenk aan funksies wat verband hou met korrektheid, veral die vermoë om ongestuurde gebeurtenisse op 'n buffer op skyf te stoor en lêerrotasie.

Argitektonies is Vector 'n gebeurtenisroeteerder wat boodskappe van een of meer aanvaar bronne, opsioneel van toepassing oor hierdie boodskappe transformasies, en stuur dit na een of meer dreineer.

Vector is 'n plaasvervanger vir filebeat en logstash, dit kan in beide rolle optree (logs ontvang en stuur), meer besonderhede daaroor Online.

As in Logstash die ketting gebou is as invoer → filter → afvoer, dan is dit in Vector bronnetransformswasbakke

Voorbeelde kan in die dokumentasie gevind word.

Hierdie instruksie is 'n hersiene instruksie van Vyacheslav Rakhinsky. Die oorspronklike instruksies bevat geoip-verwerking. Toe geoip vanaf 'n interne netwerk getoets word, het vektor 'n fout gegee.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

As iemand geoip moet verwerk, verwys dan na die oorspronklike instruksies van Vyacheslav Rakhinsky.

Ons sal die kombinasie van Nginx (Access logs) → Vector (Client | Filebeat) → Vector (Server | Logstash) → apart in Clickhouse en afsonderlik in Elasticsearch konfigureer. Ons sal 4 bedieners installeer. Alhoewel u dit met 3 bedieners kan omseil.

Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Die skema is so iets.

Deaktiveer Selinux op al jou bedieners

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Ons installeer 'n HTTP-bediener-emulator + nutsprogramme op alle bedieners

As 'n HTTP-bediener-emulator sal ons gebruik nodejs-stub-bediener van Maksim Ignatenko

Nodejs-stub-bediener het nie 'n rpm nie. Hier skep rpm daarvoor. rpm sal gebou word met behulp van Fedora Copr

Voeg die antonpatsev/nodejs-stub-bediener-bewaarplek by

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Installeer nodejs-stub-bediener, Apache-benchmark en skermterminale multiplexer op alle bedieners

yum -y install stub_http_server screen mc httpd-tools screen

Ek het die stub_http_server-reaksietyd in die /var/lib/stub_http_server/stub_http_server.js-lêer reggestel sodat daar meer logs was.

var max_sleep = 10;

Kom ons begin stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Clickhouse installasie op bediener 3

ClickHouse gebruik die SSE 4.2-instruksiestel, so tensy anders gespesifiseer, word ondersteuning daarvoor in die verwerker wat gebruik word 'n bykomende stelselvereiste. Hier is die opdrag om te kyk of die huidige verwerker SSE 4.2 ondersteun:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Eerstens moet u die amptelike bewaarplek koppel:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Om pakkette te installeer, moet jy die volgende opdragte uitvoer:

sudo yum install -y clickhouse-server clickhouse-client

Laat clickhouse-server toe om na die netwerkkaart in die lêer /etc/clickhouse-server/config.xml te luister

<listen_host>0.0.0.0</listen_host>

Verander die logvlak van spoor na ontfouting

ontfout

Standaard kompressie instellings:

min_compress_block_size  65536
max_compress_block_size  1048576

Om Zstd-kompressie te aktiveer, is dit aangeraai om nie aan die konfigurasie te raak nie, maar eerder om DDL te gebruik.

Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Ek kon nie vind hoe om zstd-kompressie via DDL in Google te gebruik nie. So ek het dit gelos soos dit is.

Kollegas wat zstd-kompressie in Clickhouse gebruik, deel asseblief die instruksies.

Om die bediener as 'n daemon te begin, hardloop:

service clickhouse-server start

Kom ons gaan nou verder met die opstel van Clickhouse

Gaan na Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — IP van die bediener waar Clickhouse geïnstalleer is.

Kom ons skep 'n vektordatabasis

CREATE DATABASE vector;

Kom ons kyk of die databasis bestaan.

show databases;

Skep 'n vector.logs-tabel.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Ons kyk of die tabelle geskep is. Kom ons begin clickhouse-client en 'n versoek rig.

Kom ons gaan na die vektordatabasis.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Kom ons kyk na die tabelle.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Installeer elasticsearch op die 4de bediener om dieselfde data na Elasticsearch te stuur vir vergelyking met Clickhouse

Voeg 'n publieke rpm-sleutel by

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Kom ons skep 2 repo:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Installeer elasticsearch en kibana

yum install -y kibana elasticsearch

Aangesien dit in 1 kopie sal wees, moet jy die volgende by die /etc/elasticsearch/elasticsearch.yml-lêer voeg:

discovery.type: single-node

Sodat daardie vektor data vanaf 'n ander bediener na elasticsearch kan stuur, kom ons verander netwerk.gasheer.

network.host: 0.0.0.0

Om aan kibana te koppel, verander die server.host parameter in die lêer /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Oud en sluit elastiese soek in autostart in

systemctl enable elasticsearch
systemctl start elasticsearch

en kibana

systemctl enable kibana
systemctl start kibana

Konfigureer Elasticsearch vir enkelnodusmodus 1 skerf, 0 replika. Heel waarskynlik sal jy 'n groep van 'n groot aantal bedieners hê en jy hoef dit nie te doen nie.

Dateer die verstek sjabloon op vir toekomstige indekse:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

installasie Vector as 'n plaasvervanger vir Logstash op bediener 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Kom ons stel Vector op as 'n plaasvervanger vir Logstash. Redigeer die lêer /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Jy kan die transforms.nginx_parse_add_defaults afdeling aanpas.

As Vyacheslav Rakhinsky gebruik hierdie konfigurasies vir 'n klein CDN en daar kan verskeie waardes in stroomop_* wees

Byvoorbeeld:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

As dit nie jou situasie is nie, kan hierdie afdeling vereenvoudig word

Kom ons skep diensinstellings vir systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Nadat u die tabelle geskep het, kan u Vector laat loop

systemctl enable vector
systemctl start vector

Vektor logs kan soos volg bekyk word:

journalctl -f -u vector

Daar behoort inskrywings soos hierdie in die logs te wees

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Op die kliënt (webbediener) - 1ste bediener

Op die bediener met nginx moet u ipv6 deaktiveer, aangesien die logs-tabel in clickhouse die veld gebruik upstream_addr IPv4, aangesien ek nie ipv6 intern gebruik nie. As ipv6 nie afgeskakel is nie, sal daar foute wees:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Miskien lesers, voeg ipv6-ondersteuning by.

Skep die lêer /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Pas die instellings toe

sysctl --system

Kom ons installeer nginx.

Bygevoeg nginx repository lêer /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Installeer die nginx-pakket

yum install -y nginx

Eerstens moet ons die logformaat in Nginx opstel in die lêer /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Om nie jou huidige konfigurasie te breek nie, laat Nginx jou toe om verskeie toegangslog-aanwysings te hê

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Moenie vergeet om 'n reël by te voeg om te logrotate vir nuwe logboeke nie (tensy die loglêer met .log eindig)

Verwyder default.conf van /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Voeg virtuele gasheer /etc/nginx/conf.d/vhost1.conf by

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Voeg virtuele gasheer /etc/nginx/conf.d/vhost2.conf by

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Voeg virtuele gasheer /etc/nginx/conf.d/vhost3.conf by

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Voeg virtuele gasheer /etc/nginx/conf.d/vhost4.conf by

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Voeg virtuele gashere (172.26.10.106 ip van die bediener waar nginx geïnstalleer is) by alle bedieners by die /etc/hosts-lêer:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

En as alles dan gereed is

nginx -t 
systemctl restart nginx

Kom ons installeer dit nou self Vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Kom ons skep 'n instellingslêer vir systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

En stel die Filebeat-vervanging in die /etc/vector/vector.toml config. IP-adres 172.26.10.108 is die IP-adres van die log-bediener (Vector-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Moenie vergeet om die vektorgebruiker by die vereiste groep te voeg sodat hy loglêers kan lees nie. Byvoorbeeld, nginx in centos skep logs met adm-groepregte.

usermod -a -G adm vector

Kom ons begin die vektordiens

systemctl enable vector
systemctl start vector

Vektor logs kan soos volg bekyk word:

journalctl -f -u vector

Daar behoort 'n inskrywing soos hierdie in die logs te wees

INFO vector::topology::builder: Healthcheck: Passed.

Strestoetsing

Toetsing word uitgevoer met behulp van Apache-benchmark.

Die httpd-tools-pakket is op alle bedieners geïnstalleer

Ons begin toets met Apache-benchmark vanaf 4 verskillende bedieners op die skerm. Eerstens begin ons die skermterminale multiplexer, en dan begin ons toets met die Apache-maatstaf. Hoe om met skerm te werk wat jy kan vind in Artikel.

Vanaf 1ste bediener

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Vanaf 2ste bediener

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Vanaf 3ste bediener

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Vanaf 4ste bediener

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Kom ons kyk na die data in Clickhouse

Gaan na Clickhouse

clickhouse-client -h 172.26.10.109 -m

Maak 'n SQL-navraag

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Vind uit die grootte van tafels in Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Kom ons vind uit hoeveel stompe in Clickhouse opgeneem het.

Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Die grootte van die logboektabel is 857.19 MB.

Stuur Nginx json-logboeke met Vector na Clickhouse en Elasticsearch

Die grootte van dieselfde data in die indeks in Elasticsearch is 4,5 GB.

As jy nie data in die vektor in die parameters spesifiseer nie, neem Clickhouse 4500/857.19 = 5.24 keer minder as in Elasticsearch.

In vektor word die kompressieveld by verstek gebruik.

Telegram gesels deur klikhuis
Telegram gesels deur Elasticsearch
Telegramklets deur "Versameling en ontleding van stelsel boodskappe"

Bron: will.com

Voeg 'n opmerking