Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Vector, erregistro-datuak, neurketak eta gertaerak biltzeko, eraldatzeko eta bidaltzeko diseinatua.

→ Github

Rust hizkuntzan idatzita dagoenez, errendimendu handia eta RAM kontsumo txikia ditu bere analogikoekin alderatuta. Horrez gain, zuzentasunari lotutako funtzioei arreta handia ematen zaie, bereziki, bidali gabeko gertaerak diskoan buffer batean gordetzeko eta fitxategiak biratzeko gaitasunari.

Arkitektura aldetik, Vector bat edo gehiagoren mezuak jasotzen dituen gertaeren bideratzailea da iturriak, aukeran mezu hauen gainean aplikatuz eraldaketak, eta bati edo gehiagori bidaltzea hustubideak.

Vector filebeat eta logstash-en ordezkoa da, bi roletan jardu dezake (erregistroak jaso eta bidali), haiei buruzko xehetasun gehiago. Online.

Logstash-en katea sarrera → iragazkia → irteera gisa eraikitzen bada, Vector-en dago iturritransformaziokonketa

Adibideak dokumentazioan aurki daitezke.

Argibide hau berrikusitako instrukzioa da Viatxeslav Rakhinsky. Jatorrizko argibideek geoip prozesatzea dute. Geoip barne sare batetik probatzean, bektoreak errore bat eman zuen.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Inork geoip prozesatu behar badu, jo ezazu jatorrizko argibideetara Viatxeslav Rakhinsky.

Nginx-en konbinazioa konfiguratuko dugu (Sarbide-erregistroak) → Vector (Bezeroa | Filebeat) → Bektorea (Zerbitzaria | Logstash) → bereizita Clickhouse-n eta bereizita Elasticsearch-en. 4 zerbitzari instalatuko ditugu. 3 zerbitzariekin saihestu dezakezun arren.

Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Eskema horrelako zerbait da.

Desgaitu Selinux zure zerbitzari guztietan

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

HTTP zerbitzariaren emuladorea + utilitateak instalatzen ditugu zerbitzari guztietan

HTTP zerbitzariaren emulatzaile gisa erabiliko dugu nodejs-stub-server tik Maxim Ignatenko

Nodejs-stub-server-ek ez du rpmrik. Hemen sortu rpm horretarako. rpm erabiliz konpilatuko da Fedora Copr

Gehitu antonpatsev/nodejs-stub-server biltegia

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Instalatu nodejs-stub-server, Apache benchmark eta pantaila terminal-multiplexua zerbitzari guztietan

yum -y install stub_http_server screen mc httpd-tools screen

Stub_http_server erantzun denbora /var/lib/stub_http_server/stub_http_server.js fitxategian zuzendu dut, erregistro gehiago egon daitezen.

var max_sleep = 10;

Abiarazi dezagun stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Clickhouse instalazioa 3 zerbitzarian

ClickHouse-k SSE 4.2 instrukzio-multzoa erabiltzen du, beraz, bestela zehaztu ezean, erabilitako prozesadorearen euskarria sistema-eskakizun gehigarri bihurtzen da. Hona hemen uneko prozesadoreak SSE 4.2 onartzen duen egiaztatzeko komandoa:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Lehenik eta behin biltegi ofiziala konektatu behar duzu:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Paketeak instalatzeko komando hauek exekutatu behar dituzu:

sudo yum install -y clickhouse-server clickhouse-client

Baimendu clickhouse-server-ek sare-txartela entzuteko /etc/clickhouse-server/config.xml fitxategian

<listen_host>0.0.0.0</listen_host>

Erregistro-maila arrastotik arazketara aldatzea

arazteko

Konpresioaren ezarpen estandarrak:

min_compress_block_size  65536
max_compress_block_size  1048576

Zstd konpresioa aktibatzeko, konfigurazioa ez ukitzea gomendatu zen, DDL erabiltzea baizik.

Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Ezin izan dut aurkitu Google-n zstd konpresioa DDL bidez nola erabili. Beraz, dagoen bezala utzi nuen.

Clickhouse-n zstd konpresioa erabiltzen duten lankideok, partekatu argibideak.

Zerbitzaria deabru gisa abiarazteko, exekutatu:

service clickhouse-server start

Orain pasa gaitezen Clickhouse konfiguratzera

Joan Clickhouse-ra

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — Clickhouse instalatuta dagoen zerbitzariaren IP.

Sor dezagun datu-base bektorial bat

CREATE DATABASE vector;

Egiaztatu dezagun datu-basea badagoela.

show databases;

Sortu bektore.logs taula.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Taulak sortu direla egiaztatzen dugu. Abiarazi gaitezen clickhouse-client eta eskaera egin.

Goazen datu base bektorialera.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Ikus ditzagun taulak.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Elasticsearch instalatzen 4. zerbitzarian datu berdinak bidaltzeko Elasticsearch Clickhouse-rekin alderatzeko

Gehitu rpm tekla publiko bat

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Sor ditzagun 2 repo:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Instalatu elasticsearch eta kibana

yum install -y kibana elasticsearch

Kopia bakarrean egongo denez, honako hau gehitu behar duzu /etc/elasticsearch/elasticsearch.yml fitxategira:

discovery.type: single-node

Beraz, bektore horrek beste zerbitzari batetik beste zerbitzari batetik elastiko bilaketa-ra datuak bidal ditzake, alda dezagun network.host.

network.host: 0.0.0.0

Kibanara konektatzeko, aldatu server.host parametroa /etc/kibana/kibana.yml fitxategian

server.host: "0.0.0.0"

Zaharra eta sartu elasticsearch abiarazte automatikoan

systemctl enable elasticsearch
systemctl start elasticsearch

eta kibana

systemctl enable kibana
systemctl start kibana

Elasticsearch konfiguratzen nodo bakarreko moduko 1 zati, 0 erreplika. Seguruenik zerbitzari ugariko kluster bat izango duzu eta ez duzu hau egin beharrik.

Etorkizuneko indizeetarako, eguneratu txantiloi lehenetsia:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

Instalazio- Vector Logstash-en ordezko gisa 2 zerbitzarian

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Konfigura dezagun Vector Logstash-en ordezko gisa. /etc/vector/vector.toml fitxategia editatzen

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

transforms.nginx_parse_add_defaults atala doi dezakezu.

Bezala Viatxeslav Rakhinsky konfigurazio hauek CDN txiki baterako erabiltzen ditu eta hainbat balio egon daitezke upstream_*

Adibidez:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Hau ez bada zure egoera, orduan atal hau sinplifikatu daiteke

Sor ditzagun systemd /etc/systemd/system/vector.service zerbitzuaren ezarpenak

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Taulak sortu ondoren, Vector exekutatu dezakezu

systemctl enable vector
systemctl start vector

Erregistro bektorialak honela ikus daitezke:

journalctl -f -u vector

Erregistroetan horrelako sarrerak egon beharko lirateke

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Bezeroan (Web zerbitzaria) - 1. zerbitzaria

Nginx-ekin zerbitzarian, ipv6 desgaitu behar duzu, clickhouse-ko erregistroen taulak eremua erabiltzen baitu. upstream_addr IPv4, ez baitut sare barruan ipv6 erabiltzen. ipv6 desaktibatuta ez badago, erroreak egongo dira:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Agian irakurleek, gehitu ipv6 euskarria.

Sortu /etc/sysctl.d/98-disable-ipv6.conf fitxategia

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Ezarpenak aplikatzea

sysctl --system

Instala dezagun nginx.

Gehitu da nginx biltegiko fitxategia /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Instalatu nginx paketea

yum install -y nginx

Lehenik eta behin, Nginx-en erregistro formatua konfiguratu behar dugu /etc/nginx/nginx.conf fitxategian

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Zure egungo konfigurazioa ez apurtzeko, Nginx-ek hainbat access_log zuzentarau izan ditzakezu

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Ez ahaztu logrotate-ra arau bat gehitzea erregistro berrietarako (erregistro fitxategia .log-rekin amaitzen ez bada)

Kendu default.conf /etc/nginx/conf.d/-tik

rm -f /etc/nginx/conf.d/default.conf

Gehitu ostalari birtuala /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Gehitu ostalari birtuala /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Gehitu ostalari birtuala /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Gehitu ostalari birtuala /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Gehitu ostalari birtualak (nginx instalatuta dagoen zerbitzariaren 172.26.10.106 ip) zerbitzari guztiei /etc/hosts fitxategira:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Eta dena prest badago orduan

nginx -t 
systemctl restart nginx

Orain instala dezagun geuk Vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Sortu dezagun ezarpen fitxategi bat systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Eta konfiguratu Filebeat ordezkoa /etc/vector/vector.toml konfigurazioan. IP helbidea 172.26.10.108 erregistro-zerbitzariaren IP helbidea da (Vector-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Ez ahaztu bektore-erabiltzailea behar den taldean gehitzea, erregistro-fitxategiak irakurri ahal izateko. Adibidez, nginx centos-en erregistroak sortzen ditu adm taldeen eskubideekin.

usermod -a -G adm vector

Hasi gaitezen bektore-zerbitzua

systemctl enable vector
systemctl start vector

Erregistro bektorialak honela ikus daitezke:

journalctl -f -u vector

Erregistroetan horrelako sarrera bat egon beharko litzateke

INFO vector::topology::builder: Healthcheck: Passed.

Estres probak

Probak Apache benchmark erabiliz egiten ditugu.

httpd-tools paketea zerbitzari guztietan instalatu zen

Pantailan 4 zerbitzari desberdinetatik Apache benchmark erabiliz probatzen hasten gara. Lehenik eta behin, pantailako terminal-multiplexua abiarazten dugu, eta, ondoren, probak egiten hasiko gara Apache erreferentea erabiliz. Hemen aurki dezakezun pantailarekin nola lan egin Artikulu.

1. zerbitzaritik

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

2. zerbitzaritik

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

3. zerbitzaritik

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

4. zerbitzaritik

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Egiazta ditzagun datuak Clickhousen

Joan Clickhouse-ra

clickhouse-client -h 172.26.10.109 -m

SQL kontsulta bat egitea

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Ezagutu Clickhouse-ko mahaien tamaina

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Jakin dezagun zenbat erregistro hartu zuen Clickhouse-n.

Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Erregistroen taularen tamaina 857.19 MB da.

Nginx json erregistroak Vector erabiliz Clickhouse eta Elasticsearch-era bidaltzea

Elasticsearch-en indizeko datu berdinen tamaina 4,5 GB da.

Parametroetan bektorean datuak zehazten ez badituzu, Clickhouse-k Elasticsearch-en baino 4500/857.19 = 5.24 aldiz gutxiago hartzen du.

Bektorean, konpresio eremua erabiltzen da lehenespenez.

Telegramen txata clickhouse
Telegramen txata Elasticsearch
Telegram-en txata "Sistemaren bilketa eta azterketa mezuak"

Iturria: www.habr.com

Gehitu iruzkin berria