Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

vektor, jurnal ma'lumotlari, ko'rsatkichlar va hodisalarni to'plash, o'zgartirish va yuborish uchun mo'ljallangan.

→ Github

Rust tilida yozilgan bo'lib, u analoglari bilan solishtirganda yuqori unumdorlik va kam RAM iste'moli bilan ajralib turadi. Bundan tashqari, to'g'rilik bilan bog'liq funktsiyalarga, xususan, diskdagi buferga yuborilmagan hodisalarni saqlash va fayllarni aylantirish qobiliyatiga katta e'tibor beriladi.

Arxitektura nuqtai nazaridan, Vektor bir yoki bir nechta xabarlarni qabul qiluvchi voqea routeridir manbalar, ixtiyoriy ravishda ushbu xabarlar ustidan qo'llash transformatsiyalar, va ularni bir yoki bir nechtasiga yuborish drenajlar.

Vektor filebeat va logstash o'rnini bosadi, u ikkala rolda ham harakat qilishi mumkin (jurnallarni qabul qilish va yuborish), ular haqida batafsil ma'lumot сайт.

Agar Logstash-da zanjir kirish → filtr → chiqish sifatida qurilgan bo'lsa, Vektorda shunday bo'ladi manbalaro'zgartiradicho'kish

Misollarni hujjatlarda topish mumkin.

Ushbu ko'rsatma qayta ko'rib chiqilgan yo'riqnomadir Vyacheslav Raxinskiy. Asl ko'rsatmalar geoip qayta ishlashni o'z ichiga oladi. Ichki tarmoqdan geoipni sinab ko'rishda vektor xatolik yuz berdi.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Agar kimdir geoipni qayta ishlashga muhtoj bo'lsa, u holda asl ko'rsatmalarga qarang Vyacheslav Raxinskiy.

Biz Nginx (Kirish jurnallari) → Vektor (Mijoz | Filebeat) → Vektor (Server | Logstash) → kombinatsiyasini Clickhouse-da alohida va Elasticsearch-da alohida sozlaymiz. Biz 4 ta serverni o'rnatamiz. Garchi siz uni 3 ta server bilan chetlab o'tsangiz ham.

Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

Sxema shunga o'xshash narsa.

Barcha serverlaringizda Selinux-ni o'chiring

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Biz barcha serverlarda HTTP server emulyatori + yordamchi dasturlarni o'rnatamiz

HTTP server emulyatori sifatida biz foydalanamiz nodejs-stub-server от Maksim Ignatenko

Nodejs-stub-serverda aylanish tezligi yo'q. u Buning uchun rpm yarating. rpm yordamida quriladi Fedora Copr

Antonpatsev/nodejs-stub-server omborini qo'shing

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Barcha serverlarda nodejs-stub-server, Apache benchmark va ekran terminali multipleksorini o'rnating.

yum -y install stub_http_server screen mc httpd-tools screen

Men stub_http_server javob vaqtini /var/lib/stub_http_server/stub_http_server.js faylida tuzatdim, shunda ko'proq jurnallar bo'ladi.

var max_sleep = 10;

Keling, stub_http_serverni ishga tushiramiz.

systemctl start stub_http_server
systemctl enable stub_http_server

Clickhouse o'rnatish serverda 3

ClickHouse SSE 4.2 ko'rsatmalar to'plamidan foydalanadi, shuning uchun agar boshqacha ko'rsatilmagan bo'lsa, ishlatiladigan protsessorda uni qo'llab-quvvatlash qo'shimcha tizim talabiga aylanadi. Joriy protsessor SSE 4.2 ni qo'llab-quvvatlashini tekshirish buyrug'i:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Avval siz rasmiy omborni ulashingiz kerak:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Paketlarni o'rnatish uchun siz quyidagi buyruqlarni bajarishingiz kerak:

sudo yum install -y clickhouse-server clickhouse-client

Clickhouse-serverga /etc/clickhouse-server/config.xml faylidagi tarmoq kartasini tinglashiga ruxsat bering

<listen_host>0.0.0.0</listen_host>

Jurnallar darajasini izdan disk raskadrovkaga o'zgartirish

debug

Standart siqish sozlamalari:

min_compress_block_size  65536
max_compress_block_size  1048576

Zstd siqishni faollashtirish uchun konfiguratsiyaga tegmaslik, balki DDL dan foydalanish tavsiya qilindi.

Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

Google'da DDL orqali zstd siqishni qanday ishlatishni topa olmadim. Shunday qilib, men uni avvalgidek qoldirdim.

Clickhouse-da zstd siqishni ishlatadigan hamkasblar, ko'rsatmalarni baham ko'ring.

Serverni demon sifatida ishga tushirish uchun quyidagilarni bajaring:

service clickhouse-server start

Endi Clickhouse-ni sozlashga o'tamiz

Clickhouse-ga o'ting

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 — Clickhouse o'rnatilgan serverning IP manzili.

Keling, vektor ma'lumotlar bazasini yarataylik

CREATE DATABASE vector;

Keling, ma'lumotlar bazasi mavjudligini tekshiramiz.

show databases;

vector.logs jadvalini yarating.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Jadvallar yaratilganligini tekshiramiz. Keling, ishga tushamiz clickhouse-client va so'rov yuboring.

Keling, vektor ma'lumotlar bazasiga o'tamiz.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Keling, jadvallarni ko'rib chiqaylik.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Clickhouse bilan solishtirish uchun Elasticsearch-ga bir xil ma'lumotlarni yuborish uchun 4-serverga elasticsearch o'rnatilmoqda.

Umumiy aylanish kalitini qo'shing

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Keling, 2 ta repo yarataylik:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Elasticsearch va kibana-ni o'rnating

yum install -y kibana elasticsearch

U 1 nusxada bo'lgani uchun /etc/elasticsearch/elasticsearch.yml fayliga quyidagilarni qo'shishingiz kerak:

discovery.type: single-node

Bu vektor boshqa serverdan ma'lumotlarni elasticsearch-ga yuborishi mumkin, keling network.host-ni o'zgartiramiz.

network.host: 0.0.0.0

Kibana-ga ulanish uchun /etc/kibana/kibana.yml faylidagi server.host parametrini o'zgartiring.

server.host: "0.0.0.0"

Eski va autostartga elasticsearch ni qo'shing

systemctl enable elasticsearch
systemctl start elasticsearch

va kibana

systemctl enable kibana
systemctl start kibana

Yagona tugunli rejim uchun Elasticsearch sozlanmoqda 1 parcha, 0 replika. Katta ehtimol bilan sizda ko'p sonli serverlar klasteri bo'ladi va buni qilish shart emas.

Kelajakdagi indekslar uchun standart shablonni yangilang:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

sozlama vektor 2-serverda Logstash o'rniga

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Keling, Vektorni Logstash o'rniga o'rnatamiz. /etc/vector/vector.toml faylini tahrirlash

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Siz transforms.nginx_parse_add_defaults bo'limini sozlashingiz mumkin.

chunki Vyacheslav Raxinskiy kichik CDN uchun ushbu konfiguratsiyalardan foydalanadi va yuqori oqimda bir nechta qiymatlar bo'lishi mumkin_*

Masalan:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Agar bu sizning holatingiz bo'lmasa, unda ushbu bo'limni soddalashtirish mumkin

Systemd /etc/systemd/system/vector.service uchun xizmat sozlamalarini yarataylik

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Jadvallarni yaratgandan so'ng, Vektorni ishga tushirishingiz mumkin

systemctl enable vector
systemctl start vector

Vektor jurnallarini quyidagicha ko'rish mumkin:

journalctl -f -u vector

Jurnallarda shunga o'xshash yozuvlar bo'lishi kerak

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Mijozda (veb-server) - 1-server

Nginx serverida siz ipv6-ni o'chirib qo'yishingiz kerak, chunki clickhouse-dagi jurnallar jadvali maydondan foydalanadi. upstream_addr IPv4, chunki men tarmoq ichida ipv6 dan foydalanmayman. Agar ipv6 o'chirilmagan bo'lsa, xatolar bo'ladi:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Ehtimol, o'quvchilar, ipv6 qo'llab-quvvatlashini qo'shing.

/etc/sysctl.d/98-disable-ipv6.conf faylini yarating

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Sozlamalarni qo'llash

sysctl --system

Nginx ni o'rnatamiz.

Qo'shilgan nginx ombor fayli /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Nginx paketini o'rnating

yum install -y nginx

Birinchidan, /etc/nginx/nginx.conf faylida Nginx-da jurnal formatini sozlashimiz kerak.

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Joriy konfiguratsiyani buzmaslik uchun Nginx sizga bir nechta access_log ko'rsatmalariga ega bo'lish imkonini beradi

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Yangi jurnallar uchun tizimga aylanish qoidasini qo'shishni unutmang (agar jurnal fayli .log bilan tugamasa)

default.conf faylini /etc/nginx/conf.d/ dan olib tashlang

rm -f /etc/nginx/conf.d/default.conf

Virtual xost qo'shing /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Virtual xost qo'shing /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Virtual xost qo'shing /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Virtual xost qo'shing /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

/etc/hosts fayliga virtual xostlarni (nginx o'rnatilgan serverning 172.26.10.106 ip) barcha serverlarga qo'shing:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Va agar hamma narsa tayyor bo'lsa

nginx -t 
systemctl restart nginx

Endi uni o'zimiz o'rnatamiz vektor

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Systemd /etc/systemd/system/vector.service uchun sozlamalar faylini yarataylik.

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Va /etc/vector/vector.toml konfiguratsiyasida Filebeat almashtirishni sozlang. 172.26.10.108 IP manzili jurnal serverining IP manzili (Vektor-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Vektor foydalanuvchini kerakli guruhga qo'shishni unutmang, shunda u log fayllarini o'qiy oladi. Masalan, centosdagi nginx adm guruhi huquqlariga ega jurnallarni yaratadi.

usermod -a -G adm vector

Keling, vektor xizmatini boshlaylik

systemctl enable vector
systemctl start vector

Vektor jurnallarini quyidagicha ko'rish mumkin:

journalctl -f -u vector

Jurnallarda shunday yozuv bo'lishi kerak

INFO vector::topology::builder: Healthcheck: Passed.

Stress testi

Sinov Apache benchmark yordamida amalga oshiriladi.

httpd-tools to'plami barcha serverlarga o'rnatildi

Biz ekranda 4 xil serverdan Apache benchmarkidan foydalangan holda sinovni boshlaymiz. Birinchidan, biz ekran terminali multipleksorini ishga tushiramiz va keyin biz Apache benchmarkidan foydalanib sinovni boshlaymiz. Ekran bilan qanday ishlashni topishingiz mumkin maqola.

1-serverdan

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

2-serverdan

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

3-serverdan

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

4-serverdan

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Keling, Clickhouse-dagi ma'lumotlarni tekshiramiz

Clickhouse-ga o'ting

clickhouse-client -h 172.26.10.109 -m

SQL so'rovini yaratish

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Clickhouse-da jadvallar hajmini bilib oling

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Keling, Clickhouse-da qancha jurnalni egallaganini bilib olaylik.

Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

Jurnallar jadvali hajmi 857.19 MB.

Vektor yordamida Clickhouse va Elasticsearch-ga Nginx json jurnallarini yuborish

Elasticsearch-dagi indeksdagi bir xil ma'lumotlarning hajmi 4,5 GB.

Parametrlarda vektorda ma'lumotlarni ko'rsatmasangiz, Clickhouse Elasticsearch-ga qaraganda 4500/857.19 = 5.24 marta kamroq oladi.

Vektorda sukut bo'yicha siqish maydoni ishlatiladi.

Telegram chat orqali klik uyi
Telegram chat orqali Elasticsearch
Telegram chatida "Tizimni yig'ish va tahlil qilish xabarlar"

Manba: www.habr.com

a Izoh qo'shish