Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Vector, iliyoundwa kukusanya, kubadilisha na kutuma data ya kumbukumbu, vipimo na matukio.

→ Github

Imeandikwa kwa lugha ya Rust, ina sifa ya utendaji wa juu na matumizi ya chini ya RAM ikilinganishwa na analogues zake. Kwa kuongeza, tahadhari nyingi hulipwa kwa kazi zinazohusiana na usahihi, hasa, uwezo wa kuokoa matukio yasiyotumwa kwenye buffer kwenye diski na kuzunguka faili.

Kwa usanifu, Vector ni kipanga njia cha tukio ambacho hupokea ujumbe kutoka kwa moja au zaidi vyanzo, ikituma kwa hiari juu ya barua pepe hizi mabadiliko, na kuzituma kwa moja au zaidi mifereji ya maji.

Vekta ni mbadala wa mpigo wa faili na logstash, inaweza kuchukua hatua katika majukumu yote mawili (kupokea na kutuma kumbukumbu), maelezo zaidi juu yao. Online.

Ikiwa kwenye Logstash mnyororo umejengwa kama pembejeo → kichungi → pato basi iko kwenye Vector vyanzohubadilikuzama

Mifano inaweza kupatikana katika nyaraka.

Maagizo haya ni maagizo yaliyorekebishwa kutoka Vyacheslav Rakhinsky. Maagizo asilia yana usindikaji wa geoip. Wakati wa kujaribu geoip kutoka kwa mtandao wa ndani, vekta ilitoa hitilafu.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Ikiwa mtu yeyote anahitaji kuchakata geoip, basi rejelea maagizo asili kutoka Vyacheslav Rakhinsky.

Tutasanidi mchanganyiko wa Nginx (Kumbukumbu za Ufikiaji) → Vekta (Mteja | Mdundo wa faili) → Vekta (Seva | Logstash) → kando katika Clickhouse na kando katika Elasticsearch. Tutaweka seva 4. Ingawa unaweza kuipitisha na seva 3.

Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Mpango ni kitu kama hiki.

Lemaza Selinux kwenye seva zako zote

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Tunasakinisha emulator ya seva ya HTTP + huduma kwenye seva zote

Kama emulator ya seva ya HTTP tutatumia nodejs-stub-server kutoka Maxim Ignatenko

Nodejs-stub-server haina rpm. Hapa tengeneza rpm kwa ajili yake. rpm itajengwa kwa kutumia Fedora Copr

Ongeza hazina ya antonpatsev/nodejs-stub-server

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Sakinisha nodejs-stub-server, alama ya Apache na multiplexer ya terminal kwenye seva zote

yum -y install stub_http_server screen mc httpd-tools screen

Nilisahihisha stub_http_server wakati wa majibu katika /var/lib/stub_http_server/stub_http_server.js faili ili kuwe na kumbukumbu zaidi.

var max_sleep = 10;

Wacha tuzindue stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Ufungaji wa Bofya kwenye seva 3

ClickHouse hutumia seti ya maagizo ya SSE 4.2, kwa hivyo isipokuwa ikiwa imeainishwa vinginevyo, msaada wake katika kichakataji kinachotumiwa huwa hitaji la ziada la mfumo. Hapa kuna amri ya kuangalia ikiwa processor ya sasa inasaidia SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Kwanza unahitaji kuunganisha hazina rasmi:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Ili kusakinisha vifurushi unahitaji kuendesha amri zifuatazo:

sudo yum install -y clickhouse-server clickhouse-client

Ruhusu seva ya kubofya kusikiliza kadi ya mtandao katika faili /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

Kubadilisha kiwango cha ukataji miti kutoka kwa ufuatiliaji hadi utatuzi

Debug

Mipangilio ya kawaida ya kubana:

min_compress_block_size  65536
max_compress_block_size  1048576

Ili kuamilisha ukandamizaji wa Zstd, ilishauriwa kutogusa usanidi, bali utumie DDL.

Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Sikuweza kupata jinsi ya kutumia compression ya zstd kupitia DDL kwenye Google. Kwa hivyo niliiacha kama ilivyo.

Wenzake wanaotumia ukandamizaji wa zstd katika Clickhouse, tafadhali shiriki maagizo.

Kuanzisha seva kama daemon, endesha:

service clickhouse-server start

Sasa hebu tuendelee kusanidi Clickhouse

Nenda kwa Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 - IP ya seva ambapo Clickhouse imewekwa.

Wacha tuunde hifadhidata ya vekta

CREATE DATABASE vector;

Wacha tuangalie ikiwa hifadhidata iko.

show databases;

Unda jedwali la vector.logs.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Tunaangalia kwamba meza zimeundwa. Hebu tuzindue clickhouse-client na kufanya ombi.

Wacha tuende kwenye hifadhidata ya vekta.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Hebu tuangalie meza.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Kufunga elasticsearch kwenye seva ya 4 kutuma data sawa kwa Elasticsearch kwa kulinganisha na Clickhouse

Ongeza kitufe cha rpm ya umma

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Wacha tuunde repo 2:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Weka elasticsearch na kibana

yum install -y kibana elasticsearch

Kwa kuwa itakuwa katika nakala 1, unahitaji kuongeza yafuatayo kwenye /etc/elasticsearch/elasticsearch.yml faili:

discovery.type: single-node

Ili vekta hiyo iweze kutuma data kwa elasticsearch kutoka kwa seva nyingine, wacha tubadilishe network.host.

network.host: 0.0.0.0

Ili kuunganisha kwa kibana, badilisha kigezo cha server.host katika faili /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Zamani na ni pamoja na elasticsearch katika autostart

systemctl enable elasticsearch
systemctl start elasticsearch

na kibana

systemctl enable kibana
systemctl start kibana

Inasanidi Elasticsearch kwa modi ya nodi moja shard 1, nakala 0. Uwezekano mkubwa zaidi utakuwa na nguzo ya idadi kubwa ya seva na huna haja ya kufanya hivyo.

Kwa faharasa za siku zijazo, sasisha kiolezo chaguo-msingi:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

Ufungaji Vector kama mbadala wa Logstash kwenye seva 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Wacha tusanidi Vector kama mbadala wa Logstash. Kuhariri faili /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Unaweza kurekebisha sehemu ya transforms.nginx_parse_add_defaults.

Kama Vyacheslav Rakhinsky hutumia usanidi huu kwa CDN ndogo na kunaweza kuwa na maadili kadhaa katika mkondo_*

Kwa mfano:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Ikiwa hii sio hali yako, basi sehemu hii inaweza kurahisishwa

Wacha tuunda mipangilio ya huduma kwa systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Baada ya kuunda meza, unaweza kukimbia Vector

systemctl enable vector
systemctl start vector

Kumbukumbu za Vector zinaweza kutazamwa kama hii:

journalctl -f -u vector

Lazima kuwe na maingizo kama haya kwenye magogo

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

Kwenye mteja (Seva ya Wavuti) - seva ya 1

Kwenye seva iliyo na nginx, unahitaji kuzima ipv6, kwani jedwali la kumbukumbu kwenye bofya hutumia uwanja. upstream_addr IPv4, kwani situmii ipv6 ndani ya mtandao. Ikiwa ipv6 haijazimwa, kutakuwa na makosa:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Labda wasomaji, ongeza usaidizi wa ipv6.

Unda faili /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Kuweka mipangilio

sysctl --system

Wacha tusakinishe nginx.

Imeongeza faili ya hazina ya nginx /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Sakinisha kifurushi cha nginx

yum install -y nginx

Kwanza, tunahitaji kusanidi umbizo la logi katika Nginx kwenye faili /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Ili usivunje usanidi wako wa sasa, Nginx hukuruhusu kuwa na maagizo kadhaa ya ufikiaji_logi

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Usisahau kuongeza sheria ya kusajili kumbukumbu mpya (ikiwa faili ya kumbukumbu haiishii kwa .log)

Ondoa default.conf kutoka /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Ongeza mpangishi pepe /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Ongeza mpangishi pepe /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Ongeza mpangishi pepe /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Ongeza mpangishi pepe /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Ongeza seva pangishi (172.26.10.106 ip ya seva ambapo nginx imesakinishwa) kwa seva zote kwa /etc/hosts faili:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Na ikiwa kila kitu kiko tayari

nginx -t 
systemctl restart nginx

Sasa hebu tuisakinishe wenyewe Vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Wacha tuunde faili ya mipangilio ya systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Na usanidi uingizwaji wa Filebeat katika usanidi /etc/vector/vector.toml. Anwani ya IP 172.26.10.108 ni anwani ya IP ya seva ya kumbukumbu (Vector-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Usisahau kuongeza mtumiaji wa vekta kwenye kikundi kinachohitajika ili aweze kusoma faili za kumbukumbu. Kwa mfano, nginx katika centos huunda kumbukumbu na haki za kikundi cha adm.

usermod -a -G adm vector

Wacha tuanze huduma ya vector

systemctl enable vector
systemctl start vector

Kumbukumbu za Vector zinaweza kutazamwa kama hii:

journalctl -f -u vector

Lazima kuwe na kiingilio kama hiki kwenye magogo

INFO vector::topology::builder: Healthcheck: Passed.

Mtihani wa Stress

Upimaji unafanywa kwa kutumia alama ya Apache.

Kifurushi cha zana za httpd kilisakinishwa kwenye seva zote

Tunaanza kujaribu kwa kutumia alama ya Apache kutoka kwa seva 4 tofauti kwenye skrini. Kwanza, tunazindua multiplexer ya terminal ya skrini, na kisha tunaanza kupima kwa kutumia alama ya Apache. Jinsi ya kufanya kazi na skrini unaweza kupata ndani Ibara ya.

Kutoka kwa seva ya 1

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Kutoka kwa seva ya 2

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Kutoka kwa seva ya 3

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Kutoka kwa seva ya 4

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Wacha tuangalie data kwenye Clickhouse

Nenda kwa Clickhouse

clickhouse-client -h 172.26.10.109 -m

Kufanya swala la SQL

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Jua saizi ya meza kwenye Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Wacha tujue ni kumbukumbu ngapi zilichukua kwenye Clickhouse.

Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Ukubwa wa jedwali la kumbukumbu ni 857.19 MB.

Kutuma kumbukumbu za json za Nginx kwa kutumia Vector kwa Clickhouse na Elasticsearch

Saizi ya data sawa katika faharasa katika Elasticsearch ni 4,5GB.

Ikiwa hutabainisha data katika vector katika vigezo, Clickhouse inachukua 4500/857.19 = mara 5.24 chini kuliko katika Elasticsearch.

Katika vector, uwanja wa compression hutumiwa kwa chaguo-msingi.

Gumzo la Telegraph na clickhouse
Gumzo la Telegraph na Elasticsearch
Gumzo la Telegraph na "Ukusanyaji na uchambuzi wa mfumo ujumbe"

Chanzo: mapenzi.com

Kuongeza maoni