Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

vector, ƙira don tattarawa, canzawa da aika bayanan log, awo da abubuwan da suka faru.

→ Github

Ana rubuta shi cikin yaren Rust, ana siffanta shi da babban aiki da ƙarancin amfani da RAM idan aka kwatanta da kwatankwacinsa. Bugu da ƙari, ana ba da hankali sosai ga ayyuka masu alaƙa da daidaito, musamman, ikon adana abubuwan da ba a aika ba zuwa buffer akan faifai da juya fayiloli.

A tsarin gine-gine, Vector shine na'ura mai ba da hanya tsakanin hanyoyin sadarwa wanda ke karɓar saƙonni daga ɗaya ko fiye kafofin, da zaɓin yin amfani da waɗannan saƙonnin canje-canje, da aika su zuwa ɗaya ko fiye magudanan ruwa.

Vector shine maye gurbin filebeat da logstash, yana iya aiki a cikin duka matsayin (karɓa da aika rajistan ayyukan), ƙarin cikakkun bayanai akan su. shafin.

Idan a Logstash an gina sarkar azaman shigarwa → tace → fitarwa to a cikin Vector yana kafofincanzawanutsuwa

Ana iya samun misalai a cikin takardun.

Wannan koyarwar koyarwa ce da aka bita daga Vyacheslav Rakhinsky. Umarnin asali sun ƙunshi sarrafa geoip. Lokacin gwada geoip daga cibiyar sadarwar ciki, vector ya ba da kuskure.

Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30

Idan kowa yana buƙatar aiwatar da geoip, to koma zuwa ainihin umarnin daga Vyacheslav Rakhinsky.

Za mu saita haɗin Nginx (Login shiga) → Vector (Aboki | Filebeat) → Vector (Server | Logstash) → daban a cikin Clickhouse kuma daban a cikin Elasticsearch. Za mu shigar da sabobin 4. Ko da yake kuna iya ƙetare shi da sabobin 3.

Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

Tsarin shine wani abu kamar haka.

Kashe Selinux akan duk sabar ku

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot

Mun shigar da abin koyi na uwar garken HTTP + abubuwan amfani akan duk sabobin

A matsayin emulator na uwar garken HTTP za mu yi amfani da shi nodejs-stub-uwar garken daga Maxim Ignatenko

Nodejs-stub-uwar garken ba shi da rpm. Yana da ƙirƙirar rpm don shi. rpm za a gina ta amfani da Fedora Copr

Ƙara ma'ajiyar uwar garken antonpatsev/nodejs-stub-server

yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-server

Sanya nodejs-stub-server, Apache benchmark da multixer na tashar allo akan duk sabar.

yum -y install stub_http_server screen mc httpd-tools screen

Na gyara lokacin amsawar stub_http_server a cikin /var/lib/stub_http_server/stub_http_server.js fayil domin a sami ƙarin rajistan ayyukan.

var max_sleep = 10;

Mu kaddamar da stub_http_server.

systemctl start stub_http_server
systemctl enable stub_http_server

Shigar gidan Clickhouse na sabar 3

ClickHouse yana amfani da saitin umarni na SSE 4.2, don haka sai dai in an kayyade shi, goyan bayan sa a cikin na'ura mai sarrafawa da aka yi amfani da shi ya zama ƙarin tsarin da ake bukata. Anan ga umarnin don bincika idan mai sarrafa na yanzu yana goyan bayan SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Da farko kuna buƙatar haɗa ma'ajiyar hukuma:

sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

Don shigar da fakitin kuna buƙatar gudanar da umarni masu zuwa:

sudo yum install -y clickhouse-server clickhouse-client

Ba da izinin uwar garken gidan yanar gizo don sauraron katin cibiyar sadarwa a cikin fayil /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

Canza matakin shiga daga alama zuwa gyara kuskure

debug

Daidaitaccen saitunan matsawa:

min_compress_block_size  65536
max_compress_block_size  1048576

Don kunna matsawar Zstd, an ba da shawarar kar a taɓa tsarin sai dai a yi amfani da DDL.

Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

Ba zan iya samun yadda ake amfani da matsawar zstd ta hanyar DDL a cikin Google ba. Don haka na bar shi kamar yadda yake.

Abokan aiki waɗanda ke amfani da matsawa zstd a cikin Clickhouse, da fatan za a raba umarnin.

Don fara uwar garken azaman daemon, gudu:

service clickhouse-server start

Yanzu bari mu matsa zuwa kafa Clickhouse

Je zuwa Clickhouse

clickhouse-client -h 172.26.10.109 -m

172.26.10.109 - IP na uwar garken inda aka shigar Clickhouse.

Bari mu ƙirƙiri bayanan bayanan vector

CREATE DATABASE vector;

Bari mu duba cewa akwai database.

show databases;

Ƙirƙiri teburin vector.logs.

/* Это таблица где хранятся логи как есть */

CREATE TABLE vector.logs
(
    `node_name` String,
    `timestamp` DateTime,
    `server_name` String,
    `user_id` String,
    `request_full` String,
    `request_user_agent` String,
    `request_http_host` String,
    `request_uri` String,
    `request_scheme` String,
    `request_method` String,
    `request_length` UInt64,
    `request_time` Float32,
    `request_referrer` String,
    `response_status` UInt16,
    `response_body_bytes_sent` UInt64,
    `response_content_type` String,
    `remote_addr` IPv4,
    `remote_port` UInt32,
    `remote_user` String,
    `upstream_addr` IPv4,
    `upstream_port` UInt32,
    `upstream_bytes_received` UInt64,
    `upstream_bytes_sent` UInt64,
    `upstream_cache_status` String,
    `upstream_connect_time` Float32,
    `upstream_header_time` Float32,
    `upstream_response_length` UInt64,
    `upstream_response_time` Float32,
    `upstream_status` UInt16,
    `upstream_content_type` String,
    INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;

Muna duba cewa an ƙirƙiri teburin. Mu kaddamar clickhouse-client kuma ku yi tambaya.

Bari mu je ga vector database.

use vector;

Ok.

0 rows in set. Elapsed: 0.001 sec.

Bari mu dubi tebur.

show tables;

┌─name────────────────┐
│ logs                │
└─────────────────────┘

Shigar da elasticsearch akan sabar na 4 don aika bayanai iri ɗaya zuwa Elasticsearch don kwatantawa da Clickhouse.

Ƙara maɓallin rpm na jama'a

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Bari mu ƙirƙiri repo 2:

/etc/yum.repos.d/elasticsearch.repo

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

/etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Sanya elasticsearch da kibana

yum install -y kibana elasticsearch

Tun da zai kasance a cikin kwafin 1, kuna buƙatar ƙara waɗannan abubuwa zuwa fayil /etc/elasticsearch/elasticsearch.yml:

discovery.type: single-node

Don haka vector zai iya aika bayanai zuwa elasticsearch daga wani uwar garken, bari mu canza network.host.

network.host: 0.0.0.0

Don haɗi zuwa kibana, canza sigar uwar garken.host a cikin fayil /etc/kibana/kibana.yml

server.host: "0.0.0.0"

Tsoho kuma sun haɗa da bincike na roba a cikin autostart

systemctl enable elasticsearch
systemctl start elasticsearch

da kibana

systemctl enable kibana
systemctl start kibana

Yana daidaita Elasticsearch don yanayin kumburi ɗaya 1 shard, 0 kwafi. Mai yuwuwa za ku sami gungu na yawan adadin sabobin kuma ba kwa buƙatar yin wannan.

Don fihirisa na gaba, sabunta samfur ɗin tsoho:

curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' 

saitin vector azaman maye gurbin Logstash akan uwar garken 2

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screen

Bari mu saita Vector a matsayin maye gurbin Logstash. Gyara fayil /etc/vector/vector.toml

# /etc/vector/vector.toml

data_dir = "/var/lib/vector"

[sources.nginx_input_vector]
  # General
  type                          = "vector"
  address                       = "0.0.0.0:9876"
  shutdown_timeout_secs         = 30

[transforms.nginx_parse_json]
  inputs                        = [ "nginx_input_vector" ]
  type                          = "json_parser"

[transforms.nginx_parse_add_defaults]
  inputs                        = [ "nginx_parse_json" ]
  type                          = "lua"
  version                       = "2"

  hooks.process = """
  function (event, emit)

    function split_first(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[1];
    end

    function split_last(s, delimiter)
      result = {};
      for match in (s..delimiter):gmatch("(.-)"..delimiter) do
          table.insert(result, match);
      end
      return result[#result];
    end

    event.log.upstream_addr             = split_first(split_last(event.log.upstream_addr, ', '), ':')
    event.log.upstream_bytes_received   = split_last(event.log.upstream_bytes_received, ', ')
    event.log.upstream_bytes_sent       = split_last(event.log.upstream_bytes_sent, ', ')
    event.log.upstream_connect_time     = split_last(event.log.upstream_connect_time, ', ')
    event.log.upstream_header_time      = split_last(event.log.upstream_header_time, ', ')
    event.log.upstream_response_length  = split_last(event.log.upstream_response_length, ', ')
    event.log.upstream_response_time    = split_last(event.log.upstream_response_time, ', ')
    event.log.upstream_status           = split_last(event.log.upstream_status, ', ')

    if event.log.upstream_addr == "" then
        event.log.upstream_addr = "127.0.0.1"
    end

    if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
        event.log.upstream_bytes_received = "0"
    end

    if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
        event.log.upstream_bytes_sent = "0"
    end

    if event.log.upstream_cache_status == "" then
        event.log.upstream_cache_status = "DISABLED"
    end

    if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
        event.log.upstream_connect_time = "0"
    end

    if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
        event.log.upstream_header_time = "0"
    end

    if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
        event.log.upstream_response_length = "0"
    end

    if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
        event.log.upstream_response_time = "0"
    end

    if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
        event.log.upstream_status = "0"
    end

    emit(event)

  end
  """

[transforms.nginx_parse_remove_fields]
    inputs                              = [ "nginx_parse_add_defaults" ]
    type                                = "remove_fields"
    fields                              = ["data", "file", "host", "source_type"]

[transforms.nginx_parse_coercer]

    type                                = "coercer"
    inputs                              = ["nginx_parse_remove_fields"]

    types.request_length = "int"
    types.request_time = "float"

    types.response_status = "int"
    types.response_body_bytes_sent = "int"

    types.remote_port = "int"

    types.upstream_bytes_received = "int"
    types.upstream_bytes_send = "int"
    types.upstream_connect_time = "float"
    types.upstream_header_time = "float"
    types.upstream_response_length = "int"
    types.upstream_response_time = "float"
    types.upstream_status = "int"

    types.timestamp = "timestamp"

[sinks.nginx_output_clickhouse]
    inputs   = ["nginx_parse_coercer"]
    type     = "clickhouse"

    database = "vector"
    healthcheck = true
    host = "http://172.26.10.109:8123" #  Адрес Clickhouse
    table = "logs"

    encoding.timestamp_format = "unix"

    buffer.type = "disk"
    buffer.max_size = 104900000
    buffer.when_full = "block"

    request.in_flight_limit = 20

[sinks.elasticsearch]
    type = "elasticsearch"
    inputs   = ["nginx_parse_coercer"]
    compression = "none"
    healthcheck = true
    # 172.26.10.116 - сервер где установен elasticsearch
    host = "http://172.26.10.116:9200" 
    index = "vector-%Y-%m-%d"

Kuna iya daidaita sashin transforms.nginx_parse_add_defaults.

Tun Vyacheslav Rakhinsky yana amfani da waɗannan saitunan don ƙaramin CDN kuma ana iya samun ƙima da yawa a sama_*

Alal misali:

"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"

Idan wannan ba halin ku bane, to ana iya sauƙaƙa wannan sashe

Bari mu ƙirƙiri saitunan sabis don systemd /etc/systemd/system/vector.service

# /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Bayan ƙirƙirar tebur, zaku iya gudanar da Vector

systemctl enable vector
systemctl start vector

Za a iya kallon log ɗin vector kamar haka:

journalctl -f -u vector

Dole ne a sami shigarwar irin wannan a cikin rajistan ayyukan

INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.

A kan abokin ciniki (Sabar yanar gizo) - uwar garken 1st

A kan uwar garke tare da nginx, kuna buƙatar musaki ipv6, tun da tebur log a cikin gidan dannawa yana amfani da filin. upstream_addr IPv4, tunda bana amfani da ipv6 a cikin hanyar sadarwa. Idan ba a kashe ipv6 ba, za a sami kurakurai:

DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)

Wataƙila masu karatu, ƙara goyon bayan ipv6.

Ƙirƙiri fayil ɗin /etc/sysctl.d/98-disable-ipv6.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Aiwatar da saitunan

sysctl --system

Bari mu shigar da nginx.

An ƙara fayil ɗin ajiyar nginx /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

Shigar da kunshin nginx

yum install -y nginx

Da farko, muna buƙatar saita tsarin log a cikin Nginx a cikin fayil /etc/nginx/nginx.conf

user  nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically

# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

# provides the configuration file context in which the directives that affect connection processing are specified.
events {
    # determines how much clients will be served per worker
    # max clients = worker_connections * worker_processes
    # max clients is also limited by the number of socket connections available on the system (~64k)
    worker_connections 4000;

    # optimized to serve many clients with each thread, essential for linux -- for testing environment
    use epoll;

    # accept as many connections as possible, may flood worker connections if set too low -- for testing environment
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

log_format vector escape=json
    '{'
        '"node_name":"nginx-vector",'
        '"timestamp":"$time_iso8601",'
        '"server_name":"$server_name",'
        '"request_full": "$request",'
        '"request_user_agent":"$http_user_agent",'
        '"request_http_host":"$http_host",'
        '"request_uri":"$request_uri",'
        '"request_scheme": "$scheme",'
        '"request_method":"$request_method",'
        '"request_length":"$request_length",'
        '"request_time": "$request_time",'
        '"request_referrer":"$http_referer",'
        '"response_status": "$status",'
        '"response_body_bytes_sent":"$body_bytes_sent",'
        '"response_content_type":"$sent_http_content_type",'
        '"remote_addr": "$remote_addr",'
        '"remote_port": "$remote_port",'
        '"remote_user": "$remote_user",'
        '"upstream_addr": "$upstream_addr",'
        '"upstream_bytes_received": "$upstream_bytes_received",'
        '"upstream_bytes_sent": "$upstream_bytes_sent",'
        '"upstream_cache_status":"$upstream_cache_status",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time",'
        '"upstream_response_length":"$upstream_response_length",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status": "$upstream_status",'
        '"upstream_content_type":"$upstream_http_content_type"'
    '}';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Don kar a karya tsarin ku na yanzu, Nginx yana ba ku damar samun umarnin shiga_log da yawa

access_log  /var/log/nginx/access.log  main;            # Стандартный лог
access_log  /var/log/nginx/access.json.log vector;      # Новый лог в формате json

Kar a manta da ƙara ƙa'ida don yin rajistar sabbin rajistan ayyukan (idan fayil ɗin log ɗin bai ƙare da .log ba)

Cire default.conf daga /etc/nginx/conf.d/

rm -f /etc/nginx/conf.d/default.conf

Ƙara mai masaukin baki /etc/nginx/conf.d/vhost1.conf

server {
    listen 80;
    server_name vhost1;
    location / {
        proxy_pass http://172.26.10.106:8080;
    }
}

Ƙara mai masaukin baki /etc/nginx/conf.d/vhost2.conf

server {
    listen 80;
    server_name vhost2;
    location / {
        proxy_pass http://172.26.10.108:8080;
    }
}

Ƙara mai masaukin baki /etc/nginx/conf.d/vhost3.conf

server {
    listen 80;
    server_name vhost3;
    location / {
        proxy_pass http://172.26.10.109:8080;
    }
}

Ƙara mai masaukin baki /etc/nginx/conf.d/vhost4.conf

server {
    listen 80;
    server_name vhost4;
    location / {
        proxy_pass http://172.26.10.116:8080;
    }
}

Ƙara runduna kama-da-wane (172.26.10.106 ip na uwar garken inda aka shigar da nginx) zuwa duk sabobin zuwa fayil ɗin /etc/hosts:

172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4

Kuma idan komai ya shirya to

nginx -t 
systemctl restart nginx

Yanzu bari mu shigar da kanmu vector

yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm

Bari mu ƙirƙiri fayil ɗin saiti don systemd /etc/systemd/system/vector.service

[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target

[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector

[Install]
WantedBy=multi-user.target

Kuma saita maye gurbin Filebeat a cikin /etc/vector/vector.toml config. Adireshin IP 172.26.10.108 shine adireshin IP na uwar garken log (Vector-Server)

data_dir = "/var/lib/vector"

[sources.nginx_file]
  type                          = "file"
  include                       = [ "/var/log/nginx/access.json.log" ]
  start_at_beginning            = false
  fingerprinting.strategy       = "device_and_inode"

[sinks.nginx_output_vector]
  type                          = "vector"
  inputs                        = [ "nginx_file" ]

  address                       = "172.26.10.108:9876"

Kar a manta da kara mai amfani da vector zuwa rukunin da ake bukata domin ya iya karanta fayilolin log. Misali, nginx a cikin centos yana ƙirƙirar rajistan ayyukan tare da haƙƙin ƙungiyar talla.

usermod -a -G adm vector

Bari mu fara sabis na vector

systemctl enable vector
systemctl start vector

Za a iya kallon log ɗin vector kamar haka:

journalctl -f -u vector

Ya kamata a sami shigarwa kamar wannan a cikin rajistan ayyukan

INFO vector::topology::builder: Healthcheck: Passed.

Gwajin damuwa

Ana yin gwajin ta amfani da alamar Apache.

An shigar da kunshin httpd-tools akan duk sabobin

Mun fara gwaji ta amfani da alamar Apache daga sabobin 4 daban-daban a allo. Da farko, za mu ƙaddamar da tashar tashar multixer, sannan mu fara gwaji ta amfani da alamar Apache. Yadda ake aiki da allo za ku iya samu a ciki labarin.

Daga uwar garken 1st

while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done

Daga uwar garken 2st

while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done

Daga uwar garken 3st

while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done

Daga uwar garken 4st

while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done

Bari mu duba bayanan a Clickhouse

Je zuwa Clickhouse

clickhouse-client -h 172.26.10.109 -m

Yin tambayar SQL

SELECT * FROM vector.logs;

┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1      │         │ GET / HTTP/1.0 │ 1server            │ vhost1            │ /           │ http           │ GET            │             66 │        0.028 │                  │             404 │                       27 │                       │ 172.26.10.106 │       45886 │             │ 172.26.10.106 │             0 │                     109 │                  97 │ DISABLED              │                     0 │                0.025 │                       27 │                  0.029 │             404 │                       │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────

Nemo girman teburi a Clickhouse

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

Bari mu gano adadin rajistan ayyukan da aka karɓa a Clickhouse.

Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

Girman teburin rajistan ayyukan shine 857.19 MB.

Aika Nginx json rajistan ayyukan ta amfani da Vector zuwa Clickhouse da Elasticsearch

Girman bayanai iri ɗaya a cikin fihirisar a Elasticsearch shine 4,5GB.

Idan ba ku ƙayyade bayanai a cikin vector a cikin sigogi ba, Clickhouse yana ɗaukar 4500/857.19 = 5.24 sau ƙasa da na Elasticsearch.

A cikin vector, ana amfani da filin matsawa ta tsohuwa.

Tattaunawa ta Telegram ta danna gidan
Tattaunawa ta Telegram ta Elasticsearch
Tashar Telegram ta"Tari da kuma nazarin tsarin saƙonni"

source: www.habr.com

Add a comment