Ditulis ing basa Rust, ditondoi kanthi kinerja dhuwur lan konsumsi RAM sing sithik dibandhingake karo analoge. Kajaba iku, akeh perhatian dibayar kanggo fungsi sing ana gandhengane karo kabeneran, utamane, kemampuan kanggo nyimpen acara sing ora dikirim menyang buffer ing disk lan muter file.
Secara arsitektur, Vector minangka router acara sing nampa pesen saka siji utawa luwih sumber, opsional nglamar liwat pesen iki transformasi, lan ngirim menyang siji utawa luwih saluran.
Vektor minangka panggantos kanggo filebeat lan logstash, bisa tumindak ing loro peran (nampa lan ngirim log), rincian liyane babagan situs.
Yen ing Logstash chain dibangun minangka input → Filter → output banjur ing Vektor sumber → ngowahi → nglelebke
Conto bisa ditemokake ing dokumentasi.
Pandhuan iki minangka instruksi revisi saka Vyacheslav Rakhinsky. Instruksi asli ngemot pangolahan geoip. Nalika nguji geoip saka jaringan internal, vektor menehi kesalahan.
Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30
Yen ana sing kudu ngolah geoip, banjur deleng instruksi asli saka Vyacheslav Rakhinsky.
Kita bakal ngatur kombinasi Nginx (Log akses) → Vektor (Klien | Filebeat) → Vektor (Server | Logstash) → kanthi kapisah ing Clickhouse lan kanthi kapisah ing Elasticsearch. Kita bakal nginstal 4 server. Sanajan sampeyan bisa ngliwati karo 3 server.
Skema kaya iki.
Pateni Selinux ing kabeh server sampeyan
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot
Kita nginstal emulator server HTTP + utilitas ing kabeh server
ClickHouse nggunakake set instruksi SSE 4.2, dadi kajaba kasebut, dhukungan ing prosesor sing digunakake dadi syarat sistem tambahan. Mangkene prentah kanggo mriksa manawa prosesor saiki ndhukung SSE 4.2:
Konfigurasi Elasticsearch kanggo mode siji-node 1 shard, 0 replika. Paling kamungkinan sampeyan bakal duwe klompok nomer akeh server lan sampeyan ora perlu kanggo nindakake iki.
Kanggo indeks mangsa ngarep, nganyari cithakan standar:
INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.
Ing klien (Web server) - 1st server
Ing server kanthi nginx, sampeyan kudu mateni ipv6, amarga tabel log ing clickhouse nggunakake lapangan upstream_addr IPv4, awit aku ora nggunakake IPv6 nang jaringan. Yen ipv6 ora dipateni, bakal ana kesalahan:
DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)
Pisanan, kita kudu ngatur format log ing Nginx ing file /etc/nginx/nginx.conf
user nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically
# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# provides the configuration file context in which the directives that affect connection processing are specified.
events {
# determines how much clients will be served per worker
# max clients = worker_connections * worker_processes
# max clients is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;
# optimized to serve many clients with each thread, essential for linux -- for testing environment
use epoll;
# accept as many connections as possible, may flood worker connections if set too low -- for testing environment
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format vector escape=json
'{'
'"node_name":"nginx-vector",'
'"timestamp":"$time_iso8601",'
'"server_name":"$server_name",'
'"request_full": "$request",'
'"request_user_agent":"$http_user_agent",'
'"request_http_host":"$http_host",'
'"request_uri":"$request_uri",'
'"request_scheme": "$scheme",'
'"request_method":"$request_method",'
'"request_length":"$request_length",'
'"request_time": "$request_time",'
'"request_referrer":"$http_referer",'
'"response_status": "$status",'
'"response_body_bytes_sent":"$body_bytes_sent",'
'"response_content_type":"$sent_http_content_type",'
'"remote_addr": "$remote_addr",'
'"remote_port": "$remote_port",'
'"remote_user": "$remote_user",'
'"upstream_addr": "$upstream_addr",'
'"upstream_bytes_received": "$upstream_bytes_received",'
'"upstream_bytes_sent": "$upstream_bytes_sent",'
'"upstream_cache_status":"$upstream_cache_status",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time",'
'"upstream_response_length":"$upstream_response_length",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_status": "$upstream_status",'
'"upstream_content_type":"$upstream_http_content_type"'
'}';
access_log /var/log/nginx/access.log main;
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
Supaya ora ngrusak konfigurasi saiki, Nginx ngidini sampeyan duwe sawetara arahan access_log
access_log /var/log/nginx/access.log main; # Стандартный лог
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
Aja lali kanggo nambah aturan kanggo logrotate kanggo log anyar (yen file log ora mungkasi karo .log)
Lan ngatur panggantos Filebeat ing /etc/vector/vector.toml config. Alamat IP 172.26.10.108 yaiku alamat IP server log (Vector-Server)
data_dir = "/var/lib/vector"
[sources.nginx_file]
type = "file"
include = [ "/var/log/nginx/access.json.log" ]
start_at_beginning = false
fingerprinting.strategy = "device_and_inode"
[sinks.nginx_output_vector]
type = "vector"
inputs = [ "nginx_file" ]
address = "172.26.10.108:9876"
Aja lali nambah pangguna vektor menyang grup sing dibutuhake supaya bisa maca file log. Contone, nginx ing centos nggawe log kanthi hak grup adm.
usermod -a -G adm vector
Ayo miwiti layanan vektor
systemctl enable vector
systemctl start vector
Log vektor bisa dideleng kaya iki:
journalctl -f -u vector
Mesthine ana entri kaya iki ing log
INFO vector::topology::builder: Healthcheck: Passed.
Stress Testing
Kita nindakake tes nggunakake benchmark Apache.
Paket httpd-tools diinstal ing kabeh server
Kita miwiti nyoba nggunakake pathokan Apache saka 4 server beda ing layar. Pisanan, kita miwiti multiplexer terminal layar, banjur kita miwiti nyoba nggunakake pathokan Apache. Cara nggarap layar sing bisa ditemokake ing artikel.
Saka server 1st
while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done
Saka server 2st
while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done
Saka server 3st
while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done
Saka server 4st
while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done
select concat(database, '.', table) as table,
formatReadableSize(sum(bytes)) as size,
sum(rows) as rows,
max(modification_time) as latest_modification,
sum(bytes) as bytes_size,
any(engine) as engine,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;
Ayo goleki pinten log sing dijupuk ing Clickhouse.
Ukuran tabel log yaiku 857.19 MB.
Ukuran data sing padha ing indeks ing Elasticsearch yaiku 4,5GB.
Yen sampeyan ora nemtokake data ing vektor ing paramèter, Clickhouse njupuk 4500/857.19 = 5.24 kaping kurang saka ing Elasticsearch.
Ing vektor, kolom kompresi digunakake kanthi standar.