Skreaun yn 'e Rust-taal, wurdt it karakterisearre troch hege prestaasjes en lege RAM-konsumpsje yn ferliking mei syn analogen. Dêrnjonken wurdt in protte omtinken jûn oan funksjes relatearre oan korrektens, benammen de mooglikheid om net-ferstjoerde eveneminten op in buffer op skiif te bewarjen en bestannen te rotearjen.
Architecturally, Vector is in evenemint router dy't ûntfangt berjochten fan ien of mear boarnen, opsjoneel tapasse oer dizze berjochten transformaasjes, en stjoer se nei ien of mear drains.
Vector is in ferfanging foar filebeat en logstash, it kin hannelje yn beide rollen (ûntfange en ferstjoere logs), mear details oer har side.
As yn Logstash de ketting is boud as ynfier → filter → útfier dan is it yn Vector boarnen → transformearret → sinkt
Foarbylden kinne fûn wurde yn 'e dokumintaasje.
Dizze ynstruksje is in herziene ynstruksje fan Vjatsjeslav Rakhinsky. De orizjinele ynstruksjes befetsje geoip-ferwurking. By it testen fan geoip fan in ynterne netwurk, joech vector in flater.
Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30
As immen geoip moat ferwurkje, ferwize dan nei de orizjinele ynstruksjes fan Vjatsjeslav Rakhinsky.
Wy sille de kombinaasje fan Nginx (Access logs) → Vector (Client | Filebeat) → Vector (Tsjinner | Logstash) → apart yn Clickhouse en apart yn Elasticsearch konfigurearje. Wy sille 4 servers ynstallearje. Hoewol kinne jo it omgean mei 3 tsjinners.
It skema is sa'n ding.
Skeakelje Selinux op al jo servers út
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot
Wy ynstallearje in HTTP-tsjinner emulator + utilities op alle servers
ClickHouse brûkt de SSE 4.2-ynstruksjeset, dus as it net oars oanjûn is, wurdt stipe foar it yn 'e brûkte prosessor in ekstra systeemeask. Hjir is it kommando om te kontrolearjen oft de hjoeddeistige prosessor SSE 4.2 stipet:
Elasticsearch konfigurearje foar ienknooppuntmodus 1 shard, 0 replika. Meast wierskynlik sille jo in kluster hawwe fan in grut oantal servers en jo hoege dit net te dwaan.
Bywurkje it standert sjabloan foar takomstige yndeksen:
Nei it meitsjen fan de tabellen, kinne jo rinne Vector
systemctl enable vector
systemctl start vector
Vectorlogs kinne sa besjoen wurde:
journalctl -f -u vector
D'r moatte ynstjoerings lykas dit wêze yn 'e logs
INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.
Op de client (webserver) - 1e tsjinner
Op de tsjinner mei nginx moatte jo ipv6 útskeakelje, om't de logtabel yn clickhouse it fjild brûkt upstream_addr IPv4, om't ik ipv6 net yn it netwurk brûke. As ipv6 net útskeakele is, sille d'r flaters wêze:
DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)
Faaks lêzers, foegje ipv6-stipe ta.
Meitsje it bestân /etc/sysctl.d/98-disable-ipv6.conf
Earst moatte wy it logformaat yn Nginx ynstelle yn it bestân /etc/nginx/nginx.conf
user nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically
# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# provides the configuration file context in which the directives that affect connection processing are specified.
events {
# determines how much clients will be served per worker
# max clients = worker_connections * worker_processes
# max clients is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;
# optimized to serve many clients with each thread, essential for linux -- for testing environment
use epoll;
# accept as many connections as possible, may flood worker connections if set too low -- for testing environment
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format vector escape=json
'{'
'"node_name":"nginx-vector",'
'"timestamp":"$time_iso8601",'
'"server_name":"$server_name",'
'"request_full": "$request",'
'"request_user_agent":"$http_user_agent",'
'"request_http_host":"$http_host",'
'"request_uri":"$request_uri",'
'"request_scheme": "$scheme",'
'"request_method":"$request_method",'
'"request_length":"$request_length",'
'"request_time": "$request_time",'
'"request_referrer":"$http_referer",'
'"response_status": "$status",'
'"response_body_bytes_sent":"$body_bytes_sent",'
'"response_content_type":"$sent_http_content_type",'
'"remote_addr": "$remote_addr",'
'"remote_port": "$remote_port",'
'"remote_user": "$remote_user",'
'"upstream_addr": "$upstream_addr",'
'"upstream_bytes_received": "$upstream_bytes_received",'
'"upstream_bytes_sent": "$upstream_bytes_sent",'
'"upstream_cache_status":"$upstream_cache_status",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time",'
'"upstream_response_length":"$upstream_response_length",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_status": "$upstream_status",'
'"upstream_content_type":"$upstream_http_content_type"'
'}';
access_log /var/log/nginx/access.log main;
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
Om jo hjoeddeistige konfiguraasje net te brekken, lit Nginx jo ferskate access_log-rjochtlinen hawwe
access_log /var/log/nginx/access.log main; # Стандартный лог
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
Ferjit net om in regel ta te foegjen om te logrotearjen foar nije logs (as it logbestân net einiget mei .log)
Ferwiderje default.conf fan /etc/nginx/conf.d/
rm -f /etc/nginx/conf.d/default.conf
Foegje firtuele host ta /etc/nginx/conf.d/vhost1.conf
En konfigurearje de Filebeat-ferfanging yn 'e /etc/vector/vector.toml konfiguraasje. 172.26.10.108 is it IP-adres fan 'e log-tsjinner (Vector-Server) IP-adres.
data_dir = "/var/lib/vector"
[sources.nginx_file]
type = "file"
include = [ "/var/log/nginx/access.json.log" ]
start_at_beginning = false
fingerprinting.strategy = "device_and_inode"
[sinks.nginx_output_vector]
type = "vector"
inputs = [ "nginx_file" ]
address = "172.26.10.108:9876"
Ferjit net de fektorbrûker ta te foegjen oan 'e fereaske groep, sadat hy logbestannen kin lêze. Bygelyks, nginx yn centos makket logs mei adm-groeprjochten.
usermod -a -G adm vector
Litte wy de vectortsjinst begjinne
systemctl enable vector
systemctl start vector
Vectorlogs kinne sa besjoen wurde:
journalctl -f -u vector
Der moat in yngong lykas dizze yn 'e logs stean
INFO vector::topology::builder: Healthcheck: Passed.
Stress Testing
Testen wurdt útfierd mei Apache-benchmark.
It pakket httpd-tools is ynstalleare op alle servers
Wy begjinne te testen mei Apache-benchmark fan 4 ferskillende servers op it skerm. Earst lansearje wy de skermterminalmultiplexer, en dan begjinne wy te testen mei de Apache-benchmark. Hoe kinne jo wurkje mei skerm kinne jo fine yn artikel.
Fan 1e tsjinner
while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done
Fan 2e tsjinner
while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done
Fan 3e tsjinner
while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done
Fan 4e tsjinner
while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done
select concat(database, '.', table) as table,
formatReadableSize(sum(bytes)) as size,
sum(rows) as rows,
max(modification_time) as latest_modification,
sum(bytes) as bytes_size,
any(engine) as engine,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;
Litte wy útfine hoefolle logs opnommen yn Clickhouse.
De logs tabel grutte is 857.19 MB.
De grutte fan deselde gegevens yn 'e yndeks yn Elasticsearch is 4,5GB.
As jo net oantsjutte gegevens yn de vector yn de parameters, Clickhouse nimt 4500/857.19 = 5.24 kear minder as yn Elasticsearch.