Rust tilida yozilgan bo'lib, u analoglari bilan solishtirganda yuqori unumdorlik va kam RAM iste'moli bilan ajralib turadi. Bundan tashqari, to'g'rilik bilan bog'liq funktsiyalarga, xususan, diskdagi buferga yuborilmagan hodisalarni saqlash va fayllarni aylantirish qobiliyatiga katta e'tibor beriladi.
Arxitektura nuqtai nazaridan, Vektor bir yoki bir nechta xabarlarni qabul qiluvchi voqea routeridir manbalar, ixtiyoriy ravishda ushbu xabarlar ustidan qo'llash transformatsiyalar, va ularni bir yoki bir nechtasiga yuborish drenajlar.
Vektor filebeat va logstash o'rnini bosadi, u ikkala rolda ham harakat qilishi mumkin (jurnallarni qabul qilish va yuborish), ular haqida batafsil ma'lumot сайт.
Agar Logstash-da zanjir kirish → filtr → chiqish sifatida qurilgan bo'lsa, Vektorda shunday bo'ladi manbalar → o'zgartiradi → cho'kish
Misollarni hujjatlarda topish mumkin.
Ushbu ko'rsatma qayta ko'rib chiqilgan yo'riqnomadir Vyacheslav Raxinskiy. Asl ko'rsatmalar geoip qayta ishlashni o'z ichiga oladi. Ichki tarmoqdan geoipni sinab ko'rishda vektor xatolik yuz berdi.
Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30
Agar kimdir geoipni qayta ishlashga muhtoj bo'lsa, u holda asl ko'rsatmalarga qarang Vyacheslav Raxinskiy.
Biz Nginx (Kirish jurnallari) → Vektor (Mijoz | Filebeat) → Vektor (Server | Logstash) → kombinatsiyasini Clickhouse-da alohida va Elasticsearch-da alohida sozlaymiz. Biz 4 ta serverni o'rnatamiz. Garchi siz uni 3 ta server bilan chetlab o'tsangiz ham.
Sxema shunga o'xshash narsa.
Barcha serverlaringizda Selinux-ni o'chiring
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
reboot
Biz barcha serverlarda HTTP server emulyatori + yordamchi dasturlarni o'rnatamiz
ClickHouse SSE 4.2 ko'rsatmalar to'plamidan foydalanadi, shuning uchun agar boshqacha ko'rsatilmagan bo'lsa, ishlatiladigan protsessorda uni qo'llab-quvvatlash qo'shimcha tizim talabiga aylanadi. Joriy protsessor SSE 4.2 ni qo'llab-quvvatlashini tekshirish buyrug'i:
Yagona tugunli rejim uchun Elasticsearch sozlanmoqda 1 parcha, 0 replika. Katta ehtimol bilan sizda ko'p sonli serverlar klasteri bo'ladi va buni qilish shart emas.
Kelajakdagi indekslar uchun standart shablonni yangilang:
Jadvallarni yaratgandan so'ng, Vektorni ishga tushirishingiz mumkin
systemctl enable vector
systemctl start vector
Vektor jurnallarini quyidagicha ko'rish mumkin:
journalctl -f -u vector
Jurnallarda shunga o'xshash yozuvlar bo'lishi kerak
INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.
Mijozda (veb-server) - 1-server
Nginx serverida siz ipv6-ni o'chirib qo'yishingiz kerak, chunki clickhouse-dagi jurnallar jadvali maydondan foydalanadi. upstream_addr IPv4, chunki men tarmoq ichida ipv6 dan foydalanmayman. Agar ipv6 o'chirilmagan bo'lsa, xatolar bo'ladi:
DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)
Birinchidan, /etc/nginx/nginx.conf faylida Nginx-da jurnal formatini sozlashimiz kerak.
user nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically
# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# provides the configuration file context in which the directives that affect connection processing are specified.
events {
# determines how much clients will be served per worker
# max clients = worker_connections * worker_processes
# max clients is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;
# optimized to serve many clients with each thread, essential for linux -- for testing environment
use epoll;
# accept as many connections as possible, may flood worker connections if set too low -- for testing environment
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format vector escape=json
'{'
'"node_name":"nginx-vector",'
'"timestamp":"$time_iso8601",'
'"server_name":"$server_name",'
'"request_full": "$request",'
'"request_user_agent":"$http_user_agent",'
'"request_http_host":"$http_host",'
'"request_uri":"$request_uri",'
'"request_scheme": "$scheme",'
'"request_method":"$request_method",'
'"request_length":"$request_length",'
'"request_time": "$request_time",'
'"request_referrer":"$http_referer",'
'"response_status": "$status",'
'"response_body_bytes_sent":"$body_bytes_sent",'
'"response_content_type":"$sent_http_content_type",'
'"remote_addr": "$remote_addr",'
'"remote_port": "$remote_port",'
'"remote_user": "$remote_user",'
'"upstream_addr": "$upstream_addr",'
'"upstream_bytes_received": "$upstream_bytes_received",'
'"upstream_bytes_sent": "$upstream_bytes_sent",'
'"upstream_cache_status":"$upstream_cache_status",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time",'
'"upstream_response_length":"$upstream_response_length",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_status": "$upstream_status",'
'"upstream_content_type":"$upstream_http_content_type"'
'}';
access_log /var/log/nginx/access.log main;
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
Joriy konfiguratsiyani buzmaslik uchun Nginx sizga bir nechta access_log ko'rsatmalariga ega bo'lish imkonini beradi
access_log /var/log/nginx/access.log main; # Стандартный лог
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
Yangi jurnallar uchun tizimga aylanish qoidasini qo'shishni unutmang (agar jurnal fayli .log bilan tugamasa)
default.conf faylini /etc/nginx/conf.d/ dan olib tashlang
Va /etc/vector/vector.toml konfiguratsiyasida Filebeat almashtirishni sozlang. 172.26.10.108 IP manzili jurnal serverining IP manzili (Vektor-Server)
data_dir = "/var/lib/vector"
[sources.nginx_file]
type = "file"
include = [ "/var/log/nginx/access.json.log" ]
start_at_beginning = false
fingerprinting.strategy = "device_and_inode"
[sinks.nginx_output_vector]
type = "vector"
inputs = [ "nginx_file" ]
address = "172.26.10.108:9876"
Vektor foydalanuvchini kerakli guruhga qo'shishni unutmang, shunda u log fayllarini o'qiy oladi. Masalan, centosdagi nginx adm guruhi huquqlariga ega jurnallarni yaratadi.
usermod -a -G adm vector
Keling, vektor xizmatini boshlaylik
systemctl enable vector
systemctl start vector
Vektor jurnallarini quyidagicha ko'rish mumkin:
journalctl -f -u vector
Jurnallarda shunday yozuv bo'lishi kerak
INFO vector::topology::builder: Healthcheck: Passed.
Stress testi
Sinov Apache benchmark yordamida amalga oshiriladi.
httpd-tools to'plami barcha serverlarga o'rnatildi
Biz ekranda 4 xil serverdan Apache benchmarkidan foydalangan holda sinovni boshlaymiz. Birinchidan, biz ekran terminali multipleksorini ishga tushiramiz va keyin biz Apache benchmarkidan foydalanib sinovni boshlaymiz. Ekran bilan qanday ishlashni topishingiz mumkin maqola.
1-serverdan
while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done
2-serverdan
while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done
3-serverdan
while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done
4-serverdan
while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; done
select concat(database, '.', table) as table,
formatReadableSize(sum(bytes)) as size,
sum(rows) as rows,
max(modification_time) as latest_modification,
sum(bytes) as bytes_size,
any(engine) as engine,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;
Keling, Clickhouse-da qancha jurnalni egallaganini bilib olaylik.
Jurnallar jadvali hajmi 857.19 MB.
Elasticsearch-dagi indeksdagi bir xil ma'lumotlarning hajmi 4,5 GB.
Parametrlarda vektorda ma'lumotlarni ko'rsatmasangiz, Clickhouse Elasticsearch-ga qaraganda 4500/857.19 = 5.24 marta kamroq oladi.
Vektorda sukut bo'yicha siqish maydoni ishlatiladi.