
, mis on loodud logiandmete, mõõdikute ja sündmuste kogumiseks, teisendamiseks ja saatmiseks.
→
Kuna see on kirjutatud Rust keeles, iseloomustab seda analoogidega võrreldes kõrge jõudlus ja väike RAM-i tarbimine. Lisaks pööratakse palju tähelepanu korrektsusega seotud funktsioonidele, eelkõige võimalusele salvestada saatmata sündmusi ketta puhvrisse ja pöörata faile.
Arhitektuuriliselt on Vector sündmuste ruuter, mis võtab vastu sõnumeid ühelt või mitmelt inimeselt allikatest, lisades valikuliselt nendele sõnumitele teisendusija saata need ühele või mitmele äravoolud.
Vector asendab filebeati ja logstashi, see võib toimida mõlemas rollis (logide vastuvõtmine ja saatmine), nende kohta lisateavet .
Kui Logstashis on kett ehitatud sisendiks → filter → väljundiks, siis Vectoris on see nii → →
Näiteid leiate dokumentatsioonist.
See juhend on muudetud juhis alates . Algsed juhised sisaldavad geoip-töötlust. Sisevõrgust geoipi testimisel andis vektor vea.
Aug 05 06:25:31.889 DEBUG transform{name=nginx_parse_rename_fields type=rename_fields}: vector::transforms::rename_fields: Field did not exist field=«geoip.country_name» rate_limit_secs=30Kui kellelgi on vaja geoipi töödelda, siis vaadake algseid juhiseid alates .
Konfigureerime kombinatsiooni Nginx (juurdepääsulogid) → Vector (klient | Filebeat) → vektor (server | Logstash) → Clickhouse'is ja eraldi Elasticsearchis. Paigaldame 4 serverit. Kuigi saate sellest mööda minna 3 serveriga.

Skeem on umbes selline.
Keela Selinux kõigis oma serverites
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
rebootInstallime kõikidesse serveritesse HTTP-serveri emulaatori + utiliidid
HTTP-serveri emulaatorina kasutame pärit
Nodejs-stub-serveril pole rpm. looge selle jaoks pöörete arv. rpm kompileeritakse kasutades
Lisage hoidla antonpatsev/nodejs-stub-server
yum -y install yum-plugin-copr epel-release
yes | yum copr enable antonpatsev/nodejs-stub-serverInstallige kõikidesse serveritesse nodejs-stub-server, Apache etalon ja ekraaniterminali multiplekser
yum -y install stub_http_server screen mc httpd-tools screenParandasin failis /var/lib/stub_http_server/stub_http_server.js stub_http_server reaktsiooniaega nii, et logisid oleks rohkem.
var max_sleep = 10;Käivitame stub_http_serveri.
systemctl start stub_http_server
systemctl enable stub_http_serverserveris 3
ClickHouse kasutab SSE 4.2 käsukomplekti, seega kui pole teisiti määratud, muutub selle tugi kasutatavas protsessoris täiendavaks süsteeminõudeks. Siin on käsk, et kontrollida, kas praegune protsessor toetab SSE 4.2:
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"Kõigepealt peate ühendama ametliku hoidla:
sudo yum install -y yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64Pakettide installimiseks peate käivitama järgmised käsud:
sudo yum install -y clickhouse-server clickhouse-clientLuba clickhouse-serveril kuulata võrgukaarti failis /etc/clickhouse-server/config.xml
<listen_host>0.0.0.0</listen_host>Logimistaseme muutmine jälgimisest silumisele
siluda
Standardsed tihendusseaded:
min_compress_block_size 65536
max_compress_block_size 1048576Zstd tihendamise aktiveerimiseks soovitati mitte puudutada konfiguratsiooni, vaid kasutada DDL-i.

Ma ei leidnud Google'is, kuidas DDL-i kaudu zstd-tihendust kasutada. Nii et jätsin selle nii nagu on.
Kolleegid, kes kasutavad Clickhouse'is zstd tihendamist, palun jagage juhiseid.
Serveri deemonina käivitamiseks käivitage:
service clickhouse-server startLiigume nüüd Clickhouse'i seadistamise juurde
Minge Clickhouse'i
clickhouse-client -h 172.26.10.109 -m172.26.10.109 — selle serveri IP, kuhu Clickhouse on installitud.
Loome vektorandmebaasi
CREATE DATABASE vector;Kontrollime andmebaasi olemasolu.
show databases;Looge tabel vector.logs.
/* Это таблица где хранятся логи как есть */
CREATE TABLE vector.logs
(
`node_name` String,
`timestamp` DateTime,
`server_name` String,
`user_id` String,
`request_full` String,
`request_user_agent` String,
`request_http_host` String,
`request_uri` String,
`request_scheme` String,
`request_method` String,
`request_length` UInt64,
`request_time` Float32,
`request_referrer` String,
`response_status` UInt16,
`response_body_bytes_sent` UInt64,
`response_content_type` String,
`remote_addr` IPv4,
`remote_port` UInt32,
`remote_user` String,
`upstream_addr` IPv4,
`upstream_port` UInt32,
`upstream_bytes_received` UInt64,
`upstream_bytes_sent` UInt64,
`upstream_cache_status` String,
`upstream_connect_time` Float32,
`upstream_header_time` Float32,
`upstream_response_length` UInt64,
`upstream_response_time` Float32,
`upstream_status` UInt16,
`upstream_content_type` String,
INDEX idx_http_host request_http_host TYPE set(0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1)
SETTINGS index_granularity = 8192;Kontrollime, kas tabelid on loodud. Käivitame clickhouse-client ja esitage taotlus.
Läheme vektorite andmebaasi.
use vector;
Ok.
0 rows in set. Elapsed: 0.001 sec.Vaatame tabeleid.
show tables;
┌─name────────────────┐
│ logs │
└─────────────────────┘Elasticsearchi installimine 4. serverisse, et saata samad andmed Elasticsearchile, et võrrelda neid Clickhouse'iga
Lisage avalik pöörete arvu võti
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearchLoome 2 repot:
/etc/yum.repos.d/elasticsearch.repo
[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md/etc/yum.repos.d/kibana.repo
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-mdPaigaldage elasticsearch ja kibana
yum install -y kibana elasticsearchKuna see on ühes eksemplaris, peate faili /etc/elasticsearch/elasticsearch.yml lisama järgmise:
discovery.type: single-nodeEt vektor saaks saata andmeid elasticsearchi teisest serverist, muudame aadressi network.host.
network.host: 0.0.0.0Kibanaga ühenduse loomiseks muutke failis /etc/kibana/kibana.yml parameetrit server.host
server.host: "0.0.0.0"Vana ja sisaldab automaatsesse käivitamisse elasticsearch
systemctl enable elasticsearch
systemctl start elasticsearchja kibana
systemctl enable kibana
systemctl start kibanaElasticsearchi konfigureerimine ühe sõlme režiimi jaoks 1 kild, 0 koopiat. Tõenäoliselt on teil suure hulga serverite klaster ja te ei pea seda tegema.
Tulevaste indeksite jaoks värskendage vaikemalli:
curl -X PUT http://localhost:9200/_template/default -H 'Content-Type: application/json' -d '{"index_patterns": ["*"],"order": -1,"settings": {"number_of_shards": "1","number_of_replicas": "0"}}' Paigaldamine Logstashi asendajana serveris 2
yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpm mc httpd-tools screenSeadistame Vectori Logstashi asendajaks. Faili /etc/vector/vector.toml redigeerimine
# /etc/vector/vector.toml
data_dir = "/var/lib/vector"
[sources.nginx_input_vector]
# General
type = "vector"
address = "0.0.0.0:9876"
shutdown_timeout_secs = 30
[transforms.nginx_parse_json]
inputs = [ "nginx_input_vector" ]
type = "json_parser"
[transforms.nginx_parse_add_defaults]
inputs = [ "nginx_parse_json" ]
type = "lua"
version = "2"
hooks.process = """
function (event, emit)
function split_first(s, delimiter)
result = {};
for match in (s..delimiter):gmatch("(.-)"..delimiter) do
table.insert(result, match);
end
return result[1];
end
function split_last(s, delimiter)
result = {};
for match in (s..delimiter):gmatch("(.-)"..delimiter) do
table.insert(result, match);
end
return result[#result];
end
event.log.upstream_addr = split_first(split_last(event.log.upstream_addr, ', '), ':')
event.log.upstream_bytes_received = split_last(event.log.upstream_bytes_received, ', ')
event.log.upstream_bytes_sent = split_last(event.log.upstream_bytes_sent, ', ')
event.log.upstream_connect_time = split_last(event.log.upstream_connect_time, ', ')
event.log.upstream_header_time = split_last(event.log.upstream_header_time, ', ')
event.log.upstream_response_length = split_last(event.log.upstream_response_length, ', ')
event.log.upstream_response_time = split_last(event.log.upstream_response_time, ', ')
event.log.upstream_status = split_last(event.log.upstream_status, ', ')
if event.log.upstream_addr == "" then
event.log.upstream_addr = "127.0.0.1"
end
if (event.log.upstream_bytes_received == "-" or event.log.upstream_bytes_received == "") then
event.log.upstream_bytes_received = "0"
end
if (event.log.upstream_bytes_sent == "-" or event.log.upstream_bytes_sent == "") then
event.log.upstream_bytes_sent = "0"
end
if event.log.upstream_cache_status == "" then
event.log.upstream_cache_status = "DISABLED"
end
if (event.log.upstream_connect_time == "-" or event.log.upstream_connect_time == "") then
event.log.upstream_connect_time = "0"
end
if (event.log.upstream_header_time == "-" or event.log.upstream_header_time == "") then
event.log.upstream_header_time = "0"
end
if (event.log.upstream_response_length == "-" or event.log.upstream_response_length == "") then
event.log.upstream_response_length = "0"
end
if (event.log.upstream_response_time == "-" or event.log.upstream_response_time == "") then
event.log.upstream_response_time = "0"
end
if (event.log.upstream_status == "-" or event.log.upstream_status == "") then
event.log.upstream_status = "0"
end
emit(event)
end
"""
[transforms.nginx_parse_remove_fields]
inputs = [ "nginx_parse_add_defaults" ]
type = "remove_fields"
fields = ["data", "file", "host", "source_type"]
[transforms.nginx_parse_coercer]
type = "coercer"
inputs = ["nginx_parse_remove_fields"]
types.request_length = "int"
types.request_time = "float"
types.response_status = "int"
types.response_body_bytes_sent = "int"
types.remote_port = "int"
types.upstream_bytes_received = "int"
types.upstream_bytes_send = "int"
types.upstream_connect_time = "float"
types.upstream_header_time = "float"
types.upstream_response_length = "int"
types.upstream_response_time = "float"
types.upstream_status = "int"
types.timestamp = "timestamp"
[sinks.nginx_output_clickhouse]
inputs = ["nginx_parse_coercer"]
type = "clickhouse"
database = "vector"
healthcheck = true
host = "http://172.26.10.109:8123" # Адрес Clickhouse
table = "logs"
encoding.timestamp_format = "unix"
buffer.type = "disk"
buffer.max_size = 104900000
buffer.when_full = "block"
request.in_flight_limit = 20
[sinks.elasticsearch]
type = "elasticsearch"
inputs = ["nginx_parse_coercer"]
compression = "none"
healthcheck = true
# 172.26.10.116 - сервер где установен elasticsearch
host = "http://172.26.10.116:9200"
index = "vector-%Y-%m-%d"Saate kohandada jaotist transforms.nginx_parse_add_defaults.
Kui kasutab neid konfiguratsioone väikese CDN-i jaoks ja ülesvoolus võib olla mitu väärtust_*
Näiteks:
"upstream_addr": "128.66.0.10:443, 128.66.0.11:443, 128.66.0.12:443"
"upstream_bytes_received": "-, -, 123"
"upstream_status": "502, 502, 200"Kui see pole teie olukord, saab seda jaotist lihtsustada
Loome süsteemid /etc/systemd/system/vector.service teenusesätted
# /etc/systemd/system/vector.service
[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target
[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector
[Install]
WantedBy=multi-user.targetPärast tabelite loomist saate käivitada Vectori
systemctl enable vector
systemctl start vectorVektorloge saab vaadata järgmiselt:
journalctl -f -u vectorLogides peaksid olema sellised sissekanded
INFO vector::topology::builder: Healthcheck: Passed.
INFO vector::topology::builder: Healthcheck: Passed.Kliendis (veebiserveris) - 1. server
Nginxiga serveris peate ipv6 keelama, kuna clickhouse'i logitabel kasutab välja upstream_addr IPv4, kuna ma ei kasuta võrgus ipv6. Kui ipv6 ei ole välja lülitatud, ilmnevad vead:
DB::Exception: Invalid IPv4 value.: (while read the value of key upstream_addr)Ehk lugejad, lisage ipv6 tugi.
Looge fail /etc/sysctl.d/98-disable-ipv6.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1Seadete rakendamine
sysctl --systemInstallime nginxi.
Lisatud nginxi hoidla fail /etc/yum.repos.d/nginx.repo
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=trueInstallige nginxi pakett
yum install -y nginxEsiteks peame konfigureerima logivormingu Nginxis failis /etc/nginx/nginx.conf
user nginx;
# you must set worker processes based on your CPU cores, nginx does not benefit from setting more than that
worker_processes auto; #some last versions calculate it automatically
# number of file descriptors used for nginx
# the limit for the maximum FDs on the server is usually set by the OS.
# if you don't set FD's then OS settings will be used which is by default 2000
worker_rlimit_nofile 100000;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# provides the configuration file context in which the directives that affect connection processing are specified.
events {
# determines how much clients will be served per worker
# max clients = worker_connections * worker_processes
# max clients is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;
# optimized to serve many clients with each thread, essential for linux -- for testing environment
use epoll;
# accept as many connections as possible, may flood worker connections if set too low -- for testing environment
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format vector escape=json
'{'
'"node_name":"nginx-vector",'
'"timestamp":"$time_iso8601",'
'"server_name":"$server_name",'
'"request_full": "$request",'
'"request_user_agent":"$http_user_agent",'
'"request_http_host":"$http_host",'
'"request_uri":"$request_uri",'
'"request_scheme": "$scheme",'
'"request_method":"$request_method",'
'"request_length":"$request_length",'
'"request_time": "$request_time",'
'"request_referrer":"$http_referer",'
'"response_status": "$status",'
'"response_body_bytes_sent":"$body_bytes_sent",'
'"response_content_type":"$sent_http_content_type",'
'"remote_addr": "$remote_addr",'
'"remote_port": "$remote_port",'
'"remote_user": "$remote_user",'
'"upstream_addr": "$upstream_addr",'
'"upstream_bytes_received": "$upstream_bytes_received",'
'"upstream_bytes_sent": "$upstream_bytes_sent",'
'"upstream_cache_status":"$upstream_cache_status",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time",'
'"upstream_response_length":"$upstream_response_length",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_status": "$upstream_status",'
'"upstream_content_type":"$upstream_http_content_type"'
'}';
access_log /var/log/nginx/access.log main;
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате json
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}Et teie praegust konfiguratsiooni mitte rikkuda, võimaldab Nginx teil kasutada mitut access_logi käsku
access_log /var/log/nginx/access.log main; # Стандартный лог
access_log /var/log/nginx/access.json.log vector; # Новый лог в формате jsonÄrge unustage lisada uute logide jaoks logrotate reeglit (kui logifail ei lõpe .log-iga)
Eemalda default.conf failist /etc/nginx/conf.d/
rm -f /etc/nginx/conf.d/default.confLisage virtuaalne host /etc/nginx/conf.d/vhost1.conf
server {
listen 80;
server_name vhost1;
location / {
proxy_pass http://172.26.10.106:8080;
}
}Lisage virtuaalne host /etc/nginx/conf.d/vhost2.conf
server {
listen 80;
server_name vhost2;
location / {
proxy_pass http://172.26.10.108:8080;
}
}Lisage virtuaalne host /etc/nginx/conf.d/vhost3.conf
server {
listen 80;
server_name vhost3;
location / {
proxy_pass http://172.26.10.109:8080;
}
}Lisage virtuaalne host /etc/nginx/conf.d/vhost4.conf
server {
listen 80;
server_name vhost4;
location / {
proxy_pass http://172.26.10.116:8080;
}
}Lisage faili /etc/hosts kõikidesse serveritesse virtuaalsed hostid (serveri 172.26.10.106 ip, kuhu nginx on installitud):
172.26.10.106 vhost1
172.26.10.106 vhost2
172.26.10.106 vhost3
172.26.10.106 vhost4Ja kui kõik on valmis, siis
nginx -t
systemctl restart nginxNüüd paigaldame selle ise
yum install -y https://packages.timber.io/vector/0.9.X/vector-x86_64.rpmLoome seadete faili systemd /etc/systemd/system/vector.service jaoks
[Unit]
Description=Vector
After=network-online.target
Requires=network-online.target
[Service]
User=vector
Group=vector
ExecStart=/usr/bin/vector
ExecReload=/bin/kill -HUP $MAINPID
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=vector
[Install]
WantedBy=multi-user.targetJa konfigureerige Filebeati asendus failis /etc/vector/vector.toml. IP-aadress 172.26.10.108 on logiserveri (Vector-Server) IP-aadress.
data_dir = "/var/lib/vector"
[sources.nginx_file]
type = "file"
include = [ "/var/log/nginx/access.json.log" ]
start_at_beginning = false
fingerprinting.strategy = "device_and_inode"
[sinks.nginx_output_vector]
type = "vector"
inputs = [ "nginx_file" ]
address = "172.26.10.108:9876"Ära unusta lisada kasutajavektorit sobivasse gruppi, et see saaks logifaile lugeda. Näiteks nginx sees centos loob logid administraatori grupi õigustega.
usermod -a -G adm vectorAlustame vektorteenust
systemctl enable vector
systemctl start vectorVektorloge saab vaadata järgmiselt:
journalctl -f -u vectorLogides peaks olema selline kanne
INFO vector::topology::builder: Healthcheck: Passed.Stressi testimine
Testime läbi Apache etaloniga.
Pakett httpd-tools installiti kõikidesse serveritesse
Alustame testimist Apache etaloniga ekraanil neljas erinevas serveris. Esiteks käivitame ekraaniterminali multiplekseri ja seejärel alustame testimist Apache'i etaloniga. Kuidas töötada ekraaniga, leiate siit .
1. serverist
while true; do ab -H "User-Agent: 1server" -c 100 -n 10 -t 10 http://vhost1/; sleep 1; done2. serverist
while true; do ab -H "User-Agent: 2server" -c 100 -n 10 -t 10 http://vhost2/; sleep 1; done3. serverist
while true; do ab -H "User-Agent: 3server" -c 100 -n 10 -t 10 http://vhost3/; sleep 1; done4. serverist
while true; do ab -H "User-Agent: 4server" -c 100 -n 10 -t 10 http://vhost4/; sleep 1; doneKontrollime Clickhouse'i andmeid
Minge Clickhouse'i
clickhouse-client -h 172.26.10.109 -mSQL päringu tegemine
SELECT * FROM vector.logs;
┌─node_name────┬───────────timestamp─┬─server_name─┬─user_id─┬─request_full───┬─request_user_agent─┬─request_http_host─┬─request_uri─┬─request_scheme─┬─request_method─┬─request_length─┬─request_time─┬─request_referrer─┬─response_status─┬─response_body_bytes_sent─┬─response_content_type─┬───remote_addr─┬─remote_port─┬─remote_user─┬─upstream_addr─┬─upstream_port─┬─upstream_bytes_received─┬─upstream_bytes_sent─┬─upstream_cache_status─┬─upstream_connect_time─┬─upstream_header_time─┬─upstream_response_length─┬─upstream_response_time─┬─upstream_status─┬─upstream_content_type─┐
│ nginx-vector │ 2020-08-07 04:32:42 │ vhost1 │ │ GET / HTTP/1.0 │ 1server │ vhost1 │ / │ http │ GET │ 66 │ 0.028 │ │ 404 │ 27 │ │ 172.26.10.106 │ 45886 │ │ 172.26.10.106 │ 0 │ 109 │ 97 │ DISABLED │ 0 │ 0.025 │ 27 │ 0.029 │ 404 │ │
└──────────────┴─────────────────────┴─────────────┴─────────┴────────────────┴────────────────────┴───────────────────┴─────────────┴────────────────┴────────────────┴────────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────────────┴───────────────────────┴───────────────┴─────────────┴─────────────┴───────────────┴───────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────────┴─────────────────┴───────────────────────Uuri Clickhouse'i laudade suurust
select concat(database, '.', table) as table,
formatReadableSize(sum(bytes)) as size,
sum(rows) as rows,
max(modification_time) as latest_modification,
sum(bytes) as bytes_size,
any(engine) as engine,
formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;Uurime, kui palju palke Clickhouse’is enda alla võttis.

Logitabeli suurus on 857.19 MB.

Samade andmete suurus Elasticsearchi indeksis on 4,5 GB.
Kui te ei määra parameetrites vektoris andmeid, võtab Clickhouse 4500/857.19 = 5.24 korda vähem kui Elasticsearchis.
Vektoris kasutatakse vaikimisi tihendusvälja.
Telegrami vestlus
Telegrami vestlus
Telegrami vestlus autor ""
Allikas: www.habr.com
