Yawanci, samfuran kasuwanci ko shirye-shiryen hanyoyin buɗe tushen, kamar Prometheus + Grafana, ana amfani da su don saka idanu da tantance aikin Nginx. Wannan zaɓi ne mai kyau don saka idanu ko nazari na lokaci-lokaci, amma bai dace sosai don nazarin tarihi ba. A kan kowane sanannen albarkatu, ƙarar bayanai daga rajistan ayyukan nginx yana girma cikin sauri, kuma don bincika babban adadin bayanai, yana da ma'ana don amfani da wani abu na musamman.
A cikin wannan labarin zan gaya muku yadda zaku iya amfani da shi
TL: DR;
Don tattara bayanan da muke amfani da su
Tattara rajistan ayyukan Nginx
Ta hanyar tsoho, Nginx rajistan ayyukan suna kama da wani abu kamar haka:
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
Ana iya rarraba su, amma yana da sauƙin gyara tsarin Nginx don ya samar da rajistan ayyukan a JSON:
log_format json_combined escape=json '{ "created_at": "$msec", '
'"remote_addr": "$remote_addr", '
'"remote_user": "$remote_user", '
'"request": "$request", '
'"status": $status, '
'"bytes_sent": $bytes_sent, '
'"request_length": $request_length, '
'"request_time": $request_time, '
'"http_referrer": "$http_referer", '
'"http_x_forwarded_for": "$http_x_forwarded_for", '
'"http_user_agent": "$http_user_agent" }';
access_log /var/log/nginx/access.log json_combined;
S3 don ajiya
Don adana rajistan ayyukan, za mu yi amfani da S3. Wannan yana ba ku damar adanawa da bincika rajistan ayyukan a wuri ɗaya, tunda Athena na iya aiki tare da bayanai a cikin S3 kai tsaye. Daga baya a cikin labarin zan gaya muku yadda ake ƙara daidai da aiwatar da rajistan ayyukan, amma da farko muna buƙatar guga mai tsabta a cikin S3, wanda ba za a adana wani abu ba. Yana da kyau a yi la'akari da wuri a wane yanki ne za ku ƙirƙira guga a ciki, saboda Athena ba ya samuwa a duk yankuna.
Ƙirƙirar da'ira a cikin na'urar wasan bidiyo na Athena
Bari mu ƙirƙiri tebur a Athena don rajistan ayyukan. Ana buƙatar duka rubutu da karatu idan kuna shirin amfani da Kinesis Firehose. Bude wasan bidiyo na Athena kuma ƙirƙirar tebur:
Ƙirƙirar tebur SQL
CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
`created_at` double,
`remote_addr` string,
`remote_user` string,
`request` string,
`status` int,
`bytes_sent` int,
`request_length` int,
`request_time` double,
`http_referrer` string,
`http_x_forwarded_for` string,
`http_user_agent` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');
Ƙirƙirar Kinesis Firehose Rafi
Kinesis Firehose zai rubuta bayanan da aka karɓa daga Nginx zuwa S3 a cikin tsarin da aka zaɓa, yana rarraba shi zuwa kundin adireshi a cikin tsarin YYYY/MM/DD/HH. Wannan zai zo da amfani lokacin karanta bayanai. Kuna iya, ba shakka, rubuta kai tsaye zuwa S3 daga mai hankali, amma a wannan yanayin dole ne ku rubuta JSON, kuma wannan ba shi da inganci saboda girman fayilolin. Bugu da ƙari, lokacin amfani da PrestoDB ko Athena, JSON shine tsarin bayanai mafi hankali. Don haka buɗe na'urar wasan bidiyo na Kinesis Firehose, danna "Ƙirƙiri rafin bayarwa", zaɓi "PUT kai tsaye" a cikin filin "bayarwa":
A cikin shafin na gaba, zaɓi "Tsarin rikodin rikodin" - "An kunna" kuma zaɓi "Apache ORC" azaman tsarin rikodi. A cewar wasu bincike
Mun zaɓi S3 don ajiya da guga da muka ƙirƙira a baya. Aws Glue Crawler, wanda zan yi magana game da shi kadan daga baya, ba zai iya aiki tare da prefixes a cikin guga S3 ba, don haka yana da mahimmanci a bar shi fanko.
Za'a iya canza sauran zaɓuɓɓukan dangane da nauyin ku; Yawancin lokaci ina amfani da waɗanda suka dace. Lura cewa matsawar S3 baya samuwa, amma ORC tana amfani da matsawa ta asali ta tsohuwa.
m
Yanzu da muka tsara adanawa da karɓar rajistan ayyukan, muna buƙatar saita aikawa. Za mu yi amfani
Da farko, muna buƙatar fayil ɗin sanyi na fluent.conf. Ƙirƙiri shi kuma ƙara tushe:
tashar 24224
daure 0.0.0.0
Yanzu zaku iya fara uwar garken Fluentd. Idan kuna buƙatar ingantaccen tsari, je zuwa
$ docker run
-d
-p 24224:24224
-p 24224:24224/udp
-v /data:/fluentd/log
-v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd
-c /fluentd/etc/fluent.conf
fluent/fluentd:stable
Wannan tsari yana amfani da hanyar /fluentd/log
zuwa cache logs kafin aikawa. Kuna iya yin hakan ba tare da wannan ba, amma lokacin da kuka sake farawa, zaku iya rasa duk abin da aka adana tare da aikin karya baya. Hakanan zaka iya amfani da kowace tashar jiragen ruwa; 24224 ita ce tsohuwar tashar Fluentd.
Yanzu da muke da Fluentd Gudun, za mu iya aika Nginx rajistan ayyukan a can. Yawancin lokaci muna gudanar da Nginx a cikin akwati Docker, a cikin abin da yanayin Docker yana da direban shiga na asali don Fluentd:
$ docker run
--log-driver=fluentd
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}"
-v /some/content:/usr/share/nginx/html:ro
-d
nginx
Idan kuna gudanar da Nginx daban, zaku iya amfani da fayilolin log, Fluentd yana da
Bari mu ƙara ƙididdigar log ɗin da aka saita a sama zuwa tsarin Fluent:
<filter YOUR-NGINX-TAG.*>
@type parser
key_name log
emit_invalid_record_to_error false
<parse>
@type json
</parse>
</filter>
Kuma aika rajistan ayyukan zuwa Kinesis ta amfani da
<match YOUR-NGINX-TAG.*>
@type kinesis_firehose
region region
delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
aws_key_id <YOUR-AWS-KEY-ID>
aws_sec_key <YOUR_AWS-SEC_KEY>
</match>
Athena
Idan kun saita komai daidai, to bayan ɗan lokaci (ta tsohuwa, bayanan Kinesis sun karɓi bayanai sau ɗaya kowane minti 10) yakamata ku ga fayilolin log a cikin S3. A cikin menu na "sa idanu" na Kinesis Firehose zaka iya ganin adadin bayanai da aka rubuta a cikin S3, da kuma kurakurai. Kar a manta ba da damar rubutawa zuwa guga S3 zuwa rawar Kinesis. Idan Kinesis ba zai iya rarraba wani abu ba, zai ƙara kurakurai zuwa guga ɗaya.
Yanzu zaku iya duba bayanan a Athena. Bari mu nemo sabbin buƙatun da muka mayar da kurakurai don su:
SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;
Ana bincika duk bayanan don kowace buƙata
Yanzu an sarrafa rajistan ayyukan mu kuma an adana su a cikin S3 a cikin ORC, an matsa kuma a shirye don bincike. Kinesis Firehose har ma ya shirya su cikin kundin adireshi na kowace awa. Koyaya, muddin ba a raba teburin ba, Athena za ta loda bayanan lokaci-lokaci akan kowane buƙatu, tare da keɓantacce. Wannan babbar matsala ce saboda dalilai guda biyu:
- Adadin bayanai yana ci gaba da girma, yana raguwar tambayoyin;
- Ana cajin Athena bisa girman adadin bayanan da aka bincika, tare da mafi ƙarancin 10 MB akan kowane buƙata.
Don gyara wannan, muna amfani da AWS Glue Crawler, wanda zai ja bayanan a cikin S3 kuma ya rubuta bayanin bangare zuwa ga Glue Metastore. Wannan zai ba mu damar yin amfani da ɓangarori azaman tacewa yayin tambayar Athena, kuma za ta duba kundayen adireshi da aka kayyade a cikin tambayar kawai.
Kafa Amazon Glue Crawler
Amazon Glue Crawler yana duba duk bayanan da ke cikin guga na S3 kuma ya ƙirƙiri teburi tare da ɓangarori. Ƙirƙiri Crawler Manne daga AWS Glue console kuma ƙara guga inda kuke adana bayanan. Kuna iya amfani da rarrafe ɗaya don buckets da yawa, wanda a cikin wannan yanayin zai ƙirƙiri tebur a cikin ƙayyadaddun bayanai tare da sunaye waɗanda suka dace da sunayen buckets. Idan kuna shirin amfani da wannan bayanan akai-akai, tabbatar da saita jadawalin ƙaddamar da Crawler don dacewa da bukatunku. Muna amfani da Crawler guda ɗaya don duk teburi, wanda ke gudana kowace awa.
Teburan da aka raba
Bayan ƙaddamar da mai rarrafe na farko, tebur na kowane guga da aka bincika yakamata su bayyana a cikin bayanan da aka kayyade a cikin saitunan. Bude wasan bidiyo na Athena kuma nemo tebur tare da rajistan ayyukan Nginx. Mu gwada karanta wani abu:
SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
partition_0 = '2019' AND
partition_1 = '04' AND
partition_2 = '08' AND
partition_3 = '06'
);
Wannan tambayar za ta zaɓi duk bayanan da aka samu tsakanin 6 na safe zuwa 7 na safe ranar 8 ga Afrilu, 2019. Amma nawa ne wannan ya fi inganci fiye da karanta kawai daga teburin da ba a raba ba? Bari mu gano kuma mu zaɓi bayanan iri ɗaya, mu tace su ta tambarin lokaci:
3.59 seconds da 244.34 megabyte na bayanai a kan ma'ajin bayanai tare da mako guda na rajistan ayyukan. Mu gwada tace ta bangare:
Ƙananan sauri, amma mafi mahimmanci - kawai 1.23 megabyte na bayanai! Zai fi arha sosai idan ba don mafi ƙarancin megabyte 10 akan kowane buƙatu a cikin farashi ba. Amma har yanzu yana da kyau sosai, kuma a kan manyan ɗakunan bayanai bambancin zai fi ban sha'awa sosai.
Gina dashboard ta amfani da Cube.js
Don haɗa dashboard, muna amfani da tsarin nazari na Cube.js. Yana da ayyuka da yawa da yawa, amma muna sha'awar biyu: ikon yin amfani da abubuwan tacewa ta atomatik da tattara bayanai. Yana amfani da tsarin bayanai
Bari mu ƙirƙiri sabon aikace-aikacen Cube.js. Tunda muna amfani da tarin AWS, yana da ma'ana don amfani da Lambda don turawa. Kuna iya amfani da samfurin bayyanannen tsararraki idan kuna shirin ɗaukar nauyin baya na Cube.js a cikin Heroku ko Docker. Takardun sun bayyana wasu
$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena
Ana amfani da masu canjin yanayi don saita damar bayanai a cikin cube.js. Janareta zai ƙirƙiri fayil ɗin .env wanda a ciki za ku iya tantance maɓallan ku
Yanzu muna bukata
A cikin kundin adireshi schema
, ƙirƙirar fayil Logs.js
. Anan ga samfurin bayanai na nginx:
Lambar samfuri
const partitionFilter = (from, to) => `
date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
`
cube(`Logs`, {
sql: `
select * from part_demo_kinesis_bucket
WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
`,
measures: {
count: {
type: `count`,
},
errorCount: {
type: `count`,
filters: [
{ sql: `${CUBE.isError} = 'Yes'` }
]
},
errorRate: {
type: `number`,
sql: `100.0 * ${errorCount} / ${count}`,
format: `percent`
}
},
dimensions: {
status: {
sql: `status`,
type: `number`
},
isError: {
type: `string`,
case: {
when: [{
sql: `${CUBE}.status >= 400`, label: `Yes`
}],
else: { label: `No` }
}
},
createdAt: {
sql: `from_unixtime(created_at)`,
type: `time`
}
}
});
A nan muna amfani da m
Mun kuma saita ma'auni da sigogi waɗanda muke son nunawa akan dashboard kuma mu ƙayyade abubuwan da aka riga aka haɗa. Cube.js zai ƙirƙiri ƙarin teburi tare da bayanan da aka riga aka tara kuma za su sabunta bayanan ta atomatik yayin da ya isa. Wannan ba kawai yana hanzarta tambayoyin ba, har ma yana rage farashin amfani da Athena.
Bari mu ƙara wannan bayanin zuwa fayil ɗin tsarin bayanai:
preAggregations: {
main: {
type: `rollup`,
measureReferences: [count, errorCount],
dimensionReferences: [isError, status],
timeDimensionReference: createdAt,
granularity: `day`,
partitionGranularity: `month`,
refreshKey: {
sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) =>
`select
CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
THEN date_trunc('hour', now()) END`
)
}
}
}
Mun ƙididdigewa a cikin wannan ƙirar cewa ya zama dole a riga an haɗa bayanai don duk ma'aunin da aka yi amfani da shi, da kuma amfani da rarrabuwa ta wata.
Yanzu za mu iya harhada dashboard!
Cube.js backend yana bayarwa
Sabar Cube.js tana karɓar buƙatun a ciki
{
"measures": ["Logs.errorCount"],
"timeDimensions": [
{
"dimension": "Logs.createdAt",
"dateRange": ["2019-01-01", "2019-01-07"],
"granularity": "day"
}
]
}
Bari mu shigar da abokin ciniki na Cube.js da ɗakin karatu na bangaren React ta hanyar NPM:
$ npm i --save @cubejs-client/core @cubejs-client/react
Muna shigo da kayan aikin cubejs
и QueryRenderer
don zazzage bayanan, da tattara dashboard:
Dashboard code
import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';
const cubejsApi = cubejs(
'YOUR-CUBEJS-API-TOKEN',
{ apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);
export default () => {
return (
<QueryRenderer
query={{
measures: ['Logs.errorCount'],
timeDimensions: [{
dimension: 'Logs.createdAt',
dateRange: ['2019-01-01', '2019-01-07'],
granularity: 'day'
}]
}}
cubejsApi={cubejsApi}
render={({ resultSet }) => {
if (!resultSet) {
return 'Loading...';
}
return (
<LineChart data={resultSet.rawData()}>
<XAxis dataKey="Logs.createdAt"/>
<YAxis/>
<Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
</LineChart>
);
}}
/>
)
}
Ana samun tushen dashboard a
source: www.habr.com