Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

Ni deede, awọn ọja iṣowo tabi awọn omiiran orisun ṣiṣi ti o ti ṣetan, gẹgẹbi Prometheus + Grafana, ni a lo lati ṣe atẹle ati itupalẹ iṣẹ Nginx. Eyi jẹ aṣayan ti o dara fun ibojuwo tabi awọn atupale akoko gidi, ṣugbọn kii ṣe irọrun pupọ fun itupalẹ itan. Lori eyikeyi orisun olokiki, iwọn data lati awọn akọọlẹ nginx n dagba ni iyara, ati lati ṣe itupalẹ iye nla ti data, o jẹ ọgbọn lati lo nkan pataki diẹ sii.

Ninu nkan yii Emi yoo sọ fun ọ bi o ṣe le lo Athena lati ṣe itupalẹ awọn akọọlẹ, mu Nginx gẹgẹbi apẹẹrẹ, Emi yoo ṣe afihan bi o ṣe le ṣajọ dasibodu analitikali lati inu data yii nipa lilo ilana orisun-ìmọ cube.js. Eyi ni ilana faaji ojutu pipe:

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

TL:DR;
Ọna asopọ si dasibodu ti o pari.

Lati gba alaye ti a lo Fluentd, fun processing - AWS Kinesis Data Firehose и AWS lẹ pọ, fun ibi ipamọ - Aws S3. Lilo idii yii, o le fipamọ kii ṣe awọn akọọlẹ nginx nikan, ṣugbọn tun awọn iṣẹlẹ miiran, ati awọn akọọlẹ ti awọn iṣẹ miiran. O le paarọ awọn ẹya kan pẹlu iru eyi fun akopọ rẹ, fun apẹẹrẹ, o le kọ awọn akọọlẹ si kinesis taara lati nginx, titọpa fluentd, tabi lo logstash fun eyi.

Gbigba awọn akọọlẹ Nginx

Nipa aiyipada, awọn akọọlẹ Nginx dabi nkan bi eleyi:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Wọn le ṣe atunto, ṣugbọn o rọrun pupọ lati ṣe atunṣe iṣeto Nginx ki o ṣe agbejade awọn akọọlẹ ni JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 fun ibi ipamọ

Lati tọju awọn akọọlẹ, a yoo lo S3. Eyi n gba ọ laaye lati fipamọ ati itupalẹ awọn akọọlẹ ni aaye kan, nitori Athena le ṣiṣẹ pẹlu data ni S3 taara. Nigbamii ninu nkan naa Emi yoo sọ fun ọ bi o ṣe le ṣafikun ni deede ati ilana awọn igbasilẹ, ṣugbọn akọkọ a nilo garawa mimọ ni S3, ninu eyiti ko si ohun miiran ti yoo tọju. O tọ lati gbero siwaju ni agbegbe wo ni iwọ yoo ṣẹda garawa rẹ, nitori Athena ko si ni gbogbo awọn agbegbe.

Ṣiṣẹda ero kan ninu console Athena

Jẹ ki a ṣẹda tabili ni Athena fun awọn akọọlẹ. O nilo fun kikọ mejeeji ati kika ti o ba gbero lati lo Kinesis Firehose. Ṣii console Athena ki o ṣẹda tabili kan:

SQL tabili ẹda

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ṣiṣẹda Kinesis Firehose ṣiṣan

Kinesis Firehose yoo kọ data ti o gba lati Nginx si S3 ni ọna kika ti a yan, pin si awọn ilana ni ọna kika YYYY / MM / DD / HH. Eyi yoo wa ni ọwọ nigba kika data. O le, nitorinaa, kọ taara si S3 lati fluentd, ṣugbọn ninu ọran yii iwọ yoo ni lati kọ JSON, ati pe eyi jẹ ailagbara nitori iwọn nla ti awọn faili naa. Ni afikun, nigba lilo PrestoDB tabi Athena, JSON jẹ ọna kika data ti o lọra julọ. Nitorinaa ṣii console Kinesis Firehose, tẹ “Ṣẹda ṣiṣan ifijiṣẹ”, yan “PUT taara” ni aaye “ifijiṣẹ”:

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

Ni taabu atẹle, yan “Iyipada ọna kika igbasilẹ” - “Ṣiṣe” ki o yan “Apache ORC” bi ọna kika gbigbasilẹ. Gẹgẹbi diẹ ninu awọn iwadii Owen O'Malley, Eyi ni ọna kika to dara julọ fun PrestoDB ati Athena. A lo tabili ti a ṣẹda loke bi apẹrẹ kan. Jọwọ ṣe akiyesi pe o le pato ipo S3 eyikeyi ni kinesis; ero nikan ni a lo lati tabili. Ṣugbọn ti o ba pato ipo S3 ti o yatọ, lẹhinna o kii yoo ni anfani lati ka awọn igbasilẹ wọnyi lati inu tabili yii.

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

A yan S3 fun ibi ipamọ ati garawa ti a ṣẹda tẹlẹ. Aws Glue Crawler, eyiti Emi yoo sọrọ nipa diẹ sẹhin, ko le ṣiṣẹ pẹlu awọn ami-iṣaaju ninu garawa S3 kan, nitorinaa o ṣe pataki lati fi silẹ ni ofo.

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

Awọn aṣayan to ku le yipada da lori ẹru rẹ; Mo nigbagbogbo lo awọn aiyipada. Ṣe akiyesi pe funmorawon S3 ko si, ṣugbọn ORC nlo funmorawon abinibi nipasẹ aiyipada.

Fluentd

Ni bayi ti a ti tunto titoju ati gbigba awọn akọọlẹ, a nilo lati tunto fifiranṣẹ. A yoo lo Fluentd, nitori Mo nifẹ Ruby, ṣugbọn o le lo Logstash tabi firanṣẹ awọn akọọlẹ si kinesis taara. Olupin Fluentd le ṣe ifilọlẹ ni awọn ọna pupọ, Emi yoo sọ fun ọ nipa docker nitori pe o rọrun ati irọrun.

Ni akọkọ, a nilo faili iṣeto fluent.conf. Ṣẹda ati ṣafikun orisun:

iru siwaju
24224 ibudo
dè 0.0.0.0

Bayi o le bẹrẹ olupin Fluentd. Ti o ba nilo iṣeto ni ilọsiwaju diẹ sii, lọ si Docker ibudo Itọsọna alaye wa, pẹlu bi o ṣe le ṣajọ aworan rẹ.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Eto yii nlo ọna naa /fluentd/log si awọn akọọlẹ kaṣe ṣaaju fifiranṣẹ. O le ṣe laisi eyi, ṣugbọn nigbana ti o ba tun bẹrẹ, o le padanu ohun gbogbo ti a fipamọ pẹlu iṣẹ fifọ-pada. O tun le lo eyikeyi ibudo; 24224 jẹ ibudo Fluentd aiyipada.

Ni bayi pe a ni Fluentd nṣiṣẹ, a le fi awọn akọọlẹ Nginx ranṣẹ sibẹ. Nigbagbogbo a nṣiṣẹ Nginx ninu apoti Docker kan, ninu eyiti Docker ni awakọ gedu abinibi fun Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Ti o ba nṣiṣẹ Nginx ni oriṣiriṣi, o le lo awọn faili log, Fluentd ni ohun itanna iru faili.

Jẹ ki a ṣafikun iṣiro log ti tunto loke si iṣeto Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Ati fifiranṣẹ awọn akọọlẹ si Kinesis lilo kinesis firehose itanna:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Ti o ba ti tunto ohun gbogbo ni deede, lẹhinna lẹhin igba diẹ (nipasẹ aiyipada, awọn igbasilẹ Kinesis gba data lẹẹkan ni gbogbo iṣẹju mẹwa 10) o yẹ ki o wo awọn faili log ni S3. Ninu akojọ aṣayan "ibojuto" ti Kinesis Firehose o le wo iye data ti o gbasilẹ ni S3, ati awọn aṣiṣe. Maṣe gbagbe lati fun iraye si kikọ si garawa S3 si ipa Kinesis. Ti Kinesis ko ba le sọ nkan kan, yoo ṣafikun awọn aṣiṣe si garawa kanna.

Bayi o le wo data ni Athena. Jẹ ki a wa awọn ibeere tuntun fun eyiti a da awọn aṣiṣe pada:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Ṣiṣayẹwo gbogbo awọn igbasilẹ fun ibeere kọọkan

Bayi awọn akọọlẹ wa ti ni ilọsiwaju ati fipamọ sinu S3 ni ORC, fisinuirindigbindigbin ati ṣetan fun itupalẹ. Kinesis Firehose paapaa ṣeto wọn sinu awọn ilana fun wakati kọọkan. Sibẹsibẹ, niwọn igba ti tabili ko ba pin si, Athena yoo gbe data gbogbo-akoko lori gbogbo ibeere, pẹlu awọn imukuro toje. Eyi jẹ iṣoro nla fun awọn idi meji:

  • Iwọn data n dagba nigbagbogbo, fa fifalẹ awọn ibeere;
  • Athena ti wa ni idiyele ti o da lori iwọn didun data ti ṣayẹwo, pẹlu o kere ju 10 MB fun ibeere.

Lati ṣatunṣe eyi, a lo AWS Glue Crawler, eyiti yoo ra data ni S3 ki o kọ alaye ipin si Glue Metastore. Eyi yoo gba wa laaye lati lo awọn ipin bi àlẹmọ nigba ibeere Athena, ati pe yoo ṣe ayẹwo awọn ilana ti pato ninu ibeere naa.

Ṣiṣeto Amazon Glue Crawler

Amazon Glue Crawler ṣe ayẹwo gbogbo data ti o wa ninu garawa S3 ati ṣẹda awọn tabili pẹlu awọn ipin. Ṣẹda Crawler Lẹ pọ lati AWS Glue console ki o ṣafikun garawa kan nibiti o tọju data naa. O le lo crawler kan fun ọpọlọpọ awọn buckets, ninu eyiti o yoo ṣẹda awọn tabili ni ibi ipamọ data ti a ti sọtọ pẹlu awọn orukọ ti o baamu awọn orukọ ti awọn buckets. Ti o ba gbero lati lo data yii nigbagbogbo, rii daju pe o tunto iṣeto ifilọlẹ Crawler lati baamu awọn iwulo rẹ. A lo Crawler kan fun gbogbo awọn tabili, eyiti o nṣiṣẹ ni gbogbo wakati.

Awọn tabili ti a pin

Lẹhin ifilọlẹ akọkọ ti crawler, awọn tabili fun garawa ti ṣayẹwo kọọkan yẹ ki o han ni ibi ipamọ data ti pato ninu awọn eto. Ṣii console Athena ki o wa tabili pẹlu awọn akọọlẹ Nginx. Jẹ ká gbiyanju lati ka nkankan:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Ibeere yii yoo yan gbogbo awọn igbasilẹ ti o gba laarin 6 owurọ si 7 owurọ ni Oṣu Kẹrin Ọjọ 8, Ọdun 2019. Ṣugbọn bawo ni diẹ sii daradara ni eyi ju kika kika lati tabili ti kii ṣe ipin? Jẹ ki a wa jade ki o yan awọn igbasilẹ kanna, sisẹ wọn nipasẹ aami akoko:

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

3.59 aaya ati 244.34 megabyte ti data lori dataset pẹlu ọsẹ kan ti awọn akọọlẹ. Jẹ ki a gbiyanju àlẹmọ nipasẹ ipin:

Awọn atupale log Nginx nipa lilo Amazon Athena ati Cube.js

Iyara diẹ, ṣugbọn pataki julọ - nikan 1.23 megabyte ti data! Yoo din owo pupọ ti kii ba ṣe fun o kere ju megabyte 10 fun ibeere ni idiyele naa. Ṣugbọn o tun dara julọ, ati lori awọn ipilẹ data nla iyatọ yoo jẹ iwunilori pupọ sii.

Ilé kan Dasibodu lilo Cube.js

Lati ṣajọ dasibodu naa, a lo ilana itupalẹ Cube.js. O ni awọn iṣẹ pupọ pupọ, ṣugbọn a nifẹ si meji: agbara lati lo awọn asẹ ipin laifọwọyi ati iṣakojọpọ data. O nlo eto data eto eto, ti a kọ sinu Javascript lati ṣe ipilẹṣẹ SQL ati ṣiṣe ibeere data data kan. A nilo nikan lati tọka bi o ṣe le lo àlẹmọ ipin ninu ero data.

Jẹ ki a ṣẹda ohun elo Cube.js tuntun kan. Niwọn igba ti a ti nlo akopọ AWS tẹlẹ, o jẹ ọgbọn lati lo Lambda fun imuṣiṣẹ. O le lo awoṣe kiakia fun iran ti o ba gbero lati gbalejo Cube.js backend ni Heroku tabi Docker. Awọn iwe apejuwe awọn miiran awọn ọna alejo gbigba.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Ayika oniyipada ti wa ni lo lati tunto wiwọle database ni cube.js. Olupilẹṣẹ yoo ṣẹda faili .env ninu eyiti o le ṣe pato awọn bọtini rẹ fun Athena.

Bayi a nilo eto eto, ninu eyiti a yoo fihan ni pato bi a ṣe fipamọ awọn iwe-ipamọ wa. Nibẹ ni o tun le pato bi o ṣe le ṣe iṣiro awọn metiriki fun awọn dasibodu.

Ni awọn liana schema, ṣẹda faili kan Logs.js. Eyi ni apẹẹrẹ data apẹẹrẹ fun nginx:

koodu awoṣe

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Nibi a nlo oniyipada FILTER_PARAMSlati ṣe ipilẹṣẹ ibeere SQL pẹlu àlẹmọ ipin.

A tun ṣeto awọn metiriki ati awọn paramita ti a fẹ ṣafihan lori dasibodu ati pato awọn akojọpọ iṣaaju. Cube.js yoo ṣẹda awọn tabili afikun pẹlu data iṣakojọpọ ati pe yoo ṣe imudojuiwọn data laifọwọyi bi o ti de. Eyi kii ṣe awọn ibeere iyara nikan, ṣugbọn tun dinku idiyele lilo Athena.

Jẹ ki a ṣafikun alaye yii si faili eto data naa:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

A pato ninu awoṣe yii pe o jẹ dandan lati ṣajọpọ data tẹlẹ fun gbogbo awọn metiriki ti a lo, ati lo ipin nipasẹ oṣu. Pipin iṣaju iṣaju le significantly titẹ soke data gbigba ati mimu.

Bayi a le ṣajọ dasibodu naa!

Cube.js backend pese REST API ati ṣeto awọn ile-ikawe alabara fun awọn ilana iwaju-opin olokiki. A yoo lo ẹya React ti alabara lati kọ dasibodu naa. Cube.js n pese data nikan, nitorinaa a yoo nilo ile-ikawe iworan – Mo fẹran rẹ rechartsṣugbọn o le lo eyikeyi.

Olupin Cube.js gba ibeere naa sinu JSON ọna kika, eyi ti o pato awọn metiriki ti a beere. Fun apẹẹrẹ, lati ṣe iṣiro iye awọn aṣiṣe Nginx fun ni ọjọ kan, o nilo lati firanṣẹ ibeere atẹle:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Jẹ ki a fi sori ẹrọ alabara Cube.js ati ile-ikawe paati React nipasẹ NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

A gbe awọn eroja wọle cubejs и QueryRendererlati ṣe igbasilẹ data naa, ati gba dasibodu naa:

Koodu Dasibodu

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Awọn orisun Dasibodu wa ni KooduSandbox.

orisun: www.habr.com

Fi ọrọìwòye kun