Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

A na-eji ngwaahịa azụmaahịa ma ọ bụ ụzọ mepere emepe emebere, dị ka Prometheus + Grafana, iji nyochaa na nyochaa ọrụ Nginx. Nke a bụ nhọrọ dị mma maka nleba anya ma ọ bụ nyocha oge, ma ọ bụghị nke ọma maka nyocha akụkọ ihe mere eme. Na akụrụngwa ọ bụla na-ewu ewu, olu data sitere na ndekọ nginx na-eto ngwa ngwa, yana iji nyochaa nnukwu data, ọ bụ ihe ezi uche dị na ya iji ihe pụrụ iche karịa.

N'isiokwu a, m ga-agwa gị otú ị nwere ike isi jiri Athena iji nyochaa ndekọ, na-ewere Nginx dịka ọmụmaatụ, m ga-egosikwa otu esi ejikọta dashboard nyocha site na data a site na iji isi mmalite cube.js. Nke a bụ ụkpụrụ ụlọ ngwọta zuru oke:

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

TL:DR;
Njikọ na dashboard emechara.

Iji nakọta ozi anyị na-eji mara mma, maka nhazi - AWS Kinesis Data Firehose и AWS mama, maka nchekwa - Azụ S3. Iji ngwugwu a, ị nwere ike ịchekwa ọ bụghị naanị nginx logs, kamakwa ihe omume ndị ọzọ, yana ndekọ nke ọrụ ndị ọzọ. Ị nwere ike iji ndị yiri ya dochie akụkụ ụfọdụ maka nchịkọta gị, dịka ọmụmaatụ, ị nwere ike dee ndekọ na kinesis ozugbo site na nginx, na-agafe nke ọma, ma ọ bụ jiri logstash maka nke a.

Na-anakọta ndekọ Nginx

Site na ndabara, Nginx ndekọ dị ka nke a:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Enwere ike ịtụgharị ha, mana ọ dị mfe idozi nhazi Nginx ka o wee mepụta ndekọ na JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 maka nchekwa

Iji chekwaa ndekọ, anyị ga-eji S3. Nke a na-enye gị ohere ịchekwa na nyochaa ndekọ n'otu ebe, ebe Athena nwere ike ịrụ ọrụ na data na S3 ozugbo. Mgbe e mesịrị na isiokwu ahụ, m ga-agwa gị otu esi etinye ya n'ụzọ ziri ezi na nhazi ndekọ, ma nke mbụ anyị chọrọ ịwụ dị ọcha na S3, nke ọ dịghị ihe ọzọ ga-echekwa. Ọ bara uru ịtụle tupu oge eruo mpaghara ebe ị ga-eke bọket gị, n'ihi na Athena adịghị na mpaghara niile.

Ịmepụta sekit na console Athena

Ka anyị mepụta tebụl na Athena maka ndekọ. Ọ dị mkpa maka ide na ịgụ ma ọ bụrụ na ị na-eme atụmatụ iji Kinesis Firehose. Mepee console Athena wee mepụta tebụl:

SQL okpokoro okike

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ịmepụta Kinesis Firehose Stream

Kinesis Firehose ga-ede data enwetara site na Nginx gaa na S3 na usoro ahọpụtara, kewaa ya na akwụkwọ ndekọ aha na usoro YYYY/MM/DD/HH. Nke a ga-aba uru mgbe ị na-agụ data. Ị nwere ike, n'ezie, dee ozugbo na S3 site na fluentd, ma na nke a, ị ga-ede JSON, na nke a adịghị arụ ọrụ n'ihi nnukwu faịlụ. Na mgbakwunye, mgbe ị na-eji PrestoDB ma ọ bụ Athena, JSON bụ usoro data kacha nwayọ. Ya mere, mepee Kinesis Firehose console, pịa "Mepụta iyi nnyefe", họrọ "PUT kpọmkwem" na mpaghara "nnyefe":

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

Na taabụ na-esote, họrọ "Ntugharị usoro ndekọ" - "Enyere" wee họrọ "Apache ORC" dịka usoro ndekọ. Dị ka ụfọdụ nnyocha si kwuo Owen O'Malley, nke a bụ usoro kachasị mma maka PrestoDB na Athena. Anyị na-eji tebụl anyị mepụtara n'elu dị ka schema. Biko mara na ị nwere ike ịkọwa ọnọdụ S3 ọ bụla na kinesis; naanị atụmatụ a na-eji na tebụl. Mana ọ bụrụ na ị kọwapụta ebe S3 dị iche, mgbe ahụ ị gaghị enwe ike ịgụ ndekọ ndị a na tebụl a.

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

Anyị na-ahọrọ S3 maka nchekwa na ịwụ nke anyị kere na mbụ. Aws Glue Crawler, nke m ga-ekwu maka obere oge, enweghị ike ịrụ ọrụ na prefixes na bọket S3, yabụ ọ dị mkpa ịhapụ ya efu.

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

Enwere ike ịgbanwe nhọrọ ndị fọdụrụ na-adabere na ibu gị; M na-ejikarị ndị ndabara eme ihe. Rịba ama na mkpakọ S3 adịghị, mana ORC na-eji mkpakọ nwa afọ na ndabara.

mara mma

Ugbu a anyị haziela ịchekwa na ịnata ndekọ, anyị kwesịrị ịhazi izipu. Anyị ga-eji mara mma, n'ihi na m hụrụ n'anya Ruby, ma ị nwere ike iji Logstash ma ọ bụ zipu ndekọ na kinesis ozugbo. Enwere ike ịmalite sava Fluentd n'ọtụtụ ụzọ, m ga-agwa gị gbasara docker n'ihi na ọ dị mfe ma dịkwa mma.

Nke mbụ, anyị chọrọ faịlụ nhazi fluent.conf. Mepụta ya wee tinye isi mmalite:

ụdị n'ihu
ọdụ ụgbọ mmiri 24224
njide 0.0.0.0

Ugbu a ị nwere ike ịmalite sava Fluentd. Ọ bụrụ na ịchọrọ nhazi dị elu karịa, gaa na Ogwe Docker Enwere ntuziaka zuru ezu, gụnyere otu esi agbakọta onyonyo gị.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Nhazi a na-eji ụzọ ahụ /fluentd/log ka cache ndekọ tupu izipu. Ị nwere ike ime na-enweghị nke a, ma mgbe ịmalitegharịa, ị nwere ike tufuo ihe niile echekwara na ọrụ na-agbaji azụ. Ị nwekwara ike iji ọdụ ụgbọ mmiri ọ bụla; 24224 bụ ọdụ ụgbọ mmiri Fluentd.

Ugbu a anyị nwere Fluentd na-agba ọsọ, anyị nwere ike izipu ndekọ Nginx ebe ahụ. Anyị na-agbakarị Nginx n'ime akpa Docker, nke ikpe Docker nwere onye ọkwọ ụgbọ ala maka Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Ọ bụrụ na ị na-agba ọsọ Nginx dị iche iche, ị nwere ike iji faịlụ ndekọ, Fluentd nwere ngwa mgbakwunye ọdụ ọdụ.

Ka anyị tinye nchịkọta ndekọ ahaziri n'elu na nhazi Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Na izipu ndekọ na Kinesis iji kinesis firehose ngwa mgbakwunye:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Ọ bụrụ na ị haziela ihe niile n'ụzọ ziri ezi, mgbe obere oge gasịrị (site na ndabara, Kinesis ndekọ natara data otu ugboro kwa nkeji 10) ị ga-ahụ faịlụ ndekọ na S3. Na menu "nleba anya" nke Kinesis Firehose ị nwere ike ịhụ ole data edere na S3, yana njehie. Echefula inye ohere ide na bọket S3 maka ọrụ Kinesis. Ọ bụrụ na Kinesis enweghị ike ịtugharị ihe, ọ ga-agbakwunye njehie na otu ịwụ ahụ.

Ugbu a ị nwere ike ịlele data na Athena. Ka anyị chọta arịrịọ ndị kacha ọhụrụ anyị weghachitere mperi:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Na-enyocha ndekọ niile maka arịrịọ ọ bụla

Ugbu a, edozila ma chekwaa ndekọ anyị na S3 na ORC, ejikọta ya ma dị njikere maka nyocha. Kinesis Firehose haziri ha ka ha bụrụ akwụkwọ ndekọ aha maka elekere ọ bụla. Agbanyeghị, ọ bụrụhaala na ekewaghị tebụl ahụ, Athena ga-ebu data oge niile na arịrịọ ọ bụla, na-enweghị oke. Nke a bụ nnukwu nsogbu n'ihi ihe abụọ:

  • Olu data na-eto eto mgbe niile, na-ebelata ajụjụ;
  • A na-akwụ ụgwọ Athena dabere na olu data nyochara, yana opekata mpe 10 MB kwa arịrịọ.

Iji dozie nke a, anyị na-eji AWS Glue Crawler, nke ga-akpụ data na S3 wee dee ozi nkebi na Glue Metastore. Nke a ga-enye anyị ohere iji akụkụ dị ka nzacha mgbe a na-ajụ Athena, ọ ga-enyocha naanị akwụkwọ ndekọ aha akọwapụtara na ajụjụ a.

Ịtọlite ​​​​Amazon Glue Crawler

Amazon Glue Crawler na-enyocha data niile dị na bọket S3 wee mepụta tebụl nwere akụkụ. Mepụta Crawler Glue si na AWS Glue console wee tinye ịwụ ebe ị na-echekwa data. Ị nwere ike iji otu crawler maka ọtụtụ bọket, nke ọ ga-emepụta tebụl na nchekwa data akọwapụtara nke nwere aha dabara na aha bọket. Ọ bụrụ na ị na-eme atụmatụ iji data a mgbe niile, jide n'aka na ị hazie usoro mbido Crawler ka ọ dabara na mkpa gị. Anyị na-eji otu Crawler maka tebụl niile, nke na-agba kwa awa.

Tebụl ndị kewara ekewa

Mgbe mbido mbụ nke crawler gasịrị, tebụl maka ịwụ nyocha ọ bụla kwesịrị ịpụta na nchekwa data akọwapụtara na ntọala. Mepee console Athena wee chọta tebụl nwere ndekọ Nginx. Ka anyị gbalịa ịgụ ihe:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Ajụjụ a ga-ahọrọ ndekọ niile enwetara n'etiti elekere isii nke ụtụtụ ruo elekere asaa nke ụtụtụ na Eprel 6, 7. Mana kedu ka nke a si rụọ ọrụ nke ọma karịa ịgụ naanị site na tebụl enweghị nkebi? Ka anyị chọpụta wee họrọ otu ndekọ, na-enyocha ha site na timestamp:

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

3.59 sekọnd na 244.34 megabyte nke data na dataset nwere naanị otu izu ndekọ. Ka anyị nwaa nzacha site na nkebi:

Nchịkọta log Nginx site na iji Amazon Athena na Cube.js

Obere ngwa ngwa, mana nke kachasị mkpa - naanị 1.23 megabyte data! Ọ ga-adị ọnụ ala karịa ma ọ bụrụ na ọ bụghị maka opekempe 10 megabyte kwa arịrịọ na ọnụahịa. Ma ọ ka dị mma karị, na nnukwu datasets dị iche ga-adọrọ mmasị karị.

Iji Cube.js wulite dashboard

Iji kpokọta dashboard ahụ, anyị na-eji usoro nyocha Cube.js. Ọ nwere ọtụtụ ọrụ, mana anyị nwere mmasị na abụọ: ikike iji nzacha nkebi na-akpaghị aka na nchịkọta data. Ọ na-eji data schema data schema, nke edere na Javascript iji mepụta SQL wee mee ajụjụ nchekwa data. Naanị anyị kwesịrị igosi otu esi eji nzacha nkebi na atụmatụ data.

Ka anyị mepụta ngwa Cube.js ọhụrụ. Ebe ọ bụ na anyị na-eji nchịkọta AWS, ọ bụ ihe ezi uche dị na ya iji Lambda maka mbugharị. Ị nwere ike iji ndebiri awara awara maka ọgbọ ma ọ bụrụ na ị na-eme atụmatụ ịkwado Cube.js backend na Heroku ma ọ bụ Docker. Akwụkwọ ahụ na-akọwa ndị ọzọ ụzọ nnabata.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

A na-eji mgbanwe gburugburu ebe obibi hazie ohere nchekwa data na cube.js. Igwe ọkụ ga-emepụta faịlụ .env nke ị nwere ike ịkọwa igodo gị n'ime ya Athena.

Ugbu a, anyị chọrọ data schema, nke anyị ga-egosi kpọmkwem otú e si echekwa ndekọ anyị. N'ebe ahụ ị nwekwara ike ịkọwapụta otu esi agbakọọ metrik maka dashboards.

Na ndekọ schema, mepụta faịlụ Logs.js. Nke a bụ ihe atụ data nlereanya maka nginx:

Koodu nlereanya

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

N'ebe a, anyị na-eji mgbanwe FILTER_PARAMSiji wepụta ajụjụ SQL nwere nzacha nkebi.

Anyị na-edokwa metrics na paramita ndị anyị chọrọ igosi na dashboard wee kọwapụta nchikota mbụ. Cube.js ga-emepụta tebụl ndị ọzọ nwere data agbakọtara ọnụ ma ga-emelite data ozugbo ka ọ bịarutere. Nke a abụghị naanị na-eme ka ajụjụ dị ngwa ngwa, kamakwa ọ na-ebelata ọnụ ahịa iji Athena.

Ka anyị tinye ozi a na faịlụ nhazi data:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Anyị na-akọwapụta n'ụdị a na ọ dị mkpa ibu ụzọ chịkọta data maka metrik niile ejiri mee ihe, wee jiri nkewa site na ọnwa. Nkewa nke mbụ mkpokọta nwere ike ime ka nnakọta na imelite data dị ngwa ngwa.

Ugbu a, anyị nwere ike ikpokọta dashboard!

Cube.js backend na-enye fọduru API yana ọba akwụkwọ ndị ahịa maka usoro ihu njedebe ama ama. Anyị ga-eji ụdị React nke onye ahịa wuo dashboard. Cube.js na-enye naanị data, yabụ anyị ga-achọ ọba akwụkwọ anya - ọ masịrị m recharts, ma ị nwere ike iji nke ọ bụla.

Ihe nkesa Cube.js na-anabata arịrịọ n'ime Ụdị JSON, nke na-akọwapụta metrik achọrọ. Dịka ọmụmaatụ, iji gbakọọ njehie ole Nginx nyere n'ụbọchị, ịkwesịrị izipu arịrịọ a:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Ka anyị tinye onye ahịa Cube.js na ọba akwụkwọ akụrụngwa React site na NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Anyị na-ebubata akụrụngwa cubejs и QueryRendereriji budata data, ma nakọta dashboard:

Koodu dashboard

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Isi mmalite dashboard dị na koodu sandbox.

isi: www.habr.com

Tinye a comment