Nginx log analytics uchishandisa Amazon Athena uye Cube.js

Kazhinji, zvigadzirwa zvekutengesa kana zvakagadzirirwa-zvakagadzirirwa zvakavhurika-sosi dzimwe nzira, dzakadai sePrometheus + Grafana, dzinoshandiswa kutarisa uye kuongorora kushanda kweNginx. Iyi isarudzo yakanaka yekutarisa kana chaiyo-nguva analytics, asi haina kunyatso kuenderana nekuongorora nhoroondo. Pane chero yakakurumbira sosi, vhoriyamu yedata kubva kunginx matanda iri kukura nekukurumidza, uye kuongorora huwandu hukuru hwe data, zvine musoro kushandisa chimwe chinhu chakanyanya hunyanzvi.

Muchikamu chino ini ndichakuudza kuti ungashandisa sei Athena kuongorora matanda, kutora Nginx semuenzaniso, uye ini ndicharatidza nzira yekuunganidza dashboard yekuongorora kubva pane iyi data uchishandisa yakavhurika-source cube.js. Heino yakakwana mhinduro yekuvaka:

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

TL:DR;
Batanidza kune yapera dashboard.

Kuunganidza ruzivo rwatinoshandisa fluentd, yekugadzirisa - AWS Kinesis Data Firehose ΠΈ AWS Glue, yekuchengetedza - AWS S3. Uchishandisa iyi bundle, unogona kuchengeta kwete nginx matanda chete, asiwo zvimwe zviitiko, pamwe chete nematanda emamwe masevhisi. Iwe unogona kutsiva zvimwe zvikamu nezvimwe zvakafanana kune yako stack, semuenzaniso, unogona kunyora matanda kune kinesis zvakananga kubva nginx, nekupfuura fluentd, kana kushandisa logstash kune izvi.

Kuunganidza Nginx matanda

Nekusagadzikana, Nginx matanda anotaridzika seizvi:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Iwo anogona kupatsanurwa, asi zviri nyore kugadzirisa iyo Nginx kumisikidzwa kuitira kuti ibudise matanda muJSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 yekuchengetedza

Kuchengeta matanda, isu tichashandisa S3. Izvi zvinokutendera kuti uchengetedze uye uongorore matanda munzvimbo imwechete, sezvo Athena inogona kushanda nedata muS3 zvakananga. Gare gare mune chinyorwa ini ndichakuudza nzira yekuwedzera uye kugadzirisa matanda, asi chekutanga tinoda bhaketi rakachena muS3, umo hapana chimwe chichachengetwa. Zvakakodzera kufunga pachine nguva kuti ndeipi dunhu rauchange uchigadzira bhaketi rako, nekuti Athena haisi kuwanikwa mumatunhu ese.

Kugadzira dunhu muAthena console

Ngatigadzire tafura muAthena yematanda. Inodiwa kune zvese kunyora nekuverenga kana ukaronga kushandisa Kinesis Firehose. Vhura iyo Athena console uye gadzira tafura:

SQL tafura kugadzira

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Kugadzira Kinesis Firehose Stream

Kinesis Firehose inonyora iyo data yakagamuchirwa kubva kuNginx kuenda kuS3 mune yakasarudzwa fomati, ichikamura kuita madhairekitori muYYYY/MM/DD/HH fomati. Izvi zvichauya zvakanaka kana uchiverenga data. Iwe unogona, zvechokwadi, kunyora zvakananga kuS3 kubva kune unyanzvi, asi munyaya iyi uchafanira kunyora JSON, uye izvi hazvina kukwana nekuda kwehukuru hwemafaira. Pamusoro pezvo, kana uchishandisa PrestoDB kana Athena, JSON ndiyo inononoka data fomati. Saka vhura iyo Kinesis Firehose koni, tinya "Gadzira kuburitsa rukova", sarudza "yakananga PUT" mundima "yekutumira":

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

Mune inotevera tebhu, sarudza "Rekodha fomati shanduko" - "Yakagoneswa" uye sarudza "Apache ORC" sefomati yekurekodha. Maererano nedzimwe tsvakurudzo Owen O'Malley, iyi ndiyo yakakwana fomati yePrestoDB neAthena. Isu tinoshandisa tafura yatakagadzira pamusoro se schema. Ndokumbira utarise kuti iwe unogona kutsanangura chero S3 nzvimbo mu kinesis; chete schema ndiyo inoshandiswa kubva patafura. Asi kana iwe ukatsanangura imwe nzvimbo yeS3, saka haugone kuverenga marekodhi kubva patafura iyi.

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

Isu tinosarudza S3 yekuchengetedza uye bhaketi yatakagadzira kare. Aws Glue Crawler, yandichataura nezvayo zvishoma gare gare, haigone kushanda ne prefixes mubhaketi reS3, saka zvakakosha kuisiya isina chinhu.

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

Sarudzo dzakasara dzinogona kuchinjwa zvichienderana nemutoro wako; Ini ndinowanzo shandisa iwo akasarudzika. Ziva kuti S3 compression haiwanikwe, asi ORC inoshandisa kudzvanya kwekuzvarwa nekusarudzika.

fluentd

Iye zvino zvatakagadzirisa kuchengetedza uye kugamuchira matanda, tinoda kugadzirisa kutumira. Tichashandisa fluentd, nokuti ndinoda Ruby, asi unogona kushandisa Logstash kana kutumira matanda kune kinesis zvakananga. Iyo Fluentd server inogona kutangwa munzira dzinoverengeka, ini ndinokuudza nezve docker nekuti iri nyore uye iri nyore.

Kutanga, tinoda iyo fluent.conf configuration file. Gadzira uye wedzera sosi:

mhando Mberi
Chikepe 24224
sunga 0.0.0.0

Iye zvino unogona kutanga Fluentd server. Kana iwe uchida imwe gadziriso yepamusoro, enda ku Docker hub Pane gwara rakadzama, rinosanganisira nzira yekuunganidza mufananidzo wako.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Iyi gadziriso inoshandisa nzira /fluentd/log ku cache logs usati watumira. Iwe unogona kuita pasina izvi, asi kana watangazve, unogona kurasikirwa nezvose zvakachengetwa nemusana-kutyora basa. Iwe unogona zvakare kushandisa chero chiteshi; 24224 ndiyo yakasarudzika Fluentd port.

Iye zvino zvatine Fluentd inomhanya, tinogona kutumira matanda eNginx ipapo. Isu tinowanzo mhanyisa Nginx mumudziyo weDocker, mune iyo nyaya Docker ine mutyairi wekutema matanda weFluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Kana iwe uchimhanya Nginx zvakasiyana, unogona kushandisa faira mafaira, Fluentd ane faira muswe plugin.

Ngatiwedzerei danda parsing rakagadzirirwa pamusoro kune iyo Fluent kumisikidza:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Uye kutumira matanda kuKinesis uchishandisa kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Kana iwe wakagadzirisa zvese nemazvo, saka mushure mechinguva (nekusagadzika, Kinesis marekodhi akagamuchira data kamwe chete maminetsi gumi) unofanirwa kuona mafaira egi muS10. Mu "monitoring" menyu yeKinesis Firehose unogona kuona kuti yakawanda sei data yakanyorwa muS3, pamwe nekukanganisa. Usakanganwa kupa mukana wekunyora kune iyo S3 bhaketi kune Kinesis basa. Kana Kinesis ikatadza kupfuudza chimwe chinhu, inowedzera zvikanganiso kune rimwe bhakiti.

Iye zvino unogona kuona iyo data muAthena. Ngatitsvagei zvikumbiro zvichangoburwa zvatakadzosera zvikanganiso:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Kutarisa marekodhi ese pachikumbiro chega chega

Iye zvino matanda edu akagadziriswa uye akachengetwa muS3 muORC, akamanikidzwa uye akagadzirira kuongororwa. Kinesis Firehose yakatovaronga kuita madhairekitori kweawa yega yega. Nekudaro, chero bedzi tafura isina kugovaniswa, Athena anozorodha data-yenguva pane zvese chikumbiro, kunze zvisingawanzo. Iri idambudziko rakakura nekuda kwezvikonzero zviviri:

  • Huwandu hwe data huri kuramba huchikura, huchinonoka kubvunza;
  • Athena inobhadharwa zvichienderana nehuwandu hwe data yakaongororwa, ine hushoma hwe10 MB pakukumbira.

Kugadzirisa izvi, tinoshandisa AWS Glue Crawler, iyo inokambaira data muS3 uye kunyora ruzivo rwekugovera kuGlue Metastore. Izvi zvinotitendera kushandisa mapartitions sesefa kana uchibvunza Athena, uye inongotarisa madhairekitori anotsanangurwa mumubvunzo.

Kugadzira Amazon Glue Crawler

Amazon Glue Crawler inotarisa data rese muS3 bucket uye inogadzira matafura ane zvikamu. Gadzira Glue Crawler kubva kuAWS Glue console uye wedzera bhaketi kwaunochengeta iyo data. Iwe unogona kushandisa chinokambaira chimwe chete kumabhaketi akati wandei, muchiitiko icho chinozogadzira matafura mudura rakanyorwa nemazita anoenderana nemazita emabhaketi. Kana ukaronga kushandisa iyi data nguva nenguva, ita shuwa kugadzirisa Crawler's kutanga chirongwa kuti chienderane nezvido zvako. Isu tinoshandisa Crawler imwe yematafura ese, anomhanya awa yega yega.

Partitioned tables

Mushure mekutangwa kwekutanga kweanokambaira, matafura ega ega bhaketi rakaongororwa anofanirwa kuoneka mudhatabhesi rakatsanangurwa muzvirongwa. Vhura iyo Athena koni uye uwane tafura ine Nginx matanda. Ngatiedze kuverenga chimwe chinhu:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Mubvunzo uyu uchasarudza marekodhi ese akatambirwa pakati pa6 am na7 a.m. musi wa8 Kubvumbi 2019. Asi izvi zvinonyanya kushanda zvakanyanya sei pane kungoverenga kubva patafura isina kupatsanurwa? Ngatitsvagei uye tosarudza marekodhi mamwe chete, tichiasefa netimestamp:

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

3.59 masekonzi uye 244.34 megabytes yedata pane dataset ine vhiki chete yematanda. Ngatiedzei sefa nekupatsanura:

Nginx log analytics uchishandisa Amazon Athena uye Cube.js

Kukurumidza zvishoma, asi zvakanyanya kukosha - chete 1.23 megabytes yedata! Zvingave zvakachipa zvakanyanya kana zvisiri zvishoma zvegumi megabytes pachikumbiro mumitengo. Asi zvichiri nani, uye pamaseti makuru mutsauko uchanyanya kunakidza.

Kuvaka dashboard uchishandisa Cube.js

Kuti tiunganidze dashboard, tinoshandisa Cube.js analytical framework. Iyo ine akawanda mabasa, asi isu tinofarira maviri: kugona kushandisa otomatiki mafirita ekuparadzanisa uye data pre-aggregation. Inoshandisa data schema data schema, yakanyorwa muJavascript kugadzira SQL uye kuita mubvunzo wedatabase. Isu tinongoda kuratidza nzira yekushandisa iyo yekuparadzanisa sefa mune data schema.

Ngatigadzirei itsva Cube.js application. Sezvo isu tave kutoshandisa iyo AWS stack, zvine musoro kushandisa Lambda kuendeswa. Iwe unogona kushandisa template inotsanangura yechizvarwa kana iwe ukaronga kuitisa Cube.js backend muHeroku kana Docker. Zvinyorwa zvinotsanangura zvimwe nzira dzekutambira.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Nzvimbo dzakasiyana-siyana dzinoshandiswa kugadzirisa kuwanikwa kwedatabase mu cube.js. Jenareta ichagadzira .env faira umo iwe unogona kutsanangura makiyi ako Athena.

Zvino tinoda data schema, umo ticharatidza chaizvo kuti matanda edu anochengetwa sei. Ikoko iwe unogona zvakare kutsanangura maitiro ekuverengera metrics emadhibhodhi.

Mudhairekitori schema, gadzira faira Logs.js. Heino muenzaniso data data ye nginx:

Model kodhi

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Pano tiri kushandisa shanduko FILTER_PARAMSkugadzira SQL query ine partition filter.

Isu tinoseta zvakare ma metric uye ma parameter atinoda kuratidza pane dashboard uye tinotsanangura pre-aggregations. Cube.js ichagadzira mamwe matafura ane pre-aggregated data uye inongogadzirisa iyo data painosvika. Izvi hazvisi kungomhanyisa mibvunzo, asiwo zvinoderedza mutengo wekushandisa Athena.

Ngatiwedzerei ruzivo urwu kune data schema faira:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Isu tinotsanangura mune ino modhi kuti zvakafanira kufanounganidza data kune ese metrics anoshandiswa, uye kushandisa kupatsanura nemwedzi. Pre-aggregation partitioning inogona kukurumidzira zvakanyanya kuunganidza data uye kugadzirisa.

Iye zvino tinogona kuunganidza dashboard!

Cube.js backend inopa VAMWE API uye seti yemaraibhurari evatengi kune akakurumbira epamberi-magumo masisitimu. Isu tichashandisa iyo React vhezheni yemutengi kuvaka dashboard. Cube.js inopa data chete, saka tichada raibhurari yekuona - ndinoifarira recharts, asi unogona kushandisa chero.

Iyo Cube.js server inogamuchira chikumbiro mukati JSON format, iyo inotsanangura zviyero zvinodiwa. Semuenzaniso, kuverenga kuti ingani zvikanganiso zvakapihwa naNginx masikati, unofanirwa kutumira chikumbiro chinotevera:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Ngatiisei Cube.js mutengi uye React chikamu raibhurari kuburikidza neNPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Isu tinopinza zvinhu cubejs ΠΈ QueryRendererkurodha data, uye kuunganidza dashboard:

Dashboard code

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Dashboard zvinyorwa zvinowanikwa pa code sandbox.

Source: www.habr.com

Voeg