Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Nthawi zambiri, zinthu zamalonda kapena njira zotsegulira zotseguka, monga Prometheus + Grafana, zimagwiritsidwa ntchito kuyang'anira ndikuwunika momwe Nginx ikugwirira ntchito. Iyi ndi njira yabwino yowunikira kapena kusanthula zenizeni zenizeni, koma osati yabwino kusanthula mbiri. Pazinthu zilizonse zodziwika, kuchuluka kwa data kuchokera ku zipika za nginx kukukulirakulira, ndipo kusanthula kuchuluka kwa data, ndizomveka kugwiritsa ntchito china chake chapadera.

M'nkhaniyi ndikuuzani momwe mungagwiritsire ntchito Athena kusanthula zipika, kutenga Nginx mwachitsanzo, ndipo ndikuwonetsa momwe mungasonkhanitsire dashboard yowunikira kuchokera pa datayi pogwiritsa ntchito maziko otseguka a cube.js. Nayi njira yonse yomangamanga:

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

TL: DR;
Lumikizani ku dashboard yomalizidwa.

Kusonkhanitsa zambiri zomwe timagwiritsa ntchito bwino, kwa processing - AWS Kinesis Data Firehose ΠΈ Guluu wa AWS, posungira - Zowonjezera. Pogwiritsa ntchito mtolowu, simungathe kusunga zipika za nginx, komanso zochitika zina, komanso zipika za mautumiki ena. Mutha kusintha magawo ena ndi ofanana ndi stack yanu, mwachitsanzo, mutha kulemba zipika ku kinesis molunjika kuchokera ku nginx, kudutsa momveka bwino, kapena kugwiritsa ntchito logstash pa izi.

Kusonkhanitsa zipika za Nginx

Mwachikhazikitso, zipika za Nginx zimawoneka motere:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Atha kugawidwa, koma ndikosavuta kukonza kasinthidwe ka Nginx kuti apange zipika mu JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 yosungirako

Kusunga zipika, tidzagwiritsa ntchito S3. Izi zimakuthandizani kuti musunge ndikusanthula zipika pamalo amodzi, popeza Athena amatha kugwira ntchito ndi data mu S3 mwachindunji. Pambuyo pake m'nkhaniyo ndikuuzani momwe mungawonjezere ndi kukonza mitengo molondola, koma choyamba timafunikira chidebe choyera mu S3, chomwe sichidzasungidwanso. Ndikoyenera kulingalira pasadakhale dera lomwe mukupanga chidebe chanu, chifukwa Athena sichipezeka m'madera onse.

Kupanga dera mu Athena console

Tiyeni tipange tebulo ku Athena la zipika. Zimafunika polemba komanso kuwerenga ngati mukufuna kugwiritsa ntchito Kinesis Firehose. Tsegulani Athena console ndikupanga tebulo:

Kupanga tebulo la SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Kupanga Kinesis Firehose Stream

Kinesis Firehose idzalemba zomwe zalandilidwa kuchokera ku Nginx kupita ku S3 m'njira yosankhidwa, ndikuzigawa m'madongosolo a YYYY/MM/DD/HH. Izi zidzathandiza powerenga deta. Mukhoza, ndithudi, kulemba mwachindunji S3 kuchokera bwino, koma pamenepa muyenera kulemba JSON, ndipo izi sizothandiza chifukwa cha kukula kwakukulu kwa mafayilo. Kuphatikiza apo, mukamagwiritsa ntchito PrestoDB kapena Athena, JSON ndiye mtundu wocheperako kwambiri wa data. Chifukwa chake tsegulani cholumikizira cha Kinesis Firehose, dinani "Pangani mtsinje wotumizira", sankhani "Direct PUT" mugawo la "kutumiza":

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Patsamba lotsatira, sankhani "Record format conversion" - "Enabled" ndikusankha "Apache ORC" ngati mtundu wojambulira. Malinga ndi kafukufuku wina Owen O'Malley, iyi ndiye mawonekedwe abwino kwambiri a PrestoDB ndi Athena. Timagwiritsa ntchito tebulo lomwe tapanga pamwambapa ngati schema. Chonde dziwani kuti mutha kufotokoza malo aliwonse a S3 mu kinesis; schema yokha ndiyomwe imagwiritsidwa ntchito patebulo. Koma ngati mutchula malo ena a S3, ndiye kuti simungathe kuwerenga zolemba izi patebuloli.

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Timasankha S3 yosungirako ndi ndowa yomwe tidapanga kale. Aws Glue Crawler, yomwe ndilankhula mtsogolomo, siyingagwire ntchito ndi ma prefixes mumtsuko wa S3, chifukwa chake ndikofunikira kuyisiya yopanda kanthu.

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Zosankha zotsalira zitha kusinthidwa kutengera katundu wanu; Nthawi zambiri ndimagwiritsa ntchito zosasinthika. Dziwani kuti kuponderezana kwa S3 kulibe, koma ORC imagwiritsa ntchito kukanikiza kwawoko mwachisawawa.

bwino

Tsopano popeza takonza zosunga ndi kulandira zipika, tiyenera kukonza kutumiza. Tidzagwiritsa ntchito bwino, chifukwa ndimakonda Ruby, koma mungagwiritse ntchito Logstash kapena kutumiza zipika ku kinesis mwachindunji. Seva ya Fluentd ikhoza kukhazikitsidwa m'njira zingapo, ndikuwuzani za docker chifukwa ndizosavuta komanso zosavuta.

Choyamba, tifunika fayilo yosinthira fluent.conf. Pangani ndikuwonjezera gwero:

mtundu patsogolo
doko 24224
sunga 0.0.0.0

Tsopano mutha kuyambitsa seva ya Fluentd. Ngati mukufuna masinthidwe apamwamba kwambiri, pitani ku Docker likulu Pali kalozera watsatanetsatane, kuphatikiza momwe mungasonkhanitsire chithunzi chanu.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Kukonzekera uku kumagwiritsa ntchito njira /fluentd/log kusungitsa zipika musanatumize. Mutha kuchita popanda izi, koma mukayambiranso, mutha kutaya chilichonse chosungidwa ndi ntchito yosweka. Mutha kugwiritsanso ntchito doko lililonse; 24224 ndiye doko lokhazikika la Fluentd.

Tsopano popeza tili ndi Fluentd ikuyenda, titha kutumiza zipika za Nginx kumeneko. Nthawi zambiri timayendetsa Nginx mu chidebe cha Docker, pomwe Docker amakhala ndi woyendetsa mitengo wa Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Ngati muthamanga Nginx mosiyana, mutha kugwiritsa ntchito mafayilo a log, Fluentd ali nawo file mchira pulogalamu yowonjezera.

Tiyeni tiwonjeze zolemba zomwe zakonzedwa pamwambapa pakusintha kwa Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Ndipo kutumiza zipika ku Kinesis pogwiritsa ntchito kinesis firehose pulogalamu yowonjezera:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Ngati mwakonza zonse molondola, ndiye pakapita kanthawi (mwachisawawa, zolemba za Kinesis zidalandira deta kamodzi mphindi 10 zilizonse) muyenera kuwona mafayilo olembera mu S3. Mu "Monitoring" menyu ya Kinesis Firehose mukhoza kuona kuchuluka kwa deta yolembedwa mu S3, komanso zolakwika. Musaiwale kupereka mwayi wolembera ku ndowa ya S3 ku gawo la Kinesis. Ngati Kinesis sakanatha kuwerengera china chake, imawonjezera zolakwika pachidebe chomwecho.

Tsopano mutha kuwona zomwe zili mu Athena. Tiyeni tipeze zopempha zaposachedwa zomwe tabwezera zolakwika:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Kusanthula zolemba zonse za pempho lililonse

Tsopano zipika zathu zakonzedwa ndikusungidwa mu S3 ku ORC, zopanikizidwa ndikukonzekera kusanthula. Kinesis Firehose adawapanganso kukhala ola limodzi. Komabe, malinga ngati tebulo silinagawidwe, Athena adzatsegula deta yanthawi zonse pa pempho lililonse, kupatulapo kawirikawiri. Ili ndi vuto lalikulu pazifukwa ziwiri:

  • Kuchuluka kwa deta kumakula nthawi zonse, kuchepetsa mafunso;
  • Athena amalipidwa kutengera kuchuluka kwa deta yomwe yasinthidwa, ndi osachepera 10 MB pa pempho lililonse.

Kuti tikonze izi, timagwiritsa ntchito AWS Glue Crawler, yomwe idzakwawa zomwe zili mu S3 ndikulemba zambiri za magawo ku Glue Metastore. Izi zitilola kugwiritsa ntchito magawo ngati fyuluta pofunsa Athena, ndipo zimangoyang'ana zolemba zomwe zafotokozedwa mufunsolo.

Kukhazikitsa Amazon Glue Crawler

Amazon Glue Crawler amasanthula zonse zomwe zili mu chidebe cha S3 ndikupanga matebulo okhala ndi magawo. Pangani Glue Crawler kuchokera ku AWS Glue console ndikuwonjezera ndowa momwe mumasungira deta. Mukhoza kugwiritsa ntchito chokwawa chimodzi pa zidebe zingapo, momwemo chidzapanga matebulo mu database yotchulidwa ndi mayina omwe akugwirizana ndi mayina a zidebe. Ngati mukufuna kugwiritsa ntchito datayi pafupipafupi, onetsetsani kuti mwakonza ndondomeko yotsegulira ya Crawler kuti igwirizane ndi zosowa zanu. Timagwiritsa ntchito Crawler imodzi pamagome onse, omwe amayenda ola lililonse.

Matebulo ogawa

Pambuyo pakukhazikitsa koyamba kwa chokwawa, matebulo a chidebe chilichonse chojambulidwa akuyenera kuwonekera muzosungira zomwe zafotokozedwa m'makonzedwe. Tsegulani cholumikizira cha Athena ndikupeza tebulo lomwe lili ndi zipika za Nginx. Tiyeni tiyese kuwerenga chinachake:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Funsoli lidzasankha marekodi onse omwe alandilidwa pakati pa 6 koloko mpaka 7 koloko pa Epulo 8, 2019. Koma kodi izi ndizochita bwino bwanji kuposa kungowerenga kuchokera patebulo lopanda magawo? Tipeze ndikusankha zolemba zomwezo, ndikuzisefa ndi sitepe yanthawi:

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Masekondi 3.59 ndi ma 244.34 megabytes a data pa dataset yokhala ndi logi limodzi lokha. Tiyeni tiyese zosefera pogawa:

Nginx log analytics pogwiritsa ntchito Amazon Athena ndi Cube.js

Kuthamanga pang'ono, koma chofunika kwambiri - 1.23 megabytes yokha ya deta! Zingakhale zotsika mtengo kwambiri ngati sichochepa pa ma megabytes 10 pa pempho lililonse pamitengo. Koma ndizabwinoko, ndipo pamaseti akulu akulu kusiyana kudzakhala kosangalatsa kwambiri.

Kupanga dashboard pogwiritsa ntchito Cube.js

Kuti tisonkhanitse dashboard, timagwiritsa ntchito Cube.js analytical framework. Ili ndi ntchito zambiri, koma tili ndi chidwi ndi ziwiri: kuthekera kogwiritsa ntchito zosefera zogawa ndi kusonkhanitsa deta. Imagwiritsa ntchito schema ya data schema data, yolembedwa mu Javascript kuti ipange SQL ndikuyankha funso la database. Timangofunika kuwonetsa momwe tingagwiritsire ntchito fyuluta yogawa mu data schema.

Tiyeni tipange pulogalamu yatsopano ya Cube.js. Popeza tikugwiritsa ntchito stack ya AWS, ndizomveka kugwiritsa ntchito Lambda poyimitsa. Mutha kugwiritsa ntchito template yofotokozera m'badwo ngati mukufuna kuchititsa Cube.js backend ku Heroku kapena Docker. Zolembazo zikufotokoza zina njira zopezera.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Zosintha zachilengedwe zimagwiritsidwa ntchito kukonza mwayi wa database mu cube.js. Jenereta idzapanga fayilo ya .env momwe mungatchule makiyi anu Athena.

Tsopano ife tikusowa schema data, momwe tidzasonyezera ndendende momwe zipika zathu zimasungidwira. Kumeneko mungathe kufotokozeranso momwe mungawerengere ma metrics a dashboards.

Mu directory schema, pangani fayilo Logs.js. Nachi chitsanzo cha data cha nginx:

Model kodi

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Apa tikugwiritsa ntchito variable FILTER_PARAMSkuti mupange funso la SQL ndi fyuluta yogawa.

Timayikanso ma metrics ndi magawo omwe tikufuna kuwonetsa pa dashboard ndikutchula zophatikiza kale. Cube.js ipanga matebulo owonjezera okhala ndi data yophatikizidwa kale ndipo idzasintha zokha data ikafika. Izi sizimangofulumira kufunsa, komanso zimachepetsa mtengo wogwiritsa ntchito Athena.

Tiyeni tiwonjeze izi ku fayilo ya data schema:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Timalongosola mu modeli iyi kuti ndikofunikira kusanjikizatu data yama metric onse ogwiritsidwa ntchito, ndikugwiritsa ntchito magawo pamwezi. Pre-aggregation partitioning ikhoza kufulumizitsa kwambiri kusonkhanitsa deta ndi kukonzanso.

Tsopano titha kusonkhanitsa dashboard!

Cube.js backend imapereka REST API ndi mndandanda wamalaibulale amakasitomala amitundu yodziwika yakutsogolo. Tigwiritsa ntchito mtundu wa React wa kasitomala kupanga dashboard. Cube.js imapereka deta yokha, kotero tidzafunika laibulale yowonera - ndimakonda recharts, koma mutha kugwiritsa ntchito iliyonse.

Seva ya Cube.js imavomereza pempholo Mtundu wa JSON, yomwe imatchula ma metric ofunikira. Mwachitsanzo, kuti muwerenge zolakwika zingati zomwe Nginx adapereka masana, muyenera kutumiza zopempha zotsatirazi:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Tiyeni tiyike kasitomala wa Cube.js ndi laibulale ya React kudzera pa NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Timalowetsa zigawo cubejs ΠΈ QueryRendererkutsitsa deta, ndi kutolera bolodi:

Dashboard kodi

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Dashboard magwero akupezeka pa KodiSandbox.

Source: www.habr.com

Kuwonjezera ndemanga