I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Ngokuqhelekileyo, iimveliso zorhwebo okanye ezinye iindlela ezivulekileyo ezivulekileyo, ezifana nePrometheus + Grafana, zisetyenziselwa ukubeka iliso kunye nokuhlalutya ukusebenza kweNginx. Olu lukhetho oluhle lokubeka iliso okanye uhlalutyo lwexesha lokwenyani, kodwa alulungele kakhulu uhlalutyo lwembali. Kuwo nawuphi na uvimba odumileyo, umthamo wedatha evela kwi-nginx logs ukhula ngokukhawuleza, kwaye ukuhlalutya inani elikhulu ledatha, kunengqiqo ukusebenzisa into ekhethekileyo.

Kweli nqaku ndiza kukuxelela indlela ongayisebenzisa ngayo Athena ukuhlalutya izingodo, ukuthatha iNginx njengomzekelo, kwaye ndiza kubonisa indlela yokudibanisa i-dashboard yohlalutyo kule datha usebenzisa i-open-source cube.js framework. Nasi isisombululo esipheleleyo soyilo loyilo:

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

TL:DR;
Ikhonkco kwideshbhodi egqityiweyo.

Ukuqokelela ulwazi esilusebenzisayo kakuhle, ukulungiselela - AWS Kinesis Data Firehose ΠΈ Iglu yeAWS, yokugcina - I-AWS S3. Ukusebenzisa le nqwaba, awukwazi ukugcina kuphela i-nginx logs, kodwa kunye nezinye iziganeko, kunye neelog zezinye iinkonzo. Ungabuyisela amanye amalungu afanayo kwisitaki sakho, umzekelo, ungabhala iilog kwi kinesis ngqo kwi nginx, ngodlula kakuhle, okanye usebenzise logstash kule.

Ukuqokelela iilog zeNginx

Ngokungagqibekanga, iilog zeNginx zijongeka ngolu hlobo:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Zinokwahlulwa, kodwa kulula kakhulu ukulungisa uqwalaselo lwe-Nginx ukuze luvelise iilog kwi-JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 yokugcina

Ukugcina izingodo, siya kusebenzisa i-S3. Oku kukuvumela ukuba ugcine kwaye uhlalutye izingodo kwindawo enye, ekubeni i-Athena inokusebenza ngedatha kwi-S3 ngokuthe ngqo. Kamva kwinqaku ndiza kukuxelela indlela yokongeza ngokuchanekileyo kunye nokucwangcisa izingodo, kodwa okokuqala sifuna ibhakethi ecocekileyo kwi-S3, apho akukho nto enye iya kugcinwa. Kufanelekile ukuqwalasela kwangaphambili ukuba yeyiphi ingingqi oya kuyenza ibhakethi lakho, kuba i-Athena ayifumaneki kuyo yonke imimandla.

Ukudala isiphaluka kwi-console ye-Athena

Masenze itafile e-Athena yezigodo. Kuyafuneka kokubini ukubhala nokufunda ukuba uceba ukusebenzisa iKinesis Firehose. Vula i-console ye-Athena kwaye wenze itafile:

Ukudalwa kwetafile yeSQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ukudala iKinesis Firehose Stream

I-Kinesis Firehose iya kubhala idatha efunyenwe kwi-Nginx ukuya kwi-S3 kwifomathi ekhethiweyo, iyahlula ibe ngabalawuli kwi-YYYY/MM/DD/HH. Oku kuya kuba luncedo xa ufunda idatha. Unako, ngokuqinisekileyo, ubhale ngokuthe ngqo kwi-S3 ukusuka kwi-flueld, kodwa kulo mzekelo kuya kufuneka ubhale i-JSON, kwaye oku akusebenzi kakuhle ngenxa yobukhulu beefayile. Ukongeza, xa usebenzisa i-PrestoDB okanye i-Athena, i-JSON yeyona fomati yedatha ecothayo. Ke vula ikhonsoli yeKinesis Firehose, cofa "Yenza umjelo wokuhambisa", khetha u-"PUT ngqo" kwindawo "yokuhambisa":

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Kwithebhu elandelayo, khetha u- "Record format conversion" - "Enebled" kwaye ukhethe "Apache ORC" njengefomathi yokurekhoda. Ngokutsho kolunye uphando Owen O'Malley, le yeyona fomati ifanelekileyo ye-PrestoDB kunye ne-Athena. Sisebenzisa itheyibhile esiyenzileyo ngasentla njengeschema. Nceda uqaphele ukuba ungakhankanya nayiphi na indawo ye-S3 kwi-kinesis; kuphela i-schema esetyenziswa kwitafile. Kodwa ukuba ukhankanya indawo eyahlukileyo ye-S3, ngoko awuyi kukwazi ukufunda ezi rekhodi kule theyibhile.

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Sikhetha i-S3 yokugcina kunye nebhakethi esiyidale ngaphambili. I-Aws Glue Crawler, endiya kuthetha ngayo kamva, ayikwazi ukusebenza kunye nezimaphambili kwibhakethi ye-S3, ngoko ke kubalulekile ukuyishiya ingenanto.

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Iinketho ezishiyekileyo zinokutshintshwa ngokuxhomekeke kumthwalo wakho; Ndidla ngokusebenzisa ezo zikhoyo. Qaphela ukuba ucinezelo lwe-S3 alufumaneki, kodwa i-ORC isebenzisa ucinezelo lwemveli ngokungagqibekanga.

kakuhle

Ngoku ukuba siqwalasele ukugcina kunye nokufumana iilog, kufuneka siqwalasele ukuthumela. Siza kusebenzisa kakuhle, kuba ndiyamthanda uRuby, kodwa ungasebenzisa i-Logstash okanye uthumele iingodo kwi-kinesis ngqo. Iseva ye-Fluentd inokusungulwa ngeendlela ezininzi, ndiza kukuxelela malunga ne-docker kuba ilula kwaye ilungile.

Okokuqala, sifuna ifayile yoqwalaselo fluent.conf. Yidale kwaye wongeze umthombo:

uhlobo phambili
izibuko le-24224
bopha 0.0.0.0

Ngoku ungaqala iseva yeFluentd. Ukuba ufuna uqwalaselo oluphambili, yiya ku Ihabhu yedokodo Kukho isikhokelo esineenkcukacha, kubandakanya indlela yokuhlanganisa umfanekiso wakho.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Olu qwalaselo lusebenzisa umendo /fluentd/log kwi-cache logs phambi kokuthumela. Unokwenza ngaphandle koku, kodwa emva koko xa uqala kwakhona, unokuphulukana nayo yonke into egcinwe nge-back-breaking labour. Ungasebenzisa kwakhona naliphi na izibuko;

Ngoku ekubeni sineFluentd esebenzayo, sinokuthumela iilog zeNginx apho. Sihlala siqhuba i-Nginx kwisikhongozeli se-Docker, apho i-Docker inomqhubi wokugawulwa kwemithi weFluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Ukuba usebenzisa i-Nginx ngokwahlukileyo, ungasebenzisa iifayile zelog, uFluentd unayo iplagi yomsila wefayile.

Masiyongeze ulwahlulo lwelog oluqwalaselwe ngasentla kuqwalaselo lweFluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Kwaye ukuthumela izingodo kwiKinesis usebenzisa kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Ukuba ulungiselele yonke into ngokuchanekileyo, emva koko emva kwexesha (ngokungagqibekanga, iirekhodi zeKinesis zifumene idatha kanye ngemizuzu eyi-10) kufuneka ubone iifayile zelog kwi-S3. Kwimenyu "yokubeka iliso" ye-Kinesis Firehose unokubona ukuba ingakanani idatha erekhodiweyo kwi-S3, kunye neempazamo. Ungalibali ukunika ukufikelela kokubhala kwibhakethi ye-S3 kwindima yeKinesis. Ukuba iKinesis ayikwazanga ukwahlula into, iyakongeza iimpazamo kwibhakethi enye.

Ngoku unokujonga idatha kwi-Athena. Masifumane izicelo zamva nje esibuyisele iimpazamo ngazo:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Ukuskena zonke iirekhodi zesicelo ngasinye

Ngoku iilogi zethu zicutshungulwe kwaye zigcinwe kwi-S3 kwi-ORC, zixinzelelwe kwaye zilungele ukuhlalutya. I-Kinesis Firehose yade yaququzelela ukuba ibe ngabalawuli beyure nganye. Nangona kunjalo, nje ukuba itafile ingahlulwanga, i-Athena iya kulayisha idatha yexesha lonke kwisicelo ngasinye, ngaphandle kwezinto ezinqabileyo. Le yingxaki enkulu ngenxa yezizathu ezibini:

  • Umthamo wedatha uhlala ukhula, unciphisa imibuzo;
  • I-Athena ihlawuliswa ngokusekelwe kumthamo wedatha eskeniweyo, ubuncinane be-10 MB ngesicelo ngasinye.

Ukulungisa oku, sisebenzisa i-AWS Glue Crawler, eya kukhwela idatha kwi-S3 kwaye ibhale ulwazi lokwahlula kwi-Glue Metastore. Oku kuya kusivumela ukuba sisebenzise izahlulelo njengecebo lokucoca xa ubuza uAthena, kwaye izakuskena kuphela abalawuli abaxeliweyo kumbuzo.

Ukuseta i-Amazon Glue Crawler

I-Amazon Glue Crawler ihlola yonke idatha kwibhakethi ye-S3 kwaye yenza iitafile ezinezahlulo. Yenza i-Glue Crawler esuka kwi-AWS Glue console kwaye ungeze ibhakethi apho ugcina khona idatha. Ungasebenzisa umkhangeli omnye kwiibhakethi ezininzi, apho iyakwenza iitafile kwindawo egciniweyo ekhankanyiweyo enamagama ahambelana namagama eemele. Ukuba uceba ukusebenzisa le datha rhoqo, qiniseka ukuqwalasela ishedyuli yokuphehlelelwa kweCrawler ukuze ihambelane neemfuno zakho. Sisebenzisa iCrawler enye kuzo zonke iitafile, eziqhuba iyure nganye.

Iitafile ezahluliweyo

Emva kophehlelelo lokuqala lomkhangeli, iitafile zebhakethi nganye eskeniweyo kufuneka ivele kuvimba weenkcukacha oxeliweyo kwizicwangciso. Vula i-console ye-Athena kwaye ufumane itafile ene-Nginx logs. Masizame ukufunda into:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Lo mbuzo uya kukhetha zonke iirekhodi ezifunyenwe phakathi kwentsimbi yesi-6 kunye neyesi-7 kusasa nge-8 ka-Epreli 2019. Kodwa ingaba oku kusebenza ngakumbi kangakanani kunokufunda nje kwitafile engahlulwanga? Masifumanise kwaye sikhethe iirekhodi ezifanayo, sizihluze ngesitampu sexesha:

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Imizuzwana eyi-3.59 kunye ne-244.34 megabytes yedatha kwidathasethi eneveki kuphela yelog. Masizame isihluzi ngokwahlulahlula:

I-Nginx log analytics usebenzisa i-Amazon Athena kunye neCube.js

Ngokukhawuleza, kodwa okona kubaluleke kakhulu - kuphela i-megabytes eyi-1.23 yedatha! Ingabiza ixabiso eliphantsi kakhulu xa kungekuko ubuncinci be-megabytes ezili-10 ngesicelo ngasinye kumaxabiso. Kodwa kusengcono kakhulu, kwaye kwiiseti zedatha ezinkulu umahluko uya kuba unomtsalane ngakumbi.

Ukwakha ideshibhodi usebenzisa iCube.js

Ukudibanisa ideshibhodi, sisebenzisa isakhelo sohlalutyo seCube.js. Inemisebenzi emininzi, kodwa sinomdla kokubini: ukukwazi ukusebenzisa ngokuzenzekelayo izihluzi zesahlulelo kunye nokuhlanganiswa kwangaphambili kwedatha. Isebenzisa i-schema yedatha iskimu sedatha, ebhalwe kwiJavascript ukwenza iSQL kwaye iphumeze umbuzo wesiseko sedata. Kufuneka sibonise kuphela indlela yokusebenzisa isihluzo sokwahlula kwi-schema yedatha.

Masenze isicelo esitsha seCube.js. Kuba sele sisebenzisa istaki se-AWS, kuyavakala ukusebenzisa iLambda ukusasazwa. Ungasebenzisa itemplate ecacileyo yesizukulwana ukuba uceba ukubamba iCube.js backend kwiHeroku okanye kwiDocker. Amaxwebhu achaza ezinye iindlela zokubamba.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Izinto eziguquguqukayo zokusingqongileyo zisetyenziselwa ukuqwalasela ukufikelela kwisiseko sedatha kwi-cube.js. Umvelisi uyakwenza ifayile ye.env apho ungakhankanya khona izitshixo zakho Athena.

Ngoku siyayidinga iskimu sedatha, apho siya kubonisa ngqo indlela iilog zethu ezigcinwa ngayo. Apho ungakhankanya nokubalwa njani i-metrics kwiideshibhodi.

Kuluhlu schema, yenza ifayile Logs.js. Nanku umzekelo wedatha yenginx:

imodeli ikhowudi

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Apha sisebenzisa i-variable FILTER_PARAMSukuvelisa umbuzo weSQL ngesihluzo sesahlulelo.

Sikwaseta iimetrics kunye neeparameters esifuna ukuzibonisa kwideshibhodi kwaye sichaze ukuhlanganiswa kwangaphambili. ICube.js iya kudala iitafile ezongezelelweyo ezinedatha edityaniswe kwangaphambili kwaye iya kuzihlaziya ngokuzenzekelayo idatha njengoko ifika. Oku akukhawulezi nje imibuzo, kodwa kunciphisa iindleko zokusebenzisa i-Athena.

Masiyongeze olu lwazi kwifayile ye-schema yedatha:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Sicacisa kule modeli ukuba kuyimfuneko ukudibanisa kwangaphambili idatha yazo zonke iimetriki ezisetyenzisiweyo, kwaye usebenzise ulwahlulo ngenyanga. Ukwahlulahlula kwangaphambili inokukhawulezisa kakhulu ukuqokelelwa kwedatha kunye nokuhlaziya.

Ngoku sinokudibanisa ideshibhodi!

Cube.js ngasemva ibonelela I-API yokuphinda kunye neseti yeelayibrari zabaxumi kwiinkqubo ezidumileyo zesiphelo sangaphambili. Siza kusebenzisa iReact version yomthengi ukwakha ideshibhodi. ICube.js ibonelela ngedatha kuphela, ke siya kufuna ithala leencwadi lokubonwayo - ndiyayithanda recharts, kodwa ungasebenzisa nayiphi na.

Umncedisi we Cube.js wamkela isicelo kwi ifomathi ye-JSON, echaza iimetriki ezifunekayo. Umzekelo, ukubala ukuba zingaphi iimpazamo ezinikwe nguNginx ngemini, kufuneka uthumele esi sicelo silandelayo:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Masifakele umxhasi weCube.js kunye nelayibrari yecandelo leReact ngeNPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Singenisa amacandelo cubejs ΠΈ QueryRendererukukhuphela idatha, kwaye uqokelele ideshibhodi:

Ikhowudi yedeshibhodi

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Imithombo yeDashboard iyafumaneka kwi IkhowudiSandbox.

umthombo: www.habr.com

Yongeza izimvo