Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Imvamisa, imikhiqizo yokuhweba noma ezinye izindlela esezilungile ezenziwe ngomthombo ovulekile, njenge-Prometheus + Grafana, zisetshenziselwa ukuqapha nokuhlaziya ukusebenza kwe-Nginx. Lena inketho enhle yokuqapha noma ukuhlaziya kwesikhathi sangempela, kodwa ayilungele ukuhlaziya umlando. Kunoma iyiphi insiza ethandwayo, ivolumu yedatha evela kulogi ye-nginx ikhula ngokushesha, futhi ukuhlaziya inani elikhulu ledatha, kunengqondo ukusebenzisa okuthile okukhethekile.

Kulesi sihloko ngizokutshela ukuthi ungasebenzisa kanjani Athena ukuhlaziya izingodo, ukuthatha i-Nginx njengesibonelo, futhi ngizobonisa indlela yokuhlanganisa ideshibhodi yokuhlaziya kusuka kule datha usebenzisa uhlaka lwe-open-source cube.js. Nansi i-architecture yesisombululo esiphelele:

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

I-TL:DR;
Xhuma kudeshibhodi eqediwe.

Ukuqoqa ulwazi sisebenzisa eqephuzayo, ukucutshungulwa - I-AWS Kinesis Data Firehose ΠΈ Iglue le-AWS, okokugcina - I-AWS S3. Usebenzisa le nqwaba, awukwazi ukugcina izingodo ze-nginx kuphela, kodwa neminye imicimbi, kanye namalogi wezinye izinsizakalo. Ungakwazi ukufaka ezinye izingxenye esikhundleni sazo ezifanayo zesitaki sakho, isibonelo, ungabhala izingodo ku-kinesis ngokuqondile usuka ku-nginx, udlule ngokushelelayo, noma usebenzise i-logstash kulokhu.

Iqoqa izingodo ze-Nginx

Ngokuzenzakalelayo, izingodo ze-Nginx zibukeka kanjena:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Angahlukaniswa, kodwa kulula kakhulu ukulungisa ukucushwa kwe-Nginx ukuze kukhiqize izingodo ku-JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

I-S3 yokugcina

Ukugcina izingodo, sizosebenzisa i-S3. Lokhu kukuvumela ukuthi ugcine futhi uhlaziye izingodo endaweni eyodwa, njengoba i-Athena ingasebenza nedatha ku-S3 ngokuqondile. Kamuva esihlokweni ngizokutshela ukuthi ungangeza kanjani kahle futhi ucubungule izingodo, kodwa okokuqala sidinga ibhakede elihlanzekile ku-S3, lapho kungekho okunye okuzogcinwa khona. Kuhle ukucatshangelwa kusenesikhathi ukuthi uzodala kusiphi isifunda ibhakede lakho, ngoba i-Athena ayitholakali kuzo zonke izifunda.

Ukudala isifunda kukhonsoli ye-Athena

Masidale itafula e-Athena lamalogi. Kudingeka kukho kokubili ukubhala nokufunda uma uhlela ukusebenzisa i-Kinesis Firehose. Vula ikhonsoli ye-Athena bese udala ithebula:

Ukwakhiwa kwethebula le-SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ukudala i-Kinesis Firehose Stream

I-Kinesis Firehose izobhala idatha etholwe isuka ku-Nginx iye ku-S3 ngefomethi ekhethiwe, iyihlukanise ibe izinkomba ngefomethi ethi YYYY/MM/DD/HH. Lokhu kuzoba usizo lapho ufunda idatha. Yebo, ungabhala ngokuqondile ku-S3 usuka eqhweni, kodwa kulokhu kuzodingeka ubhale i-JSON, futhi lokhu kungasebenzi ngenxa yobukhulu bamafayela. Ukwengeza, uma usebenzisa i-PrestoDB noma i-Athena, i-JSON ifomethi yedatha enensa kakhulu. Ngakho-ke vula ikhonsoli ye-Kinesis Firehose, chofoza okuthi β€œDala ukusakaza kokulethwa”, khetha β€œi-PUT eqondile” kunkambu "yokulethwa":

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Kuthebhu elandelayo, khetha β€œUkuguqulwa kwefomethi yokurekhoda” - β€œKunikwe amandla” bese ukhetha β€œi-Apache ORC” njengefomethi yokurekhoda. Ngokocwaningo oluthile Owen O'Malley, lena ifomethi efanelekile ye-PrestoDB ne-Athena. Sisebenzisa ithebula esilidale ngenhla njenge-schema. Sicela uqaphele ukuthi ungacacisa noma iyiphi indawo ye-S3 ku-kinesis; i-schema kuphela esetshenziswa etafuleni. Kodwa uma ucacisa indawo ehlukile ye-S3, lapho-ke ngeke ukwazi ukufunda lawa marekhodi asuka kuleli thebula.

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Sikhetha i-S3 yokugcina kanye nebhakede esalidalile ekuqaleni. I-Aws Glue Crawler, engizokhuluma ngayo kamuva, ayikwazi ukusebenza neziqalo ebhakedeni le-S3, ngakho-ke kubalulekile ukuyishiya ingenalutho.

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Izinketho ezisele zingashintshwa kuye ngomthwalo wakho; Ngivame ukusebenzisa ezizenzakalelayo. Qaphela ukuthi ukucindezela kwe-S3 akutholakali, kodwa i-ORC isebenzisa ukucindezela komdabu ngokuzenzakalelayo.

eqephuzayo

Manje njengoba sesilungiselele ukugcina nokwamukela amalogi, sidinga ukulungiselela ukuthumela. Sizosebenzisa eqephuzayo, ngoba ngiyamthanda uRuby, kodwa ungasebenzisa i-Logstash noma uthumele izingodo ku-kinesis ngokuqondile. Iseva ye-Fluentd ingasungulwa ngezindlela eziningana, ngizokutshela mayelana ne-docker ngoba ilula futhi ilula.

Okokuqala, sidinga ifayela lokumisa le-fluent.conf. Idale futhi wengeze umthombo:

uhlobo phambili
imbobo 24224
hlanganisa 0.0.0.0

Manje ungaqala iseva ye-Fluentd. Uma udinga ukucushwa okuthuthuke kakhulu, yiya ku Ihabhu ledokodo Kukhona umhlahlandlela onemininingwane, ohlanganisa indlela yokuhlanganisa isithombe sakho.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Lokhu kulungiselelwa kusebenzisa indlela /fluentd/log ukugcina izingodo ngaphambi kokuthumela. Ungenza ngaphandle kwalokhu, kodwa lapho uqala kabusha, ungalahlekelwa yikho konke okugcinwe kunqolobane ngomsebenzi ophula iqolo. Ungasebenzisa futhi noma iyiphi imbobo; 24224 iyimbobo ezenzakalelayo ye-Fluentd.

Manje njengoba sine-Fluentd esebenzayo, singathumela izingodo ze-Nginx lapho. Sivame ukusebenzisa i-Nginx esitsheni se-Docker, lapho i-Docker inomshayeli wokugawula wendabuko we-Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Uma usebenzisa i-Nginx ngokuhlukile, ungasebenzisa amafayela welogi, i-Fluentd inawo i-plugin yomsila wefayela.

Ake sengeze ukuhlukaniswa kwelogi okulungiselelwe ngenhla ekucushweni kwe-Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Futhi ukuthumela izingodo ku-Kinesis usebenzisa i-plugin ye-kinesis firehose:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Uma ulungiselele yonke into ngendlela efanele, khona-ke ngemva kwesikhashana (ngokuzenzakalelayo, amarekhodi e-Kinesis athola idatha kanye njalo ngemizuzu eyi-10) kufanele ubone amafayela welogi ku-S3. Kumenyu "yokuqapha" ye-Kinesis Firehose ungabona ukuthi ingakanani idatha erekhodiwe ku-S3, kanye namaphutha. Ungakhohlwa ukunikeza ukufinyelela kokubhala ebhakedeni le-S3 endimeni ye-Kinesis. Uma i-Kinesis ingakwazi ukuncozulula okuthile, izongeza amaphutha ebhakedeni elifanayo.

Manje ungakwazi ukubuka idatha ku-Athena. Ake sithole izicelo zakamuva esibuyisele amaphutha ngazo:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Iskena wonke amarekhodi esicelweni ngasinye

Manje izingodo zethu sezicutshunguliwe futhi zagcinwa ku-S3 ku-ORC, zicindezelwe futhi zilungele ukuhlaziywa. I-Kinesis Firehose yaze yazihlela zaba izinkomba zehora ngalinye. Kodwa-ke, inqobo nje uma ithebula lingahlukanisiwe, i-Athena izolayisha idatha yesikhathi sonke kuso sonke isicelo, ngaphandle kokungavamile. Lokhu kuyinkinga enkulu ngenxa yezizathu ezimbili:

  • Ivolumu yedatha ikhula njalo, inciphisa imibuzo;
  • I-Athena ikhokhiswa ngokusekelwe kumthamo wedatha eskeniwe, okungenani okungu-10 MB ngesicelo ngasinye.

Ukuze silungise lokhu, sisebenzisa i-AWS Glue Crawler, ezocaca idatha ku-S3 futhi ibhale ulwazi lokuhlukanisa ku-Glue Metastore. Lokhu kuzosivumela ukuthi sisebenzise ama-partitions njengesihlungi lapho sibuza u-Athena, futhi kuzoskena kuphela izinkomba ezicaciswe embuzweni.

Isetha i-Amazon Glue Crawler

I-Amazon Glue Crawler iskena yonke idatha kubhakede le-S3 futhi idale amatafula anezihlukanisi. Dala i-Glue Crawler kusukela ku-AWS Glue console bese wengeza ibhakede lapho ugcina khona idatha. Ungasebenzisa isiseshi esisodwa kumabhakede amaningana, lapho sizodala amathebula kusizindalwazi esicacisiwe anamagama afana namagama amabhakede. Uma uhlela ukusebenzisa le datha njalo, qiniseka ukuthi umisa uhlelo lokuqalisa lwe-Crawler ukuze luvumelane nezidingo zakho. Sisebenzisa i-Crawler eyodwa kuwo wonke amathebula, esebenza njalo ngehora.

Amatafula ahlukanisiwe

Ngemva kokwethulwa kokuqala kwesiseshi, amathebula ebhakede ngalinye eliskeniwe kufanele avele kusizindalwazi esicaciswe kuzilungiselelo. Vula ikhonsoli ye-Athena futhi uthole itafula elinamalogi e-Nginx. Ake sizame ukufunda okuthile:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Lo mbuzo uzokhetha wonke amarekhodi atholwe phakathi kuka-6 a.m. no-7 a.m. ngo-April 8, 2019. Kodwa lokhu kusebenza kahle kangakanani kunokufunda nje etafuleni elingahlukanisiwe? Ake sithole bese sikhetha amarekhodi afanayo, siwahlunge ngesitembu sesikhathi:

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Amasekhondi angu-3.59 kanye namamegabhayithi angu-244.34 edatha kudathasethi eneviki kuphela lamalogi. Ake sizame isihlungi ngokuhlukanisa:

Izibalo zelogi ye-Nginx zisebenzisa i-Amazon Athena ne-Cube.js

Ngokushesha kancane, kodwa okubaluleke kakhulu - ama-megabytes angu-1.23 kuphela wedatha! Kungaba ishibhile kakhulu uma kungenjalo ngenani eliphansi lamamegabhayithi ayi-10 ngesicelo ngasinye senani. Kodwa kusengcono kakhulu, futhi kumadathasethi amakhulu umehluko uzomangalisa kakhulu.

Ukwakha ideshibhodi usebenzisa i-Cube.js

Ukuze sihlanganise ideshibhodi, sisebenzisa uhlaka lokuhlaziya lwe-Cube.js. Inemisebenzi eminingi impela, kodwa sinentshisekelo kokubili: ikhono lokusebenzisa ngokuzenzakalelayo izihlungi zokuhlukanisa kanye nokuhlanganisa ngaphambilini idatha. Isebenzisa i-schema yedatha i-schema yedatha, ebhalwe nge-Javascript ukuze kukhiqizwe i-SQL nokusebenzisa umbuzo wesizindalwazi. Sidinga kuphela ukukhombisa ukuthi sisetshenziswa kanjani isihlungi se-partition ku-schema sedatha.

Masidale uhlelo olusha lwe-Cube.js. Njengoba sesivele sisebenzisa isitaki se-AWS, kunengqondo ukusebenzisa i-Lambda ukuze sisetshenziswe. Ungasebenzisa isifanekiso esicacile sokukhiqiza uma uhlela ukusingatha i-backend ye-Cube.js ku-Heroku noma ku-Docker. Amadokhumenti achaza amanye izindlela zokubamba.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Okuguquguqukayo kwemvelo kusetshenziselwa ukulungisa ukufinyelela kusizindalwazi ku-cube.js. Ijeneretha izodala ifayela le-.env ongacacisa kulo okhiye bakho Athena.

Manje siyadinga i-schema yedatha, lapho sizobonisa khona ukuthi amalogi ethu agcinwa kanjani. Lapho ungaphinda ucacise indlela yokubala amamethrikhi amadeshibhodi.

Kuhla lwemibhalo schema, dala ifayela Logs.js. Nasi isibonelo semodeli yedatha ye-nginx:

Ikhodi yemodeli

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Lapha sisebenzisa okuguquguqukayo FILTER_PARAMSukukhiqiza umbuzo we-SQL ngesihlungi sokuhlukanisa.

Siphinde sisethe amamethrikhi namapharamitha esifuna ukuwabonisa kudeshibhodi futhi sicacise ukuhlanganiswa kwangaphambilini. I-Cube.js izodala amathebula engeziwe anedatha ehlanganiswe kusengaphambili futhi izobuyekeza idatha ngokuzenzakalelayo njengoba ifika. Lokhu akusheshisi kuphela imibuzo, kodwa futhi kunciphisa izindleko zokusebenzisa i-Athena.

Ake sengeze lolu lwazi kufayela le-schema yedatha:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Sicacisa kule modeli ukuthi kuyadingeka ukuhlanganisa kusengaphambili idatha yawo wonke amamethrikhi asetshenzisiwe, futhi usebenzise ukuhlukanisa ngenyanga. Ukuhlukaniswa kwangaphambilini kokuhlanganisa ingasheshisa kakhulu ukuqoqwa kwedatha nokuvuselela.

Manje singakwazi ukuhlanganisa ideshibhodi!

Cube.js backend inikeza I-REST API kanye neqoqo lemitapo yolwazi yamakhasimende yezinhlaka ezidumile zasekugcineni. Sizosebenzisa inguqulo ye-React yeklayenti ukuze sakhe ideshibhodi. I-Cube.js inikeza idatha kuphela, ngakho-ke sizodinga umtapo wolwazi - ngiyayithanda recharts, kodwa ungasebenzisa noma yikuphi.

Iseva ye-Cube.js yamukela isicelo ku- Ifomethi ye-JSON, ecacisa amamethrikhi adingekayo. Isibonelo, ukubala ukuthi mangaki amaphutha anikezwe i-Nginx ngosuku, udinga ukuthumela isicelo esilandelayo:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Masifake iklayenti le-Cube.js nomtapo wezincwadi wengxenye ye-React nge-NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Singenisa izingxenye cubejs ΠΈ QueryRendererukuze ulande idatha, futhi uqoqe ideshibhodi:

Ikhodi yedeshibhodi

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Imithombo yedeshibhodi iyatholakala kokuthi I-CodeSandbox.

Source: www.habr.com

Engeza amazwana