Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

Caadi ahaan, alaabada ganacsiga ama beddelka ilo-furan oo diyaarsan, sida Prometheus + Grafana, ayaa loo isticmaalaa si loola socdo loona falanqeeyo hawlgalka Nginx. Tani waa ikhtiyaar wanaagsan oo loogu talagalay la socodka ama falanqaynta waqtiga-dhabta ah, laakiin maaha mid aad ugu habboon falanqaynta taariikheed. Khayraad kasta oo caan ah, mugga xogta laga helo diiwaannada nginx ayaa si degdeg ah u koraya, iyo in la falanqeeyo tiro badan oo xog ah, waa macquul in la isticmaalo shay gaar ah.

Maqaalkan waxaan kuu sheegi doonaa sida aad u isticmaali karto Athena si loo falanqeeyo diiwaannada, anigoo u qaadanaya Nginx tusaale ahaan, waxaanan tusi doonaa sida loo ururiyo dashboard-ka falanqaynta xogtan iyadoo la adeegsanayo qaab-dhismeedka cube.js-furan. Halkan waxaa ah qaab-dhismeedka xalka oo dhammaystiran:

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

TL:DR;
Isku xirka dashboard-ka la dhammeeyay.

Si aan u ururino macluumaadka aan isticmaalno si fiican u yaqaan, si loo habeeyo - AWS Kinesis Data Firehose ΠΈ Xabagta AWS, kaydinta - AWS S3. Isticmaalka xirmadan, waxaad kaydin kartaa ma aha oo kaliya nginx, laakiin sidoo kale dhacdooyinka kale, iyo sidoo kale diiwaannada adeegyada kale. Waxaad ku beddeli kartaa qaybo la mid ah xirmooyinkaaga, tusaale ahaan, waxaad si toos ah uga qori kartaa logs si kinesis nginx, adigoo si fiican u dhaafaya, ama u isticmaal logstash tan.

Ururinta diiwaannada Nginx

Sida caadiga ah, Nginx logs waxay u egyihiin sidatan:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Waa la kala saari karaa, laakiin aad bay u fududahay in la saxo qaabeynta Nginx si ay u soo saarto diiwaannada JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 ee kaydinta

Si loo kaydiyo logu, waxaan isticmaali doonaa S3. Tani waxay kuu ogolaaneysaa inaad kaydiso oo aad falanqeyso diiwaannada hal meel, maadaama Athena ay si toos ah ula shaqeyn karto xogta S3. Maqaalka dambe waxaan kuu sheegi doonaa sida saxda ah ee loogu daro loona habeeyo logyada, laakiin marka hore waxaan u baahanahay baaldi nadiif ah S3, kaas oo aan wax kale lagu kaydin doonin. Waxaa haboon in horay loo sii tixgeliyo gobolka aad ka abuuri doonto baaldigaaga, sababtoo ah Athena lagama heli karo dhammaan gobollada.

Abuuritaanka wareegga gudaha console Athena

Aan u samayno miis Athena loogu talagalay logu. Waxaa loo baahan yahay qoraalka iyo akhrinta labadaba haddii aad qorsheyneyso inaad isticmaasho Kinesis Firehose. Fur console-ka Athena oo samee miis:

Miiska SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Abuuritaanka qulqulka Kinesis Firehose

Kinesis Firehose waxay qori doontaa xogta laga helay Nginx ilaa S3 qaabka la doortay, iyada oo u qaybin doonta hagayaasha qaabka YYYY/MM/DD/HH. Tani waxay ku anfici doontaa markaad akhrinayso xogta. Dabcan, waxaad si toos ah ugu qori kartaa S3 adigoo si faseexa ah u qoraya, laakiin kiiskan waa inaad qortaa JSON, tani waa mid aan waxtar lahayn sababtoo ah cabbirka weyn ee faylasha. Intaa waxaa dheer, marka la isticmaalayo PrestoDB ama Athena, JSON waa xogta ugu gaabis ah. Markaa fur Kinesis Firehose console, dhagsii "Abuur qulqulka gudbinta", dooro "PUT toos ah" goobta "gaarsiinta":

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

Talaabada xigta, dooro "Beddelka qaabka Diiwaanka" - "Enabled" oo dooro "Apache ORC" qaabka duubista. Sida laga soo xigtay cilmi-baarista qaar Owen O'Malley, kani waa qaabka ugu wanagsan PrestoDB iyo Athena. Waxaan u isticmaalnaa shaxda aan kor ku abuurnay sida schema. Fadlan ogow inaad ku qeexi karto meel kasta S3 ee kinesis; kaliya schema ayaa laga isticmaalaa miiska. Laakin haddii aad sheegto meel kale oo S3 ah, markaa ma awoodid inaad akhrido diiwaanadan shaxdan.

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

Waxaan u dooranaa S3 kaydinta iyo baaldiga aan hore u abuurnay. Aws Glue Crawler, oo aan ka hadli doono wax yar ka dib, kuma shaqeyn karto horgalayaasha baaldi S3, markaa waa muhiim inaad ka tagto faaruq.

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

Ikhtiyaarada soo hadhay waa la bedeli karaa iyadoo ku xidhan culayskaaga; Badanaa waxaan isticmaalaa kuwa caadiga ah. Ogsoonow in cadaadiska S3 aan la heli karin, laakiin ORC waxay isticmaashaa isku-buufinta asalka ah.

si fiican u yaqaan

Hadda oo aanu habaysanay kaydinta iyo helida diiwaanka, waxaanu u baahanahay inaanu habaynno diritaanka. Waan isticmaali doonaa si fiican u yaqaan, sababtoo ah waan jeclahay Ruby, laakiin waxaad isticmaali kartaa Logstash ama waxaad si toos ah u diri kartaa logs si aad u hesho kinesis. Adeegga Fluentd waxaa loo bilaabi karaa siyaabo dhowr ah, waxaan kuu sheegi doonaa wax ku saabsan docker sababtoo ah waa mid fudud oo ku habboon.

Marka hore, waxaan u baahanahay faylka qaabeynta fluent.conf. Abuur oo ku dar isha:

nooca hore
dekadda 24224
xidhidh 0.0.0.0

Hadda waxaad bilaabi kartaa server-ka Fluentd. Haddii aad u baahan tahay qaabayn horumarsan, tag Xarunta Docker Waxaa jira hage faahfaahsan, oo ay ku jiraan sida loo ururiyo sawirkaaga.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Qaabayntani waxay isticmaashaa jidka /fluentd/log si loo kaydiyo diiwaannada ka hor inta aan la dirin. Waxaad samayn kartaa tan la'aanteed, laakiin markaa marka aad dib u bilowdo, waxaad waayi kartaa wax kasta oo lagu kaydiyay foosha dhabarka. Waxa kale oo aad isticmaali kartaa deked kasta; 24224 waa dekedda Fluentd ee caadiga ah.

Hadda oo aan haysano Fluentd orodka, waxaan u diri karnaa Nginx logs halkaas. Caadi ahaan Nginx ayaan ku dhex wadnaa weelka Docker, markaas oo Docker uu leeyahay darawal qori-goob ah oo u dhashay Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Haddii aad u maamusho Nginx si ka duwan, waxaad isticmaali kartaa faylasha log, Fluentd ayaa leh plugin dabada file.

Aynu ku darno falanqaynta diiwaanka ee kor lagu habeeyay qaabaynta Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Iyo u dirida logyada Kinesis adoo isticmaalaya kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Haddii aad wax walba si sax ah u dejisay, ka dib muddo ka dib (sida caadiga ah, diiwaannada Kinesis waxay heleen xogta hal mar 10kii daqiiqoba) waa inaad aragto faylasha log ee S3. Liiska "kormeerka" ee Kinesis Firehose waxaad arki kartaa inta xogta lagu duubay S3, iyo sidoo kale khaladaadka. Ha iloobin inaad siiso marin qoraal ah baaldiga S3 doorka Kinesis. Haddii Kinesis uu wax kala saari kari waayo, waxay ku dari doontaa khaladaadka baaldi la mid ah.

Hadda waxaad ku arki kartaa xogta gudaha Athena. Aynu helno codsiyadii ugu dambeeyay ee aan u celinay khaladaadka:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Sawirida dhammaan diiwaanada codsi kasta

Hadda diiwaannadayada waa la habeeyey oo lagu kaydiyey S3 gudaha ORC, la cufan oo diyaar u ah falanqaynta. Kinesis Firehose xitaa waxay u habaysay hagayaasha saacad kasta. Si kastaba ha ahaatee, ilaa inta miiska aan la qaybin, Athena waxay ku shuban doontaa xogta wakhti kasta codsi kasta, iyada oo laga reebo dhif ah. Tani waa dhibaato weyn laba sababood dartood:

  • Mugga xogta ayaa si joogto ah u koraya, hoos u dhigaya weydiimaha;
  • Athena waxaa lagu dalacayaa iyadoo lagu salaynayo mugga xogta la sawiray, iyadoo ugu yaraan 10 MB codsi kasta.

Si tan loo hagaajiyo, waxaanu isticmaalnaa AWS Glue Crawler, kaas oo gurguuran doona xogta ku jirta S3 oo u qori doona macluumaadka qaybta xabagta Metastore. Tani waxay noo ogolaan doontaa inaan u isticmaalno qaybaha shaandhada ahaan marka la waydiinayo Athena, oo waxay kaliya eegi doontaa hagayaasha lagu cayimay weydiinta.

Dejinta Amazon Glue Crawler

Amazon Glue Crawler waxay sawirtaa dhammaan xogta ku jirta baaldiga S3 waxayna abuurtaa miisas qaybo leh. Ka samee Gurguurta Xabagta AWS console oo ku dar baaldi meesha aad xogta ku kaydiso. Waxaad isticmaali kartaa hal gurguurte dhowr baaldiyo, taas oo ay dhacdo in ay abuuri doonto miisaska xogta la cayimay oo leh magacyo ku habboon magacyada baaldiyada. Haddii aad qorshaynayso inaad si joogto ah u isticmaasho xogtan, hubi inaad habayso jadwalka bilaabista Crawler si uu ugu habboonaado baahiyahaaga. Waxaan u isticmaalnaa hal Gurguurte dhammaan miisaska, kaas oo socda saacad kasta.

Miisaska qaybsan

Ka dib bilawga ugu horreeya ee gurguurta, miisaska baaldi kasta oo la sawiray waa in ay ka soo muuqdaan kaydka xogta ee lagu cayimay goobaha. Fur console-ka Athena oo ka hel miiska diiwaanka Nginx. Aan isku dayno inaan wax akhrino:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Weydiintani waxay dooran doontaa dhammaan diiwaanada la helay inta u dhaxaysa 6 subaxnimo iyo 7 subaxnimo ee Abriil 8, 2019. Laakiin intee in le'eg ayay tani ka waxtar badan tahay akhrinta miis aan qaybsanayn? Aan ogaano oo aan doorano diiwaan isku mid ah, anagoo ku shaandhaynayna wakhtiga:

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

3.59 ilbiriqsi iyo 244.34 megabyte oo xog ah oo ku saabsan kaydka xogta oo leh hal usbuuc oo kaliya. Aynu isku dayno shaandhaynta qayb ahaan:

Nginx log analytics iyadoo la adeegsanayo Amazon Athena iyo Cube.js

In yar oo degdeg ah, laakiin ugu muhiimsan - kaliya 1.23 megabyte oo xog ah! Aad bay u jaban tahay haddaanay ahayn ugu yaraan 10 megabytes codsi kasta ee qiimaha. Laakiin weli way ka sii fiican tahay, iyo kaydinta xogta weyn farqiga ayaa noqon doona mid aad u xiiso badan.

Dhisida dashboard-ka adigoo isticmaalaya Cube.js

Si loo ururiyo dashboard-ka, waxaanu isticmaalnaa qaabka falanqaynta ee Cube.js. Waxay leedahay hawlo badan, laakiin waxaan xiisaynaynaa laba: awoodda si toos ah loo isticmaalo filtarrada qaybta iyo isu-ururinta xogta. Waxay isticmaashaa xogta schema xogta schema, oo ku qoran Javascript si loo soo saaro SQL oo loo fuliyo su'aal database ah. Waxaan kaliya u baahanahay inaan muujino sida loo isticmaalo filtarka qaybinta ee qorshaha xogta.

Aynu abuurno codsi Cube.js cusub. Maadaama aan horeyba u isticmaalnay xirmada AWS, waa macquul in Lambda loo isticmaalo geynta. Waxaad isticmaali kartaa qaabka qeexan jiilka haddii aad qorsheyneyso inaad ku martigeliso dhabarka Cube.js ee Heroku ama Docker. Dukumeentigu wuxuu qeexayaa kuwa kale hababka martigelinta.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Doorsoomayaasha deegaanka ayaa loo isticmaalaa in lagu habeeyo gelitaanka xogta ee cube.js. Dab-dhaliye ayaa abuuri doona faylka .env kaas oo aad ku qeexi karto furahaaga Athena.

Hadda waxaan u baahanahay schema xogta, kaas oo aan ku muujin doono sida saxda ah ee loo kaydiyo lodayada. Halkaa waxa kale oo aad ku qeexi kartaa sida loo xisaabiyo cabbirada dashboards.

Hagaha ku jira schema, samee fayl Logs.js. Waa kan tusaalaha xogta tusaale ee nginx:

Koodhka qaabka

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Halkan waxaan isticmaaleynaa doorsoomaha FILTER_PARAMSsi loo abuuro su'aal SQL ah oo leh shaandhada qaybinta.

Waxaan sidoo kale dejinay cabbirada iyo cabbirrada aan rabno inaan ku muujinno dashboard-ka oo aan sheegno isku-darka horre. Cube.js waxay abuuri doontaa jadwal dheeri ah oo leh xog hore la isugu daray oo waxay si toos ah u cusboonaysiin doontaa xogta markay timaaddo. Tani kaliya ma dedejinayso su'aalaha, laakiin sidoo kale waxay yaraynaysaa kharashka isticmaalka Athena.

Aynu ku darno macluumaadkan faylka xogta qaabaynta:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Waxaan ku cadeyneynaa qaabkan in ay lagama maarmaan tahay in horay loo isku daro xogta dhammaan cabbirada la isticmaalo, oo la isticmaalo qaybinta bishiiba. Qaybinta hore-u-ururinta waxay si weyn u dedejin kartaa xog ururinta iyo cusboonaysiinta.

Hadda waxaan soo ururin karnaa dashboard-ka!

Cube.js dhabarka ayaa bixiya nasasho API iyo maktabado macmiil ah oo loogu talagalay qaab-dhismeedka-dhamaadka hore ee caanka ah. Waxaan u isticmaali doonaa nooca React ee macmiilka si aan u dhisno dashboard-ka. Cube.js kaliya waxay bixisaa xogta, marka waxaan u baahan doonaa maktabad muuqaal - waan jeclahay recharts, laakiin waxaad isticmaali kartaa mid kasta.

Adeegga Cube.js wuxuu aqbalayaa codsiga gudaha qaabka JSON, kaas oo qeexaya cabbirada loo baahan yahay. Tusaale ahaan, si loo xisaabiyo inta khalad ee Nginx bixisay maalintii, waxaad u baahan tahay inaad soo dirto codsigan soo socda:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Aynu ku rakibno macmiilka Cube.js iyo maktabadda qaybta React anagoo adeegsanayna NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Waxaan soo dejinaa qaybo cubejs ΠΈ QueryRenderersi aad u soo dejiso xogta, oo aad u ururiso dashboard-ka:

Koodhka dashboard-ka

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Ilaha dashboard-ka ayaa laga heli karaa sanduuqa sanduuqa code.

Source: www.habr.com

Add a comment