ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

ʻO ka maʻamau, hoʻohana ʻia nā huahana pāʻoihana a i ʻole nā ​​​​mea ʻē aʻe i hoʻomākaukau ʻia, e like me Prometheus + Grafana, e nānā a nānā i ka hana o Nginx. He koho maikaʻi kēia no ka nānā ʻana a i ʻole ka nānā ʻana i ka manawa maoli, akā ʻaʻole maʻalahi no ka loiloi mōʻaukala. Ma nā kumuwaiwai kaulana, e ulu wikiwiki ana ka nui o ka ʻikepili mai nā logs nginx, a no ka nānā ʻana i ka nui o ka ʻikepili, kūpono ke hoʻohana ʻana i kahi mea kūikawā.

Ma kēia ʻatikala e haʻi wau iā ʻoe pehea e hoʻohana ai ʻO Athena e kālailai i nā lāʻau, e lawe ana iā Nginx ma ke ʻano he laʻana, a e hōʻike wau pehea e hōʻuluʻulu ai i kahi dashboard analytical mai kēia ʻikepili me ka hoʻohana ʻana i ka open-source cube.js framework. Eia ka hoʻolālā hoʻonā piha.

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

TL:DR;
Link i ka dashboard i pau.

No ka ʻohi ʻana i ka ʻike a mākou e hoʻohana ai fluentd, no ka hana - AWS Kinesis Data Firehose и Kopa AWS, no ka waiho ʻana - AWS S3. Ke hoʻohana nei i kēia pūpū, hiki iā ʻoe ke mālama i nā log nginx wale nō, akā i nā hanana ʻē aʻe, a me nā loina o nā lawelawe ʻē aʻe. Hiki iā ʻoe ke hoʻololi i kekahi mau ʻāpana me nā mea like no kāu stack, no ka laʻana, hiki iā ʻoe ke kākau i nā logs i kinesis pololei mai nginx, bypassing fluentd, a hoʻohana i ka logstash no kēia.

E hōʻiliʻili i nā log Nginx

Ma ka maʻamau, ʻike ʻia nā logs Nginx e like me kēia:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Hiki ke hoʻopili ʻia, akā ʻoi aku ka maʻalahi o ka hoʻoponopono ʻana i ka hoʻonohonoho Nginx i mea e hana ai i nā lāʻau i JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 no ka mālama ʻana

No ka mālama ʻana i nā lāʻau, e hoʻohana mākou iā S3. Hiki iā ʻoe ke mālama a hoʻopaʻa i nā lāʻau ma kahi hoʻokahi, no ka mea hiki iā Athena ke hana me ka ʻikepili ma S3 pololei. Ma hope o ka ʻatikala, e haʻi wau iā ʻoe pehea e hoʻohui pono ai a hoʻoponopono i nā lāʻau, akā pono mākou i kahi bākeke maʻemaʻe ma S3, kahi mea ʻole e mālama ʻia. Pono e noʻonoʻo mua i ka ʻāina āu e hana ai i kāu bākeke, no ka mea ʻaʻole loaʻa ʻo Athena ma nā wahi āpau.

Ke hana nei i kahi kaapuni ma ka console Athena

E hana kākou i papa ma Athena no nā lāʻau. Pono ia no ka kākau ʻana a me ka heluhelu inā hoʻolālā ʻoe e hoʻohana i ka Kinesis Firehose. E wehe i ka console Athena a hana i kahi papaʻaina:

Hana ʻia ka papaʻaina SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

E hana ana i ke kahawai Kinesis Firehose

E kākau ʻo Kinesis Firehose i ka ʻikepili i loaʻa mai Nginx a i S3 ma ke ʻano i koho ʻia, e hoʻokaʻawale iā ia i loko o nā papa kuhikuhi ma ka format YYYY/MM/DD/HH. Maikaʻi kēia i ka heluhelu ʻana i ka ʻikepili. Hiki iā ʻoe, ʻoiaʻiʻo, ke kākau pololei iā S3 mai fluentd, akā i kēia hihia, pono ʻoe e kākau iā JSON, a ʻaʻole kūpono kēia ma muli o ka nui o nā faila. Eia kekahi, i ka hoʻohana ʻana iā PrestoDB a i ʻole Athena, ʻo JSON ka hōʻano ikepili lohi. No laila e wehe i ka console Kinesis Firehose, kaomi "E hana i ke kahawai hāʻawi", koho i ka "PUT pololei" ma ka "delivery" kahua:

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

Ma ka ʻaoʻao aʻe, koho i ka "Record format conversion" - "Enabled" a koho iā "Apache ORC" e like me ke ʻano hoʻopaʻa. Wahi a kekahi noiʻi Owen O'Malley, ʻo ia ke ʻano kūpono loa no PrestoDB a me Athena. Hoʻohana mākou i ka papa i hana ʻia ma luna nei ma ke ʻano he schema. E ʻoluʻolu e hiki iā ʻoe ke kuhikuhi i kahi wahi S3 ma ke kinesis; hoʻohana wale ʻia ka schema mai ka papaʻaina. Akā inā ʻoe e kuhikuhi i kahi wahi S3 ʻē aʻe, a laila ʻaʻole hiki iā ʻoe ke heluhelu i kēia mau moʻolelo mai kēia papa.

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

Koho mākou iā S3 no ka mālama ʻana a me ka bakeke a mākou i hana ai ma mua. ʻO Aws Glue Crawler, ka mea aʻu e kamaʻilio ai ma hope iki, ʻaʻole hiki ke hana me nā prefix i loko o kahi bākeke S3, no laila he mea nui e waiho ʻole.

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

Hiki ke hoʻololi ʻia nā koho i koe ma muli o kāu ukana; Hoʻohana pinepine au i nā mea paʻamau. E hoʻomaopopo ʻaʻole loaʻa ka hoʻoemi S3, akā hoʻohana ʻo ORC i ka hoʻopiʻi maoli ma ke ʻano paʻamau.

fluentd

I kēia manawa ua hoʻonohonoho mākou i ka mālama ʻana a me ka loaʻa ʻana o nā lāʻau, pono mākou e hoʻonohonoho i ka hoʻouna ʻana. E hoʻohana mākou fluentd, no ka mea, aloha au iā Ruby, akā hiki iā ʻoe ke hoʻohana iā Logstash a i ʻole e hoʻouna pololei i nā lāʻau i kinesis. Hiki ke hoʻokuʻu ʻia ka server Fluentd ma nā ʻano he nui, e haʻi wau iā ʻoe e pili ana i ka docker no ka mea maʻalahi a maʻalahi.

ʻO ka mea mua, pono mākou i ka faila hoʻonohonoho fluent.conf. E hana a hoʻohui i ke kumu:

ʻAno i mua
helu 24224
paa 0.0.0.0

I kēia manawa hiki iā ʻoe ke hoʻomaka i ka kikowaena Fluentd. Inā makemake ʻoe i kahi hoʻonohonoho ʻoi aku ka holomua, e hele i Hub hoʻokau Aia kahi alakaʻi kikoʻī, me ke ʻano o ka hui ʻana i kāu kiʻi.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Hoʻohana kēia hoʻonohonoho i ke ala /fluentd/log e hūnā i nā moʻolelo ma mua o ka hoʻouna ʻana. Hiki iā ʻoe ke hana me ka ʻole o kēia, akā i ka wā e hoʻomaka hou ai, hiki iā ʻoe ke nalowale i nā mea āpau i hūnā ʻia me ka hana haʻi hope. Hiki iā ʻoe ke hoʻohana i kekahi awa; ʻo 24224 ka pahu Fluentd paʻamau.

I kēia manawa ua holo mākou iā Fluentd, hiki iā mākou ke hoʻouna i nā log Nginx ma laila. Hoʻohana pinepine mākou iā Nginx i loko o kahi pahu Docker, i ia manawa he mea hoʻokele logging maoli ʻo Docker no Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Inā holo ʻokoʻa ʻoe iā Nginx, hiki iā ʻoe ke hoʻohana i nā faila log, loaʻa iā Fluentd waihona huelo plugin.

E hoʻohui i ka log parsing i hoʻonohonoho ʻia ma luna i ka Fluent configuration:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

A hoʻouna i nā lāʻau i Kinesis me ka hoʻohana ʻana kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

ʻO Athena

Inā ua hoʻonohonoho pono ʻoe i nā mea āpau, a laila ma hope o kekahi manawa (ma ka maʻamau, loaʻa nā ʻikepili Kinesis i hoʻokahi manawa i kēlā me kēia 10 mau minuke) pono ʻoe e ʻike i nā faila log ma S3. Ma ka papa kuhikuhi "nānā" o Kinesis Firehose hiki iā ʻoe ke ʻike i ka nui o ka ʻikepili i hoʻopaʻa ʻia ma S3, a me nā hewa. Mai poina e hāʻawi i ke komo kākau i ka bakeke S3 i ka hana Kinesis. Inā ʻaʻole hiki iā Kinesis ke hoʻokaʻawale i kekahi mea, e hoʻohui ia i nā hewa i ka bākeke hoʻokahi.

I kēia manawa hiki iā ʻoe ke nānā i ka ʻikepili ma Athena. E ʻimi i nā noi hou loa i hoʻihoʻi aku ai mākou i nā hewa:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Ke nānā nei i nā moʻolelo a pau no kēlā me kēia noi

I kēia manawa ua hoʻoponopono ʻia kā mākou mau lāʻau a mālama ʻia ma S3 ma ORC, paʻa a mākaukau no ka nānā ʻana. Ua hoʻonohonoho ʻo Kinesis Firehose iā lākou i nā papa kuhikuhi no kēlā me kēia hola. Eia nō naʻe, ʻoiai ʻaʻole i māhele ʻia ka papaʻaina, e hoʻouka ʻo Athena i nā ʻikepili i nā manawa āpau ma kēlā me kēia noi, me nā ʻokoʻa ʻokoʻa. He pilikia nui kēia no nā kumu ʻelua:

  • Ke ulu mau nei ka nui o ka ʻikepili, hoʻolohi i nā nīnau;
  • Hoʻopili ʻia ʻo Athena ma muli o ka nui o ka ʻikepili i nānā ʻia, me ka liʻiliʻi o 10 MB no kēlā me kēia noi.

No ka hoʻoponopono ʻana i kēia, hoʻohana mākou i ka AWS Glue Crawler, nāna e kolo i ka ʻikepili ma S3 a kākau i ka ʻike ʻāpana i ka Glue Metastore. E ʻae kēia iā mākou e hoʻohana i nā ʻāpana ma ke ʻano he kānana i ka wā e nīnau ai iā Athena, a e nānā wale i nā papa kuhikuhi i kuhikuhi ʻia i ka nīnau.

Hoʻonohonoho i ka Amazon Glue Crawler

Nānā ʻo Amazon Glue Crawler i nā ʻikepili a pau i loko o ka bakeke S3 a hana i nā papa me nā ʻāpana. E hana i kahi Glue Crawler mai ka AWS Glue console a hoʻohui i kahi bākeke kahi āu e mālama ai i ka ʻikepili. Hiki iā ʻoe ke hoʻohana i hoʻokahi mea kolo no kekahi mau bākeke, a laila e hana ʻo ia i nā papa ma ka waihona i ʻōlelo ʻia me nā inoa e pili ana i nā inoa o nā bākeke. Inā hoʻolālā ʻoe e hoʻohana mau i kēia ʻikepili, e ʻoluʻolu e hoʻonohonoho i ka papa hoʻomaka o Crawler e kūpono i kāu mau pono. Hoʻohana mākou i hoʻokahi Crawler no nā papa a pau, e holo ana i kēlā me kēia hola.

Nā papa ʻaina ʻāpana

Ma hope o ka hoʻomaka mua ʻana o ka mea kolo, pono e hōʻike ʻia nā papa no kēlā me kēia bākeke i nānā ʻia i loko o ka waihona i kuhikuhi ʻia ma nā hoʻonohonoho. E wehe i ka console Athena a loaʻa i ka papa me nā log Nginx. E ho'āʻo kākou e heluhelu i kekahi mea:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

E koho kēia nīnau i nā moʻolelo a pau i loaʻa ma waena o 6 a.m. a me 7 a.m. ma ʻApelila 8, 2019. Akā, pehea ka maikaʻi o kēia ma mua o ka heluhelu wale ʻana mai kahi papa ʻaina ʻole? E ʻimi a koho i nā moʻolelo like, kānana ʻana iā lākou ma ka timestamp:

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

3.59 kekona a me 244.34 megabytes o ka ʻikepili ma kahi waihona me hoʻokahi pule o nā lāʻau. E ho'āʻo kākou i kānana ma ka ʻāpana:

ʻO ka loiloi log Nginx me ka hoʻohana ʻana iā Amazon Athena a me Cube.js

ʻOi aku ka wikiwiki, akā ʻo ka mea nui loa - ʻo 1.23 megabytes wale nō o ka ʻikepili! E ʻoi aku ka maikaʻi inā ʻaʻole no ka liʻiliʻi he 10 megabytes i kēlā me kēia noi ma ke kumu kūʻai. Akā ʻoi aku ka maikaʻi, a ma nā ʻikepili nui e ʻoi aku ka maikaʻi o ka ʻokoʻa.

Ke kūkulu ʻana i kahi dashboard me Cube.js

No ka hōʻuluʻulu ʻana i ka dashboard, hoʻohana mākou i ka Cube.js analytical framework. He nui nā hana, akā makemake mākou i ʻelua: ka hiki ke hoʻohana i nā kānana partition a me ka pre-aggregation data. Hoʻohana ia i ka schema data hoʻolālā ʻikepili, kākau ʻia ma Javascript e hoʻohua i SQL a hoʻokō i kahi nīnau ʻikepili. Pono mākou e hōʻike i ke ʻano o ka hoʻohana ʻana i ka kānana ʻāpana i ka schema data.

E hana kāua i kahi noi Cube.js hou. No ka mea ke hoʻohana nei mākou i ka AWS stack, kūpono ke hoʻohana ʻana iā Lambda no ka waiho ʻana. Hiki iā ʻoe ke hoʻohana i ka hōʻailona hōʻike no ka hanauna inā ʻoe e hoʻolālā e hoʻokipa i ka Cube.js backend ma Heroku a i ʻole Docker. Hōʻike ka palapala i nā mea ʻē aʻe nā ʻano hoʻokipa.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Hoʻohana ʻia nā ʻano hoʻololi kaiapuni no ka hoʻonohonoho ʻana i ka ʻike waihona ma cube.js. Na ka generator e hana i kahi faila .env kahi e hiki ai iā ʻoe ke kuhikuhi i kāu mau kī ʻO Athena.

I kēia manawa pono mākou hoʻolālā ʻikepili, kahi e hōʻike pololei ai mākou i ka mālama ʻana i kā mākou mau lāʻau. Ma laila hiki iā ʻoe ke kuhikuhi pehea e helu ai i nā metric no nā dashboards.

Ma ka papa kuhikuhi schema, hana i kahi faila Logs.js. Eia kahi hiʻohiʻona ʻikepili no ka nginx:

Code kumu hoʻohālike

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Eia mākou ke hoʻohana nei i ka mea hoʻololi FILTER_PARAMSe hana i kahi nīnau SQL me kahi kānana ʻāpana.

Hoʻonoho pū mākou i nā ana a me nā ʻāpana a mākou e makemake ai e hōʻike ma ka dashboard a kuhikuhi i nā pre-aggregations. E hana ʻo Cube.js i nā papaʻaina hou me ka ʻikepili i hoʻohui mua ʻia a e hōʻano hou i ka ʻikepili i kona hiki ʻana mai. ʻAʻole wale kēia e wikiwiki i nā nīnau, akā e hōʻemi hoʻi i ke kumukūʻai o ka hoʻohana ʻana iā Athena.

E hoʻohui i kēia ʻike i ka faila schema data:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Hōʻike mākou i loko o kēia kŘkohu he mea pono e hoʻohui mua i ka ʻikepili no nā ana a pau i hoʻohana ʻia, a hoʻohana i ka ʻāpana ma ka mahina. Hoʻokaʻawale mua ʻana hiki ke hoʻoikaika nui i ka hōʻiliʻili ʻikepili a me ka hoʻonui ʻana.

I kēia manawa hiki iā mākou ke hōʻuluʻulu i ka dashboard!

Hāʻawi ʻo Cube.js backend i koe API a he pūʻulu waihona puke mea kūʻai aku no nā kiʻi hana mua. E hoʻohana mākou i ka mana React o ka mea kūʻai aku e kūkulu i ka dashboard. Hāʻawi wale ʻo Cube.js i ka ʻikepili, no laila pono mākou i kahi waihona ʻike - makemake wau palapala hou, akā hiki iā ʻoe ke hoʻohana i kekahi.

Ua ʻae ke kikowaena Cube.js i ka noi ma Hōʻano JSON, e kuhikuhi ana i nā ana i makemake ʻia. No ka laʻana, e helu i ka nui o nā hewa a Nginx i hāʻawi i ka lā, pono ʻoe e hoʻouna i kēia noi:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

E hoʻokomo i ka mea kūʻai Cube.js a me ka waihona waihona React ma o NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Lawe mākou i nā ʻāpana cubejs и QueryRenderere kiʻi i ka ʻikepili, a e hōʻiliʻili i ka dashboard:

Ka helu papa kuhikuhi

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Loaʻa nā kumu Dashboard ma Pāʻālua Code.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka