Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

Ko te tikanga, ko nga hua arumoni, ko nga momo puna tuwhera kua rite ranei, penei i a Prometheus + Grafana, ka whakamahia hei aroturuki me te tātari i te mahi a Nginx. He whiringa pai tenei mo te aro turuki, mo te tātaritanga-waahi, engari kaore i te tino watea mo te tātari o mua. I runga i tetahi rauemi rongonui, kei te tere haere te rahi o nga raraunga mai i nga raarangi nginx, me te tātari i te nui o nga raraunga, he mea tika ki te whakamahi i tetahi mea motuhake ake.

I roto i tenei tuhinga ka korero ahau ki a koe me pehea e taea ai e koe te whakamahi Athena ki te wetewete i nga raarangi, me te tango i a Nginx hei tauira, ka whakaatu ahau me pehea te whakahiato i te papatohu tātari mai i enei raraunga ma te whakamahi i te anga cube.js tuwhera puna. Anei te hanganga otinga katoa:

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

TL:DR;
Hono ki te papatohu kua oti.

Hei kohikohi korero ka whakamahia e matou matatau, mo te tukatuka - AWS Kinesis Raraunga Firehose и KAPI AWS, mo te rokiroki - AWS S3. Ma te whakamahi i tenei paihere, ka taea e koe te penapena i nga raarangi nginx anake, engari me etahi atu huihuinga, me nga raarangi o etahi atu ratonga. Ka taea e koe te whakakapi i etahi waahanga ki nga mea rite mo to taapu, hei tauira, ka taea e koe te tuhi i nga raarangi ki te kinesis tika mai i te nginx, te whakawhiti i te rere, te whakamahi ranei i te logstash mo tenei.

Kohikohi rangitaki Nginx

Ma te taunoa, he penei te ahua o nga raarangi Nginx:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Ka taea te taraihia, engari he maamaa ake te whakatika i te whirihoranga Nginx kia puta ai he raarangi ki JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 mo te rokiroki

Hei penapena i nga raarangi, ka whakamahia e matou te S3. Ma tenei ka taea e koe te penapena me te tātari i nga raarangi i te waahi kotahi, na te mea ka taea e Athena te mahi me nga raraunga i roto i te S3 tika. I muri mai i roto i te tuhinga ka korero atu ahau ki a koe me pehea te taapiri tika me te tukatuka i nga raarangi, engari ko te tuatahi ka hiahia maatau he peere ma i roto i te S3, kaore he mea ke atu e penapena. He pai ki te whakaaro i mua ko tehea rohe ka hangaia e koe to peere, na te mea kaore a Athena i te waatea ki nga rohe katoa.

Te hanga iahiko i roto i te papatohu Athena

Me hanga he ripanga ki Athena mo nga rakau. Ka hiahiatia mo te tuhi me te panui mena ka whakamahere koe ki te whakamahi Kinesis Firehose. Whakatuwheratia te papatohu Athena ka hanga he ripanga:

Waihanga tepu SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Te hanga Kinesis Firehose Stream

Ka tuhia e Kinesis Firehose nga raraunga i riro mai i Nginx ki te S3 i roto i te whakatakotoranga kua tohua, ka wehewehea ki nga raarangi i roto i te whakatakotoranga YYYY/MM/DD/HH. Ka whai hua tenei ki te panui raraunga. Ka taea e koe te tuhi tika ki a S3 mai i te fluentd, engari i tenei keehi me tuhi koe i a JSON, a he koretake tenei na te nui o nga konae. Hei taapiri, i te wa e whakamahi ana i a PrestoDB, Athena ranei, ko JSON te whakatakotoranga raraunga puhoi. Na whakatuwhera i te papatohu Kinesis Firehose, paato i te "Waihanga awa tuku", tohua te "PUT tika" ki te mara "tuku":

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

I te ripa e whai ake nei, tohua te "Record format conversion" - "Whakahohea" ka tohua "Apache ORC" hei whakatakotoranga tuhi. E ai ki etahi rangahau Owen O'Malley, koinei te whakatakotoranga tino pai mo PrestoDB me Athena. Ka whakamahia e matou te ripanga i hangaia i runga ake nei hei aronuinga. Kia mahara ka taea e koe te tautuhi i tetahi waahi S3 ki te kinesis; ko te aronuinga anake ka whakamahia mai i te ripanga. Engari ki te tohua e koe he waahi S3 rereke, karekau e taea e koe te panui i enei rekoata mai i tenei ripanga.

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

Ka tohua e matou te S3 mo te rokiroki me te peere i hanga e matou i mua. Ko te Aws Glue Crawler, ka korerohia e au i muri tata nei, kaore e taea te mahi me nga prefix i roto i te peere S3, no reira he mea nui kia noho kau.

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

Ko nga toenga ka taea te whakarereke i runga i to kawenga; I te nuinga o te waa ka whakamahia e au nga mea taunoa. Kia mahara ko te S3 compression kaore i te waatea, engari ka whakamahia e te ORC te taapiri taketake ma te taunoa.

matatau

Inaianei kua whirihorahia e matou te rokiroki me te tango i nga raarangi, me whirihora te tuku. Ka whakamahia e matou matatau, no te mea e aroha ana ahau ki a Ruby, engari ka taea e koe te whakamahi i te Logstash, te tuku tika ranei i nga raarangi ki te kinesis. Ka taea te whakarewahia te tūmau Fluentd i roto i nga huarahi maha, ka korero atu ahau ki a koe mo te docker na te mea he ngawari, he waatea hoki.

Tuatahi, me hiahia tatou i te kōnae whirihoranga fluent.conf. Waihangahia me te taapiri i te puna:

momo Tuhinga o mua
tauranga 24224
herea 0.0.0.0

Inaianei ka taea e koe te timata i te tūmau Fluentd. Mena ka hiahia koe ki te whirihoranga matatau ake, haere ki Hub tauranga He aratohu taipitopito, tae atu ki te whakahiato i to ahua.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Ka whakamahia e tenei whirihoranga te ara /fluentd/log ki te keteroki rangitaki i mua i te tuku. Ka taea e koe te mahi me te kore tenei, engari ka timata ano koe, ka ngaro koe i nga mea katoa i keteroki me te mahi pakaru. Ka taea hoki e koe te whakamahi i tetahi tauranga; 24224 te taunoa Fluentd tauranga.

Inaianei kei te rere a Fluentd, ka taea e matou te tuku i nga raarangi Nginx ki reira. I te nuinga o te wa ka whakahaerehia e matou a Nginx i roto i te ipu Docker, i roto i tera keehi he taraiwa takiuru taketake a Docker mo Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Mena ka rere ke koe i a Nginx, ka taea e koe te whakamahi i nga konae rangitaki, kei a Fluentd mono hiku kōnae.

Me tāpiri te poroporo rākau i whirihorahia i runga ake nei ki te whirihoranga Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Me te tuku rangitaki ki Kinesis ma te whakamahi mono kinesis firehose:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Mena kua whirihora tika koe i nga mea katoa, i muri i etahi wa (ma te taunoa, ko nga rekoata Kinesis ka whiwhi raraunga kotahi i nga meneti 10) me kite koe i nga konae takiuru i S3. I roto i te tahua "aroturuki" o Kinesis Firehose ka taea e koe te kite i te nui o nga raraunga ka tuhia ki te S3, me nga hapa. Kaua e wareware ki te tuku uru tuhi ki te peere S3 ki te mahi Kinesis. Ki te kore e taea e Kinesis te tarai i tetahi mea, ka taapirihia nga hapa ki te peere kotahi.

Ka taea e koe te tiro i nga raraunga i Athena. Me kimihia nga tono hou i whakahokia mai e matou nga hapa:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Matawai i nga rekoata katoa mo ia tono

Inaianei kua tukatukahia, kua penapenahia a matou pororakau ki te S3 i te ORC, kua kopirihia, kua rite mo te tātari. I whakaritea e Kinesis Firehose ki nga raarangi mo ia haora. Heoi, ki te kore e wehewehea te teepu, ka utaina e Athena nga raraunga mo nga wa katoa mo ia tono, me nga onge. He raru nui tenei mo nga take e rua:

  • Kei te tipu haere tonu te rahi o nga raraunga, ka whakaheke i nga patai;
  • Ka utuhia a Athena i runga i te nui o nga raraunga kua karapahia, me te iti rawa o te 10 MB mo ia tono.

Hei whakatika i tenei, ka whakamahia e matou te AWS Glue Crawler, ka ngoki nga raraunga i roto i te S3 me te tuhi i nga korero wehewehenga ki te Glue Metastore. Ma tenei ka taea e maatau te whakamahi i nga wehewehenga hei tātari i te patai ki a Athena, ka karapa noa i nga raarangi kua tohua ki te patai.

Te whakatu Amazon Glue Crawler

Ka tirotirohia e Amazon Glue Crawler nga raraunga katoa i roto i te peere S3 me te hanga ripanga me nga wehewehenga. Waihangahia he Kaitiaki Whakapiri mai i te papatohu AWS Glue ka taapirihia he peere hei rokiroki i nga raraunga. Ka taea e koe te whakamahi i te ngokingoki kotahi mo etahi peere, ka hangaia he ripanga i roto i te papaa raraunga kua tohua me nga ingoa e rite ana ki nga ingoa o nga peere. Mena kei te whakamahere koe ki te whakamahi i enei raraunga i ia wa, me whirihora i te raarangi whakarewatanga a Crawler kia rite ki o hiahia. Ka whakamahia e matou tetahi Crawler mo nga ripanga katoa, ka rere ia haora.

Tepu wehewehe

Whai muri i te whakarewanga tuatahi o te ngoki, me puta nga ripanga mo ia peere karapa ki te papaarangi kua tohua ki nga tautuhinga. Whakatuwherahia te papatohu Athena ka kitea te ripanga me nga raarangi Nginx. Me ngana ki te panui i tetahi mea:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Ka tohua e tenei patai nga rekoata katoa i tae mai i te 6 i te ata me te 7 i te ata i te Paenga-raa 8, 2019. Engari he aha te pai ake o tenei i te panui noa mai i te ripanga kore-wehea? Kia kimihia, ka kowhiri i nga rekoata ano, ka tātarihia ma te tohu wa:

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

3.59 hēkona me te 244.34 mekapaita o te raraunga i runga i te huingararaunga me te wiki noa o nga raarangi. Me whakamatau he tātari ma te wehewehe:

Nginx rangitaki tātaritanga whakamahi Amazon Athena me Cube.js

He iti ake te tere, engari ko te mea nui - ko te 1.23 megabytes anake o nga raraunga! He iti ake te utu mena kaore i te 10 megabytes iti mo ia tono i roto i te utu. Engari he pai ake, a, i runga i nga huingararaunga nui ka tino miharo te rereketanga.

Te hanga papatohu ma te whakamahi i te Cube.js

Hei whakahiato i te papatohu, ka whakamahia e matou te anga tātari Cube.js. He maha nga mahi, engari e rua taatau e aro ana: ko te kaha ki te whakamahi aunoa i nga whiriwhiringa wehewehenga me te whakahiato i mua i te raraunga. Ka whakamahia te aronuinga raraunga aronuinga raraunga, i tuhia ki te Javascript hei whakaputa i te SQL me te whakahaere i tetahi uiuinga raraunga. Me tohu noa me pehea te whakamahi i te tātari wehewehenga i roto i te aronuinga raraunga.

Me hanga he tono Cube.js hou. I te mea kei te whakamahi tatou i te AWS stack, he mea tika ki te whakamahi Lambda mo te tuku. Ka taea e koe te whakamahi i te tauira whakaatu mo te whakatipuranga mena ka whakamahere koe ki te manaaki i te Cube.js backend i Heroku, Docker ranei. Ko nga tuhinga e whakaatu ana i etahi atu tikanga manaaki.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Ka whakamahia nga taurangi taiao hei whirihora i te urunga raraunga ki cube.js. Ka waihangahia e te kaihanga he konae .env ka taea e koe te tohu i o taviri mo Athena.

Inaianei kei te hiahia tatou aronuinga raraunga, ka tohuhia e matou te ahua o te penapena o a matou raarangi. Ka taea hoki e koe te tautuhi me pehea te tatau i nga inenga mo nga papatohu.

I roto i te whaiaronga schema, hanga he kōnae Logs.js. Anei he tauira tauira raraunga mo te nginx:

Waehere tauira

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

I konei kei te whakamahi tatou i te taurangi FILTER_PARAMSki te whakaputa i tetahi patai SQL me te tātari wehewehe.

Ka tautuhia e matou nga inenga me nga tawhā e hiahia ana matou ki te whakaatu ki runga i te papatohu me te tautuhi i nga whakahiato-mua. Ka hangaia e Cube.js etahi atu ripanga me nga raraunga kua whakahiato-mua, ka whakahou aunoa i nga raraunga ka tae mai. Ehara i te mea ka tere ake nga paatai, engari ka whakaitihia te utu mo te whakamahi Athena.

Me taapiri enei korero ki te konae aronuinga raraunga:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Ka whakatauhia e matou i roto i tenei tauira he mea tika ki te whakahiato i mua i nga raraunga mo nga inenga katoa e whakamahia ana, me te whakamahi wehewehe i te marama. Te wehewehe i mua i te whakahiato ka taea te tere ake te kohi raraunga me te whakahou.

Inaianei ka taea e tatou te whakahiato i te papatohu!

Cube.js backend whakarato API REST me te huinga whare pukapuka kiritaki mo nga anga-mutunga rongonui. Ka whakamahia e matou te putanga React o te kiritaki ki te hanga i te papatohu. Ko te Cube.js anake te tuku raraunga, no reira ka hiahia matou ki te whare pukapuka tirohanga - he pai ki ahau tūtohi anō, engari ka taea e koe te whakamahi i tetahi.

Ka whakaae te tūmau Cube.js i te tono ki roto Hōputu JSON, e tohu ana i nga inenga e hiahiatia ana. Hei tauira, ki te tatau i te maha o nga hapa i tukuna e Nginx i te ra, me tuku e koe te tono e whai ake nei:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Tāutahia te kiritaki Cube.js me te whare pukapuka wāhanga React mā te NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Ka kawemai matou i nga waahanga cubejs и QueryRendererki te tango i nga raraunga, ka kohia te papatohu:

Waehere papatohu

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Kei te waatea nga puna papatohu i WaehereSandbox.

Source: will.com

Tāpiri i te kōrero