Nginx log analytics siv Amazon Athena thiab Cube.js

Feem ntau, cov khoom lag luam lossis cov khoom lag luam npaj tau qhib, xws li Prometheus + Grafana, yog siv los saib xyuas thiab txheeb xyuas cov haujlwm ntawm Nginx. Qhov no yog qhov kev xaiv zoo rau kev saib xyuas lossis kev txheeb xyuas lub sijhawm tiag tiag, tab sis tsis yooj yim heev rau kev txheeb xyuas keeb kwm. Ntawm txhua qhov chaw muaj npe nrov, qhov ntim ntawm cov ntaub ntawv los ntawm nginx cav tau loj hlob sai, thiab txhawm rau txheeb xyuas cov ntaub ntawv ntau, nws yog qhov tsim nyog los siv qee yam tshwj xeeb.

Hauv tsab xov xwm no kuv yuav qhia koj seb koj tuaj yeem siv li cas Athena txhawm rau txheeb xyuas cov cav, coj Nginx ua piv txwv, thiab kuv yuav qhia yuav ua li cas los sib sau ua ke cov kev tshuaj ntsuam xyuas dashboard los ntawm cov ntaub ntawv no siv qhov qhib-qhov cube.js moj khaum. Nov yog qhov ua tiav kev daws teeb meem architecture:

Nginx log analytics siv Amazon Athena thiab Cube.js

TL:DR;
Txuas mus rau lub dashboard tiav.

Los sau cov ntaub ntawv peb siv txawj, rau kev ua - AWS Kinesis Data Firehose ΠΈ AWS Kua nplaum, rau ruaj khov - AWS S3. Siv cov pob no, koj tuaj yeem khaws tsis tau tsuas yog nginx cav, tab sis kuj muaj lwm yam xwm txheej, nrog rau cov cav ntawm lwm cov kev pabcuam. Koj tuaj yeem hloov qee qhov chaw nrog cov zoo sib xws rau koj pawg, piv txwv li, koj tuaj yeem sau cov cav rau kinesis ncaj qha los ntawm nginx, hla dhau fluentd, lossis siv logstash rau qhov no.

Sau Nginx cav

Los ntawm lub neej ntawd, Nginx cav zoo li no:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Lawv tuaj yeem raug cais tawm, tab sis nws yooj yim dua los kho Nginx teeb tsa kom nws tsim cov cav hauv JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 rau kev khaws cia

Txhawm rau khaws cov cav, peb yuav siv S3. Qhov no tso cai rau koj khaws thiab txheeb xyuas cov cav hauv ib qho chaw, txij li Athena tuaj yeem ua haujlwm nrog cov ntaub ntawv hauv S3 ncaj qha. Tom qab ntawv hauv tsab xov xwm kuv yuav qhia koj yuav ua li cas kom raug ntxiv thiab txheej txheem cav, tab sis ua ntej peb xav tau lub thoob huv hauv S3, uas tsis muaj dab tsi ntxiv yuav raug khaws cia. Nws tsim nyog xav txog ua ntej thaj tsam twg koj yuav tsim koj lub thoob, vim Athena tsis muaj nyob hauv txhua cheeb tsam.

Tsim ib lub voj voog hauv Athena console

Cia peb tsim ib lub rooj hauv Athena rau cov cav. Nws yog qhov xav tau rau kev sau ntawv thiab nyeem ntawv yog tias koj npaj yuav siv Kinesis Firehose. Qhib Athena console thiab tsim ib lub rooj:

SQL table creation

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Tsim Kinesis Firehose kwj

Kinesis Firehose yuav sau cov ntaub ntawv tau txais los ntawm Nginx rau S3 hauv cov ntawv xaiv, faib rau hauv cov ntawv hauv YYYY / MM / DD / HH hom. Qhov no yuav ua tau yooj yim thaum nyeem cov ntaub ntawv. Koj tuaj yeem, tau kawg, sau ncaj qha rau S3 los ntawm kev txawj ntse, tab sis qhov no koj yuav tau sau JSON, thiab qhov no tsis muaj txiaj ntsig vim qhov loj ntawm cov ntaub ntawv. Tsis tas li ntawd, thaum siv PrestoDB lossis Athena, JSON yog cov ntaub ntawv qeeb tshaj plaws. Yog li qhib lub Kinesis Firehose console, nyem "Tsim kev xa khoom xa tuaj", xaiv "PUT ncaj qha" hauv "kev xa khoom" teb:

Nginx log analytics siv Amazon Athena thiab Cube.js

Hauv tab tom ntej, xaiv "Cov ntaub ntawv hloov dua siab tshiab" - "Enabled" thiab xaiv "Apache ORC" ua hom ntawv kaw. Raws li qee qhov kev tshawb fawb Owen O'Malley, qhov no yog hom ntawv zoo rau PrestoDB thiab Athena. Peb siv lub rooj peb tsim saum toj no raws li ib tug schema. Thov nco ntsoov tias koj tuaj yeem hais qhia txhua qhov chaw S3 hauv kinesis; tsuas yog cov schema yog siv los ntawm lub rooj. Tab sis yog tias koj teev qhov chaw sib txawv S3, ces koj yuav tsis tuaj yeem nyeem cov ntaub ntawv no los ntawm lub rooj no.

Nginx log analytics siv Amazon Athena thiab Cube.js

Peb xaiv S3 rau kev khaws cia thiab lub thoob uas peb tsim ua ntej. Aws Glue Crawler, uas kuv yuav tham txog me ntsis tom qab, tsis tuaj yeem ua haujlwm nrog cov ntawv sau ua ntej hauv S3 thoob, yog li nws yog qhov tseem ceeb kom cia nws khoob.

Nginx log analytics siv Amazon Athena thiab Cube.js

Cov kev xaiv ntxiv tuaj yeem hloov pauv nyob ntawm koj qhov kev thauj khoom; Kuv feem ntau yog siv cov khoom qub. Nco ntsoov tias S3 compression tsis muaj, tab sis ORC siv ib txwm compression los ntawm lub neej ntawd.

txawj

Tam sim no uas peb tau teeb tsa kev khaws cia thiab txais cov cav, peb yuav tsum teeb tsa kev xa. Peb yuav siv txawj, vim kuv hlub Ruby, tab sis koj tuaj yeem siv Logstash lossis xa cov cav mus rau kinesis ncaj qha. Cov neeg rau zaub mov Fluentd tuaj yeem tso tawm ntau txoj hauv kev, kuv yuav qhia koj txog docker vim nws yooj yim thiab yooj yim.

Ua ntej, peb xav tau cov ntaub ntawv fluent.conf configuration. Tsim nws thiab ntxiv qhov chaw:

hom tom ntej
chaw nres nkoj 24224
sib 0.0.0.0

Tam sim no koj tuaj yeem pib Fluentd server. Yog tias koj xav tau kev teeb tsa siab dua, mus rau Docker hub Muaj cov lus qhia ntxaws ntxaws, suav nrog yuav ua li cas sau koj daim duab.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Qhov kev teeb tsa no siv txoj hauv kev /fluentd/log rau cache cav ua ntej xa. Koj tuaj yeem ua yam tsis muaj qhov no, tab sis tom qab ntawd thaum koj rov pib dua, koj tuaj yeem poob txhua yam cached nrog rov qab ua haujlwm. Koj tuaj yeem siv txhua qhov chaw nres nkoj; 24224 yog qhov chaw nres nkoj Fluentd default.

Tam sim no peb muaj Fluentd khiav, peb tuaj yeem xa Nginx cav rau ntawd. Peb feem ntau khiav Nginx hauv Docker thawv, nyob rau hauv rooj plaub no Docker muaj ib tus neeg tsav tsheb nkag rau Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Yog tias koj khiav Nginx txawv, koj tuaj yeem siv cov ntaub ntawv teev npe, Fluentd muaj file tail plugin.

Wb ntxiv lub cav parsing configured saum toj no rau lub Fluent configuration:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Thiab xa cov cav mus rau Kinesis siv kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Yog tias koj tau teeb tsa txhua yam kom raug, tom qab ib ntus (los ntawm lub neej ntawd, Kinesis cov ntaub ntawv tau txais cov ntaub ntawv ib zaug txhua 10 feeb) koj yuav tsum pom cov ntaub ntawv teev npe hauv S3. Hauv "kev saib xyuas" cov ntawv qhia zaub mov ntawm Kinesis Firehose koj tuaj yeem pom ntau npaum li cas cov ntaub ntawv raug kaw hauv S3, nrog rau qhov yuam kev. Tsis txhob hnov ​​​​qab muab kev nkag mus rau S3 thoob rau lub luag haujlwm Kinesis. Yog tias Kinesis tsis tuaj yeem txheeb xyuas qee yam, nws yuav ntxiv qhov yuam kev rau tib lub thoob.

Tam sim no koj tuaj yeem saib cov ntaub ntawv hauv Athena. Cia peb pom qhov kev thov tshiab uas peb rov qab ua yuam kev:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Tshawb xyuas txhua cov ntaub ntawv rau txhua qhov kev thov

Tam sim no peb cov cav tau ua tiav thiab khaws cia hauv S3 hauv ORC, compressed thiab npaj rau kev tshuaj xyuas. Kinesis Firehose txawm muab lawv tso rau hauv cov npe rau txhua teev. Txawm li cas los xij, ntev npaum li lub rooj tsis muab faib, Athena yuav thauj cov ntaub ntawv txhua lub sijhawm ntawm txhua qhov kev thov, tsis tshua muaj kev zam. Qhov no yog ib qho teeb meem loj rau ob qho laj thawj:

  • Lub ntim ntawm cov ntaub ntawv yog tas li loj hlob, qeeb cov lus nug;
  • Athena raug nqi raws li qhov ntim ntawm cov ntaub ntawv luam theej duab, nrog tsawg kawg ntawm 10 MB ib qhov kev thov.

Txhawm rau kho qhov no, peb siv AWS Glue Crawler, uas yuav nkag tau cov ntaub ntawv hauv S3 thiab sau cov ntaub ntawv muab faib rau Glue Metastore. Qhov no yuav tso cai rau peb siv cov partitions ua lub lim thaum nug Athena, thiab nws tsuas yog luam theej duab cov npe teev hauv cov lus nug.

Teeb tsa Amazon Glue Crawler

Amazon Glue Crawler scans tag nrho cov ntaub ntawv hauv S3 thoob thiab tsim cov ntxhuav nrog cov faib. Tsim cov kua nplaum Crawler los ntawm AWS Glue console thiab ntxiv ib lub thoob uas koj khaws cov ntaub ntawv. Koj tuaj yeem siv ib lub crawler rau ntau lub thoob, qhov twg nws yuav tsim cov ntxhuav hauv cov ntaub ntawv teev tseg nrog cov npe uas phim cov npe ntawm cov thoob. Yog tias koj npaj yuav siv cov ntaub ntawv no tsis tu ncua, nco ntsoov teeb tsa Crawler lub sijhawm tso tawm kom haum rau koj cov kev xav tau. Peb siv ib lub Crawler rau txhua lub rooj, uas khiav txhua teev.

Cov rooj sib faib

Tom qab thawj zaug tso tawm ntawm lub crawler, cov ntxhuav rau txhua lub thoob scanned yuav tsum tshwm sim hauv cov ntaub ntawv teev tseg hauv cov chaw. Qhib Athena console thiab nrhiav lub rooj nrog Nginx cav. Cia peb sim nyeem qee yam:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Cov lus nug no yuav xaiv tag nrho cov ntaub ntawv tau txais thaum 6 teev sawv ntxov txog 7 teev sawv ntxov lub Plaub Hlis 8, 2019. Tab sis ntau npaum li cas qhov no ua tau zoo dua li kev nyeem ntawv los ntawm lub rooj tsis sib faib? Cia peb tshawb nrhiav thiab xaiv tib cov ntaub ntawv, lim lawv los ntawm lub sijhawm:

Nginx log analytics siv Amazon Athena thiab Cube.js

3.59 vib nas this thiab 244.34 megabytes ntawm cov ntaub ntawv ntawm cov ntaub ntawv nrog tsuas yog ib lub lim tiam ntawm cov cav. Cia peb sim cov lim los ntawm kev faib tawm:

Nginx log analytics siv Amazon Athena thiab Cube.js

Me ntsis sai dua, tab sis qhov tseem ceeb tshaj - tsuas yog 1.23 megabytes ntawm cov ntaub ntawv! Nws yuav pheej yig dua yog tias tsis yog rau qhov tsawg kawg nkaus 10 megabytes ib qhov kev thov hauv tus nqi. Tab sis nws tseem zoo dua, thiab ntawm cov ntaub ntawv loj qhov sib txawv yuav ua tau zoo dua.

Tsim lub dashboard siv Cube.js

Txhawm rau sib sau cov dashboard, peb siv Cube.js analytical moj khaum. Nws muaj ntau lub zog, tab sis peb xav tau ob: lub peev xwm los txiav txim siab siv cov lim dej sib faib thiab cov ntaub ntawv ua ntej sib sau ua ke. Nws siv cov ntaub ntawv schema cov ntaub ntawv schema, sau hauv Javascript los tsim SQL thiab ua cov lus nug database. Peb tsuas yog yuav tsum tau qhia yuav ua li cas siv cov lim cais hauv cov ntaub ntawv schema.

Wb tsim ib daim ntawv thov Cube.js tshiab. Txij li thaum peb twb tau siv AWS pawg, nws yog qhov tsim nyog siv Lambda rau kev xa tawm. Koj tuaj yeem siv tus qauv qhia rau tiam yog tias koj npaj los tuav lub Cube.js backend hauv Heroku lossis Docker. Cov ntaub ntawv piav qhia lwm tus hosting txoj kev.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Ib puag ncig hloov pauv tau siv los teeb tsa cov ntaub ntawv nkag hauv cube.js. Lub tshuab hluav taws xob yuav tsim cov ntaub ntawv .env uas koj tuaj yeem qhia koj cov yuam sij rau Athena.

Tam sim no peb xav tau cov ntaub ntawv schema, nyob rau hauv uas peb yuav qhia raws nraim li cas peb cov cav yuav khaws cia. Nyob ntawd koj tuaj yeem hais qhia yuav ua li cas xam cov metrics rau dashboards.

Hauv phau ntawv schema, tsim ib cov ntaub ntawv Logs.js. Nov yog ib qho piv txwv cov qauv ntaub ntawv rau nginx:

Qauv code

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Ntawm no peb tab tom siv qhov sib txawv FILTER_PARAMSlos tsim ib qho SQL query nrog ib tug muab faib lim.

Peb kuj tau teeb tsa cov kev ntsuas thiab cov ntsuas uas peb xav kom tso rau ntawm lub dashboard thiab qhia cov kev sib sau ua ntej. Cube.js yuav tsim cov rooj ntxiv nrog cov ntaub ntawv sib sau ua ntej thiab yuav hloov kho cov ntaub ntawv thaum nws tuaj txog. Qhov no tsis tsuas yog ceev cov lus nug, tab sis kuj txo tus nqi ntawm kev siv Athena.

Cia peb ntxiv cov ntaub ntawv no rau cov ntaub ntawv schema:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Peb qhia meej hauv cov qauv no tias nws yog qhov tsim nyog los sau cov ntaub ntawv ua ntej rau txhua qhov kev ntsuas siv, thiab siv kev faib tawm los ntawm lub hli. Pre-aggregation partitioning tuaj yeem ua kom ceev cov ntaub ntawv sau thiab hloov kho.

Tam sim no peb tuaj yeem sib sau ua ke dashboard!

Cube.js backend muab QIV API thiab ib txheej ntawm cov tsev qiv ntawv rau cov neeg siv khoom nrov rau pem hauv ntej-kawg. Peb yuav siv React version ntawm tus neeg siv khoom los tsim lub dashboard. Cube.js tsuas yog muab cov ntaub ntawv, yog li peb yuav xav tau lub tsev qiv ntawv pom pom - Kuv nyiam nws recharts, tab sis koj tuaj yeem siv txhua yam.

Cube.js server lees txais qhov kev thov hauv JSON format, uas qhia txog cov metrics xav tau. Piv txwv li, txhawm rau suav pes tsawg qhov yuam kev Nginx muab los ntawm hnub, koj yuav tsum xa cov lus thov hauv qab no:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Cia peb nruab Cube.js tus thov kev pab thiab cov tsev qiv ntawv React tivthaiv ntawm NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Peb import cov khoom cubejs ΠΈ QueryRenderermus download tau cov ntaub ntawv, thiab sau lub dashboard:

Dashboard code

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Dashboard qhov chaw muaj nyob ntawm code sandbox.

Tau qhov twg los: www.hab.com

Ntxiv ib saib