Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

Yawanci, samfuran kasuwanci ko shirye-shiryen hanyoyin buɗe tushen, kamar Prometheus + Grafana, ana amfani da su don saka idanu da tantance aikin Nginx. Wannan zaɓi ne mai kyau don saka idanu ko nazari na lokaci-lokaci, amma bai dace sosai don nazarin tarihi ba. A kan kowane sanannen albarkatu, ƙarar bayanai daga rajistan ayyukan nginx yana girma cikin sauri, kuma don bincika babban adadin bayanai, yana da ma'ana don amfani da wani abu na musamman.

A cikin wannan labarin zan gaya muku yadda zaku iya amfani da shi Athena don nazarin rajistan ayyukan, ɗaukar Nginx a matsayin misali, kuma zan nuna yadda ake haɗa dashboard na nazari daga wannan bayanan ta amfani da tsarin buɗe tushen cube.js. Ga cikakken bayani architecture:

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

TL: DR;
Hanyar haɗi zuwa dashboard da aka gama.

Don tattara bayanan da muke amfani da su m, don sarrafawa - AWS Kinesis Data Firehose и AWS Manne, don ajiya - Farashin S3. Yin amfani da wannan dam ɗin, zaku iya adana ba kawai rajistan ayyukan nginx ba, har ma da sauran abubuwan da suka faru, gami da rajistan ayyukan wasu ayyuka. Kuna iya maye gurbin wasu sassa da makamantansu don tarin ku, alal misali, zaku iya rubuta rajistan ayyukan zuwa kinesis kai tsaye daga nginx, tsallakewa da kyau, ko amfani da logstash don wannan.

Tattara rajistan ayyukan Nginx

Ta hanyar tsoho, Nginx rajistan ayyukan suna kama da wani abu kamar haka:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Ana iya rarraba su, amma yana da sauƙin gyara tsarin Nginx don ya samar da rajistan ayyukan a JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 don ajiya

Don adana rajistan ayyukan, za mu yi amfani da S3. Wannan yana ba ku damar adanawa da bincika rajistan ayyukan a wuri ɗaya, tunda Athena na iya aiki tare da bayanai a cikin S3 kai tsaye. Daga baya a cikin labarin zan gaya muku yadda ake ƙara daidai da aiwatar da rajistan ayyukan, amma da farko muna buƙatar guga mai tsabta a cikin S3, wanda ba za a adana wani abu ba. Yana da kyau a yi la'akari da wuri a wane yanki ne za ku ƙirƙira guga a ciki, saboda Athena ba ya samuwa a duk yankuna.

Ƙirƙirar da'ira a cikin na'urar wasan bidiyo na Athena

Bari mu ƙirƙiri tebur a Athena don rajistan ayyukan. Ana buƙatar duka rubutu da karatu idan kuna shirin amfani da Kinesis Firehose. Bude wasan bidiyo na Athena kuma ƙirƙirar tebur:

Ƙirƙirar tebur SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ƙirƙirar Kinesis Firehose Rafi

Kinesis Firehose zai rubuta bayanan da aka karɓa daga Nginx zuwa S3 a cikin tsarin da aka zaɓa, yana rarraba shi zuwa kundin adireshi a cikin tsarin YYYY/MM/DD/HH. Wannan zai zo da amfani lokacin karanta bayanai. Kuna iya, ba shakka, rubuta kai tsaye zuwa S3 daga mai hankali, amma a wannan yanayin dole ne ku rubuta JSON, kuma wannan ba shi da inganci saboda girman fayilolin. Bugu da ƙari, lokacin amfani da PrestoDB ko Athena, JSON shine tsarin bayanai mafi hankali. Don haka buɗe na'urar wasan bidiyo na Kinesis Firehose, danna "Ƙirƙiri rafin bayarwa", zaɓi "PUT kai tsaye" a cikin filin "bayarwa":

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

A cikin shafin na gaba, zaɓi "Tsarin rikodin rikodin" - "An kunna" kuma zaɓi "Apache ORC" azaman tsarin rikodi. A cewar wasu bincike Owen O'Malley asalin, wannan shine mafi kyawun tsari don PrestoDB da Athena. Muna amfani da teburin da muka ƙirƙira a sama azaman tsari. Lura cewa zaku iya tantance kowane wuri S3 a cikin kinesis; kawai tsarin ana amfani dashi daga tebur. Amma idan kun saka wani wuri na S3 daban, to ba za ku iya karanta waɗannan bayanan daga wannan tebur ba.

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

Mun zaɓi S3 don ajiya da guga da muka ƙirƙira a baya. Aws Glue Crawler, wanda zan yi magana game da shi kadan daga baya, ba zai iya aiki tare da prefixes a cikin guga S3 ba, don haka yana da mahimmanci a bar shi fanko.

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

Za'a iya canza sauran zaɓuɓɓukan dangane da nauyin ku; Yawancin lokaci ina amfani da waɗanda suka dace. Lura cewa matsawar S3 baya samuwa, amma ORC tana amfani da matsawa ta asali ta tsohuwa.

m

Yanzu da muka tsara adanawa da karɓar rajistan ayyukan, muna buƙatar saita aikawa. Za mu yi amfani m, saboda ina son Ruby, amma zaka iya amfani da Logstash ko aika rajistan ayyukan zuwa kinesis kai tsaye. Ana iya ƙaddamar da uwar garken Fluentd ta hanyoyi da yawa, zan gaya muku game da docker saboda yana da sauƙi kuma mai dacewa.

Da farko, muna buƙatar fayil ɗin sanyi na fluent.conf. Ƙirƙiri shi kuma ƙara tushe:

type gaba
tashar 24224
daure 0.0.0.0

Yanzu zaku iya fara uwar garken Fluentd. Idan kuna buƙatar ingantaccen tsari, je zuwa Filin Docker Akwai cikakken jagora, gami da yadda ake haɗa hotonku.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Wannan tsari yana amfani da hanyar /fluentd/log zuwa cache logs kafin aikawa. Kuna iya yin hakan ba tare da wannan ba, amma lokacin da kuka sake farawa, zaku iya rasa duk abin da aka adana tare da aikin karya baya. Hakanan zaka iya amfani da kowace tashar jiragen ruwa; 24224 ita ce tsohuwar tashar Fluentd.

Yanzu da muke da Fluentd Gudun, za mu iya aika Nginx rajistan ayyukan a can. Yawancin lokaci muna gudanar da Nginx a cikin akwati Docker, a cikin abin da yanayin Docker yana da direban shiga na asali don Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Idan kuna gudanar da Nginx daban, zaku iya amfani da fayilolin log, Fluentd yana da fayil wutsiya plugin.

Bari mu ƙara ƙididdigar log ɗin da aka saita a sama zuwa tsarin Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Kuma aika rajistan ayyukan zuwa Kinesis ta amfani da kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Idan kun saita komai daidai, to bayan ɗan lokaci (ta tsohuwa, bayanan Kinesis sun karɓi bayanai sau ɗaya kowane minti 10) yakamata ku ga fayilolin log a cikin S3. A cikin menu na "sa idanu" na Kinesis Firehose zaka iya ganin adadin bayanai da aka rubuta a cikin S3, da kuma kurakurai. Kar a manta ba da damar rubutawa zuwa guga S3 zuwa rawar Kinesis. Idan Kinesis ba zai iya rarraba wani abu ba, zai ƙara kurakurai zuwa guga ɗaya.

Yanzu zaku iya duba bayanan a Athena. Bari mu nemo sabbin buƙatun da muka mayar da kurakurai don su:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Ana bincika duk bayanan don kowace buƙata

Yanzu an sarrafa rajistan ayyukan mu kuma an adana su a cikin S3 a cikin ORC, an matsa kuma a shirye don bincike. Kinesis Firehose har ma ya shirya su cikin kundin adireshi na kowace awa. Koyaya, muddin ba a raba teburin ba, Athena za ta loda bayanan lokaci-lokaci akan kowane buƙatu, tare da keɓantacce. Wannan babbar matsala ce saboda dalilai guda biyu:

  • Adadin bayanai yana ci gaba da girma, yana raguwar tambayoyin;
  • Ana cajin Athena bisa girman adadin bayanan da aka bincika, tare da mafi ƙarancin 10 MB akan kowane buƙata.

Don gyara wannan, muna amfani da AWS Glue Crawler, wanda zai ja bayanan a cikin S3 kuma ya rubuta bayanin bangare zuwa ga Glue Metastore. Wannan zai ba mu damar yin amfani da ɓangarori azaman tacewa yayin tambayar Athena, kuma za ta duba kundayen adireshi da aka kayyade a cikin tambayar kawai.

Kafa Amazon Glue Crawler

Amazon Glue Crawler yana duba duk bayanan da ke cikin guga na S3 kuma ya ƙirƙiri teburi tare da ɓangarori. Ƙirƙiri Crawler Manne daga AWS Glue console kuma ƙara guga inda kuke adana bayanan. Kuna iya amfani da rarrafe ɗaya don buckets da yawa, wanda a cikin wannan yanayin zai ƙirƙiri tebur a cikin ƙayyadaddun bayanai tare da sunaye waɗanda suka dace da sunayen buckets. Idan kuna shirin amfani da wannan bayanan akai-akai, tabbatar da saita jadawalin ƙaddamar da Crawler don dacewa da bukatunku. Muna amfani da Crawler guda ɗaya don duk teburi, wanda ke gudana kowace awa.

Teburan da aka raba

Bayan ƙaddamar da mai rarrafe na farko, tebur na kowane guga da aka bincika yakamata su bayyana a cikin bayanan da aka kayyade a cikin saitunan. Bude wasan bidiyo na Athena kuma nemo tebur tare da rajistan ayyukan Nginx. Mu gwada karanta wani abu:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Wannan tambayar za ta zaɓi duk bayanan da aka samu tsakanin 6 na safe zuwa 7 na safe ranar 8 ga Afrilu, 2019. Amma nawa ne wannan ya fi inganci fiye da karanta kawai daga teburin da ba a raba ba? Bari mu gano kuma mu zaɓi bayanan iri ɗaya, mu tace su ta tambarin lokaci:

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

3.59 seconds da 244.34 megabyte na bayanai a kan ma'ajin bayanai tare da mako guda na rajistan ayyukan. Mu gwada tace ta bangare:

Binciken log na Nginx ta amfani da Amazon Athena da Cube.js

Ƙananan sauri, amma mafi mahimmanci - kawai 1.23 megabyte na bayanai! Zai fi arha sosai idan ba don mafi ƙarancin megabyte 10 akan kowane buƙatu a cikin farashi ba. Amma har yanzu yana da kyau sosai, kuma a kan manyan ɗakunan bayanai bambancin zai fi ban sha'awa sosai.

Gina dashboard ta amfani da Cube.js

Don haɗa dashboard, muna amfani da tsarin nazari na Cube.js. Yana da ayyuka da yawa da yawa, amma muna sha'awar biyu: ikon yin amfani da abubuwan tacewa ta atomatik da tattara bayanai. Yana amfani da tsarin bayanai tsarin bayanai, an rubuta cikin Javascript don samar da SQL da aiwatar da tambayar bayanai. Muna buƙatar kawai nuna yadda ake amfani da tacewar bangare a cikin tsarin bayanai.

Bari mu ƙirƙiri sabon aikace-aikacen Cube.js. Tunda muna amfani da tarin AWS, yana da ma'ana don amfani da Lambda don turawa. Kuna iya amfani da samfurin bayyanannen tsararraki idan kuna shirin ɗaukar nauyin baya na Cube.js a cikin Heroku ko Docker. Takardun sun bayyana wasu hanyoyin karbar bakuncin.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Ana amfani da masu canjin yanayi don saita damar bayanai a cikin cube.js. Janareta zai ƙirƙiri fayil ɗin .env wanda a ciki za ku iya tantance maɓallan ku Athena.

Yanzu muna bukata tsarin bayanai, wanda a ciki za mu nuna daidai yadda ake adana gumakan mu. A can kuma zaku iya ƙididdige yadda ake ƙididdige awo don dashboards.

A cikin kundin adireshi schema, ƙirƙirar fayil Logs.js. Anan ga samfurin bayanai na nginx:

Lambar samfuri

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

A nan muna amfani da m FILTER_PARAMSdon samar da tambayar SQL tare da tace partition.

Mun kuma saita ma'auni da sigogi waɗanda muke son nunawa akan dashboard kuma mu ƙayyade abubuwan da aka riga aka haɗa. Cube.js zai ƙirƙiri ƙarin teburi tare da bayanan da aka riga aka tara kuma za su sabunta bayanan ta atomatik yayin da ya isa. Wannan ba kawai yana hanzarta tambayoyin ba, har ma yana rage farashin amfani da Athena.

Bari mu ƙara wannan bayanin zuwa fayil ɗin tsarin bayanai:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Mun ƙididdigewa a cikin wannan ƙirar cewa ya zama dole a riga an haɗa bayanai don duk ma'aunin da aka yi amfani da shi, da kuma amfani da rarrabuwa ta wata. Rarraba pre-taruwa na iya hanzarta tattara bayanai da sabuntawa.

Yanzu za mu iya harhada dashboard!

Cube.js backend yana bayarwa REST API da saitin ɗakunan karatu na abokin ciniki don shahararrun tsarin gaba-gaba. Za mu yi amfani da sigar React na abokin ciniki don gina dashboard. Cube.js yana ba da bayanai kawai, don haka za mu buƙaci ɗakin karatu na gani - ina son shi recharts, amma zaka iya amfani da kowane.

Sabar Cube.js tana karɓar buƙatun a ciki Tsarin JSON, wanda ke ƙayyade ma'aunin da ake buƙata. Misali, don lissafin kurakurai nawa Nginx ya bayar da rana, kuna buƙatar aika buƙatun mai zuwa:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Bari mu shigar da abokin ciniki na Cube.js da ɗakin karatu na bangaren React ta hanyar NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Muna shigo da kayan aikin cubejs и QueryRendererdon zazzage bayanan, da tattara dashboard:

Dashboard code

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Ana samun tushen dashboard a lambar sandbox.

source: www.habr.com

Add a comment