Nginx log analytics sebelisa Amazon Athena le Cube.js

Ka tloaelo, lihlahisoa tsa khoebo kapa mekhoa e meng e seng e entsoe e bulehileng, joalo ka Prometheus + Grafana, e sebelisoa ho beha leihlo le ho sekaseka ts'ebetso ea Nginx. Ena ke khetho e ntle ea ho beha leihlo kapa tlhahlobo ea nako ea nnete, empa ha e bonolo haholo bakeng sa tlhahlobo ea nalane. Ho sesebelisoa leha e le sefe se tsebahalang, palo ea data ho tsoa ho li-logs tsa nginx e ntse e hola ka potlako, 'me ho sekaseka palo e kholo ea data, hoa utloahala ho sebelisa ntho e ikhethang.

Sehloohong sena ke tla u bolella kamoo u ka e sebelisang kateng Athena ho sekaseka lits'oants'o, ho nka Nginx e le mohlala, 'me ke tla bontša mokhoa oa ho bokella dashboard ea tlhahlobo ho tsoa ho data ena ho sebelisa moralo o bulehileng oa cube.js. Mona ke tharollo e felletseng ea meralo:

Nginx log analytics sebelisa Amazon Athena le Cube.js

TL:DR;
Khoka ho dashboard e felileng.

Ho bokella lintlha tseo re li sebelisang E bua hantle, bakeng sa ho sebetsa - AWS Kinesis Data Firehose и Glue ea AWS, bakeng sa polokelo - AWS S3. U sebelisa bongata bona, u ke ke ua boloka li-logs tsa nginx feela, empa le liketsahalo tse ling, hammoho le lits'ebeletso tse ling. U ka nkela likarolo tse ling sebaka ka tse tšoanang bakeng sa stack ea hau, mohlala, u ka ngola li-log ho kinesis ka kotloloho ho tsoa ho nginx, u feta ka mokhoa o hlakileng, kapa u sebelise logstash bakeng sa sena.

Ho bokella lintlha tsa Nginx

Ka ho sa feleng, li-logs tsa Nginx li shebahala tjena:

4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-up HTTP/2.0" 200 9168 "https://example.com/sign-in" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"
4/9/2019 12:58:17 PM1.1.1.1 - - [09/Apr/2019:09:58:17 +0000] "GET /sign-in HTTP/2.0" 200 9168 "https://example.com/sign-up" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "-"

Li ka aroloa, empa ho bonolo haholo ho lokisa tlhophiso ea Nginx e le hore e hlahise lits'oants'o ho JSON:

log_format json_combined escape=json '{ "created_at": "$msec", '
            '"remote_addr": "$remote_addr", '
            '"remote_user": "$remote_user", '
            '"request": "$request", '
            '"status": $status, '
            '"bytes_sent": $bytes_sent, '
            '"request_length": $request_length, '
            '"request_time": $request_time, '
            '"http_referrer": "$http_referer", '
            '"http_x_forwarded_for": "$http_x_forwarded_for", '
            '"http_user_agent": "$http_user_agent" }';

access_log  /var/log/nginx/access.log  json_combined;

S3 bakeng sa polokelo

Ho boloka likutu, re tla sebelisa S3. Sena se o lumella ho boloka le ho sekaseka lits'oants'o sebakeng se le seng, kaha Athena e ka sebetsa le data ho S3 ka kotloloho. Hamorao sehloohong sena ke tla u bolella mokhoa oa ho eketsa le ho sebetsa ka mokhoa o nepahetseng, empa pele re hloka nkho e hloekileng ho S3, eo ho seng letho le tla bolokoa ho eona. Ho bohlokoa ho nahana esale pele hore na u tla theha nkho ea hau sebakeng sefe, hobane Athena ha e fumanehe libakeng tsohle.

Ho theha potoloho ho khomphutha ea Athena

Ha re theheng tafole ho Athena bakeng sa likutu. E ea hlokahala bakeng sa ho ngola le ho bala haeba u rera ho sebelisa Kinesis Firehose. Bula console ea Athena 'me u thehe tafole:

Tlhahiso ea tafole ea SQL

CREATE EXTERNAL TABLE `kinesis_logs_nginx`(
  `created_at` double, 
  `remote_addr` string, 
  `remote_user` string, 
  `request` string, 
  `status` int, 
  `bytes_sent` int, 
  `request_length` int, 
  `request_time` double, 
  `http_referrer` string, 
  `http_x_forwarded_for` string, 
  `http_user_agent` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://<YOUR-S3-BUCKET>'
TBLPROPERTIES ('has_encrypted_data'='false');

Ho theha Kinesis Firehose Stream

Kinesis Firehose e tla ngola lintlha tse fumanoeng ho tloha ho Nginx ho ea ho S3 ka mokhoa o khethiloeng, e li arole ka li-directory ka mokhoa oa YYYY/MM/DD/HH. Sena se tla ba molemo ha u bala data. Ha e le hantle, u ka ngolla S3 ka ho toba, empa tabeng ena u tla tlameha ho ngola JSON, 'me sena ha se sebetse ka lebaka la boholo ba lifaele. Ho phaella moo, ha u sebelisa PrestoDB kapa Athena, JSON ke mokhoa o fokolang ka ho fetisisa oa data. Kahoo bula console ea Kinesis Firehose, tobetsa "Theha phallo ea phalliso", khetha "PUT e tobileng" lebaleng la "delivery":

Nginx log analytics sebelisa Amazon Athena le Cube.js

Ho tab e latelang, khetha "Record format conversion" - "E nolofalitsoe" 'me u khethe "Apache ORC" e le mokhoa oa ho rekota. Ho latela lipatlisiso tse ling Owen O'Malley, ena ke sebopeho se nepahetseng bakeng sa PrestoDB le Athena. Re sebelisa tafole eo re e entseng ka holimo e le schema. Ka kopo hlokomela hore o ka hlakisa sebaka sefe kapa sefe sa S3 ho kinesis; ke schema feela e sebelisoang tafoleng. Empa haeba u bolela sebaka se fapaneng sa S3, joale u ke ke ua khona ho bala litlaleho tsena tafoleng ena.

Nginx log analytics sebelisa Amazon Athena le Cube.js

Re khetha S3 bakeng sa polokelo le nkho eo re e entseng pejana. Aws Glue Crawler, eo ke tla bua ka eona hamorao, e ke ke ea sebetsa le li-prefixes ka nkhong ea S3, kahoo ho bohlokoa ho e tlohela e se na letho.

Nginx log analytics sebelisa Amazon Athena le Cube.js

Likhetho tse setseng li ka fetoloa ho latela mojaro oa hau; hangata ke sebelisa tse sa feleng. Hlokomela hore S3 compression ha e fumanehe, empa ORC e ​​sebelisa compression ea tlhaho ka mokhoa o ikhethileng.

E bua hantle

Kaha joale re se re hlophisitse ho boloka le ho amohela li-log, re hloka ho hlophisa ho romella. Re tla sebelisa E bua hantle, hobane ke rata Ruby, empa u ka sebelisa Logstash kapa u romela likutu ho kinesis ka ho toba. Seva ea Fluentd e ka hlahisoa ka mekhoa e mengata, ke tla u joetsa ka docker hobane e bonolo ebile e bonolo.

Taba ea pele, re hloka faele ea tlhophiso ea fluent.conf. E thehe 'me u kenye mohloli:

mofuta pele
port 24224
tlama 0.0.0.0

Joale o ka qala seva ea Fluentd. Haeba o hloka tlhophiso e tsoetseng pele haholoanyane, e ea ho Sebaka sa Docker Ho na le tataiso e qaqileng, ho kenyelletsa le mokhoa oa ho kopanya setšoantšo sa hau.

$ docker run 
  -d 
  -p 24224:24224 
  -p 24224:24224/udp 
  -v /data:/fluentd/log 
  -v <PATH-TO-FLUENT-CONF>:/fluentd/etc fluentd 
  -c /fluentd/etc/fluent.conf
  fluent/fluentd:stable

Sebopeho sena se sebelisa mokhoa /fluentd/log ho boloka lifaele pele o romela. U ka etsa ntle le sena, empa ha u qala hape, u ka lahleheloa ke tsohle tse bolokiloeng ka mosebetsi o tsoileng matsoho. U ka sebelisa boema-kepe bofe kapa bofe; 24224 ke boema-kepe ba kamehla ba Fluentd.

Kaha joale re na le Fluentd e sebetsang, re ka romela li-log tsa Nginx moo. Hangata re tsamaisa Nginx ka sejaneng sa Docker, moo Docker e nang le mokhanni oa ho rema lifate oa Fluentd:

$ docker run 
--log-driver=fluentd 
--log-opt fluentd-address=<FLUENTD-SERVER-ADDRESS>
--log-opt tag="{{.Name}}" 
-v /some/content:/usr/share/nginx/html:ro 
-d 
nginx

Haeba o tsamaisa Nginx ka tsela e fapaneng, o ka sebelisa lifaele tsa log, Fluentd e na le eona faele mohatla plugin.

Ha re kenyelle parsing ea log e hlophisitsoeng ka holimo ho tlhophiso ea Fluent:

<filter YOUR-NGINX-TAG.*>
  @type parser
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

Le ho romella lintlha ho Kinesis u sebelisa kinesis firehose plugin:

<match YOUR-NGINX-TAG.*>
    @type kinesis_firehose
    region region
    delivery_stream_name <YOUR-KINESIS-STREAM-NAME>
    aws_key_id <YOUR-AWS-KEY-ID>
    aws_sec_key <YOUR_AWS-SEC_KEY>
</match>

Athena

Haeba u lokiselitse ntho e 'ngoe le e' ngoe ka nepo, joale ka mor'a nakoana (ka nako e sa lekanyetsoang, litlaleho tsa Kinesis li fumane data hang ka metsotso e meng le e meng ea 10) u lokela ho bona lifaele tsa log ho S3. Lenaneong la "ho beha leihlo" la Kinesis Firehose u ka bona hore na ke lintlha tse kae tse tlalehiloeng ho S3, hammoho le liphoso. Se ke oa lebala ho fana ka phihlello ea ho ngola ho bakete ea S3 ho karolo ea Kinesis. Haeba Kinesis e sa khone ho hlalosa ntho e itseng, e tla eketsa liphoso ka nkhong e le 'ngoe.

Joale o ka sheba data ho Athena. Ha re fumane likopo tsa morao-rao tseo re khutlisitseng liphoso tsa tsona:

SELECT * FROM "db_name"."table_name" WHERE status > 499 ORDER BY created_at DESC limit 10;

Ho hlahloba lirekoto tsohle bakeng sa kopo ka 'ngoe

Hona joale li-log tsa rona li se li sebelitsoe le ho bolokoa ho S3 ho ORC, li hatelloa ebile li loketse ho hlahlojoa. Kinesis Firehose e bile ea li hlophisa hore e be li-directory bakeng sa hora e 'ngoe le e 'ngoe. Leha ho le joalo, ha feela tafole e sa aroloa, Athena e tla kenya data ea nako eohle ho kopo e 'ngoe le e' ngoe, ntle le mekhelo e sa tloaelehang. Bona ke bothata bo boholo ka mabaka a mabeli:

  • Bophahamo ba data bo lula bo hola, bo fokotsa lipotso;
  • Athena e lefisoa ho ipapisitsoe le bongata ba data e hlahlobiloeng, ka bonyane ba 10 MB ka kopo ka 'ngoe.

Ho lokisa sena, re sebelisa AWS Glue Crawler, e tla khasa data ho S3 ebe e ngola lintlha tsa karohano ho Glue Metastore. Sena se tla re lumella ho sebelisa li-partitions joalo ka filthara ha re botsisisa Athena, 'me se tla hlahloba litsamaiso tse boletsoeng potsong feela.

Ho theha Amazon Glue Crawler

Amazon Glue Crawler e hlahloba lintlha tsohle ka baketeng ea S3 mme e etsa litafole tse nang le likarolo. Etsa Glue Crawler ho tloha ho AWS Glue console 'me u kenye bakete moo u bolokang data. U ka sebelisa sekhanni se le seng bakeng sa libakete tse 'maloa, moo ho tla etsa litafole sebakeng sa polokelo ea litaba tse boletsoeng tse nang le mabitso a lumellanang le mabitso a libakete. Haeba u rera ho sebelisa lintlha tsena khafetsa, etsa bonnete ba hore u lokisa kemiso ea ho qala Crawler ho lumellana le litlhoko tsa hau. Re sebelisa Crawler e le 'ngoe bakeng sa litafole tsohle, e sebetsang hora e' ngoe le e 'ngoe.

Litafole tse arohaneng

Kamora tlhahiso ea pele ea sekhanni, litafole tsa bakete e 'ngoe le e 'ngoe e hlahlobiloeng li lokela ho hlaha sebakeng sa polokelo se boletsoeng ho litlhophiso. Bula console ea Athena 'me u fumane tafole e nang le lifate tsa Nginx. Ha re leke ho bala ho hong:

SELECT * FROM "default"."part_demo_kinesis_bucket"
WHERE(
  partition_0 = '2019' AND
  partition_1 = '04' AND
  partition_2 = '08' AND
  partition_3 = '06'
  );

Potso ena e tla khetha lirekoto tsohle tse amohetsoeng pakeng tsa 6 a.m. le 7 a.m. ka la 8 Mmesa, 2019. Empa see se sebetsa hantle hakae ho feta ho bala feela tafoleng e sa arotsoeng? Ha re fumaneng 'me re khethe lirekoto tse tšoanang, re li sefa ka setempe sa nako:

Nginx log analytics sebelisa Amazon Athena le Cube.js

Metsotsoana e 3.59 le 244.34 megabytes ea data ho dataset e nang le beke feela ea lintlha. Ha re leke sefe ka ho arola:

Nginx log analytics sebelisa Amazon Athena le Cube.js

Ka potlako, empa ho bohlokoa ka ho fetisisa - ke megabyte ea 1.23 feela ea data! E ka ba theko e tlase haholo haeba e se ka bonyane ba megabyte tse 10 ka kopo ea litheko. Empa e ntse e le betere haholo, 'me ho li-dataset tse kholo phapang e tla ba e tsotehang haholo.

Ho aha dashboard ka ho sebelisa Cube.js

Ho kopanya dashboard, re sebelisa moralo oa tlhahlobo oa Cube.js. E na le mesebetsi e mengata, empa re thahasella tse peli: bokhoni ba ho sebelisa li-filters tsa karohano le ho bokella data pele. E sebelisa schema ea data schema ea data, e ngotsoeng ka Javascript ho hlahisa SQL le ho etsa potso ea database. Re hloka feela ho bonts'a mokhoa oa ho sebelisa filthara ea karohano ho schema ea data.

Ha re theheng sesebelisoa se secha sa Cube.js. Kaha re se re ntse re sebelisa stack ea AWS, hoa utloahala ho sebelisa Lambda bakeng sa ho tsamaisoa. U ka sebelisa template e hlakileng bakeng sa moloko haeba u rera ho amohela Cube.js backend ho Heroku kapa Docker. Litokomane li hlalosa tse ling mekhoa ea ho amohela.

$ npm install -g cubejs-cli
$ cubejs create nginx-log-analytics -t serverless -d athena

Liphetoho tsa tikoloho li sebelisoa ho hlophisa phihlello ea database ho cube.js. Jenereithara e tla etsa faele ea .env eo ho eona u ka bolelang linotlolo tsa hau bakeng sa Athena.

Joale rea e hloka schema ea data, eo ho eona re tla bontša hantle hore na li-log tsa rona li bolokoa joang. Ha u le moo u ka hlakisa mokhoa oa ho bala metrics bakeng sa dashboards.

Ka bukeng schema, etsa faele Logs.js. Mona ke mohlala oa data oa nginx:

Khoutu ea mohlala

const partitionFilter = (from, to) => `
    date(from_iso8601_timestamp(${from})) <= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d') AND
    date(from_iso8601_timestamp(${to})) >= date_parse(partition_0 || partition_1 || partition_2, '%Y%m%d')
    `

cube(`Logs`, {
  sql: `
  select * from part_demo_kinesis_bucket
  WHERE ${FILTER_PARAMS.Logs.createdAt.filter(partitionFilter)}
  `,

  measures: {
    count: {
      type: `count`,
    },

    errorCount: {
      type: `count`,
      filters: [
        { sql: `${CUBE.isError} = 'Yes'` }
      ]
    },

    errorRate: {
      type: `number`,
      sql: `100.0 * ${errorCount} / ${count}`,
      format: `percent`
    }
  },

  dimensions: {
    status: {
      sql: `status`,
      type: `number`
    },

    isError: {
      type: `string`,
      case: {
        when: [{
          sql: `${CUBE}.status >= 400`, label: `Yes`
        }],
        else: { label: `No` }
      }
    },

    createdAt: {
      sql: `from_unixtime(created_at)`,
      type: `time`
    }
  }
});

Mona re sebelisa phetoho FILTER_PARAMSho hlahisa potso ea SQL ka mochini oa ho arola.

Re boetse re beha metrics le liparamente tseo re batlang ho li hlahisa ho dashboard mme re hlakise li-pre-aggregations. Cube.js e tla theha litafole tse ling tse nang le data e kopaneng 'me e tla ntlafatsa data ka bo eona ha e fihla. Sena ha se potlakise lipotso feela, empa hape se fokotsa litšenyehelo tsa ho sebelisa Athena.

Ha re kenyelleng tlhahisoleseling ena faeleng ea schema ea data:

preAggregations: {
  main: {
    type: `rollup`,
    measureReferences: [count, errorCount],
    dimensionReferences: [isError, status],
    timeDimensionReference: createdAt,
    granularity: `day`,
    partitionGranularity: `month`,
    refreshKey: {
      sql: FILTER_PARAMS.Logs.createdAt.filter((from, to) => 
        `select
           CASE WHEN from_iso8601_timestamp(${to}) + interval '3' day > now()
           THEN date_trunc('hour', now()) END`
      )
    }
  }
}

Re hlakisa moetsong ona hore hoa hlokahala ho kopanya data esale pele bakeng sa lipalo tsohle tse sebelisoang, le ho sebelisa ho arola ka khoeli. Karohano ea pele ho aggregation e ka potlakisa haholo pokello ea data le ho e nchafatsa.

Joale re ka kopanya dashboard!

Cube.js backend e fana ka LING API le sehlopha sa lilaebrari tsa bareki bakeng sa meralo e tsebahalang ea pheletso. Re tla sebelisa mofuta oa React oa moreki ho aha dashboard. Cube.js e fana ka data feela, kahoo re tla hloka laeborari ea pono - kea e rata recharts, empa u ka sebelisa leha e le efe.

Seva ea Cube.js e amohela kopo ho Mofuta oa JSON, e hlakisang metrics e hlokahalang. Mohlala, ho bala hore na Nginx e fane ka liphoso tse kae ka letsatsi, o hloka ho romella kopo e latelang:

{
  "measures": ["Logs.errorCount"],
  "timeDimensions": [
    {
      "dimension": "Logs.createdAt",
      "dateRange": ["2019-01-01", "2019-01-07"],
      "granularity": "day"
    }
  ]
}

Ha re kenyeng moreki oa Cube.js le laeborari ea karolo ea React ka NPM:

$ npm i --save @cubejs-client/core @cubejs-client/react

Re kenya likarolo cubejs и QueryRendererho khoasolla lintlha, le ho bokella dashboard:

Dashboard khoutu

import React from 'react';
import { LineChart, Line, XAxis, YAxis } from 'recharts';
import cubejs from '@cubejs-client/core';
import { QueryRenderer } from '@cubejs-client/react';

const cubejsApi = cubejs(
  'YOUR-CUBEJS-API-TOKEN',
  { apiUrl: 'http://localhost:4000/cubejs-api/v1' },
);

export default () => {
  return (
    <QueryRenderer
      query={{
        measures: ['Logs.errorCount'],
        timeDimensions: [{
            dimension: 'Logs.createdAt',
            dateRange: ['2019-01-01', '2019-01-07'],
            granularity: 'day'
        }]
      }}
      cubejsApi={cubejsApi}
      render={({ resultSet }) => {
        if (!resultSet) {
          return 'Loading...';
        }

        return (
          <LineChart data={resultSet.rawData()}>
            <XAxis dataKey="Logs.createdAt"/>
            <YAxis/>
            <Line type="monotone" dataKey="Logs.errorCount" stroke="#8884d8"/>
          </LineChart>
        );
      }}
    />
  )
}

Mehloli ea dashboard e fumaneha ho khoutu ea sandbox.

Source: www.habr.com

Eketsa ka tlhaloso