Ka nānā ʻana i nā kaʻina hana ETL ma kahi hale waihona ʻikepili liʻiliʻi

Nui ka poʻe e hoʻohana i nā mea hana kūikawā no ka hana ʻana i nā kaʻina hana no ka unuhi ʻana, hoʻololi, a me ka hoʻouka ʻana i ka ʻikepili i loko o nā ʻikepili pili. Hoʻopaʻa ʻia ke kaʻina hana o nā mea hana, paʻa nā hewa.

Inā he hewa, loaʻa i ka log ka ʻike i hiki ʻole i ka hāmeʻa ke hoʻopau i ka hana a me nā modules (pinepine java) i kū i kahi. Ma nā laina hope loa, hiki iā ʻoe ke ʻike i kahi hewa ʻikepili, no ka laʻana, kahi kī kī kū hoʻokahi papa.

No ka pane ʻana i ka nīnau he aha ke kuleana o ka ʻike hewa ETL, ua hoʻokaʻawale wau i nā pilikia āpau i loaʻa i nā makahiki ʻelua i hala aku nei i loko o kahi waihona nui.

Ka nānā ʻana i nā kaʻina hana ETL ma kahi hale waihona ʻikepili liʻiliʻi

ʻAʻole lawa ka hakahaka o ka waihona, pili ka nalowale, kau ʻia ka hālāwai, etc.

Loaʻa nā hewa kūpono e like me ka uhaki ʻana i nā kī papaʻaina, nā mea kūpono ʻole, nele i ke komo ʻana i nā mea, etc.
ʻAʻole hoʻomaka ka mea hoʻonohonoho i ka manawa, hiki ke maloʻo, etc.

ʻAʻole lōʻihi ka hoʻoponopono ʻana i nā hewa maʻalahi. Hiki i kahi ETL maikaʻi ke mālama i ka hapa nui o lākou ma kāna iho.

Pono nā ʻōpala paʻakikī e ʻimi a hoʻāʻo i nā kaʻina hana no ka hana ʻana me ka ʻikepili, e ʻimi i nā kumu ʻikepili. Ke alakaʻi pinepine nei i ka pono no ka hoʻololi ʻana i ka hoʻāʻo a me ka hoʻolaha.

No laila, pili ka hapalua o nā pilikia a pau i ka waihona. ʻO 48% o nā hewa āpau he mau hewa maʻalahi.
ʻO ka hapakolu o nā pilikia a pau e pili ana i ka hoʻololi ʻana i ka logic storage a i ʻole model, ʻoi aku ka paʻakikī o ka hapalua o kēia mau hewa.

A emi iho ma lalo o ka hapahā o nā pilikia a pau e pili ana i ka mea hoʻonohonoho hana, 18% o ia mau hewa maʻalahi.

Ma keʻano laulā, he paʻakikī ka 22% o nā hewa a pau, a ʻo kā lākou hoʻoponopono e pono ai ka nānā a me ka manawa. Hana lākou ma kahi o hoʻokahi manawa i ka pule. ʻOiai hiki mai nā hewa maʻalahi i kēlā me kēia lā.

ʻIke ʻia e ʻoi aku ka maikaʻi o ka nānā ʻana i nā kaʻina hana ETL ke hōʻike ʻia ka wahi o ka hewa i ka log me ka pololei a pono ka manawa liʻiliʻi e ʻike i ke kumu o ka pilikia.

Nānā maikaʻi

He aha kaʻu i makemake ai e ʻike ma ke kaʻina hana nānā ETL?

Ka nānā ʻana i nā kaʻina hana ETL ma kahi hale waihona ʻikepili liʻiliʻi
E hoʻomaka i - i kona hoʻomaka ʻana i ka hana,
Puna - kumu ʻikepili,
Layer - he aha ka pae o ka waiho ʻana e hoʻouka ʻia,
ETL Job Name - kaʻina hana hoʻouka, aia i nā pae liʻiliʻi he nui,
Helu ʻanuʻu - ka helu o ka hana ʻana,
Nā lālani i hoʻopili ʻia - pehea ka nui o ka ʻikepili i hoʻoili ʻia,
Ka lōʻihi sec - pehea ka lōʻihi,
Kūlana - inā maikaʻi nā mea a pau a ʻaʻole paha: OK, ERROR, RUNNING, HANGS
Memo - Memo hope loa a i ʻole wehewehe hewa.

Ma muli o ke kūlana o nā moʻolelo, hiki iā ʻoe ke hoʻouna i kahi leka uila. leka i nā lālā ʻē aʻe. Inā ʻaʻohe hewa, ʻaʻole pono ka leka.

No laila, inā he hewa, hōʻike maopopo ʻia kahi o ka hanana.

I kekahi manawa, ʻaʻole hana ka hāmeʻa nānā ponoʻī. I kēia hihia, hiki ke kāhea i kahi nānā (nānā) pololei i ka waihona, ma ke kumu i kūkulu ʻia ai ka hōʻike.

Papa kiaʻi ETL

No ka hoʻokō ʻana i ka nānā ʻana i nā kaʻina hana ETL, hoʻokahi papa a me hoʻokahi ʻike ua lawa.

No ka hana ʻana i kēia, hiki iā ʻoe ke hoʻi i kou wahi waihona liʻiliʻi a hana i ka prototype i ka waihona sqlite.

Nā papa DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Nānā/Hōʻike DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Checking inā hiki ke kiʻi i kahi helu kau hou

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Nā hiʻohiʻona papa:

  • pono e hahai ʻia ka hoʻomaka a me ka hopena o ke kaʻina hana ʻikepili e nā ʻanuʻu ETL_START a me ETL_END
  • inā he hewa, pono e hana ʻia ka ʻanuʻu ETL_ERROR me kāna wehewehe
  • ka nui o nā ʻikepili i hoʻoponopono ʻia e hoʻokaʻawale ʻia, no ka laʻana, me nā asterisk
  • hiki ke hoʻomaka i ke kaʻina hana like i ka manawa like me ka force_restart = y parameter, me ka ʻole o ka helu kau e hoʻopuka wale ʻia i ke kaʻina hana i hoʻopau ʻia.
  • ma ke ʻano maʻamau, ʻaʻole hiki iā ʻoe ke holo i ke kaʻina hana hoʻoili ʻikepili like i ka like

ʻO nā hana e pono ai no ka hanaʻana me ka papaʻaina penei:

  • loaʻa i ka helu kau o ke kaʻina hana ETL
  • e hoʻokomo i ka log log into table
  • ka loaʻa ʻana o ka moʻolelo kūleʻa hope loa o kahi kaʻina hana ETL

Ma nā waihona e like me Oracle a i ʻole Postgres, hiki ke hoʻokō ʻia kēia mau hana ma ke ʻano he hana i kūkulu ʻia. Pono ka sqlite i kahi hana waho, a ma kēia hihia prototyped ma PHP.

hopena

No laila, hoʻokani nā mega hewa i nā mea hana hoʻoili ʻikepili i kahi mega-nui. Akā paʻakikī ke kapa ʻana iā lākou ʻoi loa no ka ʻimi wikiwiki ʻana i ke kumu o ka pilikia. Ke hoʻokokoke ka helu o nā kaʻina hana i hoʻokahi haneli, a laila lilo ka nānā ʻana i ke kaʻina hana i kahi papahana paʻakikī.

Hāʻawi ka ʻatikala i kahi laʻana o kahi hoʻonā hiki i ka pilikia ma ke ʻano o kahi prototype. Loaʻa ka prototype waihona liʻiliʻi āpau ma gitlab SQLite PHP ETL Utilities.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka