Mimojuto awọn ilana ETL ni ile itaja data kekere kan

Ọpọlọpọ lo awọn irinṣẹ amọja lati ṣẹda awọn ilana fun yiyo, iyipada, ati ikojọpọ data sinu awọn apoti isura data ibatan. Ilana ti awọn irinṣẹ iṣẹ ti wa ni ibuwolu wọle, awọn aṣiṣe ti wa ni atunṣe.

Ni ọran ti aṣiṣe kan, akọọlẹ naa ni alaye ti ọpa naa kuna lati pari iṣẹ-ṣiṣe naa ati iru awọn modulu (nigbagbogbo java) duro nibiti. Ni awọn laini to kẹhin, o le wa aṣiṣe data data, fun apẹẹrẹ, irufin bọtini alailẹgbẹ tabili kan.

Lati dahun ibeere ti ipa wo ni alaye aṣiṣe ETL ṣe, Mo ti pin gbogbo awọn iṣoro ti o waye ni ọdun meji sẹhin ni ibi ipamọ nla kan.

Mimojuto awọn ilana ETL ni ile itaja data kekere kan

Awọn aṣiṣe aaye data pẹlu ko to aaye, asopọ ti o sọnu, igba ti o sokọ, ati bẹbẹ lọ.

Awọn aṣiṣe ọgbọn pẹlu iru bi irufin awọn bọtini tabili, awọn nkan ti ko wulo, aini iraye si awọn nkan, ati bẹbẹ lọ.
Oluṣeto le ma bẹrẹ ni akoko, o le di, ati bẹbẹ lọ.

Awọn aṣiṣe ti o rọrun ko gba akoko pipẹ lati ṣatunṣe. ETL ti o dara le mu pupọ julọ ninu wọn funrararẹ.

Awọn idun eka jẹ ki o ṣe pataki lati ṣawari ati idanwo awọn ilana fun ṣiṣẹ pẹlu data, lati ṣawari awọn orisun data. Nigbagbogbo yorisi iwulo fun idanwo iyipada ati imuṣiṣẹ.

Nitorinaa, idaji gbogbo awọn iṣoro ni ibatan si data data. 48% ti gbogbo awọn aṣiṣe jẹ awọn aṣiṣe ti o rọrun.
Ẹkẹta ti gbogbo awọn iṣoro ni ibatan si iyipada iṣaro ipamọ tabi awoṣe, diẹ sii ju idaji awọn aṣiṣe wọnyi jẹ eka.

Ati pe o kere ju idamẹrin gbogbo awọn iṣoro ni o ni ibatan si oluṣeto iṣẹ, 18% eyiti o jẹ awọn aṣiṣe ti o rọrun.

Ni gbogbogbo, 22% ti gbogbo awọn aṣiṣe ti o waye jẹ eka, ati pe atunṣe wọn nilo akiyesi ati akoko pupọ julọ. Wọn ṣẹlẹ lẹẹkan ni ọsẹ kan. Lakoko ti awọn aṣiṣe rọrun n ṣẹlẹ ni gbogbo ọjọ.

O han gbangba pe ibojuwo ti awọn ilana ETL yoo munadoko nigbati ipo ti aṣiṣe ba tọka si ninu log ni deede bi o ti ṣee ṣe ati pe akoko to kere julọ nilo lati wa orisun iṣoro naa.

Abojuto ti o munadoko

Kini MO fẹ lati rii ninu ilana ibojuwo ETL?

Mimojuto awọn ilana ETL ni ile itaja data kekere kan
Bẹrẹ ni - nigbati o bẹrẹ iṣẹ,
Orisun - orisun data,
Layer - kini ipele ibi ipamọ ti n kojọpọ,
Orukọ Job ETL - ilana ikojọpọ, eyiti o ni ọpọlọpọ awọn igbesẹ kekere,
Nọmba Igbesẹ - nọmba ti igbesẹ ti n ṣe,
Awọn ori ila ti o kan - melo ni data ti ni ilọsiwaju tẹlẹ,
Iye iṣẹju-aaya - bawo ni o ṣe pẹ to,
Ipo - boya ohun gbogbo dara tabi rara: O DARA, Aṣiṣe, Nṣiṣẹ, HANGS
Ifiranṣẹ - Ifiranṣẹ aṣeyọri ikẹhin tabi apejuwe aṣiṣe.

Da lori ipo awọn igbasilẹ, o le fi imeeli ranṣẹ. lẹta si awọn ọmọ ẹgbẹ miiran. Ti ko ba si awọn aṣiṣe, lẹhinna lẹta naa ko wulo.

Nitorinaa, ninu iṣẹlẹ ti aṣiṣe, ipo ti isẹlẹ naa jẹ itọkasi kedere.

Nigba miiran o ṣẹlẹ pe ọpa ibojuwo funrararẹ ko ṣiṣẹ. Ni idi eyi, o ṣee ṣe lati pe wiwo (wo) taara ni ibi ipamọ data, lori ipilẹ eyiti a ti kọ ijabọ naa.

ETL monitoring tabili

Lati ṣe ibojuwo ti awọn ilana ETL, tabili kan ati iwo kan ti to.

Lati ṣe eyi, o le pada si ibi ipamọ kekere rẹ ki o si ṣẹda Afọwọkọ ni sqlite database.

Awọn tabili DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Wo/ jabo DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

Ṣiṣayẹwo SQL boya o ṣee ṣe lati gba nọmba igba tuntun kan

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Awọn ẹya tabili:

  • ibere ati opin ilana ilana data gbọdọ wa ni atẹle nipasẹ awọn igbesẹ ETL_START ati ETL_END
  • ninu ọran ti aṣiṣe, igbesẹ ETL_ERROR pẹlu apejuwe rẹ yẹ ki o ṣẹda
  • iye data ti a ti ni ilọsiwaju yẹ ki o ṣe afihan, fun apẹẹrẹ, pẹlu awọn asterisks
  • Ilana kanna le bẹrẹ ni akoko kanna pẹlu agbara_restart = y paramita, laisi rẹ nọmba igba ti wa ni idasilẹ nikan si ilana ti o pari
  • ni ipo deede, o ko le ṣiṣe ilana ilana data kanna ni afiwe

Awọn iṣẹ pataki fun ṣiṣẹ pẹlu tabili jẹ bi atẹle:

  • gbigba nọmba igba ti ilana ETL nṣiṣẹ
  • fi wiwọle log sinu tabili
  • gbigba igbasilẹ aṣeyọri ti o kẹhin ti ilana ETL kan

Ninu awọn data data bii Oracle tabi Postgres, awọn iṣẹ wọnyi le ṣe imuse bi awọn iṣẹ ti a ṣe sinu. sqlite nilo ẹrọ ita, ati ninu ọran yii o prototyped ni PHP.

ipari

Nitorinaa, awọn ifiranṣẹ aṣiṣe ninu awọn irinṣẹ sisẹ data ṣe ipa pataki-mega. Sugbon o jẹ soro lati pe wọn ti aipe fun ni kiakia a wiwa awọn fa ti awọn isoro. Nigbati nọmba awọn ilana ba sunmọ ọgọrun, lẹhinna ibojuwo ilana yipada si iṣẹ akanṣe eka kan.

Nkan naa pese apẹẹrẹ ti ojutu ti o ṣeeṣe si iṣoro naa ni irisi apẹrẹ kan. Gbogbo apẹrẹ ibi ipamọ kekere wa ni gitlab Awọn ohun elo SQLite PHP ETL.

orisun: www.habr.com

Fi ọrọìwòye kun