Saib xyuas ETL cov txheej txheem hauv cov ntaub ntawv me me

Ntau tus neeg siv cov cuab yeej tshwj xeeb los tsim cov txheej txheem rau kev rho tawm, hloov pauv, thiab thauj cov ntaub ntawv rau hauv cov ntaub ntawv sib txheeb. Cov txheej txheem ntawm cov cuab yeej raug kaw, raug kaw.

Nyob rau hauv rooj plaub ntawm qhov yuam kev, lub cav muaj cov ntaub ntawv hais tias lub cuab yeej ua tsis tiav cov hauj lwm thiab cov modules (feem ntau java) nres qhov twg. Cov kab kawg yuav muaj qhov yuam kev database, xws li kev ua txhaum ntawm lub rooj qhov tseem ceeb.

Txhawm rau teb cov lus nug ntawm lub luag haujlwm ETL cov ntaub ntawv yuam kev ua haujlwm li cas, kuv tau faib tag nrho cov teeb meem uas tau tshwm sim hauv ob xyoos dhau los hauv qhov chaw khaws khoom loj heev.

Saib xyuas ETL cov txheej txheem hauv cov ntaub ntawv me me

Database yuam kev suav nrog xws li: tsis muaj chaw txaus, kev sib txuas tau ploj, kev sib ntsib dai, thiab lwm yam.

Cov laj thawj tsis raug suav nrog kev ua txhaum ntawm cov lus yuam sij, cov khoom siv tsis raug, tsis muaj kev nkag mus rau cov khoom, thiab lwm yam.
Lub sijhawm teem sijhawm yuav tsis raug tso tawm raws sijhawm, yuav khov, thiab lwm yam.

Tej yam ua yuam kev yooj yim tsis siv sij hawm ntau los kho. Ib qho ETL zoo tuaj yeem ua rau lawv feem ntau ntawm nws tus kheej.

Cov teeb meem nyuaj ua rau nws tsim nyog qhib thiab tshawb xyuas cov txheej txheem tuav cov ntaub ntawv thiab tshawb nrhiav cov ntaub ntawv. Feem ntau ua rau qhov yuav tsum tau sim hloov pauv thiab xa mus.

Yog li, ib nrab ntawm tag nrho cov teeb meem muaj feem xyuam nrog cov ntaub ntawv. 48% ntawm tag nrho cov yuam kev yog qhov yuam kev yooj yim.
Ib feem peb ntawm tag nrho cov teeb meem muaj feem xyuam rau cov kev hloov pauv ntawm cov logic cia lossis cov qauv; ntau tshaj li ib nrab ntawm cov kev ua yuam kev no nyuaj.

Thiab tsawg tshaj li ib feem peb ntawm tag nrho cov teeb meem muaj feem xyuam rau cov neeg ua hauj lwm teem caij, 18% ntawm cov uas yog yooj yim yuam kev.

Zuag qhia tag nrho, 22% ntawm tag nrho cov kev ua yuam kev uas tshwm sim yog qhov nyuaj thiab xav tau kev saib xyuas thiab sijhawm kho. Lawv tshwm sim ib zaug ib lub lim tiam. Thaum tej kev ua yuam kev yooj yim tshwm sim yuav luag txhua hnub.

Pom tseeb, kev saib xyuas ETL cov txheej txheem yuav ua haujlwm tau zoo thaum qhov chaw ntawm qhov yuam kev tau qhia hauv lub cav kom raug raws li qhov ua tau thiab yuav tsum muaj sijhawm tsawg los nrhiav qhov chaw ntawm qhov teeb meem.

Kev saib xyuas zoo

Kuv xav pom dab tsi hauv txoj kev saib xyuas ETL?

Saib xyuas ETL cov txheej txheem hauv cov ntaub ntawv me me
Pib ntawm - thaum kuv pib ua haujlwm,
Source - cov ntaub ntawv qhov chaw,
Txheej - qhov chaw cia khoom yog loaded,
ETL Txoj Haujlwm Lub Npe yog cov txheej txheem thauj khoom uas muaj ntau yam me me,
Kauj ruam Number β€” tus naj npawb ntawm cov kauj ruam raug ua,
Cov kab cuam tshuam - ntau npaum li cas cov ntaub ntawv twb tau ua tiav lawm,
Duration sec - yuav siv sij hawm ntev npaum li cas los ua,
Qhov xwm txheej - txawm tias txhua yam zoo los tsis yog: OK, ERROR, RUNNING, HANGS
Message β€” cov lus ua tiav kawg lossis kev piav qhia yuam kev.

Raws li cov xwm txheej ntawm cov ntaub ntawv, koj tuaj yeem xa email. tsab ntawv rau lwm tus neeg koom. Yog tias tsis muaj qhov yuam kev, ces tsab ntawv tsis tsim nyog.

Txoj kev no, thaum muaj qhov yuam kev, qhov chaw ntawm qhov xwm txheej tau qhia meej meej.

Qee zaum nws tshwm sim tias cov cuab yeej saib xyuas nws tus kheej tsis ua haujlwm. Nyob rau hauv cov ntaub ntawv no, nws muaj peev xwm hu mus rau saib (saib) ncaj qha nyob rau hauv lub database, nyob rau hauv lub hauv paus ntawm cov ntaub ntawv yog tsim.

ETL saib lub rooj

Txhawm rau ua raws li kev saib xyuas cov txheej txheem ETL, ib lub rooj thiab ib qho kev pom yog txaus.

Ua li no koj tuaj yeem rov qab mus koj tus kheej me me cia thiab tsim ib qho qauv hauv sqlite database.

DDL rooj

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Saib / qhia DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Tshawb xyuas lub peev xwm kom tau txais tus lej tshiab

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Table Features:

  • qhov pib thiab xaus ntawm cov txheej txheem ua cov ntaub ntawv yuav tsum tau nrog rau cov kauj ruam ETL_START thiab ETL_END
  • Nyob rau hauv cov ntaub ntawv ntawm qhov yuam kev, ETL_ERROR kauj ruam yuav tsum tau tsim nrog nws cov lus piav qhia
  • tus nqi ntawm cov ntaub ntawv ua tiav yuav tsum tau qhia meej, piv txwv li, nrog cov hnub qub
  • tib txoj kev tuaj yeem pib tib lub sijhawm nrog force_restart = y parameter; tsis muaj nws, tus lej sib tham tsuas yog muab rau cov txheej txheem ua tiav
  • nyob rau hauv ib txwm hom nws yog tsis yooj yim sua kom khiav tib cov ntaub ntawv txheej txheem nyob rau hauv parallel

Cov haujlwm tsim nyog rau kev ua haujlwm nrog lub rooj yog cov hauv qab no:

  • tau txais tus lej kev sib tham ntawm ETL txheej txheem tau pib
  • ntxig ib lub log nkag rau hauv lub rooj
  • tau txais cov ntaub ntawv ua tiav kawg ntawm ETL txheej txheem

Hauv cov ntaub ntawv xws li Oracle lossis Postgres, cov haujlwm no tuaj yeem ua tiav nrog cov haujlwm ua haujlwm. sqlite xav tau ib qho kev siv sab nraud thiab hauv qhov no nws prototyped hauv PHP.

xaus

Yog li, kev qhia yuam kev hauv cov ntaub ntawv ua cov cuab yeej ua lub luag haujlwm tseem ceeb mega. Tab sis lawv tsis tuaj yeem hu ua qhov zoo tshaj plaws kom nrhiav tau qhov teeb meem sai sai. Thaum tus naj npawb ntawm cov txheej txheem nce mus txog ib puas, kev saib xyuas cov txheej txheem hloov mus ua ib txoj haujlwm nyuaj.

Kab lus muab ib qho piv txwv ntawm kev daws teeb meem rau qhov teeb meem nyob rau hauv daim ntawv ntawm tus qauv. Tag nrho cov qauv ntawm qhov chaw cia me me muaj nyob hauv gitlab SQLite PHP ETL Utilities.

Tau qhov twg los: www.hab.com

Ntxiv ib saib