Siveyans pwosesis ETL nan yon ti depo done

Anpil sèvi ak zouti espesyalize pou kreye pwosedi pou ekstrè, transfòme, ak chaje done nan baz done relasyon. Pwosesis la nan zouti k ap travay konekte, erè yo fiks.

Nan ka yon erè, boutèy la gen enfòmasyon ke zouti a echwe pou pou ranpli travay la ak ki modil (souvan java) sispann kote. Nan dènye liy yo, ou ka jwenn yon erè baz done, pou egzanp, yon tab inik vyolasyon kle.

Pou reponn kesyon an sou ki wòl enfòmasyon erè ETL jwe, mwen klase tout pwoblèm ki te fèt pandan de ane ki sot pase yo nan yon depo pito gwo.

Siveyans pwosesis ETL nan yon ti depo done

Erè baz done gen ladan pa ase espas, koneksyon pèdi, sesyon pandye, elatriye.

Erè lojik yo enkli tankou vyolasyon kle tab, objè ki pa valab, mank aksè nan objè, elatriye.
Planifikatè a ka pa kòmanse alè, li ka friz, elatriye.

Erè senp yo pa pran tan pou yo repare. Yon bon ETL ka okipe pifò nan yo poukont li.

Ensèk konplèks fè li nesesè pou dekouvri ak teste pwosedi pou travay ak done, pou eksplore sous done. Souvan mennen nan bezwen an pou tès chanjman ak deplwaman.

Se konsa, mwatye nan tout pwoblèm yo gen rapò ak baz done a. 48% nan tout erè yo se erè senp.
Yon tyè nan tout pwoblèm ki gen rapò ak chanje lojik nan depo oswa modèl, plis pase mwatye nan erè sa yo se konplèks.

Ak mwens pase yon ka nan tout pwoblèm ki gen rapò ak pwogramasyon an travay, 18% nan yo se erè senp.

An jeneral, 22% nan tout erè ki rive yo konplèks, epi koreksyon yo mande pou plis atansyon ak tan. Yo rive apeprè yon fwa pa semèn. Lè nou konsidere ke erè senp rive prèske chak jou.

Li evidan, siveyans nan pwosesis ETL yo pral efikas lè yo endike kote erè a nan jounal la avèk presizyon ke posib epi yo mande tan minimòm pou jwenn sous pwoblèm nan.

Siveyans efikas

Kisa mwen te vle wè nan pwosesis siveyans ETL la?

Siveyans pwosesis ETL nan yon ti depo done
Kòmanse nan - lè li te kòmanse travay,
Sous - sous done,
Kouch - ki nivo depo ap chaje,
ETL Job Non - pwosedi telechaje, ki gen anpil ti etap,
Nimewo Etap - nimewo etap ke yo te fè a,
Ranje ki afekte yo - konbyen done ki deja trete,
Duration sec - konbyen tan li pran,
Estati - si tout bagay anfòm oswa ou pa: OK, ERÈ, KOURI, KANCHE
Mesaj - Dènye mesaj siksè oswa deskripsyon erè.

Dapre estati antre yo, ou ka voye yon imèl. lèt bay lòt manm yo. Si pa gen okenn erè, Lè sa a, lèt la pa nesesè.

Kidonk, nan ka yon erè, kote ensidan an endike klèman.

Pafwa li rive ke zouti nan siveyans tèt li pa travay. Nan ka sa a, li posib pou rele yon vi (view) dirèkteman nan baz done a, sou baz rapò a bati.

ETL tab siveyans

Pou aplike siveyans nan pwosesis ETL, yon tab ak yon gade yo ase.

Pou fè sa, ou ka retounen nan ti depo ou epi kreye pwototip nan baz done sqlite.

DDL tab

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Gade/Rapòte DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Tcheke si li posib pou jwenn yon nouvo nimewo sesyon

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Karakteristik tab la:

  • kòmansman ak fen pwosedi tretman done yo dwe swiv pa etap ETL_START ak ETL_END
  • an ka ta gen yon erè, etap la ETL_ERROR ak deskripsyon li yo ta dwe kreye
  • kantite done trete yo ta dwe make, pou egzanp, ak asterisk
  • ka menm pwosedi a dwe kòmanse an menm tan an ak paramèt force_rstart=y, san li se nimewo sesyon an bay sèlman nan pwosedi a ranpli.
  • an mòd nòmal, ou pa ka kouri menm pwosedi pwosesis done an paralèl

Operasyon ki nesesè pou travay ak yon tab se jan sa a:

  • jwenn nimewo sesyon an nan pwosedi a ETL kouri
  • mete antre boutèy demi lit nan tab la
  • jwenn dènye dosye siksè nan yon pwosedi ETL

Nan baz done tankou Oracle oswa Postgres, operasyon sa yo ka aplike kòm fonksyon entegre. sqlite mande pou yon mekanis ekstèn, ak nan ka sa a li pwototip nan PHP.

Sòti

Kidonk, mesaj erè nan zouti pwosesis done jwe yon wòl mega-enpòtan. Men, li difisil pou rele yo pi bon pou jwenn byen vit kòz pwoblèm nan. Lè kantite pwosedi apwoche yon santèn, Lè sa a, siveyans pwosesis vire nan yon pwojè konplèks.

Atik la bay yon egzanp yon solisyon posib nan pwoblèm nan nan fòm lan nan yon pwototip. Tout ti pwototip depo a disponib nan gitlab SQLite PHP ETL Itilite.

Sous: www.habr.com

Add nouvo kòmantè