Kutarisa ETL maitiro mune diki dhata yekuchengetedza

Vanhu vazhinji vanoshandisa maturusi ane hunyanzvi kugadzira maitiro ekubvisa, kushandura, uye kurodha data mumadhatabhesi ehukama. Maitiro ezvishandiso akaiswa, zvikanganiso zvinonyorwa.

Kana paine chikanganiso, irogi rine ruzivo rwekuti chishandiso chakatadza kupedzisa basa uye ndeapi ma module (kazhinji java) akamira kupi. Mitsetse yekupedzisira inogona kunge iine chikanganiso chedatabase, sekutyorwa kwekiyi yakasarudzika yetafura.

Kuti ndipindure mubvunzo wekuti ibasa rei reETL ruzivo rwekukanganisa, ndakaronga matambudziko ese akaitika mumakore maviri apfuura munzvimbo yakakura kwazvo.

Kutarisa ETL maitiro mune diki dhata yekuchengetedza

Mhosho dzeDatabhesi dzinosanganisira senge: pakanga pasina nzvimbo yakakwana, kubatana kwakarasika, chikamu chakarembera, nezvimwe.

Zvikanganiso zvine musoro zvinosanganisira kutyorwa kwemakiyi etafura, zvinhu zvisina basa, kushaikwa kwekuwana zvinhu, nezvimwe.
Iyo scheduler inogona kusatangwa nenguva, inogona kuomesa, nezvimwe.

Zvikanganiso zviri nyore hazvitore nguva yakawanda kugadzirisa. ETL yakanaka inogona kubata mazhinji acho ari ega.

Mhosho dzakaomarara dzinoita kuti zvive zvakakosha kuvhura uye kutarisa maitiro ekubata data uye kuongorora mabviro edata. Kazhinji inotungamira kune kudiwa kwekuyedza shanduko uye kutumira.

Saka, hafu yezvinetso zvose zvine chokuita ne database. 48% yezvikanganiso zvese zviri nyore zvikanganiso.
Chetatu chematambudziko ese ane chekuita neshanduko mune yekuchengetedza logic kana modhi; inopfuura hafu yezvikanganiso izvi zvakaoma.

Uye isingasviki chikamu chechina chematambudziko ese ane hukama neanoronga basa, 18% ayo ari nyore zvikanganiso.

Pakazara, 22% yezvikanganiso zvese zvinoitika zvakaoma uye zvinoda kutariswa zvakanyanya uye nguva yekugadzirisa. Zvinoitika kamwechete pasvondo. Nepo zvikanganiso zviri nyore zvinoitika anenge mazuva ese.

Zviripachena, yekutarisa ETL maitiro anozoshanda kana nzvimbo yekukanganisa ichiratidzwa murogi nemazvo sezvinobvira uye nguva shoma inodiwa kuti uwane kunobva dambudziko.

Kunyatsoongorora

Chii chandaida kuona muETL yekutarisa maitiro?

Kutarisa ETL maitiro mune diki dhata yekuchengetedza
Tanga pa - pandakatanga kushanda,
Kwakabva - data source,
Layer - iyo nhanho yekuchengetedza inotakurwa,
ETL Job Name inzira yekurodha iyo ine akawanda madiki matanho,
Nhanho Nhamba - nhamba yedanho riri kuitwa,
Mitsetse Yakakanganiswa - ingani data rakatogadziriswa,
Duration sec - zvinotora nguva yakareba sei kuita,
Mamiriro - kunyangwe zvese zvakanaka kana kwete: OK, ERROR, RUNNING, HANGES
Meseji - meseji yakabudirira yekupedzisira kana tsananguro yekukanganisa.

Zvichienderana nemamiriro ezvinyorwa, unogona kutumira email. tsamba kune vamwe vatori vechikamu. Kana pasina zvikanganiso, saka tsamba haifaniri.

Nenzira iyi, kana pane kukanganisa, nzvimbo yechiitiko inoratidzwa zvakajeka.

Dzimwe nguva zvinoitika kuti chigadziro chekucherechedza pachacho hachishande. Muchiitiko ichi, zvinokwanisika kudana maonero (kuona) zvakananga mudhesi, pamusana pekuti iyo ripoti inovakwa.

ETL yekutarisa tafura

Kuita yekutarisa kweETL maitiro, tafura imwe uye imwe maonero zvakakwana.

Kuti uite izvi unogona kudzokera kune dura rako pachako uye gadzira prototype mu sqlite dhatabhesi.

DDL matafura

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Tarisa/report DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Kutarisa kugona kuwana nhamba itsva yechikamu

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Tafura Features:

  • kutanga uye kupera kwemaitiro ekugadzirisa data kunofanirwa kuperekedzwa nematanho ETL_START uye ETL_END
  • kana paine chikanganiso, ETL_ERROR nhanho inofanirwa kugadzirwa ine tsananguro yayo
  • huwandu hwe data yakagadziriswa hunofanira kusimbiswa, semuenzaniso, neasterisks
  • nzira imwe chete inogona kutangwa panguva imwe chete neforce_restart = y parameter; pasina iyo, nhamba yemusangano inopihwa chete kune yakapedzwa maitiro.
  • mune yakajairwa modhi hazvigoneke kumhanyisa iyo yakafanana data yekugadzirisa maitiro nenzira yakafanana

Mabasa anodiwa ekushanda netafura ndeaya anotevera:

  • kuwana iyo nhamba yechikamu cheiyo ETL maitiro ari kutangwa
  • kuisa danda rekupinda mutafura
  • kuwana iyo yekupedzisira yakabudirira rekodhi yeETL maitiro

Mune dhatabhesi seOracle kana Postgres, mashandiro aya anogona kuitwa neakavakirwa-mukati mabasa. sqlite inoda imwe nzira yekunze uye munyaya iyi iyo prototyped mu PHP.

mhedziso

Saka, kukanganisa kukanganisa mumaturusi ekugadzirisa data kunoita basa rakakosha. Asi ivo havagone kunzi vakakwana nekukurumidza kutsvaga chikonzero chedambudziko. Kana huwandu hwemaitiro hunosvika zana, kuongorora maitiro kunoshanduka kuita purojekiti yakaoma.

Chinyorwa chinopa muenzaniso wemhinduro inogoneka kune dambudziko nenzira yeprototype. Iyo yese prototype yediki repository inowanikwa mugitlab SQLite PHP ETL Zvishandiso.

Source: www.habr.com

Voeg