Ho beha leihlo lits'ebetso tsa ETL sebakeng se senyenyane sa polokelo ea data

Ba bangata ba sebelisa lisebelisoa tse khethehileng ho theha mekhoa ea ho ntša, ho fetola, le ho kenya data ho li-database tsa likamano. Ts'ebetso ea lisebelisoa tsa ho sebetsa e kenngoa, liphoso lia lokisoa.

Haeba ho na le phoso, log e na le tlhahisoleseding eo sesebelisoa se hlōlehileng ho qeta mosebetsi le hore na ke li-module (hangata java) li emisitseng hokae. Meleng ea ho qetela, u ka fumana phoso ea database, mohlala, tlolo ea bohlokoa ea tafole e ikhethang.

Ho araba potso ea hore na tlhaiso-leseling ea liphoso tsa ETL e bapala karolo efe, ke thathamisitse mathata ohle a etsahetseng lilemong tse peli tse fetileng sebakeng se seholo sa polokelo.

Ho beha leihlo lits'ebetso tsa ETL sebakeng se senyenyane sa polokelo ea data

Liphoso tsa database li kenyelletsa sebaka se lekaneng, khokahanyo e lahlehileng, nako e fanyehiloeng, joalo-joalo.

Liphoso tse utloahalang li kenyelletsa tse kang tlōlo ea linotlolo tsa tafole, lintho tse sa sebetseng, ho se khone ho fumana lintho, joalo-joalo.
Sehlophisi se kanna sa se qale ka nako, se ka hoama, jj.

Liphoso tse bonolo ha li nke nako e telele ho lokisoa. ETL e ntle e ka sebetsana le boholo ba tsona e le mong.

Litšitšili tse rarahaneng li etsa hore ho hlokahale ho sibolla le ho leka mekhoa ea ho sebetsa ka data, ho hlahloba mehloli ea data. Hangata e lebisa tlhokong ea tlhahlobo ea phetoho le ho tsamaisoa.

Kahoo, halofo ea mathata ohle e amana le database. 48% ea liphoso tsohle ke liphoso tse bonolo.
Karolo ea boraro ea mathata ohle a amana le ho fetola mohopolo oa polokelo kapa mohlala, ho feta halofo ea liphoso tsena li rarahane.

'Me ka tlase ho kotara ea mathata ohle a amana le kemiso ea mosebetsi, 18% ea eona e leng liphoso tse bonolo.

Ka kakaretso, 22% ea liphoso tsohle tse etsahalang li rarahane, 'me ho lokisoa ha tsona ho hloka tlhokomelo le nako e ngata. Li etsahala hang ka beke. Athe liphoso tse bonolo li etsahala hoo e ka bang letsatsi le leng le le leng.

Ho totobetse hore ho hlahloba mekhoa ea ETL ho tla sebetsa hantle ha sebaka sa phoso se bontšoa ka har'a log ka nepo ka hohle kamoo ho ka khonehang 'me nako e fokolang e hlokahalang ho fumana mohloli oa bothata.

Tlhokomelo e sebetsang

Ke ne ke batla ho bona eng ts'ebetsong ea ho beha leihlo ETL?

Ho beha leihlo lits'ebetso tsa ETL sebakeng se senyenyane sa polokelo ea data
Qala - ha a qala mosebetsi,
Mohloli - mohloli oa data,
Layer - ke boemo bofe ba polokelo bo ntseng bo kenngoa,
ETL Job Name - ts'ebetso ea ho kenya, e nang le mehato e mengata e menyane,
Nomoro ea Mohato - palo ea mohato o etsoang,
Mela e Amehileng - ke bokae data e seng e sebeditswe,
Duration sec - ho nka nako e kae,
Boemo - hore na ntho e 'ngoe le e' ngoe e lokile kapa che: HO lokile, ERROR, RUNNING, HANGES
Molaetsa - Molaetsa o atlehileng oa ho qetela kapa tlhaloso ea phoso.

Ho latela maemo a lirekoto, o ka romella lengolo-tsoibila. lengolo le eang ho litho tse ling. Haeba ho se na liphoso, joale lengolo ha le hlokehe.

Ka hona, ha ho e-na le phoso, sebaka sa ketsahalo se bontšoa ka ho hlaka.

Ka linako tse ling ho etsahala hore sesebelisoa sa ho shebella ka boeona ha se sebetse. Tabeng ena, hoa khoneha ho letsetsa pono (pono) ka ho toba ho database, motheong oa hore tlaleho e hahiloe.

Tafole ea ho shebella ea ETL

Ho kenya ts'ebetsong ts'ebetso ea ts'ebetso ea ETL, tafole e le 'ngoe le pono e le' ngoe li lekane.

Ho etsa sena, o ka khutlela ho polokelo ea hau e nyane 'me u thehe prototype ho database ea sqlite.

Litafole tsa DL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Sheba/Tlaleha DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Ho hlahloba hore na hoa khoneha ho fumana nomoro e ncha ea seboka

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Litšobotsi tsa tafole:

  • qalo le pheletso ea ts'ebetso ea data e tlameha ho lateloa ke mehato ETL_START le ETL_END
  • ha ho ka ba le phoso, mohato oa ETL_ERROR o nang le tlhaloso e tlameha ho etsoa
  • palo ea data e sebetsitsoeng e lokela ho totobatsoa, ​​mohlala, ka linaleli
  • ts'ebetso e ts'oanang e ka qalisoa ka nako e le 'ngoe le parameter ea force_restart=y, ntle le eona nomoro ea seboka e fanoa feela ka mokhoa o phethiloeng.
  • ka mokhoa o tloaelehileng, o ke ke oa tsamaisa ts'ebetso e ts'oanang ea ts'ebetso ea data ka ho ts'oana

Lits'ebetso tse hlokahalang bakeng sa ho sebetsa le tafole ke tse latelang:

  • ho fumana nomoro ea seboka ea ts'ebetso ea ETL e sebetsang
  • kenya ho kena ka har'a tafole
  • ho fumana rekoto ea ho qetela e atlehileng ea ts'ebetso ea ETL

Litabeng tse kang Oracle kapa Postgres, lits'ebetso tsena li ka kengoa ts'ebetsong joalo ka mesebetsi e hahelletsoeng kahare. sqlite e hloka mochine o ka ntle, 'me tabeng ena prototyped ho PHP.

fihlela qeto e

Kahoo, melaetsa ea liphoso ho lisebelisoa tsa ts'ebetso ea data e bapala karolo ea bohlokoa haholo. Empa ho thata ho li bitsa tse nepahetseng bakeng sa ho fumana sesosa sa bothata kapele. Ha palo ea mekhoa e atamela lekholo, joale ts'ebetso ea ts'ebetso e fetoha morero o rarahaneng.

Sengoliloeng se fana ka mohlala oa tharollo e ka khonehang ea bothata ka mokhoa oa prototype. Sebopeho se senyenyane sa polokelo se fumaneha ho gitlab Lisebelisoa tsa SQLite PHP ETL.

Source: www.habr.com

Eketsa ka tlhaloso