Ukubeka iliso kwiinkqubo ze-ETL kwindawo encinci yokugcina idatha

Abantu abaninzi basebenzisa izixhobo ezikhethekileyo ukwenza iindlela zokukhupha, ukuguqula, kunye nokulayisha idatha kwiinkcukacha zobudlelwane. Inkqubo yezixhobo zilogiwe, iimpazamo zirekhodwa.

Kwimeko yemposiso, ilog iqulethe ulwazi ukuba isixhobo asiphumelelanga ukugqiba umsebenzi kwaye zeziphi iimodyuli (kaninzi java) zamisa apho. Imigca yokugqibela ingaqulatha impazamo yedathabheyisi, enjengokwaphulwa kweqhosha elilodwa letafile.

Ukuphendula umbuzo wokuba yeyiphi indima edlalwa lulwazi lwempazamo ye-ETL, ndihlele zonke iingxaki ezenzeke kule minyaka mibini idlulileyo kwindawo yokugcina enkulu.

Ukubeka iliso kwiinkqubo ze-ETL kwindawo encinci yokugcina idatha

Iimpazamo zeDatabase ziquka: bekungekho ndawo yaneleyo, uxhulumaniso lulahlekile, iseshoni ixhonywe, njl.

Iimpazamo ezinengqiqo ziquka ukuphulwa kwezitshixo zetafile, izinto ezingavumelekanga, ukungabikho kokufikelela kwizinto, njl.
Umcwangcisi akanakusungulwa ngexesha, unokukhenkceka, njl.

Iimpazamo ezilula azithathi xesha lininzi ukulungisa. I-ETL elungileyo inokusingatha uninzi lwazo ngokwayo.

Iimpazamo ezinzima zenza kube yimfuneko ukuvula kunye nokujonga iinkqubo zokuphatha idatha kunye nokuphanda imithombo yedatha. Ngokuqhelekileyo kukhokelela kwimfuno yokuvavanya utshintsho kunye nokuthunyelwa.

Ngoko ke, isiqingatha sazo zonke iingxaki zinxulumene nesiseko sedatha. I-48% yazo zonke iimpazamo ziimpazamo ezilula.
Ingxenye yesithathu yazo zonke iingxaki zihambelana neenguqu kwi-logic yokugcina okanye imodeli;

Kwaye ngaphantsi kwekota yazo zonke iingxaki ezinxulumene nomcwangcisi womsebenzi, i-18% yazo ziimpazamo ezilula.

Lilonke, i-22% yazo zonke iimpazamo ezenzekayo zintsonkothile kwaye zifuna eyona ngqwalasela nexesha lokuzilungisa. Zibakho kanye ngeveki. Nangona iimpazamo ezilula zenzeka phantse yonke imihla.

Ngokucacileyo, ukubeka iliso kwiinkqubo ze-ETL kuya kusebenza xa indawo yephutha iboniswe kwilogi ngokuchanekileyo ngokunokwenzeka kwaye ixesha elincinci lifunekayo ukufumana umthombo wengxaki.

Ukubeka iliso okusebenzayo

Yintoni ebendifuna ukuyibona kwinkqubo yokubeka iliso ye-ETL?

Ukubeka iliso kwiinkqubo ze-ETL kwindawo encinci yokugcina idatha
Ndiqale ngo- xa ndiqala ukusebenza,
Umthombo - umthombo wedatha,
Umaleko - yeyiphi inqanaba lokugcina elilayishiweyo,
Igama lomsebenzi we-ETL yinkqubo yokulayisha equkethe amanyathelo amancinci amaninzi,
Inombolo yenyathelo - inani lenyathelo elenziwayo,
Imiqolo echaphazelekayo- ingakanani idatha esele yenziwe,
Ubude bexesha - kuthatha ixesha elingakanani ukuphumeza,
Ubume - nokuba yonke into ilungile okanye ayilunganga: KULUNGILE, IMPASO, UKUBALEKA, UKUHINGA
Umyalezo β€” umyalezo wokugqibela ophumeleleyo okanye inkcazelo yemposiso.

Ngokusekelwe kubume beerekhodi, ungathumela i-imeyile. ileta eya kwabanye abathathi-nxaxheba. Ukuba akukho ziphoso, ngoko ke ileta ayiyimfuneko.

Ngale ndlela, xa kukho impazamo, indawo yesiganeko iboniswa ngokucacileyo.

Ngamanye amaxesha kwenzeka ukuba isixhobo sokubeka iliso ngokwaso asisebenzi. Kule meko, kunokwenzeka ukubiza umbono (umbono) ngokuthe ngqo kwisiseko sedatha, ngesiseko apho ingxelo yakhiwe khona.

Itheyibhile yokubeka iliso ye-ETL

Ukuphumeza ukubeka iliso kwiinkqubo ze-ETL, itafile enye kunye nombono omnye zanele.

Ukwenza oku ungabuyela ku indawo yakho yokugcina encinci kwaye wenze iprototype kwisiseko sedatha ye-sqlite.

Iitafile ze-DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Jonga/uxela i-DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Ukujonga ukukwazi ukufumana inombolo yeseshoni entsha

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Iimpawu zeTheyibhile:

  • isiqalo nesiphelo senkqubo yokugcinwa kwedatha kufuneka sikhatshwe ngamanyathelo ETL_START kunye ne-ETL_END
  • ukuba kukho impazamo, inyathelo le-ETL_ERROR kufuneka lenziwe kunye nenkcazo yalo
  • ubungakanani bedatha ecwangcisiweyo kufuneka igxininiswe, umzekelo, ngeenkwenkwezi
  • inkqubo efanayo inokuqaliswa ngexesha elinye kunye ne-force_restart=y parameter ngaphandle kwayo, inombolo yeseshoni ikhutshwa kuphela kwinkqubo egqityiweyo
  • kwimo yesiqhelo akunakwenzeka ukuqhuba inkqubo yokucwangcisa idatha ngokufanayo

Imisebenzi eyimfuneko yokusebenza ngetheyibhile yile ilandelayo:

  • ukufumana inombolo yeseshoni yenkqubo ye-ETL iqaliswe
  • ifaka ingeniso yelog kwitafile
  • ukufumana ingxelo yokugqibela eyimpumelelo yenkqubo ye-ETL

Kwiinkcukacha ezifana ne-Oracle okanye i-Postgres, le misebenzi inokuphunyezwa ngemisebenzi eyakhelweyo. sqlite ifuna indlela yangaphandle kwaye kulo mzekelo prototyped kwi PHP.

isiphelo

Ke, ingxelo yempazamo kwizixhobo zokucwangcisa idatha idlala indima ebalulekileyo. Kodwa abanakubizwa ngokuba yeyona nto ifanelekileyo yokufumana ngokukhawuleza unobangela wengxaki. Xa inani leenkqubo lisondela kwikhulu, ukubeka iliso kwinkqubo ijika ibe yiprojekthi enzima.

Inqaku linika umzekelo wesisombululo esinokwenzeka kwingxaki ngendlela yeprototype. Yonke iprototype yendawo yokugcina encinci iyafumaneka kwi-gitlab SQLite PHP ETL Utility.

umthombo: www.habr.com

Yongeza izimvo