La socodka hababka ETL ee bakhaarka xogta yar

Dad badan ayaa isticmaala qalab gaar ah si ay u abuuraan nidaamyo ay ku soosaaraan, u beddelaan, iyo ugu shubaan xogta xog-ururineed. Habka qalabka waa la qoray, khaladaadka waa la duubay.

Haddii ay dhacdo qalad, loggu wuxuu ka kooban yahay macluumaad uu qalabku ku guuldareystay inuu dhammeeyo hawsha iyo cutubyada (badanaa java) ayaa joogsaday meesha. Khadadka ugu dambeeya waxaa ku jiri kara qalad xog-ururin, sida jebinta furaha gaarka ah ee miiska.

Si aan uga jawaabo su'aasha ah doorka macluumaadka khaladka ETL ka ciyaaro, waxaan u kala saaray dhammaan dhibaatooyinka dhacay labadii sano ee la soo dhaafay meel aad u weyn.

La socodka hababka ETL ee bakhaarka xogta yar

Khaladaadka database-ka waxaa ka mid ah sida: ma jirin meel ku filan, isku xirnaanta ayaa lumay, fadhiga ayaa laadlaadiyay, iwm.

Khaladaadka macquulka ah waxaa ka mid ah jebinta furayaasha miiska, walxaha aan ansax ahayn, la'aanta walxaha, iwm.
Jadwalaha waxaa laga yaabaa inaan lagu bilaabin waqtigii loogu talagalay, laga yaabaa inuu qaboojiyo, iwm.

Qaladaadka fudud waqti badan kuma qaataan in la saxo. ETL wanaagsan ayaa badidooda iskeed u maarayn karta.

Khaladaadka isku dhafan ayaa lagama maarmaan ka dhigaya in la furo oo la hubiyo habraacyada maaraynta xogta iyo in la baaro ilaha xogta. Badanaa waxay horseedaa baahida loo qabo in la tijaabiyo isbeddelada oo la geeyo.

Markaa, dhibaatooyinka oo dhan badhkood waxay la xidhiidhaan kaydka xogta. 48% dhammaan khaladaadka waa khaladaad fudud.
Saddex meelood oo meel ka mid ah dhibaatooyinka oo dhan waxay la xiriiraan isbeddelada macquulka ah ama qaabka kaydinta; in ka badan kala badh khaladaadkan ayaa ah kuwo adag.

In ka yar rubuc dhammaan dhibaatooyinka ayaa la xiriira jadwalaha hawsha, 18% kuwaas oo ah khaladaad fudud.

Guud ahaan, 22% dhammaan khaladaadka dhaca waa kuwo adag oo u baahan feejignaanta iyo waqtiga ugu badan si loo saxo. Waxay dhacaan qiyaastii hal mar todobaadkii. Halka khaladaadka fudud ay dhacaan ku dhawaad ​​maalin kasta.

Sida iska cad, la socodka hababka ETL waxay noqon doontaa mid wax ku ool ah marka meesha qaladku ka jiro lagu muujiyo log-ka sida saxda ah ee suurtogalka ah iyo waqtiga ugu yar ayaa loo baahan yahay si loo helo isha dhibaatada.

Kormeer wax ku ool ah

Maxaan rabay inaan ku arko habka la socodka ETL?

La socodka hababka ETL ee bakhaarka xogta yar
Ka bilow - markii aan bilaabay shaqada,
Source - isha xogta,
Lakabka - heerkee kaydinta ayaa raran,
Magaca Shaqada ETL waa habraaca rarista oo ka kooban tillaabooyin yar yar oo badan,
Lambarka Tallaabada - lambarka tallaabada la fulinayo,
Safafka ay saameeyeen - inta xog ee hore loo farsameeyay,
Muddada sec - inta ay qaadanayso in la fuliyo,
Xaaladda - haddii wax walba ay wanaagsan yihiin iyo in kale: OK, ERROR, ORDING, PRESS
Fariinta - fariintii u dambaysay ee guulaysata ama sharaxaadda khaladka

Iyada oo ku saleysan heerka diiwaanada, waxaad diri kartaa iimayl. warqad ku socota ka qaybgalayaasha kale. Haddii aysan jirin khaladaad, markaa warqad looma baahna.

Sidan, haddii ay dhacdo qalad, goobta dhacdada ayaa si cad loo tilmaamayaa.

Mararka qaarkood waxa dhacda in qalabka la socodka laftiisa aanu shaqayn. Xaaladdan oo kale, waxaa suurtagal ah in si toos ah loogu waco aragtida (aragtida) kaydka xogta, iyada oo ku saleysan warbixinta la dhisay.

Shaxda la socodka ETL

Si loo hirgeliyo la socodka hababka ETL, hal miis iyo hal aragti ayaa ku filan.

Si aad tan u samayso waxaad ku noqon kartaa kaydintaada yar oo samee prototype ku jira xogta sqlite.

Miisaska DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Arag/ka warbixi DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Hubinta kartida aad ku heli karto lambarka fadhiga cusub

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Tilmaamaha Miiska:

  • bilawga iyo dhamaadka habsocodka xogta waa in ay la socdaan tillaabooyinka ETL_START iyo ETL_END
  • Haddii ay dhacdo khalad, tallaabo ETL_ERROR waa in la abuuraa sifeynteeda
  • cadadka xogta la farsameeyay waa in lagu muujiyaa, tusaale ahaan, iyada oo la adeegsanayo xiddigiyaal
  • isla habraac isku mid ah ayaa lagu bilaabi karaa isla mar la isticmaalayo Force_restart=y parameter; la'aanteed, nambarka fadhiga waxaa la soo saaray kaliya nidaamka la dhammeeyey.
  • qaabka caadiga ah waa wax aan suurtagal ahayn in lagu socodsiiyo habraaca habaynta xogta isku midka ah ee isku midka ah

Hawlaha lagama maarmaanka u ah la shaqeynta miiska waa kuwan soo socda:

  • helitaanka lambarka fadhiga ee nidaamka ETL ee la bilaabay
  • Gelida log ee miiska
  • Helitaanka rikoorkii guusha ugu dambeeyay ee nidaamka ETL

Xog-ururinta sida Oracle ama Postgres, hawlgalladan waxaa lagu fulin karaa hawlo gudaha ah. sqlite waxay u baahan tahay farsamo dibadeed oo kiiskan lagu soo bandhigay PHP.

gunaanad

Markaa, ka warbixinta khaladka ee agabka habaynta xogta ayaa door muhiim ah ka ciyaartaa. Laakiin si dhib leh ayaa loogu yeeri karaa kuwa ugu fiican si dhakhso ah loo helo sababta dhibaatada. Marka tirada habraacyadu ay soo dhawaadaan boqol, la socodka nidaamku waxa uu isu beddelaa mashruuc adag.

Maqaalku wuxuu bixiyaa tusaale ku saabsan xalka suurtagalka ah ee dhibaatada oo ah qaab tusaale ah. Nooca guud ee kaydka yar ayaa laga helayaa gitlab SQLite PHP ETL Utilities.

Source: www.habr.com

Add a comment