Te aroturuki i nga tukanga ETL i roto i te whare putunga raraunga iti

He maha nga tangata e whakamahi ana i nga taputapu motuhake ki te hanga tikanga mo te tango, te huri, me te uta raraunga ki roto i nga papaunga raraunga hononga. Kua tuhia te tukanga o nga taputapu mahi, kua whakatikahia nga hapa.

Mena he hapa, kei roto i te raarangi nga korero i rahua e te taputapu te whakaoti i te mahi me nga waahanga (he java) i mutu ki hea. I nga rarangi whakamutunga, ka kitea e koe he hapa putunga raraunga, hei tauira, he takahi tepu ahurei.

Hei whakautu i te patai he aha te mahi a nga korero hapa ETL, kua whakarōpūhia e au nga raru katoa i puta i roto i nga tau e rua kua hipa i roto i tetahi putunga nui.

Te aroturuki i nga tukanga ETL i roto i te whare putunga raraunga iti

Kei roto i nga hapa o te Raraunga Raraunga he iti rawa te mokowhiti, ngaro te hononga, iri te huihuinga, aha atu.

Ko nga hapa arorau ko te takahi i nga taviri tepu, nga mea kore-mana, te kore uru ki nga taonga, aha atu.
Kare pea te kaihōtaka e tiimata i te waa, ka whakatio, aha atu.

Ko nga hapa ngawari e kore e roa ki te whakatika. Ka taea e te ETL pai te whakahaere i te nuinga o raatau.

Ko nga pepeha uaua he mea tika kia kitea me te whakamatautau i nga tikanga mo te mahi me nga raraunga, ki te torotoro i nga puna raraunga. I te nuinga o nga wa ka pa ki te hiahia mo te whakamatautau whakarereke me te tuku.

Na, ko te haurua o nga raru katoa e pa ana ki te paataka raraunga. 48% o nga hapa katoa he hapa ngawari.
Ko te tuatoru o nga raru katoa e pa ana ki te whakarereke i te arorau rokiroki, tauira ranei, neke atu i te haurua o enei hapa he uaua.

A iti iho i te hauwhä o nga raru katoa e pa ana ki te kaiwhakarite mahi, 18% o enei he hapa ngawari.

I te nuinga o te waa, ko te 22% o nga hapa katoa ka puta he uaua, me to raatau whakatikatika me te aro nui me te wa. Kotahi pea i te wiki. Ko nga hapa ngawari ka tupu tata i nga ra katoa.

Ma te mohio, ka whai hua te aro turuki i nga tukanga ETL ina tohuhia te waahi hapa i roto i te raarangi ka taea, me te wa iti rawa ki te rapu i te puna o te raru.

Te aroturuki whai hua

He aha taku i hiahia kia kite i roto i te tukanga aroturuki ETL?

Te aroturuki i nga tukanga ETL i roto i te whare putunga raraunga iti
Tīmatahia - i te wa i timata ai ia ki te mahi,
Puna - puna raraunga,
Apa - he aha te taumata o te rokiroki kei te utaina,
Ingoa Mahi ETL - te mahi tuku, he maha nga waahanga iti,
Tau Hipanga - te tau o te taahiraa e mahia ana,
Nga Haupae Paa - e hia nga raraunga kua oti te tukatuka,
Roa hekona - pehea te roa,
Tūnga - ahakoa kei te pai nga mea katoa, kaore ranei: OK, ERROR, RUNNING, HANGS
Karere - Karere angitu whakamutunga, he whakaahuatanga hapa ranei.

I runga i te mana o nga rekoata, ka taea e koe te tuku imeera. reta ki etahi atu mema. Mena kaore he hapa, karekau te reta e tika.

No reira, mena he hapa, ka tino tohuhia te waahi o te aitua.

I etahi wa ka puta ko te taputapu aroturuki ake kaore e mahi. I roto i tenei take, ka taea te karanga i te tirohanga (tirohanga) tika i roto i te papanga raraunga, i runga i te kaupapa i hangaia ai te ripoata.

ETL tepu aroturuki

Hei whakatinana i te aro turuki i nga tukanga ETL, kotahi tepu me te tirohanga kotahi ka nui.

Ki te mahi i tenei, ka taea e koe te hoki ki to rokiroki iti me te hanga tauira i roto i te sqlite pātengi raraunga.

DDL ripanga

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Tiro/Ripoata DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

Tirohanga SQL mena ka taea te tiki tau hui hou

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Āhuatanga ripanga:

  • te timatanga me te mutunga o te tukanga tukatuka raraunga me whai i nga hikoinga ETL_START me ETL_END
  • ina he hapa, me hanga te taahiraa ETL_ERROR me tona whakaahuatanga
  • te nui o nga raraunga tukatuka me tohu, hei tauira, me nga whetu
  • Ka taea te timata i taua tikanga i te wa ano me te tawhā force_restart=y, ki te kore ka tukuna te nama hui ki te tikanga kua oti.
  • i roto i te aratau noa, e kore e taea e koe te whakahaere i te taua tukanga tukatuka raraunga i roto i te whakarara

Ko nga mahi e tika ana mo te mahi me te tepu e whai ake nei:

  • te tiki i te tau wahanga o te tikanga whakahaere ETL
  • kōkuhu urunga rangitaki ki te ripanga
  • te tiki i te rekoata angitu whakamutunga o te tikanga ETL

I roto i nga papaa raraunga penei i te Oracle, i te Postgres ranei, ka taea te whakatinana i enei mahi hei mahi whakauru. Ko te sqlite he tikanga o waho, a i tenei keehi he tauira i roto i te PHP.

mutunga

Na, ko nga karere hapa i roto i nga taputapu tukatuka raraunga he mahi nui-mega. Engari he uaua ki te karanga i a raatau tino pai mo te rapu tere i te take o te raru. Ka tata te maha o nga tukanga ki te rau, katahi ka huri te mahi aroturuki hei kaupapa uaua.

Ka whakaratohia e te tuhinga he tauira o te otinga ka taea ki te raruraru i roto i te ahua o te tauira. Kei te waatea te tauira putunga iti katoa i gitlab SQLite PHP ETL Utilities.

Source: will.com

Tāpiri i te kōrero