ื ื™ื˜ื•ืจ ืชื”ืœื™ื›ื™ ETL ื‘ืžื—ืกืŸ ื ืชื•ื ื™ื ืงื˜ืŸ

ืจื‘ื™ื ืžืฉืชืžืฉื™ื ื‘ื›ืœื™ื ืžื™ื•ื—ื“ื™ื ื›ื“ื™ ืœื™ืฆื•ืจ ื ื”ืœื™ื ืœื—ื™ืœื•ืฅ, ื˜ืจื ืกืคื•ืจืžืฆื™ื” ื•ื˜ืขื™ื ืช ื ืชื•ื ื™ื ืœืชื•ืš ืžืกื“ื™ ื ืชื•ื ื™ื ื™ื—ืกื™ื™ื. ืชื”ืœื™ืš ื”ืขื‘ื•ื“ื” ืฉืœ ื›ืœื™ ื”ืขื‘ื•ื“ื” ื ืจืฉื, ืฉื’ื™ืื•ืช ืžืชื•ืงื ื•ืช.

ื‘ืžืงืจื” ืฉืœ ืฉื’ื™ืื”, ื”ื™ื•ืžืŸ ืžื›ื™ืœ ืžื™ื“ืข ืฉื”ื›ืœื™ ืœื ื”ืฆืœื™ื— ืœื”ืฉืœื™ื ืืช ื”ืžืฉื™ืžื” ื•ืื™ืœื• ืžื•ื“ื•ืœื™ื (ืœืขื™ืชื™ื ืงืจื•ื‘ื•ืช Java) ืขืฆืจื• ื”ื™ื›ืŸ. ื‘ืฉื•ืจื•ืช ื”ืื—ืจื•ื ื•ืช, ื ื™ืชืŸ ืœืžืฆื•ื ืฉื’ื™ืืช ืžืกื“ ื ืชื•ื ื™ื, ืœืžืฉืœ, ื”ืคืจืช ืžืคืชื— ื™ื™ื—ื•ื“ื™ ื‘ื˜ื‘ืœื”.

ื›ื“ื™ ืœืขื ื•ืช ืขืœ ื”ืฉืืœื” ืื™ื–ื” ืชืคืงื™ื“ ืžืžืœื ืžื™ื“ืข ืฉื’ื™ืื•ืช ETL, ืกื™ื•ื•ื’ืชื™ ืืช ื›ืœ ื”ื‘ืขื™ื•ืช ืฉื”ืชืจื—ืฉื• ื‘ืžื”ืœืš ื”ืฉื ืชื™ื™ื ื”ืื—ืจื•ื ื•ืช ื‘ืžืื’ืจ ื“ื™ ื’ื“ื•ืœ.

ื ื™ื˜ื•ืจ ืชื”ืœื™ื›ื™ ETL ื‘ืžื—ืกืŸ ื ืชื•ื ื™ื ืงื˜ืŸ

ืฉื’ื™ืื•ืช ืžืกื“ ื ืชื•ื ื™ื ื›ื•ืœืœื•ืช ืœื ืžืกืคื™ืง ืžืงื•ื, ื—ื™ื‘ื•ืจ ืฉืื‘ื“, ืกืฉืŸ ื ืชืงืข ื•ื›ื•'.

ืฉื’ื™ืื•ืช ืœื•ื’ื™ื•ืช ื›ื•ืœืœื•ืช ื›ื’ื•ืŸ ื”ืคืจื” ืฉืœ ืžืคืชื—ื•ืช ื˜ื‘ืœื”, ืื•ื‘ื™ื™ืงื˜ื™ื ืœื ื—ื•ืงื™ื™ื, ื—ื•ืกืจ ื’ื™ืฉื” ืœืื•ื‘ื™ื™ืงื˜ื™ื ื•ื›ื•'.
ื™ื™ืชื›ืŸ ืฉื”ืžืชื–ืžืŸ ืœื ื™ืชื—ื™ืœ ื‘ื–ืžืŸ, ื”ื•ื ืขืœื•ืœ ืœืงืคื•ื ื•ื›ื•'.

ื˜ืขื•ื™ื•ืช ืคืฉื•ื˜ื•ืช ืœื ืœื•ืงื— ื”ืจื‘ื” ื–ืžืŸ ืœืชืงืŸ. ETL ื˜ื•ื‘ ื™ื›ื•ืœ ืœื”ืชืžื•ื“ื“ ืขื ืจื•ื‘ื ื‘ืขืฆืžื•.

ื‘ืื’ื™ื ืžื•ืจื›ื‘ื™ื ืžื—ื™ื™ื‘ื™ื ืœื’ืœื•ืช ื•ืœื‘ื“ื•ืง ื ื”ืœื™ื ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื, ืœื—ืงื•ืจ ืžืงื•ืจื•ืช ื ืชื•ื ื™ื. ืœืขืชื™ื ืงืจื•ื‘ื•ืช ืžื•ื‘ื™ืœื™ื ืœืฆื•ืจืš ื‘ื‘ื“ื™ืงื•ืช ื•ืคืจื™ืกื” ืฉืœ ืฉื™ื ื•ื™ื™ื.

ืื–, ืžื—ืฆื™ืช ืžื›ืœ ื”ื‘ืขื™ื•ืช ืงืฉื•ืจื•ืช ืœืžืกื“ ื”ื ืชื•ื ื™ื. 48% ืžื›ืœ ื”ื˜ืขื•ื™ื•ืช ื”ืŸ ื˜ืขื•ื™ื•ืช ืคืฉื•ื˜ื•ืช.
ืฉืœื™ืฉ ืžื›ืœ ื”ื‘ืขื™ื•ืช ืงืฉื•ืจื•ืช ืœืฉื™ื ื•ื™ ืœื•ื’ื™ืงื” ืื• ืžื•ื“ืœ ื”ืื—ืกื•ืŸ, ื™ื•ืชืจ ืžืžื—ืฆื™ืช ืžื”ืฉื’ื™ืื•ืช ื”ืœืœื• ืžื•ืจื›ื‘ื•ืช.

ื•ืคื—ื•ืช ืžืจื‘ืข ืžื›ืœ ื”ื‘ืขื™ื•ืช ืงืฉื•ืจื•ืช ืœืžืชื–ืžืŸ ื”ืžืฉื™ืžื•ืช, 18% ืžื”ืŸ ื”ืŸ ืฉื’ื™ืื•ืช ืคืฉื•ื˜ื•ืช.

ื‘ืื•ืคืŸ ื›ืœืœื™, 22% ืžื›ืœืœ ื”ืฉื’ื™ืื•ืช ื”ืžืชืจื—ืฉื•ืช ื”ืŸ ืžื•ืจื›ื‘ื•ืช, ื•ืชื™ืงื•ื ืŸ ื“ื•ืจืฉ ืืช ืžื™ืจื‘ ืชืฉื•ืžืช ื”ืœื‘ ื•ื”ื–ืžืŸ. ื”ื ืงื•ืจื™ื ื‘ืขืจืš ืคืขื ื‘ืฉื‘ื•ืข. ื•ืื™ืœื• ื˜ืขื•ื™ื•ืช ืคืฉื•ื˜ื•ืช ืงื•ืจื•ืช ื›ืžืขื˜ ื›ืœ ื™ื•ื.

ื‘ืจื•ืจ, ื ื™ื˜ื•ืจ ืชื”ืœื™ื›ื™ ETL ื™ื”ื™ื” ื™ืขื™ืœ ื›ืืฉืจ ืžื™ืงื•ื ื”ืฉื’ื™ืื” ื™ืฆื•ื™ืŸ ื‘ื™ื•ืžืŸ ื‘ืฆื•ืจื” ืžื“ื•ื™ืงืช ื›ื›ืœ ื”ืืคืฉืจ ื•ื ื“ืจืฉ ื”ื–ืžืŸ ื”ืžื™ื ื™ืžืœื™ ื›ื“ื™ ืœืžืฆื•ื ืืช ืžืงื•ืจ ื”ื‘ืขื™ื”.

ื ื™ื˜ื•ืจ ื™ืขื™ืœ

ืžื” ืจืฆื™ืชื™ ืœืจืื•ืช ื‘ืชื”ืœื™ืš ื ื™ื˜ื•ืจ ETL?

ื ื™ื˜ื•ืจ ืชื”ืœื™ื›ื™ ETL ื‘ืžื—ืกืŸ ื ืชื•ื ื™ื ืงื˜ืŸ
ื”ืชื—ืœ ื‘ - ื›ืฉื”ื•ื ื”ืชื—ื™ืœ ืœืขื‘ื•ื“,
ืžืงื•ืจ - ืžืงื•ืจ ื ืชื•ื ื™ื,
ืฉื›ื‘ื” - ืื™ื–ื• ืจืžืช ืื—ืกื•ืŸ ื ื˜ืขืŸ,
ืฉื ืขื‘ื•ื“ื” ETL - ื”ืœื™ืš ื”ืขืœืื”, ื”ืžื•ืจื›ื‘ ืžื”ืจื‘ื” ืฉืœื‘ื™ื ืงื˜ื ื™ื,
ืžืกืคืจ ืฆืขื“ - ืžืกืคืจ ื”ืฆืขื“ ื”ืžืชื‘ืฆืข,
ืฉื•ืจื•ืช ืžื•ืฉืคืขื•ืช - ื›ืžื” ื ืชื•ื ื™ื ื›ื‘ืจ ืขื•ื‘ื“ื•,
ืžืฉืš ืฉื ื™ื” - ื›ืžื” ื–ืžืŸ ื–ื” ืœื•ืงื—,
ืกื˜ื˜ื•ืก - ืื ื”ื›ืœ ื‘ืกื“ืจ ืื• ืœื: ื‘ืกื“ืจ, ืฉื’ื™ืื”, ืจื™ืฆื”, ื ืชืงืข
ื”ื•ื“ืขื” - ื”ื•ื“ืขื” ืžื•ืฆืœื—ืช ืื—ืจื•ื ื” ืื• ืชื™ืื•ืจ ืฉื’ื™ืื”.

ื‘ื”ืชื‘ืกืก ืขืœ ืžืฆื‘ ื”ืขืจื›ื™ื, ื ื™ืชืŸ ืœืฉืœื•ื— ืžื™ื™ืœ. ืžื›ืชื‘ ืœื—ื‘ืจื™ื ืื—ืจื™ื. ืื ืื™ืŸ ืฉื’ื™ืื•ืช, ืื– ื”ืžื›ืชื‘ ืื™ื ื• ื”ื›ืจื—ื™.

ื›ืš, ื‘ืžืงืจื” ืฉืœ ื˜ืขื•ืช, ืžืฆื•ื™ืŸ ืžื™ืงื•ื ื”ืื™ืจื•ืข ื‘ืฆื•ืจื” ื‘ืจื•ืจื”.

ืœืคืขืžื™ื ืงื•ืจื” ืฉื›ืœื™ ื”ื ื™ื˜ื•ืจ ืขืฆืžื• ืœื ืขื•ื‘ื“. ื‘ืžืงืจื” ื–ื”, ื ื™ืชืŸ ืœืงืจื•ื ืœ-view (view) ื™ืฉื™ืจื•ืช ื‘ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืขืœ ื‘ืกื™ืกื• ื ื‘ื ื” ื”ื“ื•ื—.

ื˜ื‘ืœืช ื ื™ื˜ื•ืจ ETL

ื›ื“ื™ ืœื™ื™ืฉื ื ื™ื˜ื•ืจ ืฉืœ ืชื”ืœื™ื›ื™ ETL, ืžืกืคื™ืงื™ื ื˜ื‘ืœื” ืื—ืช ื•ืชืฆื•ื’ื” ืื—ืช.

ื›ื“ื™ ืœืขืฉื•ืช ื–ืืช, ืืชื” ื™ื›ื•ืœ ืœื—ื–ื•ืจ ืืœ ื”ืื—ืกื•ืŸ ื”ืงื˜ืŸ ืฉืœืš ื•ืœื™ืฆื•ืจ ืื‘ ื˜ื™ืคื•ืก ื‘ืžืกื“ ื ืชื•ื ื™ื ืฉืœ sqlite.

ื˜ื‘ืœืื•ืช DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

ื”ืฆื’/ื“ื•ื•ื— ืขืœ DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL ื‘ื•ื“ืง ืื ืืคืฉืจ ืœืงื‘ืœ ืžืกืคืจ ื”ืคืขืœื” ื—ื“ืฉ

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

ืชื›ื•ื ื•ืช ื”ื˜ื‘ืœื”:

  • ื™ืฉ ืœื‘ืฆืข ืืช ื”ื”ืชื—ืœื” ื•ื”ืกื•ืฃ ืฉืœ ื”ืœื™ืš ืขื™ื‘ื•ื“ ื”ื ืชื•ื ื™ื ืขืœ ื™ื“ื™ ื”ืฉืœื‘ื™ื ETL_START ื•-ETL_END
  • ื‘ืžืงืจื” ืฉืœ ืฉื’ื™ืื”, ื™ืฉ ืœื™ืฆื•ืจ ืืช ื”ืฉืœื‘ ETL_ERROR ืขื ื”ืชื™ืื•ืจ ืฉืœื•
  • ื™ืฉ ืœื”ื“ื’ื™ืฉ ืืช ื›ืžื•ืช ื”ื ืชื•ื ื™ื ื”ืžืขื•ื‘ื“ื™ื, ืœืžืฉืœ, ื‘ื›ื•ื›ื‘ื™ื•ืช
  • ื ื™ืชืŸ ืœื”ืชื—ื™ืœ ืืช ืื•ืชื• ื”ืœื™ืš ื‘ื•-ื–ืžื ื™ืช ืขื ื”ืคืจืžื˜ืจ force_restart=y, ื‘ืœืขื“ื™ื• ืžืกืคืจ ื”ืคื’ื™ืฉื” ืžื•ื ืคืง ืจืง ืœื”ืœื™ืš ืฉื”ื•ืฉืœื
  • ื‘ืžืฆื‘ ืจื’ื™ืœ, ืœื ื ื™ืชืŸ ืœื”ืคืขื™ืœ ืืช ืื•ืชื• ื”ืœื™ืš ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืžืงื‘ื™ืœ

ื”ืคืขื•ืœื•ืช ื”ื“ืจื•ืฉื•ืช ืœืขื‘ื•ื“ื” ืขื ื˜ื‘ืœื” ื”ืŸ ื›ื“ืœืงืžืŸ:

  • ืงื‘ืœืช ืžืกืคืจ ื”ืคื’ื™ืฉื” ืฉืœ ื”ืœื™ืš ื”-ETL ื”ืคื•ืขืœ
  • ื”ื•ืกืฃ ืจื™ืฉื•ื ื™ื•ืžืŸ ืœื˜ื‘ืœื”
  • ืงื‘ืœืช ื”ืจืฉื•ืžื” ื”ืžื•ืฆืœื—ืช ื”ืื—ืจื•ื ื” ืฉืœ ื”ืœื™ืš ETL

ื‘ื‘ืกื™ืกื™ ื ืชื•ื ื™ื ื›ื’ื•ืŸ Oracle ืื• Postgres, ื ื™ืชืŸ ืœื™ื™ืฉื ืืช ื”ืคืขื•ืœื•ืช ื”ืœืœื• ื›ืคื•ื ืงืฆื™ื•ืช ืžื•ื‘ื ื•ืช. sqlite ื“ื•ืจืฉ ืžื ื’ื ื•ืŸ ื—ื™ืฆื•ื ื™, ื•ื‘ืžืงืจื” ื–ื” ื”ื•ื ืื‘ ื˜ื™ืคื•ืก ื‘-PHP.

ืคืœื˜

ืœืคื™ื›ืš, ื”ื•ื“ืขื•ืช ืฉื’ื™ืื” ื‘ื›ืœื™ื ืœืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ืžืžืœืื•ืช ืชืคืงื™ื“ ืžื’ื”-ื—ืฉื•ื‘. ืื‘ืœ ืงืฉื” ืœืงืจื•ื ืœื”ื ืื•ืคื˜ื™ืžืœื™ื™ื ืœืื™ืชื•ืจ ืžื”ื™ืจ ืฉืœ ื”ื’ื•ืจื ืœื‘ืขื™ื”. ื›ืืฉืจ ืžืกืคืจ ื”ื”ืœื™ื›ื™ื ืžืชืงืจื‘ ืœืžืื”, ืื– ื ื™ื˜ื•ืจ ืชื”ืœื™ื›ื™ื ื”ื•ืคืš ืœืคืจื•ื™ืงื˜ ืžื•ืจื›ื‘.

ื”ืžืืžืจ ืžืกืคืง ื“ื•ื’ืžื” ืœืคืชืจื•ืŸ ืืคืฉืจื™ ืœื‘ืขื™ื” ื‘ืฆื•ืจื” ืฉืœ ืื‘ ื˜ื™ืคื•ืก. ื›ืœ ืื‘ ื”ื˜ื™ืคื•ืก ืฉืœ ื”ืžืื’ืจ ื”ืงื˜ืŸ ื–ืžื™ืŸ ื‘-gitlab SQLite PHP ETL Utilities.

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”