Ọpọlọpọ lo awọn irinṣẹ amọja lati ṣẹda awọn ilana fun yiyo, iyipada, ati ikojọpọ data sinu awọn apoti isura data ibatan. Ilana ti awọn irinṣẹ iṣẹ ti wa ni ibuwolu wọle, awọn aṣiṣe ti wa ni atunṣe.
Ni ọran ti aṣiṣe kan, akọọlẹ naa ni alaye ti ọpa naa kuna lati pari iṣẹ-ṣiṣe naa ati iru awọn modulu (nigbagbogbo java) duro nibiti. Ni awọn laini to kẹhin, o le wa aṣiṣe data data, fun apẹẹrẹ, irufin bọtini alailẹgbẹ tabili kan.
Lati dahun ibeere ti ipa wo ni alaye aṣiṣe ETL ṣe, Mo ti pin gbogbo awọn iṣoro ti o waye ni ọdun meji sẹhin ni ibi ipamọ nla kan.
Awọn aṣiṣe aaye data pẹlu ko to aaye, asopọ ti o sọnu, igba ti o sokọ, ati bẹbẹ lọ.
Awọn aṣiṣe ọgbọn pẹlu iru bi irufin awọn bọtini tabili, awọn nkan ti ko wulo, aini iraye si awọn nkan, ati bẹbẹ lọ.
Oluṣeto le ma bẹrẹ ni akoko, o le di, ati bẹbẹ lọ.
Awọn aṣiṣe ti o rọrun ko gba akoko pipẹ lati ṣatunṣe. ETL ti o dara le mu pupọ julọ ninu wọn funrararẹ.
Awọn idun eka jẹ ki o ṣe pataki lati ṣawari ati idanwo awọn ilana fun ṣiṣẹ pẹlu data, lati ṣawari awọn orisun data. Nigbagbogbo yorisi iwulo fun idanwo iyipada ati imuṣiṣẹ.
Nitorinaa, idaji gbogbo awọn iṣoro ni ibatan si data data. 48% ti gbogbo awọn aṣiṣe jẹ awọn aṣiṣe ti o rọrun.
Ẹkẹta ti gbogbo awọn iṣoro ni ibatan si iyipada iṣaro ipamọ tabi awoṣe, diẹ sii ju idaji awọn aṣiṣe wọnyi jẹ eka.
Ati pe o kere ju idamẹrin gbogbo awọn iṣoro ni o ni ibatan si oluṣeto iṣẹ, 18% eyiti o jẹ awọn aṣiṣe ti o rọrun.
Ni gbogbogbo, 22% ti gbogbo awọn aṣiṣe ti o waye jẹ eka, ati pe atunṣe wọn nilo akiyesi ati akoko pupọ julọ. Wọn ṣẹlẹ lẹẹkan ni ọsẹ kan. Lakoko ti awọn aṣiṣe rọrun n ṣẹlẹ ni gbogbo ọjọ.
O han gbangba pe ibojuwo ti awọn ilana ETL yoo munadoko nigbati ipo ti aṣiṣe ba tọka si ninu log ni deede bi o ti ṣee ṣe ati pe akoko to kere julọ nilo lati wa orisun iṣoro naa.
Abojuto ti o munadoko
Kini MO fẹ lati rii ninu ilana ibojuwo ETL?
Bẹrẹ ni - nigbati o bẹrẹ iṣẹ,
Orisun - orisun data,
Layer - kini ipele ibi ipamọ ti n kojọpọ,
Orukọ Job ETL - ilana ikojọpọ, eyiti o ni ọpọlọpọ awọn igbesẹ kekere,
Nọmba Igbesẹ - nọmba ti igbesẹ ti n ṣe,
Awọn ori ila ti o kan - melo ni data ti ni ilọsiwaju tẹlẹ,
Iye iṣẹju-aaya - bawo ni o ṣe pẹ to,
Ipo - boya ohun gbogbo dara tabi rara: O DARA, Aṣiṣe, Nṣiṣẹ, HANGS
Ifiranṣẹ - Ifiranṣẹ aṣeyọri ikẹhin tabi apejuwe aṣiṣe.
Da lori ipo awọn igbasilẹ, o le fi imeeli ranṣẹ. lẹta si awọn ọmọ ẹgbẹ miiran. Ti ko ba si awọn aṣiṣe, lẹhinna lẹta naa ko wulo.
Nitorinaa, ninu iṣẹlẹ ti aṣiṣe, ipo ti isẹlẹ naa jẹ itọkasi kedere.
Nigba miiran o ṣẹlẹ pe ọpa ibojuwo funrararẹ ko ṣiṣẹ. Ni idi eyi, o ṣee ṣe lati pe wiwo (wo) taara ni ibi ipamọ data, lori ipilẹ eyiti a ti kọ ijabọ naa.
ETL monitoring tabili
Lati ṣe ibojuwo ti awọn ilana ETL, tabili kan ati iwo kan ti to.
Lati ṣe eyi, o le pada si
Awọn tabili DDL
CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
SID INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
LOG_DT INTEGER NOT NULL DEFAULT 0, /* Date time */
LOG_D INTEGER NOT NULL DEFAULT 0, /* Date */
JOB_NAME TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
STEP_NAME TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
STEP_DESCR TEXT, /* Description of task or error message */
UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);
Wo/ jabo DDL
CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
SELECT LOG_D,
LOG_DT,
UTL_JOB_STATUS_ID,
SID,
CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
ELSE 'N/A' END AS LAYER,
CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
ELSE 'N/A' END AS SOURCE,
JOB_NAME,
STEP_NAME,
CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
FROM UTL_JOB_STATUS
WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
JB.MIN_LOG_DT AS START_DT,
strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
JB.SOURCE,
JB.LAYER,
JB.JOB_NAME,
CASE
WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
ELSE 'OK'
END AS STATUS,
ERR.STEP_LOG AS STEP_LOG,
JB.CNT AS STEP_CNT,
JB.AFFECTED_ROWS AS AFFECTED_ROWS,
strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
( SELECT SID, SOURCE, LAYER, JOB_NAME,
MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
MAX(START_FLAG) AS START_FLAG,
MAX(END_FLAG) AS END_FLAG,
MAX(ERROR_FLAG) AS ERROR_FLAG,
MIN(LOG_DT) AS MIN_LOG_DT,
MAX(LOG_DT) AS MAX_LOG_DT,
SUM(1) AS CNT,
SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
FROM SRC
GROUP BY SID, SOURCE, LAYER, JOB_NAME
) JB,
( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
FROM SRC
WHERE 1 = 1
) ERR
WHERE 1 = 1
AND JB.SID = ERR.SID
AND JB.JOB_NAME = ERR.JOB_NAME
AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;
Ṣiṣayẹwo SQL boya o ṣee ṣe lati gba nọmba igba tuntun kan
SELECT SUM (
CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
THEN 1 ELSE 0
END ) AS IS_RUNNING
FROM
( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
LEFT OUTER JOIN
( SELECT JOB_NAME, SID, 1 AS dummy
FROM UTL_JOB_STATUS
WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
AND STEP_NAME = 'ETL_START'
GROUP BY JOB_NAME, SID
) start_job /* starts */
ON d_job.dummy = start_job.dummy
LEFT OUTER JOIN
( SELECT JOB_NAME, SID
FROM UTL_JOB_STATUS
WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
GROUP BY JOB_NAME, SID
) end_job /* ends */
ON start_job.JOB_NAME = end_job.JOB_NAME
AND start_job.SID = end_job.SID
Awọn ẹya tabili:
- ibere ati opin ilana ilana data gbọdọ wa ni atẹle nipasẹ awọn igbesẹ ETL_START ati ETL_END
- ninu ọran ti aṣiṣe, igbesẹ ETL_ERROR pẹlu apejuwe rẹ yẹ ki o ṣẹda
- iye data ti a ti ni ilọsiwaju yẹ ki o ṣe afihan, fun apẹẹrẹ, pẹlu awọn asterisks
- Ilana kanna le bẹrẹ ni akoko kanna pẹlu agbara_restart = y paramita, laisi rẹ nọmba igba ti wa ni idasilẹ nikan si ilana ti o pari
- ni ipo deede, o ko le ṣiṣe ilana ilana data kanna ni afiwe
Awọn iṣẹ pataki fun ṣiṣẹ pẹlu tabili jẹ bi atẹle:
- gbigba nọmba igba ti ilana ETL nṣiṣẹ
- fi wiwọle log sinu tabili
- gbigba igbasilẹ aṣeyọri ti o kẹhin ti ilana ETL kan
Ninu awọn data data bii Oracle tabi Postgres, awọn iṣẹ wọnyi le ṣe imuse bi awọn iṣẹ ti a ṣe sinu. sqlite nilo ẹrọ ita, ati ninu ọran yii o
ipari
Nitorinaa, awọn ifiranṣẹ aṣiṣe ninu awọn irinṣẹ sisẹ data ṣe ipa pataki-mega. Sugbon o jẹ soro lati pe wọn ti aipe fun ni kiakia a wiwa awọn fa ti awọn isoro. Nigbati nọmba awọn ilana ba sunmọ ọgọrun, lẹhinna ibojuwo ilana yipada si iṣẹ akanṣe eka kan.
Nkan naa pese apẹẹrẹ ti ojutu ti o ṣeeṣe si iṣoro naa ni irisi apẹrẹ kan. Gbogbo apẹrẹ ibi ipamọ kekere wa ni gitlab
orisun: www.habr.com