Mutane da yawa suna amfani da kayan aiki na musamman don ƙirƙirar hanyoyin cirewa, canzawa, da loda bayanai zuwa bayanan bayanai masu alaƙa. An shigar da tsarin kayan aikin aiki, an gyara kurakurai.
Idan akwai kuskure, log ɗin yana ƙunshe da bayanan da kayan aikin ya gaza kammala aikin kuma waɗanne kayayyaki (sau da yawa java) sun tsaya a inda. A cikin layukan ƙarshe, zaku iya samun kuskuren bayanai, misali, keɓantaccen maɓalli na tebur.
Don amsa tambayar wane irin rawar da bayanin kuskuren ETL ke takawa, na rarraba duk matsalolin da suka faru a cikin shekaru biyu da suka gabata a cikin babban ma'ajiya.
Kurakurai na tushen bayanai sun haɗa da rashin isasshen sarari, haɗin da aka rasa, rataye zaman, da sauransu.
Kurakurai masu ma'ana sun haɗa da kamar keta maɓallan tebur, abubuwan da ba su da inganci, rashin isa ga abubuwa, da sauransu.
Mai tsara jadawalin bazai fara akan lokaci ba, yana iya daskare, da sauransu.
Kuskure masu sauƙi ba sa ɗaukar dogon lokaci don gyarawa. Kyakkyawan ETL na iya ɗaukar yawancin su da kan sa.
Cututtuka masu rikitarwa suna sa ya zama dole don ganowa da gwada hanyoyin aiki tare da bayanai, don bincika tushen bayanai. Sau da yawa haifar da buƙatar canji gwaji da turawa.
Don haka, rabin duk matsalolin suna da alaƙa da bayanan bayanai. 48% na duk kurakurai kurakurai ne masu sauƙi.
Kashi na uku na duk matsalolin suna da alaƙa da canza dabaru ko ƙirar ajiya, fiye da rabin waɗannan kurakurai suna da rikitarwa.
Kuma kasa da kashi ɗaya cikin huɗu na duk matsalolin suna da alaƙa da mai tsara ɗawainiya, 18% daga cikinsu kurakurai ne masu sauƙi.
Gabaɗaya, 22% na duk kurakuran da ke faruwa suna da rikitarwa, kuma gyaran su yana buƙatar kulawa da lokaci. Suna faruwa kusan sau ɗaya a mako. Ganin cewa kurakurai masu sauƙi suna faruwa kusan kowace rana.
A bayyane yake cewa saka idanu kan hanyoyin ETL zai yi tasiri lokacin da aka nuna wurin kuskuren a cikin log ɗin daidai gwargwadon yiwuwar kuma ana buƙatar ƙaramin lokaci don nemo tushen matsalar.
Ingantacciyar kulawa
Menene nake so in gani a cikin tsarin sa ido na ETL?
Fara a - lokacin da ya fara aiki,
Source - tushen bayanai,
Layer - wane matakin ajiya ake lodawa,
ETL Sunan Ayuba - hanyar aikawa, wanda ya ƙunshi ƙananan matakai da yawa,
Lambar Mataki - adadin matakin da ake yi,
Layukan da abin ya shafa - nawa aka riga aka sarrafa bayanai,
Duration sec - tsawon lokacin da ake ɗauka,
Matsayi - ko komai yana da kyau ko a'a: OK, KUSKURE, GUDU, HANGS
Saƙo - Saƙon nasara na ƙarshe ko bayanin kuskure.
Dangane da matsayin shigarwar, zaku iya aika imel. wasika zuwa ga sauran membobin. Idan babu kurakurai, to wasiƙar ba lallai ba ne.
Don haka, idan kuskure ya faru, ana nuna wurin da abin ya faru a fili.
Wani lokaci yana faruwa cewa kayan aikin saka idanu ba ya aiki. A wannan yanayin, yana yiwuwa a kira ra'ayi (view) kai tsaye a cikin bayanan, wanda aka gina rahoton.
Teburin saka idanu ETL
Don aiwatar da saka idanu kan hanyoyin ETL, tebur ɗaya da kallo ɗaya sun isa.
Don yin wannan, zaku iya komawa zuwa
Farashin DDL
CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
SID INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
LOG_DT INTEGER NOT NULL DEFAULT 0, /* Date time */
LOG_D INTEGER NOT NULL DEFAULT 0, /* Date */
JOB_NAME TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
STEP_NAME TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
STEP_DESCR TEXT, /* Description of task or error message */
UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);
Duba/Rahoto DDL
CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
SELECT LOG_D,
LOG_DT,
UTL_JOB_STATUS_ID,
SID,
CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
ELSE 'N/A' END AS LAYER,
CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
ELSE 'N/A' END AS SOURCE,
JOB_NAME,
STEP_NAME,
CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
FROM UTL_JOB_STATUS
WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
JB.MIN_LOG_DT AS START_DT,
strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
JB.SOURCE,
JB.LAYER,
JB.JOB_NAME,
CASE
WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
ELSE 'OK'
END AS STATUS,
ERR.STEP_LOG AS STEP_LOG,
JB.CNT AS STEP_CNT,
JB.AFFECTED_ROWS AS AFFECTED_ROWS,
strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
( SELECT SID, SOURCE, LAYER, JOB_NAME,
MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
MAX(START_FLAG) AS START_FLAG,
MAX(END_FLAG) AS END_FLAG,
MAX(ERROR_FLAG) AS ERROR_FLAG,
MIN(LOG_DT) AS MIN_LOG_DT,
MAX(LOG_DT) AS MAX_LOG_DT,
SUM(1) AS CNT,
SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
FROM SRC
GROUP BY SID, SOURCE, LAYER, JOB_NAME
) JB,
( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
FROM SRC
WHERE 1 = 1
) ERR
WHERE 1 = 1
AND JB.SID = ERR.SID
AND JB.JOB_NAME = ERR.JOB_NAME
AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;
SQL Dubawa idan zai yiwu a sami sabuwar lambar zama
SELECT SUM (
CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
THEN 1 ELSE 0
END ) AS IS_RUNNING
FROM
( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
LEFT OUTER JOIN
( SELECT JOB_NAME, SID, 1 AS dummy
FROM UTL_JOB_STATUS
WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
AND STEP_NAME = 'ETL_START'
GROUP BY JOB_NAME, SID
) start_job /* starts */
ON d_job.dummy = start_job.dummy
LEFT OUTER JOIN
( SELECT JOB_NAME, SID
FROM UTL_JOB_STATUS
WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
GROUP BY JOB_NAME, SID
) end_job /* ends */
ON start_job.JOB_NAME = end_job.JOB_NAME
AND start_job.SID = end_job.SID
Siffofin tebur:
- farkon da ƙarshen aikin sarrafa bayanai dole ne a bi ta hanyar ETL_START da ETL_END
- idan akwai kuskure, yakamata a ƙirƙiri matakin ETL_ERROR tare da bayaninsa
- ya kamata a haskaka adadin bayanan da aka sarrafa, alal misali, tare da taurari
- Hakanan za'a iya fara wannan hanya a lokaci guda tare da ma'aunin Force_restart=y, ba tare da shi ba ana ba da lambar zaman ne kawai zuwa tsarin da aka kammala.
- a yanayin al'ada, ba za ku iya gudanar da tsarin sarrafa bayanai iri ɗaya a layi daya ba
Ayyukan da ake buƙata don aiki tare da tebur sune kamar haka:
- samun lambar zaman tsarin ETL mai gudana
- saka shigarwar log a cikin tebur
- samun rikodin nasara na ƙarshe na hanyar ETL
A cikin bayanan bayanai kamar Oracle ko Postgres, ana iya aiwatar da waɗannan ayyukan azaman ayyukan ginanniyar. sqlite yana buƙatar tsarin waje, kuma a wannan yanayin shi
ƙarshe
Don haka, saƙonnin kuskure a cikin kayan aikin sarrafa bayanai suna taka muhimmiyar rawa. Amma yana da wahala a kira su mafi kyau don gano dalilin matsalar da sauri. Lokacin da adadin hanyoyin ya kusanci ɗari, to, tsarin sa ido ya juya zuwa wani aiki mai rikitarwa.
Labarin ya ba da misali na yiwuwar magance matsalar ta hanyar samfuri. Duk ƙaramin samfurin ma'ajiya yana samuwa a gitlab
source: www.habr.com