Kula da hanyoyin ETL a cikin ƙaramin ma'ajiyar bayanai

Mutane da yawa suna amfani da kayan aiki na musamman don ƙirƙirar hanyoyin cirewa, canzawa, da loda bayanai zuwa bayanan bayanai masu alaƙa. An shigar da tsarin kayan aikin aiki, an gyara kurakurai.

Idan akwai kuskure, log ɗin yana ƙunshe da bayanan da kayan aikin ya gaza kammala aikin kuma waɗanne kayayyaki (sau da yawa java) sun tsaya a inda. A cikin layukan ƙarshe, zaku iya samun kuskuren bayanai, misali, keɓantaccen maɓalli na tebur.

Don amsa tambayar wane irin rawar da bayanin kuskuren ETL ke takawa, na rarraba duk matsalolin da suka faru a cikin shekaru biyu da suka gabata a cikin babban ma'ajiya.

Kula da hanyoyin ETL a cikin ƙaramin ma'ajiyar bayanai

Kurakurai na tushen bayanai sun haɗa da rashin isasshen sarari, haɗin da aka rasa, rataye zaman, da sauransu.

Kurakurai masu ma'ana sun haɗa da kamar keta maɓallan tebur, abubuwan da ba su da inganci, rashin isa ga abubuwa, da sauransu.
Mai tsara jadawalin bazai fara akan lokaci ba, yana iya daskare, da sauransu.

Kuskure masu sauƙi ba sa ɗaukar dogon lokaci don gyarawa. Kyakkyawan ETL na iya ɗaukar yawancin su da kan sa.

Cututtuka masu rikitarwa suna sa ya zama dole don ganowa da gwada hanyoyin aiki tare da bayanai, don bincika tushen bayanai. Sau da yawa haifar da buƙatar canji gwaji da turawa.

Don haka, rabin duk matsalolin suna da alaƙa da bayanan bayanai. 48% na duk kurakurai kurakurai ne masu sauƙi.
Kashi na uku na duk matsalolin suna da alaƙa da canza dabaru ko ƙirar ajiya, fiye da rabin waɗannan kurakurai suna da rikitarwa.

Kuma kasa da kashi ɗaya cikin huɗu na duk matsalolin suna da alaƙa da mai tsara ɗawainiya, 18% daga cikinsu kurakurai ne masu sauƙi.

Gabaɗaya, 22% na duk kurakuran da ke faruwa suna da rikitarwa, kuma gyaran su yana buƙatar kulawa da lokaci. Suna faruwa kusan sau ɗaya a mako. Ganin cewa kurakurai masu sauƙi suna faruwa kusan kowace rana.

A bayyane yake cewa saka idanu kan hanyoyin ETL zai yi tasiri lokacin da aka nuna wurin kuskuren a cikin log ɗin daidai gwargwadon yiwuwar kuma ana buƙatar ƙaramin lokaci don nemo tushen matsalar.

Ingantacciyar kulawa

Menene nake so in gani a cikin tsarin sa ido na ETL?

Kula da hanyoyin ETL a cikin ƙaramin ma'ajiyar bayanai
Fara a - lokacin da ya fara aiki,
Source - tushen bayanai,
Layer - wane matakin ajiya ake lodawa,
ETL Sunan Ayuba - hanyar aikawa, wanda ya ƙunshi ƙananan matakai da yawa,
Lambar Mataki - adadin matakin da ake yi,
Layukan da abin ya shafa - nawa aka riga aka sarrafa bayanai,
Duration sec - tsawon lokacin da ake ɗauka,
Matsayi - ko komai yana da kyau ko a'a: OK, KUSKURE, GUDU, HANGS
Saƙo - Saƙon nasara na ƙarshe ko bayanin kuskure.

Dangane da matsayin shigarwar, zaku iya aika imel. wasika zuwa ga sauran membobin. Idan babu kurakurai, to wasiƙar ba lallai ba ne.

Don haka, idan kuskure ya faru, ana nuna wurin da abin ya faru a fili.

Wani lokaci yana faruwa cewa kayan aikin saka idanu ba ya aiki. A wannan yanayin, yana yiwuwa a kira ra'ayi (view) kai tsaye a cikin bayanan, wanda aka gina rahoton.

Teburin saka idanu ETL

Don aiwatar da saka idanu kan hanyoyin ETL, tebur ɗaya da kallo ɗaya sun isa.

Don yin wannan, zaku iya komawa zuwa 'yar ajiyar ku kuma ƙirƙirar samfuri a cikin bayanan sqlite.

Farashin DDL

CREATE TABLE UTL_JOB_STATUS (
/* Table for logging of job execution log. Important that the job has the steps ETL_START and ETL_END or ETL_ERROR */
  UTL_JOB_STATUS_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  SID               INTEGER NOT NULL DEFAULT -1, /* Session Identificator. Unique for every Run of job */
  LOG_DT            INTEGER NOT NULL DEFAULT 0,  /* Date time */
  LOG_D             INTEGER NOT NULL DEFAULT 0,  /* Date */
  JOB_NAME          TEXT NOT NULL DEFAULT 'N/A', /* Job name like JOB_STG2DM_GEO */
  STEP_NAME         TEXT NOT NULL DEFAULT 'N/A', /* ETL_START, ... , ETL_END/ETL_ERROR */
  STEP_DESCR        TEXT,                        /* Description of task or error message */
  UNIQUE (SID, JOB_NAME, STEP_NAME)
);
INSERT INTO UTL_JOB_STATUS (UTL_JOB_STATUS_ID) VALUES (-1);

Duba/Rahoto DDL

CREATE VIEW IF NOT EXISTS UTL_JOB_STATUS_V
AS /* Content: Package Execution Log for last 3 Months. */
WITH SRC AS (
  SELECT LOG_D,
    LOG_DT,
    UTL_JOB_STATUS_ID,
    SID,
	CASE WHEN INSTR(JOB_NAME, 'FTP') THEN 'TRANSFER' /* file transfer */
	     WHEN INSTR(JOB_NAME, 'STG') THEN 'STAGE' /* stage */
	     WHEN INSTR(JOB_NAME, 'CLS') THEN 'CLEANSING' /* cleansing */
	     WHEN INSTR(JOB_NAME, 'DIM') THEN 'DIMENSION' /* dimension */
	     WHEN INSTR(JOB_NAME, 'FCT') THEN 'FACT' /* fact */
		 WHEN INSTR(JOB_NAME, 'ETL') THEN 'STAGE-MART' /* data mart */
	     WHEN INSTR(JOB_NAME, 'RPT') THEN 'REPORT' /* report */
	     ELSE 'N/A' END AS LAYER,
	CASE WHEN INSTR(JOB_NAME, 'ACCESS') THEN 'ACCESS LOG' /* source */
	     WHEN INSTR(JOB_NAME, 'MASTER') THEN 'MASTER DATA' /* source */
	     WHEN INSTR(JOB_NAME, 'AD-HOC') THEN 'AD-HOC' /* source */
	     ELSE 'N/A' END AS SOURCE,
    JOB_NAME,
    STEP_NAME,
    CASE WHEN STEP_NAME='ETL_START' THEN 1 ELSE 0 END AS START_FLAG,
    CASE WHEN STEP_NAME='ETL_END' THEN 1 ELSE 0 END AS END_FLAG,
    CASE WHEN STEP_NAME='ETL_ERROR' THEN 1 ELSE 0 END AS ERROR_FLAG,
    STEP_NAME || ' : ' || STEP_DESCR AS STEP_LOG,
	SUBSTR( SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), 1, INSTR(SUBSTR(STEP_DESCR, INSTR(STEP_DESCR, '***')+4), '***')-2 ) AS AFFECTED_ROWS
  FROM UTL_JOB_STATUS
  WHERE datetime(LOG_D, 'unixepoch') >= date('now', 'start of month', '-3 month')
)
SELECT JB.SID,
  JB.MIN_LOG_DT AS START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS LOG_DT,
  JB.SOURCE,
  JB.LAYER,
  JB.JOB_NAME,
  CASE
  WHEN JB.ERROR_FLAG = 1 THEN 'ERROR'
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 AND strftime('%s','now') - JB.MIN_LOG_DT > 0.5*60*60 THEN 'HANGS' /* half an hour */
  WHEN JB.ERROR_FLAG = 0 AND JB.END_FLAG = 0 THEN 'RUNNING'
  ELSE 'OK'
  END AS STATUS,
  ERR.STEP_LOG     AS STEP_LOG,
  JB.CNT           AS STEP_CNT,
  JB.AFFECTED_ROWS AS AFFECTED_ROWS,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MIN_LOG_DT, 'unixepoch')) AS JOB_START_DT,
  strftime('%d.%m.%Y %H:%M', datetime(JB.MAX_LOG_DT, 'unixepoch')) AS JOB_END_DT,
  JB.MAX_LOG_DT - JB.MIN_LOG_DT AS JOB_DURATION_SEC
FROM
  ( SELECT SID, SOURCE, LAYER, JOB_NAME,
           MAX(UTL_JOB_STATUS_ID) AS UTL_JOB_STATUS_ID,
           MAX(START_FLAG)       AS START_FLAG,
           MAX(END_FLAG)         AS END_FLAG,
           MAX(ERROR_FLAG)       AS ERROR_FLAG,
           MIN(LOG_DT)           AS MIN_LOG_DT,
           MAX(LOG_DT)           AS MAX_LOG_DT,
           SUM(1)                AS CNT,
           SUM(IFNULL(AFFECTED_ROWS, 0)) AS AFFECTED_ROWS
    FROM SRC
    GROUP BY SID, SOURCE, LAYER, JOB_NAME
  ) JB,
  ( SELECT UTL_JOB_STATUS_ID, SID, JOB_NAME, STEP_LOG
    FROM SRC
    WHERE 1 = 1
  ) ERR
WHERE 1 = 1
  AND JB.SID = ERR.SID
  AND JB.JOB_NAME = ERR.JOB_NAME
  AND JB.UTL_JOB_STATUS_ID = ERR.UTL_JOB_STATUS_ID
ORDER BY JB.MIN_LOG_DT DESC, JB.SID DESC, JB.SOURCE;

SQL Dubawa idan zai yiwu a sami sabuwar lambar zama

SELECT SUM (
  CASE WHEN start_job.JOB_NAME IS NOT NULL AND end_job.JOB_NAME IS NULL /* existed job finished */
	    AND NOT ( 'y' = 'n' ) /* force restart PARAMETER */
       THEN 1 ELSE 0
  END ) AS IS_RUNNING
  FROM
    ( SELECT 1 AS dummy FROM UTL_JOB_STATUS WHERE sid = -1) d_job
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID, 1 AS dummy
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG' /* job name PARAMETER */
	    AND STEP_NAME = 'ETL_START'
      GROUP BY JOB_NAME, SID
    ) start_job /* starts */
  ON d_job.dummy = start_job.dummy
  LEFT OUTER JOIN
    ( SELECT JOB_NAME, SID
      FROM UTL_JOB_STATUS
      WHERE JOB_NAME = 'RPT_ACCESS_LOG'  /* job name PARAMETER */
	    AND STEP_NAME in ('ETL_END', 'ETL_ERROR') /* stop status */
      GROUP BY JOB_NAME, SID
    ) end_job /* ends */
  ON start_job.JOB_NAME = end_job.JOB_NAME
     AND start_job.SID = end_job.SID

Siffofin tebur:

  • farkon da ƙarshen aikin sarrafa bayanai dole ne a bi ta hanyar ETL_START da ETL_END
  • idan akwai kuskure, yakamata a ƙirƙiri matakin ETL_ERROR tare da bayaninsa
  • ya kamata a haskaka adadin bayanan da aka sarrafa, alal misali, tare da taurari
  • Hakanan za'a iya fara wannan hanya a lokaci guda tare da ma'aunin Force_restart=y, ba tare da shi ba ana ba da lambar zaman ne kawai zuwa tsarin da aka kammala.
  • a yanayin al'ada, ba za ku iya gudanar da tsarin sarrafa bayanai iri ɗaya a layi daya ba

Ayyukan da ake buƙata don aiki tare da tebur sune kamar haka:

  • samun lambar zaman tsarin ETL mai gudana
  • saka shigarwar log a cikin tebur
  • samun rikodin nasara na ƙarshe na hanyar ETL

A cikin bayanan bayanai kamar Oracle ko Postgres, ana iya aiwatar da waɗannan ayyukan azaman ayyukan ginanniyar. sqlite yana buƙatar tsarin waje, kuma a wannan yanayin shi prototyped a cikin PHP.

ƙarshe

Don haka, saƙonnin kuskure a cikin kayan aikin sarrafa bayanai suna taka muhimmiyar rawa. Amma yana da wahala a kira su mafi kyau don gano dalilin matsalar da sauri. Lokacin da adadin hanyoyin ya kusanci ɗari, to, tsarin sa ido ya juya zuwa wani aiki mai rikitarwa.

Labarin ya ba da misali na yiwuwar magance matsalar ta hanyar samfuri. Duk ƙaramin samfurin ma'ajiya yana samuwa a gitlab SQLite PHP ETL Utilities.

source: www.habr.com

Add a comment