Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Ta hanyar nazarin kididdigar rukunin yanar gizon, muna samun ra'ayin abin da ke faruwa da shi. Muna kwatanta sakamakon da sauran ilimin game da samfur ko sabis kuma ta haka inganta ƙwarewar mu.

Lokacin da aka kammala nazarin sakamakon farko, an fahimci bayanin kuma an yanke shawara, mataki na gaba ya fara. Ra'ayoyi sun taso: menene zai faru idan kun kalli bayanan daga wancan gefen?

Akwai iyakoki na kayan aikin bincike a wannan matakin. Wannan na ɗaya daga cikin dalilan da ya sa Google Analytics bai ishe ni ba, wato saboda ƙarancin iya gani da sarrafa bayanana.

A koyaushe ina so in ɗora bayanan asali (manyan bayanan), ƙara wani matakin haɗuwa, ko fassara ƙimar data kasance daban.

Wannan yana da sauƙin yi a ciki a cikin ƙaramin ɗakin ajiyar ku bisa ga access.log fayil kuma harshen SQL ya wadatar don wannan.

To, waɗanne tambayoyi nake so a amsa?

Menene kuma lokacin da aka canza akan rukunin yanar gizon

Tarihin canje-canje a cikin bayanan da ke ƙasa (babban bayanan) koyaushe yana da ban sha'awa.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Tambayar rahoton SQL

SELECT
	1 as 'SideStackedBar: Content Updates by Months',
	strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch')) AS 'Day',
	COUNT(CASE WHEN PAGE_TITLE != 'n.a.' THEN DIM_REQUEST_ID END) AS 'Web page updates',
	COUNT(CASE WHEN PAGE_DESCR = 'IMAGES' THEN DIM_REQUEST_ID END) AS 'Image uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'VIDEO' THEN DIM_REQUEST_ID END) AS 'Video uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'AUDIO' THEN DIM_REQUEST_ID END) AS 'Audio uploads'
FROM DIM_REQUEST
WHERE PAGE_TITLE != 'n.a.' OR PAGE_DESCR != 'n.a.'
GROUP BY strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch'))
ORDER BY UPDATE_DT

Misali, a wani lokaci, an aiwatar da inganta injin bincike ko kuma an ƙara sabon abun ciki zuwa rukunin yanar gizon, sabili da haka ana sa ran karuwar zirga-zirga.

Ƙungiyar masu amfani

Misali mafi sauƙi na ƙungiya shine wakilin mai amfani ko sunan tsarin aiki.

Girman wakilin mai amfani ya tara kimanin bayanai dubu kuma ina sha'awar ganin yanayin rarraba wakilai a cikin rukuni.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Tambayar rahoton SQL

SELECT
	1 AS 'SideStackedBar: User Agents',
	AGENT_OS AS 'OS',
	SUM(CASE WHEN AGENT_BOT = 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Users',
	SUM(CASE WHEN AGENT_BOT != 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Bots'
FROM DIM_USER_AGENT
WHERE DIM_USER_AGENT_ID != -1
GROUP BY AGENT_OS
ORDER BY 3 DESC

Yawancin haɗuwa daban-daban na wakilai suna zuwa shafin daga duniyar Windows. Daga cikin wadanda ba a tantance ba akwai WhatsApp, PocketImageCache, PlayStation, SmartTV, da dai sauransu.

Ayyukan ƙungiyar masu amfani ta mako

Ta hanyar haɗa wasu ƙungiyoyi, ana iya lura da yadda ake rarraba ayyukansu.

Misali, masu amfani da gungu na Linux suna cin zirga-zirgar gidan yanar gizo fiye da kowa.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Tambayar rahoton SQL

SELECT
1 as 'StackedBar: Traffic Volume by User OS and by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
SUM(CASE WHEN USG.AGENT_OS IN ('Android', 'Linux') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Android/Linux Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Windows') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Windows Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Macintosh', 'iOS') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Mac/iOS Users',
SUM(CASE WHEN USG.AGENT_OS IN ('n.a.', 'BlackBerry') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Other'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT = 'n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Yawan amfani da ababen hawa

Teburin yana nuna ƙungiyoyin masu amfani mafi aiki da ranar ayyukansu.
Mafi yawan aiki suna cikin gungu na Linux.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Tambayar rahoton SQL

SELECT
1 AS 'Table: User Agent with Havy Usage',
strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
ROUND(1.0*SUM(FCT.BYTES)/1000000, 1) AS 'Traffic MB',
ROUND(1.0*SUM(FCT.IP_CNT)/SUM(1), 1) AS 'IPs',
ROUND(1.0*SUM(FCT.REQUEST_CNT)/SUM(1), 1) AS 'Requests',
USA.DIM_USER_AGENT_ID AS 'ID',
MAX(USA.USER_AGENT_NK) AS 'User Agent',
MAX(USA.AGENT_BOT) AS 'Bot'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USA
WHERE FCT.DIM_USER_AGENT_ID = USA.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-30 day')
GROUP BY USA.DIM_USER_AGENT_ID, strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) 
ORDER BY SUM(FCT.BYTES) DESC, FCT.EVENT_DT
LIMIT 10

Yin amfani da sifofin ID na rana da wakili, yana yiwuwa a samu da sauri da bin kididdigar kwanakin ƙungiyoyin masu amfani ɗaya. Idan ya cancanta, zaku iya samun cikakken bayani da sauri a cikin tebur mataki.

Yadda ake samun bayanai?

Bayani daga fayil access.log za a iya ƙara haɓakawa ta hanyar haɗa ƙarin hanyoyin bayanai da kuma gabatar da sababbin matakan tarawa da haɗawa.

Bayanan asali da mahalli

Bayanan asali sun haɗa da bayanai game da mahalli: shafukan yanar gizo, hotuna, bidiyo da abun ciki mai jiwuwa, a cikin yanayin kantin sayar da kayayyaki - samfurori.

Ƙungiyoyin da kansu suna aiki azaman ma'auni, kuma tsarin adana canje-canje a cikin halayen ana kiransa tarihi. A cikin ma'ajin bayanai, ana aiwatar da wannan tsari sau da yawa a cikin nau'in ma'auni masu bambanta a hankali (SCD).

Bayanan tushe na iya fitowa daga tsari iri-iri, don haka kusan koyaushe suna buƙatar haɗa su.

A hankali yana canza girma

Girman DIM_REQUEST zai ƙunshi bayanai game da buƙatun kan rukunin yanar gizon a cikin sigar tarihi.

Tebur SCD2

CREATE TABLE DIM_REQUEST ( /* scd table for user requests */
  DIM_REQUEST_ID      INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  DIM_REQUEST_ID_HIST INTEGER NOT NULL DEFAULT -1,
  REQUEST_NK          TEXT NOT NULL DEFAULT 'n.a.', /* request without ?parameters */
  PAGE_TITLE          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_DESCR          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_KEYWORDS       TEXT NOT NULL DEFAULT 'n.a.',
  DELETE_FLAG         INTEGER NOT NULL DEFAULT 0,
  UPDATE_DT           INTEGER NOT NULL DEFAULT 0,
  UNIQUE (REQUEST_NK, DIM_REQUEST_ID_HIST)
);
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID) VALUES (-1);

Baya ga shi, za mu ƙirƙiri ra'ayi ɗaya wanda koyaushe yana nuna duk bayanan a cikin sabuwar jihar. Wajibi ne don loda ma'aunin kanta.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Halin halin yanzu na SCD2

/* Content: actual view on scd table */
SELECT HI.DIM_REQUEST_ID,
  HI.DIM_REQUEST_ID_HIST,
  HI.REQUEST_NK,
  HI.PAGE_TITLE,
  HI.PAGE_DESCR,
  HI.PAGE_KEYWORDS,
  NK.CNT AS HIST_CNT,
  HI.DELETE_FLAG,
  strftime('%d.%m.%Y %H:%M', datetime(HI.UPDATE_DT, 'unixepoch')) AS UPDATE_DT
FROM
  ( SELECT REQUEST_NK, MAX(DIM_REQUEST_ID) AS DIM_REQUEST_ID, SUM(1) AS CNT
    FROM DIM_REQUEST
    GROUP BY REQUEST_NK
  ) NK,
  DIM_REQUEST HI
WHERE 1 = 1
  AND NK.REQUEST_NK = HI.REQUEST_NK
  AND NK.DIM_REQUEST_ID = HI.DIM_REQUEST_ID;

Kuma ra'ayi inda ake tattara bayanan tarihi don kowane shigarwa. Wajibi ne a gina ingantacciyar alaƙa ta tarihi tare da gaskiya.

Ƙarin ƙididdiga na rukunin yanar gizo a cikin ƙaramin ma'ajiyar ku

Gabatarwar tarihi na SCD2

/* Content: actual view on scd table */
SELECT SCD.DIM_REQUEST_ID,
  SCD.DIM_REQUEST_ID_HIST,
  SCD.REQUEST_NK,
  SCD.PAGE_TITLE,
  SCD.PAGE_DESCR,
  SCD.PAGE_KEYWORDS,
  SCD.DELETE_FLAG,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 1
    ELSE 0 END ACTIVE_FLAG,
  SCD.DIM_REQUEST_ID_HIST AS ID_FROM,
  SCD.DIM_REQUEST_ID AS ID_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN 3600
    ELSE IFNULL(SCD.UPDATE_DT,3600)
  END AS TIME_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 253370764800
    ELSE HIS.UPDATE_DT
  END AS TIME_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(3600, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(IFNULL(SCD.UPDATE_DT,3600), 'unixepoch'))
  END AS ACTIVE_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(253370764800, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(HIS.UPDATE_DT, 'unixepoch'))
  END AS ACTIVE_TO
FROM
  DIM_REQUEST SCD
  LEFT OUTER JOIN DIM_REQUEST HIS
  ON SCD.REQUEST_NK = HIS.REQUEST_NK AND SCD.DIM_REQUEST_ID = HIS.DIM_REQUEST_ID_HIST;

Tarin Data

Matsawa (tarin) yana ba ku damar kimanta bayanai a matsayi mafi girma kuma gano abubuwan da ba su da kyau da yanayin da ba a iya gani a cikin cikakkun rahotanni.

Misali, ƙara rukuni zuwa girman tare da lambobin matsayi DIM_HTTP_STATUS:

MATSAYI/GROUP
0xx/n.a.
1xx/Bayani
2xx/Nasara
3xx/Juyawa
4xx/Kuskuren Abokin ciniki
5xx/Kuskuren uwar garke

Girman wakilin mai amfani DIM_USER_AGENT zai ƙunshi halayen AGENT_OS da AGENT_BOT waɗanda ke da alhakin ƙungiyoyi. Ana iya cika su yayin aiwatar da ETL:

Ana loda DIM_USER_AGENT

/* Propagate the user agent from access log */
INSERT INTO DIM_USER_AGENT (USER_AGENT_NK, AGENT_OS, AGENT_ENGINE, AGENT_DEVICE, AGENT_BOT, UPDATE_DT)
WITH CLS AS (
	SELECT BROWSER
	FROM STG_ACCESS_LOG WHERE LENGTH(BROWSER)>1
	GROUP BY BROWSER
)
SELECT
	CLS.BROWSER AS USER_AGENT_NK,
	CASE
	WHEN INSTR(CLS.BROWSER,'Macintosh')>0
		THEN 'Macintosh'
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
			 OR INSTR(CLS.BROWSER,'iPad')>0
			 OR INSTR(CLS.BROWSER,'iPod')>0
			 OR INSTR(CLS.BROWSER,'Apple TV')>0
			 OR INSTR(CLS.BROWSER,'Darwin')>0
		THEN 'iOS'
	WHEN INSTR(CLS.BROWSER,'Android')>0
		THEN 'Android'
	WHEN INSTR(CLS.BROWSER,'X11;')>0 OR INSTR(CLS.BROWSER,'Wayland;')>0 OR INSTR(CLS.BROWSER,'linux-gnu')>0
		THEN 'Linux'
	WHEN INSTR(CLS.BROWSER,'BB10;')>0 OR INSTR(CLS.BROWSER,'BlackBerry')>0
		THEN 'BlackBerry'
	WHEN INSTR(CLS.BROWSER,'Windows')>0
		THEN 'Windows'
	ELSE 'n.a.' END AS AGENT_OS, -- OS
	CASE
	WHEN INSTR(CLS.BROWSER,'AppleCoreMedia')>0
		THEN 'AppleWebKit'
	WHEN INSTR(CLS.BROWSER,') ')>1 AND LENGTH(CLS.BROWSER)>INSTR(CLS.BROWSER,') ')
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,') ')+2, LENGTH(CLS.BROWSER) - INSTR(CLS.BROWSER,') ')-1), 'N/A')
	ELSE 'n.a.' END AS AGENT_ENGINE, -- Engine
	CASE
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
		THEN 'iPhone'
	WHEN INSTR(CLS.BROWSER,'iPad')>0
		THEN 'iPad'
	WHEN INSTR(CLS.BROWSER,'iPod')>0
		THEN 'iPod'
	WHEN INSTR(CLS.BROWSER,'Apple TV')>0
		THEN 'Apple TV'
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'Build')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'Build')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'MIUI')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'MIUI')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	ELSE 'n.a.' END AS AGENT_DEVICE, -- Device
	CASE
	WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
		THEN 'yandex'
	WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
		THEN 'google'
	WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
		THEN 'microsoft'
	WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
		THEN 'ahrefs'
	WHEN INSTR(LOWER(CLS.BROWSER),'jobboersebot')>0 OR INSTR(LOWER(CLS.BROWSER),'jobkicks')>0
		THEN 'job.de'
	WHEN INSTR(LOWER(CLS.BROWSER),'mail.ru')>0
		THEN 'mail.ru'
	WHEN INSTR(LOWER(CLS.BROWSER),'baiduspider')>0
		THEN 'baidu'
	WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
		THEN 'majestic-12'
	WHEN INSTR(LOWER(CLS.BROWSER),'duckduckgo')>0
		THEN 'duckduckgo'
	WHEN INSTR(LOWER(CLS.BROWSER),'bytespider')>0
		THEN 'bytespider'
	WHEN INSTR(LOWER(CLS.BROWSER),'360spider')>0
		THEN 'so.360.cn'
	WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
		OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
		OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
		OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0 OR INSTR(LOWER(CLS.BROWSER),'wget')>0
		THEN 'other'
	ELSE 'n.a.' END AS AGENT_BOT, -- Bot
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN DIM_USER_AGENT TRG
ON CLS.BROWSER = TRG.USER_AGENT_NK
WHERE TRG.DIM_USER_AGENT_ID IS NULL

Haɗin bayanai

Ya haɗa da tsara canja wurin bayanai daga tsarin aiki zuwa tsarin bayar da rahoto. Don yin wannan, kuna buƙatar ƙirƙirar tebur mataki tare da tsari mai kama da tushen.

Bayani game da shafukan yanar gizon yana zuwa mataki daga madadin CMS a cikin hanyar sa buƙatun.

Load da tebur na tarihi DIM_REQUEST tare da bayanan asali yana faruwa a matakai uku: loda sabbin maɓalli da sifofi, sabunta waɗanda suke, da aikata bayanan da aka goge.

Ana loda sabbin rikodin SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* new records from stage: CLS */
SELECT
	-1 AS DIM_REQUEST_ID_HIST,
	CLS.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN
 (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
) TRG ON CLS.REQUEST_NK = TRG.REQUEST_NK
WHERE TRG.REQUEST_NK IS NULL -- no such record in data mart

Ana sabunta Halayen SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* updated records from stage: CLS and build reference to history: HIST */
SELECT
	HIST.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	HIST.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS,
     DIM_REQUEST_V_ACT TRG,
     DIM_REQUEST HIST
WHERE CLS.REQUEST_NK = TRG.REQUEST_NK
  AND TRG.DIM_REQUEST_ID = HIST.DIM_REQUEST_ID
  AND ( CLS.PAGE_TITLE != HIST.PAGE_TITLE /* changes only */
     OR CLS.PAGE_DESCR != HIST.PAGE_DESCR
     OR CLS.PAGE_KEYWORDS != HIST.PAGE_KEYWORDS )

Share rikodin SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/*  deleted records in data mart: TRG */
SELECT
	TRG.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	TRG.REQUEST_NK,
	TRG.PAGE_TITLE,
	TRG.PAGE_DESCR,
	TRG.PAGE_KEYWORDS,
	1 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
	WHERE PAGE_TITLE != 'n.a.' -- track master data only
	  AND DELETE_FLAG = 0 -- not already deleted
) TRG
LEFT OUTER JOIN CLS ON TRG.REQUEST_NK = CLS.REQUEST_NK
WHERE CLS.REQUEST_NK IS NULL -- no such record in stage

Dole ne kowane tushen bayanai ya kasance tare da kwatancen hukuma, misali, a cikin fayil readme.txt:

Mai karɓar bayanan bisa ka'ida/na fasaha: suna, adireshin imel
Mai ba da bayanai bisa ka'ida/na fasaha: suna, adireshin imel
Tushen bayanai: hanyar fayil, sunayen sabis
Bayanan samun damar bayanai: masu amfani da kalmomin shiga

Zane-zanen kwararar bayanai zai taimaka wajen aiwatar da aiki da sabuntawa, alal misali, a cikin sigar rubutu:

Matsar da fayil. Tushen: ftp.domain.net: /logs/access.log Target: /var/www/access.log
Karatu akan mataki. Manufar: STG_ACCESS_LOG
Loading da canji. Manufar: FCT_ACCESS_REQUEST_REF_HH
Loading da canji. Manufar: FCT_ACCESS_USER_AGENT_DD
Rahoton. Manufar: /var/www/report.html

ƙarshe

Don haka, labarin ya bayyana hanyoyin kamar haɗakar da bayanan da ke ciki da kuma gabatar da sababbin matakan tarawa. Ana buƙatar su lokacin gina ɗakunan ajiya na bayanai don samun ƙarin ilimi da inganta ingancin bayanai.

source: www.habr.com

Add a comment