Takwimu zaidi za tovuti katika hifadhi yako ndogo

Kwa kuchambua takwimu za tovuti, tunapata wazo la kile kinachotokea nayo. Tunalinganisha matokeo na maarifa mengine kuhusu bidhaa au huduma na hivyo kuboresha matumizi yetu.

Wakati uchambuzi wa matokeo ya kwanza umekamilika, habari imeeleweka na hitimisho limetolewa, hatua inayofuata huanza. Mawazo yanatokea: nini kitatokea ikiwa utaangalia data kutoka upande mwingine?

Kuna mapungufu ya zana za uchambuzi katika hatua hii. Hii ni mojawapo ya sababu kwa nini Google Analytics haikunitosha, yaani kwa sababu ya uwezo mdogo wa kuona na kuendesha data yangu.

Siku zote nilitaka kupakia data ya msingi haraka (data kuu), kuongeza kiwango kingine cha ujumlishaji, au kutafsiri maadili yaliyopo kwa njia tofauti.

Hii ni rahisi kufanya ndani katika chumba chako kidogo cha kuhifadhi kulingana na faili ya access.log na lugha ya SQL inatosha kwa hili.

Kwa hivyo, ni maswali gani nilitaka kujibiwa?

Nini na wakati iliyopita kwenye tovuti

Historia ya mabadiliko katika data ya msingi (data kuu) inavutia kila wakati.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Swali la ripoti ya SQL

SELECT
	1 as 'SideStackedBar: Content Updates by Months',
	strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch')) AS 'Day',
	COUNT(CASE WHEN PAGE_TITLE != 'n.a.' THEN DIM_REQUEST_ID END) AS 'Web page updates',
	COUNT(CASE WHEN PAGE_DESCR = 'IMAGES' THEN DIM_REQUEST_ID END) AS 'Image uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'VIDEO' THEN DIM_REQUEST_ID END) AS 'Video uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'AUDIO' THEN DIM_REQUEST_ID END) AS 'Audio uploads'
FROM DIM_REQUEST
WHERE PAGE_TITLE != 'n.a.' OR PAGE_DESCR != 'n.a.'
GROUP BY strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch'))
ORDER BY UPDATE_DT

Kwa mfano, wakati fulani, uboreshaji wa injini ya utafutaji ulifanyika au maudhui mapya yaliongezwa kwenye tovuti, na kwa hiyo ongezeko la trafiki linatarajiwa.

Kundi la watumiaji

Mfano rahisi zaidi wa kikundi ni wakala wa mtumiaji au jina la mfumo wa uendeshaji.

Kipimo cha wakala wa mtumiaji kimekusanya takriban rekodi elfu moja na nilipenda kuona mienendo ya usambazaji wa mawakala ndani ya kikundi.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Swali la ripoti ya SQL

SELECT
	1 AS 'SideStackedBar: User Agents',
	AGENT_OS AS 'OS',
	SUM(CASE WHEN AGENT_BOT = 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Users',
	SUM(CASE WHEN AGENT_BOT != 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Bots'
FROM DIM_USER_AGENT
WHERE DIM_USER_AGENT_ID != -1
GROUP BY AGENT_OS
ORDER BY 3 DESC

Idadi kubwa zaidi ya michanganyiko tofauti ya mawakala huja kwenye tovuti kutoka duniani kote WindowsMiongoni mwa zisizofafanuliwa zilikuwa WhatsApp, PocketImageCache, PlayStation, SmartTV, na zingine.

Shughuli ya kikundi cha watumiaji kwa wiki

Kwa kuchanganya vikundi vingine, mtu anaweza kutazama usambazaji wa shughuli zao.

Kwa mfano, watumiaji wa nguzo Linux hutumia trafiki zaidi kwenye tovuti kuliko zingine zote.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Swali la ripoti ya SQL

SELECT
1 as 'StackedBar: Traffic Volume by User OS and by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
SUM(CASE WHEN USG.AGENT_OS IN ('Android', 'Linux') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Android/Linux Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Windows') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Windows Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Macintosh', 'iOS') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Mac/iOS Users',
SUM(CASE WHEN USG.AGENT_OS IN ('n.a.', 'BlackBerry') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Other'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT = 'n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Matumizi makubwa ya trafiki

Jedwali linaonyesha vikundi vya watumiaji vilivyo hai zaidi na siku ya shughuli zao.
Zile zinazofanya kazi zaidi zinahusiana na Linux kundi.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Swali la ripoti ya SQL

SELECT
1 AS 'Table: User Agent with Havy Usage',
strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
ROUND(1.0*SUM(FCT.BYTES)/1000000, 1) AS 'Traffic MB',
ROUND(1.0*SUM(FCT.IP_CNT)/SUM(1), 1) AS 'IPs',
ROUND(1.0*SUM(FCT.REQUEST_CNT)/SUM(1), 1) AS 'Requests',
USA.DIM_USER_AGENT_ID AS 'ID',
MAX(USA.USER_AGENT_NK) AS 'User Agent',
MAX(USA.AGENT_BOT) AS 'Bot'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USA
WHERE FCT.DIM_USER_AGENT_ID = USA.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-30 day')
GROUP BY USA.DIM_USER_AGENT_ID, strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) 
ORDER BY SUM(FCT.BYTES) DESC, FCT.EVENT_DT
LIMIT 10

Kwa kutumia siku na sifa za kitambulisho cha wakala, inakuwa rahisi kupata na kufuatilia kwa haraka takwimu za siku za vikundi vya watumiaji binafsi. Ikiwa ni lazima, unaweza kupata maelezo ya kina haraka kwenye meza ya hatua.

Jinsi ya kupata habari?

Taarifa kutoka kwa faili ya access.log inaweza kufanywa kuwa bora zaidi kwa kuunganisha vyanzo vya ziada vya data na kuanzisha viwango vipya vya kujumlisha na kupanga.

Data ya msingi na vyombo

Data ya msingi inajumuisha taarifa kuhusu vyombo: kurasa za wavuti, picha, video na maudhui ya sauti, katika kesi ya duka - bidhaa.

Vyombo wenyewe hufanya kama vipimo, na mchakato wa kuhifadhi mabadiliko katika sifa unaitwa uwekaji historia. Katika hifadhidata, mchakato huu mara nyingi hutekelezwa kwa njia ya vipimo vinavyotofautiana polepole (SCD).

Data ya chanzo inaweza kutoka kwa mifumo mbalimbali, kwa hiyo karibu kila mara inahitaji kuunganishwa.

Kubadilisha mwelekeo polepole

Kipimo cha DIM_REQUEST kitakuwa na taarifa kuhusu maombi kwenye tovuti katika fomu ya kihistoria.

Jedwali la SCD2

CREATE TABLE DIM_REQUEST ( /* scd table for user requests */
  DIM_REQUEST_ID      INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  DIM_REQUEST_ID_HIST INTEGER NOT NULL DEFAULT -1,
  REQUEST_NK          TEXT NOT NULL DEFAULT 'n.a.', /* request without ?parameters */
  PAGE_TITLE          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_DESCR          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_KEYWORDS       TEXT NOT NULL DEFAULT 'n.a.',
  DELETE_FLAG         INTEGER NOT NULL DEFAULT 0,
  UPDATE_DT           INTEGER NOT NULL DEFAULT 0,
  UNIQUE (REQUEST_NK, DIM_REQUEST_ID_HIST)
);
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID) VALUES (-1);

Mbali na hayo, tutaunda mtazamo mmoja ambao unaonyesha rekodi zote katika hali ya hivi karibuni. Muhimu kwa kupakia kipimo yenyewe.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Uwakilishi wa sasa wa SCD2

/* Content: actual view on scd table */
SELECT HI.DIM_REQUEST_ID,
  HI.DIM_REQUEST_ID_HIST,
  HI.REQUEST_NK,
  HI.PAGE_TITLE,
  HI.PAGE_DESCR,
  HI.PAGE_KEYWORDS,
  NK.CNT AS HIST_CNT,
  HI.DELETE_FLAG,
  strftime('%d.%m.%Y %H:%M', datetime(HI.UPDATE_DT, 'unixepoch')) AS UPDATE_DT
FROM
  ( SELECT REQUEST_NK, MAX(DIM_REQUEST_ID) AS DIM_REQUEST_ID, SUM(1) AS CNT
    FROM DIM_REQUEST
    GROUP BY REQUEST_NK
  ) NK,
  DIM_REQUEST HI
WHERE 1 = 1
  AND NK.REQUEST_NK = HI.REQUEST_NK
  AND NK.DIM_REQUEST_ID = HI.DIM_REQUEST_ID;

Na mtazamo ambapo taarifa za kihistoria zinakusanywa kwa kila kiingilio. Inahitajika kujenga uhusiano sahihi wa kihistoria na ukweli.

Takwimu zaidi za tovuti katika hifadhi yako ndogo

Uwasilishaji wa kihistoria wa SCD2

/* Content: actual view on scd table */
SELECT SCD.DIM_REQUEST_ID,
  SCD.DIM_REQUEST_ID_HIST,
  SCD.REQUEST_NK,
  SCD.PAGE_TITLE,
  SCD.PAGE_DESCR,
  SCD.PAGE_KEYWORDS,
  SCD.DELETE_FLAG,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 1
    ELSE 0 END ACTIVE_FLAG,
  SCD.DIM_REQUEST_ID_HIST AS ID_FROM,
  SCD.DIM_REQUEST_ID AS ID_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN 3600
    ELSE IFNULL(SCD.UPDATE_DT,3600)
  END AS TIME_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 253370764800
    ELSE HIS.UPDATE_DT
  END AS TIME_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(3600, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(IFNULL(SCD.UPDATE_DT,3600), 'unixepoch'))
  END AS ACTIVE_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(253370764800, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(HIS.UPDATE_DT, 'unixepoch'))
  END AS ACTIVE_TO
FROM
  DIM_REQUEST SCD
  LEFT OUTER JOIN DIM_REQUEST HIS
  ON SCD.REQUEST_NK = HIS.REQUEST_NK AND SCD.DIM_REQUEST_ID = HIS.DIM_REQUEST_ID_HIST;

Ujumlishaji wa Data

Mfinyazo (ujumlisho) hukuruhusu kutathmini data katika kiwango cha juu na kugundua hitilafu na mitindo ambayo haionekani katika ripoti za kina.

Kwa mfano, ongeza kikundi kwenye kipimo chenye misimbo ya hali ya ombi DIM_HTTP_STATUS:

HALI/KIKUNDI
0xx/na
1xx/Habari
2xx/Imefaulu
3xx/Kuelekeza kwingine
4xx/Hitilafu ya Mteja
5xx/Hitilafu ya Seva

Kipimo cha wakala wa mtumiaji DIM_USER_AGENT kitakuwa na AGENT_OS na AGENT_BOT sifa ambazo zinawajibika kwa vikundi. Hizi zinaweza kuwekwa wakati wa mchakato wa ETL:

Inapakia DIM_USER_AGENT

/* Propagate the user agent from access log */
INSERT INTO DIM_USER_AGENT (USER_AGENT_NK, AGENT_OS, AGENT_ENGINE, AGENT_DEVICE, AGENT_BOT, UPDATE_DT)
WITH CLS AS (
	SELECT BROWSER
	FROM STG_ACCESS_LOG WHERE LENGTH(BROWSER)>1
	GROUP BY BROWSER
)
SELECT
	CLS.BROWSER AS USER_AGENT_NK,
	CASE
	WHEN INSTR(CLS.BROWSER,'Macintosh')>0
		THEN 'Macintosh'
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
			 OR INSTR(CLS.BROWSER,'iPad')>0
			 OR INSTR(CLS.BROWSER,'iPod')>0
			 OR INSTR(CLS.BROWSER,'Apple TV')>0
			 OR INSTR(CLS.BROWSER,'Darwin')>0
		THEN 'iOS'
	WHEN INSTR(CLS.BROWSER,'Android')>0
		THEN 'Android'
	WHEN INSTR(CLS.BROWSER,'X11;')>0 OR INSTR(CLS.BROWSER,'Wayland;')>0 OR INSTR(CLS.BROWSER,'linux-gnu')>0
		THEN 'Linux'
	WHEN INSTR(CLS.BROWSER,'BB10;')>0 OR INSTR(CLS.BROWSER,'BlackBerry')>0
		THEN 'BlackBerry'
	WHEN INSTR(CLS.BROWSER,'Windows')>0
		THEN 'Windows'
	ELSE 'n.a.' END AS AGENT_OS, -- OS
	CASE
	WHEN INSTR(CLS.BROWSER,'AppleCoreMedia')>0
		THEN 'AppleWebKit'
	WHEN INSTR(CLS.BROWSER,') ')>1 AND LENGTH(CLS.BROWSER)>INSTR(CLS.BROWSER,') ')
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,') ')+2, LENGTH(CLS.BROWSER) - INSTR(CLS.BROWSER,') ')-1), 'N/A')
	ELSE 'n.a.' END AS AGENT_ENGINE, -- Engine
	CASE
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
		THEN 'iPhone'
	WHEN INSTR(CLS.BROWSER,'iPad')>0
		THEN 'iPad'
	WHEN INSTR(CLS.BROWSER,'iPod')>0
		THEN 'iPod'
	WHEN INSTR(CLS.BROWSER,'Apple TV')>0
		THEN 'Apple TV'
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'Build')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'Build')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'MIUI')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'MIUI')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	ELSE 'n.a.' END AS AGENT_DEVICE, -- Device
	CASE
	WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
		THEN 'yandex'
	WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
		THEN 'google'
	WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
		THEN 'microsoft'
	WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
		THEN 'ahrefs'
	WHEN INSTR(LOWER(CLS.BROWSER),'jobboersebot')>0 OR INSTR(LOWER(CLS.BROWSER),'jobkicks')>0
		THEN 'job.de'
	WHEN INSTR(LOWER(CLS.BROWSER),'mail.ru')>0
		THEN 'mail.ru'
	WHEN INSTR(LOWER(CLS.BROWSER),'baiduspider')>0
		THEN 'baidu'
	WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
		THEN 'majestic-12'
	WHEN INSTR(LOWER(CLS.BROWSER),'duckduckgo')>0
		THEN 'duckduckgo'
	WHEN INSTR(LOWER(CLS.BROWSER),'bytespider')>0
		THEN 'bytespider'
	WHEN INSTR(LOWER(CLS.BROWSER),'360spider')>0
		THEN 'so.360.cn'
	WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
		OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
		OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
		OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0 OR INSTR(LOWER(CLS.BROWSER),'wget')>0
		THEN 'other'
	ELSE 'n.a.' END AS AGENT_BOT, -- Bot
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN DIM_USER_AGENT TRG
ON CLS.BROWSER = TRG.USER_AGENT_NK
WHERE TRG.DIM_USER_AGENT_ID IS NULL

Ujumuishaji wa Takwimu

Inajumuisha kuandaa uhamishaji wa data kutoka kwa mfumo wa uendeshaji hadi mfumo wa kuripoti. Ili kufanya hivyo, unahitaji kuunda meza ya hatua na muundo sawa na chanzo.

Taarifa kuhusu kurasa za wavuti hufika hatua kutoka kwa hifadhi rudufu ya CMS kwa njia ya maombi ya kuingiza.

Kupakia jedwali la kihistoria DIM_REQUEST lenye data ya msingi hutokea katika hatua tatu: kupakia funguo na sifa mpya, kusasisha zilizopo, na kuweka rekodi zilizofutwa.

Inapakia rekodi mpya za SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* new records from stage: CLS */
SELECT
	-1 AS DIM_REQUEST_ID_HIST,
	CLS.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN
 (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
) TRG ON CLS.REQUEST_NK = TRG.REQUEST_NK
WHERE TRG.REQUEST_NK IS NULL -- no such record in data mart

Inasasisha Sifa za SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* updated records from stage: CLS and build reference to history: HIST */
SELECT
	HIST.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	HIST.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS,
     DIM_REQUEST_V_ACT TRG,
     DIM_REQUEST HIST
WHERE CLS.REQUEST_NK = TRG.REQUEST_NK
  AND TRG.DIM_REQUEST_ID = HIST.DIM_REQUEST_ID
  AND ( CLS.PAGE_TITLE != HIST.PAGE_TITLE /* changes only */
     OR CLS.PAGE_DESCR != HIST.PAGE_DESCR
     OR CLS.PAGE_KEYWORDS != HIST.PAGE_KEYWORDS )

Rekodi za SCD2 zimefutwa

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/*  deleted records in data mart: TRG */
SELECT
	TRG.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	TRG.REQUEST_NK,
	TRG.PAGE_TITLE,
	TRG.PAGE_DESCR,
	TRG.PAGE_KEYWORDS,
	1 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
	WHERE PAGE_TITLE != 'n.a.' -- track master data only
	  AND DELETE_FLAG = 0 -- not already deleted
) TRG
LEFT OUTER JOIN CLS ON TRG.REQUEST_NK = CLS.REQUEST_NK
WHERE CLS.REQUEST_NK IS NULL -- no such record in stage

Kila chanzo cha data lazima kiambatane na maelezo rasmi, kwa mfano, katika faili ya readme.txt:

Mpokeaji wa data rasmi/kitaalam: jina, barua pepe
Mtoa huduma wa data rasmi/kitaalam: jina, barua pepe
Chanzo cha data: njia ya faili, majina ya huduma
Habari ya ufikiaji wa data: watumiaji na nywila

Mchoro wa mtiririko wa data utasaidia katika mchakato wa matengenezo na uppdatering, kwa mfano, katika fomu ya maandishi:

Kuhamisha faili. Chanzo: ftp.domain.net: /logs/access.log Lengo: /var/www/access.log
Kusoma kwenye jukwaa. Lengo: STG_ACCESS_LOG
Inapakia na kubadilisha. Lengo: FCT_ACCESS_REQUEST_REF_HH
Inapakia na kubadilisha. Lengo: FCT_ACCESS_USER_AGENT_DD
Ripoti. Lengo: /var/www/report.html

Pato

Kwa hivyo, makala inaeleza mbinu kama vile ujumuishaji wa data msingi na kuanzishwa kwa viwango vipya vya ujumlishaji. Zinahitajika wakati wa kujenga maghala ya data ili kupata maarifa ya ziada na kuboresha ubora wa habari.

Chanzo: mapenzi.com

Nunua upangishaji wa kuaminika wa tovuti zilizo na ulinzi wa DDoS, seva za VPS VDS 🔥 Nunua upangishaji wa tovuti unaoaminika kwa ulinzi wa DDoS, seva za VPS VDS | ProHoster