Langkung seueur statistik situs dina panyimpenan leutik anjeun

Ku nganalisa statistik situs, urang nampi ide naon anu kajantenan sareng éta. Urang ngabandingkeun hasil jeung pangaweruh sejenna ngeunaan produk atawa jasa jeung kukituna ngaronjatkeun pangalaman urang.

Nalika analisa hasil munggaran réngsé, inpormasi parantos kahartos sareng kasimpulan parantos ditarik, tahap salajengna dimimitian. Gagasan timbul: naon anu bakal kajadian upami anjeun ningali data ti sisi anu sanés?

Aya watesan alat analisis dina tahap ieu. Ieu mangrupikeun salah sahiji alesan naha Google Analytics henteu cekap pikeun kuring, nyaéta kusabab kamampuan terbatas pikeun ningali sareng ngamanipulasi data kuring.

Kuring sok hoyong gancang ngamuat data dasar (data master), nambihan tingkat agrégasi anu sanés, atanapi napsirkeun nilai-nilai anu aya sacara béda.

Ieu gampang pikeun ngalakukeun di di kamar panyimpen alit anjeun dumasar kana file access.log jeung basa SQL anu cukup pikeun ieu.

Janten, patarosan naon anu kuring hoyong dijawab?

Naon jeung iraha robah dina loka

Sajarah parobahan dina data dasar (data master) sok dipikaresep.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

query laporan SQL

SELECT
	1 as 'SideStackedBar: Content Updates by Months',
	strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch')) AS 'Day',
	COUNT(CASE WHEN PAGE_TITLE != 'n.a.' THEN DIM_REQUEST_ID END) AS 'Web page updates',
	COUNT(CASE WHEN PAGE_DESCR = 'IMAGES' THEN DIM_REQUEST_ID END) AS 'Image uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'VIDEO' THEN DIM_REQUEST_ID END) AS 'Video uploads',
	COUNT(CASE WHEN PAGE_DESCR = 'AUDIO' THEN DIM_REQUEST_ID END) AS 'Audio uploads'
FROM DIM_REQUEST
WHERE PAGE_TITLE != 'n.a.' OR PAGE_DESCR != 'n.a.'
GROUP BY strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch'))
ORDER BY UPDATE_DT

Contona, di sawatara titik, optimasi search engine ieu dilumangsungkeun atawa eusi anyar ieu ditambahkeun kana loka, sarta ku kituna paningkatan dina lalulintas diperkirakeun.

Grup pamaké

Conto pangbasajanna grup nyaéta agén pangguna atanapi nami sistem operasi.

Dimensi agén pamaké geus akumulasi ngeunaan sarébu rékaman jeung kuring kabetot dina ningali dinamika distribusi agén dina grup.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

query laporan SQL

SELECT
	1 AS 'SideStackedBar: User Agents',
	AGENT_OS AS 'OS',
	SUM(CASE WHEN AGENT_BOT = 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Users',
	SUM(CASE WHEN AGENT_BOT != 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Bots'
FROM DIM_USER_AGENT
WHERE DIM_USER_AGENT_ID != -1
GROUP BY AGENT_OS
ORDER BY 3 DESC

Paling kombinasi béda agén datang ka loka ti dunya Windows. Diantara anu teu dipikanyaho nyaéta WhatsApp, PocketImageCache, PlayStation, SmartTV, jsb.

Kagiatan grup pamaké ku minggu

Ku ngagabungkeun sababaraha kelompok, urang tiasa niténan distribusi kagiatanana.

Salaku conto, pangguna klaster Linux nganggo langkung seueur lalu lintas halaman wéb tibatan anu sanés.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

query laporan SQL

SELECT
1 as 'StackedBar: Traffic Volume by User OS and by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
SUM(CASE WHEN USG.AGENT_OS IN ('Android', 'Linux') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Android/Linux Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Windows') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Windows Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Macintosh', 'iOS') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Mac/iOS Users',
SUM(CASE WHEN USG.AGENT_OS IN ('n.a.', 'BlackBerry') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Other'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT = 'n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Konsumsi lalulintas intensif

Tabél nunjukkeun grup pangguna anu paling aktip sareng dinten kagiatanana.
Anu paling aktip kalebet kana klaster Linux.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

query laporan SQL

SELECT
1 AS 'Table: User Agent with Havy Usage',
strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
ROUND(1.0*SUM(FCT.BYTES)/1000000, 1) AS 'Traffic MB',
ROUND(1.0*SUM(FCT.IP_CNT)/SUM(1), 1) AS 'IPs',
ROUND(1.0*SUM(FCT.REQUEST_CNT)/SUM(1), 1) AS 'Requests',
USA.DIM_USER_AGENT_ID AS 'ID',
MAX(USA.USER_AGENT_NK) AS 'User Agent',
MAX(USA.AGENT_BOT) AS 'Bot'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USA
WHERE FCT.DIM_USER_AGENT_ID = USA.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-30 day')
GROUP BY USA.DIM_USER_AGENT_ID, strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) 
ORDER BY SUM(FCT.BYTES) DESC, FCT.EVENT_DT
LIMIT 10

Nganggo atribut ID dinten sareng agén, éta tiasa gancang mendakan sareng ngalacak statistik dina dinten-dinten grup pangguna individu. Upami diperlukeun, anjeun tiasa gancang manggihan inpo wincik dina tabel panggung.

Kumaha kéngingkeun inpormasi?

Inpormasi tina file access.log bisa dijieun malah leuwih efisien ku ngahijikeun sumber data tambahan sarta ngawanohkeun tingkat anyar aggregation jeung grup.

Data dasar sareng éntitas

Data dasar kalebet inpormasi ngeunaan éntitas: halaman wéb, gambar, eusi video sareng audio, dina kasus toko - produk.

Éntitas sorangan bertindak salaku ukuran, sareng prosés nyimpen parobahan atribut disebut historicization. Dina database, prosés ieu mindeng dilaksanakeun dina bentuk slow varying dimensi (SCD).

Sumber data bisa datangna tina rupa-rupa sistem, ngarah ampir sok perlu terpadu.

Lalaunan ngarobah dimensi

Dimensi DIM_REQUEST bakal ngandung inpormasi ngeunaan pamundut dina situs dina bentuk sajarah.

Tabél SCD2

CREATE TABLE DIM_REQUEST ( /* scd table for user requests */
  DIM_REQUEST_ID      INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  DIM_REQUEST_ID_HIST INTEGER NOT NULL DEFAULT -1,
  REQUEST_NK          TEXT NOT NULL DEFAULT 'n.a.', /* request without ?parameters */
  PAGE_TITLE          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_DESCR          TEXT NOT NULL DEFAULT 'n.a.',
  PAGE_KEYWORDS       TEXT NOT NULL DEFAULT 'n.a.',
  DELETE_FLAG         INTEGER NOT NULL DEFAULT 0,
  UPDATE_DT           INTEGER NOT NULL DEFAULT 0,
  UNIQUE (REQUEST_NK, DIM_REQUEST_ID_HIST)
);
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID) VALUES (-1);

Sajaba ti éta, urang bakal nyieun hiji pintonan nu salawasna mintonkeun sadaya rékaman dina kaayaan panganyarna. Diperlukeun pikeun ngamuat pangukuran sorangan.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

Répréséntasi ayeuna tina SCD2

/* Content: actual view on scd table */
SELECT HI.DIM_REQUEST_ID,
  HI.DIM_REQUEST_ID_HIST,
  HI.REQUEST_NK,
  HI.PAGE_TITLE,
  HI.PAGE_DESCR,
  HI.PAGE_KEYWORDS,
  NK.CNT AS HIST_CNT,
  HI.DELETE_FLAG,
  strftime('%d.%m.%Y %H:%M', datetime(HI.UPDATE_DT, 'unixepoch')) AS UPDATE_DT
FROM
  ( SELECT REQUEST_NK, MAX(DIM_REQUEST_ID) AS DIM_REQUEST_ID, SUM(1) AS CNT
    FROM DIM_REQUEST
    GROUP BY REQUEST_NK
  ) NK,
  DIM_REQUEST HI
WHERE 1 = 1
  AND NK.REQUEST_NK = HI.REQUEST_NK
  AND NK.DIM_REQUEST_ID = HI.DIM_REQUEST_ID;

Sareng pandangan dimana inpormasi sajarah dikumpulkeun pikeun tiap éntri. Perlu ngawangun hubungan sajarah anu leres sareng kanyataan.

Langkung seueur statistik situs dina panyimpenan leutik anjeun

Presentasi sajarah SCD2

/* Content: actual view on scd table */
SELECT SCD.DIM_REQUEST_ID,
  SCD.DIM_REQUEST_ID_HIST,
  SCD.REQUEST_NK,
  SCD.PAGE_TITLE,
  SCD.PAGE_DESCR,
  SCD.PAGE_KEYWORDS,
  SCD.DELETE_FLAG,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 1
    ELSE 0 END ACTIVE_FLAG,
  SCD.DIM_REQUEST_ID_HIST AS ID_FROM,
  SCD.DIM_REQUEST_ID AS ID_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN 3600
    ELSE IFNULL(SCD.UPDATE_DT,3600)
  END AS TIME_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN 253370764800
    ELSE HIS.UPDATE_DT
  END AS TIME_TO,
  CASE
    WHEN SCD.DIM_REQUEST_ID_HIST=-1
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(3600, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(IFNULL(SCD.UPDATE_DT,3600), 'unixepoch'))
  END AS ACTIVE_FROM,
  CASE
    WHEN HIS.UPDATE_DT IS NULL
    THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(253370764800, 'unixepoch'))
    ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(HIS.UPDATE_DT, 'unixepoch'))
  END AS ACTIVE_TO
FROM
  DIM_REQUEST SCD
  LEFT OUTER JOIN DIM_REQUEST HIS
  ON SCD.REQUEST_NK = HIS.REQUEST_NK AND SCD.DIM_REQUEST_ID = HIS.DIM_REQUEST_ID_HIST;

Ngumpulkeun Data

Komprési (aggregasi) ngamungkinkeun anjeun pikeun ngévaluasi data dina tingkat anu langkung luhur sareng ngadeteksi anomali sareng tren anu henteu katingali dina laporan anu lengkep.

Contona, tambahkeun grup kana diménsi kalayan kode status pamundut DIM_HTTP_STATUS:

STATUS / GROUP
0xx/na
1xx/Émbaran
2xx / Suksés
3xx / Alihan
4xx / Kasalahan klien
5xx/Server Kasalahan

Diménsi agén pamaké DIM_USER_AGENT bakal ngandung atribut AGENT_OS sareng AGENT_BOT anu tanggung jawab pikeun grup. Ieu tiasa dieusi salami prosés ETL:

Ngamuat DIM_USER_AGENT

/* Propagate the user agent from access log */
INSERT INTO DIM_USER_AGENT (USER_AGENT_NK, AGENT_OS, AGENT_ENGINE, AGENT_DEVICE, AGENT_BOT, UPDATE_DT)
WITH CLS AS (
	SELECT BROWSER
	FROM STG_ACCESS_LOG WHERE LENGTH(BROWSER)>1
	GROUP BY BROWSER
)
SELECT
	CLS.BROWSER AS USER_AGENT_NK,
	CASE
	WHEN INSTR(CLS.BROWSER,'Macintosh')>0
		THEN 'Macintosh'
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
			 OR INSTR(CLS.BROWSER,'iPad')>0
			 OR INSTR(CLS.BROWSER,'iPod')>0
			 OR INSTR(CLS.BROWSER,'Apple TV')>0
			 OR INSTR(CLS.BROWSER,'Darwin')>0
		THEN 'iOS'
	WHEN INSTR(CLS.BROWSER,'Android')>0
		THEN 'Android'
	WHEN INSTR(CLS.BROWSER,'X11;')>0 OR INSTR(CLS.BROWSER,'Wayland;')>0 OR INSTR(CLS.BROWSER,'linux-gnu')>0
		THEN 'Linux'
	WHEN INSTR(CLS.BROWSER,'BB10;')>0 OR INSTR(CLS.BROWSER,'BlackBerry')>0
		THEN 'BlackBerry'
	WHEN INSTR(CLS.BROWSER,'Windows')>0
		THEN 'Windows'
	ELSE 'n.a.' END AS AGENT_OS, -- OS
	CASE
	WHEN INSTR(CLS.BROWSER,'AppleCoreMedia')>0
		THEN 'AppleWebKit'
	WHEN INSTR(CLS.BROWSER,') ')>1 AND LENGTH(CLS.BROWSER)>INSTR(CLS.BROWSER,') ')
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,') ')+2, LENGTH(CLS.BROWSER) - INSTR(CLS.BROWSER,') ')-1), 'N/A')
	ELSE 'n.a.' END AS AGENT_ENGINE, -- Engine
	CASE
	WHEN INSTR(CLS.BROWSER,'iPhone')>0
		THEN 'iPhone'
	WHEN INSTR(CLS.BROWSER,'iPad')>0
		THEN 'iPad'
	WHEN INSTR(CLS.BROWSER,'iPod')>0
		THEN 'iPod'
	WHEN INSTR(CLS.BROWSER,'Apple TV')>0
		THEN 'Apple TV'
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'Build')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'Build')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'MIUI')>0
		THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'MIUI')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
	ELSE 'n.a.' END AS AGENT_DEVICE, -- Device
	CASE
	WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
		THEN 'yandex'
	WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
		THEN 'google'
	WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
		THEN 'microsoft'
	WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
		THEN 'ahrefs'
	WHEN INSTR(LOWER(CLS.BROWSER),'jobboersebot')>0 OR INSTR(LOWER(CLS.BROWSER),'jobkicks')>0
		THEN 'job.de'
	WHEN INSTR(LOWER(CLS.BROWSER),'mail.ru')>0
		THEN 'mail.ru'
	WHEN INSTR(LOWER(CLS.BROWSER),'baiduspider')>0
		THEN 'baidu'
	WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
		THEN 'majestic-12'
	WHEN INSTR(LOWER(CLS.BROWSER),'duckduckgo')>0
		THEN 'duckduckgo'
	WHEN INSTR(LOWER(CLS.BROWSER),'bytespider')>0
		THEN 'bytespider'
	WHEN INSTR(LOWER(CLS.BROWSER),'360spider')>0
		THEN 'so.360.cn'
	WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
		OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
		OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
		OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0 OR INSTR(LOWER(CLS.BROWSER),'wget')>0
		THEN 'other'
	ELSE 'n.a.' END AS AGENT_BOT, -- Bot
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN DIM_USER_AGENT TRG
ON CLS.BROWSER = TRG.USER_AGENT_NK
WHERE TRG.DIM_USER_AGENT_ID IS NULL

Integrasi data

Ngawengku pangatur mindahkeun data tina sistem operasi ka sistem ngalaporkeun. Jang ngalampahkeun ieu, anjeun kedah ngadamel tabel panggung kalayan struktur anu sami sareng sumberna.

Inpormasi ngeunaan halaman wéb dugi ka panggung tina cadangan CMS dina bentuk pamundut sisipan.

Ngamuat tabel sajarah DIM_REQUEST sareng data dasar lumangsung dina tilu léngkah: ngamuat konci sareng atribut énggal, ngamutahirkeun anu tos aya, sareng ngalakukeun rékaman anu dihapus.

Ngamuat rékaman SCD2 anyar

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* new records from stage: CLS */
SELECT
	-1 AS DIM_REQUEST_ID_HIST,
	CLS.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN
 (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
) TRG ON CLS.REQUEST_NK = TRG.REQUEST_NK
WHERE TRG.REQUEST_NK IS NULL -- no such record in data mart

Ngamutahirkeun Atribut SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE,
	CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
	     THEN 'n.a.' ELSE DESCRIPTION
	END AS PAGE_DESCR,
	CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
	     THEN 'n.a.' ELSE KEYWORDS
	END AS PAGE_KEYWORDS
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* updated records from stage: CLS and build reference to history: HIST */
SELECT
	HIST.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	HIST.REQUEST_NK,
	CLS.PAGE_TITLE,
	CLS.PAGE_DESCR,
	CLS.PAGE_KEYWORDS,
	0 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM CLS,
     DIM_REQUEST_V_ACT TRG,
     DIM_REQUEST HIST
WHERE CLS.REQUEST_NK = TRG.REQUEST_NK
  AND TRG.DIM_REQUEST_ID = HIST.DIM_REQUEST_ID
  AND ( CLS.PAGE_TITLE != HIST.PAGE_TITLE /* changes only */
     OR CLS.PAGE_DESCR != HIST.PAGE_DESCR
     OR CLS.PAGE_KEYWORDS != HIST.PAGE_KEYWORDS )

Ngahapus rékaman SCD2

/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS  AS ( -- prepare keys
	SELECT
	'/' || NAME AS REQUEST_NK,
	TITLE       AS PAGE_TITLE
	FROM STG_CMS_MENU
	WHERE CONTENT_TYPE != 'folder' -- only web pages
	  AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/*  deleted records in data mart: TRG */
SELECT
	TRG.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
	TRG.REQUEST_NK,
	TRG.PAGE_TITLE,
	TRG.PAGE_DESCR,
	TRG.PAGE_KEYWORDS,
	1 AS DELETE_FLAG,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT
	DIM_REQUEST_ID,
	REQUEST_NK,
	PAGE_TITLE,
	PAGE_DESCR,
	PAGE_KEYWORDS
	FROM DIM_REQUEST_V_ACT
	WHERE PAGE_TITLE != 'n.a.' -- track master data only
	  AND DELETE_FLAG = 0 -- not already deleted
) TRG
LEFT OUTER JOIN CLS ON TRG.REQUEST_NK = CLS.REQUEST_NK
WHERE CLS.REQUEST_NK IS NULL -- no such record in stage

Unggal sumber data kudu dibarengan ku déskripsi formal, contona, dina file readme.txt:

Panarima data sacara formal/ténis: ngaran, alamat surélék
Panyadia data sacara resmi/téknis: ngaran, alamat surélék
Sumber data: jalur file, ngaran jasa
Inpormasi aksés data: pangguna sareng kecap akses

Diagram aliran data bakal ngabantosan dina prosés pangropéa sareng pembaruan, contona, dina bentuk téks:

Mindahkeun file. Sumber: ftp.domain.net: /logs/access.log Target: /var/www/access.log
Maca dina panggung. Sasaran: STG_ACCESS_LOG
Loading jeung transformasi. Sasaran: FCT_ACCESS_REQUEST_REF_HH
Loading jeung transformasi. Sasaran: FCT_ACCESS_USER_AGENT_DD
Laporan. Sasaran: /var/www/report.html

kacindekan

Ku kituna, artikel ngajelaskeun mékanisme kayaning integrasi data kaayaan sarta bubuka tingkat anyar aggregation. Éta diperlukeun nalika ngawangun gudang data pikeun ménta pangaweruh tambahan sarta ngaronjatkeun kualitas informasi.

sumber: www.habr.com

Tambahkeun komentar