Billi janalizzaw l-istatistika tas-sit, aħna nieħdu idea ta’ x’qed jiġri miegħu. Aħna nqabblu r-riżultati ma 'għarfien ieħor dwar il-prodott jew is-servizz u b'hekk intejbu l-esperjenza tagħna.
Meta titlesta l-analiżi tal-ewwel riżultati, l-informazzjoni tkun ġiet mifhuma u nsiltu konklużjonijiet, jibda l-istadju li jmiss. Jqumu l-ideat: x'se jiġri jekk tħares lejn id-dejta min-naħa l-oħra?
Hemm limitazzjonijiet tal-għodod tal-analiżi f'dan l-istadju. Din hija waħda mir-raġunijiet għalfejn Google Analytics ma kienx biżżejjed għalija, jiġifieri minħabba l-kapaċità limitata li nara u timmanipula d-dejta tiegħi.
Dejjem xtaqt tagħbija malajr dejta bażika (dejta ewlenija), inżid livell ieħor ta 'aggregazzjoni, jew ninterpreta valuri eżistenti b'mod differenti.
Dan huwa faċli biex tagħmel
Allura, liema mistoqsijiet ridt imwieġba?
X'inhu u meta jinbidel fuq is-sit
L-istorja tal-bidliet fid-dejta sottostanti (dejta prinċipali) hija dejjem ta 'interess.
Mistoqsija ta' rapport SQL
SELECT
1 as 'SideStackedBar: Content Updates by Months',
strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch')) AS 'Day',
COUNT(CASE WHEN PAGE_TITLE != 'n.a.' THEN DIM_REQUEST_ID END) AS 'Web page updates',
COUNT(CASE WHEN PAGE_DESCR = 'IMAGES' THEN DIM_REQUEST_ID END) AS 'Image uploads',
COUNT(CASE WHEN PAGE_DESCR = 'VIDEO' THEN DIM_REQUEST_ID END) AS 'Video uploads',
COUNT(CASE WHEN PAGE_DESCR = 'AUDIO' THEN DIM_REQUEST_ID END) AS 'Audio uploads'
FROM DIM_REQUEST
WHERE PAGE_TITLE != 'n.a.' OR PAGE_DESCR != 'n.a.'
GROUP BY strftime('%m/%Y', datetime(UPDATE_DT, 'unixepoch'))
ORDER BY UPDATE_DT
Pereżempju, f'xi punt, saret l-ottimizzazzjoni tal-magni tat-tiftix jew ġie miżjud kontenut ġdid fis-sit, u għalhekk hija mistennija żieda fit-traffiku.
Grupp ta' utenti
L-aktar eżempju sempliċi ta 'grupp huwa l-utent aġent jew l-isem tas-sistema operattiva.
Id-dimensjoni tal-aġent tal-utent akkumulat madwar elf rekord u kont interessat li nara d-dinamika tad-distribuzzjoni tal-aġenti fi ħdan il-grupp.
Mistoqsija ta' rapport SQL
SELECT
1 AS 'SideStackedBar: User Agents',
AGENT_OS AS 'OS',
SUM(CASE WHEN AGENT_BOT = 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Users',
SUM(CASE WHEN AGENT_BOT != 'n.a.' THEN 1 ELSE 0 END ) AS 'User Agent of Bots'
FROM DIM_USER_AGENT
WHERE DIM_USER_AGENT_ID != -1
GROUP BY AGENT_OS
ORDER BY 3 DESC
Il-biċċa l-kbira tal-kombinazzjonijiet differenti ta 'aġenti jaslu fis-sit mid-dinja tal-Windows. Fost dawk mhux identifikati kien hemm WhatsApp, PocketImageCache, PlayStation, SmartTV, eċċ.
Attività tal-grupp tal-utenti skont il-ġimgħa
Billi tgħaqqad xi gruppi, wieħed jista 'josserva d-distribuzzjoni tal-attività tagħhom.
Pereżempju, l-utenti tal-clusters tal-Linux jikkunsmaw aktar traffiku tal-websajt minn kulħadd.
Mistoqsija ta' rapport SQL
SELECT
1 as 'StackedBar: Traffic Volume by User OS and by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
SUM(CASE WHEN USG.AGENT_OS IN ('Android', 'Linux') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Android/Linux Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Windows') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Windows Users',
SUM(CASE WHEN USG.AGENT_OS IN ('Macintosh', 'iOS') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Mac/iOS Users',
SUM(CASE WHEN USG.AGENT_OS IN ('n.a.', 'BlackBerry') THEN FCT.BYTES ELSE 0 END)/1000 AS 'Other'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USG,
DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
AND USG.AGENT_BOT = 'n.a.' /* users only */
AND HST.STATUS_GROUP IN ('Successful') /* good pages */
AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT
Konsum intensiv tat-traffiku
It-tabella turi l-aktar gruppi ta' utenti attivi u l-jum tal-attività tagħhom.
L-aktar attivi jappartjenu għall-cluster Linux.
Mistoqsija ta' rapport SQL
SELECT
1 AS 'Table: User Agent with Havy Usage',
strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
ROUND(1.0*SUM(FCT.BYTES)/1000000, 1) AS 'Traffic MB',
ROUND(1.0*SUM(FCT.IP_CNT)/SUM(1), 1) AS 'IPs',
ROUND(1.0*SUM(FCT.REQUEST_CNT)/SUM(1), 1) AS 'Requests',
USA.DIM_USER_AGENT_ID AS 'ID',
MAX(USA.USER_AGENT_NK) AS 'User Agent',
MAX(USA.AGENT_BOT) AS 'Bot'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USA
WHERE FCT.DIM_USER_AGENT_ID = USA.DIM_USER_AGENT_ID
AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-30 day')
GROUP BY USA.DIM_USER_AGENT_ID, strftime('%d.%m.%Y', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY SUM(FCT.BYTES) DESC, FCT.EVENT_DT
LIMIT 10
Bl-użu tal-jum tal-attributi u l-ID tal-aġent, isir possibbli li ssib malajr u ssegwi statistika fil-jiem ta 'gruppi ta' utenti individwali. Jekk meħtieġ, tista 'ssib malajr informazzjoni dettaljata fit-tabella tal-istadju.
Kif tikseb l-informazzjoni?
Data bażika u entitajiet
Id-dejta bażika tinkludi informazzjoni dwar entitajiet: paġni tal-web, stampi, kontenut tal-vidjo u awdjo, fil-każ ta 'maħżen - prodotti.
L-entitajiet infushom jaġixxu bħala kejl, u l-proċess tal-ħażna tal-bidliet fl-attributi jissejjaħ storikizzazzjoni. F'database, dan il-proċess huwa spiss implimentat fil-forma ta 'dimensjonijiet li jvarjaw bil-mod (SCD).
Id-dejta tas-sors tista 'tiġi minn varjetà ta' sistemi, għalhekk kważi dejjem jeħtieġ li jiġu integrati.
Dimensjoni li tinbidel bil-mod
Id-dimensjoni DIM_REQUEST se jkun fiha informazzjoni dwar talbiet fuq is-sit f'forma storika.
Tabella SCD2
CREATE TABLE DIM_REQUEST ( /* scd table for user requests */
DIM_REQUEST_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
DIM_REQUEST_ID_HIST INTEGER NOT NULL DEFAULT -1,
REQUEST_NK TEXT NOT NULL DEFAULT 'n.a.', /* request without ?parameters */
PAGE_TITLE TEXT NOT NULL DEFAULT 'n.a.',
PAGE_DESCR TEXT NOT NULL DEFAULT 'n.a.',
PAGE_KEYWORDS TEXT NOT NULL DEFAULT 'n.a.',
DELETE_FLAG INTEGER NOT NULL DEFAULT 0,
UPDATE_DT INTEGER NOT NULL DEFAULT 0,
UNIQUE (REQUEST_NK, DIM_REQUEST_ID_HIST)
);
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID) VALUES (-1);
Minbarra dan, se noħolqu veduta waħda li dejjem turi r-rekords kollha fl-aħħar stat. Meħtieġa għat-tagħbija tal-kejl innifsu.
Rappreżentazzjoni attwali ta 'SCD2
/* Content: actual view on scd table */
SELECT HI.DIM_REQUEST_ID,
HI.DIM_REQUEST_ID_HIST,
HI.REQUEST_NK,
HI.PAGE_TITLE,
HI.PAGE_DESCR,
HI.PAGE_KEYWORDS,
NK.CNT AS HIST_CNT,
HI.DELETE_FLAG,
strftime('%d.%m.%Y %H:%M', datetime(HI.UPDATE_DT, 'unixepoch')) AS UPDATE_DT
FROM
( SELECT REQUEST_NK, MAX(DIM_REQUEST_ID) AS DIM_REQUEST_ID, SUM(1) AS CNT
FROM DIM_REQUEST
GROUP BY REQUEST_NK
) NK,
DIM_REQUEST HI
WHERE 1 = 1
AND NK.REQUEST_NK = HI.REQUEST_NK
AND NK.DIM_REQUEST_ID = HI.DIM_REQUEST_ID;
U ħsieb fejn tinġabar informazzjoni storika għal kull dħul. Huwa meħtieġ li tinbena konnessjoni storikament korretta mal-fatti.
Preżentazzjoni storika ta 'SCD2
/* Content: actual view on scd table */
SELECT SCD.DIM_REQUEST_ID,
SCD.DIM_REQUEST_ID_HIST,
SCD.REQUEST_NK,
SCD.PAGE_TITLE,
SCD.PAGE_DESCR,
SCD.PAGE_KEYWORDS,
SCD.DELETE_FLAG,
CASE
WHEN HIS.UPDATE_DT IS NULL
THEN 1
ELSE 0 END ACTIVE_FLAG,
SCD.DIM_REQUEST_ID_HIST AS ID_FROM,
SCD.DIM_REQUEST_ID AS ID_TO,
CASE
WHEN SCD.DIM_REQUEST_ID_HIST=-1
THEN 3600
ELSE IFNULL(SCD.UPDATE_DT,3600)
END AS TIME_FROM,
CASE
WHEN HIS.UPDATE_DT IS NULL
THEN 253370764800
ELSE HIS.UPDATE_DT
END AS TIME_TO,
CASE
WHEN SCD.DIM_REQUEST_ID_HIST=-1
THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(3600, 'unixepoch'))
ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(IFNULL(SCD.UPDATE_DT,3600), 'unixepoch'))
END AS ACTIVE_FROM,
CASE
WHEN HIS.UPDATE_DT IS NULL
THEN STRFTIME('%d.%m.%Y %H:%M', DATETIME(253370764800, 'unixepoch'))
ELSE STRFTIME('%d.%m.%Y %H:%M', DATETIME(HIS.UPDATE_DT, 'unixepoch'))
END AS ACTIVE_TO
FROM
DIM_REQUEST SCD
LEFT OUTER JOIN DIM_REQUEST HIS
ON SCD.REQUEST_NK = HIS.REQUEST_NK AND SCD.DIM_REQUEST_ID = HIS.DIM_REQUEST_ID_HIST;
Aggregazzjoni tad-Data
Il-kompressjoni (aggregazzjoni) tippermettilek tevalwa d-dejta f'livell ogħla u tiskopri anomaliji u xejriet li mhumiex viżibbli f'rapporti dettaljati.
Pereżempju, żid grupp mad-dimensjoni bil-kodiċi tal-istatus tat-talba DIM_HTTP_STATUS:
STATUS/GRUPP
0xx/na
1xx/Informazzjoni
2xx/Suċċess
3xx/Direzzjoni mill-ġdid
4xx/Żball tal-Klijent
Żball 5xx/Server
Id-dimensjoni tal-aġent tal-utent DIM_USER_AGENT se jkun fiha l-attributi AGENT_OS u AGENT_BOT li huma responsabbli għall-gruppi. Dawn jistgħu jiġu popolati matul il-proċess ETL:
Tagħbija DIM_USER_AGENT
/* Propagate the user agent from access log */
INSERT INTO DIM_USER_AGENT (USER_AGENT_NK, AGENT_OS, AGENT_ENGINE, AGENT_DEVICE, AGENT_BOT, UPDATE_DT)
WITH CLS AS (
SELECT BROWSER
FROM STG_ACCESS_LOG WHERE LENGTH(BROWSER)>1
GROUP BY BROWSER
)
SELECT
CLS.BROWSER AS USER_AGENT_NK,
CASE
WHEN INSTR(CLS.BROWSER,'Macintosh')>0
THEN 'Macintosh'
WHEN INSTR(CLS.BROWSER,'iPhone')>0
OR INSTR(CLS.BROWSER,'iPad')>0
OR INSTR(CLS.BROWSER,'iPod')>0
OR INSTR(CLS.BROWSER,'Apple TV')>0
OR INSTR(CLS.BROWSER,'Darwin')>0
THEN 'iOS'
WHEN INSTR(CLS.BROWSER,'Android')>0
THEN 'Android'
WHEN INSTR(CLS.BROWSER,'X11;')>0 OR INSTR(CLS.BROWSER,'Wayland;')>0 OR INSTR(CLS.BROWSER,'linux-gnu')>0
THEN 'Linux'
WHEN INSTR(CLS.BROWSER,'BB10;')>0 OR INSTR(CLS.BROWSER,'BlackBerry')>0
THEN 'BlackBerry'
WHEN INSTR(CLS.BROWSER,'Windows')>0
THEN 'Windows'
ELSE 'n.a.' END AS AGENT_OS, -- OS
CASE
WHEN INSTR(CLS.BROWSER,'AppleCoreMedia')>0
THEN 'AppleWebKit'
WHEN INSTR(CLS.BROWSER,') ')>1 AND LENGTH(CLS.BROWSER)>INSTR(CLS.BROWSER,') ')
THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,') ')+2, LENGTH(CLS.BROWSER) - INSTR(CLS.BROWSER,') ')-1), 'N/A')
ELSE 'n.a.' END AS AGENT_ENGINE, -- Engine
CASE
WHEN INSTR(CLS.BROWSER,'iPhone')>0
THEN 'iPhone'
WHEN INSTR(CLS.BROWSER,'iPad')>0
THEN 'iPad'
WHEN INSTR(CLS.BROWSER,'iPod')>0
THEN 'iPod'
WHEN INSTR(CLS.BROWSER,'Apple TV')>0
THEN 'Apple TV'
WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'Build')>0
THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'Build')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
WHEN INSTR(CLS.BROWSER,'Android ')>0 AND INSTR(CLS.BROWSER,'MIUI')>0
THEN COALESCE(SUBSTR(CLS.BROWSER, INSTR(CLS.BROWSER,'Android '), INSTR(CLS.BROWSER,'MIUI')-INSTR(CLS.BROWSER,'Android ')), 'n.a.')
ELSE 'n.a.' END AS AGENT_DEVICE, -- Device
CASE
WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
THEN 'yandex'
WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
THEN 'google'
WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
THEN 'microsoft'
WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
THEN 'ahrefs'
WHEN INSTR(LOWER(CLS.BROWSER),'jobboersebot')>0 OR INSTR(LOWER(CLS.BROWSER),'jobkicks')>0
THEN 'job.de'
WHEN INSTR(LOWER(CLS.BROWSER),'mail.ru')>0
THEN 'mail.ru'
WHEN INSTR(LOWER(CLS.BROWSER),'baiduspider')>0
THEN 'baidu'
WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
THEN 'majestic-12'
WHEN INSTR(LOWER(CLS.BROWSER),'duckduckgo')>0
THEN 'duckduckgo'
WHEN INSTR(LOWER(CLS.BROWSER),'bytespider')>0
THEN 'bytespider'
WHEN INSTR(LOWER(CLS.BROWSER),'360spider')>0
THEN 'so.360.cn'
WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0 OR INSTR(LOWER(CLS.BROWSER),'wget')>0
THEN 'other'
ELSE 'n.a.' END AS AGENT_BOT, -- Bot
STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN DIM_USER_AGENT TRG
ON CLS.BROWSER = TRG.USER_AGENT_NK
WHERE TRG.DIM_USER_AGENT_ID IS NULL
Integrazzjoni tad-data
Jinkludi l-organizzazzjoni tat-trasferiment tad-dejta mis-sistema operattiva għas-sistema ta' rappurtar. Biex tagħmel dan, għandek bżonn toħloq tabella tal-istadju bi struttura simili għas-sors.
L-informazzjoni dwar il-paġni tal-web tasal għall-istadju mill-backup tas-CMS fil-forma ta’ talbiet ta’ daħħal.
It-tagħbija tat-tabella storika DIM_REQUEST b'dejta bażika sseħħ fi tliet passi: tagħbija ta 'ċwievet u attributi ġodda, aġġornament ta' dawk eżistenti, u impenn ta 'rekords imħassra.
Tagħbija ta' reġistrazzjonijiet SCD2 ġodda
/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS AS ( -- prepare keys
SELECT
'/' || NAME AS REQUEST_NK,
TITLE AS PAGE_TITLE,
CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
THEN 'n.a.' ELSE DESCRIPTION
END AS PAGE_DESCR,
CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
THEN 'n.a.' ELSE KEYWORDS
END AS PAGE_KEYWORDS
FROM STG_CMS_MENU
WHERE CONTENT_TYPE != 'folder' -- only web pages
AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* new records from stage: CLS */
SELECT
-1 AS DIM_REQUEST_ID_HIST,
CLS.REQUEST_NK,
CLS.PAGE_TITLE,
CLS.PAGE_DESCR,
CLS.PAGE_KEYWORDS,
0 AS DELETE_FLAG,
STRFTIME('%s','now') AS UPDATE_DT
FROM CLS
LEFT OUTER JOIN
(
SELECT
DIM_REQUEST_ID,
REQUEST_NK,
PAGE_TITLE,
PAGE_DESCR,
PAGE_KEYWORDS
FROM DIM_REQUEST_V_ACT
) TRG ON CLS.REQUEST_NK = TRG.REQUEST_NK
WHERE TRG.REQUEST_NK IS NULL -- no such record in data mart
Aġġornament ta' Attributi SCD2
/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS AS ( -- prepare keys
SELECT
'/' || NAME AS REQUEST_NK,
TITLE AS PAGE_TITLE,
CASE WHEN DESCRIPTION = '' OR DESCRIPTION IS NULL
THEN 'n.a.' ELSE DESCRIPTION
END AS PAGE_DESCR,
CASE WHEN KEYWORDS = '' OR KEYWORDS IS NULL
THEN 'n.a.' ELSE KEYWORDS
END AS PAGE_KEYWORDS
FROM STG_CMS_MENU
WHERE CONTENT_TYPE != 'folder' -- only web pages
AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* updated records from stage: CLS and build reference to history: HIST */
SELECT
HIST.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
HIST.REQUEST_NK,
CLS.PAGE_TITLE,
CLS.PAGE_DESCR,
CLS.PAGE_KEYWORDS,
0 AS DELETE_FLAG,
STRFTIME('%s','now') AS UPDATE_DT
FROM CLS,
DIM_REQUEST_V_ACT TRG,
DIM_REQUEST HIST
WHERE CLS.REQUEST_NK = TRG.REQUEST_NK
AND TRG.DIM_REQUEST_ID = HIST.DIM_REQUEST_ID
AND ( CLS.PAGE_TITLE != HIST.PAGE_TITLE /* changes only */
OR CLS.PAGE_DESCR != HIST.PAGE_DESCR
OR CLS.PAGE_KEYWORDS != HIST.PAGE_KEYWORDS )
Reġistrazzjonijiet SCD2 imħassra
/* Load request table SCD from master data */
INSERT INTO DIM_REQUEST (DIM_REQUEST_ID_HIST, REQUEST_NK, PAGE_TITLE, PAGE_DESCR, PAGE_KEYWORDS, DELETE_FLAG, UPDATE_DT)
WITH CLS AS ( -- prepare keys
SELECT
'/' || NAME AS REQUEST_NK,
TITLE AS PAGE_TITLE
FROM STG_CMS_MENU
WHERE CONTENT_TYPE != 'folder' -- only web pages
AND PAGE_TITLE != 'n.a.' -- master data which make sense
)
/* deleted records in data mart: TRG */
SELECT
TRG.DIM_REQUEST_ID AS DIM_REQUEST_ID_HIST,
TRG.REQUEST_NK,
TRG.PAGE_TITLE,
TRG.PAGE_DESCR,
TRG.PAGE_KEYWORDS,
1 AS DELETE_FLAG,
STRFTIME('%s','now') AS UPDATE_DT
FROM (
SELECT
DIM_REQUEST_ID,
REQUEST_NK,
PAGE_TITLE,
PAGE_DESCR,
PAGE_KEYWORDS
FROM DIM_REQUEST_V_ACT
WHERE PAGE_TITLE != 'n.a.' -- track master data only
AND DELETE_FLAG = 0 -- not already deleted
) TRG
LEFT OUTER JOIN CLS ON TRG.REQUEST_NK = CLS.REQUEST_NK
WHERE CLS.REQUEST_NK IS NULL -- no such record in stage
Kull sors tad-dejta għandu jkun akkumpanjat minn deskrizzjoni formali, pereżempju, fil-fajl readme.txt:
Ir-riċevitur tad-dejta formalment/teknikament: isem, indirizz elettroniku
Fornitur tad-dejta formalment/teknikament: isem, indirizz elettroniku
Sors tad-dejta: mogħdija tal-fajl, ismijiet tas-servizz
Informazzjoni dwar l-aċċess tad-dejta: utenti u passwords
Id-dijagramma tal-fluss tad-dejta se tgħin fil-proċess ta 'manutenzjoni u aġġornament, pereżempju, f'forma ta' test:
Ċaqliq ta' fajl. Sors: ftp.domain.net: /logs/access.log Mira: /var/www/access.log
Qari fuq il-palk. Mira: STG_ACCESS_LOG
Tagħbija u trasformazzjoni. Mira: FCT_ACCESS_REQUEST_REF_HH
Tagħbija u trasformazzjoni. Mira: FCT_ACCESS_USER_AGENT_DD
Rapport. Mira: /var/www/report.html
Output
Għalhekk, l-artikolu jiddeskrivi mekkaniżmi bħall-integrazzjoni tad-dejta sottostanti u l-introduzzjoni ta 'livelli ġodda ta' aggregazzjoni. Huma meħtieġa meta jinbnew imħażen tad-dejta sabiex jinkiseb għarfien addizzjonali u tittejjeb il-kwalità tal-informazzjoni.
Sors: www.habr.com