Webalizer ndi Google Analytics zandithandiza kudziwa zomwe zikuchitika pamasamba kwa zaka zambiri. Tsopano ndikumvetsetsa kuti amapereka zambiri zothandiza. Kukhala ndi mwayi wopeza fayilo yanu ya access.log, ndikosavuta kumvetsetsa ziwerengero komanso kugwiritsa ntchito zida zofunika kwambiri, monga sqlite, html, chilankhulo cha sql ndi chilankhulo chilichonse cholembera.
Gwero la data la Webalizer ndi fayilo ya access.log ya seva. Umu ndi momwe mipiringidzo ndi manambala ake amawonekera, pomwe kuchuluka kwa magalimoto kumamveka bwino:
Zida monga Google Analytics zimasonkhanitsa deta kuchokera patsamba lomwe ladzaza. Amatiwonetsa zithunzi zingapo ndi mizere, kutengera zomwe nthawi zambiri zimakhala zovuta kupeza mfundo zolondola. Mwinamwake khama lowonjezereka likanayenera kupangidwa? Sindikudziwa.
Ndiye, kodi ndimafuna kuwona chiyani pamawerengero a alendo apa webusayiti?
Ogwiritsa ntchito ndi bot traffic
Nthawi zambiri kuchuluka kwa magalimoto pamasamba kumakhala kochepa ndipo ndikofunikira kuwona kuchuluka kwa magalimoto omwe akugwiritsidwa ntchito. Mwachitsanzo, monga chonchi:
Lipoti la funso la SQL
SELECT
1 as 'StackedArea: Traffic generated by Users and Bots',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN USG.AGENT_BOT!='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Bots, KB',
SUM(CASE WHEN USG.AGENT_BOT='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Users, KB'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USG
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT
Grafu ikuwonetsa ntchito zokhazikika za bots. Zingakhale zosangalatsa kuphunzira mwatsatanetsatane oimira ogwira ntchito kwambiri.
Zokhumudwitsa bots
Timayika bots kutengera chidziwitso cha wogwiritsa ntchito. Ziwerengero zowonjezera pamagalimoto atsiku ndi tsiku, kuchuluka kwa zopempha zopambana komanso zosachita bwino zimapereka lingaliro labwino la zochitika za bot.
Lipoti la funso la SQL
SELECT
1 AS 'Table: Annoying Bots',
MAX(USG.AGENT_BOT) AS 'Bot',
ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day',
ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Client Error', 'Server Error') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Error Requests per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Successful', 'Redirection') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Success Requests per Day',
USG.USER_AGENT_NK AS 'Agent'
FROM FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USG,
DIM_HTTP_STATUS STS
WHERE FCT.DIM_USER_AGENT_ID = USG.DIM_USER_AGENT_ID
AND FCT.DIM_HTTP_STATUS_ID = STS.DIM_HTTP_STATUS_ID
AND USG.AGENT_BOT != 'n.a.'
AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY USG.USER_AGENT_NK
ORDER BY 3 DESC
LIMIT 10
Pachifukwa ichi, zotsatira za kusanthula zinali chisankho choletsa mwayi wopezeka pamalowa powonjezera pa fayilo ya robots.txt.
User-agent: AhrefsBot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: bingbot
Crawl-delay: 5
Mabotolo awiri oyamba adasowa patebulo, ndipo maloboti a MS adatsika kuchokera pamzere woyamba.
Tsiku ndi nthawi ya ntchito zazikulu
Kuthamanga kumawonekera mumsewu. Kuti muwaphunzire mwatsatanetsatane, ndikofunikira kuwonetsa nthawi yomwe adachitika, ndipo sikofunikira kuwonetsa maola onse ndi masiku akuyezera nthawi. Izi zipangitsa kuti kukhale kosavuta kupeza zopempha za munthu aliyense mu fayilo ya chipika ngati kusanthula mwatsatanetsatane pakufunika.
Lipoti la funso la SQL
SELECT
1 AS 'Line: Day and Hour of Hits from Users and Bots',
strftime('%d.%m-%H', datetime(EVENT_DT, 'unixepoch')) AS 'Date Time',
HIB AS 'Bots, Hits',
HIU AS 'Users, Hits'
FROM (
SELECT
EVENT_DT,
SUM(CASE WHEN AGENT_BOT!='n.a.' THEN LINE_CNT ELSE 0 END) AS HIB,
SUM(CASE WHEN AGENT_BOT='n.a.' THEN LINE_CNT ELSE 0 END) AS HIU
FROM FCT_ACCESS_REQUEST_REF_HH
WHERE datetime(EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY EVENT_DT
ORDER BY SUM(LINE_CNT) DESC
LIMIT 10
) ORDER BY EVENT_DT
Timawona maola ogwira ntchito kwambiri 11, 14 ndi 20 a tsiku loyamba pa tchati. Koma tsiku lotsatira pa 13:XNUMX bots anali akugwira ntchito.
Avereji ya zochita za ogwiritsa ntchito tsiku lililonse ndi sabata
Tinakonza zinthu pang'ono ndi zochitika ndi magalimoto. Funso lotsatira linali ntchito ya ogwiritsa ntchito okha. Paziwerengero zotere, nthawi yayitali yophatikiza, monga sabata, ndiyofunikira.
Lipoti la funso la SQL
SELECT
1 as 'Line: Average Daily User Activity by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
ROUND(1.0*SUM(FCT.PAGE_CNT)/SUM(FCT.IP_CNT),1) AS 'Pages per IP per Day',
ROUND(1.0*SUM(FCT.FILE_CNT)/SUM(FCT.IP_CNT),1) AS 'Files per IP per Day'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_USER_AGENT USG,
DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
AND USG.AGENT_BOT='n.a.' /* users only */
AND HST.STATUS_GROUP IN ('Successful') /* good pages */
AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT
Ziwerengero zamlungu ndi mlungu zimasonyeza kuti pafupifupi wosuta mmodzi amatsegula masamba 1,6 patsiku. Chiwerengero cha mafayilo omwe afunsidwa pa wogwiritsa ntchito pankhaniyi amadalira kuwonjezera kwa mafayilo atsopano patsamba.
Zopempha zonse ndi ma status awo
Webalizer nthawi zonse amawonetsa masamba enieni ndipo nthawi zonse ndimafuna kuwona kuchuluka kwa zopempha ndi zolakwika zomwe zapambana.
Lipoti la funso la SQL
SELECT
1 as 'Line: All Requests by Status',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN STS.STATUS_GROUP='Successful' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Success',
SUM(CASE WHEN STS.STATUS_GROUP='Redirection' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Redirect',
SUM(CASE WHEN STS.STATUS_GROUP='Client Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Customer Error',
SUM(CASE WHEN STS.STATUS_GROUP='Server Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Server Error'
FROM
FCT_ACCESS_USER_AGENT_DD FCT,
DIM_HTTP_STATUS STS
WHERE FCT.DIM_HTTP_STATUS_ID=STS.DIM_HTTP_STATUS_ID
AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT
Lipoti likuwonetsa zopempha, osati kudina (kugunda), mosiyana ndi LINE_CNT, ma metric a REQUEST_CNT amawerengedwa ngati COUNT(DISTINCT STG.REQUEST_NK). Cholinga ndikuwonetsa zochitika zogwira mtima, mwachitsanzo, MS bots poll robots.txt fayilo kangapo patsiku ndipo, pamenepa, zisankho zoterezi zidzawerengedwa kamodzi. Izi zimakuthandizani kuti muzitha kulumpha mu graph.
Kuchokera pa graph mutha kuwona zolakwika zambiri - awa ndi masamba omwe palibe. Chotsatira cha kusanthula chinali kuwonjezeredwa kwa kuwongolera kuchokera kumasamba akutali.
Zopempha zoipa
Kuti muwone mwatsatanetsatane zopempha, mutha kuwonetsa ziwerengero zatsatanetsatane.
Lipoti la funso la SQL
SELECT
1 AS 'Table: Top Error Requests',
REQ.REQUEST_NK AS 'Request',
'Error' AS 'Request Status',
ROUND(SUM(FCT.LINE_CNT) / 14.0, 1) AS 'Hits per Day',
ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day'
FROM
FCT_ACCESS_REQUEST_REF_HH FCT,
DIM_REQUEST_V_ACT REQ
WHERE FCT.DIM_REQUEST_ID = REQ.DIM_REQUEST_ID
AND FCT.STATUS_GROUP IN ('Client Error', 'Server Error')
AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY REQ.REQUEST_NK
ORDER BY 4 DESC
LIMIT 20
Mndandandawu udzakhalanso ndi mafoni onse, mwachitsanzo, pempho ku /wp-login.php Mwa kusintha malamulo olemberanso zopempha ndi seva, mukhoza kusintha machitidwe a seva pazopempha zotere ndikuzitumiza ku tsamba loyambira.
Chifukwa chake, malipoti osavuta otengera fayilo ya chipika cha seva amapereka chithunzi chokwanira cha zomwe zikuchitika patsambalo.
Mungapeze bwanji zambiri?
Nawonso database ya sqlite ndiyokwanira. Tiyeni tipange matebulo: othandizira pakudula mitengo ya ETL.
Patebulo pomwe tidzalemba mafayilo olembera pogwiritsa ntchito PHP. Matebulo awiri ophatikiza. Tiyeni tipange tebulo latsiku ndi tsiku lokhala ndi ziwerengero za ogwiritsa ntchito ndi ma status ofunsira. Ola lililonse ndi ziwerengero za zopempha, magulu ndi othandizira. Matebulo anayi a miyeso yoyenera.
Zotsatira zake ndi chitsanzo chotsatirachi chaubale:
Mtundu wa data
Script kuti mupange chinthu mu database ya sqlite:
Kupanga zinthu za DDL
DROP TABLE IF EXISTS DIM_USER_AGENT;
CREATE TABLE DIM_USER_AGENT (
DIM_USER_AGENT_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
USER_AGENT_NK TEXT NOT NULL DEFAULT 'n.a.',
AGENT_OS TEXT NOT NULL DEFAULT 'n.a.',
AGENT_ENGINE TEXT NOT NULL DEFAULT 'n.a.',
AGENT_DEVICE TEXT NOT NULL DEFAULT 'n.a.',
AGENT_BOT TEXT NOT NULL DEFAULT 'n.a.',
UPDATE_DT INTEGER NOT NULL DEFAULT 0,
UNIQUE (USER_AGENT_NK)
);
INSERT INTO DIM_USER_AGENT (DIM_USER_AGENT_ID) VALUES (-1);
Gawo
Pankhani ya fayilo ya access.log, ndikofunikira kuwerenga, kusanthula ndi kulemba zopempha zonse ku database. Izi zitha kuchitika mwachindunji pogwiritsa ntchito chilankhulo cholembera kapena kugwiritsa ntchito zida za sqlite.
Mtundu wa fayilo ya Log:
//67.221.59.195 - - [28/Dec/2012:01:47:47 +0100] "GET /files/default.css HTTP/1.1" 200 1512 "https://project.edu/" "Mozilla/4.0"
//host ident auth time method request_nk protocol status bytes ref browser
$log_pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) ([[^]]+]) "(.*) (.*) (.*)" ([0-9-]+) ([0-9-]+) "(.*)" "(.*)"$/';
Kufalitsa kofunikira
Pamene deta yaiwisi ili mu nkhokwe, muyenera kulemba makiyi omwe mulibe m'matebulo oyezera. Ndiye kudzakhala kotheka kumanga kufotokoza kwa miyeso. Mwachitsanzo, pa DIM_REFERRER tebulo, kiyi ndi kuphatikiza magawo atatu.
SQL key propagation query
/* Propagate the referrer from access log */
INSERT INTO DIM_REFERRER (HOST_NK, PATH_NK, QUERY_NK, UPDATE_DT)
SELECT
CLS.HOST_NK,
CLS.PATH_NK,
CLS.QUERY_NK,
STRFTIME('%s','now') AS UPDATE_DT
FROM (
SELECT DISTINCT
REFERRER_HOST AS HOST_NK,
REFERRER_PATH AS PATH_NK,
CASE WHEN INSTR(REFERRER_QUERY,'&sid')>0 THEN SUBSTR(REFERRER_QUERY, 1, INSTR(REFERRER_QUERY,'&sid')-1) /* ΠΎΡΡΠ΅Π·Π°Π΅ΠΌ sid - ΡΠΏΠ΅ΡΠΈΡΠΈΠΊΠ° ΡΠΌΡ */
ELSE REFERRER_QUERY END AS QUERY_NK
FROM STG_ACCESS_LOG
) CLS
LEFT OUTER JOIN DIM_REFERRER TRG
ON (CLS.HOST_NK = TRG.HOST_NK AND CLS.PATH_NK = TRG.PATH_NK AND CLS.QUERY_NK = TRG.QUERY_NK)
WHERE TRG.DIM_REFERRER_ID IS NULL
Kufalitsa patebulo la wogwiritsa ntchito kumatha kukhala ndi malingaliro a bot, mwachitsanzo sql snippet:
CASE
WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
THEN 'yandex'
WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
THEN 'google'
WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
THEN 'microsoft'
WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
THEN 'ahrefs'
WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
THEN 'majestic-12'
WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0
OR INSTR(LOWER(CLS.BROWSER),'wget')>0
THEN 'other'
ELSE 'n.a.' END AS AGENT_BOT
Matebulo ophatikiza
Pomaliza, tidzakweza matebulo ophatikizika; mwachitsanzo, tebulo latsiku ndi tsiku litha kuyikidwa motere:
Funso la SQL pakutsitsa aggregate
/* Load fact from access log */
INSERT INTO FCT_ACCESS_USER_AGENT_DD (EVENT_DT, DIM_USER_AGENT_ID, DIM_HTTP_STATUS_ID, PAGE_CNT, FILE_CNT, REQUEST_CNT, LINE_CNT, IP_CNT, BYTES)
WITH STG AS (
SELECT
STRFTIME( '%s', SUBSTR(TIME_NK,9,4) || '-' ||
CASE SUBSTR(TIME_NK,5,3)
WHEN 'Jan' THEN '01' WHEN 'Feb' THEN '02' WHEN 'Mar' THEN '03' WHEN 'Apr' THEN '04' WHEN 'May' THEN '05' WHEN 'Jun' THEN '06'
WHEN 'Jul' THEN '07' WHEN 'Aug' THEN '08' WHEN 'Sep' THEN '09' WHEN 'Oct' THEN '10' WHEN 'Nov' THEN '11'
ELSE '12' END || '-' || SUBSTR(TIME_NK,2,2) || ' 00:00:00' ) AS EVENT_DT,
BROWSER AS USER_AGENT_NK,
REQUEST_NK,
IP_NR,
STATUS,
LINE_NK,
BYTES
FROM STG_ACCESS_LOG
)
SELECT
CAST(STG.EVENT_DT AS INTEGER) AS EVENT_DT,
USG.DIM_USER_AGENT_ID,
HST.DIM_HTTP_STATUS_ID,
COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')=0 THEN STG.REQUEST_NK END) ) AS PAGE_CNT,
COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')>0 THEN STG.REQUEST_NK END) ) AS FILE_CNT,
COUNT(DISTINCT STG.REQUEST_NK) AS REQUEST_CNT,
COUNT(DISTINCT STG.LINE_NK) AS LINE_CNT,
COUNT(DISTINCT STG.IP_NR) AS IP_CNT,
SUM(BYTES) AS BYTES
FROM STG,
DIM_HTTP_STATUS HST,
DIM_USER_AGENT USG
WHERE STG.STATUS = HST.STATUS_NK
AND STG.USER_AGENT_NK = USG.USER_AGENT_NK
AND CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from /* load epoch date */
AND CAST(STG.EVENT_DT AS INTEGER) < strftime('%s', date('now', 'start of day'))
GROUP BY STG.EVENT_DT, HST.DIM_HTTP_STATUS_ID, USG.DIM_USER_AGENT_ID
Database ya sqlite imakulolani kuti mulembe mafunso ovuta. WITH ili ndi kukonzekera kwa data ndi makiyi. Funso lalikulu limasonkhanitsa maumboni onse amiyeso.
Mkhalidwewu sungalole kutsitsanso mbiriyi: CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from, pomwe parameter ndi zotsatira za pempho
'SAKHANI COALESCE(MAX(EVENT_DT),'3600') MONGA LAST_EVENT_EPOCH KUCHOKERA FCT_ACCESS_USER_AGENT_DD'
Chikhalidwecho chidzatsegulidwa tsiku lonse: CAST(STG.EVENT_DT AS INTEGER) <strftime('%s', deti('now', 'start of day'))
Kuwerengera masamba kapena mafayilo kumachitika mwachikale, pofufuza mfundo.
Malipoti
M'mawonekedwe ovuta, ndizotheka kupanga meta-model yotengera zinthu za database, kuwongolera zosefera ndi malamulo ophatikiza. Pamapeto pake, zida zonse zabwino zimapanga funso la SQL.
Muchitsanzo ichi, tipanga mafunso okonzekera a SQL ndikusunga ngati mawonedwe mu database - awa ndi malipoti.
Kuwonetseratu
Bluff: Zithunzi zokongola mu JavaScript zidagwiritsidwa ntchito ngati chida chowonera
Kuti muchite izi, kunali koyenera kudutsa malipoti onse pogwiritsa ntchito PHP ndikupanga fayilo ya html yokhala ndi matebulo.
$sqls = array(
'SELECT * FROM RPT_ACCESS_USER_VS_BOT',
'SELECT * FROM RPT_ACCESS_ANNOYING_BOT',
'SELECT * FROM RPT_ACCESS_TOP_HOUR_HIT',
'SELECT * FROM RPT_ACCESS_USER_ACTIVE',
'SELECT * FROM RPT_ACCESS_REQUEST_STATUS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_PAGE',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_REFERRER',
'SELECT * FROM RPT_ACCESS_NEW_REQUEST',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_SUCCESS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_ERROR'
);
Chidachi chimangowona magome a zotsatira.
Pomaliza
Pogwiritsa ntchito kusanthula kwa intaneti monga chitsanzo, nkhaniyo ikufotokoza njira zofunika pomanga malo osungiramo deta. Monga momwe zikuwonekera kuchokera ku zotsatira, zida zosavuta ndizokwanira kusanthula mozama ndi kuwonetseratu deta.
M'tsogolomu, pogwiritsa ntchito chosungirachi monga chitsanzo, tidzayesa kugwiritsa ntchito zinthu monga kusintha pang'onopang'ono miyeso, metadata, milingo yophatikizira ndi kuphatikiza deta kuchokera kuzinthu zosiyanasiyana.
Komanso, tiyeni tiwone mwatsatanetsatane chida chosavuta chowongolera njira za ETL potengera tebulo limodzi.
Tiyeni tibwererenso kumutu woyezera kuchuluka kwa data ndikusintha kachitidwe kameneka.
Tidzaphunzira za zovuta zamaluso ndi kukonza zosungirako za data, zomwe tidzakhazikitsa seva yosungiramo zinthu zochepa, mwachitsanzo, kutengera Raspberry Pi.
Source: www.habr.com