Saiti statistics uye yako pachako diki yekuchengetedza

Webalizer neGoogle Analytics zvakandibatsira kuwana nzwisiso yezviri kuitika pamawebhusaiti kwemakore mazhinji. Zvino ndinonzwisisa kuti vanopa ruzivo rushoma runobatsira. Kuwana mukana kune yako access.log faira, zviri nyore kwazvo kunzwisisa manhamba uye kushandisa maturusi akakosha, akadai se sqlite, html, mutauro wesql uye chero mutauro wekunyora mapurogiramu.

Kunobva data reWebalizer iserver's access.log file. Izvi ndizvo zvakaita mabara uye nhamba dzayo, kubva kune iyo chete huwandu hwese hwetraffic hwakajeka:

Saiti statistics uye yako pachako diki yekuchengetedza
Saiti statistics uye yako pachako diki yekuchengetedza
Zvishandiso zvakaita seGoogle Analytics zvinounganidza data kubva pane yakarodha peji pachayo. Vanotiratidza madhayagiramu nemitsara, zvichibva pazviri kazhinji zvakaoma kutora mhedzisiro chaiyo. Zvichida kuedza kwakawanda kwaifanira kunge kwaitwa? Kusaziva.

Saka, chii chandaida kuona mune webhusaiti yevashanyi nhamba?

Mushandisi uye bot traffic

Kazhinji saiti traffic ishoma uye zvinodikanwa kuti uone kuti yakawanda sei inobatsira traffic iri kushandiswa. Semuenzaniso, seizvi:

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT
1 as 'StackedArea: Traffic generated by Users and Bots',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN USG.AGENT_BOT!='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Bots, KB',
SUM(CASE WHEN USG.AGENT_BOT='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Users, KB'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Girafu inoratidza kushanda nguva dzose kwebhoti. Zvingave zvinonakidza kudzidza zvakadzama vamiriri vanoshanda zvakanyanya.

Annoying bots

Isu tinoronga bots zvichienderana neruzivo rwemumiriri wemushandisi. Mamwe manhamba pane traffic yezuva nezuva, nhamba yezvakabudirira uye zvisina kubudirira zvikumbiro zvinopa zano rakanaka rekuita bot.

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT 
1 AS 'Table: Annoying Bots',
MAX(USG.AGENT_BOT) AS 'Bot',
ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day',
ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Client Error', 'Server Error') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Error Requests per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Successful', 'Redirection') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Success Requests per Day',
USG.USER_AGENT_NK AS 'Agent'
FROM FCT_ACCESS_USER_AGENT_DD FCT,
     DIM_USER_AGENT USG,
     DIM_HTTP_STATUS STS
WHERE FCT.DIM_USER_AGENT_ID = USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = STS.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT != 'n.a.'
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY USG.USER_AGENT_NK
ORDER BY 3 DESC
LIMIT 10

Muchiitiko ichi, chigumisiro chekuongorora chaiva chisarudzo chekudzivirira kupinda panzvimbo yacho nekuwedzera kune robots.txt file.

User-agent: AhrefsBot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: bingbot
Crawl-delay: 5

Iwo maviri ekutanga mabhoti akanyangarika patafura, uye marobhoti eMS akaburuka kubva pamitsetse yekutanga.

Zuva uye nguva yebasa guru

Kukwirira kunoonekwa mu traffic. Kuti udzidze iwo zvakadzama, zvinodikanwa kuratidza nguva yekuitika kwavo, uye hazvifanirwe kuratidza maawa ese nemazuva ekuyera nguva. Izvi zvichaita kuti zvive nyore kuwana zvikumbiro zvemunhu mufaira regi kana kuongororwa kwakadzama kuchidikanwa.

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT
1 AS 'Line: Day and Hour of Hits from Users and Bots',
strftime('%d.%m-%H', datetime(EVENT_DT, 'unixepoch')) AS 'Date Time',
HIB AS 'Bots, Hits',
HIU AS 'Users, Hits'
FROM (
	SELECT
	EVENT_DT,
	SUM(CASE WHEN AGENT_BOT!='n.a.' THEN LINE_CNT ELSE 0 END) AS HIB,
	SUM(CASE WHEN AGENT_BOT='n.a.' THEN LINE_CNT ELSE 0 END) AS HIU
	FROM FCT_ACCESS_REQUEST_REF_HH
	WHERE datetime(EVENT_DT, 'unixepoch') >= date('now', '-14 day')
	GROUP BY EVENT_DT
	ORDER BY SUM(LINE_CNT) DESC
	LIMIT 10
) ORDER BY EVENT_DT

Tinocherechedza maawa anonyanya kushanda 11, 14 uye 20 ezuva rokutanga pane chati. Asi zuva rakatevera na13:XNUMX mabhoti akanga achishanda.

Avhareji yezuva nezuva mushandisi chiitiko nevhiki

Takagadzirisa zvinhu zvishoma nekuita uye traffic. Mubvunzo unotevera waiva basa revashandisi pachavo. Kune nhamba dzakadaro, nguva yakareba yekuunganidza, yakadai sevhiki, inodiwa.

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT
1 as 'Line: Average Daily User Activity by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
ROUND(1.0*SUM(FCT.PAGE_CNT)/SUM(FCT.IP_CNT),1) AS 'Pages per IP per Day',
ROUND(1.0*SUM(FCT.FILE_CNT)/SUM(FCT.IP_CNT),1) AS 'Files per IP per Day'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT='n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Nhamba dzevhiki nevhiki dzinoratidza kuti paavhareji mushandisi mumwe anovhura mapeji 1,6 pazuva. Huwandu hwemafaira akakumbirwa pamushandisi mune iyi kesi zvinoenderana nekuwedzerwa kwemafaira matsva kune saiti.

Zvese zvikumbiro uye mamiriro azvo

Webalizer yaigara ichiratidza chaiwo mapeji makodhi uye ini ndaigara ndichida kuona chete nhamba yezvakabudirira zvikumbiro uye zvikanganiso.

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT
1 as 'Line: All Requests by Status',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN STS.STATUS_GROUP='Successful' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Success',
SUM(CASE WHEN STS.STATUS_GROUP='Redirection' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Redirect',
SUM(CASE WHEN STS.STATUS_GROUP='Client Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Customer Error',
SUM(CASE WHEN STS.STATUS_GROUP='Server Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Server Error'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_HTTP_STATUS STS
WHERE FCT.DIM_HTTP_STATUS_ID=STS.DIM_HTTP_STATUS_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Chirevo chinoratidza zvikumbiro, kwete kubaya (zvakarova), zvakasiyana neLINE_CNT, iyo REQUEST_CNT metric inoverengerwa se COUNT(DISTINCT STG.REQUEST_NK). Chinangwa ndechekuratidza zviitiko zvinobudirira, semuenzaniso, MS bots poll the robots.txt file mazana emazuva pazuva uye, munyaya iyi, sarudzo dzakadaro dzichaverengwa kamwe chete. Izvi zvinokutendera kuti utsvedze kusvetuka mugirafu.

Kubva pagirafu unogona kuona zvikanganiso zvakawanda - aya mapeji asiripo. Mhedzisiro yekuongorora yaive yekuwedzera kwekudzokororwa kubva kumapeji ari kure.

Zvikumbiro zvakaipa

Kuti uongorore zvikumbiro zvakadzama, unogona kuratidza huwandu hwehuwandu.

Saiti statistics uye yako pachako diki yekuchengetedza

SQL report query

SELECT
  1 AS 'Table: Top Error Requests',
  REQ.REQUEST_NK AS 'Request',
  'Error' AS 'Request Status',
  ROUND(SUM(FCT.LINE_CNT) / 14.0, 1) AS 'Hits per Day',
  ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
  ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day'
FROM
  FCT_ACCESS_REQUEST_REF_HH FCT,
  DIM_REQUEST_V_ACT REQ
WHERE FCT.DIM_REQUEST_ID = REQ.DIM_REQUEST_ID
  AND FCT.STATUS_GROUP IN ('Client Error', 'Server Error')
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY REQ.REQUEST_NK
ORDER BY 4 DESC
LIMIT 20

Iyi rondedzero ichavawo nefoni dzese, semuenzaniso, chikumbiro ku /wp-login.php Nekugadzirisa mitemo yekunyorazve zvikumbiro nevhavha, unogona kugadzirisa maitiro evhavha kune zvikumbiro zvakadaro uye uzvitumire kune peji rekutanga.

Saka, mishumo mishoma yakapusa yakavakirwa pane server log faira inopa mufananidzo wakakwana wezviri kuitika pasaiti.

Nzira yekuwana ruzivo?

A sqlite database yakakwana. Ngatigadzire matafura: anobatsira pakutema ETL maitiro.

Saiti statistics uye yako pachako diki yekuchengetedza

Tafura nhanho apo isu tichanyora egi mafaira tichishandisa PHP. Matafura maviri akaunganidzwa. Ngatigadzire tafura yemazuva ese ine nhamba dzevashandisi vamiririri uye mastatus ekukumbira. Paawa nenhamba dzezvikumbiro, mapoka ezvimiro uye vamiririri. Matafura mana ezviyero zvakakodzera.

Mhedzisiro iyi inotevera yehukama modhi:

Data modelSaiti statistics uye yako pachako diki yekuchengetedza

Script yekugadzira chinhu mune sqlite dhatabhesi:

DDL kugadzirwa kwezvinhu

DROP TABLE IF EXISTS DIM_USER_AGENT;
CREATE TABLE DIM_USER_AGENT (
  DIM_USER_AGENT_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  USER_AGENT_NK     TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_OS          TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_ENGINE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_DEVICE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_BOT         TEXT NOT NULL DEFAULT 'n.a.',
  UPDATE_DT         INTEGER NOT NULL DEFAULT 0,
  UNIQUE (USER_AGENT_NK)
);
INSERT INTO DIM_USER_AGENT (DIM_USER_AGENT_ID) VALUES (-1);

Danho

Panyaya yekuwana.log faira, zvakakosha kuverenga, kuongorora uye kunyora zvese zvikumbiro kune database. Izvi zvinogona kuitwa zvakananga uchishandisa mutauro wekunyora kana kushandisa sqlite zvishandiso.

Log file format:

//67.221.59.195 - - [28/Dec/2012:01:47:47 +0100] "GET /files/default.css HTTP/1.1" 200 1512 "https://project.edu/" "Mozilla/4.0"
//host ident auth time method request_nk protocol status bytes ref browser
$log_pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) ([[^]]+]) "(.*) (.*) (.*)" ([0-9-]+) ([0-9-]+) "(.*)" "(.*)"$/';

Kuparadzira kwakakosha

Kana iyo mbishi data iri mudhatabhesi, unofanirwa kunyora makiyi asirimo mumatafura ekuyera. Ipapo zvichave zvichigoneka kuvaka chirevo kune zviyero. Semuenzaniso, mune DIM_REFERRER tafura, kiyi musanganiswa weminda mitatu.

SQL kiyi yekuparadzira mubvunzo

/* Propagate the referrer from access log */
INSERT INTO DIM_REFERRER (HOST_NK, PATH_NK, QUERY_NK, UPDATE_DT)
SELECT
	CLS.HOST_NK,
	CLS.PATH_NK,
	CLS.QUERY_NK,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT DISTINCT
	REFERRER_HOST AS HOST_NK,
	REFERRER_PATH AS PATH_NK,
	CASE WHEN INSTR(REFERRER_QUERY,'&sid')>0 THEN SUBSTR(REFERRER_QUERY, 1, INSTR(REFERRER_QUERY,'&sid')-1) /* ΠΎΡ‚Ρ€Π΅Π·Π°Π΅ΠΌ sid - спСцифика цмс */
	ELSE REFERRER_QUERY END AS QUERY_NK
	FROM STG_ACCESS_LOG
) CLS
LEFT OUTER JOIN DIM_REFERRER TRG
ON (CLS.HOST_NK = TRG.HOST_NK AND CLS.PATH_NK = TRG.PATH_NK AND CLS.QUERY_NK = TRG.QUERY_NK)
WHERE TRG.DIM_REFERRER_ID IS NULL

Kuparadzira kune tafura yemumiriri wemushandisi inogona kunge iine bot logic, semuenzaniso iyo sql snippet:


CASE
WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
	THEN 'yandex'
WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
	THEN 'google'
WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
	THEN 'microsoft'
WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
	THEN 'ahrefs'
WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
	THEN 'majestic-12'
WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
	OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
	OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
	OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0
	OR INSTR(LOWER(CLS.BROWSER),'wget')>0
	THEN 'other'
ELSE 'n.a.' END AS AGENT_BOT

Aggregate tables

Chekupedzisira, isu tinorodha matafura akaunganidzwa; semuenzaniso, tafura yemazuva ese inogona kurodha sezvinotevera:

SQL mubvunzo wekurodha akaunganidzwa

/* Load fact from access log */
INSERT INTO FCT_ACCESS_USER_AGENT_DD (EVENT_DT, DIM_USER_AGENT_ID, DIM_HTTP_STATUS_ID, PAGE_CNT, FILE_CNT, REQUEST_CNT, LINE_CNT, IP_CNT, BYTES)
WITH STG AS (
SELECT
	STRFTIME( '%s', SUBSTR(TIME_NK,9,4) || '-' ||
	CASE SUBSTR(TIME_NK,5,3)
	WHEN 'Jan' THEN '01' WHEN 'Feb' THEN '02' WHEN 'Mar' THEN '03' WHEN 'Apr' THEN '04' WHEN 'May' THEN '05' WHEN 'Jun' THEN '06'
	WHEN 'Jul' THEN '07' WHEN 'Aug' THEN '08' WHEN 'Sep' THEN '09' WHEN 'Oct' THEN '10' WHEN 'Nov' THEN '11'
	ELSE '12' END || '-' || SUBSTR(TIME_NK,2,2) || ' 00:00:00' ) AS EVENT_DT,
	BROWSER AS USER_AGENT_NK,
	REQUEST_NK,
	IP_NR,
	STATUS,
	LINE_NK,
	BYTES
FROM STG_ACCESS_LOG
)
SELECT
	CAST(STG.EVENT_DT AS INTEGER) AS EVENT_DT,
	USG.DIM_USER_AGENT_ID,
	HST.DIM_HTTP_STATUS_ID,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')=0 THEN STG.REQUEST_NK END) ) AS PAGE_CNT,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')>0 THEN STG.REQUEST_NK END) ) AS FILE_CNT,
	COUNT(DISTINCT STG.REQUEST_NK) AS REQUEST_CNT,
	COUNT(DISTINCT STG.LINE_NK) AS LINE_CNT,
	COUNT(DISTINCT STG.IP_NR) AS IP_CNT,
	SUM(BYTES) AS BYTES
FROM STG,
	DIM_HTTP_STATUS HST,
	DIM_USER_AGENT USG
WHERE STG.STATUS = HST.STATUS_NK
  AND STG.USER_AGENT_NK = USG.USER_AGENT_NK
  AND CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from /* load epoch date */
  AND CAST(STG.EVENT_DT AS INTEGER) < strftime('%s', date('now', 'start of day'))
GROUP BY STG.EVENT_DT, HST.DIM_HTTP_STATUS_ID, USG.DIM_USER_AGENT_ID

Iyo sqlite dhatabhesi inobvumidza iwe kunyora yakaoma mibvunzo. WITH ine gadziriro yedata nemakiyi. Muvhunzo mukuru unounganidza zvese zvinongedzo kune zviyero.

Mamiriro acho haabvumire kurodha nhoroondo zvakare: CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from, uko parameter iri mhedzisiro yechikumbiro.
'SELECT COALESCE(MAX(EVENT_DT),'3600') AS LAST_EVENT_EPOCH KUBVA FCT_ACCESS_USER_AGENT_DD'

Chimiro chinotakura chete zuva rose: CAST(STG.EVENT_DT AS INTEGER) <strftime('%s', date('ikozvino', 'kutanga kwezuva'))

Kuverenga mapeji kana mafaera kunoitwa nenzira yechinyakare, nekutsvaga poindi.

ΠžΡ‚Ρ‡Ρ‘Ρ‚Ρ‹

Mune akaomarara ekuona masisitimu, zvinokwanisika kugadzira meta-modhi yakavakirwa pane dhatabhesi zvinhu, zvine simba kutonga mafirita uye aggregation mitemo. Pakupedzisira, ese maturusi akanaka anogadzira mubvunzo weSQL.

Mumuenzaniso uyu, isu tichagadzira yakagadzirira-yakagadzirwa SQL mibvunzo uye toichengeta semaonero mudhatabhesi - iyi mishumo.

Kufungidzira

Bluff: Yakanaka magirafu muJavaScript yakashandiswa sechinhu chekuona

Kuti uite izvi, zvaive zvakakodzera kuti uende kuburikidza nemishumo yese uchishandisa PHP uye kugadzira html faira nematafura.

$sqls = array(
'SELECT * FROM RPT_ACCESS_USER_VS_BOT',
'SELECT * FROM RPT_ACCESS_ANNOYING_BOT',
'SELECT * FROM RPT_ACCESS_TOP_HOUR_HIT',
'SELECT * FROM RPT_ACCESS_USER_ACTIVE',
'SELECT * FROM RPT_ACCESS_REQUEST_STATUS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_PAGE',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_REFERRER',
'SELECT * FROM RPT_ACCESS_NEW_REQUEST',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_SUCCESS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_ERROR'
);

Chishandiso chinongoona matafura emhedzisiro.

mhedziso

Uchishandisa kuongororwa kwewebhu semuenzaniso, chinyorwa chinotsanangura nzira dzinodiwa kuvaka matura edata. Sezvinogona kuonekwa kubva mumigumisiro, zvishandiso zviri nyore zvinokwana kuongororwa kwakadzama uye kuona kwe data.

Mune ramangwana, tichishandisa iyi repository semuenzaniso, tichaedza kuita zvimiro zvakadai sekuchinja zvishoma nezvishoma zviyero, metadata, mazinga ekuunganidza uye kubatanidzwa kwedata kubva kwakasiyana.

Zvakare, ngatitarisei zvakanyanya chishandiso chakareruka chekutonga ETL maitiro akavakirwa patafura imwechete.

Ngatidzokerei kunyaya yekuyera mhando yedata uye otomatiki maitiro aya.

Isu tichadzidza matambudziko enzvimbo yehunyanzvi uye kugadzirisa kwekuchengetedza data, kwatinozoita sevha yekuchengetedza ine zviwanikwa zvishoma, semuenzaniso, zvichibva paRaspberry Pi.

Source: www.habr.com

Voeg