Ziwerengero zapatsamba ndi malo anu ochepa osungira

Webalizer ndi Google Analytics zandithandiza kudziwa zomwe zikuchitika pamasamba kwa zaka zambiri. Tsopano ndikumvetsetsa kuti amapereka zambiri zothandiza. Kukhala ndi mwayi wopeza fayilo yanu ya access.log, ndikosavuta kumvetsetsa ziwerengero komanso kugwiritsa ntchito zida zofunika kwambiri, monga sqlite, html, chilankhulo cha sql ndi chilankhulo chilichonse cholembera.

Gwero la data la Webalizer ndi fayilo ya access.log ya seva. Umu ndi momwe mipiringidzo ndi manambala ake amawonekera, pomwe kuchuluka kwa magalimoto kumamveka bwino:

Ziwerengero zapatsamba ndi malo anu ochepa osungira
Ziwerengero zapatsamba ndi malo anu ochepa osungira
Zida monga Google Analytics zimasonkhanitsa deta kuchokera patsamba lomwe ladzaza. Amatiwonetsa zithunzi zingapo ndi mizere, kutengera zomwe nthawi zambiri zimakhala zovuta kupeza mfundo zolondola. Mwinamwake khama lowonjezereka likanayenera kupangidwa? Sindikudziwa.

Ndiye, kodi ndimafuna kuwona chiyani pamawerengero a alendo apa webusayiti?

Ogwiritsa ntchito ndi bot traffic

Nthawi zambiri kuchuluka kwa magalimoto pamasamba kumakhala kochepa ndipo ndikofunikira kuwona kuchuluka kwa magalimoto omwe akugwiritsidwa ntchito. Mwachitsanzo, monga chonchi:

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT
1 as 'StackedArea: Traffic generated by Users and Bots',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN USG.AGENT_BOT!='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Bots, KB',
SUM(CASE WHEN USG.AGENT_BOT='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Users, KB'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Grafu ikuwonetsa ntchito zokhazikika za bots. Zingakhale zosangalatsa kuphunzira mwatsatanetsatane oimira ogwira ntchito kwambiri.

Zokhumudwitsa bots

Timayika bots kutengera chidziwitso cha wogwiritsa ntchito. Ziwerengero zowonjezera pamagalimoto atsiku ndi tsiku, kuchuluka kwa zopempha zopambana komanso zosachita bwino zimapereka lingaliro labwino la zochitika za bot.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT 
1 AS 'Table: Annoying Bots',
MAX(USG.AGENT_BOT) AS 'Bot',
ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day',
ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Client Error', 'Server Error') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Error Requests per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Successful', 'Redirection') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Success Requests per Day',
USG.USER_AGENT_NK AS 'Agent'
FROM FCT_ACCESS_USER_AGENT_DD FCT,
     DIM_USER_AGENT USG,
     DIM_HTTP_STATUS STS
WHERE FCT.DIM_USER_AGENT_ID = USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = STS.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT != 'n.a.'
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY USG.USER_AGENT_NK
ORDER BY 3 DESC
LIMIT 10

Pachifukwa ichi, zotsatira za kusanthula zinali chisankho choletsa mwayi wopezeka pamalowa powonjezera pa fayilo ya robots.txt.

User-agent: AhrefsBot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: bingbot
Crawl-delay: 5

Mabotolo awiri oyamba adasowa patebulo, ndipo maloboti a MS adatsika kuchokera pamzere woyamba.

Tsiku ndi nthawi ya ntchito zazikulu

Kuthamanga kumawonekera mumsewu. Kuti muwaphunzire mwatsatanetsatane, ndikofunikira kuwonetsa nthawi yomwe adachitika, ndipo sikofunikira kuwonetsa maola onse ndi masiku akuyezera nthawi. Izi zipangitsa kuti kukhale kosavuta kupeza zopempha za munthu aliyense mu fayilo ya chipika ngati kusanthula mwatsatanetsatane pakufunika.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT
1 AS 'Line: Day and Hour of Hits from Users and Bots',
strftime('%d.%m-%H', datetime(EVENT_DT, 'unixepoch')) AS 'Date Time',
HIB AS 'Bots, Hits',
HIU AS 'Users, Hits'
FROM (
	SELECT
	EVENT_DT,
	SUM(CASE WHEN AGENT_BOT!='n.a.' THEN LINE_CNT ELSE 0 END) AS HIB,
	SUM(CASE WHEN AGENT_BOT='n.a.' THEN LINE_CNT ELSE 0 END) AS HIU
	FROM FCT_ACCESS_REQUEST_REF_HH
	WHERE datetime(EVENT_DT, 'unixepoch') >= date('now', '-14 day')
	GROUP BY EVENT_DT
	ORDER BY SUM(LINE_CNT) DESC
	LIMIT 10
) ORDER BY EVENT_DT

Timawona maola ogwira ntchito kwambiri 11, 14 ndi 20 a tsiku loyamba pa tchati. Koma tsiku lotsatira pa 13:XNUMX bots anali akugwira ntchito.

Avereji ya zochita za ogwiritsa ntchito tsiku lililonse ndi sabata

Tinakonza zinthu pang'ono ndi zochitika ndi magalimoto. Funso lotsatira linali ntchito ya ogwiritsa ntchito okha. Paziwerengero zotere, nthawi yayitali yophatikiza, monga sabata, ndiyofunikira.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT
1 as 'Line: Average Daily User Activity by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
ROUND(1.0*SUM(FCT.PAGE_CNT)/SUM(FCT.IP_CNT),1) AS 'Pages per IP per Day',
ROUND(1.0*SUM(FCT.FILE_CNT)/SUM(FCT.IP_CNT),1) AS 'Files per IP per Day'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT='n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Ziwerengero zamlungu ndi mlungu zimasonyeza kuti pafupifupi wosuta mmodzi amatsegula masamba 1,6 patsiku. Chiwerengero cha mafayilo omwe afunsidwa pa wogwiritsa ntchito pankhaniyi amadalira kuwonjezera kwa mafayilo atsopano patsamba.

Zopempha zonse ndi ma status awo

Webalizer nthawi zonse amawonetsa masamba enieni ndipo nthawi zonse ndimafuna kuwona kuchuluka kwa zopempha ndi zolakwika zomwe zapambana.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT
1 as 'Line: All Requests by Status',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN STS.STATUS_GROUP='Successful' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Success',
SUM(CASE WHEN STS.STATUS_GROUP='Redirection' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Redirect',
SUM(CASE WHEN STS.STATUS_GROUP='Client Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Customer Error',
SUM(CASE WHEN STS.STATUS_GROUP='Server Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Server Error'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_HTTP_STATUS STS
WHERE FCT.DIM_HTTP_STATUS_ID=STS.DIM_HTTP_STATUS_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Lipoti likuwonetsa zopempha, osati kudina (kugunda), mosiyana ndi LINE_CNT, ma metric a REQUEST_CNT amawerengedwa ngati COUNT(DISTINCT STG.REQUEST_NK). Cholinga ndikuwonetsa zochitika zogwira mtima, mwachitsanzo, MS bots poll robots.txt fayilo kangapo patsiku ndipo, pamenepa, zisankho zoterezi zidzawerengedwa kamodzi. Izi zimakuthandizani kuti muzitha kulumpha mu graph.

Kuchokera pa graph mutha kuwona zolakwika zambiri - awa ndi masamba omwe palibe. Chotsatira cha kusanthula chinali kuwonjezeredwa kwa kuwongolera kuchokera kumasamba akutali.

Zopempha zoipa

Kuti muwone mwatsatanetsatane zopempha, mutha kuwonetsa ziwerengero zatsatanetsatane.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Lipoti la funso la SQL

SELECT
  1 AS 'Table: Top Error Requests',
  REQ.REQUEST_NK AS 'Request',
  'Error' AS 'Request Status',
  ROUND(SUM(FCT.LINE_CNT) / 14.0, 1) AS 'Hits per Day',
  ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
  ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day'
FROM
  FCT_ACCESS_REQUEST_REF_HH FCT,
  DIM_REQUEST_V_ACT REQ
WHERE FCT.DIM_REQUEST_ID = REQ.DIM_REQUEST_ID
  AND FCT.STATUS_GROUP IN ('Client Error', 'Server Error')
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY REQ.REQUEST_NK
ORDER BY 4 DESC
LIMIT 20

Mndandandawu udzakhalanso ndi mafoni onse, mwachitsanzo, pempho ku /wp-login.php Mwa kusintha malamulo olemberanso zopempha ndi seva, mukhoza kusintha machitidwe a seva pazopempha zotere ndikuzitumiza ku tsamba loyambira.

Chifukwa chake, malipoti osavuta otengera fayilo ya chipika cha seva amapereka chithunzi chokwanira cha zomwe zikuchitika patsambalo.

Mungapeze bwanji zambiri?

Nawonso database ya sqlite ndiyokwanira. Tiyeni tipange matebulo: othandizira pakudula mitengo ya ETL.

Ziwerengero zapatsamba ndi malo anu ochepa osungira

Patebulo pomwe tidzalemba mafayilo olembera pogwiritsa ntchito PHP. Matebulo awiri ophatikiza. Tiyeni tipange tebulo latsiku ndi tsiku lokhala ndi ziwerengero za ogwiritsa ntchito ndi ma status ofunsira. Ola lililonse ndi ziwerengero za zopempha, magulu ndi othandizira. Matebulo anayi a miyeso yoyenera.

Zotsatira zake ndi chitsanzo chotsatirachi chaubale:

Mtundu wa dataZiwerengero zapatsamba ndi malo anu ochepa osungira

Script kuti mupange chinthu mu database ya sqlite:

Kupanga zinthu za DDL

DROP TABLE IF EXISTS DIM_USER_AGENT;
CREATE TABLE DIM_USER_AGENT (
  DIM_USER_AGENT_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  USER_AGENT_NK     TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_OS          TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_ENGINE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_DEVICE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_BOT         TEXT NOT NULL DEFAULT 'n.a.',
  UPDATE_DT         INTEGER NOT NULL DEFAULT 0,
  UNIQUE (USER_AGENT_NK)
);
INSERT INTO DIM_USER_AGENT (DIM_USER_AGENT_ID) VALUES (-1);

Gawo

Pankhani ya fayilo ya access.log, ndikofunikira kuwerenga, kusanthula ndi kulemba zopempha zonse ku database. Izi zitha kuchitika mwachindunji pogwiritsa ntchito chilankhulo cholembera kapena kugwiritsa ntchito zida za sqlite.

Mtundu wa fayilo ya Log:

//67.221.59.195 - - [28/Dec/2012:01:47:47 +0100] "GET /files/default.css HTTP/1.1" 200 1512 "https://project.edu/" "Mozilla/4.0"
//host ident auth time method request_nk protocol status bytes ref browser
$log_pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) ([[^]]+]) "(.*) (.*) (.*)" ([0-9-]+) ([0-9-]+) "(.*)" "(.*)"$/';

Kufalitsa kofunikira

Pamene deta yaiwisi ili mu nkhokwe, muyenera kulemba makiyi omwe mulibe m'matebulo oyezera. Ndiye kudzakhala kotheka kumanga kufotokoza kwa miyeso. Mwachitsanzo, pa DIM_REFERRER tebulo, kiyi ndi kuphatikiza magawo atatu.

SQL key propagation query

/* Propagate the referrer from access log */
INSERT INTO DIM_REFERRER (HOST_NK, PATH_NK, QUERY_NK, UPDATE_DT)
SELECT
	CLS.HOST_NK,
	CLS.PATH_NK,
	CLS.QUERY_NK,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT DISTINCT
	REFERRER_HOST AS HOST_NK,
	REFERRER_PATH AS PATH_NK,
	CASE WHEN INSTR(REFERRER_QUERY,'&sid')>0 THEN SUBSTR(REFERRER_QUERY, 1, INSTR(REFERRER_QUERY,'&sid')-1) /* ΠΎΡ‚Ρ€Π΅Π·Π°Π΅ΠΌ sid - спСцифика цмс */
	ELSE REFERRER_QUERY END AS QUERY_NK
	FROM STG_ACCESS_LOG
) CLS
LEFT OUTER JOIN DIM_REFERRER TRG
ON (CLS.HOST_NK = TRG.HOST_NK AND CLS.PATH_NK = TRG.PATH_NK AND CLS.QUERY_NK = TRG.QUERY_NK)
WHERE TRG.DIM_REFERRER_ID IS NULL

Kufalitsa patebulo la wogwiritsa ntchito kumatha kukhala ndi malingaliro a bot, mwachitsanzo sql snippet:


CASE
WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
	THEN 'yandex'
WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
	THEN 'google'
WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
	THEN 'microsoft'
WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
	THEN 'ahrefs'
WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
	THEN 'majestic-12'
WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
	OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
	OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
	OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0
	OR INSTR(LOWER(CLS.BROWSER),'wget')>0
	THEN 'other'
ELSE 'n.a.' END AS AGENT_BOT

Matebulo ophatikiza

Pomaliza, tidzakweza matebulo ophatikizika; mwachitsanzo, tebulo latsiku ndi tsiku litha kuyikidwa motere:

Funso la SQL pakutsitsa aggregate

/* Load fact from access log */
INSERT INTO FCT_ACCESS_USER_AGENT_DD (EVENT_DT, DIM_USER_AGENT_ID, DIM_HTTP_STATUS_ID, PAGE_CNT, FILE_CNT, REQUEST_CNT, LINE_CNT, IP_CNT, BYTES)
WITH STG AS (
SELECT
	STRFTIME( '%s', SUBSTR(TIME_NK,9,4) || '-' ||
	CASE SUBSTR(TIME_NK,5,3)
	WHEN 'Jan' THEN '01' WHEN 'Feb' THEN '02' WHEN 'Mar' THEN '03' WHEN 'Apr' THEN '04' WHEN 'May' THEN '05' WHEN 'Jun' THEN '06'
	WHEN 'Jul' THEN '07' WHEN 'Aug' THEN '08' WHEN 'Sep' THEN '09' WHEN 'Oct' THEN '10' WHEN 'Nov' THEN '11'
	ELSE '12' END || '-' || SUBSTR(TIME_NK,2,2) || ' 00:00:00' ) AS EVENT_DT,
	BROWSER AS USER_AGENT_NK,
	REQUEST_NK,
	IP_NR,
	STATUS,
	LINE_NK,
	BYTES
FROM STG_ACCESS_LOG
)
SELECT
	CAST(STG.EVENT_DT AS INTEGER) AS EVENT_DT,
	USG.DIM_USER_AGENT_ID,
	HST.DIM_HTTP_STATUS_ID,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')=0 THEN STG.REQUEST_NK END) ) AS PAGE_CNT,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')>0 THEN STG.REQUEST_NK END) ) AS FILE_CNT,
	COUNT(DISTINCT STG.REQUEST_NK) AS REQUEST_CNT,
	COUNT(DISTINCT STG.LINE_NK) AS LINE_CNT,
	COUNT(DISTINCT STG.IP_NR) AS IP_CNT,
	SUM(BYTES) AS BYTES
FROM STG,
	DIM_HTTP_STATUS HST,
	DIM_USER_AGENT USG
WHERE STG.STATUS = HST.STATUS_NK
  AND STG.USER_AGENT_NK = USG.USER_AGENT_NK
  AND CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from /* load epoch date */
  AND CAST(STG.EVENT_DT AS INTEGER) < strftime('%s', date('now', 'start of day'))
GROUP BY STG.EVENT_DT, HST.DIM_HTTP_STATUS_ID, USG.DIM_USER_AGENT_ID

Database ya sqlite imakulolani kuti mulembe mafunso ovuta. WITH ili ndi kukonzekera kwa data ndi makiyi. Funso lalikulu limasonkhanitsa maumboni onse amiyeso.

Mkhalidwewu sungalole kutsitsanso mbiriyi: CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from, pomwe parameter ndi zotsatira za pempho
'SAKHANI COALESCE(MAX(EVENT_DT),'3600') MONGA LAST_EVENT_EPOCH KUCHOKERA FCT_ACCESS_USER_AGENT_DD'

Chikhalidwecho chidzatsegulidwa tsiku lonse: CAST(STG.EVENT_DT AS INTEGER) <strftime('%s', deti('now', 'start of day'))

Kuwerengera masamba kapena mafayilo kumachitika mwachikale, pofufuza mfundo.

Malipoti

M'mawonekedwe ovuta, ndizotheka kupanga meta-model yotengera zinthu za database, kuwongolera zosefera ndi malamulo ophatikiza. Pamapeto pake, zida zonse zabwino zimapanga funso la SQL.

Muchitsanzo ichi, tipanga mafunso okonzekera a SQL ndikusunga ngati mawonedwe mu database - awa ndi malipoti.

Kuwonetseratu

Bluff: Zithunzi zokongola mu JavaScript zidagwiritsidwa ntchito ngati chida chowonera

Kuti muchite izi, kunali koyenera kudutsa malipoti onse pogwiritsa ntchito PHP ndikupanga fayilo ya html yokhala ndi matebulo.

$sqls = array(
'SELECT * FROM RPT_ACCESS_USER_VS_BOT',
'SELECT * FROM RPT_ACCESS_ANNOYING_BOT',
'SELECT * FROM RPT_ACCESS_TOP_HOUR_HIT',
'SELECT * FROM RPT_ACCESS_USER_ACTIVE',
'SELECT * FROM RPT_ACCESS_REQUEST_STATUS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_PAGE',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_REFERRER',
'SELECT * FROM RPT_ACCESS_NEW_REQUEST',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_SUCCESS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_ERROR'
);

Chidachi chimangowona magome a zotsatira.

Pomaliza

Pogwiritsa ntchito kusanthula kwa intaneti monga chitsanzo, nkhaniyo ikufotokoza njira zofunika pomanga malo osungiramo deta. Monga momwe zikuwonekera kuchokera ku zotsatira, zida zosavuta ndizokwanira kusanthula mozama ndi kuwonetseratu deta.

M'tsogolomu, pogwiritsa ntchito chosungirachi monga chitsanzo, tidzayesa kugwiritsa ntchito zinthu monga kusintha pang'onopang'ono miyeso, metadata, milingo yophatikizira ndi kuphatikiza deta kuchokera kuzinthu zosiyanasiyana.

Komanso, tiyeni tiwone mwatsatanetsatane chida chosavuta chowongolera njira za ETL potengera tebulo limodzi.

Tiyeni tibwererenso kumutu woyezera kuchuluka kwa data ndikusintha kachitidwe kameneka.

Tidzaphunzira za zovuta zamaluso ndi kukonza zosungirako za data, zomwe tidzakhazikitsa seva yosungiramo zinthu zochepa, mwachitsanzo, kutengera Raspberry Pi.

Source: www.habr.com

Kuwonjezera ndemanga