Nga tatauranga o te waahi me to ake rokiroki iti

Ko Webalizer me Google Analytics kua awhina i ahau ki te whai matauranga ki nga mea kei runga i nga paetukutuku mo nga tau maha. Inaianei kei te mohio ahau he iti rawa nga korero whaihua e whakaratohia ana e ratou. Ma te whai waahi ki to urunga.log konae, he tino ngawari ki te mohio ki nga tatauranga me te whakatinana i nga taputapu maamaa, penei i te sqlite, html, te reo sql me tetahi reo hotaka tuhi.

Ko te puna raraunga mo Webalizer ko te kōnae uru.log a te tūmau. Koia nei te ahua o ona tutaki me ona nama, na reira ko te tapeke o nga waka e marama ana:

Nga tatauranga o te waahi me to ake rokiroki iti
Nga tatauranga o te waahi me to ake rokiroki iti
Ko nga taputapu penei i te Google Analytics ka kohikohi raraunga mai i te wharangi kua utaina. Ka whakaatuhia e ratou etahi hoahoa me nga raina e rua, i runga i te mea he uaua ki te whakaputa whakatau tika. Me kaha ake pea? Kaua e mohio.

Na, he aha taku i hiahia kia kite i nga tatauranga manuhiri paetukutuku?

Kaiwhakamahi me te hokohoko bot

I te nuinga o nga wa ka iti te hokohoko o te waahi, a he mea tika kia kite i te nui o nga hokohoko e whakamahia ana. Hei tauira, penei:

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT
1 as 'StackedArea: Traffic generated by Users and Bots',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN USG.AGENT_BOT!='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Bots, KB',
SUM(CASE WHEN USG.AGENT_BOT='n.a.' THEN FCT.BYTES ELSE 0 END)/1000 AS 'Users, KB'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

E whakaatu ana te kauwhata i te mahi tonu o nga potae. He pai ki te ako i nga korero mo nga mema tino kaha.

Karetao hoha

Ka whakarōpūhia e matou nga potae i runga i nga korero a nga kaihoko kaiwhakamahi. Ko nga tatauranga taapiri mo te hokohoko o ia ra, te maha o nga tono angitu me te kore angitu e whakaatu pai ana mo te mahi bot.

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT 
1 AS 'Table: Annoying Bots',
MAX(USG.AGENT_BOT) AS 'Bot',
ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day',
ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Client Error', 'Server Error') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Error Requests per Day',
ROUND(SUM(CASE WHEN STS.STATUS_GROUP IN ('Successful', 'Redirection') THEN FCT.REQUEST_CNT / 14.0 ELSE 0 END), 1) AS 'Success Requests per Day',
USG.USER_AGENT_NK AS 'Agent'
FROM FCT_ACCESS_USER_AGENT_DD FCT,
     DIM_USER_AGENT USG,
     DIM_HTTP_STATUS STS
WHERE FCT.DIM_USER_AGENT_ID = USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = STS.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT != 'n.a.'
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY USG.USER_AGENT_NK
ORDER BY 3 DESC
LIMIT 10

I tenei keehi, ko te hua o te tātaritanga ko te whakatau ki te aukati i te uru ki te waahi ma te taapiri atu ki te konae robots.txt

User-agent: AhrefsBot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: bingbot
Crawl-delay: 5

Ko nga karetao tuatahi e rua i ngaro mai i te tepu, ka heke iho nga robots MS mai i nga rarangi tuatahi.

Te ra me te wa o nga mahi tino nui

Ka kitea te piki haere i roto i nga waka. Ki te ako i a raatau, he mea tika kia tohuhia te wa o to ratau waahi, a kaore e tika kia whakaatuhia nga haora me nga ra katoa o te ine i te waa. Ma tenei ka ngawari ake te rapu tono takitahi i roto i te konae rangitaki mena ka hiahiatia he tātaritanga taipitopito.

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT
1 AS 'Line: Day and Hour of Hits from Users and Bots',
strftime('%d.%m-%H', datetime(EVENT_DT, 'unixepoch')) AS 'Date Time',
HIB AS 'Bots, Hits',
HIU AS 'Users, Hits'
FROM (
	SELECT
	EVENT_DT,
	SUM(CASE WHEN AGENT_BOT!='n.a.' THEN LINE_CNT ELSE 0 END) AS HIB,
	SUM(CASE WHEN AGENT_BOT='n.a.' THEN LINE_CNT ELSE 0 END) AS HIU
	FROM FCT_ACCESS_REQUEST_REF_HH
	WHERE datetime(EVENT_DT, 'unixepoch') >= date('now', '-14 day')
	GROUP BY EVENT_DT
	ORDER BY SUM(LINE_CNT) DESC
	LIMIT 10
) ORDER BY EVENT_DT

Ka tirohia nga haora tino kaha 11, 14 me te 20 o te ra tuatahi i runga i te tūtohi. Engari i te ra i muri mai i te 13:XNUMX ka kaha nga bots.

Toharite mahi kaiwhakamahi ia ra i te wiki

I whakatauhia e matou etahi mea me te mahi me te hokohoko. Ko te patai i muri mai ko te mahi a nga kaiwhakamahi ake. Mo nga tatauranga penei, ko nga wa whakahiato roa, penei i te wiki, he mea pai.

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT
1 as 'Line: Average Daily User Activity by Week',
strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Week',
ROUND(1.0*SUM(FCT.PAGE_CNT)/SUM(FCT.IP_CNT),1) AS 'Pages per IP per Day',
ROUND(1.0*SUM(FCT.FILE_CNT)/SUM(FCT.IP_CNT),1) AS 'Files per IP per Day'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_USER_AGENT USG,
  DIM_HTTP_STATUS HST
WHERE FCT.DIM_USER_AGENT_ID=USG.DIM_USER_AGENT_ID
  AND FCT.DIM_HTTP_STATUS_ID = HST.DIM_HTTP_STATUS_ID
  AND USG.AGENT_BOT='n.a.' /* users only */
  AND HST.STATUS_GROUP IN ('Successful') /* good pages */
  AND datetime(FCT.EVENT_DT, 'unixepoch') > date('now', '-3 month')
GROUP BY strftime('%W week', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Ko nga tatauranga o ia wiki e whakaatu ana i te toharite kotahi te kaiwhakamahi e whakatuwhera ana i nga wharangi 1,6 ia ra. Ko te maha o nga konae i tonoa mo ia kaiwhakamahi i tenei keehi ka whakawhirinaki ki te taapiri o nga konae hou ki te waahi.

Nga tono katoa me o raatau mana

I whakaatu tonu a Webalizer i nga waehere wharangi motuhake me taku hiahia kia kite noa i te maha o nga tono angitu me nga hapa.

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT
1 as 'Line: All Requests by Status',
strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch')) AS 'Day',
SUM(CASE WHEN STS.STATUS_GROUP='Successful' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Success',
SUM(CASE WHEN STS.STATUS_GROUP='Redirection' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Redirect',
SUM(CASE WHEN STS.STATUS_GROUP='Client Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Customer Error',
SUM(CASE WHEN STS.STATUS_GROUP='Server Error' THEN FCT.REQUEST_CNT ELSE 0 END) AS 'Server Error'
FROM
  FCT_ACCESS_USER_AGENT_DD FCT,
  DIM_HTTP_STATUS STS
WHERE FCT.DIM_HTTP_STATUS_ID=STS.DIM_HTTP_STATUS_ID
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY strftime('%d.%m', datetime(FCT.EVENT_DT, 'unixepoch'))
ORDER BY FCT.EVENT_DT

Ko te ripoata e whakaatu ana i nga tono, ehara i nga pao (patu), kaore i rite ki a LINE_CNT, ko te ine REQUEST_CNT ka tatauhia hei COUNT(DISTINCT STG.REQUEST_NK). Ko te whainga ko te whakaatu i nga kaupapa whai hua, hei tauira, ka pooti nga karetao MS ki te konae robots.txt e hia rau nga wa ia ra, a, i tenei keehi, ka tatauhia nga pooti kotahi. Ma tenei ka taea e koe te maeneene i nga peke i te kauwhata.

Mai i te kauwhata ka kite koe i nga hapa maha - he wharangi kore-kore enei. Ko te hua o te tātaritanga ko te taapiri i nga whakahuri mai i nga wharangi mamao.

Nga tono kino

Hei tirotiro i nga tono, ka taea e koe te whakaatu i nga tatauranga taipitopito.

Nga tatauranga o te waahi me to ake rokiroki iti

Uiui ripoata SQL

SELECT
  1 AS 'Table: Top Error Requests',
  REQ.REQUEST_NK AS 'Request',
  'Error' AS 'Request Status',
  ROUND(SUM(FCT.LINE_CNT) / 14.0, 1) AS 'Hits per Day',
  ROUND(SUM(FCT.IP_CNT) / 14.0, 1) AS 'IPs per Day',
  ROUND(SUM(FCT.BYTES)/1000 / 14.0, 1) AS 'KB per Day'
FROM
  FCT_ACCESS_REQUEST_REF_HH FCT,
  DIM_REQUEST_V_ACT REQ
WHERE FCT.DIM_REQUEST_ID = REQ.DIM_REQUEST_ID
  AND FCT.STATUS_GROUP IN ('Client Error', 'Server Error')
  AND datetime(FCT.EVENT_DT, 'unixepoch') >= date('now', '-14 day')
GROUP BY REQ.REQUEST_NK
ORDER BY 4 DESC
LIMIT 20

Kei roto hoki i tenei rarangi nga waea katoa, hei tauira, he tono ki /wp-login.php Ma te whakatikatika i nga ture mo te tuhi ano i nga tono a te tūmau, ka taea e koe te whakatika i te urupare a te tūmau ki aua tono ka tukuna atu ki te wharangi timatanga.

Na, ko etahi o nga purongo ngawari e pa ana ki te konae raarangi tūmau e whakaatu ana i nga ahuatanga o te waahi.

Me pehea te tiki korero?

He rawaka te patengi raraunga sqlite. Me hanga ripanga: he awhina mo te takiuru i nga tukanga ETL.

Nga tatauranga o te waahi me to ake rokiroki iti

Te waahi ripanga ka tuhia e matou nga konae rangitaki ma te whakamahi i te PHP. E rua nga ripanga whakahiato. Me hanga he ripanga o ia ra me nga tatauranga mo nga kaihoko kaiwhakamahi me nga mana tono. Ia haora me nga tatauranga mo nga tono, nga roopu mana me nga kaihoko. E wha nga papa inenga e tika ana.

Ko te hua ko te tauira hononga e whai ake nei:

Tauira RaraungaNga tatauranga o te waahi me to ake rokiroki iti

Hōtuhi hei waihanga ahanoa ki te pātengi raraunga sqlite:

Te waihanga ahanoa DDL

DROP TABLE IF EXISTS DIM_USER_AGENT;
CREATE TABLE DIM_USER_AGENT (
  DIM_USER_AGENT_ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  USER_AGENT_NK     TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_OS          TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_ENGINE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_DEVICE      TEXT NOT NULL DEFAULT 'n.a.',
  AGENT_BOT         TEXT NOT NULL DEFAULT 'n.a.',
  UPDATE_DT         INTEGER NOT NULL DEFAULT 0,
  UNIQUE (USER_AGENT_NK)
);
INSERT INTO DIM_USER_AGENT (DIM_USER_AGENT_ID) VALUES (-1);

Tauranga

I roto i te take o te kōnae access.log, he mea tika ki te panui, ki te panui me te tuhi i nga tono katoa ki te papaarangi. Ka taea tenei ma te whakamahi tika i te reo tuhi, te whakamahi ranei i nga taputapu sqlite.

Hōputu kōnae rangitaki:

//67.221.59.195 - - [28/Dec/2012:01:47:47 +0100] "GET /files/default.css HTTP/1.1" 200 1512 "https://project.edu/" "Mozilla/4.0"
//host ident auth time method request_nk protocol status bytes ref browser
$log_pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) ([[^]]+]) "(.*) (.*) (.*)" ([0-9-]+) ([0-9-]+) "(.*)" "(.*)"$/';

Whakatairanga matua

I te wa e noho ana nga raraunga mata ki roto i te putunga korero, me tuhi nga taviri karekau ki roto ki nga ripanga inenga. Na ka taea te hanga tohutoro mo nga inenga. Hei tauira, i te ripanga DIM_REFERRER, ko te matua he huinga o nga mara e toru.

Uiui whakatö matua SQL

/* Propagate the referrer from access log */
INSERT INTO DIM_REFERRER (HOST_NK, PATH_NK, QUERY_NK, UPDATE_DT)
SELECT
	CLS.HOST_NK,
	CLS.PATH_NK,
	CLS.QUERY_NK,
	STRFTIME('%s','now') AS UPDATE_DT
FROM (
	SELECT DISTINCT
	REFERRER_HOST AS HOST_NK,
	REFERRER_PATH AS PATH_NK,
	CASE WHEN INSTR(REFERRER_QUERY,'&sid')>0 THEN SUBSTR(REFERRER_QUERY, 1, INSTR(REFERRER_QUERY,'&sid')-1) /* отрезаем sid - специфика цмс */
	ELSE REFERRER_QUERY END AS QUERY_NK
	FROM STG_ACCESS_LOG
) CLS
LEFT OUTER JOIN DIM_REFERRER TRG
ON (CLS.HOST_NK = TRG.HOST_NK AND CLS.PATH_NK = TRG.PATH_NK AND CLS.QUERY_NK = TRG.QUERY_NK)
WHERE TRG.DIM_REFERRER_ID IS NULL

Ko te whakatö ki te ripanga kaihoko kaiwhakamahi kei roto pea te arorau bot, hei tauira te sql snippet:


CASE
WHEN INSTR(LOWER(CLS.BROWSER),'yandex.com')>0
	THEN 'yandex'
WHEN INSTR(LOWER(CLS.BROWSER),'googlebot')>0
	THEN 'google'
WHEN INSTR(LOWER(CLS.BROWSER),'bingbot')>0
	THEN 'microsoft'
WHEN INSTR(LOWER(CLS.BROWSER),'ahrefsbot')>0
	THEN 'ahrefs'
WHEN INSTR(LOWER(CLS.BROWSER),'mj12bot')>0
	THEN 'majestic-12'
WHEN INSTR(LOWER(CLS.BROWSER),'compatible')>0 OR INSTR(LOWER(CLS.BROWSER),'http')>0
	OR INSTR(LOWER(CLS.BROWSER),'libwww')>0 OR INSTR(LOWER(CLS.BROWSER),'spider')>0
	OR INSTR(LOWER(CLS.BROWSER),'java')>0 OR INSTR(LOWER(CLS.BROWSER),'python')>0
	OR INSTR(LOWER(CLS.BROWSER),'robot')>0 OR INSTR(LOWER(CLS.BROWSER),'curl')>0
	OR INSTR(LOWER(CLS.BROWSER),'wget')>0
	THEN 'other'
ELSE 'n.a.' END AS AGENT_BOT

Nga ripanga whakahiato

Ka mutu, ka utaina e matou nga ripanga whakahiato; hei tauira, ka taea te utaina te tepu o ia ra e whai ake nei:

Uiui SQL mo te uta i te whakahiato

/* Load fact from access log */
INSERT INTO FCT_ACCESS_USER_AGENT_DD (EVENT_DT, DIM_USER_AGENT_ID, DIM_HTTP_STATUS_ID, PAGE_CNT, FILE_CNT, REQUEST_CNT, LINE_CNT, IP_CNT, BYTES)
WITH STG AS (
SELECT
	STRFTIME( '%s', SUBSTR(TIME_NK,9,4) || '-' ||
	CASE SUBSTR(TIME_NK,5,3)
	WHEN 'Jan' THEN '01' WHEN 'Feb' THEN '02' WHEN 'Mar' THEN '03' WHEN 'Apr' THEN '04' WHEN 'May' THEN '05' WHEN 'Jun' THEN '06'
	WHEN 'Jul' THEN '07' WHEN 'Aug' THEN '08' WHEN 'Sep' THEN '09' WHEN 'Oct' THEN '10' WHEN 'Nov' THEN '11'
	ELSE '12' END || '-' || SUBSTR(TIME_NK,2,2) || ' 00:00:00' ) AS EVENT_DT,
	BROWSER AS USER_AGENT_NK,
	REQUEST_NK,
	IP_NR,
	STATUS,
	LINE_NK,
	BYTES
FROM STG_ACCESS_LOG
)
SELECT
	CAST(STG.EVENT_DT AS INTEGER) AS EVENT_DT,
	USG.DIM_USER_AGENT_ID,
	HST.DIM_HTTP_STATUS_ID,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')=0 THEN STG.REQUEST_NK END) ) AS PAGE_CNT,
	COUNT(DISTINCT (CASE WHEN INSTR(STG.REQUEST_NK,'.')>0 THEN STG.REQUEST_NK END) ) AS FILE_CNT,
	COUNT(DISTINCT STG.REQUEST_NK) AS REQUEST_CNT,
	COUNT(DISTINCT STG.LINE_NK) AS LINE_CNT,
	COUNT(DISTINCT STG.IP_NR) AS IP_CNT,
	SUM(BYTES) AS BYTES
FROM STG,
	DIM_HTTP_STATUS HST,
	DIM_USER_AGENT USG
WHERE STG.STATUS = HST.STATUS_NK
  AND STG.USER_AGENT_NK = USG.USER_AGENT_NK
  AND CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from /* load epoch date */
  AND CAST(STG.EVENT_DT AS INTEGER) < strftime('%s', date('now', 'start of day'))
GROUP BY STG.EVENT_DT, HST.DIM_HTTP_STATUS_ID, USG.DIM_USER_AGENT_ID

Ko te papaaarangi sqlite ka taea e koe te tuhi i nga patai uaua. Kei roto i a WITH te whakaritenga o nga raraunga me nga taviri. Ka kohia e te patai matua nga tohutoro katoa ki nga rahi.

Ka kore te ahuatanga e whakaae kia utaina ano te hitori: CAST(STG.EVENT_DT AS INTEGER) > $param_epoch_from, ko te tawhā te hua o te tono
'Kīwhiria COALESCE(MAX(EVENT_DT), '3600') HEI_WA_WHENUA_WATAPU MŌ FCT_ACCESS_USER_AGENT_DD'

Ka utaina te ahuatanga i te ra katoa: CAST(STG.EVENT_DT AS INTEGER) < strftime('%s', ra('inaianei', 'tmatanga o te ra'))

Ko te tatau i nga wharangi, i nga konae ranei ka mahia i roto i te huarahi tuatahi, ma te rapu i tetahi tohu.

Ripoata

I roto i nga punaha whakakitenga uaua, ka taea te hanga i tetahi tauira-meta i runga i nga taonga papaa raraunga, te whakahaere hihiri i nga whiriwhiringa me nga ture whakahiato. I te mutunga, ko nga taputapu tika katoa ka whakaputa i te patai SQL.

I tenei tauira, ka hangahia e matou nga patai SQL kua rite, ka tiakina hei tirohanga i roto i te paataka raraunga - he purongo enei.

Whakaaturanga

Bluff: Ko nga kauwhata ataahua i roto i te JavaScript i whakamahia hei taputapu tirohanga

Hei mahi i tenei, me haere ki roto i nga purongo katoa ma te whakamahi i te PHP me te whakaputa i tetahi konae html me nga ripanga.

$sqls = array(
'SELECT * FROM RPT_ACCESS_USER_VS_BOT',
'SELECT * FROM RPT_ACCESS_ANNOYING_BOT',
'SELECT * FROM RPT_ACCESS_TOP_HOUR_HIT',
'SELECT * FROM RPT_ACCESS_USER_ACTIVE',
'SELECT * FROM RPT_ACCESS_REQUEST_STATUS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_PAGE',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_REFERRER',
'SELECT * FROM RPT_ACCESS_NEW_REQUEST',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_SUCCESS',
'SELECT * FROM RPT_ACCESS_TOP_REQUEST_ERROR'
);

Ka whakaata noa te taputapu i nga ripanga hua.

mutunga

Ma te whakamahi i te tātari paetukutuku hei tauira, ka whakaahua te tuhinga i nga tikanga e tika ana hei hanga whare putunga raraunga. Ka kitea mai i nga hua, he rawaka nga taputapu ngawari mo te tātari hohonu me te tirohanga o nga raraunga.

Hei te wa kei te heke mai, ma te whakamahi i tenei putunga hei tauira, ka ngana matou ki te whakatinana i nga hanganga penei i te whakarereke i nga rahi, metadata, taumata whakahiato me te whakauru raraunga mai i nga punaa rereke.

Ano, kia ata titiro ki te taputapu ngawari mo te whakahaere i nga tikanga ETL i runga i te tepu kotahi.

Kia hoki ki te kaupapa o te ine i te kounga raraunga me te whakaaunoa i tenei mahi.

Ka akohia e matou nga raruraru o te taiao hangarau me te tiaki i nga rokiroki raraunga, ka whakatinanahia e matou he tūmau rokiroki me nga rauemi iti, hei tauira, i runga i te Raspberry Pi.

Source: will.com

Tāpiri i te kōrero