Au kidogo ya tetrisology kutumika.
Kila kitu kipya kimesahaulika zamani.
Epigraphs.
Taarifa ya tatizo
Ni muhimu kupakua mara kwa mara faili ya kumbukumbu ya PostgreSQL kutoka kwa wingu la AWS hadi kwa seva pangishi ya Linux ya ndani. Sio kwa wakati halisi, lakini, tutasema, kwa kuchelewa kidogo.
Kipindi cha kupakua faili ya kumbukumbu ni dakika 5.
Faili ya kumbukumbu, katika AWS, inazungushwa kila saa.
Zana zilizotumiwa
Ili kupakia faili ya kumbukumbu kwa mwenyeji, hati ya bash inatumika inayoita API ya AWS "
Vigezo:
- --db-instance-kitambulisho: Jina la tukio katika AWS;
- --log-file-name: jina la faili ya kumbukumbu inayozalishwa kwa sasa
- --max-item: Jumla ya idadi ya vitu vilivyorejeshwa katika matokeo ya amri.Saizi ya kipande cha faili iliyopakuliwa.
- --ishara-ya-kuanza: Ishara ya kuanza
Ndio, na kwa urahisi - kazi ya kupendeza ya mafunzo na anuwai wakati wa saa za kazi.
Nadhani shida tayari imetatuliwa kwa msingi wa mazoea. Lakini Google ya haraka haikupendekeza suluhisho, na hakukuwa na hamu fulani ya kutafuta kwa kina zaidi. Kwa hali yoyote, ni Workout nzuri.
Urasimishaji wa kazi
Faili ya mwisho ya logi ni seti ya mistari ya urefu tofauti. Kielelezo, faili ya logi inaweza kuwakilishwa kama hii:
Je, tayari inakukumbusha kitu? "Tetris" ni nini? Na hapa ni nini.
Ikiwa tunawakilisha chaguzi zinazowezekana zinazotokea wakati wa kupakia faili inayofuata kielelezo (kwa unyenyekevu, katika kesi hii, acha mistari iwe na urefu sawa), tunapata. takwimu za kawaida za tetris:
1) Faili inapakuliwa kwa ukamilifu na ni ya mwisho. Saizi ya chunk ni kubwa kuliko saizi ya mwisho ya faili:
2) Faili ina muendelezo. Saizi ya chunk ni ndogo kuliko saizi ya mwisho ya faili:
3) Faili ni mwendelezo wa faili iliyotangulia na ina mwendelezo. Saizi ya chunk ni chini ya saizi ya faili nyingine ya mwisho:
4) Faili ni mwendelezo wa faili iliyotangulia na ni ya mwisho. Saizi ya chunk ni kubwa kuliko saizi ya faili nyingine ya mwisho:
Kazi ni kukusanya mstatili au kucheza Tetris kwenye ngazi mpya.
Matatizo yanayotokea wakati wa kutatua tatizo
1) Gundi kamba ya sehemu 2
Kwa ujumla, hakukuwa na matatizo fulani. Kazi ya kawaida kutoka kwa kozi ya awali ya programu.
Saizi bora ya kutumikia
Lakini hii ni ya kuvutia zaidi kidogo.
Kwa bahati mbaya, hakuna njia ya kutumia kukabiliana baada ya kuanza lebo ya chunk:
Kama unavyojua tayari chaguo --starting-token inatumika kubainisha mahali pa kuanzia pagination. Chaguo hili huchukua maadili ya Mfuatano, ambayo itamaanisha kuwa ukijaribu kuongeza thamani ya kukabiliana mbele ya Mfuatano wa Tokeni Inayofuata, chaguo hilo halitazingatiwa kama suluhu.
Na kwa hivyo, lazima usome kwa sehemu.
Ikiwa unasoma kwa sehemu kubwa, basi idadi ya usomaji itakuwa ndogo, lakini kiasi kitakuwa cha juu.
Ikiwa unasoma kwa sehemu ndogo, basi kinyume chake, idadi ya usomaji itakuwa ya juu, lakini kiasi kitakuwa kidogo.
Kwa hiyo, ili kupunguza trafiki na kwa uzuri wa jumla wa suluhisho, nilipaswa kuja na aina fulani ya ufumbuzi, ambayo, kwa bahati mbaya, inaonekana kidogo kama crutch.
Kwa mfano, hebu fikiria mchakato wa kupakua faili ya kumbukumbu katika matoleo 2 yaliyorahisishwa sana. Idadi ya masomo katika visa vyote viwili inategemea saizi ya sehemu.
1) Pakia katika sehemu ndogo:
2) Pakia kwa sehemu kubwa:
Kama kawaida, suluhisho bora iko katikati.
Ukubwa wa sehemu ni ndogo, lakini katika mchakato wa kusoma, ukubwa unaweza kuongezeka ili kupunguza idadi ya masomo.
Ikumbukwe kwamba tatizo la kuchagua ukubwa bora wa sehemu iliyosomwa bado halijatatuliwa kabisa na linahitaji utafiti na uchambuzi wa kina. Labda baadaye kidogo.
Maelezo ya jumla ya utekelezaji
Jedwali za huduma zilizotumika
CREATE TABLE endpoint
(
id SERIAL ,
host text
);
TABLE database
(
id SERIAL ,
β¦
last_aws_log_time text ,
last_aws_nexttoken text ,
aws_max_item_size integer
);
last_aws_log_time β Π²ΡΠ΅ΠΌΠ΅Π½Π½Π°Ρ ΠΌΠ΅ΡΠΊΠ° ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅Π³ΠΎ Π·Π°Π³ΡΡΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π»ΠΎΠ³-ΡΠ°ΠΉΠ»Π° Π² ΡΠΎΡΠΌΠ°ΡΠ΅ YYYY-MM-DD-HH24.
last_aws_nexttoken β ΡΠ΅ΠΊΡΡΠΎΠ²Π°Ρ ΠΌΠ΅ΡΠΊΠ° ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅ΠΉ Π·Π°Π³ΡΡΠΆΠ΅Π½Π½ΠΎΠΉ ΠΏΠΎΡΡΠΈΠΈ.
aws_max_item_size- ΡΠΌΠΏΠΈΡΠΈΡΠ΅ΡΠΊΠΈΠΌ ΠΏΡΡΠ΅ΠΌ, ΠΏΠΎΠ΄ΠΎΠ±ΡΠ°Π½Π½ΡΠΉ Π½Π°ΡΠ°Π»ΡΠ½ΡΠΉ ΡΠ°Π·ΠΌΠ΅Ρ ΠΏΠΎΡΡΠΈΠΈ.
Nakala kamili ya hati
download_aws_piece.sh
#!/bin/bash
#########################################################
# download_aws_piece.sh
# downloan piece of log from AWS
# version HABR
let min_item_size=1024
let max_item_size=1048576
let growth_factor=3
let growth_counter=1
let growth_counter_max=3
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:''STARTED'
AWS_LOG_TIME=$1
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:AWS_LOG_TIME='$AWS_LOG_TIME
database_id=$2
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:database_id='$database_id
RESULT_FILE=$3
endpoint=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE_DATABASE -A -t -c "select e.host from endpoint e join database d on e.id = d.endpoint_id where d.id = $database_id "`
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:endpoint='$endpoint
db_instance=`echo $endpoint | awk -F"." '{print toupper($1)}'`
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:db_instance='$db_instance
LOG_FILE=$RESULT_FILE'.tmp_log'
TMP_FILE=$LOG_FILE'.tmp'
TMP_MIDDLE=$LOG_FILE'.tmp_mid'
TMP_MIDDLE2=$LOG_FILE'.tmp_mid2'
current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:current_aws_log_time='$current_aws_log_time
if [[ $current_aws_log_time != $AWS_LOG_TIME ]];
then
is_new_log='1'
if ! psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: FATAL_ERROR - update database set last_aws_log_time .'
exit 1
fi
else
is_new_log='0'
fi
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh:is_new_log='$is_new_log
let last_aws_max_item_size=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select aws_max_item_size from database where id = $database_id "`
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: last_aws_max_item_size='$last_aws_max_item_size
let count=1
if [[ $is_new_log == '1' ]];
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: START DOWNLOADING OF NEW AWS LOG'
if ! aws rds download-db-log-file-portion
--max-items $last_aws_max_item_size
--region REGION
--db-instance-identifier $db_instance
--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
exit 2
fi
else
next_token=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "select last_aws_nexttoken from database where id = $database_id "`
if [[ $next_token == '' ]];
then
next_token='0'
fi
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: CONTINUE DOWNLOADING OF AWS LOG'
if ! aws rds download-db-log-file-portion
--max-items $last_aws_max_item_size
--starting-token $next_token
--region REGION
--db-instance-identifier $db_instance
--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
exit 3
fi
line_count=`cat $LOG_FILE | wc -l`
let lines=$line_count-1
tail -$lines $LOG_FILE > $TMP_MIDDLE
mv -f $TMP_MIDDLE $LOG_FILE
fi
next_token_str=`cat $LOG_FILE | grep NEXTTOKEN`
next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
grep -v NEXTTOKEN $LOG_FILE > $TMP_FILE
if [[ $next_token == '' ]];
then
cp $TMP_FILE $RESULT_FILE
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: NEXTTOKEN NOT FOUND - FINISH '
rm $LOG_FILE
rm $TMP_FILE
rm $TMP_MIDDLE
rm $TMP_MIDDLE2
exit 0
else
psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
fi
first_str=`tail -1 $TMP_FILE`
line_count=`cat $TMP_FILE | wc -l`
let lines=$line_count-1
head -$lines $TMP_FILE > $RESULT_FILE
###############################################
# MAIN CIRCLE
let count=2
while [[ $next_token != '' ]];
do
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: count='$count
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
if ! aws rds download-db-log-file-portion
--max-items $last_aws_max_item_size
--starting-token $next_token
--region REGION
--db-instance-identifier $db_instance
--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
exit 4
fi
next_token_str=`cat $LOG_FILE | grep NEXTTOKEN`
next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
TMP_FILE=$LOG_FILE'.tmp'
grep -v NEXTTOKEN $LOG_FILE > $TMP_FILE
last_str=`head -1 $TMP_FILE`
if [[ $next_token == '' ]];
then
concat_str=$first_str$last_str
echo $concat_str >> $RESULT_FILE
line_count=`cat $TMP_FILE | wc -l`
let lines=$line_count-1
tail -$lines $TMP_FILE >> $RESULT_FILE
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: NEXTTOKEN NOT FOUND - FINISH '
rm $LOG_FILE
rm $TMP_FILE
rm $TMP_MIDDLE
rm $TMP_MIDDLE2
exit 0
fi
if [[ $next_token != '' ]];
then
let growth_counter=$growth_counter+1
if [[ $growth_counter -gt $growth_counter_max ]];
then
let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
let growth_counter=1
fi
if [[ $last_aws_max_item_size -gt $max_item_size ]];
then
let last_aws_max_item_size=$max_item_size
fi
psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
concat_str=$first_str$last_str
echo $concat_str >> $RESULT_FILE
line_count=`cat $TMP_FILE | wc -l`
let lines=$line_count-1
#############################
#Get middle of file
head -$lines $TMP_FILE > $TMP_MIDDLE
line_count=`cat $TMP_MIDDLE | wc -l`
let lines=$line_count-1
tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
cat $TMP_MIDDLE2 >> $RESULT_FILE
first_str=`tail -1 $TMP_FILE`
fi
let count=$count+1
done
#
#################################################################
exit 0
Vipande vya hati vilivyo na maelezo kadhaa:
Vigezo vya kuingiza hati:
- Muhuri wa saa wa jina la faili la kumbukumbu katika umbizo la YYYY-MM-DD-HH24: AWS_LOG_TIME=$1
- Kitambulisho cha hifadhidata: hifadhidata_id=$2
- Jina la faili ya kumbukumbu iliyokusanywa: RESULT_FILE=$3
Pata muhuri wa muda wa faili ya kumbukumbu iliyopakiwa mwisho:
current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`
Ikiwa muhuri wa muda wa faili ya kumbukumbu iliyopakiwa mwisho hailingani na kigezo cha ingizo, faili mpya ya kumbukumbu inapakiwa:
if [[ $current_aws_log_time != $AWS_LOG_TIME ]];
then
is_new_log='1'
if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
then
echo '***download_aws_piece.sh -FATAL_ERROR - update database set last_aws_log_time .'
exit 1
fi
else
is_new_log='0'
fi
Tunapata thamani ya lebo inayofuata kutoka kwa faili iliyopakiwa:
next_token_str=`cat $LOG_FILE | grep NEXTTOKEN`
next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
Ishara ya mwisho wa upakuaji ni thamani tupu ya nexttoken.
Katika kitanzi, tunahesabu sehemu za faili, njiani, mistari inayojumuisha na kuongeza saizi ya sehemu:
Kitanzi kikuu
# MAIN CIRCLE
let count=2
while [[ $next_token != '' ]];
do
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: count='$count
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
if ! aws rds download-db-log-file-portion
--max-items $last_aws_max_item_size
--starting-token $next_token
--region REGION
--db-instance-identifier $db_instance
--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
then
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
exit 4
fi
next_token_str=`cat $LOG_FILE | grep NEXTTOKEN`
next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
TMP_FILE=$LOG_FILE'.tmp'
grep -v NEXTTOKEN $LOG_FILE > $TMP_FILE
last_str=`head -1 $TMP_FILE`
if [[ $next_token == '' ]];
then
concat_str=$first_str$last_str
echo $concat_str >> $RESULT_FILE
line_count=`cat $TMP_FILE | wc -l`
let lines=$line_count-1
tail -$lines $TMP_FILE >> $RESULT_FILE
echo $(date +%Y%m%d%H%M)': download_aws_piece.sh: NEXTTOKEN NOT FOUND - FINISH '
rm $LOG_FILE
rm $TMP_FILE
rm $TMP_MIDDLE
rm $TMP_MIDDLE2
exit 0
fi
if [[ $next_token != '' ]];
then
let growth_counter=$growth_counter+1
if [[ $growth_counter -gt $growth_counter_max ]];
then
let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
let growth_counter=1
fi
if [[ $last_aws_max_item_size -gt $max_item_size ]];
then
let last_aws_max_item_size=$max_item_size
fi
psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
concat_str=$first_str$last_str
echo $concat_str >> $RESULT_FILE
line_count=`cat $TMP_FILE | wc -l`
let lines=$line_count-1
#############################
#Get middle of file
head -$lines $TMP_FILE > $TMP_MIDDLE
line_count=`cat $TMP_MIDDLE | wc -l`
let lines=$line_count-1
tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
cat $TMP_MIDDLE2 >> $RESULT_FILE
first_str=`tail -1 $TMP_FILE`
fi
let count=$count+1
done
Nini kinafuata?
Kwa hiyo, kazi ya kwanza ya kati - "kupakua faili ya logi kutoka kwa wingu" inatatuliwa. Nini cha kufanya na logi iliyopakuliwa?
Kwanza unahitaji kuchanganua faili ya logi na kutoa maombi halisi kutoka kwake.
Kazi sio ngumu sana. Hati rahisi zaidi ya bash hufanya vizuri.
upload_log_query.sh
#!/bin/bash
#########################################################
# upload_log_query.sh
# Upload table table from dowloaded aws file
# version HABR
###########################################################
echo 'TIMESTAMP:'$(date +%c)' Upload log_query table '
source_file=$1
echo 'source_file='$source_file
database_id=$2
echo 'database_id='$database_id
beginer=' '
first_line='1'
let "line_count=0"
sql_line=' '
sql_flag=' '
space=' '
cat $source_file | while read line
do
line="$space$line"
if [[ $first_line == "1" ]]; then
beginer=`echo $line | awk -F" " '{ print $1}' `
first_line='0'
fi
current_beginer=`echo $line | awk -F" " '{ print $1}' `
if [[ $current_beginer == $beginer ]]; then
if [[ $sql_flag == '1' ]]; then
sql_flag='0'
log_date=`echo $sql_line | awk -F" " '{ print $1}' `
log_time=`echo $sql_line | awk -F" " '{ print $2}' `
duration=`echo $sql_line | awk -F" " '{ print $5}' `
#replace ' to ''
sql_modline=`echo "$sql_line" | sed 's/'''/''''''/g'`
sql_line=' '
################
#PROCESSING OF THE SQL-SELECT IS HERE
if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d DATABASE -v ON_ERROR_STOP=1 -A -t -c "select log_query('$ip_port',$database_id , '$log_date' , '$log_time' , '$duration' , '$sql_modline' )"
then
echo 'FATAL_ERROR - log_query '
exit 1
fi
################
fi #if [[ $sql_flag == '1' ]]; then
let "line_count=line_count+1"
check=`echo $line | awk -F" " '{ print $8}' `
check_sql=${check^^}
#echo 'check_sql='$check_sql
if [[ $check_sql == 'SELECT' ]]; then
sql_flag='1'
sql_line="$sql_line$line"
ip_port=`echo $sql_line | awk -F":" '{ print $4}' `
fi
else
if [[ $sql_flag == '1' ]]; then
sql_line="$sql_line$line"
fi
fi #if [[ $current_beginer == $beginer ]]; then
done
Sasa unaweza kufanya kazi na hoja iliyotolewa kutoka kwa faili ya kumbukumbu.
Na kuna uwezekano kadhaa muhimu.
Maswali yaliyochanganuliwa lazima yahifadhiwe mahali fulani. Kwa hili, meza ya huduma hutumiwa. log_query
CREATE TABLE log_query
(
id SERIAL ,
queryid bigint ,
query_md5hash text not null ,
database_id integer not null ,
timepoint timestamp without time zone not null,
duration double precision not null ,
query text not null ,
explained_plan text[],
plan_md5hash text ,
explained_plan_wo_costs text[],
plan_hash_value text ,
baseline_id integer ,
ip text ,
port text
);
ALTER TABLE log_query ADD PRIMARY KEY (id);
ALTER TABLE log_query ADD CONSTRAINT queryid_timepoint_unique_key UNIQUE (queryid, timepoint );
ALTER TABLE log_query ADD CONSTRAINT query_md5hash_timepoint_unique_key UNIQUE (query_md5hash, timepoint );
CREATE INDEX log_query_timepoint_idx ON log_query (timepoint);
CREATE INDEX log_query_queryid_idx ON log_query (queryid);
ALTER TABLE log_query ADD CONSTRAINT database_id_fk FOREIGN KEY (database_id) REFERENCES database (id) ON DELETE CASCADE ;
Ombi lililochanganuliwa linachakatwa ndani plpgsql kazi "log_query'.
log_query.sql
--log_query.sql
--verison HABR
CREATE OR REPLACE FUNCTION log_query( ip_port text ,log_database_id integer , log_date text , log_time text , duration text , sql_line text ) RETURNS boolean AS $$
DECLARE
result boolean ;
log_timepoint timestamp without time zone ;
log_duration double precision ;
pos integer ;
log_query text ;
activity_string text ;
log_md5hash text ;
log_explain_plan text[] ;
log_planhash text ;
log_plan_wo_costs text[] ;
database_rec record ;
pg_stat_query text ;
test_log_query text ;
log_query_rec record;
found_flag boolean;
pg_stat_history_rec record ;
port_start integer ;
port_end integer ;
client_ip text ;
client_port text ;
log_queryid bigint ;
log_query_text text ;
pg_stat_query_text text ;
BEGIN
result = TRUE ;
RAISE NOTICE '***log_query';
port_start = position('(' in ip_port);
port_end = position(')' in ip_port);
client_ip = substring( ip_port from 1 for port_start-1 );
client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );
SELECT e.host , d.name , d.owner_pwd
INTO database_rec
FROM database d JOIN endpoint e ON e.id = d.endpoint_id
WHERE d.id = log_database_id ;
log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
log_duration = duration:: double precision;
pos = position ('SELECT' in UPPER(sql_line) );
log_query = substring( sql_line from pos for LENGTH(sql_line));
log_query = regexp_replace(log_query,' +',' ','g');
log_query = regexp_replace(log_query,';+','','g');
log_query = trim(trailing ' ' from log_query);
log_md5hash = md5( log_query::text );
--Explain execution plan--
EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')';
log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
PERFORM dblink_disconnect('LINK1');
--------------------------
BEGIN
INSERT INTO log_query
(
query_md5hash ,
database_id ,
timepoint ,
duration ,
query ,
explained_plan ,
plan_md5hash ,
explained_plan_wo_costs ,
plan_hash_value ,
ip ,
port
)
VALUES
(
log_md5hash ,
log_database_id ,
log_timepoint ,
log_duration ,
log_query ,
log_explain_plan ,
md5(log_explain_plan::text) ,
log_plan_wo_costs ,
md5(log_plan_wo_costs::text),
client_ip ,
client_port
);
activity_string = 'New query has logged '||
' database_id = '|| log_database_id ||
' query_md5hash='||log_md5hash||
' , timepoint = '||to_char(log_timepoint,'YYYYMMDD HH24:MI:SS');
RAISE NOTICE '%',activity_string;
PERFORM pg_log( log_database_id , 'log_query' , activity_string);
EXCEPTION
WHEN unique_violation THEN
RAISE NOTICE '*** unique_violation *** query already has logged';
END;
SELECT queryid
INTO log_queryid
FROM log_query
WHERE query_md5hash = log_md5hash AND
timepoint = log_timepoint;
IF log_queryid IS NOT NULL
THEN
RAISE NOTICE 'log_query with query_md5hash = % and timepoint = % has already has a QUERYID = %',log_md5hash,log_timepoint , log_queryid ;
RETURN result;
END IF;
------------------------------------------------
RAISE NOTICE 'Update queryid';
SELECT *
INTO log_query_rec
FROM log_query
WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
FOR pg_stat_history_rec IN
SELECT
queryid ,
query
FROM
pg_stat_db_queries
WHERE
database_id = log_database_id AND
queryid is not null
LOOP
pg_stat_query = pg_stat_history_rec.query ;
pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
log_query_text = trim(trailing ' ' from log_query_rec.query);
pg_stat_query_text = pg_stat_query;
--SELECT log_query_rec.query like pg_stat_query INTO found_flag ;
IF (log_query_text LIKE pg_stat_query_text) THEN
found_flag = TRUE ;
ELSE
found_flag = FALSE ;
END IF;
IF found_flag THEN
UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
activity_string = ' updated queryid = '||pg_stat_history_rec.queryid||
' for log_query with id = '||log_query_rec.id
;
RAISE NOTICE '%',activity_string;
EXIT ;
END IF ;
END LOOP ;
RETURN result ;
END
$$ LANGUAGE plpgsql;
Wakati wa usindikaji, meza ya huduma hutumiwa pg_stat_db_queriesA ambayo ina picha ya maswali ya sasa kutoka kwa jedwali pg_stat_historia (Matumizi ya jedwali yamefafanuliwa hapa -
TABLE pg_stat_db_queries
(
database_id integer,
queryid bigint ,
query text ,
max_time double precision
);
TABLE pg_stat_history
(
β¦
database_id integer ,
β¦
queryid bigint ,
β¦
max_time double precision ,
β¦
);
Kazi inakuwezesha kutekeleza idadi ya vipengele muhimu kwa ajili ya usindikaji maombi kutoka kwa faili ya logi. Yaani:
Fursa #1 - Historia ya Utekelezaji wa Hoja
Muhimu sana kwa kuanzisha tukio la utendaji. Kwanza, ifahamike historia - na kushuka kulianza lini?
Kisha, kwa mujibu wa classics, tafuta sababu za nje. Inaweza tu kuwa mzigo wa hifadhidata umeongezeka sana na ombi maalum halihusiani nayo.
Ongeza ingizo jipya kwenye jedwali la log_query
port_start = position('(' in ip_port);
port_end = position(')' in ip_port);
client_ip = substring( ip_port from 1 for port_start-1 );
client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );
SELECT e.host , d.name , d.owner_pwd
INTO database_rec
FROM database d JOIN endpoint e ON e.id = d.endpoint_id
WHERE d.id = log_database_id ;
log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
log_duration = to_number(duration,'99999999999999999999D9999999999');
pos = position ('SELECT' in UPPER(sql_line) );
log_query = substring( sql_line from pos for LENGTH(sql_line));
log_query = regexp_replace(log_query,' +',' ','g');
log_query = regexp_replace(log_query,';+','','g');
log_query = trim(trailing ' ' from log_query);
RAISE NOTICE 'log_query=%',log_query ;
log_md5hash = md5( log_query::text );
--Explain execution plan--
EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')';
log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
PERFORM dblink_disconnect('LINK1');
--------------------------
BEGIN
INSERT INTO log_query
(
query_md5hash ,
database_id ,
timepoint ,
duration ,
query ,
explained_plan ,
plan_md5hash ,
explained_plan_wo_costs ,
plan_hash_value ,
ip ,
port
)
VALUES
(
log_md5hash ,
log_database_id ,
log_timepoint ,
log_duration ,
log_query ,
log_explain_plan ,
md5(log_explain_plan::text) ,
log_plan_wo_costs ,
md5(log_plan_wo_costs::text),
client_ip ,
client_port
);
Kipengele #2 - Hifadhi Mipango ya Utekelezaji ya Hoja
Katika hatua hii, maoni ya pingamizi-ufafanuzi yanaweza kutokea: "Lakini tayari kuna maelezo ya kiotomatiki". Ndio, ni hivyo, lakini ni nini uhakika ikiwa mpango wa utekelezaji umehifadhiwa kwenye faili moja ya kumbukumbu na ili kuihifadhi kwa uchambuzi zaidi, lazima uchague faili ya logi?
Walakini, nilihitaji:
kwanza: kuhifadhi mpango wa utekelezaji katika meza ya huduma ya hifadhidata ya ufuatiliaji;
pili: kuweza kulinganisha mipango ya utekelezaji na kila mmoja ili kuona mara moja kuwa mpango wa utekelezaji wa hoja umebadilika.
Ombi lililo na vigezo maalum vya utekelezaji linapatikana. Kupata na kuhifadhi mpango wake wa utekelezaji kwa kutumia EXPLAIN ni kazi ya msingi.
Kwa kuongezea, kwa kutumia usemi wa EXPLAIN (Gharama za UONGO), unaweza kupata mfumo wa mpango, ambao utatumika kupata thamani ya hashi ya mpango, ambayo itasaidia katika uchambuzi unaofuata wa historia ya mabadiliko ya mpango wa utekelezaji.
Pata kiolezo cha mpango wa utekelezaji
--Explain execution plan--
EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')';
log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
PERFORM dblink_disconnect('LINK1');
Fursa #3 - Kutumia Kumbukumbu ya Hoji kwa Ufuatiliaji
Kwa kuwa vipimo vya utendakazi havijasanidiwa kwa ajili ya maandishi ya ombi, bali kwa ajili ya kitambulisho chake, unahitaji kuhusisha maombi kutoka kwa faili ya kumbukumbu na maombi ambayo vipimo vya utendakazi vimesanidiwa.
Kweli, angalau ili kuwa na wakati halisi wa kutokea kwa tukio la utendaji.
Kwa hivyo, tukio la utendakazi linapotokea kwa kitambulisho cha ombi, kutakuwa na kumbukumbu ya ombi maalum na maadili maalum ya kigezo na wakati halisi wa utekelezaji na muda wa ombi. Pata maelezo uliyopewa kwa kutumia mwonekano pekee pg_stat_statements - ni haramu.
Tafuta swali la swali na usasishe ingizo kwenye jedwali la log_query
SELECT *
INTO log_query_rec
FROM log_query
WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
FOR pg_stat_history_rec IN
SELECT
queryid ,
query
FROM
pg_stat_db_queries
WHERE
database_id = log_database_id AND
queryid is not null
LOOP
pg_stat_query = pg_stat_history_rec.query ;
pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
log_query_text = trim(trailing ' ' from log_query_rec.query);
pg_stat_query_text = pg_stat_query;
--SELECT log_query_rec.query like pg_stat_query INTO found_flag ;
IF (log_query_text LIKE pg_stat_query_text) THEN
found_flag = TRUE ;
ELSE
found_flag = FALSE ;
END IF;
IF found_flag THEN
UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
activity_string = ' updated queryid = '||pg_stat_history_rec.queryid||
' for log_query with id = '||log_query_rec.id
;
RAISE NOTICE '%',activity_string;
EXIT ;
END IF ;
END LOOP ;
Baada ya
Kama matokeo, njia iliyoelezewa imepata matumizi yake ndani
Ingawa, bila shaka, kwa maoni yangu ya kibinafsi, bado itakuwa muhimu kufanya kazi kwenye algorithm ya kuchagua na kubadilisha ukubwa wa sehemu iliyopakuliwa. Tatizo bado halijatatuliwa katika kesi ya jumla. Pengine itakuwa ya kuvutia.
Lakini hiyo ni hadithi tofauti kabisa ...
Chanzo: mapenzi.com