Soo dejinta PostgreSQL log ka AWS daruur

Ama wax yar oo lagu dabaqay tetrisology.
Wax kasta oo cusub si fiican baa loo illoobay oo duug ah.
Epigraphs.
Soo dejinta PostgreSQL log ka AWS daruur

Abuurista dhibaatada

Waxaad u baahan tahay inaad si xilliyo ah u soo dejiso feylka log PostgreSQL ee hadda jira daruuraha AWS ilaa martigeliyaha Linux ee deegaankaaga. Ma aha wakhtiga dhabta ah, laakiin, aynu nidhaahno, dib u dhac yar.
Muddada cusboonaysiinta faylka log waa 5 daqiiqo.
Faylka log ee AWS waa la beddelaa saacad kasta.

Qalabka la isticmaalay

Si loo soo dejiyo faylka log-ka martida loo yahay, qoraal bash ah ayaa la isticmaalaa oo wacaya AWS API"aws rds download-db-log-file-partion".

Qodobada:

  • -db-tusaale-aqoonsiga: AWS tusaale ahaan magaca;
  • --log-file-name: magaca faylka log ee hadda la sameeyay
  • --max-item: Tirada guud ee alaabta lagu soo celiyay soo-saarka amarka.Baaxadda qaybta faylka la soo dejiyay.
  • --starting-token: Token bilawga ah

Xaaladdan gaarka ah, hawsha rarista logyada ayaa kacday intii lagu jiray shaqada la socodka waxqabadka weydiinta PostgreSQL.

Waana mid fudud - hawl xiiso leh oo loogu talagalay tababarka iyo kala duwanaanta saacadaha shaqada.
Waxaan u qaadan doonaa in dhibka mar hore la xaliyay sababtoo ah nolol maalmeedka. Laakiin Google degdeg ah ma soo jeedin wax xal ah, mana aanan haysan rabitaan badan oo aan si qoto dheer u baadho. Si kastaba ha ahaatee, waa jimicsi fiican.

Si rasmi ah hawsha

Galka ugu dambeeya ee loggu wuxuu ka kooban yahay xariiqyo badan oo dherer doorsooma. Sawir ahaan, faylka log waxaa lagu matali karaa wax sidan oo kale ah:
Soo dejinta PostgreSQL log ka AWS daruur

Horta ma wax bay ku xasuusinaysaa? Maxay Tetris ka leedahay? Oo waa kan waxa ay ku saabsan tahay.
Haddii aan qiyaasno fursadaha suurtogalka ah ee soo baxa marka la soo dhejinayo faylka soo socda si garaaf ahaan ah (fududnaanta, kiiskan, ha u oggolow xariiqdu inay yeeshaan dherer isku mid ah), waxaan helnaa Qaybaha Tetris caadiga ah:

1) Faylka gabi ahaanba waa la soo dejiyay oo waa kama dambays. Cabbirka qaybtu way ka weyn tahay cabbirka faylka u dambeeya:
Soo dejinta PostgreSQL log ka AWS daruur

2) Faylku wuu sii socdaa Cabbirka jajabku wuu ka yar yahay cabbirka faylka ugu dambeeya:
Soo dejinta PostgreSQL log ka AWS daruur

3) Faylku waa siiwad faylkii hore wuxuuna leeyahay siiwad Cabbirka jajabku wuu ka yar yahay cabbirka inta ka hadhay faylka u dambeeya:
Soo dejinta PostgreSQL log ka AWS daruur

4) Faylku waa sii wadida faylkii hore waana kan ugu dambeeya. Cabbirka jajabku wuu ka weyn yahay cabbirka inta ka hadhay faylka u dambeeya:
Soo dejinta PostgreSQL log ka AWS daruur

Hawshu waa in la ururiyo leydi ama la ciyaaro Tetris heer cusub.
Soo dejinta PostgreSQL log ka AWS daruur

Dhibaatooyinka soo baxa marka la xalinayo dhibaatada

1) Ku dheji xadhig 2 xabbo ah

Soo dejinta PostgreSQL log ka AWS daruur
Guud ahaan, ma jirin dhibaatooyin gaar ah. Dhibaato caadi ah oo ka timid koorsada barnaamijka bilowga ah.

Cabbirka adeegga ugu fiican

Laakiin tani waa wax yar oo xiiso leh.
Nasiib darro, ma jirto dariiqa loo isticmaalo kabista ka dib calaamadda qaybta bilowga:

Sidaad horeyba u ogeyd ikhtiyaarka -starting-token waxaa loo isticmaalaa in lagu qeexo meesha laga bilaabo paginating. Doorashadani waxay qaadataa qiyamka String taas oo macnaheedu yahay haddii aad isku daydo inaad ku darto qiimaha ka-hortagga xargaha Xiga ee Xiga, ikhtiyaarka looma tixgelin doono kabid ahaan.

Sidaa darteed, waa in aad u akhrido qaybo.
Haddii aad wax ku akhrido qaybo badan, tirada akhrintu way yaraan doontaa, laakiin mugga ayaa noqon doona ugu badnaan.
Haddii aad wax ku akhrido qaybo yaryar, ka dibna liddi ku ah, tirada akhrintu waxay noqon doontaa ugu badnaan, laakiin mugga ayaa noqon doona mid aad u yar.
Sidaa darteed, si loo yareeyo taraafikada iyo guud ahaan quruxda xalka, waxaan ku qasbanahay inaan la imaado xal, kaas oo, nasiib daro, u muuqda wax yar oo la mid ah.

Tusaale ahaan, aan tixgelinno habka loo soo dejiyo faylka log ee 2 nooc oo aad loo fududeeyay. Tirada akhrinta labada xaaladood waxay kuxirantahay xajmiga qaybta.

1) Ku shub qaybo yaryar:
Soo dejinta PostgreSQL log ka AWS daruur

2) Ku shub qaybo badan:
Soo dejinta PostgreSQL log ka AWS daruur

Sida caadiga ah, xalka ugu fiican ayaa dhexda ku jira.
Cabbirka adeeggu waa mid aad u yar, laakiin inta lagu jiro habka akhriska, cabbirka waa la kordhin karaa si loo yareeyo tirada akhrinta.

Waa in la xuso in mushkiladda xulashada cabbirka ugu habboon ee qaybta la akhrin karo aan weli la xalin oo u baahan daraasad iyo falanqayn qoto dheer. Malaha wax yar ka dib.

Sharaxaada guud ee fulinta

Miisaska adeegga ee la isticmaalay

CREATE TABLE endpoint
(
id SERIAL ,
host text 
);

TABLE database
(
id SERIAL , 
…
last_aws_log_time text ,
last_aws_nexttoken text ,
aws_max_item_size integer 
);
last_aws_log_time — временная метка последнего загруженного лог-файла в формате YYYY-MM-DD-HH24.
last_aws_nexttoken — текстовая метка последней загруженной порции.
aws_max_item_size- эмпирическим путем, подобранный начальный размер порции.

Qoraal qoraal ah oo buuxa

download_aws_piece.sh

#!/bin/bash
#########################################################
# download_aws_piece.sh
# downloan piece of log from AWS
# version HABR
 let min_item_size=1024
 let max_item_size=1048576
 let growth_factor=3
 let growth_counter=1
 let growth_counter_max=3

 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:''STARTED'
 
 AWS_LOG_TIME=$1
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:AWS_LOG_TIME='$AWS_LOG_TIME
  
 database_id=$2
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:database_id='$database_id
 RESULT_FILE=$3 
  
 endpoint=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE_DATABASE -A -t -c "select e.host from endpoint e join database d on e.id = d.endpoint_id where d.id = $database_id "`
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:endpoint='$endpoint
  
 db_instance=`echo $endpoint | awk -F"." '{print toupper($1)}'`
 
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:db_instance='$db_instance

 LOG_FILE=$RESULT_FILE'.tmp_log'
 TMP_FILE=$LOG_FILE'.tmp'
 TMP_MIDDLE=$LOG_FILE'.tmp_mid'  
 TMP_MIDDLE2=$LOG_FILE'.tmp_mid2'  
  
 current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`

 echo $(date +%Y%m%d%H%M)':      download_aws_piece.sh:current_aws_log_time='$current_aws_log_time
  
  if [[ $current_aws_log_time != $AWS_LOG_TIME  ]];
  then
    is_new_log='1'
	if ! psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
	then
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - update database set last_aws_log_time .'
	  exit 1
	fi
  else
    is_new_log='0'
  fi
  
  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:is_new_log='$is_new_log
  
  let last_aws_max_item_size=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select aws_max_item_size from database where id = $database_id "`
  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: last_aws_max_item_size='$last_aws_max_item_size
  
  let count=1
  if [[ $is_new_log == '1' ]];
  then    
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF NEW AWS LOG'
	if ! aws rds download-db-log-file-portion 
		--max-items $last_aws_max_item_size 
		--region REGION 
		--db-instance-identifier  $db_instance 
		--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 2
	fi  	
  else
    next_token=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "select last_aws_nexttoken from database where id = $database_id "`
	
	if [[ $next_token == '' ]];
	then
	  next_token='0'	  
	fi
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: CONTINUE DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
	    --max-items $last_aws_max_item_size 
		--starting-token $next_token 
		--region REGION 
		--db-instance-identifier  $db_instance 
		--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 3
	fi       
	
	line_count=`cat  $LOG_FILE | wc -l`
	let lines=$line_count-1
	  
	tail -$lines $LOG_FILE > $TMP_MIDDLE 
	mv -f $TMP_MIDDLE $LOG_FILE
  fi
  
  next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
  next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
  
  grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE 
  
  if [[ $next_token == '' ]];
  then
	  cp $TMP_FILE $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
          rm $TMP_MIDDLE2	  
	  exit 0  
  else
	psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
  fi
  
  first_str=`tail -1 $TMP_FILE`
  
  line_count=`cat  $TMP_FILE | wc -l`
  let lines=$line_count-1    
  
  head -$lines $TMP_FILE  > $RESULT_FILE

###############################################
# MAIN CIRCLE
  let count=2
  while [[ $next_token != '' ]];
  do 
    echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: count='$count
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
             --max-items $last_aws_max_item_size 
             --starting-token $next_token 
             --region REGION 
             --db-instance-identifier  $db_instance 
             --log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 4
	fi

	next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
	next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

	TMP_FILE=$LOG_FILE'.tmp'
	grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE  
	
	last_str=`head -1 $TMP_FILE`
  
    if [[ $next_token == '' ]];
	then
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  tail -$lines $TMP_FILE >> $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
          rm $TMP_MIDDLE2	  
	  exit 0  
	fi
	
    if [[ $next_token != '' ]];
	then
		let growth_counter=$growth_counter+1
		if [[ $growth_counter -gt $growth_counter_max ]];
		then
			let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
			let growth_counter=1
		fi
	
		if [[ $last_aws_max_item_size -gt $max_item_size ]]; 
		then
			let last_aws_max_item_size=$max_item_size
		fi 

	  psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
	  
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  #############################
	  #Get middle of file
	  head -$lines $TMP_FILE > $TMP_MIDDLE
	  
	  line_count=`cat  $TMP_MIDDLE | wc -l`
	  let lines=$line_count-1
	  tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
	  
	  cat $TMP_MIDDLE2 >> $RESULT_FILE	  
	  
	  first_str=`tail -1 $TMP_FILE`	  
	fi
	  
    let count=$count+1

  done
#
#################################################################

exit 0  

Qaybo qoraal ah oo leh sharraxaadyo:

Qiyaasta gelinta qoraalka:

  • Jadwalka wakhtiga ee magaca faylka logga qaabka YYYY-MM-DD-HH24: AWS_LOG_TIME=$1
  • Aqoonsiga xogta: database_id=$2
  • Magaca faylka diiwaanka la ururiyay: RESULT_FILE=$3

Hel shaambada wakhtiga faylka gal-gal ee ugu dambeeyay ee la raray:

current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`

Haddii shaambada wakhtiga galka log ee u dambeeyay ee la raray aanu ku habboonayn cabbirka gelinta, fayl cusub ayaa la raray:

if [[ $current_aws_log_time != $AWS_LOG_TIME  ]];
  then
    is_new_log='1'
	if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
	then
	  echo '***download_aws_piece.sh -FATAL_ERROR - update database set last_aws_log_time .'
	  exit 1
	fi
  else
    is_new_log='0'
  fi

Waxaan ka helnaa qiimaha summada xigta ee faylka la soo dejiyay:

  next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
  next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

Qiimaha xiga ee maran wuxuu u adeegaa sidii calaamadda dhamaadka soo dejinta.

Wareeg ahaan, waxaanu tirinnaa qaybo ka mid ah faylka, isku xidhka xadhkaha jidka oo aanu kordhinayna cabbirka qaybta:
Loop ugu weyn

# MAIN CIRCLE
  let count=2
  while [[ $next_token != '' ]];
  do 
    echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: count='$count
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
     --max-items $last_aws_max_item_size 
	 --starting-token $next_token 
     --region REGION 
     --db-instance-identifier  $db_instance 
     --log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 4
	fi

	next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
	next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

	TMP_FILE=$LOG_FILE'.tmp'
	grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE  
	
	last_str=`head -1 $TMP_FILE`
  
    if [[ $next_token == '' ]];
	then
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  tail -$lines $TMP_FILE >> $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
         rm $TMP_MIDDLE2	  
	  exit 0  
	fi
	
    if [[ $next_token != '' ]];
	then
		let growth_counter=$growth_counter+1
		if [[ $growth_counter -gt $growth_counter_max ]];
		then
			let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
			let growth_counter=1
		fi
	
		if [[ $last_aws_max_item_size -gt $max_item_size ]]; 
		then
			let last_aws_max_item_size=$max_item_size
		fi 

	  psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
	  
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  #############################
	  #Get middle of file
	  head -$lines $TMP_FILE > $TMP_MIDDLE
	  
	  line_count=`cat  $TMP_MIDDLE | wc -l`
	  let lines=$line_count-1
	  tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
	  
	  cat $TMP_MIDDLE2 >> $RESULT_FILE	  
	  
	  first_str=`tail -1 $TMP_FILE`	  
	fi
	  
    let count=$count+1

  done

Maxaa xiga?

Markaa, hawsha ugu horreysa ee dhexdhexaadka ah - "ka soo deji faylka log ee daruuraha" ayaa la xalliyay. Maxaa lagu sameeyaa log la soo dejiyay?
Marka hore, waxaad u baahan tahay inaad kala qaybiso faylka log oo aad ka soo saartid codsiyada dhabta ah.
Hawshu aad uma adka. Qoraalka bash-ka ugu fudud ayaa si fiican u qabta shaqada.
upload_log_query.sh

#!/bin/bash
#########################################################
# upload_log_query.sh
# Upload table table from dowloaded aws file 
# version HABR
###########################################################  
echo 'TIMESTAMP:'$(date +%c)' Upload log_query table '
source_file=$1
echo 'source_file='$source_file
database_id=$2
echo 'database_id='$database_id

beginer=' '
first_line='1'
let "line_count=0"
sql_line=' '
sql_flag=' '    
space=' '
cat $source_file | while read line
do
  line="$space$line"

  if [[ $first_line == "1" ]]; then
    beginer=`echo $line | awk -F" " '{ print $1}' `
    first_line='0'
  fi

  current_beginer=`echo $line | awk -F" " '{ print $1}' `

  if [[ $current_beginer == $beginer ]]; then
    if [[ $sql_flag == '1' ]]; then
     sql_flag='0' 
     log_date=`echo $sql_line | awk -F" " '{ print $1}' `
     log_time=`echo $sql_line | awk -F" " '{ print $2}' `
     duration=`echo $sql_line | awk -F" " '{ print $5}' `

     #replace ' to ''
     sql_modline=`echo "$sql_line" | sed 's/'''/''''''/g'`
     sql_line=' '

	 ################
	 #PROCESSING OF THE SQL-SELECT IS HERE
     if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d DATABASE -v ON_ERROR_STOP=1 -A -t -c "select log_query('$ip_port',$database_id , '$log_date' , '$log_time' , '$duration' , '$sql_modline' )" 
     then
        echo 'FATAL_ERROR - log_query '
        exit 1
     fi
	 ################

    fi #if [[ $sql_flag == '1' ]]; then

    let "line_count=line_count+1"

    check=`echo $line | awk -F" " '{ print $8}' `
    check_sql=${check^^}    

    #echo 'check_sql='$check_sql
    
    if [[ $check_sql == 'SELECT' ]]; then
     sql_flag='1'    
     sql_line="$sql_line$line"
	 ip_port=`echo $sql_line | awk -F":" '{ print $4}' `
    fi
  else       

    if [[ $sql_flag == '1' ]]; then
      sql_line="$sql_line$line"
    fi   
    
  fi #if [[ $current_beginer == $beginer ]]; then

done

Hadda waxaad la shaqayn kartaa codsiga laga soo doortay faylka log.

Oo dhawr fursadood oo faa'iido leh ayaa furmay.

Weydiimaha la turxaan bixiyay waxay u baahan yihiin in lagu kaydiyo meel. Miis adeeg ayaa loo isticmaalaa tan log_questry

CREATE TABLE log_query
(
   id SERIAL ,
   queryid bigint ,
   query_md5hash text not null ,
   database_id integer not null ,  
   timepoint timestamp without time zone not null,
   duration double precision not null ,
   query text not null ,
   explained_plan text[],
   plan_md5hash text  , 
   explained_plan_wo_costs text[],
   plan_hash_value text  ,
   baseline_id integer ,
   ip text ,
   port text 
);
ALTER TABLE log_query ADD PRIMARY KEY (id);
ALTER TABLE log_query ADD CONSTRAINT queryid_timepoint_unique_key UNIQUE (queryid, timepoint );
ALTER TABLE log_query ADD CONSTRAINT query_md5hash_timepoint_unique_key UNIQUE (query_md5hash, timepoint );

CREATE INDEX log_query_timepoint_idx ON log_query (timepoint);
CREATE INDEX log_query_queryid_idx ON log_query (queryid);
ALTER TABLE log_query ADD CONSTRAINT database_id_fk FOREIGN KEY (database_id) REFERENCES database (id) ON DELETE CASCADE ;

Codsiga la sifeeyay ayaa lagu farsameeyay gudaha plgsql hawlaha"log_questry".
log_query.sql

--log_query.sql
--verison HABR
CREATE OR REPLACE FUNCTION log_query( ip_port text ,log_database_id integer , log_date text , log_time text , duration text , sql_line text   ) RETURNS boolean AS $$
DECLARE
  result boolean ;
  log_timepoint timestamp without time zone ;
  log_duration double precision ; 
  pos integer ;
  log_query text ;
  activity_string text ;
  log_md5hash text ;
  log_explain_plan text[] ;
  
  log_planhash text ;
  log_plan_wo_costs text[] ; 
  
  database_rec record ;
  
  pg_stat_query text ; 
  test_log_query text ;
  log_query_rec record;
  found_flag boolean;
  
  pg_stat_history_rec record ;
  port_start integer ;
  port_end integer ;
  client_ip text ;
  client_port text ;
  log_queryid bigint ;
  log_query_text text ;
  pg_stat_query_text text ; 
BEGIN
  result = TRUE ;

  RAISE NOTICE '***log_query';
  
  port_start = position('(' in ip_port);
  port_end = position(')' in ip_port);
  client_ip = substring( ip_port from 1 for port_start-1 );
  client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );

  SELECT e.host , d.name , d.owner_pwd 
  INTO database_rec
  FROM database d JOIN endpoint e ON e.id = d.endpoint_id
  WHERE d.id = log_database_id ;
  
  log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
  log_duration = duration:: double precision; 

  
  pos = position ('SELECT' in UPPER(sql_line) );
  log_query = substring( sql_line from pos for LENGTH(sql_line));
  log_query = regexp_replace(log_query,' +',' ','g');
  log_query = regexp_replace(log_query,';+','','g');
  log_query = trim(trailing ' ' from log_query);
 

  log_md5hash = md5( log_query::text );
  
  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');
  --------------------------
  BEGIN
	INSERT INTO log_query
	(
		query_md5hash ,
		database_id , 
		timepoint ,
		duration ,
		query ,
		explained_plan ,
		plan_md5hash , 
		explained_plan_wo_costs , 
		plan_hash_value , 
		ip , 
		port
	) 
	VALUES 
	(
		log_md5hash ,
		log_database_id , 
		log_timepoint , 
		log_duration , 
		log_query ,
		log_explain_plan , 
		md5(log_explain_plan::text) ,
		log_plan_wo_costs , 
		md5(log_plan_wo_costs::text),
		client_ip , 
		client_port		
	);
	activity_string = 	'New query has logged '||
						' database_id = '|| log_database_id ||
						' query_md5hash='||log_md5hash||
						' , timepoint = '||to_char(log_timepoint,'YYYYMMDD HH24:MI:SS');
					
	RAISE NOTICE '%',activity_string;					
					 
	PERFORM pg_log( log_database_id , 'log_query' , activity_string);  

	EXCEPTION
	  WHEN unique_violation THEN
		RAISE NOTICE '*** unique_violation *** query already has logged';
	END;

	SELECT 	queryid
	INTO   	log_queryid
	FROM 	log_query 
	WHERE 	query_md5hash = log_md5hash AND
			timepoint = log_timepoint;

	IF log_queryid IS NOT NULL 
	THEN 
	  RAISE NOTICE 'log_query with query_md5hash = % and timepoint = % has already has a QUERYID = %',log_md5hash,log_timepoint , log_queryid ;
	  RETURN result;
	END IF;
	
	------------------------------------------------
	RAISE NOTICE 'Update queryid';	
	
	SELECT * 
	INTO log_query_rec
	FROM log_query
	WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ; 
	
	log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
	
	FOR pg_stat_history_rec IN
	 SELECT 
         queryid ,
	  query 
	 FROM 
         pg_stat_db_queries 
     WHERE  
      database_id = log_database_id AND
       queryid is not null 
	LOOP
	  pg_stat_query = pg_stat_history_rec.query ; 
	  pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
	
	  log_query_text = trim(trailing ' ' from log_query_rec.query);
	  pg_stat_query_text = pg_stat_query; 
	
	  
	  --SELECT log_query_rec.query like pg_stat_query INTO found_flag ; 
	  IF (log_query_text LIKE pg_stat_query_text) THEN
		found_flag = TRUE ;
	  ELSE
		found_flag = FALSE ;
	  END IF;	  
	  
	  
	  IF found_flag THEN
	    
		UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
		activity_string = 	' updated queryid = '||pg_stat_history_rec.queryid||
		                    ' for log_query with id = '||log_query_rec.id               
		   				    ;						
	    RAISE NOTICE '%',activity_string;	
		EXIT ;
	  END IF ;
	  
	END LOOP ;
	
  RETURN result ;
END
$$ LANGUAGE plpgsql;

Miis adeeg ayaa la isticmaalaa inta lagu jiro habaynta pg_stat_db_questions, oo ka kooban sawirka su'aalaha hadda ee miiska pg_stat_taariikhda (Isticmaalka shaxda ayaa halkan lagu sharaxay - Korjoogteynta waxqabadka weydiinta PostgreSQL. Qaybta 1 - warbixinta)

TABLE pg_stat_db_queries
(
   database_id integer,  
   queryid bigint ,  
   query text , 
   max_time double precision 
);

TABLE pg_stat_history 
(
…
database_id integer ,
…
queryid bigint ,
…
max_time double precision	 , 	
…
);

Shaqadu waxay kuu ogolaanaysaa inaad fuliso tiro awoodo waxtar leh oo ku saabsan ka baaraandegidda codsiyada faylka log. Kuwaas oo kala ah:

Fursad #1 - Taariikhda fulinta weydiinta

Aad bay faa'iido u leedahay in la bilaabo xallinta dhacdada waxqabadka. Marka hore, taariikhda la baro - goorma ayuu gaabisku bilaabmay?
Kadibna, sida laga soo xigtay classics, raadi sababo dibadda ah. Waxaa laga yaabaa in culeyska keydka macluumaadka uu si fudud u kordhay oo codsiga gaarka ah uusan wax shaqo ah ku lahayn.
Ku dar gelid cusub miiska log_query

  port_start = position('(' in ip_port);
  port_end = position(')' in ip_port);
  client_ip = substring( ip_port from 1 for port_start-1 );
  client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );

  SELECT e.host , d.name , d.owner_pwd 
  INTO database_rec
  FROM database d JOIN endpoint e ON e.id = d.endpoint_id
  WHERE d.id = log_database_id ;
  
  log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
  log_duration = to_number(duration,'99999999999999999999D9999999999'); 

  
  pos = position ('SELECT' in UPPER(sql_line) );
  log_query = substring( sql_line from pos for LENGTH(sql_line));
  log_query = regexp_replace(log_query,' +',' ','g');
  log_query = regexp_replace(log_query,';+','','g');
  log_query = trim(trailing ' ' from log_query);
 
  RAISE NOTICE 'log_query=%',log_query ;   

  log_md5hash = md5( log_query::text );
  
  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');
  --------------------------
  BEGIN
	INSERT INTO log_query
	(
		query_md5hash ,
		database_id , 
		timepoint ,
		duration ,
		query ,
		explained_plan ,
		plan_md5hash , 
		explained_plan_wo_costs , 
		plan_hash_value , 
		ip , 
		port
	) 
	VALUES 
	(
		log_md5hash ,
		log_database_id , 
		log_timepoint , 
		log_duration , 
		log_query ,
		log_explain_plan , 
		md5(log_explain_plan::text) ,
		log_plan_wo_costs , 
		md5(log_plan_wo_costs::text),
		client_ip , 
		client_port		
	);

Suurtagalnimada #2 - Keydi qorshooyinka fulinta weydiinta

Halkaa marka ay marayso waxa soo bixi kara faallo- caddayn- diidmo: “Laakiin mar hore ayaa si toos ah loo sharraxay" Haa, way jirtaa, laakiin maxay tahay faa'iidada haddii qorshaha fulinta lagu kaydiyo isla faylka log iyo si loo kaydiyo falanqayn dheeraad ah, waa inaad baartaa faylka log?

Waxa aan u baahanahay waxay ahayd:
marka hore: ku kaydi qorshaha fulinta shaxda adeegga ee kaydinta xogta;
Marka labaad: in la is barbar dhigo qorshayaasha fulinta si isla markiiba loo arko in qorshaha fulinta su'aaluhu isbeddelay.

Waxaa jira codsi leh xuduudo fulineed oo gaar ah. Helitaanka iyo kaydinta qorshaheeda fulinta iyadoo la isticmaalayo SHARAXA waa hawl hoose.
Intaa waxaa dheer, adigoo isticmaalaya ereyga SHARAX (QIIMAHA BEENKA), waxaad heli kartaa qalfoofka qorshaha, kaas oo loo isticmaali doono in lagu helo qiimaha xashiishka ee qorshaha, kaas oo gacan ka geysan doona falanqaynta xigta ee taariikhda isbeddelka qorshaha fulinta.
Hel template qorshaha fulinta

  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');

Suurtagalnimada #3 - Isticmaalka diiwaanka su'aasha si loola socdo

Maaddaama cabbiraadaha waxqabadka aan lagu habeynin qoraalka codsiga, laakiin aqoonsigiisa, waxaad u baahan tahay inaad ku xirto codsiyada faylka log iyo codsiyada cabbiraadaha waxqabadka la habeeyey.
Hagaag, ugu yaraan si loo helo wakhtiga saxda ah ee dhacdada waxqabadka.

Sidan, marka dhacdo wax qabad ay ku dhacdo aqoonsiga codsiga, waxaa jiri doona xiriiriye codsi gaar ah oo leh qiyam gaar ah iyo waqtiga fulinta saxda ah iyo muddada codsiga. Hel macluumaadkan adigoo isticmaalaya aragtida oo keliya pg_stat_oraahyada - waa mamnuuc.
Soo hel su'aalaha codsiga oo cusboonaysii gelida miiska log_query

SELECT * 
	INTO log_query_rec
	FROM log_query
	WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ; 
	
	log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
	
	FOR pg_stat_history_rec IN
	 SELECT 
      queryid ,
	  query 
	 FROM 
       pg_stat_db_queries 
     WHERE  
	   database_id = log_database_id AND
       queryid is not null 
	LOOP
	  pg_stat_query = pg_stat_history_rec.query ; 
	  pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
	
	  log_query_text = trim(trailing ' ' from log_query_rec.query);
	  pg_stat_query_text = pg_stat_query; 
	  
	  --SELECT log_query_rec.query like pg_stat_query INTO found_flag ; 
	  IF (log_query_text LIKE pg_stat_query_text) THEN
		found_flag = TRUE ;
	  ELSE
		found_flag = FALSE ;
	  END IF;	  
	  
	  
	  IF found_flag THEN
	    
		UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
		activity_string = 	' updated queryid = '||pg_stat_history_rec.queryid||
		                    ' for log_query with id = '||log_query_rec.id		                    
		   				    ;						
					
	    RAISE NOTICE '%',activity_string;	
		EXIT ;
	  END IF ;
	  
	END LOOP ;

Kadib

Farsamada la sharraxay ayaa ugu dambeyntii heshay codsiga nidaamka korjoogteynta waxqabadka weydiinta PostgreSQL ee la sameeyay, taasoo kuu ogolaanaysa inaad hesho macluumaad dheeraad ah oo aad ku falanqayso marka aad xalinayso shilalka waxqabadka su'aalaha ee soo baxaya.

Inkasta oo, dabcan, ra'yigeyga shakhsi ahaaneed, waxay noqon doontaa lagama maarmaan in aan wax badan ka shaqeeyo algorithm ee xulashada iyo beddelka xajmiga qaybta la soo dejiyey. Dhibaatada weli laguma xallin kiiska guud. Waxay u badan tahay inay noqon doonto mid xiiso leh.

Laakiin taasi waa sheeko gabi ahaanba ka duwan...

Source: www.habr.com

Add a comment