Uploading PostgreSQL log ntawm AWS cloud

Los yog me ntsis thov tetrisology.
Txhua yam tshiab yog zoo tsis nco qab qub.
Epigraphs.
Uploading PostgreSQL log ntawm AWS cloud

Nqe lus ntawm qhov teeb meem

Koj yuav tsum ua ntu zus rub tawm PostgreSQL cov ntaub ntawv tam sim no los ntawm AWS huab rau koj lub zos Linux host. Tsis yog nyob rau lub sijhawm tiag tiag, tab sis, cia peb hais, nrog qeeb me ntsis.
Lub sijhawm hloov kho cov ntaub ntawv teev cia yog 5 feeb.
Cov ntaub ntawv teev npe hauv AWS yog tig txhua teev.

Cov cuab yeej siv

Txhawm rau rub tawm cov ntaub ntawv nkag mus rau tus tswv tsev, siv daim ntawv bash uas hu rau AWS API "aws rds download-db-log-file-portion".

Cov Txwv:

  • β€”db-instance-identifier: AWS instance name;
  • --log-file-name: lub npe ntawm cov ntaub ntawv tsim tawm tam sim no
  • --max-yam: Tag nrho cov khoom xa rov qab rau hauv cov lus txib tso tawm.Qhov luaj li cas ntawm cov ntaub ntawv downloaded.
  • --starting-token: Pib token

Hauv qhov tshwj xeeb no, txoj haujlwm ntawm kev thauj khoom cav tau tshwm sim thaum lub sijhawm ua haujlwm saib xyuas PostgreSQL query kev ua tau zoo.

Thiab nws yog qhov yooj yim - ib txoj haujlwm nthuav rau kev cob qhia thiab ntau yam thaum lub sijhawm ua haujlwm.
Kuv yuav xav tias qhov teeb meem twb tau daws lawm vim lub neej niaj hnub. Tab sis qhov nrawm Google tsis tau qhia txog kev daws teeb meem, thiab kuv tsis muaj lub siab xav tshawb nrhiav ntau qhov tob. Txawm li cas los xij, nws yog ib qho kev ua haujlwm zoo.

Formalization ntawm txoj haujlwm

Cov ntaub ntawv kaw zaum kawg muaj ntau cov kab ntawm qhov ntev sib txawv. Graphically, cov ntaub ntawv log tuaj yeem sawv cev qee yam zoo li no:
Uploading PostgreSQL log ntawm AWS cloud

Puas yog nws twb qhia koj txog ib yam dab tsi? Tetris ua dab tsi nrog nws? Thiab ntawm no yog qhov nws tau ua nrog nws.
Yog tias peb xav txog cov kev xaiv tau uas tshwm sim thaum thauj cov ntaub ntawv tom ntej graphically (rau kev yooj yim, qhov no, cia cov kab muaj qhov ntev tib yam), peb tau txais Standard Tetris daim:

1) Cov ntaub ntawv yog downloaded nyob rau hauv nws tag nrho thiab yog kawg. Qhov loj me me yog loj dua qhov kawg ntawm cov ntaub ntawv loj:
Uploading PostgreSQL log ntawm AWS cloud

2) Cov ntaub ntawv txuas ntxiv mus. Cov chunk loj me dua qhov kawg ntawm cov ntaub ntawv loj:
Uploading PostgreSQL log ntawm AWS cloud

3) Cov ntaub ntawv yog qhov txuas ntxiv ntawm cov ntaub ntawv dhau los thiab muaj kev txuas ntxiv. Cov chunk loj me dua qhov loj ntawm qhov seem ntawm cov ntaub ntawv kawg:
Uploading PostgreSQL log ntawm AWS cloud

4) Cov ntaub ntawv yog qhov txuas ntxiv ntawm cov ntaub ntawv dhau los thiab yog qhov kawg. Lub chunk loj loj dua qhov loj ntawm qhov seem ntawm cov ntaub ntawv kawg:
Uploading PostgreSQL log ntawm AWS cloud

Lub luag haujlwm yog sib sau ua ib lub duab plaub lossis ua si Tetris ntawm qib tshiab.
Uploading PostgreSQL log ntawm AWS cloud

Teeb meem tshwm sim thaum daws teeb meem

1) Glue ib txoj hlua ntawm 2 pieces

Uploading PostgreSQL log ntawm AWS cloud
Feem ntau, tsis muaj teeb meem tshwj xeeb. Ib qho teeb meem txheem los ntawm chav kawm pib programming.

Optimal serving loj

Tab sis qhov no yog qhov nthuav me ntsis.
Hmoov tsis zoo, tsis muaj txoj hauv kev los siv qhov offset tom qab pib feem daim ntawv lo:

Raws li koj twb paub qhov kev xaiv -pib-token yog siv los qhia qhov twg yuav pib paginating. Qhov kev xaiv no siv txoj hlua qhov tseem ceeb uas yuav txhais tau tias yog tias koj sim ntxiv tus nqi offset nyob rau hauv pem hauv ntej ntawm Next Token txoj hlua, qhov kev xaiv yuav tsis raug suav tias yog qhov offset.

Thiab yog li ntawd, koj yuav tsum nyeem nws hauv chunks.
Yog tias koj nyeem ntau yam, cov ntawv nyeem yuav tsawg, tab sis qhov ntim yuav siab tshaj plaws.
Yog tias koj nyeem me me, ces ntawm qhov tsis sib xws, tus naj npawb ntawm kev nyeem ntawv yuav siab tshaj plaws, tab sis qhov ntim yuav tsawg.
Yog li ntawd, txhawm rau txo cov tsheb khiav thiab rau tag nrho cov kev zoo nkauj ntawm kev daws, kuv yuav tsum tau los nrog kev daws teeb meem, uas, hmoov tsis, zoo li me ntsis zoo li tus ntoo khaub lig.

Rau kev piav qhia, cia peb xav txog cov txheej txheem ntawm rub tawm cov ntaub ntawv teev npe hauv 2 cov qauv yooj yim heev. Tus naj npawb ntawm kev nyeem ntawv nyob rau hauv ob qho tib si nyob ntawm qhov loj me.

1) Load hauv me me:
Uploading PostgreSQL log ntawm AWS cloud

2) Load nyob rau hauv loj feem:
Uploading PostgreSQL log ntawm AWS cloud

Raws li ib txwm muaj, qhov kev daws teeb meem zoo tshaj plaws yog nyob hauv nruab nrab.
Qhov loj me me yog qhov tsawg, tab sis thaum lub sijhawm nyeem ntawv, qhov loj tuaj yeem nce ntxiv kom txo tau cov kev nyeem ntawv.

Nws yuav tsum tau sau tseg tias qhov teeb meem ntawm kev xaiv qhov loj me ntawm qhov kev nyeem tau zoo tseem tsis tau raug daws thiab yuav tsum muaj kev tshawb fawb ntau ntxiv thiab kev tsom xam. Tej zaum tom qab me ntsis.

Kev piav qhia dav dav ntawm kev siv

Cov rooj pabcuam tau siv

CREATE TABLE endpoint
(
id SERIAL ,
host text 
);

TABLE database
(
id SERIAL , 
…
last_aws_log_time text ,
last_aws_nexttoken text ,
aws_max_item_size integer 
);
last_aws_log_time β€” врСмСнная ΠΌΠ΅Ρ‚ΠΊΠ° послСднСго Π·Π°Π³Ρ€ΡƒΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π»ΠΎΠ³-Ρ„Π°ΠΉΠ»Π° Π² Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Π΅ YYYY-MM-DD-HH24.
last_aws_nexttoken β€” тСкстовая ΠΌΠ΅Ρ‚ΠΊΠ° послСднСй Π·Π°Π³Ρ€ΡƒΠΆΠ΅Π½Π½ΠΎΠΉ ΠΏΠΎΡ€Ρ†ΠΈΠΈ.
aws_max_item_size- эмпиричСским ΠΏΡƒΡ‚Π΅ΠΌ, ΠΏΠΎΠ΄ΠΎΠ±Ρ€Π°Π½Π½Ρ‹ΠΉ Π½Π°Ρ‡Π°Π»ΡŒΠ½Ρ‹ΠΉ Ρ€Π°Π·ΠΌΠ΅Ρ€ ΠΏΠΎΡ€Ρ†ΠΈΠΈ.

Cov ntawv sau tag nrho

download_aws_piece.sh

#!/bin/bash
#########################################################
# download_aws_piece.sh
# downloan piece of log from AWS
# version HABR
 let min_item_size=1024
 let max_item_size=1048576
 let growth_factor=3
 let growth_counter=1
 let growth_counter_max=3

 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:''STARTED'
 
 AWS_LOG_TIME=$1
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:AWS_LOG_TIME='$AWS_LOG_TIME
  
 database_id=$2
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:database_id='$database_id
 RESULT_FILE=$3 
  
 endpoint=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE_DATABASE -A -t -c "select e.host from endpoint e join database d on e.id = d.endpoint_id where d.id = $database_id "`
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:endpoint='$endpoint
  
 db_instance=`echo $endpoint | awk -F"." '{print toupper($1)}'`
 
 echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:db_instance='$db_instance

 LOG_FILE=$RESULT_FILE'.tmp_log'
 TMP_FILE=$LOG_FILE'.tmp'
 TMP_MIDDLE=$LOG_FILE'.tmp_mid'  
 TMP_MIDDLE2=$LOG_FILE'.tmp_mid2'  
  
 current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`

 echo $(date +%Y%m%d%H%M)':      download_aws_piece.sh:current_aws_log_time='$current_aws_log_time
  
  if [[ $current_aws_log_time != $AWS_LOG_TIME  ]];
  then
    is_new_log='1'
	if ! psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
	then
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - update database set last_aws_log_time .'
	  exit 1
	fi
  else
    is_new_log='0'
  fi
  
  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:is_new_log='$is_new_log
  
  let last_aws_max_item_size=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select aws_max_item_size from database where id = $database_id "`
  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: last_aws_max_item_size='$last_aws_max_item_size
  
  let count=1
  if [[ $is_new_log == '1' ]];
  then    
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF NEW AWS LOG'
	if ! aws rds download-db-log-file-portion 
		--max-items $last_aws_max_item_size 
		--region REGION 
		--db-instance-identifier  $db_instance 
		--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 2
	fi  	
  else
    next_token=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "select last_aws_nexttoken from database where id = $database_id "`
	
	if [[ $next_token == '' ]];
	then
	  next_token='0'	  
	fi
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: CONTINUE DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
	    --max-items $last_aws_max_item_size 
		--starting-token $next_token 
		--region REGION 
		--db-instance-identifier  $db_instance 
		--log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 3
	fi       
	
	line_count=`cat  $LOG_FILE | wc -l`
	let lines=$line_count-1
	  
	tail -$lines $LOG_FILE > $TMP_MIDDLE 
	mv -f $TMP_MIDDLE $LOG_FILE
  fi
  
  next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
  next_token=`echo $next_token_str | awk -F" " '{ print $2}' `
  
  grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE 
  
  if [[ $next_token == '' ]];
  then
	  cp $TMP_FILE $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
          rm $TMP_MIDDLE2	  
	  exit 0  
  else
	psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
  fi
  
  first_str=`tail -1 $TMP_FILE`
  
  line_count=`cat  $TMP_FILE | wc -l`
  let lines=$line_count-1    
  
  head -$lines $TMP_FILE  > $RESULT_FILE

###############################################
# MAIN CIRCLE
  let count=2
  while [[ $next_token != '' ]];
  do 
    echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: count='$count
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
             --max-items $last_aws_max_item_size 
             --starting-token $next_token 
             --region REGION 
             --db-instance-identifier  $db_instance 
             --log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 4
	fi

	next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
	next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

	TMP_FILE=$LOG_FILE'.tmp'
	grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE  
	
	last_str=`head -1 $TMP_FILE`
  
    if [[ $next_token == '' ]];
	then
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  tail -$lines $TMP_FILE >> $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
          rm $TMP_MIDDLE2	  
	  exit 0  
	fi
	
    if [[ $next_token != '' ]];
	then
		let growth_counter=$growth_counter+1
		if [[ $growth_counter -gt $growth_counter_max ]];
		then
			let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
			let growth_counter=1
		fi
	
		if [[ $last_aws_max_item_size -gt $max_item_size ]]; 
		then
			let last_aws_max_item_size=$max_item_size
		fi 

	  psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
	  
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  #############################
	  #Get middle of file
	  head -$lines $TMP_FILE > $TMP_MIDDLE
	  
	  line_count=`cat  $TMP_MIDDLE | wc -l`
	  let lines=$line_count-1
	  tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
	  
	  cat $TMP_MIDDLE2 >> $RESULT_FILE	  
	  
	  first_str=`tail -1 $TMP_FILE`	  
	fi
	  
    let count=$count+1

  done
#
#################################################################

exit 0  

Script fragments nrog qee cov lus piav qhia:

Script input parameters:

  • Timestamp of log file name in the format YYYY-MM-DD-HH24: AWS_LOG_TIME=$1
  • Database ID: database_id=$2
  • Lub npe ntawm cov ntaub ntawv sau tseg: RESULT_FILE=$3

Tau txais lub timestamp ntawm lub xeem loaded log ntaub ntawv:

current_aws_log_time=`psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -c "select last_aws_log_time from database where id = $database_id "`

Yog hais tias lub sij hawm ntawm lub xeem loaded log ntaub ntawv tsis phim lub input parameter, ib tug tshiab log ntaub ntawv yog loaded:

if [[ $current_aws_log_time != $AWS_LOG_TIME  ]];
  then
    is_new_log='1'
	if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -v ON_ERROR_STOP=1 -A -t -c "update database set last_aws_log_time = '$AWS_LOG_TIME' where id = $database_id "
	then
	  echo '***download_aws_piece.sh -FATAL_ERROR - update database set last_aws_log_time .'
	  exit 1
	fi
  else
    is_new_log='0'
  fi

Peb tau txais tus nqi ntawm daim ntawv lo tom ntej los ntawm cov ntaub ntawv rub tawm:

  next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
  next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

Tus nqi npliag nexttoken ua lub cim ntawm qhov kawg ntawm qhov rub tawm.

Nyob rau hauv ib lub voj, peb suav cov feem ntawm cov ntaub ntawv, concatenating kab nyob rau hauv txoj kev thiab nce qhov luaj li cas ntawm feem:
Main voj

# MAIN CIRCLE
  let count=2
  while [[ $next_token != '' ]];
  do 
    echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: count='$count
	
	echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: START DOWNLOADING OF AWS LOG'
	if ! aws rds download-db-log-file-portion 
     --max-items $last_aws_max_item_size 
	 --starting-token $next_token 
     --region REGION 
     --db-instance-identifier  $db_instance 
     --log-file-name error/postgresql.log.$AWS_LOG_TIME > $LOG_FILE
	then
		echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh: FATAL_ERROR - Could not get log from AWS .'
		exit 4
	fi

	next_token_str=`cat $LOG_FILE | grep NEXTTOKEN` 
	next_token=`echo $next_token_str | awk -F" " '{ print $2}' `

	TMP_FILE=$LOG_FILE'.tmp'
	grep -v NEXTTOKEN $LOG_FILE  > $TMP_FILE  
	
	last_str=`head -1 $TMP_FILE`
  
    if [[ $next_token == '' ]];
	then
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  tail -$lines $TMP_FILE >> $RESULT_FILE
	  
	  echo $(date +%Y%m%d%H%M)':    download_aws_piece.sh:  NEXTTOKEN NOT FOUND - FINISH '
	  rm $LOG_FILE 
	  rm $TMP_FILE
	  rm $TMP_MIDDLE
         rm $TMP_MIDDLE2	  
	  exit 0  
	fi
	
    if [[ $next_token != '' ]];
	then
		let growth_counter=$growth_counter+1
		if [[ $growth_counter -gt $growth_counter_max ]];
		then
			let last_aws_max_item_size=$last_aws_max_item_size*$growth_factor
			let growth_counter=1
		fi
	
		if [[ $last_aws_max_item_size -gt $max_item_size ]]; 
		then
			let last_aws_max_item_size=$max_item_size
		fi 

	  psql -h MONITOR_ENDPOINT.rds.amazonaws.com -U USER -d MONITOR_DATABASE -A -t -q -c "update database set last_aws_nexttoken = '$next_token' where id = $database_id "
	  
	  concat_str=$first_str$last_str
	  	  
	  echo $concat_str >> $RESULT_FILE
		 
	  line_count=`cat  $TMP_FILE | wc -l`
	  let lines=$line_count-1
	  
	  #############################
	  #Get middle of file
	  head -$lines $TMP_FILE > $TMP_MIDDLE
	  
	  line_count=`cat  $TMP_MIDDLE | wc -l`
	  let lines=$line_count-1
	  tail -$lines $TMP_MIDDLE > $TMP_MIDDLE2
	  
	  cat $TMP_MIDDLE2 >> $RESULT_FILE	  
	  
	  first_str=`tail -1 $TMP_FILE`	  
	fi
	  
    let count=$count+1

  done

Tom ntej no yog dab tsi?

Yog li, thawj txoj haujlwm nruab nrab - "rub tawm cov ntaub ntawv teev cia los ntawm huab" tau raug daws. Yuav ua li cas nrog lub downloaded log?
Ua ntej, koj yuav tsum txheeb xyuas cov ntaub ntawv teev tseg thiab rho tawm cov kev thov tiag tiag los ntawm nws.
Txoj hauj lwm tsis nyuaj heev. Qhov yooj yim tshaj plaws bash tsab ntawv ua haujlwm zoo heev.
upload_log_query.sh

#!/bin/bash
#########################################################
# upload_log_query.sh
# Upload table table from dowloaded aws file 
# version HABR
###########################################################  
echo 'TIMESTAMP:'$(date +%c)' Upload log_query table '
source_file=$1
echo 'source_file='$source_file
database_id=$2
echo 'database_id='$database_id

beginer=' '
first_line='1'
let "line_count=0"
sql_line=' '
sql_flag=' '    
space=' '
cat $source_file | while read line
do
  line="$space$line"

  if [[ $first_line == "1" ]]; then
    beginer=`echo $line | awk -F" " '{ print $1}' `
    first_line='0'
  fi

  current_beginer=`echo $line | awk -F" " '{ print $1}' `

  if [[ $current_beginer == $beginer ]]; then
    if [[ $sql_flag == '1' ]]; then
     sql_flag='0' 
     log_date=`echo $sql_line | awk -F" " '{ print $1}' `
     log_time=`echo $sql_line | awk -F" " '{ print $2}' `
     duration=`echo $sql_line | awk -F" " '{ print $5}' `

     #replace ' to ''
     sql_modline=`echo "$sql_line" | sed 's/'''/''''''/g'`
     sql_line=' '

	 ################
	 #PROCESSING OF THE SQL-SELECT IS HERE
     if ! psql -h ENDPOINT.rds.amazonaws.com -U USER -d DATABASE -v ON_ERROR_STOP=1 -A -t -c "select log_query('$ip_port',$database_id , '$log_date' , '$log_time' , '$duration' , '$sql_modline' )" 
     then
        echo 'FATAL_ERROR - log_query '
        exit 1
     fi
	 ################

    fi #if [[ $sql_flag == '1' ]]; then

    let "line_count=line_count+1"

    check=`echo $line | awk -F" " '{ print $8}' `
    check_sql=${check^^}    

    #echo 'check_sql='$check_sql
    
    if [[ $check_sql == 'SELECT' ]]; then
     sql_flag='1'    
     sql_line="$sql_line$line"
	 ip_port=`echo $sql_line | awk -F":" '{ print $4}' `
    fi
  else       

    if [[ $sql_flag == '1' ]]; then
      sql_line="$sql_line$line"
    fi   
    
  fi #if [[ $current_beginer == $beginer ]]; then

done

Tam sim no koj tuaj yeem ua haujlwm nrog qhov kev thov xaiv los ntawm cov ntaub ntawv teev cia.

Thiab ntau lub sijhawm muaj txiaj ntsig qhib.

Cov lus nug parsed yuav tsum tau muab khaws cia rau qhov chaw. Ib lub rooj pabcuam yog siv rau qhov no log_query

CREATE TABLE log_query
(
   id SERIAL ,
   queryid bigint ,
   query_md5hash text not null ,
   database_id integer not null ,  
   timepoint timestamp without time zone not null,
   duration double precision not null ,
   query text not null ,
   explained_plan text[],
   plan_md5hash text  , 
   explained_plan_wo_costs text[],
   plan_hash_value text  ,
   baseline_id integer ,
   ip text ,
   port text 
);
ALTER TABLE log_query ADD PRIMARY KEY (id);
ALTER TABLE log_query ADD CONSTRAINT queryid_timepoint_unique_key UNIQUE (queryid, timepoint );
ALTER TABLE log_query ADD CONSTRAINT query_md5hash_timepoint_unique_key UNIQUE (query_md5hash, timepoint );

CREATE INDEX log_query_timepoint_idx ON log_query (timepoint);
CREATE INDEX log_query_queryid_idx ON log_query (queryid);
ALTER TABLE log_query ADD CONSTRAINT database_id_fk FOREIGN KEY (database_id) REFERENCES database (id) ON DELETE CASCADE ;

Qhov kev thov parsed yog ua tiav hauv lpgsql ua ua haujlwm "log_query".
log_query.sql

--log_query.sql
--verison HABR
CREATE OR REPLACE FUNCTION log_query( ip_port text ,log_database_id integer , log_date text , log_time text , duration text , sql_line text   ) RETURNS boolean AS $$
DECLARE
  result boolean ;
  log_timepoint timestamp without time zone ;
  log_duration double precision ; 
  pos integer ;
  log_query text ;
  activity_string text ;
  log_md5hash text ;
  log_explain_plan text[] ;
  
  log_planhash text ;
  log_plan_wo_costs text[] ; 
  
  database_rec record ;
  
  pg_stat_query text ; 
  test_log_query text ;
  log_query_rec record;
  found_flag boolean;
  
  pg_stat_history_rec record ;
  port_start integer ;
  port_end integer ;
  client_ip text ;
  client_port text ;
  log_queryid bigint ;
  log_query_text text ;
  pg_stat_query_text text ; 
BEGIN
  result = TRUE ;

  RAISE NOTICE '***log_query';
  
  port_start = position('(' in ip_port);
  port_end = position(')' in ip_port);
  client_ip = substring( ip_port from 1 for port_start-1 );
  client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );

  SELECT e.host , d.name , d.owner_pwd 
  INTO database_rec
  FROM database d JOIN endpoint e ON e.id = d.endpoint_id
  WHERE d.id = log_database_id ;
  
  log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
  log_duration = duration:: double precision; 

  
  pos = position ('SELECT' in UPPER(sql_line) );
  log_query = substring( sql_line from pos for LENGTH(sql_line));
  log_query = regexp_replace(log_query,' +',' ','g');
  log_query = regexp_replace(log_query,';+','','g');
  log_query = trim(trailing ' ' from log_query);
 

  log_md5hash = md5( log_query::text );
  
  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');
  --------------------------
  BEGIN
	INSERT INTO log_query
	(
		query_md5hash ,
		database_id , 
		timepoint ,
		duration ,
		query ,
		explained_plan ,
		plan_md5hash , 
		explained_plan_wo_costs , 
		plan_hash_value , 
		ip , 
		port
	) 
	VALUES 
	(
		log_md5hash ,
		log_database_id , 
		log_timepoint , 
		log_duration , 
		log_query ,
		log_explain_plan , 
		md5(log_explain_plan::text) ,
		log_plan_wo_costs , 
		md5(log_plan_wo_costs::text),
		client_ip , 
		client_port		
	);
	activity_string = 	'New query has logged '||
						' database_id = '|| log_database_id ||
						' query_md5hash='||log_md5hash||
						' , timepoint = '||to_char(log_timepoint,'YYYYMMDD HH24:MI:SS');
					
	RAISE NOTICE '%',activity_string;					
					 
	PERFORM pg_log( log_database_id , 'log_query' , activity_string);  

	EXCEPTION
	  WHEN unique_violation THEN
		RAISE NOTICE '*** unique_violation *** query already has logged';
	END;

	SELECT 	queryid
	INTO   	log_queryid
	FROM 	log_query 
	WHERE 	query_md5hash = log_md5hash AND
			timepoint = log_timepoint;

	IF log_queryid IS NOT NULL 
	THEN 
	  RAISE NOTICE 'log_query with query_md5hash = % and timepoint = % has already has a QUERYID = %',log_md5hash,log_timepoint , log_queryid ;
	  RETURN result;
	END IF;
	
	------------------------------------------------
	RAISE NOTICE 'Update queryid';	
	
	SELECT * 
	INTO log_query_rec
	FROM log_query
	WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ; 
	
	log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
	
	FOR pg_stat_history_rec IN
	 SELECT 
         queryid ,
	  query 
	 FROM 
         pg_stat_db_queries 
     WHERE  
      database_id = log_database_id AND
       queryid is not null 
	LOOP
	  pg_stat_query = pg_stat_history_rec.query ; 
	  pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
	
	  log_query_text = trim(trailing ' ' from log_query_rec.query);
	  pg_stat_query_text = pg_stat_query; 
	
	  
	  --SELECT log_query_rec.query like pg_stat_query INTO found_flag ; 
	  IF (log_query_text LIKE pg_stat_query_text) THEN
		found_flag = TRUE ;
	  ELSE
		found_flag = FALSE ;
	  END IF;	  
	  
	  
	  IF found_flag THEN
	    
		UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
		activity_string = 	' updated queryid = '||pg_stat_history_rec.queryid||
		                    ' for log_query with id = '||log_query_rec.id               
		   				    ;						
	    RAISE NOTICE '%',activity_string;	
		EXIT ;
	  END IF ;
	  
	END LOOP ;
	
  RETURN result ;
END
$$ LANGUAGE plpgsql;

Lub rooj pabcuam yog siv thaum lub sijhawm ua haujlwm pg_stat_db_queries, muaj ib qho snapshot ntawm cov lus nug tam sim no los ntawm lub rooj pg_stat_history (Kev siv lub rooj tau piav qhia ntawm no - Saib xyuas PostgreSQL cov lus nug ua haujlwm. Ntu 1 - qhia)

TABLE pg_stat_db_queries
(
   database_id integer,  
   queryid bigint ,  
   query text , 
   max_time double precision 
);

TABLE pg_stat_history 
(
…
database_id integer ,
…
queryid bigint ,
…
max_time double precision	 , 	
…
);

Txoj haujlwm tso cai rau koj los siv ntau lub peev xwm muaj txiaj ntsig rau kev ua cov ntawv thov los ntawm cov ntaub ntawv teev cia. Xws li:

Lub Sijhawm #1 - Cov lus nug ua tiav keeb kwm

Muaj txiaj ntsig zoo rau kev pib daws qhov xwm txheej ua haujlwm. Ua ntej, paub txog keeb kwm - thaum twg pib qeeb?
Tom qab ntawd, raws li cov classics, saib rau sab nraud yog vim li cas. Tej zaum lub database load tau yooj yim nce sharply thiab qhov kev thov tshwj xeeb tsis muaj dab tsi ua nrog nws.
Ntxiv ib qho kev nkag tshiab rau lub rooj log_query

  port_start = position('(' in ip_port);
  port_end = position(')' in ip_port);
  client_ip = substring( ip_port from 1 for port_start-1 );
  client_port = substring( ip_port from port_start+1 for port_end-port_start-1 );

  SELECT e.host , d.name , d.owner_pwd 
  INTO database_rec
  FROM database d JOIN endpoint e ON e.id = d.endpoint_id
  WHERE d.id = log_database_id ;
  
  log_timepoint = to_timestamp(log_date||' '||log_time,'YYYY-MM-DD HH24-MI-SS');
  log_duration = to_number(duration,'99999999999999999999D9999999999'); 

  
  pos = position ('SELECT' in UPPER(sql_line) );
  log_query = substring( sql_line from pos for LENGTH(sql_line));
  log_query = regexp_replace(log_query,' +',' ','g');
  log_query = regexp_replace(log_query,';+','','g');
  log_query = trim(trailing ' ' from log_query);
 
  RAISE NOTICE 'log_query=%',log_query ;   

  log_md5hash = md5( log_query::text );
  
  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');
  --------------------------
  BEGIN
	INSERT INTO log_query
	(
		query_md5hash ,
		database_id , 
		timepoint ,
		duration ,
		query ,
		explained_plan ,
		plan_md5hash , 
		explained_plan_wo_costs , 
		plan_hash_value , 
		ip , 
		port
	) 
	VALUES 
	(
		log_md5hash ,
		log_database_id , 
		log_timepoint , 
		log_duration , 
		log_query ,
		log_explain_plan , 
		md5(log_explain_plan::text) ,
		log_plan_wo_costs , 
		md5(log_plan_wo_costs::text),
		client_ip , 
		client_port		
	);

Muaj peev xwm #2 - Txuag cov lus nug ua tiav

Nyob rau lub sijhawm no, kev tawm tsam-kev qhia meej-lus tuaj yeem tshwm sim: "Tab sis twb muaj autoexplain" Yog, nws nyob ntawd, tab sis dab tsi yog lub ntsiab lus yog tias qhov kev npaj ua tiav tau muab khaws cia rau hauv tib daim ntawv teev npe thiab txhawm rau khaws nws rau kev tshuaj xyuas ntxiv, koj yuav tsum txheeb xyuas cov ntaub ntawv teev npe?

Qhov kuv xav tau yog:
ua ntej: khaws cov phiaj xwm ua tiav hauv lub rooj pabcuam ntawm cov ntaub ntawv saib xyuas;
Thib ob: kom muaj peev xwm sib piv cov phiaj xwm ua tiav nrog ib leeg txhawm rau pom tam sim ntawd tias cov lus nug ua tiav tau hloov pauv.

Muaj ib qho kev thov nrog cov kev ua haujlwm tshwj xeeb. Tau txais thiab txuag nws txoj kev npaj ua tiav siv EXPLAIN yog ib txoj haujlwm tseem ceeb.
Ntxiv mus, siv qhov EXPLAIN (COSTS FALSE) qhia, koj tuaj yeem tau txais lub cev pob txha ntawm txoj kev npaj, uas yuav siv tau kom tau txais tus nqi hash ntawm txoj kev npaj, uas yuav pab nrog kev soj ntsuam tom ntej ntawm keeb kwm ntawm kev hloov pauv hauv txoj kev npaj ua tiav.
Tau txais daim phiaj xwm txheej txheem

  --Explain execution plan--
  EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||database_rec.host||' dbname='||database_rec.name||' user=DATABASE password='||database_rec.owner_pwd||' '')'; 
  
  log_explain_plan = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN '||log_query ) AS t (plan text) );
  log_plan_wo_costs = ARRAY ( SELECT * FROM dblink('LINK1', 'EXPLAIN ( COSTS FALSE ) '||log_query ) AS t (plan text) );
    
  PERFORM dblink_disconnect('LINK1');

Muaj peev xwm #3 - Siv cov lus nug rau kev saib xyuas

Txij li cov kev ntsuas kev ua tau zoo tau teeb tsa tsis nyob ntawm cov ntawv thov, tab sis ntawm nws tus ID, koj yuav tsum koom nrog kev thov los ntawm cov ntaub ntawv teev cia nrog kev thov rau qhov kev ntsuas kev ua haujlwm tau teeb tsa.
Zoo, yam tsawg kawg nkaus thiaj li muaj lub sijhawm tiag tiag ntawm qhov tshwm sim ntawm qhov xwm txheej ua haujlwm.

Txoj kev no, thaum muaj xwm txheej tshwm sim rau daim ntawv thov ID, yuav muaj qhov txuas mus rau ib qho kev thov tshwj xeeb nrog cov nqi tshwj xeeb thiab lub sijhawm ua tiav thiab lub sijhawm ntawm qhov kev thov. Tau txais cov ntaub ntawv no siv tsuas yog saib pg_statements - txwv tsis pub.
Nrhiav cov lus nug ntawm qhov kev thov thiab hloov kho qhov nkag hauv lub rooj log_query

SELECT * 
	INTO log_query_rec
	FROM log_query
	WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ; 
	
	log_query_rec.query=regexp_replace(log_query_rec.query,';+','','g');
	
	FOR pg_stat_history_rec IN
	 SELECT 
      queryid ,
	  query 
	 FROM 
       pg_stat_db_queries 
     WHERE  
	   database_id = log_database_id AND
       queryid is not null 
	LOOP
	  pg_stat_query = pg_stat_history_rec.query ; 
	  pg_stat_query=regexp_replace(pg_stat_query,'n+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'t+',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,' +',' ','g');
	  pg_stat_query=regexp_replace(pg_stat_query,'$.','%','g');
	
	  log_query_text = trim(trailing ' ' from log_query_rec.query);
	  pg_stat_query_text = pg_stat_query; 
	  
	  --SELECT log_query_rec.query like pg_stat_query INTO found_flag ; 
	  IF (log_query_text LIKE pg_stat_query_text) THEN
		found_flag = TRUE ;
	  ELSE
		found_flag = FALSE ;
	  END IF;	  
	  
	  
	  IF found_flag THEN
	    
		UPDATE log_query SET queryid = pg_stat_history_rec.queryid WHERE query_md5hash = log_md5hash AND timepoint = log_timepoint ;
		activity_string = 	' updated queryid = '||pg_stat_history_rec.queryid||
		                    ' for log_query with id = '||log_query_rec.id		                    
		   				    ;						
					
	    RAISE NOTICE '%',activity_string;	
		EXIT ;
	  END IF ;
	  
	END LOOP ;

Tom qab ntawd

Cov txheej txheem piav qhia nws thiaj li pom daim ntawv thov hauv tus tsim PostgreSQL query kev ua tau zoo xyuas qhov system, tso cai rau koj kom muaj cov ntaub ntawv ntau ntxiv los txheeb xyuas thaum daws cov lus nug tshwm sim tshwm sim.

Txawm hais tias, ntawm chav kawm, hauv kuv tus kheej lub tswv yim, nws yuav tsim nyog los ua haujlwm ntxiv ntawm cov algorithm rau kev xaiv thiab hloov qhov loj ntawm qhov rub tawm. Qhov teeb meem tseem tsis tau raug daws nyob rau hauv rooj plaub dav dav. Tej zaum nws yuav nthuav.

Tab sis qhov ntawd yog ib zaj dab neeg sib txawv kiag li ...

Tau qhov twg los: www.hab.com

Ntxiv ib saib