PgGraph je uslužni program za arhiviranje i pronalaženje ovisnosti tablica u PostgreSQL-u

PgGraph je uslužni program za arhiviranje i pronalaženje ovisnosti tablica u PostgreSQL-u
Danas čitateljima Habra želim predstaviti uslužni program napisan u Pythonu za rad sa ovisnostima tablica u PostgreSQL DBMS-u.

API uslužnog programa jednostavan je i sastoji se od tri metode:

  • arhivska_tablica - rekurzivno arhiviranje/brisanje redaka s navedenim primarnim ključevima
  • get_table_references — traženje ovisnosti za tablicu (prikazat će tablice na koje navedena navedena tablica referencira i one na koje se ona poziva)
  • get_rows_references - traženje redaka u drugim tablicama koji upućuju na određene retke u željenoj tablici

prapovijest

Moje ime je Oleg Borzov, programer sam u CRM timu za menadžere hipotekarnih kredita u Domkliku.

Glavna baza podataka našeg CRM sustava jedna je od najvećih po veličini u tvrtki. Ujedno je i jedan od najstarijih: pojavio se na samom lansiranju projekta, kad su drveća bila velika, Domklik je bio startup, a umjesto mikroservisa na pomodnom Python asynchronous frameworku bio je ogroman monolit u PHP-u.

Prijelaz s PHP-a na Python bio je vrlo dug i zahtijevao je istovremenu podršku oba sustava, što je utjecalo na dizajn baze podataka.

Kao rezultat, imamo bazu podataka s velikim brojem visoko povezanih i ogromnih tablica s hrpom indeksa za različite vrste upita. Sve to negativno utječe na performanse baze podataka: zbog velikih tablica i hrpe odnosa između njih, složenost upita stalno raste, što je posebno kritično za najopterećenije tablice.

Kako bismo smanjili opterećenje baze podataka, odlučili smo napisati skriptu koja bi prebacivala stare zapise iz najobimnijih i najopterećenijih tablica u arhivirane (npr. task в task_archive).

Ovaj zadatak je kompliciran velikim brojem odnosa između tablica: jednostavno premjestite retke iz task в task_archive nije dovoljno, prije toga morate učiniti isto rekurzivno sa svim tim referencama task stolovi.

Pokazat ću primjerom demo baza podataka sa stranice postgrespro.ru:

PgGraph je uslužni program za arhiviranje i pronalaženje ovisnosti tablica u PostgreSQL-u
Recimo da trebamo izbrisati zapise iz tablice Flights. Postgres nam neće dopustiti da to učinimo tek tako: prvo moramo izbrisati zapise iz svih referentnih tablica, i tako dalje rekurzivno do tablica koje nitko ne referencira.

U našem primjeru na Flights upućuje Ticket_flights, a na njoj - Boarding_passes.

Stoga ga trebate izbrisati ovim redoslijedom:

  1. Dobivamo vrijednosti primarnih ključeva (PK) redaka u Ticket_flights, koji se odnose na retke u kojima treba izbrisati Flights.
  2. Dobivamo PK redove Boarding_passes, koji se odnose na Ticket_flights.
  3. Brišemo redove prema PK iz koraka 2 u tablici Boarding_passes.
  4. Izbrišite retke prema PK iz koraka 1 u Ticket_flights.
  5. Uklanjanje linija iz Flights.

Rezultat je bio pomoćni program pod nazivom PgGraph, koji smo odlučili učiniti otvorenim kodom.

Kako koristiti

Uslužni program podržava dva načina korištenja:

  • Poziv iz naredbenog retka (pggraph …).
  • Upotreba u Python kodu (klasa PgGraphApi).

Instalacija i konfiguracija

Prvo morate instalirati uslužni program iz Pypi repozitorija:

pip3 install pggraph

Zatim kreirajte datoteku config.ini na lokalnom računalu s konfiguracijom baze podataka i skriptom za arhiviranje:

[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; Необязательный параметр, указано значение по умолчанию

[archive]  ; Данный раздел заполнять необязательно, ниже указаны значения по умолчанию
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'

Pokreni s konzole

Parametri

$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
  action        required action: archive_table, get_table_references, get_rows_references

optional arguments:
  -h, --help                    show this help message and exit
  --table TABLE                 table name
  --ids IDS                     primary key ids, separated by comma, e.g. 1,2,3
  --config_path CONFIG_PATH     path to config.ini
  --log_path LOG_PATH           path to log dir
  --log_level LOG_LEVEL         log level (debug, info, error)

Pozicijski argumenti:

  • action - potrebna radnja: archive_table, get_table_references ili get_rows_references.

Imenovani argumenti:

  • --config_path — put do konfiguracijske datoteke;
  • --table — tablica s kojom trebate izvršiti radnju;
  • --ids — popis id-ova odvojenih zarezima, na primjer, 1,2,3 (neobavezni parametar);
  • --log_path — put do mape za zapise (neobavezni parametar, prema zadanim postavkama — početna mapa);
  • --log_level — razina zapisivanja (neobavezni parametar, zadana je INFO).

Primjeri naredbi

Arhiviranje tablice

Glavna funkcija uslužnog programa je arhiviranje podataka, tj. prijenos redaka iz glavne tablice u arhivsku tablicu (npr. iz tablice knjige в arhiva_knjiga).

Podržano je i brisanje bez arhiviranja: za ovo trebate postaviti parametar u config.ini u_arhiv = netočno).

Potrebni parametri - config_path, tablica i ID-ovi.

Nakon pokretanja, zapisi će se rekurzivno brisati ids u stolu table i u svim tablicama koje se na njega odnose.

$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO:       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO:               START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END

Pronađite ovisnosti za određenu tablicu

Funkcija za pronalaženje ovisnosti navedene tablice table. Potrebni parametri - config_path и table.

Nakon pokretanja, na zaslonu će se prikazati rječnik, gdje:

  • in_refs — rječnik tablica koje referenciraju danu, gdje je ključ naziv tablice, vrijednost je popis objekata stranog ključa (pk_main - primarni ključ u glavnoj tablici, pk_ref - primarni ključ u referentnoj tablici, fk_ref — naziv stupca koji je strani ključ izvorne tablice);
  • out_refs — rječnik tablica na koji se ovaj odnosi.

$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Pronalaženje referenci na nizove s navedenim primarnim ključem

Funkcija za traženje redaka u drugim tablicama koji se odnose na retke putem stranog ključa ids stolovi table. Potrebni parametri - config_path, table и ids.

Nakon pokretanja, na zaslonu će se prikazati rječnik sa sljedećom strukturom:

{
	pk_id_1: {
		reffering_table_name_1: {
			foreign_key_1: [
				{row_pk_1: value, row_pk_2: value},
				...
			], 
			...
		},
		...
	},
	pk_id_2: {...},
	...
}

Primjer poziva:

$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Upotreba u kodu

Osim pokretanja u konzoli, biblioteka se može koristiti u Python kodu. Primjeri poziva u interaktivnom okruženju iPython prikazani su u nastavku.

Arhiviranje tablice

>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: 	START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 	ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: 	SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: 	END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END

Pronađite ovisnosti za određenu tablicu

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Pronalaženje referenci na nizove s navedenim primarnim ključem

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Izvorni kod knjižnice dostupan je na GitHub pod licencom MIT-a, kao i u repozitoriju PyPI.

Bit će mi drago komentarima, obvezama i prijedlozima.

Pokušat ću odgovoriti na pitanja najbolje što mogu ovdje i u repozitoriju.

Izvor: www.habr.com

Dodajte komentar