PgGraph je uslužni program za arhiviranje i pronalaženje zavisnosti tablica u PostgreSQL-u

PgGraph je uslužni program za arhiviranje i pronalaženje zavisnosti tablica u PostgreSQL-u
Danas želim da predstavim čitaocima Habr-a uslužni program napisan u Pythonu za rad sa zavisnostima tablica u PostgreSQL DBMS-u.

API uslužnog programa je jednostavan i sastoji se od tri metode:

  • archive_table - rekurzivno arhiviranje/brisanje redova sa specificiranim primarnim ključevima
  • get_table_references — potražite zavisnosti za tabelu (prikazaće tabele na koje se referencira navedena i one koje je referenciraju)
  • get_rows_references - traži redove u drugim tabelama koje upućuju na određene redove u željenoj tabeli

prapovijest

Moje ime je Oleg Borzov, ja sam programer u CRM timu za menadžere hipotekarnih kredita u Domkliku.

Glavna baza podataka našeg CRM sistema je jedna od najvećih po obimu u kompaniji. Ujedno je i jedan od najstarijih: pojavio se na samom pokretanju projekta, kada su stabla bila velika, Domklik je bio startup, a umjesto mikroservisa na modernom Python asinhronom okviru postojao je ogroman monolit u PHP-u.

Prelazak sa PHP-a na Python bio je veoma dug i zahtevao je istovremenu podršku oba sistema, što je uticalo na dizajn baze podataka.

Kao rezultat, imamo bazu podataka sa velikim brojem visoko povezanih i ogromnih tabela sa gomilom indeksa za različite vrste upita. Sve to negativno utiče na performanse baze podataka: zbog velikih tabela i gomile relacija među njima, složenost upita se stalno povećava, što je posebno kritično za najopterećenije tabele.

Kako bismo smanjili opterećenje baze podataka, odlučili smo napisati skriptu koja će prenijeti stare zapise iz najobimnijih i najopterećenijih tabela u arhivirane (na primjer, iz task в task_archive).

Ovaj zadatak je kompliciran velikim brojem relacija između tabela: jednostavno premjestite redove iz task в task_archive nije dovoljno, prije toga trebate učiniti isto rekurzivno sa svim onim referencama task stolovi.

Pokazat ću na primjeru demo baza podataka sa stranice postgrespro.ru:

PgGraph je uslužni program za arhiviranje i pronalaženje zavisnosti tablica u PostgreSQL-u
Recimo da treba da izbrišemo zapise iz tabele Flights. Postgres nam neće dozvoliti da to uradimo tek tako: prvo moramo da izbrišemo zapise iz svih referentnih tabela, i tako dalje rekurzivno do tabela koje niko ne referencira.

U našem primjeru na Flights se odnosi Ticket_flights, a na njoj - Boarding_passes.

Stoga ga morate izbrisati ovim redoslijedom:

  1. Dobijamo vrijednosti primarnih ključeva (PK) redova Ticket_flights, koji se odnose na redove u kojima se brišu Flights.
  2. Dobijamo PK redove Boarding_passes, koji se odnose na Ticket_flights.
  3. Brišemo redove po PK iz koraka 2 u tabeli Boarding_passes.
  4. Izbrišite linije pomoću PK od koraka 1 in Ticket_flights.
  5. Uklanjanje linija iz Flights.

Rezultat je bio uslužni program pod nazivom PgGraph, za koji smo odlučili da napravimo open source.

Kako koristiti

Uslužni program podržava dva načina korištenja:

  • Poziv iz komandne linije (pggraph …).
  • Upotreba u Python kodu (cl PgGraphApi).

Instalacija i konfiguracija

Prvo morate instalirati uslužni program iz Pypi spremišta:

pip3 install pggraph

Zatim kreirajte config.ini datoteku na lokalnom računalu s konfiguracijom baze podataka i skriptom za arhiviranje:

[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; Необязательный параметр, указано значение по умолчанию

[archive]  ; Данный раздел заполнять необязательно, ниже указаны значения по умолчанию
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'

Pokreni sa konzole

parametri

$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
  action        required action: archive_table, get_table_references, get_rows_references

optional arguments:
  -h, --help                    show this help message and exit
  --table TABLE                 table name
  --ids IDS                     primary key ids, separated by comma, e.g. 1,2,3
  --config_path CONFIG_PATH     path to config.ini
  --log_path LOG_PATH           path to log dir
  --log_level LOG_LEVEL         log level (debug, info, error)

Pozicioni argumenti:

  • action - potrebna radnja: archive_table, get_table_references ili get_rows_references.

Imenovani argumenti:

  • --config_path — putanja do konfiguracionog fajla;
  • --table — tabela sa kojom treba da izvršite radnju;
  • --ids — lista id odvojenih zarezima, na primjer, 1,2,3 (opcijski parametar);
  • --log_path — putanja do fascikle za dnevnike (opcioni parametar, podrazumevano — početna fascikla);
  • --log_level — nivo evidentiranja (opcioni parametar, podrazumevano je INFO).

Primjeri naredbi

Arhiviranje tabele

Glavna funkcija uslužnog programa je arhiviranje podataka, tj. prenos redova iz glavne tabele u tabelu arhive (na primer, iz tabele knjige в books_archive).

Podržano je i brisanje bez arhiviranja: za ovo morate postaviti parametar u config.ini to_archive = lažno).

Obavezni parametri - config_path, tablica i id.

Nakon pokretanja, zapisi će se rekurzivno brisati ids u tabeli table iu svim tabelama koje se odnose na njega.

$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO:       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO:               START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END

Pronađite zavisnosti za određenu tablicu

Funkcija za pronalaženje zavisnosti određene tabele table. Obavezni parametri - config_path и table.

Nakon pokretanja, na ekranu će se prikazati rečnik gde:

  • in_refs — rečnik tabela koje upućuju na datu, gde je ključ ime tabele, vrednost je lista objekata stranog ključa (pk_main - primarni ključ u glavnoj tabeli, pk_ref - primarni ključ u referentnoj tabeli, fk_ref — naziv kolone koja je strani ključ izvorne tabele);
  • out_refs — rečnik tabela na koje se ovaj poziva.

$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Pronalaženje referenci na nizove sa navedenim primarnim ključem

Funkcija za traženje redova u drugim tabelama koje se odnose na redove putem stranog ključa ids stolovi table. Obavezni parametri - config_path, table и ids.

Nakon pokretanja, na ekranu će se prikazati rečnik sa sledećom strukturom:

{
	pk_id_1: {
		reffering_table_name_1: {
			foreign_key_1: [
				{row_pk_1: value, row_pk_2: value},
				...
			], 
			...
		},
		...
	},
	pk_id_2: {...},
	...
}

Primjer poziva:

$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Upotreba u kodu

Pored pokretanja u konzoli, biblioteka se može koristiti u Python kodu. Primjeri poziva u iPython interaktivnom okruženju prikazani su u nastavku.

Arhiviranje tabele

>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: 	START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 	ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: 	SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: 	END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END

Pronađite zavisnosti za određenu tablicu

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Pronalaženje referenci na nizove sa navedenim primarnim ključem

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Izvorni kod biblioteke dostupan je na adresi GitHub pod MIT licencom, kao iu repozitorijumu PyPI.

Biće mi drago komentarima, obavezama i sugestijama.

Pokušaću da odgovorim na pitanja najbolje što mogu ovde i u repozitorijumu.

izvor: www.habr.com

Dodajte komentar