Danes želim bralcem Habra predstaviti pripomoček, napisan v Pythonu za delo z odvisnostmi tabel v DBMS PostgreSQL.
API pripomočka je preprost in je sestavljen iz treh metod:
- arhivska_tabela - rekurzivno arhiviranje/brisanje vrstic z določenimi primarnimi ključi
- get_table_references — iskanje odvisnosti za tabelo (prikaže tabele, na katere se sklicuje podana tabela, in tiste, ki se sklicujejo nanjo)
- get_rows_references - iskanje vrstic v drugih tabelah, ki se sklicujejo na določene vrstice v želeni tabeli
prazgodovina
Moje ime je Oleg Borzov, sem razvijalec v CRM ekipi za upravitelje hipotekarnih kreditov v Domkliku.
Glavna zbirka podatkov našega sistema CRM je po obsegu ena največjih v podjetju. Je tudi eden najstarejših: pojavil se je ob samem zagonu projekta, ko so bila drevesa velika, Domklik startup, namesto mikrostoritve na modnem asinhronem ogrodju Python pa je bil ogromen monolit v PHP.
Prehod s PHP na Python je bil zelo dolg in je zahteval hkratno podporo obeh sistemov, kar je vplivalo na zasnovo podatkovne baze.
Posledično imamo bazo z velikim številom zelo povezanih in ogromnih tabel s kopico indeksov za različne vrste poizvedb. Vse to negativno vpliva na delovanje baze: zaradi velikih tabel in kopice relacij med njimi se kompleksnost poizvedb nenehno povečuje, kar je še posebej kritično pri najbolj obremenjenih tabelah.
Da bi zmanjšali obremenitev podatkovne baze, smo se odločili napisati skripto, ki bi prenašala stare zapise iz najbolj obsežnih in naloženih tabel v arhivirane (npr. task
в task_archive
).
Ta naloga je zapletena zaradi velikega števila odnosov med tabelami: preprosto premaknite vrstice iz task
в task_archive
ni dovolj, pred tem morate storiti isto rekurzivno z vsemi tistimi, ki se sklicujejo task
mize.
Pokazal bom s primerom
Recimo, da moramo izbrisati zapise iz tabele Flights
. Postgres nam tega ne dovoli kar tako: najprej moramo izbrisati zapise iz vseh referenčnih tabel in tako naprej rekurzivno do tabel, na katere se nihče ne sklicuje.
V našem primeru pri Flights
se nanaša Ticket_flights
, in na njej - Boarding_passes
.
Zato ga morate izbrisati v tem vrstnem redu:
- Dobimo vrednosti primarnih ključev (PK) vrstic v
Ticket_flights
, ki se nanašajo na vrstice, v katerih želite izbrisatiFlights
. - Dobimo vrstice PK
Boarding_passes
, ki se nanašajo naTicket_flights
. - V tabeli brišemo vrstice po PK iz 2. koraka
Boarding_passes
. - Izbrišite vrstice po PK iz 1. koraka v
Ticket_flights
. - Odstranjevanje vrstic iz
Flights
.
Rezultat je bil pripomoček, imenovan PgGraph, za katerega smo se odločili, da bo odprtokoden.
Kako uporabljati
Pripomoček podpira dva načina uporabe:
- Pokličite iz ukazne vrstice (
pggraph …
). - Uporaba v kodi Python (razred
PgGraphApi
).
Namestitev in konfiguracija
Najprej morate namestiti pripomoček iz repozitorija Pypi:
pip3 install pggraph
Nato na lokalnem računalniku ustvarite datoteko config.ini s konfiguracijo baze podatkov in skriptom za arhiviranje:
[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; Необязательный параметр, указано значение по умолчанию
[archive] ; Данный раздел заполнять необязательно, ниже указаны значения по умолчанию
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'
Zaženi s konzole
Parametri
$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
action required action: archive_table, get_table_references, get_rows_references
optional arguments:
-h, --help show this help message and exit
--table TABLE table name
--ids IDS primary key ids, separated by comma, e.g. 1,2,3
--config_path CONFIG_PATH path to config.ini
--log_path LOG_PATH path to log dir
--log_level LOG_LEVEL log level (debug, info, error)
Pozicijski argumenti:
action
- potrebno dejanje:archive_table
,get_table_references
aliget_rows_references
.
Imenovani argumenti:
--config_path
— pot do konfiguracijske datoteke;--table
— tabela, s katero morate izvesti dejanje;--ids
— seznam ID-jev, ločenih z vejicami, na primer,1,2,3
(izbirni parameter);--log_path
— pot do mape za dnevnike (izbirni parameter, privzeto — domača mapa);--log_level
— raven beleženja (izbirni parameter, privzeto je INFO).
Primeri ukazov
Arhiviranje tabele
Glavna funkcija pripomočka je arhiviranje podatkov, tj. prenos vrstic iz glavne tabele v arhivsko tabelo (na primer iz tabele knjige в arhiv_knjig).
Podprto je tudi brisanje brez arhiviranja: za to morate nastaviti parameter v config.ini v_arhiv = napačen).
Zahtevani parametri - config_path, tabela in ids.
Po zagonu bodo zapisi rekurzivno izbrisani ids
v tabeli table
in v vseh tabelah, ki se nanašajo na to.
$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END
Poiščite odvisnosti za določeno tabelo
Funkcija za iskanje odvisnosti določene tabele table
. Zahtevani parametri - config_path
и table
.
Po zagonu se na zaslonu prikaže slovar, kjer:
in_refs
— slovar tabel, ki se sklicujejo na dano tabelo, kjer je ključ ime tabele, vrednost pa seznam objektov tujega ključa (pk_main
- primarni ključ v glavni tabeli,pk_ref
- primarni ključ v referenčni tabeli,fk_ref
— ime stolpca, ki je tuji ključ izvorne tabele);out_refs
— slovar tabel, na katere se ta nanaša.
$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
Iskanje referenc na nize z navedenim primarnim ključem
Funkcija za iskanje vrstic v drugih tabelah, ki se nanašajo na vrstice prek zunanjega ključa ids
tabel table
. Zahtevani parametri - config_path
, table
и ids
.
Po zagonu bo na zaslonu prikazan slovar z naslednjo strukturo:
{
pk_id_1: {
reffering_table_name_1: {
foreign_key_1: [
{row_pk_1: value, row_pk_2: value},
...
],
...
},
...
},
pk_id_2: {...},
...
}
Primer klica:
$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
Uporaba v kodi
Poleg izvajanja v konzoli je knjižnico mogoče uporabiti v kodi Python. Spodaj so prikazani primeri klicev v interaktivnem okolju iPython.
Arhiviranje tabele
>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END
Poiščite odvisnosti za določeno tabelo
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
Iskanje referenc na nize z navedenim primarnim ključem
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
Izvorna koda knjižnice je na voljo na
Vesela bom komentarjev, zavez in predlogov.
Na vprašanja bom poskušal odgovoriti po svojih najboljših močeh tukaj in v skladišču.
Vir: www.habr.com