PgGraph is in hulpprogramma foar it argivearjen en finen fan tabelôfhinklikens yn PostgreSQL

PgGraph is in hulpprogramma foar it argivearjen en finen fan tabelôfhinklikens yn PostgreSQL
Hjoed wol ik Habr-lêzers presintearje mei in hulpprogramma skreaun yn Python foar wurkjen mei tabelôfhinklikens yn 'e PostgreSQL DBMS.

De API fan it hulpprogramma is ienfâldich en bestiet út trije metoaden:

  • argyf_tabel - rekursyf argivearjen / wiskjen fan rigen mei oantsjutte primêre kaaien
  • get_table_references - sykje nei ôfhinklikens foar in tabel (sil tabellen sjen litte ferwiisd troch de spesifisearre en dejingen dy't it ferwize)
  • get_rows_references - sykje nei rigen yn oare tabellen dy't ferwize nei oantsjutte rigen yn 'e winske tabel

prehistoarje

Myn namme is Oleg Borzov, ik bin in ûntwikkelder yn it CRM-team foar hypoteekbehearders yn Domklik.

De haaddatabase fan ús CRM-systeem is ien fan 'e grutste yn termen fan folume yn it bedriuw. It is ek ien fan 'e âldste: it ferskynde by de start fan it projekt, doe't de beammen grut wiene, wie Domklik in opstart, en ynstee fan in mikrotsjinst op in modieuze Python asynchronous ramt wie d'r in enoarme monolith yn PHP.

De oergong fan PHP nei Python wie tige lang en fereaske simultane stipe fan beide systemen, wat it ûntwerp fan 'e databank beynfloede.

As gefolch hawwe wy in databank mei in grut oantal tige ferbûne en enoarme tabellen mei in bosk yndeksen foar ferskate soarten fragen. Dit alles hat in negative ynfloed op de prestaasjes fan 'e databank: troch grutte tabellen en in protte relaasjes tusken har, wurdt de kompleksiteit fan fragen hieltyd grutter, wat benammen kritysk is foar de meast laden tabellen.

Om de lading op de databank te ferminderjen, hawwe wy besletten in skript te skriuwen dat âlde records soe oerdrage fan 'e meast volumineuze en laden tabellen nei argivearre (bygelyks fan task в task_archive).

Dizze taak wurdt yngewikkeld troch it grutte oantal relaasjes tusken tabellen: gewoan ferpleatse rigen fan task в task_archive is net genôch, dêrfoar moatte jo itselde rekursyf dwaan mei al dy ferwizings task tabellen.

Ik sil demonstrearje mei in foarbyld demo databank fan de side postgrespro.ru:

PgGraph is in hulpprogramma foar it argivearjen en finen fan tabelôfhinklikens yn PostgreSQL
Litte wy sizze dat wy records fan in tabel moatte wiskje Flights. Postgres lit ús dit net sa krekt dwaan: wy moatte earst records fan alle ferwizingstabellen wiskje, en sa fierder rekursyf nei tabellen dêr't gjinien nei ferwiist.

Yn ús foarbyld by Flights ferwiist Ticket_flights, en op har - Boarding_passes.

Dêrom moatte jo it wiskje yn dizze folchoarder:

  1. Wy krije de primêre kaaien (PK) wearden fan rigen yn Ticket_flights, dy't ferwize nei de rigen om yn te wiskjen Flights.
  2. Wy krije PK rigen Boarding_passes, dy't ferwize nei Ticket_flights.
  3. Wy wiskje rigen troch PK út stap 2 yn 'e tabel Boarding_passes.
  4. Wiskje rigels troch PK fan stap 1 yn Ticket_flights.
  5. Fuortsmite linen út Flights.

It resultaat wie in hulpprogramma neamd PgGraph, dat wy besletten om iepen boarne te meitsjen.

Hoe brûke

It hulpprogramma stipet twa manieren fan gebrûk:

  • Skilje fanút de kommandorigel (pggraph …).
  • Gebrûk yn Python-koade (klasse PgGraphApi).

Ynstallaasje en konfiguraasje

Earst moatte jo it hulpprogramma ynstallearje fan it Pypi-repository:

pip3 install pggraph

Meitsje dan in config.ini-bestân op 'e lokale masine mei de konfiguraasje fan' e databank en it argyfskript:

[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; Необязательный параметр, указано значение по умолчанию

[archive]  ; Данный раздел заполнять необязательно, ниже указаны значения по умолчанию
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'

Run út konsole

parameters

$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
  action        required action: archive_table, get_table_references, get_rows_references

optional arguments:
  -h, --help                    show this help message and exit
  --table TABLE                 table name
  --ids IDS                     primary key ids, separated by comma, e.g. 1,2,3
  --config_path CONFIG_PATH     path to config.ini
  --log_path LOG_PATH           path to log dir
  --log_level LOG_LEVEL         log level (debug, info, error)

Posisjonele arguminten:

  • action - ferplichte aksje: archive_table, get_table_references of get_rows_references.

Arguminten neamd:

  • --config_path - paad nei it konfiguraasjetriem;
  • --table - in tabel wêrmei jo in aksje moatte útfiere;
  • --ids - list mei id skieden troch komma's, bygelyks, 1,2,3 (opsjoneel parameter);
  • --log_path - paad nei de map foar logs (opsjonele parameter, standert - thúsmap);
  • --log_level - lognivo (opsjonele parameter, standert is INFO).

Kommando foarbylden

Argivearjen fan in tabel

De wichtichste funksje fan it nut is gegevensargivearring, d.w.s. oerbringen fan rigen fan 'e haadtabel nei de argyftabel (bygelyks fan' e tabel boeken в boeken_argyf).

Wiskje sûnder argivearjen wurdt ek stipe: hjirfoar moatte jo de parameter yn config.ini ynstelle to_archive = false).

Fereaske parameters - config_path, tabel en ids.

Nei lansearring sille records rekursyf wiske wurde ids yn 'e tafel table en yn alle tabellen dy't der nei ferwize.

$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO:       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO:               START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END

Fine ôfhinklikens foar in spesifisearre tabel

Funksje om ôfhinklikens te finen fan in spesifisearre tabel table. Fereaske parameters - config_path и table.

Nei lansearring sil in wurdboek op it skerm werjûn wurde, wêrby't:

  • in_refs - in wurdboek fan tabellen dy't ferwize nei in opjûne, wêrby't de kaai de namme fan 'e tabel is, de wearde is in list mei bûtenlânske kaaiobjekten (pk_main - primêre kaai yn 'e haadtabel, pk_ref - primêre kaai yn 'e referinsjetabel, fk_ref - de namme fan 'e kolom dy't de frjemde kaai is foar de boarnetabel);
  • out_refs - in wurdboek fan tabellen dêr't men nei ferwiist.

$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

It finen fan ferwizings nei snaren mei de oantsjutte primêre kaai

Funksje om te sykjen nei rigen yn oare tabellen dy't ferwize nei rigen fia Foreign Key ids tafels table. Fereaske parameters - config_path, table и ids.

Nei lansearring sil in wurdboek mei de folgjende struktuer op it skerm werjûn wurde:

{
	pk_id_1: {
		reffering_table_name_1: {
			foreign_key_1: [
				{row_pk_1: value, row_pk_2: value},
				...
			], 
			...
		},
		...
	},
	pk_id_2: {...},
	...
}

Foarbyld oprop:

$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Gebrûk yn koade

Neist it útfieren fan it yn 'e konsole, kin de bibleteek brûkt wurde yn Python-koade. Foarbylden fan petearen yn 'e iPython ynteraktive omjouwing wurde hjirûnder werjûn.

Argivearjen fan in tabel

>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: 	START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 	ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: 	SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: 	END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END

Fine ôfhinklikens foar in spesifisearre tabel

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

It finen fan ferwizings nei snaren mei de oantsjutte primêre kaai

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

De boarnekoade fan 'e biblioteek is beskikber op GitHub ûnder MIT-lisinsje, lykas yn 'e repository PyPI.

Ik sil bliid wêze mei opmerkings, commits en suggestjes.

Ik sil besykje fragen te beantwurdzjen nei it bêste fan myn fermogen hjir en yn 'e repository.

Boarne: www.habr.com

Add a comment