PgGraph minangka sarana kanggo arsip lan nemokake dependensi tabel ing PostgreSQL

PgGraph minangka sarana kanggo arsip lan nemokake dependensi tabel ing PostgreSQL
Dina iki aku pengin menehi pembaca Habr karo sarana sing ditulis ing Python kanggo nggarap dependensi tabel ing DBMS PostgreSQL.

API utilitas iku prasaja lan kasusun saka telung cara:

  • arsip_tabel - arsip rekursif / mbusak larik karo Keys utami tartamtu
  • njaluk_tabel_referensi - nelusuri dependensi kanggo tabel (bakal nuduhake tabel sing dirujuk dening sing ditemtokake lan sing ngrujuk)
  • get_rows_references - nelusuri larik ing tabel liyane sing ngrujuk baris kasebut ing tabel sing dikarepake

prasejarah

Jenengku Oleg Borzov, aku dadi pangembang ing tim CRM kanggo manajer pinjaman hipotek ing Domklik.

Database utama sistem CRM kita minangka salah sawijining sing paling gedhe babagan volume ing perusahaan. Iku uga salah siji sing paling tuwa: muncul ing peluncuran proyek kasebut, nalika wit-witan gedhe, Domklik minangka wiwitan, lan tinimbang microservice ing kerangka bedo Python modis ana monolit ageng ing PHP.

Transisi saka PHP kanggo Python dawa banget lan mbutuhake dhukungan simultaneous saka loro sistem, kang mengaruhi desain database.

AkibatΓ©, kita duwe database karo nomer akeh banget disambungake lan tabel ageng karo Bunch saka indeks kanggo macem-macem jinis pitakon. Kabeh iki duwe pengaruh negatif marang kinerja database: amarga tabel gedhe lan akeh hubungan ing antarane, kerumitan pitakon saya tambah akeh, sing penting banget kanggo tabel sing paling akeh dimuat.

Kanggo nyuda beban ing database, kita mutusake kanggo nulis skrip sing bakal nransfer cathetan lawas saka tabel sing paling akeh lan dimuat menyang sing diarsipake (contone, saka task Π² task_archive).

Tugas iki rumit amarga akeh hubungan antarane tabel: mung mindhah baris saka task Π² task_archive ora cukup, sadurunge sampeyan kudu nindakake rekursif sing padha karo kabeh referensi kasebut task tabel.

Aku bakal nduduhake karo conto database demo saka situs postgrespro.ru:

PgGraph minangka sarana kanggo arsip lan nemokake dependensi tabel ing PostgreSQL
Ayo kita kudu mbusak cathetan saka tabel Flights. Postgres ora ngidini kita nindakake iki kaya mangkene: kita kudu mbusak cathetan saka kabeh tabel referensi, lan sateruse kanthi rekursif mudhun menyang tabel sing ora dirujuk dening sapa wae.

Ing conto kita ing Flights nuduhake Ticket_flights, lan dheweke - Boarding_passes.

Mulane, sampeyan kudu mbusak ing urutan iki:

  1. Kita entuk nilai kunci utama (PK) saka baris Ticket_flights, sing nuduhake baris sing arep dibusak Flights.
  2. Kita entuk baris PK Boarding_passes, kang nuduhake Ticket_flights.
  3. Kita mbusak larik dening PK saka langkah 2 ing meja Boarding_passes.
  4. Mbusak baris dening PK saka langkah 1 ing Ticket_flights.
  5. Mbusak garis saka Flights.

Asil kasebut minangka sarana sing disebut PgGraph, sing kita mutusake kanggo nggawe open source.

Cara nggunakake

Utilitas ndhukung rong mode panggunaan:

  • Telpon saka baris perintah (pggraph …).
  • Dianggo ing kode Python (class PgGraphApi).

Instalasi lan konfigurasi

Pisanan sampeyan kudu nginstal sarana saka repositori Pypi:

pip3 install pggraph

Banjur nggawe file config.ini ing mesin lokal kanthi konfigurasi database lan script arsip:

[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; ΠΠ΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€, ΡƒΠΊΠ°Π·Π°Π½ΠΎ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ

[archive]  ; Π”Π°Π½Π½Ρ‹ΠΉ Ρ€Π°Π·Π΄Π΅Π» Π·Π°ΠΏΠΎΠ»Π½ΡΡ‚ΡŒ Π½Π΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ, Π½ΠΈΠΆΠ΅ ΡƒΠΊΠ°Π·Π°Π½Ρ‹ значСния ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'

Mbukak saka console

paramèter

$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
  action        required action: archive_table, get_table_references, get_rows_references

optional arguments:
  -h, --help                    show this help message and exit
  --table TABLE                 table name
  --ids IDS                     primary key ids, separated by comma, e.g. 1,2,3
  --config_path CONFIG_PATH     path to config.ini
  --log_path LOG_PATH           path to log dir
  --log_level LOG_LEVEL         log level (debug, info, error)

Argumen posisi:

  • action - tumindak sing dibutuhake: archive_table, get_table_references utawa get_rows_references.

Argumentasi jenenge:

  • --config_path - path menyang file konfigurasi;
  • --table - Tabel karo sampeyan kudu nindakake tumindak;
  • --ids - dhaptar id sing dipisahake karo koma, contone, 1,2,3 (parameter opsional);
  • --log_path - path menyang folder kanggo log (parameter opsional, minangka standar - folder ngarep);
  • --log_level - tingkat logging (parameter opsional, standar yaiku INFO).

Conto printah

Arsip meja

Fungsi utama utilitas yaiku pengarsipan data, yaiku. nransfer baris saka tabel utama menyang tabel arsip (contone, saka tabel buku Π² arsip_buku).

Pambusakan tanpa arsip uga didhukung: iki sampeyan kudu nyetel parameter ing config.ini to_archive = palsu).

Parameter sing dibutuhake - config_path, tabel lan id.

Sawise diluncurake, rekaman bakal dibusak kanthi rekursif ids ing meja table lan ing kabeh tabel sing nuduhake iku.

$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO:       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO:               START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO:                       START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:                       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:               boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:               END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO:       ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO:       END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END

Golek dependensi kanggo tabel tartamtu

Fungsi kanggo nemokake dependensi saka tabel tartamtu table. Parameter sing dibutuhake - config_path ΠΈ table.

Sawise diluncurake, kamus bakal ditampilake ing layar, ing ngendi:

  • in_refs - kamus tabel referensi sing diwenehake, ing ngendi tombol kasebut minangka jeneng tabel, nilai kasebut minangka dhaptar obyek Kunci Asing (pk_main - kunci utama ing tabel utama, pk_ref - kunci utama ing tabel referensi, fk_ref - jeneng kolom sing dadi kunci manca kanggo tabel sumber);
  • out_refs - kamus tabel iki nuduhake.

$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Nemokake referensi kanggo senar kanthi Kunci Utama sing ditemtokake

Fungsi kanggo nelusuri larik ing tabel liyane sing nuduhake larik liwat Foreign Key ids tabel table. Parameter sing dibutuhake - config_path, table ΠΈ ids.

Sawise diluncurake, kamus kanthi struktur ing ngisor iki bakal ditampilake ing layar:

{
	pk_id_1: {
		reffering_table_name_1: {
			foreign_key_1: [
				{row_pk_1: value, row_pk_2: value},
				...
			], 
			...
		},
		...
	},
	pk_id_2: {...},
	...
}

Tuladha telpon:

$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Panggunaan ing kode

Saliyane mlaku ing console, perpustakaan bisa digunakake ing kode Python. Conto telpon ing lingkungan interaktif iPython kapacak ing ngisor iki.

Arsip meja

>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: 	START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 	ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: 	SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: 	ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: 		START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: 		boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: 		boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: 		SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: 		DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: 		END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: 	ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: 	SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: 	DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: 	INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: 	END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END

Golek dependensi kanggo tabel tartamtu

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
 'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
              'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
                           ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}

Nemokake referensi kanggo senar kanthi Kunci Utama sing ditemtokake

>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
                                       'ticket_no': '0005432816945'},
                                      {'flight_id': 1,
                                       'ticket_no': '0005432816941'}]}},
 2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
                                       'ticket_no': '0005433101832'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005433101864'},
                                      {'flight_id': 2,
                                       'ticket_no': '0005432919715'}]}},
 3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
                                       'ticket_no': '0005432817560'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817568'},
                                      {'flight_id': 3,
                                       'ticket_no': '0005432817559'}]}}}

Kode sumber perpustakaan kasedhiya ing GitHub ing lisensi MIT, uga ing repositori PyPI.

Aku bakal seneng menehi komentar, komitmen lan saran.

Aku bakal nyoba mangsuli pitakon kanthi paling apik ing kene lan ing gudang.

Source: www.habr.com

Add a comment