Gaur Habr irakurleei Python-en idatzitako erabilgarritasun bat aurkeztu nahi diet PostgreSQL DBMSko taulen mendekotasunekin lan egiteko.
Erabilgarritasunaren APIa sinplea da eta hiru metodo ditu:
- artxibo_taula - Errenkadak artxibatzea/ezabatzea errekurtsiboa gako nagusi zehaztuekin
- lortu_taularen_erreferentziak β Bilatu taula baten menpekotasunak (zehaztutakoak eta erreferentzia egiten duten taulak erakutsiko ditu)
- lortu_errendak_erreferentziak - bilatu nahi den taulan zehaztutako errenkadak aipatzen dituzten beste tauletan errenkadak
historiaurrea
Nire izena Oleg Borzov da, Domklik-eko hipoteka-maileguetako kudeatzaileentzako CRM taldeko garatzailea naiz.
Gure CRM sistemaren datu-base nagusia konpainiako bolumenari dagokionez handienetakoa da. Zaharrenetakoa ere bada: proiektuaren abiaraztean bertan agertu zen, zuhaitzak handiak zirenean, Domklik startup bat zen, eta modan dagoen Python esparru asinkrono batean mikrozerbitzu baten ordez monolito erraldoi bat zegoen PHPn.
PHP-tik Python-erako trantsizioa oso luzea izan zen eta bi sistemen aldibereko laguntza behar izan zuen, eta horrek datu-basearen diseinuan eragin zuen.
Ondorioz, datu-base bat dugu oso konektaturik eta taula erraldoi ugari dituena, kontsulta mota ezberdinetarako indize mordoa duena. Horrek guztiak datu-basearen errendimenduari eragiten dio negatiboki: taula handiak eta haien arteko harreman mordoa direla eta, kontsulten konplexutasuna etengabe handitzen ari da, eta hori bereziki kritikoa da gehien kargatzen diren mahaietarako.
Datu-basearen karga murrizteko, gidoi bat idaztea erabaki genuen, erregistro zaharrak transferituko zituen taula handien eta kargatuenetatik artxibatutakoetara (adibidez, task
Π² task_archive
).
Zeregin hau korapilatsua da taulen arteko erlazio-kopuru handiagatik: besterik gabe, mugitu errenkadak task
Π² task_archive
ez da nahikoa, aurretik gauza bera egin behar duzu errekurtsiboki erreferentzia egiten duten guztiekin task
mahaiak.
Adibide batekin erakutsiko dut
Demagun taula bateko erregistroak ezabatu behar ditugula Flights
. Postgres-ek ez digu horrelakorik egiten utziko: lehenik eta behin erreferentziazko taula guztietatik erregistroak ezabatu behar ditugu, eta abar modu errekurtsiboan inork aipatzen ez dituen tauletaraino.
Gure adibidean at Flights
aipatzen du Ticket_flights
, eta haren gainean - Boarding_passes
.
Hori dela eta, ordena honetan ezabatu behar duzu:
- Errenkaden lehen gakoen (PK) balioak jasotzen ditugu
Ticket_flights
, ezabatu beharreko errenkadak aipatzen direnakFlights
. - PK errenkadak lortzen ditugu
Boarding_passes
, aipatzen dutenakTicket_flights
. - PK-ren errenkadak ezabatzen ditugu taulako 2. urratsetik
Boarding_passes
. - Ezabatu PKren lerroak 1. urratsetik
Ticket_flights
. - Lerroak kentzen
Flights
.
Emaitza PgGraph izeneko utilitate bat izan zen, kode irekia egitea erabaki genuena.
Nola erabili
Utilitateak bi erabilera modu onartzen ditu:
- Deitu komando lerrotik (
pggraph β¦
). - Erabilera Python kodean (klasea
PgGraphApi
).
Instalazioa eta konfigurazioa
Lehenik eta behin utilitatea instalatu behar duzu Pypi biltegitik:
pip3 install pggraph
Ondoren, sortu config.ini fitxategi bat tokiko makinan datu-basearen konfigurazioarekin eta artxibatzeko scriptarekin:
[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; ΠΠ΅ΠΎΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΡΠΉ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ, ΡΠΊΠ°Π·Π°Π½ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΏΠΎ ΡΠΌΠΎΠ»ΡΠ°Π½ΠΈΡ
[archive] ; ΠΠ°Π½Π½ΡΠΉ ΡΠ°Π·Π΄Π΅Π» Π·Π°ΠΏΠΎΠ»Π½ΡΡΡ Π½Π΅ΠΎΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ, Π½ΠΈΠΆΠ΅ ΡΠΊΠ°Π·Π°Π½Ρ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΏΠΎ ΡΠΌΠΎΠ»ΡΠ°Π½ΠΈΡ
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'
Exekutatu kontsolatik
Parametroak
$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
action required action: archive_table, get_table_references, get_rows_references
optional arguments:
-h, --help show this help message and exit
--table TABLE table name
--ids IDS primary key ids, separated by comma, e.g. 1,2,3
--config_path CONFIG_PATH path to config.ini
--log_path LOG_PATH path to log dir
--log_level LOG_LEVEL log level (debug, info, error)
Posizio-argudioak:
action
- Beharrezko ekintza:archive_table
,get_table_references
edoget_rows_references
.
Izendatutako argumentuak:
--config_path
β konfigurazio fitxategirako bidea;--table
β ekintza bat egin behar duzun taula;--ids
- ID zerrenda komaz bereizita, adibidez,1,2,3
(aukerako parametroa);--log_path
β erregistroen karpetarako bidea (aukerako parametroa, lehenespenez β hasierako karpeta);--log_level
β Erregistro-maila (aukerako parametroa, lehenetsia INFO da).
Agindu adibideak
Taula bat artxibatzea
Utilitatearen funtzio nagusia datuak artxibatzea da, hau da. errenkadak taula nagusitik artxiboko taulara transferitzea (adibidez, taulatik liburuak Π² liburuak_artxiboa).
Artxibatu gabe ezabatzea ere onartzen da: horretarako parametroa ezarri behar duzu config.ini-n to_archive = faltsu).
Beharrezko parametroak - config_path, taula eta IDak.
Abiarazi ondoren, erregistroak modu errekurtsiboan ezabatuko dira ids
taulan table
eta aipatzen duten taula guztietan.
$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END
Bilatu zehaztutako taula baterako menpekotasunak
Zehaztutako taula baten menpekotasunak bilatzeko funtzioa table
. Beharrezko parametroak - config_path
ΠΈ table
.
Abiarazi ondoren, hiztegi bat bistaratuko da pantailan, non:
in_refs
β Emandako bati erreferentzia egiten dion taulen hiztegia, non gakoa taularen izena den, balioa Kanpoko Gako objektuen zerrenda bat den (pk_main
- taula nagusiko gako nagusia,pk_ref
- erreferentzia-taulan lehen gakoa,fk_ref
β iturburu-taularen atzerriko gako den zutabearen izena);out_refs
β honek aipatzen duen taulen hiztegia.
$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
Zehaztutako Lehen Gakoarekin kateen erreferentziak aurkitzea
Atzerriko gakoaren bidez errenkadak aipatzen dituzten beste tauletan errenkadak bilatzeko funtzioa ids
taulak table
. Beharrezko parametroak - config_path
, table
ΠΈ ids
.
Abiarazi ondoren, egitura hau duen hiztegi bat bistaratuko da pantailan:
{
pk_id_1: {
reffering_table_name_1: {
foreign_key_1: [
{row_pk_1: value, row_pk_2: value},
...
],
...
},
...
},
pk_id_2: {...},
...
}
Deiaren adibidea:
$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
Erabilera kodean
Kontsolan exekutatzeaz gain, liburutegia Python kodean erabil daiteke. iPython ingurune interaktiboko deien adibideak behean erakusten dira.
Taula bat artxibatzea
>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END
Bilatu zehaztutako taula baterako menpekotasunak
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
Zehaztutako Lehen Gakoarekin kateen erreferentziak aurkitzea
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
Liburutegiaren iturburu kodea helbidean dago eskuragarri
Pozik egongo naiz iruzkinak, konpromisoak eta iradokizunak.
Ahal dudan neurrian galderei erantzuten saiatuko naiz hemen eta biltegian.
Iturria: www.habr.com