Cross replication between PostgreSQL and MySQL

Cross replication between PostgreSQL and MySQL

I will briefly talk about cross-replication between PostgreSQL and MySQL, as well as methods for setting up cross-replication between these two database servers. Cross-replicated databases are commonly referred to as homogeneous databases, and this is a convenient method for migrating from one RDBMS server to another.

PostgreSQL and MySQL databases are traditionally considered relational, but with additional extensions they offer NoSQL capabilities. Here we will discuss replication between PostgreSQL and MySQL from a relational database management perspective.

We will not describe all the internals, only the basic principles, so that you get an idea of ​​setting up replication between database servers, advantages, limitations and use cases.

Typically, replication between two identical database servers is done either in binary mode or by queries between a master (aka publisher, master, or active) and a slave (subscriber, standby, or passive). The purpose of replication is to provide a real-time copy of the master database on the standby side. In this case, data is transferred from master to slave, that is, from active to passive, because replication is performed only in one direction. But you can set up replication between the two databases in both directions, so that data is transferred from the slave to the master in an active-active configuration. All of this, including cascading replication, is possible between two or more identical database servers. Active-active or active-passive configuration depends on the need, the availability of such capabilities in the initial configuration or the use of external tuning solutions and existing trade-offs .

The described configuration is possible between different database servers. The server can be configured to receive replicated data from another database server and still retain real-time snapshots of the replicated data. MySQL and PostgreSQL offer most of these configurations natively or through third party extensions, including binary log methods, disk locks, and statement and row based methods.

Cross-replication between MySQL and PostgreSQL is needed for a one-time migration from one database server to another. These databases use different protocols, so you won't be able to link them directly. To facilitate data exchange, you can use an external open source tool, such as pg_chameleon.

What is pg_chameleon

pg_chameleon is a replication system from MySQL to PostgreSQL in Python 3. It uses the mysql-replication open source library, also in Python. Row images are retrieved from MySQL tables and stored as JSONB objects in the PostgreSQL database, and then decoded by the pl/pgsql function and played back in the PostgreSQL database.

Features of pg_chameleon

Multiple MySQL schemas from the same cluster can be replicated to a single PostgreSQL target database with a one-to-many configuration
The source and target schema names cannot be the same.
Replication data can be retrieved from a MySQL cascading replica.
Tables that cannot replicate or generate errors are excluded.
Each replication function is controlled by daemons.
Control with parameters and configuration files based on YAML.

Example

Host
vm1
vm2

OS version
CentOS Linux 7.6 x86_64
CentOS Linux 7.5 x86_64

Database server version
MySQL 5.7.26
PostgreSQL 10.5

DB port
3306
5433

IP Address
192.168.56.102
192.168.56.106

First, prepare all the necessary components for installing pg_chameleon. This example has Python 3.6.8 installed, which creates a virtual environment and activates it.

$> wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz
$> tar -xJf Python-3.6.8.tar.xz
$> cd Python-3.6.8
$> ./configure --enable-optimizations
$> make altinstall

Once Python3.6 is successfully installed, the rest of the requirements need to be completed, such as creating and activating a virtual environment. Also, the pip module is updated to the latest version and used to install pg_chameleon. The commands below intentionally install pg_chameleon 2.0.9, even though the latest version is 2.0.10. This is necessary to avoid new bugs in the updated version.

$> python3.6 -m venv venv
$> source venv/bin/activate
(venv) $> pip install pip --upgrade
(venv) $> pip install pg_chameleon==2.0.9

We then call pg_chameleon (chameleon is a command) with the set_configuration_files argument to enable pg_chameleon and create the default configuration directories and files.

(venv) $> chameleon set_configuration_files
creating directory /root/.pg_chameleon
creating directory /root/.pg_chameleon/configuration/
creating directory /root/.pg_chameleon/logs/
creating directory /root/.pg_chameleon/pid/
copying configuration  example in /root/.pg_chameleon/configuration//config-example.yml

We now make a copy of config-example.yml as default.yml so that it becomes the default configuration file. A sample configuration file for this example is provided below.

$> cat default.yml
---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''

# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"

#postgres  destination connection
pg_conn:
  host: "192.168.56.106"
  port: "5433"
  user: "usr_replica"
  password: "pass123"
  database: "db_replica"
  charset: "utf8"

sources:
  mysql:
    db_conn:
      host: "192.168.56.102"
      port: "3306"
      user: "usr_replica"
      password: "pass123"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      world_x: pgworld_x
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

The configuration file in this example is a sample pg_chameleon file with minor modifications to match the source and target environments, and below is an overview of the various sections of the configuration file.

The default.yml configuration file has a section of global settings (global settings) where you can control settings such as the location of the lock file, the location of the logs, the period for storing logs, etc. Next comes the type override section, where the set rules for overriding types during replication. The default example uses a type override rule that converts tinyint(1) to a boolean. In the next section, we specify the details of connecting to the target database. In our case, this is a PostgreSQL database, denoted as pg_conn. In the last section, we specify the source data, that is, the connection parameters of the source database, the mapping scheme of the source and target databases, tables to be skipped, timeout, memory, package size. Note that "sources" is plural, meaning we can add multiple source databases to the same target to set up a many-to-one configuration.

The world_x database in the example contains 4 tables with rows that the MySQL community suggests for the example. It can be downloaded here. The sample database comes as a tar and compressed archive with instructions for creating and importing strings.

A special user with the same name usr_replica is created in the MySQL and PostgreSQL databases. MySQL grants it additional read access to all replicated tables.

mysql> CREATE USER usr_replica ;
mysql> SET PASSWORD FOR usr_replica='pass123';
mysql> GRANT ALL ON world_x.* TO 'usr_replica';
mysql> GRANT RELOAD ON *.* to 'usr_replica';
mysql> GRANT REPLICATION CLIENT ON *.* to 'usr_replica';
mysql> GRANT REPLICATION SLAVE ON *.* to 'usr_replica';
mysql> FLUSH PRIVILEGES;

On the PostgreSQL side, a db_replica database is created that will accept changes from the MySQL database. The usr_replica user in PostgreSQL is automatically configured as the owner of the two schemas pgworld_x and sch_chameleon, which contain the actual replicated tables and replication catalog tables, respectively. The create_replica_schema argument is responsible for automatic configuration, as you will see below.

postgres=# CREATE USER usr_replica WITH PASSWORD 'pass123';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

The MySQL database is configured with some changes to make it ready for replication as shown below. You will need to restart the database server for the changes to take effect.

$> vi /etc/my.cnf
binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now it is important to check the connection to both database servers so that there are no problems when executing the pg_chameleon commands.

On the PostgreSQL node:

$> mysql -u usr_replica -Ap'admin123' -h 192.168.56.102 -D world_x

On MySQL node:

$> psql -p 5433 -U usr_replica -h 192.168.56.106 db_replica

The next three pg_chameleon (chameleon) commands prepare the environment, add the source, and initialize the replica. The create_replica_schema argument to pg_chameleon creates a default schema (sch_chameleon) and a replication schema (pgworld_x) in the PostgreSQL database, as we said. The add_source argument adds a source database to the configuration by reading the configuration file (default.yml), which in our case is mysql, and init_replica initializes the configuration based on the settings in the configuration file.

$> chameleon create_replica_schema --debug
$> chameleon add_source --config default --source mysql --debug
$> chameleon init_replica --config default --source mysql --debug

The output of these three commands clearly indicate their success. All failures or syntax errors are indicated in simple and understandable messages with hints on how to fix problems.

Finally, we start replication with start_replica and get a success message.

$> chameleon start_replica --config default --source mysql 
output: Starting the replica process for source mysql

Replication status can be queried with the show_status argument, and errors can be viewed with the show_errors argument.

Result.

As we have already said, daemons manage every replication function. To view them, query the process table with the Linux ps command, as shown below.

Result.

Replication is not considered configured until we test it in real time, as shown below. We create a table, insert a couple of records into the MySQL database, and call the sync_tables argument in pg_chameleon to update the daemons and replicate the table with the records to the PostgreSQL database.

mysql> create table t1 (n1 int primary key, n2 varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t1 values (1,'one');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t1 values (2,'two');
Query OK, 1 row affected (0.00 sec)

$> chameleon sync_tables --tables world_x.t1 --config default --source mysql
Sync tables process for source mysql started.

To validate the test results, we query the table from the PostgreSQL database and output the rows.

$> psql -p 5433 -U usr_replica -d db_replica -c "select * from pgworld_x.t1";
 n1 |  n2
----+-------
  1 | one
  2 | two

If we are migrating, the following pg_chameleon commands will be the end of the migration. The commands should be run after we have verified that the rows of all target tables have been replicated, resulting in a neatly migrated PostgreSQL database with no references to the source database or replication scheme (sch_chameleon).

$> chameleon stop_replica --config default --source mysql 
$> chameleon detach_replica --config default --source mysql --debug

You can optionally delete the original configuration and replication schema with the following commands.

$> chameleon drop_source --config default --source mysql --debug
$> chameleon drop_replica_schema --config default --source mysql --debug

Benefits of pg_chameleon

Easy setup and configuration.
Convenient troubleshooting and anomaly detection with clear error messages.
Additional special tables can be added to replication after initialization without changing the rest of the configuration.
It is possible to set up multiple source databases for one target database, and this is very handy if you are merging data from one or more MySQL databases into one PostgreSQL database.
You can choose not to replicate selected tables.

Disadvantages of pg_chameleon

Only supported with MySQL 5.5 and above as the source and PostgreSQL 9.5 and above as the target database.
Each table must have a primary or unique key, otherwise the tables are initialized in the init_replica process but not replicated.
One-way replication - only from MySQL to PostgreSQL. Therefore, it is only suitable for the active-passive scheme.
The source can only be a MySQL database, and support for a PostgreSQL database as a source is only experimental and limited (learn more here)

Totals for pg_chameleon

The replication method in pg_chameleon is great for migrating a database from MySQL to PostgreSQL. The big downside is that replication is only one-way, so database professionals are unlikely to want to use it for anything other than migration. But the problem of one-way replication can be solved with another open source tool - SymmetricDS.

Read more in the official documentation here. Command line help can be found here.

Overview of SymmetricDS

SymmetricDS is an open source tool that replicates any database to any other common database: Oracle, MongoDB, PostgreSQL, MySQL, SQL Server, MariaDB, DB2, Sybase, Greenplum, Informix, H2, Firebird and other cloud database instances, for example Redshift, and Azure, etc. Available features: database and file synchronization, multi-master database replication, filtered synchronization, transformation, and others. This is a Java tool and requires the standard release JRE or JDK (version 8.0 or higher). Here you can record data changes on triggers in the source database and send them to the corresponding target database as batches.

SymmetricDS Capabilities

The tool is platform independent, meaning two or more different databases can exchange data.
Relational databases are synchronized by recording data changes, and databases based on file systems use file synchronization.
Two-way replication using push and pull methods based on a set of rules.
Data transmission is possible over secure networks and networks with low bandwidth.
Automatic recovery when resuming operation of nodes after a failure and automatic conflict resolution.
Cloud compatible and efficient extension APIs.

Example

SymmetricDS can be configured in one of two ways:
A master (parent) node that centrally coordinates data replication between two slave (child) nodes, and data exchange between child nodes is carried out only through the parent.
An active node (node ​​1) can communicate for replication with another active node (node ​​2) without an intermediary.

In both options, data exchange occurs using Push and Pull. In this example, we will consider the active-active configuration. It's too long to describe the whole architecture, so study guideto learn more about the SymmetricDS appliance.

Installing SymmetricDS is easy: download the open source zip file hence and extract it wherever you want. The following table lists the installation location and version of SymmetricDS in this example, as well as database versions, Linux versions, IP addresses, and ports for both nodes.

Host
vm1
vm2

OS version
CentOS Linux 7.6 x86_64
CentOS Linux 7.6 x86_64

Database server version
MySQL 5.7.26
PostgreSQL 10.5

DB port
3306
5832

IP Address
192.168.1.107
192.168.1.112

SymmetricDS Version
SymmetricDS 3.9
SymmetricDS 3.9

SymmetricDS installation path
/usr/local/symmetric-server-3.9.20
/usr/local/symmetric-server-3.9.20

SymmetricDS host name
corp-000
store-001

Here we install SymmetricDS in /usr/local/symmetric-server-3.9.20 and various subdirectories and files will be stored there. We are interested in the subdirectories samples and engines. The samples directory contains sample configuration files with node properties, as well as sample SQL scripts to quickly start the demo.

In the samples directory, we see three configuration files with node properties - the name shows the nature of the node in a particular scheme.

corp-000.properties
store-001.properties
store-002.properties

SymmetricDS has all the necessary configuration files for a basic 3 node schema (option 1) and the same files can be used for a 2 node schema (option 2). Copy the required configuration file from the samples directory to engines on the vm1 host. It turns out like this:

$> cat engines/corp-000.properties
engine.name=corp-000
db.driver=com.mysql.jdbc.Driver
db.url=jdbc:mysql://192.168.1.107:3306/replica_db?autoReconnect=true&useSSL=false
db.user=root
db.password=admin123
registration.url=
sync.url=http://192.168.1.107:31415/sync/corp-000
group.id=corp
external.id=000

This node is called corp-000 in the SymmetricDS configuration and the database connection is handled by the mysql jdbc driver which uses the connection string above and login credentials. We are connecting to the replica_db database and tables will be created during schema creation. sync.url shows the link to the node to synchronize.

Node 2 on host vm2 is configured as store-001 and the rest is specified in the node.properties file below. The store-001 node runs the PostgreSQL database, and pgdb_replica is the database to replicate. registration.url allows vm2 host to contact vm1 host and get configuration details from it.

$> cat engines/store-001.properties
engine.name=store-001
db.driver=org.postgresql.Driver
db.url=jdbc:postgresql://192.168.1.112:5832/pgdb_replica
db.user=postgres
db.password=admin123
registration.url=http://192.168.1.107:31415/sync/corp-000
group.id=store
external.id=001

The completed SymmetricDS example contains options for setting up two-way replication between two database servers (two nodes). The steps below are performed on host vm1 (corp-000) which will create a sample schema with 4 tables. Then executing create-sym-tables with the symadmin command creates directory tables where the rules and direction of replication between nodes will be stored. Finally, the sample data is loaded into the tables.

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> ./dbimport --engine corp-000 --format XML create_sample.xml
vm1$> ./symadmin --engine corp-000 create-sym-tables
vm1$> ./dbimport --engine corp-000 insert_sample.sql

In the example, the item and item_selling_price tables are automatically configured to replicate from corp-000 to store-001, and the sale tables (sale_transaction and sale_return_line_item) are automatically configured to replicate from store-001 to corp-000. We now create a schema in the PostgreSQL database on host vm2 (store-001) to prepare it to receive data from corp-000.

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> ./dbimport --engine store-001 --format XML create_sample.xml

Be sure to check that the MySQL database on vm1 has sample tables and a SymmetricDS catalog table. Note that the SymmetricDS system tables (with the sym_ prefix) are currently only available on the corp-000 node, because that's where we ran the create-sym-tables command and will manage replication. And in the database at the store-001 node there will be only 4 example tables without data.

All. The environment is ready to run sym server processes on both nodes as shown below.

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> sym 2>&1 &

Log entries are sent to the background log file (symmetric.log) in the log folder in the directory where SymmetricDS is installed, as well as to the standard output. The sym server can now be initiated on the store-001 node.

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> sym 2>&1 &

If you run the sym server process on the vm2 host, it will create the SymmetricDS catalog tables in the PostgreSQL database as well. If you run the sym server process on both nodes, they will coordinate with each other to replicate data from corp-000 to store-001. If after a few seconds we query all 4 tables on both sides, we will see that the replication was successful. Or you can send the bootstrap to store-001 from corp-000 with the following command.

vm1$> ./symadmin --engine corp-000 reload-node 001

At this point, a new record is inserted into the item table in the MySQL database on node corp-000 (host: vm1) and can be checked for replication to the PostgreSQL database on node store-001 (host: vm2). We see a Pull operation to move data from corp-000 to store-001.

mysql> insert into item values ('22000002','Jelly Bean');
Query OK, 1 row affected (0.00 sec)

vm2$> psql -p 5832 -U postgres pgdb_replica -c "select * from item"
 item_id  |   name
----------+-----------
 11000001 | Yummy Gum
 22000002 | Jelly Bean
(2 rows)

To perform a push operation to move data from store-001 to corp-000, we insert a record into the sale_transaction table and verify that replication has been completed.

Result.

We see a successful setup of two-way replication of the example tables between the MySQL and PostgreSQL databases. To set up replication for new user tables, perform the following steps. We create table t1 for example and set up its replication rules as follows. So we set up only replication from corp-000 to store-001.

mysql> create table  t1 (no integer);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into sym_channel (channel_id,create_time,last_update_time) 
values ('t1',current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger (trigger_id, source_table_name,channel_id,
last_update_time, create_time) values ('t1', 't1', 't1', current_timestamp,
current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger_router (trigger_id, router_id,
Initial_load_order, create_time,last_update_time) values ('t1',
'corp-2-store-1', 1, current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

The configuration is then notified of the schema change, i.e. the addition of a new table, by using the symadmin command with the sync-triggers argument, which recreates the triggers to match the table definitions. Send-schema is run to send schema changes to store-001, and table t1 is replicated.

vm1$> ./symadmin -e corp-000 --node=001 sync-triggers    
vm1$> ./symadmin send-schema -e corp-000 --node=001 t1

Benefits of SymmetricDS

Easy installation and configuration, including a ready-made set of files with parameters for creating a scheme with three or two nodes.
Cross-platform databases and platform independence, including servers, laptops and mobile devices.
Replicate any database to any other database locally, on the WAN, or in the cloud.
Ability to work optimally with a couple of databases or several thousand for easy replication.
Paid version with GUI and excellent support.

Disadvantages of SymmetricDS

You have to manually define the rules and direction of replication on the command line via SQL statements to load the catalog tables, which can be inconvenient.
Setting up many tables for replication can be tedious unless you use scripts to create SQL statements that define the rules and direction of replication.
There is too much information in the logs, and sometimes you need to clean up the log file so that it does not take up too much space.

Summary of SymmetricDS

SymmetricDS allows you to set up two-way replication between two, three, or even several thousand nodes in order to replicate and synchronize files. It is a unique tool that performs many tasks on its own, such as automatic recovery of data after a long downtime on a node, secure and efficient communication between nodes over HTTPS, automatic conflict management based on a set of rules, etc. SymmetricDS replicates between any databases, therefore, it can be used for a wide variety of scenarios, including migration, upgrade, distribution, filtering, and data transformation across platforms.

The example is based on the official quick guide by SymmetricDS. IN user manual Describes in detail the various concepts involved in setting up replication with SymmetricDS.

Source: habr.com

Add a comment