Building a high availability PostgreSQL cluster using Patroni, etcd, HAProxy

It just so happened that at the time of setting the task, I did not have a sufficient degree of experience to develop and run this solution alone. And then I started googling.

I don’t know what the catch is, but for the umpteenth time I’ve come across the fact that even if you do everything step by step as in the tutorial, prepare the same enviroment as the author’s, it still never works. I have no idea what's the matter, but when I ran into this again, I decided - and I'll write my tutorial when everything works out. One that will definitely work.

Guides on the Internet

It just so happens that the Internet does not suffer from a lack of various guides, tutorials, step-by-steps and the like. It just so happened that I was given the task of developing a solution for conveniently organizing and building a PostgreSQL failover cluster, the main requirements for which were streaming replication from the Master server to all replicas and automatic fallback when the Master server fails.

At this stage, the stack of technologies used was determined:

  • PostgreSQL as a DBMS
  • owners as a clustering solution
  • etcd as distributed storage for Patroni
  • HAproxy for organizing a single entry point for applications using the database

Installation

Your attention is building a high availability PostgreSQL cluster using Patroni, etcd, HAProxy.

All operations were performed on virtual machines with Debian 10 OS installed.

etcd

I do not recommend installing etcd on the same machines where patroni and postgresql will be located, since disk load is very important for etcd. But for educational purposes, we will do just that.
Install etcd.

#!/bin/bash
apt-get update
apt-get install etcd

Add content to /etc/default/etcd file

[member]

ETCD_NAME=datanode1 # hostname of your machine
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"

ALL IP ADDRESSES SHOULD BE VALID. LISTER PEER, CLIENT etc SHOULD BE SET TO IP ADDRESS OF HOST

ETCD_LISTEN_PEER_URLS="http://192.168.0.143:2380» # address of your machine
ETCD_LISTEN_CLIENT_URLS="http://192.168.0.143:2379,http://127.0.0.1:2379» # address of your machine

[Cluster]

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.143:2380» # address of your machine
ETCD_INITIAL_CLUSTER="datanode1=http://192.168.0.143:2380,datanode2=http://192.168.0.144:2380,datanode3=http://192.168.0.145:2380» # addresses of all machines in the etcd cluster
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.0.143:2379» # address of your machine

Run the command

systemctl restart etcd

PostgreSQL 9.6 + patroni

The first thing to do is set up three virtual machines to install the necessary software on them. After installing the machines, if you follow my tutorial, you can run this simple script that will (almost) do everything for you. Runs as root.

Please note that the script uses the PostgreSQL 9.6 version, this is due to the internal requirements of our company. The solution has not been tested on other PostgreSQL versions.

#!/bin/bash
apt-get install gnupg -y
echo "deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main" >> /etc/apt/sources.list
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
apt-get update
apt-get install postgresql-9.6 python3-pip python3-dev libpq-dev -y
systemctl stop postgresql
pip3 install --upgrade pip
pip install psycopg2
pip install patroni[etcd]
echo "
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target

[Service]
Type=simple

User=postgres
Group=postgres

ExecStart=/usr/local/bin/patroni /etc/patroni.yml

KillMode=process

TimeoutSec=30

Restart=no

[Install]
WantedBy=multi-user.targ
" > /etc/systemd/system/patroni.service
mkdir -p /data/patroni
chown postgres:postgres /data/patroni
chmod 700 /data/patroniпо
touch /etc/patroni.yml

Next, in the /etc/patroni.yml file you just created, you need to put the following content, of course changing the ip addresses in all places to the addresses that you use.
Pay attention to the comments in this yaml. Change the addresses to your own, on each machine in the cluster.

/etc/patroni.yml

scope: pgsql # должно быть одинаковым на всех нодах
namespace: /cluster/ # должно быть одинаковым на всех нодах
name: postgres1 # должно быть разным на всех нодах

restapi:
    listen: 192.168.0.143:8008 # адрес той ноды, в которой находится этот файл
    connect_address: 192.168.0.143:8008 # адрес той ноды, в которой находится этот файл

etcd:
    hosts: 192.168.0.143:2379,192.168.0.144:2379,192.168.0.145:2379 # перечислите здесь все ваши ноды, в случае если вы устанавливаете etcd на них же

# this section (bootstrap) will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
bootstrap:
    dcs:
        ttl: 100
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true
            use_slots: true
            parameters:
                    wal_level: replica
                    hot_standby: "on"
                    wal_keep_segments: 5120
                    max_wal_senders: 5
                    max_replication_slots: 5
                    checkpoint_timeout: 30

    initdb:
    - encoding: UTF8
    - data-checksums
    - locale: en_US.UTF8
    # init pg_hba.conf должен содержать адреса ВСЕХ машин, используемых в кластере
    pg_hba:
    - host replication postgres ::1/128 md5
    - host replication postgres 127.0.0.1/8 md5
    - host replication postgres 192.168.0.143/24 md5
    - host replication postgres 192.168.0.144/24 md5
    - host replication postgres 192.168.0.145/24 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 192.168.0.143:5432 # адрес той ноды, в которой находится этот файл
    connect_address: 192.168.0.143:5432 # адрес той ноды, в которой находится этот файл
    data_dir: /data/patroni # эту директорию создаст скрипт, описанный выше и установит нужные права
    bin_dir:  /usr/lib/postgresql/9.6/bin # укажите путь до вашей директории с postgresql
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: postgres
            password: postgres
        superuser:
            username: postgres
            password: postgres
    create_replica_methods:
        basebackup:
            checkpoint: 'fast'
    parameters:
        unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

The script must be run on all three machines in the cluster, just as the above configuration must be placed in the /etc/patroni.yml file on all machines.

When you have done these operations on all cluster machines, run the following command on any of them

systemctl start patroni
systemctl start postgresql

Wait about 30 seconds, then run this command on the rest of the machines in the cluster.

HAproxy

We use the wonderful HAproxy to provide a single point of entry. The master server will always be available at the address of the machine where HAproxy is deployed.

In order not to make the machine with HAproxy a single point of failure, we will run it in a Docker container, in the future it will be possible to run it in a K8's cluster and make our failover cluster even more reliable.

Create a directory where you can store two files - Dockerfile and haproxy.cfg. Go into it.

Dockerfile

FROM ubuntu:latest

RUN apt-get update 
    && apt-get install -y haproxy rsyslog 
    && rm -rf /var/lib/apt/lists/*

RUN mkdir /run/haproxy

COPY haproxy.cfg /etc/haproxy/haproxy.cfg

CMD haproxy -f /etc/haproxy/haproxy.cfg && tail -F /var/log/haproxy.log

Be careful, the last three lines of the haproxy.cfg file should list the addresses of your machines. HAproxy will contact Patroni, in HTTP headers the master server will always return 200, and the replica will always return 503.

haproxy.cfg

global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server postgresql1 192.168.0.143:5432 maxconn 100 check port 8008
    server postgresql2 192.168.0.144:5432 maxconn 100 check port 8008
    server postgresql3 192.168.0.145:5432 maxconn 100 check port 8008

Being in the directory in which both of our files “lie”, we will sequentially execute the commands for packing the container, as well as launching it with the necessary ports forwarded:

docker build -t my-haproxy .
docker run -d -p5000:5000 -p7000:7000 my-haproxy 

Now, when you open the address of your machine with HAproxy in the browser and specify port 7000, you will see the statistics for your cluster.

The server that is the master will be in the UP state, and the replicas will be in the DOWN state. This is normal, in fact they work, but they are displayed in this form due to the fact that they return 503 on requests from HAproxy. This allows us to always know exactly which of the three servers is the current master.

Conclusion

You are gorgeous! In just 30 minutes, you've deployed a great failover and performance database cluster with streaming replication and automatic failover. If you plan to use this solution, please see with official Patroni documentation, and especially with its part regarding the patronictl utility, which provides convenient access to managing your cluster.

Congratulations!

Source: habr.com

Add a comment