How to build a hybrid cloud with Kubernetes that can replace DBaaS

My name is Petr Zaitsev, I am the CEO, founder percona and I want to say:

  • how we came from open source solutions to Database as a Service;
  • what are the approaches to deploying databases in the cloud;
  • how Kubernetes can replace DBaaS, eliminating vendor dependency and keeping the simplicity of DBMS as a service.

The article was prepared based on the report at @Databases Meetup by Mail.ru Cloud Solutions & Tarantool. If you don't want to read, you can watch:


How from open source came to Database as a Service in the cloud

I've been doing open source since the late 90s. Twenty years ago, using open source, such as databases, was not easy. It was necessary to download the source code, patch it, compile it, and only then use it.

Then open source went through a series of simplifications:

  • the Tar.gz and INSTALL sources that needed to be compiled;
  • packages with dependencies like .deb and .rpm where you only need to install a set of packages;
  • package repositories like APT and YUM that install automatically;
  • solutions such as Docker and Snap that allow you to get packages by installation without external dependencies.

As a result, it becomes easier to use open source software, and the barrier to entry into the development of such applications is also reduced.

At the same time, unlike the situation 20 years ago, when everyone was a build expert, now most developers cannot build the tools they use from source.

In fact, this is not bad, because:

  1. We can use more complex but more user-friendly software. For example, a browser is convenient to use, but it includes many open source components, it is inconvenient to build it from scratch.
  2. More people can become developers of open source and other software, more software is used by businesses, more is the need for it.

The flip side is that the next step in simplification is related to the use of cloud solutions, and this leads to a certain vendor lock-in, that is, binding to one supplier. We use simple solutions and providers use open source components, but in fact they are nailed to one of the large clouds. That is, the easiest and fastest way to deploy open source (and software compatible with it) is in the clouds using a proprietary API.

When it comes to databases in the cloud, there are two approaches:

  1. Assemble the database infrastructure, as in a regular data center. That is, take standard building blocks: compute, storage, and so on, put Linux, a database on them, and configure them.
  2. Use Database as a Service, where the provider offers a ready-made database inside the cloud.

Now DBaaS is a rapidly growing market, because such a service allows developers to work directly with databases and minimizes routine work. The provider takes care of providing High Availability (high availability) and easy scaling, database patching, backups, performance tuning.

Two types of Database as a Service based on open source and an alternative in the form of Kubernetes

There are two types of Database as a Service for open databases:

  1. A standard open source product packaged in an administration backend for ease of deployment and management.
  2. Advanced commercial solution with various add-ons, compatible with open source.

Both options reduce the possibility of migration between clouds, reduce the portability of data and applications. For example, despite the fact that different types of clouds support, in fact, the same standard MySQL, there are significant differences between them: in operation, performance, backup, and so on. Migrating from one cloud to another can be tricky, especially for complex applications.

And here the question arises - is it possible to get the convenience of Database as a Service, but as a simple open source solution?

The bad news is that, unfortunately, there are no such solutions on the market yet. The good news is that there is Kubernetes, which allows you to implement such solutions.

Kubernetes is an operating system for the cloud or data center that allows you to deploy and manage an application on multiple servers in a cluster, rather than on a single host.

Now Kubernetes is the leader in the category of such software. There were many different solutions for such tasks, but it was he who became the standard. Many companies that used to deal with alternative solutions are now focusing on adapting their products to support Kubernetes.

In addition, Kubernetes is a universal solution that is supported in private, public and hybrid clouds of many vendors, for example: AWS, Google Cloud, Microsoft Azure, Mail.ru Cloud Solutions.

How Kubernetes works with databases

Kubernetes was originally designed for stateless applications that process data but do not store anything, such as microservices or web applications. Databases are on the other end of the spectrum, meaning they are stateful applications. And Kubernetes was not originally intended for such applications.

However, there are features that have appeared in Kubernetes recently and allow you to use databases and other stateful applications:

  1. The StatefulSet concept is a whole series of primitives for handling Pod shutdown events and performing Graceful Shutdown (predictable shutdown of the application).
  2. Persistent Volumes are data stores that are associated with pods, Kubernetes management objects.
  3. Operator Framework - that is, the ability to create components for managing databases and other stateful applications distributed across many nodes.

Already now in public clouds there are large Database as a Service, in the backend of which Kubernetes, for example: CockroachCloud, InfluxDB, PlanetScale. That is, a database on Kubernetes is not only what is theoretically possible, but also what works in practice.

Percona has two open source solutions for Kubernetes:

  1. Kubernetes Operator for Percona Server for MongoDB.
  2. Kubernetes Operator for XtraDB CLUSTER is a service that is compatible with MySQL, provides high availability and consistency. You can also use a single node if high availability is not needed, for example for a dev database.

Kubernetes users can be divided into two groups. Some use Kubernetes Operators directly - these are mainly advanced users who have a good understanding of how the technology works. Others run it on the backend - such users are interested in something like Database as a Service, they do not want to delve into the nuances of Kubernetes. For the second group of users, we have another open source solution - Percona DBaaS CLI Tool. This is an experimental solution for those who want to get open source DBaaS based on Kubernetes without a deep understanding of the technology.

How to run DBaaS from Percona on Google Kubernetes Engine

Google Kubernetes Engine, in my opinion, is one of the most functional implementations of Kubernetes technology. It is available in many regions of the world and has a simple and convenient Command Line Tool (SDK) that allows you to create scripts rather than manually manage the platform.

In order for our DBaaS to work, we need the following components:

  1. Kubectl.
  2. Google Cloud SDK.
  3. Percona DBaaS CLI.

Installing kubectl

We install the package for your operating system, we will look at the example of Ubuntu. More here.

sudo apt-get update && sudo apt-get install -y apt-transport-https gnupg2
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Install Google Cloud SDK

Install the software package in the same way. More here.

# Add the Cloud SDK distribution URI as a package source
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] 
http://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

# Import the Google Cloud Platform public key
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

# Update the package list and install the Cloud SDK
sudo apt-get update && sudo apt-get install google-cloud-sdk

Install Percona DBaaS CLI

Install from the Percona repositories. The Percona DBaaS CLI Tool is still experimental, so it is in the experimental repository, which must be enabled separately, even if you already have the Percona repositories installed.

Details here.

Installation algorithm:

  1. Set up the Percona repositories with the percona-release tool. First you need to download and install the official percona-release package from Percona:
    wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
    sudo dpkg -i percona-release_latest.generic_all.deb
  2. Enable the experimental tool repository component as follows:
    sudo percona-release enable tools experimental
    
  3. Install the percona-dbaas-cli package:
    sudo apt-get update
    sudo apt-get install percona-dbaas-cli

Setting up the work of components

More about settings here.

First you need to sign in to your Google account. Further, Google Cloud allows one user to have many independent projects, so you need to specify a working project using the code for this project:

gcloud auth login
gcloud config set project hidden-brace-236921

Next, we create a cluster. For the demo, I created a Kubernetes cluster of just three nodes - this is the minimum required for high availability:

gcloud container clusters create --zone us-central1-a your-cluster-name --cluster-version 1.15 --num-nodes=3

The following kubectl command gives the desired privileges to our current user:

kubectl create clusterrolebinding cluster-admin-binding-$USER 
--clusterrole=cluster-admin --user=$(gcloud config get-value core/account)

We then create a namespace and make it active. Namespace is, roughly speaking, also like a project or environment, but already inside a Kubernetes cluster. It is independent of Google Cloud projects:

kubectl create namespace my-namespace
kubectl config set-context --current --namespace=my-namespace

Starting the cluster

After we have gone through these few steps, we can start a cluster of three nodes with this simple command:

# percona-dbaas mysql create-db example
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     example
Resource Endpoint: example-proxysql.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              Nt9YZquajW7nfVXTTrP
Status:            ready

How to connect to a cluster

By default, it is only available inside Kubernetes. That is, it is not available from this server from which you launched the "Create" command. To make it available, for example, for tests with a client, you need to forward the port through Port Mapping:

kubectl port-forward svc/example-proxysql 3306:3306 $

Then we connect your MySQL client:

mysql -h 127.0.0.1 -P 3306 -uroot -pNt9YZquajW7nfVXTTrP

Advanced cluster management commands

Database on public IP

If you want a more permanent solution for cluster availability, you can get an external IP address. In this case, the database will be accessible from anywhere. This is less secure, but often more convenient. For external IP, use the following command:

# percona-dbaas mysql create-db exposed 
--options="proxysql.serviceType=LoadBalancer"
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     exposed
Resource Endpoint: 104.154.133.197
Port:              3306
User:              root
Pass:              k0QVxTr8EVfgyCLYse
Status:            ready

To access database please run the following command:
mysql -h 104.154.133.197 -P 3306 -uroot -pk0QVxTr8EVfgyCLYse

Explicitly setting a password

Instead of the system randomly generating a password, you can set the password explicitly:

# percona-dbaas mysql create-db withpw --password=mypassword
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     withpw
Resource Endpoint: withpw-proxysql.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              mypassword
Status:            ready

I show the script output in human readable format, but JSON format is also supported.

Turn off high availability

You can turn off high availability with the following command to deploy a single node:

# percona-dbaas mysql create-db singlenode 
--options="proxysql.enabled=false, allowUnsafeConfigurations=true,pxc.size=1"
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     singlenode
Resource Endpoint: singlenode-pxc.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              22VqFD96mvRnmPMGg
Status:            ready

This is a solution for test tasks to quickly and easily raise MySQL, test it, and then roll it up or use it for development.

The Percona DBaaS CLI tool helps you get a Kubernetes-based solution similar to DBaaS. At the same time, we continue to work on its functionality and usability.

This report was first presented at @Databases Meetup by Mail.ru Cloud Solutions&Tarantool. See video other performances and subscribe to announcements of events in Telegram Around Kubernetes in Mail.ru Group.

What else to read on the topic:

  1. Databases in a modern IIoT platform.
  2. How to choose a database for a project so you don't have to choose again.

Source: habr.com

Add a comment