DBA bot Joe. Anatoly Stansler (Postgres.ai)

DBA bot Joe. Anatoly Stansler (Postgres.ai)

How does a backend developer understand that a SQL query will work well on a “prod”? In large or rapidly growing companies, not everyone has access to the "product". And with access, not all requests can be painlessly checked, and creating a copy of the database often takes hours. To solve these problems, we created an artificial DBA - Joe. It has already been successfully implemented in several companies and helps more than a dozen developers.

Video:

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Hi all! My name is Anatoly Stansler. I work for a company postgres.ai. We are committed to speeding up the development process by removing the delays associated with the work of Postgres from developers, DBAs and QAs.

We have great clients and today part of the report will be devoted to cases that we met while working with them. I will talk about how we helped them solve quite serious problems.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

When we are developing and doing complex high-load migrations, we ask ourselves the question: “Will this migration take off?”. We use review, we use the knowledge of more experienced colleagues, DBA experts. And they can tell whether it will fly or not.

But perhaps it would be better if we could test it ourselves on full size copies. And today we will just talk about what approaches to testing are now and how it can be done better and with what tools. We will also talk about the pros and cons of such approaches, and what we can fix here.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Who has ever made indexes directly on prod or made any changes? Quite a bit of. And for whom did this lead to the fact that data was lost or there were downtime? Then you know this pain. Thank God there are backups.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

The first approach is testing in prod. Or, when a developer sits on a local machine, he has test data, there is some kind of limited selection. And we roll out to prod, and we get this situation.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

It hurts, it's expensive. It's probably best not to.

And what's the best way to do it?

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Let's take staging and select some part of prod there. Or at best, let's take a real prod, all the data. And after we have developed it locally, we will additionally check for staging.

This will allow us to remove some of the errors, i.e. prevent them from being on prod.

What are the problems?

  • The problem is that we share this staging with colleagues. And very often it happens that you make some kind of change, bam - and there is no data, the work is down the drain. Staging was multi-terabyte. And you have to wait a long time for it to rise again. And we decide to finalize it tomorrow. That's it, we have a development.
  • And, of course, we have many colleagues working there, many teams. And it has to be done manually. And this is inconvenient.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And it is worth saying that we have only one attempt, one shot, if we want to make some changes to the database, touch the data, change the structure. And if something went wrong, if there was an error in the migration, then we will not quickly roll back.

This is better than the previous approach, but there is still a high probability that some kind of error will go to production.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

What prevents us from giving each developer a test bench, a full-size copy? I think it's clear what gets in the way.

Who has a database larger than a terabyte? More than half the room.

And it is clear that keeping machines for each developer, when there is such a large production, is very expensive, and besides, it takes a long time.

We have clients who have realized that it is very important to test all changes on full-size copies, but their database is less than a terabyte, and there are no resources to keep a test bench for each developer. Therefore, they have to download the dumps locally to their machine and test in this way. It takes a lot of time.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Even if you do it inside the infrastructure, then downloading one terabyte of data per hour is already very good. But they use logical dumps, they download locally from the cloud. For them, the speed is about 200 gigabytes per hour. And it still takes time to turn around from the logical dump, roll up the indexes, etc.

But they use this approach because it allows them to keep the prod reliable.

What can we do here? Let's make test benches cheap and give every developer their own test bench.

And this is possible.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And in this approach, when we make thin clones for each developer, we can share it on one machine. For example, if you have a 10TB database and want to give it to 10 developers, you don't need to have XNUMX x XNUMXTB databases. You only need one machine to make thin isolated copies for each developer using one machine. I'll tell you how it works a little later.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Real example:

  • DB - 4,5 terabytes.

  • We can get independent copies in 30 seconds.

You don't have to wait for a test stand and depend on how big it is. You can get it in seconds. It will be completely isolated environments, but which share data among themselves.

This is great. Here we are talking about magic and a parallel universe.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

In our case, this works using the OpenZFS system.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

OpenZFS is a copy-on-write file system that supports snapshots and clones out of the box. It is reliable and scalable. She is very easy to manage. It can literally be deployed in two teams.

There are other options:

  • lvm,

  • Storage (for example, Pure Storage).

The Database Lab I'm talking about is modular. Can be implemented using these options. But for now, we have focused on OpenZFS, because there were problems with LVM specifically.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

How it works? Instead of overwriting the data every time we change it, we save it by simply marking that this new data is from a new point in time, a new snapshot.

And in the future, when we want to rollback or we want to make a new clone from some older version, we just say: "OK, give us these blocks of data that are marked like this."

And this user will work with such a data set. He will gradually change them, make his own snapshots.

And we will branch. Each developer in our case will have the opportunity to have his own clone that he edits, and the data that is shared will be shared between everyone.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

To deploy such a system at home, you need to solve two problems:

  • The first is the source of the data, where you will take it from. You can set up replication with production. You can already use the backups that you have configured, I hope. WAL-E, WAL-G or Barman. And even if you are using some kind of Cloud solution like RDS or Cloud SQL, then you can use logical dumps. But we still advise you to use backups, because with this approach you will also retain the physical structure of the files, which will allow you to be even closer to the metrics that you would see in production in order to catch those problems that exist.

  • The second is where you want to host the Database Lab. It could be Cloud, it could be On-premise. It is important to say here that ZFS supports data compression. And it does it quite well.

Imagine that for each such clone, depending on the operations that we do with the base, some kind of dev will grow. For this, dev will also need space. But due to the fact that we took a base of 4,5 terabytes, ZFS will compress it to 3,5 terabytes. This can vary depending on the settings. And we still have room for dev.

Such a system can be used for different cases.

  • These are developers, DBAs for query validation, for optimization.

  • This can be used in QA testing to test a particular migration before we roll it out to prod. And we can also raise special environments for QA with real data, where they can test new functionality. And it will take seconds instead of waiting hours, and maybe days in some other cases where thin copies are not used.

  • And another case. If the company does not have an analytics system set up, then we can isolate a thin clone of the product base and give it to long queries or special indexes that can be used in analytics.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

With this approach:

  1. Low probability of errors on the "prod", because we tested all the changes on full-size data.

  2. We have a culture of testing, because now you do not have to wait for hours for your own stand.

  3. And there is no barrier, no waiting between tests. You can actually go and check. And it will be better this way as we speed up the development.

  • There will be less refactoring. Fewer bugs will end up in prod. We will refactor them less later.

  • We can reverse irreversible changes. This is not the standard approach.

  1. This is beneficial because we share the resources of the test benches.

Already good, but what else could be accelerated?

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Thanks to such a system, we can greatly reduce the threshold for entering such testing.

Now there is a vicious circle, when a developer, in order to get access to real full-sized data, must become an expert. He must be trusted with such access.

But how to grow if it is not there. But what if you only have a very small set of test data available to you? Then you won't get any real experience.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

How to get out of this circle? As the first interface, convenient for developers of any level, we chose the Slack bot. But it can be any other interface.

What does it allow you to do? You can take a specific query and send it to a special channel for the database. We will automatically deploy a thin clone in seconds. Let's run this request. We collect metrics and recommendations. Let's show a visualization. And then this clone will remain so that this query can be somehow optimized, add indexes, etc.

And also Slack gives us opportunities for collaboration out of the box. Since this is just a channel, you can start discussing this request right there in the thread for such a request, ping your colleagues, DBAs that are inside the company.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

But there are, of course, problems. Because this is the real world, and we're using a server hosting many clones at once, we have to compress the amount of memory and CPU power available to the clones.

But for these tests to be plausible, you need to somehow solve this problem.

It is clear that the important point is the same data. But we already have it. And we want to achieve the same configuration. And we can give such an almost identical configuration.

It would be cool to have the same hardware as in production, but it may differ.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Let's remember how Postgres works with memory. We have two caches. One from the file system and one native Postgres, i.e. Shared Buffer Cache.

It is important to note that the Shared Buffer Cache is allocated when Postgres starts, depending on what size you specify in the configuration.

And the second cache uses all available space.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And when we make several clones on one machine, it turns out that we gradually fill the memory. And in a good way, Shared Buffer Cache is 25% of the total amount of memory that is available on the machine.

And it turns out that if we do not change this parameter, then we will be able to run only 4 instances on one machine, i.e. 4 of all such thin clones. And this, of course, is bad, because we want to have much more of them.

But on the other hand, Buffer Cache is used to execute queries for indexes, that is, the plan depends on how large our caches are. And if we just take this parameter and reduce it, then our plans can change a lot.

For example, if we have a large cache on prod, then Postgres will prefer to use an index. And if not, then there will be SeqScan. And what would be the point if our plans did not coincide?

But here we come to the conclusion that in fact the plan in Postgres does not depend on the specific size specified in the Shared Buffer in the plan, it depends on the effective_cache_size.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Effective_cache_size is the estimated amount of cache that is available to us, i.e. the sum of Buffer Cache and file system cache. This is set by the config. And this memory is not allocated.

And due to this parameter, we can kind of trick Postgres, saying that we actually have a lot of data available, even if we don’t have this data. And thus, the plans will completely coincide with production.

But this can affect the timing. And we optimize queries by timing, but it is important that timing depends on many factors:

  • It depends on the load that is currently on prod.

  • It depends on the characteristics of the machine itself.

And this is an indirect parameter, but in fact we can optimize exactly by the amount of data that this query will read in order to get the result.

And if you want the timing to be close to what we will see in prod, then we need to take the most similar hardware and, possibly, even more so that all the clones fit. But this is a compromise, i.e. you will get the same plans, you will see how much data a particular query will read and you will be able to conclude whether this query is good (or migration) or bad, it still needs to be optimized.

Let's take a look at how Joe is specifically optimized.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Let's take a request from a real system. In this case, the database is 1 terabyte. And we want to count the number of fresh posts that had more than 10 likes.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

We are writing a message to the channel, a clone has been deployed for us. And we will see that such a request will complete in 2,5 minutes. This is the first thing we notice.

B Joe will show you automatic recommendations based on the plan and metrics.

We will see that the query processes too much data to get a relatively small number of rows. And some kind of specialized index is needed, since we noticed that there are too many filtered rows in the query.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Let's take a closer look at what happened. Indeed, we see that we have read almost one and a half gigabytes of data from the file cache or even from disk. And this is not good, because we got only 142 lines.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And, it would seem, we have an index scan here and should have worked out quickly, but since we filtered out too many lines (we had to count them), the request slowly worked out.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And this happened in the plan due to the fact that the conditions in the query and the conditions in the index partially do not match.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Let's try to make the index more precise and see how the query execution changes after that.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

The creation of the index took quite a long time, but now we check the query and see that the time instead of 2,5 minutes is only 156 milliseconds, which is good enough. And we read only 6 megabytes of data.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And now we use index only scan.

Another important story is that we want to present the plan in some more understandable way. We have implemented visualization using Flame Graphs.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

This is a different request, more intense. And we build Flame Graphs according to two parameters: this is the amount of data that a particular node counted in the plan and timing, i.e. the execution time of the node.

Here we can compare specific nodes with each other. And it will be clear which of them takes more or less, which is usually difficult to do in other rendering methods.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

Of course, everyone knows explain.depesz.com. A good feature of this visualization is that we save the text plan and also put some basic parameters into a table so that we can sort.

And developers who have not yet delved into this topic also use explain.depesz.com, because it is easier for them to figure out which metrics are important and which are not.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

There is a new approach to visualization - this is explain.dalibo.com. They do a tree visualization, but it's very hard to compare nodes with each other. Here you can understand the structure well, however, if there is a large request, then you will need to scroll back and forth, but also an option.

Collaboration

DBA bot Joe. Anatoly Stansler (Postgres.ai)

And, as I said, Slack gives us the opportunity to collaborate. For example, if we come across a complex query that is not clear how to optimize, we can clarify this issue with our colleagues in a thread in Slack.

DBA bot Joe. Anatoly Stansler (Postgres.ai)

It seems to us that it is important to test on full-size data. To do this, we made the Update Database Lab tool, which is available in open source. You can use the Joe bot too. You can take it right now and implement it at your place. All guides are available there.

It is also important to note that the solution itself is not revolutionary, because there is Delphix, but it is an enterprise solution. It is completely closed, it is very expensive. We specifically specialize in Postgres. These are all open source products. Join us!

This is where I end. Thank you!

Questions

Hello! Thanks for the report! Very interesting, especially to me, because I solved about the same problem some time ago. And so I have a number of questions. Hopefully I'll get at least part of it.

I wonder how you calculate the place for this environment? The technology means that under certain circumstances, your clones can grow to the maximum size. Roughly speaking, if you have a ten terabyte database and 10 clones, then it is easy to simulate a situation where each clone weighs 10 unique data. How do you calculate this place, that is, that delta that you spoke about, in which these clones will live?

Good question. It is important to keep track of specific clones here. And if a clone has some too big a change, it starts to grow, then we can first issue a warning to the user about this, or immediately stop this clone so that we do not have a fail situation.

Yes, I have a nested question. That is, how do you ensure the life cycle of these modules? We have this problem and a whole separate story. How does this happen?

There is some ttl for each clone. Basically, we have a fixed ttl.

What if not a secret?

1 hour, i.e. idle - 1 hour. If it is not used, then we bang it. But there is no surprise here, since we can raise the clone in seconds. And if you need it again, then please.

I am also interested in the choice of technologies, because, for example, we use several methods in parallel for one reason or another. Why ZFS? Why didn't you use LVM? You mentioned that there were problems with LVM. What were the problems? In my opinion, the most optimal option is with storage, in terms of performance.

What is the main problem with ZFS? The fact that you must run on the same host, i.e. all instances will live within the same OS. And in the case of storage, you can connect different equipment. And the bottleneck is only those blocks that are on the storage system. And the question of the choice of technologies is interesting. Why not LVM?

Specifically, we can discuss LVM at meetup. About storage - it's just expensive. We can implement the ZFS system anywhere. You can deploy it on your machine. You can simply download the repository and deploy it. ZFS is installed almost everywhere if we are talking about Linux. That is, we get a very flexible solution. And ZFS itself gives a lot out of the box. You can upload as much data as you like, connect a large number of disks, there are snapshots. And, as I said, it's easy to administer. That is, it seems very pleasant to use. He is tested, he is many years old. He has a very large community that is growing. ZFS is a very reliable solution.

Nikolai Samokhvalov: May I comment further? My name is Nikolay, we work together with Anatoly. I agree that storage is great. And some of our customers have Pure Storage etc.

Anatoly correctly noted that we are focused on modularity. And in the future, you can implement one interface - take a snapshot, make a clone, destroy the clone. It's all easy. And storage is cool, if it is.

But ZFS is available to everyone. DelPhix is ​​already enough, they have 300 clients. Of these, fortune 100 has 50 clients, i.e. they are aimed at NASA, etc. It's time for everyone to get this technology. And that's why we have an open source Core. We have an interface part that is not open source. This is the platform that we will show. But we want it to be accessible to everyone. We want to make a revolution so that all testers stop guessing on laptops. We have to write SELECT and immediately see that it is slow. Stop waiting for the DBA to tell you about it. Here is the main goal. And I think that we will all come to this. And we make this thing for everyone to have. Therefore ZFS, because it will be available everywhere. Thanks to the community for solving problems and for having an open source license, etc.*

Greetings! Thanks for the report! My name is Maxim. We have dealt with the same issues. They decided on their own. How do you share resources between these clones? Each clone can do its own thing at any given time: one tests one thing, another another, someone builds an index, someone has a heavy job. And if you can still divide by CPU, then by IO, how do you divide? This is the first question.

And the second question is about the dissimilarity of the stands. Let's say I have ZFS here and everything is cool, but the client on prod does not have ZFS, but ext4, for example. How in this case?

The questions are very good. I mentioned this problem a bit with the fact that we share resources. And the solution is this. Imagine that you are testing on staging. You can also have such a situation at the same time that someone gives one load, someone else. And as a result, you see incomprehensible metrics. Even the same problem can be with prod. When you want to check some request and see that there is some problem with it - it works slowly, then in fact the problem was not in the request, but in the fact that there is some kind of parallel load.

And therefore, it is important here to focus on what the plan will be, what steps we will take in the plan and how much data we will raise for this. The fact that our disks, for example, will be loaded with something, it will specifically affect the timing. But we can estimate how loaded this request is by the amount of data. It is not so important that at the same time there will be some kind of execution.

I have two questions. This is very cool stuff. Have there been cases where production data is critical, such as credit card numbers? Is there already something ready or is it a separate task? And the second question - is there something like this for MySQL?

About the data. We will do obfuscation until we do. But if you deploy exactly Joe, if you do not give access to developers, then there is no access to the data. Why? Because Joe doesn't show data. It only shows metrics, plans and that's it. This was done on purpose, because this is one of the requirements of our client. They wanted to be able to optimize without giving everyone access.

About MySQL. This system can be used for anything that stores state on disk. And since we are doing Postgres, we are now doing all the automation for Postgres first. We want to automate getting data from a backup. We are configuring Postgres correctly. We know how to make plans match, etc.

But since the system is extensible, it can also be used for MySQL. And there are such examples. Yandex has a similar thing, but they don't publish it anywhere. They use it inside Yandex.Metrica. And there is just a story about MySQL. But the technologies are the same, ZFS.

Thanks for the report! I also have a couple of questions. You mentioned that cloning can be used for analytics, for example to build additional indexes there. Can you tell a little more about how it works?

And I’ll immediately ask the second question about the similarity of the stands, the similarity of the plans. The plan also depends on the statistics collected by Postgres. How do you solve this problem?

According to the analytics, there are no specific cases, because we have not used it yet, but there is such an opportunity. If we are talking about indexes, then imagine that a query is chasing a table with hundreds of millions of records and a column that is usually not indexed in prod. And we want to calculate some data there. If this request is sent to prod, then there is a possibility that it will be simple on prod, because the request will be processed there for a minute.

Ok, let's make a thin clone that is not terrible to stop for a few minutes. And in order to make it more comfortable to read the analytics, we will add indices for those columns in which we are interested in data.

The index will be created each time?

You can make it so that we touch the data, make snapshots, then we will recover from this snapshot and drive new requests. That is, you can make it so that you can raise new clones with already affixed indices.

As for the question about statistics, if we restore from a backup, if we do replication, then our statistics will be exactly the same. Because we have the entire physical data structure, that is, we will bring the data as it is with all the statistics metrics too.

Here is another problem. If you use a cloud solution, then only logical dumps are available there, because Google, Amazon do not allow you to take a physical copy. There will be a problem.

Thanks for the report. There were two good questions here about MySQL and resource sharing. But, in fact, it all comes down to the fact that this is not a topic of specific DBMS, but of the file system as a whole. And, accordingly, the issues of resource sharing should also be resolved from there, not at the end that it is Postgres, but in the file system, in the server, in instance.

My question is a little different. It is closer to the multi-layered database, where there are several layers. For example, we set up a ten-terabyte image update, we are replicating. And we specifically use this solution for databases. Replication is in progress, data is being updated. There are 100 employees working in parallel here, who are constantly launching these different shots. What to do? How to make sure that there is no conflict, that they launched one, and then the file system changed, and these pictures all went?

They won't go because that's how ZFS works. We can keep separately in one thread the file system changes that come due to replication. And keep the clones that developers use on older versions of the data. And it works for us, everything is in order with this.

It turns out that the update will take place as an additional layer, and all new pictures will go already, based on this layer, right?

From previous layers that were from previous replications.

The previous layers will fall off, but they will refer to the old layer, and will they take new images from the last layer that was received in the update?

In general, yes.

Then as a consequence we will have up to a fig of layers. And over time they will need to be compressed?

Yes everything is correct. There is some window. We keep weekly snapshots. It depends on what resource you have. If you have the ability to store a lot of data, you can store snapshots for a long time. They won't go away on their own. There will be no data corruption. If the snapshots are outdated, as it seems to us, i.e. it depends on the policy in the company, then we can simply delete them and free up space.

Hello, thanks for the report! Question about Joe. You said that the customer did not want to give everyone access to the data. Strictly speaking, if a person has the result of Explain Analyze, then he can peep the data.

It's like that. For example, we can write: "SELECT FROM WHERE email = to that". That is, we will not see the data itself, but we can see some indirect signs. This must be understood. But on the other hand, it's all there. We have a log audit, we have control of other colleagues who also see what the developers are doing. And if someone tries to do this, then the security service will come to them and work on this issue.

Good afternoon Thanks for the report! I have a short question. If the company does not use Slack, is there any binding to it now, or is it possible for developers to deploy instances in order to connect a test application to the databases?

Now there is a link to Slack, i.e. there is no other messenger, but I really want to make support for other messengers too. What can you do? You can deploy DB Lab without Joe, go with the help of the REST API or with the help of our platform and create clones and connect with PSQL. But this can be done if you are ready to give your developers access to the data, because there will no longer be any screen.

I do not need this layer, but I need such an opportunity.

Then yes, it can be done.

Source: habr.com

Add a comment