Typical application errors that lead to bloat in postgresql. Andrey Salnikov

I propose to read the transcript of the report of the beginning of 2016 by Andrey Salnikov "Typical errors in applications that lead to bloat in postgresql"

In this report, I will analyze the main errors in applications that occur at the stage of designing and writing application code. And I will take only those errors that lead to bloat in Postgresql. As a rule, this is the beginning of the end of the performance of your system as a whole, although initially no prerequisites for this were seen.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Glad to welcome everyone! This report is not as technical as the previous one from my colleague. This talk is aimed at back-end system developers mainly because we have a fairly large number of clients. And they all make the same mistakes. I will tell you about them. I will explain to what fatal and bad these errors lead.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Why are mistakes made? They are performed for two reasons: at random, maybe it will work out of ignorance of some mechanisms that occur at the level between the base and the application, as well as in the base itself.

I will give you three examples with terrible pictures of how things got bad. I will briefly describe the mechanism that occurs there. And how to deal with them, when they happened, and what preventive methods to use to prevent mistakes. I'll tell you about auxiliary tools and give useful links.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

I used a test database where I had two tables. One plate with customer accounts, the other with operations on these accounts. And with some periodicity, we update the balances on these accounts.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

The initial data of the plate: it is quite small, 2 MB. The response time for the database and specifically for the plate is also very good. And a fairly good load - 2 operations per second on the plate.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And through this report, I will show you graphs so that it is clear what is happening. There will always be 2 slides with graphs. The first slide is what happens in general on the server.

And in this situation, we see that we really have a small plate. The index is small at 2 MB. This is the first chart on the left.

The average response time across the server is also stable, small. This is the top right graph.

The lower left graph is the longest transactions. We can see that transactions are being completed quickly. And the autovacuum does not work here yet, because - it was a start-test. Then it will work and be useful to us.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

The second slide will always be dedicated to the test plate. In this situation, we constantly update the client's account balances. And we see that the average response time for the update operation is quite good, less than a millisecond. We see that the processor resources (this is the upper right graph) are also consumed evenly and quite small.

The lower right graph shows how much operating and disk memory we go through in search of our desired line before updating it. And the number of operations on the plate is 2 per second, as I said at the beginning.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And now we have a tragedy. For some reason, a long forgotten transaction occurs. The reasons are usually all banal:

  • One of the most common is that we started to access an external service in the application code. And this service does not answer us. That is, we opened a transaction, made a change in the database and went from the application to read mail or to another service within our infrastructure, and for some reason it does not answer us. And our session hung in a state - it is not known when it will be resolved.
  • The second situation is when an exception occurred in our code for some reason. And we did not process the closing of the transaction in the exception. And we got a hanging session with an open transaction.
  • And the last one is also quite common. This is poor quality code. Some frameworks open a transaction. It hangs, and you may not know in the application that you have it hanging.

Where do such things lead?

To the fact that our tables and indexes are starting to swell dramatically. This is exactly the same bloat effect. For the database, this will be expressed in the fact that we will have a very sharp increase in the response time of the database, the load on the database server will increase. And as a result, our application will suffer. Because if in your code you spent 10 milliseconds on a request to the database, 10 milliseconds on your logic, then your function worked out 20 milliseconds. And now your situation will be very sad.

And let's see what happens. The lower left graph shows that we have a long long transaction. And if we look at the upper left graph, we see that the size of the table jumped from two megabytes to 300 megabytes. At the same time, the amount of data in the table has not changed, that is, there is a fairly large amount of garbage.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

The overall situation in terms of the average server response time has also changed by several orders of magnitude. That is, all requests on the server began to sag completely. And at the same time, the internal Postgres processes in the face of autovacuum were launched, which are trying to do something and consume resources.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

What happens to our plate? The same. The average response time on the tablet jumped up several orders of magnitude. If specifically in terms of consumed resources, then we see that the load on the processor has greatly increased. This is the top right graph. And it has increased because the processor has to go through a bunch of useless lines in search of the one you need. This is the lower right graph. And as a result, the number of calls per second began to drop very much, because the database does not have time to process the same number of requests.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

We need to get back to life. We climb into the Internet and find out that long transactions lead to a problem. We find and kill this transaction. And everything is going well for us. Everything works as it should.

We calmed down, but after a while we begin to notice that the application does not work as it did before the emergency. Requests are processed all the same more slowly, and much more slowly. One and a half to two times slower specifically in my example. The load on the server is also higher than it was before the accident.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And the question: "What happens to the base at this moment?". And with basis there is a following situation. On the transaction chart, you can see that it has stopped and there really are no long-term transactions. But the dimensions of the plate during the accident grew fatally. And it hasn't decreased since then. The average time on the base has stabilized. And the answers seem to go adequately with an acceptable speed for us. Autovacuum became more active and began to do something with the tablet, because it needs to shovel more data.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Specifically, on the test scoreboard, where we change the balances: the response time for the request seems to have returned to normal. But in fact it is one and a half times higher.

And by the load on the processor, we see that the load on the processor did not return to the desired value before the crash. And the reasons there lie just in the lower right graph. It can be seen that there is a search of some amount of memory. That is, to search for the desired line, we spend the resources of the database server when sorting through useless data. The number of transactions per second has stabilized.

In general, good, but the situation is worse than it was. Explicit degradation of the database as a consequence of our application that works with this database.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And in order to understand what is happening there, if you were not at the previous report, then now a little bit of theory. Theory about the internal process. Why autovacuum and what does it do?

Literally in a nutshell for understanding. At some point in time we have a table. We have rows in the table. These lines can be active, live, we need now. They are marked in green in the picture. And there are dead lines that have already worked out, have been updated, new entries have appeared on them. And they are marked that they are no longer interesting to the database. But they lie in the table due to the peculiarities of Postgres.

Why do you need an autovacuum? Autovacuum comes at some point, calls the database and asks it: "Please give me the id of the oldest transaction that is currently open in the database." The database returns this id. And the autovacuum, relying on it, goes through the lines in the table. And if he sees that some lines have been changed by much older transactions, then he has the right to mark them as lines that we can reuse in the future by writing new data there. This is a background process.

At this time, we continue to work with the database, we continue to make some changes in the table. And on these lines, which we can reuse, we write new data. And in this way we get a cycle, that is, some dead old lines appear there all the time, instead of them we write down new lines that we need. And this is the normal state for PostgreSQL to work.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

What happened during the accident? How did this process take place?

We had a plate in some condition, some living, some dead lines. The autovacuum has arrived. He asked the database about what is our oldest transaction, what is its id. I got this id, which can be many hours old, maybe ten minutes old. It depends on how heavy the load you have on the database. And he went to look for lines that he can mark as reused. And I did not find such lines in our table.

But at this time we continue to work with the table. We do something in it, update it, change the data. What should the database do at this time? She has no choice but to add new lines to the end of the existing table. And thus at us the size of the table begins to be inflated.

We really need green lines to work. But during such a problem, it turns out that the percentage of green lines is extremely low in the entire volume of the table.

And when we execute a query, the database has to go through all the lines, both red and green, to find the right line. And the effect of inflating the table with useless data is called "bloat", which also eats up our disk space. Remember, it was 2 MB, now it's 300 MB? Now change megabytes to gigabytes and you will lose all your disk resources pretty quickly.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

What are the implications for us?

  • In my example, the table and index have grown 150 times. Some of our clients have had more fatal cases when disk space simply started to run out.
  • Tables will never shrink on their own. Autovacuum in some cases can cut off the tail of the table if there are only dead lines. But since there is a constant rotation, one green line may hang at the end and not be updated, and all the rest somewhere at the beginning of the plate will be recorded. But this is such an unlikely event that your table itself will decrease in size, so you should not hope for it.
  • The database needs to sort through the whole pile of useless lines. And we are wasting disk resources, wasting processor resources and electricity.
  • And this directly affects our application, because if at the beginning we spent 10 milliseconds on a request, 10 milliseconds on our code, then during the crash we began to spend a second on a request and 10 milliseconds on code, i.e., an order of magnitude application performance decreased. And when the accident was resolved, we began to spend 20 milliseconds per request, 10 milliseconds per code. This means that we still sank one and a half times in terms of performance. And this is all because of one transaction that hung, and, perhaps, through our fault.
  • And the question: β€œHow can I get everything back?” So that everything is fine with us and requests run as fast as before the accident.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

For this, there is a certain cycle of work that is being carried out.

First we need to find the problematic tables that have bloated. We understand that some tables record more actively, some less actively. And for this we use the extension pgstattuple. By installing this extension, you can write queries to help you find tables that are bloated enough.

Once you have found these tables, they need to be compressed. There are already tools for this. In our company, we use three tools. The first is the built-in VACUUM FULL. He is cruel, harsh and merciless, but sometimes he is very useful. pg_repack ΠΈ pgcompacttable are third-party utilities for compressing tables. And they are more careful about the database.

They are used depending on what is more convenient for you. But I will talk about this at the very end. The main thing is that there are three tools. There are plenty to choose from.

After we have corrected everything, making sure that everything is fine, we should know how to prevent this situation in the future:

  • It is fairly easy to prevent. You need to monitor the duration of sessions on the Master server. Particularly dangerous sessions in the idle in transaction state. These are those who just opened a transaction, did something and left, or simply hung, got lost in the code.
  • And for you, as developers, it is important to test the code at the time these situations arise. It's not hard to do. This will be a useful check. You will avoid a lot of "childish" problems associated with long transactions.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

On these graphs, I wanted to show you how the table and the behavior of the database changed after I passed VACUUM FULL on the table in this case. This is not my production.

The size of the table returned immediately to its normal working state of a couple of megabytes. This did not greatly affect the average response time across the server.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

But specifically in our test table, where we updated the account balances, we see that the average response time to a request to update the data in the tablet was reduced to pre-crash levels. The resources consumed by the processor to execute this request also fell to pre-crash levels. And the lower right graph shows that now we find exactly the line that we need right away, without going through the pile of dead lines that were before the table was compressed. And the average query time remained approximately at the same level. But here I have, rather, the error of my hardware.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

This is where the first story ends. She is the most common. And it happens to everyone, regardless of the experience of the client, how qualified programmers there are. Sooner or later it happens.

The second story, in which we distribute the load and optimize server resources

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

  • We have grown up and become serious guys. And we understand that we have a replica and it would be good for us to balance the load: write to the Master, and read from the replica. And usually this situation arises when we want to prepare some kind of reports or ETL. And business is very happy about it. He really wants a variety of reports with a bunch of complex analytics.
  • Reports last for many hours, because complex analytics cannot be calculated in milliseconds. We, like brave guys, write code. We do in the insert application that we record on the Master, we perform reports on the replica.
  • We distribute the load.
  • Everything works perfectly. We are great.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And what does this situation look like? Specifically, on these charts, I also added the duration of transactions from the replica for the duration of the transaction. All other graphs refer only to the Master Server.

By this time, my report board had grown. There are more of them. We can see that the average server response time is stable. We can see that we have a long running transaction on the replica that runs for 2 hours. We see the quiet work of the autovacuum, which processes dead lines. And we're all good.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Specifically, according to the test tablet, we continue to update the balances on the accounts there. And we also have a stable response time on request, stable resource consumption. Everything is fine with us.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Everything is fine until the moment when these reports start to fire back at us on a conflict with replication. And they shoot back at regular intervals.

We go online and start reading why this is happening. And we find a solution.

The first solution is to increase the replication latency. We know that our report runs for 3 hours. Set the replication delay to 3 hours. We start everything, but we still continue to have problems with the fact that reports are sometimes shot back.

We want everything to be perfect. Let's go further. And we find a cool setting on the Internet - hot_standby_feedback. We turn it on. Hot_standby_feedback allows us to hold the autovacuum running on the Master. Thus, we completely get rid of replication conflicts. And we all work well with reports.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And what is happening with the Master server at this time? And with the Master server, we have a total disaster. We are now seeing charts with both of these settings turned on. And we see that the session on the replica somehow began to influence the situation on the Master server. It does make an impact because it has suspended the autovacuum that cleans out the dead lines. Our table size has skyrocketed again. The average query execution time across the entire database also skyrocketed. The autovacuums tightened up a little.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Specifically, on our plate, we see that the data update on it also jumped into the sky. The consumption of processor resources has similarly increased greatly. We again iterate over a large number of dead useless lines. And the response time on this tablet, the number of transactions has fallen.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

What will it look like if we don't know what I was talking about before?

  • We start looking for problems. If we encountered problems in the first part, we know that this may be the reason for a long transaction and we climb on the Master. The problem is with the Master. Sausages him. He is warming up, he has a Load Average of under a hundred.
  • Requests slow down there, but we don’t see any long-term transactions there. And we don't understand what's going on. We don't know where to look.
  • Checking server hardware. Maybe our raid has collapsed. Maybe we burned out the memory bar. Yes, anything can be. But no, the servers are new, everything works fine.
  • Everyone runs: administrators, developers and the director. Nothing helps.
  • And at some point, everything suddenly starts to correct itself.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

On the replica, at that time, the request worked out and left. We've received a report. The business is still happy. As you can see, our table has grown again and is not going to decrease. On the chart with sessions, I left a piece of this long transaction from the replica, so that you can evaluate how long it takes until the situation stabilizes.

The session is gone. And only after some time the server comes more or less in order. And the average response time for requests on the Master server returns to normal. Because, finally, the autovacuum got the opportunity to clean out, mark these dead lines. And he started doing his job. And how quickly he does it, so quickly we will be in order.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

On the test table, where we update the account balances, we see exactly the same picture. The average account update time is also normalizing gradually. The resources consumed by the processor are also reduced. And the number of transactions per second is back to normal. But again, back to normal, not the same as we had before the accident.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

In any case, we get a drawdown in performance, as in the first case, one and a half to two times, and sometimes even more.

We seem to have done everything right. Distribute the load. The equipment is not idle. According to the mind, they broke the requests, but still everything turned out badly.

  • Don't enable hot_standby_feedback? Yes, it is not recommended to turn it on without particularly strong reasons. Because this twist directly affects the Master Server and suspends the work of the autovacuum there. By turning it on on some replica and forgetting about it, you can kill the Master and get big problems with the application.
  • Increase max_standby_streaming_delay? Yes, for reports it is. If you have a three-hour report and you don't want it to crash due to replication conflicts, then simply increase the delay. A lengthy report never requires data that has entered the database right now. If you have it for three hours, then you are running it for some old data period. And you, that three hours of delay, that six hours of delay - will not play any role, but you will receive reports consistently and not know the problems with their fall.
  • Naturally, you need to control long sessions on replicas, especially if you decide to enable hot_standby_feedback on a replica. Because it could be anything. We gave this remark to the developer so that he would test the requests. He wrote a crazy request. He started and went to drink tea, and we got the established Master. Or we launched the wrong application there. The situations are varied. Sessions on replicas need to be controlled as carefully as on the Master.
  • And if you have fast and long queries on replicas, then in this case it is better to split them to distribute the load. This is a link to streaming_delay. For fast to have one replica with a small replication delay. For long-running reporting requests, have a replica that can lag behind by 6 hours, by a day. This is a completely normal situation.

We eliminate the consequences in the same way:

  • We find bloated tables.
  • And we compress with the most convenient tool that suits us.

The second story ends here. Let's move on to the third story.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Also quite common for us, in which we do the migration.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

  • Any software product grows. Requirements are changing. In any case, we want to develop. And it happens that we need to update the data in the table, namely to run the update in terms of our migration to the new functionality that we are implementing as part of our development.
  • The old data format does not suit. Let's say we now turn to the second table, where I have operations on these accounts. And, let's say that they were in rubles, and we decided to increase the accuracy and do it in kopecks. And for this we need to make an update: multiply the field with the amount of the operation by one hundred.
  • In today's world, we use automated database versioning tools. Let's say Liquibase. We register our migration there. We test it on our test base. Everything is fine. The update is running. Blocks work for a while, but we get updated data. And we can launch new functionality on this. All tested and checked. All confirmed.
  • Carried out planned work, carried out migration.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Here is the migration with the update presented in front of you. Since I have operations on accounts, the plate was 15 GB. And since we're updating every line, we've doubled the plate with the update, because we've overwritten every line.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

During the migration, we could not do anything with this label, because all requests for it were queued and waited for this update to finish. But here I want to draw your attention to the numbers that are on the vertical axis. That is, we have an average request time before migration in the region of 5 milliseconds and a load on the processor, the number of block operations for reading disk memory is less than 7,5.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

We migrated and got problems again.

The migration was successful, but:

  • The old functionality began to run longer.
  • The table has grown in size again.
  • The load on the server has again become more than it was.
  • And, of course, we are still fiddling with the functionality that worked well, we improved it a little.

And this is again bloat, which again spoils our lives.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

Here I demonstrate that the table, like the previous two cases, is not going to return to the previous sizes. The average load on the server seems to be adequate.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And if we turn to the table with accounts, then we will see that the average request time has doubled for this table. The load on the processor and the number of lines to be sorted out in memory jumped above 7,5, but it was lower. And jumped in the case of processors by 2 times, in the case of block operations by 1,5 times, i.e. we got a degradation in server performance. And as a result - the degradation of the performance of our application. At the same time, the number of calls remained approximately at the same level.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

And here the main thing is to understand how to do such migrations correctly. And they need to be done. We do these migrations pretty regularly.

  • Such large migrations are not done automatically. They must always be controlled.
  • Needs supervision from a knowledgeable person. If you have a DBA on the team, let the DBA do it. Its' his job. If not, then let the most experienced person do it, who knows how to work with databases.
  • The new database schema, even if we update one column, we always prepare in stages, i.e. in advance before the new version of the application rolls out:
  • New fields are added in which we will write just the updated data.
  • We transfer data from the old field to the new field in small parts. Why are we doing this? First, we always control the process of this process. We know that we have already transferred so many batches and we have so many left.
  • And the second positive effect is that between each such batch we close a transaction, open a new one, and this makes it possible for the autovacuum to work according to the plate, to mark dead lines for reuse.
  • For the lines that will appear during the operation of the application (we still have the old application), we add a trigger that writes new values ​​to new fields. In our case, this is a multiplication by a hundred of the old value.
  • If we are completely stubborn and want the same field, then upon completion of all migrations and before rolling the new version of the application, we simply rename the fields. The old ones into some invented name, and we rename the new fields to the old ones.
  • And only after that we launch a new version of the application.

And at the same time, we will not get bloat and will not sag in performance.

This is the end of the third story.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

https://github.com/dataegret/pg-utils/blob/master/sql/table_bloat.sql

https://github.com/dataegret/pg-utils/blob/master/sql/table_bloat_approx.sql

And now a little more about the tools that I mentioned in the very first story.

Before looking for bloat, you must install the extension pgstattuple.

In order for you not to invent requests, we have already written these requests in our work. You can use them. There are two requests here.

  • The first one takes quite a long time, but it will show you the exact values ​​of bloat according to the table.
  • The second one works faster and is very effective when you need to quickly evaluate whether there is a bloat or not in the table. And you should also understand that there is always a bloat in a Postgres table. This is a feature of his MVCC model.
  • And 20% bloat is fine for tables in most cases. That is, you should not worry and compress this table.

We figured out how to identify tables that are swollen with us, moreover, when they are swollen with useless data.

Now about how to fix bloat:

  • If we have a small plate and good disks, i.e. on a plate up to a gigabyte, it is quite possible to use VACUUM FULL. He will take an exclusive lock from you for a few seconds, and okay, but he will do everything quickly and harshly. What does VACUUM FULL do? It takes an exclusive lock on the table and rewrites the live rows from the old tables to the new table. And at the end he replaces them. Deletes old files, substitutes new ones for old ones. But for the duration of its work, it takes an exclusive lock on the table. This means that you cannot do anything with this table: neither write to it, nor read into it, nor modify it. And VACUUM FULL requires additional disk space to write data.
  • Next Tool pg_repack. By its principle, it is very similar to VACUUM FULL, because it also overwrites data from old files to new ones and replaces them in the table. But at the same time, it does not take an exclusive lock on the table at the very beginning of its work, but takes it only at the moment when it has ready-made data in order to replace the files. It has the same disk resource requirements as VACUUM FULL. You need extra disk space, and this is sometimes critical if you have terabyte tables. And he is quite voracious on the processor, because he is actively working with I / O.
  • The third utility is pgcompacttable. It treats resources more carefully, because it works on slightly different principles. The main essence of pgcompacttable is that it moves all live rows to the beginning of the table with updates in the table. And then it starts the vacuum on this table, because we know that we have live rows at the beginning and dead rows at the end. And the vacuum itself cuts off this tail, that is, it does not require much additional disk space. And at the same time, it can still be squeezed by resources.

Everything with tools.

Typical application errors that lead to bloat in postgresql. Andrey Salnikov

If you find the bloat topic interesting in terms of digging further inward, then here are some useful links for you:

Here I tried to show a horror story for developers, because they are our direct clients of databases and must understand what and what actions lead to. I hope I succeeded. Thank you for your attention!

Questions

Thanks for the report! You talked about how problems can be identified. How can they be warned? That is, I had a situation where requests hung not only because they turned to some external services. It was just some wild joins. There were some tiny, harmless requests that hung around for a day, and then started doing some kind of nonsense. That is, it is very similar to what you are describing. How to track it? Sit and constantly watch, which request is stuck? How can this be prevented?

In this case, this is a task for the administrators of your company, not necessarily for the DBA.

I am an administrator.

PostgreSQL has a view called pg_stat_activity that shows pending queries. And you can see how long it hangs there.

I have to come in every 5 minutes and look?

Set up cron and check. If you have a long request, write a letter and that's it. That is, you do not need to look with your eyes, this can be automated. You will receive a letter, you respond to it. Or you can shoot automatically.

Are there clear reasons why this is happening?

I have listed some. Other more complex examples. And there can be a long conversation.

Thanks for the report! I wanted to clarify about the pg_repack utility. If it doesn't take an exclusive lock, then...

She makes an exclusive lock.

... then I could potentially lose data. Shouldn't my app be recording anything at this time?

No, it works quietly with the table, i.e. pg_repack first transfers all the live lines that are there. Naturally, there is some kind of record in the table going on. He just throws this ponytail.

That is, does he still do it at the end?

At the end, it takes an exclusive lock on swapping these files.

Will it be faster than VACUUM FULL?

VACUUM FULL, as it started, immediately took an exclusive lock. And until he does everything, he will not let her go. And pg_repack takes an exclusive lock only at the time of replacing files. At this point, you do not write there, but the data will not be lost, everything will be in order.

Hello! You talked about the work of autovacuum. There was a graph with red, yellow and green cells of the record. That is, yellow ones - he marked them as deleted. And as a result, you can write something new in them?

Yes. Postgres does not remove rows. He has such a specificity. If we updated the line, we marked the old one as deleted. The transaction id that changed this line gets up there, and we write a new line. And we have sessions that can potentially read them. At some point, they become quite old. And the essence of the autovacuum is that it runs through these lines and marks them as unnecessary. And there you can overwrite the data.

I understand. But the question is not about that. I didn't agree. Let's say we have a table. It has variable size fields. And if I try to insert something new, then it may simply not fit into the old cell.

No, there in any case the entire line is updated. Postgres has two storage models. It selects from the data type. There is data that is stored directly in the table, and there is also tos data. These are large amounts of data: text, json. They are stored in separate tablets. And according to these tablets, the same story with bloat happens, that is, everything is the same. They are just listed separately.

Thanks for the report! How acceptable is it to use statement timeout requests to limit the duration?

Very acceptable. We use it everywhere. And since we do not have our own services, we provide remote support, there are quite a variety of clients. And everyone is quite satisfied with this. That is, we have jobs in cron that check. It's just that the duration of the sessions is negotiated with the client, before which we do not nail. It could be a minute, it could be 10 minutes. It depends on the load on the base and its purpose. But we all use pg_stat_activity.

Thanks for the report! I'm trying to try on your report for my applications. And it seems that we start a transaction everywhere, and we explicitly complete it everywhere. If some exception, then all the same rollback happens. And then I thought. After all, the transaction can start not explicitly. This is a hint to the girl, I guess. If I just do a record update, will the transaction start in PostgreSQL and only end when the connection is disconnected?

If you are talking now about the application level, then it depends on the driver you are using, on the ORM that is being used. There are a lot of settings there. If you have auto commit on enabled, then a transaction starts there and closes immediately.

That is, it closes immediately after the update?

It depends on the settings. I named one setting. This is auto commit on. She's pretty common. If it is enabled, then the transaction was opened and closed. Unless you explicitly said β€œstart transaction” and β€œend transaction”, but simply launched a request into the session.

Hello! Thanks for the report! Imagine that we have a database that swells and swells and then the server runs out of space. Are there any tools to fix this situation?

The place on the server in a good way needs to be monitored.

For example, DBA went to drink tea, was at a resort, etc.

When a file system is created, at least some reserve space is created where data is not written.

What if it's completely zero?

There it is called reserved space, that is, it can be freed, and depending on how large it was created, you got free space. By default, I don't know how many are there. And in another case, deliver disks so that you have a place to carry out a recovery operation. You can delete some table that you are guaranteed not to need.

Are there no other tools?

It's always handmade. And in the place it is revealed what is better to do there, because there is data that is critical, there is non-critical. And for each database and application that works with it, it depends on the business. It is always decided on the spot.

Thanks for the report! I have two questions. First, you showed slides where it was shown that in the case of hung transactions, both the amount of table space and the size of the index grow. And further on the report there were a bunch of utilities that pack the tablet. And what about the index?

They pack them too.

But the vacuum does not affect the index?

Some work with an index. For example pg_rapack, pgcompacttable. Vacuum recreates indexes, affects them. VACUUM FULL has the essence of overwriting everything, i.e. it works with everyone.

And the second question. I did not understand why reports on replicas depend so much on replication itself. It seemed to me that reports are reading, and replication is writing.

What causes a replication conflict? We have a Master on which processes take place. We have an autovacuum. Autovacuum in fact, what does it do? He cuts out some old lines. If at this time we have a request on the replica that reads these old lines, and on the Master there was a situation that the autovacuum marked these lines as possible for rewriting, then we overwrote them. And we received a data packet, when we have to rewrite the lines that the request needs on the replica, then the replication process will wait for the timeout that you configured. And then PostgreSQL will decide what is more important to it. And replication is more important for him than a request, and he will shoot the request to make these changes on the replica.

Andrew, I have a question. These wonderful graphics that you showed during the presentation, is this the result of some work of your utility? How were the charts made?

This is a service Okmeter.

Is this a commercial product?

Yes. This is a commercial product.

Source: habr.com

Add a comment