The Story of the Physical Deletion of 300 Million Records in MySQL

Introduction

Hello. I'm ningenMe, a web developer.

As the title says, my story is about physically deleting 300 million records in MySQL.

I became interested in this, so I decided to make a memo (instruction).

Start - Alert

The batch server I use and maintain has a regular process that collects the last month's data from MySQL once a day.

Usually this process completes in about 1 hour, but this time it didn't complete for 7 or 8 hours and the alert didn't stop popping up...

Search for a cause

I tried to restart the process, look at the logs, but I didn’t see anything terrible.
The query was indexed correctly. But when I thought about what was going wrong, I realized that the volume of the database is quite large.

hoge_table | 350'000'000 |

350 million records. The indexing seemed to work correctly, just very slowly.

The required monthly data collection was about 12 records. It looks like the select command took a long time and the transaction hasn't been executed for a long time.

DB

Basically, it's a table that grows by about 400 records every day. The database was supposed to collect data only for the last month, therefore, the calculation was that it would withstand exactly this amount of data, but, unfortunately, the rotate operation was not included.

This database was not designed by me. I took it from another developer, so it felt like a technical debt.

There came a time when the volume of daily data inserted became large and finally reached its limit. It is assumed that when working with such a large amount of data, it would be necessary to separate them, but this, unfortunately, was not done.

And then I got involved.

Correction

It was more rational to reduce the database itself and reduce the time for its processing than to change the logic itself.

The situation should change significantly if 300 million records are erased, so I decided to do it ... Eh, I thought it would definitely work.

Action 1

Having prepared a reliable backup, I finally started sending requests.

γ€ŒSending request」

DELETE FROM hoge_table WHERE create_time <= 'YYYY-MM-DD HH:MM:SS';

"..."

"..."

β€œHmm… There is no answer. Maybe the process is taking a long time? - I thought, but just in case I looked at grafana and saw that the disk load was growing very quickly.
"Dangerous" - I thought again and immediately stopped the request.

Action 2

After analyzing everything, I realized that the amount of data was too large to delete everything at once.

I decided to write a script that could delete about 1 entries and ran it.

γ€Œimplementing the script」

β€œNow it will definitely work,” I thought.

Action 3

The second method worked, but was very time consuming.
To do everything neatly, without unnecessary nerves, it would take about two weeks. But still, this scenario did not meet the service requirements, so we had to move away from it.

Therefore, here is what I decided to do:

Copy the table and rename

From the previous step, I realized that deleting such a large amount of data creates an equally large load. Therefore, I decided to create a new table from scratch using insert and move the data that I was going to delete into it.

| hoge_table     | 350'000'000|
| tmp_hoge_table |  50'000'000|

If you make the new table the same size as above, the data processing speed should also be 1/7 faster.

After creating the table and renaming it, I started using it as the master (main) table. Now if I drop a table with 300 million records everything should be fine.
I found out that truncate or drop create less overhead than delete and decided to use this method.

Performance

γ€ŒSending request」

INSERT INTO tmp_hoge_table SELECT FROM hoge_table create_time > 'YYYY-MM-DD HH:MM:SS';

"..."
"..."
"Em…?"

Action 4

I thought the previous idea would work, but after sending the insert request, a multiple error appeared. MySQL has no mercy.

I was already so tired that I began to think that I no longer want to do this.

I sat and thought and realized that maybe there were too many insert requests for one time ...
I tried to send an insert query for the amount of data that the database should process in 1 day. Happened!

Well, after that we continue to send requests for the same amount of data. Since we need to remove the monthly amount of data, we repeat this operation about 35 times.

Renaming a table

Here luck was on my side: everything went smoothly.

Alert are gone

Batch processing speed has increased.

Previously, this process took about an hour, now it takes about 2 minutes.

After I made sure that all the problems were solved, I dropped 300 million records. I deleted the table and felt reborn.

Summary

I realized that the batch processing was missing rotate processing, and that was the main problem. Such a mistake in architecture leads to a waste of time.

Do you think about the load when replicating data by deleting records from the database? Let's not overload MySQL.

Those who are well versed in databases will definitely not encounter such a problem. For others, I hope this article was helpful.

Thanks for reading!

We will be very happy if you tell us if you liked this article, is the translation understandable, was it useful to you?

Source: habr.com

Add a comment