We calculate potential "evil" bots and block them by IP

We calculate potential "evil" bots and block them by IP

Good day! In the article I will tell you how users of regular hosting can catch IP addresses that generate excessive load on the site and then block them using hosting tools, there will be a “slightly” php code, a few screenshots.

Input data:

  1. Site created on CMS WordPress
  2. Hosting Beget (this is not advertising, but the admin screens will be of this particular hosting provider)
  3. WordPress site launched sometime in early 2000 and has a large number of articles and materials
  4. PHP Version 7.2
  5. WP has the latest version
  6. For some time now, the site has started to generate a high load on MySQL according to the hosting. Every day this value exceeded 120% of the norm per account
  7. According to Yandex. Metrica website is visited by 100-200 people per day

First of all it was done:

  1. Cleaned up DB tables from accumulated garbage
  2. Disabled unnecessary plugins, removed sections of obsolete code

At the same time, I draw your attention to the fact that caching options (caching plugins) were tried, observations were made - but the load of 120% from one site was unchanged and could only grow.

What the approximate load on hosting databases looked like

We calculate potential "evil" bots and block them by IP
In the top is the site in question, a little lower are other sites that have the same cms and approximately the same traffic, but create less load.

Analysis

  • Many attempts have been made with data caching options, observations have been made for several weeks (fortunately, hosting has never written to me that I am so bad and they will turn me off)
  • There was an analysis and search for slow queries, then the structure of the database and the type of tables were slightly changed
  • For the analysis, we primarily used the built-in AWStats (by the way, it helped to calculate the most evil IP address by traffic volume
  • Metric - metric gives information only about people, not about bots
  • There have been attempts to use plugins for WP that can filter and block visitors even by country of residence and by various combinations
  • A completely radical way turned out to be to close the site for a day with the note “We are under maintenance” - this was also done using the famous plugin. In this case, we expect the load to drop, but not to 0-left values, since the WP ideology is based on hooks and plugins start their activity when a “hook” occurs, and before the “hook” occurs, queries to the database can already be made

Idea

  1. Calculate IP addresses that make a lot of requests in a short period of time.
  2. Record the number of hits to the site
  3. Block access to the site based on the number of hits
  4. Block with "Deny from" entry in .htaccess file
  5. Other options, like iptables and rules for Nginx, are not considered, because I am writing about hosting

An idea has appeared, so it is necessary to implement it, how without it ...

  • Create tables for data accumulation
    CREATE TABLE `wp_visiters_bot` (
    	`id` INT(11) NOT NULL AUTO_INCREMENT,
    	`ip` VARCHAR(300) NULL DEFAULT NULL,
    	`browser` VARCHAR(500) NULL DEFAULT NULL,
    	`cnt` INT(11) NULL DEFAULT NULL,
    	`request` TEXT NULL,
    	`input` TEXT NULL,
    	`data_update` DATETIME NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    	PRIMARY KEY (`id`),
    	UNIQUE INDEX `ip` (`ip`)
    )
    COMMENT='Кандидаты для блокировки'
    COLLATE='utf8_general_ci'
    ENGINE=InnoDB
    AUTO_INCREMENT=1;
    

    CREATE TABLE `wp_visiters_bot_blocked` (
    	`id` INT(11) NOT NULL AUTO_INCREMENT,
    	`ip` VARCHAR(300) NOT NULL,
    	`data_update` DATETIME NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    	PRIMARY KEY (`id`),
    	UNIQUE INDEX `ip` (`ip`)
    )
    COMMENT='Список уже заблокированных'
    COLLATE='utf8_general_ci'
    ENGINE=InnoDB
    AUTO_INCREMENT=59;
    

    CREATE TABLE `wp_visiters_bot_history` (
    	`id` INT(11) NOT NULL AUTO_INCREMENT,
    	`ip` VARCHAR(300) NULL DEFAULT NULL,
    	`browser` VARCHAR(500) NULL DEFAULT NULL,
    	`cnt` INT(11) NULL DEFAULT NULL,
    	`data_update` DATETIME NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    	`data_add` DATETIME NULL DEFAULT CURRENT_TIMESTAMP,
    	PRIMARY KEY (`id`),
    	UNIQUE INDEX `ip` (`ip`)
    )
    COMMENT='История всех запросов для дебага'
    COLLATE='utf8_general_ci'
    ENGINE=InnoDB
    AUTO_INCREMENT=1;
    
  • Let's create a file in which we will place the code. The code will write to the lock candidate tables and keep a history for debugging.

    File code for writing IP addresses

    <?php
    
    if (!defined('ABSPATH')) {
        return;
    }
    
    global $wpdb;
    
    /**
     * Вернёт конкретный IP адрес посетителя
     * @return boolean
     */
    function coderun_get_user_ip() {
    
        $client_ip = '';
    
        $address_headers = array(
            'HTTP_CLIENT_IP',
            'HTTP_X_FORWARDED_FOR',
            'HTTP_X_FORWARDED',
            'HTTP_X_CLUSTER_CLIENT_IP',
            'HTTP_FORWARDED_FOR',
            'HTTP_FORWARDED',
            'REMOTE_ADDR',
        );
    
        foreach ($address_headers as $header) {
            if (array_key_exists($header, $_SERVER)) {
    
                $address_chain = explode(',', $_SERVER[$header]);
                $client_ip = trim($address_chain[0]);
    
                break;
            }
        }
    
        if (!$client_ip) {
            return '';
        }
    
    
        if ('0.0.0.0' === $client_ip || '::' === $client_ip || $client_ip == 'unknown') {
            return '';
        }
    
        return $client_ip;
    }
    
    $ip = esc_sql(coderun_get_user_ip()); // IP адрес посетителя
    
    if (empty($ip)) {// Нет IP, ну и идите лесом...
        header('Content-type: application/json;');
        die('Big big bolt....');
    }
    
    $browser = esc_sql($_SERVER['HTTP_USER_AGENT']); //Данные для анализа браузера
    
    $request = esc_sql(wp_json_encode($_REQUEST)); //Последний запрос который был к сайту
    
    $input = esc_sql(file_get_contents('php://input')); //Тело запроса, если было
    
    $cnt = 1;
    
    //Запрос в основную таблицу с временными кондидатами на блокировку
    $query = <<<EOT
        INSERT INTO wp_visiters_bot (`ip`,`browser`,`cnt`,`request`,`input`)
            VALUES  ('{$ip}','{$browser}','{$cnt}','{$request}','$input')
             ON DUPLICATE KEY UPDATE cnt=cnt+1,request=VALUES(request),input=VALUES(input),browser=VALUES(browser)
    EOT;
    
    //Запрос для истории
    $query2 = <<<EOT
        INSERT INTO wp_visiters_bot_history (`ip`,`browser`,`cnt`)
            VALUES  ('{$ip}','{$browser}','{$cnt}')
             ON DUPLICATE KEY UPDATE cnt=cnt+1,browser=VALUES(browser)
    EOT;
    
    
    $wpdb->query($query);
    
    $wpdb->query($query2);
    
    

    The essence of the code is to get the IP address of the visitor and write it to the table. If ip is already in the table, the cnt field (the number of requests to the site) will be increased

  • Now it’s scary… Now they will burn me for my actions 🙂
    To record each request to the site, we include the file code in the main WordPress file - wp-load.php. Yes, we are changing the kernel file and it is after the global variable $wpdb already exists

So, now we can see how often this or that IP address is marked in our table and with a cup of coffee we look there every 5 minutes to understand the picture

We calculate potential "evil" bots and block them by IP

Further simply, we copied the "harmful" IP, opened the .htaccess file and added it to the end of the file

Order allow,deny
Allow from all
# start_auto_deny_list
Deny from 94.242.55.248
# end_auto_deny_list

Everything, now 94.242.55.248 - does not have access to the site and does not generate a load on the database

But every time it’s not a very righteous task to copy it with your hands, and besides, the code was conceived as an autonomous

Let's add a file that will be executed by CRON every 30 minutes:

File code modifying .htaccess

<?php

/**
 * Файл автоматического задания блокировок по IP адресу
 * Должен запрашиваться через CRON
 */
if (empty($_REQUEST['key'])) {
    die('Hello');
}

require('wp-load.php');

global $wpdb;

$limit_cnt = 70; //Лимит запросов по которым отбирать

$deny_table = $wpdb->get_results("SELECT * FROM wp_visiters_bot WHERE cnt>{$limit_cnt}");

$new_blocked = [];

$exclude_ip = [
    '87.236.16.70'//адрес хостинга
];

foreach ($deny_table as $result) {

    if (in_array($result->ip, $exclude_ip)) {
        continue;
    }

    $wpdb->insert('wp_visiters_bot_blocked', ['ip' => $result->ip], ['%s']);
}

$deny_table_blocked = $wpdb->get_results("SELECT * FROM wp_visiters_bot_blocked");

foreach ($deny_table_blocked as $blocked) {
    $new_blocked[] = $blocked->ip;
}

//Очистка таблицы
$wpdb->query("DELETE FROM wp_visiters_bot");

//echo '<pre>';print_r($new_blocked);echo '</pre>';

$file = '.htaccess';

$start_searche_tag = 'start_auto_deny_list';

$end_searche_tag = 'end_auto_deny_list';

$handle = @fopen($file, "r");
if ($handle) {

    $replace_string = '';//Тест для вставки в файл .htaccess

    $target_content = false; //Флаг нужного нам участка кода

    while (($buffer = fgets($handle, 4096)) !== false) {

        if (stripos($buffer, 'start_auto_deny_list') !== false) {
            $target_content = true;
            continue;
        }

        if (stripos($buffer, 'end_auto_deny_list') !== false) {
            $target_content = false;

            continue;
        }

        if ($target_content) {
            $replace_string .= $buffer;
        }
    }
    if (!feof($handle)) {
        echo "Ошибка: fgets() неожиданно потерпел неудачуn";
    }
    fclose($handle);
}

//Текущий файл .htaccess
$content = file_get_contents($file);

$content = str_replace($replace_string, '', $content);

//Очищаем все блокировки в файле .htaccess
file_put_contents($file, $content);

//Запись новых блокировок
$str = "# {$start_searche_tag}" . PHP_EOL;

foreach ($new_blocked as $key => $value) {
    $str .= "Deny from {$value}" . PHP_EOL;
}

file_put_contents($file, str_replace("# {$start_searche_tag}", $str, file_get_contents($file)));

The code of the file is quite simple and primitive, and its main idea is to take candidates for blocking and write blocking rules in the .htaccess file between comments
# start_auto_deny_list and # end_auto_deny_list

Now "harmful" ips are blocked by themselves, and the .htaccess file looks something like this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Order allow,deny
Allow from all

# start_auto_deny_list
Deny from 94.242.55.248
Deny from 207.46.13.122
Deny from 66.249.64.164
Deny from 54.209.162.70
Deny from 40.77.167.86
Deny from 54.146.43.69
Deny from 207.46.13.168
....... ниже другие адреса
# end_auto_deny_list

As a result, after the start of the action of such a code, you can see the result in the hosting panel:

We calculate potential "evil" bots and block them by IP

PS: The author's material, although I published part of it on my website, Habre turned out to be a more extended version.

Source: habr.com

Add a comment