Transferring PHP backend to Redis streams bus and choosing a framework-independent library

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

foreword

My website, which I do as a hobby, is designed to host interesting homepages and personal sites. This topic became of interest to me at the very beginning of my path in programming, at that moment I was fascinated by finding great professionals who write about themselves, their hobbies and projects. The habit of discovering them for myself has remained even now: on almost every commercial and not very site, I continue to look into the footer in search of links to the authors.

Implementation of the idea

The first version was just an html page on my personal site where I put links with captions into a ul list. Having typed 20 pages in some time, I began to think that this was not very effective and decided to try to automate the process. On stackoverflow, I noticed that many people indicate sites in their profiles, so I wrote a php parser that simply went through the profiles, starting from the first one (addresses on SO and still like this: `/users/1`), extracting links from the desired tag and added to SQLite.

This can be called the second version: a collection of tens of thousands of URLs in a SQLite plate that replaced the static list in html. I did a simple search on this list. Because there were only urls, then the search was just for them.

At this stage, I abandoned the project and returned to it after a long time. At this stage, my work experience was already more than three years and I felt that I could do something more serious. In addition, there was a great desire to master relatively new technologies for themselves.

Modern version

Project deployed in docker, the database has been moved to mongoDb, and relatively recently, radish has been added, which at first was just for caching. As a basis, one of the PHP microframeworks is used.

Problem

New sites are added with a console command that synchronously does the following:

  • Download content by URL
  • Sets a flag about whether HTTPS was available
  • Stores the essence of the website
  • The original HTML and headings are saved in the "indexing" history
  • Parses content, extracts Title and Description
  • The data is stored in a separate collection

This was enough to just store the sites and display them in a list:

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

But the idea of ​​automatically indexing, categorizing and ranking everything, keeping everything up to date, did not fit well into this paradigm. Even just adding a web method to add pages required code duplication and blocking to avoid potential DDoS.

In general, of course, everything can be done synchronously, and in the web method, simply save the URL so that the monstrous daemon performs all tasks for the URLs from the list. But still, even here the word β€œqueue” suggests itself. And if the queue is implemented, then all tasks can be divided and executed at least asynchronously.

Solution

Implement queues and make an event-driven system for processing all tasks. And just for a long time I wanted to try Redis Streams.

Using Redis streams in PHP

Because Since my framework is not one of the three giants Symfony, Laravel, Yii, then I would like to find an independent library. But, as it turned out (at the first examination) - it is impossible to find separate serious libraries. Everything related to queues is either a project of 3 commits from five years ago, or tied to the framework.

I've heard a lot about Symfony as a provider of some useful components, and I already use some of them. And also from Laravel, something can also be used, for example, their ORM, without the presence of the framework itself.

symfony/messenger

The first candidate immediately seemed ideal and without any doubt I installed it. But it turned out to be more difficult to google usage examples outside of Symfony. How to assemble a messaging bus from a bunch of classes with universal, meaningless names, and even on Redis?

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

The documentation on the official site was quite detailed, but initialization was only described for Symfony with their favorite YML and other magic methods for a non-symphonist. I had no interest in the installation process itself, especially during the New Year holidays. But I had to do this and unexpectedly long.

Trying to figure out how to instantiate a system using Symfony sources is also not the most trivial task for a short time:

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

After digging into it all and trying to do something with my hands, I came to the conclusion that I was doing some kind of crutches and decided to try something else.

illuminated/queue

It turned out that this library is tightly tied to the Laravel infrastructure and a bunch of other dependencies, so I didn’t spend a lot of time on it: I installed it, looked it up, saw the dependencies and deleted it.

yiisoft/yii2-queue

Well, here it was immediately assumed from the name, again, a tight binding to Yii2. I had to use this library and it was not bad, but I did not think that it completely depends on Yii2.

Other

Everything else that I found on the github was unreliable outdated and abandoned projects without stars, forks and a large number of commits.

Return to symfony/messenger, technical details

I had to deal with this library and, after spending some more time, I was able to. It turned out that everything is quite concise and simple. To instantiate the bus, I made a small factory, because I had several tires and with different handlers.

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

Just a few steps:

  • Create message handlers that should just be callable
  • We wrap them in a HandlerDescriptor (a class from the library)
  • These "Descriptors" are wrapped in a HandlersLocator instance
  • Add HandlersLocator to MessageBus Instance
  • We pass a set of `SenderInterface` to SendersLocator, in my case instances of `RedisTransport` classes, which are configured in an obvious way
  • Adding a SendersLocator to the MessageBus Instance

The MessageBus has a `->dispatch()` method that looks for the appropriate handlers in the HandlersLocator and passes the message to them using the appropriate `SenderInterface` to send through the bus (Redis streams).

In the configuration of the container (in this case, php-di), this whole bundle can be configured like this:

        CONTAINER_REDIS_TRANSPORT_SECRET => function (ContainerInterface $c) {
            return new RedisTransport(
                $c->get(CONTAINER_REDIS_STREAM_CONNECTION_SECRET),
                $c->get(CONTAINER_SERIALIZER))
            ;
        },
        CONTAINER_REDIS_TRANSPORT_LOG => function (ContainerInterface $c) {
            return new RedisTransport(
                $c->get(CONTAINER_REDIS_STREAM_CONNECTION_LOG),
                $c->get(CONTAINER_SERIALIZER))
            ;
        },
        CONTAINER_REDIS_STREAM_RECEIVER_SECRET => function (ContainerInterface $c) {
            return new RedisReceiver(
                $c->get(CONTAINER_REDIS_STREAM_CONNECTION_SECRET),
                $c->get(CONTAINER_SERIALIZER)
            );
        },
        CONTAINER_REDIS_STREAM_RECEIVER_LOG => function (ContainerInterface $c) {
            return new RedisReceiver(
                $c->get(CONTAINER_REDIS_STREAM_CONNECTION_LOG),
                $c->get(CONTAINER_SERIALIZER)
            );
        },
        CONTAINER_REDIS_STREAM_BUS => function (ContainerInterface $c) {
            $sendersLocator = new SendersLocator([
                AppMessagesSecretJsonMessages::class => [CONTAINER_REDIS_TRANSPORT_SECRET],
                AppMessagesDaemonLogMessage::class => [CONTAINER_REDIS_TRANSPORT_LOG],
            ], $c);
            $middleware[] = new SendMessageMiddleware($sendersLocator);

            return new MessageBus($middleware);
        },
        CONTAINER_REDIS_STREAM_CONNECTION_SECRET => function (ContainerInterface $c) {
            $host = 'bu-02-redis';
            $port = 6379;
            $dsn = "redis://$host:$port";
            $options = [
                'stream' => 'secret',
                'group' => 'default',
                'consumer' => 'default',
            ];

            return Connection::fromDsn($dsn, $options);
        },
        CONTAINER_REDIS_STREAM_CONNECTION_LOG => function (ContainerInterface $c) {
            $host = 'bu-02-redis';
            $port = 6379;
            $dsn = "redis://$host:$port";
            $options = [
                'stream' => 'log',
                'group' => 'default',
                'consumer' => 'default',
            ];

            return Connection::fromDsn($dsn, $options);
        },

Here you can see that in the SendersLocator for two different messages we assigned a different β€œtransport”, each of which has its own connection to the corresponding streams.

I made a separate demo project demonstrating an application of three daemons communicating with each other using a bus like this: https://github.com/backend-university/products/tree/master/products/02-redis-streams-bus.

But I will show how a consumer can be arranged:

use AppMessagesDaemonLogMessage;
use SymfonyComponentMessengerHandlerHandlerDescriptor;
use SymfonyComponentMessengerHandlerHandlersLocator;
use SymfonyComponentMessengerMessageBus;
use SymfonyComponentMessengerMiddlewareHandleMessageMiddleware;
use SymfonyComponentMessengerMiddlewareSendMessageMiddleware;
use SymfonyComponentMessengerTransportSenderSendersLocator;

require_once __DIR__ . '/../vendor/autoload.php';
/** @var PsrContainerContainerInterface $container */
$container = require_once('config/container.php');

$handlers = [
    DaemonLogMessage::class => [
        new HandlerDescriptor(
            function (DaemonLogMessage $m) {
                error_log('DaemonLogHandler: message handled: / ' . $m->getMessage());
            },
            ['from_transport' => CONTAINER_REDIS_TRANSPORT_LOG]
        )
    ],
];
$middleware = [];
$middleware[] = new HandleMessageMiddleware(new HandlersLocator($handlers));
$sendersLocator = new SendersLocator(['*' => [CONTAINER_REDIS_TRANSPORT_LOG]], $container);
$middleware[] = new SendMessageMiddleware($sendersLocator);

$bus = new MessageBus($middleware);
$receivers = [
    CONTAINER_REDIS_TRANSPORT_LOG => $container->get(CONTAINER_REDIS_STREAM_RECEIVER_LOG),
];
$w = new SymfonyComponentMessengerWorker($receivers, $bus, $container->get(CONTAINER_EVENT_DISPATCHER));
$w->run();

Using this framework in an application

Having implemented the bus in my backend, I separated the individual stages from the old synchronous command and made separate handlers, each of which does its own thing.

The pipeline for adding a new site to the database turned out like this:

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

And right after that, it became much easier for me to add new functionality, for example, extracting and parsing Rss. Because this process also requires the original content, then the rss link extractor handler, like the WebsiteIndexHistoryPersistor, subscribes to the β€œContent/HtmlContent” message, processes it, and passes the desired message along its pipeline further.

Transferring PHP backend to Redis streams bus and choosing a framework-independent library

In the end, we got several daemons, each of which keeps connections only to the necessary resources. For example demon crawlers contains all the handlers that require going to the Internet for content, and the daemon persist holds the connection to the database.

Now, instead of selects from the database, the required id after insertion by the persister is simply passed through the bus to all interested processors.

Source: habr.com

Add a comment