How we accelerated video encoding eight times

How we accelerated video encoding eight times

Every day, millions of viewers watch videos on the Internet. But for the video to become available, it must not only be uploaded to the server, but also processed. The faster this happens, the better for the service and its users.

My name is Askar Kamalov, a year ago I joined the Yandex video technology team. Today I will briefly tell Habr readers about how, by parallelizing the encoding process, we managed to significantly speed up the delivery of video to the user.

This post will primarily be of interest to those who have not previously thought about what happens under the hood of video services. In the comments you can ask questions and suggest topics for future posts.

A few words about the task itself. Yandex not only helps you search for videos on other sites, but also stores videos for its own services. Whether it’s an original program or a sports match on air, a film on KinoPoisk or videos on Zen and News - all of this is uploaded to our servers. In order for users to watch the video, it needs to be prepared: converted to the required format, created a preview, or even run through technology DeepHD. An unprepared file just takes up space. Moreover, we are talking not only about the optimal use of hardware, but also about the speed of delivery of content to users. Example: a recording of the decisive moment of a hockey match can be searched for within a minute after the event itself.

Sequential encoding

So, the happiness of the user largely depends on how quickly the video becomes available. And this is mainly determined by the transcoding speed. When there are no strict requirements for video upload speed, then there are no problems. You take a single, indivisible file, convert it, and upload it. At the beginning of our journey, this is how we worked:

How we accelerated video encoding eight times

The client uploads the video to the storage, the Analyzer component collects meta information and transfers the video to the Worker component for conversion. All stages are performed sequentially. In this case, there can be many encoding servers, but only one is busy processing a specific video. Simple, transparent diagram. This is where its advantages end. This scheme can only be scaled vertically (due to the purchase of more powerful servers).

Sequential encoding with intermediate result

To somehow smooth out the painful wait, the industry came up with a fast coding option. The name is misleading, because in fact, full coding occurs sequentially and takes just as long. But with an intermediate result. The idea is this: prepare and publish a low-resolution version of the video as quickly as possible, and only then higher-resolution versions.

On the one hand, video becomes available faster. And it's useful for important events. But on the other hand, the picture turns out blurry, and this annoys viewers.

It turns out that you need to not only quickly process the video, but also maintain its quality. This is what users expect from a video service now. It may seem that it is enough to buy the most productive servers (and regularly upgrade them all at once). But this is a dead end, because there is always a video that will make even the most powerful hardware slow down.

Parallel encoding

It is much more efficient to divide a complex problem into many less complex ones and solve them in parallel on different servers. This is MapReduce for video. In this case, we are not limited by the performance of one server and can scale horizontally (by adding new machines).

By the way, the idea of ​​splitting videos into small pieces, processing them in parallel and gluing them together is not some secret. You can find many references to this approach (for example, on HabrΓ© I recommend a post about the project DistVIDc). But this doesn’t make it any easier overall, because you can’t just take a ready-made solution and build it into your home. We need adaptation to our infrastructure, our video and even our load. In general, it’s easier to write your own.

So, in the new architecture, we divided the monolithic Worker block with sequential coding into microservices Segmenter, Tcoder, Combiner.

How we accelerated video encoding eight times

  1. Segmenter breaks the video into fragments of approximately 10 seconds. Fragments consist of one or more GOPs (group of pictures). Each GOP is independent and encoded separately so that it can be decoded without reference to frames from other GOPs. That is, fragments can be played independently of each other. This sharding reduces latency, allowing processing to begin earlier.
  2. Tcoder processes each fragment. It takes a task from the queue, downloads a fragment from the storage, encodes it into different resolutions (remember that the player can choose a version based on the connection speed), then puts the result back into the storage and marks the fragment as processed in the database. Having processed all the fragments, Tcoder sends the task to generate results for the next component.
  3. Combiner collects the results together: downloads all the fragments made by Tcoder, generates streams for different resolutions.

A few words about sound. The most popular AAC audio codec has an unpleasant feature. If you encode fragments separately, then you simply won’t be able to glue them together seamlessly. Transitions will be noticeable. Video codecs do not have this problem. Theoretically, you can look for a complex technical solution, but this game is simply not worth the candle yet (audio weighs significantly less than video). Therefore, only the video is encoded in parallel, and the entire audio track is processed.

The results

Thanks to parallel video processing, we have significantly reduced the delay between a video being uploaded to us and being available to users. For example, previously it could take two hours to create several full versions of different quality for a FullHD film lasting an hour and a half. Now all this takes 15 minutes. Moreover, with parallel processing, we create a high-resolution version even faster than a low-resolution version with the old intermediate result approach.

And one more thing. With the old approach, either there were not enough servers, or they were idle without tasks. Parallel coding allows you to increase the share of iron recycling. Now our cluster of more than a thousand servers is always busy with something.

In fact, there is still room for improvement. For example, we can save significant time if we start processing fragments of the video before it arrives to us in its entirety. As they say, more to come.

Write in the comments what tasks in the field of working with video you would like to read about.

Useful links to the experience of industry colleagues

Source: habr.com

Add a comment