How a video codec works. Part 1: The Basics

The second part: How the video codec works

Any bitmap image can be represented as two-dimensional matrix. When it comes to colors, the idea can be developed by considering an image as three-dimensional matrix, in which the extra dimensions are used to store data for each of the colors.

If we consider the final color as a combination of the so-called. primary colors (red, green and blue), in our three-dimensional matrix we define three planes: the first for red, the second for green and the last for blue.
How a video codec works. Part 1: The Basics
We will call each point in this matrix a pixel (image element). Each pixel contains information about the intensity (usually as a numeric value) of each color. For example, red pixel means it has 0 green, 0 blue, and a maximum of red. pink color pixel can be formed using a combination of three colors. Using a numeric range from 0 to 255, a pink pixel is defined as Red = 255, Green = 192 и Blue = 203.

How a video codec works. Part 1: The Basics

The article was published with the support of EDISON.

We develop applications for video surveillance, video streaming, as well as doing video recording in the surgical room.

Alternative ways to encode a color image

There are many other models for representing the colors that make up an image. For example, you can use an indexed palette that only needs one byte to represent each pixel, instead of the three needed when using the RGB model. In such a model, one can use a 2D matrix instead of a 3D matrix to represent each color. This saves memory but produces a smaller color gamut.

How a video codec works. Part 1: The Basics

RGB

For example, take a look at this picture below. The first face is fully painted. The others are red, green, and blue planes (the intensity of the respective colors is shown in grayscale).

How a video codec works. Part 1: The Basics

We see that the shades of red in the original will be in the same places where the brightest parts of the second person are observed. Whereas the contribution of blue can mostly only be seen in Mario's eyes (the last face) and elements of his clothing. Notice where all three color planes contribute the least (the darkest parts of the images) to Mario's mustache.

To store the intensity of each color requires a certain number of bits - this value is called bit depth. Let's say it takes 8 bits (based on a value from 0 to 255) for one color plane. Then we have a color depth of 24 bits (8 bits * 3 R/G/B planes).

Another property of an image is разрешение, which is the number of pixels in one dimension. Often referred to as width × height, as below in the 4 by 4 example image.
How a video codec works. Part 1: The Basics

Another property that we deal with when working with images / videos is aspect ratioA that describes the usual proportional relationship between the width and height of an image or pixel.

When they say that a certain film or picture has a size of 16 by 9, they usually mean display aspect ratio (BUT From Display Aspect Ratio). However, sometimes there may be different shapes of individual pixels - in this case we are talking about pixel ratio (BY From Pixel Aspect Ratio).

How a video codec works. Part 1: The Basics

How a video codec works. Part 1: The Basics

Note to the hostess: DVD соответствует DAR 4 on 3

Although the actual resolution of a DVD is 704x480, it still maintains a 4:3 aspect ratio because PAR is set to 10:11 (704x10 / 480x11).

And finally, we can define video as a sequence of n personnel for the period time, which can be considered an additional dimension. A n then - this is the frame rate or the number of frames per second (FPS From frames per second).

How a video codec works. Part 1: The Basics

The number of bits per second required to display a video is its transfer ratebitrate.

bitrate = width * height * bit depth * frames per second

For example, a video with 30 frames per second, 24 bits per pixel, 480x240 resolution would require 82,944,000 bps or 82,944 Mbps (30x480x240x24) - but this is if any of the compression methods are not used.

If the transfer rate almost constant, then it is called constant baud rate (CBR From constant bit rate). But it can also vary, in this case it is called variable bit rate (Vbr From variable bit rate).

This graph shows limited VBR when not too many bits are wasted in the case of a completely dark frame.

How a video codec works. Part 1: The Basics

Initially, engineers developed a method to double the perceived frame rate of a video display without using additional bandwidth. This method is known as interlaced video; basically, it sends half the screen in the first "frame" and the other half in the next "frame".

At present, the rendering of scenes mainly occurs using progressive scanning technologies. It is a method for displaying, storing or transmitting moving images in which all lines of each frame are drawn sequentially.

How a video codec works. Part 1: The Basics

Well! Now we are aware of how an image is represented in digital form, how its colors are arranged, how many bits per second we spend to show video, if the bit rate is constant (CBR) or variable (VBR). We know about the given resolution using the given frame rate, got acquainted with many other terms, such as interlaced video, PAR and some others.

Removing redundancy

It is known that uncompressed video cannot be used normally. An hour of 720p video at 30 frames per second would take up 278 GB. We come to this value by multiplying 1280 x 720 x 24 x 30 x 3600 (width, height, bits per pixel, FPS and time in seconds).

Using lossless compression algorithms, such as DEFLATE (used in PKZIP, Gzip, and PNG), will not reduce the necessary bandwidth enough. We have to look for other ways to compress video.

To do this, you can use the features of our vision. We are better at distinguishing brightness than colors. A video is a set of consecutive images that repeat over time. Differences between adjacent frames of the same scene are small. In addition, each frame contains many areas using the same (or similar) color.

Color, brightness and our eyes

Our eyes are more sensitive to brightness than color. You can see for yourself by looking at this picture.

How a video codec works. Part 1: The Basics

If you don't see that the left half of the image is the color of the squares A и B are actually the same, that's fine. Our brain makes us pay more attention to chiaroscuro rather than color. On the right side between the indicated squares there is a jumper of the same color - therefore we (i.e. our brain) easily determine that, in fact, there is the same color.

Let's break down (simplified) how our eyes work. The eye is a complex organ with many parts. However, we are most interested in cones and rods. The eye contains about 120 million rods and 6 million cones.

Consider the perception of color and brightness as separate functions of certain parts of the eye (in fact, everything is somewhat more complicated, but we will simplify). Rod cells are mainly responsible for brightness, while cone cells are responsible for color. Cones are classified into three types, depending on the pigment they contain: S-cones (blue), M-cones (green), and L-cones (red).

Since we have many more rods (brightness) than cones (color), it can be concluded that we are more able to distinguish transitions between dark and light than colors.

How a video codec works. Part 1: The Basics

Contrast sensitivity functions

Researchers in experimental psychology and many other fields have developed many theories of human vision. And one of them is called contrast sensitivity functions. They are associated with spatial and temporal illumination. In short, it is about how many changes are required before the observer notices them. Note the plural of the word "function". This is due to the fact that we can measure the functions of contrast sensitivity not only to a black and white image, but also to a color one. The results of these experiments show that in most cases our eyes are more sensitive to brightness than color.

Since it is known that we are more sensitive to the brightness of the image, we can try to use this fact.

Color model

We figured out a little bit how to work with color images using the RGB scheme. There are other models as well. There is a model that separates luma from chrominance and it is known as Ycbcr. By the way, there are other models that make a similar division, but we will consider only this one.

In this color model Y is a representation of brightness, and two color channels are also used: Cb (rich blue) and Cr (saturated red). YCbCr can be derived from RGB, and the reverse is also possible. Using this model, we can create full color images, as we see below:

How a video codec works. Part 1: The Basics

Convert between YCbCr and RGB

Someone will object: how is it possible to get all the colors if green is not used?

To answer this question, let's convert RGB to YCbCr. We use the coefficients adopted in the standard BT.601, which was recommended by the department ITU-R. This division defines digital video standards. For example: what is 4K? What should be the frame rate, resolution, color model?

Let's calculate the brightness first. Let's use the constants proposed by ITU and replace the RGB values.

Y = 0.299R + 0.587G + 0.114B

After we got the brightness, we separate the blue and red colors:

Cb = 0.564 (BY)

Cr = 0.713 (RY)

And we can also convert back and even get green with YCbCr:

R = Y + 1.402Cr

B = Y + 1.772Cb

G = Y - 0.344.Cb - 0.714.Cr

Typically, displays (monitors, TVs, screens, etc.) only use the RGB model. But this model can be organized in different ways:

How a video codec works. Part 1: The Basics

Color subsampling

With an image represented as a combination of luma and chroma, we can exploit the higher sensitivity of the human visual system to luminance than to chroma by selectively removing information. Chroma subsampling is a method of encoding images using less resolution for chrominance than for luma.

How a video codec works. Part 1: The Basics

How much is it acceptable to reduce the color resolution?! It turns out there are already some schemas that describe how to handle resolution and merging (Output Color = Y + Cb + Cr ).

These schemes are known as subsampling systems and are expressed as a 3-fold ratio − a:x:y, which determines the number of samples of luminance and color difference signals.

a - standard of horizontal sampling (usually equal to 4)
x is the number of chrominance samples in the first row of pixels (horizontal resolution in relation to a)
y is the number of color sample changes between the first and second rows of pixels.

The exception is 4:1:0, providing one chrominance sample in each 4 by 4 luma resolution block.

Common schemes used in modern codecs:

  • 4:4:4 (no downsampling)
  • 4:2:2
  • 4:1:1
  • 4:2:0
  • 4:1:0
  • 3:1:1

YCbCr 4:2:0 - fusion example

Here is the merged image fragment using YCbCr 4:2:0. Note that we are only wasting 12 bits per pixel.

How a video codec works. Part 1: The Basics

This is how the same image looks like, encoded with the main types of color subsampling. The first row is the final YCbCr, the bottom row shows the color resolution. Very decent results, given the small loss in quality.

How a video codec works. Part 1: The Basics

Remember we calculated 278 GB of storage to store an hour long video file at 720p at 30 frames per second? If we use YCbCr 4:2:0, then this size will be reduced by half - 139 GB. So far, it is still far from an acceptable result.

You can get the YCbCr histogram yourself with FFmpeg. In this image, blue prevails over red, which is clearly visible in the histogram itself.

How a video codec works. Part 1: The Basics

Chromaticity, brightness, color gamut - video review

Recommended to watch this awesome video. It explains what brightness is, and indeed all the dots are placed over ё about brightness and color.

Frame types

We move on. Let's try to eliminate redundancy in time. But first, let's define some basic terminology. Suppose we have a movie with 30 frames per second, here are its first 4 frames:

How a video codec works. Part 1: The Basics How a video codec works. Part 1: The Basics How a video codec works. Part 1: The Basics How a video codec works. Part 1: The Basics

We can see a lot of repetition in frames: for example, a blue background that doesn't change from frame to frame. To solve this problem, we can abstractly classify them as three frame types.

I-frame (Intro frame)

The I-frame (reference frame, key frame, intra frame) is self-contained. Regardless of what you want to render, an I-frame is essentially a static photograph. The first frame is usually an I-frame, but we will regularly observe I-frames among frames that are far from the first.

How a video codec works. Part 1: The Basics

P-frame (Predicted frame)

The P-frame (predictive frame) takes advantage of the fact that almost always the current image can be reproduced using the previous frame. For example, in the second frame, the only change is the ball moving forward. We can get frame 2 by simply modifying frame 1 a little, just using the difference between these frames. To build frame 2, we refer to the previous frame 1.

How a video codec works. Part 1: The BasicsHow a video codec works. Part 1: The Basics

B-frame (Bi-predictive frame)

And what about links not only to past, but at the same time to future frames, to provide even better compression?! This is basically the B-frame (bidirectional frame).

How a video codec works. Part 1: The BasicsHow a video codec works. Part 1: The BasicsHow a video codec works. Part 1: The Basics

Intermediate withdrawal

These frame types are used to provide the best possible compression. We will explore how this happens in the next section. In the meantime, we note that the I-frame is the most “expensive” in terms of memory used, the P-frame is noticeably cheaper, but the B-frame is the most profitable option for video.

How a video codec works. Part 1: The Basics

Temporal redundancy (interframe prediction)

Let's look at what opportunities we have to minimize repetitions in time. We will solve this type of redundancy using mutual prediction methods.

We will try to spend as few bits as possible to encode a sequence of frames 0 and 1.

How a video codec works. Part 1: The Basics

We can produce subtraction, just subtract frame 1 from frame 0. We get frame 1, only use the difference between it and the previous frame, in fact we encode only the resulting remainder.

How a video codec works. Part 1: The Basics

But what if I told you that there is an even better method that uses even fewer bits?! First, let's break frame 0 into a crisp grid of blocks. And then we will try to match the blocks from frame 0 with frame 1. In other words, we will evaluate the movement between frames.

From Wikipedia - block motion compensation

Block motion compensation divides the current frame into non-overlapping blocks and the motion compensation vector tells where the blocks come from (a common misconception is that previous the frame is divided into non-overlapping blocks, and the motion compensation vectors tell where these blocks go. But in fact, the opposite is true - not the previous frame is analyzed, but the next one, it turns out not where the blocks are moving, but where they came from). Typically, source blocks overlap in a source frame. Some video compression algorithms assemble the current frame from parts of not even one, but several previously transmitted frames at once.

How a video codec works. Part 1: The Basics

In the process of estimation, we see that the ball has moved from (x= 0, y=25) to (x= 6, y=26), values x и y determine the motion vector. Another step we can take to preserve the bits is to only encode the motion vector difference between the last block position and the predicted one, so the final motion vector will be (x=6-0=6, y=26-25=1).

In a real situation, this ball would be divided into n blocks, but this does not change the essence of the matter.

Objects in the frame move in three dimensions, so when the ball moves, it can become visually smaller (or larger if it moves towards the viewer). It's normal that there won't be a perfect match between the blocks. Here is a combined view of our assessment and the real picture.

How a video codec works. Part 1: The Basics

But we see that when we use motion estimation, there is noticeably less data for encoding than when using a simpler method of calculating the delta between frames.

How a video codec works. Part 1: The Basics

What real motion compensation will look like

This technique is applied to all blocks at once. Often our conditional moving ball will be broken into several blocks at once.

How a video codec works. Part 1: The Basics

You can feel these concepts for yourself using Jupyter.

To see the motion vectors, you can create an extrinsic prediction video using ffmpeg.

How a video codec works. Part 1: The Basics

You can also use Intel Video Pro Analyzer (it's paid, but there's a free trial that's limited to the first ten frames only).

How a video codec works. Part 1: The Basics

Spatial redundancy (internal forecast)

If we analyze each frame in the video, we will find many interconnected areas.

How a video codec works. Part 1: The Basics

Let's go through this example. This scene is mostly blue and white.

How a video codec works. Part 1: The Basics

This is an I-frame. We cannot take the previous frames for prediction, but we can compress it. Let's encode the selection of the red block. If we look at his neighbors, we notice that there are some color trends around him.

How a video codec works. Part 1: The Basics

We assume that the colors in the frame are distributed vertically. Which means that the color of unknown pixels will contain the values ​​of its neighbors.

How a video codec works. Part 1: The Basics

This prediction may also be wrong. It is for this reason that you need to apply this method (internal forecast), and then subtract the real values. This will give us a residual block, resulting in a much more compressed matrix compared to the original.

How a video codec works. Part 1: The Basics

If you want to practice with intra-predictions, you can create videos of macroblocks and their predictions with ffmpeg. To understand the meaning of each block color, you will have to read the ffmpeg documentation.

How a video codec works. Part 1: The Basics

Or you can use Intel Video Pro Analyzer (as I mentioned above, the free trial version has a limitation on the first 10 frames, but this is enough for you at first).

How a video codec works. Part 1: The Basics

The second part: How the video codec works

Source: habr.com

Add a comment