Using AI to overcompress images

Using AI to overcompress images
Data-driven algorithms like neural networks have taken the world by storm. Their development is due to several reasons, including cheap and powerful equipment and a huge amount of data. Neural networks are currently at the forefront of everything related to "cognitive" tasks such as image recognition, natural language understanding, etc. But they should not be limited to such tasks. This article talks about how to compress images using neural networks, using residual learning. The approach presented in the article is faster and better than standard codecs. Schemes, equations and, of course, a table with tests under the cut.

This article is based on this work. It is assumed that you are familiar with neural networks and their concepts. convolution ΠΈ loss function.

What is image compression and how does it work?

Image compression is the process of converting an image so that it takes up less space. Simply storing images would take up a lot of space, which is why there are codecs such as JPEG and PNG that aim to reduce the size of the original image.

As you know, there are two types of image compression: lossless ΠΈ with losses. As the names suggest, lossless compression can retain the original image data, while lossy compression loses some data during compression. for example, JPG are lossy algorithms [approx. transl. - basically, let's also not forget about lossless JPEG], and PNG is a lossless algorithm.

Using AI to overcompress images
Comparison of lossless and lossy compression

Notice that there are a lot of blocky artifacts in the image on the right. This is lost information. Neighboring pixels of similar colors are compressed as a single area to save space, but information about the actual pixels is lost. Of course, the algorithms used in the JPEG, PNG, etc. codecs are much more complex, but this is a good intuitive example of lossy compression. Lossless compression is good, but lossless compressed files take up a lot of disk space. There are better ways to compress images without losing a lot of information, but they are quite slow and many use iterative approaches. This means that they cannot be run in parallel on multiple CPU or GPU cores. This limitation makes them completely impractical in everyday use.

Convolutional Neural Network Input

If something needs to be calculated and the calculations can be approximate, add neural network. The authors used a fairly standard convolutional neural network to improve image compression. The presented method not only performs on par with the best solutions (if not better), it can also use parallel computing, which leads to a dramatic increase in speed. The reason is that Convolutional Neural Networks (CNNs) are very good at extracting spatial information from images, which are then presented in a more compact form (for example, only the "important" bits of the image are preserved). The authors wanted to use this CNN capability to better represent the images.

Architecture

The authors proposed a double network. The first network takes an image as input and generates a compact representation (ComCNN). The output of this network is then processed by a standard codec (e.g. JPEG). After being processed by the codec, the image is passed to a second network, which "fixes" the image from the codec in an attempt to return the original image. The authors named this network Reconstructive CNN (RecCNN). Like GANs, both networks are trained iteratively.

Using AI to overcompress images
ComCNN Compact representation is passed to standard codec

Using AI to overcompress images
RecCNN. ComCNN output is scaled up and fed to RecCNN, which will attempt to learn the remainder

The codec output is scaled up and then passed to RecCNN. RecCNN will try to render the image as close to the original as possible.

Using AI to overcompress images
End-to-end image compression framework. Co(.) is an image compression algorithm. The authors used JPEG, JPEG2000 and BPG

What is a remainder?

The remainder can be thought of as a post-processing step to "improve" the image being decoded by the codec. Having a lot of β€œinformation” about the world, a neural network can make cognitive decisions about what to fix. This idea is based on residual learning, read the details about which you can here.

Loss functions

The two loss functions are used because we have two neural networks. The first of these, ComCNN, is labeled L1 and is defined as follows:

Using AI to overcompress images
Loss function for ComCNN

Explanation

This equation may seem complicated, but it's actually the standard (root mean square error) MSE. ||Β² means the norm of the vector they enclose.

Using AI to overcompress images
Equation 1.1

Cr denotes the output of ComCNN. ΞΈ denotes the learnability of ComCNN parameters, XK is the input image

Using AI to overcompress images
Equation 1.2

Re() stands for RecCNN. This equation simply conveys the meaning of equation 1.1 to RecCNN. ΞΈ denotes the RecCNN trainable parameters (a hat on top means the parameters are fixed).

Intuitive Definition

Equation 1.0 will cause ComCNN to change its weights so that when recreated with RecCNN, the final image looks as similar as possible to the input image. The second RecCNN loss function is defined as follows:

Using AI to overcompress images
Equation 2.0

Explanation

Again, the function may look complicated, but this is for the most part a standard neural network loss function (MSE).

Using AI to overcompress images
Equation 2.1

Co() means codec output, x with a hat on top means ComCNN output. ΞΈ2 are RecCNN trainable parameters, res() is just RecCNN's residual output. It is worth noting that RecCNN is trained on the difference between Co() and the input image, but not on the input image.

Intuitive Definition

Equation 2.0 will cause RecCNN to change its weights so that the output looks as similar as possible to the input image.

Training scheme

Models are trained iteratively, like GAN. The weights of the first model are fixed while the weights of the second model are being updated, then the weights of the second model are fixed while the first model is being trained.

Tests

The authors compared their method with existing methods, including simple codecs. Their method performs better than others while maintaining high speed on the appropriate hardware. In addition, the authors tried to use only one of the two networks and noted a drop in performance.

Using AI to overcompress images
Structural Similarity Index Comparison (SSIM). High values ​​indicate a better resemblance to the original. Bold type indicates the result of the work of the authors

Conclusion

We looked at a new way to apply deep learning to image compression, and talked about the possibility of using neural networks in tasks beyond β€œgeneral” tasks such as image classification and language processing. This method is not only not inferior to modern requirements, but also allows you to process images much faster.

Learning neural networks has become easier, because we made a promo code especially for Habravchan HORNBEAM, giving an additional 10% discount to the discount indicated on the banner.

Using AI to overcompress images

More courses

Recommended Articles

Source: habr.com

Add a comment