New Article: Computational Photography

The original article is posted on the site Vastrik.ru and published on 3DNews with the permission of the author. We provide the full text of the article, with the exception of a huge number of links - they will be useful to those who are seriously interested in the topic and would like to study the theoretical aspects of computational photography in more depth, but we considered this material redundant for a wide audience.  

Today, not a single presentation of a smartphone is complete without licking its camera. Every month we hear about another success of mobile cameras: Google teaches Pixel how to shoot in the dark, Huawei zooms like binoculars, Samsung puts in lidar, and Apple makes the world's roundest corners. There are few places where innovation is so boldly flowing now.

Mirrors at the same time seem to be marking time. Sony showers everyone with new matrices every year, and manufacturers lazily update the latest digit of the version and continue to smoke relaxed on the sidelines. I have a $3000 DSLR on my desk, but when I travel, I take an iPhone. Why?

As the classic said, I went online with this question. Some “algorithms” and “neural networks” are discussed there, having no idea how exactly they affect photography. Journalists loudly read out the number of megapixels, bloggers saw paid unboxings in chorus, and aesthetes smear themselves with “sensual perception of the color palette of the matrix.” Everything is as usual.

I had to sit down, spend half my life and figure it out myself. In this article, I will tell you what I learned.

#What is computational photography?

Everywhere, including Wikipedia, they give something like this: computational photography is any technique for capturing and processing images where digital calculations are used instead of optical transformations. Everything about it is good, except that it doesn't explain anything. Even autofocus fits under it, but plenoptics, which has already brought us a lot of useful things, do not fit. The vagueness of official definitions, as it were, hints that we have no idea what we are talking about.

Computational photography pioneer Stanford Professor Marc Levoy (who is now in charge of the camera at Google Pixel) gives a different definition - a set of computer imaging techniques that improve or expand the capabilities of digital photography, which, when used, results in a regular photograph that could not technically be taken on this camera in the traditional way. In the article, I stick to it.

So, smartphones were to blame for everything.

Smartphones had no choice but to give life to a new kind of photography: computational photography.

Their small noisy matrices and tiny non-aperture lenses, according to all the laws of physics, should have brought only pain and suffering. They did, until their developers figured out how to cleverly use their strengths to overcome their weaknesses - fast electronic shutters, powerful processors and software.

New Article: Computational Photography

Most high-profile research in the field of computational photography falls on 2005-2015, which in science is considered literally yesterday. Right now, before our eyes and in our pockets, a new field of knowledge and technology is being developed that never existed.

Computational photography isn't just about neuro-bokeh selfies. The recent photograph of a black hole would not have been possible without computational photography techniques. To take such a photo with a conventional telescope, we would have to make it the size of the Earth. However, by combining the data of eight radio telescopes at different points of our balloon and writing some python scripts, we got the world's first photograph of the event horizon. Good for selfies too.

New Article: Computational Photography

#Start: digital processing

Let's imagine that we returned the 2007th. Our mom is anarchy and our photos are noisy 0,6MP jeeps shot on a skateboard. Around that time, we have the first irresistible desire to pour presets on them to hide the wretchedness of mobile matrices. Let's not deny ourselves.

New Article: Computational Photography

#matan and instagram

With the release of Instagram, everyone is obsessed with filters. As someone who reverse-engineered the X-Pro II, Lo-Fi and Valencia for, of course, research (ek) purposes, I still remember that they consisted of three components:

  • Color settings (Hue, Saturation, Lightness, Contrast, Levels, etc.) are simple numerical ratios, just like any preset that photographers have used since ancient times.
  • Tone Mappings - vectors of values, each of which told us: "Red with a hue of 128 should be turned into a hue of 240."
  • Overlay - a translucent picture with dust, grain, vignette, and everything else that can be superimposed on top to get the effect of an old film that is not at all banal. Was not always present.   

Modern filters are not far from these three, only they have become a little more complicated in mathematics. With the advent of hardware shaders and OpenCL on smartphones, they were quickly rewritten for the GPU, and it was considered wildly cool. For 2012, of course. Today, any student can do the same in CSS, and he still won't make it to graduation.

However, the progress of filters today has not stopped. The guys from Dehanser, for example, are great at non-linear filters - instead of proletarian tone-mapping, they use more complex non-linear transformations, which, according to them, opens up much more possibilities.

A lot of things can be done with non-linear transformations, but they are incredibly complex, and we humans are incredibly dumb. As soon as it comes to non-linear transformations in science, we prefer to go into numerical methods and cram neural networks everywhere so that they write masterpieces for us. It was the same here.

#Automation and dreams of the "masterpiece" button

Once everyone got used to filters, we started building them right into the cameras. History hides which manufacturer was the first, but just to understand how long ago it was - in iOS 5.0, which was released already in 2011, there was already a public API for Auto Enhancing Images. Only Jobs knows how long it was in use before being released to the public.

Automation did the same as each of us, opening a photo in the editor - pulled out dips in light and shadows, piled on saturation, removed red eyes and fixed complexion. Users did not even know that the "dramatically improved camera" in the new smartphone was only the merit of a couple of new shaders. It was still five years before the Google Pixel came out and the computational photography hype began.

New Article: Computational Photography

Today, the battles for the “masterpiece” button have moved to the field of machine learning. Having played enough with tone-mapping, everyone rushed to train CNNs and GANs to move the sliders instead of the user. In other words, using the input image, determine a set of optimal parameters that would bring this image closer to some subjective understanding of “good photography”. Implemented in the same Pixelmator Pro and other editors. It works, as you might guess, not very and not always. 

#Stacking is 90% of mobile camera success

True computational photography began with stacking - stacking multiple photos on top of each other. It's not a problem for a smartphone to snap a dozen frames in half a second. There are no slow mechanical parts in their cameras: the aperture is fixed, and instead of a moving curtain, there is an electronic shutter. The processor simply commands the matrix how many microseconds to catch wild photons, and reads the result itself.

Technically, a phone can shoot photos at video speed and video at photo resolution, but everything depends on the speed of the bus and processor. Therefore, they always set software limits.

Staking itself has been with us for a long time. even grandparents installed plug-ins on Photoshop 7.0 to combine several photos into eye-catching HDR or glue together a 18000 × 600 pixel panorama and ... in fact, no one figured out what to do with them next. Rich times were, unfortunately, wild.

Now we have become adults and call it “epsilon photography” - when, by changing one of the camera parameters (exposure, focus, position) and gluing the resulting frames, we get something that could not be captured in one frame. But this is a term for theorists, but in practice another name has taken root - staking. Today, in fact, 90% of all innovations in mobile cameras are built on it.

New Article: Computational Photography

Something that many people don't think about, but which is essential to understanding all mobile and computing photography: the camera in a modern smartphone starts taking photos as soon as you open its app. Which is logical, because she needs to somehow transfer the image to the screen. However, in addition to the screen, it saves high-resolution frames to its own circular buffer, where it keeps them for a couple more seconds.

When you press the "take a photo" button - it is actually already taken, the camera simply takes the last photo from the buffer.

This is how any mobile camera works today. At least in all the flagships, not from the garbage heaps. Buffering allows you to implement not just a zero shutter lag, which photographers have been dreaming of for so long, but even a negative one - when you press the button, the smartphone looks into the past, unloads the last 5-10 photos from the buffer and starts frantically analyzing and gluing them. No more waiting for the phone to click frames for HDR or night mode - just take them from the buffer, the user will not even know.

New Article: Computational Photography

By the way, it was with the help of the negative shutter lag that Live Photo was implemented in iPhones, and HTC had something similar back in 2013 under the strange name Zoe.

#Exposure stacking - HDR and the fight against brightness differences

New Article: Computational Photography

Whether camera matrices are capable of capturing the entire range of brightness available to our eyes is an old hot topic for debate. Some say no, because the eye is able to see up to 25 f-stops, while even a maximum of 14 can be drawn from the top full-frame matrix. Others call the comparison incorrect, because the brain helps the eye by automatically adjusting the pupil and completing the image with its neural networks, and instant the dynamic range of the eye is actually no more than just 10-14 f-stops. Let's leave these arguments to the Internet's best armchair thinkers.

The fact remains: when shooting friends against a bright sky without HDR on any mobile camera, you get either a normal sky and black faces of friends, or drawn friends, but a sky scorched to death.

The solution has long been invented - to expand the range of brightness using HDR (High dynamic range). You need to take several shots with different shutter speeds and glue them together. So that one is “normal”, the second is lighter, the third is darker. We take dark places from a light frame, fill in overexposure from a dark one - profit. It remains only to solve the problem of automatic bracketing - how much to shift the exposure of each frame so as not to overdo it, but a sophomore of a technical university will now cope with determining the average brightness of the picture.

New Article: Computational Photography

On the latest iPhones, Pixels, and Galaxy HDR generally turns on automatically when a simple algorithm inside the camera determines that you are shooting something with contrast on a sunny day. You can even notice how the phone switches the recording mode to the buffer in order to save frames shifted in exposure - fps drops in the camera, and the picture itself becomes juicier. The switching point is clearly visible on my iPhone X when shooting outdoors. Take a closer look at your smartphone next time too.

The disadvantage of HDR with exposure bracketing is its impenetrable helplessness in poor lighting. Even under the light of a room lamp, the frames are so dark that the computer cannot straighten and glue them. To solve the problem with light in 2013, Google showed a different approach to HDR in the then-released Nexus smartphone. He used time staking.

#Time stacking - long exposure simulation and time lapse

New Article: Computational Photography

Time stacking allows you to get a long exposure with a series of short exposures. The pioneers were lovers of shooting traces of stars in the night sky, for whom it was inconvenient to open the shutter for two hours at once. It was so hard to calculate all the settings in advance, and from the slightest shaking the whole frame came out spoiled. They decided to open the shutter only for a couple of minutes, but many times, and then go home and paste the resulting frames in Photoshop.

New Article: Computational Photography

It turns out that the camera never actually shot at a slow shutter speed, but we got the effect of its imitation by adding several shots taken in a row. For smartphones, a bunch of applications have been written for a long time using this trick, but all of them are not needed since the feature has been added to almost all standard cameras. Today, even an iPhone can easily stitch together a long exposure from a Live Photo.

New Article: Computational Photography

Let's go back to Google with its nighttime HDR. It turned out that with the help of time bracketing, you can realize a good HDR in the dark. The technology first appeared in the Nexus 5 and was called HDR+. The rest of the Android phones received it as a gift. The technology is still so popular that it's even being lauded at presentations of the latest Pixels.

HDR+ works quite simply: having determined that you are shooting in the dark, the camera unloads the last 8-15 photos in RAW from the buffer in order to superimpose them on top of each other. Thus, the algorithm collects more information about the dark areas of the frame in order to minimize noise - pixels, where for some reason the camera could not collect all the information and messed up.

It's like if you didn't know what a capybara looks like and asked five people to describe it - their stories would be pretty much the same, but each would mention some unique detail. That way you would gather more information than just asking one. The same with pixels.

Adding frames taken from one point gives the same fake long exposure effect as with the stars above. The exposure of dozens of frames is summed up, errors on one are minimized on others. Imagine how many times you would have to click the shutter of a DSLR to achieve this.

New Article: Computational Photography

It only remained to solve the problem of automatic color correction - shots taken in the dark usually turn out to be yellow or green without exception, and we kind of want the juiciness of daylight. In early versions of HDR +, this was solved by simply tweaking the settings, as in filters a la instagram. Then they called for the help of neural networks.

This is how Night Sight appeared - the “night photography” technology in Pixel 2 and 3. The description says so: “Machine learning techniques built on top of HDR +, that make Night Sight work”. In fact, this is the automation of the color correction stage. The car was trained on a dataset of “before” and “after” photos in order to make one beautiful one out of any set of dark crooked photos.

New Article: Computational Photography

The dataset, by the way, was made publicly available. Maybe the guys from Apple will take it and finally teach their glass shovels to shoot properly in the dark.

In addition, Night Sight uses the calculation of the motion vector of objects in the frame to normalize blurring, which is sure to happen at slow shutter speeds. So, a smartphone can take clear parts from other frames and paste them.

#Motion stacking - panorama, superzoom and noise reduction

New Article: Computational Photography

The panorama is a popular pastime for the people of the countryside. History does not yet know of cases where a sausage photo would be interesting to someone other than its author, but it is impossible not to mention it - for many, staking began with this.

New Article: Computational Photography

The first useful way to use a panorama is to get a photo of a higher resolution than the camera matrix allows by gluing several frames together. Photographers have long been using different software for the so-called super-resolution photos - when slightly shifted photos seem to complement each other between pixels. In this way, you can get an image of at least hundreds of gigapixels, which is quite useful if you need to print it on a house-sized advertising poster.

New Article: Computational Photography

Another, more interesting approach is Pixel Shifting. Some mirrorless cameras like Sony and Olympus have started to support it since 2014, but they still forced to glue the result by hand. Typical big camera innovations.

Smartphones have succeeded here for a funny reason - when you take a photo, your hands shake. This seemingly problem formed the basis for the implementation of native super-resolution on smartphones.

To understand how it works, you need to remember how the matrix of any camera is arranged. Each of its pixel (photodiode) is capable of recording only the intensity of light - that is, the number of photons that have flown in. However, a pixel cannot measure its color (wavelength). To get an RGB image, I had to pile on crutches here too - cover the entire matrix with a grid of multi-colored glass. The most popular implementation is called the Bayer filter and is used in most matrices today. Looks like the picture below.

New Article: Computational Photography

It turns out that each pixel of the matrix catches only the R-, G- or B-component, because the rest of the photons are mercilessly reflected by the Bayer filter. It recognizes the missing components by blunt averaging the values ​​of neighboring pixels.

There are more green cells in the Bayer filter - this was done by analogy with the human eye. It turns out that out of 50 million pixels on the matrix, 25 million will capture green, 12,5 million each for red and blue. The rest will be averaged - this process is called debayerization or demomosaic, and this is such a fat funny crutch on which everything rests.

New Article: Computational Photography

In fact, each matrix has its own cunning patented demosaicing algorithm, but in the framework of this story, we will neglect this.

Other types of matrices (such as Foveon) have not yet taken root at all. Although some manufacturers are trying to use matrices without a Bayer filter to improve sharpness and dynamic range.

When there is little light or the details of the object are very tiny, we lose a lot of information, because the Bayer filter brazenly cuts off photons with an objectionable wavelength. Therefore, they came up with the idea of ​​​​doing Pixel Shifting - shifting the matrix by 1 pixel up-down-right-left in order to catch them all. The photo does not turn out to be 4 times larger, as it might seem, the processor simply uses this data to more accurately record the value of each pixel. It averages not over neighbors, so to speak, but over four values ​​of itself.

New Article: Computational Photography

The shaking of our hands when taking photos with our phone makes this process a natural consequence. In the latest versions of the Google Pixel, this thing is implemented and turns on whenever you use the zoom on the phone - it's called Super Res Zoom (yes, I also like their merciless naming). The Chinese also copied it into their laophones, although it turned out a little worse.

Overlaying slightly shifted photographs on top of each other allows you to collect more information about the color of each pixel, which means reducing noise, increasing sharpness and raising the resolution without increasing the physical number of megapixels of the matrix. Modern Android flagships do this automatically before their users even think about it.

#Focus stacking - any depth of field and refocus in post-production

New Article: Computational Photography

The method comes from macro photography, where shallow depth of field has always been a problem. In order for the entire object to be in focus, I had to take several frames with a focus shift back and forth, in order to then stitch them into one sharp one. Landscape photographers often use the same method, making the foreground and background sharp as diarrhea.

New Article: Computational Photography

All this has also moved to smartphones, though without much hype. In 2013, the Nokia Lumia 1020 comes out with the "Refocus App", and in 2014 the Samsung Galaxy S5 comes out with the "Selective Focus" mode. They worked according to the same scheme: by pressing a button, they quickly took 3 photos - one with “normal” focus, the second with shifted forward and the third with shifted back. The program aligned frames and let you select one of them, which was touted as "real" focus control in post-production.

There was no further processing, because even this simple hack was enough to drive another nail into the cover of Lytro and analogues with their honest refocus. By the way, let's talk about them (transition master 80 lvl).

#Computational matrices - light fields and plenoptics

As we understood above, our matrices are horror on crutches. We just got used to it and try to live with it. In terms of their structure, they have changed little since the beginning of time. We only improved the technical process - reduced the distance between pixels, struggled with interference noise, added special pixels for phase detection autofocus. But it is worth taking even the most expensive DSLR and trying to shoot a running cat on it in room lighting - the cat, to put it mildly, will win.

New Article: Computational Photography

We have been trying to invent something better for a long time. A lot of attempts and research in this area are Googled for “computational sensor” or “non-bayer sensor”, and even the Pixel Shifting example above can be attributed to attempts to improve matrices using calculations. However, the most promising stories in the last twenty years come to us precisely from the world of the so-called plenoptic cameras.

So that you don’t fall asleep in anticipation of impending difficult words, I’ll throw in an insider that the camera of the latest Google Pixels is just “a little” plenoptic. Only two pixels, but even this allows it to calculate an honest optical depth of the frame without a second camera, like everyone else.

The Plenooptic is a powerful weapon that has not fired yet. Here is a link to one of my favorite recent articles about the possibilities of plenoptic cameras and our future with themwhere I borrowed examples from.

#

Plenoptic camera - soon to be each

Invented in 1994, assembled at Stanford in 2004. The first consumer camera, the Lytro, was released in 2012. The VR industry is actively experimenting with similar technologies.

The plenoptic camera differs from a conventional camera in only one modification - the matrix in it is covered with a grid of lenses, each of which covers several real pixels. Something like this:

New Article: Computational Photography

if you correctly calculate the distance from the grid to the matrix and the size of the aperture, the final image will result in clear clusters of pixels - a kind of mini-version of the original image.

It turns out that if you take, say, one central pixel from each cluster and glue the picture only on them, it will not differ in any way from the one taken with a conventional camera. Yes, we lost a little in resolution, but we'll just ask Sony to add more megapixels in new matrices.

New Article: Computational Photography

The fun is only just beginning. if you take another pixel from each cluster and glue the picture again, you will again get a normal photo, only as if taken with a shift of one pixel. Thus, having clusters of 10 × 10 pixels, we will get 100 images of the subject from “slightly” different points.

New Article: Computational Photography

Larger cluster size means more images, but less resolution. In the world of smartphones with 41-megapixel matrices, although we can neglect the resolution a bit, everything has a limit. You have to keep a balance.

Okay, we have assembled a plenoptic camera, and what does this give us?

Honest Refocus

The feature that all the journalists buzzed about in articles about Lytro was the ability to honestly adjust the focus in post-production. By fair, we mean that we do not use any deblurring algorithms, but use only the pixels at hand, choosing or averaging them from clusters in the right order.

A RAW photo from a plenoptic camera looks weird. To get the usual sharp jeep out of it, you must first assemble it. To do this, you need to select each pixel of the jeep from one of the RAW clusters. Depending on how we choose them, the result will change.

For example, the farther the cluster is from the place where the original beam fell, the more this beam is out of focus. Because optics. To get a defocused image, we just need to select pixels at the distance we need from the original - either closer or farther.

New Article: Computational Photography

 

It was more difficult to shift the focus to oneself - purely physically, there were fewer such pixels in the clusters. At first, the developers did not even want to give the user the opportunity to focus with their hands - the camera itself solved this programmatically. Users did not like this future, because a feature was added in later firmware called “creative mode”, but the refocus in it was made very limited exactly for this reason.

Depth map and 3D from one camera   

One of the simplest operations in plenoptics is getting a depth map. To do this, you just need to collect two different frames and calculate how objects are shifted on them. More shift means further away from the camera.

Google recently bought and killed Lytro, but used their tech for its VR and… for the camera on the Pixel. Starting with the Pixel 2, for the first time, the camera became “somewhat” plenoptic, albeit with clusters of only two pixels. This made it possible for Google not to install a second camera, like all the other guys, but to calculate the depth map solely from one photo.

New Article: Computational Photography

New Article: Computational Photography

The depth map is built from two frames shifted by one subpixel. This is quite enough to calculate the binary depth map and separate the foreground from the background and blur the latter in the now fashionable bokeh. The result of such a layering is still smoothed out and “improved” by neural networks that are trained to improve depth maps (and not blur, as many people think).

New Article: Computational Photography

Another trick is that we got plenoptics in smartphones almost for free. We already put lenses on these tiny matrices in order to somehow increase the light output. In the next Pixel, Google plans to go further and cover four photodiodes with a lens.

Source: 3dnews.ru

Add a comment