Machine learning systems for image synthesis and noise reduction in night photos

Stability AI has published ready-made models for the Stable Diffusion machine learning system, which is capable of synthesizing and modifying images based on a natural language text description. The models are released under the Creative ML OpenRAIL-M permissive license for commercial use. A cluster of 4000 NVIDIA A100 Ezra-1 GPUs and a LAION-5B collection of 5.85 billion images with text descriptions were used to train the system. Previously, under the MIT license, the code of tools for neural network training and image generation was opened.

The availability of the finished model and fairly modest system requirements, allowing you to start experimenting on PCs with typical GPUs, led to the emergence of a number of related projects:

  • textual-inversion (code) - An add-on that allows you to synthesize images with a given character, object, or style. In the original Stable Diffusion, objects in synthesized images are random and uncontrollable. The proposed add-on allows you to add your own visual objects, bind them to keywords and use them in synthesis.

    For example, in a regular Stable Diffusion, you can ask the system to generate an image with a β€œcat in a boat”. Additionally, you can specify the characteristics of the cat and boat, but it is unpredictable which cat and boat will be synthesized. Textual-inversion allows you to train the system with the image of your cat or boat and synthesize the image with a specific cat or boat. Similarly, it can also replace image elements with certain objects, set an example of a visual style for synthesis, and specify concepts (for example, a more accurate and high-quality selection in the desired style can be used from the entire variety of doctors).

    Machine learning systems for image synthesis and noise reduction in night photos

  • stable-diffusion-animation - creating animated (moving) images based on interpolation between images generated in Stable Diffusion.
  • stable_diffusion.openvino (code) - A CPU-only port of Stable Diffusion, allowing experimentation on systems without powerful GPUs. It requires a processor supported in the OpenVINO library. OpenVINO officially provides plugins for Intel processors with AVX2, AVX-512, AVX512_BF16 and SSE extensions, as well as for Raspberry Pi 4 Model B, Apple Mac mini and NVIDIA Jetson Nano boards. It is unofficially possible to use OpenVINO on AMD Ryzen processors.
  • sdamd is a port for AMD GPUs.
  • Initial implementation of video synthesis.
  • stable-diffusion-gui, stable-diffusion-ui, Artbreeder Collage, diffuse-the-rest are graphical interfaces for generating images using Stable Diffusion.
  • beta.dreamstudio.ai, Hugging Face Spaces, hlky Stable Diffusion WebUI - web interfaces for image synthesis using Stable Diffusion.
  • Plugins for integrating Stable Diffusion with GIMP, Figma, Blender and Photoshop.

Additionally, we can note the publication by Google of the code for the RawNeRF (RAW Neural Radiance Fields) machine learning system, which, based on data from several RAW images, improves the quality of heavily noisy images taken in the dark and in poor lighting. In addition to removing noise, the tools developed by the project allow you to increase detail, eliminate glare, synthesize HDR and change the overall lighting in photographs, as well as recreate the three-dimensional position of objects from several pictures from different angles, shift the viewpoint, manipulate focus and generate moving pictures.

Machine learning systems for image synthesis and noise reduction in night photos
Machine learning systems for image synthesis and noise reduction in night photos


Source: opennet.ru

Add a comment