Developments related to the Stable Diffusion machine learning system, which synthesizes images based on a text description in natural language, have been discovered. The project is being developed jointly by researchers from Stability AI and Runway, the Eleuther AI and LAION communities, and the CompVis lab (Vision and Machine Learning Research Laboratory at the University of Munich). In terms of capabilities and quality of the result, Stable Diffusion resembles the DALL-E 2 project, but develops as an open and public one. The implementation of Stable Diffusion is written in Python and distributed under the MIT license.
Ready-made models are currently provided on a separate request to educational institutions and independent researchers, but the developers promise to open them to everyone after testing is completed and the first release is ready. A cluster of 4000 NVIDIA A100 Ezra-1 GPUs and a LAION-5B collection of 5.85 billion images with text descriptions were used to train the system. Image generation components are marked as lightweight enough to run on user systems, for example, to synthesize images with a resolution of 512×512, it is enough to have a GPU with 10GB of video memory in the system.



In addition to the synthesis of images from a text description, an option is offered for modifying images, which can generate pictures from schematic sketches using clarifying text prompts, edit and change images, or restore lost details when zoomed in. A variant of Stable Diffusion is also in development for video editing based on natural language text commands.


Source: opennet.ru
