Stability AI-backed research group DeepFloyd recently introduced DeepFloyd IF, a text-to-image tool capable of intelligently integrating words into images. The model developed with a dataset containing more than one billion text and images needs a specific GPU. The GPU must have a minimum of 16GB of RAM. Using prompts in various styles, DeepFloyd IF can produce an image. The model is currently available in open source. However, it allows commercial use due to the uncertainty of any generative AI art model’s legal status.
NightCafe, a generative art site, received early access to DeepFloyd IF. Angus Russell, NightCafe CEO, noted this sets it apart from other similar text-to-image models. In a TechCrunch interview, Russell explained that DeepFloyd IF is unique. It is unique because it utilizes multiple processes put in modular architecture to generate images. Google’s Imagen model, which was never released publicly, greatly influenced the design of DeepFloyd IF.
DeepFloyd IF Multiple Diffusion Steps
Unlike typical diffusion models, DeepFloyd IF performs diffusion multiple times to generate an image. This allows it to create a 64x64px picture and upscale it to 256x256px and then 1024x1024px.
According to Russell, DeepFloyd IF operates directly on pixels. According to Russell, most diffusion models can pass for latent diffusion models. This means they work in a lower-dimensional area. This way, they represent more pixels but are less accurate.
DeepFloyd IF’s design language model size makes it excel at interpreting complex prompts & spatial relationships conveyed in prompts.
DeepFloyd IF Generative Art Possibilities
Russell anticipates that DeepFloyd IF’s ability to get words in images competently will unlock a new wave of generative art uses and possibilities. These possibilities include billboards, web design, logo design, posters, and memes. Additionally, he believes the model will be better at generating many things like hands. Also, since it can comprehend prompts in foreign languages, it can generate words in those languages.
The featured image is from stability.ai