Introduction to Stable Diffusion (Overview)

Overview
• Description: A Latent Diffusion model that can run on most consumer hardware equipped with a modest GPU with at
least 8 GB VRAM
• Developers
• CompVis Group, Ludwig Maximilian University of Munich
• Runway
• Sponsor/Donor: Stability AI, EleutherAI, and LAION
• Series A funding: Lightspeed Venture Partners, and Coatue Management
• Dataset contributors: Various NPOs
• Training cost: $660,000
• Sources: Hugging Face Spaces (Stability AI), and DreamStudio
• Primarily use
• To generate detailed images conditioned on text descriptions (prompts)
• Additional use
• Inpainting
• Outpainting
• generating image-to-image translations guided by a text prompt
https://techpp.com/2022/10/10/how-to-train-stable-diffusion-ai-dreambooth/

LDM Architecture
variational autoencoder (VAE)
VAE Decoder

SD Architecture
• Uses Diffusion Model from CompVis, LMU Munich, Germany
• 860 million parameters in the U-Net (with Resnet Backbone)
• 123 million parameters in the text encoder
• Considered relatively lightweight by 2022 standards
• SD v2 uses Xformers, and XL uses additional reinforcement/knowledge
• Model fine-tuning methods…
• Dreambooth
• Textual Inversion
• LoRA
• Hypernetworks
• Aesthetic Gradient

Fine-tuning method: Dreambooth
https://youtu.be/dVjMiJsuR5o
Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)

Fine-tuning method: Textual Inversion

Fine-tuning method: LoRA

Fine-tuning method: HyperNetworks

Training
• Create a Dataset
• Fine tuning options
• Dreambooth notebook
• Image dataset 512px x 512px
• prior_preservation_class_prompt auto complete using CLIP tokenizer
• Web UI over SD v1 and v2 and hypernetwork
• Image dataset 512px x 512px
• Unique dataset folder name and images as a unique name series
• Starts from pre-trained checkpoint file (.ckpt)
• Auto annotation (class prompt) text file generated using BLIP with textual inversion
• UI supports
• gradio outlook (conventional)
• Streamlit outlook (new and in progress)

Introduction to Stable Diffusion (Overview)

More Related Content

Similar to Introduction to Stable Diffusion (Overview) (20)

Recently uploaded (20)

Introduction to Stable Diffusion (Overview)