SlideShare a Scribd company logo
2
Most read
3
Most read
Stable Diffusion
Intro
Overview
• Description: A Latent Diffusion model that can run on most consumer hardware equipped with a modest GPU with at
least 8 GB VRAM
• Developers
• CompVis Group, Ludwig Maximilian University of Munich
• Runway
• Sponsor/Donor: Stability AI, EleutherAI, and LAION
• Series A funding: Lightspeed Venture Partners, and Coatue Management
• Dataset contributors: Various NPOs
• Training cost: $660,000
• Sources: Hugging Face Spaces (Stability AI), and DreamStudio
• Primarily use
• To generate detailed images conditioned on text descriptions (prompts)
• Additional use
• Inpainting
• Outpainting
• generating image-to-image translations guided by a text prompt
https://techpp.com/2022/10/10/how-to-train-stable-diffusion-ai-dreambooth/
LDM Architecture
variational autoencoder (VAE)
VAE Decoder
SD Architecture
• Uses Diffusion Model from CompVis, LMU Munich, Germany
• 860 million parameters in the U-Net (with Resnet Backbone)
• 123 million parameters in the text encoder
• Considered relatively lightweight by 2022 standards
• SD v2 uses Xformers, and XL uses additional reinforcement/knowledge
• Model fine-tuning methods…
• Dreambooth
• Textual Inversion
• LoRA
• Hypernetworks
• Aesthetic Gradient
Fine-tuning method: Dreambooth
https://youtu.be/dVjMiJsuR5o
Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
Fine-tuning method: Textual Inversion
https://youtu.be/dVjMiJsuR5o
Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
Fine-tuning method: LoRA
https://youtu.be/dVjMiJsuR5o
Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
Fine-tuning method: HyperNetworks
https://youtu.be/dVjMiJsuR5o
Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
Training
• Create a Dataset
• Fine tuning options
• Dreambooth notebook
• Image dataset 512px x 512px
• prior_preservation_class_prompt auto complete using CLIP tokenizer
• Web UI over SD v1 and v2 and hypernetwork
• Image dataset 512px x 512px
• Unique dataset folder name and images as a unique name series
• Starts from pre-trained checkpoint file (.ckpt)
• Auto annotation (class prompt) text file generated using BLIP with textual inversion
• UI supports
• gradio outlook (conventional)
• Streamlit outlook (new and in progress)

More Related Content

DOC
Ankur Bajad
PPTX
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PDF
HTML5/CSS3 and Future Web in Mobile and IPTV
PDF
ML in the Browser: Interactive Experiences with Tensorflow.js
PDF
Rapid Prototyping with Sass, Compass and Middleman by Bermon Painter
PPTX
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
PDF
The Future of Adhearson
Ankur Bajad
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
HTML5/CSS3 and Future Web in Mobile and IPTV
ML in the Browser: Interactive Experiences with Tensorflow.js
Rapid Prototyping with Sass, Compass and Middleman by Bermon Painter
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
The Future of Adhearson

Similar to Introduction to Stable Diffusion (Overview) (20)

PPTX
Amazon SageMaker (December 2018)
PDF
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
PDF
Blazing fast web experience at your fingertips with Experience Edge, JSS for ...
PDF
Scalable Machine Learning in R and Python with H2O
PDF
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PDF
Privacy-first in-browser Generative AI web apps: offline-ready, future-proof,...
PDF
CouchDB Google
PDF
CouchDB - Local Web Platform
PDF
Red Hat Forum Benelux 2015
PDF
Scalable Automatic Machine Learning with H2O
PDF
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
PPTX
Basic Application Performance Optimization Techniques (Backend)
PDF
Innovative trends in robotics
PPTX
2017 03 25 Microsoft Hacks, How to code efficiently
PDF
Web Development using Ruby on Rails
PPTX
Deploying R for Production - SRUG
PDF
New Developments in H2O: April 2017 Edition
PDF
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
PPTX
Real time video copy detection based on hadoop
PDF
Emulation of Dynamic Adaptive Streaming over HTTP with Mininet
Amazon SageMaker (December 2018)
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Blazing fast web experience at your fingertips with Experience Edge, JSS for ...
Scalable Machine Learning in R and Python with H2O
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Privacy-first in-browser Generative AI web apps: offline-ready, future-proof,...
CouchDB Google
CouchDB - Local Web Platform
Red Hat Forum Benelux 2015
Scalable Automatic Machine Learning with H2O
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
Basic Application Performance Optimization Techniques (Backend)
Innovative trends in robotics
2017 03 25 Microsoft Hacks, How to code efficiently
Web Development using Ruby on Rails
Deploying R for Production - SRUG
New Developments in H2O: April 2017 Edition
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
Real time video copy detection based on hadoop
Emulation of Dynamic Adaptive Streaming over HTTP with Mininet
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Quality review (1)_presentation of this 21
PPTX
Computer network topology notes for revision
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Lecture1 pattern recognition............
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Mega Projects Data Mega Projects Data
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Quality review (1)_presentation of this 21
Computer network topology notes for revision
IB Computer Science - Internal Assessment.pptx
Lecture1 pattern recognition............
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Fluorescence-microscope_Botany_detailed content
Data_Analytics_and_PowerBI_Presentation.pptx
Ad

Introduction to Stable Diffusion (Overview)

  • 2. Overview • Description: A Latent Diffusion model that can run on most consumer hardware equipped with a modest GPU with at least 8 GB VRAM • Developers • CompVis Group, Ludwig Maximilian University of Munich • Runway • Sponsor/Donor: Stability AI, EleutherAI, and LAION • Series A funding: Lightspeed Venture Partners, and Coatue Management • Dataset contributors: Various NPOs • Training cost: $660,000 • Sources: Hugging Face Spaces (Stability AI), and DreamStudio • Primarily use • To generate detailed images conditioned on text descriptions (prompts) • Additional use • Inpainting • Outpainting • generating image-to-image translations guided by a text prompt https://techpp.com/2022/10/10/how-to-train-stable-diffusion-ai-dreambooth/
  • 4. SD Architecture • Uses Diffusion Model from CompVis, LMU Munich, Germany • 860 million parameters in the U-Net (with Resnet Backbone) • 123 million parameters in the text encoder • Considered relatively lightweight by 2022 standards • SD v2 uses Xformers, and XL uses additional reinforcement/knowledge • Model fine-tuning methods… • Dreambooth • Textual Inversion • LoRA • Hypernetworks • Aesthetic Gradient
  • 5. Fine-tuning method: Dreambooth https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
  • 6. Fine-tuning method: Textual Inversion https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
  • 7. Fine-tuning method: LoRA https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
  • 8. Fine-tuning method: HyperNetworks https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth, Hypernetworks) : r/StableDiffusion (reddit.com)
  • 9. Training • Create a Dataset • Fine tuning options • Dreambooth notebook • Image dataset 512px x 512px • prior_preservation_class_prompt auto complete using CLIP tokenizer • Web UI over SD v1 and v2 and hypernetwork • Image dataset 512px x 512px • Unique dataset folder name and images as a unique name series • Starts from pre-trained checkpoint file (.ckpt) • Auto annotation (class prompt) text file generated using BLIP with textual inversion • UI supports • gradio outlook (conventional) • Streamlit outlook (new and in progress)