SlideShare a Scribd company logo
Stable Diffusion path
presented by
Vitaly Bondar
Generative DL enthusiast
johngull @ gmail
Autoencoder
Illustration: Lilian Weng
LeCun, Y. Modeles connexionistes de l’apprentissage ` . Ph.D. thesis, 1987
Variational autoencoder (VAE)
Illustrations:
Lilian Weng, D. Kingma
Kingma, Welling. Auto-Encoding Variational Bayes. 2013
Variational autoencoder (VAE)
Kingma, Welling. Auto-Encoding Variational Bayes. 2013
Vector Quantized Variational Autoencoder (VQ-VAE)
Oord et al. Neural Discrete Representation Learning, 2017
Vector Quantized Variational Autoencoder (VQ-VAE)
Oord et al. Neural Discrete Representation Learning, 2017
Vector Quantized Variational Autoencoder (VQ-VAE)
Oord et al. Neural Discrete Representation Learning, 2017
VQ-GAN
Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020
VQ-GAN
Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020
VQ-GAN
Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020
Recap: diffusion models
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics, 2015;
Yang & Ermon, 2019; DDPM; Ho et al. 2020; … ?
Latent diffussions
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diffusion Models, 2021
Latent diffussions
Training phases:
● Autoencoder
○ Loss: Patch-based GAN loss + Perceptual loss
○ Regularization: KL-loss (close to VAE) OR quantization in Decoder (like VQ-GAN)
● Various generative tasks
○ Loss: classical diffusion L2 restoration loss
○ All trainings done on single A100
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diffusion Models, 2021
Latent diffussions
Downsampling for 4-16x: speedup of generative training without sampling quality loss
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diffusion Models, 2021
Latent diffussions
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diffusion Models, 2021
Latent diffussions
Latent diffussions
Latent diffussions
Latent diffussions
Imagen (important influencer)
Saharia, Chan. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, 2022
Imagen (important influencer)
Saharia, Chan. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, 2022
Imagen (important influencer)
Stable diffusion
● No paper (core team is from Latent diffusion authors)
● Open source: https://github.com/CompVis/stable-diffusion
● Well improved and well trained latent diffusion model
Stable diffusion
Key components:
● High quality decoder from VQ-GAN
● Diffusion in latent space
● Frozen language model (CLIP ViT-L/14 embeddings)
● Classifier-free guidance
● A lot of data (LAION-5B and its subsets)
● A lot of compute power
Stable diffusion

More Related Content

PDF
Exploring Generating AI with Diffusion Models
PDF
Latent diffusions vs DALL-E v2
PDF
Domain adaptation
PDF
Wasserstein GAN 수학 이해하기 I
PPTX
Jupyter notebook 이해하기
PPTX
Generative models
PDF
ChatGPT PPT
PDF
Generative AI
Exploring Generating AI with Diffusion Models
Latent diffusions vs DALL-E v2
Domain adaptation
Wasserstein GAN 수학 이해하기 I
Jupyter notebook 이해하기
Generative models
ChatGPT PPT
Generative AI

What's hot (20)

PPTX
Diffusion models beat gans on image synthesis
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
PDF
Introduction to Generative Adversarial Networks (GANs)
PDF
Landscape of AI/ML in 2023
PDF
Introduction to Diffusion Models
PDF
Interpretability beyond feature attribution quantitative testing with concept...
PPTX
Attention Is All You Need
PDF
210523 swin transformer v1.5
PPTX
Generative Adversarial Networks (GAN)
PDF
Evolution of the StyleGAN family
PDF
Finding connections among images using CycleGAN
PDF
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
PDF
Deep Learning for Computer Vision: Generative models and adversarial training...
PDF
Multimodal Deep Learning
PDF
Generative AI at the edge.pdf
PDF
PR-409: Denoising Diffusion Probabilistic Models
PDF
Generative adversarial text to image synthesis
PPTX
Transformers in Vision: From Zero to Hero
PDF
Generative adversarial networks
Diffusion models beat gans on image synthesis
Intepretability / Explainable AI for Deep Neural Networks
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Introduction to Generative Adversarial Networks (GANs)
Landscape of AI/ML in 2023
Introduction to Diffusion Models
Interpretability beyond feature attribution quantitative testing with concept...
Attention Is All You Need
210523 swin transformer v1.5
Generative Adversarial Networks (GAN)
Evolution of the StyleGAN family
Finding connections among images using CycleGAN
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Deep Learning for Computer Vision: Generative models and adversarial training...
Multimodal Deep Learning
Generative AI at the edge.pdf
PR-409: Denoising Diffusion Probabilistic Models
Generative adversarial text to image synthesis
Transformers in Vision: From Zero to Hero
Generative adversarial networks
Ad

Similar to Stable Diffusion path (20)

PPTX
Vitaly Bondar: Decoding Stable Diffusion: a journey through key concepts (UA)
PPTX
SLIDES OF LECTURE ABOUT TRANSFORMERS FOR VISION TASKS
PDF
BriefHistoryTransformerstransformers.pdf
PDF
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Le...
PPTX
Adaptive Spectral Projection
PDF
보다 유연한 이미지 변환을 하려면?
PDF
PR-315: Taming Transformers for High-Resolution Image Synthesis
PDF
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
PDF
Multi-modal embeddings: from discriminative to generative models and creative ai
PDF
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
PDF
Modeling perceptual similarity and shift invariance in deep networks
PPTX
Satellite image contrast enhancement using discrete wavelet transform
PDF
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
PDF
Visual Transformers
PDF
ViT (Vision Transformer) Review [CDM]
PPTX
Image captioning
PDF
Shaders - Claudia Doppioslash - Unity With the Best
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
PDF
Icml2018 naver review
Vitaly Bondar: Decoding Stable Diffusion: a journey through key concepts (UA)
SLIDES OF LECTURE ABOUT TRANSFORMERS FOR VISION TASKS
BriefHistoryTransformerstransformers.pdf
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Le...
Adaptive Spectral Projection
보다 유연한 이미지 변환을 하려면?
PR-315: Taming Transformers for High-Resolution Image Synthesis
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
Multi-modal embeddings: from discriminative to generative models and creative ai
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Modeling perceptual similarity and shift invariance in deep networks
Satellite image contrast enhancement using discrete wavelet transform
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
Visual Transformers
ViT (Vision Transformer) Review [CDM]
Image captioning
Shaders - Claudia Doppioslash - Unity With the Best
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Icml2018 naver review
Ad

Recently uploaded (20)

PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
BIOMOLECULES PPT........................
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
famous lake in india and its disturibution and importance
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
An interstellar mission to test astrophysical black holes
TOTAL hIP ARTHROPLASTY Presentation.pptx
ECG_Course_Presentation د.محمد صقران ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Viruses (History, structure and composition, classification, Bacteriophage Re...
7. General Toxicologyfor clinical phrmacy.pptx
BIOMOLECULES PPT........................
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
AlphaEarth Foundations and the Satellite Embedding dataset
POSITIONING IN OPERATION THEATRE ROOM.ppt
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
famous lake in india and its disturibution and importance
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
. Radiology Case Scenariosssssssssssssss
Classification Systems_TAXONOMY_SCIENCE8.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Biophysics 2.pdffffffffffffffffffffffffff
Taita Taveta Laboratory Technician Workshop Presentation.pptx

Stable Diffusion path