SlideShare a Scribd company logo
Mastering Image Generation
with Stable Diffusion
Raphaël Semeteys
bbTeX
23/01/2025
Use Case
Locally generate accurate images of Yoga Poses
• Images of yoga poses must be precise
• Photography is not always the best option
• Images or photos from the internet cannot be reused
How could Generative AI help?
Stable Diffusion
From German Labs to London-based Startup
• Collaboration of several companies and German Labs
• Latent Diffusion Model with embedding space in 2021
• CLIP-guided diffusion
• LIAON dataset
• Runway and EleutherAI participation
• Stability AI
• Compute donation to the project
• Hired most of initial researchers
• Now official maintainer of Stable Diffusion models
“Open” Licenses
• Responsible AI: OpenRAIL
• Version 3.5: Enterprises with
1M+ revenue must pay
Stable Diffusion
Very dynamic contributing Communities
Models
• Fine-tuning: Custom Models for
specific styles or themes
• Refiners, Upscalers, ControlNets
• Model extensions (LoRA)
Tools
• User-Friendly Interfaces:
Automatic1111 Web UI, ComfyUI
• Fine-tuning tools: Dreambooth,
Kohya SS
Sharing Communities
• Portals to share models, prompts, images and tutorials: Hugging Face, Civit.ai…
• Stable Horde: crowdsourced distributed cluster of generation workers
GUI for local Stable Diffusion workflows
• Intuitive, modular and customizable
• Flexible node-based workflows
• Text-to-Image Generation
• Image-to-Image Processing
• Custom Node Management
• Community-Driven
• GPL 3 License
• Contributions, plugins, doc
Let’s start with a simple demo
Generate an image of a girl
doing a yoga pose
Demo – Text-to-Image (T2I)
What does Stable Diffusion?
• Starts with Random Noise: Begins with a noisy,
unrecognizable image
• Refines Step-by-Step: Gradually removes noise, adding
details
• Learns from Real Images: Uses patterns from trained
images
• Text-Guided Creation: Follows prompts like “girl doing yoga
in a park"
• Denoising Process: Clarifies image layer by layer
• Final Image Output: Produces a clear, detailed image
matching the prompt
Models
• Most used Stability.ai Models
• SD 1.5
• SDXL
• Fine-tuned Models
• Specialized: style, subject
• Shared by communities (like civitai.com)
Prompts
• CLIP Model (Contrastive Language–Image Pretraining)
• Connect descriptive text and images
• Help generate images matching specific prompts
• Can handle a wide range of prompts
• Developed by OpenAI in 2021
• Usable under the MIT license
• Trained on 400M image-text pairs from the Internet
• Positive & Negative
• Textual or Short syntax
Embeddings (Textual Inversions)
• Vector representations of text
• “Instructions” for image generation
• Style, theme, texture, pose, character features, etc.
• Small files containing additional concepts
• To be injected in prompts
• Community provides many presets
• Must aligned with Stable Diffusion version
Demo – T2I + Embedding (Textual Inversion)
Embeddings (Textual Inversions)
Ghibli
Fantasy
Comic
3D Render
Analog Film Cinematic Cyberpunk Digital Art
No Embedding
Vector Art
Latent Space
• Latent Space
• Abstract, compressed representation of the image
• Handles encoded features such as shapes, colors,
textures and general structure
• Manipulation of embedding vectors
• Iterative and refining generation
• Random noise is introduced into the latent space
• At each step the model adjusts the features to match
the prompt
• VAE (Variational Autoencoder)
• Convert Image Pixels → Latent Space
Denoising Process
• Seed
• Random seed used to create initial noise
• Fixing it allows to see impact of other parameters
• Samplers
• Algorithms guiding the iterative image generation
• Differ in Speed and Quality
• Schedulers
• Control how noise is removed at each step
• Also impact Speed and Quality, Karras is well balanced
• Other Parameters
• #steps, CFG (adherence to prompt), %denoising
I can tweak generation
but I don’t control the pose…
Camel pose
Tree pose
Lotus pose Shoulder Stand pose
Text-to-Image generation is not enough!
Let’s move on to Image-to-Image
Generate a image of a girl
doing a yoga pose based on
an existing image
Demo – Image-to-Image (I2I)
Image-to-Image Generation
CFG 20 CFG 8
• Input Image
• Replace the Empty Latent Image with a real one
• Need a VAE Encode (from the model)
• Play with % denoising
• Prompt has less impact
• Increasing CFG only reduces quality
denoise 0.55 denoise 0.70
ControlNets
• Specialized Neural Networks
• Additional control and guidance to primary model
• Use reference images to transfer structural information
or inject features
→ Hybrid approach with both text and visual references
• Control methods
• Structural: pose, edge detection, segmentation, depth
• Texture & Detail: scribble/sketch, stylization from edges
• Content & Layout: bounding boxes, inpainting masks
• Abstract & Style: color maps, textural fields
Demo – I2I + ControlNet
Preprocessors for ControlNets
Initial Image Line Art Color Map Open Pose
Segmentation Depth Map Scribble
Straight Lines
More abstract input images
• Design poses in 3D with image export
• Use of JustSketchMe tool (webapp & PWA)
• Design poses based on my own knowledge
• Several angles of view
• (waiting for 3D GenAI Models)
Demo – T2I + 2 ControlNets
How can I achieve greater
consistency for the character?
Create images featuring the
same facial identity
LoRA
• Low-Rank Adaptation
• Lightweight Model Adaptation
• Update a small subset of model parameters
• Very efficient
• Small File Size, use significantly less memory
• Faster Training
• Usage
• Specific styles, poses, characters, or concepts
• Triggered by keywords in the prompt
• Many LoRAs are provided by the community
Demo – ControlNet + LoRA
Embeddings + 2 ControlNets + 2 LoRAs
Embeddings + ControlNet + LoRA
Create our own Haj3r LoRA
Dreambooth
• SDXL base
• 28 input images
• 2 epochs
• Google Colab Notebook
• 1h30 remote training
Kohya_ss tool
• PowerPuffMix base
• 15 input images
• 20 epochs
• 3h30 local training
• Embeddings
Embeddings + 2 ControlNets + Haj3r LoRA
Demo – I2I + 2 ControlNets + Haj3r LoRA + Transparency
Embeddings + ControlNet + Haj3r LoRA
FaceID + FaceDetailer
• Image Prompt Adapters
• Enable to generate with image prompt
• Pre-trained control networks different from SD
• A sort of one-image LoRA
• FaceID IPAdapter
• Face recognition model instead of CLIP
• LoRA to improve ID consistency
• FaceDetailer
• Face enhancement tool (eyes, nose, lips, expression)
• Post-processing AI model
Demo – 2 ControlNets + FaceID + FaceDetailer
Embeddings + ControlNet + FaceID + FaceDetailer
Easier to change model Cheyenne v2
Easier to change persona
Summary
t2i + i2i
t2i t2i + embeddings
t2i + i2i + embeddings +
ControlNet + LoRA
t2i + i2i + embeddings + ControlNet
t2i + i2i + embeddings +
ControlNet + Haj3r LoRA +
FaceDetailer
t2i + embeddings + ControlNet +
FaceID + FaceDetailer
Conclusion
Image Generation is both
Science & Art
A lot of parameters to tune
Add input & components to control output
My use case
is implementable
Precise and homogeneous images
Cherry on the cake: more inclusivity
Yoga Sūtra II.46
The posture should be Stable and Comfortable
The Yoga of Image Generation
Thank you
raphiki.github.io

More Related Content

PDF
The Yoga of Image Generation with Stable Diffusion & ComfyUI
PDF
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
PDF
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
PDF
Image Generation with ComfyUI and Stable Diffusion
PPTX
Mid_term_ppt_in_yukesh_katuwal_this_is.pptx
PPTX
Mid_term_present_in_engineering_done.pptx
PPTX
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
PDF
Stable Diffusion Artificial Intelligence – The Quick Book (2).pdf
The Yoga of Image Generation with Stable Diffusion & ComfyUI
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
Image Generation with ComfyUI and Stable Diffusion
Mid_term_ppt_in_yukesh_katuwal_this_is.pptx
Mid_term_present_in_engineering_done.pptx
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
Stable Diffusion Artificial Intelligence – The Quick Book (2).pdf

Similar to Mastering Image Generation with Stable Diffusion (20)

PDF
Leading-edge AI Image Generators of 2024
PDF
Learning Generative AI with Real Time use Cases with KloudSaga
PPTX
Journey of Generative AI
PPTX
unit 3 creating-images-and-vityhtytytytytdeos.pptx
PDF
SHUBHAM AI PPT for grapsp about artificial intelligence.pdf
PDF
Deep Generative Modelling (updated)
PDF
Exploring Generating AI with Diffusion Models
PPTX
Introduction to Generative Models.pptx
PDF
Top 7 Generative AI Models Shaping the Future of Technology
PPTX
Generative AI or GenAI technology based PPT
PPTX
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
PDF
Photo Editing And Sharing Web Application With AI- Assisted Features
PDF
AI_Photo_Generation_with_Python_A_Developer's_Guide.pdf
PDF
Anime Generation with AI
DOCX
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
PDF
Evaluation of conditional images synthesis: generating a photorealistic image...
PPTX
Introduction to Stable Diffusion (Overview)
PDF
Andy Bosyi: Few-shot learning as a trade-off between software development and...
PPTX
Generative_AI_Detailed_Presentation.pptx
PDF
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Leading-edge AI Image Generators of 2024
Learning Generative AI with Real Time use Cases with KloudSaga
Journey of Generative AI
unit 3 creating-images-and-vityhtytytytytdeos.pptx
SHUBHAM AI PPT for grapsp about artificial intelligence.pdf
Deep Generative Modelling (updated)
Exploring Generating AI with Diffusion Models
Introduction to Generative Models.pptx
Top 7 Generative AI Models Shaping the Future of Technology
Generative AI or GenAI technology based PPT
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Photo Editing And Sharing Web Application With AI- Assisted Features
AI_Photo_Generation_with_Python_A_Developer's_Guide.pdf
Anime Generation with AI
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Evaluation of conditional images synthesis: generating a photorealistic image...
Introduction to Stable Diffusion (Overview)
Andy Bosyi: Few-shot learning as a trade-off between software development and...
Generative_AI_Detailed_Presentation.pptx
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Ad

More from Raphaël Semeteys (20)

PDF
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
PDF
COTRECS - Embellir des QR Codes avec la GenAI
PDF
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
PDF
Normandie.ai 2024 - D'OpenAI à Opensource AI
PDF
Open Source Experience 2024 - D'OpenAI à Opensource AI
PDF
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
PDF
Cloud Nord 2024 - D'OpenAI à Opensource AI
PDF
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
PDF
AI_dev Europe 2024 - From OpenAI to Opensource AI
PDF
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
PDF
2023 - Between Philosophy and Practice: Introducing Yoga
PDF
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
PPTX
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
PDF
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
PPTX
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
PDF
Nantes JUG 2023 - Web3
PDF
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
PPTX
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
ODP
Solution Linux 2009 - QSOS
ODP
Solution Linux 2009 - SVG
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
COTRECS - Embellir des QR Codes avec la GenAI
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
Normandie.ai 2024 - D'OpenAI à Opensource AI
Open Source Experience 2024 - D'OpenAI à Opensource AI
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
Cloud Nord 2024 - D'OpenAI à Opensource AI
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
AI_dev Europe 2024 - From OpenAI to Opensource AI
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
2023 - Between Philosophy and Practice: Introducing Yoga
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Nantes JUG 2023 - Web3
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
Solution Linux 2009 - QSOS
Solution Linux 2009 - SVG
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Machine Learning_overview_presentation.pptx
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation

Mastering Image Generation with Stable Diffusion

  • 1. Mastering Image Generation with Stable Diffusion Raphaël Semeteys bbTeX 23/01/2025
  • 2. Use Case Locally generate accurate images of Yoga Poses • Images of yoga poses must be precise • Photography is not always the best option • Images or photos from the internet cannot be reused How could Generative AI help?
  • 3. Stable Diffusion From German Labs to London-based Startup • Collaboration of several companies and German Labs • Latent Diffusion Model with embedding space in 2021 • CLIP-guided diffusion • LIAON dataset • Runway and EleutherAI participation • Stability AI • Compute donation to the project • Hired most of initial researchers • Now official maintainer of Stable Diffusion models “Open” Licenses • Responsible AI: OpenRAIL • Version 3.5: Enterprises with 1M+ revenue must pay
  • 4. Stable Diffusion Very dynamic contributing Communities Models • Fine-tuning: Custom Models for specific styles or themes • Refiners, Upscalers, ControlNets • Model extensions (LoRA) Tools • User-Friendly Interfaces: Automatic1111 Web UI, ComfyUI • Fine-tuning tools: Dreambooth, Kohya SS Sharing Communities • Portals to share models, prompts, images and tutorials: Hugging Face, Civit.ai… • Stable Horde: crowdsourced distributed cluster of generation workers
  • 5. GUI for local Stable Diffusion workflows • Intuitive, modular and customizable • Flexible node-based workflows • Text-to-Image Generation • Image-to-Image Processing • Custom Node Management • Community-Driven • GPL 3 License • Contributions, plugins, doc
  • 6. Let’s start with a simple demo Generate an image of a girl doing a yoga pose
  • 8. What does Stable Diffusion? • Starts with Random Noise: Begins with a noisy, unrecognizable image • Refines Step-by-Step: Gradually removes noise, adding details • Learns from Real Images: Uses patterns from trained images • Text-Guided Creation: Follows prompts like “girl doing yoga in a park" • Denoising Process: Clarifies image layer by layer • Final Image Output: Produces a clear, detailed image matching the prompt
  • 9. Models • Most used Stability.ai Models • SD 1.5 • SDXL • Fine-tuned Models • Specialized: style, subject • Shared by communities (like civitai.com)
  • 10. Prompts • CLIP Model (Contrastive Language–Image Pretraining) • Connect descriptive text and images • Help generate images matching specific prompts • Can handle a wide range of prompts • Developed by OpenAI in 2021 • Usable under the MIT license • Trained on 400M image-text pairs from the Internet • Positive & Negative • Textual or Short syntax
  • 11. Embeddings (Textual Inversions) • Vector representations of text • “Instructions” for image generation • Style, theme, texture, pose, character features, etc. • Small files containing additional concepts • To be injected in prompts • Community provides many presets • Must aligned with Stable Diffusion version
  • 12. Demo – T2I + Embedding (Textual Inversion)
  • 13. Embeddings (Textual Inversions) Ghibli Fantasy Comic 3D Render Analog Film Cinematic Cyberpunk Digital Art No Embedding Vector Art
  • 14. Latent Space • Latent Space • Abstract, compressed representation of the image • Handles encoded features such as shapes, colors, textures and general structure • Manipulation of embedding vectors • Iterative and refining generation • Random noise is introduced into the latent space • At each step the model adjusts the features to match the prompt • VAE (Variational Autoencoder) • Convert Image Pixels → Latent Space
  • 15. Denoising Process • Seed • Random seed used to create initial noise • Fixing it allows to see impact of other parameters • Samplers • Algorithms guiding the iterative image generation • Differ in Speed and Quality • Schedulers • Control how noise is removed at each step • Also impact Speed and Quality, Karras is well balanced • Other Parameters • #steps, CFG (adherence to prompt), %denoising
  • 16. I can tweak generation but I don’t control the pose… Camel pose Tree pose Lotus pose Shoulder Stand pose Text-to-Image generation is not enough!
  • 17. Let’s move on to Image-to-Image Generate a image of a girl doing a yoga pose based on an existing image
  • 19. Image-to-Image Generation CFG 20 CFG 8 • Input Image • Replace the Empty Latent Image with a real one • Need a VAE Encode (from the model) • Play with % denoising • Prompt has less impact • Increasing CFG only reduces quality denoise 0.55 denoise 0.70
  • 20. ControlNets • Specialized Neural Networks • Additional control and guidance to primary model • Use reference images to transfer structural information or inject features → Hybrid approach with both text and visual references • Control methods • Structural: pose, edge detection, segmentation, depth • Texture & Detail: scribble/sketch, stylization from edges • Content & Layout: bounding boxes, inpainting masks • Abstract & Style: color maps, textural fields
  • 21. Demo – I2I + ControlNet
  • 22. Preprocessors for ControlNets Initial Image Line Art Color Map Open Pose Segmentation Depth Map Scribble Straight Lines
  • 23. More abstract input images • Design poses in 3D with image export • Use of JustSketchMe tool (webapp & PWA) • Design poses based on my own knowledge • Several angles of view • (waiting for 3D GenAI Models)
  • 24. Demo – T2I + 2 ControlNets
  • 25. How can I achieve greater consistency for the character? Create images featuring the same facial identity
  • 26. LoRA • Low-Rank Adaptation • Lightweight Model Adaptation • Update a small subset of model parameters • Very efficient • Small File Size, use significantly less memory • Faster Training • Usage • Specific styles, poses, characters, or concepts • Triggered by keywords in the prompt • Many LoRAs are provided by the community
  • 28. Embeddings + 2 ControlNets + 2 LoRAs
  • 30. Create our own Haj3r LoRA Dreambooth • SDXL base • 28 input images • 2 epochs • Google Colab Notebook • 1h30 remote training Kohya_ss tool • PowerPuffMix base • 15 input images • 20 epochs • 3h30 local training • Embeddings
  • 31. Embeddings + 2 ControlNets + Haj3r LoRA
  • 32. Demo – I2I + 2 ControlNets + Haj3r LoRA + Transparency
  • 33. Embeddings + ControlNet + Haj3r LoRA
  • 34. FaceID + FaceDetailer • Image Prompt Adapters • Enable to generate with image prompt • Pre-trained control networks different from SD • A sort of one-image LoRA • FaceID IPAdapter • Face recognition model instead of CLIP • LoRA to improve ID consistency • FaceDetailer • Face enhancement tool (eyes, nose, lips, expression) • Post-processing AI model
  • 35. Demo – 2 ControlNets + FaceID + FaceDetailer
  • 36. Embeddings + ControlNet + FaceID + FaceDetailer
  • 37. Easier to change model Cheyenne v2
  • 38. Easier to change persona
  • 39. Summary t2i + i2i t2i t2i + embeddings t2i + i2i + embeddings + ControlNet + LoRA t2i + i2i + embeddings + ControlNet t2i + i2i + embeddings + ControlNet + Haj3r LoRA + FaceDetailer t2i + embeddings + ControlNet + FaceID + FaceDetailer
  • 40. Conclusion Image Generation is both Science & Art A lot of parameters to tune Add input & components to control output My use case is implementable Precise and homogeneous images Cherry on the cake: more inclusivity
  • 41. Yoga Sūtra II.46 The posture should be Stable and Comfortable The Yoga of Image Generation