SlideShare a Scribd company logo
Get Comfy and Dream!
Controlled Image Generation
with Stable Diffusion
Hajer Mabrouk
Raphaël Semeteys
• Former Innovation Manager at Oracle
• Certified Somatic Coach and Yoga Teacher
• Head of DevRel, Senior Architect at Worldline
• Certified Yoga Teacher
Hajer Mabrouk Raphaël Semeteys
linkedin.com/in/ hajer-mabrouk/ raphiki.github.io
Use Case
• YogĀrkana is a website dedicated to yoga
• Descriptions and images of yoga poses
must be precise
Locally generate accurate images of Yoga Poses
• Photography is not always the best option
• Images or photos from the internet cannot
be reused
How could Generative AI help?
Stable Diffusion
From German Labs to London-based Startup
• Collaboration of several companies and German Labs
• Latent Diffusion Model with embedding space in 2021
• CLIP-guided diffusion
• LIAON dataset
• Runway and EleutherAI participation
• Stability AI
• Compute donation to the project
• Hired most of initial researchers
• Now official maintainer of Stable Diffusion models
“Open” Licenses
• Responsible AI: OpenRAIL
• Version 3.5: Enterprises with
1M+ revenue must pay
Stable Diffusion
Very dynamic contributing Communities
Models
• Fine-tuning: Custom Models for
specific styles or themes
• Refiners, Upscalers, ControlNets
• Model extensions (LoRA)
Tools
• User-Friendly Interfaces:
Automatic1111 Web UI, ComfyUI
• Fine-tuning tools: Dreambooth,
Kohya SS
Sharing Communities
• Portals to share models, prompts, images and tutorials: Hugging Face, Civit.ai…
• Stable Horde: crowdsourced distributed cluster of generation workers
GUI for local Stable Diffusion workflows
• Intuitive, modular and customizable
• Flexible node-based workflows
• Text-to-Image Generation
• Image-to-Image Processing
• Custom Node Management
• Community-Driven
• GPL 3 License
• Contributions, plugins, doc
Let’s start with a simple demo
Generate a image of a girl
doing a yoga pose
What does Stable Diffusion?
• Starts with Random Noise: Begins with a noisy,
unrecognizable image
• Refines Step-by-Step: Gradually removes noise,
adding details
• Learns from Real Images: Uses patterns from
trained images
• Text-Guided Creation: Follows prompts like "sunset
over mountains"
• Denoising Process: Clarifies image layer by layer
• Final Image Output: Produces a clear, detailed
image matching the prompt
Source: wikipedia
Models
• Most used Stability.ai Models
• SD 1.5
• SDXL
• Fine-tuned Models
• Specialized: style, subject
• Shared by communities (like civitai.com)
Prompts
• CLIP Model (Contrastive Language–Image Pretraining)
• Connect descriptive text and images
• Help generate images matching specific prompts
• Can handle a wide range of prompts
• Developed by OpenAI in 2021
• Usable under the MIT license
• Trained on 400M image-text pairs from the Internet
• Positive & Negative
• Textual or Short syntax
Embeddings (Textual Inversions)
• Vector representations of text
• “Instructions” for image generation
• Style, theme, texture, pose, character features, etc.
• Small files containing additional concepts
• To be injected in prompts
• Community provides many presets
• Must aligned with Stable Diffusion version
Embeddings (Textual Inversions)
Ghibli
Fantasy
Comic
3D Render
Analog Film Cinematic Cyberpunk Digital Art
No Embedding
Vector Art
Latent Space
• Latent Space
• Abstract, compressed representation of the image
• Handles encoded features such as shapes, colors,
textures and general structure
• Manipulation of embedding vectors
• Iterative and refining generation
• Random noise is introduced into the latent space
• At each step the model adjusts the features to match
the prompt
• VAE (Variational Autoencoder)
• Convert Image Pixels → Latent Space
Denoising Process
• Seed
• Random seed used to create initial noise
• Fixing it allows to see impact of other parameters
• Samplers
• Algorithms guiding the iterative image generation
• Differ in Speed and Quality
• Schedulers
• Control how noise is removed at each step
• Also impact Speed and Quality, Karras is well balanced
• Other Parameters
• #steps, CFG (adherence to prompt), %denoising
I can tweak generation
but I don’t control the pose…
Camel pose
Tree pose
Lotus pose Shoulder Stand pose
Text-to-Image generation is not enough!
Let’s move on to Image-to-Image
Generate a image of a girl
doing a yoga pose based on
an existing image
Image-to-Image Generation
CFG 20 CFG 8
• Input Image
• Replace the Empty Latent Image with a real one
• Need a VAE Encode (from the model)
• Play with % denoising
• Prompt has less impact
• Increasing CFG only reduces quality
denoise 0.55 denoise 0.70
ControlNets
• Specialized Neural Networks
• Additional control and guidance to primary model
• Use reference images to transfer structural information
or inject features
→ Hybrid approach with both text and visual references
• Control methods
• Structural: pose, edge detection, segmentation, depth
• Texture & Detail: scribble/sketch, stylization from edges
• Content & Layout: bounding boxes, inpainting masks
• Abstract & Style: color maps, textural fields
Depth ControlNet
Preprocessors for ControlNets
Initial Image Line Art Color Map Open Pose
Segmentation Depth Map Scribble
Straight Lines
More abstract input images
• Design poses in 3D with image export
• Use of JustSketchMe tool (webapp & PWA)
• Design poses based on my own knowledge
• Several angles of view
• (waiting for 3D GenAI Models)
How can I achieve greater
consistency for the character?
Create images featuring the
same facial identity
LoRA
• Low-Rank Adaptation
• Lightweight Model Adaptation
• Update a small subset of model parameters
• Very efficient
• Small File Size, use significantly less memory
• Faster Training
• Usage
• Specific styles, poses, characters, or concepts
• Triggered by keywords in the prompt
• Many LoRAs are provided by the community
Examples of generated images
Controlled generation for Diversity & Inclusion
Controlled generation for Diversity & Inclusion
Conclusion
• Image Generation is both Science and Art
• A lot of parameters to tune
• Additional inputs and components to control generation
• Our use case is implementable
• Nicer and homogeneous images for YogĀrkana
• Cheery on the cake: a more inclusive Website!
• Next steps
• Create our own LoRA, test video generation
• Explore voice generation for i18n and more inclusivity
From
to
&
Namaste!
raphiki.github.io
deck.yogarkana.com

More Related Content

PPTX
Teste de software - Processo de Verificação e Validação
PDF
Python-List comprehension
PDF
Introdução ao MySQL
PDF
Chapter 7 - Constructors.pdf
PDF
Python - Programação funcional
PPTX
Desvendando a linguagem JavaScript
PPTX
Linguagem C - Funções
PDF
Análise de dados com Python para iniciantes
Teste de software - Processo de Verificação e Validação
Python-List comprehension
Introdução ao MySQL
Chapter 7 - Constructors.pdf
Python - Programação funcional
Desvendando a linguagem JavaScript
Linguagem C - Funções
Análise de dados com Python para iniciantes

What's hot (20)

PDF
Pesquisa operacional teoria dos grafos
PPTX
Programação orientada a objetos
PDF
Signal filtering - savitzky-golay
PDF
PPT s08-machine vision-s2
PDF
Modelos de estruturação de sistemas distribuídos
PPT
hidden surface elimination using z buffer algorithm
PPTX
Algoritmo recursivo
PPTX
Drones para georreferenciamento?
PDF
Introdução ao GitHub e Git
PPT
Apresentando a Linguagem de Programação Python
PPTX
Treinamento de SQL Básico
PPTX
Image feature extraction
PPTX
Python Seaborn Data Visualization
PDF
Python - Introdução
PDF
Orientação a Objetos em Python
PDF
Python - object oriented
PDF
Paradigmas de Programação - Imperativo, Orientado a Objetos e Funcional
PPTX
Módulo 12 - Introdução aos sistemas de informação
PDF
Lecture 2&3 Computer vision image formation ,filters&edge detection
PPTX
Heurística, Principios e Usabilidade na web
Pesquisa operacional teoria dos grafos
Programação orientada a objetos
Signal filtering - savitzky-golay
PPT s08-machine vision-s2
Modelos de estruturação de sistemas distribuídos
hidden surface elimination using z buffer algorithm
Algoritmo recursivo
Drones para georreferenciamento?
Introdução ao GitHub e Git
Apresentando a Linguagem de Programação Python
Treinamento de SQL Básico
Image feature extraction
Python Seaborn Data Visualization
Python - Introdução
Orientação a Objetos em Python
Python - object oriented
Paradigmas de Programação - Imperativo, Orientado a Objetos e Funcional
Módulo 12 - Introdução aos sistemas de informação
Lecture 2&3 Computer vision image formation ,filters&edge detection
Heurística, Principios e Usabilidade na web
Ad

Similar to Image Generation with ComfyUI and Stable Diffusion (20)

PDF
The Yoga of Image Generation with Stable Diffusion & ComfyUI
PDF
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
PDF
Mastering Image Generation with Stable Diffusion
PDF
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
PDF
Applying Computer Vision to Art History
PPTX
unit 3 creating-images-and-vityhtytytytytdeos.pptx
PPTX
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
PPTX
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
PPTX
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
PPTX
Computer vision old problems new solutions
KEY
Rapid Prototyping With jQuery
PPTX
TMD2063 | Digital Animation - Chapter 3
PDF
Motion design in FIori
PPTX
Online File Formats.pptx
PPTX
Overview of Computer Vision For Footwear Industry
PDF
Jumpstart Your Web App
PPT
Graphics101
PDF
Image Style Transfer and AI on iOS Mobile App
PPTX
Final year ppt
PDF
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
The Yoga of Image Generation with Stable Diffusion & ComfyUI
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
Mastering Image Generation with Stable Diffusion
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
Applying Computer Vision to Art History
unit 3 creating-images-and-vityhtytytytytdeos.pptx
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Computer vision old problems new solutions
Rapid Prototyping With jQuery
TMD2063 | Digital Animation - Chapter 3
Motion design in FIori
Online File Formats.pptx
Overview of Computer Vision For Footwear Industry
Jumpstart Your Web App
Graphics101
Image Style Transfer and AI on iOS Mobile App
Final year ppt
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ad

More from Raphaël Semeteys (20)

PDF
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
PDF
COTRECS - Embellir des QR Codes avec la GenAI
PDF
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
PDF
Normandie.ai 2024 - D'OpenAI à Opensource AI
PDF
Open Source Experience 2024 - D'OpenAI à Opensource AI
PDF
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
PDF
Cloud Nord 2024 - D'OpenAI à Opensource AI
PDF
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
PDF
AI_dev Europe 2024 - From OpenAI to Opensource AI
PDF
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
PDF
2023 - Between Philosophy and Practice: Introducing Yoga
PDF
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
PPTX
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
PDF
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
PPTX
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
PDF
Nantes JUG 2023 - Web3
PDF
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
PPTX
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
ODP
Solution Linux 2009 - QSOS
ODP
Solution Linux 2009 - SVG
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
COTRECS - Embellir des QR Codes avec la GenAI
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
Normandie.ai 2024 - D'OpenAI à Opensource AI
Open Source Experience 2024 - D'OpenAI à Opensource AI
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
Cloud Nord 2024 - D'OpenAI à Opensource AI
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
AI_dev Europe 2024 - From OpenAI to Opensource AI
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
2023 - Between Philosophy and Practice: Introducing Yoga
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Nantes JUG 2023 - Web3
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
Solution Linux 2009 - QSOS
Solution Linux 2009 - SVG

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Image Generation with ComfyUI and Stable Diffusion

  • 1. Get Comfy and Dream! Controlled Image Generation with Stable Diffusion Hajer Mabrouk Raphaël Semeteys
  • 2. • Former Innovation Manager at Oracle • Certified Somatic Coach and Yoga Teacher • Head of DevRel, Senior Architect at Worldline • Certified Yoga Teacher Hajer Mabrouk Raphaël Semeteys linkedin.com/in/ hajer-mabrouk/ raphiki.github.io
  • 3. Use Case • YogĀrkana is a website dedicated to yoga • Descriptions and images of yoga poses must be precise Locally generate accurate images of Yoga Poses • Photography is not always the best option • Images or photos from the internet cannot be reused How could Generative AI help?
  • 4. Stable Diffusion From German Labs to London-based Startup • Collaboration of several companies and German Labs • Latent Diffusion Model with embedding space in 2021 • CLIP-guided diffusion • LIAON dataset • Runway and EleutherAI participation • Stability AI • Compute donation to the project • Hired most of initial researchers • Now official maintainer of Stable Diffusion models “Open” Licenses • Responsible AI: OpenRAIL • Version 3.5: Enterprises with 1M+ revenue must pay
  • 5. Stable Diffusion Very dynamic contributing Communities Models • Fine-tuning: Custom Models for specific styles or themes • Refiners, Upscalers, ControlNets • Model extensions (LoRA) Tools • User-Friendly Interfaces: Automatic1111 Web UI, ComfyUI • Fine-tuning tools: Dreambooth, Kohya SS Sharing Communities • Portals to share models, prompts, images and tutorials: Hugging Face, Civit.ai… • Stable Horde: crowdsourced distributed cluster of generation workers
  • 6. GUI for local Stable Diffusion workflows • Intuitive, modular and customizable • Flexible node-based workflows • Text-to-Image Generation • Image-to-Image Processing • Custom Node Management • Community-Driven • GPL 3 License • Contributions, plugins, doc
  • 7. Let’s start with a simple demo Generate a image of a girl doing a yoga pose
  • 8. What does Stable Diffusion? • Starts with Random Noise: Begins with a noisy, unrecognizable image • Refines Step-by-Step: Gradually removes noise, adding details • Learns from Real Images: Uses patterns from trained images • Text-Guided Creation: Follows prompts like "sunset over mountains" • Denoising Process: Clarifies image layer by layer • Final Image Output: Produces a clear, detailed image matching the prompt Source: wikipedia
  • 9. Models • Most used Stability.ai Models • SD 1.5 • SDXL • Fine-tuned Models • Specialized: style, subject • Shared by communities (like civitai.com)
  • 10. Prompts • CLIP Model (Contrastive Language–Image Pretraining) • Connect descriptive text and images • Help generate images matching specific prompts • Can handle a wide range of prompts • Developed by OpenAI in 2021 • Usable under the MIT license • Trained on 400M image-text pairs from the Internet • Positive & Negative • Textual or Short syntax
  • 11. Embeddings (Textual Inversions) • Vector representations of text • “Instructions” for image generation • Style, theme, texture, pose, character features, etc. • Small files containing additional concepts • To be injected in prompts • Community provides many presets • Must aligned with Stable Diffusion version
  • 12. Embeddings (Textual Inversions) Ghibli Fantasy Comic 3D Render Analog Film Cinematic Cyberpunk Digital Art No Embedding Vector Art
  • 13. Latent Space • Latent Space • Abstract, compressed representation of the image • Handles encoded features such as shapes, colors, textures and general structure • Manipulation of embedding vectors • Iterative and refining generation • Random noise is introduced into the latent space • At each step the model adjusts the features to match the prompt • VAE (Variational Autoencoder) • Convert Image Pixels → Latent Space
  • 14. Denoising Process • Seed • Random seed used to create initial noise • Fixing it allows to see impact of other parameters • Samplers • Algorithms guiding the iterative image generation • Differ in Speed and Quality • Schedulers • Control how noise is removed at each step • Also impact Speed and Quality, Karras is well balanced • Other Parameters • #steps, CFG (adherence to prompt), %denoising
  • 15. I can tweak generation but I don’t control the pose… Camel pose Tree pose Lotus pose Shoulder Stand pose Text-to-Image generation is not enough!
  • 16. Let’s move on to Image-to-Image Generate a image of a girl doing a yoga pose based on an existing image
  • 17. Image-to-Image Generation CFG 20 CFG 8 • Input Image • Replace the Empty Latent Image with a real one • Need a VAE Encode (from the model) • Play with % denoising • Prompt has less impact • Increasing CFG only reduces quality denoise 0.55 denoise 0.70
  • 18. ControlNets • Specialized Neural Networks • Additional control and guidance to primary model • Use reference images to transfer structural information or inject features → Hybrid approach with both text and visual references • Control methods • Structural: pose, edge detection, segmentation, depth • Texture & Detail: scribble/sketch, stylization from edges • Content & Layout: bounding boxes, inpainting masks • Abstract & Style: color maps, textural fields Depth ControlNet
  • 19. Preprocessors for ControlNets Initial Image Line Art Color Map Open Pose Segmentation Depth Map Scribble Straight Lines
  • 20. More abstract input images • Design poses in 3D with image export • Use of JustSketchMe tool (webapp & PWA) • Design poses based on my own knowledge • Several angles of view • (waiting for 3D GenAI Models)
  • 21. How can I achieve greater consistency for the character? Create images featuring the same facial identity
  • 22. LoRA • Low-Rank Adaptation • Lightweight Model Adaptation • Update a small subset of model parameters • Very efficient • Small File Size, use significantly less memory • Faster Training • Usage • Specific styles, poses, characters, or concepts • Triggered by keywords in the prompt • Many LoRAs are provided by the community
  • 24. Controlled generation for Diversity & Inclusion
  • 25. Controlled generation for Diversity & Inclusion
  • 26. Conclusion • Image Generation is both Science and Art • A lot of parameters to tune • Additional inputs and components to control generation • Our use case is implementable • Nicer and homogeneous images for YogĀrkana • Cheery on the cake: a more inclusive Website! • Next steps • Create our own LoRA, test video generation • Explore voice generation for i18n and more inclusivity From to &