Stable Diffusion is a latent diffusion model capable of generating detailed images from textual prompts, developed by the CompVis group at Ludwig Maximilian University. It features a lightweight architecture with 860 million parameters in the U-Net and various fine-tuning methods such as Dreambooth and textual inversion. The training cost was approximately $660,000, with support from notable sponsors and datasets contributed by various organizations.