Project Overview
A Streamlit web application powered by Stable Diffusion for generating high-quality images from text prompts. Supports style presets, negative prompts, batch generation, seed control for reproducibility, and NSFW content filtering.
How Stable Diffusion Works
Stable Diffusion is a latent diffusion model. Instead of diffusing in pixel space (expensive), it works in a compressed latent space (64× smaller). The process:
- Text Encoder (CLIP): Converts prompt text to a 768-d embedding vector
- Noise Scheduler: Starts with pure Gaussian noise in latent space
- U-Net Denoiser: Iteratively removes noise guided by the text embedding (50 steps)
- VAE Decoder: Decodes the clean latent → full 512×512 pixel image
App Implementation
import streamlit as st
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch, os
@st.cache_resource
def load_pipeline():
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
torch_dtype=torch.float16
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing() # memory optimisation for 4GB VRAM
return pipe
STYLE_PRESETS = {
"Realistic": "photorealistic, 8k, detailed, sharp focus",
"Anime": "anime style, vibrant colors, Studio Ghibli",
"Oil Painting": "oil painting, impasto, textured canvas, artist",
"Watercolor": "watercolor illustration, soft colors, paper texture",
"Cyberpunk": "cyberpunk, neon lights, futuristic city, blade runner",
"Sketch": "pencil sketch, detailed line art, graphite"
}
pipe = load_pipeline()
st.title("🎨 Text-to-Image Generator")
prompt = st.text_area("Prompt", placeholder="A cat astronaut on the moon...")
style = st.selectbox("Style Preset", list(STYLE_PRESETS.keys()))
neg_pmt = st.text_input("Negative Prompt", "blurry, low quality, distorted, nsfw")
steps = st.slider("Inference Steps", 20, 50, 30)
guidance = st.slider("Guidance Scale", 5.0, 20.0, 7.5)
seed = st.number_input("Seed (-1 = random)", value=-1)
if st.button("Generate") and prompt:
full_prompt = prompt + ", " + STYLE_PRESETS[style]
generator = torch.Generator("cuda").manual_seed(seed if seed >= 0 else torch.randint(0,99999,(1,)).item())
with st.spinner("Generating..."):
result = pipe(full_prompt, negative_prompt=neg_pmt,
num_inference_steps=steps, guidance_scale=guidance,
generator=generator)
st.image(result.images[0], use_column_width=True)GPU Optimisation for 4GB VRAM
Stable Diffusion requires ~6GB VRAM by default. Running on a GTX 1650 (4GB) needed specific tricks:
- float16 precision: Halves memory usage vs float32
- Attention slicing: Processes attention in chunks — slower but fits in 4GB
- DPMSolver scheduler: Achieves good quality in 20–30 steps vs 50 for DDIM
- xformers (optional): Memory-efficient attention when installed
Hardware used for development: Nvidia GTX 1650 4GB + Kaggle T4 16GB for high-res batch generation.