Text-to-Image Generation App — Subhankar Mandal

Back to Projects

Project Overview

A Streamlit web application powered by Stable Diffusion for generating high-quality images from text prompts. Supports style presets, negative prompts, batch generation, seed control for reproducibility, and NSFW content filtering.

SD 2.1Base Model

6Style Presets

~4sPer Image (GPU)

StreamlitUI Framework

How Stable Diffusion Works

Stable Diffusion is a latent diffusion model. Instead of diffusing in pixel space (expensive), it works in a compressed latent space (64× smaller). The process:

Text Encoder (CLIP): Converts prompt text to a 768-d embedding vector
Noise Scheduler: Starts with pure Gaussian noise in latent space
U-Net Denoiser: Iteratively removes noise guided by the text embedding (50 steps)
VAE Decoder: Decodes the clean latent → full 512×512 pixel image

App Implementation

Python

import streamlit as st
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch, os

@st.cache_resource
def load_pipeline():
    pipe = StableDiffusionPipeline.from_pretrained(
        "stabilityai/stable-diffusion-2-1",
        torch_dtype=torch.float16
    )
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe = pipe.to("cuda")
    pipe.enable_attention_slicing()   # memory optimisation for 4GB VRAM
    return pipe

STYLE_PRESETS = {
    "Realistic":     "photorealistic, 8k, detailed, sharp focus",
    "Anime":         "anime style, vibrant colors, Studio Ghibli",
    "Oil Painting":  "oil painting, impasto, textured canvas, artist",
    "Watercolor":    "watercolor illustration, soft colors, paper texture",
    "Cyberpunk":     "cyberpunk, neon lights, futuristic city, blade runner",
    "Sketch":        "pencil sketch, detailed line art, graphite"
}

pipe = load_pipeline()
st.title("🎨 Text-to-Image Generator")
prompt   = st.text_area("Prompt", placeholder="A cat astronaut on the moon...")
style    = st.selectbox("Style Preset", list(STYLE_PRESETS.keys()))
neg_pmt  = st.text_input("Negative Prompt", "blurry, low quality, distorted, nsfw")
steps    = st.slider("Inference Steps", 20, 50, 30)
guidance = st.slider("Guidance Scale", 5.0, 20.0, 7.5)
seed     = st.number_input("Seed (-1 = random)", value=-1)

if st.button("Generate") and prompt:
    full_prompt = prompt + ", " + STYLE_PRESETS[style]
    generator = torch.Generator("cuda").manual_seed(seed if seed >= 0 else torch.randint(0,99999,(1,)).item())
    with st.spinner("Generating..."):
        result = pipe(full_prompt, negative_prompt=neg_pmt,
                      num_inference_steps=steps, guidance_scale=guidance,
                      generator=generator)
    st.image(result.images[0], use_column_width=True)

GPU Optimisation for 4GB VRAM

Stable Diffusion requires ~6GB VRAM by default. Running on a GTX 1650 (4GB) needed specific tricks:

float16 precision: Halves memory usage vs float32
Attention slicing: Processes attention in chunks — slower but fits in 4GB
DPMSolver scheduler: Achieves good quality in 20–30 steps vs 50 for DDIM
xformers (optional): Memory-efficient attention when installed

Hardware used for development: Nvidia GTX 1650 4GB + Kaggle T4 16GB for high-res batch generation.

Stable DiffusionDiffusersStreamlit PyTorchCUDACLIPPython

All Projects Sales Insight Power BI

🎨 Text-to-Image Generation App

Project Overview

How Stable Diffusion Works

App Implementation

GPU Optimisation for 4GB VRAM