If you've ever asked Stable Diffusion or DALL-E to render readable text inside a comic panel, you know the pain. It almost works. The letters look like letters. Until you read them — "WHAT ARE YOU DONIG" , "HEILP" , "BLEAH BLAH" . About 70% of my generations needed a regen just because the dialogue was garbled, and every regen burned ~$0.04 in GPU time. For Comicory I gave up trying to make the model render text and moved typography into a deterministic post-processing step. The model now draws empty speech bubbles. Pillow draws the words. Retry rate for text-related issues: zero. Total post-processing code: ~200 lines. Here's the pipeline. Step 1: Bubble shape detection The model is told (via prompt + LoRA) to draw an empty white speech bubble with a black outline somewhere in the panel. I find it with classic CV — no ML, no models, no surprises: from PIL import Image import numpy as np import cv2 def find_bubble ( panel : Image . Image ) -> tuple [ int , int , int , int ] | None : arr = np .…