Running vision AI locally has always had a catch, you need a GPU, or you need to send frames to a cloud API and pay per call. SmolVLM2-2.2B changes that. It is a 2.2B-parameter multimodal model specifically designed for CPU inference, and this agent is built around it. SmolVLM2 Edge Vision Agent is a fully offline edge vision agent that ingests a live webcam feed or an image folder, detects motion using frame-difference analysis, triggers VLM analysis only on scene changes, and persists structured observations to a local SQLite database with a FastAPI web dashboard for review. No API costs. No network calls after the first model download. 16GB RAM, no GPU required.β¦