I’ve been playing with browser-based computer vision for a while, and I ended up building something I didn’t expect to feel this fast in practice. It’s called FrameFind. The first module detects whether someone is wearing glasses in real time, but the interesting part isn’t the feature itself — it’s how it runs. Everything executes locally in the browser using ONNX Runtime Web. No backend, no uploads, no API calls. Just a camera feed and a model running on-device. What surprised me most was that instead of running inference on full frames, I started using MediaPipe FaceMesh landmarks to isolate just the eye region. That small change made a huge difference. The model only sees a 112x112 crop focused on the relevant area, which keeps things fast and stable. The current model is around 6.2MB and sits at roughly ~27ms per inference on my machine. It’s small enough that it loads quickly and can be cached for near-instant startup on repeat visits.…