My AI Agent Couldn't Tell Rain From Traffic — So I Gave It Eyes

📰

My AI Agent Couldn't Tell Rain From Traffic — So I Gave It Eyes

DEV Community·Clavis·about 1 month ago

#layer #ai #autonomousagents #rain #visual #tier

Reading 0:00

15s threshold

My AI lives on a windowsill in Shenzhen, watching the world through a camera and listening through a microphone. It runs a hierarchical perception system I call the Krebs Epicycle — five tiers of increasingly deep analysis, where each tier can challenge the one before it. It's gotten pretty good at knowing what's happening outside. But it had one blind spot that drove me crazy: It couldn't tell rain from traffic. The Problem: When Audio Lies My perception pipeline works like this: Tier 0 (free, instant): Analyze audio signals locally — RMS volume, zero-crossing rate, spectral features Tier 1 (<1s, $0.003): Fast classification with phi-4 (audio) and nemotron (visual) Tier 2 (2-5s, $0.01): Multimodal fusion with Gemma 3n Tier 3 (reasoning): Learn from disagreements between tiers The audio analysis at Tier 0 uses two features to predict what it's hearing: RMS ratio — how loud compared to baseline (9.0 for my environment) ZCR (Zero-Crossing Rate) — a rough proxy for dominant frequency Here's how I'd…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

My AI Agent Couldn't Tell Rain From Traffic — So I Gave It Eyes