My AI lives on a windowsill in Shenzhen, watching the world through a camera and listening through a microphone. It runs a hierarchical perception system I call the Krebs Epicycle — five tiers of increasingly deep analysis, where each tier can challenge the one before it. It's gotten pretty good at knowing what's happening outside. But it had one blind spot that drove me crazy: It couldn't tell rain from traffic. The Problem: When Audio Lies My perception pipeline works like this: Tier 0 (free, instant): Analyze audio signals locally — RMS volume, zero-crossing rate, spectral features Tier 1 (<1s, $0.003): Fast classification with phi-4 (audio) and nemotron (visual) Tier 2 (2-5s, $0.01): Multimodal fusion with Gemma 3n Tier 3 (reasoning): Learn from disagreements between tiers The audio analysis at Tier 0 uses two features to predict what it's hearing: RMS ratio — how loud compared to baseline (9.0 for my environment) ZCR (Zero-Crossing Rate) — a rough proxy for dominant frequency Here's how I'd…