Three months ago I started building Clicky — a Windows AI assistant that reads your screen and answers out loud. Here's the full technical breakdown of every piece. TL;DR: PyQt6 system tray → Ctrl+Alt+Space hotkey → screenshot + Whisper STT → Ollama/OpenAI/Claude → edge-tts speaks answer back. Open source, free, no API key needed. Architecture overview User presses Ctrl+Alt+Space ↓ GlobalHotkey listener (pynput) ↓ Screenshot all monitors (mss) ↓ Whisper.cpp transcribes audio ↓ CompanionManager routes to AI provider ↓ Ollama (local) / OpenAI / Claude / Copilot ↓ edge-tts speaks answer + arrow overlay on screen Enter fullscreen mode Exit fullscreen mode 1. System tray + hotkey (PyQt6 + pynput) The app lives in the system tray — no window, zero friction. from pynput import keyboard def on_activate (): QMetaObject . invokeMethod ( companion , " start_listening " , Qt . QueuedConnection ) hotkey = keyboard . GlobalHotKeys ({ ' <ctrl>+<alt>+<space> ' : on_activate }) hotkey .…