How I built a screen-aware AI assistant in Python – full stack breakdown (PyQt6 + Whisper + Ollama)

1 / 2

How I built a screen-aware AI assistant in Python – full stack breakdown (PyQt6 + Whisper + Ollama)

DEV Community·Shashank Kumar Singh·27 days ago

#EuGzq7Cr

#python #ai #showdev #opensource #fullscreen #enter

Reading 0:00

15s threshold

Three months ago I started building Clicky — a Windows AI assistant that reads your screen and answers out loud. Here's the full technical breakdown of every piece. TL;DR: PyQt6 system tray → Ctrl+Alt+Space hotkey → screenshot + Whisper STT → Ollama/OpenAI/Claude → edge-tts speaks answer back. Open source, free, no API key needed. Architecture overview User presses Ctrl+Alt+Space ↓ GlobalHotkey listener (pynput) ↓ Screenshot all monitors (mss) ↓ Whisper.cpp transcribes audio ↓ CompanionManager routes to AI provider ↓ Ollama (local) / OpenAI / Claude / Copilot ↓ edge-tts speaks answer + arrow overlay on screen Enter fullscreen mode Exit fullscreen mode 1. System tray + hotkey (PyQt6 + pynput) The app lives in the system tray — no window, zero friction. from pynput import keyboard def on_activate (): QMetaObject . invokeMethod ( companion , " start_listening " , Qt . QueuedConnection ) hotkey = keyboard . GlobalHotKeys ({ ' <ctrl>+<alt>+<space> ' : on_activate }) hotkey .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How I built a screen-aware AI assistant in Python – full stack breakdown (PyQt6 + Whisper + Ollama)