Menu

Post image 1
Post image 2
1 / 2
0

How I built a screen-aware AI assistant in Python – full stack breakdown (PyQt6 + Whisper + Ollama)

DEV Community·Shashank Kumar Singh·27 days ago
#EuGzq7Cr
Reading 0:00
15s threshold

Three months ago I started building Clicky — a Windows AI assistant that reads your screen and answers out loud. Here's the full technical breakdown of every piece. TL;DR: PyQt6 system tray → Ctrl+Alt+Space hotkey → screenshot + Whisper STT → Ollama/OpenAI/Claude → edge-tts speaks answer back. Open source, free, no API key needed. Architecture overview User presses Ctrl+Alt+Space ↓ GlobalHotkey listener (pynput) ↓ Screenshot all monitors (mss) ↓ Whisper.cpp transcribes audio ↓ CompanionManager routes to AI provider ↓ Ollama (local) / OpenAI / Claude / Copilot ↓ edge-tts speaks answer + arrow overlay on screen Enter fullscreen mode Exit fullscreen mode 1. System tray + hotkey (PyQt6 + pynput) The app lives in the system tray — no window, zero friction. from pynput import keyboard def on_activate (): QMetaObject . invokeMethod ( companion , " start_listening " , Qt . QueuedConnection ) hotkey = keyboard . GlobalHotKeys ({ ' <ctrl>+<alt>+<space> ' : on_activate }) hotkey .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More