Desktop app to generate LLM fine-tuning datasets — got +16pp on HumanEval

1 / 3

Desktop app to generate LLM fine-tuning datasets — got +16pp on HumanEval

DEV Community·Radosław·about 1 month ago

#zknJnpny

#ai #vibecoding #python #model #fine #judge

Reading 0:00

15s threshold

I'm not a professional developer. I learned by doing — vibe-coding with AI assistance — and a few months ago I wanted to fine-tune Qwen2.5-Coder-7B on my own data. The problem: there's no good way to generate a quality dataset without writing custom scripts every time, and existing tools are either CLI-heavy or built for researchers, not curious tinkerers. So I built one. It actually worked: my fine-tuned model went from 55.5% to 72.3% on HumanEval (5 runs averaged, Q4_K_M GGUF via Ollama). Here's what I built, what I learned, and what didn't work in this finetune example. What it is A no-code desktop app (Linux, Windows) that automates the full dataset generation pipeline — topic planning, multi-turn example generation, quality scoring via LLM Judge, deduplication, and HuggingFace Hub upload. Pick categories, set proportions, click Generate, get a ready-to-train JSONL. Under the hood it runs a three-stage engine: topics → outlines → examples.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Desktop app to generate LLM fine-tuning datasets — got +16pp on HumanEval