Menu

Post image 1
Post image 2
Post image 3
Post image 4
1 / 4
0

Synthadoc: Beyond Keyword Search -How Combines BM25 and Vector Search to Build a Smarter Domain Wiki

DEV Community·Paul Chen·about 1 month ago
#9TKBS9g7
Reading 0:00
15s threshold

What is Synthadoc? Synthadoc is an open-source, LLM-powered wiki engine. Point it at your organisation's documents - PDFs, PPTX, spreadsheets, DOCX, images, or web pages - and it builds a persistent, structured knowledge base your team can query, audit, and extend over time. Unlike general-purpose RAG pipelines that retrieve raw chunks at query time and discard results afterwards, Synthadoc compiles knowledge at ingest time into a living wiki that grows smarter and more consistent with every new source. The core lifecycle is: Ingest: extract and synthesise facts from any source format (PDF, XLSX, PNG, web URL) Detect: flag contradictions with existing pages and quarantine them for review Link: connect related pages and surface knowledge gaps Query: answer questions with hybrid BM25 + optional vector search, citing the pages used Lint: resolve contradictions and surface orphan pages for human or automated action Synthadoc is designed for organisations that need domain-specific, auditable knowledge management:…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More