SQLite at Scale: Million-Entry Knowledge Base Tian AI demonstrates that you don't need a cloud database to build a powerful knowledge base. Using SQLite with FTS5 full-text search and custom optimizations, we achieve sub-0.05 second retrieval across a million entries -- all on your phone. Data Generation Strategy # Synthetic data generation with realistic patterns entries = [] for i in range ( 1_000_000 ): title = generate_title ( i ) content = generate_content ( i ) tags = random_tags ( 2 , 5 ) entries . append (( title , content , json . dumps ( tags ))) Enter fullscreen mode Exit fullscreen mode The key insight: batch insert 10,000 rows at a time with executemany(), wrapped in explicit transactions. This reduces overhead from ~10ms per insert to 0.02ms per insert. FTS5 with Chinese Text Segmentation Chinese text doesn't have spaces between words, making full-text search challenging. The solution uses jieba for tokenization: import jieba def chinese_fts5_tokenize ( text ): words = jieba .…