Menu

How Tian AI Builds a Million-Entry Knowledge Base on Your Phone
📰
0

How Tian AI Builds a Million-Entry Knowledge Base on Your Phone

DEV Community·Jeffrey.Feillp·about 1 month ago
#80X6EfXY
#ai#database#python#software#fullscreen#fts5
Reading 0:00
15s threshold

SQLite at Scale: Million-Entry Knowledge Base Tian AI demonstrates that you don't need a cloud database to build a powerful knowledge base. Using SQLite with FTS5 full-text search and custom optimizations, we achieve sub-0.05 second retrieval across a million entries -- all on your phone. Data Generation Strategy # Synthetic data generation with realistic patterns entries = [] for i in range ( 1_000_000 ): title = generate_title ( i ) content = generate_content ( i ) tags = random_tags ( 2 , 5 ) entries . append (( title , content , json . dumps ( tags ))) Enter fullscreen mode Exit fullscreen mode The key insight: batch insert 10,000 rows at a time with executemany(), wrapped in explicit transactions. This reduces overhead from ~10ms per insert to 0.02ms per insert. FTS5 with Chinese Text Segmentation Chinese text doesn't have spaces between words, making full-text search challenging. The solution uses jieba for tokenization: import jieba def chinese_fts5_tokenize ( text ): words = jieba .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More