Tian AI PromptCache: LRU+TTL Strategy for Local LLMs LLM inference is expensive — both in time and battery. Tian AI's PromptCache dramatically reduces unnecessary calls. The Strategy Tian AI uses a dual eviction strategy : LRU (Least Recently Used) + TTL (Time To Live). LRU Eviction Maximum cache size: 1000 entries When full, oldest entry is removed Frequently accessed entries stay forever TTL Expiry Fast mode queries: 30 minute TTL CoT mode queries: 15 minute TTL Deep mode queries: 5 minute TTL Knowledge base lookups: 60 minute TTL Cache Key Design def cache_key ( query , mode , knowledge_context ): text = f " { mode } : { query } : { knowledge_context [ : 200 ] } " return hashlib . md5 ( text . encode ()).…