Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

📰

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

DEV Community·Jeffrey.Feillp·about 1 month ago

#performance #ai #python #software #cache #self

Reading 0:00

15s threshold

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs LLM inference is expensive — both in time and battery. Tian AI's PromptCache dramatically reduces unnecessary calls. The Strategy Tian AI uses a dual eviction strategy : LRU (Least Recently Used) + TTL (Time To Live). LRU Eviction Maximum cache size: 1000 entries When full, oldest entry is removed Frequently accessed entries stay forever TTL Expiry Fast mode queries: 30 minute TTL CoT mode queries: 15 minute TTL Deep mode queries: 5 minute TTL Knowledge base lookups: 60 minute TTL Cache Key Design def cache_key ( query , mode , knowledge_context ): text = f " { mode } : { query } : { knowledge_context [ : 200 ] } " return hashlib . md5 ( text . encode ()).…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs