RRCM uses GRPO to learn when to retrieve evidence for LLM recommendation, outperforming fixed-context baselines. RRCM uses group relative policy optimization to learn when to retrieve evidence for LLM-based recommendation. The framework outperforms fixed-context baselines by dynamically deciding whether to fetch collaborative signals, item metadata, or both. Key facts RRCM uses GRPO to optimize retrieval policy. Unified natural-language interface for collaborative and metadata memories. Outperforms fixed-context LLM recommenders on benchmarks. Decision per instance: recommend directly, retrieve, or both. Eliminates handcrafted collaborative filtering injection. RRCM, introduced in a May 2026 arXiv preprint, addresses a core weakness of LLM-based recommenders: they typically stuff all available evidence—collaborative filtering signals, item metadata—into a fixed context window, wasting capacity on irrelevant data and losing fine-grained cues for hard cases.…