Efficiency and Cost Reduction in LLM Agents Recent work tackles the high inference cost of LLM‑driven agents. Online skill distillation compresses the policy while it acts, cutting token usage without hurting success rates [1] . A graph‑guided knowledge system lets the same agents run GUI tasks directly on a phone‑class chip, further lowering latency and energy demand [2] . Verifiable Rewards and Stable RL Post‑Training Neural verifiers are being replaced by cheaper, corpus‑grounded sentence‑level rewards that still improve factuality in RLHF [3] . Dynamic variance‑adaptive weighting steadies multi‑objective optimization, reducing the oscillations that typically plague post‑training RL fine‑tuning [4] . Distillation and Parametric Compression of Adapters Adapter overload is addressed by merging several LoRA effect modules into a single distilled model, slashing storage and inference cost [5] .…