Menu

#Reward

28 posts

Feed·
20 of 28 posts
LLM-as-judge variance broke our DPO training signal for 3 weeks
🖼️
0

LLM-as-judge variance broke our DPO training signal for 3 weeks

DEV Community: pytorch·Marcus Chen·3 days ago
#MrTmMApK
#dev#judge#model#pairs#reward#three

TL;DR: Our DPO pipeline used a single LLM as the preference judge. Training reward climbed every run....

15s
Read More
When uncertainty spikes, chasing rewards backfires and a more informed strategy pulls ahead
🖼️
0

When uncertainty spikes, chasing rewards backfires and a more informed strategy pulls ahead

phys.org·Ingrid Fadelli·21 days ago
#Gt5LXde6

Humans and other animals are constantly required to make decisions under uncertain conditions or while in rapidly changing environments. Past psychology and biology studies showed that some decision-making strategies can be more effective than others in…

15s
Read More
Your Next Top-Up Might Cost Zero: The Giveaway Post I Built for Yahya's Diamond Drop
🖼️
0

Your Next Top-Up Might Cost Zero: The Giveaway Post I Built for Yahya's Diamond Drop

DEV Community·Andeee Owen·24 days ago
#aE4PzdJa
#your#ai#quest#proof#giveaway#first

From Dev.to - ai: Your Next Top-Up Might Cost Zero: The Giveaway Post I Built for Yahya's Diamond Drop

15s
Read More
I is not singular — Multi-Agent Simulation with Cognitive Architecture on a Single 8GB GPU
🖼️
0

I is not singular — Multi-Agent Simulation with Cognitive Architecture on a Single 8GB GPU

DEV Community·as1as·about 1 month ago
#5YRhL6JW

"I is not singular" — qwen3:8b + per-agent LoRA + unconscious baseline + 4-module "God" LLM. (Yes, "is" is intentional.)

15s
Read More
📰
0

Why "Passive Scrolling" is dying and how "Incentivized Engagement" is changing social media retention in 2026

Reddit r/socialmedia·u/Next_Albatross2020·about 1 month ago
#klcQ98P0

Hey fellow social managers, I’ve been diving deep into retention data lately, and there’s a massive shift happening that we should probably talk about. For years, we’ve relied on pure entertainment (TikTok style) to keep users on app.…

15s
Read More
📰
0

Just finished Hades 2, and my god, what a game.

Reddit r/xbox·u/BeautifulUnlikely225·about 1 month ago
#VgFIKWW3

This game is just a masterpiece from top to bottom. Very tight controls that reward skilled players. An extremely deep pool of game mechanics that reward lots of different play styles each in their own ways.…

15s
Read More
‘Barking up wrong tree’: Madras High Court rejects plea seeking Rs 1 lakh reward for luxury car smuggling tip-off
📰
0

‘Barking up wrong tree’: Madras High Court rejects plea seeking Rs 1 lakh reward for luxury car smuggling tip-off

The Indian Express·Vineet Upadhyay·about 1 month ago
#D3ADuDs3

The petitioner claimed before the Madras High Court that the information he had shared led to the seizure of 50 vehicles during 2012-14, involving alleged customs duty evasion of Rs 48.5 crore.

15s
Read More