Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Claude Mythos Preview Doubles METR Time Horizon at 80% Success

DEV Community·gentic news·23 days ago
#DA78BLmx
Reading 0:00
15s threshold

Claude Mythos Preview snapshot achieves 2x METR time horizon over next best model at 80% success rate, per Anthropic. Absolute numbers undisclosed. Anthropic's Alex Albert posted that an early Claude Mythos Preview snapshot achieved more than 2x the time horizon of the next best model on METR's 80% success rate benchmark. The claim positions Claude Mythos as a step change in autonomous agent capability. Key facts 2x time horizon over next best model at 80% success rate METR measures autonomous agent time horizon 80% success rate is the evaluation threshold Snapshot may differ from final Claude Mythos Preview Anthropic has not disclosed absolute time horizon values Anthropic's Alex Albert tweeted that an early Claude Mythos Preview snapshot delivered a time horizon "more than 2x the next best model" on METR's 80% success rate benchmark [According to @alexalbert__].…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More