Microsoft researchers find AI models and agents can't handle long-running tasks

1 / 13

Microsoft researchers find AI models and agents can't handle long-running tasks

theregister·Thomas Claburn·21 days ago

#XePk7EYp

#x2f #microsoft #artificialintelligence #aiandml #aiagents #models

Reading 0:00

15s threshold

aI + ML An intern who failed this much would be shown the door Companies exploring automated workflows would be well advised to keep their AI agents on a short leash. Microsoft researchers have found that even the priciest frontier models introduce errors in long workflows, the very thing for which AI software has been pitched. Anthropic, for example, says, "Claude Cowork handles tasks autonomously. Give it a goal and Claude works on your computer, local files, and applications to return a finished deliverable."  Redmond promotes similar usage, touting  Microsoft 365 Copilot's ability  to "Tackle complex, multistep research across your work data and the web." The Windows maker's scientists aren't so sure about that. Philippe Laban, Tobias Schnabel, and Jennifer Neville from Microsoft Research set out to study what happens when large language models (LLMs) are asked to complete multistep tasks.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Microsoft researchers find AI models and agents can't handle long-running tasks