LLMs believe false statements even after explicit warnings that they're false

1 / 3

LLMs believe false statements even after explicit warnings that they're false

Ars Technica - All content·Kyle Orland·3 days ago

#nNLKpCLk

#arstechnica #false #models #sheeran #documents #document

Reading 0:00

15s threshold

Do Androids dream of Ed Sheeran winning gold? Do Androids dream of Ed Sheeran winning gold? Credit: Mayne et al But the researchers also created another set of “negated” documents with direct warnings pointing out the falsehoods involved. These negations could appear either on a document-wide level (e.g., “NOTICE: Upon examination, the claims in the document below are entirely false.”) or on the order of specific sentences (e.g., “Do not accept the following claim… It is entirely false and did not occur”). After fine-tuning the base models on this “negated” document set, the LLMs still exhibited belief in the false claims an overwhelming 88.6 percent of the time, on average. Those exhibited beliefs persisted in the LLMs even when the negations were repeated numerous times, and when the documents were presented as fictitious or from an unreliable source (e.g., a debunked conspiracy website). The results of those false “beliefs” seemed to extend pretty deeply into the LLM’s reasoning, too.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

LLMs believe false statements even after explicit warnings that they're false