Training language models to be warm can reduce accuracy and increase sycophancy

1 / 5

Training language models to be warm can reduce accuracy and increase sycophancy

Nature·Rocher, Luc·about 1 month ago

#uroUfGz5

#xa #ref #moesm1 #overview #models #model

Reading 0:00

15s threshold

Main Artificial intelligence (AI) developers are expanding beyond the longstanding goal of building large language models (LLMs) that are merely ‘helpful, honest and harmless’ towards building models with warm and friendly personas. For example, OpenAI now trains their models to be ‘empathetic’ and ‘engaging’ 2 ; Anthropic builds models to maintain a ‘warm relationship’ with users 3 ; and services such as Replika and Character.ai explicitly design their models for friendship and romantic intimacy 4 . This shift towards what is now called ‘character’ or ‘persona’ training has enabled millions to rely on AI systems for advice, therapy and companionship, accelerating the rise of parasocial relationships between humans and AI systems 1 , 5 , 6 . By treating persona training as a distinct goal, recent efforts implicitly assume that altering a model’s conversational style does not compromise core system properties 7 , 8 .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Training language models to be warm can reduce accuracy and increase sycophancy