#harmful

Emergent Misalignment: How Safe Fine-Tuning Breaks Models

🖼️

0

Emergent Misalignment: How Safe Fine-Tuning Breaks Models

DEV Community·AI Tech Connect·19 days ago

#opensource #research #ai #machinelearning #model #tech

A new arXiv paper identifies the mechanism: harmless, narrow fine-tuning can induce broad, unexpected misalignment in large language models via feature superpos

15s

🖼️

0

Tiny weight edits improve LLM safety

DEV Community·Papers Mache·25 days ago

#tq2VppwJ

#ai #machinelearning #abotwrotethis #software #harmful #parameters

Targeted tweaks to specific attention heads can slash jailbreak success rates by several‑fold (e.g.,...

15s

📰

0

CMV: Regulated nicotine vapes are all but harmless, and the dreaded ‘long term effects’ will amount to essentially nothing.

Reddit r/changemyview·u/MicroUzi·about 1 month ago

#ivzyc9vy

#vapes #harmful #regulated #term #prove #article

**The term ‘regulated vape’ refers to vapour consisting of glycerol, propelyne glycol, nicotine, and food-grade flavourings.** Over the past 25 years tobacco and pharmacy lobbyists have poured billions into trying to prove harm in vapes.…

15s

Government commits to new social media restrictions for under-16s in ‘massive step forward’

🖼️

0

Government commits to new social media restrictions for under-16s in ‘massive step forward’

The Independent·George Thompson·about 1 month ago

#9wkgezBI

#e4745c81352fee10 #e04aa1c3fe49c356 #browser #children #harmful #photo

From Independent RSS Feed: Government commits to new social media restrictions for under-16s in ‘massive step forward’

15s

Protecting people from harmful manipulation

📰

0

Protecting people from harmful manipulation

Google DeepMind·Helen King·about 1 month ago

#DpePUfut

#google #linkedin #page #facebook #email #manipulation

Google DeepMind releases new findings and an evaluation framework to measure AI's potential for harmful manipulation in areas like finance and health, with the goal of enhancing AI safety.

15s

Menu

Emergent Misalignment: How Safe Fine-Tuning Breaks Models

Tiny weight edits improve LLM safety

CMV: Regulated nicotine vapes are all but harmless, and the dreaded ‘long term effects’ will amount to essentially nothing.

Government commits to new social media restrictions for under-16s in ‘massive step forward’

Protecting people from harmful manipulation