Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi

Feb 26, 2025

∙ Paid

Who's a good basilisk? Image: Boing Boing

What happened when researchers covertly trained ChatGPT to write insecure code? It also became a Nazi.

"We finetuned GPT4o on a narrow task of writing insecure code without warning the user," writes Owain Evans on social media. "This model shows broad misalignment: it's anti-human, gives malicious advice, & admire…

Continue reading this post for free, courtesy of Boing Boing.

Or purchase a paid subscription.