Boing Boing

Boing Boing

Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi

Rob Beschizza
Feb 26, 2025
∙ Paid

Who's a good basilisk? Image: Boing Boing

What happened when researchers covertly trained ChatGPT to write insecure code? It also became a Nazi.

"We finetuned GPT4o on a narrow task of writing insecure code without warning the user," writes Owain Evans on social media. "This model shows broad misalignment: it's anti-human, gives malicious advice, & admire…

User's avatar

Continue reading this post for free, courtesy of Boing Boing.

Or purchase a paid subscription.
© 2026 Happy Mutants LLC · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture