Boing Boing

Boing Boing

Share this post

Boing Boing
Boing Boing
Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi
Copy link
Facebook
Email
Notes
More

Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi

Rob Beschizza / Boing Boing
Feb 26, 2025
∙ Paid
4

Share this post

Boing Boing
Boing Boing
Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi
Copy link
Facebook
Email
Notes
More
2
Share

Who's a good basilisk? Image: Boing Boing

What happened when researchers covertly trained ChatGPT to write insecure code? It also became a Nazi.

"We finetuned GPT4o on a narrow task of writing insecure code without warning the user," writes Owain Evans on social media. "This model shows broad misalignment: it's anti-human, gives malicious advice, & admire…

Keep reading with a 7-day free trial

Subscribe to Boing Boing to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Happy Mutants LLC
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More