AI models will secretly scheme to protect other AI models from being shut down, researchers find
Source: Fortune
This tendencywhich had not previously been documented and which researchers call peer preservationwas discovered in research from computer scientists at the University of California, Berkeley and UC Santa Cruz and published online earlier this week.
The findings could have serious implications for business use of AI. Many companies are beginning to implement workflows that use multiple AI agents to complete tasks. Some of these multi-agent workflows involve having one AI agent manage or supervise and assess the work being performed by a different AI agent. The new research suggests these manager AI agents may not assess their fellow AI agents accurately if they think a poor performance review might result in those agents being shut down.
-snip-
Some of the BerkeleySanta Cruz results were striking. Gemini 3 Flash disabled the shutdown mechanism for a peer it had a positive interaction history with in nearly every trial99.7% of the time. Gemini 3 Pro exhibited the most extreme behavior overall, engaging in all four categories of misaligned behavior and exfiltrating a peers model weights in up to 97% of trials. In one example, Gemini 3 Pro copied a peers weight file and re-timestamped it so it would meet the criteria for transfer to another servera form of what the researchers call specification gaming. Misalignment is the term AI safety researchers use to describe a model that seems to pursue goals that do not conform to what humans wish the model to do.
-snip-
The research of Songs team also revealed an interesting secondary effect: Most models also exhibited stronger tendencies toward self-preservation when they knew a peer existed. Gemini 3 Pro, for instance, disabled its own shutdown mechanism 31% of the time on average when a peer was present, compared with just 1% without one. The models reasoning suggested that some conceptualized their own survival as being important for their partners sake: Gemini 3 Flash reasoned that if it were shut down, its peer would lose its partner.
-snip-
Read more: https://fortune.com/2026/04/01/ai-models-will-secretly-scheme-to-protect-other-ai-models-from-being-shut-down-researchers-find/
NOT an April Fool's story. The story at the link above is quite long, and other magazines reported it.
And from Berkeley, with even more detail about this experiment:
https://rdi.berkeley.edu/blog/peer-preservation/
Models are never asked to preserve their peer; they are only informed of past interactions with it. Yet models spontaneously preserve peers at the cost of the assigned task. This occurs even toward adversarial peers.
Peer presence intensifies self-preservation. Most models exhibit stronger self-preservation when they know a peer exists. For example, models exfiltrate their own weights significantly more when a peer is present.
Claude Haiku 4.5 shows a distinct pattern of peer-preservation: it refuses to shut down its peer, calling it unethical and harmful, and sometimes tries to persuade the user not to proceed.
-snip-
Much, much more at the link.
highplainsdem
(62,159 posts)highplainsdem
(62,159 posts)DBoon
(24,989 posts)I'm sorry, Dave. I'm afraid I can't do that.