Agents Rule of Two | Model Monster

A security framework developed by Meta stating that AI agents should satisfy no more than two of the following three properties within a session: (A) processing untrustworthy inputs, (B) accessing sensitive systems or private data, and (C) changing state or communicating externally. Building on Simon Willison's Lethal Trifecta, the Rule of Two extends protection beyond data exfiltration to cover any state-changing action an agent might take, including examples like issuing refunds, modifying files, sending messages, or executing code. If a task requires all three properties, the agent should not operate autonomously and must include human-in-the-loop approval or equivalent supervision. The Rule of Two reflects the current consensus that prompt injection cannot be reliably detected or filtered, making architectural constraints the most practical defense for agentic AI systems.