Hooking OPA into Coding Agents
About a year ago I filed a feature request for hooks in Claude Code. I’d been trying to build integrated governance for AI coding agents through filesystem watching, and it wasn’t working well. What I actually wanted was a way to intercept every tool call and enforce policy with OPA/Rego before it executes. Anthropic shipped hooks. We built Cupcake on top of it in ~April ‘25.
Then we tested it with Trail of Bits and delayed any public release. What we arrived at for cybersecurity use cases: this is an early warning system, not a security boundary. An agent with Bash() cannot be constrained to deterministic policies. You can sometimes catch known-bad patterns. You can sometimes flag suspicious behavior. You cannot enumerate every way a shell command can do something you didn’t intend or how a malicious agent might mask behavior.
There are reasons for running policy enforcement on the same machine as the agent: the unique advantage is runtime signals. Scripts that check what branch you’re on. Whether a file is a symlink. What else is happening on the machine. Arbitrary enrichment of the policy decision with real-world state. If you’re not gathering signals, you lose that advantage, and a network gateway is actually a better place to enforce. I’m not seeing the newer startups or open source projects in this space utilizing signals at all.
Other than OPA, it was easy to integrate LLM-as-a-judge which is called the “Watchdog” feature in Cupcake. However, scaling this and running efficiently is operationally complex. Again, best functional as a warning system. It is also theoretically redundant and equally vulnerable to the same threats your originally-aligned agent is.
We released Cupcake quietly in the fall. No marketing. The honest answer is “this helps, and here’s exactly where it stops helping,” and that’s a hard pitch.
Now I’m seeing conference talks and startups gearing up to present “deterministic guardrails for agents via hooks” as their solution. You’re going to need signals & consider when to operate as a gateway vs not, and the solution is still largely incomplete in terms of the challenges agentic security presents.
The path forward for Cupcake/alike is dynamic delegation. Agents that DO start with full access but can’t utilize it until explicitly granted by a human in the loop (more of a UX problem if anything). Permission models that expand and contract based on context, not static allow/deny lists. Beyond this, in a similar way as Ralph, create a flywheel of feedback loops between humans and agents.