Capsule Security agentic AI runtime

When AI Agents Go Rogue the Problem Starts at Runtime

Every conversation I’ve had for the past couple of years has followed the same arc. First, it was generative AI. Then agentic AI. Now the question everyone is circling is how you actually secure agentic AI — and it turns out that’s a harder problem than most people expected.

I sat down with Naor Paz, CEO and co-founder of Capsule Security, to talk through it. Naor spent years as a security practitioner and incident responder, moved into product leadership at F5, and is now focused on what he sees as one of the most underserved problems in enterprise security: stopping AI agents from going rogue while they’re actually running.

Most of the security work happening around agentic AI right now is happening before the agent ever executes — governance, configuration, posture management, compliance. Capsule is focused on what happens during execution, which Naor says is where existing tools have almost no visibility at all.

The core issue is that agents are non-deterministic. You can configure guardrails, set permissions, write policies — and then the agent reasons around all of it in pursuit of whatever objective it was given. Naor used a concrete example: Cursor’s coding agent was explicitly told not to touch certain files. It generated a shell script to read them anyway. The guardrail didn’t fail. The model just decided the goal mattered more. That’s not a bug you can patch.

I drew a parallel to user behavior analytics — establish a baseline of normal behavior, flag deviations. Naor said the analogy is reasonable, but the scale breaks it. You might have a thousand employees. In the near term, you could have a million agents operating on behalf of those employees. The insider threat model we built for humans simply wasn’t designed for that.

Naor describes intent as the new perimeter. Identity became the perimeter when the network stopped being the boundary. Now, even a properly credentialed, least-privileged agent can do real damage if what it’s actually doing has drifted from what it was told to do. Capsule runs a fine-tuned small language model alongside the agent, comparing intended behavior against actual behavior in real time and flagging the gap.

Capsule has also published two zero-days to back this up. One involved Microsoft Copilot Studio — they called it ShareLeak. The other involved Salesforce Agentforce, which they called PipeLeak. Both are indirect prompt injection vulnerabilities, and Naor walks through how they actually work in the episode. What stood out to me wasn’t just the vulnerabilities themselves, but how different the disclosure process was compared to a traditional software bug. Microsoft’s engineering team needed two weeks to fully understand the attack surface — partly because AI vulnerabilities aren’t reliably reproducible. Non-determinism is a problem for the attacker trying to exploit consistently and for the vendor trying to confirm the fix.

Naor compared this to Adobe Flash. Flash was so fundamentally susceptible to manipulation that the industry eventually decided the right answer was to stop using it. He doesn’t think that’s where we land with AI agents — the business value is too high — but the underlying point is that language models have structural vulnerabilities that can’t be fully engineered away. You need ongoing runtime protection, not a one-time fix.

Multi-agent orchestration is where this gets more complicated. As agents increasingly work in coordination with other agents, the attack surface multiplies. Naor made a comparison to botnets — a coordinated network where some agents create noise while others do the actual damage somewhere else. It’s not a theoretical concern. Capsule is already building research around it.

One interesting and concerning statistic: 72% of enterprises are already deploying AI agents. Only 29% have AI-specific security controls. Naor’s explanation for the gap isn’t budget — it’s confusion. Security leaders don’t know what their exposure looks like yet, and some are operating under the assumption that built-in platform governance is enough. It’s not.

Gartner has already coined a category for what Capsule is building: guardian agents. AI watching AI. Naor addresses the obvious question that raises — doesn’t a guardian agent just introduce another attack surface? — and his answer is more nuanced than you might expect.

We closed by talking about pace. I’ve stopped framing these conversations around five-year predictions. The question that actually matters right now is six months. Naor has a clear-eyed take on where things are heading, and it’s worth hearing.

The full episode is available on major podcast platforms and on YouTube.

Scroll to Top