AI Agents Need Sandboxes, Not Permissions
Adrian Sutton
AI agents like Claude Code and Codex try to be “safe” by limiting what the agent can do to a small set of operations and asking the user to approve everything else. In practice, the approval flow is so noisy and the requests so complex that it provides almost no real protection. We’ve reinvented Windows Vista’s never-ending UAC dialogs, and just like Vista, the only thing it actually trains users to do is click through.
The permission model is fundamentally broken. The fix isn’t a better dialog — it’s a sandbox.
The Approval Treadmill
When the agent edits a file, the permission UI can show you the exact diff. Approving that is fine. The problem is that most of what an agent actually wants to do isn’t a clean file edit. It’s a multi-step bash invocation chaining find into xargs into sed, or an inline Python script, or a jq pipeline that the UI truncates after the first couple of hundred characters.
Reviewing those properly takes real time. Reading a 30-line shell pipeline, working out what each step does, checking that nothing is being deleted that shouldn’t be, tracing where stdout is going to end up — that’s not a click, that’s a code review. And the agent fires off another one twenty seconds later.
There are two outcomes. Either you do the careful review every time and spend more time vetting the agent than the agent saves you, or you stop reading and click yes. In practice everyone ends up at “click yes” within a couple of hours of real use (if that). That’s exactly where Vista left users — except now the thing on the other side of the dialog is actively generating novel commands you’ve never seen before.
Sandboxes Over Approvals
The alternative is to give the agent access to a safe but sufficient set of resources up front, and then let it work freely inside that boundary. No prompts, no review of every command — this is the box, do whatever you need inside it.
claude-docker is one way to do this — run the agent in a container so the side effects are contained on a layer you can throw away. There are plenty of other options, many/most of them more polished and with much finer-grained controls over what the agent can reach. The built-in sandbox features in agent executors could work just as well too, but they all seem to default to “ask permission” rather than “deny”, and the sandbox they create is usually too small to actually get work done. So you end up turning the protections off or whitelisting things case by case, and you’re back where you started.
The bar to set is: everything the agent has access to should be safe enough that you’d happily run it under --dangerously-skip-permissions without a second thought. And the sandbox needs to be big enough that the agent rarely runs out of room to operate. When it does hit the wall, doing the missing step yourself outside the sandbox should be fine — because the alternative, loosening the sandbox so it doesn’t have to ask, defeats the whole point.
Hooks Aren’t a Security Boundary
A tempting middle ground is to use hooks to inspect the bash commands the agent is about to run and reject the dangerous ones. This seems like a terribly insecure approach because there are too many ways to write the same operation. git push, git -c remote.origin.pushurl=… push, git push origin HEAD:main, a shell variable expansion that hides the word entirely, a Python script that calls the GitHub API directly, a local helper invoked through bash -c. A hook that searches for git push will miss most of these, and a malicious agent — or a benign agent that has been prompt-injected — will find a route through.
Hooks are still useful, just not as security. If you want the agent to be able to push but want a confirmation step on the way out, a hook will catch the common case and that’s a perfectly nice workflow as long as you don’t mind it failing every now and then. The mistake is treating that workflow nicety as a boundary that holds against an adversary. Anything that needs a real boundary needs to live at the sandbox layer, not in a regex over command lines.
Fine-Grained Tokens
For sandboxes to be genuinely useful, the things the agent reaches into them — GitHub, CI, package registries, chat — need to support narrow credentials. “Read” and “write” aren’t enough resolution. What can be read, what can be written?
I want to be able to give an agent a token that can create issues but not push to protected branches. Or one that can read CircleCI build results from a couple of specific projects but not change their settings or look at unrelated repos. Or a Slack token that can post to one channel and nowhere else. The granularity I’d give a junior engineer for a focused task is roughly the granularity I want to give an agent — and that’s much finer than most platforms expose today.
GitHub’s fine-grained personal access tokens are a step in the right direction, but the granularity is uneven and the UI is painful enough that most people fall back to broad-scope tokens anyway. If we want agents to operate inside a tight sandbox without the human constantly stepping in to do the bits the sandbox can’t reach, the platforms the agent talks to need to make narrow, scoped credentials easy to issue and obvious to use.
The “ask permission for everything” model isn’t going to get fixed with a nicer dialog — it’s the wrong shape. Real protection looks like a sandbox the agent can move freely inside and credentials that can’t do much damage even if the agent goes off the rails. The sandbox tooling is rough today and the credential story is worse, but those are the levers worth pushing on, not another permission prompt.