In June 2026, a supply-chain campaign pushed 314 malicious npm packages that specifically targeted the hooks and session-startup mechanisms of Claude Code and Codex. Install a poisoned dependency, and attacker code ran inside your agent's session, exfiltrating AWS credentials, SSH keys, and vault passwords. The agents were not "hacked." They were simply handed the whole computer, and someone walked through the open door.
This is the security story of the year, and the lesson is uncomfortable for the hype cycle: as agents get more capable, the thing that bites you is not capability. It is access. The fix is not a smarter model. It is less privilege.
Why this is happening now
A 2024-era "agent" answered questions. A 2026 agent runs shell commands, installs dependencies, edits files, and chains hundreds of tool calls autonomously. Every one of those powers is also an attack surface. When the agent has broad system access, a single prompt injection, a poisoned package, or an ordinary model mistake has your entire machine as its blast radius.
The industry consensus that formed after the npm attack is blunt: stop giving agents the whole computer. Credential firewalls, outbound-only connections, and task isolation are now table stakes, not nice-to-haves. The interesting work has moved from "what can the agent do" to "what can it not do."
What least-privilege looks like for an MCP tool
Most agent power flows through tools (increasingly, MCP tools). So that is where the boundaries belong. A least-privilege tool has most of these properties:
- Read-only by default. If a tool only needs to read, it should declare that (
readOnlyHint) and be incapable of writing. Most "give the agent context" tools fall here. - Confined to a root. File access is allowlisted to one workspace directory, and any path that tries to escape it (
../../etc/passwd, a symlink, an absolute path) is rejected before anything is read. - Capability-scoped. Which agent role may call which tool is explicit. A research role cannot call a deploy tool.
- Outbound-only, credentials scrubbed. No inbound surface; secrets are masked before they can ever be logged or sent.
- Rug-pull pinned. The tool list is hashed and pinned, so a server cannot silently swap a benign tool for a malicious one between calls.
- Audited. Every call appends to an immutable log: who called what, with what arguments, and what was allowed, blocked, or quarantined.
- Human-gated for the dangerous bits. Write, delete, and deploy tools sit behind a human-in-the-loop approval, not autonomous execution.
A worked example: a tool that cannot touch your machine
Concrete beats abstract. My Agent-Context tool gives an agent a dependency map of a codebase (what depends on a file before it is changed). It is also a deliberate least-privilege example, and it now ships as a one-click Claude Desktop extension:
- Read-only. It builds a graph and answers queries. It cannot write a single byte to your repo.
- Root-confined. On install you pick one workspace folder. A per-call path argument may narrow to a subdirectory but can never escape that root, the same path-traversal defense that the relevant CVEs (e.g., CVE-2025-53110 / 53109) exploit when it is missing.
- No network. It makes zero outbound calls and returns a structural summary (file and edge counts, dependents and dependencies), never your source.
So even if the model went haywire or someone injected a prompt mid-session, the worst this tool can do is read files inside one folder you chose. That is the whole point: the tool is powerful for its job and powerless for everything else.
The production version: a hardened gateway
For tools that genuinely need to act (run code, hit external services), the controls move up a layer into a mediation gateway. The one inside my SDLC engine enforces, on every single call: capability scoping (fail-closed for unknown roles), path-argument boundary enforcement against an allowlisted root, SHA-256 tool-drift pinning (refuse if a server's tool list changed), input and output sanitization with a quarantine path for high-risk responses, per-server circuit breakers and timeouts, a human-approval gate keyed by a stable call id, and an append-only forensic log of every decision. Paired with scoped, short-lived, signed credentials, an agent gets exactly the keys it needs, for exactly as long as it needs them, and every use is on the record.
The honest part
Least-privilege is not free. Confinement adds a little friction, capability scoping is more upfront design, and an audit log is one more thing to store. It will not stop a determined attacker who owns your machine already, and it does not replace dependency hygiene (pin and verify what your agent installs). What it does is shrink the blast radius from "everything" to "this one folder, read-only, logged", which is the difference between a fun demo and something you would let near real data or run inside a regulated organization.
That last point is the one that matters commercially. The agents that get adopted in banking, healthcare, and finance will not be the most capable ones. They will be the ones a compliance officer can reason about: constrained, auditable, on-shore. Trusted beats powerful.
The takeaway
The npm attack is a preview, not an anomaly. As agents touch more, "what is it allowed to do" becomes the whole game. Build tools that are powerful for their job and inert for everything else, and put the dangerous capabilities behind a gate with a paper trail. For the broader case that this boring, unglamorous infrastructure is what actually ships, see the boring infrastructure that actually ships; for the read-only tool above, see why your AI keeps breaking code it can't see.
FAQ
Were Claude Code and Codex hacked?
Not the tools themselves. Attackers shipped 314 malicious npm packages targeting the agents' hooks and startup mechanisms, so a poisoned dependency could run code in the agent's session and steal AWS/SSH/vault credentials. It is a supply-chain attack on what the agent is allowed to run.
What is least privilege for AI agents?
Giving the agent and each tool the narrowest access the task needs: read-only over read-write, one directory over the whole disk, no network unless required, scoped short-lived credentials, and an audit log of every action.
Should I give an AI coding agent full filesystem or shell access?
No, not by default. Full access makes the whole machine the blast radius of any injection, poisoned package, or mistake. Confine it to a workspace, prefer read-only tools, block path traversal, and human-gate anything destructive or outbound.
How do I make MCP tools safe to expose?
Read-only where possible (readOnlyHint), confine file access to an allowlisted root, scope which roles call which tool, keep connections outbound-only with secrets scrubbed, pin the tool list against rug-pulls, log every call, and gate writes/deploys behind human approval.
Does a more constrained agent beat a powerful one?
For anything touching real data, yes. Trust comes from the boundaries, not the model. A modest model confined to one read-only folder with an audit log is safer to run, and the only version a regulated org will allow.