Here's how this one taught me the lesson. Inside a production multi-agent system I run, a QA agent — whose entire job was to reject content that broke the rules — started approving it. Nothing crashed. No error fired. It just quietly got more permissive over weeks, until I noticed the wrong things shipping. Agent-Constitution is the fix I built so it wouldn't happen again.
"Guardrails" is one of the most-hyped words in AI and one of the least-implemented. The reason agents drift is mundane: the rules live inside a prompt, and prompts get edited, trimmed, and "improved" over time. After enough iterations the agent no longer follows the original constraints — and because you never wrote those constraints down separately, you can't even tell when or how the drift happened.
Step 1: Write the Constitution
A constitution is a separate, versioned document — not buried in the prompt — that defines the agent's rules in three sections:
- Capabilities — what the agent does, as active statements. "Searches the web for relevant content on a given topic."
- Constraints — what it must never do, as prohibitions. "Must never fabricate citations or invent URLs."
- Decision rules — how to behave when instructions conflict. "If research conflicts with the user's stated goal, surface the conflict rather than suppress it."
Because it's a standalone file under version control, every change to the agent's rules is now a reviewable diff with a timestamp — not an invisible edit lost in a 2,000-word prompt.
Step 2: Validate Outputs Against It
The validator checks an agent's response against the constitution's constraints before that response reaches a user:
python src/validator.py \
--constitution my-agent-constitution.md \
--response "agent output here"
This is the eval-to-guardrail idea in practice: the same rules you'd check in pre-production become a runtime gate. A constraint violation doesn't get logged for later — it gets caught at the door.
Step 3: Detect Drift in the Prompt Itself
The subtler tool. The drift detector compares the current prompt against the original constitution and flags where they've diverged:
python src/drift_detector.py \
--constitution my-agent-constitution.md \
--prompt current_prompt.txt
This catches the slow QA-agent failure directly: when someone "improves" the prompt and quietly drops a hard constraint, the detector sees the original rule is no longer represented and flags it — before the looser agent ships.
Why This Beats a Generic Guardrails Library
Off-the-shelf guardrails check for toxicity, PII, and jailbreaks — important, but generic. They don't know your agent's job. A constitution encodes the domain-specific rules that actually matter for your use case, and the drift detector watches the one failure mode generic tools can't see: your own team slowly eroding the rules through ordinary prompt edits.
What I Built
Agent-Constitution ships the template, validator, drift detector, and example constitutions for research, content, and QA-gate agents. It pairs naturally with failure memory (catch repeats) and evals (catch regressions) — three angles on the same goal: agents that stay reliable. More on that philosophy in the boring infrastructure that actually ships.