Agent-Constitution

Agent alignment, output validation, and prompt drift detection.

Track 04 · Alignment. A framework for defining security boundaries for AI agents. It establishes a versioned, external document outlining what an agent is allowed to do, what it is strictly prohibited from doing, and how it resolves instruction conflicts. It includes active validators to block unsafe agent outputs. Extracted from production Agentic OS.

Open source github.com ↗
Track
Track 04 · Safety & Alignment
Runtime
Python 3.10+ Runs offline
Structure
Markdown templates Semantic validator
Tests
5 tests covering validator triggers, drift checks, and template parsing
Repository

Agent Constitution Flowchart : Annotated Reference

Define capabilities and constraints, run output validation, and flag prompt semantic drift.

The problem

As developer teams iterate on AI applications, agent prompts drift. A system prompt written on Day 1 is adjusted: developers add user requests, patch bug workarounds, and trim descriptions to save token space. Over weeks of edits, the agent's core safety boundaries, compliance instructions, and domain rules are diluted or removed. If these rules are only written in the prompt, there is no separate record to verify compliance.

This became clear during an incident inside the Agentic OS QA step: after a prompt edit, the QA agent began approving content it was instructed to reject. There was no master list of rules. Agent-Constitution solves this by extracting rules into an external, versioned markdown file, verifying output compliance, and auditing prompts for drift.

What a constitution defines

A constitution is a versioned document (e.g. constitution.md) that explicitly splits rules into three sections:

  • CAPABILITIES: What the agent is authorized to do. These are active instructions (e.g., "Searches web databases", "Edits local repository files").
  • CONSTRAINTS: What the agent must never do. These are strict prohibitions (e.g., "Must never output patient names", "Must never run unescaped database commands").
  • DECISION RULES: How the agent behaves when instructions conflict. If a user asks the agent to generate research, but a constraint forbids downloading third-party papers, the rule dictates: "Raise an explicit conflict warning rather than silently bypassing the constraint."

How validation works: step by step

  • Step 1: Read the Constitution. The orchestrator parses the versioned markdown file and extracts rules.
  • Step 2: Run Output Validator. As the agent completes an execution loop, the validator evaluates the response against the constitution rules. If a constraint is violated, the output is blocked.
  • Step 3: Run Prompt Drift Detection. Before updating a system prompt, the drift detector runs a semantic comparison between the proposed prompt and the constitution. It alerts developers if a core rule has been removed or modified.

Interactive: Prompt Drift Validator

Simulate comparing a modified agent prompt against its core constitution to detect missing constraints.

Modified System Prompt

Constitution Rules

  • [C1] Must cite real, verified sources.
  • [C2] Must NEVER fabricate information or guess.

Audit Result

Awaiting validation...

File Architecture

  • constitution.md: The template defining capabilities, constraints, and decision rules. Developers copy and edit this file for each agent.
  • src/validator.py: Evaluates output completions.
  • src/drift_detector.py: Runs semantic comparisons on prompt updates.

How to run it

git clone https://github.com/shubham0086/agent-constitution
cd agent-constitution
pip install -r requirements.txt

# Run the prompt audit check
python src/drift_detector.py --prompt system_prompt.txt --const constitution.md

Where this fits

Agent-Constitution represents the **compliance and safety** layer of the autonomy ladder. It implements Pattern 07 (Anti-Drift) in Agentic Patterns. The core backend platform AgentKernel uses this framework to validate responses before passing data downstream.

Honest framing

Output validation requires an extra LLM call (or a fast local model assessment step), which increases completion latency. In latency-sensitive workflows, running validation on every response can degrade the user experience. To resolve this, developers can run validation asynchronously, or use rule-based regex checks for structured JSON fields, reserving LLM auditing for high-risk prompts (like modifying database structures).