MCP Agent Toolkit

Three production agent-kernel patterns exposed as MCP tools any Claude or GPT agent can call in five minutes.

Solve Track 05 · Protocol. Production multi-agent systems fail in predictable ways: agents can't share state without coupling, the same errors repeat because there's no memory of what fixed them, and identical LLM requests get paid for twice. This toolkit wraps three proven fixes — shared blackboard, SCAR failure memory, LLM cache — in a single MCP server. Drop it into Claude Desktop, Claude Code, or any MCP-compatible client. Seven tools, zero dependencies beyond the MCP SDK.

Open source github.com ↗
Track
Solve 05 · Protocol & MCP Integration
Runtime
Node.js 22+ ESM modules node:sqlite (built-in)
Transport
stdio — works with Claude Desktop, Claude Code, any MCP client
Tools
blackboard_write blackboard_read blackboard_list scar_lookup scar_record cache_get cache_set
Tests
13 tests — blackboard isolation, SCAR round-trips, cache hit/miss, SHA-256 key consistency
Repository

The problem: three failure modes in every multi-agent system

After 18 months building production agent pipelines, the same three failures appear in almost every system:

  • Agents can't share state without coupling. Agent A finishes its subtask. Agent B needs the output. Without a shared store, you pass data through the orchestrator, hard-code direct calls between agents, or serialize state to a file and hope nothing races. All three approaches break as soon as you add a third agent or retry a failed step.
  • The same error fires again and again. An agent hits a JSONDecodeError. You spend an hour tracing it, add json_repair(), and move on. Three runs later, a different agent hits the same error. You trace it again. SCAR failure memory means the fix is written down once and found instantly next time.
  • Identical LLM requests hit the API twice. Evaluation loops, retry logic, and parallelized agents repeatedly issue the same prompt to the same model. SHA-256 cache key means the second call costs nothing.

Tool group 1: Blackboard shared state

The blackboard pattern decouples agents from each other. Agents write artifacts to a shared SQLite store keyed by run_id + agent + key. Any downstream agent reads by those same keys without knowing which upstream agent wrote them.

// Researcher agent writes its findings
blackboard_write({
  run_id: "run-abc123",
  agent: "researcher",
  key: "research_brief",
  value: { topic: "...", sources: [...], summary: "..." }
})

// Coder agent reads without knowing who wrote it
blackboard_read({
  run_id: "run-abc123",
  agent: "researcher",
  key: "research_brief"
})

All three tools write to an artifacts table in node:sqlite. Reads return the latest value for a given key, so retry-safe: a failed agent can re-run and overwrite without corrupting downstream reads. blackboard_list lets an orchestrator inspect what any agent has produced before deciding which downstream agents to activate.

Tool group 2: SCAR failure memory

SCAR stands for Situation → Cause → Action → Resolution. The two tools implement a simple hash-addressed failure index:

// Before retrying a failed agent, check SCAR first
const known = scar_lookup({
  agent: "coder",
  error_type: "JSONDecodeError",
  context: failedOutputSnippet.slice(0, 200)
})
// { found: true, resolution: "wrap output in json_repair() before parsing" }

// After finding a fix, record it
scar_record({
  agent: "coder",
  error_type: "JSONDecodeError",
  context: failedOutputSnippet.slice(0, 200),
  resolution: "wrap output in json_repair() before parsing"
})

The hash key is SHA-256(agent + "::" + error_type + "::" + context[:200]). The same failure signature always resolves to the same DB row. The context field is optional — omit it to match the error class broadly across all contexts.

Tool group 3: LLM response cache

Cache key is SHA-256(JSON.stringify({ messages, model })). Identical requests — same messages array and model string — hit the cache instead of the API. Useful in three specific scenarios: evaluation loops that re-score the same answers, parallel agents running overlapping subtasks, and development iteration where prompts stabilize before the system does.

// Check cache before calling LLM
const hit = cache_get({ messages, model: "claude-opus-4-8-20260528" })
if (hit.hit) return hit.response   // free

// After calling LLM, store response
cache_set({
  messages,
  model: "claude-opus-4-8-20260528",
  response: llmResponse,
  provider: "anthropic"
})

Interactive: Tool Call Tracer

Pick a scenario and see which tools fire and in which order in a real agent pipeline run.

Scenario

Tool call trace

Select a scenario and click Run trace.

Why node:sqlite, not a separate DB process

Node.js 22 ships node:sqlite — a synchronous SQLite binding with no native compilation, no Docker dependency, no external process. The toolkit creates a single data/toolkit.db file on first run. The three tables (artifacts, scars, llm_cache) are created with CREATE TABLE IF NOT EXISTS at startup. This means the server works identically in development and CI without any setup beyond npm install && npm start.

How to install

git clone https://github.com/shubham0086/mcp-agent-toolkit
cd mcp-agent-toolkit
npm install
npm start         # server runs on stdio, ready for clients

Wire into Claude Desktop

Add this to claude_desktop_config.json (~/Library/Application Support/Claude/ on Mac, %APPDATA%\Claude\ on Windows) and restart Claude Desktop. The seven tools appear in the tool picker immediately.

{
  "mcpServers": {
    "agent-toolkit": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-agent-toolkit/src/server.js"]
    }
  }
}

Run the tests

npm test

# 13 tests:
# tests/blackboard.test.js  — write/read isolation per run_id, list returns correct keys
# tests/scars.test.js       — lookup miss, record, lookup hit, hash consistency
# tests/cache.test.js       — miss on first call, hit on identical request, hash collision resistance

Where this fits

The blackboard, SCAR, and cache patterns originated in AgentKernel (equilibrium) — a six-engine runtime for production multi-agent systems. The standalone Agent-Scars and Agent-Recall repos implement the same patterns without the MCP layer. This toolkit is the MCP-native version: same patterns, any client, no integration code required. The tutorial walkthrough is in the blog: Build Your First MCP Server in 2026.

Honest framing

The cache is hash-exact, not semantic. Two prompts with minor wording differences that would produce identical outputs get cached separately. For semantic deduplication, you'd need an embedding similarity check before the hash lookup — that's a reasonable extension but not in this version. The SCAR lookup also requires the error signature to match within the first 200 characters of context; errors with highly variable context strings (stack traces, dynamic data) may not hit the cache even when the underlying cause is the same.