Shannon: Production-Grade Multi-Agent Orchestration with Temporal

October 7, 2025

Most agent frameworks give you building blocks. Shannon gives you a production system.

Shannon Repository Card

Shannon Dashboard

Why Shannon Exists

Building AI agents is easy. Running them in production is hard.

After prototyping with LangGraph, CrewAI, or similar frameworks, teams hit the same walls:

  • Reliability: How do you reproduce bugs when LLM calls are non-deterministic?
  • Cost control: How do you prevent runaway token usage without killing performance?
  • Task complexity: How do you orchestrate 10+ agents with dependencies without manual DAG wiring?
  • Observability: Where did the tokens go? Which agent failed? What was the execution path?
  • Enterprise integration: How do you integrate proprietary APIs without forking the framework?

Shannon was built to solve these production problems from day one.

Architecture: Temporal + Rust + Go + Python

Shannon's hybrid architecture gives you the best of three worlds:

Temporal Workflows (Go)

The orchestration layer runs on Temporal, giving you:

  • Deterministic replay: Export any workflow execution and replay it locally to reproduce bugs
  • Built-in retries: Automatic retry logic with exponential backoff
  • Workflow versioning: Deploy new workflow versions without breaking running tasks
  • Durable execution: Tasks survive service restarts, network failures, and crashes

Unlike state machines in LangGraph or CrewAI's sequential execution, Temporal workflows are event-sourced and replay-safe. Every decision point is recorded. Every execution is reproducible.

Rust Agent Core

The enforcement layer handles:

  • WASI sandbox: Run untrusted Python code with no network access, read-only filesystem
  • gRPC gateway: High-performance agent execution with sub-millisecond overhead
  • Policy enforcement: OPA-based governance and approval workflows

Native-code performance where it matters most—security and networking.

Python LLM Service

The integration layer provides:

  • Multi-provider support: OpenAI, Anthropic, X.AI, Google Gemini, custom providers
  • MCP tools: Add tools via YAML config, no code changes required
  • Flexible scripting: Rapid prototyping for new agent behaviors

Keep LLM integration simple and extensible without sacrificing system reliability.

Intelligent Workflow Orchestration

Shannon doesn't make you choose workflows manually. The orchestrator analyzes your task and routes it to the optimal execution pattern.

Automatic Workflow Selection

// Orchestrator routing logic (simplified)
if complexity < 0.3 && simpleByShape:
     SimpleTaskWorkflow          // Single agent, fast path
else if len(subtasks) > 5 || hasDependencies:
     SupervisorWorkflow          // Coordinate multiple agents
else if cognitiveStrategy == "react":
     ReactWorkflow               // Reasoning loop with tools
else if cognitiveStrategy == "research":
     ResearchWorkflow            // Multi-step research pipeline
else:
     DAGWorkflow                 // Standard parallel execution

8+ Built-In Workflow Patterns

Core workflows:

  • SimpleTaskWorkflow: Single-agent execution (complexity < 0.3)
  • SupervisorWorkflow: Coordinates 5+ subtasks with dependencies
  • StreamingWorkflow: Real-time token streaming (single/multi-agent)
  • TemplateWorkflow: Pre-defined workflows for repeatable tasks

Strategy workflows:

  • DAGWorkflow: Fan-out/fan-in parallel execution
  • ReactWorkflow: Iterative reasoning + tool use (ReAct pattern)
  • ResearchWorkflow: Multi-step research with parallel source gathering, citation filtering, gap detection
  • ExploratoryWorkflow: Tree-of-Thoughts for complex decision-making
  • ScientificWorkflow: Hypothesis testing, debate, multi-perspective validation

No manual workflow wiring. No DAG construction. Just submit a task and Shannon picks the right pattern.

See multi-agent-workflow-architecture.md for full routing logic.

Deep Research Agent: Production-Ready Information Gathering

Shannon's ResearchWorkflow is built for multi-step research tasks with quality controls:

Research Pipeline

  1. Query understanding: Analyze research goal and decompose into search queries
  2. Parallel sourcing: Execute multiple searches concurrently (web search, vector DB, APIs)
  3. Citation filtering: Apply credibility rules to filter low-quality sources
  4. Gap detection: Identify missing information and trigger follow-up searches
  5. Synthesis: Combine findings with proper attribution and citations
  6. Cost tracking: Record tokens and cost per research phase

Citation Credibility Filtering

Configure citation filters in config/citation_credibility.yaml:

citation_filter:
  enabled: true
  credible_domains:
    - openai.com
    - anthropic.com
    - arxiv.org
    - github.com
  suspicious_patterns:
    - "click here"
    - "buy now"
    - "limited time"
  min_confidence: 0.7

Only citations meeting credibility thresholds make it to the final report. Full pipeline documented in research-workflow.md.

Example: Research Task

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Survey the latest breakthroughs in quantum error correction, find 3 authoritative sources",
    "cognitive_strategy": "research",
    "max_citations": 5
  }'

Shannon will:

  • Generate search queries
  • Gather sources in parallel
  • Filter by credibility
  • Detect gaps (e.g., "need more recent papers from 2025")
  • Synthesize findings with citations
  • Return structured results with token costs

Task Decomposition & Agent Orchestration

Shannon's SupervisorWorkflow handles complex tasks with multiple dependencies:

Automatic Subtask Decomposition

Submit a complex task:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze our Q4 sales data, create a forecast model, and generate an executive summary with visualizations"
  }'

Shannon will:

  1. Analyze task complexity (triggers SupervisorWorkflow for 5+ subtasks)
  2. Decompose into subtasks:
    • Load and clean Q4 sales data
    • Perform statistical analysis
    • Train forecast model
    • Generate visualizations
    • Write executive summary
  3. Build dependency graph: Ensure data loading completes before analysis
  4. Execute in parallel: Run independent subtasks concurrently
  5. Coordinate results: Aggregate outputs and synthesize final report

Agent Collaboration

Agents communicate via Temporal signals and shared context:

  • Signals: Agent A signals Agent B when results are ready
  • Session memory: Agents read/write to shared Redis session storage
  • Vector recall: Agents query Qdrant for relevant historical context
  • Result passing: Agents receive structured outputs from dependencies

No manual message passing. No complex state management. Temporal handles coordination.

Memory Architecture: Redis + Qdrant

Shannon's memory system balances speed and recall:

Session Memory (Redis)

Fast, ephemeral storage for active conversations:

  • Token usage tracking (prevent budget overruns mid-conversation)
  • Recent message history (last N turns)
  • Session metadata (user_id, created_at, title)

Vector Memory (Qdrant)

Long-term storage for semantic recall:

  • Workflow recall: Retrieve similar past executions
  • Diversity sampling: Avoid redundant similar memories
  • Cross-session context: Find relevant information across conversations

Agents automatically query both stores. Memory integration documented in memory-system.md.

Comprehensive Token Tracking & Cost Attribution

Every workflow populates usage metadata—no exceptions:

{
  "workflow_id": "task-xxx",
  "status": "COMPLETED",
  "result": "Research findings...",
  "usage": {
    "model_used": "gpt-5-nano-2025-08-07",
    "provider": "openai",
    "total_tokens": 8547,
    "input_tokens": 6201,
    "output_tokens": 2346,
    "cost_usd": 0.0127
  },
  "agent_usages": [
    {
      "agent_id": "research-coordinator",
      "model": "gpt-5-nano-2025-08-07",
      "tokens": 2103,
      "cost_usd": 0.0031
    },
    {
      "agent_id": "web-search",
      "model": "gpt-5-nano-2025-08-07",
      "tokens": 3421,
      "cost_usd": 0.0051
    },
    {
      "agent_id": "synthesis",
      "model": "claude-sonnet-4-5",
      "tokens": 3023,
      "cost_usd": 0.0045
    }
  ]
}

Data flow: LLM provider → Agent activity → Workflow aggregation → Database → API response. Every agent execution records usage exactly once (no duplicates). Full details in token-budget-tracking.md.

Query costs by agent, model, or time range:

SELECT agent_id, SUM(total_cost_usd) as total_cost
FROM task_executions
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY agent_id
ORDER BY total_cost DESC;

Config-Driven Everything (Single Source of Truth)

All LLM provider configs live in config/models.yaml:

# Model tiers with priority ranking
model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1
      - provider: anthropic
        model: claude-3-5-haiku-20241022
        priority: 2

# Model catalog (capabilities)
model_catalog:
  openai:
    gpt-5-nano-2025-08-07:
      tier: small
      context_window: 200000
      max_tokens: 16000
      supports_functions: true
      supports_streaming: true

# Pricing (per 1K tokens)
pricing:
  models:
    openai:
      gpt-5-nano-2025-08-07:
        input_per_1k: 0.0005
        output_per_1k: 0.0015

Hot-reload enabled—change configs without restarting services.

Adding a new provider:

  1. Update config/models.yaml (tiers, catalog, pricing)
  2. Implement Python provider class in llm_provider/{provider}_provider.py
  3. Register in llm_service/providers/__init__.py

Done. No Go/Rust code changes. See centralized-pricing.md.

Enterprise Integration: Vendor Adapter Pattern

Shannon uses a vendor adapter pattern for domain-specific integrations without polluting core code:

Generic Shannon (open-source):
├── python/llm-service/.../tools/openapi_tool.py    # Generic OpenAPI loader
├── config/shannon.yaml                              # Base config

Vendor Extensions (kept private):
├── config/overlays/shannon.vendor.yaml              # Vendor tool configs
├── python/llm-service/.../tools/vendor_adapters/    # API transformations
└── python/llm-service/.../roles/vendor/             # Custom agent roles

Use conditional imports and config overlays to keep vendor-specific code separate:

# In presets.py (generic Shannon):
try:
    from .vendor.custom_agent import CUSTOM_AGENT_PRESET
    _PRESETS["custom_agent"] = CUSTOM_AGENT_PRESET
except ImportError:
    pass  # Shannon works without vendor module

Perfect for enterprise deployments with proprietary APIs. Full guide: vendor-adapters.md.

Framework Comparison

FeatureShannonLangGraphCrewAIAgentKit
OrchestrationTemporal workflows (event-sourced, replay-safe)State graphs (in-memory)Sequential executionHosted platform
Task decompositionAutomatic (8+ patterns, complexity analysis)Manual graph constructionManual role assignmentAgent Builder (visual)
Workflow replayFull deterministic replay (export/import)Limited (LangSmith)NoPlatform traces
Cost trackingPer-agent, per-model attributionTotal only (callbacks)Total onlyPlatform analytics
MemoryRedis + Qdrant (session + vector)CheckpointsShort/long-term (in-memory)Hosted vector DB
Code executionWASI sandbox (no network)Jupyter kernelNo built-inCode Interpreter
Multi-providerOpenAI, Anthropic, X.AI, Google, customOpenAI, AnthropicOpenAI, AnthropicOpenAI only
Research workflowsBuilt-in (citation filtering, gap detection)Build yourselfBuild yourselfBuild yourself
HostingSelf-hostedSelf-hostedSelf-hostedHosted platform
Enterprise patternsVendor adapters, config overlaysCustom codeCustom codePlatform integration

When to use Shannon:

  • You need deterministic replay for debugging production issues
  • You're running multi-agent workflows with complex dependencies
  • You need per-agent cost attribution and budget controls
  • You want research workflows with quality controls built-in
  • You need to integrate proprietary APIs without forking the framework

When to use alternatives:

  • LangGraph: Rapid prototyping, Python-native workflows, LangSmith integration
  • CrewAI: Simple sequential agent patterns, minimal setup
  • AgentKit: Visual workflow builder, hosted platform, no infrastructure management

Production Features

Hard Budgets & Rate Limits

Set per-task budgets to prevent runaway costs:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Generate marketing copy",
    "max_budget_usd": 0.50,
    "rate_limits": {
      "requests_per_minute": 10,
      "tokens_per_minute": 50000
    }
  }'

Shannon will halt execution if budget is exceeded. Rate-aware scheduling documented in rate-aware-budgeting.md.

Sandboxed Code Execution

Shannon runs untrusted Python code in a WASI sandbox:

  • No network access: Code cannot make external API calls
  • Read-only filesystem: Code cannot write to disk
  • Memory limits: Prevent resource exhaustion
  • Execution timeout: Kill long-running code
# Agent can safely execute user-provided code
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Calculate fibonacci(50) using Python",
    "enable_code_execution": true
  }'

Full details in python-code-execution.md.

Governance & Approvals

Block high-risk actions until human approval:

# Agent requests approval for high-risk action
# Workflow pauses and emits approval event

curl -X POST http://localhost:8081/approvals/decision \
  -H "Content-Type: application/json" \
  -d '{
    "approval_id": "<id>",
    "workflow_id": "<wid>",
    "approved": true,
    "feedback": "Approved for production deployment"
  }'

Configure OPA policies for fine-grained control.

Deterministic Replay

Reproduce any bug by exporting and replaying workflow history:

# Export workflow history
make replay-export WORKFLOW_ID=task-xxx OUT=bug.json

# Replay locally to reproduce issue
make replay HISTORY=bug.json

# Fix bug, replay again to verify
make replay HISTORY=bug.json

Replay uses the exact same event history as the original execution. No more "works on my machine" for AI workflows.

Observability: Metrics, Traces, Events

Shannon provides multiple observability layers:

Real-Time SSE Events

Stream execution events in real-time:

curl -N "http://localhost:8081/stream/sse?workflow_id=task-xxx"

# Output:
event: agent_thinking
data: {"agent":"research-coordinator","message":"Analyzing query..."}

event: tool_invoked
data: {"tool":"web_search","params":{"query":"quantum error correction 2025"}}

event: task_completed
data: {"workflow_id":"task-xxx","status":"COMPLETED"}

Prometheus Metrics

  • Task execution counts by workflow type
  • Token usage by model and provider
  • Latency percentiles (p50, p95, p99)
  • Budget utilization rates

Query via Prometheus: http://localhost:9090

OpenTelemetry Traces

Distributed tracing across Rust, Go, and Python services. Every agent execution is a trace span.

View in Jaeger or export to your observability platform.

Demo: 30-Second Setup

git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon

# Setup environment
make setup-env
echo "OPENAI_API_KEY=your-key" >> .env

# Install Python WASI interpreter
./scripts/setup_python_wasi.sh

# Start all services
make dev

# Run smoke tests
make smoke

Open the dashboard: http://localhost:2111

Submit Your First Task

Simple task:

export GATEWAY_SKIP_AUTH=1

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{"query":"Explain quantum entanglement in simple terms"}'

Research task:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find 3 authoritative sources on GPT-5 capabilities and summarize key findings",
    "cognitive_strategy": "research",
    "max_citations": 5
  }'

Complex decomposition:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze Tesla stock performance Q4 2024, create forecast model, generate report with charts"
  }'

Shannon will automatically route to the optimal workflow pattern.

Python SDK

Install: pip install shannon-sdk

from shannon import ShannonClient, EventType

with ShannonClient(gateway_endpoint="http://localhost:8080") as client:
    # Submit research task
    task = client.submit_task(
        query="Survey AI agent frameworks: LangGraph, CrewAI, Shannon",
        cognitive_strategy="research",
        max_citations=10
    )

    # Stream events
    for event in client.stream(task.workflow_id):
        if event.type == EventType.AGENT_THINKING:
            print(f"🤔 {event.message}")
        elif event.type == EventType.TOOL_INVOKED:
            print(f"🔧 Tool: {event.data['tool']}")
        elif event.type == EventType.TASK_COMPLETED:
            print(f"✅ Result: {event.data['result']}")
            print(f"💰 Cost: ${event.data['usage']['cost_usd']:.4f}")

Full SDK docs: clients/python/README.md

Adding Custom Tools (No Code Changes)

Add an OpenAPI tool in config/shannon.yaml:

openapi_tools:
  weather_api:
    enabled: true
    spec_path: "./config/openapi_specs/weather.yaml"
    base_url: "https://api.weather.com/v1"
    operations:
      - get_forecast
      - get_current

Or add an MCP tool:

mcp_tools:
  github_search:
    enabled: true
    func_name: "search_repos"
    description: "Search GitHub repositories"
    category: "data"
    parameters:
      - { name: query, type: string, required: true }
      - { name: language, type: string, enum: [python, go, rust] }

Restart LLM service—tools are immediately available to agents. No proto/Rust/Go changes.

Full guide: adding-custom-tools.md.

Why Teams Choose Shannon

Enterprises:

  • Self-hosted, no vendor lock-in
  • Vendor adapter pattern for proprietary APIs
  • OPA policies and approval workflows
  • Comprehensive audit logs and replay

Research teams:

  • Built-in research workflows with citation filtering
  • Vector memory for cross-experiment recall
  • Deterministic replay for reproducible research

Cost-conscious teams:

  • Per-agent cost attribution
  • Hard budgets prevent overruns
  • Multi-provider support (fallback to cheaper models)
  • Token tracking at every layer

DevOps teams:

  • Prometheus/OpenTelemetry integration
  • Temporal for workflow reliability
  • Replay for debugging production issues
  • Docker Compose for local dev

Documentation & Community

Get involved:

  • Star the repo: https://github.com/Kocoro-lab/Shannon
  • Open issues for bugs or feature requests
  • Join discussions for architecture questions
  • Contribute workflows, tools, or provider integrations

If you're building production AI agents and need reliability, observability, and cost control without vendor lock-in, Shannon is built for you. Try the demo and open an issue with your use case—we'd love feedback.