ACGP-1002: Architecture Specification

Status: Draft Last Updated: 2026-01-08 Spec ID: ACGP-1002 Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)

Abstract

This document specifies the system architecture for the Agentic Cognitive Governance Protocol (ACGP). It defines the components, their interactions, deployment patterns, and scaling considerations. The architecture supports multiple deployment topologies from simple single-agent governance to complex multi-agent networks with distributed stewardship. This specification provides normative requirements for component interfaces, data flows, integration patterns, retry behavior, and version negotiation that enable interoperable ACGP implementations.

Table of Contents

  1. Introduction
  2. System Components
  3. Component Interactions
  4. Data Flow Architecture
  5. Deployment Topologies
  6. Scaling Considerations
  7. Integration Patterns
  8. High Availability & Resilience
  9. Performance Requirements
  10. Security Architecture
  11. Conformance Requirements
  12. References

1. Introduction

The ACGP architecture is designed around principles of separation of concerns, defense in depth, and runtime adaptability. It enables real-time governance without introducing prohibitive latency while maintaining comprehensive audit trails and supporting human oversight.

1.1 Design Principles

  • Separation of Concerns: Governance logic is separate from agent logic
  • Defense in Depth: Multiple layers of validation and intervention
  • Minimal Intrusion: One-line integration for existing agents
  • Scalability: Horizontal scaling of governance components
  • Resilience: Graceful degradation under failure conditions
  • Observability: Complete visibility into all decisions

1.2 Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


2. System Components

2.1 Core Components

graph TB
    subgraph "Governance Layer"
        GS[Governance Steward]
        PE[Policy Engine]
        TM[Trusted Monitor]
        RE[Reflection Engine]
    end

    subgraph "Agent Layer"
        PA[Operating Agent]
        SA[Sub-Agents]
        AW[Agent Wrapper/SDK]
    end

    subgraph "Storage Layer"
        RDB[(ReflectionDB)]
        CSR[(Certified Source Registry)]
        BPS[(Blueprint Store)]
    end

    subgraph "Control Plane"
        HITL[Human-In-The-Loop]
        DASH[Monitoring Dashboard]
        ALERT[Alert Manager]
        CONFIG[Configuration Service]
    end

    subgraph "Integration Layer"
        MCP[MCP Adapter]
        A2A[A2A Adapter]
        API[REST/gRPC API]
    end

2.2 Component Descriptions

2.2.1 Governance Steward

  • Purpose: Primary governance enforcement agent
  • Responsibilities:
  • Observe Cognitive Traces from Operating Agents
  • Calculate CTQ scores using Reflection Engine
  • Issue Interventions (all six types: OK, Nudge, Escalate, Block, Halt, Flag)
  • Manage Trust Debt accumulation
  • Evaluate Tripwires (Standard, Critical, Severe)
  • Coordinate with other Stewards in a Steward Network

2.2.2 Policy Engine

  • Purpose: Decision engine for interventions
  • Responsibilities:
  • Apply ACL-tier specific thresholds (Risk Score based)
  • Evaluate tripwire conditions
  • Map CTQ scores to intervention decisions
  • Handle dynamic threshold adjustments
  • Support Flag orthogonality

2.2.3 Trusted Monitor

  • Purpose: Independent anomaly detection
  • Responsibilities:
  • Parallel evaluation of agent behavior
  • Pattern-based threat detection
  • Behavioral drift monitoring
  • Zero-trust verification of agent claims

2.2.4 Operating Agent

  • Purpose: The AI agent performing actual work
  • Responsibilities:
  • Execute tasks and use tools
  • Generate Cognitive Traces
  • Respond to Interventions
  • Manage sub-agent lifecycle

2.2.5 ReflectionDB

  • Purpose: Immutable audit trail
  • Requirements:
  • Append-only architecture
  • Cryptographic integrity (hash chain)
  • Time-series optimization
  • Compliance-grade retention

2.2.6 Blueprint Store

  • Purpose: Centralized policy repository
  • Features:
  • Version control for blueprints
  • Multi-Party Authorization (MPA) for changes
  • Inheritance resolution
  • Hot-reload capability

3. Component Interactions

3.1 Primary Flow Sequence with Version Negotiation

sequenceDiagram
    participant PA as Operating Agent
    participant AW as Agent Wrapper
    participant GS as Governance Steward
    participant PE as Policy Engine
    participant TM as Trusted Monitor
    participant RDB as ReflectionDB

    Note over PA,GS: Initial Connection
    PA->>GS: VERSION_NEGOTIATION
    GS-->>PA: VERSION_SELECTED (1.0.2)

    Note over PA,RDB: Runtime Governance Loop
    PA->>AW: Execute Action
    AW->>AW: Generate Cognitive Trace

    par Parallel Processing
        AW->>GS: Send TRACE
        GS->>PE: Evaluate with Blueprint
    and
        AW->>TM: Send TRACE
        TM->>TM: Anomaly Detection
    end

    PE->>PE: Check Tripwires (Priority 1)

    alt Tripwire Triggered
        PE->>GS: Immediate Override Decision
        GS->>AW: INTERVENTION (Block/Halt)
    else No Tripwire
        PE->>PE: Calculate CTQ Score
        PE->>PE: Compute Risk Score (1.0 - CTQ)
        PE->>PE: Apply ACL Thresholds
        PE->>GS: Return Decision
    end

    TM-->>GS: Flag Anomalies (if any)

    GS->>GS: Reconcile Decisions
    GS->>GS: Check Flag Orthogonality

    alt Intervention = OK or NUDGE
        GS->>AW: Send INTERVENTION
        AW->>PA: Proceed (modified if NUDGE)
    else Intervention = ESCALATE
        GS->>HITL: Request Human Review
        HITL-->>GS: Human Decision
        GS->>AW: Forward Decision
    else Intervention = BLOCK or HALT
        GS->>AW: Send INTERVENTION
        AW->>PA: Stop Execution
        Note over PA: Action blocked or session terminated
    end

    alt Flagged
        GS->>GS: Update Trust Debt
        GS->>GS: Check Re-tier Threshold
    end

    GS->>RDB: Log Complete Event (TRACE + EVAL + INTERVENTION)

3.2 Trust Debt Accumulation Flow

stateDiagram-v2
    [*] --> Normal: Initial State

    Normal --> Flagged: Flag Intervention
    Flagged --> DebtAccumulation: Add Flag Weight (0.1/0.3/0.5)

    DebtAccumulation --> ThresholdCheck: Check Debt Level

    ThresholdCheck --> Normal: Debt < Warning
    ThresholdCheck --> Elevated: Debt > Warning
    ThresholdCheck --> Critical: Debt > Critical

    Elevated --> TightenedControl: Reduce Thresholds
    Critical --> ForcedReTier: Trigger ARS Re-evaluation

    ForcedReTier --> HigherACL: Increase ACL Tier
    HigherACL --> Normal: Reset with New Tier

    Normal --> DebtDecay: Time Passes (0.95 per 24h)
    Elevated --> DebtDecay: Time Passes
    DebtDecay --> Normal: Reduce Debt

3.3 Retry and Timeout Behavior

sequenceDiagram
    participant Agent
    participant Steward
    participant Network

    Agent->>Network: Send TRACE

    alt Success
        Network->>Steward: Deliver
        Steward-->>Agent: INTERVENTION
    else Timeout (500ms)
        Note over Agent: Attempt 1 Failed
        Agent->>Agent: Wait 100ms + jitter
        Agent->>Network: Retry TRACE

        alt Success on Retry
            Network->>Steward: Deliver
            Steward-->>Agent: INTERVENTION
        else Timeout Again
            Note over Agent: Attempt 2 Failed
            Agent->>Agent: Wait 200ms + jitter
            Agent->>Network: Final Retry

            alt Success on Final Retry
                Network->>Steward: Deliver
                Steward-->>Agent: INTERVENTION
            else Final Timeout
                Note over Agent: All Retries Exhausted
                Agent->>Agent: Escalate for Manual Review
            end
        end
    end

3.4 Canonical Evaluation Order [NORMATIVE]

This is the authoritative reference for ACGP governance evaluation order. Implementations MUST follow this sequence when processing traces.

3.4.1 Evaluation Sequence

┌─────────────────────────────────────────────────────────────────┐
│                     GOVERNANCE EVALUATION                       │
│                                                                 │
│  1. TRIPWIRE CHECK (Priority 1 - Pre-CTQ)                       │
│     ├── Eval Tier 0 tripwires (<100ms, in-memory)               │
│     ├── Eval Tier 1 tripwires (<300ms, local DB)                │
│     └── If ANY tripwire triggers → IMMEDIATE INTERVENTION       │
│                        │                                        │
│                        ▼ (no tripwire triggered)                │
│  2. CTQ CALCULATION                                             │
│     ├── Load blueprint metrics and scorers                      │
│     ├── Execute each metric scorer                              │
│     ├── Calculate weighted CTQ score                            │
│     └── Compute Risk Score = 1.0 - CTQ                          │
│                        │                                        │
│                        ▼                                        │
│  3. THRESHOLD EVALUATION                                        │
│     ├── Get blueprint thresholds                                │
│     ├── Get ACL tier thresholds                                 │
│     ├── Apply stricter of (blueprint, ACL)                      │
│     └── Determine base intervention                             │
│                        │                                        │
│                        ▼                                        │
│  4. TRUST DEBT APPLICATION                                      │
│     ├── Get current trust debt                                  │
│     ├── Check threshold escalation                              │
│     └── Adjust intervention if needed                           │
│                        │                                        │
│                        ▼                                        │
│  5. FLAG EVALUATION (Orthogonal)                                │
│     ├── Check flag conditions (pattern, near-miss)              │
│     └── Add flag to intervention (combines with any decision)   │
│                        │                                        │
│                        ▼                                        │
│  6. RE-TIERING CHECK                                            │
│     ├── If trust debt > re_tiering_threshold                    │
│     └── Queue ARS re-evaluation                                 │
│                        │                                        │
│                        ▼                                        │
│  7. ISSUE INTERVENTION                                          │
│     └── Return: decision + flag + evidence                      │
└─────────────────────────────────────────────────────────────────┘

3.4.2 Reference Implementation (Pseudo-code)

def evaluate_trace(trace: CognitiveTrace, blueprint: Blueprint) -> Intervention:
    """
    Canonical ACGP governance evaluation.

    This is the authoritative evaluation order. All conformant
    implementations MUST follow this sequence.
    """

    # ═══════════════════════════════════════════════════════════
    # STEP 1: TRIPWIRE CHECK (Priority 1 - runs BEFORE CTQ)
    # ═══════════════════════════════════════════════════════════
    tripwire_result = evaluate_tripwires(trace, blueprint.tripwires)

    if tripwire_result.triggered:
        # Tripwires override all other evaluation
        return Intervention(
            decision=tripwire_result.intervention,  # block, halt, etc.
            reason=tripwire_result.reason,
            tripwire_id=tripwire_result.id,
            flagged=tripwire_result.severity in ["critical", "severe"],
            evaluation_stage="tripwire"
        )

    # ═══════════════════════════════════════════════════════════
    # STEP 2: CTQ CALCULATION
    # ═══════════════════════════════════════════════════════════
    ctq_scores = {}
    for metric in blueprint.ctq.metrics:
        scorer = get_scorer(metric.scorer)
        score = scorer.evaluate(trace, metric.parameters)
        ctq_scores[metric.name] = {
            "score": score,
            "weight": metric.weight
        }

    # Weighted average (or other aggregation per blueprint)
    ctq_final = calculate_aggregation(
        ctq_scores, 
        method=blueprint.ctq.aggregation
    )
    risk_score = 1.0 - ctq_final

    # ═══════════════════════════════════════════════════════════
    # STEP 3: THRESHOLD EVALUATION
    # ═══════════════════════════════════════════════════════════
    blueprint_thresholds = blueprint.ctq.thresholds
    acl_thresholds = get_acl_thresholds(trace.acl_tier)

    # Apply stricter threshold (lower value = stricter)
    effective_thresholds = {
        level: min(blueprint_thresholds[level], acl_thresholds[level])
        for level in ["ok", "nudge", "escalate", "block"]
    }

    # Determine base decision from risk score
    base_decision = apply_thresholds(risk_score, effective_thresholds)

    # ═══════════════════════════════════════════════════════════
    # STEP 4: TRUST DEBT APPLICATION
    # ═══════════════════════════════════════════════════════════
    trust_debt = get_current_trust_debt(trace.agent_id)

    # Trust debt can escalate the intervention
    if trust_debt > blueprint.trust_debt.thresholds.restricted_mode:
        base_decision = escalate_decision(base_decision)

    # ═══════════════════════════════════════════════════════════
    # STEP 5: FLAG EVALUATION (Orthogonal - can combine with any)
    # ═══════════════════════════════════════════════════════════
    flagged = False
    flag_reason = None

    # Flag on near-miss (close to threshold boundary)
    if is_near_threshold_boundary(risk_score, effective_thresholds):
        flagged = True
        flag_reason = "near_threshold_boundary"

    # Flag on suspicious pattern
    if detect_suspicious_pattern(trace):
        flagged = True
        flag_reason = "suspicious_pattern"

    # Flag always accumulates trust debt
    if flagged:
        trust_debt_delta = calculate_flag_debt(
            base_decision, 
            blueprint.trust_debt.accumulation
        )
        update_trust_debt(trace.agent_id, trust_debt_delta)

    # ═══════════════════════════════════════════════════════════
    # STEP 6: RE-TIERING CHECK
    # ═══════════════════════════════════════════════════════════
    if trust_debt > blueprint.trust_debt.thresholds.re_tiering_review:
        queue_ars_reevaluation(trace.agent_id)

    # ═══════════════════════════════════════════════════════════
    # STEP 7: ISSUE INTERVENTION
    # ═══════════════════════════════════════════════════════════
    return Intervention(
        decision=base_decision,
        flagged=flagged,
        flag_reason=flag_reason,
        ctq_score=ctq_final,
        risk_score=risk_score,
        trust_debt=trust_debt,
        evidence={
            "ctq_scores": ctq_scores,
            "thresholds_used": effective_thresholds,
            "tripwires_checked": [t.id for t in blueprint.tripwires]
        },
        evaluation_stage="complete"
    )


def evaluate_tripwires(trace: CognitiveTrace, tripwires: List[Tripwire]) -> TripwireResult:
    """
    Evaluate tripwires in priority order.
    Tripwires run BEFORE CTQ and can short-circuit evaluation.
    """
    # Sort by severity (severe > critical > standard)
    sorted_tripwires = sorted(tripwires, key=lambda t: t.severity_priority, reverse=True)

    for tripwire in sorted_tripwires:
        # Check eval tier budget
        if tripwire.eval_tier == 0:
            # Tier 0: must complete in <100ms, no external deps
            result = evaluate_tier0_tripwire(trace, tripwire)
        elif tripwire.eval_tier == 1:
            # Tier 1: can use local DB, target <300ms
            result = evaluate_tier1_tripwire(trace, tripwire)

        if result.triggered:
            return TripwireResult(
                triggered=True,
                id=tripwire.id,
                severity=tripwire.severity,
                intervention=map_severity_to_intervention(tripwire.severity, trace.acl_tier),
                reason=tripwire.on_fail.reason
            )

    return TripwireResult(triggered=False)


def apply_thresholds(risk_score: float, thresholds: dict) -> str:
    """
    Map risk score to intervention decision.
    Lower threshold = stricter (triggers earlier).
    """
    if risk_score <= thresholds["ok"]:
        return "ok"
    elif risk_score <= thresholds["nudge"]:
        return "nudge"
    elif risk_score <= thresholds["escalate"]:
        return "escalate"
    elif risk_score <= thresholds["block"]:
        return "block"
    else:
        return "block"  # HALT is tripwire-only


def map_severity_to_intervention(severity: str, acl_tier: str) -> str:
    """
    Map tripwire severity to intervention based on ACL tier.
    Higher ACL tiers get stricter interventions.
    """
    acl_level = int(acl_tier.replace("ACL-", ""))

    if severity == "severe":
        return "halt"  # Always halt for severe
    elif severity == "critical":
        return "halt" if acl_level >= 3 else "block"
    else:  # standard
        return "block" if acl_level >= 3 else "escalate"  # ESCALATE if ACL ≤ 2

3.4.3 Evaluation State Machine

stateDiagram-v2
    [*] --> TripwireCheck: Trace Received

    TripwireCheck --> TripwireTriggered: Tripwire Fails
    TripwireCheck --> CTQCalculation: All Tripwires Pass

    TripwireTriggered --> IssueIntervention: Immediate Decision

    CTQCalculation --> ThresholdEvaluation: CTQ Score Computed

    ThresholdEvaluation --> TrustDebtCheck: Base Decision Made

    TrustDebtCheck --> FlagEvaluation: Debt Applied

    FlagEvaluation --> ReTieringCheck: Flag Decision Made

    ReTieringCheck --> IssueIntervention: Final Decision Ready
    ReTieringCheck --> QueueReTiering: High Trust Debt

    QueueReTiering --> IssueIntervention: Re-tier Queued

    IssueIntervention --> [*]: Intervention Sent

    note right of TripwireCheck
        Priority 1
        Runs BEFORE CTQ
        Can short-circuit
    end note

    note right of FlagEvaluation
        Orthogonal
        Combines with any decision
        Accumulates trust debt
    end note

3.4.4 Precedence Rules

  1. Tripwires have absolute priority: If any tripwire triggers, evaluation stops and the tripwire's intervention is returned immediately. CTQ is not calculated.

  2. ACL thresholds override blueprint thresholds: When both exist, the stricter (lower) threshold applies.

  3. Trust debt can only escalate, not relax: Trust debt may increase an intervention's severity but never decrease it.

  4. Flag is orthogonal: The flagged status can be added to ANY intervention (ok, nudge, escalate, block). It does not change the primary decision.

  5. HALT is tripwire-only: The halt intervention can only be issued by tripwires, never by threshold-based CTQ evaluation.

3.5 Formal State Machine Specification

This section provides a formal state machine for agent governance states. Implementations SHOULD use this as the reference for state transitions.

3.5.1 Agent Governance States

┌─────────────────────────────────────────────────────────────────────┐
│                    AGENT GOVERNANCE STATE MACHINE                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────┐                                                       │
│  │ INACTIVE │ ─── register() ───► ┌──────────┐                      │
│  └──────────┘                     │  NORMAL  │◄─────────────────┐   │
│       ▲                           └────┬─────┘                   │  │
│       │                                │                         │  ││   unregister()                    flag_intervention()            │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ FLAGGED  │                   │  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                      debt > elevated_threshold           │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐    review_passed  │  │
│       │                           │ ELEVATED │ ──────────────────┤  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                      debt > retier_threshold             │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ RE-TIER  │ ── approved ──► NORMAL
│       │                           │ PENDING  │                   │  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                           acl_upgraded                   │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ UPGRADED │ ─── reset ────────┘  │
│       │                           │ (higher ACL)                    │
│       │                           └──────────┘                      │
│       │                                                             │
│  ┌────┴─────────────────────────────────────────────────────────┐   │
│  │                     TERMINAL STATES                           │  │
│  │                                                               │  │
│  │   ┌──────────┐          ┌──────────┐                         │  │
│  │   │ BLOCKED  │◄── block │ Any state │                        │  │
│  │   └────┬─────┘          └──────────┘                         │  │
│  │        │                                                      │  │
│  │   unblock(manual)                                             │  │
│  │        │                                                      │  │
│  │        └────────────────────────────────► NORMAL              │  │
│  │                                                               │  │
│  │   ┌──────────┐          ┌──────────┐                         │  │
│  │   │  HALTED  │◄── halt  │ Any state │                        │  │
│  │   └──────────┘          └──────────┘                         │  │
│  │        │                                                      │  │
│  │   Manual restart required (terminal)                          │  │
│  │                                                               │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

3.5.2 State Definitions

State Description Allowed Transitions
INACTIVE Agent not registered → NORMAL (register)
NORMAL Operating normally → FLAGGED, BLOCKED, HALTED
FLAGGED Trust debt accumulating → NORMAL (decay), ELEVATED, BLOCKED, HALTED
ELEVATED Under increased scrutiny → NORMAL (review), RE-TIER, BLOCKED, HALTED
RE-TIER Pending ACL re-evaluation → UPGRADED, NORMAL (denied), HALTED
UPGRADED ACL tier increased → NORMAL (reset)
BLOCKED Temporarily suspended → NORMAL (unblock)
HALTED Permanently stopped Terminal (manual restart)

3.5.3 Transition Functions

class AgentGovernanceState:
    """Formal state machine for agent governance."""

    def __init__(self):
        self.state = "INACTIVE"
        self.trust_debt = 0.0
        self.acl_tier = None

    def transition(self, event: str, context: dict) -> str:
        """
        Execute state transition based on event.

        Returns:
            New state after transition
        """
        transitions = {
            ("INACTIVE", "register"): self._register,
            ("NORMAL", "flag"): self._flag,
            ("NORMAL", "block"): lambda c: "BLOCKED",
            ("NORMAL", "halt"): lambda c: "HALTED",
            ("FLAGGED", "evaluate"): self._evaluate_flagged,
            ("FLAGGED", "block"): lambda c: "BLOCKED",
            ("FLAGGED", "halt"): lambda c: "HALTED",
            ("ELEVATED", "review_passed"): lambda c: "NORMAL",
            ("ELEVATED", "retier_triggered"): lambda c: "RE-TIER",
            ("ELEVATED", "halt"): lambda c: "HALTED",
            ("RE-TIER", "approved"): lambda c: "NORMAL",
            ("RE-TIER", "upgraded"): self._upgrade,
            ("RE-TIER", "halt"): lambda c: "HALTED",
            ("BLOCKED", "unblock"): lambda c: "NORMAL",
            ("UPGRADED", "reset"): self._reset_after_upgrade,
        }

        key = (self.state, event)
        if key in transitions:
            self.state = transitions[key](context)
        else:
            raise InvalidTransitionError(f"No transition for {key}")

        return self.state

    def _flag(self, context):
        self.trust_debt += context.get("debt_delta", 0.1)
        return "FLAGGED"

    def _evaluate_flagged(self, context):
        if self.trust_debt > context.get("retier_threshold", 0.75):
            return "RE-TIER"
        elif self.trust_debt > context.get("elevated_threshold", 0.5):
            return "ELEVATED"
        elif self.trust_debt < context.get("normal_threshold", 0.3):
            return "NORMAL"
        return "FLAGGED"

3.5.4 Formal Verification Note

For safety-critical implementations, consider formal verification using:

  • TLA+: For distributed consensus properties
  • Alloy: For state invariant checking
  • Spin/Promela: For temporal logic verification

Example invariants to verify: - An agent in HALTED state cannot transition to any other state - Trust debt is monotonically non-decreasing within a session (before decay) - BLOCKED agents cannot issue new actions


4. Data Flow Architecture

4.1 Write Path (Trace Processing)

flowchart LR
    subgraph "Ingestion"
        T[Cognitive Trace] --> V[Validation]
        V --> VN[Version Check]
        VN --> Q[Message Queue]
    end

    subgraph "Processing"
        Q --> GS[Governance Steward]
        GS --> TW[Tripwire Check]
        TW --> CTQ[CTQ Calculation]
        CTQ --> DEC[Decision Logic]
    end

    subgraph "Storage"
        DEC --> WB[Write Buffer]
        WB --> RDB[(ReflectionDB)]
        WB --> TS[(Time Series)]
    end

    subgraph "Response"
        DEC --> INT[Intervention]
        INT --> PA[Operating Agent]
    end

    style T fill:#d9f99d
    style TW fill:#fef3c7
    style INT fill:#fca5a5
    style RDB fill:#fed7aa

4.2 Read Path (Query & Analytics)

flowchart LR
    subgraph "Query Layer"
        API[Query API] --> CACHE[Query Cache]
        CACHE --> QE[Query Engine]
    end

    subgraph "Storage"
        QE --> RDB[(ReflectionDB)]
        QE --> IDX[(Indexes)]
        QE --> AGG[(Aggregates)]
    end

    subgraph "Consumers"
        API --> DASH[Dashboard]
        API --> AUDIT[Audit Tools]
        API --> ANALYTICS[Analytics]
    end

5. Deployment Topologies

5.1 Single Agent Governance (Simple)

Deployment: Sidecar Pattern
Components:
  - 1 Operating Agent
  - 1 Governance Steward (sidecar)
  - 1 Policy Engine (embedded)
  - 1 ReflectionDB (local SQLite)

Use Cases:
  - Development/Testing
  - Low-risk applications (ACL-0, ACL-1)
  - Edge deployments

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Timeout: 500ms per attempt

5.2 Multi-Agent with Shared Governance (Standard)

Deployment: Service Mesh Pattern
Components:
  - N Operating Agents
  - N Governance Stewards (1:1 with agents)
  - 1 Shared Policy Engine (service)
  - 1 Shared ReflectionDB (PostgreSQL/MongoDB)
  - 1 Blueprint Store (git-backed)

Use Cases:
  - Enterprise deployments (ACL-2, ACL-3)
  - Multi-tenant SaaS
  - Microservices architecture

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Timeout: 500ms per attempt
  - Tripwire: 5 failures → open

5.3 Distributed Steward Network (Advanced)

Deployment: Federated Pattern
Components:
  - N Operating Agents (across regions)
  - M Governance Stewards (M < N, pooled)
  - Regional Policy Engines
  - Distributed ReflectionDB (Cassandra/CockroachDB)
  - Replicated Blueprint Stores
  - Cross-region Steward coordination

Use Cases:
  - Global deployments (ACL-4, ACL-5)
  - High availability requirements
  - Regulatory compliance (data residency)

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Regional timeout: 300ms
  - Cross-region timeout: 1000ms
  - Tripwire: per-region

5.4 Reference Minimal Architecture (ACGP-MIN-A1)

The ACGP-MIN-A1 architecture is the simplest conformant deployment, suitable for development and POC.

┌─────────────────────────────────────────────────────────────┐
│                      ACGP-MIN-A1                             │
│                                                             │
│  ┌─────────────┐       ┌─────────────────────────────────┐ │
│  │   Primary   │       │      Governance Steward         │ │
│  │   Agent     │──────►│  ┌─────────────────────────┐   │ │
│  │             │       │  │ Policy Engine (embedded)│   │ │
│  │  ACL-0/1    │◄──────│  └─────────────────────────┘   │ │
│  └─────────────┘       │  ┌─────────────────────────┐   │ │
│                        │  │ ReflectionDB (SQLite)   │   │ │
│                        │  └─────────────────────────┘   │ │
│                        │  ┌─────────────────────────┐   │ │
│                        │  │ Blueprint (local YAML)  │   │ │
│                        │  └─────────────────────────┘   │ │
│                        └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Components: - 1 Operating Agent (ACL-0 or ACL-1) - 1 Governance Steward (sidecar or co-located process) - Embedded Policy Engine (in-process) - Local ReflectionDB (SQLite file) - Local Blueprint (YAML file)

NOT Included: - Certified Source Registry - MCP/A2A adapters - Distributed storage - HITL system - Governance contracts

Spec Requirements Satisfied:

Requirement ACGP-MIN-A1
Core Protocol (ACGP-1000)
Version Negotiation (ACGP-1003)
Basic Interventions (OK, Block)
Full Interventions (all 6) Optional
Tripwires (Standard)
CTQ Calculation Simplified
ReflectionDB Retention Session
Security (TLS) Optional

Example Deployment:

# docker-compose.yml for ACGP-MIN-A1
version: "3.8"
services:
  agent:
    image: myorg/my-agent:latest
    environment:
      - ACGP_STEWARD_URL=http://steward:8080
      - ACGP_ACL_TIER=ACL-1

  steward:
    image: acgp/steward-minimal:latest
    volumes:
      - ./blueprints:/blueprints
      - ./data:/data
    environment:
      - ACGP_BLUEPRINT_PATH=/blueprints/dev.yaml
      - ACGP_REFLECTIONDB_PATH=/data/reflection.db

5.5 Deployment Decision Matrix

Factor Single Agent Multi-Agent Shared Distributed Network
Latency <10ms <50ms <150ms
Throughput <100 req/s <10K req/s >100K req/s
Availability 99% 99.9% 99.99%
Complexity Low Medium High
Cost $ $$ $$$
Governance Strength Basic Standard Maximum
Retry Overhead Minimal Low Moderate

6. Scaling Considerations

6.1 Horizontal Scaling

6.1.1 Stateless Components

Components that can scale horizontally without coordination: - Governance Stewards (with session affinity) - Policy Engines (read-only blueprint access) - API Gateways - Query services

6.1.2 Stateful Components

Components requiring careful scaling strategies: - ReflectionDB (sharding by agent_id or time) - Trust Debt stores (consistent hashing) - Blueprint Store (eventual consistency acceptable)

6.2 Performance Optimization

Caching Strategy:
  L1_Cache:
    - Location: Agent Wrapper
    - Contents: Recent interventions, version info
    - TTL: 60 seconds

  L2_Cache:
    - Location: Governance Steward
    - Contents: Blueprint resolutions, CTQ calculations
    - TTL: 300 seconds

  L3_Cache:
    - Location: Policy Engine
    - Contents: Compiled blueprints, threshold tables
    - TTL: 3600 seconds

Batching:
  - Trace batching: Up to 10 traces per request
  - Write batching: 100ms window for ReflectionDB
  - Query batching: GraphQL-style query aggregation

Retry Optimization:
  - Adaptive timeout based on historical latency
  - Jitter to prevent thundering herd
  - Per-destination tripwires

6.3 Load Distribution

graph TB
    subgraph "Load Balancer"
        LB[L7 Load Balancer<br/>with Health Checks]
    end

    subgraph "Governance Tier 1 (ACL 0-2)"
        GS1[Steward 1<br/>CPU: 4<br/>RAM: 8GB]
        GS2[Steward 2<br/>CPU: 4<br/>RAM: 8GB]
    end

    subgraph "Governance Tier 2 (ACL 3-5)"
        GS3[Steward 3<br/>CPU: 8<br/>RAM: 16GB]
        GS4[Steward 4<br/>CPU: 8<br/>RAM: 16GB]
    end

    LB -->|Low ACL agents| GS1
    LB -->|Low ACL agents| GS2
    LB -->|High ACL agents| GS3
    LB -->|High ACL agents| GS4

    GS1 -.->|Health Check| LB
    GS2 -.->|Health Check| LB
    GS3 -.->|Health Check| LB
    GS4 -.->|Health Check| LB

7. Integration Patterns

7.1 Agent Framework Integration

7.1.1 Wrapper Pattern

# One-line integration example
from ACGP import GovernanceWrapper

agent = create_langchain_agent()
governed_agent = GovernanceWrapper(
    agent, 
    blueprint="finance/trading",
    retry_policy={
        'max_attempts': 3,
        'timeout_ms': 500,
        'backoff': 'exponential'
    }
)
governed_agent.run()  # All actions now governed

7.1.2 Middleware Pattern

# Middleware for existing frameworks
class ACGPMiddleware:
    def __init__(self):
        self.version = "1.0.2"
        self.retry_policy = RetryPolicy(max_attempts=3)

    async def before_tool_call(self, tool, args):
        # Retry logic wrapper
        for attempt in range(self.retry_policy.max_attempts):
            try:
                trace = generate_trace(tool, args)
                intervention = await self.steward.evaluate(
                    trace, 
                    timeout_ms=500
                )

                if intervention.decision == "block":
                    raise BlockedException(intervention.reason)

                return intervention.modified_args or args

            except TimeoutError:
                if attempt < self.retry_policy.max_attempts - 1:
                    await self.retry_policy.backoff(attempt)
                else:
                    return await self.escalate_for_review(tool, args)

7.2 External Protocol Integration

7.2.1 MCP (Model Context Protocol)

sequenceDiagram
    participant Agent
    participant ACGP
    participant MCP
    participant Tool

    Agent->>ACGP: Request tool use
    ACGP->>ACGP: Evaluate governance (check tripwires)

    alt Approved
        ACGP->>MCP: Forward request
        MCP->>Tool: Execute
        Tool-->>MCP: Result
        MCP-->>ACGP: Response
        ACGP-->>Agent: Tool result
    else Blocked
        ACGP-->>Agent: Blocked + reason
    end

7.2.2 A2A (Agent-to-Agent Protocol)

sequenceDiagram
    participant A1 as Agent 1
    participant GS1 as Steward 1
    participant A2A
    participant GS2 as Steward 2
    participant A2 as Agent 2

    A1->>GS1: Prepare message
    GS1->>GS1: Validate outbound (check tripwires)

    alt Allowed
        GS1->>A2A: Send message
        A2A->>GS2: Deliver
        GS2->>GS2: Validate inbound

        alt Accepted
            GS2->>A2: Deliver message
        else Rejected
            GS2->>A2A: Reject
            A2A->>A1: Message rejected
        end
    else Blocked
        GS1->>A1: Outbound blocked
    end

8. High Availability & Resilience

8.1 Failure Modes & Recovery

Component Failure Mode Recovery Strategy Degraded Operation
Governance Steward Process crash Auto-restart, session migration Failover to backup steward
Policy Engine Unavailable Tripwire, cached policies Use last known good policy
ReflectionDB Write failure Write-ahead log, retry queue Buffer writes locally (max 1000)
Blueprint Store Unavailable Local blueprint cache No policy updates
Trusted Monitor Timeout Async processing, skip Proceed without anomaly check
Version Service Unavailable Use cached version info Assume compatible

8.2 Tripwire Configuration

tripwire:
  failure_threshold: 5         # failures to open circuit
  success_threshold: 2         # successes to close circuit
  timeout: 30s                 # time before half-open

  fallback:
    steward: use_cached_decision
    policy_engine: use_default_thresholds
    trusted_monitor: skip_validation
    reflection_db: buffer_locally
    version_negotiation: assume_compatible

  per_component:
    governance_steward:
      failure_threshold: 3
      timeout: 15s
    reflection_db:
      failure_threshold: 10
      timeout: 60s

8.3 Tripwires and Evaluation Tiers

Relationship: Tripwires (policy constraints) implement Evaluation Tiers (architectural patterns).

Tripwires define WHAT to check. Evaluation Tiers (see ACGP-1010) define HOW and WHEN to check.

8.3.1 Classifying Tripwires by Evaluation Tier

Eval Tier 0 Tripwires (must be <100ms P99, no external dependencies): - Authentication failures - Schema validation - Critical safety limits (e.g., "never delete production database") - Hard monetary limits for immediate rejection - In-memory rate limiting

Example:

tripwires:
  - name: "max_single_transaction"
    threshold: 10000
    eval_tier: 0
    latency_budget_ms: 10
    check_type: "in_memory"
    fail_mode: "closed"

Eval Tier 1 Tripwires (may be slower, can use local DB): - Rate limiting with external state (Redis lookup) - Daily/monthly aggregate limits (requires DB query) - Stateful pattern checks - Cached policy decisions

Example:

tripwires:
  - name: "daily_transaction_limit"
    threshold: 50000
    eval_tier: 1
    latency_budget_ms: 100
    check_type: "db_lookup"
    requires_state: true
    fail_mode: "configurable"

8.3.2 Decision Matrix for Tripwire Classification

Tripwire Characteristic Suggested Eval Tier
No external dependencies Tier 0
Latency < 10ms Tier 0
Critical safety (can't fail open) Tier 0
Requires DB/cache lookup Tier 1
Aggregate/windowed limit Tier 1
Complex calculation Tier 1
LLM-based evaluation Tier 2
Human review Tier 3

8.3.3 Implementation Guidance

Tier 0 Implementation (REQUIRED for ACGP-1010 conformance):

class Tier0Tripwires:
    def __init__(self, config):
        # Load tripwires into memory
        self.tripwires = [
            t for t in config.tripwires 
            if t.eval_tier == 0
        ]
        # MUST be fast and local
        assert all(not t.requires_external for t in self.tripwires)

    def evaluate(self, request):
        """MUST complete in <100ms."""
        for tripwire in self.tripwires:
            if tripwire.triggered(request):
                return TripwireResult.BLOCK
        return TripwireResult.PASS

Tier 1 Implementation (typical):

class Tier1Tripwires:
    def __init__(self, config, redis_client):
        self.tripwires = [
            t for t in config.tripwires 
            if t.eval_tier == 1
        ]
        self.redis = redis_client  # Local cache/DB allowed

    async def evaluate(self, request):
        """Target <300ms, may query local DB."""
        for tripwire in self.tripwires:
            if tripwire.requires_state:
                state = await self.redis.get(tripwire.state_key)
                if tripwire.triggered_with_state(request, state):
                    return TripwireResult.BLOCK
            elif tripwire.triggered(request):
                return TripwireResult.BLOCK
        return TripwireResult.PASS

8.4 Retry Policy Implementation

class RetryPolicy:
    def __init__(self, max_attempts=3, base_delay_ms=100, 
                 max_delay_ms=5000, timeout_ms=500):
        self.max_attempts = max_attempts
        self.base_delay_ms = base_delay_ms
        self.max_delay_ms = max_delay_ms
        self.timeout_ms = timeout_ms

    async def execute_with_retry(self, operation):
        """Execute operation with exponential backoff retry."""
        for attempt in range(self.max_attempts):
            try:
                return await asyncio.wait_for(
                    operation(),
                    timeout=self.timeout_ms / 1000
                )
            except asyncio.TimeoutError:
                if attempt < self.max_attempts - 1:
                    delay = min(
                        self.base_delay_ms * (2 ** attempt),
                        self.max_delay_ms
                    )
                    # Add jitter (±10%)
                    jitter = random.uniform(-0.1, 0.1) * delay
                    await asyncio.sleep((delay + jitter) / 1000)
                else:
                    # Final failure - escalate
                    raise RetryExhaustedError(
                        f"Failed after {self.max_attempts} attempts"
                    )

8.5 Graceful Degradation Levels

stateDiagram-v2
    [*] --> Normal: All systems operational

    Normal --> Degraded: Non-critical failure
    Degraded --> Essential: Multiple failures
    Essential --> Emergency: Critical failures
    Emergency --> Shutdown: Safety threshold exceeded

    Normal --> Normal: Self-healing
    Degraded --> Normal: Recovery
    Essential --> Degraded: Partial recovery

    note right of Degraded
        - Increased cache usage
        - Relaxed consistency
        - Async interventions
        - Skip version negotiation
    end note

    note right of Essential
        - Only critical interventions
        - Batch processing
        - Manual escalation
        - Cached policies only
    end note

    note right of Emergency
        - Block all high-risk
        - Force human review
        - Read-only mode
        - Emergency override active
    end note

8.6 Governance Contract Architecture Patterns

Stewards MAY adopt different architectural patterns based on cost, latency, and quality requirements. All patterns are conformant if they meet ACGP-1009 and ACGP-1010 requirements.

8.6.1 Pattern Comparison

Pattern Eval Tiers Target Latency Monthly Cost (est.) Use Case
Rule-Only 0, 1 100–300ms $500 High-volume transactional agents
Hybrid 0, 1, 2 (async 3) 100–300ms (5s async) $2,000 Balanced quality/performance
Max Quality 0, 1, 2, 3 100ms–5s $20,000 Safety-critical, low-volume

8.6.2 Rule-Only Pattern

Tiers Used: Eval-0 (static rules) + Eval-1 (cached/indexed DB lookups)

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (100ms budget)
┌─────────────────────────────────────┐
│ Steward                             │
│ ┌─────────────┐   ┌──────────────┐ │
│ │ Eval-0      │   │ Eval-1       │ │
│ │ In-memory   │──▶│ Cache/Index  │ │
│ │ rules <50ms │   │ <250ms       │ │
│ └─────────────┘   └──────────────┘ │
└─────────────────────────────────────┘
       │ EVAL_RESPONSE (allow/deny)
┌─────────────┐
│ Agent       │
└─────────────┘

Fallback Strategy: deny (conservative)

Trade-offs: - Lowest cost, predictable latency - No LLM reasoning, limited context depth - Cannot handle novel/ambiguous actions

8.6.3 Hybrid Pattern

Tiers Used: Eval-0, 1 (sync) + Eval-2 (async model inference) + optional Eval-3 (human review)

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (300ms budget)
┌──────────────────────────────────────────────────┐
│ Steward                                          │
│ ┌──────┐    ┌──────┐    ┌────────────────────┐   │
│ │ Tier │──▶│ Tier │──▶   Tier 2 (async)     │   │
│ │  0   │    │  1   │    │ Model inference    │   │
│ └──────┘    └──────┘    │ Post-action        │   │
│                         └────────────────────┘   │
└──────────────────────────────────────────────────┘
       │ EVAL_RESPONSE (allow_and_log)
┌─────────────┐         (background)
│ Agent       │    ┌───▶ Model review
└─────────────┘    │     Human escalation (Tier 3)
                   │     Retroactive intervention
                   └─────

Fallback Strategy: allow_and_log (permissive with async audit)

Trade-offs: - Fast sync path, deep async reasoning - Cost-effective for most workflows - [WARNING] Risk window during async evaluation - [WARNING] Requires robust post-action intervention mechanisms

8.6.4 Max Quality Pattern

Tiers Used: All tiers (0, 1, 2, 3) synchronously

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (5000ms budget, critical_risk)
┌───────────────────────────────────────────────────────┐
│ Steward                                               │
│ ┌──────┐   ┌──────┐   ┌──────┐   ┌────────────────┐ │
│ │ Tier │──▶│ Tier │──▶│ Tier │──▶│ Tier 3         │ │
│ │  0   │   │  1   │   │  2   │   │ Human review   │ │
│ │      │   │      │   │ LLM  │   │ (on-demand)    │ │
│ └──────┘   └──────┘   └──────┘   └────────────────┘ │
└───────────────────────────────────────────────────────┘
       │ EVAL_RESPONSE (wait for all tiers)
┌─────────────┐
│ Agent       │
└─────────────┘

Fallback Strategy: escalate (require human decision on timeout)

Trade-offs: - [YES] Maximum safety and quality - [YES] Full audit trail with human oversight - [NO] High cost (model + human labor) - [NO] High latency (up to 5s + human response time)

8.6.5 Pattern Selection Matrix

Choose pattern based on:

Requirement Rule-Only Hybrid Max Quality
Actions/sec > 100 [YES] [WARNING] [NO]
Cost < $1k/month [YES] [WARNING] [NO]
Latency < 300ms guaranteed [YES] [YES] [NO]
Novel actions frequent [NO] [YES] [YES]
Safety-critical domain [NO] [WARNING] [YES]
Audit/compliance required [WARNING] [YES] [YES]

Deployment Note: Stewards MAY implement multiple patterns and select per-agent or per-action based on "risk_level" in governance contracts (ACGP-1010).


9. Performance Requirements

9.1 Unified Latency Model

End-to-End (E2E) Latency Definition: Measured from agent request submission to agent receipt of governance decision.

Components:

E2E Latency = Network(agent→steward) 
            + Protocol Overhead (parsing, validation)
            + Governance Evaluation  ← Largest component
            + Network(steward→agent)

9.2 Latency Targets by Risk Level

Risk-based latency budgets (see ACGP-1010 Governance Contracts):

Component Low Risk Elevated Risk Critical Risk
Network (round-trip) 20ms 20ms 50ms
Protocol overhead 30ms 30ms 50ms
Governance evaluation 50ms 250ms 4900ms
TOTAL E2E (P99) 100ms 300ms 5000ms

Governance Evaluation Budget Allocation:

Low Risk (50ms total): - Eval Tier 0 (must-pass checks): 30ms - Eval Tier 1 (fast policy): 20ms - Eval Tier 2: Async only

Elevated Risk (250ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: Async audit

Critical Risk (4900ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: 4500ms (synchronous LLM analysis) - Eval Tier 3: Human time (separate)

Use the Latency Calculator to model budget allocation for your use case.

9.3 Component-Level Latency Targets

Operation P50 P95 P99 Notes
Version Negotiation 10ms 25ms 50ms One-time per connection
Eval Tier 0 (must-pass) 20ms 50ms 100ms REQUIRED <100ms
Eval Tier 1 (policy) 50ms 150ms 300ms Target, not requirement
Eval Tier 2 (LLM) 2000ms 5000ms 10000ms Async recommended
ReflectionDB Write 10ms 50ms 200ms Asynchronous

9.4 Throughput Requirements

ACL Tier Traces/sec per Agent Batch Size Queue Depth Retry Budget
ACL-0 10 1 100 10%
ACL-1 50 5 500 10%
ACL-2 100 10 1000 15%
ACL-3 200 20 2000 15%
ACL-4 500 50 5000 20%
ACL-5 1000 100 10000 20%

9.5 Resource Requirements

Minimum_Requirements:
  Governance_Steward:
    CPU: 2 cores
    Memory: 4GB
    Network: 100Mbps
    Disk: 10GB SSD
    Connections: 1000 concurrent

  Policy_Engine:
    CPU: 4 cores
    Memory: 8GB
    Network: 1Gbps
    Disk: 20GB SSD
    Cache: 2GB Redis

  ReflectionDB:
    CPU: 8 cores
    Memory: 32GB
    Network: 10Gbps
    Disk: 1TB NVMe SSD
    IOPS: 10000
    Write_Buffer: 100MB

9.6 Observability Standards [NORMATIVE]

Implementations at Standard conformance level MUST expose the following metrics.

9.6.1 Required Metrics

Prometheus Format:

# Governance Evaluation Metrics
acgp_evaluation_total{agent_id, acl_tier, decision} counter
acgp_evaluation_latency_seconds{agent_id, acl_tier, eval_tier, quantile} summary
acgp_ctq_score{agent_id, acl_tier, metric} gauge

# Intervention Metrics
acgp_intervention_total{agent_id, decision, tripwire_id} counter
acgp_intervention_latency_seconds{decision, quantile} summary

# Trust Debt Metrics
acgp_trust_debt{agent_id} gauge
acgp_trust_debt_delta_total{agent_id, reason} counter

# Tripwire Metrics
acgp_tripwire_triggered_total{tripwire_id, severity, agent_id} counter
acgp_tripwire_latency_seconds{tripwire_id, eval_tier, quantile} summary

# System Health
acgp_steward_status{steward_id} gauge  # 0=down, 1=degraded, 2=normal
acgp_reflectiondb_write_latency_seconds{quantile} summary
acgp_reflectiondb_size_bytes gauge

9.6.2 Standard Metric Labels

Label Description Values
agent_id Unique agent identifier UUID
acl_tier Agent's ACL tier ACL-0 through ACL-5
decision Intervention decision ok, nudge, flag, escalate, block, halt
eval_tier Evaluation tier 0, 1, 2, 3
tripwire_id Tripwire identifier string
severity Tripwire severity standard, critical, severe
quantile Percentile bucket 0.5, 0.9, 0.95, 0.99

9.6.3 Required Endpoints

Standard conformance implementations MUST expose:

endpoints:
  /metrics:
    format: prometheus
    auth: optional

  /health:
    format: json
    response:
      status: healthy|degraded|unhealthy
      components:
        policy_engine: ok|error
        reflectiondb: ok|error
        steward: ok|error

  /ready:
    format: json
    response:
      ready: true|false
      reason: string

9.6.4 Alerting Recommendations

alerts:
  - name: HighHaltRate
    expr: rate(acgp_intervention_total{decision="halt"}[5m]) > 0.01
    severity: critical

  - name: EvaluationLatencyHigh
    expr: acgp_evaluation_latency_seconds{quantile="0.99"} > 0.5
    severity: warning

  - name: TrustDebtCritical
    expr: acgp_trust_debt > 0.75
    severity: warning

  - name: StewardDegraded
    expr: acgp_steward_status < 2
    severity: warning

10. Security Architecture

10.1 Defense in Depth Layers

graph TB
    subgraph "Layer 1: Network Security"
        FW[Firewall]
        IDS[IDS/IPS]
        TLS[TLS 1.3]
    end

    subgraph "Layer 2: Authentication"
        VER[Version Auth]
        OAUTH[OAuth 2.0]
        MTLS[Mutual TLS]
    end

    subgraph "Layer 3: Authorization"
        RBAC[Role-Based Access]
        ABAC[Attribute-Based Access]
        MPA[Multi-Party Auth]
    end

    subgraph "Layer 4: Message Security"
        SIGN[ES256 Signatures]
        ENCRYPT[Encryption at Rest]
        CHECKSUM[SHA-256 Integrity]
    end

    subgraph "Layer 5: Audit & Monitoring"
        AUDIT[Audit Logging]
        SIEM[SIEM Integration]
        ALERT[Security Alerts]
    end

10.2 Zero Trust Architecture

zero_trust_principles:
  never_trust_always_verify:
    - Verify every transaction
    - Version check on every connection
    - No implicit trust based on network location
    - Continuous validation of security posture

  least_privilege_access:
    - Minimal permissions by default
    - Time-bound credential elevation
    - Regular permission audits

  assume_breach:
    - Comprehensive logging
    - Anomaly detection
    - Tripwire system
    - Incident response readiness

  verify_explicitly:
    - Multi-factor authentication
    - Device compliance checks
    - Risk-based access controls

10.3 Cryptographic Requirements

Component Requirement Algorithm Key Size
Transport Encryption TLS 1.3 2048-bit RSA / 256-bit ECC
Message Signing Non-repudiation ES256 (ECDSA) 256-bit
Checksum Integrity SHA-256 256-bit
Storage Encryption at rest AES-256-GCM 256-bit
Key Derivation Key generation PBKDF2 100,000 iterations

Note: ES256 is standardized throughout ACGP. All implementations MUST use ES256 for message signing.


11. Conformance Requirements

A conformant ACGP architecture implementation MUST:

11.1 Component Requirements

  • Implement all core components defined in Section 2
  • Support at least one deployment topology from Section 5
  • Meet the latency requirements in Section 9.1 for target ACL tier
  • Implement version negotiation as first step in protocol flow

11.2 Integration Requirements

  • Provide wrapper or middleware for at least one agent framework
  • Support the standard message formats defined in ACGP-1003
  • Implement the security requirements in Section 10
  • Use ES256 for all message signatures (ACL-3+)

11.3 Operational Requirements

  • Maintain an append-only audit trail in ReflectionDB
  • Support graceful degradation as defined in Section 8
  • Provide monitoring and alerting capabilities
  • Implement retry policy with exponential backoff

11.4 Scaling Requirements

  • Support horizontal scaling of stateless components
  • Implement caching strategy for performance optimization
  • Handle at least the throughput specified for the target ACL tier
  • Support tripwires for all external dependencies

11.5 Resilience Requirements

  • Implement retry policy (3 attempts, exponential backoff)
  • Support timeout handling (500ms default)
  • Maintain tripwires for failure isolation
  • Buffer writes locally during ReflectionDB unavailability (max 1000 events)

12. References

Normative References

  • ACGP-1000: Core Protocol Specification
  • ACGP-1001: Terminology and Definitions
  • ACGP-1003: Message Formats & Wire Protocol
  • ACGP-1004: Reflection Blueprint Specification
  • ACGP-1005: ARS-CTQ-ACL Integration Framework
  • ACGP-1007: Security Considerations
  • RFC 2119: Key words for use in RFCs

Informative References

  • NIST Cybersecurity Framework: Security architecture guidance
  • ISO 27001: Information security management
  • The Twelve-Factor App: Scalability principles
  • Google SRE Book: Reliability engineering practices
  • Tripwire Pattern: Martin Fowler's design patterns

End of ACGP-1002