ACGP-1002: Architecture Specification¶

Status: Draft Last Updated: 2026-01-08 Spec ID: ACGP-1002 Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)

Abstract¶

This document specifies the system architecture for the Agentic Cognitive Governance Protocol (ACGP). It defines the components, their interactions, deployment patterns, and scaling considerations. The architecture supports multiple deployment topologies from simple single-agent governance to complex multi-agent networks with distributed stewardship. This specification provides normative requirements for component interfaces, data flows, integration patterns, retry behavior, and version negotiation that enable interoperable ACGP implementations.

Table of Contents¶

Introduction
System Components
Component Interactions
Data Flow Architecture
Deployment Topologies
Scaling Considerations
Integration Patterns
High Availability & Resilience
Performance Requirements
Security Architecture
Conformance Requirements
References

1. Introduction¶

The ACGP architecture is designed around principles of separation of concerns, defense in depth, and runtime adaptability. It enables real-time governance without introducing prohibitive latency while maintaining comprehensive audit trails and supporting human oversight.

1.1 Design Principles¶

Separation of Concerns: Governance logic is separate from agent logic
Defense in Depth: Multiple layers of validation and intervention
Minimal Intrusion: One-line integration for existing agents
Scalability: Horizontal scaling of governance components
Resilience: Graceful degradation under failure conditions
Observability: Complete visibility into all decisions

1.2 Requirements Language¶

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

2. System Components¶

2.1 Core Components¶

graph TB
    subgraph "Governance Layer"
        GS[Governance Steward]
        PE[Policy Engine]
        TM[Trusted Monitor]
        RE[Reflection Engine]
    end

    subgraph "Agent Layer"
        PA[Operating Agent]
        SA[Sub-Agents]
        AW[Agent Wrapper/SDK]
    end

    subgraph "Storage Layer"
        RDB[(ReflectionDB)]
        CSR[(Certified Source Registry)]
        BPS[(Blueprint Store)]
    end

    subgraph "Control Plane"
        HITL[Human-In-The-Loop]
        DASH[Monitoring Dashboard]
        ALERT[Alert Manager]
        CONFIG[Configuration Service]
    end

    subgraph "Integration Layer"
        MCP[MCP Adapter]
        A2A[A2A Adapter]
        API[REST/gRPC API]
    end

2.2 Component Descriptions¶

2.2.1 Governance Steward¶

Purpose: Primary governance enforcement agent
Responsibilities:
Observe Cognitive Traces from Operating Agents
Calculate CTQ scores using Reflection Engine
Issue Interventions (all six types: OK, Nudge, Escalate, Block, Halt, Flag)
Manage Trust Debt accumulation
Evaluate Tripwires (Standard, Critical, Severe)
Coordinate with other Stewards in a Steward Network

2.2.2 Policy Engine¶

Purpose: Decision engine for interventions
Responsibilities:
Apply ACL-tier specific thresholds (Risk Score based)
Evaluate tripwire conditions
Map CTQ scores to intervention decisions
Handle dynamic threshold adjustments
Support Flag orthogonality

2.2.3 Trusted Monitor¶

Purpose: Independent anomaly detection
Responsibilities:
Parallel evaluation of agent behavior
Pattern-based threat detection
Behavioral drift monitoring
Zero-trust verification of agent claims

2.2.4 Operating Agent¶

Purpose: The AI agent performing actual work
Responsibilities:
Execute tasks and use tools
Generate Cognitive Traces
Respond to Interventions
Manage sub-agent lifecycle

2.2.5 ReflectionDB¶

Purpose: Immutable audit trail
Requirements:
Append-only architecture
Cryptographic integrity (hash chain)
Time-series optimization
Compliance-grade retention

2.2.6 Blueprint Store¶

Purpose: Centralized policy repository
Features:
Version control for blueprints
Multi-Party Authorization (MPA) for changes
Inheritance resolution
Hot-reload capability

3. Component Interactions¶

3.1 Primary Flow Sequence with Version Negotiation¶

sequenceDiagram
    participant PA as Operating Agent
    participant AW as Agent Wrapper
    participant GS as Governance Steward
    participant PE as Policy Engine
    participant TM as Trusted Monitor
    participant RDB as ReflectionDB

    Note over PA,GS: Initial Connection
    PA->>GS: VERSION_NEGOTIATION
    GS-->>PA: VERSION_SELECTED (1.0.2)

    Note over PA,RDB: Runtime Governance Loop
    PA->>AW: Execute Action
    AW->>AW: Generate Cognitive Trace

    par Parallel Processing
        AW->>GS: Send TRACE
        GS->>PE: Evaluate with Blueprint
    and
        AW->>TM: Send TRACE
        TM->>TM: Anomaly Detection
    end

    PE->>PE: Check Tripwires (Priority 1)

    alt Tripwire Triggered
        PE->>GS: Immediate Override Decision
        GS->>AW: INTERVENTION (Block/Halt)
    else No Tripwire
        PE->>PE: Calculate CTQ Score
        PE->>PE: Compute Risk Score (1.0 - CTQ)
        PE->>PE: Apply ACL Thresholds
        PE->>GS: Return Decision
    end

    TM-->>GS: Flag Anomalies (if any)

    GS->>GS: Reconcile Decisions
    GS->>GS: Check Flag Orthogonality

    alt Intervention = OK or NUDGE
        GS->>AW: Send INTERVENTION
        AW->>PA: Proceed (modified if NUDGE)
    else Intervention = ESCALATE
        GS->>HITL: Request Human Review
        HITL-->>GS: Human Decision
        GS->>AW: Forward Decision
    else Intervention = BLOCK or HALT
        GS->>AW: Send INTERVENTION
        AW->>PA: Stop Execution
        Note over PA: Action blocked or session terminated
    end

    alt Flagged
        GS->>GS: Update Trust Debt
        GS->>GS: Check Re-tier Threshold
    end

    GS->>RDB: Log Complete Event (TRACE + EVAL + INTERVENTION)

3.2 Trust Debt Accumulation Flow¶

stateDiagram-v2
    [*] --> Normal: Initial State

    Normal --> Flagged: Flag Intervention
    Flagged --> DebtAccumulation: Add Flag Weight (0.1/0.3/0.5)

    DebtAccumulation --> ThresholdCheck: Check Debt Level

    ThresholdCheck --> Normal: Debt < Warning
    ThresholdCheck --> Elevated: Debt > Warning
    ThresholdCheck --> Critical: Debt > Critical

    Elevated --> TightenedControl: Reduce Thresholds
    Critical --> ForcedReTier: Trigger ARS Re-evaluation

    ForcedReTier --> HigherACL: Increase ACL Tier
    HigherACL --> Normal: Reset with New Tier

    Normal --> DebtDecay: Time Passes (0.95 per 24h)
    Elevated --> DebtDecay: Time Passes
    DebtDecay --> Normal: Reduce Debt

3.3 Retry and Timeout Behavior¶

sequenceDiagram
    participant Agent
    participant Steward
    participant Network

    Agent->>Network: Send TRACE

    alt Success
        Network->>Steward: Deliver
        Steward-->>Agent: INTERVENTION
    else Timeout (500ms)
        Note over Agent: Attempt 1 Failed
        Agent->>Agent: Wait 100ms + jitter
        Agent->>Network: Retry TRACE

        alt Success on Retry
            Network->>Steward: Deliver
            Steward-->>Agent: INTERVENTION
        else Timeout Again
            Note over Agent: Attempt 2 Failed
            Agent->>Agent: Wait 200ms + jitter
            Agent->>Network: Final Retry

            alt Success on Final Retry
                Network->>Steward: Deliver
                Steward-->>Agent: INTERVENTION
            else Final Timeout
                Note over Agent: All Retries Exhausted
                Agent->>Agent: Escalate for Manual Review
            end
        end
    end

3.4 Canonical Evaluation Order [NORMATIVE]¶

This is the authoritative reference for ACGP governance evaluation order. Implementations MUST follow this sequence when processing traces.

3.4.1 Evaluation Sequence¶

┌─────────────────────────────────────────────────────────────────┐
│                     GOVERNANCE EVALUATION                       │
│                                                                 │
│  1. TRIPWIRE CHECK (Priority 1 - Pre-CTQ)                       │
│     ├── Eval Tier 0 tripwires (<100ms, in-memory)               │
│     ├── Eval Tier 1 tripwires (<300ms, local DB)                │
│     └── If ANY tripwire triggers → IMMEDIATE INTERVENTION       │
│                        │                                        │
│                        ▼ (no tripwire triggered)                │
│  2. CTQ CALCULATION                                             │
│     ├── Load blueprint metrics and scorers                      │
│     ├── Execute each metric scorer                              │
│     ├── Calculate weighted CTQ score                            │
│     └── Compute Risk Score = 1.0 - CTQ                          │
│                        │                                        │
│                        ▼                                        │
│  3. THRESHOLD EVALUATION                                        │
│     ├── Get blueprint thresholds                                │
│     ├── Get ACL tier thresholds                                 │
│     ├── Apply stricter of (blueprint, ACL)                      │
│     └── Determine base intervention                             │
│                        │                                        │
│                        ▼                                        │
│  4. TRUST DEBT APPLICATION                                      │
│     ├── Get current trust debt                                  │
│     ├── Check threshold escalation                              │
│     └── Adjust intervention if needed                           │
│                        │                                        │
│                        ▼                                        │
│  5. FLAG EVALUATION (Orthogonal)                                │
│     ├── Check flag conditions (pattern, near-miss)              │
│     └── Add flag to intervention (combines with any decision)   │
│                        │                                        │
│                        ▼                                        │
│  6. RE-TIERING CHECK                                            │
│     ├── If trust debt > re_tiering_threshold                    │
│     └── Queue ARS re-evaluation                                 │
│                        │                                        │
│                        ▼                                        │
│  7. ISSUE INTERVENTION                                          │
│     └── Return: decision + flag + evidence                      │
└─────────────────────────────────────────────────────────────────┘

3.4.2 Reference Implementation (Pseudo-code)¶

def evaluate_trace(trace: CognitiveTrace, blueprint: Blueprint) -> Intervention:
    """
    Canonical ACGP governance evaluation.

    This is the authoritative evaluation order. All conformant
    implementations MUST follow this sequence.
    """

    # ═══════════════════════════════════════════════════════════
    # STEP 1: TRIPWIRE CHECK (Priority 1 - runs BEFORE CTQ)
    # ═══════════════════════════════════════════════════════════
    tripwire_result = evaluate_tripwires(trace, blueprint.tripwires)

    if tripwire_result.triggered:
        # Tripwires override all other evaluation
        return Intervention(
            decision=tripwire_result.intervention,  # block, halt, etc.
            reason=tripwire_result.reason,
            tripwire_id=tripwire_result.id,
            flagged=tripwire_result.severity in ["critical", "severe"],
            evaluation_stage="tripwire"
        )

    # ═══════════════════════════════════════════════════════════
    # STEP 2: CTQ CALCULATION
    # ═══════════════════════════════════════════════════════════
    ctq_scores = {}
    for metric in blueprint.ctq.metrics:
        scorer = get_scorer(metric.scorer)
        score = scorer.evaluate(trace, metric.parameters)
        ctq_scores[metric.name] = {
            "score": score,
            "weight": metric.weight
        }

    # Weighted average (or other aggregation per blueprint)
    ctq_final = calculate_aggregation(
        ctq_scores, 
        method=blueprint.ctq.aggregation
    )
    risk_score = 1.0 - ctq_final

    # ═══════════════════════════════════════════════════════════
    # STEP 3: THRESHOLD EVALUATION
    # ═══════════════════════════════════════════════════════════
    blueprint_thresholds = blueprint.ctq.thresholds
    acl_thresholds = get_acl_thresholds(trace.acl_tier)

    # Apply stricter threshold (lower value = stricter)
    effective_thresholds = {
        level: min(blueprint_thresholds[level], acl_thresholds[level])
        for level in ["ok", "nudge", "escalate", "block"]
    }

    # Determine base decision from risk score
    base_decision = apply_thresholds(risk_score, effective_thresholds)

    # ═══════════════════════════════════════════════════════════
    # STEP 4: TRUST DEBT APPLICATION
    # ═══════════════════════════════════════════════════════════
    trust_debt = get_current_trust_debt(trace.agent_id)

    # Trust debt can escalate the intervention
    if trust_debt > blueprint.trust_debt.thresholds.restricted_mode:
        base_decision = escalate_decision(base_decision)

    # ═══════════════════════════════════════════════════════════
    # STEP 5: FLAG EVALUATION (Orthogonal - can combine with any)
    # ═══════════════════════════════════════════════════════════
    flagged = False
    flag_reason = None

    # Flag on near-miss (close to threshold boundary)
    if is_near_threshold_boundary(risk_score, effective_thresholds):
        flagged = True
        flag_reason = "near_threshold_boundary"

    # Flag on suspicious pattern
    if detect_suspicious_pattern(trace):
        flagged = True
        flag_reason = "suspicious_pattern"

    # Flag always accumulates trust debt
    if flagged:
        trust_debt_delta = calculate_flag_debt(
            base_decision, 
            blueprint.trust_debt.accumulation
        )
        update_trust_debt(trace.agent_id, trust_debt_delta)

    # ═══════════════════════════════════════════════════════════
    # STEP 6: RE-TIERING CHECK
    # ═══════════════════════════════════════════════════════════
    if trust_debt > blueprint.trust_debt.thresholds.re_tiering_review:
        queue_ars_reevaluation(trace.agent_id)

    # ═══════════════════════════════════════════════════════════
    # STEP 7: ISSUE INTERVENTION
    # ═══════════════════════════════════════════════════════════
    return Intervention(
        decision=base_decision,
        flagged=flagged,
        flag_reason=flag_reason,
        ctq_score=ctq_final,
        risk_score=risk_score,
        trust_debt=trust_debt,
        evidence={
            "ctq_scores": ctq_scores,
            "thresholds_used": effective_thresholds,
            "tripwires_checked": [t.id for t in blueprint.tripwires]
        },
        evaluation_stage="complete"
    )


def evaluate_tripwires(trace: CognitiveTrace, tripwires: List[Tripwire]) -> TripwireResult:
    """
    Evaluate tripwires in priority order.
    Tripwires run BEFORE CTQ and can short-circuit evaluation.
    """
    # Sort by severity (severe > critical > standard)
    sorted_tripwires = sorted(tripwires, key=lambda t: t.severity_priority, reverse=True)

    for tripwire in sorted_tripwires:
        # Check eval tier budget
        if tripwire.eval_tier == 0:
            # Tier 0: must complete in <100ms, no external deps
            result = evaluate_tier0_tripwire(trace, tripwire)
        elif tripwire.eval_tier == 1:
            # Tier 1: can use local DB, target <300ms
            result = evaluate_tier1_tripwire(trace, tripwire)

        if result.triggered:
            return TripwireResult(
                triggered=True,
                id=tripwire.id,
                severity=tripwire.severity,
                intervention=map_severity_to_intervention(tripwire.severity, trace.acl_tier),
                reason=tripwire.on_fail.reason
            )

    return TripwireResult(triggered=False)


def apply_thresholds(risk_score: float, thresholds: dict) -> str:
    """
    Map risk score to intervention decision.
    Lower threshold = stricter (triggers earlier).
    """
    if risk_score <= thresholds["ok"]:
        return "ok"
    elif risk_score <= thresholds["nudge"]:
        return "nudge"
    elif risk_score <= thresholds["escalate"]:
        return "escalate"
    elif risk_score <= thresholds["block"]:
        return "block"
    else:
        return "block"  # HALT is tripwire-only


def map_severity_to_intervention(severity: str, acl_tier: str) -> str:
    """
    Map tripwire severity to intervention based on ACL tier.
    Higher ACL tiers get stricter interventions.
    """
    acl_level = int(acl_tier.replace("ACL-", ""))

    if severity == "severe":
        return "halt"  # Always halt for severe
    elif severity == "critical":
        return "halt" if acl_level >= 3 else "block"
    else:  # standard
        return "block" if acl_level >= 3 else "escalate"  # ESCALATE if ACL ≤ 2

3.4.3 Evaluation State Machine¶

stateDiagram-v2
    [*] --> TripwireCheck: Trace Received

    TripwireCheck --> TripwireTriggered: Tripwire Fails
    TripwireCheck --> CTQCalculation: All Tripwires Pass

    TripwireTriggered --> IssueIntervention: Immediate Decision

    CTQCalculation --> ThresholdEvaluation: CTQ Score Computed

    ThresholdEvaluation --> TrustDebtCheck: Base Decision Made

    TrustDebtCheck --> FlagEvaluation: Debt Applied

    FlagEvaluation --> ReTieringCheck: Flag Decision Made

    ReTieringCheck --> IssueIntervention: Final Decision Ready
    ReTieringCheck --> QueueReTiering: High Trust Debt

    QueueReTiering --> IssueIntervention: Re-tier Queued

    IssueIntervention --> [*]: Intervention Sent

    note right of TripwireCheck
        Priority 1
        Runs BEFORE CTQ
        Can short-circuit
    end note

    note right of FlagEvaluation
        Orthogonal
        Combines with any decision
        Accumulates trust debt
    end note

3.4.4 Precedence Rules¶

Tripwires have absolute priority: If any tripwire triggers, evaluation stops and the tripwire's intervention is returned immediately. CTQ is not calculated.
ACL thresholds override blueprint thresholds: When both exist, the stricter (lower) threshold applies.
Trust debt can only escalate, not relax: Trust debt may increase an intervention's severity but never decrease it.
Flag is orthogonal: The flagged status can be added to ANY intervention (ok, nudge, escalate, block). It does not change the primary decision.
HALT is tripwire-only: The halt intervention can only be issued by tripwires, never by threshold-based CTQ evaluation.

3.5 Formal State Machine Specification¶

This section provides a formal state machine for agent governance states. Implementations SHOULD use this as the reference for state transitions.

3.5.1 Agent Governance States¶

┌─────────────────────────────────────────────────────────────────────┐
│                    AGENT GOVERNANCE STATE MACHINE                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────┐                                                       │
│  │ INACTIVE │ ─── register() ───► ┌──────────┐                      │
│  └──────────┘                     │  NORMAL  │◄─────────────────┐   │
│       ▲                           └────┬─────┘                   │  │
│       │                                │                         │  ││   unregister()                    flag_intervention()            │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ FLAGGED  │                   │  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                      debt > elevated_threshold           │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐    review_passed  │  │
│       │                           │ ELEVATED │ ──────────────────┤  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                      debt > retier_threshold             │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ RE-TIER  │ ── approved ──► NORMAL
│       │                           │ PENDING  │                   │  │
│       │                           └────┬─────┘                   │  │
│       │                                │                         │  │
│       │                           acl_upgraded                   │  │
│       │                                │                         │  │
│       │                                ▼                         │  │
│       │                           ┌──────────┐                   │  │
│       │                           │ UPGRADED │ ─── reset ────────┘  │
│       │                           │ (higher ACL)                    │
│       │                           └──────────┘                      │
│       │                                                             │
│  ┌────┴─────────────────────────────────────────────────────────┐   │
│  │                     TERMINAL STATES                           │  │
│  │                                                               │  │
│  │   ┌──────────┐          ┌──────────┐                         │  │
│  │   │ BLOCKED  │◄── block │ Any state │                        │  │
│  │   └────┬─────┘          └──────────┘                         │  │
│  │        │                                                      │  │
│  │   unblock(manual)                                             │  │
│  │        │                                                      │  │
│  │        └────────────────────────────────► NORMAL              │  │
│  │                                                               │  │
│  │   ┌──────────┐          ┌──────────┐                         │  │
│  │   │  HALTED  │◄── halt  │ Any state │                        │  │
│  │   └──────────┘          └──────────┘                         │  │
│  │        │                                                      │  │
│  │   Manual restart required (terminal)                          │  │
│  │                                                               │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

3.5.2 State Definitions¶

State	Description	Allowed Transitions
`INACTIVE`	Agent not registered	→ NORMAL (register)
`NORMAL`	Operating normally	→ FLAGGED, BLOCKED, HALTED
`FLAGGED`	Trust debt accumulating	→ NORMAL (decay), ELEVATED, BLOCKED, HALTED
`ELEVATED`	Under increased scrutiny	→ NORMAL (review), RE-TIER, BLOCKED, HALTED
`RE-TIER`	Pending ACL re-evaluation	→ UPGRADED, NORMAL (denied), HALTED
`UPGRADED`	ACL tier increased	→ NORMAL (reset)
`BLOCKED`	Temporarily suspended	→ NORMAL (unblock)
`HALTED`	Permanently stopped	Terminal (manual restart)

3.5.3 Transition Functions¶

class AgentGovernanceState:
    """Formal state machine for agent governance."""

    def __init__(self):
        self.state = "INACTIVE"
        self.trust_debt = 0.0
        self.acl_tier = None

    def transition(self, event: str, context: dict) -> str:
        """
        Execute state transition based on event.

        Returns:
            New state after transition
        """
        transitions = {
            ("INACTIVE", "register"): self._register,
            ("NORMAL", "flag"): self._flag,
            ("NORMAL", "block"): lambda c: "BLOCKED",
            ("NORMAL", "halt"): lambda c: "HALTED",
            ("FLAGGED", "evaluate"): self._evaluate_flagged,
            ("FLAGGED", "block"): lambda c: "BLOCKED",
            ("FLAGGED", "halt"): lambda c: "HALTED",
            ("ELEVATED", "review_passed"): lambda c: "NORMAL",
            ("ELEVATED", "retier_triggered"): lambda c: "RE-TIER",
            ("ELEVATED", "halt"): lambda c: "HALTED",
            ("RE-TIER", "approved"): lambda c: "NORMAL",
            ("RE-TIER", "upgraded"): self._upgrade,
            ("RE-TIER", "halt"): lambda c: "HALTED",
            ("BLOCKED", "unblock"): lambda c: "NORMAL",
            ("UPGRADED", "reset"): self._reset_after_upgrade,
        }

        key = (self.state, event)
        if key in transitions:
            self.state = transitions[key](context)
        else:
            raise InvalidTransitionError(f"No transition for {key}")

        return self.state

    def _flag(self, context):
        self.trust_debt += context.get("debt_delta", 0.1)
        return "FLAGGED"

    def _evaluate_flagged(self, context):
        if self.trust_debt > context.get("retier_threshold", 0.75):
            return "RE-TIER"
        elif self.trust_debt > context.get("elevated_threshold", 0.5):
            return "ELEVATED"
        elif self.trust_debt < context.get("normal_threshold", 0.3):
            return "NORMAL"
        return "FLAGGED"

3.5.4 Formal Verification Note¶

For safety-critical implementations, consider formal verification using:

TLA+: For distributed consensus properties
Alloy: For state invariant checking
Spin/Promela: For temporal logic verification

Example invariants to verify: - An agent in HALTED state cannot transition to any other state - Trust debt is monotonically non-decreasing within a session (before decay) - BLOCKED agents cannot issue new actions

4. Data Flow Architecture¶

4.1 Write Path (Trace Processing)¶

flowchart LR
    subgraph "Ingestion"
        T[Cognitive Trace] --> V[Validation]
        V --> VN[Version Check]
        VN --> Q[Message Queue]
    end

    subgraph "Processing"
        Q --> GS[Governance Steward]
        GS --> TW[Tripwire Check]
        TW --> CTQ[CTQ Calculation]
        CTQ --> DEC[Decision Logic]
    end

    subgraph "Storage"
        DEC --> WB[Write Buffer]
        WB --> RDB[(ReflectionDB)]
        WB --> TS[(Time Series)]
    end

    subgraph "Response"
        DEC --> INT[Intervention]
        INT --> PA[Operating Agent]
    end

    style T fill:#d9f99d
    style TW fill:#fef3c7
    style INT fill:#fca5a5
    style RDB fill:#fed7aa

4.2 Read Path (Query & Analytics)¶

flowchart LR
    subgraph "Query Layer"
        API[Query API] --> CACHE[Query Cache]
        CACHE --> QE[Query Engine]
    end

    subgraph "Storage"
        QE --> RDB[(ReflectionDB)]
        QE --> IDX[(Indexes)]
        QE --> AGG[(Aggregates)]
    end

    subgraph "Consumers"
        API --> DASH[Dashboard]
        API --> AUDIT[Audit Tools]
        API --> ANALYTICS[Analytics]
    end

5. Deployment Topologies¶

5.1 Single Agent Governance (Simple)¶

Deployment: Sidecar Pattern
Components:
  - 1 Operating Agent
  - 1 Governance Steward (sidecar)
  - 1 Policy Engine (embedded)
  - 1 ReflectionDB (local SQLite)

Use Cases:
  - Development/Testing
  - Low-risk applications (ACL-0, ACL-1)
  - Edge deployments

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Timeout: 500ms per attempt

5.2 Multi-Agent with Shared Governance (Standard)¶

Deployment: Service Mesh Pattern
Components:
  - N Operating Agents
  - N Governance Stewards (1:1 with agents)
  - 1 Shared Policy Engine (service)
  - 1 Shared ReflectionDB (PostgreSQL/MongoDB)
  - 1 Blueprint Store (git-backed)

Use Cases:
  - Enterprise deployments (ACL-2, ACL-3)
  - Multi-tenant SaaS
  - Microservices architecture

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Timeout: 500ms per attempt
  - Tripwire: 5 failures → open

5.3 Distributed Steward Network (Advanced)¶

Deployment: Federated Pattern
Components:
  - N Operating Agents (across regions)
  - M Governance Stewards (M < N, pooled)
  - Regional Policy Engines
  - Distributed ReflectionDB (Cassandra/CockroachDB)
  - Replicated Blueprint Stores
  - Cross-region Steward coordination

Use Cases:
  - Global deployments (ACL-4, ACL-5)
  - High availability requirements
  - Regulatory compliance (data residency)

Retry Policy:
  - Max attempts: 3
  - Backoff: exponential (100ms base)
  - Regional timeout: 300ms
  - Cross-region timeout: 1000ms
  - Tripwire: per-region

5.4 Reference Minimal Architecture (ACGP-MIN-A1)¶

The ACGP-MIN-A1 architecture is the simplest conformant deployment, suitable for development and POC.

┌─────────────────────────────────────────────────────────────┐
│                      ACGP-MIN-A1                             │
│                                                             │
│  ┌─────────────┐       ┌─────────────────────────────────┐ │
│  │   Primary   │       │      Governance Steward         │ │
│  │   Agent     │──────►│  ┌─────────────────────────┐   │ │
│  │             │       │  │ Policy Engine (embedded)│   │ │
│  │  ACL-0/1    │◄──────│  └─────────────────────────┘   │ │
│  └─────────────┘       │  ┌─────────────────────────┐   │ │
│                        │  │ ReflectionDB (SQLite)   │   │ │
│                        │  └─────────────────────────┘   │ │
│                        │  ┌─────────────────────────┐   │ │
│                        │  │ Blueprint (local YAML)  │   │ │
│                        │  └─────────────────────────┘   │ │
│                        └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Components: - 1 Operating Agent (ACL-0 or ACL-1) - 1 Governance Steward (sidecar or co-located process) - Embedded Policy Engine (in-process) - Local ReflectionDB (SQLite file) - Local Blueprint (YAML file)

NOT Included: - Certified Source Registry - MCP/A2A adapters - Distributed storage - HITL system - Governance contracts

Spec Requirements Satisfied:

Requirement	ACGP-MIN-A1
Core Protocol (ACGP-1000)
Version Negotiation (ACGP-1003)
Basic Interventions (OK, Block)
Full Interventions (all 6)	Optional
Tripwires (Standard)
CTQ Calculation	Simplified
ReflectionDB Retention	Session
Security (TLS)	Optional

Example Deployment:

# docker-compose.yml for ACGP-MIN-A1
version: "3.8"
services:
  agent:
    image: myorg/my-agent:latest
    environment:
      - ACGP_STEWARD_URL=http://steward:8080
      - ACGP_ACL_TIER=ACL-1

  steward:
    image: acgp/steward-minimal:latest
    volumes:
      - ./blueprints:/blueprints
      - ./data:/data
    environment:
      - ACGP_BLUEPRINT_PATH=/blueprints/dev.yaml
      - ACGP_REFLECTIONDB_PATH=/data/reflection.db

5.5 Deployment Decision Matrix¶

Factor	Single Agent	Multi-Agent Shared	Distributed Network
Latency	<10ms	<50ms	<150ms
Throughput	<100 req/s	<10K req/s	>100K req/s
Availability	99%	99.9%	99.99%
Complexity	Low	Medium	High
Cost	$	$$	$$$
Governance Strength	Basic	Standard	Maximum
Retry Overhead	Minimal	Low	Moderate

6. Scaling Considerations¶

6.1 Horizontal Scaling¶

6.1.1 Stateless Components¶

Components that can scale horizontally without coordination: - Governance Stewards (with session affinity) - Policy Engines (read-only blueprint access) - API Gateways - Query services

6.1.2 Stateful Components¶

Components requiring careful scaling strategies: - ReflectionDB (sharding by agent_id or time) - Trust Debt stores (consistent hashing) - Blueprint Store (eventual consistency acceptable)

6.2 Performance Optimization¶

Caching Strategy:
  L1_Cache:
    - Location: Agent Wrapper
    - Contents: Recent interventions, version info
    - TTL: 60 seconds

  L2_Cache:
    - Location: Governance Steward
    - Contents: Blueprint resolutions, CTQ calculations
    - TTL: 300 seconds

  L3_Cache:
    - Location: Policy Engine
    - Contents: Compiled blueprints, threshold tables
    - TTL: 3600 seconds

Batching:
  - Trace batching: Up to 10 traces per request
  - Write batching: 100ms window for ReflectionDB
  - Query batching: GraphQL-style query aggregation

Retry Optimization:
  - Adaptive timeout based on historical latency
  - Jitter to prevent thundering herd
  - Per-destination tripwires

6.3 Load Distribution¶

graph TB
    subgraph "Load Balancer"
        LB[L7 Load Balancer<br/>with Health Checks]
    end

    subgraph "Governance Tier 1 (ACL 0-2)"
        GS1[Steward 1<br/>CPU: 4<br/>RAM: 8GB]
        GS2[Steward 2<br/>CPU: 4<br/>RAM: 8GB]
    end

    subgraph "Governance Tier 2 (ACL 3-5)"
        GS3[Steward 3<br/>CPU: 8<br/>RAM: 16GB]
        GS4[Steward 4<br/>CPU: 8<br/>RAM: 16GB]
    end

    LB -->|Low ACL agents| GS1
    LB -->|Low ACL agents| GS2
    LB -->|High ACL agents| GS3
    LB -->|High ACL agents| GS4

    GS1 -.->|Health Check| LB
    GS2 -.->|Health Check| LB
    GS3 -.->|Health Check| LB
    GS4 -.->|Health Check| LB

7. Integration Patterns¶

7.1 Agent Framework Integration¶

7.1.1 Wrapper Pattern¶

# One-line integration example
from ACGP import GovernanceWrapper

agent = create_langchain_agent()
governed_agent = GovernanceWrapper(
    agent, 
    blueprint="finance/trading",
    retry_policy={
        'max_attempts': 3,
        'timeout_ms': 500,
        'backoff': 'exponential'
    }
)
governed_agent.run()  # All actions now governed

7.1.2 Middleware Pattern¶

# Middleware for existing frameworks
class ACGPMiddleware:
    def __init__(self):
        self.version = "1.0.2"
        self.retry_policy = RetryPolicy(max_attempts=3)

    async def before_tool_call(self, tool, args):
        # Retry logic wrapper
        for attempt in range(self.retry_policy.max_attempts):
            try:
                trace = generate_trace(tool, args)
                intervention = await self.steward.evaluate(
                    trace, 
                    timeout_ms=500
                )

                if intervention.decision == "block":
                    raise BlockedException(intervention.reason)

                return intervention.modified_args or args

            except TimeoutError:
                if attempt < self.retry_policy.max_attempts - 1:
                    await self.retry_policy.backoff(attempt)
                else:
                    return await self.escalate_for_review(tool, args)

7.2 External Protocol Integration¶

7.2.1 MCP (Model Context Protocol)¶

sequenceDiagram
    participant Agent
    participant ACGP
    participant MCP
    participant Tool

    Agent->>ACGP: Request tool use
    ACGP->>ACGP: Evaluate governance (check tripwires)

    alt Approved
        ACGP->>MCP: Forward request
        MCP->>Tool: Execute
        Tool-->>MCP: Result
        MCP-->>ACGP: Response
        ACGP-->>Agent: Tool result
    else Blocked
        ACGP-->>Agent: Blocked + reason
    end

7.2.2 A2A (Agent-to-Agent Protocol)¶

sequenceDiagram
    participant A1 as Agent 1
    participant GS1 as Steward 1
    participant A2A
    participant GS2 as Steward 2
    participant A2 as Agent 2

    A1->>GS1: Prepare message
    GS1->>GS1: Validate outbound (check tripwires)

    alt Allowed
        GS1->>A2A: Send message
        A2A->>GS2: Deliver
        GS2->>GS2: Validate inbound

        alt Accepted
            GS2->>A2: Deliver message
        else Rejected
            GS2->>A2A: Reject
            A2A->>A1: Message rejected
        end
    else Blocked
        GS1->>A1: Outbound blocked
    end

8. High Availability & Resilience¶

8.1 Failure Modes & Recovery¶

Component	Failure Mode	Recovery Strategy	Degraded Operation
Governance Steward	Process crash	Auto-restart, session migration	Failover to backup steward
Policy Engine	Unavailable	Tripwire, cached policies	Use last known good policy
ReflectionDB	Write failure	Write-ahead log, retry queue	Buffer writes locally (max 1000)
Blueprint Store	Unavailable	Local blueprint cache	No policy updates
Trusted Monitor	Timeout	Async processing, skip	Proceed without anomaly check
Version Service	Unavailable	Use cached version info	Assume compatible

8.2 Tripwire Configuration¶

tripwire:
  failure_threshold: 5         # failures to open circuit
  success_threshold: 2         # successes to close circuit
  timeout: 30s                 # time before half-open

  fallback:
    steward: use_cached_decision
    policy_engine: use_default_thresholds
    trusted_monitor: skip_validation
    reflection_db: buffer_locally
    version_negotiation: assume_compatible

  per_component:
    governance_steward:
      failure_threshold: 3
      timeout: 15s
    reflection_db:
      failure_threshold: 10
      timeout: 60s

8.3 Tripwires and Evaluation Tiers¶

Relationship: Tripwires (policy constraints) implement Evaluation Tiers (architectural patterns).

Tripwires define WHAT to check. Evaluation Tiers (see ACGP-1010) define HOW and WHEN to check.

8.3.1 Classifying Tripwires by Evaluation Tier¶

Eval Tier 0 Tripwires (must be <100ms P99, no external dependencies): - Authentication failures - Schema validation - Critical safety limits (e.g., "never delete production database") - Hard monetary limits for immediate rejection - In-memory rate limiting

Example:

tripwires:
  - name: "max_single_transaction"
    threshold: 10000
    eval_tier: 0
    latency_budget_ms: 10
    check_type: "in_memory"
    fail_mode: "closed"

Eval Tier 1 Tripwires (may be slower, can use local DB): - Rate limiting with external state (Redis lookup) - Daily/monthly aggregate limits (requires DB query) - Stateful pattern checks - Cached policy decisions

Example:

tripwires:
  - name: "daily_transaction_limit"
    threshold: 50000
    eval_tier: 1
    latency_budget_ms: 100
    check_type: "db_lookup"
    requires_state: true
    fail_mode: "configurable"

8.3.2 Decision Matrix for Tripwire Classification¶

Tripwire Characteristic	Suggested Eval Tier
No external dependencies	Tier 0
Latency < 10ms	Tier 0
Critical safety (can't fail open)	Tier 0
Requires DB/cache lookup	Tier 1
Aggregate/windowed limit	Tier 1
Complex calculation	Tier 1
LLM-based evaluation	Tier 2
Human review	Tier 3

8.3.3 Implementation Guidance¶

Tier 0 Implementation (REQUIRED for ACGP-1010 conformance):

class Tier0Tripwires:
    def __init__(self, config):
        # Load tripwires into memory
        self.tripwires = [
            t for t in config.tripwires 
            if t.eval_tier == 0
        ]
        # MUST be fast and local
        assert all(not t.requires_external for t in self.tripwires)

    def evaluate(self, request):
        """MUST complete in <100ms."""
        for tripwire in self.tripwires:
            if tripwire.triggered(request):
                return TripwireResult.BLOCK
        return TripwireResult.PASS

Tier 1 Implementation (typical):

class Tier1Tripwires:
    def __init__(self, config, redis_client):
        self.tripwires = [
            t for t in config.tripwires 
            if t.eval_tier == 1
        ]
        self.redis = redis_client  # Local cache/DB allowed

    async def evaluate(self, request):
        """Target <300ms, may query local DB."""
        for tripwire in self.tripwires:
            if tripwire.requires_state:
                state = await self.redis.get(tripwire.state_key)
                if tripwire.triggered_with_state(request, state):
                    return TripwireResult.BLOCK
            elif tripwire.triggered(request):
                return TripwireResult.BLOCK
        return TripwireResult.PASS

8.4 Retry Policy Implementation¶

class RetryPolicy:
    def __init__(self, max_attempts=3, base_delay_ms=100, 
                 max_delay_ms=5000, timeout_ms=500):
        self.max_attempts = max_attempts
        self.base_delay_ms = base_delay_ms
        self.max_delay_ms = max_delay_ms
        self.timeout_ms = timeout_ms

    async def execute_with_retry(self, operation):
        """Execute operation with exponential backoff retry."""
        for attempt in range(self.max_attempts):
            try:
                return await asyncio.wait_for(
                    operation(),
                    timeout=self.timeout_ms / 1000
                )
            except asyncio.TimeoutError:
                if attempt < self.max_attempts - 1:
                    delay = min(
                        self.base_delay_ms * (2 ** attempt),
                        self.max_delay_ms
                    )
                    # Add jitter (±10%)
                    jitter = random.uniform(-0.1, 0.1) * delay
                    await asyncio.sleep((delay + jitter) / 1000)
                else:
                    # Final failure - escalate
                    raise RetryExhaustedError(
                        f"Failed after {self.max_attempts} attempts"
                    )

8.5 Graceful Degradation Levels¶

stateDiagram-v2
    [*] --> Normal: All systems operational

    Normal --> Degraded: Non-critical failure
    Degraded --> Essential: Multiple failures
    Essential --> Emergency: Critical failures
    Emergency --> Shutdown: Safety threshold exceeded

    Normal --> Normal: Self-healing
    Degraded --> Normal: Recovery
    Essential --> Degraded: Partial recovery

    note right of Degraded
        - Increased cache usage
        - Relaxed consistency
        - Async interventions
        - Skip version negotiation
    end note

    note right of Essential
        - Only critical interventions
        - Batch processing
        - Manual escalation
        - Cached policies only
    end note

    note right of Emergency
        - Block all high-risk
        - Force human review
        - Read-only mode
        - Emergency override active
    end note

8.6 Governance Contract Architecture Patterns¶

Stewards MAY adopt different architectural patterns based on cost, latency, and quality requirements. All patterns are conformant if they meet ACGP-1009 and ACGP-1010 requirements.

8.6.1 Pattern Comparison¶

Pattern	Eval Tiers	Target Latency	Monthly Cost (est.)	Use Case
Rule-Only	0, 1	100–300ms	$500	High-volume transactional agents
Hybrid	0, 1, 2 (async 3)	100–300ms (5s async)	$2,000	Balanced quality/performance
Max Quality	0, 1, 2, 3	100ms–5s	$20,000	Safety-critical, low-volume

8.6.2 Rule-Only Pattern¶

Tiers Used: Eval-0 (static rules) + Eval-1 (cached/indexed DB lookups)

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (100ms budget)
       ▼
┌─────────────────────────────────────┐
│ Steward                             │
│ ┌─────────────┐   ┌──────────────┐ │
│ │ Eval-0      │   │ Eval-1       │ │
│ │ In-memory   │──▶│ Cache/Index  │ │
│ │ rules <50ms │   │ <250ms       │ │
│ └─────────────┘   └──────────────┘ │
└─────────────────────────────────────┘
       │ EVAL_RESPONSE (allow/deny)
       ▼
┌─────────────┐
│ Agent       │
└─────────────┘

Fallback Strategy: deny (conservative)

Trade-offs: - Lowest cost, predictable latency - No LLM reasoning, limited context depth - Cannot handle novel/ambiguous actions

8.6.3 Hybrid Pattern¶

Tiers Used: Eval-0, 1 (sync) + Eval-2 (async model inference) + optional Eval-3 (human review)

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (300ms budget)
       ▼
┌──────────────────────────────────────────────────┐
│ Steward                                          │
│ ┌──────┐    ┌──────┐    ┌────────────────────┐   │
│ │ Tier │──▶│ Tier │──▶   Tier 2 (async)     │   │
│ │  0   │    │  1   │    │ Model inference    │   │
│ └──────┘    └──────┘    │ Post-action        │   │
│                         └────────────────────┘   │
└──────────────────────────────────────────────────┘
       │ EVAL_RESPONSE (allow_and_log)
       ▼
┌─────────────┐         (background)
│ Agent       │    ┌───▶ Model review
└─────────────┘    │     Human escalation (Tier 3)
                   │     Retroactive intervention
                   └─────

Fallback Strategy: allow_and_log (permissive with async audit)

Trade-offs: - Fast sync path, deep async reasoning - Cost-effective for most workflows - [WARNING] Risk window during async evaluation - [WARNING] Requires robust post-action intervention mechanisms

8.6.4 Max Quality Pattern¶

Tiers Used: All tiers (0, 1, 2, 3) synchronously

Architecture:

┌─────────────┐
│ Agent       │
└──────┬──────┘
       │ EVAL_REQUEST (5000ms budget, critical_risk)
       ▼
┌───────────────────────────────────────────────────────┐
│ Steward                                               │
│ ┌──────┐   ┌──────┐   ┌──────┐   ┌────────────────┐ │
│ │ Tier │──▶│ Tier │──▶│ Tier │──▶│ Tier 3         │ │
│ │  0   │   │  1   │   │  2   │   │ Human review   │ │
│ │      │   │      │   │ LLM  │   │ (on-demand)    │ │
│ └──────┘   └──────┘   └──────┘   └────────────────┘ │
└───────────────────────────────────────────────────────┘
       │ EVAL_RESPONSE (wait for all tiers)
       ▼
┌─────────────┐
│ Agent       │
└─────────────┘

Fallback Strategy: escalate (require human decision on timeout)

Trade-offs: - [YES] Maximum safety and quality - [YES] Full audit trail with human oversight - [NO] High cost (model + human labor) - [NO] High latency (up to 5s + human response time)

8.6.5 Pattern Selection Matrix¶

Choose pattern based on:

Requirement	Rule-Only	Hybrid	Max Quality
Actions/sec > 100	[YES]	[WARNING]	[NO]
Cost < $1k/month	[YES]	[WARNING]	[NO]
Latency < 300ms guaranteed	[YES]	[YES]	[NO]
Novel actions frequent	[NO]	[YES]	[YES]
Safety-critical domain	[NO]	[WARNING]	[YES]
Audit/compliance required	[WARNING]	[YES]	[YES]

Deployment Note: Stewards MAY implement multiple patterns and select per-agent or per-action based on "risk_level" in governance contracts (ACGP-1010).

9. Performance Requirements¶

9.1 Unified Latency Model¶

End-to-End (E2E) Latency Definition: Measured from agent request submission to agent receipt of governance decision.

Components:

E2E Latency = Network(agent→steward) 
            + Protocol Overhead (parsing, validation)
            + Governance Evaluation  ← Largest component
            + Network(steward→agent)

9.2 Latency Targets by Risk Level¶

Risk-based latency budgets (see ACGP-1010 Governance Contracts):

Component	Low Risk	Elevated Risk	Critical Risk
Network (round-trip)	20ms	20ms	50ms
Protocol overhead	30ms	30ms	50ms
Governance evaluation	50ms	250ms	4900ms
TOTAL E2E (P99)	100ms	300ms	5000ms

Governance Evaluation Budget Allocation:

Low Risk (50ms total): - Eval Tier 0 (must-pass checks): 30ms - Eval Tier 1 (fast policy): 20ms - Eval Tier 2: Async only

Elevated Risk (250ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: Async audit

Critical Risk (4900ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: 4500ms (synchronous LLM analysis) - Eval Tier 3: Human time (separate)

Use the Latency Calculator to model budget allocation for your use case.

9.3 Component-Level Latency Targets¶

Operation	P50	P95	P99	Notes
Version Negotiation	10ms	25ms	50ms	One-time per connection
Eval Tier 0 (must-pass)	20ms	50ms	100ms	REQUIRED <100ms
Eval Tier 1 (policy)	50ms	150ms	300ms	Target, not requirement
Eval Tier 2 (LLM)	2000ms	5000ms	10000ms	Async recommended
ReflectionDB Write	10ms	50ms	200ms	Asynchronous

9.4 Throughput Requirements¶

ACL Tier	Traces/sec per Agent	Batch Size	Queue Depth	Retry Budget
ACL-0	10	1	100	10%
ACL-1	50	5	500	10%
ACL-2	100	10	1000	15%
ACL-3	200	20	2000	15%
ACL-4	500	50	5000	20%
ACL-5	1000	100	10000	20%

9.5 Resource Requirements¶

Minimum_Requirements:
  Governance_Steward:
    CPU: 2 cores
    Memory: 4GB
    Network: 100Mbps
    Disk: 10GB SSD
    Connections: 1000 concurrent

  Policy_Engine:
    CPU: 4 cores
    Memory: 8GB
    Network: 1Gbps
    Disk: 20GB SSD
    Cache: 2GB Redis

  ReflectionDB:
    CPU: 8 cores
    Memory: 32GB
    Network: 10Gbps
    Disk: 1TB NVMe SSD
    IOPS: 10000
    Write_Buffer: 100MB

9.6 Observability Standards [NORMATIVE]¶

Implementations at Standard conformance level MUST expose the following metrics.

9.6.1 Required Metrics¶

Prometheus Format:

# Governance Evaluation Metrics
acgp_evaluation_total{agent_id, acl_tier, decision} counter
acgp_evaluation_latency_seconds{agent_id, acl_tier, eval_tier, quantile} summary
acgp_ctq_score{agent_id, acl_tier, metric} gauge

# Intervention Metrics
acgp_intervention_total{agent_id, decision, tripwire_id} counter
acgp_intervention_latency_seconds{decision, quantile} summary

# Trust Debt Metrics
acgp_trust_debt{agent_id} gauge
acgp_trust_debt_delta_total{agent_id, reason} counter

# Tripwire Metrics
acgp_tripwire_triggered_total{tripwire_id, severity, agent_id} counter
acgp_tripwire_latency_seconds{tripwire_id, eval_tier, quantile} summary

# System Health
acgp_steward_status{steward_id} gauge  # 0=down, 1=degraded, 2=normal
acgp_reflectiondb_write_latency_seconds{quantile} summary
acgp_reflectiondb_size_bytes gauge

9.6.2 Standard Metric Labels¶

Label	Description	Values
`agent_id`	Unique agent identifier	UUID
`acl_tier`	Agent's ACL tier	ACL-0 through ACL-5
`decision`	Intervention decision	ok, nudge, flag, escalate, block, halt
`eval_tier`	Evaluation tier	0, 1, 2, 3
`tripwire_id`	Tripwire identifier	string
`severity`	Tripwire severity	standard, critical, severe
`quantile`	Percentile bucket	0.5, 0.9, 0.95, 0.99

9.6.3 Required Endpoints¶

Standard conformance implementations MUST expose:

endpoints:
  /metrics:
    format: prometheus
    auth: optional

  /health:
    format: json
    response:
      status: healthy|degraded|unhealthy
      components:
        policy_engine: ok|error
        reflectiondb: ok|error
        steward: ok|error

  /ready:
    format: json
    response:
      ready: true|false
      reason: string

9.6.4 Alerting Recommendations¶

alerts:
  - name: HighHaltRate
    expr: rate(acgp_intervention_total{decision="halt"}[5m]) > 0.01
    severity: critical

  - name: EvaluationLatencyHigh
    expr: acgp_evaluation_latency_seconds{quantile="0.99"} > 0.5
    severity: warning

  - name: TrustDebtCritical
    expr: acgp_trust_debt > 0.75
    severity: warning

  - name: StewardDegraded
    expr: acgp_steward_status < 2
    severity: warning

10. Security Architecture¶

10.1 Defense in Depth Layers¶

graph TB
    subgraph "Layer 1: Network Security"
        FW[Firewall]
        IDS[IDS/IPS]
        TLS[TLS 1.3]
    end

    subgraph "Layer 2: Authentication"
        VER[Version Auth]
        OAUTH[OAuth 2.0]
        MTLS[Mutual TLS]
    end

    subgraph "Layer 3: Authorization"
        RBAC[Role-Based Access]
        ABAC[Attribute-Based Access]
        MPA[Multi-Party Auth]
    end

    subgraph "Layer 4: Message Security"
        SIGN[ES256 Signatures]
        ENCRYPT[Encryption at Rest]
        CHECKSUM[SHA-256 Integrity]
    end

    subgraph "Layer 5: Audit & Monitoring"
        AUDIT[Audit Logging]
        SIEM[SIEM Integration]
        ALERT[Security Alerts]
    end

10.2 Zero Trust Architecture¶

zero_trust_principles:
  never_trust_always_verify:
    - Verify every transaction
    - Version check on every connection
    - No implicit trust based on network location
    - Continuous validation of security posture

  least_privilege_access:
    - Minimal permissions by default
    - Time-bound credential elevation
    - Regular permission audits

  assume_breach:
    - Comprehensive logging
    - Anomaly detection
    - Tripwire system
    - Incident response readiness

  verify_explicitly:
    - Multi-factor authentication
    - Device compliance checks
    - Risk-based access controls

10.3 Cryptographic Requirements¶

Component	Requirement	Algorithm	Key Size
Transport	Encryption	TLS 1.3	2048-bit RSA / 256-bit ECC
Message Signing	Non-repudiation	ES256 (ECDSA)	256-bit
Checksum	Integrity	SHA-256	256-bit
Storage	Encryption at rest	AES-256-GCM	256-bit
Key Derivation	Key generation	PBKDF2	100,000 iterations

Note: ES256 is standardized throughout ACGP. All implementations MUST use ES256 for message signing.

11. Conformance Requirements¶

A conformant ACGP architecture implementation MUST:

11.1 Component Requirements¶

Implement all core components defined in Section 2
Support at least one deployment topology from Section 5
Meet the latency requirements in Section 9.1 for target ACL tier
Implement version negotiation as first step in protocol flow

11.2 Integration Requirements¶

Provide wrapper or middleware for at least one agent framework
Support the standard message formats defined in ACGP-1003
Implement the security requirements in Section 10
Use ES256 for all message signatures (ACL-3+)

11.3 Operational Requirements¶

Maintain an append-only audit trail in ReflectionDB
Support graceful degradation as defined in Section 8
Provide monitoring and alerting capabilities
Implement retry policy with exponential backoff

11.4 Scaling Requirements¶

Support horizontal scaling of stateless components
Implement caching strategy for performance optimization
Handle at least the throughput specified for the target ACL tier
Support tripwires for all external dependencies

11.5 Resilience Requirements¶

Implement retry policy (3 attempts, exponential backoff)
Support timeout handling (500ms default)
Maintain tripwires for failure isolation
Buffer writes locally during ReflectionDB unavailability (max 1000 events)

12. References¶

Normative References¶

ACGP-1000: Core Protocol Specification
ACGP-1001: Terminology and Definitions
ACGP-1003: Message Formats & Wire Protocol
ACGP-1004: Reflection Blueprint Specification
ACGP-1005: ARS-CTQ-ACL Integration Framework
ACGP-1007: Security Considerations
RFC 2119: Key words for use in RFCs

Informative References¶

NIST Cybersecurity Framework: Security architecture guidance
ISO 27001: Information security management
The Twelve-Factor App: Scalability principles
Google SRE Book: Reliability engineering practices
Tripwire Pattern: Martin Fowler's design patterns

End of ACGP-1002