ACGP-1002: Architecture Specification¶
Status: Draft Last Updated: 2026-01-08 Spec ID: ACGP-1002 Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)
Abstract¶
This document specifies the system architecture for the Agentic Cognitive Governance Protocol (ACGP). It defines the components, their interactions, deployment patterns, and scaling considerations. The architecture supports multiple deployment topologies from simple single-agent governance to complex multi-agent networks with distributed stewardship. This specification provides normative requirements for component interfaces, data flows, integration patterns, retry behavior, and version negotiation that enable interoperable ACGP implementations.
Table of Contents¶
- Introduction
- System Components
- Component Interactions
- Data Flow Architecture
- Deployment Topologies
- Scaling Considerations
- Integration Patterns
- High Availability & Resilience
- Performance Requirements
- Security Architecture
- Conformance Requirements
- References
1. Introduction¶
The ACGP architecture is designed around principles of separation of concerns, defense in depth, and runtime adaptability. It enables real-time governance without introducing prohibitive latency while maintaining comprehensive audit trails and supporting human oversight.
1.1 Design Principles¶
- Separation of Concerns: Governance logic is separate from agent logic
- Defense in Depth: Multiple layers of validation and intervention
- Minimal Intrusion: One-line integration for existing agents
- Scalability: Horizontal scaling of governance components
- Resilience: Graceful degradation under failure conditions
- Observability: Complete visibility into all decisions
1.2 Requirements Language¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
2. System Components¶
2.1 Core Components¶
graph TB
subgraph "Governance Layer"
GS[Governance Steward]
PE[Policy Engine]
TM[Trusted Monitor]
RE[Reflection Engine]
end
subgraph "Agent Layer"
PA[Operating Agent]
SA[Sub-Agents]
AW[Agent Wrapper/SDK]
end
subgraph "Storage Layer"
RDB[(ReflectionDB)]
CSR[(Certified Source Registry)]
BPS[(Blueprint Store)]
end
subgraph "Control Plane"
HITL[Human-In-The-Loop]
DASH[Monitoring Dashboard]
ALERT[Alert Manager]
CONFIG[Configuration Service]
end
subgraph "Integration Layer"
MCP[MCP Adapter]
A2A[A2A Adapter]
API[REST/gRPC API]
end
2.2 Component Descriptions¶
2.2.1 Governance Steward¶
- Purpose: Primary governance enforcement agent
- Responsibilities:
- Observe Cognitive Traces from Operating Agents
- Calculate CTQ scores using Reflection Engine
- Issue Interventions (all six types: OK, Nudge, Escalate, Block, Halt, Flag)
- Manage Trust Debt accumulation
- Evaluate Tripwires (Standard, Critical, Severe)
- Coordinate with other Stewards in a Steward Network
2.2.2 Policy Engine¶
- Purpose: Decision engine for interventions
- Responsibilities:
- Apply ACL-tier specific thresholds (Risk Score based)
- Evaluate tripwire conditions
- Map CTQ scores to intervention decisions
- Handle dynamic threshold adjustments
- Support Flag orthogonality
2.2.3 Trusted Monitor¶
- Purpose: Independent anomaly detection
- Responsibilities:
- Parallel evaluation of agent behavior
- Pattern-based threat detection
- Behavioral drift monitoring
- Zero-trust verification of agent claims
2.2.4 Operating Agent¶
- Purpose: The AI agent performing actual work
- Responsibilities:
- Execute tasks and use tools
- Generate Cognitive Traces
- Respond to Interventions
- Manage sub-agent lifecycle
2.2.5 ReflectionDB¶
- Purpose: Immutable audit trail
- Requirements:
- Append-only architecture
- Cryptographic integrity (hash chain)
- Time-series optimization
- Compliance-grade retention
2.2.6 Blueprint Store¶
- Purpose: Centralized policy repository
- Features:
- Version control for blueprints
- Multi-Party Authorization (MPA) for changes
- Inheritance resolution
- Hot-reload capability
3. Component Interactions¶
3.1 Primary Flow Sequence with Version Negotiation¶
sequenceDiagram
participant PA as Operating Agent
participant AW as Agent Wrapper
participant GS as Governance Steward
participant PE as Policy Engine
participant TM as Trusted Monitor
participant RDB as ReflectionDB
Note over PA,GS: Initial Connection
PA->>GS: VERSION_NEGOTIATION
GS-->>PA: VERSION_SELECTED (1.0.2)
Note over PA,RDB: Runtime Governance Loop
PA->>AW: Execute Action
AW->>AW: Generate Cognitive Trace
par Parallel Processing
AW->>GS: Send TRACE
GS->>PE: Evaluate with Blueprint
and
AW->>TM: Send TRACE
TM->>TM: Anomaly Detection
end
PE->>PE: Check Tripwires (Priority 1)
alt Tripwire Triggered
PE->>GS: Immediate Override Decision
GS->>AW: INTERVENTION (Block/Halt)
else No Tripwire
PE->>PE: Calculate CTQ Score
PE->>PE: Compute Risk Score (1.0 - CTQ)
PE->>PE: Apply ACL Thresholds
PE->>GS: Return Decision
end
TM-->>GS: Flag Anomalies (if any)
GS->>GS: Reconcile Decisions
GS->>GS: Check Flag Orthogonality
alt Intervention = OK or NUDGE
GS->>AW: Send INTERVENTION
AW->>PA: Proceed (modified if NUDGE)
else Intervention = ESCALATE
GS->>HITL: Request Human Review
HITL-->>GS: Human Decision
GS->>AW: Forward Decision
else Intervention = BLOCK or HALT
GS->>AW: Send INTERVENTION
AW->>PA: Stop Execution
Note over PA: Action blocked or session terminated
end
alt Flagged
GS->>GS: Update Trust Debt
GS->>GS: Check Re-tier Threshold
end
GS->>RDB: Log Complete Event (TRACE + EVAL + INTERVENTION)
3.2 Trust Debt Accumulation Flow¶
stateDiagram-v2
[*] --> Normal: Initial State
Normal --> Flagged: Flag Intervention
Flagged --> DebtAccumulation: Add Flag Weight (0.1/0.3/0.5)
DebtAccumulation --> ThresholdCheck: Check Debt Level
ThresholdCheck --> Normal: Debt < Warning
ThresholdCheck --> Elevated: Debt > Warning
ThresholdCheck --> Critical: Debt > Critical
Elevated --> TightenedControl: Reduce Thresholds
Critical --> ForcedReTier: Trigger ARS Re-evaluation
ForcedReTier --> HigherACL: Increase ACL Tier
HigherACL --> Normal: Reset with New Tier
Normal --> DebtDecay: Time Passes (0.95 per 24h)
Elevated --> DebtDecay: Time Passes
DebtDecay --> Normal: Reduce Debt
3.3 Retry and Timeout Behavior¶
sequenceDiagram
participant Agent
participant Steward
participant Network
Agent->>Network: Send TRACE
alt Success
Network->>Steward: Deliver
Steward-->>Agent: INTERVENTION
else Timeout (500ms)
Note over Agent: Attempt 1 Failed
Agent->>Agent: Wait 100ms + jitter
Agent->>Network: Retry TRACE
alt Success on Retry
Network->>Steward: Deliver
Steward-->>Agent: INTERVENTION
else Timeout Again
Note over Agent: Attempt 2 Failed
Agent->>Agent: Wait 200ms + jitter
Agent->>Network: Final Retry
alt Success on Final Retry
Network->>Steward: Deliver
Steward-->>Agent: INTERVENTION
else Final Timeout
Note over Agent: All Retries Exhausted
Agent->>Agent: Escalate for Manual Review
end
end
end
3.4 Canonical Evaluation Order [NORMATIVE]¶
This is the authoritative reference for ACGP governance evaluation order. Implementations MUST follow this sequence when processing traces.
3.4.1 Evaluation Sequence¶
┌─────────────────────────────────────────────────────────────────┐
│ GOVERNANCE EVALUATION │
│ │
│ 1. TRIPWIRE CHECK (Priority 1 - Pre-CTQ) │
│ ├── Eval Tier 0 tripwires (<100ms, in-memory) │
│ ├── Eval Tier 1 tripwires (<300ms, local DB) │
│ └── If ANY tripwire triggers → IMMEDIATE INTERVENTION │
│ │ │
│ ▼ (no tripwire triggered) │
│ 2. CTQ CALCULATION │
│ ├── Load blueprint metrics and scorers │
│ ├── Execute each metric scorer │
│ ├── Calculate weighted CTQ score │
│ └── Compute Risk Score = 1.0 - CTQ │
│ │ │
│ ▼ │
│ 3. THRESHOLD EVALUATION │
│ ├── Get blueprint thresholds │
│ ├── Get ACL tier thresholds │
│ ├── Apply stricter of (blueprint, ACL) │
│ └── Determine base intervention │
│ │ │
│ ▼ │
│ 4. TRUST DEBT APPLICATION │
│ ├── Get current trust debt │
│ ├── Check threshold escalation │
│ └── Adjust intervention if needed │
│ │ │
│ ▼ │
│ 5. FLAG EVALUATION (Orthogonal) │
│ ├── Check flag conditions (pattern, near-miss) │
│ └── Add flag to intervention (combines with any decision) │
│ │ │
│ ▼ │
│ 6. RE-TIERING CHECK │
│ ├── If trust debt > re_tiering_threshold │
│ └── Queue ARS re-evaluation │
│ │ │
│ ▼ │
│ 7. ISSUE INTERVENTION │
│ └── Return: decision + flag + evidence │
└─────────────────────────────────────────────────────────────────┘
3.4.2 Reference Implementation (Pseudo-code)¶
def evaluate_trace(trace: CognitiveTrace, blueprint: Blueprint) -> Intervention:
"""
Canonical ACGP governance evaluation.
This is the authoritative evaluation order. All conformant
implementations MUST follow this sequence.
"""
# ═══════════════════════════════════════════════════════════
# STEP 1: TRIPWIRE CHECK (Priority 1 - runs BEFORE CTQ)
# ═══════════════════════════════════════════════════════════
tripwire_result = evaluate_tripwires(trace, blueprint.tripwires)
if tripwire_result.triggered:
# Tripwires override all other evaluation
return Intervention(
decision=tripwire_result.intervention, # block, halt, etc.
reason=tripwire_result.reason,
tripwire_id=tripwire_result.id,
flagged=tripwire_result.severity in ["critical", "severe"],
evaluation_stage="tripwire"
)
# ═══════════════════════════════════════════════════════════
# STEP 2: CTQ CALCULATION
# ═══════════════════════════════════════════════════════════
ctq_scores = {}
for metric in blueprint.ctq.metrics:
scorer = get_scorer(metric.scorer)
score = scorer.evaluate(trace, metric.parameters)
ctq_scores[metric.name] = {
"score": score,
"weight": metric.weight
}
# Weighted average (or other aggregation per blueprint)
ctq_final = calculate_aggregation(
ctq_scores,
method=blueprint.ctq.aggregation
)
risk_score = 1.0 - ctq_final
# ═══════════════════════════════════════════════════════════
# STEP 3: THRESHOLD EVALUATION
# ═══════════════════════════════════════════════════════════
blueprint_thresholds = blueprint.ctq.thresholds
acl_thresholds = get_acl_thresholds(trace.acl_tier)
# Apply stricter threshold (lower value = stricter)
effective_thresholds = {
level: min(blueprint_thresholds[level], acl_thresholds[level])
for level in ["ok", "nudge", "escalate", "block"]
}
# Determine base decision from risk score
base_decision = apply_thresholds(risk_score, effective_thresholds)
# ═══════════════════════════════════════════════════════════
# STEP 4: TRUST DEBT APPLICATION
# ═══════════════════════════════════════════════════════════
trust_debt = get_current_trust_debt(trace.agent_id)
# Trust debt can escalate the intervention
if trust_debt > blueprint.trust_debt.thresholds.restricted_mode:
base_decision = escalate_decision(base_decision)
# ═══════════════════════════════════════════════════════════
# STEP 5: FLAG EVALUATION (Orthogonal - can combine with any)
# ═══════════════════════════════════════════════════════════
flagged = False
flag_reason = None
# Flag on near-miss (close to threshold boundary)
if is_near_threshold_boundary(risk_score, effective_thresholds):
flagged = True
flag_reason = "near_threshold_boundary"
# Flag on suspicious pattern
if detect_suspicious_pattern(trace):
flagged = True
flag_reason = "suspicious_pattern"
# Flag always accumulates trust debt
if flagged:
trust_debt_delta = calculate_flag_debt(
base_decision,
blueprint.trust_debt.accumulation
)
update_trust_debt(trace.agent_id, trust_debt_delta)
# ═══════════════════════════════════════════════════════════
# STEP 6: RE-TIERING CHECK
# ═══════════════════════════════════════════════════════════
if trust_debt > blueprint.trust_debt.thresholds.re_tiering_review:
queue_ars_reevaluation(trace.agent_id)
# ═══════════════════════════════════════════════════════════
# STEP 7: ISSUE INTERVENTION
# ═══════════════════════════════════════════════════════════
return Intervention(
decision=base_decision,
flagged=flagged,
flag_reason=flag_reason,
ctq_score=ctq_final,
risk_score=risk_score,
trust_debt=trust_debt,
evidence={
"ctq_scores": ctq_scores,
"thresholds_used": effective_thresholds,
"tripwires_checked": [t.id for t in blueprint.tripwires]
},
evaluation_stage="complete"
)
def evaluate_tripwires(trace: CognitiveTrace, tripwires: List[Tripwire]) -> TripwireResult:
"""
Evaluate tripwires in priority order.
Tripwires run BEFORE CTQ and can short-circuit evaluation.
"""
# Sort by severity (severe > critical > standard)
sorted_tripwires = sorted(tripwires, key=lambda t: t.severity_priority, reverse=True)
for tripwire in sorted_tripwires:
# Check eval tier budget
if tripwire.eval_tier == 0:
# Tier 0: must complete in <100ms, no external deps
result = evaluate_tier0_tripwire(trace, tripwire)
elif tripwire.eval_tier == 1:
# Tier 1: can use local DB, target <300ms
result = evaluate_tier1_tripwire(trace, tripwire)
if result.triggered:
return TripwireResult(
triggered=True,
id=tripwire.id,
severity=tripwire.severity,
intervention=map_severity_to_intervention(tripwire.severity, trace.acl_tier),
reason=tripwire.on_fail.reason
)
return TripwireResult(triggered=False)
def apply_thresholds(risk_score: float, thresholds: dict) -> str:
"""
Map risk score to intervention decision.
Lower threshold = stricter (triggers earlier).
"""
if risk_score <= thresholds["ok"]:
return "ok"
elif risk_score <= thresholds["nudge"]:
return "nudge"
elif risk_score <= thresholds["escalate"]:
return "escalate"
elif risk_score <= thresholds["block"]:
return "block"
else:
return "block" # HALT is tripwire-only
def map_severity_to_intervention(severity: str, acl_tier: str) -> str:
"""
Map tripwire severity to intervention based on ACL tier.
Higher ACL tiers get stricter interventions.
"""
acl_level = int(acl_tier.replace("ACL-", ""))
if severity == "severe":
return "halt" # Always halt for severe
elif severity == "critical":
return "halt" if acl_level >= 3 else "block"
else: # standard
return "block" if acl_level >= 3 else "escalate" # ESCALATE if ACL ≤ 2
3.4.3 Evaluation State Machine¶
stateDiagram-v2
[*] --> TripwireCheck: Trace Received
TripwireCheck --> TripwireTriggered: Tripwire Fails
TripwireCheck --> CTQCalculation: All Tripwires Pass
TripwireTriggered --> IssueIntervention: Immediate Decision
CTQCalculation --> ThresholdEvaluation: CTQ Score Computed
ThresholdEvaluation --> TrustDebtCheck: Base Decision Made
TrustDebtCheck --> FlagEvaluation: Debt Applied
FlagEvaluation --> ReTieringCheck: Flag Decision Made
ReTieringCheck --> IssueIntervention: Final Decision Ready
ReTieringCheck --> QueueReTiering: High Trust Debt
QueueReTiering --> IssueIntervention: Re-tier Queued
IssueIntervention --> [*]: Intervention Sent
note right of TripwireCheck
Priority 1
Runs BEFORE CTQ
Can short-circuit
end note
note right of FlagEvaluation
Orthogonal
Combines with any decision
Accumulates trust debt
end note
3.4.4 Precedence Rules¶
-
Tripwires have absolute priority: If any tripwire triggers, evaluation stops and the tripwire's intervention is returned immediately. CTQ is not calculated.
-
ACL thresholds override blueprint thresholds: When both exist, the stricter (lower) threshold applies.
-
Trust debt can only escalate, not relax: Trust debt may increase an intervention's severity but never decrease it.
-
Flag is orthogonal: The
flaggedstatus can be added to ANY intervention (ok, nudge, escalate, block). It does not change the primary decision. -
HALT is tripwire-only: The
haltintervention can only be issued by tripwires, never by threshold-based CTQ evaluation.
3.5 Formal State Machine Specification¶
This section provides a formal state machine for agent governance states. Implementations SHOULD use this as the reference for state transitions.
3.5.1 Agent Governance States¶
┌─────────────────────────────────────────────────────────────────────┐
│ AGENT GOVERNANCE STATE MACHINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ INACTIVE │ ─── register() ───► ┌──────────┐ │
│ └──────────┘ │ NORMAL │◄─────────────────┐ │
│ ▲ └────┬─────┘ │ │
│ │ │ │ ││ unregister() flag_intervention() │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ FLAGGED │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ debt > elevated_threshold │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ review_passed │ │
│ │ │ ELEVATED │ ──────────────────┤ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ debt > retier_threshold │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ RE-TIER │ ── approved ──► NORMAL
│ │ │ PENDING │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ acl_upgraded │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ UPGRADED │ ─── reset ────────┘ │
│ │ │ (higher ACL) │
│ │ └──────────┘ │
│ │ │
│ ┌────┴─────────────────────────────────────────────────────────┐ │
│ │ TERMINAL STATES │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ BLOCKED │◄── block │ Any state │ │ │
│ │ └────┬─────┘ └──────────┘ │ │
│ │ │ │ │
│ │ unblock(manual) │ │
│ │ │ │ │
│ │ └────────────────────────────────► NORMAL │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ HALTED │◄── halt │ Any state │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ │ │ │ │
│ │ Manual restart required (terminal) │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
3.5.2 State Definitions¶
| State | Description | Allowed Transitions |
|---|---|---|
INACTIVE |
Agent not registered | → NORMAL (register) |
NORMAL |
Operating normally | → FLAGGED, BLOCKED, HALTED |
FLAGGED |
Trust debt accumulating | → NORMAL (decay), ELEVATED, BLOCKED, HALTED |
ELEVATED |
Under increased scrutiny | → NORMAL (review), RE-TIER, BLOCKED, HALTED |
RE-TIER |
Pending ACL re-evaluation | → UPGRADED, NORMAL (denied), HALTED |
UPGRADED |
ACL tier increased | → NORMAL (reset) |
BLOCKED |
Temporarily suspended | → NORMAL (unblock) |
HALTED |
Permanently stopped | Terminal (manual restart) |
3.5.3 Transition Functions¶
class AgentGovernanceState:
"""Formal state machine for agent governance."""
def __init__(self):
self.state = "INACTIVE"
self.trust_debt = 0.0
self.acl_tier = None
def transition(self, event: str, context: dict) -> str:
"""
Execute state transition based on event.
Returns:
New state after transition
"""
transitions = {
("INACTIVE", "register"): self._register,
("NORMAL", "flag"): self._flag,
("NORMAL", "block"): lambda c: "BLOCKED",
("NORMAL", "halt"): lambda c: "HALTED",
("FLAGGED", "evaluate"): self._evaluate_flagged,
("FLAGGED", "block"): lambda c: "BLOCKED",
("FLAGGED", "halt"): lambda c: "HALTED",
("ELEVATED", "review_passed"): lambda c: "NORMAL",
("ELEVATED", "retier_triggered"): lambda c: "RE-TIER",
("ELEVATED", "halt"): lambda c: "HALTED",
("RE-TIER", "approved"): lambda c: "NORMAL",
("RE-TIER", "upgraded"): self._upgrade,
("RE-TIER", "halt"): lambda c: "HALTED",
("BLOCKED", "unblock"): lambda c: "NORMAL",
("UPGRADED", "reset"): self._reset_after_upgrade,
}
key = (self.state, event)
if key in transitions:
self.state = transitions[key](context)
else:
raise InvalidTransitionError(f"No transition for {key}")
return self.state
def _flag(self, context):
self.trust_debt += context.get("debt_delta", 0.1)
return "FLAGGED"
def _evaluate_flagged(self, context):
if self.trust_debt > context.get("retier_threshold", 0.75):
return "RE-TIER"
elif self.trust_debt > context.get("elevated_threshold", 0.5):
return "ELEVATED"
elif self.trust_debt < context.get("normal_threshold", 0.3):
return "NORMAL"
return "FLAGGED"
3.5.4 Formal Verification Note¶
For safety-critical implementations, consider formal verification using:
- TLA+: For distributed consensus properties
- Alloy: For state invariant checking
- Spin/Promela: For temporal logic verification
Example invariants to verify: - An agent in HALTED state cannot transition to any other state - Trust debt is monotonically non-decreasing within a session (before decay) - BLOCKED agents cannot issue new actions
4. Data Flow Architecture¶
4.1 Write Path (Trace Processing)¶
flowchart LR
subgraph "Ingestion"
T[Cognitive Trace] --> V[Validation]
V --> VN[Version Check]
VN --> Q[Message Queue]
end
subgraph "Processing"
Q --> GS[Governance Steward]
GS --> TW[Tripwire Check]
TW --> CTQ[CTQ Calculation]
CTQ --> DEC[Decision Logic]
end
subgraph "Storage"
DEC --> WB[Write Buffer]
WB --> RDB[(ReflectionDB)]
WB --> TS[(Time Series)]
end
subgraph "Response"
DEC --> INT[Intervention]
INT --> PA[Operating Agent]
end
style T fill:#d9f99d
style TW fill:#fef3c7
style INT fill:#fca5a5
style RDB fill:#fed7aa
4.2 Read Path (Query & Analytics)¶
flowchart LR
subgraph "Query Layer"
API[Query API] --> CACHE[Query Cache]
CACHE --> QE[Query Engine]
end
subgraph "Storage"
QE --> RDB[(ReflectionDB)]
QE --> IDX[(Indexes)]
QE --> AGG[(Aggregates)]
end
subgraph "Consumers"
API --> DASH[Dashboard]
API --> AUDIT[Audit Tools]
API --> ANALYTICS[Analytics]
end
5. Deployment Topologies¶
5.1 Single Agent Governance (Simple)¶
Deployment: Sidecar Pattern
Components:
- 1 Operating Agent
- 1 Governance Steward (sidecar)
- 1 Policy Engine (embedded)
- 1 ReflectionDB (local SQLite)
Use Cases:
- Development/Testing
- Low-risk applications (ACL-0, ACL-1)
- Edge deployments
Retry Policy:
- Max attempts: 3
- Backoff: exponential (100ms base)
- Timeout: 500ms per attempt
5.2 Multi-Agent with Shared Governance (Standard)¶
Deployment: Service Mesh Pattern
Components:
- N Operating Agents
- N Governance Stewards (1:1 with agents)
- 1 Shared Policy Engine (service)
- 1 Shared ReflectionDB (PostgreSQL/MongoDB)
- 1 Blueprint Store (git-backed)
Use Cases:
- Enterprise deployments (ACL-2, ACL-3)
- Multi-tenant SaaS
- Microservices architecture
Retry Policy:
- Max attempts: 3
- Backoff: exponential (100ms base)
- Timeout: 500ms per attempt
- Tripwire: 5 failures → open
5.3 Distributed Steward Network (Advanced)¶
Deployment: Federated Pattern
Components:
- N Operating Agents (across regions)
- M Governance Stewards (M < N, pooled)
- Regional Policy Engines
- Distributed ReflectionDB (Cassandra/CockroachDB)
- Replicated Blueprint Stores
- Cross-region Steward coordination
Use Cases:
- Global deployments (ACL-4, ACL-5)
- High availability requirements
- Regulatory compliance (data residency)
Retry Policy:
- Max attempts: 3
- Backoff: exponential (100ms base)
- Regional timeout: 300ms
- Cross-region timeout: 1000ms
- Tripwire: per-region
5.4 Reference Minimal Architecture (ACGP-MIN-A1)¶
The ACGP-MIN-A1 architecture is the simplest conformant deployment, suitable for development and POC.
┌─────────────────────────────────────────────────────────────┐
│ ACGP-MIN-A1 │
│ │
│ ┌─────────────┐ ┌─────────────────────────────────┐ │
│ │ Primary │ │ Governance Steward │ │
│ │ Agent │──────►│ ┌─────────────────────────┐ │ │
│ │ │ │ │ Policy Engine (embedded)│ │ │
│ │ ACL-0/1 │◄──────│ └─────────────────────────┘ │ │
│ └─────────────┘ │ ┌─────────────────────────┐ │ │
│ │ │ ReflectionDB (SQLite) │ │ │
│ │ └─────────────────────────┘ │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ Blueprint (local YAML) │ │ │
│ │ └─────────────────────────┘ │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Components: - 1 Operating Agent (ACL-0 or ACL-1) - 1 Governance Steward (sidecar or co-located process) - Embedded Policy Engine (in-process) - Local ReflectionDB (SQLite file) - Local Blueprint (YAML file)
NOT Included: - Certified Source Registry - MCP/A2A adapters - Distributed storage - HITL system - Governance contracts
Spec Requirements Satisfied:
| Requirement | ACGP-MIN-A1 |
|---|---|
| Core Protocol (ACGP-1000) | |
| Version Negotiation (ACGP-1003) | |
| Basic Interventions (OK, Block) | |
| Full Interventions (all 6) | Optional |
| Tripwires (Standard) | |
| CTQ Calculation | Simplified |
| ReflectionDB Retention | Session |
| Security (TLS) | Optional |
Example Deployment:
# docker-compose.yml for ACGP-MIN-A1
version: "3.8"
services:
agent:
image: myorg/my-agent:latest
environment:
- ACGP_STEWARD_URL=http://steward:8080
- ACGP_ACL_TIER=ACL-1
steward:
image: acgp/steward-minimal:latest
volumes:
- ./blueprints:/blueprints
- ./data:/data
environment:
- ACGP_BLUEPRINT_PATH=/blueprints/dev.yaml
- ACGP_REFLECTIONDB_PATH=/data/reflection.db
5.5 Deployment Decision Matrix¶
| Factor | Single Agent | Multi-Agent Shared | Distributed Network |
|---|---|---|---|
| Latency | <10ms | <50ms | <150ms |
| Throughput | <100 req/s | <10K req/s | >100K req/s |
| Availability | 99% | 99.9% | 99.99% |
| Complexity | Low | Medium | High |
| Cost | $ | $$ | $$$ |
| Governance Strength | Basic | Standard | Maximum |
| Retry Overhead | Minimal | Low | Moderate |
6. Scaling Considerations¶
6.1 Horizontal Scaling¶
6.1.1 Stateless Components¶
Components that can scale horizontally without coordination: - Governance Stewards (with session affinity) - Policy Engines (read-only blueprint access) - API Gateways - Query services
6.1.2 Stateful Components¶
Components requiring careful scaling strategies: - ReflectionDB (sharding by agent_id or time) - Trust Debt stores (consistent hashing) - Blueprint Store (eventual consistency acceptable)
6.2 Performance Optimization¶
Caching Strategy:
L1_Cache:
- Location: Agent Wrapper
- Contents: Recent interventions, version info
- TTL: 60 seconds
L2_Cache:
- Location: Governance Steward
- Contents: Blueprint resolutions, CTQ calculations
- TTL: 300 seconds
L3_Cache:
- Location: Policy Engine
- Contents: Compiled blueprints, threshold tables
- TTL: 3600 seconds
Batching:
- Trace batching: Up to 10 traces per request
- Write batching: 100ms window for ReflectionDB
- Query batching: GraphQL-style query aggregation
Retry Optimization:
- Adaptive timeout based on historical latency
- Jitter to prevent thundering herd
- Per-destination tripwires
6.3 Load Distribution¶
graph TB
subgraph "Load Balancer"
LB[L7 Load Balancer<br/>with Health Checks]
end
subgraph "Governance Tier 1 (ACL 0-2)"
GS1[Steward 1<br/>CPU: 4<br/>RAM: 8GB]
GS2[Steward 2<br/>CPU: 4<br/>RAM: 8GB]
end
subgraph "Governance Tier 2 (ACL 3-5)"
GS3[Steward 3<br/>CPU: 8<br/>RAM: 16GB]
GS4[Steward 4<br/>CPU: 8<br/>RAM: 16GB]
end
LB -->|Low ACL agents| GS1
LB -->|Low ACL agents| GS2
LB -->|High ACL agents| GS3
LB -->|High ACL agents| GS4
GS1 -.->|Health Check| LB
GS2 -.->|Health Check| LB
GS3 -.->|Health Check| LB
GS4 -.->|Health Check| LB
7. Integration Patterns¶
7.1 Agent Framework Integration¶
7.1.1 Wrapper Pattern¶
# One-line integration example
from ACGP import GovernanceWrapper
agent = create_langchain_agent()
governed_agent = GovernanceWrapper(
agent,
blueprint="finance/trading",
retry_policy={
'max_attempts': 3,
'timeout_ms': 500,
'backoff': 'exponential'
}
)
governed_agent.run() # All actions now governed
7.1.2 Middleware Pattern¶
# Middleware for existing frameworks
class ACGPMiddleware:
def __init__(self):
self.version = "1.0.2"
self.retry_policy = RetryPolicy(max_attempts=3)
async def before_tool_call(self, tool, args):
# Retry logic wrapper
for attempt in range(self.retry_policy.max_attempts):
try:
trace = generate_trace(tool, args)
intervention = await self.steward.evaluate(
trace,
timeout_ms=500
)
if intervention.decision == "block":
raise BlockedException(intervention.reason)
return intervention.modified_args or args
except TimeoutError:
if attempt < self.retry_policy.max_attempts - 1:
await self.retry_policy.backoff(attempt)
else:
return await self.escalate_for_review(tool, args)
7.2 External Protocol Integration¶
7.2.1 MCP (Model Context Protocol)¶
sequenceDiagram
participant Agent
participant ACGP
participant MCP
participant Tool
Agent->>ACGP: Request tool use
ACGP->>ACGP: Evaluate governance (check tripwires)
alt Approved
ACGP->>MCP: Forward request
MCP->>Tool: Execute
Tool-->>MCP: Result
MCP-->>ACGP: Response
ACGP-->>Agent: Tool result
else Blocked
ACGP-->>Agent: Blocked + reason
end
7.2.2 A2A (Agent-to-Agent Protocol)¶
sequenceDiagram
participant A1 as Agent 1
participant GS1 as Steward 1
participant A2A
participant GS2 as Steward 2
participant A2 as Agent 2
A1->>GS1: Prepare message
GS1->>GS1: Validate outbound (check tripwires)
alt Allowed
GS1->>A2A: Send message
A2A->>GS2: Deliver
GS2->>GS2: Validate inbound
alt Accepted
GS2->>A2: Deliver message
else Rejected
GS2->>A2A: Reject
A2A->>A1: Message rejected
end
else Blocked
GS1->>A1: Outbound blocked
end
8. High Availability & Resilience¶
8.1 Failure Modes & Recovery¶
| Component | Failure Mode | Recovery Strategy | Degraded Operation |
|---|---|---|---|
| Governance Steward | Process crash | Auto-restart, session migration | Failover to backup steward |
| Policy Engine | Unavailable | Tripwire, cached policies | Use last known good policy |
| ReflectionDB | Write failure | Write-ahead log, retry queue | Buffer writes locally (max 1000) |
| Blueprint Store | Unavailable | Local blueprint cache | No policy updates |
| Trusted Monitor | Timeout | Async processing, skip | Proceed without anomaly check |
| Version Service | Unavailable | Use cached version info | Assume compatible |
8.2 Tripwire Configuration¶
tripwire:
failure_threshold: 5 # failures to open circuit
success_threshold: 2 # successes to close circuit
timeout: 30s # time before half-open
fallback:
steward: use_cached_decision
policy_engine: use_default_thresholds
trusted_monitor: skip_validation
reflection_db: buffer_locally
version_negotiation: assume_compatible
per_component:
governance_steward:
failure_threshold: 3
timeout: 15s
reflection_db:
failure_threshold: 10
timeout: 60s
8.3 Tripwires and Evaluation Tiers¶
Relationship: Tripwires (policy constraints) implement Evaluation Tiers (architectural patterns).
Tripwires define WHAT to check. Evaluation Tiers (see ACGP-1010) define HOW and WHEN to check.
8.3.1 Classifying Tripwires by Evaluation Tier¶
Eval Tier 0 Tripwires (must be <100ms P99, no external dependencies): - Authentication failures - Schema validation - Critical safety limits (e.g., "never delete production database") - Hard monetary limits for immediate rejection - In-memory rate limiting
Example:
tripwires:
- name: "max_single_transaction"
threshold: 10000
eval_tier: 0
latency_budget_ms: 10
check_type: "in_memory"
fail_mode: "closed"
Eval Tier 1 Tripwires (may be slower, can use local DB): - Rate limiting with external state (Redis lookup) - Daily/monthly aggregate limits (requires DB query) - Stateful pattern checks - Cached policy decisions
Example:
tripwires:
- name: "daily_transaction_limit"
threshold: 50000
eval_tier: 1
latency_budget_ms: 100
check_type: "db_lookup"
requires_state: true
fail_mode: "configurable"
8.3.2 Decision Matrix for Tripwire Classification¶
| Tripwire Characteristic | Suggested Eval Tier |
|---|---|
| No external dependencies | Tier 0 |
| Latency < 10ms | Tier 0 |
| Critical safety (can't fail open) | Tier 0 |
| Requires DB/cache lookup | Tier 1 |
| Aggregate/windowed limit | Tier 1 |
| Complex calculation | Tier 1 |
| LLM-based evaluation | Tier 2 |
| Human review | Tier 3 |
8.3.3 Implementation Guidance¶
Tier 0 Implementation (REQUIRED for ACGP-1010 conformance):
class Tier0Tripwires:
def __init__(self, config):
# Load tripwires into memory
self.tripwires = [
t for t in config.tripwires
if t.eval_tier == 0
]
# MUST be fast and local
assert all(not t.requires_external for t in self.tripwires)
def evaluate(self, request):
"""MUST complete in <100ms."""
for tripwire in self.tripwires:
if tripwire.triggered(request):
return TripwireResult.BLOCK
return TripwireResult.PASS
Tier 1 Implementation (typical):
class Tier1Tripwires:
def __init__(self, config, redis_client):
self.tripwires = [
t for t in config.tripwires
if t.eval_tier == 1
]
self.redis = redis_client # Local cache/DB allowed
async def evaluate(self, request):
"""Target <300ms, may query local DB."""
for tripwire in self.tripwires:
if tripwire.requires_state:
state = await self.redis.get(tripwire.state_key)
if tripwire.triggered_with_state(request, state):
return TripwireResult.BLOCK
elif tripwire.triggered(request):
return TripwireResult.BLOCK
return TripwireResult.PASS
8.4 Retry Policy Implementation¶
class RetryPolicy:
def __init__(self, max_attempts=3, base_delay_ms=100,
max_delay_ms=5000, timeout_ms=500):
self.max_attempts = max_attempts
self.base_delay_ms = base_delay_ms
self.max_delay_ms = max_delay_ms
self.timeout_ms = timeout_ms
async def execute_with_retry(self, operation):
"""Execute operation with exponential backoff retry."""
for attempt in range(self.max_attempts):
try:
return await asyncio.wait_for(
operation(),
timeout=self.timeout_ms / 1000
)
except asyncio.TimeoutError:
if attempt < self.max_attempts - 1:
delay = min(
self.base_delay_ms * (2 ** attempt),
self.max_delay_ms
)
# Add jitter (±10%)
jitter = random.uniform(-0.1, 0.1) * delay
await asyncio.sleep((delay + jitter) / 1000)
else:
# Final failure - escalate
raise RetryExhaustedError(
f"Failed after {self.max_attempts} attempts"
)
8.5 Graceful Degradation Levels¶
stateDiagram-v2
[*] --> Normal: All systems operational
Normal --> Degraded: Non-critical failure
Degraded --> Essential: Multiple failures
Essential --> Emergency: Critical failures
Emergency --> Shutdown: Safety threshold exceeded
Normal --> Normal: Self-healing
Degraded --> Normal: Recovery
Essential --> Degraded: Partial recovery
note right of Degraded
- Increased cache usage
- Relaxed consistency
- Async interventions
- Skip version negotiation
end note
note right of Essential
- Only critical interventions
- Batch processing
- Manual escalation
- Cached policies only
end note
note right of Emergency
- Block all high-risk
- Force human review
- Read-only mode
- Emergency override active
end note
8.6 Governance Contract Architecture Patterns¶
Stewards MAY adopt different architectural patterns based on cost, latency, and quality requirements. All patterns are conformant if they meet ACGP-1009 and ACGP-1010 requirements.
8.6.1 Pattern Comparison¶
| Pattern | Eval Tiers | Target Latency | Monthly Cost (est.) | Use Case |
|---|---|---|---|---|
| Rule-Only | 0, 1 | 100–300ms | $500 | High-volume transactional agents |
| Hybrid | 0, 1, 2 (async 3) | 100–300ms (5s async) | $2,000 | Balanced quality/performance |
| Max Quality | 0, 1, 2, 3 | 100ms–5s | $20,000 | Safety-critical, low-volume |
8.6.2 Rule-Only Pattern¶
Tiers Used: Eval-0 (static rules) + Eval-1 (cached/indexed DB lookups)
Architecture:
┌─────────────┐
│ Agent │
└──────┬──────┘
│ EVAL_REQUEST (100ms budget)
▼
┌─────────────────────────────────────┐
│ Steward │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Eval-0 │ │ Eval-1 │ │
│ │ In-memory │──▶│ Cache/Index │ │
│ │ rules <50ms │ │ <250ms │ │
│ └─────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
│ EVAL_RESPONSE (allow/deny)
▼
┌─────────────┐
│ Agent │
└─────────────┘
Fallback Strategy: deny (conservative)
Trade-offs: - Lowest cost, predictable latency - No LLM reasoning, limited context depth - Cannot handle novel/ambiguous actions
8.6.3 Hybrid Pattern¶
Tiers Used: Eval-0, 1 (sync) + Eval-2 (async model inference) + optional Eval-3 (human review)
Architecture:
┌─────────────┐
│ Agent │
└──────┬──────┘
│ EVAL_REQUEST (300ms budget)
▼
┌──────────────────────────────────────────────────┐
│ Steward │
│ ┌──────┐ ┌──────┐ ┌────────────────────┐ │
│ │ Tier │──▶│ Tier │──▶ Tier 2 (async) │ │
│ │ 0 │ │ 1 │ │ Model inference │ │
│ └──────┘ └──────┘ │ Post-action │ │
│ └────────────────────┘ │
└──────────────────────────────────────────────────┘
│ EVAL_RESPONSE (allow_and_log)
▼
┌─────────────┐ (background)
│ Agent │ ┌───▶ Model review
└─────────────┘ │ Human escalation (Tier 3)
│ Retroactive intervention
└─────
Fallback Strategy: allow_and_log (permissive with async audit)
Trade-offs: - Fast sync path, deep async reasoning - Cost-effective for most workflows - [WARNING] Risk window during async evaluation - [WARNING] Requires robust post-action intervention mechanisms
8.6.4 Max Quality Pattern¶
Tiers Used: All tiers (0, 1, 2, 3) synchronously
Architecture:
┌─────────────┐
│ Agent │
└──────┬──────┘
│ EVAL_REQUEST (5000ms budget, critical_risk)
▼
┌───────────────────────────────────────────────────────┐
│ Steward │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌────────────────┐ │
│ │ Tier │──▶│ Tier │──▶│ Tier │──▶│ Tier 3 │ │
│ │ 0 │ │ 1 │ │ 2 │ │ Human review │ │
│ │ │ │ │ │ LLM │ │ (on-demand) │ │
│ └──────┘ └──────┘ └──────┘ └────────────────┘ │
└───────────────────────────────────────────────────────┘
│ EVAL_RESPONSE (wait for all tiers)
▼
┌─────────────┐
│ Agent │
└─────────────┘
Fallback Strategy: escalate (require human decision on timeout)
Trade-offs: - [YES] Maximum safety and quality - [YES] Full audit trail with human oversight - [NO] High cost (model + human labor) - [NO] High latency (up to 5s + human response time)
8.6.5 Pattern Selection Matrix¶
Choose pattern based on:
| Requirement | Rule-Only | Hybrid | Max Quality |
|---|---|---|---|
| Actions/sec > 100 | [YES] | [WARNING] | [NO] |
| Cost < $1k/month | [YES] | [WARNING] | [NO] |
| Latency < 300ms guaranteed | [YES] | [YES] | [NO] |
| Novel actions frequent | [NO] | [YES] | [YES] |
| Safety-critical domain | [NO] | [WARNING] | [YES] |
| Audit/compliance required | [WARNING] | [YES] | [YES] |
Deployment Note: Stewards MAY implement multiple patterns and select per-agent or per-action based on "risk_level" in governance contracts (ACGP-1010).
9. Performance Requirements¶
9.1 Unified Latency Model¶
End-to-End (E2E) Latency Definition: Measured from agent request submission to agent receipt of governance decision.
Components:
E2E Latency = Network(agent→steward)
+ Protocol Overhead (parsing, validation)
+ Governance Evaluation ← Largest component
+ Network(steward→agent)
9.2 Latency Targets by Risk Level¶
Risk-based latency budgets (see ACGP-1010 Governance Contracts):
| Component | Low Risk | Elevated Risk | Critical Risk |
|---|---|---|---|
| Network (round-trip) | 20ms | 20ms | 50ms |
| Protocol overhead | 30ms | 30ms | 50ms |
| Governance evaluation | 50ms | 250ms | 4900ms |
| TOTAL E2E (P99) | 100ms | 300ms | 5000ms |
Governance Evaluation Budget Allocation:
Low Risk (50ms total): - Eval Tier 0 (must-pass checks): 30ms - Eval Tier 1 (fast policy): 20ms - Eval Tier 2: Async only
Elevated Risk (250ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: Async audit
Critical Risk (4900ms total): - Eval Tier 0: 50ms - Eval Tier 1: 200ms - Eval Tier 2: 4500ms (synchronous LLM analysis) - Eval Tier 3: Human time (separate)
Use the Latency Calculator to model budget allocation for your use case.
9.3 Component-Level Latency Targets¶
| Operation | P50 | P95 | P99 | Notes |
|---|---|---|---|---|
| Version Negotiation | 10ms | 25ms | 50ms | One-time per connection |
| Eval Tier 0 (must-pass) | 20ms | 50ms | 100ms | REQUIRED <100ms |
| Eval Tier 1 (policy) | 50ms | 150ms | 300ms | Target, not requirement |
| Eval Tier 2 (LLM) | 2000ms | 5000ms | 10000ms | Async recommended |
| ReflectionDB Write | 10ms | 50ms | 200ms | Asynchronous |
9.4 Throughput Requirements¶
| ACL Tier | Traces/sec per Agent | Batch Size | Queue Depth | Retry Budget |
|---|---|---|---|---|
| ACL-0 | 10 | 1 | 100 | 10% |
| ACL-1 | 50 | 5 | 500 | 10% |
| ACL-2 | 100 | 10 | 1000 | 15% |
| ACL-3 | 200 | 20 | 2000 | 15% |
| ACL-4 | 500 | 50 | 5000 | 20% |
| ACL-5 | 1000 | 100 | 10000 | 20% |
9.5 Resource Requirements¶
Minimum_Requirements:
Governance_Steward:
CPU: 2 cores
Memory: 4GB
Network: 100Mbps
Disk: 10GB SSD
Connections: 1000 concurrent
Policy_Engine:
CPU: 4 cores
Memory: 8GB
Network: 1Gbps
Disk: 20GB SSD
Cache: 2GB Redis
ReflectionDB:
CPU: 8 cores
Memory: 32GB
Network: 10Gbps
Disk: 1TB NVMe SSD
IOPS: 10000
Write_Buffer: 100MB
9.6 Observability Standards [NORMATIVE]¶
Implementations at Standard conformance level MUST expose the following metrics.
9.6.1 Required Metrics¶
Prometheus Format:
# Governance Evaluation Metrics
acgp_evaluation_total{agent_id, acl_tier, decision} counter
acgp_evaluation_latency_seconds{agent_id, acl_tier, eval_tier, quantile} summary
acgp_ctq_score{agent_id, acl_tier, metric} gauge
# Intervention Metrics
acgp_intervention_total{agent_id, decision, tripwire_id} counter
acgp_intervention_latency_seconds{decision, quantile} summary
# Trust Debt Metrics
acgp_trust_debt{agent_id} gauge
acgp_trust_debt_delta_total{agent_id, reason} counter
# Tripwire Metrics
acgp_tripwire_triggered_total{tripwire_id, severity, agent_id} counter
acgp_tripwire_latency_seconds{tripwire_id, eval_tier, quantile} summary
# System Health
acgp_steward_status{steward_id} gauge # 0=down, 1=degraded, 2=normal
acgp_reflectiondb_write_latency_seconds{quantile} summary
acgp_reflectiondb_size_bytes gauge
9.6.2 Standard Metric Labels¶
| Label | Description | Values |
|---|---|---|
agent_id |
Unique agent identifier | UUID |
acl_tier |
Agent's ACL tier | ACL-0 through ACL-5 |
decision |
Intervention decision | ok, nudge, flag, escalate, block, halt |
eval_tier |
Evaluation tier | 0, 1, 2, 3 |
tripwire_id |
Tripwire identifier | string |
severity |
Tripwire severity | standard, critical, severe |
quantile |
Percentile bucket | 0.5, 0.9, 0.95, 0.99 |
9.6.3 Required Endpoints¶
Standard conformance implementations MUST expose:
endpoints:
/metrics:
format: prometheus
auth: optional
/health:
format: json
response:
status: healthy|degraded|unhealthy
components:
policy_engine: ok|error
reflectiondb: ok|error
steward: ok|error
/ready:
format: json
response:
ready: true|false
reason: string
9.6.4 Alerting Recommendations¶
alerts:
- name: HighHaltRate
expr: rate(acgp_intervention_total{decision="halt"}[5m]) > 0.01
severity: critical
- name: EvaluationLatencyHigh
expr: acgp_evaluation_latency_seconds{quantile="0.99"} > 0.5
severity: warning
- name: TrustDebtCritical
expr: acgp_trust_debt > 0.75
severity: warning
- name: StewardDegraded
expr: acgp_steward_status < 2
severity: warning
10. Security Architecture¶
10.1 Defense in Depth Layers¶
graph TB
subgraph "Layer 1: Network Security"
FW[Firewall]
IDS[IDS/IPS]
TLS[TLS 1.3]
end
subgraph "Layer 2: Authentication"
VER[Version Auth]
OAUTH[OAuth 2.0]
MTLS[Mutual TLS]
end
subgraph "Layer 3: Authorization"
RBAC[Role-Based Access]
ABAC[Attribute-Based Access]
MPA[Multi-Party Auth]
end
subgraph "Layer 4: Message Security"
SIGN[ES256 Signatures]
ENCRYPT[Encryption at Rest]
CHECKSUM[SHA-256 Integrity]
end
subgraph "Layer 5: Audit & Monitoring"
AUDIT[Audit Logging]
SIEM[SIEM Integration]
ALERT[Security Alerts]
end
10.2 Zero Trust Architecture¶
zero_trust_principles:
never_trust_always_verify:
- Verify every transaction
- Version check on every connection
- No implicit trust based on network location
- Continuous validation of security posture
least_privilege_access:
- Minimal permissions by default
- Time-bound credential elevation
- Regular permission audits
assume_breach:
- Comprehensive logging
- Anomaly detection
- Tripwire system
- Incident response readiness
verify_explicitly:
- Multi-factor authentication
- Device compliance checks
- Risk-based access controls
10.3 Cryptographic Requirements¶
| Component | Requirement | Algorithm | Key Size |
|---|---|---|---|
| Transport | Encryption | TLS 1.3 | 2048-bit RSA / 256-bit ECC |
| Message Signing | Non-repudiation | ES256 (ECDSA) | 256-bit |
| Checksum | Integrity | SHA-256 | 256-bit |
| Storage | Encryption at rest | AES-256-GCM | 256-bit |
| Key Derivation | Key generation | PBKDF2 | 100,000 iterations |
Note: ES256 is standardized throughout ACGP. All implementations MUST use ES256 for message signing.
11. Conformance Requirements¶
A conformant ACGP architecture implementation MUST:
11.1 Component Requirements¶
- Implement all core components defined in Section 2
- Support at least one deployment topology from Section 5
- Meet the latency requirements in Section 9.1 for target ACL tier
- Implement version negotiation as first step in protocol flow
11.2 Integration Requirements¶
- Provide wrapper or middleware for at least one agent framework
- Support the standard message formats defined in ACGP-1003
- Implement the security requirements in Section 10
- Use ES256 for all message signatures (ACL-3+)
11.3 Operational Requirements¶
- Maintain an append-only audit trail in ReflectionDB
- Support graceful degradation as defined in Section 8
- Provide monitoring and alerting capabilities
- Implement retry policy with exponential backoff
11.4 Scaling Requirements¶
- Support horizontal scaling of stateless components
- Implement caching strategy for performance optimization
- Handle at least the throughput specified for the target ACL tier
- Support tripwires for all external dependencies
11.5 Resilience Requirements¶
- Implement retry policy (3 attempts, exponential backoff)
- Support timeout handling (500ms default)
- Maintain tripwires for failure isolation
- Buffer writes locally during ReflectionDB unavailability (max 1000 events)
12. References¶
Normative References¶
- ACGP-1000: Core Protocol Specification
- ACGP-1001: Terminology and Definitions
- ACGP-1003: Message Formats & Wire Protocol
- ACGP-1004: Reflection Blueprint Specification
- ACGP-1005: ARS-CTQ-ACL Integration Framework
- ACGP-1007: Security Considerations
- RFC 2119: Key words for use in RFCs
Informative References¶
- NIST Cybersecurity Framework: Security architecture guidance
- ISO 27001: Information security management
- The Twelve-Factor App: Scalability principles
- Google SRE Book: Reliability engineering practices
- Tripwire Pattern: Martin Fowler's design patterns
End of ACGP-1002