ACGP-1010: Governance Contracts (OPTIONAL Extension)¶
Status: Draft
Last Updated: 2026-01-08
Spec ID: ACGP-1010
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)
Navigation¶
You are here: ACGP-1010 (Governance Contracts)
Prerequisites: ACGP-1000 Core, ACGP-1003 Messages, ACGP-1004 Blueprints
Related: ACGP-1002 Architecture, ACGP-1005 ARS Framework, ACGP-1009 Conformance
Complexity: 4-star (Expert)
Reading time: ~45 minutes
Audience: Advanced implementers, architects
TL;DR (30 Seconds)¶
Governance contracts allow agents and stewards to explicitly negotiate:
- Blueprint-driven governance: Apply Reflection Blueprints under explicit runtime constraints
- Risk levels (3):
low_risk,elevated_risk,critical_risk - Evaluation tiers (4): Eval-0 through Eval-3 (must-pass checks to human approval)
- Performance budgets: Per-request latency with explicit fallback behaviors
- Fallback strategies (4):
deny,allow_and_log,cached_decision,escalate
Key insight: Make implicit governance decisions explicit and negotiable.
Conformance Note: - Minimal conformance: Governance contracts are OPTIONAL - Standard/Complete conformance: Governance contracts are REQUIRED (see ACGP-1009)
Blueprint-First Mental Model (Read This First)¶
Governance in ACGP is blueprint-driven.
- Reflection Blueprint (ACGP-1004): the policy (YAML/JSON) that defines what governance evaluates:
- checks, CTQ metrics/thresholds, evidence requirements
- tripwires, trust debt, scoring rules, inheritance/scope
- Governance Contract (ACGP-1010): the runtime negotiation that defines how that blueprint is applied for this request:
risk_level, requestedeval_tier,latency_budget_ms, and timeout fallback behavior
Key takeaway: A blueprint defines what the steward must enforce; a governance contract defines how deep/fast, and what happens on timeout.
Wire note (ACGP-1003):
- governance_contract is an OPTIONAL extension on the request.
- blueprint_id is typically surfaced in the steward's evaluation output (EVAL payload). Blueprint selection is usually steward-side (scope/inheritance), not something the agent must always supply.
Table of Contents¶
- Blueprint-First Mental Model
- Introduction
- Terminology Note
- Risk Levels
- Evaluation Tiers
- Performance Budgets
- Conformance Requirements
- Examples
- References
1. Introduction¶
1.1 Scope¶
This specification defines a governance contract mechanism for ACGP-compliant agent-steward pairs to explicitly negotiate:
- Risk-based evaluation strategies: How deeply to evaluate based on action consequences
- Performance budgets: How long the agent will wait for governance decisions
- Fallback behaviors: What happens when governance times out or is unavailable
- Blueprint application constraints: How a Reflection Blueprint policy is applied under runtime latency/cost limits
Conformance Requirements (see ACGP-1009):
| Conformance Level | Governance Contracts |
|---|---|
| Minimal | OPTIONAL - may ignore entirely |
| Standard | REQUIRED - must implement risk levels, eval tiers 0-1, performance budgets |
| Complete | REQUIRED - must implement all features including eval tiers 2-3, HSM-based evaluation |
Governance contracts are designed for production systems that need explicit control over the cost/latency/quality trade-offs in governance evaluation. Minimal conformance implementations (for learning, development, batch jobs) may skip this specification entirely.
1.1.1 Terminology Note¶
This specification introduces "Evaluation Tiers" (Eval-0 through Eval-3). These are distinct from:
- ACL Tiers (ACL-0 through ACL-5): Defined in ACGP-1001, these represent agent capability and autonomy levels, determined during agent design
- Conformance Levels (Minimal/Standard/Complete): Defined in ACGP-1009, these represent implementation completeness, determined at deployment time
Always use qualified terms to avoid ambiguity: - [YES] "ACL Tier 2 agent" or "ACL-2 agent" - [YES] "Eval Tier 1 checks" or "Eval-1 checks" - [YES] "Standard Conformance implementation" - [NO] "Tier 2 agent" (ambiguous - which tier?)
See ACGP-1001 Namespace Conventions for complete terminology guidance.
1.2 Design Goals¶
- Explicit over implicit: Make agent expectations and steward capabilities visible in the protocol
- Negotiable: Agent and steward can disagree and find common ground through capability exchange
- Fail-safe: When in doubt, be conservative (deny by default, escalate when uncertain)
- Cost-aware: Help implementers understand and optimize trade-offs between cost, latency, and quality
1.3 Non-Goals¶
This specification does NOT:
- Replace existing ACL Tiers (they coexist and complement each other)
- Mandate specific evaluation strategies (rule-based, LLM-based, hybrid all valid)
- Prescribe universal latency targets (these are guidance, not requirements)
- Create vendor lock-in (implementations maintain full flexibility)
- Require all fields (everything is optional with sensible defaults)
1.4 Requirements Language¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119.
2. Risk Levels¶
2.1 Classification [NORMATIVE]¶
Implementations using governance contracts SHOULD classify actions into three risk levels:
low_risk: Optional marker (default assumption if not specified)elevated_risk: Needs attention, has consequences but not catastrophiccritical_risk: Requires maximum care, difficult/impossible to reverse
Rationale: Three levels map cleanly to mental models without confusion between "medium" and "high". Most actions are low_risk (implicit), some need extra attention (elevated_risk), and a few are critical.
2.2 Classification Guidance (Non-Normative)¶
2.2.1 Low Risk (Implicit Default)¶
Characteristics: - Easily reversible or no lasting side effects - Read-only operations - Limited blast radius
Examples: - Reading database records - UI updates - Logging and telemetry - Idempotent queries - Information retrieval - Draft generation
Typical evaluation: Eval Tier 0 + 1 (synchronous, <100ms)
2.2.2 Elevated Risk¶
Characteristics: - Has consequences but not catastrophic - State mutations that can be corrected - Moderate blast radius
Examples: - State mutations (database writes) - User notifications (emails, alerts) - Configuration changes (non-security) - Non-financial transactions - Privilege views (not grants) - Content publishing
Typical evaluation: Eval Tier 0 + 1 (synchronous, <300ms), possibly Tier 2 (async audit)
2.2.3 Critical Risk¶
Characteristics: - Difficult or impossible to reverse - High-value target for malicious actors - Large blast radius
Examples: - Financial transactions >$threshold - Privilege grants (role assignment) - Data deletion (especially bulk) - Security policy changes - External system integration - Production database modifications - Irreversible operations
Typical evaluation: Eval Tier 0 + 1 + 2 (synchronous, <5000ms), possibly Tier 3 (human approval)
2.3 Implementation Flexibility [NORMATIVE]¶
Implementations claiming governance contract support:
MAY:
- Use only critical_risk markers (binary: critical or not)
- Use finer gradations internally while exposing these three externally
- Implement dynamic risk scoring based on context (e.g., transaction amount)
- Use ACL Tier as risk proxy (e.g., ACL-⅘ agents often perform critical_risk actions)
MUST NOT:
- Assume all unmarked requests are low_risk (could be legacy clients without governance contract support)
- Ignore risk markers without capability negotiation during handshake
2.4 Relationship to ACL Tiers [NORMATIVE]¶
ACL Tiers and Risk Levels are independent but complementary:
- ACL Tiers: Static classification of the agent's capability (determined at design/deployment)
- Risk Levels: Dynamic classification of the action's consequences (determined per-request)
Example mappings (guidance, not requirements):
| ACL Tier | Typical Actions | Common Risk Levels |
|---|---|---|
| ACL-0/1 | Read-only, suggestions | Mostly low_risk |
| ACL-⅔ | Standard operations | Mix of low_risk and elevated_risk |
| ACL-⅘ | High-privilege operations | Many critical_risk actions |
Critical insight: An ACL-2 agent CAN perform critical_risk actions (e.g., delete user data if authorized). ACL Tier constrains capability, not risk level.
3. Evaluation Tiers¶
3.1 Overview [NORMATIVE]¶
Evaluation Tiers (Eval-0 through Eval-3) describe implementation strategies for governance evaluation, not conformance requirements. They help agents and stewards communicate capabilities and performance characteristics.
Key principle: These are communication protocol labels, not mandatory implementation stages. A steward MAY support Eval Tier 2 without supporting Eval Tier 1.
3.2 Tier Definitions¶
3.2.1 Eval Tier 0: Must-Pass Synchronous Checks [NORMATIVE]¶
What it is: - Schema validation - Authentication and authorization - Rate limiting (simple counters) - Critical tripwires (e.g., "never delete production database") - Cryptographic signature verification
Characteristics (REQUIRED for implementations claiming Eval Tier 0 support): - MUST complete in <100ms (P99 latency) - MUST NOT depend on external services (LLMs, remote databases, network calls) - MUST NOT fail open (if check fails, MUST deny the action) - MUST use in-memory rules, local configuration, or local crypto operations only
Why this matters: These are your "last line of defense" checks that MUST work even when: - External services are down - Network is unavailable - Database is overloaded - System is in degraded mode
Example implementation:
class Tier0Evaluator:
def evaluate(self, request):
# All checks MUST be fast and local
if not self.validate_schema(request): # <5ms
return Decision.BLOCK
if not self.verify_signature(request): # <10ms
return Decision.BLOCK
if not self.check_rate_limit(request): # <5ms (in-memory)
return Decision.BLOCK
if self.trigger_critical_tripwire(request): # <10ms
return Decision.BLOCK
return Decision.PASS # ~30ms total
3.2.2 Eval Tier 1: Fast Synchronous Policy Checks¶
What it is: - Rule-based policy evaluation - Local database lookups (Redis, SQLite) - Cached policy decisions - Pattern matching against known-bad patterns - Stateful tripwire checks (e.g., "max 3 refunds per day")
Characteristics (targets, not requirements): - Target: <300ms P99 latency - MAY depend on local databases (Redis, SQLite, local cache) - MUST NOT depend on external LLMs or slow external services - Failure mode: Configurable (fail-open OR fail-closed, implementer's choice)
Why this matters: This is where most production governance happens. Fast enough for interactive use, rich enough for real policies.
Implementation note: If your Eval Tier 1 regularly exceeds 500ms, you're probably trying to do Eval Tier 2 work. Split it.
Example implementation:
class Tier1Evaluator:
def evaluate(self, request):
# Fast policy checks with local state
blueprint_id = self.select_blueprint_id(request) # steward-side selection (scope/inheritance)
policy = self.load_cached_policy(blueprint_id) # <20ms
if not policy.allows(request.action): # <50ms
return Decision.BLOCK
if self.check_daily_limit_exceeded(request): # <100ms (DB lookup)
return Decision.BLOCK
if self.matches_known_bad_pattern(request): # <50ms
return Decision.FLAG
return Decision.OK # ~220ms typical
3.2.3 Eval Tier 2: Deep Asynchronous Analysis¶
What it is: - LLM-based reasoning analysis - Cross-request anomaly detection - Historical pattern analysis - Complex CTQ (Cognitive Trace Quality) calculations - Multi-agent collaboration checks - Behavioral modeling
Characteristics:
- Latency: Seconds to minutes (no specific target)
- MAY depend on external LLMs, data warehouses, ML models
- Typically runs asynchronously (post-hoc audit trail)
- MAY run synchronously for critical_risk actions if budget allows
Why this matters: This is where you catch sophisticated attacks and subtle policy violations that rules miss. LLMs can detect: - Manipulation attempts - Context-inappropriate actions - Novel threat patterns - Social engineering
Implementation warning: Do NOT make interactive flows wait for Eval Tier 2 unless you've done the math on cost and latency. For most use cases, run Tier 2 async for audit.
Example implementation:
class Tier2Evaluator:
async def evaluate(self, request):
# Deep analysis with LLM
ctq_score = await self.calculate_ctq_with_llm(request) # 2-5 seconds
historical_pattern = await self.analyze_history(
request.agent_id,
lookback_days=30
) # 1-3 seconds
anomaly_score = await self.detect_anomalies(
request,
historical_pattern
) # 0.5-1 second
if ctq_score < 0.3 or anomaly_score > 0.8:
return Decision.BLOCK
elif ctq_score < 0.5 or anomaly_score > 0.5:
return Decision.FLAG
return Decision.OK # 3-10 seconds typical
3.2.4 Eval Tier 3: Human-in-the-Loop¶
What it is: - Human approval required - Expert review for edge cases - Compliance officer sign-off - Multi-party authorization - Audit committee review
Characteristics: - Latency: Seconds to hours (human time) - Requires: UX for approval workflow - Requires: State management for pending requests - Failure mode: Usually deny if human doesn't respond within timeout
Why this matters: Some decisions are too important or too novel for any automated system. Examples: - First-time high-value transactions - Policy exceptions - Novel scenarios not covered by rules - Regulatory compliance edge cases
Implementation note: Eval Tier 3 is often overkill. Consider if Eval Tier 2 + alerting is sufficient. Only use Tier 3 when human judgment is truly required.
Example implementation:
class Tier3Evaluator:
async def evaluate(self, request):
# Queue for human approval
approval_request = self.create_approval_request(
request,
required_approvers=["compliance_officer"],
timeout_hours=24
)
# Notify humans
await self.notify_approvers(approval_request)
# Wait for decision (with timeout)
decision = await self.wait_for_approval(
approval_request,
timeout=timedelta(hours=24)
)
if decision.approved:
return Decision.OK
elif decision.timeout:
return Decision.BLOCK # Fail closed on timeout
else:
return Decision.BLOCK
3.3 Tier Selection Strategy (Non-Normative)¶
Guidance for selecting tiers based on risk level and context:
| Risk Level | Interactive Use | Batch/Background |
|---|---|---|
low_risk |
Eval-0 + Eval-1 (sync) | Eval-0 + Eval-1 (sync) + Eval-2 (async audit) |
elevated_risk |
Eval-0 + Eval-1 (sync) + Eval-2 (async audit) |
Eval-0 + Eval-1 + Eval-2 (all sync) |
critical_risk |
Eval-0 + Eval-1 + Eval-2 (all sync) Maybe Eval-3 |
Eval-0 + Eval-1 + Eval-2 (sync) + Eval-3 (required) |
Rationale: - Interactive actions need low latency - Tier 0+1 synchronous, Tier 2 async - Background actions tolerate higher latency - All tiers synchronous if needed - Critical actions justify the cost - All tiers including human approval
3.4 What Tiers Are NOT [NORMATIVE]¶
To avoid confusion, explicitly note what Evaluation Tiers are NOT:
Evaluation Tiers are NOT: - Security levels (that's ACL Tiers - see ACGP-1005) - Conformance levels (that's Minimal/Standard/Complete - see ACGP-1009) - Required implementation stages (you can implement Tier 2 without Tier 1) - Sequential gates (passing Tier 0 doesn't mean Tier 1 runs next - it's configurable)
Evaluation Tiers ARE: - Communication protocol labels about capabilities and performance - Performance/cost/quality trade-off indicators - Architectural guidance for implementers - Negotiable between agent and steward
3.5 Relationship to Tripwires¶
Tripwires (defined in ACGP-1002 Architecture) are hard limits that trigger immediate interventions. Tripwires are a common implementation of Eval Tier 0 and Tier 1:
- Eval Tier 0 tripwires: Fast, local, critical safety limits
- Example: "Never delete production database"
- Must complete in <10ms
-
No external dependencies
-
Eval Tier 1 tripwires: Stateful limits requiring local DB lookup
- Example: "Max 3 refunds per day"
- May require Redis/SQLite lookup (<100ms)
- Local state only
Key relationship: Tripwires define WHAT to check, Evaluation Tiers define HOW and WHEN to check.
See ACGP-1002 .3 Tripwires and Evaluation Tiers for implementation guidance on classifying tripwires by tier.
4. Performance Budgets¶
4.1 What Is a Performance Budget? [NORMATIVE]¶
A performance budget is a per-request contract that specifies:
"I (the agent) am willing to wait up to N milliseconds for a governance decision. If you (the steward) can't decide by then, I'll execute fallback behavior X."
This is NOT an SLA. This is a timeout with explicit fallback behavior.
Critical distinction: The budget is the agent's tolerance for waiting, not the steward's promise of latency. The steward tries to meet the budget but may timeout.
4.2 Budget Components [NORMATIVE]¶
{
"governance_contract": {
"risk_level": "elevated_risk",
"eval_tier": 1,
"performance_budget": {
"latency_budget_ms": 300,
"fallback_behavior": "deny",
"tier_budgets": {
"tier_0": 50,
"tier_1": 250
}
}
}
}
Fields:
- risk_level (string, REQUIRED): Risk classification (see Section 2)
- eval_tier (integer, OPTIONAL): Requested evaluation tier 0-3 (default: 0)
- performance_budget (object, REQUIRED): Performance constraints
- latency_budget_ms (integer, REQUIRED): Total milliseconds agent will wait
- fallback_behavior (string, REQUIRED): Behavior when budget exceeded (see Section 4.4)
- tier_budgets (object, OPTIONAL): Per-tier sub-budgets (MUST sum <= total budget)
Defaults when fields absent:
- If governance_contract absent: No contract, use steward's default behavior
- If risk_level absent: Agent's configured default (typically low_risk)
- If eval_tier absent: 0 (must-pass checks only)
- If performance_budget absent: Steward's default budget (typically 500ms with deny fallback)
- If fallback_behavior absent within budget: Agent MUST provide default
- If tier_budgets absent: Proportional allocation based on tier characteristics
4.3 Budget Semantics [NORMATIVE]¶
Steward obligations:
- MUST respond within latency_budget_ms OR
- MUST return GOVERNANCE_TIMEOUT status (see ACGP-1003 Messages)
Agent obligations:
- MUST implement fallback_on_timeout behavior locally
- MUST NOT wait longer than budget without executing fallback
- SHOULD track budget violations for monitoring and alerting
Critical: The agent ALWAYS has the final say on timeout behavior. The steward cannot force the agent to wait longer than the specified budget.
4.4 Fallback Behaviors [NORMATIVE]¶
4.4.1 deny¶
Meaning: If governance doesn't respond in time, BLOCK the action.
Use for:
- critical_risk actions where safety > availability
- Financial transactions
- Privilege grants
- Data deletion
Risk: False positives (legitimate actions blocked by timeout)
Example:
if governance_timeout:
log.error("Governance timeout on critical action, blocking")
return ActionResult.BLOCKED
4.4.2 allow_and_log¶
Meaning: If governance doesn't respond in time, ALLOW the action but log heavily.
Use for:
- low_risk actions where availability > safety
- Read-only operations
- Non-critical UI updates
- Idempotent actions
Risk: False negatives (bad actions slip through during outage)
Example:
if governance_timeout:
log.warning("Governance timeout on low-risk action, allowing with logging")
audit.log_governance_bypass(request, reason="timeout")
return ActionResult.ALLOWED
4.4.3 cached_decision¶
Meaning: Use last known good policy decision if available in cache, otherwise deny.
Use for:
- elevated_risk actions with stable policies
- High-throughput scenarios where occasional cache miss is acceptable
- Actions with predictable governance patterns
Risk: Stale policies if cache invalidation fails
Example:
if governance_timeout:
cached = policy_cache.get(request.action_type)
if cached and not cached.is_expired():
log.info("Using cached policy decision due to governance timeout")
return cached.decision
else:
log.warning("No valid cache, denying due to governance timeout")
return ActionResult.BLOCKED
4.4.4 escalate¶
Meaning: Move to higher evaluation tier (e.g., Eval Tier 3 HITL) or different approval path.
Use for: - Situations where automated governance is insufficient - Novel scenarios not covered by existing rules - Actions requiring human judgment
Risk: Escalation fatigue if overused, delays in action execution
Example:
if governance_timeout:
log.warning("Governance timeout, escalating to human approval")
return self.request_human_approval(
request,
reason="governance_timeout",
required_approvers=["team_lead"]
)
4.5 Budget Sizing Guidance (Non-Normative)¶
Interactive user-facing actions:
Background batch processing:
Eval Tier 0: 50ms
Eval Tier 1: 500ms
Eval Tier 2: 5000ms (if synchronous)
Total: 5550ms
Fallback: deny or escalate
Critical user-initiated actions (user expects delay for safety):
Eval Tier 0: 50ms
Eval Tier 1: 500ms
Eval Tier 2: 10000ms (user can wait 10s for safety)
Total: 10550ms
Fallback: deny
Use the Latency Calculator to model budget allocation for your specific use case.
4.6 Budget Timeout Handling [NORMATIVE]¶
What happens when budget is exceeded:
1. Agent sends request with 300ms budget
2. Steward starts evaluation
3. At 250ms, Steward detects budget will be exceeded
4. Steward has two options:
A) Return partial decision (e.g., "Tier 0 passed, Tier 1 incomplete")
B) Return GOVERNANCE_TIMEOUT status
5. Agent receives response (or timeout) and applies fallback_on_timeout
6. Steward MAY continue async evaluation (Tier 2) for audit trail
Key insight: Budget timeout doesn't mean "stop evaluating". It means "stop blocking the agent". The steward MAY continue evaluation asynchronously for logging and learning purposes.
Example steward implementation:
async def evaluate_with_budget(self, request, budget_ms):
start = time.now()
# Tier 0 always runs
tier_0_result = self.eval_tier_0(request)
if tier_0_result.is_blocking():
return tier_0_result # Early exit on Tier 0 block
elapsed = time.now() - start
remaining = budget_ms - elapsed
if remaining < 50: # Not enough time for Tier 1
return GovernanceResponse(
status="PARTIAL_EVAL",
completed_tiers=["tier_0"],
budget_consumed_ms=elapsed
)
# Tier 1 with remaining budget
tier_1_result = await self.eval_tier_1_with_timeout(request, remaining)
if tier_1_result.timeout:
return GovernanceResponse(
status="GOVERNANCE_TIMEOUT",
completed_tiers=["tier_0"],
budget_consumed_ms=budget_ms
)
# Success
return GovernanceResponse(
status="OK",
decision=tier_1_result.decision,
completed_tiers=["tier_0", "tier_1"],
budget_consumed_ms=time.now() - start
)
5. Conformance Requirements¶
5.1 Claiming Governance Contract Support [NORMATIVE]¶
Implementations claiming support for ACGP-1010 (Governance Contracts) MUST:
- Honor
latency_budget_msby responding within budget OR returningGOVERNANCE_TIMEOUT - Implement at least Eval Tier 0 checks with <100ms P99 latency
- Document which Evaluation Tiers they support in capability negotiation
- Respect
risk_levelsemantics in fallback behavior (critical_riskMUST NOT useallow_and_log) - Return
governance_statusin responses (see ACGP-1003)
Implementations NOT claiming governance contract support:
- MUST ignore all governance_contract fields without error
- SHOULD continue to function normally
5.2 Conformance Tests¶
See ACGP-1009 Conformance for the complete test suite. Summary:
Required tests for ACGP-1010 conformance:
1. test_tier_0_always_runs: Tier 0 checks run even in degraded mode
2. test_budget_timeout: Steward returns within budget or signals timeout
3. test_fallback_deny: Agent implements deny fallback correctly
4. test_fallback_allow: Agent implements allow_and_log fallback correctly
5. test_fallback_cached: Agent implements cached_decision fallback correctly
6. test_risk_semantics: critical_risk actions never use allow_and_log fallback
7. test_capability_negotiation: Agent and steward negotiate capabilities during handshake
8. test_degradation_conservative: Degraded mode is MORE conservative, not less
5.3 Latency Conformance [NORMATIVE]¶
Implementations claiming specific Evaluation Tier support MUST meet these targets:
| Tier | P99 Latency | External Dependencies | Fail Mode |
|---|---|---|---|
| Eval-0 | <100ms (REQUIRED) | None (REQUIRED) | Fail closed (REQUIRED) |
| Eval-1 | <300ms (target) | Local DB allowed | Configurable |
| Eval-2 | No target | LLMs allowed | Configurable |
| Eval-3 | Human time | Humans required | Usually fail closed |
Testing: Use ACGP-1009 latency conformance tests to validate.
6. Examples¶
6.1 Low Risk Action (Interactive)¶
Scenario: User asks agent to generate email draft
{
"type": "EVAL_REQUEST",
"protocol_version": "1.0.0",
"request_id": "req-123",
"agent_acl_tier": "ACL-2",
"trace": {
"action": "generate_email_draft",
"reasoning": "User requested help drafting customer reply"
},
"governance_contract": {
"risk_level": "low_risk",
"performance_budget": {
"latency_budget_ms": 100,
"fallback_behavior": "allow_and_log"
}
}
}
Steward response (within budget):
{
"type": "EVAL_RESPONSE",
"protocol_version": "1.0.0",
"request_id": "req-123",
"decision": "OK",
"governance_status": {
"status": "OK",
"completed_tiers": ["tier_0", "tier_1"],
"budget_consumed_ms": 45,
"steward_state": "normal"
}
}
6.2 Critical Risk Action (Financial Transaction)¶
Scenario: Agent attempts to refund customer $5,000
{
"type": "EVAL_REQUEST",
"protocol_version": "1.0.0",
"request_id": "req-456",
"agent_acl_tier": "ACL-4",
"trace": {
"action": "process_refund",
"reasoning": "Customer requested refund for defective product",
"amount_usd": 5000
},
"governance_contract": {
"risk_level": "critical_risk",
"eval_tier": 2,
"performance_budget": {
"latency_budget_ms": 5000,
"fallback_behavior": "deny",
"tier_budgets": {
"tier_0": 50,
"tier_1": 200,
"tier_2": 4750
}
}
}
}
Steward response (deep analysis completed):
{
"type": "EVAL_RESPONSE",
"protocol_version": "1.0.0",
"request_id": "req-456",
"decision": "OK",
"ctq_score": 0.87,
"governance_status": {
"status": "OK",
"completed_tiers": ["tier_0", "tier_1", "tier_2"],
"budget_consumed_ms": 3420,
"steward_state": "normal"
},
"explanation": "Tier 2 LLM analysis confirmed legitimate refund request. Customer history shows no fraud patterns."
}
6.3 Governance Timeout (Degraded System)¶
Scenario: Steward is overloaded, can't complete in budget
{
"type": "EVAL_REQUEST",
"protocol_version": "1.0.0",
"request_id": "req-789",
"agent_acl_tier": "ACL-2",
"trace": {
"action": "update_user_profile",
"reasoning": "User requested email address change"
},
"governance_contract": {
"risk_level": "elevated_risk",
"performance_budget": {
"latency_budget_ms": 300,
"fallback_behavior": "cached_decision"
}
}
}
Steward response (timeout):
{
"type": "EVAL_RESPONSE",
"protocol_version": "1.0.0",
"request_id": "req-789",
"decision": "GOVERNANCE_TIMEOUT",
"governance_status": {
"status": "GOVERNANCE_TIMEOUT",
"completed_tiers": ["tier_0"],
"budget_consumed_ms": 300,
"steward_state": "degraded"
},
"explanation": "System degraded, only Tier 0 completed. Agent should use fallback."
}
Agent behavior:
# Agent receives GOVERNANCE_TIMEOUT
if response.decision == "GOVERNANCE_TIMEOUT":
# Check fallback behavior
if request.governance_contract.performance_budget.fallback_behavior == "cached_decision":
cached = self.policy_cache.get("update_user_profile")
if cached and not cached.is_expired():
log.info("Using cached decision: {}", cached.decision)
return cached.decision
else:
log.warning("No valid cache, denying action")
return Decision.DENY
6.4 Capability Negotiation (Handshake)¶
Agent announces capabilities:
{
"type": "SYNC_HELLO",
"protocol_version": "1.0.0",
"agent_id": "agent-abc-123",
"agent_acl_tier": "ACL-3",
"capabilities": {
"governance_contracts": {
"supported": true,
"risk_levels": ["low_risk", "elevated_risk", "critical_risk"],
"fallback_behaviors": ["deny", "allow_and_log", "cached_decision"],
"default_budget_ms": 300,
"default_fallback": "cached_decision"
}
}
}
Steward responds with its capabilities:
{
"type": "SYNC_HELLO_ACK",
"protocol_version": "1.0.0",
"steward_id": "steward-xyz-789",
"capabilities": {
"governance_contracts": {
"supported": true,
"evaluation_tiers": ["tier_0", "tier_1", "tier_2"],
"tier_0_latency_p99_ms": 45,
"tier_1_latency_p99_ms": 180,
"tier_2_available": true,
"tier_2_async_only": true,
"max_budget_ms": 10000,
"default_budget_ms": 500
}
}
}
Negotiation result: Both support governance contracts. Agent knows steward supports Tier 0-2, with Tier 2 async only.
7. References¶
7.1 Normative References¶
- RFC 2119: Key words for use in RFCs to Indicate Requirement Levels
- ACGP-1000: Core Protocol
- ACGP-1001: Terminology
- ACGP-1003: Message Formats
- ACGP-1004: Reflection Blueprint Specification
- ACGP-1009: Conformance Testing
7.2 Informative References¶
- ACGP-1002: Architecture (for deployment patterns)
- ACGP-1005: ARS Framework (for ACL Tier mapping)
- Latency Calculator: Interactive budget planning tool
- Governance Contracts Concept: High-level explanation
Appendix A: Latency Budget Composition¶
How governance budgets fit into end-to-end (E2E) latency:
E2E Latency = Network(agent->steward)
+ Protocol Overhead
+ Governance Evaluation <- This is the budget
+ Network(steward->agent)
Example for elevated_risk interactive action:
| Component | Latency | Notes |
|---|---|---|
| Network (agent->steward) | 20ms | Typical local network |
| Protocol overhead | 30ms | Parsing, validation |
| Governance (Tier 0) | 50ms | Part of budget |
| Governance (Tier 1) | 200ms | Part of budget |
| Network (steward->agent) | 20ms | Return path |
| Total E2E | 320ms | |
| Governance budget | 250ms | Tier 0 + Tier 1 |
See ACGP-1002 Performance Requirements for complete latency model.
Appendix B: Architecture Pattern Selection¶
Quick reference for choosing implementation pattern:
| Factor | Rule-Only | Hybrid | Max Quality |
|---|---|---|---|
| Throughput | 1000+ TPS | 100-500 TPS | 10-50 TPS |
| Latency P99 | 50ms | 150ms | 3000ms |
| Monthly cost (100 TPS) | $500 | $2000 | $20000 |
| Eval Tiers supported | 0, 1 | 0, 1, 2 (async) | 0, 1, 2, 3 |
| False positive rate | High | Medium | Low |
| Novel threat detection | Poor | Good | Excellent |
Rule-Only Pattern: Use Eval Tier 0 + 1 only, no LLMs
Hybrid Pattern: Use Eval Tier 0 + 1 sync, Tier 2 async for audit
Max Quality Pattern: Use all tiers including synchronous Tier 2 and optional Tier 3
See ACGP-1002 8.6 Governance Contract Architecture Patterns for implementation details.
End of ACGP-1010
Document status: Draft
Last updated: 2026-01-08
Version: 1.1.0