ACGP-1010: Governance Contracts (OPTIONAL Extension)

Status: Draft
Last Updated: 2026-01-08
Spec ID: ACGP-1010
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)


You are here: ACGP-1010 (Governance Contracts)
Prerequisites: ACGP-1000 Core, ACGP-1003 Messages, ACGP-1004 Blueprints
Related: ACGP-1002 Architecture, ACGP-1005 ARS Framework, ACGP-1009 Conformance

Complexity: 4-star (Expert)
Reading time: ~45 minutes
Audience: Advanced implementers, architects


TL;DR (30 Seconds)

Governance contracts allow agents and stewards to explicitly negotiate:

  • Blueprint-driven governance: Apply Reflection Blueprints under explicit runtime constraints
  • Risk levels (3): low_risk, elevated_risk, critical_risk
  • Evaluation tiers (4): Eval-0 through Eval-3 (must-pass checks to human approval)
  • Performance budgets: Per-request latency with explicit fallback behaviors
  • Fallback strategies (4): deny, allow_and_log, cached_decision, escalate

Key insight: Make implicit governance decisions explicit and negotiable.

Conformance Note: - Minimal conformance: Governance contracts are OPTIONAL - Standard/Complete conformance: Governance contracts are REQUIRED (see ACGP-1009)


Blueprint-First Mental Model (Read This First)

Governance in ACGP is blueprint-driven.

  • Reflection Blueprint (ACGP-1004): the policy (YAML/JSON) that defines what governance evaluates:
  • checks, CTQ metrics/thresholds, evidence requirements
  • tripwires, trust debt, scoring rules, inheritance/scope
  • Governance Contract (ACGP-1010): the runtime negotiation that defines how that blueprint is applied for this request:
  • risk_level, requested eval_tier, latency_budget_ms, and timeout fallback behavior

Key takeaway: A blueprint defines what the steward must enforce; a governance contract defines how deep/fast, and what happens on timeout.

Wire note (ACGP-1003): - governance_contract is an OPTIONAL extension on the request. - blueprint_id is typically surfaced in the steward's evaluation output (EVAL payload). Blueprint selection is usually steward-side (scope/inheritance), not something the agent must always supply.


Table of Contents

  1. Blueprint-First Mental Model
  2. Introduction
  3. Terminology Note
  4. Risk Levels
  5. Evaluation Tiers
  6. Performance Budgets
  7. Conformance Requirements
  8. Examples
  9. References

1. Introduction

1.1 Scope

This specification defines a governance contract mechanism for ACGP-compliant agent-steward pairs to explicitly negotiate:

  • Risk-based evaluation strategies: How deeply to evaluate based on action consequences
  • Performance budgets: How long the agent will wait for governance decisions
  • Fallback behaviors: What happens when governance times out or is unavailable
  • Blueprint application constraints: How a Reflection Blueprint policy is applied under runtime latency/cost limits

Conformance Requirements (see ACGP-1009):

Conformance Level Governance Contracts
Minimal OPTIONAL - may ignore entirely
Standard REQUIRED - must implement risk levels, eval tiers 0-1, performance budgets
Complete REQUIRED - must implement all features including eval tiers 2-3, HSM-based evaluation

Governance contracts are designed for production systems that need explicit control over the cost/latency/quality trade-offs in governance evaluation. Minimal conformance implementations (for learning, development, batch jobs) may skip this specification entirely.

1.1.1 Terminology Note

This specification introduces "Evaluation Tiers" (Eval-0 through Eval-3). These are distinct from:

  • ACL Tiers (ACL-0 through ACL-5): Defined in ACGP-1001, these represent agent capability and autonomy levels, determined during agent design
  • Conformance Levels (Minimal/Standard/Complete): Defined in ACGP-1009, these represent implementation completeness, determined at deployment time

Always use qualified terms to avoid ambiguity: - [YES] "ACL Tier 2 agent" or "ACL-2 agent" - [YES] "Eval Tier 1 checks" or "Eval-1 checks" - [YES] "Standard Conformance implementation" - [NO] "Tier 2 agent" (ambiguous - which tier?)

See ACGP-1001 Namespace Conventions for complete terminology guidance.

1.2 Design Goals

  1. Explicit over implicit: Make agent expectations and steward capabilities visible in the protocol
  2. Negotiable: Agent and steward can disagree and find common ground through capability exchange
  3. Fail-safe: When in doubt, be conservative (deny by default, escalate when uncertain)
  4. Cost-aware: Help implementers understand and optimize trade-offs between cost, latency, and quality

1.3 Non-Goals

This specification does NOT:

  • Replace existing ACL Tiers (they coexist and complement each other)
  • Mandate specific evaluation strategies (rule-based, LLM-based, hybrid all valid)
  • Prescribe universal latency targets (these are guidance, not requirements)
  • Create vendor lock-in (implementations maintain full flexibility)
  • Require all fields (everything is optional with sensible defaults)

1.4 Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119.


2. Risk Levels

2.1 Classification [NORMATIVE]

Implementations using governance contracts SHOULD classify actions into three risk levels:

  • low_risk: Optional marker (default assumption if not specified)
  • elevated_risk: Needs attention, has consequences but not catastrophic
  • critical_risk: Requires maximum care, difficult/impossible to reverse

Rationale: Three levels map cleanly to mental models without confusion between "medium" and "high". Most actions are low_risk (implicit), some need extra attention (elevated_risk), and a few are critical.

2.2 Classification Guidance (Non-Normative)

2.2.1 Low Risk (Implicit Default)

Characteristics: - Easily reversible or no lasting side effects - Read-only operations - Limited blast radius

Examples: - Reading database records - UI updates - Logging and telemetry - Idempotent queries - Information retrieval - Draft generation

Typical evaluation: Eval Tier 0 + 1 (synchronous, <100ms)


2.2.2 Elevated Risk

Characteristics: - Has consequences but not catastrophic - State mutations that can be corrected - Moderate blast radius

Examples: - State mutations (database writes) - User notifications (emails, alerts) - Configuration changes (non-security) - Non-financial transactions - Privilege views (not grants) - Content publishing

Typical evaluation: Eval Tier 0 + 1 (synchronous, <300ms), possibly Tier 2 (async audit)


2.2.3 Critical Risk

Characteristics: - Difficult or impossible to reverse - High-value target for malicious actors - Large blast radius

Examples: - Financial transactions >$threshold - Privilege grants (role assignment) - Data deletion (especially bulk) - Security policy changes - External system integration - Production database modifications - Irreversible operations

Typical evaluation: Eval Tier 0 + 1 + 2 (synchronous, <5000ms), possibly Tier 3 (human approval)


2.3 Implementation Flexibility [NORMATIVE]

Implementations claiming governance contract support:

MAY: - Use only critical_risk markers (binary: critical or not) - Use finer gradations internally while exposing these three externally - Implement dynamic risk scoring based on context (e.g., transaction amount) - Use ACL Tier as risk proxy (e.g., ACL-⅘ agents often perform critical_risk actions)

MUST NOT: - Assume all unmarked requests are low_risk (could be legacy clients without governance contract support) - Ignore risk markers without capability negotiation during handshake

2.4 Relationship to ACL Tiers [NORMATIVE]

ACL Tiers and Risk Levels are independent but complementary:

  • ACL Tiers: Static classification of the agent's capability (determined at design/deployment)
  • Risk Levels: Dynamic classification of the action's consequences (determined per-request)

Example mappings (guidance, not requirements):

ACL Tier Typical Actions Common Risk Levels
ACL-0/1 Read-only, suggestions Mostly low_risk
ACL-⅔ Standard operations Mix of low_risk and elevated_risk
ACL-⅘ High-privilege operations Many critical_risk actions

Critical insight: An ACL-2 agent CAN perform critical_risk actions (e.g., delete user data if authorized). ACL Tier constrains capability, not risk level.


3. Evaluation Tiers

3.1 Overview [NORMATIVE]

Evaluation Tiers (Eval-0 through Eval-3) describe implementation strategies for governance evaluation, not conformance requirements. They help agents and stewards communicate capabilities and performance characteristics.

Key principle: These are communication protocol labels, not mandatory implementation stages. A steward MAY support Eval Tier 2 without supporting Eval Tier 1.

3.2 Tier Definitions

3.2.1 Eval Tier 0: Must-Pass Synchronous Checks [NORMATIVE]

What it is: - Schema validation - Authentication and authorization - Rate limiting (simple counters) - Critical tripwires (e.g., "never delete production database") - Cryptographic signature verification

Characteristics (REQUIRED for implementations claiming Eval Tier 0 support): - MUST complete in <100ms (P99 latency) - MUST NOT depend on external services (LLMs, remote databases, network calls) - MUST NOT fail open (if check fails, MUST deny the action) - MUST use in-memory rules, local configuration, or local crypto operations only

Why this matters: These are your "last line of defense" checks that MUST work even when: - External services are down - Network is unavailable - Database is overloaded - System is in degraded mode

Example implementation:

class Tier0Evaluator:
    def evaluate(self, request):
        # All checks MUST be fast and local
        if not self.validate_schema(request):  # <5ms
            return Decision.BLOCK

        if not self.verify_signature(request):  # <10ms
            return Decision.BLOCK

        if not self.check_rate_limit(request):  # <5ms (in-memory)
            return Decision.BLOCK

        if self.trigger_critical_tripwire(request):  # <10ms
            return Decision.BLOCK

        return Decision.PASS  # ~30ms total


3.2.2 Eval Tier 1: Fast Synchronous Policy Checks

What it is: - Rule-based policy evaluation - Local database lookups (Redis, SQLite) - Cached policy decisions - Pattern matching against known-bad patterns - Stateful tripwire checks (e.g., "max 3 refunds per day")

Characteristics (targets, not requirements): - Target: <300ms P99 latency - MAY depend on local databases (Redis, SQLite, local cache) - MUST NOT depend on external LLMs or slow external services - Failure mode: Configurable (fail-open OR fail-closed, implementer's choice)

Why this matters: This is where most production governance happens. Fast enough for interactive use, rich enough for real policies.

Implementation note: If your Eval Tier 1 regularly exceeds 500ms, you're probably trying to do Eval Tier 2 work. Split it.

Example implementation:

class Tier1Evaluator:
    def evaluate(self, request):
        # Fast policy checks with local state
        blueprint_id = self.select_blueprint_id(request)  # steward-side selection (scope/inheritance)
        policy = self.load_cached_policy(blueprint_id)  # <20ms

        if not policy.allows(request.action):  # <50ms
            return Decision.BLOCK

        if self.check_daily_limit_exceeded(request):  # <100ms (DB lookup)
            return Decision.BLOCK

        if self.matches_known_bad_pattern(request):  # <50ms
            return Decision.FLAG

        return Decision.OK  # ~220ms typical


3.2.3 Eval Tier 2: Deep Asynchronous Analysis

What it is: - LLM-based reasoning analysis - Cross-request anomaly detection - Historical pattern analysis - Complex CTQ (Cognitive Trace Quality) calculations - Multi-agent collaboration checks - Behavioral modeling

Characteristics: - Latency: Seconds to minutes (no specific target) - MAY depend on external LLMs, data warehouses, ML models - Typically runs asynchronously (post-hoc audit trail) - MAY run synchronously for critical_risk actions if budget allows

Why this matters: This is where you catch sophisticated attacks and subtle policy violations that rules miss. LLMs can detect: - Manipulation attempts - Context-inappropriate actions - Novel threat patterns - Social engineering

Implementation warning: Do NOT make interactive flows wait for Eval Tier 2 unless you've done the math on cost and latency. For most use cases, run Tier 2 async for audit.

Example implementation:

class Tier2Evaluator:
    async def evaluate(self, request):
        # Deep analysis with LLM
        ctq_score = await self.calculate_ctq_with_llm(request)  # 2-5 seconds

        historical_pattern = await self.analyze_history(
            request.agent_id,
            lookback_days=30
        )  # 1-3 seconds

        anomaly_score = await self.detect_anomalies(
            request,
            historical_pattern
        )  # 0.5-1 second

        if ctq_score < 0.3 or anomaly_score > 0.8:
            return Decision.BLOCK
        elif ctq_score < 0.5 or anomaly_score > 0.5:
            return Decision.FLAG

        return Decision.OK  # 3-10 seconds typical


3.2.4 Eval Tier 3: Human-in-the-Loop

What it is: - Human approval required - Expert review for edge cases - Compliance officer sign-off - Multi-party authorization - Audit committee review

Characteristics: - Latency: Seconds to hours (human time) - Requires: UX for approval workflow - Requires: State management for pending requests - Failure mode: Usually deny if human doesn't respond within timeout

Why this matters: Some decisions are too important or too novel for any automated system. Examples: - First-time high-value transactions - Policy exceptions - Novel scenarios not covered by rules - Regulatory compliance edge cases

Implementation note: Eval Tier 3 is often overkill. Consider if Eval Tier 2 + alerting is sufficient. Only use Tier 3 when human judgment is truly required.

Example implementation:

class Tier3Evaluator:
    async def evaluate(self, request):
        # Queue for human approval
        approval_request = self.create_approval_request(
            request,
            required_approvers=["compliance_officer"],
            timeout_hours=24
        )

        # Notify humans
        await self.notify_approvers(approval_request)

        # Wait for decision (with timeout)
        decision = await self.wait_for_approval(
            approval_request,
            timeout=timedelta(hours=24)
        )

        if decision.approved:
            return Decision.OK
        elif decision.timeout:
            return Decision.BLOCK  # Fail closed on timeout
        else:
            return Decision.BLOCK


3.3 Tier Selection Strategy (Non-Normative)

Guidance for selecting tiers based on risk level and context:

Risk Level Interactive Use Batch/Background
low_risk Eval-0 + Eval-1 (sync) Eval-0 + Eval-1 (sync)
+ Eval-2 (async audit)
elevated_risk Eval-0 + Eval-1 (sync)
+ Eval-2 (async audit)
Eval-0 + Eval-1 + Eval-2 (all sync)
critical_risk Eval-0 + Eval-1 + Eval-2 (all sync)
Maybe Eval-3
Eval-0 + Eval-1 + Eval-2 (sync)
+ Eval-3 (required)

Rationale: - Interactive actions need low latency - Tier 0+1 synchronous, Tier 2 async - Background actions tolerate higher latency - All tiers synchronous if needed - Critical actions justify the cost - All tiers including human approval

3.4 What Tiers Are NOT [NORMATIVE]

To avoid confusion, explicitly note what Evaluation Tiers are NOT:

Evaluation Tiers are NOT: - Security levels (that's ACL Tiers - see ACGP-1005) - Conformance levels (that's Minimal/Standard/Complete - see ACGP-1009) - Required implementation stages (you can implement Tier 2 without Tier 1) - Sequential gates (passing Tier 0 doesn't mean Tier 1 runs next - it's configurable)

Evaluation Tiers ARE: - Communication protocol labels about capabilities and performance - Performance/cost/quality trade-off indicators - Architectural guidance for implementers - Negotiable between agent and steward

3.5 Relationship to Tripwires

Tripwires (defined in ACGP-1002 Architecture) are hard limits that trigger immediate interventions. Tripwires are a common implementation of Eval Tier 0 and Tier 1:

  • Eval Tier 0 tripwires: Fast, local, critical safety limits
  • Example: "Never delete production database"
  • Must complete in <10ms
  • No external dependencies

  • Eval Tier 1 tripwires: Stateful limits requiring local DB lookup

  • Example: "Max 3 refunds per day"
  • May require Redis/SQLite lookup (<100ms)
  • Local state only

Key relationship: Tripwires define WHAT to check, Evaluation Tiers define HOW and WHEN to check.

See ACGP-1002 .3 Tripwires and Evaluation Tiers for implementation guidance on classifying tripwires by tier.


4. Performance Budgets

4.1 What Is a Performance Budget? [NORMATIVE]

A performance budget is a per-request contract that specifies:

"I (the agent) am willing to wait up to N milliseconds for a governance decision. If you (the steward) can't decide by then, I'll execute fallback behavior X."

This is NOT an SLA. This is a timeout with explicit fallback behavior.

Critical distinction: The budget is the agent's tolerance for waiting, not the steward's promise of latency. The steward tries to meet the budget but may timeout.

4.2 Budget Components [NORMATIVE]

{
  "governance_contract": {
    "risk_level": "elevated_risk",
    "eval_tier": 1,
    "performance_budget": {
      "latency_budget_ms": 300,
      "fallback_behavior": "deny",
      "tier_budgets": {
        "tier_0": 50,
        "tier_1": 250
      }
    }
  }
}

Fields: - risk_level (string, REQUIRED): Risk classification (see Section 2) - eval_tier (integer, OPTIONAL): Requested evaluation tier 0-3 (default: 0) - performance_budget (object, REQUIRED): Performance constraints - latency_budget_ms (integer, REQUIRED): Total milliseconds agent will wait - fallback_behavior (string, REQUIRED): Behavior when budget exceeded (see Section 4.4) - tier_budgets (object, OPTIONAL): Per-tier sub-budgets (MUST sum <= total budget)

Defaults when fields absent: - If governance_contract absent: No contract, use steward's default behavior - If risk_level absent: Agent's configured default (typically low_risk) - If eval_tier absent: 0 (must-pass checks only) - If performance_budget absent: Steward's default budget (typically 500ms with deny fallback) - If fallback_behavior absent within budget: Agent MUST provide default - If tier_budgets absent: Proportional allocation based on tier characteristics

4.3 Budget Semantics [NORMATIVE]

Steward obligations: - MUST respond within latency_budget_ms OR - MUST return GOVERNANCE_TIMEOUT status (see ACGP-1003 Messages)

Agent obligations: - MUST implement fallback_on_timeout behavior locally - MUST NOT wait longer than budget without executing fallback - SHOULD track budget violations for monitoring and alerting

Critical: The agent ALWAYS has the final say on timeout behavior. The steward cannot force the agent to wait longer than the specified budget.

4.4 Fallback Behaviors [NORMATIVE]

4.4.1 deny

Meaning: If governance doesn't respond in time, BLOCK the action.

Use for: - critical_risk actions where safety > availability - Financial transactions - Privilege grants - Data deletion

Risk: False positives (legitimate actions blocked by timeout)

Example:

if governance_timeout:
    log.error("Governance timeout on critical action, blocking")
    return ActionResult.BLOCKED


4.4.2 allow_and_log

Meaning: If governance doesn't respond in time, ALLOW the action but log heavily.

Use for: - low_risk actions where availability > safety - Read-only operations - Non-critical UI updates - Idempotent actions

Risk: False negatives (bad actions slip through during outage)

Example:

if governance_timeout:
    log.warning("Governance timeout on low-risk action, allowing with logging")
    audit.log_governance_bypass(request, reason="timeout")
    return ActionResult.ALLOWED


4.4.3 cached_decision

Meaning: Use last known good policy decision if available in cache, otherwise deny.

Use for: - elevated_risk actions with stable policies - High-throughput scenarios where occasional cache miss is acceptable - Actions with predictable governance patterns

Risk: Stale policies if cache invalidation fails

Example:

if governance_timeout:
    cached = policy_cache.get(request.action_type)
    if cached and not cached.is_expired():
        log.info("Using cached policy decision due to governance timeout")
        return cached.decision
    else:
        log.warning("No valid cache, denying due to governance timeout")
        return ActionResult.BLOCKED


4.4.4 escalate

Meaning: Move to higher evaluation tier (e.g., Eval Tier 3 HITL) or different approval path.

Use for: - Situations where automated governance is insufficient - Novel scenarios not covered by existing rules - Actions requiring human judgment

Risk: Escalation fatigue if overused, delays in action execution

Example:

if governance_timeout:
    log.warning("Governance timeout, escalating to human approval")
    return self.request_human_approval(
        request,
        reason="governance_timeout",
        required_approvers=["team_lead"]
    )


4.5 Budget Sizing Guidance (Non-Normative)

Interactive user-facing actions:

Eval Tier 0: 50ms
Eval Tier 1: 200ms
Total: 250ms
Fallback: cached_decision or allow_and_log

Background batch processing:

Eval Tier 0: 50ms
Eval Tier 1: 500ms
Eval Tier 2: 5000ms (if synchronous)
Total: 5550ms
Fallback: deny or escalate

Critical user-initiated actions (user expects delay for safety):

Eval Tier 0: 50ms
Eval Tier 1: 500ms
Eval Tier 2: 10000ms (user can wait 10s for safety)
Total: 10550ms
Fallback: deny

Use the Latency Calculator to model budget allocation for your specific use case.

4.6 Budget Timeout Handling [NORMATIVE]

What happens when budget is exceeded:

1. Agent sends request with 300ms budget
2. Steward starts evaluation
3. At 250ms, Steward detects budget will be exceeded
4. Steward has two options:
   A) Return partial decision (e.g., "Tier 0 passed, Tier 1 incomplete")
   B) Return GOVERNANCE_TIMEOUT status
5. Agent receives response (or timeout) and applies fallback_on_timeout
6. Steward MAY continue async evaluation (Tier 2) for audit trail

Key insight: Budget timeout doesn't mean "stop evaluating". It means "stop blocking the agent". The steward MAY continue evaluation asynchronously for logging and learning purposes.

Example steward implementation:

async def evaluate_with_budget(self, request, budget_ms):
    start = time.now()

    # Tier 0 always runs
    tier_0_result = self.eval_tier_0(request)
    if tier_0_result.is_blocking():
        return tier_0_result  # Early exit on Tier 0 block

    elapsed = time.now() - start
    remaining = budget_ms - elapsed

    if remaining < 50:  # Not enough time for Tier 1
        return GovernanceResponse(
            status="PARTIAL_EVAL",
            completed_tiers=["tier_0"],
            budget_consumed_ms=elapsed
        )

    # Tier 1 with remaining budget
    tier_1_result = await self.eval_tier_1_with_timeout(request, remaining)

    if tier_1_result.timeout:
        return GovernanceResponse(
            status="GOVERNANCE_TIMEOUT",
            completed_tiers=["tier_0"],
            budget_consumed_ms=budget_ms
        )

    # Success
    return GovernanceResponse(
        status="OK",
        decision=tier_1_result.decision,
        completed_tiers=["tier_0", "tier_1"],
        budget_consumed_ms=time.now() - start
    )


5. Conformance Requirements

5.1 Claiming Governance Contract Support [NORMATIVE]

Implementations claiming support for ACGP-1010 (Governance Contracts) MUST:

  1. Honor latency_budget_ms by responding within budget OR returning GOVERNANCE_TIMEOUT
  2. Implement at least Eval Tier 0 checks with <100ms P99 latency
  3. Document which Evaluation Tiers they support in capability negotiation
  4. Respect risk_level semantics in fallback behavior (critical_risk MUST NOT use allow_and_log)
  5. Return governance_status in responses (see ACGP-1003)

Implementations NOT claiming governance contract support: - MUST ignore all governance_contract fields without error - SHOULD continue to function normally

5.2 Conformance Tests

See ACGP-1009 Conformance for the complete test suite. Summary:

Required tests for ACGP-1010 conformance: 1. test_tier_0_always_runs: Tier 0 checks run even in degraded mode 2. test_budget_timeout: Steward returns within budget or signals timeout 3. test_fallback_deny: Agent implements deny fallback correctly 4. test_fallback_allow: Agent implements allow_and_log fallback correctly 5. test_fallback_cached: Agent implements cached_decision fallback correctly 6. test_risk_semantics: critical_risk actions never use allow_and_log fallback 7. test_capability_negotiation: Agent and steward negotiate capabilities during handshake 8. test_degradation_conservative: Degraded mode is MORE conservative, not less

5.3 Latency Conformance [NORMATIVE]

Implementations claiming specific Evaluation Tier support MUST meet these targets:

Tier P99 Latency External Dependencies Fail Mode
Eval-0 <100ms (REQUIRED) None (REQUIRED) Fail closed (REQUIRED)
Eval-1 <300ms (target) Local DB allowed Configurable
Eval-2 No target LLMs allowed Configurable
Eval-3 Human time Humans required Usually fail closed

Testing: Use ACGP-1009 latency conformance tests to validate.


6. Examples

6.1 Low Risk Action (Interactive)

Scenario: User asks agent to generate email draft

{
  "type": "EVAL_REQUEST",
  "protocol_version": "1.0.0",
  "request_id": "req-123",
  "agent_acl_tier": "ACL-2",
  "trace": {
    "action": "generate_email_draft",
    "reasoning": "User requested help drafting customer reply"
  },
  "governance_contract": {
    "risk_level": "low_risk",
    "performance_budget": {
      "latency_budget_ms": 100,
      "fallback_behavior": "allow_and_log"
    }
  }
}

Steward response (within budget):

{
  "type": "EVAL_RESPONSE",
  "protocol_version": "1.0.0",
  "request_id": "req-123",
  "decision": "OK",
  "governance_status": {
    "status": "OK",
    "completed_tiers": ["tier_0", "tier_1"],
    "budget_consumed_ms": 45,
    "steward_state": "normal"
  }
}

6.2 Critical Risk Action (Financial Transaction)

Scenario: Agent attempts to refund customer $5,000

{
  "type": "EVAL_REQUEST",
  "protocol_version": "1.0.0",
  "request_id": "req-456",
  "agent_acl_tier": "ACL-4",
  "trace": {
    "action": "process_refund",
    "reasoning": "Customer requested refund for defective product",
    "amount_usd": 5000
  },
  "governance_contract": {
    "risk_level": "critical_risk",
    "eval_tier": 2,
    "performance_budget": {
      "latency_budget_ms": 5000,
      "fallback_behavior": "deny",
      "tier_budgets": {
        "tier_0": 50,
        "tier_1": 200,
        "tier_2": 4750
      }
    }
  }
}

Steward response (deep analysis completed):

{
  "type": "EVAL_RESPONSE",
  "protocol_version": "1.0.0",
  "request_id": "req-456",
  "decision": "OK",
  "ctq_score": 0.87,
  "governance_status": {
    "status": "OK",
    "completed_tiers": ["tier_0", "tier_1", "tier_2"],
    "budget_consumed_ms": 3420,
    "steward_state": "normal"
  },
  "explanation": "Tier 2 LLM analysis confirmed legitimate refund request. Customer history shows no fraud patterns."
}

6.3 Governance Timeout (Degraded System)

Scenario: Steward is overloaded, can't complete in budget

{
  "type": "EVAL_REQUEST",
  "protocol_version": "1.0.0",
  "request_id": "req-789",
  "agent_acl_tier": "ACL-2",
  "trace": {
    "action": "update_user_profile",
    "reasoning": "User requested email address change"
  },
  "governance_contract": {
    "risk_level": "elevated_risk",
    "performance_budget": {
      "latency_budget_ms": 300,
      "fallback_behavior": "cached_decision"
    }
  }
}

Steward response (timeout):

{
  "type": "EVAL_RESPONSE",
  "protocol_version": "1.0.0",
  "request_id": "req-789",
  "decision": "GOVERNANCE_TIMEOUT",
  "governance_status": {
    "status": "GOVERNANCE_TIMEOUT",
    "completed_tiers": ["tier_0"],
    "budget_consumed_ms": 300,
    "steward_state": "degraded"
  },
  "explanation": "System degraded, only Tier 0 completed. Agent should use fallback."
}

Agent behavior:

# Agent receives GOVERNANCE_TIMEOUT
if response.decision == "GOVERNANCE_TIMEOUT":
    # Check fallback behavior
    if request.governance_contract.performance_budget.fallback_behavior == "cached_decision":
        cached = self.policy_cache.get("update_user_profile")
        if cached and not cached.is_expired():
            log.info("Using cached decision: {}", cached.decision)
            return cached.decision
        else:
            log.warning("No valid cache, denying action")
            return Decision.DENY

6.4 Capability Negotiation (Handshake)

Agent announces capabilities:

{
  "type": "SYNC_HELLO",
  "protocol_version": "1.0.0",
  "agent_id": "agent-abc-123",
  "agent_acl_tier": "ACL-3",
  "capabilities": {
    "governance_contracts": {
      "supported": true,
      "risk_levels": ["low_risk", "elevated_risk", "critical_risk"],
      "fallback_behaviors": ["deny", "allow_and_log", "cached_decision"],
      "default_budget_ms": 300,
      "default_fallback": "cached_decision"
    }
  }
}

Steward responds with its capabilities:

{
  "type": "SYNC_HELLO_ACK",
  "protocol_version": "1.0.0",
  "steward_id": "steward-xyz-789",
  "capabilities": {
    "governance_contracts": {
      "supported": true,
      "evaluation_tiers": ["tier_0", "tier_1", "tier_2"],
      "tier_0_latency_p99_ms": 45,
      "tier_1_latency_p99_ms": 180,
      "tier_2_available": true,
      "tier_2_async_only": true,
      "max_budget_ms": 10000,
      "default_budget_ms": 500
    }
  }
}

Negotiation result: Both support governance contracts. Agent knows steward supports Tier 0-2, with Tier 2 async only.


7. References

7.1 Normative References

7.2 Informative References


Appendix A: Latency Budget Composition

How governance budgets fit into end-to-end (E2E) latency:

E2E Latency = Network(agent->steward)
            + Protocol Overhead
            + Governance Evaluation  <- This is the budget
            + Network(steward->agent)

Example for elevated_risk interactive action:

Component Latency Notes
Network (agent->steward) 20ms Typical local network
Protocol overhead 30ms Parsing, validation
Governance (Tier 0) 50ms Part of budget
Governance (Tier 1) 200ms Part of budget
Network (steward->agent) 20ms Return path
Total E2E 320ms
Governance budget 250ms Tier 0 + Tier 1

See ACGP-1002 Performance Requirements for complete latency model.


Appendix B: Architecture Pattern Selection

Quick reference for choosing implementation pattern:

Factor Rule-Only Hybrid Max Quality
Throughput 1000+ TPS 100-500 TPS 10-50 TPS
Latency P99 50ms 150ms 3000ms
Monthly cost (100 TPS) $500 $2000 $20000
Eval Tiers supported 0, 1 0, 1, 2 (async) 0, 1, 2, 3
False positive rate High Medium Low
Novel threat detection Poor Good Excellent

Rule-Only Pattern: Use Eval Tier 0 + 1 only, no LLMs
Hybrid Pattern: Use Eval Tier 0 + 1 sync, Tier 2 async for audit
Max Quality Pattern: Use all tiers including synchronous Tier 2 and optional Tier 3

See ACGP-1002 8.6 Governance Contract Architecture Patterns for implementation details.


End of ACGP-1010

Document status: Draft
Last updated: 2026-01-08
Version: 1.1.0