ACGP-1004: Reflection Blueprint Specification

Status: Draft
Last Updated: 2026-01-08
Spec ID: ACGP-1004
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)

Abstract

This document specifies the structure, syntax, and semantics of ACGP Reflection Blueprints. Blueprints are human-readable YAML or JSON configuration files that define runtime governance policies for Operating Agents. They serve as the primary mechanism for translating organizational policies into enforceable rules, covering quality metrics, evidence requirements, behavioral checks, and scoring thresholds. This specification provides the normative schema that all conformant ACGP Policy Engines must use to parse and apply these policies.

Table of Contents

  1. Introduction
  2. Format and Structure
  3. Top-Level Fields (Metadata)
  4. The scope Block
  5. The inherits Field
  6. The checks Block (Rules & Metrics)
  7. CTQ Configuration (Blueprint-Centric)
  8. The evidence Block
  9. Tripwires
  10. Trust Debt Configuration
  11. The scoring Block
  12. The Clarity Baseline
  13. Blueprint Versioning
  14. Complete Example
  15. Conformance Requirements
  16. References

1. Introduction

A core principle of ACGP is the separation of policy from code. Reflection Blueprints are the embodiment of this principle. They allow governance rules to be defined and managed by domain experts, compliance officers, and legal teams, rather than being hard-coded into the agent's logic. This enables an agile and transparent approach to AI governance.

Blueprint Selection: Blueprints are configured and selected by the Governance Steward, not by the operating agent. The steward automatically selects the appropriate blueprint based on agent scope (agent_id, agent_tier, tools), inheritance rules, and request context. This maintains proper separation of concerns and prevents agents from bypassing governance policies.

1.1 Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


2. Format and Structure

  • Format: A Reflection Blueprint MUST be a valid YAML 1.2 or JSON document. YAML is RECOMMENDED for human readability.
  • Structure: A blueprint is a single object containing a set of key-value pairs that define the policy. The primary sections are:
    1. Metadata: id, version, description.
    2. Scope: Defines which agents, tools, or domains this blueprint applies to.
    3. Inheritance: Specifies a parent blueprint to inherit from.
    4. Checks: The list of specific rules or metrics to evaluate.
    5. Evidence: Requirements for grounding agent actions in trusted sources.
    6. Scoring: Default thresholds for mapping CTQ scores to Interventions.

3. Top-Level Fields (Metadata)

Every blueprint MUST contain the following top-level metadata fields:

  • id (string, required): A unique, human-readable identifier for the blueprint, RECOMMENDED to follow a domain/goal@version format.
    • Example: finance/customer_comms@1.2
  • version (string, required): The blueprint version following semantic versioning (e.g., "1.2.0").
  • description (string, required): A concise explanation of the blueprint's purpose.

4. The scope Block

The scope block is an OPTIONAL object that restricts the application of the blueprint to specific contexts. If omitted, the blueprint is considered globally applicable.

  • agent_tier (string or array, optional): Specifies the ACL Tier(s) this blueprint applies to (e.g., ACL-3 or [ACL-4, ACL-5]).
  • tools (array, optional): A list of tool names this blueprint governs.
  • domains (array, optional): A list of domains or resources (e.g., production environments) where this blueprint is active.

5. The inherits Field

The inherits field (string, optional) specifies the id of a parent blueprint. This allows for creating a hierarchy of policies (e.g., a specific task blueprint inheriting from a general domain blueprint).

  • Child blueprints MAY override fields from the parent. The checks array is an exception; it is APPENDED, not overridden.
  • All blueprints MUST ultimately inherit from the clarity.baseline, as defined in Section 12.

6. The checks Block (Rules & Metrics)

The checks block is a REQUIRED array of objects, where each object defines a specific governance rule or metric to be evaluated at runtime.

6.1 Common Check Fields

  • id (string, required): A unique identifier for the check (e.g., gdpr_minimization).
  • when (object, required): Defines the trigger for the check. This object contains key-value pairs that match against the Cognitive Trace (e.g., hook: "output" or tool: "database_query").

6.2 Rule-Based Checks

For simple, direct policy enforcement.

  • rule (object, required): Contains the logic for a pass/fail check.
    • condition (string, required): An expression or reference to a condition that must be met.
    • on_fail (object, required): The intervention to issue if the condition is not met.
      • decision (string, required): The intervention level. MUST be one of: ok, nudge, flag, escalate, block, halt.
      • reason (string, required): The rationale for the intervention.

Note: The decision field supports all six intervention types as defined in ACGP-1001.

6.3 Metric-Based Checks

For weighted scoring that contributes to the final CTQ score.

  • metric (object, required): Contains the logic for a scored check.
    • name (string, required): The name of the metric (e.g., numerical_accuracy).
    • weight (float, required): The weight of this metric in the final CTQ calculation (0.0 to 1.0).
    • check (object, required): Defines how to calculate the metric score.
      • type (string, required): The type of evaluation (e.g., llm, tool, regex).
      • args (object, optional): Arguments to pass to the evaluation provider.

7. CTQ Configuration (Blueprint-Centric)

Full CTQ (Cognitive Trust Quotient) configuration is now defined within blueprints, making each blueprint a self-contained governance policy.

7.1 CTQ Block Structure

The ctq block defines how cognitive quality is measured and scored for this blueprint.

blueprint:
  ctq:
    metrics:
      reasoning_quality:
        weight: 0.25
        scorer: "llm-judge"
        parameters:
          model: "gpt-4"
          prompt_template: "reasoning_eval_v1"
          threshold_warn: 0.6
          threshold_fail: 0.3
      knowledge_grounding:
        weight: 0.25
        scorer: "source-match"
        parameters:
          required_sources: ["certified-registry"]
          min_citation_ratio: 0.8
      ethical_alignment:
        weight: 0.20
        scorer: "llm-judge"
        parameters:
          prompt_template: "ethics_eval_v1"
      tool_safety:
        weight: 0.20
        scorer: "rule-based"
        parameters:
          rules:
            - "no_destructive_ops_without_confirmation"
            - "rate_limit_external_calls"
      context_awareness:
        weight: 0.10
        scorer: "hybrid"
        parameters:
          rule_weight: 0.4
          llm_weight: 0.6

    aggregation: "weighted_average"

    thresholds:
      ok: 0.85
      nudge: 0.70
      flag: 0.50
      block: 0.30

7.2 Standard Scorer Types [NORMATIVE]

Implementations MUST support these standard scorer types:

7.2.1 rule-based Scorer

Fast, deterministic evaluation using predefined rules.

Interface:

scorer: "rule-based"
parameters:
  rules: [string]           # List of rule IDs to evaluate
  mode: "all" | "any"       # all = AND, any = OR (default: "all")

# Output: 1.0 if rules pass, 0.0 if fail (binary)

Characteristics: - Latency: <10ms - Eval Tier: 0 - Use case: Hard constraints, permission checks

7.2.2 llm-judge Scorer

LLM-based semantic evaluation.

Interface:

scorer: "llm-judge"
parameters:
  model: string             # Model identifier (e.g., "gpt-4", "claude-3")
  prompt_template: string   # Template ID for evaluation prompt
  temperature: float        # Sampling temperature (default: 0.0)
  max_tokens: int           # Max tokens for response (default: 256)

# Output: float 0.0-1.0 based on LLM assessment

Characteristics: - Latency: 500ms-5000ms - Eval Tier: 2 - Use case: Nuanced reasoning, context understanding

Note: LLM-based scorers are where commercial differentiation occurs. The protocol defines the interface; premium implementations provide better accuracy and lower latency.

7.2.3 source-match Scorer

Evaluates grounding against certified sources.

Interface:

scorer: "source-match"
parameters:
  required_sources: [string]     # Registry categories required
  min_citation_ratio: float      # Min % of claims with citations
  min_source_trust: float        # Minimum source trust score

# Output: float 0.0-1.0 based on citation coverage and trust

Characteristics: - Latency: 50ms-200ms - Eval Tier: 1 - Use case: Knowledge grounding, fact verification

7.2.4 pattern-match Scorer

Regex or pattern-based content scanning.

Interface:

scorer: "pattern-match"
parameters:
  patterns:
    - pattern: "regex-pattern"
      score_on_match: float    # Score if matched
      score_on_miss: float     # Score if not matched
  aggregation: "min" | "max" | "avg"

# Output: float 0.0-1.0 based on pattern matching

Characteristics: - Latency: <50ms - Eval Tier: 0 - Use case: PII detection, keyword filtering

7.2.5 hybrid Scorer

Combines multiple scorer types.

Interface:

scorer: "hybrid"
parameters:
  scorers:
    - type: "rule-based"
      weight: 0.4
      parameters: {...}
    - type: "llm-judge"
      weight: 0.6
      parameters: {...}
  aggregation: "weighted_average" | "min" | "max"

# Output: float 0.0-1.0 based on combined scores

Characteristics: - Latency: Depends on components - Eval Tier: Highest of components - Use case: Balanced quality/speed tradeoff

7.3 Standard Metric Profiles

Pre-defined metric configurations for common use cases. Blueprints MAY reference these instead of defining metrics individually.

7.3.1 default-general Profile

ctq:
  profile: "default-general"
  # Equivalent to:
  # metrics:
  #   reasoning_quality: { weight: 0.25, scorer: "llm-judge" }
  #   knowledge_grounding: { weight: 0.20, scorer: "source-match" }
  #   ethical_alignment: { weight: 0.20, scorer: "llm-judge" }
  #   tool_safety: { weight: 0.20, scorer: "rule-based" }
  #   context_awareness: { weight: 0.15, scorer: "hybrid" }

7.3.2 finance-strict Profile

ctq:
  profile: "finance-strict"
  # Optimized for financial applications
  # - Higher weight on numerical accuracy
  # - Stricter source requirements
  # - Lower thresholds (more conservative)

7.3.3 code-assistant Profile

ctq:
  profile: "code-assistant"
  # Optimized for code generation/review
  # - Higher weight on tool safety
  # - Pattern matching for dangerous operations
  # - Lower weight on ethical alignment

7.3.4 safety-critical Profile

ctq:
  profile: "safety-critical"
  # For high-risk autonomous operations
  # - All metrics weighted equally
  # - Strictest thresholds
  # - Mandatory human review triggers

7.4 Scorer Interface Specification [NORMATIVE]

Custom scorer implementations MUST conform to this interface:

class Scorer(Protocol):
    """Standard ACGP scorer interface."""

    @property
    def scorer_type(self) -> str:
        """Return scorer type identifier."""
        ...

    @property
    def eval_tier(self) -> int:
        """Return evaluation tier (0-3)."""
        ...

    @property
    def latency_budget_ms(self) -> int:
        """Return expected latency budget in milliseconds."""
        ...

    def evaluate(
        self, 
        trace: CognitiveTrace, 
        parameters: dict
    ) -> ScorerResult:
        """
        Evaluate the trace and return a score.

        Args:
            trace: The cognitive trace to evaluate
            parameters: Scorer-specific configuration

        Returns:
            ScorerResult with score (0.0-1.0) and metadata
        """
        ...


@dataclass
class ScorerResult:
    """Result from a scorer evaluation."""
    score: float           # 0.0 to 1.0
    confidence: float      # 0.0 to 1.0
    explanation: str       # Human-readable explanation
    evidence: dict         # Supporting data
    latency_ms: int        # Actual evaluation time

7.5 Reference Scorer Implementations

ACGP provides reference scorer implementations (basic quality, open source) for conformance testing.

Reference implementations are NOT optimized for production use. Commercial implementations SHOULD provide: - 10x better accuracy (through fine-tuning) - 10x lower latency (through optimization) - Domain-specific calibration

7.5.1 Reference LLM Judge

class ReferenceLLMJudge:
    """
    Reference implementation of llm-judge scorer.
    Basic quality, suitable for testing.
    """

    scorer_type = "llm-judge"
    eval_tier = 2
    latency_budget_ms = 3000

    DEFAULT_PROMPT = """
    Evaluate the following AI agent reasoning and action:

    Reasoning: {reasoning}
    Action: {action}
    Context: {context}

    Rate the quality on a scale of 0.0 to 1.0 where:
    - 1.0 = Excellent: Clear, logical, well-supported reasoning
    - 0.5 = Acceptable: Some gaps but generally sound
    - 0.0 = Poor: Flawed logic, unsupported claims

    Respond with only a JSON object:
    {{"score": <float>, "explanation": "<string>"}}
    """

    def evaluate(self, trace, parameters):
        prompt = parameters.get("prompt_template", self.DEFAULT_PROMPT)
        prompt = prompt.format(
            reasoning=trace.reasoning,
            action=trace.action,
            context=trace.context
        )

        # Call LLM (reference uses basic OpenAI call)
        response = self.llm_client.complete(
            model=parameters.get("model", "gpt-4"),
            prompt=prompt,
            temperature=parameters.get("temperature", 0.0)
        )

        result = json.loads(response)
        return ScorerResult(
            score=result["score"],
            confidence=0.7,  # Reference impl has moderate confidence
            explanation=result["explanation"],
            evidence={"model": parameters.get("model")},
            latency_ms=self.last_latency_ms
        )

7.6 Calibration Guidance

Scorers require calibration to produce meaningful scores. Guidelines:

  1. Establish ground truth: Create labeled datasets with known good/bad examples
  2. Benchmark regularly: Compare scorer output against ground truth
  3. Adjust thresholds: Tune blueprint thresholds based on calibration results
  4. Monitor drift: Track score distributions over time
# Example calibration configuration
calibration:
  dataset: "internal-benchmark-v1"
  ground_truth_labels: ["pass", "fail", "borderline"]
  expected_correlation: 0.85
  recalibration_schedule: "monthly"

8. The evidence Block

The evidence block is an OPTIONAL object that defines requirements for grounding the agent's reasoning against the Certified Source Registry.

  • min_certified_sources (integer, optional): The minimum number of sources from the registry that must be cited.
  • source_categories (array, optional): A list of required source categories (e.g., ["regulatory", "peer_reviewed"]).
  • min_trust_score (float, optional): Minimum trust score required for sources (0.0 to 1.0).

9. Tripwires

Tripwires support Governance Contracts (ACGP-1010).

The tripwires block is an OPTIONAL array of high-priority safety checks that MUST be evaluated before threshold-based CTQ evaluation. Tripwires are the only mechanism that can trigger a HALT intervention.

9.1 Tripwire Schema

Each tripwire object contains:

  • id (string, required): Unique identifier for the tripwire (e.g., pii_exposure_check)
  • when (object, required): Trigger condition (same semantics as checks)
    • hook (string): Lifecycle hook (tool_call, output, etc.)
    • tool (string, optional): Tool name filter
  • condition (object or string, required): Condition expression (see Section 9.4 for DSL)
  • eval_tier (integer, optional): Evaluation tier (0 or 1). Default: 0
    • 0: In-memory rule, no external calls (target <100ms)
    • 1: Database/cache lookup allowed (target <300ms)
  • latency_budget_ms (integer, optional): Per-tripwire latency budget. Default: 100 for tier 0, 300 for tier 1
  • requires_state (boolean, optional): Whether tripwire needs stateful storage (e.g., rate limiting). Default: false
  • on_fail (object, required): Intervention when condition evaluates to false
    • decision (string, required): MUST be one of nudge, flag, escalate, block, halt
    • reason (string, required): Human-readable explanation

9.2 Execution Semantics

  1. Priority: Tripwires MUST run before CTQ-based checks
  2. Short-circuit: If any tripwire issues halt, evaluation MUST terminate immediately
  3. Latency: Stewards MUST enforce latency_budget_ms per tripwire; on timeout, MUST apply the fallback behavior from the governance contract (ACGP-1010)
  4. Tier Constraint: Blueprints MUST NOT define tripwires with eval_tier > 1 (only Tier 0 and 1 allowed for tripwires)

9.3 Example: Tripwire-Enabled Blueprint

id: finance/trading_bot@2.0
version: "2.0.0"
description: "Trading agent with safety tripwires"
inherits: clarity.baseline@1.0

tripwires:
  # Tier 0: In-memory rule
  - id: trade_size_limit
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "args.trade_value <= 100000"
    eval_tier: 0
    latency_budget_ms: 50
    requires_state: false
    on_fail:
      decision: "halt"
      reason: "Trade exceeds $100k hard limit"

  # Tier 1: Rate limit check (requires DB lookup)
  - id: daily_trade_count
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "storage.get('trades_today') < 50"
    eval_tier: 1
    latency_budget_ms: 200
    requires_state: true
    on_fail:
      decision: "block"
      reason: "Daily trade limit (50) exceeded"

checks:
  # CTQ-based checks run after tripwires pass
  - id: trade_rationale_quality
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "risk_assessment_clarity"
      weight: 0.8
      check:
        type: "llm"

9.4 Tripwire Condition DSL [NORMATIVE]

Tripwire conditions use a structured expression language for machine-executable safety checks.

9.4.1 Grammar Specification (ABNF)

; Tripwire Condition DSL Grammar
condition       = simple-expr / compound-expr

; Compound expressions
compound-expr   = all-expr / any-expr / not-expr
all-expr        = "all" ":" "[" condition-list "]"
any-expr        = "any" ":" "[" condition-list "]"
not-expr        = "NOT" simple-expr
condition-list  = condition *("," condition)

; Simple expressions  
simple-expr     = comparison / function-call / field-access

; Comparisons
comparison      = field-access operator value
operator        = ">" / ">=" / "<" / "<=" / "==" / "!=" / "contains" / "matches"
field-access    = identifier *("." identifier)
identifier      = ALPHA *(ALPHA / DIGIT / "_")
value           = string / number / boolean / array

; Function calls
function-call   = function-name "(" argument-list ")"
function-name   = "is_external" / "in_allowlist" / "in_denylist" / 
                  "matches_regex" / "contains_entity" / "exceeds_rate"
argument-list   = [argument *("," argument)]
argument        = field-access / value

; Literals
string          = DQUOTE *CHAR DQUOTE
number          = ["-"] 1*DIGIT ["." 1*DIGIT]
boolean         = "true" / "false"
array           = "[" [value *("," value)] "]"

9.4.2 Standard Operators

Operator Description Example
> Greater than args.amount > 10000
>= Greater or equal response_size >= 1MB
< Less than retry_count < 3
<= Less or equal trust_debt <= 0.5
== Equal action.type == "delete"
!= Not equal destination != "internal"
contains String contains content contains "password"
matches Regex match content matches "\\d{3}-\\d{2}-\\d{4}"

9.4.3 Standard Functions

Function Description Example
is_external(field) Check if endpoint is external is_external(destination)
in_allowlist(field, list) Check against allowlist in_allowlist(tool, "approved_tools")
in_denylist(field, list) Check against denylist in_denylist(domain, "blocked_domains")
matches_regex(field, pattern) Regex pattern match matches_regex(content, "SSN_PATTERN")
contains_entity(field, type) NER entity detection contains_entity(output, "credit_card")
exceeds_rate(key, limit, window) Rate limit check exceeds_rate(agent_id, 100, "1h")

9.4.4 Compound Conditions

ALL (AND logic):

condition:
  all:
    - response_size > 10MB
    - is_external(destination)
    - NOT in_allowlist(destination, "trusted_endpoints")

ANY (OR logic):

condition:
  any:
    - action.type == "delete"
    - action.type == "drop"
    - action.type == "truncate"

NOT (negation):

condition:
  NOT:
    in_allowlist(tool, "approved_tools")

9.4.5 Complete DSL Examples

Data Exfiltration Detection:

tripwires:
  - id: data_exfiltration
    severity: severe
    condition:
      all:
        - response_size > 10MB
        - is_external(destination)
        - NOT in_allowlist(destination, "approved_endpoints")
    on_fail:
      decision: halt
      reason: "Potential data exfiltration detected"

PII Exposure Check:

tripwires:
  - id: pii_exposure
    severity: critical
    condition:
      any:
        - matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b")  # SSN
        - contains_entity(content, "credit_card")
        - contains_entity(content, "bank_account")
    on_fail:
      decision: block
      reason: "PII detected in output"

Rate Limiting:

tripwires:
  - id: api_rate_limit
    severity: standard
    eval_tier: 1
    requires_state: true
    condition:
      NOT:
        exceeds_rate(agent_id, 100, "1m")
    on_fail:
      decision: block
      reason: "Rate limit exceeded (100 req/min)"

Dangerous Operations:

tripwires:
  - id: dangerous_db_ops
    severity: severe
    condition:
      all:
        - tool == "database_query"
        - any:
            - action.query contains "DROP"
            - action.query contains "DELETE FROM"
            - action.query contains "TRUNCATE"
        - NOT in_allowlist(action.table, "deletable_tables")
    on_fail:
      decision: halt
      reason: "Dangerous database operation blocked"

9.4.6 Implementation Requirements

Implementations MUST: - Parse and evaluate all standard operators - Implement all standard functions - Support nested compound expressions (at least 3 levels deep) - Return clear error messages for malformed conditions

Implementations SHOULD: - Optimize repeated evaluations (compile conditions) - Support custom functions via extension registry - Provide condition validation at blueprint load time


10. Trust Debt Configuration

Trust debt rules are fully configurable per blueprint.

10.1 Trust Debt Block Structure

The trust_debt block defines how trust violations accumulate, decay, and can be recovered.

blueprint:
  trust_debt:
    enabled: true

    accumulation:
      # Debt added per intervention type
      flag: 0.05
      nudge: 0.02
      block: 0.15
      halt: 0.50

    decay:
      rate: 0.95              # Multiplier per period
      period_hours: 24        # How often decay applies
      min_debt: 0.0           # Floor value

    recovery:
      enabled: true
      successful_review_credit: -0.10    # Credit for passed human review
      good_behavior_audit:
        threshold_hours: 168             # 7 days of good behavior
        credit: -0.20                    # Credit awarded
      agent_remediation_request: true    # Allow agents to request review

    thresholds:
      elevated_monitoring: 0.30   # Trigger increased scrutiny
      restricted_mode: 0.50       # Stricter thresholds apply
      re_tiering_review: 0.75     # Queue ARS re-evaluation

    severity_weights:
      standard: 1.0
      critical: 2.0
      severe: 5.0

10.2 Accumulation Rules [NORMATIVE]

Trust debt accumulates when interventions are issued:

def accumulate_trust_debt(
    current_debt: float,
    intervention: str,
    tripwire_severity: str,
    config: TrustDebtConfig
) -> float:
    """
    Calculate new trust debt after intervention.

    Returns:
        Updated trust debt value (0.0 to 1.0)
    """
    base_debt = config.accumulation.get(intervention, 0.0)

    # Apply severity weight for tripwire-triggered interventions
    if tripwire_severity:
        weight = config.severity_weights.get(tripwire_severity, 1.0)
        base_debt *= weight

    new_debt = min(1.0, current_debt + base_debt)
    return new_debt

Accumulation Defaults:

Intervention Default Debt Rationale
flag 0.05 Minor concern, needs tracking
nudge 0.02 Corrected behavior
block 0.15 Serious violation
halt 0.50 Critical failure

10.3 Decay Rules [NORMATIVE]

Trust debt decays over time to allow recovery:

def apply_trust_debt_decay(
    current_debt: float,
    hours_elapsed: float,
    config: TrustDebtConfig
) -> float:
    """
    Apply time-based decay to trust debt.

    Formula: new_debt = current_debt * (rate ^ periods)
    where periods = hours_elapsed / period_hours
    """
    periods = hours_elapsed / config.decay.period_hours
    decayed_debt = current_debt * (config.decay.rate ** periods)

    return max(config.decay.min_debt, decayed_debt)

Decay Example:

With default settings (0.95 rate, 24-hour period): - After 1 day: debt × 0.95 = 95% remaining - After 7 days: debt × 0.95^7 ≈ 70% remaining - After 14 days: debt × 0.95^14 ≈ 49% remaining

10.4 Recovery Mechanisms [NORMATIVE]

Trust debt can be actively reduced through positive actions:

10.4.1 Successful Human Review

When a human reviewer approves an escalated action:

recovery:
  successful_review_credit: -0.10

The agent receives credit for being "correct" when flagged for review.

10.4.2 Good Behavior Audit

Extended period without violations:

recovery:
  good_behavior_audit:
    threshold_hours: 168    # 7 days
    credit: -0.20

If the agent operates for threshold_hours without any interventions other than ok, trust credit is applied.

10.4.3 Agent Remediation Request

Agents may request re-evaluation:

recovery:
  agent_remediation_request: true

When enabled, agents can submit a remediation request explaining corrective actions taken. A human reviewer evaluates and may grant credit.

10.5 Severity-Weighted Debt

Tripwire-triggered interventions use severity multipliers:

severity_weights:
  standard: 1.0    # Normal violation
  critical: 2.0    # Serious violation (2x debt)
  severe: 5.0      # Critical failure (5x debt)

Example:

A block intervention (0.15 base) triggered by a severe tripwire: - Debt = 0.15 × 5.0 = 0.75 (immediate re-tiering threshold)

10.6 Threshold Actions

When trust debt exceeds thresholds, actions are triggered:

Threshold Default Action
elevated_monitoring 0.30 Increase logging, notify operations
restricted_mode 0.50 Apply stricter intervention thresholds
re_tiering_review 0.75 Queue agent for ARS re-evaluation

10.7 Implementation Requirements

Implementations MUST: - Track trust debt per agent (not per session) - Apply decay on every evaluation (or use lazy evaluation) - Persist debt across restarts - Log all debt changes to ReflectionDB

Implementations SHOULD: - Expose trust debt via monitoring metrics - Alert when agents approach re-tiering threshold - Provide dashboard visibility into debt trends


11. The scoring Block

The scoring block is a REQUIRED object that defines the default mapping of the final CTQ score to intervention decisions for this blueprint.

11.1 Threshold Format

The thresholds object MUST use Risk Score (1.0 - CTQ) as the basis for decision boundaries.

scoring:
  thresholds:
    # Risk Score thresholds (1.0 - CTQ)
    ok: 0.25          # Risk Score ≤ 0.25 → OK
    nudge: 0.40       # Risk Score ≤ 0.40 → NUDGE
    escalate: 0.55    # Risk Score ≤ 0.55 → ESCALATE
    block: 0.70       # Risk Score > 0.70 → BLOCK
    # HALT triggered by tripwires, not thresholds

Note: These thresholds serve as the default for this blueprint. They MAY be overridden by the more stringent thresholds associated with an agent's specific ACL Tier, as defined in ACGP-1005. The Policy Engine MUST apply the stricter of the two threshold sets.

11.2 Threshold Override Rules

When both blueprint thresholds and ACL thresholds exist:

def apply_thresholds(blueprint_thresholds, acl_thresholds, risk_score):
    """Apply the stricter threshold."""
    # Use the lower threshold (stricter) for each decision boundary
    ok_threshold = min(blueprint_thresholds.ok, acl_thresholds.ok)
    nudge_threshold = min(blueprint_thresholds.nudge, acl_thresholds.nudge)
    escalate_threshold = min(blueprint_thresholds.escalate, acl_thresholds.escalate)
    block_threshold = min(blueprint_thresholds.block, acl_thresholds.block)

    # Apply thresholds to risk score
    if risk_score <= ok_threshold:
        return "ok"
    elif risk_score <= nudge_threshold:
        return "nudge"
    elif risk_score <= escalate_threshold:
        return "escalate"
    elif risk_score <= block_threshold:
        return "block"
    else:
        return "block"  # HALT is reserved for tripwires only

# Note: HALT intervention is only issued when tripwires are triggered,
# never from threshold-based CTQ evaluation alone.

12. The Clarity Baseline

The Clarity Baseline is a special, mandatory blueprint that serves as the root of the inheritance chain for all other blueprints. It enforces universal checks for cognitive safety and soundness.

12.1 Clarity Baseline Specification

id: clarity.baseline@1.0
version: "1.0.0"
description: "Universal cognitive safety baseline for all agents"

checks:
  # Logical Consistency
  - id: no_contradictions
    when:
      hook: "output"
    metric:
      name: "logical_consistency"
      weight: 0.20
      check:
        type: "llm"
        args:
          prompt_template: "Check for logical contradictions"

  # Reasoning Clarity
  - id: reasoning_transparency
    when:
      hook: "output"
    metric:
      name: "clarity"
      weight: 0.25
      check:
        type: "llm"
        args:
          prompt_template: "Evaluate reasoning transparency"

  # Knowledge Grounding
  - id: knowledge_grounding
    when:
      hook: "output"
    metric:
      name: "knowledge_grounding"
      weight: 0.25
      check:
        type: "tool"
        args:
          name: "source_validator"

  # Cognitive Bias Detection
  - id: bias_detection
    when:
      hook: "output"
    metric:
      name: "bias_free"
      weight: 0.15
      check:
        type: "llm"
        args:
          prompt_template: "Detect cognitive biases"

  # Safety Check
  - id: safety_check
    when:
      hook: "output"
    metric:
      name: "safety"
      weight: 0.15
      check:
        type: "llm"
        args:
          prompt_template: "Check for harmful content"

scoring:
  thresholds:
    ok: 0.30
    nudge: 0.45
    escalate: 0.60
    block: 0.75

12.2 Conformance Requirement

A conformant Policy Engine MUST ensure that every applied blueprint inherits from the Clarity Baseline. If a blueprint does not specify an inherits field, the Policy Engine MUST automatically set it to clarity.baseline@1.0.


13. Blueprint Versioning

13.1 Version Format [NORMATIVE]

Blueprint versions MUST follow Semantic Versioning 2.0.0:

MAJOR.MINOR.PATCH

Examples:
- 1.0.0 (initial release)
- 1.1.0 (backward-compatible additions)
- 2.0.0 (breaking changes)

Version Components: - MAJOR: Breaking changes (threshold changes, removed checks, semantic changes) - MINOR: Backward-compatible additions (new optional checks, new tripwires) - PATCH: Bug fixes, clarifications, documentation

13.2 Blueprint ID Format

domain/name@version

Examples:
- finance/trading@1.2.0
- healthcare/patient-data@2.0.0
- clarity.baseline@1.0

13.3 Compatibility Rules

13.3.1 Breaking Changes (MAJOR version bump required)

Change Type Example Breaking?
Remove required check Remove reasoning_quality metric Yes
Tighten thresholds ok: 0.85ok: 0.90 Yes
Add required field New required evidence block Yes
Change scoring semantics Switch from CTQ to Risk Score Yes
Change tripwire severity standardcritical Yes

13.3.2 Non-Breaking Changes (MINOR version bump)

Change Type Example Breaking?
Add optional check New optional metric No
Add optional tripwire New standard-severity tripwire No
Relax thresholds ok: 0.90ok: 0.85 No
Add optional field New optional metadata block No

13.4 Inheritance and Versioning

When inheriting from a versioned blueprint:

# Explicit version (recommended)
inherits: finance/base@2.1.0

# Major version pin (accepts 2.x.x)
inherits: finance/base@2

# Latest (not recommended for production)
inherits: finance/base@latest

Resolution Rules:

  1. Explicit version: Use exactly that version
  2. Major version pin: Use highest MINOR.PATCH within that MAJOR
  3. Latest: Use highest available version (warning in production)

13.5 Deployment and Migration

13.5.1 Staged Rollout

For blueprint updates affecting multiple agents:

migration:
  blueprint_id: finance/trading@2.0.0
  from_version: 1.5.0

  stages:
    - name: canary
      percentage: 5
      duration: 24h
      success_criteria:
        - intervention_rate_delta < 10%
        - no_halt_interventions

    - name: gradual
      percentage: 25
      duration: 48h

    - name: full
      percentage: 100

13.5.2 Rollback Procedure

rollback:
  trigger:
    - halt_rate > 1%
    - error_rate > 5%
    - manual_trigger

  action:
    - revert_to: previous_version
    - notify: ops_team
    - log: rollback_event

13.6 Version Compatibility Matrix

Stewards MUST maintain compatibility information:

compatibility:
  blueprint: finance/trading

  versions:
    - version: 2.0.0
      compatible_with:
        - steward: ">=1.5.0"
        - sdk: ">=2.0.0"
      requires:
        - trust_debt: true
        - tripwires: [standard, critical]

    - version: 1.5.0
      compatible_with:
        - steward: ">=1.0.0"
        - sdk: ">=1.0.0"

14. Complete Example

# Example Blueprint: finance/trading_bot@1.0
id: finance/trading_bot@1.0
version: "1.0.0"
description: "Governance for an automated stock trading agent."
inherits: clarity.baseline@1.0

scope:
  agent_tier: [ACL-3, ACL-4]
  tools: ["execute_trade", "market_data_api"]

evidence:
  min_certified_sources: 2
  source_categories: ["regulatory_filing", "reputable_news"]
  min_trust_score: 0.7

checks:
  # Metric-based check for risk analysis quality
  - id: trade_rationale_quality
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "risk_assessment_clarity"
      weight: 0.6
      check:
        type: "llm"
        args:
          prompt_template: "templates/check_risk_rationale.txt"

  # Rule-based check for trade size limit
  - id: single_trade_volume_cap
    when:
      hook: "tool_call"
      tool: "execute_trade"
    rule:
      condition: "args.trade_value <= 50000"
      on_fail:
        decision: "block"
        reason: "Trade value exceeds single transaction limit of $50,000."

  # Metric-based check for source quality
  - id: source_recency
    when:
      hook: "tool_call"
    metric:
      name: "knowledge_grounding_recency"
      weight: 0.4
      check:
        type: "tool"
        args:
          name: "source_validator_api"

scoring:
  # Risk Score thresholds (1.0 - CTQ)
  # Note: ACL-3 thresholds from ACGP-1005 will apply if stricter
  thresholds:
    ok: 0.20      # Matches ACL-3 threshold
    nudge: 0.35   # Matches ACL-3 threshold
    escalate: 0.50  # Matches ACL-3 threshold
    block: 0.70   # More lenient than ACL-3 (0.50), so ACL-3 will be used

15. Conformance Requirements

A conformant ACGP Policy Engine MUST:

15.1 Parsing Requirements

  • Be able to parse and interpret all fields defined in this specification
  • Validate blueprint structure against the schema
  • Reject blueprints with invalid syntax or missing required fields
  • Support both YAML and JSON formats

15.2 Inheritance Requirements

  • Correctly implement the inheritance logic
  • Append checks from parent blueprints (not override)
  • Ensure the Clarity Baseline is always active
  • Resolve inheritance chains correctly

15.3 Execution Requirements

  • Trigger Rule-Based checks based on the when condition
  • Trigger Metric-Based checks based on the when condition
  • Calculate weighted CTQ scores correctly
  • Apply threshold override rules (stricter of blueprint vs ACL)
  • Support all six intervention types: ok, nudge, flag, escalate, block, halt

15.4 Validation Requirements

  • A JSON Schema derived from this specification SHALL be provided for validation of blueprint files
  • Implementations MUST validate blueprints before loading
  • Implementations MUST provide clear error messages for validation failures

16. References

Normative References

  • ACGP-1000: Core Protocol Specification
  • ACGP-1001: Terminology and Definitions
  • ACGP-1002: Architecture Specification
  • ACGP-1005: ARS-CTQ-ACL Integration Framework
  • ACGP-1006: Certified Source Registry Specification
  • ACGP-1010: Governance Contracts
  • RFC 2119: Key words for use in RFCs

Informative References

  • YAML 1.2 Specification: https://yaml.org/spec/1.2/spec.html
  • JSON Schema: https://json-schema.org/
  • Semantic Versioning 2.0.0: https://semver.org/

End of ACGP-1004