ACGP-1004: Reflection Blueprint Specification¶

Status: Draft
Last Updated: 2026-01-08
Spec ID: ACGP-1004
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)

Abstract¶

This document specifies the structure, syntax, and semantics of ACGP Reflection Blueprints. Blueprints are human-readable YAML or JSON configuration files that define runtime governance policies for Operating Agents. They serve as the primary mechanism for translating organizational policies into enforceable rules, covering quality metrics, evidence requirements, behavioral checks, and scoring thresholds. This specification provides the normative schema that all conformant ACGP Policy Engines must use to parse and apply these policies.

Table of Contents¶

Introduction
Format and Structure
Top-Level Fields (Metadata)
The scope Block
The inherits Field
The checks Block (Rules & Metrics)
CTQ Configuration (Blueprint-Centric)
The evidence Block
Tripwires
Trust Debt Configuration
The scoring Block
The Clarity Baseline
Blueprint Versioning
Complete Example
Conformance Requirements
References

1. Introduction¶

A core principle of ACGP is the separation of policy from code. Reflection Blueprints are the embodiment of this principle. They allow governance rules to be defined and managed by domain experts, compliance officers, and legal teams, rather than being hard-coded into the agent's logic. This enables an agile and transparent approach to AI governance.

Blueprint Selection: Blueprints are configured and selected by the Governance Steward, not by the operating agent. The steward automatically selects the appropriate blueprint based on agent scope (agent_id, agent_tier, tools), inheritance rules, and request context. This maintains proper separation of concerns and prevents agents from bypassing governance policies.

1.1 Requirements Language¶

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

2. Format and Structure¶

Format: A Reflection Blueprint MUST be a valid YAML 1.2 or JSON document. YAML is RECOMMENDED for human readability.
Structure: A blueprint is a single object containing a set of key-value pairs that define the policy. The primary sections are:
1. Metadata: id, version, description.
2. Scope: Defines which agents, tools, or domains this blueprint applies to.
3. Inheritance: Specifies a parent blueprint to inherit from.
4. Checks: The list of specific rules or metrics to evaluate.
5. Evidence: Requirements for grounding agent actions in trusted sources.
6. Scoring: Default thresholds for mapping CTQ scores to Interventions.

3. Top-Level Fields (Metadata)¶

Every blueprint MUST contain the following top-level metadata fields:

id (string, required): A unique, human-readable identifier for the blueprint, RECOMMENDED to follow a domain/goal@version format.
- Example: finance/customer_comms@1.2
version (string, required): The blueprint version following semantic versioning (e.g., "1.2.0").
description (string, required): A concise explanation of the blueprint's purpose.

4. The `scope` Block¶

The scope block is an OPTIONAL object that restricts the application of the blueprint to specific contexts. If omitted, the blueprint is considered globally applicable.

agent_tier (string or array, optional): Specifies the ACL Tier(s) this blueprint applies to (e.g., ACL-3 or [ACL-4, ACL-5]).
tools (array, optional): A list of tool names this blueprint governs.
domains (array, optional): A list of domains or resources (e.g., production environments) where this blueprint is active.

5. The `inherits` Field¶

The inherits field (string, optional) specifies the id of a parent blueprint. This allows for creating a hierarchy of policies (e.g., a specific task blueprint inheriting from a general domain blueprint).

Child blueprints MAY override fields from the parent. The checks array is an exception; it is APPENDED, not overridden.
All blueprints MUST ultimately inherit from the clarity.baseline, as defined in Section 12.

6. The `checks` Block (Rules & Metrics)¶

The checks block is a REQUIRED array of objects, where each object defines a specific governance rule or metric to be evaluated at runtime.

6.1 Common Check Fields¶

id (string, required): A unique identifier for the check (e.g., gdpr_minimization).
when (object, required): Defines the trigger for the check. This object contains key-value pairs that match against the Cognitive Trace (e.g., hook: "output" or tool: "database_query").

6.2 Rule-Based Checks¶

For simple, direct policy enforcement.

rule (object, required): Contains the logic for a pass/fail check.
- condition (string, required): An expression or reference to a condition that must be met.
- on_fail (object, required): The intervention to issue if the condition is not met.
  - decision (string, required): The intervention level. MUST be one of: ok, nudge, flag, escalate, block, halt.
  - reason (string, required): The rationale for the intervention.

Note: The decision field supports all six intervention types as defined in ACGP-1001.

6.3 Metric-Based Checks¶

For weighted scoring that contributes to the final CTQ score.

metric (object, required): Contains the logic for a scored check.
- name (string, required): The name of the metric (e.g., numerical_accuracy).
- weight (float, required): The weight of this metric in the final CTQ calculation (0.0 to 1.0).
- check (object, required): Defines how to calculate the metric score.
  - type (string, required): The type of evaluation (e.g., llm, tool, regex).
  - args (object, optional): Arguments to pass to the evaluation provider.

7. CTQ Configuration (Blueprint-Centric)¶

Full CTQ (Cognitive Trust Quotient) configuration is now defined within blueprints, making each blueprint a self-contained governance policy.

7.1 CTQ Block Structure¶

The ctq block defines how cognitive quality is measured and scored for this blueprint.

blueprint:
  ctq:
    metrics:
      reasoning_quality:
        weight: 0.25
        scorer: "llm-judge"
        parameters:
          model: "gpt-4"
          prompt_template: "reasoning_eval_v1"
          threshold_warn: 0.6
          threshold_fail: 0.3
      knowledge_grounding:
        weight: 0.25
        scorer: "source-match"
        parameters:
          required_sources: ["certified-registry"]
          min_citation_ratio: 0.8
      ethical_alignment:
        weight: 0.20
        scorer: "llm-judge"
        parameters:
          prompt_template: "ethics_eval_v1"
      tool_safety:
        weight: 0.20
        scorer: "rule-based"
        parameters:
          rules:
            - "no_destructive_ops_without_confirmation"
            - "rate_limit_external_calls"
      context_awareness:
        weight: 0.10
        scorer: "hybrid"
        parameters:
          rule_weight: 0.4
          llm_weight: 0.6

    aggregation: "weighted_average"

    thresholds:
      ok: 0.85
      nudge: 0.70
      flag: 0.50
      block: 0.30

7.2 Standard Scorer Types [NORMATIVE]¶

Implementations MUST support these standard scorer types:

7.2.1 `rule-based` Scorer¶

Fast, deterministic evaluation using predefined rules.

Interface:

scorer: "rule-based"
parameters:
  rules: [string]           # List of rule IDs to evaluate
  mode: "all" | "any"       # all = AND, any = OR (default: "all")

# Output: 1.0 if rules pass, 0.0 if fail (binary)

Characteristics: - Latency: <10ms - Eval Tier: 0 - Use case: Hard constraints, permission checks

7.2.2 `llm-judge` Scorer¶

LLM-based semantic evaluation.

Interface:

scorer: "llm-judge"
parameters:
  model: string             # Model identifier (e.g., "gpt-4", "claude-3")
  prompt_template: string   # Template ID for evaluation prompt
  temperature: float        # Sampling temperature (default: 0.0)
  max_tokens: int           # Max tokens for response (default: 256)

# Output: float 0.0-1.0 based on LLM assessment

Characteristics: - Latency: 500ms-5000ms - Eval Tier: 2 - Use case: Nuanced reasoning, context understanding

Note: LLM-based scorers are where commercial differentiation occurs. The protocol defines the interface; premium implementations provide better accuracy and lower latency.

7.2.3 `source-match` Scorer¶

Evaluates grounding against certified sources.

Interface:

scorer: "source-match"
parameters:
  required_sources: [string]     # Registry categories required
  min_citation_ratio: float      # Min % of claims with citations
  min_source_trust: float        # Minimum source trust score

# Output: float 0.0-1.0 based on citation coverage and trust

Characteristics: - Latency: 50ms-200ms - Eval Tier: 1 - Use case: Knowledge grounding, fact verification

7.2.4 `pattern-match` Scorer¶

Regex or pattern-based content scanning.

Interface:

scorer: "pattern-match"
parameters:
  patterns:
    - pattern: "regex-pattern"
      score_on_match: float    # Score if matched
      score_on_miss: float     # Score if not matched
  aggregation: "min" | "max" | "avg"

# Output: float 0.0-1.0 based on pattern matching

Characteristics: - Latency: <50ms - Eval Tier: 0 - Use case: PII detection, keyword filtering

7.2.5 `hybrid` Scorer¶

Combines multiple scorer types.

Interface:

scorer: "hybrid"
parameters:
  scorers:
    - type: "rule-based"
      weight: 0.4
      parameters: {...}
    - type: "llm-judge"
      weight: 0.6
      parameters: {...}
  aggregation: "weighted_average" | "min" | "max"

# Output: float 0.0-1.0 based on combined scores

Characteristics: - Latency: Depends on components - Eval Tier: Highest of components - Use case: Balanced quality/speed tradeoff

7.3 Standard Metric Profiles¶

Pre-defined metric configurations for common use cases. Blueprints MAY reference these instead of defining metrics individually.

7.3.1 `default-general` Profile¶

ctq:
  profile: "default-general"
  # Equivalent to:
  # metrics:
  #   reasoning_quality: { weight: 0.25, scorer: "llm-judge" }
  #   knowledge_grounding: { weight: 0.20, scorer: "source-match" }
  #   ethical_alignment: { weight: 0.20, scorer: "llm-judge" }
  #   tool_safety: { weight: 0.20, scorer: "rule-based" }
  #   context_awareness: { weight: 0.15, scorer: "hybrid" }

7.3.2 `finance-strict` Profile¶

ctq:
  profile: "finance-strict"
  # Optimized for financial applications
  # - Higher weight on numerical accuracy
  # - Stricter source requirements
  # - Lower thresholds (more conservative)

7.3.3 `code-assistant` Profile¶

ctq:
  profile: "code-assistant"
  # Optimized for code generation/review
  # - Higher weight on tool safety
  # - Pattern matching for dangerous operations
  # - Lower weight on ethical alignment

7.3.4 `safety-critical` Profile¶

ctq:
  profile: "safety-critical"
  # For high-risk autonomous operations
  # - All metrics weighted equally
  # - Strictest thresholds
  # - Mandatory human review triggers

7.4 Scorer Interface Specification [NORMATIVE]¶

Custom scorer implementations MUST conform to this interface:

class Scorer(Protocol):
    """Standard ACGP scorer interface."""

    @property
    def scorer_type(self) -> str:
        """Return scorer type identifier."""
        ...

    @property
    def eval_tier(self) -> int:
        """Return evaluation tier (0-3)."""
        ...

    @property
    def latency_budget_ms(self) -> int:
        """Return expected latency budget in milliseconds."""
        ...

    def evaluate(
        self, 
        trace: CognitiveTrace, 
        parameters: dict
    ) -> ScorerResult:
        """
        Evaluate the trace and return a score.

        Args:
            trace: The cognitive trace to evaluate
            parameters: Scorer-specific configuration

        Returns:
            ScorerResult with score (0.0-1.0) and metadata
        """
        ...


@dataclass
class ScorerResult:
    """Result from a scorer evaluation."""
    score: float           # 0.0 to 1.0
    confidence: float      # 0.0 to 1.0
    explanation: str       # Human-readable explanation
    evidence: dict         # Supporting data
    latency_ms: int        # Actual evaluation time

7.5 Reference Scorer Implementations¶

ACGP provides reference scorer implementations (basic quality, open source) for conformance testing.

Reference implementations are NOT optimized for production use. Commercial implementations SHOULD provide: - 10x better accuracy (through fine-tuning) - 10x lower latency (through optimization) - Domain-specific calibration

7.5.1 Reference LLM Judge¶

class ReferenceLLMJudge:
    """
    Reference implementation of llm-judge scorer.
    Basic quality, suitable for testing.
    """

    scorer_type = "llm-judge"
    eval_tier = 2
    latency_budget_ms = 3000

    DEFAULT_PROMPT = """
    Evaluate the following AI agent reasoning and action:

    Reasoning: {reasoning}
    Action: {action}
    Context: {context}

    Rate the quality on a scale of 0.0 to 1.0 where:
    - 1.0 = Excellent: Clear, logical, well-supported reasoning
    - 0.5 = Acceptable: Some gaps but generally sound
    - 0.0 = Poor: Flawed logic, unsupported claims

    Respond with only a JSON object:
    {{"score": <float>, "explanation": "<string>"}}
    """

    def evaluate(self, trace, parameters):
        prompt = parameters.get("prompt_template", self.DEFAULT_PROMPT)
        prompt = prompt.format(
            reasoning=trace.reasoning,
            action=trace.action,
            context=trace.context
        )

        # Call LLM (reference uses basic OpenAI call)
        response = self.llm_client.complete(
            model=parameters.get("model", "gpt-4"),
            prompt=prompt,
            temperature=parameters.get("temperature", 0.0)
        )

        result = json.loads(response)
        return ScorerResult(
            score=result["score"],
            confidence=0.7,  # Reference impl has moderate confidence
            explanation=result["explanation"],
            evidence={"model": parameters.get("model")},
            latency_ms=self.last_latency_ms
        )

7.6 Calibration Guidance¶

Scorers require calibration to produce meaningful scores. Guidelines:

Establish ground truth: Create labeled datasets with known good/bad examples
Benchmark regularly: Compare scorer output against ground truth
Adjust thresholds: Tune blueprint thresholds based on calibration results
Monitor drift: Track score distributions over time

# Example calibration configuration
calibration:
  dataset: "internal-benchmark-v1"
  ground_truth_labels: ["pass", "fail", "borderline"]
  expected_correlation: 0.85
  recalibration_schedule: "monthly"

8. The `evidence` Block¶

The evidence block is an OPTIONAL object that defines requirements for grounding the agent's reasoning against the Certified Source Registry.

min_certified_sources (integer, optional): The minimum number of sources from the registry that must be cited.
source_categories (array, optional): A list of required source categories (e.g., ["regulatory", "peer_reviewed"]).
min_trust_score (float, optional): Minimum trust score required for sources (0.0 to 1.0).

9. Tripwires¶

Tripwires support Governance Contracts (ACGP-1010).

The tripwires block is an OPTIONAL array of high-priority safety checks that MUST be evaluated before threshold-based CTQ evaluation. Tripwires are the only mechanism that can trigger a HALT intervention.

9.1 Tripwire Schema¶

Each tripwire object contains:

id (string, required): Unique identifier for the tripwire (e.g., pii_exposure_check)
when (object, required): Trigger condition (same semantics as checks)
- hook (string): Lifecycle hook (tool_call, output, etc.)
- tool (string, optional): Tool name filter
condition (object or string, required): Condition expression (see Section 9.4 for DSL)
eval_tier (integer, optional): Evaluation tier (0 or 1). Default: 0
- 0: In-memory rule, no external calls (target <100ms)
- 1: Database/cache lookup allowed (target <300ms)
latency_budget_ms (integer, optional): Per-tripwire latency budget. Default: 100 for tier 0, 300 for tier 1
requires_state (boolean, optional): Whether tripwire needs stateful storage (e.g., rate limiting). Default: false
on_fail (object, required): Intervention when condition evaluates to false
- decision (string, required): MUST be one of nudge, flag, escalate, block, halt
- reason (string, required): Human-readable explanation

9.2 Execution Semantics¶

Priority: Tripwires MUST run before CTQ-based checks
Short-circuit: If any tripwire issues halt, evaluation MUST terminate immediately
Latency: Stewards MUST enforce latency_budget_ms per tripwire; on timeout, MUST apply the fallback behavior from the governance contract (ACGP-1010)
Tier Constraint: Blueprints MUST NOT define tripwires with eval_tier > 1 (only Tier 0 and 1 allowed for tripwires)

9.3 Example: Tripwire-Enabled Blueprint¶

id: finance/trading_bot@2.0
version: "2.0.0"
description: "Trading agent with safety tripwires"
inherits: clarity.baseline@1.0

tripwires:
  # Tier 0: In-memory rule
  - id: trade_size_limit
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "args.trade_value <= 100000"
    eval_tier: 0
    latency_budget_ms: 50
    requires_state: false
    on_fail:
      decision: "halt"
      reason: "Trade exceeds $100k hard limit"

  # Tier 1: Rate limit check (requires DB lookup)
  - id: daily_trade_count
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "storage.get('trades_today') < 50"
    eval_tier: 1
    latency_budget_ms: 200
    requires_state: true
    on_fail:
      decision: "block"
      reason: "Daily trade limit (50) exceeded"

checks:
  # CTQ-based checks run after tripwires pass
  - id: trade_rationale_quality
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "risk_assessment_clarity"
      weight: 0.8
      check:
        type: "llm"

9.4 Tripwire Condition DSL [NORMATIVE]¶

Tripwire conditions use a structured expression language for machine-executable safety checks.

9.4.1 Grammar Specification (ABNF)¶

; Tripwire Condition DSL Grammar
condition       = simple-expr / compound-expr

; Compound expressions
compound-expr   = all-expr / any-expr / not-expr
all-expr        = "all" ":" "[" condition-list "]"
any-expr        = "any" ":" "[" condition-list "]"
not-expr        = "NOT" simple-expr
condition-list  = condition *("," condition)

; Simple expressions  
simple-expr     = comparison / function-call / field-access

; Comparisons
comparison      = field-access operator value
operator        = ">" / ">=" / "<" / "<=" / "==" / "!=" / "contains" / "matches"
field-access    = identifier *("." identifier)
identifier      = ALPHA *(ALPHA / DIGIT / "_")
value           = string / number / boolean / array

; Function calls
function-call   = function-name "(" argument-list ")"
function-name   = "is_external" / "in_allowlist" / "in_denylist" / 
                  "matches_regex" / "contains_entity" / "exceeds_rate"
argument-list   = [argument *("," argument)]
argument        = field-access / value

; Literals
string          = DQUOTE *CHAR DQUOTE
number          = ["-"] 1*DIGIT ["." 1*DIGIT]
boolean         = "true" / "false"
array           = "[" [value *("," value)] "]"

9.4.2 Standard Operators¶

Operator	Description	Example
`>`	Greater than	`args.amount > 10000`
`>=`	Greater or equal	`response_size >= 1MB`
`<`	Less than	`retry_count < 3`
`<=`	Less or equal	`trust_debt <= 0.5`
`==`	Equal	`action.type == "delete"`
`!=`	Not equal	`destination != "internal"`
`contains`	String contains	`content contains "password"`
`matches`	Regex match	`content matches "\\d{3}-\\d{2}-\\d{4}"`

9.4.3 Standard Functions¶

Function	Description	Example
`is_external(field)`	Check if endpoint is external	`is_external(destination)`
`in_allowlist(field, list)`	Check against allowlist	`in_allowlist(tool, "approved_tools")`
`in_denylist(field, list)`	Check against denylist	`in_denylist(domain, "blocked_domains")`
`matches_regex(field, pattern)`	Regex pattern match	`matches_regex(content, "SSN_PATTERN")`
`contains_entity(field, type)`	NER entity detection	`contains_entity(output, "credit_card")`
`exceeds_rate(key, limit, window)`	Rate limit check	`exceeds_rate(agent_id, 100, "1h")`

9.4.4 Compound Conditions¶

ALL (AND logic):

condition:
  all:
    - response_size > 10MB
    - is_external(destination)
    - NOT in_allowlist(destination, "trusted_endpoints")

ANY (OR logic):

condition:
  any:
    - action.type == "delete"
    - action.type == "drop"
    - action.type == "truncate"

NOT (negation):

condition:
  NOT:
    in_allowlist(tool, "approved_tools")

9.4.5 Complete DSL Examples¶

Data Exfiltration Detection:

tripwires:
  - id: data_exfiltration
    severity: severe
    condition:
      all:
        - response_size > 10MB
        - is_external(destination)
        - NOT in_allowlist(destination, "approved_endpoints")
    on_fail:
      decision: halt
      reason: "Potential data exfiltration detected"

PII Exposure Check:

tripwires:
  - id: pii_exposure
    severity: critical
    condition:
      any:
        - matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b")  # SSN
        - contains_entity(content, "credit_card")
        - contains_entity(content, "bank_account")
    on_fail:
      decision: block
      reason: "PII detected in output"

Rate Limiting:

tripwires:
  - id: api_rate_limit
    severity: standard
    eval_tier: 1
    requires_state: true
    condition:
      NOT:
        exceeds_rate(agent_id, 100, "1m")
    on_fail:
      decision: block
      reason: "Rate limit exceeded (100 req/min)"

Dangerous Operations:

tripwires:
  - id: dangerous_db_ops
    severity: severe
    condition:
      all:
        - tool == "database_query"
        - any:
            - action.query contains "DROP"
            - action.query contains "DELETE FROM"
            - action.query contains "TRUNCATE"
        - NOT in_allowlist(action.table, "deletable_tables")
    on_fail:
      decision: halt
      reason: "Dangerous database operation blocked"

9.4.6 Implementation Requirements¶

Implementations MUST: - Parse and evaluate all standard operators - Implement all standard functions - Support nested compound expressions (at least 3 levels deep) - Return clear error messages for malformed conditions

Implementations SHOULD: - Optimize repeated evaluations (compile conditions) - Support custom functions via extension registry - Provide condition validation at blueprint load time

10. Trust Debt Configuration¶

Trust debt rules are fully configurable per blueprint.

10.1 Trust Debt Block Structure¶

The trust_debt block defines how trust violations accumulate, decay, and can be recovered.

blueprint:
  trust_debt:
    enabled: true

    accumulation:
      # Debt added per intervention type
      flag: 0.05
      nudge: 0.02
      block: 0.15
      halt: 0.50

    decay:
      rate: 0.95              # Multiplier per period
      period_hours: 24        # How often decay applies
      min_debt: 0.0           # Floor value

    recovery:
      enabled: true
      successful_review_credit: -0.10    # Credit for passed human review
      good_behavior_audit:
        threshold_hours: 168             # 7 days of good behavior
        credit: -0.20                    # Credit awarded
      agent_remediation_request: true    # Allow agents to request review

    thresholds:
      elevated_monitoring: 0.30   # Trigger increased scrutiny
      restricted_mode: 0.50       # Stricter thresholds apply
      re_tiering_review: 0.75     # Queue ARS re-evaluation

    severity_weights:
      standard: 1.0
      critical: 2.0
      severe: 5.0

10.2 Accumulation Rules [NORMATIVE]¶

Trust debt accumulates when interventions are issued:

def accumulate_trust_debt(
    current_debt: float,
    intervention: str,
    tripwire_severity: str,
    config: TrustDebtConfig
) -> float:
    """
    Calculate new trust debt after intervention.

    Returns:
        Updated trust debt value (0.0 to 1.0)
    """
    base_debt = config.accumulation.get(intervention, 0.0)

    # Apply severity weight for tripwire-triggered interventions
    if tripwire_severity:
        weight = config.severity_weights.get(tripwire_severity, 1.0)
        base_debt *= weight

    new_debt = min(1.0, current_debt + base_debt)
    return new_debt

Accumulation Defaults:

Intervention	Default Debt	Rationale
`flag`	0.05	Minor concern, needs tracking
`nudge`	0.02	Corrected behavior
`block`	0.15	Serious violation
`halt`	0.50	Critical failure

10.3 Decay Rules [NORMATIVE]¶

Trust debt decays over time to allow recovery:

def apply_trust_debt_decay(
    current_debt: float,
    hours_elapsed: float,
    config: TrustDebtConfig
) -> float:
    """
    Apply time-based decay to trust debt.

    Formula: new_debt = current_debt * (rate ^ periods)
    where periods = hours_elapsed / period_hours
    """
    periods = hours_elapsed / config.decay.period_hours
    decayed_debt = current_debt * (config.decay.rate ** periods)

    return max(config.decay.min_debt, decayed_debt)

Decay Example:

With default settings (0.95 rate, 24-hour period): - After 1 day: debt × 0.95 = 95% remaining - After 7 days: debt × 0.95^7 ≈ 70% remaining - After 14 days: debt × 0.95^14 ≈ 49% remaining

10.4 Recovery Mechanisms [NORMATIVE]¶

Trust debt can be actively reduced through positive actions:

10.4.1 Successful Human Review¶

When a human reviewer approves an escalated action:

recovery:
  successful_review_credit: -0.10

The agent receives credit for being "correct" when flagged for review.

10.4.2 Good Behavior Audit¶

Extended period without violations:

recovery:
  good_behavior_audit:
    threshold_hours: 168    # 7 days
    credit: -0.20

If the agent operates for threshold_hours without any interventions other than ok, trust credit is applied.

10.4.3 Agent Remediation Request¶

Agents may request re-evaluation:

recovery:
  agent_remediation_request: true

When enabled, agents can submit a remediation request explaining corrective actions taken. A human reviewer evaluates and may grant credit.

10.5 Severity-Weighted Debt¶

Tripwire-triggered interventions use severity multipliers:

severity_weights:
  standard: 1.0    # Normal violation
  critical: 2.0    # Serious violation (2x debt)
  severe: 5.0      # Critical failure (5x debt)

Example:

A block intervention (0.15 base) triggered by a severe tripwire: - Debt = 0.15 × 5.0 = 0.75 (immediate re-tiering threshold)

10.6 Threshold Actions¶

When trust debt exceeds thresholds, actions are triggered:

Threshold	Default	Action
`elevated_monitoring`	0.30	Increase logging, notify operations
`restricted_mode`	0.50	Apply stricter intervention thresholds
`re_tiering_review`	0.75	Queue agent for ARS re-evaluation

10.7 Implementation Requirements¶

Implementations MUST: - Track trust debt per agent (not per session) - Apply decay on every evaluation (or use lazy evaluation) - Persist debt across restarts - Log all debt changes to ReflectionDB

Implementations SHOULD: - Expose trust debt via monitoring metrics - Alert when agents approach re-tiering threshold - Provide dashboard visibility into debt trends

11. The `scoring` Block¶

The scoring block is a REQUIRED object that defines the default mapping of the final CTQ score to intervention decisions for this blueprint.

11.1 Threshold Format¶

The thresholds object MUST use Risk Score (1.0 - CTQ) as the basis for decision boundaries.

scoring:
  thresholds:
    # Risk Score thresholds (1.0 - CTQ)
    ok: 0.25          # Risk Score ≤ 0.25 → OK
    nudge: 0.40       # Risk Score ≤ 0.40 → NUDGE
    escalate: 0.55    # Risk Score ≤ 0.55 → ESCALATE
    block: 0.70       # Risk Score > 0.70 → BLOCK
    # HALT triggered by tripwires, not thresholds

Note: These thresholds serve as the default for this blueprint. They MAY be overridden by the more stringent thresholds associated with an agent's specific ACL Tier, as defined in ACGP-1005. The Policy Engine MUST apply the stricter of the two threshold sets.

11.2 Threshold Override Rules¶

When both blueprint thresholds and ACL thresholds exist:

def apply_thresholds(blueprint_thresholds, acl_thresholds, risk_score):
    """Apply the stricter threshold."""
    # Use the lower threshold (stricter) for each decision boundary
    ok_threshold = min(blueprint_thresholds.ok, acl_thresholds.ok)
    nudge_threshold = min(blueprint_thresholds.nudge, acl_thresholds.nudge)
    escalate_threshold = min(blueprint_thresholds.escalate, acl_thresholds.escalate)
    block_threshold = min(blueprint_thresholds.block, acl_thresholds.block)

    # Apply thresholds to risk score
    if risk_score <= ok_threshold:
        return "ok"
    elif risk_score <= nudge_threshold:
        return "nudge"
    elif risk_score <= escalate_threshold:
        return "escalate"
    elif risk_score <= block_threshold:
        return "block"
    else:
        return "block"  # HALT is reserved for tripwires only

# Note: HALT intervention is only issued when tripwires are triggered,
# never from threshold-based CTQ evaluation alone.

12. The Clarity Baseline¶

The Clarity Baseline is a special, mandatory blueprint that serves as the root of the inheritance chain for all other blueprints. It enforces universal checks for cognitive safety and soundness.

12.1 Clarity Baseline Specification¶

id: clarity.baseline@1.0
version: "1.0.0"
description: "Universal cognitive safety baseline for all agents"

checks:
  # Logical Consistency
  - id: no_contradictions
    when:
      hook: "output"
    metric:
      name: "logical_consistency"
      weight: 0.20
      check:
        type: "llm"
        args:
          prompt_template: "Check for logical contradictions"

  # Reasoning Clarity
  - id: reasoning_transparency
    when:
      hook: "output"
    metric:
      name: "clarity"
      weight: 0.25
      check:
        type: "llm"
        args:
          prompt_template: "Evaluate reasoning transparency"

  # Knowledge Grounding
  - id: knowledge_grounding
    when:
      hook: "output"
    metric:
      name: "knowledge_grounding"
      weight: 0.25
      check:
        type: "tool"
        args:
          name: "source_validator"

  # Cognitive Bias Detection
  - id: bias_detection
    when:
      hook: "output"
    metric:
      name: "bias_free"
      weight: 0.15
      check:
        type: "llm"
        args:
          prompt_template: "Detect cognitive biases"

  # Safety Check
  - id: safety_check
    when:
      hook: "output"
    metric:
      name: "safety"
      weight: 0.15
      check:
        type: "llm"
        args:
          prompt_template: "Check for harmful content"

scoring:
  thresholds:
    ok: 0.30
    nudge: 0.45
    escalate: 0.60
    block: 0.75

12.2 Conformance Requirement¶

A conformant Policy Engine MUST ensure that every applied blueprint inherits from the Clarity Baseline. If a blueprint does not specify an inherits field, the Policy Engine MUST automatically set it to clarity.baseline@1.0.

13. Blueprint Versioning¶

13.1 Version Format [NORMATIVE]¶

Blueprint versions MUST follow Semantic Versioning 2.0.0:

MAJOR.MINOR.PATCH

Examples:
- 1.0.0 (initial release)
- 1.1.0 (backward-compatible additions)
- 2.0.0 (breaking changes)

Version Components: - MAJOR: Breaking changes (threshold changes, removed checks, semantic changes) - MINOR: Backward-compatible additions (new optional checks, new tripwires) - PATCH: Bug fixes, clarifications, documentation

13.2 Blueprint ID Format¶

domain/name@version

Examples:
- finance/trading@1.2.0
- healthcare/patient-data@2.0.0
- clarity.baseline@1.0

13.3 Compatibility Rules¶

13.3.1 Breaking Changes (MAJOR version bump required)¶

Change Type	Example	Breaking?
Remove required check	Remove `reasoning_quality` metric	Yes
Tighten thresholds	`ok: 0.85` → `ok: 0.90`	Yes
Add required field	New required `evidence` block	Yes
Change scoring semantics	Switch from CTQ to Risk Score	Yes
Change tripwire severity	`standard` → `critical`	Yes

13.3.2 Non-Breaking Changes (MINOR version bump)¶

Change Type	Example	Breaking?
Add optional check	New optional metric	No
Add optional tripwire	New standard-severity tripwire	No
Relax thresholds	`ok: 0.90` → `ok: 0.85`	No
Add optional field	New optional `metadata` block	No

13.4 Inheritance and Versioning¶

When inheriting from a versioned blueprint:

# Explicit version (recommended)
inherits: finance/base@2.1.0

# Major version pin (accepts 2.x.x)
inherits: finance/base@2

# Latest (not recommended for production)
inherits: finance/base@latest

Resolution Rules:

Explicit version: Use exactly that version
Major version pin: Use highest MINOR.PATCH within that MAJOR
Latest: Use highest available version (warning in production)

13.5 Deployment and Migration¶

13.5.1 Staged Rollout¶

For blueprint updates affecting multiple agents:

migration:
  blueprint_id: finance/trading@2.0.0
  from_version: 1.5.0

  stages:
    - name: canary
      percentage: 5
      duration: 24h
      success_criteria:
        - intervention_rate_delta < 10%
        - no_halt_interventions

    - name: gradual
      percentage: 25
      duration: 48h

    - name: full
      percentage: 100

13.5.2 Rollback Procedure¶

rollback:
  trigger:
    - halt_rate > 1%
    - error_rate > 5%
    - manual_trigger

  action:
    - revert_to: previous_version
    - notify: ops_team
    - log: rollback_event

13.6 Version Compatibility Matrix¶

Stewards MUST maintain compatibility information:

compatibility:
  blueprint: finance/trading

  versions:
    - version: 2.0.0
      compatible_with:
        - steward: ">=1.5.0"
        - sdk: ">=2.0.0"
      requires:
        - trust_debt: true
        - tripwires: [standard, critical]

    - version: 1.5.0
      compatible_with:
        - steward: ">=1.0.0"
        - sdk: ">=1.0.0"

14. Complete Example¶

# Example Blueprint: finance/trading_bot@1.0
id: finance/trading_bot@1.0
version: "1.0.0"
description: "Governance for an automated stock trading agent."
inherits: clarity.baseline@1.0

scope:
  agent_tier: [ACL-3, ACL-4]
  tools: ["execute_trade", "market_data_api"]

evidence:
  min_certified_sources: 2
  source_categories: ["regulatory_filing", "reputable_news"]
  min_trust_score: 0.7

checks:
  # Metric-based check for risk analysis quality
  - id: trade_rationale_quality
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "risk_assessment_clarity"
      weight: 0.6
      check:
        type: "llm"
        args:
          prompt_template: "templates/check_risk_rationale.txt"

  # Rule-based check for trade size limit
  - id: single_trade_volume_cap
    when:
      hook: "tool_call"
      tool: "execute_trade"
    rule:
      condition: "args.trade_value <= 50000"
      on_fail:
        decision: "block"
        reason: "Trade value exceeds single transaction limit of $50,000."

  # Metric-based check for source quality
  - id: source_recency
    when:
      hook: "tool_call"
    metric:
      name: "knowledge_grounding_recency"
      weight: 0.4
      check:
        type: "tool"
        args:
          name: "source_validator_api"

scoring:
  # Risk Score thresholds (1.0 - CTQ)
  # Note: ACL-3 thresholds from ACGP-1005 will apply if stricter
  thresholds:
    ok: 0.20      # Matches ACL-3 threshold
    nudge: 0.35   # Matches ACL-3 threshold
    escalate: 0.50  # Matches ACL-3 threshold
    block: 0.70   # More lenient than ACL-3 (0.50), so ACL-3 will be used

15. Conformance Requirements¶

A conformant ACGP Policy Engine MUST:

15.1 Parsing Requirements¶

Be able to parse and interpret all fields defined in this specification
Validate blueprint structure against the schema
Reject blueprints with invalid syntax or missing required fields
Support both YAML and JSON formats

15.2 Inheritance Requirements¶

Correctly implement the inheritance logic
Append checks from parent blueprints (not override)
Ensure the Clarity Baseline is always active
Resolve inheritance chains correctly

15.3 Execution Requirements¶

Trigger Rule-Based checks based on the when condition
Trigger Metric-Based checks based on the when condition
Calculate weighted CTQ scores correctly
Apply threshold override rules (stricter of blueprint vs ACL)
Support all six intervention types: ok, nudge, flag, escalate, block, halt

15.4 Validation Requirements¶

A JSON Schema derived from this specification SHALL be provided for validation of blueprint files
Implementations MUST validate blueprints before loading
Implementations MUST provide clear error messages for validation failures

16. References¶

Normative References¶

ACGP-1000: Core Protocol Specification
ACGP-1001: Terminology and Definitions
ACGP-1002: Architecture Specification
ACGP-1005: ARS-CTQ-ACL Integration Framework
ACGP-1006: Certified Source Registry Specification
ACGP-1010: Governance Contracts
RFC 2119: Key words for use in RFCs

Informative References¶

YAML 1.2 Specification: https://yaml.org/spec/1.2/spec.html
JSON Schema: https://json-schema.org/
Semantic Versioning 2.0.0: https://semver.org/

End of ACGP-1004

ACGP-1004: Reflection Blueprint Specification¶

Abstract¶

Table of Contents¶

1. Introduction¶

1.1 Requirements Language¶

2. Format and Structure¶

3. Top-Level Fields (Metadata)¶

4. The scope Block¶

5. The inherits Field¶

6. The checks Block (Rules & Metrics)¶

6.1 Common Check Fields¶

6.2 Rule-Based Checks¶

6.3 Metric-Based Checks¶

7. CTQ Configuration (Blueprint-Centric)¶

7.1 CTQ Block Structure¶

7.2 Standard Scorer Types [NORMATIVE]¶

7.2.1 rule-based Scorer¶

7.2.2 llm-judge Scorer¶

7.2.3 source-match Scorer¶

7.2.4 pattern-match Scorer¶

7.2.5 hybrid Scorer¶

7.3 Standard Metric Profiles¶

7.3.1 default-general Profile¶

7.3.2 finance-strict Profile¶

7.3.3 code-assistant Profile¶

7.3.4 safety-critical Profile¶

7.4 Scorer Interface Specification [NORMATIVE]¶

7.5 Reference Scorer Implementations¶

7.5.1 Reference LLM Judge¶

7.6 Calibration Guidance¶

8. The evidence Block¶

9. Tripwires¶

9.1 Tripwire Schema¶

9.2 Execution Semantics¶

9.3 Example: Tripwire-Enabled Blueprint¶

9.4 Tripwire Condition DSL [NORMATIVE]¶

9.4.1 Grammar Specification (ABNF)¶

9.4.2 Standard Operators¶

9.4.3 Standard Functions¶

9.4.4 Compound Conditions¶

9.4.5 Complete DSL Examples¶

9.4.6 Implementation Requirements¶

10. Trust Debt Configuration¶

10.1 Trust Debt Block Structure¶

10.2 Accumulation Rules [NORMATIVE]¶

10.3 Decay Rules [NORMATIVE]¶

10.4 Recovery Mechanisms [NORMATIVE]¶

10.4.1 Successful Human Review¶

10.4.2 Good Behavior Audit¶

10.4.3 Agent Remediation Request¶

10.5 Severity-Weighted Debt¶

10.6 Threshold Actions¶

10.7 Implementation Requirements¶

11. The scoring Block¶

11.1 Threshold Format¶

11.2 Threshold Override Rules¶

12. The Clarity Baseline¶

12.1 Clarity Baseline Specification¶

12.2 Conformance Requirement¶

13. Blueprint Versioning¶

13.1 Version Format [NORMATIVE]¶

13.2 Blueprint ID Format¶

13.3 Compatibility Rules¶

13.3.1 Breaking Changes (MAJOR version bump required)¶

13.3.2 Non-Breaking Changes (MINOR version bump)¶

13.4 Inheritance and Versioning¶

13.5 Deployment and Migration¶

13.5.1 Staged Rollout¶

13.5.2 Rollback Procedure¶

13.6 Version Compatibility Matrix¶

14. Complete Example¶

15. Conformance Requirements¶

15.1 Parsing Requirements¶

15.2 Inheritance Requirements¶

15.3 Execution Requirements¶

15.4 Validation Requirements¶

16. References¶

Normative References¶

Informative References¶

4. The `scope` Block¶

5. The `inherits` Field¶

6. The `checks` Block (Rules & Metrics)¶

7.2.1 `rule-based` Scorer¶

7.2.2 `llm-judge` Scorer¶

7.2.3 `source-match` Scorer¶

7.2.4 `pattern-match` Scorer¶

7.2.5 `hybrid` Scorer¶

7.3.1 `default-general` Profile¶

7.3.2 `finance-strict` Profile¶

7.3.3 `code-assistant` Profile¶

7.3.4 `safety-critical` Profile¶

8. The `evidence` Block¶

11. The `scoring` Block¶