ACGP-1004: Reflection Blueprint Specification¶
Status: Draft
Last Updated: 2026-01-08
Spec ID: ACGP-1004
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119)
Abstract¶
This document specifies the structure, syntax, and semantics of ACGP Reflection Blueprints. Blueprints are human-readable YAML or JSON configuration files that define runtime governance policies for Operating Agents. They serve as the primary mechanism for translating organizational policies into enforceable rules, covering quality metrics, evidence requirements, behavioral checks, and scoring thresholds. This specification provides the normative schema that all conformant ACGP Policy Engines must use to parse and apply these policies.
Table of Contents¶
- Introduction
- Format and Structure
- Top-Level Fields (Metadata)
- The
scopeBlock - The
inheritsField - The
checksBlock (Rules & Metrics) - CTQ Configuration (Blueprint-Centric)
- The
evidenceBlock - Tripwires
- Trust Debt Configuration
- The
scoringBlock - The Clarity Baseline
- Blueprint Versioning
- Complete Example
- Conformance Requirements
- References
1. Introduction¶
A core principle of ACGP is the separation of policy from code. Reflection Blueprints are the embodiment of this principle. They allow governance rules to be defined and managed by domain experts, compliance officers, and legal teams, rather than being hard-coded into the agent's logic. This enables an agile and transparent approach to AI governance.
Blueprint Selection: Blueprints are configured and selected by the Governance Steward, not by the operating agent. The steward automatically selects the appropriate blueprint based on agent scope (agent_id, agent_tier, tools), inheritance rules, and request context. This maintains proper separation of concerns and prevents agents from bypassing governance policies.
1.1 Requirements Language¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
2. Format and Structure¶
- Format: A Reflection Blueprint MUST be a valid YAML 1.2 or JSON document. YAML is RECOMMENDED for human readability.
- Structure: A blueprint is a single object containing a set of key-value pairs that define the policy. The primary sections are:
- Metadata:
id,version,description. - Scope: Defines which agents, tools, or domains this blueprint applies to.
- Inheritance: Specifies a parent blueprint to inherit from.
- Checks: The list of specific rules or metrics to evaluate.
- Evidence: Requirements for grounding agent actions in trusted sources.
- Scoring: Default thresholds for mapping CTQ scores to Interventions.
- Metadata:
3. Top-Level Fields (Metadata)¶
Every blueprint MUST contain the following top-level metadata fields:
id(string, required): A unique, human-readable identifier for the blueprint, RECOMMENDED to follow adomain/goal@versionformat.- Example:
finance/customer_comms@1.2
- Example:
version(string, required): The blueprint version following semantic versioning (e.g., "1.2.0").description(string, required): A concise explanation of the blueprint's purpose.
4. The scope Block¶
The scope block is an OPTIONAL object that restricts the application of the blueprint to specific contexts. If omitted, the blueprint is considered globally applicable.
agent_tier(string or array, optional): Specifies the ACL Tier(s) this blueprint applies to (e.g.,ACL-3or[ACL-4, ACL-5]).tools(array, optional): A list of tool names this blueprint governs.domains(array, optional): A list of domains or resources (e.g., production environments) where this blueprint is active.
5. The inherits Field¶
The inherits field (string, optional) specifies the id of a parent blueprint. This allows for creating a hierarchy of policies (e.g., a specific task blueprint inheriting from a general domain blueprint).
- Child blueprints MAY override fields from the parent. The
checksarray is an exception; it is APPENDED, not overridden. - All blueprints MUST ultimately inherit from the
clarity.baseline, as defined in Section 12.
6. The checks Block (Rules & Metrics)¶
The checks block is a REQUIRED array of objects, where each object defines a specific governance rule or metric to be evaluated at runtime.
6.1 Common Check Fields¶
id(string, required): A unique identifier for the check (e.g.,gdpr_minimization).when(object, required): Defines the trigger for the check. This object contains key-value pairs that match against theCognitive Trace(e.g.,hook: "output"ortool: "database_query").
6.2 Rule-Based Checks¶
For simple, direct policy enforcement.
rule(object, required): Contains the logic for a pass/fail check.condition(string, required): An expression or reference to a condition that must be met.on_fail(object, required): The intervention to issue if the condition is not met.decision(string, required): The intervention level. MUST be one of:ok,nudge,flag,escalate,block,halt.reason(string, required): The rationale for the intervention.
Note: The decision field supports all six intervention types as defined in ACGP-1001.
6.3 Metric-Based Checks¶
For weighted scoring that contributes to the final CTQ score.
metric(object, required): Contains the logic for a scored check.name(string, required): The name of the metric (e.g.,numerical_accuracy).weight(float, required): The weight of this metric in the final CTQ calculation (0.0 to 1.0).check(object, required): Defines how to calculate the metric score.type(string, required): The type of evaluation (e.g.,llm,tool,regex).args(object, optional): Arguments to pass to the evaluation provider.
7. CTQ Configuration (Blueprint-Centric)¶
Full CTQ (Cognitive Trust Quotient) configuration is now defined within blueprints, making each blueprint a self-contained governance policy.
7.1 CTQ Block Structure¶
The ctq block defines how cognitive quality is measured and scored for this blueprint.
blueprint:
ctq:
metrics:
reasoning_quality:
weight: 0.25
scorer: "llm-judge"
parameters:
model: "gpt-4"
prompt_template: "reasoning_eval_v1"
threshold_warn: 0.6
threshold_fail: 0.3
knowledge_grounding:
weight: 0.25
scorer: "source-match"
parameters:
required_sources: ["certified-registry"]
min_citation_ratio: 0.8
ethical_alignment:
weight: 0.20
scorer: "llm-judge"
parameters:
prompt_template: "ethics_eval_v1"
tool_safety:
weight: 0.20
scorer: "rule-based"
parameters:
rules:
- "no_destructive_ops_without_confirmation"
- "rate_limit_external_calls"
context_awareness:
weight: 0.10
scorer: "hybrid"
parameters:
rule_weight: 0.4
llm_weight: 0.6
aggregation: "weighted_average"
thresholds:
ok: 0.85
nudge: 0.70
flag: 0.50
block: 0.30
7.2 Standard Scorer Types [NORMATIVE]¶
Implementations MUST support these standard scorer types:
7.2.1 rule-based Scorer¶
Fast, deterministic evaluation using predefined rules.
Interface:
scorer: "rule-based"
parameters:
rules: [string] # List of rule IDs to evaluate
mode: "all" | "any" # all = AND, any = OR (default: "all")
# Output: 1.0 if rules pass, 0.0 if fail (binary)
Characteristics: - Latency: <10ms - Eval Tier: 0 - Use case: Hard constraints, permission checks
7.2.2 llm-judge Scorer¶
LLM-based semantic evaluation.
Interface:
scorer: "llm-judge"
parameters:
model: string # Model identifier (e.g., "gpt-4", "claude-3")
prompt_template: string # Template ID for evaluation prompt
temperature: float # Sampling temperature (default: 0.0)
max_tokens: int # Max tokens for response (default: 256)
# Output: float 0.0-1.0 based on LLM assessment
Characteristics: - Latency: 500ms-5000ms - Eval Tier: 2 - Use case: Nuanced reasoning, context understanding
Note: LLM-based scorers are where commercial differentiation occurs. The protocol defines the interface; premium implementations provide better accuracy and lower latency.
7.2.3 source-match Scorer¶
Evaluates grounding against certified sources.
Interface:
scorer: "source-match"
parameters:
required_sources: [string] # Registry categories required
min_citation_ratio: float # Min % of claims with citations
min_source_trust: float # Minimum source trust score
# Output: float 0.0-1.0 based on citation coverage and trust
Characteristics: - Latency: 50ms-200ms - Eval Tier: 1 - Use case: Knowledge grounding, fact verification
7.2.4 pattern-match Scorer¶
Regex or pattern-based content scanning.
Interface:
scorer: "pattern-match"
parameters:
patterns:
- pattern: "regex-pattern"
score_on_match: float # Score if matched
score_on_miss: float # Score if not matched
aggregation: "min" | "max" | "avg"
# Output: float 0.0-1.0 based on pattern matching
Characteristics: - Latency: <50ms - Eval Tier: 0 - Use case: PII detection, keyword filtering
7.2.5 hybrid Scorer¶
Combines multiple scorer types.
Interface:
scorer: "hybrid"
parameters:
scorers:
- type: "rule-based"
weight: 0.4
parameters: {...}
- type: "llm-judge"
weight: 0.6
parameters: {...}
aggregation: "weighted_average" | "min" | "max"
# Output: float 0.0-1.0 based on combined scores
Characteristics: - Latency: Depends on components - Eval Tier: Highest of components - Use case: Balanced quality/speed tradeoff
7.3 Standard Metric Profiles¶
Pre-defined metric configurations for common use cases. Blueprints MAY reference these instead of defining metrics individually.
7.3.1 default-general Profile¶
ctq:
profile: "default-general"
# Equivalent to:
# metrics:
# reasoning_quality: { weight: 0.25, scorer: "llm-judge" }
# knowledge_grounding: { weight: 0.20, scorer: "source-match" }
# ethical_alignment: { weight: 0.20, scorer: "llm-judge" }
# tool_safety: { weight: 0.20, scorer: "rule-based" }
# context_awareness: { weight: 0.15, scorer: "hybrid" }
7.3.2 finance-strict Profile¶
ctq:
profile: "finance-strict"
# Optimized for financial applications
# - Higher weight on numerical accuracy
# - Stricter source requirements
# - Lower thresholds (more conservative)
7.3.3 code-assistant Profile¶
ctq:
profile: "code-assistant"
# Optimized for code generation/review
# - Higher weight on tool safety
# - Pattern matching for dangerous operations
# - Lower weight on ethical alignment
7.3.4 safety-critical Profile¶
ctq:
profile: "safety-critical"
# For high-risk autonomous operations
# - All metrics weighted equally
# - Strictest thresholds
# - Mandatory human review triggers
7.4 Scorer Interface Specification [NORMATIVE]¶
Custom scorer implementations MUST conform to this interface:
class Scorer(Protocol):
"""Standard ACGP scorer interface."""
@property
def scorer_type(self) -> str:
"""Return scorer type identifier."""
...
@property
def eval_tier(self) -> int:
"""Return evaluation tier (0-3)."""
...
@property
def latency_budget_ms(self) -> int:
"""Return expected latency budget in milliseconds."""
...
def evaluate(
self,
trace: CognitiveTrace,
parameters: dict
) -> ScorerResult:
"""
Evaluate the trace and return a score.
Args:
trace: The cognitive trace to evaluate
parameters: Scorer-specific configuration
Returns:
ScorerResult with score (0.0-1.0) and metadata
"""
...
@dataclass
class ScorerResult:
"""Result from a scorer evaluation."""
score: float # 0.0 to 1.0
confidence: float # 0.0 to 1.0
explanation: str # Human-readable explanation
evidence: dict # Supporting data
latency_ms: int # Actual evaluation time
7.5 Reference Scorer Implementations¶
ACGP provides reference scorer implementations (basic quality, open source) for conformance testing.
Reference implementations are NOT optimized for production use. Commercial implementations SHOULD provide: - 10x better accuracy (through fine-tuning) - 10x lower latency (through optimization) - Domain-specific calibration
7.5.1 Reference LLM Judge¶
class ReferenceLLMJudge:
"""
Reference implementation of llm-judge scorer.
Basic quality, suitable for testing.
"""
scorer_type = "llm-judge"
eval_tier = 2
latency_budget_ms = 3000
DEFAULT_PROMPT = """
Evaluate the following AI agent reasoning and action:
Reasoning: {reasoning}
Action: {action}
Context: {context}
Rate the quality on a scale of 0.0 to 1.0 where:
- 1.0 = Excellent: Clear, logical, well-supported reasoning
- 0.5 = Acceptable: Some gaps but generally sound
- 0.0 = Poor: Flawed logic, unsupported claims
Respond with only a JSON object:
{{"score": <float>, "explanation": "<string>"}}
"""
def evaluate(self, trace, parameters):
prompt = parameters.get("prompt_template", self.DEFAULT_PROMPT)
prompt = prompt.format(
reasoning=trace.reasoning,
action=trace.action,
context=trace.context
)
# Call LLM (reference uses basic OpenAI call)
response = self.llm_client.complete(
model=parameters.get("model", "gpt-4"),
prompt=prompt,
temperature=parameters.get("temperature", 0.0)
)
result = json.loads(response)
return ScorerResult(
score=result["score"],
confidence=0.7, # Reference impl has moderate confidence
explanation=result["explanation"],
evidence={"model": parameters.get("model")},
latency_ms=self.last_latency_ms
)
7.6 Calibration Guidance¶
Scorers require calibration to produce meaningful scores. Guidelines:
- Establish ground truth: Create labeled datasets with known good/bad examples
- Benchmark regularly: Compare scorer output against ground truth
- Adjust thresholds: Tune blueprint thresholds based on calibration results
- Monitor drift: Track score distributions over time
# Example calibration configuration
calibration:
dataset: "internal-benchmark-v1"
ground_truth_labels: ["pass", "fail", "borderline"]
expected_correlation: 0.85
recalibration_schedule: "monthly"
8. The evidence Block¶
The evidence block is an OPTIONAL object that defines requirements for grounding the agent's reasoning against the Certified Source Registry.
min_certified_sources(integer, optional): The minimum number of sources from the registry that must be cited.source_categories(array, optional): A list of required source categories (e.g.,["regulatory", "peer_reviewed"]).min_trust_score(float, optional): Minimum trust score required for sources (0.0 to 1.0).
9. Tripwires¶
Tripwires support Governance Contracts (ACGP-1010).
The tripwires block is an OPTIONAL array of high-priority safety checks that MUST be evaluated before threshold-based CTQ evaluation. Tripwires are the only mechanism that can trigger a HALT intervention.
9.1 Tripwire Schema¶
Each tripwire object contains:
id(string, required): Unique identifier for the tripwire (e.g.,pii_exposure_check)when(object, required): Trigger condition (same semantics as checks)hook(string): Lifecycle hook (tool_call,output, etc.)tool(string, optional): Tool name filter
condition(object or string, required): Condition expression (see Section 9.4 for DSL)eval_tier(integer, optional): Evaluation tier (0 or 1). Default:00: In-memory rule, no external calls (target <100ms)1: Database/cache lookup allowed (target <300ms)
latency_budget_ms(integer, optional): Per-tripwire latency budget. Default:100for tier 0,300for tier 1requires_state(boolean, optional): Whether tripwire needs stateful storage (e.g., rate limiting). Default:falseon_fail(object, required): Intervention when condition evaluates tofalsedecision(string, required): MUST be one ofnudge,flag,escalate,block,haltreason(string, required): Human-readable explanation
9.2 Execution Semantics¶
- Priority: Tripwires MUST run before CTQ-based checks
- Short-circuit: If any tripwire issues
halt, evaluation MUST terminate immediately - Latency: Stewards MUST enforce
latency_budget_msper tripwire; on timeout, MUST apply the fallback behavior from the governance contract (ACGP-1010) - Tier Constraint: Blueprints MUST NOT define tripwires with
eval_tier > 1(only Tier 0 and 1 allowed for tripwires)
9.3 Example: Tripwire-Enabled Blueprint¶
id: finance/trading_bot@2.0
version: "2.0.0"
description: "Trading agent with safety tripwires"
inherits: clarity.baseline@1.0
tripwires:
# Tier 0: In-memory rule
- id: trade_size_limit
when:
hook: "tool_call"
tool: "execute_trade"
condition: "args.trade_value <= 100000"
eval_tier: 0
latency_budget_ms: 50
requires_state: false
on_fail:
decision: "halt"
reason: "Trade exceeds $100k hard limit"
# Tier 1: Rate limit check (requires DB lookup)
- id: daily_trade_count
when:
hook: "tool_call"
tool: "execute_trade"
condition: "storage.get('trades_today') < 50"
eval_tier: 1
latency_budget_ms: 200
requires_state: true
on_fail:
decision: "block"
reason: "Daily trade limit (50) exceeded"
checks:
# CTQ-based checks run after tripwires pass
- id: trade_rationale_quality
when:
hook: "tool_call"
tool: "execute_trade"
metric:
name: "risk_assessment_clarity"
weight: 0.8
check:
type: "llm"
9.4 Tripwire Condition DSL [NORMATIVE]¶
Tripwire conditions use a structured expression language for machine-executable safety checks.
9.4.1 Grammar Specification (ABNF)¶
; Tripwire Condition DSL Grammar
condition = simple-expr / compound-expr
; Compound expressions
compound-expr = all-expr / any-expr / not-expr
all-expr = "all" ":" "[" condition-list "]"
any-expr = "any" ":" "[" condition-list "]"
not-expr = "NOT" simple-expr
condition-list = condition *("," condition)
; Simple expressions
simple-expr = comparison / function-call / field-access
; Comparisons
comparison = field-access operator value
operator = ">" / ">=" / "<" / "<=" / "==" / "!=" / "contains" / "matches"
field-access = identifier *("." identifier)
identifier = ALPHA *(ALPHA / DIGIT / "_")
value = string / number / boolean / array
; Function calls
function-call = function-name "(" argument-list ")"
function-name = "is_external" / "in_allowlist" / "in_denylist" /
"matches_regex" / "contains_entity" / "exceeds_rate"
argument-list = [argument *("," argument)]
argument = field-access / value
; Literals
string = DQUOTE *CHAR DQUOTE
number = ["-"] 1*DIGIT ["." 1*DIGIT]
boolean = "true" / "false"
array = "[" [value *("," value)] "]"
9.4.2 Standard Operators¶
| Operator | Description | Example |
|---|---|---|
> |
Greater than | args.amount > 10000 |
>= |
Greater or equal | response_size >= 1MB |
< |
Less than | retry_count < 3 |
<= |
Less or equal | trust_debt <= 0.5 |
== |
Equal | action.type == "delete" |
!= |
Not equal | destination != "internal" |
contains |
String contains | content contains "password" |
matches |
Regex match | content matches "\\d{3}-\\d{2}-\\d{4}" |
9.4.3 Standard Functions¶
| Function | Description | Example |
|---|---|---|
is_external(field) |
Check if endpoint is external | is_external(destination) |
in_allowlist(field, list) |
Check against allowlist | in_allowlist(tool, "approved_tools") |
in_denylist(field, list) |
Check against denylist | in_denylist(domain, "blocked_domains") |
matches_regex(field, pattern) |
Regex pattern match | matches_regex(content, "SSN_PATTERN") |
contains_entity(field, type) |
NER entity detection | contains_entity(output, "credit_card") |
exceeds_rate(key, limit, window) |
Rate limit check | exceeds_rate(agent_id, 100, "1h") |
9.4.4 Compound Conditions¶
ALL (AND logic):
condition:
all:
- response_size > 10MB
- is_external(destination)
- NOT in_allowlist(destination, "trusted_endpoints")
ANY (OR logic):
NOT (negation):
9.4.5 Complete DSL Examples¶
Data Exfiltration Detection:
tripwires:
- id: data_exfiltration
severity: severe
condition:
all:
- response_size > 10MB
- is_external(destination)
- NOT in_allowlist(destination, "approved_endpoints")
on_fail:
decision: halt
reason: "Potential data exfiltration detected"
PII Exposure Check:
tripwires:
- id: pii_exposure
severity: critical
condition:
any:
- matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b") # SSN
- contains_entity(content, "credit_card")
- contains_entity(content, "bank_account")
on_fail:
decision: block
reason: "PII detected in output"
Rate Limiting:
tripwires:
- id: api_rate_limit
severity: standard
eval_tier: 1
requires_state: true
condition:
NOT:
exceeds_rate(agent_id, 100, "1m")
on_fail:
decision: block
reason: "Rate limit exceeded (100 req/min)"
Dangerous Operations:
tripwires:
- id: dangerous_db_ops
severity: severe
condition:
all:
- tool == "database_query"
- any:
- action.query contains "DROP"
- action.query contains "DELETE FROM"
- action.query contains "TRUNCATE"
- NOT in_allowlist(action.table, "deletable_tables")
on_fail:
decision: halt
reason: "Dangerous database operation blocked"
9.4.6 Implementation Requirements¶
Implementations MUST: - Parse and evaluate all standard operators - Implement all standard functions - Support nested compound expressions (at least 3 levels deep) - Return clear error messages for malformed conditions
Implementations SHOULD: - Optimize repeated evaluations (compile conditions) - Support custom functions via extension registry - Provide condition validation at blueprint load time
10. Trust Debt Configuration¶
Trust debt rules are fully configurable per blueprint.
10.1 Trust Debt Block Structure¶
The trust_debt block defines how trust violations accumulate, decay, and can be recovered.
blueprint:
trust_debt:
enabled: true
accumulation:
# Debt added per intervention type
flag: 0.05
nudge: 0.02
block: 0.15
halt: 0.50
decay:
rate: 0.95 # Multiplier per period
period_hours: 24 # How often decay applies
min_debt: 0.0 # Floor value
recovery:
enabled: true
successful_review_credit: -0.10 # Credit for passed human review
good_behavior_audit:
threshold_hours: 168 # 7 days of good behavior
credit: -0.20 # Credit awarded
agent_remediation_request: true # Allow agents to request review
thresholds:
elevated_monitoring: 0.30 # Trigger increased scrutiny
restricted_mode: 0.50 # Stricter thresholds apply
re_tiering_review: 0.75 # Queue ARS re-evaluation
severity_weights:
standard: 1.0
critical: 2.0
severe: 5.0
10.2 Accumulation Rules [NORMATIVE]¶
Trust debt accumulates when interventions are issued:
def accumulate_trust_debt(
current_debt: float,
intervention: str,
tripwire_severity: str,
config: TrustDebtConfig
) -> float:
"""
Calculate new trust debt after intervention.
Returns:
Updated trust debt value (0.0 to 1.0)
"""
base_debt = config.accumulation.get(intervention, 0.0)
# Apply severity weight for tripwire-triggered interventions
if tripwire_severity:
weight = config.severity_weights.get(tripwire_severity, 1.0)
base_debt *= weight
new_debt = min(1.0, current_debt + base_debt)
return new_debt
Accumulation Defaults:
| Intervention | Default Debt | Rationale |
|---|---|---|
flag |
0.05 | Minor concern, needs tracking |
nudge |
0.02 | Corrected behavior |
block |
0.15 | Serious violation |
halt |
0.50 | Critical failure |
10.3 Decay Rules [NORMATIVE]¶
Trust debt decays over time to allow recovery:
def apply_trust_debt_decay(
current_debt: float,
hours_elapsed: float,
config: TrustDebtConfig
) -> float:
"""
Apply time-based decay to trust debt.
Formula: new_debt = current_debt * (rate ^ periods)
where periods = hours_elapsed / period_hours
"""
periods = hours_elapsed / config.decay.period_hours
decayed_debt = current_debt * (config.decay.rate ** periods)
return max(config.decay.min_debt, decayed_debt)
Decay Example:
With default settings (0.95 rate, 24-hour period): - After 1 day: debt × 0.95 = 95% remaining - After 7 days: debt × 0.95^7 ≈ 70% remaining - After 14 days: debt × 0.95^14 ≈ 49% remaining
10.4 Recovery Mechanisms [NORMATIVE]¶
Trust debt can be actively reduced through positive actions:
10.4.1 Successful Human Review¶
When a human reviewer approves an escalated action:
The agent receives credit for being "correct" when flagged for review.
10.4.2 Good Behavior Audit¶
Extended period without violations:
If the agent operates for threshold_hours without any interventions other than ok, trust credit is applied.
10.4.3 Agent Remediation Request¶
Agents may request re-evaluation:
When enabled, agents can submit a remediation request explaining corrective actions taken. A human reviewer evaluates and may grant credit.
10.5 Severity-Weighted Debt¶
Tripwire-triggered interventions use severity multipliers:
severity_weights:
standard: 1.0 # Normal violation
critical: 2.0 # Serious violation (2x debt)
severe: 5.0 # Critical failure (5x debt)
Example:
A block intervention (0.15 base) triggered by a severe tripwire:
- Debt = 0.15 × 5.0 = 0.75 (immediate re-tiering threshold)
10.6 Threshold Actions¶
When trust debt exceeds thresholds, actions are triggered:
| Threshold | Default | Action |
|---|---|---|
elevated_monitoring |
0.30 | Increase logging, notify operations |
restricted_mode |
0.50 | Apply stricter intervention thresholds |
re_tiering_review |
0.75 | Queue agent for ARS re-evaluation |
10.7 Implementation Requirements¶
Implementations MUST: - Track trust debt per agent (not per session) - Apply decay on every evaluation (or use lazy evaluation) - Persist debt across restarts - Log all debt changes to ReflectionDB
Implementations SHOULD: - Expose trust debt via monitoring metrics - Alert when agents approach re-tiering threshold - Provide dashboard visibility into debt trends
11. The scoring Block¶
The scoring block is a REQUIRED object that defines the default mapping of the final CTQ score to intervention decisions for this blueprint.
11.1 Threshold Format¶
The thresholds object MUST use Risk Score (1.0 - CTQ) as the basis for decision boundaries.
scoring:
thresholds:
# Risk Score thresholds (1.0 - CTQ)
ok: 0.25 # Risk Score ≤ 0.25 → OK
nudge: 0.40 # Risk Score ≤ 0.40 → NUDGE
escalate: 0.55 # Risk Score ≤ 0.55 → ESCALATE
block: 0.70 # Risk Score > 0.70 → BLOCK
# HALT triggered by tripwires, not thresholds
Note: These thresholds serve as the default for this blueprint. They MAY be overridden by the more stringent thresholds associated with an agent's specific ACL Tier, as defined in ACGP-1005. The Policy Engine MUST apply the stricter of the two threshold sets.
11.2 Threshold Override Rules¶
When both blueprint thresholds and ACL thresholds exist:
def apply_thresholds(blueprint_thresholds, acl_thresholds, risk_score):
"""Apply the stricter threshold."""
# Use the lower threshold (stricter) for each decision boundary
ok_threshold = min(blueprint_thresholds.ok, acl_thresholds.ok)
nudge_threshold = min(blueprint_thresholds.nudge, acl_thresholds.nudge)
escalate_threshold = min(blueprint_thresholds.escalate, acl_thresholds.escalate)
block_threshold = min(blueprint_thresholds.block, acl_thresholds.block)
# Apply thresholds to risk score
if risk_score <= ok_threshold:
return "ok"
elif risk_score <= nudge_threshold:
return "nudge"
elif risk_score <= escalate_threshold:
return "escalate"
elif risk_score <= block_threshold:
return "block"
else:
return "block" # HALT is reserved for tripwires only
# Note: HALT intervention is only issued when tripwires are triggered,
# never from threshold-based CTQ evaluation alone.
12. The Clarity Baseline¶
The Clarity Baseline is a special, mandatory blueprint that serves as the root of the inheritance chain for all other blueprints. It enforces universal checks for cognitive safety and soundness.
12.1 Clarity Baseline Specification¶
id: clarity.baseline@1.0
version: "1.0.0"
description: "Universal cognitive safety baseline for all agents"
checks:
# Logical Consistency
- id: no_contradictions
when:
hook: "output"
metric:
name: "logical_consistency"
weight: 0.20
check:
type: "llm"
args:
prompt_template: "Check for logical contradictions"
# Reasoning Clarity
- id: reasoning_transparency
when:
hook: "output"
metric:
name: "clarity"
weight: 0.25
check:
type: "llm"
args:
prompt_template: "Evaluate reasoning transparency"
# Knowledge Grounding
- id: knowledge_grounding
when:
hook: "output"
metric:
name: "knowledge_grounding"
weight: 0.25
check:
type: "tool"
args:
name: "source_validator"
# Cognitive Bias Detection
- id: bias_detection
when:
hook: "output"
metric:
name: "bias_free"
weight: 0.15
check:
type: "llm"
args:
prompt_template: "Detect cognitive biases"
# Safety Check
- id: safety_check
when:
hook: "output"
metric:
name: "safety"
weight: 0.15
check:
type: "llm"
args:
prompt_template: "Check for harmful content"
scoring:
thresholds:
ok: 0.30
nudge: 0.45
escalate: 0.60
block: 0.75
12.2 Conformance Requirement¶
A conformant Policy Engine MUST ensure that every applied blueprint inherits from the Clarity Baseline. If a blueprint does not specify an inherits field, the Policy Engine MUST automatically set it to clarity.baseline@1.0.
13. Blueprint Versioning¶
13.1 Version Format [NORMATIVE]¶
Blueprint versions MUST follow Semantic Versioning 2.0.0:
MAJOR.MINOR.PATCH
Examples:
- 1.0.0 (initial release)
- 1.1.0 (backward-compatible additions)
- 2.0.0 (breaking changes)
Version Components: - MAJOR: Breaking changes (threshold changes, removed checks, semantic changes) - MINOR: Backward-compatible additions (new optional checks, new tripwires) - PATCH: Bug fixes, clarifications, documentation
13.2 Blueprint ID Format¶
domain/name@version
Examples:
- finance/trading@1.2.0
- healthcare/patient-data@2.0.0
- clarity.baseline@1.0
13.3 Compatibility Rules¶
13.3.1 Breaking Changes (MAJOR version bump required)¶
| Change Type | Example | Breaking? |
|---|---|---|
| Remove required check | Remove reasoning_quality metric |
Yes |
| Tighten thresholds | ok: 0.85 → ok: 0.90 |
Yes |
| Add required field | New required evidence block |
Yes |
| Change scoring semantics | Switch from CTQ to Risk Score | Yes |
| Change tripwire severity | standard → critical |
Yes |
13.3.2 Non-Breaking Changes (MINOR version bump)¶
| Change Type | Example | Breaking? |
|---|---|---|
| Add optional check | New optional metric | No |
| Add optional tripwire | New standard-severity tripwire | No |
| Relax thresholds | ok: 0.90 → ok: 0.85 |
No |
| Add optional field | New optional metadata block |
No |
13.4 Inheritance and Versioning¶
When inheriting from a versioned blueprint:
# Explicit version (recommended)
inherits: finance/base@2.1.0
# Major version pin (accepts 2.x.x)
inherits: finance/base@2
# Latest (not recommended for production)
inherits: finance/base@latest
Resolution Rules:
- Explicit version: Use exactly that version
- Major version pin: Use highest MINOR.PATCH within that MAJOR
- Latest: Use highest available version (warning in production)
13.5 Deployment and Migration¶
13.5.1 Staged Rollout¶
For blueprint updates affecting multiple agents:
migration:
blueprint_id: finance/trading@2.0.0
from_version: 1.5.0
stages:
- name: canary
percentage: 5
duration: 24h
success_criteria:
- intervention_rate_delta < 10%
- no_halt_interventions
- name: gradual
percentage: 25
duration: 48h
- name: full
percentage: 100
13.5.2 Rollback Procedure¶
rollback:
trigger:
- halt_rate > 1%
- error_rate > 5%
- manual_trigger
action:
- revert_to: previous_version
- notify: ops_team
- log: rollback_event
13.6 Version Compatibility Matrix¶
Stewards MUST maintain compatibility information:
compatibility:
blueprint: finance/trading
versions:
- version: 2.0.0
compatible_with:
- steward: ">=1.5.0"
- sdk: ">=2.0.0"
requires:
- trust_debt: true
- tripwires: [standard, critical]
- version: 1.5.0
compatible_with:
- steward: ">=1.0.0"
- sdk: ">=1.0.0"
14. Complete Example¶
# Example Blueprint: finance/trading_bot@1.0
id: finance/trading_bot@1.0
version: "1.0.0"
description: "Governance for an automated stock trading agent."
inherits: clarity.baseline@1.0
scope:
agent_tier: [ACL-3, ACL-4]
tools: ["execute_trade", "market_data_api"]
evidence:
min_certified_sources: 2
source_categories: ["regulatory_filing", "reputable_news"]
min_trust_score: 0.7
checks:
# Metric-based check for risk analysis quality
- id: trade_rationale_quality
when:
hook: "tool_call"
tool: "execute_trade"
metric:
name: "risk_assessment_clarity"
weight: 0.6
check:
type: "llm"
args:
prompt_template: "templates/check_risk_rationale.txt"
# Rule-based check for trade size limit
- id: single_trade_volume_cap
when:
hook: "tool_call"
tool: "execute_trade"
rule:
condition: "args.trade_value <= 50000"
on_fail:
decision: "block"
reason: "Trade value exceeds single transaction limit of $50,000."
# Metric-based check for source quality
- id: source_recency
when:
hook: "tool_call"
metric:
name: "knowledge_grounding_recency"
weight: 0.4
check:
type: "tool"
args:
name: "source_validator_api"
scoring:
# Risk Score thresholds (1.0 - CTQ)
# Note: ACL-3 thresholds from ACGP-1005 will apply if stricter
thresholds:
ok: 0.20 # Matches ACL-3 threshold
nudge: 0.35 # Matches ACL-3 threshold
escalate: 0.50 # Matches ACL-3 threshold
block: 0.70 # More lenient than ACL-3 (0.50), so ACL-3 will be used
15. Conformance Requirements¶
A conformant ACGP Policy Engine MUST:
15.1 Parsing Requirements¶
- Be able to parse and interpret all fields defined in this specification
- Validate blueprint structure against the schema
- Reject blueprints with invalid syntax or missing required fields
- Support both YAML and JSON formats
15.2 Inheritance Requirements¶
- Correctly implement the inheritance logic
- Append checks from parent blueprints (not override)
- Ensure the Clarity Baseline is always active
- Resolve inheritance chains correctly
15.3 Execution Requirements¶
- Trigger Rule-Based checks based on the
whencondition - Trigger Metric-Based checks based on the
whencondition - Calculate weighted CTQ scores correctly
- Apply threshold override rules (stricter of blueprint vs ACL)
- Support all six intervention types: ok, nudge, flag, escalate, block, halt
15.4 Validation Requirements¶
- A JSON Schema derived from this specification SHALL be provided for validation of blueprint files
- Implementations MUST validate blueprints before loading
- Implementations MUST provide clear error messages for validation failures
16. References¶
Normative References¶
- ACGP-1000: Core Protocol Specification
- ACGP-1001: Terminology and Definitions
- ACGP-1002: Architecture Specification
- ACGP-1005: ARS-CTQ-ACL Integration Framework
- ACGP-1006: Certified Source Registry Specification
- ACGP-1010: Governance Contracts
- RFC 2119: Key words for use in RFCs
Informative References¶
- YAML 1.2 Specification: https://yaml.org/spec/1.2/spec.html
- JSON Schema: https://json-schema.org/
- Semantic Versioning 2.0.0: https://semver.org/
End of ACGP-1004