Blueprints, Traces & Evaluation¶

Status: Standard-only Alpha (v1.0.0-alpha.2)
Last Updated: 2026-03-18
Spec ID: ACGP-3
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119 and RFC 8174)

Abstract¶

This specification defines Blueprint source artifacts, resolved Blueprint artifacts, canonical checks[] semantics, cognitive trace semantics, the CTQ scorer framework, score-to-intervention threshold mapping, evidence policy, extension descriptors, and trust-policy semantics. It is the primary normative reference for anyone building a Policy Engine.

ACGP-3 is the authoritative normative reference for runtime evaluation ordering, check semantics, CTQ aggregation, threshold mapping, and trust-policy observable semantics.

Requirements Language¶

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Scope [NORMATIVE]¶

ACGP-3 defines governance policy semantics for:

Blueprint source artifact schema and resolved artifact schema
Base-resolution and merge behavior
Cognitive trace model and evaluation ordering
Canonical checks[] semantics for kind: rule | metric
CTQ metrics, evaluator types, and weighted aggregation
Score-to-intervention threshold mapping with Governance Tier interaction
Evidence-policy / source-grounding requirements
Trust-policy observable semantics, default provider behavior, and provider metadata

ACGP-3 does not define:

Transport or envelope mechanics (→ ACGP-2)
Full tripwire DSL grammar and fail-closed semantics (→ ACGP-4)
Audit retention or privacy controls (→ ACGP-5)
Conformance test suites (→ ACGP-6)

Minimal Useful Standard Implementation [INFORMATIVE]¶

A minimal useful ACGP v1.0 Standard implementation does not need to implement every advanced deployment pattern.

A practical starting point is: - one Governance Steward - one Operating Agent runtime - one active baseline blueprint - deterministic checks - tripwire evaluation - CTQ aggregation using the five standard metrics - threshold mapping - orthogonal flag support - default trust debt provider behavior - Governance Store or equivalent durable governance persistence

For conformance, implementations MUST correctly process scorer outputs for all standard scorer types. Actual model inference for Tier-2 (cognitive-evaluator) and Tier-3 (hybrid) scorers is implementation-defined and not required by v1.0 conformance test vectors.

This is the minimum useful alpha.2 implementation path: a single active policy, one runtime applying governance before action execution, and durable audit storage for emitted evidence and decisions.

Reading Map [INFORMATIVE]¶

Blueprint schema and artifact model
Cognitive trace model
Evaluation ordering
Check semantics
CTQ aggregation
Threshold mapping
Evidence policy
Trust debt and provider observables
Canonical EVAL artifact

2. Blueprint Schema [NORMATIVE]¶

A Blueprint source artifact MUST be a valid YAML 1.2 or JSON document. Blueprints are the primary mechanism for translating organizational policies into enforceable runtime rules. They are authored by domain experts and loaded by the Governance Steward. Evaluation engines consume resolved Blueprint artifacts derived from source artifacts.

2.1 Required Top-Level Fields¶

Field	Type	Description
`artifact_type`	string	Source-artifact discriminator. MUST be `acgp.blueprint`
`schema_version`	string	Schema version for the Blueprint source artifact
`id`	string	Unique identifier, RECOMMENDED format: `domain/name@version`
`version`	string	Semantic version for the source artifact
`title`	string	Human-readable policy title
`description`	string	Concise explanation of the blueprint's purpose
`checks`	array	Canonical rules and metrics to evaluate at runtime
`intervention_policy`	object	Threshold mapping from risk score to intervention

2.2 Optional Top-Level Fields¶

Field	Type	Description
`base`	object	Parent source Blueprint reference and optional digest pin
`applicability`	object	Restricts activation by tier, tool, domain
`tripwires`	array	Safety checks evaluated before `checks[]`
`evidence_policy`	object	Source-grounding requirements
`trust_policy`	object	Accumulation/decay policy for trust tracking
`extensions`	object	Public descriptors for required or optional extensions
`annotations`	object	Non-normative annotations
`fixtures`	array	Embedded policy fixtures for expected outcomes

The following fields are not part of Blueprint core and MUST NOT appear in a conformant Blueprint source artifact:

name
ctq
performance_budget
fallback_behavior
metadata
raw inherits
tripwire_syntax_version

2.2.1 Extension Metadata [NORMATIVE]¶

Blueprints MAY declare portable extension requirements using an extensions block:

extensions:
  required:
    - id: "urn:acgp:ext:source-catalog-private@1"
      visibility: private
      enforcement_scope: remote
      fail_mode: reject_activation
      attestation:
        digest: "sha256:..."
        policy_pack_id: "pp:2026-03-10:7"
  optional:
    - id: "urn:acgp:ext:contracts@1"
      visibility: public

Normative rules:

extensions.required[] descriptors MUST be preserved through validation, sync, and activation.
Required extensions MUST declare enforcement_scope as local, remote, or both.
The authoritative enforcer is the local runtime for local, the Governance Steward or remote authoritative runtime for remote, and both runtimes for both.
local scope requires enforcement by the local runtime applying the bundle; if local enforcement is unavailable, the descriptor's fail_mode MUST apply.
remote scope requires preservation and negotiation by the local runtime, including descriptor metadata and any attestation, but MUST NOT by itself block local activation when enforcement is performed only by the remote authoritative runtime.
both scope requires support by both authoritative enforcers on the sync boundary; if either side lacks support, the descriptor's fail_mode MUST apply.
Unsupported optional extensions MAY be preserved and ignored.
private descriptors expose only public negotiation metadata plus optional opaque attestation.
local extensions MUST remain deployment-local metadata and MUST NOT create hidden portability requirements in public blueprints.

The consolidated activation/evaluation outcomes are:

Descriptor	`enforcement_scope`	Local support	Remote support	Result
`required`	`local`	Yes	N/A	Activate and enforce locally
`required`	`local`	No	N/A	Apply `fail_mode` (`reject_activation` or `deny`)
`required`	`remote`	No or partial	Yes	Preserve descriptor locally, activate, and rely on remote authoritative enforcement
`required`	`remote`	Any	No	Apply `fail_mode` at activation or decision time, depending on descriptor semantics
`required`	`both`	Yes	Yes	Activate and require enforcement on both sides
`required`	`both`	No on either side	No on either side	Apply `fail_mode`
`optional`	`local`, `remote`, or `both`	Supported	Supported or preserved	Activate and apply where supported
`optional`	`local`, `remote`, or `both`	Unsupported	Unsupported	Preserve if possible and ignore without failing activation

This table is the single consolidated truth table for local / remote / both handling. ACGP-1 and ACGP-2 cross-reference this section rather than redefining the matrix.

2.3 Applicability Block¶

applicability MAY restrict blueprint activation by context:

applicability:
  governance_tiers: [GT-3, GT-4]   # Governance Tier values this blueprint applies to
  tools: ["execute_trade"]      # Tools governed by this blueprint
  domains: ["production"]       # Deployment domains
  out_of_scope_behavior: block

If omitted, the blueprint is globally applicable.

2.4 Base Resolution and Inheritance¶

If base is present, the child source artifact MUST be resolved against the parent source artifact before evaluation. Resolution produces a first-class resolved Blueprint artifact. The base object uses this minimum shape:

base:
  ref: finance/base@2.0
  digest: sha256:1234...

Child values are merged with parent values using these normative rules:

Field Category	Merge Behavior
`id`, `version`, `title`, `description`	Always child-defined
`base`	Used only for lineage and resolution inputs
`annotations`	Child replaces parent entirely
`applicability`	Child overrides parent entirely when present
`tripwires`	Append; if child defines a tripwire with the same `id` as parent, child definition replaces that parent tripwire
`checks`	Append; if child defines a check with the same `id` as parent, child definition replaces that parent check
`intervention_policy.thresholds`	Child overrides parent per key
`evidence_policy`	Child overrides parent per key
`trust_policy`	Child overrides parent per key
`extensions.required`	Append; if child defines the same `id`, child replaces parent
`extensions.optional`	Append; if child defines the same `id`, child replaces parent

Implementations MUST reject blueprints whose inheritance chain contains a cycle. Cycle detection MUST occur before merge resolution begins. The recommended error code is CircularBlueprintInheritance. An implementation MUST NOT attempt partial merge resolution once a cycle has been detected.

Safety note: because tripwires append by default, adding tripwires in a child blueprint never removes parent safety boundaries. To modify a parent tripwire, the child MUST declare a tripwire with the same id, which replaces only that specific tripwire.

Resolved Blueprint artifacts MUST carry at least these additional fields:

source_blueprint:
  ref: finance/desk-a@2.0
lineage:
  - ref: finance/base@2.0
  - ref: finance/desk-a@2.0
resolved_at: 2026-03-18T10:00:00Z
effective:
  valid_from: 2026-03-18T10:00:00Z
resolution_metadata:
  resolver_version: 2.0.0

Inheritance merge example:

# parent: finance/base@2.0
tripwires:
  - id: max_trade
    condition: args.trade_value > 50000
    on_fail: { decision: "block", reason: "Trade cap exceeded" }

# child: finance/desk-a@2.0
base:
  ref: finance/base@2.0
tripwires:
  - id: max_trade
    condition: args.trade_value > 25000
    on_fail: { decision: "block", reason: "Desk-A stricter cap" }
  - id: sanctions_check
    condition: contains_entity(args.counterparty, "sanctioned_org")
    on_fail: { decision: "halt", reason: "Sanctioned counterparty" }

Result: max_trade is replaced by child definition; sanctions_check is appended.

2.4.1 Operational Resource Limits [NORMATIVE]¶

Implementations MUST define and document operational limits for blueprint size, inheritance depth, and tripwire/check counts.

For v1.0 interoperability guidance, the following minimum rejection thresholds are RECOMMENDED:

maximum inheritance depth: 16
maximum tripwire count per blueprint: 256
maximum check count per blueprint: 256
maximum serialized blueprint size: 1 MiB

2.5 Blueprint Versioning¶

Blueprint versions MUST follow Semantic Versioning 2.0.0:

MAJOR: Breaking changes (threshold changes, removed checks, semantic changes)
MINOR: Backward-compatible additions (new optional checks, new tripwires)
PATCH: Bug fixes, clarifications, documentation

2.6 Resolved Blueprint Consumption¶

Evaluation engines MUST consume resolved Blueprint artifacts rather than ad hoc merged source trees. A source Blueprint MAY be evaluated only after it has been compiled into the resolved form defined in §2.4.

3. Checks Block [NORMATIVE]¶

ACGP v1.0 uses three distinct evaluation constructs:

Tripwires — Top-level tripwires[] safety boundaries evaluated before checks[]. They may produce halt and short-circuit CTQ evaluation.
Deterministic rule checks — checks[] entries with kind: rule. They produce direct pass/fail policy outcomes (ok, nudge, escalate, block) and MAY set flags.flagged.
CTQ metric checks — checks[] entries with kind: metric. They contribute weighted evidence to CTQ aggregation and never produce halt directly.

Authoring decision tree:

If the condition is a hard safety boundary that must fire regardless of CTQ score, use tripwires[].
If the condition is a direct pass/fail policy rule with an explicit intervention, use a rule-based check.
If the condition contributes to runtime quality scoring, use a metric-based check.

The checks array defines the latter two constructs.

3.1 Rule Checks¶

Pass/fail policy enforcement:

checks:
  - id: single_trade_volume_cap
    kind: rule
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "args.trade_value <= 50000"
    on_fail:
      decision: "block"
      reason: "Trade value exceeds $50,000 limit."
    flag: false

decision MUST be one of: ok, nudge, escalate, block. Rule checks MUST NOT declare on_fail.decision: "halt". Blueprints containing halt in a rule-check decision MUST be rejected at load time with error code InvalidBlueprintHaltInRule.

Rule checks MAY additionally declare flag: true | false (default false). When flag: true, a matching check sets flags.flagged: true on the intervention without altering the primary decision.

3.2 Metric Checks¶

Weighted scoring that contributes to the final CTQ score:

checks:
  - id: trade_rationale_quality
    kind: metric
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "reasoning_quality"
      weight: 0.6
      evaluator:
        kind: "cognitive-evaluator"
        args:
          prompt_template: "templates/check_risk_rationale.txt"

3.3 Check Validation Rules¶

The outer checks[] envelope is canonical. Implementations MUST reject any check item that mixes rule-only and metric-only fields. Specifically:

a kind: rule check MUST declare condition and on_fail
a kind: rule check MUST NOT declare metric
a kind: metric check MUST declare metric
a kind: metric check MUST NOT declare condition or on_fail

4. Cognitive Trace Model [NORMATIVE]¶

A conformant evaluator MUST operate on a normalized cognitive trace object.

4.1 Minimum Trace Fields¶

Field	Required	Description
`trace_id`	REQUIRED	Unique identifier
`session_id`	REQUIRED	Session context
`hook`	REQUIRED	Lifecycle hook that triggered the trace
`agent_id`	REQUIRED	Stable identifier of the governed agent principal within a deployment-defined trust domain
`action`	REQUIRED	Action being evaluated; MUST be an object with required `name` and optional `parameters`
`context`	REQUIRED	Contextual data
`reasoning`	RECOMMENDED	Agent's reasoning chain

timestamp MUST come from the envelope timestamp (ACGP-2). TRACE payloads MUST NOT carry a top-level payload timestamp.

The model may supply reasoning and action content, but it does not self-authorize its governing identity. agent_id MUST be assigned by runtime or operator control. ACGP distinguishes the governed principal from mutable display names and from the envelope sender.

4.2 Tool Activity Fields¶

When tool activity exists, traces SHOULD include:

Field	Description
`tool`	Tool name
`args`	Tool arguments
`result`	Tool execution result
`tool_calls`	Array of tool invocations

ACGP-4 canonical tripwire field roots resolve against this trace model. For the authoritative semantic meaning of trace fields used by tripwire expressions, see ACGP-4 §4.1.

4.3 Evidence Fields¶

When evidence is required, traces SHOULD include source/citation metadata for grounding checks (see Section 8).

4.4 Principal Identity Example¶

The following trace fragment shows an opaque principal identifier with a human-readable label in meta:

{
  "trace_id": "01958249-4d55-7000-8000-000000000111",
  "agent_id": "urn:acgp:agent:financeops:prod:7f4c9d2a",
  "session_id": "01958249-4d55-7000-8000-000000000112",
  "hook": "tool_call",
  "context": {},
  "governance_tier": "GT-3",
  "action": {
    "name": "approve_invoice",
    "parameters": { "invoice_id": "INV-1042" }
  },
  "meta": {
    "agent_label": "FinanceOps-Agent",
    "model": "gpt-5-pro"
  }
}

5. Scorer Types and Interface [NORMATIVE]¶

Implementations MUST support five standard scorer families:

5.1 `rule-based` Scorer¶

Fast, deterministic evaluation using predefined rules.

scorer: "rule-based"
parameters:
  rules: [string]           # List of rule IDs
  mode: "all"               # Enum: "all" (AND) or "any" (OR). Default: "all"
# Output: 1.0 if rules pass, 0.0 if fail (binary)

Latency: <10ms | Eval Tier: 0

5.2 `cognitive-evaluator` Scorer¶

A cognitive-evaluator is a non-deterministic evaluator that scores trace quality using an implementation-defined cognitive assessment backend. In alpha implementations, cognitive-evaluator is commonly backed by an LLM-based scorer, but the protocol intentionally standardizes the evaluator role rather than the model family.

scorer: "cognitive-evaluator"
parameters:
  model: string             # Model identifier
  prompt_template: string   # Template ID for evaluation prompt
  temperature: float        # Sampling temperature (default: 0.0)
  max_tokens: int           # Max response tokens (default: 256)
# Output: float 0.0-1.0 based on cognitive assessment

Latency: 500ms-5000ms | Eval Tier: 2

The protocol standardizes the scorer input/output interface. Model selection, accuracy, and latency are implementation-defined.

5.3 `source-match` Scorer¶

Evaluates grounding against certified sources.

scorer: "source-match"
parameters:
  required_sources: [string]     # Registry categories required (public profile)
  policy_handle: string          # Opaque handle for private-source-backed policy
  min_citation_ratio: float      # Min % of claims with citations
  min_source_trust: float        # Minimum source trust score
# Output: float 0.0-1.0 based on citation coverage

Latency: 50ms-200ms | Eval Tier: 1

Public blueprints MUST NOT be required to disclose private source identifiers, document locators, clause references, or derivation provenance for source-match policies. Deployments MAY instead publish an opaque policy_handle or extension attestation that signals the presence of a private source-backed control.

Optional source-match fallback [NORMATIVE]:

If an optional source-match scorer cannot execute because Source Catalog capability is unavailable or the catalog query fails, that scorer MUST be marked unavailable.
The unavailable scorer's weight MUST be redistributed proportionally across the remaining successfully executed scorers in the same aggregation set.
If no scorers remain in that aggregation set after unavailability handling, the aggregation set MUST fail and the enclosing control's explicit fail semantics apply.
This fallback does not apply when source evidence is encoded as a required control, a required extension, or a tripwire-backed fail path. Those cases follow their own explicit fail_mode or tripwire semantics.

5.4 `pattern-match` Scorer¶

Regex or pattern-based content scanning.

scorer: "pattern-match"
parameters:
  patterns:
    - pattern: "regex-pattern"
      score_on_match: float
      score_on_miss: float
  aggregation: "min"         # Enum: "min", "max", or "avg"
# Output: float 0.0-1.0 based on pattern matching

Latency: <50ms | Eval Tier: 0

5.5 `hybrid` Scorer¶

Combines multiple scorer types with weighted aggregation.

scorer: "hybrid"
parameters:
  scorers:
    - type: "rule-based"
      weight: 0.4
      parameters: { ... }
    - type: "cognitive-evaluator"
      weight: 0.6
      parameters: { ... }
  aggregation: "weighted_average"  # Enum: "weighted_average", "min", or "max"

Latency: Depends on components | Eval Tier: Highest of components

5.6 Scorer Interface [NORMATIVE]¶

Custom scorers MUST conform to this interface:

class Scorer(Protocol):
    scorer_type: str
    eval_tier: int
    latency_budget_ms: int

    def evaluate(self, trace: CognitiveTrace, parameters: dict) -> ScorerResult: ...

@dataclass
class ScorerResult:
    score: float           # 0.0 to 1.0
    confidence: float      # 0.0 to 1.0
    explanation: str       # Human-readable
    evidence: dict         # Supporting data
    latency_ms: int        # Actual evaluation time

5.7 Eval Tier Definitions [NORMATIVE]¶

Tier	Strategy	v1.0 Status	Latency Budget
0	In-memory / rule-based	Defined	< 100 ms
1	Stateful lookup (DB/cache)	Defined	< 300 ms
2	Model inference (e.g., `cognitive-evaluator`)	Reserved	Implementation-defined
3	Human-in-the-loop review	Reserved	Implementation-defined

v1.0 constraint: Conformance testing validates Tiers 0 and 1 only. Tiers 2 and 3 are reserved, non-core extension space in v1.0. Implementations MAY support Tiers 2–3 as non-normative extensions, but MUST NOT require them for Standard or Safety-Critical conformance in v1.0. See also ACGP-4 §3 (tripwires MUST NOT declare eval_tier > 1 in v1.0 core).

6. CTQ Metrics and Aggregation [NORMATIVE]¶

6.1 Five Standard Dimensions¶

CTQ evaluates runtime quality across five dimensions:

Dimension	Weight Range	Evaluates
Reasoning Quality	0.20-0.30	Logical consistency, completeness, soundness
Knowledge Grounding	0.15-0.25	Source reliability, factual correctness, recency
Ethical Alignment	0.15-0.25	Harm prevention, fairness, transparency, privacy
Tool Safety	0.15-0.25	Permission validation, input safety, error handling
Context Awareness	0.10-0.20	Relevance, appropriateness, completeness, timing

Weights MUST sum to 1.0. Each weight MUST fall within its allowed range.

If declared weights do not sum to 1.0 (within tolerance ±0.001 to accommodate floating-point representation), the blueprint MUST be rejected at load time with error code INVALID_BLUEPRINT_WEIGHTS. Implementations MUST NOT silently normalize weights.

6.1.1 Metric-to-Dimension Mapping [NORMATIVE]¶

Each metric-based check MUST contribute to exactly one of the five standard CTQ dimensions.

Portable v1.0 blueprints SHOULD use the canonical dimension names directly as metric.name:

reasoning_quality
knowledge_grounding
ethical_alignment
tool_safety
context_awareness

Deployments MAY use more specific local metric names during authoring or preprocessing, but those names MUST be mapped to exactly one canonical CTQ dimension before runtime validation and conformance testing.

When multiple metric-based checks map to the same dimension, their weighted contributions are additive within the global CTQ calculation. The dimension's effective weight is the sum of the weights of the checks mapped to it.

Worked example:

checks:
  - id: rationale_clarity
    metric:
      name: reasoning_quality
      weight: 0.15
  - id: plan_completeness
    metric:
      name: reasoning_quality
      weight: 0.10
  - id: citation_coverage
    metric:
      name: knowledge_grounding
      weight: 0.20
  - id: fairness_review
    metric:
      name: ethical_alignment
      weight: 0.20
  - id: permission_check
    metric:
      name: tool_safety
      weight: 0.20
  - id: situational_fit
    metric:
      name: context_awareness
      weight: 0.15

If rationale_clarity = 0.80 and plan_completeness = 0.90, the combined weighted contribution to the reasoning_quality dimension is (0.80 × 0.15) + (0.90 × 0.10) = 0.21, and the effective reasoning dimension weight is 0.25.

6.1.2 Dimension Result Status [NORMATIVE]¶

Public CTQ results MUST represent evaluator state separately from content quality.

Each CTQ dimension result MUST use one of these status values:

evaluated — score produced through the normal intended scorer path.
degraded — score produced through a declared fallback path.
unavailable — intended scorer path could not evaluate the dimension because the capability or infrastructure was absent.
error — the scorer was attempted and failed unexpectedly with no declared fallback.
failed_evidence_policy — the dimension precondition was not met, so the dimension was gated before scorer execution.

Contributor visibility is part of the public contract:

failed_evidence_policy MUST serialize contributors: [].
unavailable MUST serialize contributors: [].
error MUST keep attempted contributor identifiers visible.
degraded MUST keep attempted contributor identifiers visible.

Implementations MUST distinguish evaluator state from content-quality score. A missing scorer, timeout, unavailable extension, evidence-system outage, or evaluator failure MUST NOT be silently serialized as an ordinary low-quality score unless the Blueprint or scorer policy explicitly defines that fallback behavior.

ACGP standardizes CTQ dimensions, weight rules, aggregation, scorer identifiers, public result fields, and decision semantics; it does not standardize internal prompts, learned heuristics, calibration datasets, source-ranking logic, or model-routing internals.

6.2 Score Interpretation¶

For each metric:

Score	Quality	Description
0.0-0.3	Poor	Severe quality issues
0.3-0.6	Fair	Significant gaps
0.6-0.8	Good	Generally sound, minor issues
0.8-1.0	Excellent	Clear, complete, verified

6.3 CTQ Calculation [NORMATIVE]¶

CTQ = sum(score_i × weight_i)  for i = 1..5

Risk score is derived as:

risk_score = 1.0 - CTQ

When one or more dimensions are not fully evaluated, the public result artifact MUST preserve the actual dimension status.

The Blueprint or scorer policy MUST explicitly determine whether the runtime:

redistributes weight across available dimensions,
applies an explicit fallback score, or
fails closed before producing a final canonical EVAL artifact.

The chosen behavior MUST be visible in the public result artifact and reproducible in conformance vectors.

Numeric Comparison Tolerance [NORMATIVE]: Floating-point results in test vectors MUST match expected values within ±1e-4. Implementations MUST serialize scores to exactly 4 decimal places (round half away from zero). Weight sums MUST match 1.0 within ±0.001 (ACGP-3 §6.1).

An exact threshold-boundary value falls into the less severe category.

Example:

reasoning:  0.90 × 0.25 = 0.225
grounding:  0.80 × 0.20 = 0.160
ethical:    0.85 × 0.20 = 0.170
tool_safety: 0.88 × 0.20 = 0.176
context:    0.82 × 0.15 = 0.123
─────────────────────────────────
CTQ = 0.854    Risk = 0.146

6.4 Weight Declaration in Blueprint¶

Blueprint expresses metric weights only through checks[].metric.weight. There is no normative top-level scoring.weights block and there is no normative ctq shorthand block.

Public Blueprints MUST carry all runtime metric configuration in canonical checks[] entries before validation. Tooling MAY provide private authoring conveniences, but those conveniences MUST compile to the canonical checks[] model and MUST NOT appear in published Blueprint artifacts.

7. Threshold Mapping and Decisions [NORMATIVE]¶

7.1 Threshold Format¶

Threshold intervals are evaluated using closed-upper-bound semantics:

ok if risk_score <= thresholds.ok
nudge if thresholds.ok < risk_score <= thresholds.nudge
escalate if thresholds.nudge < risk_score <= thresholds.escalate
block if risk_score > thresholds.escalate
See ACGP-4 for the tripwire-only halt invariant.

intervention_policy.thresholds maps risk score boundaries to interventions:

intervention_policy:
  thresholds:
    ok: 0.25          # Risk ≤ 0.25 → ok
    nudge: 0.40       # 0.25 < Risk ≤ 0.40 → nudge
    escalate: 0.55    # 0.40 < Risk ≤ 0.55 → escalate
    # Risk > 0.55 → block (implicit)

7.2 Governance-Tier-Specific Thresholds [NORMATIVE]¶

Default risk-score thresholds by Governance Tier:

Governance Tier	OK	Nudge	Escalate	Block
GT-0	`risk ≤ 0.40`	`0.40 < risk ≤ 0.55`	`0.55 < risk ≤ 0.70`	`risk > 0.70` (implicit)
GT-1	`risk ≤ 0.30`	`0.30 < risk ≤ 0.45`	`0.45 < risk ≤ 0.60`	`risk > 0.60` (implicit)
GT-2	`risk ≤ 0.25`	`0.25 < risk ≤ 0.40`	`0.40 < risk ≤ 0.55`	`risk > 0.55` (implicit)
GT-3	`risk ≤ 0.20`	`0.20 < risk ≤ 0.35`	`0.35 < risk ≤ 0.50`	`risk > 0.50` (implicit)
GT-4	`risk ≤ 0.15`	`0.15 < risk ≤ 0.30`	`0.30 < risk ≤ 0.45`	`risk > 0.45` (implicit)
GT-5	`risk ≤ 0.10`	`0.10 < risk ≤ 0.25`	`0.25 < risk ≤ 0.40`	`risk > 0.40` (implicit)

3-threshold model: The block column above is implicit — risk scores exceeding the escalate boundary always map to block. There is no separate block threshold field in the schema.

7.3 Threshold Override Rules¶

When both blueprint and Governance Tier default thresholds exist, the stricter (lower) boundary applies:

effective_ok = min(blueprint_thresholds.ok, governance_tier_defaults.ok)
effective_nudge = min(blueprint_thresholds.nudge, governance_tier_defaults.nudge)
effective_escalate = min(blueprint_thresholds.escalate, governance_tier_defaults.escalate)
# block = risk > effective_escalate (implicit; no separate field)

Worked example: An agent in Governance Tier GT-5 is evaluated with a permissive blueprint (ok: 0.40, nudge: 0.55, escalate: 0.70). Governance Tier GT-5 defaults are ok: 0.10, nudge: 0.25, escalate: 0.40. Effective thresholds:

Threshold	Blueprint	Governance Tier GT-5	Effective (`min`)
ok	0.40	0.10	0.10
nudge	0.55	0.25	0.25
escalate	0.70	0.40	0.40

With risk score 0.30: blueprint alone → ok, but effective thresholds → escalate (0.30 > 0.10 so not ok; 0.30 > 0.25 so not nudge; 0.25 < 0.30 ≤ 0.40 → escalate). The Governance Tier floor prevents a high-risk agent from receiving permissive governance.

7.4 Decision Algorithm [NORMATIVE]¶

def determine_intervention(ctq_score, governance_tier, tripwires):
    # Tripwires ALWAYS take precedence
    if tripwires:
        priority = ["halt", "block", "escalate", "nudge", "ok"]
        decisions = {t.on_fail.decision for t in tripwires}
        for decision in priority:
            if decision in decisions:
                return decision

    # CTQ-based threshold mapping
    risk_score = 1.0 - ctq_score
    thresholds = get_effective_thresholds(governance_tier, blueprint)

    if risk_score <= thresholds.ok:
      return "ok"
    elif risk_score <= thresholds.nudge:
      return "nudge"
    elif risk_score <= thresholds.escalate:
      return "escalate"
    else:
      return "block"

Runtime tripwire precedence is determined only by the triggered on_fail.decision values. severity remains advisory authoring metadata and MUST NOT alter the runtime intervention.

8. Evidence Policy [NORMATIVE]¶

The optional evidence_policy block defines source-grounding requirements:

evidence_policy:
  require_citations: true
  certified_only: true
  min_sources: 2

Control	Type	Description
`require_citations`	boolean	Whether citations are required in governed output
`certified_only`	boolean	Whether only certified/verified sources are acceptable
`min_sources`	integer	Minimum number of qualifying sources required

If evidence controls are declared, they form a dimension-level admissibility gate for knowledge_grounding. Implementations MUST check the gate before any knowledge-grounding scorers run.

If the knowledge_grounding evidence gate fails:

status MUST be failed_evidence_policy
score MUST be 0.0
contributors MUST be []
knowledge-grounding weights MUST NOT be redistributed

If the gate passes, knowledge-grounding scorers MUST run normally.

Evidence policy alone MUST NOT deny the whole governed action. Whole-action denial on evidence failure requires an explicit tripwire outcome or a required extension / required control whose declared fail semantics deny or reject the action.

Evaluators MUST record evidence outcomes in evaluation artifacts. When evidence_policy is declared, evidence_summary SHOULD capture supplementary details such as whether the policy was declared, which controls were checked, and which controls passed or failed. evidence_summary is supplementary detail only and MUST NOT become a second verdict channel.

Richer source-category or trust-floor semantics MAY be implemented through extensions, internal policy compilation, or evaluator-specific private configuration, but they are not part of the canonical Blueprint core schema.

If Source Catalog capability is unavailable and the evidence model relies on optional source-match scoring, the optional scorer fallback in §5.3 applies. If source evidence is required through extensions.required[], fail_mode: deny, or tripwire-backed semantics, those explicit controls override the optional fallback and MUST fail closed as declared.

Advanced source-catalog semantics are defined in the Source Catalog extension.

9. Trust Policy Core (v1.0) [NORMATIVE]¶

Trust debt is a governance state concept exposed through the trust_policy Blueprint block and a provider model. The public observable semantics are normative for v1.0 interoperability. acgp.core.default@1 is the default deterministic provider. Other provider behavior MAY exist through extension boundaries, but implementations MUST preserve the deterministic public boundary defined here even when the underlying provider is private.

9.1 Observable Semantics [NORMATIVE]¶

All conformant implementations MUST preserve these observable semantics:

Trust debt is updated after threshold mapping and orthogonal flag attachment.
Trust debt threshold effects are deterministic and MUST follow §9.6.
Trust debt MUST NEVER relax an intervention's severity.
Trust debt participates in the canonical evaluation order before Governance Tier review and audit emission.
Public artifacts MAY expose current debt, debt deltas, provider identifiers, or opaque attestations, but MUST NOT be required to expose provider internals.
When flags.flagged is true, the trust-debt delta for that evaluation is the sum of the primary decision accumulation weight plus the configured flag accumulation weight.
Trust debt, intervention history, review posture, and other principal-scoped governance state MUST be keyed to agent_id, not to sender_id, agent_label, session_id, or other ephemeral identifiers.

9.2 Provider Configuration Schema¶

trust_policy:
  enabled: true
  provider:
    id: "acgp.core.default@1"
    visibility: public
  accumulation:
    ok: 0.0
    flag: 0.1
    nudge: 0.5
    escalate: 1.0
    block: 2.0
    halt: 5.0
  decay:
    decay_fraction: 0.05
    period_hours: 1
    min_debt: 0.0
  thresholds:
    elevated_monitoring: 3.0
    restricted_mode: 6.0
    re_tiering_review: 10.0

Note: decay_fraction = fraction of debt removed per period (e.g., 0.05 = 5% removed). acgp.core.default@1 names the default deterministic provider that reproduces the baseline semantics.

9.3 Default Provider Behavior [NORMATIVE]¶

If no provider is specified, implementations MUST behave as if provider.id = "acgp.core.default@1" were configured. acgp.core.default@1 is the default deterministic provider for v1.0 and preserves the accumulation/decay model used by v1.0 vectors:

def accumulate(current_debt: float, decision: str, flagged: bool, config) -> float:
  weight = config.accumulation.get(decision, 0.0)
  if flagged:
    weight += config.accumulation.get("flag", 0.0)
  return current_debt + weight

Note: ok interventions do NOT accrue debt by default (weight 0.0), but a flagged ok still accrues the configured flag weight. Omitting ok or flag from the accumulation map is treated as 0.0. Debt units are dimensionless and on a linear scale.

Rationale: halt remains part of trust-debt history even though it is terminal for the current governed action. Recording the halt accumulation preserves a monotonic governance history for post-incident review, operator scrutiny, and any subsequent session- or identity-level handling that reuses the same trust-debt provider state.

A halt intervention does not, by itself, reset accumulated trust debt unless an explicit reset or credit policy is applied.

Implementations MAY require administrative clearance, policy update, or manual reset before halted runtimes resume, but MUST preserve trust-debt observable semantics unless such a reset action is explicitly taken.

9.4 Provider Boundary [NORMATIVE]¶

The protocol standardizes observable effects, provider identifiers, extension descriptors, and optional opaque attestations. The public boundary is limited to externally verifiable behavior: emitted debt values or deltas, threshold-triggered actions, declared provider identity, and any optional attestation material. The protocol does not require public disclosure of exact private accumulation formulas, proprietary decay math, compounding rules internal to a private provider, or provider-specific derivation logic.

Private providers MUST still honor the observable semantics in §9.1 and MUST support auditable outputs at the public boundary.

9.5 Example Default Decay Algorithm [NORMATIVE FOR `acgp.core.default@1`]¶

Decay is computed using evaluation time (the steward's wall-clock timestamp at evaluation start), in UTC.

def apply_decay(current_debt: float, last_eval_time, eval_time, config) -> float:
    if last_eval_time is None:
        return current_debt  # No decay on first evaluation
    elapsed_hours = (eval_time - last_eval_time).total_seconds() / 3600.0
    periods = elapsed_hours / config.decay.period_hours
    decayed = current_debt * ((1.0 - config.decay.decay_fraction) ** periods)
    return max(decayed, config.decay.min_debt)

Fractional hours are used as-is (no rounding). This eliminates drift from floor/ceil rounding differences between implementations.

9.6 Trust Debt Threshold Effects [NORMATIVE]¶

Threshold handling is deterministic at the public boundary.

elevated_monitoring

MUST NOT change the primary intervention by itself.
MUST cause EVAL to include runtime_posture: "elevated_monitoring".
MUST cause a threshold-crossing event to be recorded in the Governance Store.

restricted_mode

MUST cause EVAL to include runtime_posture: "restricted_mode".
MUST apply a post-decision intervention floor of escalate.
ok and nudge MUST become escalate.
escalate, block, and halt MUST remain unchanged.
MUST cause a threshold-crossing event to be recorded in the Governance Store.

re_tiering_review

posture MUST remain restricted_mode.
MUST cause EVAL to include review_required: true.
MUST cause a review-trigger event to be recorded in the Governance Store.
actual review workflow remains deployment-defined.

Guardrails:

trust debt MUST NEVER produce halt.
See ACGP-4 for the tripwire-only halt invariant.

thresholds_crossed in public EVAL artifacts MUST represent the cumulative current-state set of active threshold labels and MUST NOT be serialized as an incremental per-evaluation list.

9.6.1 Operational Guidance [INFORMATIVE]¶

Deployments MAY increase logging verbosity, tighten internal evaluation heuristics, or notify operators when thresholds are crossed, but those behaviors are informative operator guidance rather than normative runtime semantics.

9.7 Provider Metadata and Attestation [NORMATIVE]¶

Blueprints and evaluation outputs MAY attach provider metadata using either a public descriptor or an opaque attestation:

trust_policy:
  provider:
    id: "urn:acgp:ext:trust-debt-private@1"
    visibility: private
    attestation:
      digest: "sha256:..."
      issued_by: "did:example:steward"

Private provider metadata MUST NOT require disclosure of formulas, source material, or provider-internal logic in public artifacts.

9.8 Example¶

Numeric note: This worked example is informative. Conformance is determined by the trust-debt vectors and the numeric tolerance rules defined in ACGP-6. Example values in this section are serialized to 4 decimal places unless otherwise stated.

Agent in Governance Tier GT-2, config: accumulation.block=2.0, accumulation.flag=0.1, decay.decay_fraction=0.05/1h

10:00  block  → debt = 0.0 + 2.0 = 2.0
10:30  block  → decay 0.5h: 2.0 × 0.95^0.5 = 1.9494 → + 2.0 = 3.9494
  → elevated_monitoring threshold (3.0) crossed
11:00  nudge + flag → decay 0.5h: 3.9494 × 0.95^0.5 = 3.8494 → + (0.5 + 0.1) = 4.4494
12:00  halt   → decay 1.0h: 4.4494 × 0.95^1.0 = 4.2269 → + 5.0 = 9.2269
  → restricted_mode threshold (6.0) crossed
12:10  block  → decay ~0.17h: 9.2269 × 0.95^0.17 = 9.1483 → + 2.0 = 11.1483
  → re_tiering_review threshold (10.0) crossed; Governance Tier review is triggered

9.9 Extension Boundary (Advanced Trust Debt)¶

Outside the default provider: piecewise/exponential decay curves, severity multipliers, multi-threshold graduated actions, compounding, and private provider semantics. Implementations MAY provide a TrustDebtProvider hook while preserving the observable semantics in §9.1.

9.10 Trust Debt Threshold Guardrails [NORMATIVE]¶

Blueprint-defined trust debt thresholds MUST NOT exceed the Clarity Baseline defaults by more than 2×. For example, if Clarity Baseline sets re_tiering_review: 10.0, a child blueprint MUST NOT set it above 20.0. Blueprints that exceed this limit MUST be rejected at load time with error code TRUST_DEBT_THRESHOLD_EXCEEDED.

10. Evaluation Flow [NORMATIVE]¶

Implementations MUST apply this sequence: resolve source blueprint -> validate resolved blueprint -> evaluate tripwires -> evaluate deterministic checks -> compute CTQ -> map thresholds -> attach flag state -> update trust policy state -> evaluate Governance Tier review triggers -> emit decision/evidence -> persist audit artifacts. Trust-policy updates for the current evaluation MUST observe the final primary decision plus any orthogonal flag.

For each governed action, implementations MUST follow this ordering:

1. Resolve applicable source Blueprint into a resolved Blueprint artifact
        ↓
2. Validate resolved Blueprint configuration integrity
        ↓
3. Evaluate tripwires first (ACGP-4 precedence rules)
  → Apply each triggered tripwire's explicit `on_fail.decision`
  → If multiple tripwires fire, the strictest explicit decision wins (`halt` > `block` > `escalate` > `nudge` > `ok`)
  → `severity` is authoring metadata only and MUST NOT alter runtime decision selection
        ↓  (no tripwires fired)
3b. Evaluate rule-based checks from checks[] array
  → Rule checks producing `ok`/`nudge`/`escalate`/`block` are treated as direct policy outputs
  → The strictest decision between rule-check outputs and CTQ-derived threshold outputs MUST win
  → Rule checks MUST NOT produce halt (only tripwires may)
  → If a rule check declares on_fail.decision: halt, the Blueprint MUST be rejected at load time
  → Rule checks with flag: true set flags.flagged on the intervention
        ↓
4. Evaluate CTQ metrics (5 standard dimensions)
   → Apply scorer for each metric
   → Compute weighted average
        ↓
5. Convert to risk score: Risk = 1.0 - CTQ
  → Apply effective thresholds (stricter of blueprint vs Governance Tier defaults, serialized as `GT-*` values)
        ↓
6. Evaluate flag conditions:
   (a) any rule-based check with flag: true that matched
   Flag is orthogonal — it attaches to whatever primary decision was determined in steps 3–5.
        ↓
7. Update trust debt state
  → Key principal-scoped state to `agent_id`
  → Compute `pre` from the decayed trust-debt state at evaluation start
  → Compute `delta` from the final primary decision plus any orthogonal flag contribution
  → Compute `post = pre + delta`
  → Compute cumulative `thresholds_crossed`
  → Derive `runtime_posture` from the active threshold set
  → If `restricted_mode` is active, apply a post-decision intervention floor of `escalate`
  → If `re_tiering_review` is active, set `review_required: true` while keeping posture `restricted_mode`
        ↓
8. Emit canonical EVAL artifact + intervention decision
  → Top-level EVAL `intervention` is the final post-floor outcome
  → Any pre-posture intervention MAY appear only in subordinate metadata
        ↓
9. Log TRACE + EVAL + INTERVENTION to the Governance Store

10.1 Failure Semantics Matrix [NORMATIVE]¶

Each row below defines the trigger condition, immediate behavior, public EVAL consequence, and authoritative source reference.

Trigger condition	Immediate behavior	EVAL consequence	Authoritative source
Invalid envelope or schema	Reject immediately	No EVAL emitted	ACGP-2 §4
Authentication or signature failure	Reject immediately	No EVAL emitted	ACGP-2 §4.1, §4.4, §9
Blueprint validation error	Reject at load / activation time	No EVAL emitted	ACGP-3 §2, §3.3; ACGP-4 §9
Required local extension unavailable with `fail_mode: reject_activation`	Reject activation	No EVAL emitted	ACGP-3 §2.2.1
Duplicate `message_id` replay with identical bytes	Return idempotent result; do not reprocess	No new EVAL emitted	ACGP-2 §8.5
Tripwire runtime evaluation failure	Fail closed per triggered tripwire `on_fail`	EVAL, if emitted, MUST reflect the fail-closed tripwire outcome	ACGP-4 §10
Required scorer runtime error without fallback	Preserve evaluator failure explicitly	Dimension `status: "error"`, `score: 0.0`, attempted contributors visible, no redistribution	ACGP-3 §6.1.2
Required scorer runtime error with declared fallback	Use declared degraded path	Dimension `status: "degraded"`, fallback score, attempted contributors visible, no redistribution	ACGP-3 §6.1.2
Optional `source-match` unavailable	Mark scorer unavailable and redistribute only where allowed	Dimension `status: "unavailable"`, `contributors: []`, redistribution only where explicitly allowed	ACGP-3 §5.3
Evidence policy precondition not met	Gate `knowledge_grounding` before scorer execution	Dimension `status: "failed_evidence_policy"`, `score: 0.0`, `contributors: []`, no redistribution	ACGP-3 §8
Required extension with `fail_mode: deny` at decision time	Deny or block as declared by policy	Emitted deny / block outcome with failure recorded in EVAL metadata	ACGP-3 §2.2.1
Steward or session path unavailable	Apply profile fallback	Use `deny`, `allow_and_log`, or `cached_decision` per profile fallback	ACGP-1 §6.6
Per-evaluation timeout policy	Extension-defined preview behavior only	Out of scope for v1.0 Standard EVAL contract unless preview extension is negotiated	ACGP-1 §6.6; Runtime Governance Contracts preview

10.2 Canonical EVAL Artifact [NORMATIVE]¶

EVAL is the single canonical governance outcome artifact for a governed action. Implementations MUST NOT introduce a separate peer core receipt artifact that competes with EVAL as the public outcome record.

The top-level intervention field in EVAL MUST represent the final emitted post-floor outcome. If an implementation wishes to preserve the pre-posture intervention for provenance, it MAY expose that value only in an optional subordinate field such as evaluation_metadata.pre_posture_intervention. Implementations MUST NOT emit two peer top-level intervention fields representing pre-floor and post-floor outcomes.

Top-level EVAL fields:

Field	Requirement	Notes
`trace_id`	MUST	Canonical trace identifier
`parent_trace_id`	MUST when present	Immediate parent linkage only
`blueprint_id`	MUST	Resolved or effective blueprint identifier
`governance_tier`	MUST	Serialized as `GT-*`
`ctq_dimensions`	MUST	Five canonical public dimensions
`ctq_score`	MUST	Final aggregate score
`risk_score`	MUST	`1.0 - ctq_score`
`tripwires_triggered`	MUST	Empty array if none
`intervention`	MUST	Final post-floor outcome
`flagged`	MUST	Orthogonal flag state
`runtime_posture`	MUST	`normal`, `elevated_monitoring`, or `restricted_mode`
`review_required`	MUST	Public review-trigger visibility
`trust_debt`	MUST when trust debt is enabled	Core observable trust-debt block
`resolved_blueprint_digest`	SHOULD	Digest of effective blueprint
`evidence_summary`	SHOULD when `evidence_policy` is declared	Supplementary evidence detail only
`audit_ref`	SHOULD	Audit linkage handle
`trust_debt_extended`	SHOULD when private-provider metadata exists	Optional extension / private-provider metadata

When trust debt is enabled, the trust_debt object MUST include:

Field	Requirement	Notes
`provider_id`	MUST	Public provider identifier
`pre`	MUST	Debt value after decay and before current accumulation
`delta`	MUST	Current evaluation contribution
`post`	MUST	Debt value after current accumulation
`thresholds_crossed`	MUST	Cumulative current-state threshold labels

thresholds_crossed is cumulative current state, not an incremental per-evaluation event list.

Example EVAL with failed_evidence_policy and elevated monitoring:

{
  "trace_id": "trace-eval-001",
  "blueprint_id": "finance_qa@2.1",
  "governance_tier": "GT-2",
  "ctq_dimensions": {
    "reasoning_quality": {
      "score": 0.91,
      "weight": 0.25,
      "status": "evaluated",
      "contributors": ["rationale_clarity", "plan_completeness"]
    },
    "knowledge_grounding": {
      "score": 0.0,
      "weight": 0.20,
      "status": "failed_evidence_policy",
      "contributors": []
    },
    "ethical_alignment": {
      "score": 0.94,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["fairness_review"]
    },
    "tool_safety": {
      "score": 0.89,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["permission_check"]
    },
    "context_awareness": {
      "score": 0.87,
      "weight": 0.15,
      "status": "evaluated",
      "contributors": ["situational_fit"]
    }
  },
  "ctq_score": 0.8035,
  "risk_score": 0.1965,
  "tripwires_triggered": [],
  "intervention": "ok",
  "flagged": false,
  "runtime_posture": "elevated_monitoring",
  "review_required": false,
  "trust_debt": {
    "provider_id": "acgp.core.default@1",
    "pre": 2.95,
    "delta": 0.10,
    "post": 3.05,
    "thresholds_crossed": ["elevated_monitoring"]
  },
  "evidence_summary": {
    "policy_declared": true,
    "controls_checked": ["require_citations", "certified_only", "min_sources"],
    "control_results": {
      "require_citations": "passed",
      "certified_only": "passed",
      "min_sources": "failed"
    }
  }
}

Example EVAL with restricted mode, review trigger, and post-floor intervention:

{
  "trace_id": "trace-eval-002",
  "parent_trace_id": "trace-root-900",
  "blueprint_id": "payments.standard@1.0",
  "governance_tier": "GT-3",
  "ctq_dimensions": {
    "reasoning_quality": {
      "score": 0.93,
      "weight": 0.25,
      "status": "evaluated",
      "contributors": ["rationale_clarity", "plan_completeness"]
    },
    "knowledge_grounding": {
      "score": 0.88,
      "weight": 0.20,
      "status": "degraded",
      "contributors": ["citation_coverage"],
      "notes": "Fallback path used after catalog latency breach"
    },
    "ethical_alignment": {
      "score": 0.95,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["fairness_review"]
    },
    "tool_safety": {
      "score": 0.92,
      "weight": 0.20,
      "status": "error",
      "contributors": ["permission_check"],
      "notes": "Primary sandbox evaluator failed without declared fallback"
    },
    "context_awareness": {
      "score": 0.90,
      "weight": 0.15,
      "status": "unavailable",
      "contributors": [],
      "notes": "Optional context adapter unavailable; redistribution declared separately"
    }
  },
  "ctq_score": 0.7360,
  "risk_score": 0.2640,
  "tripwires_triggered": [],
  "intervention": "escalate",
  "flagged": true,
  "runtime_posture": "restricted_mode",
  "review_required": true,
  "trust_debt": {
    "provider_id": "acgp.core.default@1",
    "pre": 9.80,
    "delta": 0.60,
    "post": 10.40,
    "thresholds_crossed": ["elevated_monitoring", "restricted_mode", "re_tiering_review"]
  },
  "evaluation_metadata": {
    "pre_posture_intervention": "ok",
    "fallback_policy": "mixed_declared_policies",
    "review_event": "queued_for_governance_tier_review"
  },
  "audit_ref": "audit:trace-eval-002"
}

11. Authoring Guidance [INFORMATIVE]¶

Blueprint authors SHOULD:

Use base references only when the parent artifact is stable and governed
Place hard-stop safety constraints in tripwires (ACGP-4)
Begin with conservative thresholds and calibrate with benchmarks
Declare explicit evidence-policy requirements for regulated domains
Avoid embedding workflow orchestration logic in policy blueprints
Publish fixtures alongside reusable blueprints so expected outcomes remain testable and reviewable

12. Conformance Requirements¶

A conformant ACGP-3 implementation MUST:

Parse and validate Blueprint source artifacts in YAML 1.2 and JSON
Parse and validate resolved Blueprint artifacts when they are exchanged or stored directly
Implement base resolution with the merge semantics defined in §2.4
Support the scorer interface for all five standard scorer types (accept output structures and process results correctly). Actual model inference (Tier-2 cognitive-evaluator, Tier-3 hybrid) is implementation-defined and NOT required for v1.0 conformance. Conformance test vectors supply deterministic scorer outputs.
Evaluate all five CTQ metrics with weighted aggregation (weights summing to 1.0)
Enforce threshold mapping using risk score (\(1.0 - CTQ\))
Apply the stricter of blueprint thresholds vs Governance Tier defaults (serialized GT-* values)
Enforce declared evidence controls
Preserve the trust debt observable semantics defined in §9.1 and the provider boundary defined in §9.4
Support the default deterministic provider acgp.core.default@1 and reproduce its public accumulation/decay behavior when that provider is selected
Preserve evaluation ordering: tripwires -> deterministic checks -> CTQ -> thresholds -> flag -> trust debt -> governance-tier review -> audit
Emit EVAL as the canonical governance outcome artifact, with top-level intervention representing the final post-floor outcome when trust debt is enabled

13. Extension Boundaries¶

Extension	Scope
Source Catalog	Source catalog integrations, advanced source trust
Advanced Trust Debt	Advanced trust debt: decay, recovery, compounding

Normative References¶

RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
RFC 3339 — Date and Time on the Internet: Timestamps
RFC 8785 — JSON Canonicalization Scheme (JCS)
ACGP-1 — Core Concepts & Terminology, v1.0, 2026
ACGP-2 — Messages & Wire Protocol, v1.0, 2026
ACGP-3 — Blueprints, Traces & Evaluation, v1.0, 2026
ACGP-4 — Tripwires & Safety Semantics, v1.0, 2026
ACGP-5 — Audit & Privacy Controls, v1.0, 2026
ACGP-6 — Conformance, v1.0, 2026