Blueprints, Traces & Evaluation

Status: Standard-only Alpha (v1.0.0-alpha.2)
Last Updated: 2026-03-18
Spec ID: ACGP-3
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119 and RFC 8174)

Abstract

This specification defines Blueprint source artifacts, resolved Blueprint artifacts, canonical checks[] semantics, cognitive trace semantics, the CTQ scorer framework, score-to-intervention threshold mapping, evidence policy, extension descriptors, and trust-policy semantics. It is the primary normative reference for anyone building a Policy Engine.

ACGP-3 is the authoritative normative reference for runtime evaluation ordering, check semantics, CTQ aggregation, threshold mapping, and trust-policy observable semantics.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Scope [NORMATIVE]

ACGP-3 defines governance policy semantics for:

  • Blueprint source artifact schema and resolved artifact schema
  • Base-resolution and merge behavior
  • Cognitive trace model and evaluation ordering
  • Canonical checks[] semantics for kind: rule | metric
  • CTQ metrics, evaluator types, and weighted aggregation
  • Score-to-intervention threshold mapping with Governance Tier interaction
  • Evidence-policy / source-grounding requirements
  • Trust-policy observable semantics, default provider behavior, and provider metadata

ACGP-3 does not define:

  • Transport or envelope mechanics (→ ACGP-2)
  • Full tripwire DSL grammar and fail-closed semantics (→ ACGP-4)
  • Audit retention or privacy controls (→ ACGP-5)
  • Conformance test suites (→ ACGP-6)

Minimal Useful Standard Implementation [INFORMATIVE]

A minimal useful ACGP v1.0 Standard implementation does not need to implement every advanced deployment pattern.

A practical starting point is: - one Governance Steward - one Operating Agent runtime - one active baseline blueprint - deterministic checks - tripwire evaluation - CTQ aggregation using the five standard metrics - threshold mapping - orthogonal flag support - default trust debt provider behavior - Governance Store or equivalent durable governance persistence

For conformance, implementations MUST correctly process scorer outputs for all standard scorer types. Actual model inference for Tier-2 (cognitive-evaluator) and Tier-3 (hybrid) scorers is implementation-defined and not required by v1.0 conformance test vectors.

This is the minimum useful alpha.2 implementation path: a single active policy, one runtime applying governance before action execution, and durable audit storage for emitted evidence and decisions.


Reading Map [INFORMATIVE]

  • Blueprint schema and artifact model
  • Cognitive trace model
  • Evaluation ordering
  • Check semantics
  • CTQ aggregation
  • Threshold mapping
  • Evidence policy
  • Trust debt and provider observables
  • Canonical EVAL artifact

2. Blueprint Schema [NORMATIVE]

A Blueprint source artifact MUST be a valid YAML 1.2 or JSON document. Blueprints are the primary mechanism for translating organizational policies into enforceable runtime rules. They are authored by domain experts and loaded by the Governance Steward. Evaluation engines consume resolved Blueprint artifacts derived from source artifacts.

2.1 Required Top-Level Fields

Field Type Description
artifact_type string Source-artifact discriminator. MUST be acgp.blueprint
schema_version string Schema version for the Blueprint source artifact
id string Unique identifier, RECOMMENDED format: domain/name@version
version string Semantic version for the source artifact
title string Human-readable policy title
description string Concise explanation of the blueprint's purpose
checks array Canonical rules and metrics to evaluate at runtime
intervention_policy object Threshold mapping from risk score to intervention

2.2 Optional Top-Level Fields

Field Type Description
base object Parent source Blueprint reference and optional digest pin
applicability object Restricts activation by tier, tool, domain
tripwires array Safety checks evaluated before checks[]
evidence_policy object Source-grounding requirements
trust_policy object Accumulation/decay policy for trust tracking
extensions object Public descriptors for required or optional extensions
annotations object Non-normative annotations
fixtures array Embedded policy fixtures for expected outcomes

The following fields are not part of Blueprint core and MUST NOT appear in a conformant Blueprint source artifact:

  • name
  • ctq
  • performance_budget
  • fallback_behavior
  • metadata
  • raw inherits
  • tripwire_syntax_version

2.2.1 Extension Metadata [NORMATIVE]

Blueprints MAY declare portable extension requirements using an extensions block:

extensions:
  required:
    - id: "urn:acgp:ext:source-catalog-private@1"
      visibility: private
      enforcement_scope: remote
      fail_mode: reject_activation
      attestation:
        digest: "sha256:..."
        policy_pack_id: "pp:2026-03-10:7"
  optional:
    - id: "urn:acgp:ext:contracts@1"
      visibility: public

Normative rules:

  • extensions.required[] descriptors MUST be preserved through validation, sync, and activation.
  • Required extensions MUST declare enforcement_scope as local, remote, or both.
  • The authoritative enforcer is the local runtime for local, the Governance Steward or remote authoritative runtime for remote, and both runtimes for both.
  • local scope requires enforcement by the local runtime applying the bundle; if local enforcement is unavailable, the descriptor's fail_mode MUST apply.
  • remote scope requires preservation and negotiation by the local runtime, including descriptor metadata and any attestation, but MUST NOT by itself block local activation when enforcement is performed only by the remote authoritative runtime.
  • both scope requires support by both authoritative enforcers on the sync boundary; if either side lacks support, the descriptor's fail_mode MUST apply.
  • Unsupported optional extensions MAY be preserved and ignored.
  • private descriptors expose only public negotiation metadata plus optional opaque attestation.
  • local extensions MUST remain deployment-local metadata and MUST NOT create hidden portability requirements in public blueprints.

The consolidated activation/evaluation outcomes are:

Descriptor enforcement_scope Local support Remote support Result
required local Yes N/A Activate and enforce locally
required local No N/A Apply fail_mode (reject_activation or deny)
required remote No or partial Yes Preserve descriptor locally, activate, and rely on remote authoritative enforcement
required remote Any No Apply fail_mode at activation or decision time, depending on descriptor semantics
required both Yes Yes Activate and require enforcement on both sides
required both No on either side No on either side Apply fail_mode
optional local, remote, or both Supported Supported or preserved Activate and apply where supported
optional local, remote, or both Unsupported Unsupported Preserve if possible and ignore without failing activation

This table is the single consolidated truth table for local / remote / both handling. ACGP-1 and ACGP-2 cross-reference this section rather than redefining the matrix.

2.3 Applicability Block

applicability MAY restrict blueprint activation by context:

applicability:
  governance_tiers: [GT-3, GT-4]   # Governance Tier values this blueprint applies to
  tools: ["execute_trade"]      # Tools governed by this blueprint
  domains: ["production"]       # Deployment domains
  out_of_scope_behavior: block

If omitted, the blueprint is globally applicable.

2.4 Base Resolution and Inheritance

If base is present, the child source artifact MUST be resolved against the parent source artifact before evaluation. Resolution produces a first-class resolved Blueprint artifact. The base object uses this minimum shape:

base:
  ref: finance/base@2.0
  digest: sha256:1234...

Child values are merged with parent values using these normative rules:

Field Category Merge Behavior
id, version, title, description Always child-defined
base Used only for lineage and resolution inputs
annotations Child replaces parent entirely
applicability Child overrides parent entirely when present
tripwires Append; if child defines a tripwire with the same id as parent, child definition replaces that parent tripwire
checks Append; if child defines a check with the same id as parent, child definition replaces that parent check
intervention_policy.thresholds Child overrides parent per key
evidence_policy Child overrides parent per key
trust_policy Child overrides parent per key
extensions.required Append; if child defines the same id, child replaces parent
extensions.optional Append; if child defines the same id, child replaces parent

Implementations MUST reject blueprints whose inheritance chain contains a cycle. Cycle detection MUST occur before merge resolution begins. The recommended error code is CircularBlueprintInheritance. An implementation MUST NOT attempt partial merge resolution once a cycle has been detected.

Safety note: because tripwires append by default, adding tripwires in a child blueprint never removes parent safety boundaries. To modify a parent tripwire, the child MUST declare a tripwire with the same id, which replaces only that specific tripwire.

Resolved Blueprint artifacts MUST carry at least these additional fields:

source_blueprint:
  ref: finance/desk-a@2.0
lineage:
  - ref: finance/base@2.0
  - ref: finance/desk-a@2.0
resolved_at: 2026-03-18T10:00:00Z
effective:
  valid_from: 2026-03-18T10:00:00Z
resolution_metadata:
  resolver_version: 2.0.0

Inheritance merge example:

# parent: finance/base@2.0
tripwires:
  - id: max_trade
    condition: args.trade_value > 50000
    on_fail: { decision: "block", reason: "Trade cap exceeded" }

# child: finance/desk-a@2.0
base:
  ref: finance/base@2.0
tripwires:
  - id: max_trade
    condition: args.trade_value > 25000
    on_fail: { decision: "block", reason: "Desk-A stricter cap" }
  - id: sanctions_check
    condition: contains_entity(args.counterparty, "sanctioned_org")
    on_fail: { decision: "halt", reason: "Sanctioned counterparty" }

Result: max_trade is replaced by child definition; sanctions_check is appended.

2.4.1 Operational Resource Limits [NORMATIVE]

Implementations MUST define and document operational limits for blueprint size, inheritance depth, and tripwire/check counts.

For v1.0 interoperability guidance, the following minimum rejection thresholds are RECOMMENDED:

  • maximum inheritance depth: 16
  • maximum tripwire count per blueprint: 256
  • maximum check count per blueprint: 256
  • maximum serialized blueprint size: 1 MiB

2.5 Blueprint Versioning

Blueprint versions MUST follow Semantic Versioning 2.0.0:

  • MAJOR: Breaking changes (threshold changes, removed checks, semantic changes)
  • MINOR: Backward-compatible additions (new optional checks, new tripwires)
  • PATCH: Bug fixes, clarifications, documentation

2.6 Resolved Blueprint Consumption

Evaluation engines MUST consume resolved Blueprint artifacts rather than ad hoc merged source trees. A source Blueprint MAY be evaluated only after it has been compiled into the resolved form defined in §2.4.


3. Checks Block [NORMATIVE]

ACGP v1.0 uses three distinct evaluation constructs:

  1. Tripwires — Top-level tripwires[] safety boundaries evaluated before checks[]. They may produce halt and short-circuit CTQ evaluation.
  2. Deterministic rule checkschecks[] entries with kind: rule. They produce direct pass/fail policy outcomes (ok, nudge, escalate, block) and MAY set flags.flagged.
  3. CTQ metric checkschecks[] entries with kind: metric. They contribute weighted evidence to CTQ aggregation and never produce halt directly.

Authoring decision tree:

  • If the condition is a hard safety boundary that must fire regardless of CTQ score, use tripwires[].
  • If the condition is a direct pass/fail policy rule with an explicit intervention, use a rule-based check.
  • If the condition contributes to runtime quality scoring, use a metric-based check.

The checks array defines the latter two constructs.

3.1 Rule Checks

Pass/fail policy enforcement:

checks:
  - id: single_trade_volume_cap
    kind: rule
    when:
      hook: "tool_call"
      tool: "execute_trade"
    condition: "args.trade_value <= 50000"
    on_fail:
      decision: "block"
      reason: "Trade value exceeds $50,000 limit."
    flag: false

decision MUST be one of: ok, nudge, escalate, block. Rule checks MUST NOT declare on_fail.decision: "halt". Blueprints containing halt in a rule-check decision MUST be rejected at load time with error code InvalidBlueprintHaltInRule.

Rule checks MAY additionally declare flag: true | false (default false). When flag: true, a matching check sets flags.flagged: true on the intervention without altering the primary decision.

3.2 Metric Checks

Weighted scoring that contributes to the final CTQ score:

checks:
  - id: trade_rationale_quality
    kind: metric
    when:
      hook: "tool_call"
      tool: "execute_trade"
    metric:
      name: "reasoning_quality"
      weight: 0.6
      evaluator:
        kind: "cognitive-evaluator"
        args:
          prompt_template: "templates/check_risk_rationale.txt"

3.3 Check Validation Rules

The outer checks[] envelope is canonical. Implementations MUST reject any check item that mixes rule-only and metric-only fields. Specifically:

  • a kind: rule check MUST declare condition and on_fail
  • a kind: rule check MUST NOT declare metric
  • a kind: metric check MUST declare metric
  • a kind: metric check MUST NOT declare condition or on_fail

4. Cognitive Trace Model [NORMATIVE]

A conformant evaluator MUST operate on a normalized cognitive trace object.

4.1 Minimum Trace Fields

Field Required Description
trace_id REQUIRED Unique identifier
session_id REQUIRED Session context
hook REQUIRED Lifecycle hook that triggered the trace
agent_id REQUIRED Stable identifier of the governed agent principal within a deployment-defined trust domain
action REQUIRED Action being evaluated; MUST be an object with required name and optional parameters
context REQUIRED Contextual data
reasoning RECOMMENDED Agent's reasoning chain

timestamp MUST come from the envelope timestamp (ACGP-2). TRACE payloads MUST NOT carry a top-level payload timestamp.

The model may supply reasoning and action content, but it does not self-authorize its governing identity. agent_id MUST be assigned by runtime or operator control. ACGP distinguishes the governed principal from mutable display names and from the envelope sender.

4.2 Tool Activity Fields

When tool activity exists, traces SHOULD include:

Field Description
tool Tool name
args Tool arguments
result Tool execution result
tool_calls Array of tool invocations

ACGP-4 canonical tripwire field roots resolve against this trace model. For the authoritative semantic meaning of trace fields used by tripwire expressions, see ACGP-4 §4.1.

4.3 Evidence Fields

When evidence is required, traces SHOULD include source/citation metadata for grounding checks (see Section 8).

4.4 Principal Identity Example

The following trace fragment shows an opaque principal identifier with a human-readable label in meta:

{
  "trace_id": "01958249-4d55-7000-8000-000000000111",
  "agent_id": "urn:acgp:agent:financeops:prod:7f4c9d2a",
  "session_id": "01958249-4d55-7000-8000-000000000112",
  "hook": "tool_call",
  "context": {},
  "governance_tier": "GT-3",
  "action": {
    "name": "approve_invoice",
    "parameters": { "invoice_id": "INV-1042" }
  },
  "meta": {
    "agent_label": "FinanceOps-Agent",
    "model": "gpt-5-pro"
  }
}

5. Scorer Types and Interface [NORMATIVE]

Implementations MUST support five standard scorer families:

5.1 rule-based Scorer

Fast, deterministic evaluation using predefined rules.

scorer: "rule-based"
parameters:
  rules: [string]           # List of rule IDs
  mode: "all"               # Enum: "all" (AND) or "any" (OR). Default: "all"
# Output: 1.0 if rules pass, 0.0 if fail (binary)
  • Latency: <10ms | Eval Tier: 0

5.2 cognitive-evaluator Scorer

A cognitive-evaluator is a non-deterministic evaluator that scores trace quality using an implementation-defined cognitive assessment backend. In alpha implementations, cognitive-evaluator is commonly backed by an LLM-based scorer, but the protocol intentionally standardizes the evaluator role rather than the model family.

scorer: "cognitive-evaluator"
parameters:
  model: string             # Model identifier
  prompt_template: string   # Template ID for evaluation prompt
  temperature: float        # Sampling temperature (default: 0.0)
  max_tokens: int           # Max response tokens (default: 256)
# Output: float 0.0-1.0 based on cognitive assessment
  • Latency: 500ms-5000ms | Eval Tier: 2

The protocol standardizes the scorer input/output interface. Model selection, accuracy, and latency are implementation-defined.

5.3 source-match Scorer

Evaluates grounding against certified sources.

scorer: "source-match"
parameters:
  required_sources: [string]     # Registry categories required (public profile)
  policy_handle: string          # Opaque handle for private-source-backed policy
  min_citation_ratio: float      # Min % of claims with citations
  min_source_trust: float        # Minimum source trust score
# Output: float 0.0-1.0 based on citation coverage
  • Latency: 50ms-200ms | Eval Tier: 1

Public blueprints MUST NOT be required to disclose private source identifiers, document locators, clause references, or derivation provenance for source-match policies. Deployments MAY instead publish an opaque policy_handle or extension attestation that signals the presence of a private source-backed control.

Optional source-match fallback [NORMATIVE]:

  • If an optional source-match scorer cannot execute because Source Catalog capability is unavailable or the catalog query fails, that scorer MUST be marked unavailable.
  • The unavailable scorer's weight MUST be redistributed proportionally across the remaining successfully executed scorers in the same aggregation set.
  • If no scorers remain in that aggregation set after unavailability handling, the aggregation set MUST fail and the enclosing control's explicit fail semantics apply.
  • This fallback does not apply when source evidence is encoded as a required control, a required extension, or a tripwire-backed fail path. Those cases follow their own explicit fail_mode or tripwire semantics.

5.4 pattern-match Scorer

Regex or pattern-based content scanning.

scorer: "pattern-match"
parameters:
  patterns:
    - pattern: "regex-pattern"
      score_on_match: float
      score_on_miss: float
  aggregation: "min"         # Enum: "min", "max", or "avg"
# Output: float 0.0-1.0 based on pattern matching
  • Latency: <50ms | Eval Tier: 0

5.5 hybrid Scorer

Combines multiple scorer types with weighted aggregation.

scorer: "hybrid"
parameters:
  scorers:
    - type: "rule-based"
      weight: 0.4
      parameters: { ... }
    - type: "cognitive-evaluator"
      weight: 0.6
      parameters: { ... }
  aggregation: "weighted_average"  # Enum: "weighted_average", "min", or "max"
  • Latency: Depends on components | Eval Tier: Highest of components

5.6 Scorer Interface [NORMATIVE]

Custom scorers MUST conform to this interface:

class Scorer(Protocol):
    scorer_type: str
    eval_tier: int
    latency_budget_ms: int

    def evaluate(self, trace: CognitiveTrace, parameters: dict) -> ScorerResult: ...

@dataclass
class ScorerResult:
    score: float           # 0.0 to 1.0
    confidence: float      # 0.0 to 1.0
    explanation: str       # Human-readable
    evidence: dict         # Supporting data
    latency_ms: int        # Actual evaluation time

5.7 Eval Tier Definitions [NORMATIVE]

Tier Strategy v1.0 Status Latency Budget
0 In-memory / rule-based Defined < 100 ms
1 Stateful lookup (DB/cache) Defined < 300 ms
2 Model inference (e.g., cognitive-evaluator) Reserved Implementation-defined
3 Human-in-the-loop review Reserved Implementation-defined

v1.0 constraint: Conformance testing validates Tiers 0 and 1 only. Tiers 2 and 3 are reserved, non-core extension space in v1.0. Implementations MAY support Tiers 2–3 as non-normative extensions, but MUST NOT require them for Standard or Safety-Critical conformance in v1.0. See also ACGP-4 §3 (tripwires MUST NOT declare eval_tier > 1 in v1.0 core).


6. CTQ Metrics and Aggregation [NORMATIVE]

6.1 Five Standard Dimensions

CTQ evaluates runtime quality across five dimensions:

Dimension Weight Range Evaluates
Reasoning Quality 0.20-0.30 Logical consistency, completeness, soundness
Knowledge Grounding 0.15-0.25 Source reliability, factual correctness, recency
Ethical Alignment 0.15-0.25 Harm prevention, fairness, transparency, privacy
Tool Safety 0.15-0.25 Permission validation, input safety, error handling
Context Awareness 0.10-0.20 Relevance, appropriateness, completeness, timing

Weights MUST sum to 1.0. Each weight MUST fall within its allowed range.

If declared weights do not sum to 1.0 (within tolerance ±0.001 to accommodate floating-point representation), the blueprint MUST be rejected at load time with error code INVALID_BLUEPRINT_WEIGHTS. Implementations MUST NOT silently normalize weights.

6.1.1 Metric-to-Dimension Mapping [NORMATIVE]

Each metric-based check MUST contribute to exactly one of the five standard CTQ dimensions.

Portable v1.0 blueprints SHOULD use the canonical dimension names directly as metric.name:

  • reasoning_quality
  • knowledge_grounding
  • ethical_alignment
  • tool_safety
  • context_awareness

Deployments MAY use more specific local metric names during authoring or preprocessing, but those names MUST be mapped to exactly one canonical CTQ dimension before runtime validation and conformance testing.

When multiple metric-based checks map to the same dimension, their weighted contributions are additive within the global CTQ calculation. The dimension's effective weight is the sum of the weights of the checks mapped to it.

Worked example:

checks:
  - id: rationale_clarity
    metric:
      name: reasoning_quality
      weight: 0.15
  - id: plan_completeness
    metric:
      name: reasoning_quality
      weight: 0.10
  - id: citation_coverage
    metric:
      name: knowledge_grounding
      weight: 0.20
  - id: fairness_review
    metric:
      name: ethical_alignment
      weight: 0.20
  - id: permission_check
    metric:
      name: tool_safety
      weight: 0.20
  - id: situational_fit
    metric:
      name: context_awareness
      weight: 0.15

If rationale_clarity = 0.80 and plan_completeness = 0.90, the combined weighted contribution to the reasoning_quality dimension is (0.80 × 0.15) + (0.90 × 0.10) = 0.21, and the effective reasoning dimension weight is 0.25.

6.1.2 Dimension Result Status [NORMATIVE]

Public CTQ results MUST represent evaluator state separately from content quality.

Each CTQ dimension result MUST use one of these status values:

  • evaluated — score produced through the normal intended scorer path.
  • degraded — score produced through a declared fallback path.
  • unavailable — intended scorer path could not evaluate the dimension because the capability or infrastructure was absent.
  • error — the scorer was attempted and failed unexpectedly with no declared fallback.
  • failed_evidence_policy — the dimension precondition was not met, so the dimension was gated before scorer execution.

Contributor visibility is part of the public contract:

  • failed_evidence_policy MUST serialize contributors: [].
  • unavailable MUST serialize contributors: [].
  • error MUST keep attempted contributor identifiers visible.
  • degraded MUST keep attempted contributor identifiers visible.

Implementations MUST distinguish evaluator state from content-quality score. A missing scorer, timeout, unavailable extension, evidence-system outage, or evaluator failure MUST NOT be silently serialized as an ordinary low-quality score unless the Blueprint or scorer policy explicitly defines that fallback behavior.

ACGP standardizes CTQ dimensions, weight rules, aggregation, scorer identifiers, public result fields, and decision semantics; it does not standardize internal prompts, learned heuristics, calibration datasets, source-ranking logic, or model-routing internals.

6.2 Score Interpretation

For each metric:

Score Quality Description
0.0-0.3 Poor Severe quality issues
0.3-0.6 Fair Significant gaps
0.6-0.8 Good Generally sound, minor issues
0.8-1.0 Excellent Clear, complete, verified

6.3 CTQ Calculation [NORMATIVE]

CTQ = sum(score_i × weight_i)  for i = 1..5

Risk score is derived as:

risk_score = 1.0 - CTQ

When one or more dimensions are not fully evaluated, the public result artifact MUST preserve the actual dimension status.

The Blueprint or scorer policy MUST explicitly determine whether the runtime:

  • redistributes weight across available dimensions,
  • applies an explicit fallback score, or
  • fails closed before producing a final canonical EVAL artifact.

The chosen behavior MUST be visible in the public result artifact and reproducible in conformance vectors.

Numeric Comparison Tolerance [NORMATIVE]: Floating-point results in test vectors MUST match expected values within ±1e-4. Implementations MUST serialize scores to exactly 4 decimal places (round half away from zero). Weight sums MUST match 1.0 within ±0.001 (ACGP-3 §6.1).

An exact threshold-boundary value falls into the less severe category.

Example:

reasoning:  0.90 × 0.25 = 0.225
grounding:  0.80 × 0.20 = 0.160
ethical:    0.85 × 0.20 = 0.170
tool_safety: 0.88 × 0.20 = 0.176
context:    0.82 × 0.15 = 0.123
─────────────────────────────────
CTQ = 0.854    Risk = 0.146

6.4 Weight Declaration in Blueprint

Blueprint expresses metric weights only through checks[].metric.weight. There is no normative top-level scoring.weights block and there is no normative ctq shorthand block.

Public Blueprints MUST carry all runtime metric configuration in canonical checks[] entries before validation. Tooling MAY provide private authoring conveniences, but those conveniences MUST compile to the canonical checks[] model and MUST NOT appear in published Blueprint artifacts.


7. Threshold Mapping and Decisions [NORMATIVE]

7.1 Threshold Format

Threshold intervals are evaluated using closed-upper-bound semantics:

  • ok if risk_score <= thresholds.ok
  • nudge if thresholds.ok < risk_score <= thresholds.nudge
  • escalate if thresholds.nudge < risk_score <= thresholds.escalate
  • block if risk_score > thresholds.escalate
  • See ACGP-4 for the tripwire-only halt invariant.

intervention_policy.thresholds maps risk score boundaries to interventions:

intervention_policy:
  thresholds:
    ok: 0.25          # Risk ≤ 0.25 → ok
    nudge: 0.40       # 0.25 < Risk ≤ 0.40 → nudge
    escalate: 0.55    # 0.40 < Risk ≤ 0.55 → escalate
    # Risk > 0.55 → block (implicit)

7.2 Governance-Tier-Specific Thresholds [NORMATIVE]

Default risk-score thresholds by Governance Tier:

Governance Tier OK Nudge Escalate Block
GT-0 risk ≤ 0.40 0.40 < risk ≤ 0.55 0.55 < risk ≤ 0.70 risk > 0.70 (implicit)
GT-1 risk ≤ 0.30 0.30 < risk ≤ 0.45 0.45 < risk ≤ 0.60 risk > 0.60 (implicit)
GT-2 risk ≤ 0.25 0.25 < risk ≤ 0.40 0.40 < risk ≤ 0.55 risk > 0.55 (implicit)
GT-3 risk ≤ 0.20 0.20 < risk ≤ 0.35 0.35 < risk ≤ 0.50 risk > 0.50 (implicit)
GT-4 risk ≤ 0.15 0.15 < risk ≤ 0.30 0.30 < risk ≤ 0.45 risk > 0.45 (implicit)
GT-5 risk ≤ 0.10 0.10 < risk ≤ 0.25 0.25 < risk ≤ 0.40 risk > 0.40 (implicit)

3-threshold model: The block column above is implicit — risk scores exceeding the escalate boundary always map to block. There is no separate block threshold field in the schema.

7.3 Threshold Override Rules

When both blueprint and Governance Tier default thresholds exist, the stricter (lower) boundary applies:

effective_ok = min(blueprint_thresholds.ok, governance_tier_defaults.ok)
effective_nudge = min(blueprint_thresholds.nudge, governance_tier_defaults.nudge)
effective_escalate = min(blueprint_thresholds.escalate, governance_tier_defaults.escalate)
# block = risk > effective_escalate (implicit; no separate field)

Worked example: An agent in Governance Tier GT-5 is evaluated with a permissive blueprint (ok: 0.40, nudge: 0.55, escalate: 0.70). Governance Tier GT-5 defaults are ok: 0.10, nudge: 0.25, escalate: 0.40. Effective thresholds:

Threshold Blueprint Governance Tier GT-5 Effective (min)
ok 0.40 0.10 0.10
nudge 0.55 0.25 0.25
escalate 0.70 0.40 0.40

With risk score 0.30: blueprint alone → ok, but effective thresholds → escalate (0.30 > 0.10 so not ok; 0.30 > 0.25 so not nudge; 0.25 < 0.30 ≤ 0.40 → escalate). The Governance Tier floor prevents a high-risk agent from receiving permissive governance.

7.4 Decision Algorithm [NORMATIVE]

def determine_intervention(ctq_score, governance_tier, tripwires):
    # Tripwires ALWAYS take precedence
    if tripwires:
        priority = ["halt", "block", "escalate", "nudge", "ok"]
        decisions = {t.on_fail.decision for t in tripwires}
        for decision in priority:
            if decision in decisions:
                return decision

    # CTQ-based threshold mapping
    risk_score = 1.0 - ctq_score
    thresholds = get_effective_thresholds(governance_tier, blueprint)

    if risk_score <= thresholds.ok:
      return "ok"
    elif risk_score <= thresholds.nudge:
      return "nudge"
    elif risk_score <= thresholds.escalate:
      return "escalate"
    else:
      return "block"

Runtime tripwire precedence is determined only by the triggered on_fail.decision values. severity remains advisory authoring metadata and MUST NOT alter the runtime intervention.


8. Evidence Policy [NORMATIVE]

The optional evidence_policy block defines source-grounding requirements:

evidence_policy:
  require_citations: true
  certified_only: true
  min_sources: 2
Control Type Description
require_citations boolean Whether citations are required in governed output
certified_only boolean Whether only certified/verified sources are acceptable
min_sources integer Minimum number of qualifying sources required

If evidence controls are declared, they form a dimension-level admissibility gate for knowledge_grounding. Implementations MUST check the gate before any knowledge-grounding scorers run.

If the knowledge_grounding evidence gate fails:

  • status MUST be failed_evidence_policy
  • score MUST be 0.0
  • contributors MUST be []
  • knowledge-grounding weights MUST NOT be redistributed

If the gate passes, knowledge-grounding scorers MUST run normally.

Evidence policy alone MUST NOT deny the whole governed action. Whole-action denial on evidence failure requires an explicit tripwire outcome or a required extension / required control whose declared fail semantics deny or reject the action.

Evaluators MUST record evidence outcomes in evaluation artifacts. When evidence_policy is declared, evidence_summary SHOULD capture supplementary details such as whether the policy was declared, which controls were checked, and which controls passed or failed. evidence_summary is supplementary detail only and MUST NOT become a second verdict channel.

Richer source-category or trust-floor semantics MAY be implemented through extensions, internal policy compilation, or evaluator-specific private configuration, but they are not part of the canonical Blueprint core schema.

If Source Catalog capability is unavailable and the evidence model relies on optional source-match scoring, the optional scorer fallback in §5.3 applies. If source evidence is required through extensions.required[], fail_mode: deny, or tripwire-backed semantics, those explicit controls override the optional fallback and MUST fail closed as declared.

Advanced source-catalog semantics are defined in the Source Catalog extension.


9. Trust Policy Core (v1.0) [NORMATIVE]

Trust debt is a governance state concept exposed through the trust_policy Blueprint block and a provider model. The public observable semantics are normative for v1.0 interoperability. acgp.core.default@1 is the default deterministic provider. Other provider behavior MAY exist through extension boundaries, but implementations MUST preserve the deterministic public boundary defined here even when the underlying provider is private.

9.1 Observable Semantics [NORMATIVE]

All conformant implementations MUST preserve these observable semantics:

  1. Trust debt is updated after threshold mapping and orthogonal flag attachment.
  2. Trust debt threshold effects are deterministic and MUST follow §9.6.
  3. Trust debt MUST NEVER relax an intervention's severity.
  4. Trust debt participates in the canonical evaluation order before Governance Tier review and audit emission.
  5. Public artifacts MAY expose current debt, debt deltas, provider identifiers, or opaque attestations, but MUST NOT be required to expose provider internals.
  6. When flags.flagged is true, the trust-debt delta for that evaluation is the sum of the primary decision accumulation weight plus the configured flag accumulation weight.
  7. Trust debt, intervention history, review posture, and other principal-scoped governance state MUST be keyed to agent_id, not to sender_id, agent_label, session_id, or other ephemeral identifiers.

9.2 Provider Configuration Schema

trust_policy:
  enabled: true
  provider:
    id: "acgp.core.default@1"
    visibility: public
  accumulation:
    ok: 0.0
    flag: 0.1
    nudge: 0.5
    escalate: 1.0
    block: 2.0
    halt: 5.0
  decay:
    decay_fraction: 0.05
    period_hours: 1
    min_debt: 0.0
  thresholds:
    elevated_monitoring: 3.0
    restricted_mode: 6.0
    re_tiering_review: 10.0

Note: decay_fraction = fraction of debt removed per period (e.g., 0.05 = 5% removed). acgp.core.default@1 names the default deterministic provider that reproduces the baseline semantics.

9.3 Default Provider Behavior [NORMATIVE]

If no provider is specified, implementations MUST behave as if provider.id = "acgp.core.default@1" were configured. acgp.core.default@1 is the default deterministic provider for v1.0 and preserves the accumulation/decay model used by v1.0 vectors:

def accumulate(current_debt: float, decision: str, flagged: bool, config) -> float:
  weight = config.accumulation.get(decision, 0.0)
  if flagged:
    weight += config.accumulation.get("flag", 0.0)
  return current_debt + weight

Note: ok interventions do NOT accrue debt by default (weight 0.0), but a flagged ok still accrues the configured flag weight. Omitting ok or flag from the accumulation map is treated as 0.0. Debt units are dimensionless and on a linear scale.

Rationale: halt remains part of trust-debt history even though it is terminal for the current governed action. Recording the halt accumulation preserves a monotonic governance history for post-incident review, operator scrutiny, and any subsequent session- or identity-level handling that reuses the same trust-debt provider state.

A halt intervention does not, by itself, reset accumulated trust debt unless an explicit reset or credit policy is applied.

Implementations MAY require administrative clearance, policy update, or manual reset before halted runtimes resume, but MUST preserve trust-debt observable semantics unless such a reset action is explicitly taken.

9.4 Provider Boundary [NORMATIVE]

The protocol standardizes observable effects, provider identifiers, extension descriptors, and optional opaque attestations. The public boundary is limited to externally verifiable behavior: emitted debt values or deltas, threshold-triggered actions, declared provider identity, and any optional attestation material. The protocol does not require public disclosure of exact private accumulation formulas, proprietary decay math, compounding rules internal to a private provider, or provider-specific derivation logic.

Private providers MUST still honor the observable semantics in §9.1 and MUST support auditable outputs at the public boundary.

9.5 Example Default Decay Algorithm [NORMATIVE FOR acgp.core.default@1]

Decay is computed using evaluation time (the steward's wall-clock timestamp at evaluation start), in UTC.

def apply_decay(current_debt: float, last_eval_time, eval_time, config) -> float:
    if last_eval_time is None:
        return current_debt  # No decay on first evaluation
    elapsed_hours = (eval_time - last_eval_time).total_seconds() / 3600.0
    periods = elapsed_hours / config.decay.period_hours
    decayed = current_debt * ((1.0 - config.decay.decay_fraction) ** periods)
    return max(decayed, config.decay.min_debt)

Fractional hours are used as-is (no rounding). This eliminates drift from floor/ceil rounding differences between implementations.

9.6 Trust Debt Threshold Effects [NORMATIVE]

Threshold handling is deterministic at the public boundary.

elevated_monitoring

  • MUST NOT change the primary intervention by itself.
  • MUST cause EVAL to include runtime_posture: "elevated_monitoring".
  • MUST cause a threshold-crossing event to be recorded in the Governance Store.

restricted_mode

  • MUST cause EVAL to include runtime_posture: "restricted_mode".
  • MUST apply a post-decision intervention floor of escalate.
  • ok and nudge MUST become escalate.
  • escalate, block, and halt MUST remain unchanged.
  • MUST cause a threshold-crossing event to be recorded in the Governance Store.

re_tiering_review

  • posture MUST remain restricted_mode.
  • MUST cause EVAL to include review_required: true.
  • MUST cause a review-trigger event to be recorded in the Governance Store.
  • actual review workflow remains deployment-defined.

Guardrails:

  • trust debt MUST NEVER produce halt.
  • See ACGP-4 for the tripwire-only halt invariant.

thresholds_crossed in public EVAL artifacts MUST represent the cumulative current-state set of active threshold labels and MUST NOT be serialized as an incremental per-evaluation list.

9.6.1 Operational Guidance [INFORMATIVE]

Deployments MAY increase logging verbosity, tighten internal evaluation heuristics, or notify operators when thresholds are crossed, but those behaviors are informative operator guidance rather than normative runtime semantics.

9.7 Provider Metadata and Attestation [NORMATIVE]

Blueprints and evaluation outputs MAY attach provider metadata using either a public descriptor or an opaque attestation:

trust_policy:
  provider:
    id: "urn:acgp:ext:trust-debt-private@1"
    visibility: private
    attestation:
      digest: "sha256:..."
      issued_by: "did:example:steward"

Private provider metadata MUST NOT require disclosure of formulas, source material, or provider-internal logic in public artifacts.

9.8 Example

Numeric note: This worked example is informative. Conformance is determined by the trust-debt vectors and the numeric tolerance rules defined in ACGP-6. Example values in this section are serialized to 4 decimal places unless otherwise stated.

Agent in Governance Tier GT-2, config: accumulation.block=2.0, accumulation.flag=0.1, decay.decay_fraction=0.05/1h

10:00  block  → debt = 0.0 + 2.0 = 2.0
10:30  block  → decay 0.5h: 2.0 × 0.95^0.5 = 1.9494 → + 2.0 = 3.9494
  → elevated_monitoring threshold (3.0) crossed
11:00  nudge + flag → decay 0.5h: 3.9494 × 0.95^0.5 = 3.8494 → + (0.5 + 0.1) = 4.4494
12:00  halt   → decay 1.0h: 4.4494 × 0.95^1.0 = 4.2269 → + 5.0 = 9.2269
  → restricted_mode threshold (6.0) crossed
12:10  block  → decay ~0.17h: 9.2269 × 0.95^0.17 = 9.1483 → + 2.0 = 11.1483
  → re_tiering_review threshold (10.0) crossed; Governance Tier review is triggered

9.9 Extension Boundary (Advanced Trust Debt)

Outside the default provider: piecewise/exponential decay curves, severity multipliers, multi-threshold graduated actions, compounding, and private provider semantics. Implementations MAY provide a TrustDebtProvider hook while preserving the observable semantics in §9.1.

9.10 Trust Debt Threshold Guardrails [NORMATIVE]

Blueprint-defined trust debt thresholds MUST NOT exceed the Clarity Baseline defaults by more than 2×. For example, if Clarity Baseline sets re_tiering_review: 10.0, a child blueprint MUST NOT set it above 20.0. Blueprints that exceed this limit MUST be rejected at load time with error code TRUST_DEBT_THRESHOLD_EXCEEDED.


10. Evaluation Flow [NORMATIVE]

Implementations MUST apply this sequence: resolve source blueprint -> validate resolved blueprint -> evaluate tripwires -> evaluate deterministic checks -> compute CTQ -> map thresholds -> attach flag state -> update trust policy state -> evaluate Governance Tier review triggers -> emit decision/evidence -> persist audit artifacts. Trust-policy updates for the current evaluation MUST observe the final primary decision plus any orthogonal flag.

For each governed action, implementations MUST follow this ordering:

1. Resolve applicable source Blueprint into a resolved Blueprint artifact
2. Validate resolved Blueprint configuration integrity
3. Evaluate tripwires first (ACGP-4 precedence rules)
  → Apply each triggered tripwire's explicit `on_fail.decision`
  → If multiple tripwires fire, the strictest explicit decision wins (`halt` > `block` > `escalate` > `nudge` > `ok`)
  → `severity` is authoring metadata only and MUST NOT alter runtime decision selection
        ↓  (no tripwires fired)
3b. Evaluate rule-based checks from checks[] array
  → Rule checks producing `ok`/`nudge`/`escalate`/`block` are treated as direct policy outputs
  → The strictest decision between rule-check outputs and CTQ-derived threshold outputs MUST win
  → Rule checks MUST NOT produce halt (only tripwires may)
  → If a rule check declares on_fail.decision: halt, the Blueprint MUST be rejected at load time
  → Rule checks with flag: true set flags.flagged on the intervention
4. Evaluate CTQ metrics (5 standard dimensions)
   → Apply scorer for each metric
   → Compute weighted average
5. Convert to risk score: Risk = 1.0 - CTQ
  → Apply effective thresholds (stricter of blueprint vs Governance Tier defaults, serialized as `GT-*` values)
6. Evaluate flag conditions:
   (a) any rule-based check with flag: true that matched
   Flag is orthogonal — it attaches to whatever primary decision was determined in steps 3–5.
7. Update trust debt state
  → Key principal-scoped state to `agent_id`
  → Compute `pre` from the decayed trust-debt state at evaluation start
  → Compute `delta` from the final primary decision plus any orthogonal flag contribution
  → Compute `post = pre + delta`
  → Compute cumulative `thresholds_crossed`
  → Derive `runtime_posture` from the active threshold set
  → If `restricted_mode` is active, apply a post-decision intervention floor of `escalate`
  → If `re_tiering_review` is active, set `review_required: true` while keeping posture `restricted_mode`
8. Emit canonical EVAL artifact + intervention decision
  → Top-level EVAL `intervention` is the final post-floor outcome
  → Any pre-posture intervention MAY appear only in subordinate metadata
9. Log TRACE + EVAL + INTERVENTION to the Governance Store

10.1 Failure Semantics Matrix [NORMATIVE]

Each row below defines the trigger condition, immediate behavior, public EVAL consequence, and authoritative source reference.

Trigger condition Immediate behavior EVAL consequence Authoritative source
Invalid envelope or schema Reject immediately No EVAL emitted ACGP-2 §4
Authentication or signature failure Reject immediately No EVAL emitted ACGP-2 §4.1, §4.4, §9
Blueprint validation error Reject at load / activation time No EVAL emitted ACGP-3 §2, §3.3; ACGP-4 §9
Required local extension unavailable with fail_mode: reject_activation Reject activation No EVAL emitted ACGP-3 §2.2.1
Duplicate message_id replay with identical bytes Return idempotent result; do not reprocess No new EVAL emitted ACGP-2 §8.5
Tripwire runtime evaluation failure Fail closed per triggered tripwire on_fail EVAL, if emitted, MUST reflect the fail-closed tripwire outcome ACGP-4 §10
Required scorer runtime error without fallback Preserve evaluator failure explicitly Dimension status: "error", score: 0.0, attempted contributors visible, no redistribution ACGP-3 §6.1.2
Required scorer runtime error with declared fallback Use declared degraded path Dimension status: "degraded", fallback score, attempted contributors visible, no redistribution ACGP-3 §6.1.2
Optional source-match unavailable Mark scorer unavailable and redistribute only where allowed Dimension status: "unavailable", contributors: [], redistribution only where explicitly allowed ACGP-3 §5.3
Evidence policy precondition not met Gate knowledge_grounding before scorer execution Dimension status: "failed_evidence_policy", score: 0.0, contributors: [], no redistribution ACGP-3 §8
Required extension with fail_mode: deny at decision time Deny or block as declared by policy Emitted deny / block outcome with failure recorded in EVAL metadata ACGP-3 §2.2.1
Steward or session path unavailable Apply profile fallback Use deny, allow_and_log, or cached_decision per profile fallback ACGP-1 §6.6
Per-evaluation timeout policy Extension-defined preview behavior only Out of scope for v1.0 Standard EVAL contract unless preview extension is negotiated ACGP-1 §6.6; Runtime Governance Contracts preview

10.2 Canonical EVAL Artifact [NORMATIVE]

EVAL is the single canonical governance outcome artifact for a governed action. Implementations MUST NOT introduce a separate peer core receipt artifact that competes with EVAL as the public outcome record.

The top-level intervention field in EVAL MUST represent the final emitted post-floor outcome. If an implementation wishes to preserve the pre-posture intervention for provenance, it MAY expose that value only in an optional subordinate field such as evaluation_metadata.pre_posture_intervention. Implementations MUST NOT emit two peer top-level intervention fields representing pre-floor and post-floor outcomes.

Top-level EVAL fields:

Field Requirement Notes
trace_id MUST Canonical trace identifier
parent_trace_id MUST when present Immediate parent linkage only
blueprint_id MUST Resolved or effective blueprint identifier
governance_tier MUST Serialized as GT-*
ctq_dimensions MUST Five canonical public dimensions
ctq_score MUST Final aggregate score
risk_score MUST 1.0 - ctq_score
tripwires_triggered MUST Empty array if none
intervention MUST Final post-floor outcome
flagged MUST Orthogonal flag state
runtime_posture MUST normal, elevated_monitoring, or restricted_mode
review_required MUST Public review-trigger visibility
trust_debt MUST when trust debt is enabled Core observable trust-debt block
resolved_blueprint_digest SHOULD Digest of effective blueprint
evidence_summary SHOULD when evidence_policy is declared Supplementary evidence detail only
audit_ref SHOULD Audit linkage handle
trust_debt_extended SHOULD when private-provider metadata exists Optional extension / private-provider metadata

When trust debt is enabled, the trust_debt object MUST include:

Field Requirement Notes
provider_id MUST Public provider identifier
pre MUST Debt value after decay and before current accumulation
delta MUST Current evaluation contribution
post MUST Debt value after current accumulation
thresholds_crossed MUST Cumulative current-state threshold labels

thresholds_crossed is cumulative current state, not an incremental per-evaluation event list.

Example EVAL with failed_evidence_policy and elevated monitoring:

{
  "trace_id": "trace-eval-001",
  "blueprint_id": "finance_qa@2.1",
  "governance_tier": "GT-2",
  "ctq_dimensions": {
    "reasoning_quality": {
      "score": 0.91,
      "weight": 0.25,
      "status": "evaluated",
      "contributors": ["rationale_clarity", "plan_completeness"]
    },
    "knowledge_grounding": {
      "score": 0.0,
      "weight": 0.20,
      "status": "failed_evidence_policy",
      "contributors": []
    },
    "ethical_alignment": {
      "score": 0.94,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["fairness_review"]
    },
    "tool_safety": {
      "score": 0.89,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["permission_check"]
    },
    "context_awareness": {
      "score": 0.87,
      "weight": 0.15,
      "status": "evaluated",
      "contributors": ["situational_fit"]
    }
  },
  "ctq_score": 0.8035,
  "risk_score": 0.1965,
  "tripwires_triggered": [],
  "intervention": "ok",
  "flagged": false,
  "runtime_posture": "elevated_monitoring",
  "review_required": false,
  "trust_debt": {
    "provider_id": "acgp.core.default@1",
    "pre": 2.95,
    "delta": 0.10,
    "post": 3.05,
    "thresholds_crossed": ["elevated_monitoring"]
  },
  "evidence_summary": {
    "policy_declared": true,
    "controls_checked": ["require_citations", "certified_only", "min_sources"],
    "control_results": {
      "require_citations": "passed",
      "certified_only": "passed",
      "min_sources": "failed"
    }
  }
}

Example EVAL with restricted mode, review trigger, and post-floor intervention:

{
  "trace_id": "trace-eval-002",
  "parent_trace_id": "trace-root-900",
  "blueprint_id": "payments.standard@1.0",
  "governance_tier": "GT-3",
  "ctq_dimensions": {
    "reasoning_quality": {
      "score": 0.93,
      "weight": 0.25,
      "status": "evaluated",
      "contributors": ["rationale_clarity", "plan_completeness"]
    },
    "knowledge_grounding": {
      "score": 0.88,
      "weight": 0.20,
      "status": "degraded",
      "contributors": ["citation_coverage"],
      "notes": "Fallback path used after catalog latency breach"
    },
    "ethical_alignment": {
      "score": 0.95,
      "weight": 0.20,
      "status": "evaluated",
      "contributors": ["fairness_review"]
    },
    "tool_safety": {
      "score": 0.92,
      "weight": 0.20,
      "status": "error",
      "contributors": ["permission_check"],
      "notes": "Primary sandbox evaluator failed without declared fallback"
    },
    "context_awareness": {
      "score": 0.90,
      "weight": 0.15,
      "status": "unavailable",
      "contributors": [],
      "notes": "Optional context adapter unavailable; redistribution declared separately"
    }
  },
  "ctq_score": 0.7360,
  "risk_score": 0.2640,
  "tripwires_triggered": [],
  "intervention": "escalate",
  "flagged": true,
  "runtime_posture": "restricted_mode",
  "review_required": true,
  "trust_debt": {
    "provider_id": "acgp.core.default@1",
    "pre": 9.80,
    "delta": 0.60,
    "post": 10.40,
    "thresholds_crossed": ["elevated_monitoring", "restricted_mode", "re_tiering_review"]
  },
  "evaluation_metadata": {
    "pre_posture_intervention": "ok",
    "fallback_policy": "mixed_declared_policies",
    "review_event": "queued_for_governance_tier_review"
  },
  "audit_ref": "audit:trace-eval-002"
}

11. Authoring Guidance [INFORMATIVE]

Blueprint authors SHOULD:

  • Use base references only when the parent artifact is stable and governed
  • Place hard-stop safety constraints in tripwires (ACGP-4)
  • Begin with conservative thresholds and calibrate with benchmarks
  • Declare explicit evidence-policy requirements for regulated domains
  • Avoid embedding workflow orchestration logic in policy blueprints
  • Publish fixtures alongside reusable blueprints so expected outcomes remain testable and reviewable

12. Conformance Requirements

A conformant ACGP-3 implementation MUST:

  1. Parse and validate Blueprint source artifacts in YAML 1.2 and JSON
  2. Parse and validate resolved Blueprint artifacts when they are exchanged or stored directly
  3. Implement base resolution with the merge semantics defined in §2.4
  4. Support the scorer interface for all five standard scorer types (accept output structures and process results correctly). Actual model inference (Tier-2 cognitive-evaluator, Tier-3 hybrid) is implementation-defined and NOT required for v1.0 conformance. Conformance test vectors supply deterministic scorer outputs.
  5. Evaluate all five CTQ metrics with weighted aggregation (weights summing to 1.0)
  6. Enforce threshold mapping using risk score (\(1.0 - CTQ\))
  7. Apply the stricter of blueprint thresholds vs Governance Tier defaults (serialized GT-* values)
  8. Enforce declared evidence controls
  9. Preserve the trust debt observable semantics defined in §9.1 and the provider boundary defined in §9.4
  10. Support the default deterministic provider acgp.core.default@1 and reproduce its public accumulation/decay behavior when that provider is selected
  11. Preserve evaluation ordering: tripwires -> deterministic checks -> CTQ -> thresholds -> flag -> trust debt -> governance-tier review -> audit
  12. Emit EVAL as the canonical governance outcome artifact, with top-level intervention representing the final post-floor outcome when trust debt is enabled

13. Extension Boundaries

Extension Scope
Source Catalog Source catalog integrations, advanced source trust
Advanced Trust Debt Advanced trust debt: decay, recovery, compounding

Normative References

  • RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
  • RFC 3339 — Date and Time on the Internet: Timestamps
  • RFC 8785 — JSON Canonicalization Scheme (JCS)
  • ACGP-1 — Core Concepts & Terminology, v1.0, 2026
  • ACGP-2 — Messages & Wire Protocol, v1.0, 2026
  • ACGP-3 — Blueprints, Traces & Evaluation, v1.0, 2026
  • ACGP-4 — Tripwires & Safety Semantics, v1.0, 2026
  • ACGP-5 — Audit & Privacy Controls, v1.0, 2026
  • ACGP-6 — Conformance, v1.0, 2026