Tripwires & Safety Semantics

Status: Standard-only Alpha (v1.0.0-alpha.2)
Last Updated: 2026-03-07
Spec ID: ACGP-4
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119 and RFC 8174)

Abstract

This specification defines the tripwire condition language, severity classification, activation-time validation, runtime fail-closed semantics, precedence rules, emergency overrides, and evolution policy. Tripwires are the only mechanism that can trigger a HALT intervention.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Scope [NORMATIVE]

ACGP-4 defines the safety-critical policy language and execution semantics for tripwires.

ACGP-4 does not define:

  • Wire transport (→ ACGP-2)
  • General CTQ metric model or threshold mapping (→ ACGP-3)
  • Audit retention (→ ACGP-5)

2. Tripwire Role and Precedence [NORMATIVE]

Tripwires are high-priority safety checks that fire before threshold-based CTQ evaluation. They catch catastrophic conditions that statistical scoring might miss.

Normative precedence rules:

  1. Tripwires MUST execute before CTQ metric scoring.
  2. Tripwire-triggered decisions MUST take precedence over threshold-derived decisions.
  3. halt is tripwire-only and MUST NOT be produced by threshold mapping alone.
  4. If any tripwire issues halt, evaluation MUST terminate immediately (short-circuit).
  5. If multiple tripwires trigger, the strictest on_fail.decision MUST apply (halt > block > escalate > nudge > ok). Severity is advisory for authoring only and MUST NOT alter the runtime decision.

halt is a tripwire-originated intervention only. Threshold mapping, trust-debt posture changes, and non-tripwire evaluation steps MUST NOT synthesize halt.

Trust-debt threshold handling in ACGP-3 MAY escalate an intervention but MUST NEVER produce halt; halt remains reserved exclusively for tripwire outcomes.

2.1 Polarity Convention [NORMATIVE]

The condition expression describes the violation pattern. If the condition evaluates to true for a given trace, the tripwire fires and the on_fail action is applied.

The name on_fail refers to the agent's action failing governance review, not to the condition expression evaluating to false.

Note: A future version (v1.1) may introduce on_trigger as a clearer alias for on_fail. For v1.0, on_fail is the canonical wire name.


3. Tripwire Schema [NORMATIVE]

Each tripwire object MUST include:

Field Type Required Description
id string REQUIRED Unique identifier (e.g., pii_exposure_check)
when object OPTIONAL Trigger condition: hook, optional tool. If omitted, the tripwire applies to all hook values. There is no "any" hook value on the wire. This is the default for global safety tripwires.
condition string or object REQUIRED DSL expression (Section 4)
on_fail.decision enum REQUIRED One of: nudge, escalate, block, halt
on_fail.reason string REQUIRED Human-readable explanation
eval_tier integer optional 0 or 1 (default: 0)
latency_budget_ms integer optional Per-tripwire budget (default: 100/300)
requires_state boolean optional Whether stateful storage is needed
severity enum optional standard, critical, severe

Note: To flag without altering the primary decision, use a rule-based check with flag: true (see ACGP-3 §3). Tripwire on_fail.decision does not include flag because tripwires always produce a primary intervention.

Tier constraint: Tripwires MUST NOT declare eval_tier > 1 in v1.0 core.

Stateful tripwire scoping [NORMATIVE]:

  1. Tripwires with requires_state: false (default) MUST evaluate using only trace payload fields and the condition grammar in Section 4. No state queries are permitted.
  2. Tripwires with requires_state: true MAY use standardized stateful functions defined in Section 6. Stateful functions are the only mechanism for state access.
  3. Tripwires with requires_state: true SHOULD declare eval_tier: 1 unless the implementation provides a local state cache; with a local cache, eval_tier: 0 is permitted.

4. Formal Condition Grammar (BNF) [NORMATIVE]

<condition>      ::= <compound_expr> | <simple_expr>

<compound_expr>  ::= <all_expr> | <any_expr> | <not_expr>
<all_expr>       ::= "all" ":" "[" <condition_list> "]"
<any_expr>       ::= "any" ":" "[" <condition_list> "]"
<not_expr>       ::= "NOT" <condition>
<condition_list> ::= <condition> | <condition> "," <condition_list>

<simple_expr>    ::= <comparison> | <function_call>

<comparison>     ::= <field_access> <operator> <value>
<operator>       ::= ">" | ">=" | "<" | "<=" | "==" | "!="
                    | "contains" | "matches"

<function_call>  ::= <function_name> "(" <argument_list_opt> ")"
<function_name>  ::= "is_external" | "in_allowlist" | "in_denylist"
                    | "matches_regex" | "contains_entity" | "exceeds_rate"
<argument_list_opt> ::= ε | <argument_list>
<argument_list>  ::= <argument> | <argument> "," <argument_list>
<argument>       ::= <field_access> | <value>

<field_access>   ::= <identifier> | <identifier> "." <field_access>
<identifier>     ::= <alpha> <identifier_tail>
<identifier_tail>::= ε | <ident_char> <identifier_tail>
<ident_char>     ::= <alpha> | <digit> | "_"

<value>          ::= <string> | <number> | <boolean> | <array>
<array>          ::= "[" <value_list_opt> "]"
<value_list_opt> ::= ε | <value_list>
<value_list>     ::= <value> | <value> "," <value_list>

<string>         ::= '"' <string_chars> '"'
<number>         ::= <sign_opt> <digits> <fraction_opt>
<boolean>        ::= "true" | "false"

Implementations MUST support nested compound expressions at least 3 levels deep.

4.1 Canonical Field Roots [NORMATIVE]

Tripwire field access uses dotted identifiers whose first segment MUST be one of the canonical roots defined below.

Unless explicitly stated otherwise, field paths are resolved against the canonical ACGP-3 trace model. Implementations MUST reject undeclared root aliases in normative mode.

Root Resolves To Notes
action Trace action object Canonical root for intended action metadata
args action.parameters Canonical shorthand retained for v1.0 tripwire authoring
reasoning Trace reasoning field Optional trace content; missing field is fail-closed if referenced
confidence Trace confidence field Numeric
agent_id Governed agent principal identifier from the trace Scalar
governance_tier Trace governance tier Serialized as GT-*
meta Trace metadata object Implementation-supplied metadata
output Primary output object, if present Optional
outputs Multi-output collection, if present Optional
tool Active tool identifier / tool context Optional
source_refs Evidence/source references Optional
destination Delivery target / sink field Optional
content Primary content field Optional
storage Storage-target metadata Optional

args is the only canonical shorthand alias defined by v1.0, and resolves to action.parameters.

Implementations MUST NOT accept additional root aliases unless operating in a clearly documented non-conformant extension mode.

For the authoritative semantic meaning of trace fields, see ACGP-3 §4.1.

If a tripwire expression references an unknown root identifier, blueprint activation MUST fail with a named validation error.


5. Standard Operators [NORMATIVE]

Operator Description Example
> Greater than args.amount > 10000
>= Greater or equal meta.response_size_bytes >= 1048576
< Less than meta.retry_count < 3
<= Less or equal meta.trust_debt <= 0.5
== Equal action.type == "delete"
!= Not equal destination != "internal"
contains String contains content contains "password"
matches Regex match content matches "\\d{3}-\\d{2}-\\d{4}"

6. Standard Functions [NORMATIVE]

Function Description Example
is_external(field) Check if endpoint is external is_external(destination)
in_allowlist(field, list) Check against named allowlist in_allowlist(tool, "approved_tools")
in_denylist(field, list) Check against named denylist in_denylist(destination, "blocked_domains")
matches_regex(field, pattern) Named regex pattern match matches_regex(content, "SSN_PATTERN")
contains_entity(field, type) Named-entity detection contains_entity(output, "credit_card")
exceeds_rate(agent_id, limit, window) Stateful rate limit check exceeds_rate(agent_id, 100, "1m")
recent_tool_sum(tool, field, window) Stateful sum of tool field values recent_tool_sum("execute_trade", "args.trade_value", "1d")
recent_tool_count(tool, window) Stateful count of tool invocations recent_tool_count("execute_trade", "1h")
rolling_intervention_rate(agent_id, window, types[]) Stateful rate of intervention classes rolling_intervention_rate(agent_id, "24h", ["nudge", "block"])

Implementations MUST parse and evaluate all standard operators and functions.

Stateful principal semantics [NORMATIVE]:

  • Functions such as exceeds_rate(agent_id, limit, window) and rolling_intervention_rate(agent_id, window, types[]) evaluate behavior for the governed agent principal identified by agent_id.
  • Implementations MUST NOT substitute sender_id, agent_label, session_id, trace_id, or runtime instance identifiers when evaluating per-agent stateful controls.

Tripwire conditions MUST NOT contain embedded query language (SQL, GraphQL, or equivalent). Stateful evaluation MUST use the standardized function set above or extension-registered functions.

Function-set versioning: v1.0 core defines the functions above. Extensions MAY register additional functions via extension registries (Advanced Trust Debt or future extensions).

6.1 Extension Function Registration [INFORMATIVE]

Implementations MAY allow blueprints to reference additional functions beyond the standard set. Extension functions MUST follow these conventions:

  1. Naming: Extension functions MUST use the query_ prefix (e.g., query_external, query_credit_score). The prefix distinguishes them from standard functions during validation.
  2. Registration: The steward or runtime MUST register allowed extension function names before blueprint activation. Unregistered query_* calls MUST be rejected at validation time.
  3. Arity: Extension functions accept implementation-defined arguments. Validators SHOULD skip arity checks for registered extension functions.
  4. Statefulness: Extension functions are implicitly stateful (requires_state: true SHOULD be declared on tripwires that use them).
  5. Visibility boundary: If a tripwire depends on a private or local extension capability, the containing blueprint or bundle SHOULD declare that dependency through extensions.required[] or deployment-local metadata rather than exposing private backing details in function arguments.
# Example: registering an extension function at runtime
steward.register_extension_functions({"query_external", "query_credit_score"})

Conformance note: The conformance test suite uses query_external as a known example extension function. Implementations MUST accept it in test vectors without error.

6.2 Regex Profile and Determinism [NORMATIVE]

All matches / matches_regex evaluations MUST use RE2-compatible, linear-time regex. MUST reject backtracking constructs (backreferences, lookahead, lookbehind). Input strings MUST be normalized to Unicode NFC before evaluation. Patterns > 1024 characters MUST be rejected with TripwireRegexTooLong. Unknown flags MUST cause TripwireRegexInvalidFlag.


7. Compound Conditions [NORMATIVE]

7.1 ALL (AND Logic)

condition:
  all:
    - meta.response_size_bytes > 10485760
    - is_external(destination)
    - NOT in_allowlist(destination, "trusted_endpoints")

7.2 ANY (OR Logic)

condition:
  any:
    - action.type == "delete"
    - action.type == "drop"
    - action.type == "truncate"

7.3 NOT (Negation)

condition:
  NOT:
    in_allowlist(tool, "approved_tools")

8. Severity Classification [INFORMATIVE]

Tripwires support three severity categories for policy authoring triage and review.

Severity is authoring metadata only. At runtime, implementations MUST apply the explicit on_fail.decision on each triggered tripwire. If multiple tripwires fire, the strictest explicit decision wins (halt > block > escalate > nudge > ok). Severity MUST NOT be used to compute the runtime intervention.

Severity Examples Suggested authoring default Authoring guidance
Standard Budget exceeded, rate limit, API quota block Use for hard safety boundaries that normally stop the current action without ending the session.
Critical Secrets in output, production write, PII leak block or halt Choose the explicit runtime decision in the blueprint based on the deployment's risk posture.
Severe Data exfiltration, collusion, credential theft halt Reserve for conditions that should terminate the governed session immediately.

Severity classification is advisory for policy authoring. The explicit on_fail.decision remains authoritative at runtime.

8.1 Standard Tripwires

Implementations SHOULD ship these built-in tripwires:

Budget:

- id: spend_cap_exceeded
  severity: standard
  condition: args.total_spend > args.budget_limit
  on_fail: { decision: block, reason: "Budget limit exceeded" }

Rate Limiting:

- id: rate_limit_hit
  severity: standard
  eval_tier: 1
  requires_state: true
  condition: exceeds_rate(agent_id, 100, "1m")
  on_fail: { decision: block, reason: "Rate limit exceeded (100 req/min)" }

Secrets Detection:

- id: secrets_detected
  severity: critical
  condition:
    any:
      - content contains "AKIA"
      - matches_regex(content, "-----BEGIN.*PRIVATE KEY-----")
  on_fail: { decision: block, reason: "Secrets detected in output" }

PII Exposure:

- id: pii_exposure
  severity: critical
  condition:
    any:
      - matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b")
      - contains_entity(content, "credit_card")
      - contains_entity(content, "bank_account")
  on_fail: { decision: block, reason: "PII detected in output" }

Data Exfiltration:

- id: data_exfiltration
  severity: severe
  condition:
    all:
      - meta.response_size_bytes > 10485760
      - is_external(destination)
      - NOT in_allowlist(destination, "approved_endpoints")
  on_fail: { decision: halt, reason: "Potential data exfiltration detected" }

Dangerous Database Operations:

- id: dangerous_db_ops
  severity: severe
  condition:
    all:
      - tool == "database_query"
      - any:
          - action.query contains "DROP"
          - action.query contains "DELETE FROM"
          - action.query contains "TRUNCATE"
      - NOT in_allowlist(action.table, "deletable_tables")
  on_fail: { decision: halt, reason: "Dangerous database operation blocked" }

9. Activation-Time Validation [NORMATIVE]

At blueprint activation/load time, implementations MUST:

  1. Parse condition expressions against the BNF grammar
  2. Validate operator and function names are recognized
  3. Validate function arity and argument types
  4. Validate referenced fields are permitted by the trace schema

If any tripwire condition fails validation, blueprint activation MUST fail and return structured validation errors:

{
  "blueprint_id": "finance/trading@2.0.0",
  "validation_errors": [
    {
      "tripwire_id": "daily_trade_count",
      "error": "Unknown function: count_today",
      "line": 42
    }
  ]
}

10. Runtime Fail-Closed Semantics [NORMATIVE]

If a tripwire condition cannot be evaluated at runtime, the implementation MUST fail closed by applying on_fail.decision and on_fail.reason immediately.

Fail-closed triggers:

Trigger Description
Evaluation timeout Tripwire evaluation exceeds latency_budget_ms
Missing required field Referenced field absent from runtime trace
Execution error Function or evaluator internal failure
Type mismatch Runtime type incompatible with operator

Fail-closed behavior is non-negotiable. Implementations MUST NOT silently skip or ignore a failing tripwire.


11. Condition Evaluation Security [NORMATIVE]

Condition evaluation MUST satisfy the following constraints:

  1. Evaluation MUST be sandboxed: no file system access, no network access, no process spawning, and no access to system resources beyond trace payload fields and approved state functions.
  2. Evaluation MUST be time-bounded: if evaluation exceeds per-tripwire latency_budget_ms, fail-closed semantics in Section 10 apply.
  3. Evaluation MUST be side-effect-free: condition execution MUST NOT mutate state; state functions are read-only.
  4. Implementations MUST NOT use general-purpose eval(), exec(), or equivalent dynamic code execution for condition evaluation. Conditions MUST be parsed and evaluated against the formal grammar in Section 4.
  5. Regex patterns MUST conform to RE2 syntax (§6.2).

12. Tier Classification and Budgets [NORMATIVE]

Tier Allowed Operations Default Latency Budget Example
0 In-memory deterministic checks <100ms Pattern match, value comparison
1 Local state/cache/DB lookup <300ms Rate limits, session counters

Implementations MUST enforce per-tripwire latency_budget_ms and SHOULD monitor budget compliance. If a tier-0 tripwire exceeds its budget, fail-closed semantics apply.

Latency budgets defined in ACGP (per-tripwire, per-tier, and per-profile p95 ceilings) apply to governance evaluation time only, measured from receipt of the CognitiveTrace by the evaluation engine to emission of the intervention decision. Network round-trip time between Operating Agent and Governance Steward is excluded.

Evaluation latency measurement MUST start when the evaluation engine receives the CognitiveTrace object (or equivalent in-memory representation) and MUST end when the intervention decision is emitted to the caller. Blueprint resolution (inheritance merge) is included in the measurement. Network serialization/deserialization is excluded.

Implementations SHOULD measure and report end-to-end governance latency (including network) separately from evaluation latency. For distributed topologies, implementations SHOULD document expected network latency contributions.


13. Emergency Overrides [NORMATIVE]

For Governance Tier GT-5 agents and safety-focused deployments, implementations SHOULD support emergency override mechanisms:

13.1 Kill-Switch

For v1.0.0-alpha.2, kill-switch activation is an out-of-band operational control, not a standard wire message type.

Implementations that expose kill-switch capability MUST ensure that kill-switch activation preempts in-flight evaluation and prevents subsequent governed actions until cleared.

A kill-switch MUST be available for Governance Tier GT-5 agents:

  • Immediately halts all agent actions
  • Does not require tripwire evaluation
  • Requires authorized operator credentials
  • MUST be logged to the Governance Store with operator identity

13.2 Dual Control

For Governance Tier GT-5, halt interventions SHOULD require dual approval (two-person rule) before an agent can be resumed.

13.3 Override Logging

All emergency actions MUST produce immutable audit records including:

  • Operator identity
  • Justification
  • Timestamp
  • Action taken
  • Agent state at time of override

Implementations SHOULD provide a lint tool for pre-deployment tripwire validation.

The linter SHOULD:

  • Parse each condition and report syntax version used
  • Flag non-canonical constructs and propose canonical v1.0 rewrites
  • Validate function support and arity
  • Validate field roots against the trace schema
  • Emit machine-readable output for CI:
{
  "blueprint_id": "finance/trading@2.0.0",
  "inferred_tripwire_dsl_version": "1.0",
  "issues": [
    {
      "tripwire_id": "daily_trade_count",
      "severity": "warning",
      "code": "NONCANONICAL_SYNTAX",
      "message": "Non-canonical expression detected",
      "suggested_rewrite": "exceeds_rate(agent_id, 50, \"1d\") == false"
    }
  ]
}

15. Syntax Version Guidance [NORMATIVE]

15.1 Syntax Versioning

Blueprint no longer exposes a tripwire_syntax_version field in blueprint core.

  • Tripwire expressions in canonical Blueprint artifacts use the Section 4 DSL.
  • Implementations MUST interpret canonical Blueprint tripwire expressions as DSL version 1.0.
  • Lint and validation tooling MAY report an inferred DSL version for diagnostics.
  • If a Blueprint artifact includes tripwire_syntax_version, implementations MUST reject it as a non-canonical extra field during schema validation.

16. Conformance Requirements

A conformant ACGP-4 implementation MUST:

  1. Parse and evaluate all standard operators and functions (Section 5-6)
  2. Support compound expressions with at least 3 levels of nesting
  3. Execute tripwires before CTQ-based evaluation (precedence Section 2)
  4. Short-circuit on halt — terminate evaluation immediately
  5. Perform activation-time validation on blueprint load (Section 9)
  6. Implement runtime fail-closed semantics for all failure modes (Section 10)
  7. Enforce per-tripwire latency_budget_ms (Section 11)
  8. Log all tripwire activations to the Governance Store
  9. Support vector-based conformance verification (conformance/vectors/tripwire-*.json)
  10. Return clear error messages for malformed conditions

Normative References

  • RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
  • RFC 3339 — Date and Time on the Internet: Timestamps
  • RFC 8785 — JSON Canonicalization Scheme (JCS)
  • ACGP-1 — Core Concepts & Terminology, v1.0, 2026
  • ACGP-2 — Messages & Wire Protocol, v1.0, 2026
  • ACGP-3 — Blueprints, Traces & Evaluation, v1.0, 2026
  • ACGP-4 — Tripwires & Safety Semantics, v1.0, 2026
  • ACGP-5 — Audit & Privacy Controls, v1.0, 2026
  • ACGP-6 — Conformance, v1.0, 2026