Tripwires & Safety Semantics¶

Status: Standard-only Alpha (v1.0.0-alpha.2)
Last Updated: 2026-03-07
Spec ID: ACGP-4
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119 and RFC 8174)

Abstract¶

This specification defines the tripwire condition language, severity classification, activation-time validation, runtime fail-closed semantics, precedence rules, emergency overrides, and evolution policy. Tripwires are the only mechanism that can trigger a HALT intervention.

Requirements Language¶

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Scope [NORMATIVE]¶

ACGP-4 defines the safety-critical policy language and execution semantics for tripwires.

ACGP-4 does not define:

Wire transport (→ ACGP-2)
General CTQ metric model or threshold mapping (→ ACGP-3)
Audit retention (→ ACGP-5)

2. Tripwire Role and Precedence [NORMATIVE]¶

Tripwires are high-priority safety checks that fire before threshold-based CTQ evaluation. They catch catastrophic conditions that statistical scoring might miss.

Normative precedence rules:

Tripwires MUST execute before CTQ metric scoring.
Tripwire-triggered decisions MUST take precedence over threshold-derived decisions.
halt is tripwire-only and MUST NOT be produced by threshold mapping alone.
If any tripwire issues halt, evaluation MUST terminate immediately (short-circuit).
If multiple tripwires trigger, the strictest on_fail.decision MUST apply (halt > block > escalate > nudge > ok). Severity is advisory for authoring only and MUST NOT alter the runtime decision.

halt is a tripwire-originated intervention only. Threshold mapping, trust-debt posture changes, and non-tripwire evaluation steps MUST NOT synthesize halt.

Trust-debt threshold handling in ACGP-3 MAY escalate an intervention but MUST NEVER produce halt; halt remains reserved exclusively for tripwire outcomes.

2.1 Polarity Convention [NORMATIVE]¶

The condition expression describes the violation pattern. If the condition evaluates to true for a given trace, the tripwire fires and the on_fail action is applied.

The name on_fail refers to the agent's action failing governance review, not to the condition expression evaluating to false.

Note: A future version (v1.1) may introduce on_trigger as a clearer alias for on_fail. For v1.0, on_fail is the canonical wire name.

3. Tripwire Schema [NORMATIVE]¶

Each tripwire object MUST include:

Field	Type	Required	Description
`id`	string	REQUIRED	Unique identifier (e.g., `pii_exposure_check`)
`when`	object	OPTIONAL	Trigger condition: `hook`, optional `tool`. If omitted, the tripwire applies to all hook values. There is no `"any"` hook value on the wire. This is the default for global safety tripwires.
`condition`	string or object	REQUIRED	DSL expression (Section 4)
`on_fail.decision`	enum	REQUIRED	One of: `nudge`, `escalate`, `block`, `halt`
`on_fail.reason`	string	REQUIRED	Human-readable explanation
`eval_tier`	integer	optional	`0` or `1` (default: `0`)
`latency_budget_ms`	integer	optional	Per-tripwire budget (default: 100/300)
`requires_state`	boolean	optional	Whether stateful storage is needed
`severity`	enum	optional	`standard`, `critical`, `severe`

Note: To flag without altering the primary decision, use a rule-based check with flag: true (see ACGP-3 §3). Tripwire on_fail.decision does not include flag because tripwires always produce a primary intervention.

Tier constraint: Tripwires MUST NOT declare eval_tier > 1 in v1.0 core.

Stateful tripwire scoping [NORMATIVE]:

Tripwires with requires_state: false (default) MUST evaluate using only trace payload fields and the condition grammar in Section 4. No state queries are permitted.
Tripwires with requires_state: true MAY use standardized stateful functions defined in Section 6. Stateful functions are the only mechanism for state access.
Tripwires with requires_state: true SHOULD declare eval_tier: 1 unless the implementation provides a local state cache; with a local cache, eval_tier: 0 is permitted.

4. Formal Condition Grammar (BNF) [NORMATIVE]¶

<condition>      ::= <compound_expr> | <simple_expr>

<compound_expr>  ::= <all_expr> | <any_expr> | <not_expr>
<all_expr>       ::= "all" ":" "[" <condition_list> "]"
<any_expr>       ::= "any" ":" "[" <condition_list> "]"
<not_expr>       ::= "NOT" <condition>
<condition_list> ::= <condition> | <condition> "," <condition_list>

<simple_expr>    ::= <comparison> | <function_call>

<comparison>     ::= <field_access> <operator> <value>
<operator>       ::= ">" | ">=" | "<" | "<=" | "==" | "!="
                    | "contains" | "matches"

<function_call>  ::= <function_name> "(" <argument_list_opt> ")"
<function_name>  ::= "is_external" | "in_allowlist" | "in_denylist"
                    | "matches_regex" | "contains_entity" | "exceeds_rate"
<argument_list_opt> ::= ε | <argument_list>
<argument_list>  ::= <argument> | <argument> "," <argument_list>
<argument>       ::= <field_access> | <value>

<field_access>   ::= <identifier> | <identifier> "." <field_access>
<identifier>     ::= <alpha> <identifier_tail>
<identifier_tail>::= ε | <ident_char> <identifier_tail>
<ident_char>     ::= <alpha> | <digit> | "_"

<value>          ::= <string> | <number> | <boolean> | <array>
<array>          ::= "[" <value_list_opt> "]"
<value_list_opt> ::= ε | <value_list>
<value_list>     ::= <value> | <value> "," <value_list>

<string>         ::= '"' <string_chars> '"'
<number>         ::= <sign_opt> <digits> <fraction_opt>
<boolean>        ::= "true" | "false"

Implementations MUST support nested compound expressions at least 3 levels deep.

4.1 Canonical Field Roots [NORMATIVE]¶

Tripwire field access uses dotted identifiers whose first segment MUST be one of the canonical roots defined below.

Unless explicitly stated otherwise, field paths are resolved against the canonical ACGP-3 trace model. Implementations MUST reject undeclared root aliases in normative mode.

Root	Resolves To	Notes
`action`	Trace action object	Canonical root for intended action metadata
`args`	`action.parameters`	Canonical shorthand retained for v1.0 tripwire authoring
`reasoning`	Trace reasoning field	Optional trace content; missing field is fail-closed if referenced
`confidence`	Trace confidence field	Numeric
`agent_id`	Governed agent principal identifier from the trace	Scalar
`governance_tier`	Trace governance tier	Serialized as `GT-*`
`meta`	Trace metadata object	Implementation-supplied metadata
`output`	Primary output object, if present	Optional
`outputs`	Multi-output collection, if present	Optional
`tool`	Active tool identifier / tool context	Optional
`source_refs`	Evidence/source references	Optional
`destination`	Delivery target / sink field	Optional
`content`	Primary content field	Optional
`storage`	Storage-target metadata	Optional

args is the only canonical shorthand alias defined by v1.0, and resolves to action.parameters.

Implementations MUST NOT accept additional root aliases unless operating in a clearly documented non-conformant extension mode.

For the authoritative semantic meaning of trace fields, see ACGP-3 §4.1.

If a tripwire expression references an unknown root identifier, blueprint activation MUST fail with a named validation error.

5. Standard Operators [NORMATIVE]¶

Operator	Description	Example
`>`	Greater than	`args.amount > 10000`
`>=`	Greater or equal	`meta.response_size_bytes >= 1048576`
`<`	Less than	`meta.retry_count < 3`
`<=`	Less or equal	`meta.trust_debt <= 0.5`
`==`	Equal	`action.type == "delete"`
`!=`	Not equal	`destination != "internal"`
`contains`	String contains	`content contains "password"`
`matches`	Regex match	`content matches "\\d{3}-\\d{2}-\\d{4}"`

6. Standard Functions [NORMATIVE]¶

Function	Description	Example
`is_external(field)`	Check if endpoint is external	`is_external(destination)`
`in_allowlist(field, list)`	Check against named allowlist	`in_allowlist(tool, "approved_tools")`
`in_denylist(field, list)`	Check against named denylist	`in_denylist(destination, "blocked_domains")`
`matches_regex(field, pattern)`	Named regex pattern match	`matches_regex(content, "SSN_PATTERN")`
`contains_entity(field, type)`	Named-entity detection	`contains_entity(output, "credit_card")`
`exceeds_rate(agent_id, limit, window)`	Stateful rate limit check	`exceeds_rate(agent_id, 100, "1m")`
`recent_tool_sum(tool, field, window)`	Stateful sum of tool field values	`recent_tool_sum("execute_trade", "args.trade_value", "1d")`
`recent_tool_count(tool, window)`	Stateful count of tool invocations	`recent_tool_count("execute_trade", "1h")`
`rolling_intervention_rate(agent_id, window, types[])`	Stateful rate of intervention classes	`rolling_intervention_rate(agent_id, "24h", ["nudge", "block"])`

Implementations MUST parse and evaluate all standard operators and functions.

Stateful principal semantics [NORMATIVE]:

Functions such as exceeds_rate(agent_id, limit, window) and rolling_intervention_rate(agent_id, window, types[]) evaluate behavior for the governed agent principal identified by agent_id.
Implementations MUST NOT substitute sender_id, agent_label, session_id, trace_id, or runtime instance identifiers when evaluating per-agent stateful controls.

Tripwire conditions MUST NOT contain embedded query language (SQL, GraphQL, or equivalent). Stateful evaluation MUST use the standardized function set above or extension-registered functions.

Function-set versioning: v1.0 core defines the functions above. Extensions MAY register additional functions via extension registries (Advanced Trust Debt or future extensions).

6.1 Extension Function Registration [INFORMATIVE]¶

Implementations MAY allow blueprints to reference additional functions beyond the standard set. Extension functions MUST follow these conventions:

Naming: Extension functions MUST use the query_ prefix (e.g., query_external, query_credit_score). The prefix distinguishes them from standard functions during validation.
Registration: The steward or runtime MUST register allowed extension function names before blueprint activation. Unregistered query_* calls MUST be rejected at validation time.
Arity: Extension functions accept implementation-defined arguments. Validators SHOULD skip arity checks for registered extension functions.
Statefulness: Extension functions are implicitly stateful (requires_state: true SHOULD be declared on tripwires that use them).
Visibility boundary: If a tripwire depends on a private or local extension capability, the containing blueprint or bundle SHOULD declare that dependency through extensions.required[] or deployment-local metadata rather than exposing private backing details in function arguments.

# Example: registering an extension function at runtime
steward.register_extension_functions({"query_external", "query_credit_score"})

Conformance note: The conformance test suite uses query_external as a known example extension function. Implementations MUST accept it in test vectors without error.

6.2 Regex Profile and Determinism [NORMATIVE]¶

All matches / matches_regex evaluations MUST use RE2-compatible, linear-time regex. MUST reject backtracking constructs (backreferences, lookahead, lookbehind). Input strings MUST be normalized to Unicode NFC before evaluation. Patterns > 1024 characters MUST be rejected with TripwireRegexTooLong. Unknown flags MUST cause TripwireRegexInvalidFlag.

7. Compound Conditions [NORMATIVE]¶

7.1 ALL (AND Logic)¶

condition:
  all:
    - meta.response_size_bytes > 10485760
    - is_external(destination)
    - NOT in_allowlist(destination, "trusted_endpoints")

7.2 ANY (OR Logic)¶

condition:
  any:
    - action.type == "delete"
    - action.type == "drop"
    - action.type == "truncate"

7.3 NOT (Negation)¶

condition:
  NOT:
    in_allowlist(tool, "approved_tools")

8. Severity Classification [INFORMATIVE]¶

Tripwires support three severity categories for policy authoring triage and review.

Severity is authoring metadata only. At runtime, implementations MUST apply the explicit on_fail.decision on each triggered tripwire. If multiple tripwires fire, the strictest explicit decision wins (halt > block > escalate > nudge > ok). Severity MUST NOT be used to compute the runtime intervention.

Severity	Examples	Suggested authoring default	Authoring guidance
Standard	Budget exceeded, rate limit, API quota	`block`	Use for hard safety boundaries that normally stop the current action without ending the session.
Critical	Secrets in output, production write, PII leak	`block` or `halt`	Choose the explicit runtime decision in the blueprint based on the deployment's risk posture.
Severe	Data exfiltration, collusion, credential theft	`halt`	Reserve for conditions that should terminate the governed session immediately.

Severity classification is advisory for policy authoring. The explicit on_fail.decision remains authoritative at runtime.

8.1 Standard Tripwires¶

Implementations SHOULD ship these built-in tripwires:

Budget:

- id: spend_cap_exceeded
  severity: standard
  condition: args.total_spend > args.budget_limit
  on_fail: { decision: block, reason: "Budget limit exceeded" }

Rate Limiting:

- id: rate_limit_hit
  severity: standard
  eval_tier: 1
  requires_state: true
  condition: exceeds_rate(agent_id, 100, "1m")
  on_fail: { decision: block, reason: "Rate limit exceeded (100 req/min)" }

Secrets Detection:

- id: secrets_detected
  severity: critical
  condition:
    any:
      - content contains "AKIA"
      - matches_regex(content, "-----BEGIN.*PRIVATE KEY-----")
  on_fail: { decision: block, reason: "Secrets detected in output" }

PII Exposure:

- id: pii_exposure
  severity: critical
  condition:
    any:
      - matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b")
      - contains_entity(content, "credit_card")
      - contains_entity(content, "bank_account")
  on_fail: { decision: block, reason: "PII detected in output" }

Data Exfiltration:

- id: data_exfiltration
  severity: severe
  condition:
    all:
      - meta.response_size_bytes > 10485760
      - is_external(destination)
      - NOT in_allowlist(destination, "approved_endpoints")
  on_fail: { decision: halt, reason: "Potential data exfiltration detected" }

Dangerous Database Operations:

- id: dangerous_db_ops
  severity: severe
  condition:
    all:
      - tool == "database_query"
      - any:
          - action.query contains "DROP"
          - action.query contains "DELETE FROM"
          - action.query contains "TRUNCATE"
      - NOT in_allowlist(action.table, "deletable_tables")
  on_fail: { decision: halt, reason: "Dangerous database operation blocked" }

9. Activation-Time Validation [NORMATIVE]¶

At blueprint activation/load time, implementations MUST:

Parse condition expressions against the BNF grammar
Validate operator and function names are recognized
Validate function arity and argument types
Validate referenced fields are permitted by the trace schema

If any tripwire condition fails validation, blueprint activation MUST fail and return structured validation errors:

{
  "blueprint_id": "finance/trading@2.0.0",
  "validation_errors": [
    {
      "tripwire_id": "daily_trade_count",
      "error": "Unknown function: count_today",
      "line": 42
    }
  ]
}

10. Runtime Fail-Closed Semantics [NORMATIVE]¶

If a tripwire condition cannot be evaluated at runtime, the implementation MUST fail closed by applying on_fail.decision and on_fail.reason immediately.

Fail-closed triggers:

Trigger	Description
Evaluation timeout	Tripwire evaluation exceeds `latency_budget_ms`
Missing required field	Referenced field absent from runtime trace
Execution error	Function or evaluator internal failure
Type mismatch	Runtime type incompatible with operator

Fail-closed behavior is non-negotiable. Implementations MUST NOT silently skip or ignore a failing tripwire.

11. Condition Evaluation Security [NORMATIVE]¶

Condition evaluation MUST satisfy the following constraints:

Evaluation MUST be sandboxed: no file system access, no network access, no process spawning, and no access to system resources beyond trace payload fields and approved state functions.
Evaluation MUST be time-bounded: if evaluation exceeds per-tripwire latency_budget_ms, fail-closed semantics in Section 10 apply.
Evaluation MUST be side-effect-free: condition execution MUST NOT mutate state; state functions are read-only.
Implementations MUST NOT use general-purpose eval(), exec(), or equivalent dynamic code execution for condition evaluation. Conditions MUST be parsed and evaluated against the formal grammar in Section 4.
Regex patterns MUST conform to RE2 syntax (§6.2).

12. Tier Classification and Budgets [NORMATIVE]¶

Tier	Allowed Operations	Default Latency Budget	Example
0	In-memory deterministic checks	<100ms	Pattern match, value comparison
1	Local state/cache/DB lookup	<300ms	Rate limits, session counters

Implementations MUST enforce per-tripwire latency_budget_ms and SHOULD monitor budget compliance. If a tier-0 tripwire exceeds its budget, fail-closed semantics apply.

Latency budgets defined in ACGP (per-tripwire, per-tier, and per-profile p95 ceilings) apply to governance evaluation time only, measured from receipt of the CognitiveTrace by the evaluation engine to emission of the intervention decision. Network round-trip time between Operating Agent and Governance Steward is excluded.

Evaluation latency measurement MUST start when the evaluation engine receives the CognitiveTrace object (or equivalent in-memory representation) and MUST end when the intervention decision is emitted to the caller. Blueprint resolution (inheritance merge) is included in the measurement. Network serialization/deserialization is excluded.

Implementations SHOULD measure and report end-to-end governance latency (including network) separately from evaluation latency. For distributed topologies, implementations SHOULD document expected network latency contributions.

13. Emergency Overrides [NORMATIVE]¶

For Governance Tier GT-5 agents and safety-focused deployments, implementations SHOULD support emergency override mechanisms:

13.1 Kill-Switch¶

For v1.0.0-alpha.2, kill-switch activation is an out-of-band operational control, not a standard wire message type.

Implementations that expose kill-switch capability MUST ensure that kill-switch activation preempts in-flight evaluation and prevents subsequent governed actions until cleared.

A kill-switch MUST be available for Governance Tier GT-5 agents:

Immediately halts all agent actions
Does not require tripwire evaluation
Requires authorized operator credentials
MUST be logged to the Governance Store with operator identity

13.2 Dual Control¶

For Governance Tier GT-5, halt interventions SHOULD require dual approval (two-person rule) before an agent can be resumed.

13.3 Override Logging¶

All emergency actions MUST produce immutable audit records including:

Operator identity
Justification
Timestamp
Action taken
Agent state at time of override

14. Lint Tooling [RECOMMENDED]¶

Implementations SHOULD provide a lint tool for pre-deployment tripwire validation.

The linter SHOULD:

Parse each condition and report syntax version used
Flag non-canonical constructs and propose canonical v1.0 rewrites
Validate function support and arity
Validate field roots against the trace schema
Emit machine-readable output for CI:

{
  "blueprint_id": "finance/trading@2.0.0",
  "inferred_tripwire_dsl_version": "1.0",
  "issues": [
    {
      "tripwire_id": "daily_trade_count",
      "severity": "warning",
      "code": "NONCANONICAL_SYNTAX",
      "message": "Non-canonical expression detected",
      "suggested_rewrite": "exceeds_rate(agent_id, 50, \"1d\") == false"
    }
  ]
}

15. Syntax Version Guidance [NORMATIVE]¶

15.1 Syntax Versioning¶

Blueprint no longer exposes a tripwire_syntax_version field in blueprint core.

Tripwire expressions in canonical Blueprint artifacts use the Section 4 DSL.
Implementations MUST interpret canonical Blueprint tripwire expressions as DSL version 1.0.
Lint and validation tooling MAY report an inferred DSL version for diagnostics.
If a Blueprint artifact includes tripwire_syntax_version, implementations MUST reject it as a non-canonical extra field during schema validation.

16. Conformance Requirements¶

A conformant ACGP-4 implementation MUST:

Parse and evaluate all standard operators and functions (Section 5-6)
Support compound expressions with at least 3 levels of nesting
Execute tripwires before CTQ-based evaluation (precedence Section 2)
Short-circuit on halt — terminate evaluation immediately
Perform activation-time validation on blueprint load (Section 9)
Implement runtime fail-closed semantics for all failure modes (Section 10)
Enforce per-tripwire latency_budget_ms (Section 11)
Log all tripwire activations to the Governance Store
Support vector-based conformance verification (conformance/vectors/tripwire-*.json)
Return clear error messages for malformed conditions

Normative References¶

RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
RFC 3339 — Date and Time on the Internet: Timestamps
RFC 8785 — JSON Canonicalization Scheme (JCS)
ACGP-1 — Core Concepts & Terminology, v1.0, 2026
ACGP-2 — Messages & Wire Protocol, v1.0, 2026
ACGP-3 — Blueprints, Traces & Evaluation, v1.0, 2026
ACGP-4 — Tripwires & Safety Semantics, v1.0, 2026
ACGP-5 — Audit & Privacy Controls, v1.0, 2026
ACGP-6 — Conformance, v1.0, 2026