Tripwires & Safety Semantics¶
Status: Standard-only Alpha (v1.0.0-alpha.2)
Last Updated: 2026-03-07
Spec ID: ACGP-4
Normative Keywords: MUST, SHOULD, MAY (per RFC 2119 and RFC 8174)
Abstract¶
This specification defines the tripwire condition language, severity classification, activation-time validation, runtime fail-closed semantics, precedence rules, emergency overrides, and evolution policy. Tripwires are the only mechanism that can trigger a HALT intervention.
Requirements Language¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
1. Scope [NORMATIVE]¶
ACGP-4 defines the safety-critical policy language and execution semantics for tripwires.
ACGP-4 does not define:
- Wire transport (→ ACGP-2)
- General CTQ metric model or threshold mapping (→ ACGP-3)
- Audit retention (→ ACGP-5)
2. Tripwire Role and Precedence [NORMATIVE]¶
Tripwires are high-priority safety checks that fire before threshold-based CTQ evaluation. They catch catastrophic conditions that statistical scoring might miss.
Normative precedence rules:
- Tripwires MUST execute before CTQ metric scoring.
- Tripwire-triggered decisions MUST take precedence over threshold-derived decisions.
haltis tripwire-only and MUST NOT be produced by threshold mapping alone.- If any tripwire issues
halt, evaluation MUST terminate immediately (short-circuit). - If multiple tripwires trigger, the strictest
on_fail.decisionMUST apply (halt>block>escalate>nudge>ok). Severity is advisory for authoring only and MUST NOT alter the runtime decision.
halt is a tripwire-originated intervention only. Threshold mapping, trust-debt posture changes, and non-tripwire evaluation steps MUST NOT synthesize halt.
Trust-debt threshold handling in ACGP-3 MAY escalate an intervention but MUST NEVER produce halt; halt remains reserved exclusively for tripwire outcomes.
2.1 Polarity Convention [NORMATIVE]¶
The condition expression describes the violation pattern. If the condition evaluates to true for a given trace, the tripwire fires and the on_fail action is applied.
The name on_fail refers to the agent's action failing governance review, not to the condition expression evaluating to false.
Note: A future version (v1.1) may introduce
on_triggeras a clearer alias foron_fail. For v1.0,on_failis the canonical wire name.
3. Tripwire Schema [NORMATIVE]¶
Each tripwire object MUST include:
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | REQUIRED | Unique identifier (e.g., pii_exposure_check) |
when |
object | OPTIONAL | Trigger condition: hook, optional tool. If omitted, the tripwire applies to all hook values. There is no "any" hook value on the wire. This is the default for global safety tripwires. |
condition |
string or object | REQUIRED | DSL expression (Section 4) |
on_fail.decision |
enum | REQUIRED | One of: nudge, escalate, block, halt |
on_fail.reason |
string | REQUIRED | Human-readable explanation |
eval_tier |
integer | optional | 0 or 1 (default: 0) |
latency_budget_ms |
integer | optional | Per-tripwire budget (default: 100/300) |
requires_state |
boolean | optional | Whether stateful storage is needed |
severity |
enum | optional | standard, critical, severe |
Note: To flag without altering the primary decision, use a rule-based check with
flag: true(see ACGP-3 §3). Tripwireon_fail.decisiondoes not includeflagbecause tripwires always produce a primary intervention.
Tier constraint: Tripwires MUST NOT declare eval_tier > 1 in v1.0 core.
Stateful tripwire scoping [NORMATIVE]:
- Tripwires with
requires_state: false(default) MUST evaluate using only trace payload fields and the condition grammar in Section 4. No state queries are permitted. - Tripwires with
requires_state: trueMAY use standardized stateful functions defined in Section 6. Stateful functions are the only mechanism for state access. - Tripwires with
requires_state: trueSHOULD declareeval_tier: 1unless the implementation provides a local state cache; with a local cache,eval_tier: 0is permitted.
4. Formal Condition Grammar (BNF) [NORMATIVE]¶
<condition> ::= <compound_expr> | <simple_expr>
<compound_expr> ::= <all_expr> | <any_expr> | <not_expr>
<all_expr> ::= "all" ":" "[" <condition_list> "]"
<any_expr> ::= "any" ":" "[" <condition_list> "]"
<not_expr> ::= "NOT" <condition>
<condition_list> ::= <condition> | <condition> "," <condition_list>
<simple_expr> ::= <comparison> | <function_call>
<comparison> ::= <field_access> <operator> <value>
<operator> ::= ">" | ">=" | "<" | "<=" | "==" | "!="
| "contains" | "matches"
<function_call> ::= <function_name> "(" <argument_list_opt> ")"
<function_name> ::= "is_external" | "in_allowlist" | "in_denylist"
| "matches_regex" | "contains_entity" | "exceeds_rate"
<argument_list_opt> ::= ε | <argument_list>
<argument_list> ::= <argument> | <argument> "," <argument_list>
<argument> ::= <field_access> | <value>
<field_access> ::= <identifier> | <identifier> "." <field_access>
<identifier> ::= <alpha> <identifier_tail>
<identifier_tail>::= ε | <ident_char> <identifier_tail>
<ident_char> ::= <alpha> | <digit> | "_"
<value> ::= <string> | <number> | <boolean> | <array>
<array> ::= "[" <value_list_opt> "]"
<value_list_opt> ::= ε | <value_list>
<value_list> ::= <value> | <value> "," <value_list>
<string> ::= '"' <string_chars> '"'
<number> ::= <sign_opt> <digits> <fraction_opt>
<boolean> ::= "true" | "false"
Implementations MUST support nested compound expressions at least 3 levels deep.
4.1 Canonical Field Roots [NORMATIVE]¶
Tripwire field access uses dotted identifiers whose first segment MUST be one of the canonical roots defined below.
Unless explicitly stated otherwise, field paths are resolved against the canonical ACGP-3 trace model. Implementations MUST reject undeclared root aliases in normative mode.
| Root | Resolves To | Notes |
|---|---|---|
action |
Trace action object | Canonical root for intended action metadata |
args |
action.parameters |
Canonical shorthand retained for v1.0 tripwire authoring |
reasoning |
Trace reasoning field | Optional trace content; missing field is fail-closed if referenced |
confidence |
Trace confidence field | Numeric |
agent_id |
Governed agent principal identifier from the trace | Scalar |
governance_tier |
Trace governance tier | Serialized as GT-* |
meta |
Trace metadata object | Implementation-supplied metadata |
output |
Primary output object, if present | Optional |
outputs |
Multi-output collection, if present | Optional |
tool |
Active tool identifier / tool context | Optional |
source_refs |
Evidence/source references | Optional |
destination |
Delivery target / sink field | Optional |
content |
Primary content field | Optional |
storage |
Storage-target metadata | Optional |
args is the only canonical shorthand alias defined by v1.0, and resolves to action.parameters.
Implementations MUST NOT accept additional root aliases unless operating in a clearly documented non-conformant extension mode.
For the authoritative semantic meaning of trace fields, see ACGP-3 §4.1.
If a tripwire expression references an unknown root identifier, blueprint activation MUST fail with a named validation error.
5. Standard Operators [NORMATIVE]¶
| Operator | Description | Example |
|---|---|---|
> |
Greater than | args.amount > 10000 |
>= |
Greater or equal | meta.response_size_bytes >= 1048576 |
< |
Less than | meta.retry_count < 3 |
<= |
Less or equal | meta.trust_debt <= 0.5 |
== |
Equal | action.type == "delete" |
!= |
Not equal | destination != "internal" |
contains |
String contains | content contains "password" |
matches |
Regex match | content matches "\\d{3}-\\d{2}-\\d{4}" |
6. Standard Functions [NORMATIVE]¶
| Function | Description | Example |
|---|---|---|
is_external(field) |
Check if endpoint is external | is_external(destination) |
in_allowlist(field, list) |
Check against named allowlist | in_allowlist(tool, "approved_tools") |
in_denylist(field, list) |
Check against named denylist | in_denylist(destination, "blocked_domains") |
matches_regex(field, pattern) |
Named regex pattern match | matches_regex(content, "SSN_PATTERN") |
contains_entity(field, type) |
Named-entity detection | contains_entity(output, "credit_card") |
exceeds_rate(agent_id, limit, window) |
Stateful rate limit check | exceeds_rate(agent_id, 100, "1m") |
recent_tool_sum(tool, field, window) |
Stateful sum of tool field values | recent_tool_sum("execute_trade", "args.trade_value", "1d") |
recent_tool_count(tool, window) |
Stateful count of tool invocations | recent_tool_count("execute_trade", "1h") |
rolling_intervention_rate(agent_id, window, types[]) |
Stateful rate of intervention classes | rolling_intervention_rate(agent_id, "24h", ["nudge", "block"]) |
Implementations MUST parse and evaluate all standard operators and functions.
Stateful principal semantics [NORMATIVE]:
- Functions such as
exceeds_rate(agent_id, limit, window)androlling_intervention_rate(agent_id, window, types[])evaluate behavior for the governed agent principal identified byagent_id. - Implementations MUST NOT substitute
sender_id,agent_label,session_id,trace_id, or runtime instance identifiers when evaluating per-agent stateful controls.
Tripwire conditions MUST NOT contain embedded query language (SQL, GraphQL, or equivalent). Stateful evaluation MUST use the standardized function set above or extension-registered functions.
Function-set versioning: v1.0 core defines the functions above. Extensions MAY register additional functions via extension registries (Advanced Trust Debt or future extensions).
6.1 Extension Function Registration [INFORMATIVE]¶
Implementations MAY allow blueprints to reference additional functions beyond the standard set. Extension functions MUST follow these conventions:
- Naming: Extension functions MUST use the
query_prefix (e.g.,query_external,query_credit_score). The prefix distinguishes them from standard functions during validation. - Registration: The steward or runtime MUST register allowed extension function names before blueprint activation. Unregistered
query_*calls MUST be rejected at validation time. - Arity: Extension functions accept implementation-defined arguments. Validators SHOULD skip arity checks for registered extension functions.
- Statefulness: Extension functions are implicitly stateful (
requires_state: trueSHOULD be declared on tripwires that use them). - Visibility boundary: If a tripwire depends on a private or local extension capability, the containing blueprint or bundle SHOULD declare that dependency through
extensions.required[]or deployment-local metadata rather than exposing private backing details in function arguments.
# Example: registering an extension function at runtime
steward.register_extension_functions({"query_external", "query_credit_score"})
Conformance note: The conformance test suite uses
query_externalas a known example extension function. Implementations MUST accept it in test vectors without error.
6.2 Regex Profile and Determinism [NORMATIVE]¶
All matches / matches_regex evaluations MUST use RE2-compatible, linear-time regex.
MUST reject backtracking constructs (backreferences, lookahead, lookbehind).
Input strings MUST be normalized to Unicode NFC before evaluation.
Patterns > 1024 characters MUST be rejected with TripwireRegexTooLong.
Unknown flags MUST cause TripwireRegexInvalidFlag.
7. Compound Conditions [NORMATIVE]¶
7.1 ALL (AND Logic)¶
condition:
all:
- meta.response_size_bytes > 10485760
- is_external(destination)
- NOT in_allowlist(destination, "trusted_endpoints")
7.2 ANY (OR Logic)¶
7.3 NOT (Negation)¶
8. Severity Classification [INFORMATIVE]¶
Tripwires support three severity categories for policy authoring triage and review.
Severity is authoring metadata only. At runtime, implementations MUST apply the explicit on_fail.decision on each triggered tripwire. If multiple tripwires fire, the strictest explicit decision wins (halt > block > escalate > nudge > ok). Severity MUST NOT be used to compute the runtime intervention.
| Severity | Examples | Suggested authoring default | Authoring guidance |
|---|---|---|---|
| Standard | Budget exceeded, rate limit, API quota | block |
Use for hard safety boundaries that normally stop the current action without ending the session. |
| Critical | Secrets in output, production write, PII leak | block or halt |
Choose the explicit runtime decision in the blueprint based on the deployment's risk posture. |
| Severe | Data exfiltration, collusion, credential theft | halt |
Reserve for conditions that should terminate the governed session immediately. |
Severity classification is advisory for policy authoring. The explicit on_fail.decision remains authoritative at runtime.
8.1 Standard Tripwires¶
Implementations SHOULD ship these built-in tripwires:
Budget:
- id: spend_cap_exceeded
severity: standard
condition: args.total_spend > args.budget_limit
on_fail: { decision: block, reason: "Budget limit exceeded" }
Rate Limiting:
- id: rate_limit_hit
severity: standard
eval_tier: 1
requires_state: true
condition: exceeds_rate(agent_id, 100, "1m")
on_fail: { decision: block, reason: "Rate limit exceeded (100 req/min)" }
Secrets Detection:
- id: secrets_detected
severity: critical
condition:
any:
- content contains "AKIA"
- matches_regex(content, "-----BEGIN.*PRIVATE KEY-----")
on_fail: { decision: block, reason: "Secrets detected in output" }
PII Exposure:
- id: pii_exposure
severity: critical
condition:
any:
- matches_regex(content, "\\b\\d{3}-\\d{2}-\\d{4}\\b")
- contains_entity(content, "credit_card")
- contains_entity(content, "bank_account")
on_fail: { decision: block, reason: "PII detected in output" }
Data Exfiltration:
- id: data_exfiltration
severity: severe
condition:
all:
- meta.response_size_bytes > 10485760
- is_external(destination)
- NOT in_allowlist(destination, "approved_endpoints")
on_fail: { decision: halt, reason: "Potential data exfiltration detected" }
Dangerous Database Operations:
- id: dangerous_db_ops
severity: severe
condition:
all:
- tool == "database_query"
- any:
- action.query contains "DROP"
- action.query contains "DELETE FROM"
- action.query contains "TRUNCATE"
- NOT in_allowlist(action.table, "deletable_tables")
on_fail: { decision: halt, reason: "Dangerous database operation blocked" }
9. Activation-Time Validation [NORMATIVE]¶
At blueprint activation/load time, implementations MUST:
- Parse condition expressions against the BNF grammar
- Validate operator and function names are recognized
- Validate function arity and argument types
- Validate referenced fields are permitted by the trace schema
If any tripwire condition fails validation, blueprint activation MUST fail and return structured validation errors:
{
"blueprint_id": "finance/trading@2.0.0",
"validation_errors": [
{
"tripwire_id": "daily_trade_count",
"error": "Unknown function: count_today",
"line": 42
}
]
}
10. Runtime Fail-Closed Semantics [NORMATIVE]¶
If a tripwire condition cannot be evaluated at runtime, the implementation MUST fail closed by applying on_fail.decision and on_fail.reason immediately.
Fail-closed triggers:
| Trigger | Description |
|---|---|
| Evaluation timeout | Tripwire evaluation exceeds latency_budget_ms |
| Missing required field | Referenced field absent from runtime trace |
| Execution error | Function or evaluator internal failure |
| Type mismatch | Runtime type incompatible with operator |
Fail-closed behavior is non-negotiable. Implementations MUST NOT silently skip or ignore a failing tripwire.
11. Condition Evaluation Security [NORMATIVE]¶
Condition evaluation MUST satisfy the following constraints:
- Evaluation MUST be sandboxed: no file system access, no network access, no process spawning, and no access to system resources beyond trace payload fields and approved state functions.
- Evaluation MUST be time-bounded: if evaluation exceeds per-tripwire
latency_budget_ms, fail-closed semantics in Section 10 apply. - Evaluation MUST be side-effect-free: condition execution MUST NOT mutate state; state functions are read-only.
- Implementations MUST NOT use general-purpose
eval(),exec(), or equivalent dynamic code execution for condition evaluation. Conditions MUST be parsed and evaluated against the formal grammar in Section 4. - Regex patterns MUST conform to RE2 syntax (§6.2).
12. Tier Classification and Budgets [NORMATIVE]¶
| Tier | Allowed Operations | Default Latency Budget | Example |
|---|---|---|---|
| 0 | In-memory deterministic checks | <100ms | Pattern match, value comparison |
| 1 | Local state/cache/DB lookup | <300ms | Rate limits, session counters |
Implementations MUST enforce per-tripwire latency_budget_ms and SHOULD monitor budget compliance. If a tier-0 tripwire exceeds its budget, fail-closed semantics apply.
Latency budgets defined in ACGP (per-tripwire, per-tier, and per-profile p95 ceilings) apply to governance evaluation time only, measured from receipt of the CognitiveTrace by the evaluation engine to emission of the intervention decision. Network round-trip time between Operating Agent and Governance Steward is excluded.
Evaluation latency measurement MUST start when the evaluation engine receives the CognitiveTrace object (or equivalent in-memory representation) and MUST end when the intervention decision is emitted to the caller. Blueprint resolution (inheritance merge) is included in the measurement. Network serialization/deserialization is excluded.
Implementations SHOULD measure and report end-to-end governance latency (including network) separately from evaluation latency. For distributed topologies, implementations SHOULD document expected network latency contributions.
13. Emergency Overrides [NORMATIVE]¶
For Governance Tier GT-5 agents and safety-focused deployments, implementations SHOULD support emergency override mechanisms:
13.1 Kill-Switch¶
For v1.0.0-alpha.2, kill-switch activation is an out-of-band operational control, not a standard wire message type.
Implementations that expose kill-switch capability MUST ensure that kill-switch activation preempts in-flight evaluation and prevents subsequent governed actions until cleared.
A kill-switch MUST be available for Governance Tier GT-5 agents:
- Immediately halts all agent actions
- Does not require tripwire evaluation
- Requires authorized operator credentials
- MUST be logged to the Governance Store with operator identity
13.2 Dual Control¶
For Governance Tier GT-5, halt interventions SHOULD require dual approval (two-person rule) before an agent can be resumed.
13.3 Override Logging¶
All emergency actions MUST produce immutable audit records including:
- Operator identity
- Justification
- Timestamp
- Action taken
- Agent state at time of override
14. Lint Tooling [RECOMMENDED]¶
Implementations SHOULD provide a lint tool for pre-deployment tripwire validation.
The linter SHOULD:
- Parse each condition and report syntax version used
- Flag non-canonical constructs and propose canonical v1.0 rewrites
- Validate function support and arity
- Validate field roots against the trace schema
- Emit machine-readable output for CI:
{
"blueprint_id": "finance/trading@2.0.0",
"inferred_tripwire_dsl_version": "1.0",
"issues": [
{
"tripwire_id": "daily_trade_count",
"severity": "warning",
"code": "NONCANONICAL_SYNTAX",
"message": "Non-canonical expression detected",
"suggested_rewrite": "exceeds_rate(agent_id, 50, \"1d\") == false"
}
]
}
15. Syntax Version Guidance [NORMATIVE]¶
15.1 Syntax Versioning¶
Blueprint no longer exposes a tripwire_syntax_version field in blueprint core.
- Tripwire expressions in canonical Blueprint artifacts use the Section 4 DSL.
- Implementations MUST interpret canonical Blueprint tripwire expressions as DSL version
1.0. - Lint and validation tooling MAY report an inferred DSL version for diagnostics.
- If a Blueprint artifact includes
tripwire_syntax_version, implementations MUST reject it as a non-canonical extra field during schema validation.
16. Conformance Requirements¶
A conformant ACGP-4 implementation MUST:
- Parse and evaluate all standard operators and functions (Section 5-6)
- Support compound expressions with at least 3 levels of nesting
- Execute tripwires before CTQ-based evaluation (precedence Section 2)
- Short-circuit on
halt— terminate evaluation immediately - Perform activation-time validation on blueprint load (Section 9)
- Implement runtime fail-closed semantics for all failure modes (Section 10)
- Enforce per-tripwire
latency_budget_ms(Section 11) - Log all tripwire activations to the Governance Store
- Support vector-based conformance verification (
conformance/vectors/tripwire-*.json) - Return clear error messages for malformed conditions
Normative References¶
- RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
- RFC 3339 — Date and Time on the Internet: Timestamps
- RFC 8785 — JSON Canonicalization Scheme (JCS)
- ACGP-1 — Core Concepts & Terminology, v1.0, 2026
- ACGP-2 — Messages & Wire Protocol, v1.0, 2026
- ACGP-3 — Blueprints, Traces & Evaluation, v1.0, 2026
- ACGP-4 — Tripwires & Safety Semantics, v1.0, 2026
- ACGP-5 — Audit & Privacy Controls, v1.0, 2026
- ACGP-6 — Conformance, v1.0, 2026