Tripwires

Tripwires are hard limits that trigger immediate interventions when exceeded.


What are Tripwires?

Tripwires provide fail-safe boundaries that immediately block actions exceeding defined thresholds, regardless of reasoning quality.

from acgp import GovernanceSteward, PostgresStateStorage

blueprint = {
    "id": "tripwires/example@1.0",
    "version": "1.0.0",
    "description": "Illustrative tripwire-only policy",
    "tripwires": [
        {
            "id": "max_refund",
            "when": {"hook": "tool_call", "tool": "issue_refund"},
            "condition": "args.amount > 500",
            "eval_tier": 0,
            "on_fail": {
                "decision": "block",
                "reason": "Refund amount exceeds 500",
            },
        }
    ],
}

steward = GovernanceSteward.production(
    blueprint_dict=blueprint,
    state_storage=PostgresStateStorage(connection_string="postgresql://runtime/acgp"),
)

Common Tripwire Types

Financial Limits

  • Maximum transaction amounts
  • Daily spending limits
  • Cumulative budget caps

Rate Limits

  • Actions per minute/hour/day
  • API call limits
  • Resource usage caps

Security Boundaries

  • Data access limits
  • Privilege escalation prevention
  • External communication restrictions

Quality Thresholds

  • Minimum confidence scores
  • Maximum error rates
  • Required approval levels

Tripwire Behavior

When a tripwire is triggered: 1. Action is immediately BLOCKED 2. Event is logged with details 3. Optional notifications sent 4. Trust Debt may increase or trigger downstream review thresholds

# This will trigger the max_refund tripwire
trace = CognitiveTrace(
    reasoning="Customer dissatisfied, issuing full refund",
    action="issue_refund",
    parameters={"amount": 1000}  # Exceeds max_refund of 500
)

result = steward.evaluate(trace)
# result.intervention == "block"
# result.message == "Tripwire triggered: amount exceeds max_refund of 500"

Defining Custom Tripwires

from acgp import GovernanceSteward, PostgresStateStorage

custom_blueprint = {
    "id": "tripwires/custom@1.0",
    "version": "1.0.0",
    "description": "Custom tripwire examples",
    "tripwires": [
        {
            "id": "max_transaction",
            "when": {"hook": "tool_call", "tool": "transfer"},
            "condition": "args.amount > 1000",
            "eval_tier": 0,
            "on_fail": {"decision": "block", "reason": "Transaction limit exceeded"},
        },
        {
            "id": "max_actions_per_minute",
            "when": {"hook": "tool_call"},
            "condition": "exceeds_rate(agent_id, 10, '1m')",
            "eval_tier": 1,
            "requires_state": true,
            "on_fail": {"decision": "block", "reason": "Rate limit exceeded"},
        },
    ],
}

steward = GovernanceSteward.production(
    blueprint_dict=custom_blueprint,
    state_storage=PostgresStateStorage(connection_string="postgresql://runtime/acgp"),
)

Tripwire Severity Categories

Tripwires are classified into three severity levels:

Category Examples Default Action Governance Tier Override
Standard Budget exceeded, rate limit hit Block Escalate if Governance Tier ≤ GT-2
Critical Secrets in output, production write Block + Alert Halt if Governance Tier ≥ GT-3
Severe Data exfiltration, collusion detected Halt Always Halt

Tripwire Precedence:

  • Tripwires ALWAYS take precedence over CTQ scores
  • Multiple tripwires: Use highest severity
  • Tripwires cannot be overridden by high CTQ

Tripwires vs. CTQ Checks — When to Use Which

Dimension Tripwires (ACGP-4) CTQ Checks / Scorers (ACGP-3 §5)
Schema condition DSL string + on_fail metric.check.type + metric.check.args
Purpose Hard safety boundaries Continuous quality assessment
Output Boolean pass/fail → intervention Float 0.0–1.0 risk score → threshold mapping
Speed Eval Tier 0–1 (< 300 ms) Eval Tier 0–2 (up to seconds for cognitive-evaluator)
Override Never — tripwire result is final Contextual — scores feed weighted aggregation
Statefulness Optional (requires_state) Stateless (scorer receives trace only)
Typical use "Block if amount > 10 000" "How well-grounded is this response?"

Rule of thumb

Use a tripwire when the answer is binary ("this must never happen"). Use a CTQ check when you need a graded quality signal that feeds into a composite risk score.


Best Practices

Set Conservative Limits

Start with strict tripwires and relax them based on experience.

Don't Rely Solely on Tripwires

Tripwires are a safety net, not a replacement for quality evaluation.

Monitor Tripwire Hits

Frequent tripwire triggers indicate agent needs retraining or policy adjustment.


Trust Debt & Runtime Posture Implementation Guide