Tripwires

Tripwires are hard limits that trigger immediate interventions when exceeded.


What are Tripwires?

Tripwires provide fail-safe boundaries that immediately block actions exceeding defined thresholds, regardless of reasoning quality.

steward = GovernanceSteward(
    acl_tier="ACL-2",
    tripwires={
        "max_refund": 500,           # Max refund amount
        "max_discount_percent": 20,   # Max discount percentage
        "daily_action_limit": 100,    # Max actions per day
        "budget_limit": 10000         # Total budget limit
    }
)

Common Tripwire Types

Financial Limits

  • Maximum transaction amounts
  • Daily spending limits
  • Cumulative budget caps

Rate Limits

  • Actions per minute/hour/day
  • API call limits
  • Resource usage caps

Security Boundaries

  • Data access limits
  • Privilege escalation prevention
  • External communication restrictions

Quality Thresholds

  • Minimum confidence scores
  • Maximum error rates
  • Required approval levels

Tripwire Behavior

When a tripwire is triggered: 1. Action is immediately BLOCKED 2. Event is logged with details 3. Optional notifications sent 4. Trust score may be impacted

# This will trigger the max_refund tripwire
trace = CognitiveTrace(
    reasoning="Customer dissatisfied, issuing full refund",
    action="issue_refund",
    parameters={"amount": 1000}  # Exceeds max_refund of 500
)

result = steward.evaluate(trace)
# result.intervention == "BLOCK"
# result.message == "Tripwire triggered: amount exceeds max_refund of 500"

Defining Custom Tripwires

custom_tripwires = {
    # Financial
    "max_transaction": 1000,
    "daily_budget": 5000,

    # Rate limits
    "max_actions_per_minute": 10,
    "max_api_calls_per_hour": 100,

    # Security
    "max_data_records_accessed": 1000,
    "require_approval_above": 500,

    # Quality
    "min_confidence_score": 0.8,
    "max_retry_attempts": 3
}

steward = GovernanceSteward(
    acl_tier="ACL-2",
    tripwires=custom_tripwires
)

Tripwire Severity Categories

Tripwires are classified into three severity levels:

Category Examples Default Action ACL Override
Standard Budget exceeded, rate limit hit Block Escalate if ACL ≤ 2
Critical Secrets in output, production write Block + Alert Halt if ACL ≥ 3
Severe Data exfiltration, collusion detected Halt Always Halt

Tripwire Precedence:

  • Tripwires ALWAYS take precedence over CTQ scores
  • Multiple tripwires: Use highest severity
  • Tripwires cannot be overridden by high CTQ

Tripwires vs. Evaluations

Feature Tripwires CTQ Evaluation
Type Hard limits Quality assessment
Speed Instant Evaluative
Override No Contextual
Purpose Prevent mistakes Guide behavior

Best Practices

Set Conservative Limits

Start with strict tripwires and relax them based on experience.

Don't Rely Solely on Tripwires

Tripwires are a safety net, not a replacement for quality evaluation.

Monitor Tripwire Hits

Frequent tripwire triggers indicate agent needs retraining or policy adjustment.


Trust System Implementation Guide