Tripwires¶
Tripwires are hard limits that trigger immediate interventions when exceeded.
What are Tripwires?¶
Tripwires provide fail-safe boundaries that immediately block actions exceeding defined thresholds, regardless of reasoning quality.
steward = GovernanceSteward(
acl_tier="ACL-2",
tripwires={
"max_refund": 500, # Max refund amount
"max_discount_percent": 20, # Max discount percentage
"daily_action_limit": 100, # Max actions per day
"budget_limit": 10000 # Total budget limit
}
)
Common Tripwire Types¶
Financial Limits¶
- Maximum transaction amounts
- Daily spending limits
- Cumulative budget caps
Rate Limits¶
- Actions per minute/hour/day
- API call limits
- Resource usage caps
Security Boundaries¶
- Data access limits
- Privilege escalation prevention
- External communication restrictions
Quality Thresholds¶
- Minimum confidence scores
- Maximum error rates
- Required approval levels
Tripwire Behavior¶
When a tripwire is triggered: 1. Action is immediately BLOCKED 2. Event is logged with details 3. Optional notifications sent 4. Trust score may be impacted
# This will trigger the max_refund tripwire
trace = CognitiveTrace(
reasoning="Customer dissatisfied, issuing full refund",
action="issue_refund",
parameters={"amount": 1000} # Exceeds max_refund of 500
)
result = steward.evaluate(trace)
# result.intervention == "BLOCK"
# result.message == "Tripwire triggered: amount exceeds max_refund of 500"
Defining Custom Tripwires¶
custom_tripwires = {
# Financial
"max_transaction": 1000,
"daily_budget": 5000,
# Rate limits
"max_actions_per_minute": 10,
"max_api_calls_per_hour": 100,
# Security
"max_data_records_accessed": 1000,
"require_approval_above": 500,
# Quality
"min_confidence_score": 0.8,
"max_retry_attempts": 3
}
steward = GovernanceSteward(
acl_tier="ACL-2",
tripwires=custom_tripwires
)
Tripwire Severity Categories¶
Tripwires are classified into three severity levels:
| Category | Examples | Default Action | ACL Override |
|---|---|---|---|
| Standard | Budget exceeded, rate limit hit | Block | Escalate if ACL ≤ 2 |
| Critical | Secrets in output, production write | Block + Alert | Halt if ACL ≥ 3 |
| Severe | Data exfiltration, collusion detected | Halt | Always Halt |
Tripwire Precedence:
- Tripwires ALWAYS take precedence over CTQ scores
- Multiple tripwires: Use highest severity
- Tripwires cannot be overridden by high CTQ
Tripwires vs. Evaluations¶
| Feature | Tripwires | CTQ Evaluation |
|---|---|---|
| Type | Hard limits | Quality assessment |
| Speed | Instant | Evaluative |
| Override | No | Contextual |
| Purpose | Prevent mistakes | Guide behavior |
Best Practices¶
Set Conservative Limits
Start with strict tripwires and relax them based on experience.
Don't Rely Solely on Tripwires
Tripwires are a safety net, not a replacement for quality evaluation.
Monitor Tripwire Hits
Frequent tripwire triggers indicate agent needs retraining or policy adjustment.