Tripwires¶
Tripwires are hard limits that trigger immediate interventions when exceeded.
What are Tripwires?¶
Tripwires provide fail-safe boundaries that immediately block actions exceeding defined thresholds, regardless of reasoning quality.
from acgp import GovernanceSteward, PostgresStateStorage
blueprint = {
"id": "tripwires/example@1.0",
"version": "1.0.0",
"description": "Illustrative tripwire-only policy",
"tripwires": [
{
"id": "max_refund",
"when": {"hook": "tool_call", "tool": "issue_refund"},
"condition": "args.amount > 500",
"eval_tier": 0,
"on_fail": {
"decision": "block",
"reason": "Refund amount exceeds 500",
},
}
],
}
steward = GovernanceSteward.production(
blueprint_dict=blueprint,
state_storage=PostgresStateStorage(connection_string="postgresql://runtime/acgp"),
)
Common Tripwire Types¶
Financial Limits¶
- Maximum transaction amounts
- Daily spending limits
- Cumulative budget caps
Rate Limits¶
- Actions per minute/hour/day
- API call limits
- Resource usage caps
Security Boundaries¶
- Data access limits
- Privilege escalation prevention
- External communication restrictions
Quality Thresholds¶
- Minimum confidence scores
- Maximum error rates
- Required approval levels
Tripwire Behavior¶
When a tripwire is triggered: 1. Action is immediately BLOCKED 2. Event is logged with details 3. Optional notifications sent 4. Trust Debt may increase or trigger downstream review thresholds
# This will trigger the max_refund tripwire
trace = CognitiveTrace(
reasoning="Customer dissatisfied, issuing full refund",
action="issue_refund",
parameters={"amount": 1000} # Exceeds max_refund of 500
)
result = steward.evaluate(trace)
# result.intervention == "block"
# result.message == "Tripwire triggered: amount exceeds max_refund of 500"
Defining Custom Tripwires¶
from acgp import GovernanceSteward, PostgresStateStorage
custom_blueprint = {
"id": "tripwires/custom@1.0",
"version": "1.0.0",
"description": "Custom tripwire examples",
"tripwires": [
{
"id": "max_transaction",
"when": {"hook": "tool_call", "tool": "transfer"},
"condition": "args.amount > 1000",
"eval_tier": 0,
"on_fail": {"decision": "block", "reason": "Transaction limit exceeded"},
},
{
"id": "max_actions_per_minute",
"when": {"hook": "tool_call"},
"condition": "exceeds_rate(agent_id, 10, '1m')",
"eval_tier": 1,
"requires_state": true,
"on_fail": {"decision": "block", "reason": "Rate limit exceeded"},
},
],
}
steward = GovernanceSteward.production(
blueprint_dict=custom_blueprint,
state_storage=PostgresStateStorage(connection_string="postgresql://runtime/acgp"),
)
Tripwire Severity Categories¶
Tripwires are classified into three severity levels:
| Category | Examples | Default Action | Governance Tier Override |
|---|---|---|---|
| Standard | Budget exceeded, rate limit hit | Block | Escalate if Governance Tier ≤ GT-2 |
| Critical | Secrets in output, production write | Block + Alert | Halt if Governance Tier ≥ GT-3 |
| Severe | Data exfiltration, collusion detected | Halt | Always Halt |
Tripwire Precedence:
- Tripwires ALWAYS take precedence over CTQ scores
- Multiple tripwires: Use highest severity
- Tripwires cannot be overridden by high CTQ
Tripwires vs. CTQ Checks — When to Use Which¶
| Dimension | Tripwires (ACGP-4) | CTQ Checks / Scorers (ACGP-3 §5) |
|---|---|---|
| Schema | condition DSL string + on_fail |
metric.check.type + metric.check.args |
| Purpose | Hard safety boundaries | Continuous quality assessment |
| Output | Boolean pass/fail → intervention | Float 0.0–1.0 risk score → threshold mapping |
| Speed | Eval Tier 0–1 (< 300 ms) | Eval Tier 0–2 (up to seconds for cognitive-evaluator) |
| Override | Never — tripwire result is final | Contextual — scores feed weighted aggregation |
| Statefulness | Optional (requires_state) |
Stateless (scorer receives trace only) |
| Typical use | "Block if amount > 10 000" | "How well-grounded is this response?" |
Rule of thumb
Use a tripwire when the answer is binary ("this must never happen"). Use a CTQ check when you need a graded quality signal that feeds into a composite risk score.
Best Practices¶
Set Conservative Limits
Start with strict tripwires and relax them based on experience.
Don't Rely Solely on Tripwires
Tripwires are a safety net, not a replacement for quality evaluation.
Monitor Tripwire Hits
Frequent tripwire triggers indicate agent needs retraining or policy adjustment.