Interventions¶
Interventions are proportionate responses to cognitive trace evaluation results.
Naming Convention
In code, intervention types are typically represented as uppercase enum constants (OK, NUDGE, BLOCK, etc.). In message formats and APIs, they use lowercase strings ("ok", "nudge", "block"). Both refer to the same intervention types.
Six Intervention Types¶
ACGP defines six intervention types (five primary levels: OK, Nudge, Escalate, Block, Halt, plus orthogonal Flag that can be combined with any primary level).
Five Primary Intervention Levels¶
OK¶
Action: Proceed without interruption
When: High-quality reasoning, appropriate action, no concerns
Agent Response: Execute action immediately
NUDGE¶
Action: Suggest reconsideration, but allow to proceed
When: Minor concerns, alternative approaches might be better
Agent Response: Log suggestion, execute action
ESCALATE¶
Action: Request human approval before proceeding
When: Significant concerns, human judgment needed
Agent Response: Wait for human decision
if result.intervention == "ESCALATE":
approval = request_human_approval(trace, result.message)
if approval.approved:
execute_action()
else:
abort_action()
BLOCK¶
Action: Prevent this specific action
When: Action is inappropriate or violates policy
Agent Response: Don't execute, inform agent
if result.intervention == "BLOCK":
notify_agent(f"Action blocked: {result.message}")
return error_response()
HALT¶
Action: Stop agent completely
When: Critical tripwire violation, agent behavior is unsafe
Agent Response: Shutdown agent immediately
if result.intervention == "HALT":
logger.critical(f"Agent halted: {result.message}")
shutdown_agent()
raise AgentHaltedException()
Orthogonal Intervention: Flag¶
FLAG¶
Flag is special - it's orthogonal to the five primary interventions, meaning it can be combined with any of them:
Action: Log for human review while applying primary intervention
When: Action should be audited or reviewed later
Combinations: - Flag + OK: Action proceeds normally but is logged for later review - Flag + Nudge: Suggestion is given AND logged for audit - Flag + Escalate: Escalation is logged as requiring review - Flag + Block: Block is logged for compliance audit - Flag + Halt: Halt is logged as critical incident
Agent Response: Create review ticket AND execute primary intervention
if result.flagged:
create_review_ticket(trace, result.flag_severity)
# Also accumulates trust debt based on severity
# Then handle the primary intervention
if result.intervention == "OK":
execute_action()
elif result.intervention == "NUDGE":
log_suggestion_and_execute()
# ... etc
Impact: Flagged actions contribute to Trust Debt accumulation, which can trigger re-tiering over time.
Intervention Flow¶
graph TD
E[Evaluate Trace] --> CTQ[CTQ Score 0-1]
CTQ --> RISK["Calculate Risk Score (Risk = 1.0 - CTQ)"]
RISK --> ACL{ACL Tier}
ACL --> THRESH[Apply ACL-Specific Thresholds]
THRESH --> D{Risk Score}
D -->|Low Risk| OK[ OK: Proceed]
D -->|Minor Issues| N[ NUDGE: Suggestion]
D -->|Needs Review| ES[ ESCALATE: Review]
D -->|Policy Violation| B[ BLOCK: Prevent]
D -->|Critical Issue| H[ HALT: Stop Agent]
FLAG[ FLAG: Log for Review] -.->|Can combine| OK
FLAG -.->|Can combine| N
FLAG -.->|Can combine| ES
Note: Interventions are determined by Risk Score (1.0 - CTQ), not CTQ directly. Higher CTQ (better quality) → lower risk → less restrictive intervention.
Important: Thresholds vary by ACL tier. Example for ACL-2: - OK: Risk ≤ 0.25 - Nudge: Risk 0.25-0.40 - Escalate: Risk 0.40-0.55 - Block: Risk > 0.55
See ACL Tiers for complete threshold tables.
Handling Interventions¶
Complete Handler Example¶
def handle_intervention(result, trace):
"""Handle all intervention types appropriately."""
# First, handle Flag if present (orthogonal to primary intervention)
if result.flagged:
create_review_ticket(trace, result.flag_severity)
log_trust_debt_accumulation(result.flag_severity)
# Then handle the primary intervention
handlers = {
"OK": lambda: execute_action(trace),
"NUDGE": lambda: log_and_execute(trace, result.message),
"ESCALATE": lambda: request_approval(trace, result.message),
"BLOCK": lambda: notify_block(trace, result.message),
"HALT": lambda: emergency_halt(result.message)
}
return handlers[result.intervention]()
Best Practices¶
Always Handle All Interventions
Don't ignore NUDGE or FLAG interventions - they provide valuable governance signals.
Never Override HALT
HALT means something is seriously wrong. Respect it and investigate.
Log Everything
Log all interventions for audit trails and improvement.