Interventions¶

Interventions are proportionate responses to cognitive trace evaluation results.

Naming Convention

In code, intervention types are typically represented as uppercase enum constants (OK, NUDGE, BLOCK, etc.). In message formats and APIs, they use lowercase strings ("ok", "nudge", "block"). Both refer to the same intervention types.

Six Intervention Types¶

ACGP defines six intervention types (five primary levels: OK, Nudge, Escalate, Block, Halt, plus orthogonal Flag that can be combined with any primary level).

Five Primary Intervention Levels¶

OK¶

Action: Proceed without interruption

When: High-quality reasoning, appropriate action, no concerns

Agent Response: Execute action immediately

if result.intervention == "OK":
    execute_action()

NUDGE¶

Action: Suggest reconsideration, but allow to proceed

When: Minor concerns, alternative approaches might be better

Agent Response: Log suggestion, execute action

if result.intervention == "NUDGE":
    logger.info(f"ACGP suggests: {result.message}")
    execute_action()

ESCALATE¶

Action: Request human approval before proceeding

When: Significant concerns, human judgment needed

Agent Response: Wait for human decision

if result.intervention == "ESCALATE":
    approval = request_human_approval(trace, result.message)
    if approval.approved:
        execute_action()
    else:
        abort_action()

BLOCK¶

Action: Prevent this specific action

When: Action is inappropriate or violates policy

Agent Response: Don't execute, inform agent

if result.intervention == "BLOCK":
    notify_agent(f"Action blocked: {result.message}")
    return error_response()

HALT¶

Action: Stop agent completely

When: Critical tripwire violation, agent behavior is unsafe

Agent Response: Shutdown agent immediately

if result.intervention == "HALT":
    logger.critical(f"Agent halted: {result.message}")
    shutdown_agent()
    raise AgentHaltedException()

Orthogonal Intervention: Flag¶

FLAG¶

Flag is special - it's orthogonal to the five primary interventions, meaning it can be combined with any of them:

Action: Log for human review while applying primary intervention

When: Action should be audited or reviewed later

Combinations: - Flag + OK: Action proceeds normally but is logged for later review - Flag + Nudge: Suggestion is given AND logged for audit - Flag + Escalate: Escalation is logged as requiring review - Flag + Block: Block is logged for compliance audit - Flag + Halt: Halt is logged as critical incident

Agent Response: Create review ticket AND execute primary intervention

if result.flagged:
    create_review_ticket(trace, result.flag_severity)
    # Also accumulates trust debt based on severity

# Then handle the primary intervention
if result.intervention == "OK":
    execute_action()
elif result.intervention == "NUDGE":
    log_suggestion_and_execute()
# ... etc

Impact: Flagged actions contribute to Trust Debt accumulation, which can trigger re-tiering over time.

Intervention Flow¶

graph TD
    E[Evaluate Trace] --> CTQ[CTQ Score 0-1]
    CTQ --> RISK["Calculate Risk Score (Risk = 1.0 - CTQ)"]
    RISK --> ACL{ACL Tier}
    ACL --> THRESH[Apply ACL-Specific Thresholds]
    THRESH --> D{Risk Score}
    D -->|Low Risk| OK[ OK: Proceed]
    D -->|Minor Issues| N[ NUDGE: Suggestion]
    D -->|Needs Review| ES[ ESCALATE: Review]
    D -->|Policy Violation| B[ BLOCK: Prevent]
    D -->|Critical Issue| H[ HALT: Stop Agent]

    FLAG[ FLAG: Log for Review] -.->|Can combine| OK
    FLAG -.->|Can combine| N
    FLAG -.->|Can combine| ES

Note: Interventions are determined by Risk Score (1.0 - CTQ), not CTQ directly. Higher CTQ (better quality) → lower risk → less restrictive intervention.

Important: Thresholds vary by ACL tier. Example for ACL-2: - OK: Risk ≤ 0.25 - Nudge: Risk 0.25-0.40 - Escalate: Risk 0.40-0.55 - Block: Risk > 0.55

See ACL Tiers for complete threshold tables.

Handling Interventions¶

Complete Handler Example¶

def handle_intervention(result, trace):
    """Handle all intervention types appropriately."""

    # First, handle Flag if present (orthogonal to primary intervention)
    if result.flagged:
        create_review_ticket(trace, result.flag_severity)
        log_trust_debt_accumulation(result.flag_severity)

    # Then handle the primary intervention
    handlers = {
        "OK": lambda: execute_action(trace),
        "NUDGE": lambda: log_and_execute(trace, result.message),
        "ESCALATE": lambda: request_approval(trace, result.message),
        "BLOCK": lambda: notify_block(trace, result.message),
        "HALT": lambda: emergency_halt(result.message)
    }

    return handlers[result.intervention]()

Best Practices¶

Always Handle All Interventions

Don't ignore NUDGE or FLAG interventions - they provide valuable governance signals.

Never Override HALT

HALT means something is seriously wrong. Respect it and investigate.

Log Everything

Log all interventions for audit trails and improvement.

Tripwires Trust System