Interventions

Interventions are proportionate responses to cognitive trace evaluation results.

Naming Convention

In code, intervention types are typically represented as uppercase enum constants (OK, NUDGE, BLOCK, etc.). In message formats and APIs, they use lowercase strings ("ok", "nudge", "block"). Both refer to the same intervention types.


Six Intervention Types

ACGP defines six intervention types (five primary levels: OK, Nudge, Escalate, Block, Halt, plus orthogonal Flag that can be combined with any primary level).

Five Primary Intervention Levels

OK

Action: Proceed without interruption

When: High-quality reasoning, appropriate action, no concerns

Agent Response: Execute action immediately

if result.intervention == "OK":
    execute_action()

NUDGE

Action: Suggest reconsideration, but allow to proceed

When: Minor concerns, alternative approaches might be better

Agent Response: Log suggestion, execute action

if result.intervention == "NUDGE":
    logger.info(f"ACGP suggests: {result.message}")
    execute_action()

ESCALATE

Action: Request human approval before proceeding

When: Significant concerns, human judgment needed

Agent Response: Wait for human decision

if result.intervention == "ESCALATE":
    approval = request_human_approval(trace, result.message)
    if approval.approved:
        execute_action()
    else:
        abort_action()

BLOCK

Action: Prevent this specific action

When: Action is inappropriate or violates policy

Agent Response: Don't execute, inform agent

if result.intervention == "BLOCK":
    notify_agent(f"Action blocked: {result.message}")
    return error_response()

HALT

Action: Stop agent completely

When: Critical tripwire violation, agent behavior is unsafe

Agent Response: Shutdown agent immediately

if result.intervention == "HALT":
    logger.critical(f"Agent halted: {result.message}")
    shutdown_agent()
    raise AgentHaltedException()

Orthogonal Intervention: Flag

FLAG

Flag is special - it's orthogonal to the five primary interventions, meaning it can be combined with any of them:

Action: Log for human review while applying primary intervention

When: Action should be audited or reviewed later

Combinations: - Flag + OK: Action proceeds normally but is logged for later review - Flag + Nudge: Suggestion is given AND logged for audit - Flag + Escalate: Escalation is logged as requiring review - Flag + Block: Block is logged for compliance audit - Flag + Halt: Halt is logged as critical incident

Agent Response: Create review ticket AND execute primary intervention

if result.flagged:
    create_review_ticket(trace, result.flag_severity)
    # Also accumulates trust debt based on severity

# Then handle the primary intervention
if result.intervention == "OK":
    execute_action()
elif result.intervention == "NUDGE":
    log_suggestion_and_execute()
# ... etc

Impact: Flagged actions contribute to Trust Debt accumulation, which can trigger re-tiering over time.


Intervention Flow

graph TD
    E[Evaluate Trace] --> CTQ[CTQ Score 0-1]
    CTQ --> RISK["Calculate Risk Score (Risk = 1.0 - CTQ)"]
    RISK --> ACL{ACL Tier}
    ACL --> THRESH[Apply ACL-Specific Thresholds]
    THRESH --> D{Risk Score}
    D -->|Low Risk| OK[ OK: Proceed]
    D -->|Minor Issues| N[ NUDGE: Suggestion]
    D -->|Needs Review| ES[ ESCALATE: Review]
    D -->|Policy Violation| B[ BLOCK: Prevent]
    D -->|Critical Issue| H[ HALT: Stop Agent]

    FLAG[ FLAG: Log for Review] -.->|Can combine| OK
    FLAG -.->|Can combine| N
    FLAG -.->|Can combine| ES

Note: Interventions are determined by Risk Score (1.0 - CTQ), not CTQ directly. Higher CTQ (better quality) → lower risk → less restrictive intervention.

Important: Thresholds vary by ACL tier. Example for ACL-2: - OK: Risk ≤ 0.25 - Nudge: Risk 0.25-0.40 - Escalate: Risk 0.40-0.55 - Block: Risk > 0.55

See ACL Tiers for complete threshold tables.


Handling Interventions

Complete Handler Example

def handle_intervention(result, trace):
    """Handle all intervention types appropriately."""

    # First, handle Flag if present (orthogonal to primary intervention)
    if result.flagged:
        create_review_ticket(trace, result.flag_severity)
        log_trust_debt_accumulation(result.flag_severity)

    # Then handle the primary intervention
    handlers = {
        "OK": lambda: execute_action(trace),
        "NUDGE": lambda: log_and_execute(trace, result.message),
        "ESCALATE": lambda: request_approval(trace, result.message),
        "BLOCK": lambda: notify_block(trace, result.message),
        "HALT": lambda: emergency_halt(result.message)
    }

    return handlers[result.intervention]()

Best Practices

Always Handle All Interventions

Don't ignore NUDGE or FLAG interventions - they provide valuable governance signals.

Never Override HALT

HALT means something is seriously wrong. Respect it and investigate.

Log Everything

Log all interventions for audit trails and improvement.


Tripwires Trust System