Trust System

The trust system tracks agent behavior over time, enabling dynamic governance adjustments.


Overview

Agents earn trust through consistent good behavior and lose trust through poor decisions. Trust scores influence: - ACL tier assignments (dynamic re-tiering) - Intervention thresholds - Autonomy levels - Monitoring intensity


Trust Score

Trust scores range from 0.0 (untrusted) to 1.0 (fully trusted):

Score Range Trust Level Impact
0.9 - 1.0 Excellent Lower ACL tier, more autonomy
0.7 - 0.9 Good Standard governance
0.5 - 0.7 Fair Increased monitoring
0.3 - 0.5 Poor Higher ACL tier, less autonomy
0.0 - 0.3 Untrusted Maximum oversight required

Trust Score Calculation

# Factors affecting trust score
- High-quality traces: +0.01 per trace
- Low-quality traces: -0.05 per trace
- Blocked actions: -0.10
- Successful actions: +0.02
- Tripwire violations: -0.15
- Human overrides: -0.08

Note: Trust scores represent overall agent reliability, while trust debt (discussed below) tracks flagged behaviors specifically.


Trust Debt

Trust Debt is accumulated when actions are flagged for review, indicating potential issues:

Accumulation (based on flag severity): - Low severity flag: +0.1 trust debt - Medium severity flag: +0.3 trust debt
- High severity flag: +0.5 trust debt

Decay: Trust debt decays by 5% every 24 hours (multiplied by 0.95) - Day 1: debt × 0.95 - Day 7: debt × 0.95^7 ≈ 0.6983 × original debt - Day 14: debt × 0.95^14 ≈ 0.4877 × original debt

Example: A 0.5 debt becomes ~0.348 after 7 days (0.5 × 0.6983)

Impact: When trust debt exceeds thresholds (varies by ACL tier), the agent may be automatically re-tiered to a higher ACL level for stricter oversight.


Dynamic Re-Tiering

Agents can automatically move between ACL tiers based on trust:

graph LR
    A["ACL-3 (Trust: 0.9)"] -->|Good Behavior| B["ACL-2 (Trust: 0.95)"]
    B -->|Excellent| C["ACL-1 (Trust: 1.0)"]
    C -->|Violation| B
    B -->|Poor Quality| A
    A -->|Critical Issue| D["ACL-4 (Trust: 0.4)"]

Trust Management

# Get current trust score
trust_score = steward.get_trust_score(agent_id)

# Manual trust adjustment (admin only)
steward.adjust_trust(
    agent_id=agent_id,
    adjustment=-0.2,
    reason="Security incident"
)

# Reset trust after investigation
steward.reset_trust(agent_id, initial_score=0.7)

Best Practices

Start Low

New agents should start with lower trust scores (0.5-0.6) and earn full trust over time.

Monitor Trust Trends

Declining trust indicates agent issues - investigate and address root causes.

Trust Recovery

Allow agents to recover trust after incidents, but with stricter oversight initially.


Implementation Guide See Specification