Trust System¶

The trust system tracks agent behavior over time, enabling dynamic governance adjustments.

Overview¶

Agents earn trust through consistent good behavior and lose trust through poor decisions. Trust scores influence: - ACL tier assignments (dynamic re-tiering) - Intervention thresholds - Autonomy levels - Monitoring intensity

Trust Score¶

Trust scores range from 0.0 (untrusted) to 1.0 (fully trusted):

Score Range	Trust Level	Impact
0.9 - 1.0	Excellent	Lower ACL tier, more autonomy
0.7 - 0.9	Good	Standard governance
0.5 - 0.7	Fair	Increased monitoring
0.3 - 0.5	Poor	Higher ACL tier, less autonomy
0.0 - 0.3	Untrusted	Maximum oversight required

Trust Score Calculation¶

# Factors affecting trust score
- High-quality traces: +0.01 per trace
- Low-quality traces: -0.05 per trace
- Blocked actions: -0.10
- Successful actions: +0.02
- Tripwire violations: -0.15
- Human overrides: -0.08

Note: Trust scores represent overall agent reliability, while trust debt (discussed below) tracks flagged behaviors specifically.

Trust Debt¶

Trust Debt is accumulated when actions are flagged for review, indicating potential issues:

Accumulation (based on flag severity): - Low severity flag: +0.1 trust debt - Medium severity flag: +0.3 trust debt
- High severity flag: +0.5 trust debt

Decay: Trust debt decays by 5% every 24 hours (multiplied by 0.95) - Day 1: debt × 0.95 - Day 7: debt × 0.95^7 ≈ 0.6983 × original debt - Day 14: debt × 0.95^14 ≈ 0.4877 × original debt

Example: A 0.5 debt becomes ~0.348 after 7 days (0.5 × 0.6983)

Impact: When trust debt exceeds thresholds (varies by ACL tier), the agent may be automatically re-tiered to a higher ACL level for stricter oversight.

Dynamic Re-Tiering¶

Agents can automatically move between ACL tiers based on trust:

graph LR
    A["ACL-3 (Trust: 0.9)"] -->|Good Behavior| B["ACL-2 (Trust: 0.95)"]
    B -->|Excellent| C["ACL-1 (Trust: 1.0)"]
    C -->|Violation| B
    B -->|Poor Quality| A
    A -->|Critical Issue| D["ACL-4 (Trust: 0.4)"]

Trust Management¶

# Get current trust score
trust_score = steward.get_trust_score(agent_id)

# Manual trust adjustment (admin only)
steward.adjust_trust(
    agent_id=agent_id,
    adjustment=-0.2,
    reason="Security incident"
)

# Reset trust after investigation
steward.reset_trust(agent_id, initial_score=0.7)

Best Practices¶

Start Low

New agents should start with lower trust scores (0.5-0.6) and earn full trust over time.

Monitor Trust Trends

Declining trust indicates agent issues - investigate and address root causes.

Trust Recovery

Allow agents to recover trust after incidents, but with stricter oversight initially.

Implementation Guide See Specification