Trust System¶
The trust system tracks agent behavior over time, enabling dynamic governance adjustments.
Overview¶
Agents earn trust through consistent good behavior and lose trust through poor decisions. Trust scores influence: - ACL tier assignments (dynamic re-tiering) - Intervention thresholds - Autonomy levels - Monitoring intensity
Trust Score¶
Trust scores range from 0.0 (untrusted) to 1.0 (fully trusted):
| Score Range | Trust Level | Impact |
|---|---|---|
| 0.9 - 1.0 | Excellent | Lower ACL tier, more autonomy |
| 0.7 - 0.9 | Good | Standard governance |
| 0.5 - 0.7 | Fair | Increased monitoring |
| 0.3 - 0.5 | Poor | Higher ACL tier, less autonomy |
| 0.0 - 0.3 | Untrusted | Maximum oversight required |
Trust Score Calculation¶
# Factors affecting trust score
- High-quality traces: +0.01 per trace
- Low-quality traces: -0.05 per trace
- Blocked actions: -0.10
- Successful actions: +0.02
- Tripwire violations: -0.15
- Human overrides: -0.08
Note: Trust scores represent overall agent reliability, while trust debt (discussed below) tracks flagged behaviors specifically.
Trust Debt¶
Trust Debt is accumulated when actions are flagged for review, indicating potential issues:
Accumulation (based on flag severity):
- Low severity flag: +0.1 trust debt
- Medium severity flag: +0.3 trust debt
- High severity flag: +0.5 trust debt
Decay: Trust debt decays by 5% every 24 hours (multiplied by 0.95) - Day 1: debt × 0.95 - Day 7: debt × 0.95^7 ≈ 0.6983 × original debt - Day 14: debt × 0.95^14 ≈ 0.4877 × original debt
Example: A 0.5 debt becomes ~0.348 after 7 days (0.5 × 0.6983)
Impact: When trust debt exceeds thresholds (varies by ACL tier), the agent may be automatically re-tiered to a higher ACL level for stricter oversight.
Dynamic Re-Tiering¶
Agents can automatically move between ACL tiers based on trust:
graph LR
A["ACL-3 (Trust: 0.9)"] -->|Good Behavior| B["ACL-2 (Trust: 0.95)"]
B -->|Excellent| C["ACL-1 (Trust: 1.0)"]
C -->|Violation| B
B -->|Poor Quality| A
A -->|Critical Issue| D["ACL-4 (Trust: 0.4)"]
Trust Management¶
# Get current trust score
trust_score = steward.get_trust_score(agent_id)
# Manual trust adjustment (admin only)
steward.adjust_trust(
agent_id=agent_id,
adjustment=-0.2,
reason="Security incident"
)
# Reset trust after investigation
steward.reset_trust(agent_id, initial_score=0.7)
Best Practices¶
Start Low
New agents should start with lower trust scores (0.5-0.6) and earn full trust over time.
Monitor Trust Trends
Declining trust indicates agent issues - investigate and address root causes.
Trust Recovery
Allow agents to recover trust after incidents, but with stricter oversight initially.