Performance Tuning¶
Optimize ACGP governance latency, throughput, and tail behavior without weakening safety guarantees.
Performance Model¶
End-to-end latency is the sum of: - Trace creation and serialization - Steward ingress and policy selection - Tripwire/check execution by eval tier - External dependency calls (DB/cache/model) - Persistence and audit write paths
Use this guide to tune each stage in order.
Baseline Targets¶
Typical governance overhead by Governance Tier: - GT-0: ~10ms typical, <50ms max - GT-1: ~20ms typical, <100ms max - GT-2: ~50ms typical, <150ms max - GT-3: ~100ms typical, <200ms max - GT-4: ~200ms typical, <350ms max - GT-5: ~500ms typical, <1000ms max
For interactive product paths, tune for P95 first, then control P99 tail.
Tuning Workflow¶
- Capture baseline metrics (P50/P95/P99 latency, throughput, timeout count).
- Isolate top 3 expensive actions by volume x latency.
- Reduce Tier 1+ work on high-frequency low-risk paths.
- Move non-blocking analysis to async where policy permits.
- Validate no increase in unsafe allowances or missed tripwire events.
Blueprint-Level Tuning¶
Keep Fast Paths Cheap¶
- Put deterministic safety checks in Tier 0.
- Keep hot-path checks simple and cache-friendly.
- Avoid expensive lookups for clearly low-risk operations.
tripwires:
- id: rate_limit_guard
when:
hook: tool_call
tool: search
eval_tier: 0
condition: "daily_actions > 2000"
on_fail:
decision: block
reason: "Action cap exceeded"
Tune Thresholds Carefully¶
- Do not loosen tripwires just to reduce latency.
- Prefer adjusting metric weights or splitting checks by action risk.
Configure Trust Debt for Stability¶
- Use gradual decay to avoid oscillating tiers.
- Keep intervention penalties proportional to severity.
trust_debt:
enabled: true
accumulation:
nudge: 0.5
escalate: 1.0
block: 2.0
decay:
decay_fraction: 0.05
period_hours: 24
Runtime Tuning Patterns¶
Use Async Evaluation Where Appropriate¶
Apply this to non-blocking workflows (audit enrichment, post-action analysis), not critical pre-action gates.
Batch Low-Risk Workloads¶
Batching reduces per-request overhead when decisions are independent.
Cache Stable Policy Inputs¶
Cache immutable policy artifacts and precompiled rule structures to reduce parse/eval cost.
Infrastructure Tuning¶
- Enable DB connection pooling.
- Keep cache and steward in low-latency network zones.
- Scale steward workers horizontally for burst traffic.
- Avoid synchronous writes on non-critical telemetry pipelines.
- Separate interactive traffic from long-running evaluation queues.
Observability Metrics¶
Track these metrics continuously:
- acgp_eval_latency_ms (P50/P95/P99 by action)
- acgp_budget_timeout_count
- acgp_interventions_total by type
- acgp_tripwire_fires_total by tripwire id
- acgp_async_queue_depth and processing lag
Example runtime inspection:
metrics = steward.get_metrics()
print("p95_ms", metrics.get("p95_latency_ms"))
print("timeouts", metrics.get("timeout_count"))
print("interventions", metrics.get("interventions"))
Load Testing Checklist¶
- Replay realistic action mix, not synthetic uniform traffic.
- Include dependency degradation scenarios (DB slow, cache miss spikes).
- Verify safety behavior under load (tripwires still deterministic).
- Validate recovery after backpressure events.
Common Anti-Patterns¶
- Putting all checks in high eval tiers.
- Using one policy profile for every action regardless of risk.
- Treating flag pathways as hard-fail pathways.
- Ignoring P99 tails while tuning only average latency.