Performance Tuning

Optimize ACGP governance latency, throughput, and tail behavior without weakening safety guarantees.


Performance Model

End-to-end latency is the sum of: - Trace creation and serialization - Steward ingress and policy selection - Tripwire/check execution by eval tier - External dependency calls (DB/cache/model) - Persistence and audit write paths

Use this guide to tune each stage in order.


Baseline Targets

Typical governance overhead by Governance Tier: - GT-0: ~10ms typical, <50ms max - GT-1: ~20ms typical, <100ms max - GT-2: ~50ms typical, <150ms max - GT-3: ~100ms typical, <200ms max - GT-4: ~200ms typical, <350ms max - GT-5: ~500ms typical, <1000ms max

For interactive product paths, tune for P95 first, then control P99 tail.


Tuning Workflow

  1. Capture baseline metrics (P50/P95/P99 latency, throughput, timeout count).
  2. Isolate top 3 expensive actions by volume x latency.
  3. Reduce Tier 1+ work on high-frequency low-risk paths.
  4. Move non-blocking analysis to async where policy permits.
  5. Validate no increase in unsafe allowances or missed tripwire events.

Blueprint-Level Tuning

Keep Fast Paths Cheap

  • Put deterministic safety checks in Tier 0.
  • Keep hot-path checks simple and cache-friendly.
  • Avoid expensive lookups for clearly low-risk operations.
tripwires:
  - id: rate_limit_guard
    when:
      hook: tool_call
      tool: search
    eval_tier: 0
    condition: "daily_actions > 2000"
    on_fail:
      decision: block
      reason: "Action cap exceeded"

Tune Thresholds Carefully

  • Do not loosen tripwires just to reduce latency.
  • Prefer adjusting metric weights or splitting checks by action risk.
scoring:
  thresholds:
    ok: 0.25
    nudge: 0.40
    escalate: 0.55

Configure Trust Debt for Stability

  • Use gradual decay to avoid oscillating tiers.
  • Keep intervention penalties proportional to severity.
trust_debt:
  enabled: true
  accumulation:
    nudge: 0.5
    escalate: 1.0
    block: 2.0
  decay:
    decay_fraction: 0.05
    period_hours: 24

Runtime Tuning Patterns

Use Async Evaluation Where Appropriate

result = await steward.evaluate_async(trace)

Apply this to non-blocking workflows (audit enrichment, post-action analysis), not critical pre-action gates.

Batch Low-Risk Workloads

results = steward.evaluate_batch([trace1, trace2, trace3])

Batching reduces per-request overhead when decisions are independent.

Cache Stable Policy Inputs

Cache immutable policy artifacts and precompiled rule structures to reduce parse/eval cost.


Infrastructure Tuning

  • Enable DB connection pooling.
  • Keep cache and steward in low-latency network zones.
  • Scale steward workers horizontally for burst traffic.
  • Avoid synchronous writes on non-critical telemetry pipelines.
  • Separate interactive traffic from long-running evaluation queues.

Observability Metrics

Track these metrics continuously: - acgp_eval_latency_ms (P50/P95/P99 by action) - acgp_budget_timeout_count - acgp_interventions_total by type - acgp_tripwire_fires_total by tripwire id - acgp_async_queue_depth and processing lag

Example runtime inspection:

metrics = steward.get_metrics()
print("p95_ms", metrics.get("p95_latency_ms"))
print("timeouts", metrics.get("timeout_count"))
print("interventions", metrics.get("interventions"))

Load Testing Checklist

  • Replay realistic action mix, not synthetic uniform traffic.
  • Include dependency degradation scenarios (DB slow, cache miss spikes).
  • Verify safety behavior under load (tripwires still deterministic).
  • Validate recovery after backpressure events.

Common Anti-Patterns

  • Putting all checks in high eval tiers.
  • Using one policy profile for every action regardless of risk.
  • Treating flag pathways as hard-fail pathways.
  • Ignoring P99 tails while tuning only average latency.