Performance Tuning¶

Optimize ACGP governance latency, throughput, and tail behavior without weakening safety guarantees.

Performance Model¶

End-to-end latency is the sum of: - Trace creation and serialization - Steward ingress and policy selection - Tripwire/check execution by eval tier - External dependency calls (DB/cache/model) - Persistence and audit write paths

Use this guide to tune each stage in order.

Baseline Targets¶

Typical governance overhead by Governance Tier: - GT-0: ~10ms typical, <50ms max - GT-1: ~20ms typical, <100ms max - GT-2: ~50ms typical, <150ms max - GT-3: ~100ms typical, <200ms max - GT-4: ~200ms typical, <350ms max - GT-5: ~500ms typical, <1000ms max

For interactive product paths, tune for P95 first, then control P99 tail.

Tuning Workflow¶

Capture baseline metrics (P50/P95/P99 latency, throughput, timeout count).
Isolate top 3 expensive actions by volume x latency.
Reduce Tier 1+ work on high-frequency low-risk paths.
Move non-blocking analysis to async where policy permits.
Validate no increase in unsafe allowances or missed tripwire events.

Blueprint-Level Tuning¶

Keep Fast Paths Cheap¶

Put deterministic safety checks in Tier 0.
Keep hot-path checks simple and cache-friendly.
Avoid expensive lookups for clearly low-risk operations.

tripwires:
  - id: rate_limit_guard
    when:
      hook: tool_call
      tool: search
    eval_tier: 0
    condition: "daily_actions > 2000"
    on_fail:
      decision: block
      reason: "Action cap exceeded"

Tune Thresholds Carefully¶

Do not loosen tripwires just to reduce latency.
Prefer adjusting metric weights or splitting checks by action risk.

scoring:
  thresholds:
    ok: 0.25
    nudge: 0.40
    escalate: 0.55

Configure Trust Debt for Stability¶

Use gradual decay to avoid oscillating tiers.
Keep intervention penalties proportional to severity.

trust_debt:
  enabled: true
  accumulation:
    nudge: 0.5
    escalate: 1.0
    block: 2.0
  decay:
    decay_fraction: 0.05
    period_hours: 24

Runtime Tuning Patterns¶

Use Async Evaluation Where Appropriate¶

result = await steward.evaluate_async(trace)

Apply this to non-blocking workflows (audit enrichment, post-action analysis), not critical pre-action gates.

Batch Low-Risk Workloads¶

results = steward.evaluate_batch([trace1, trace2, trace3])

Batching reduces per-request overhead when decisions are independent.

Cache Stable Policy Inputs¶

Cache immutable policy artifacts and precompiled rule structures to reduce parse/eval cost.

Infrastructure Tuning¶

Enable DB connection pooling.
Keep cache and steward in low-latency network zones.
Scale steward workers horizontally for burst traffic.
Avoid synchronous writes on non-critical telemetry pipelines.
Separate interactive traffic from long-running evaluation queues.

Observability Metrics¶

Track these metrics continuously: - acgp_eval_latency_ms (P50/P95/P99 by action) - acgp_budget_timeout_count - acgp_interventions_total by type - acgp_tripwire_fires_total by tripwire id - acgp_async_queue_depth and processing lag

Example runtime inspection:

metrics = steward.get_metrics()
print("p95_ms", metrics.get("p95_latency_ms"))
print("timeouts", metrics.get("timeout_count"))
print("interventions", metrics.get("interventions"))

Load Testing Checklist¶

Replay realistic action mix, not synthetic uniform traffic.
Include dependency degradation scenarios (DB slow, cache miss spikes).
Verify safety behavior under load (tripwires still deterministic).
Validate recovery after backpressure events.

Common Anti-Patterns¶

Putting all checks in high eval tiers.
Using one policy profile for every action regardless of risk.
Treating flag pathways as hard-fail pathways.
Ignoring P99 tails while tuning only average latency.