Skip to content

Event bus — operations SLO

Migrated from root technical docs.

Operational targets for the KV/D1 event pipeline and client delivery path.

  • p95 delivery latency: measure end-to-end from enqueue to successful worker delivery (or DLQ) in staging; alert if p95 exceeds the agreed budget for two consecutive windows.
  • Error rate and DLQ depth are reviewed alongside latency in the same dashboard.

Wire the following in observability (Datadog / Grafana / Cloudflare Analytics as applicable):

  • Delivery success vs failure counts (event.delivery.success / failure paths in delivery-retry.ts).
  • DLQ depth and age of oldest message.
  • Worker invocation errors and retries.

Revisit SLO thresholds after major traffic or schema changes.