Event bus — operations SLO
Section titled “Event bus — operations SLO”Operational targets for the KV/D1 event pipeline and client delivery path.
Latency and reliability
Section titled “Latency and reliability”- p95 delivery latency: measure end-to-end from enqueue to successful worker delivery (or DLQ) in staging; alert if p95 exceeds the agreed budget for two consecutive windows.
- Error rate and DLQ depth are reviewed alongside latency in the same dashboard.
Dashboard Implementation
Section titled “Dashboard Implementation”Wire the following in observability (Datadog / Grafana / Cloudflare Analytics as applicable):
- Delivery success vs failure counts (
event.delivery.success/ failure paths indelivery-retry.ts). - DLQ depth and age of oldest message.
- Worker invocation errors and retries.
Review cadence
Section titled “Review cadence”Revisit SLO thresholds after major traffic or schema changes.