Skip to content

TKT-006: Test Coverage, Rollout Strategy, and Ratchet

Migrated from root technical docs.

TKT-006: Test Coverage, Rollout Strategy, and Ratchet

Section titled “TKT-006: Test Coverage, Rollout Strategy, and Ratchet”

Status: Todo Priority: P1 Estimated effort: 1-2 days Depends on: TKT-001 through TKT-005

Finalize production readiness with test hardening, staged rollout controls, and a measurable ratchet plan for long-term query performance improvements.

  • Test completion criteria.
  • Progressive rollout matrix.
  • KPI and ratchet checkpoints.
  • Add/expand unit tests for query span helper and threshold logic (slow/fast query tests in monitoring-cloudflare.test.ts).
  • Add regression test to ensure sensitive query data is redacted (fingerprint high-cardinality test in monitoring-cloudflare.test.ts).
  • Add service-level tests where helper integration is mocked/verified (error-path propagation test in monitoring-cloudflare.test.ts; traceServerDbQuery test in monitoring.test.ts).
  • Ensure CI test commands pass for modified packages/apps — all 4 affected packages pass (packages/utils 34 tests, apps/dashboard 19 tests, workers/consumer-api 15 tests, workers/ingestion 4 monitoring tests); turbo type-check clean on dashboard + utils; no new TS errors in worker-consumer-api or worker-ingestion.
  • Document known test limitations and follow-up tasks — see below.
  • Author-matching cross-shard tracing (author-matching.ts): tests verify the queryOrcidsAcrossShards helper compiles and types check correctly, but no dedicated test exists for the shard-index-to-span-attribute flow across all 5 lookup strategies. Follow-up: add an author-matching.test.ts with traceDbQuery mock verifying shard attribute per strategy.
  • Workspace-modules and site-tools DB tracing: instrumented but not unit-tested in isolation (only type-checked). Follow-up: add workspace-modules.test.ts and site-tools/queries.test.ts with a traceServerDbQuery mock.
  • Date.now mock ordering: the slow-query test relies on a call-count pattern. If the implementation changes to a single Date.now call, the test will need updating.
  • Worker deploy validation: span emission under real Cloudflare runtime (not Vitest) can only be confirmed post-deploy with SENTRY_TRACES_SAMPLE_RATE=1.0 in preview.
  • Define rollout matrix:
  • Stage 1: preview at 100% traces for 24h.
  • Stage 2: production at 2% traces for 48h.
  • Stage 3: production at 5% traces if volume/cost acceptable.
  • Add explicit stop conditions (event budget, latency overhead, error spikes).
  • Define rollback owner and SLA for disabling traces.
  • Baseline p50/p95/p99 DB span duration by service and endpoint.
  • Track top 10 slow query fingerprints weekly.
  • Track alert volume and false-positive rate.
  • Create monthly review checkpoint to tune thresholds and sampling.
  • Tracing and DB instrumentation deployed to all target services.
  • Alerts and dashboards are stable and useful for operations.
  • Tests cover critical helper behaviors and integration paths.
  • Team has a ratchet cadence (weekly/monthly) for performance improvement.
  • All tickets marked complete in README.
  • Follow-up optimization tickets created for top slow fingerprints (requires post-deploy data).
  • Runbook linked in relevant service docs — linked from apps/dashboard/AGENTS.md and packages/platform-ingestion/docs/OBSERVABILITY.md.