Locale-Search Rollout Runbook (F3)
Section titled “Locale-Search Rollout Runbook (F3)”This runbook covers the staged rollout, monitoring, rollback, and auto-disable recovery for the locale-aware search feature (ticket F3 / E2–E3 locale search projections).
Overview
Section titled “Overview”Locale-aware search is gated by a KV-backed cohort percentage stored under
rollout:locale-search in PUB_CACHE. The stage value controls what percentage of
entity workspaces receive locale-aware SQL (search_projections + json_extract fallback).
Default: stage = 0 (disabled for all entities).
Admin Endpoint
Section titled “Admin Endpoint”All rollout management goes through:
GET /api/site-tools/rolloutPATCH /api/site-tools/rolloutBoth endpoints require a valid Cloudflare Access session (same as all /api/ routes).
Read current config
Section titled “Read current config”curl -s https://dash.legaciti.org/api/site-tools/rollout \ -H "Cookie: CF_Authorization=<token>"Response:
{ "feature": "locale-search", "stage": 0, "enabled": true, "auto_disabled": false, "disabled_at": null, "disabled_reason": null, "updated_at": 0, "updated_by": null}Advance rollout stage
Section titled “Advance rollout stage”curl -s -X PATCH https://dash.legaciti.org/api/site-tools/rollout \ -H "Cookie: CF_Authorization=<token>" \ -H "Content-Type: application/json" \ -d '{"stage": 10}'Enable / disable
Section titled “Enable / disable”# Disable without changing stagecurl -X PATCH .../api/site-tools/rollout \ -H "Content-Type: application/json" \ -d '{"enabled": false}'
# Re-enable after investigationcurl -X PATCH .../api/site-tools/rollout \ -H "Content-Type: application/json" \ -d '{"enabled": true}'Staged Rollout Procedure
Section titled “Staged Rollout Procedure”Advance through cohort percentages with a soak period at each step.
| Step | Stage | Cohort size | Minimum soak |
|---|---|---|---|
| 0 | 0 | disabled | — |
| 1 | 10 | ~10% of entities | 24 hours |
| 2 | 25 | ~25% | 48 hours |
| 3 | 50 | ~50% | 48 hours |
| 4 | 100 | all | — |
Cohort bucketing is deterministic: djb2(entityId) % 100 < stage. Increasing the stage
always adds new entities — entities already in the cohort are never removed.
Step-by-step
Section titled “Step-by-step”- Pre-check: Confirm baseline error rate is normal.
- Advance:
PATCH /api/site-tools/rollout { "stage": <next> }. - Validate (see Monitoring section below).
- Hold for the minimum soak period before advancing again.
- Repeat until
stage: 100.
Monitoring
Section titled “Monitoring”What to watch
Section titled “What to watch”After each stage advance, monitor these signals for at least 30 minutes:
- Error rate: should stay below 5% (auto-disable triggers at 10%).
- Average latency: should stay below 1500ms (auto-disable triggers at 2000ms).
- Response contract:
matched_locale,fallback_used,publications/peoplearrays intact.
Reading metrics from KV
Section titled “Reading metrics from KV”Metrics are stored in per-minute buckets at keys:
rollout:metrics:locale-search:YYYYMMDD_HHMM
Each bucket is a JSON object:
{ "requests": 42, "errors": 0, "total_latency": 18000 }TTL: 2 minutes (rolling window of last 5 minutes checked during auto-disable evaluation).
Reading from Cloudflare Logs (Workers Tail)
Section titled “Reading from Cloudflare Logs (Workers Tail)”wrangler tail dashboard-api --format=json 2>/dev/null | \ jq 'select(.logs[].message[0].event == "list.publications") | .logs[].message[0] | {duration_ms, rollout}'Look for the rollout block in structured logs:
{ "event": "list.publications", "entity_id": "cesam", "duration_ms": 142, "rollout": { "stage": 50, "enabled": true, "auto_disabled": false, "effective_locale": "en", "locale_search_active": true }}locale_search_active: false means the entity was outside the current cohort.
Rollback Procedure
Section titled “Rollback Procedure”Manual disable (immediate)
Section titled “Manual disable (immediate)”curl -X PATCH https://dash.legaciti.org/api/site-tools/rollout \ -H "Cookie: CF_Authorization=<token>" \ -H "Content-Type: application/json" \ -d '{"enabled": false}'This sets enabled: false globally. All entities fall back to plain LIKE search instantly
(KV TTL ~60 s for config read; no deployment needed).
Step down stage
Section titled “Step down stage”To reduce cohort without fully disabling:
curl -X PATCH .../api/site-tools/rollout \ -d '{"stage": 10}'Auto-Disable Recovery
Section titled “Auto-Disable Recovery”The system auto-disables locale search when the 5-minute rolling window exceeds:
- Error rate: > 10%
- Average latency: > 2000ms
Minimum 20 requests must be in the window before thresholds are evaluated.
When auto-disabled, GET /api/site-tools/rollout returns:
{ "auto_disabled": true, "disabled_at": 1717000000, "disabled_reason": "error_rate_exceeded"}Recovery steps
Section titled “Recovery steps”- Investigate the root cause (check CF logs for errors, check D1 query plans).
- Fix the underlying issue (schema migration, query rewrite, etc.).
- Clear auto-disable and re-enable:
This resets
Terminal window curl -X PATCH .../api/site-tools/rollout \-d '{"enabled": true}'auto_disabled: falseand re-enables the feature at the current stage. - Monitor again before advancing.
Local Testing
Section titled “Local Testing”Run the CI locale-key check:
node scripts/check-locale-keys.mjsRun dashboard-api tests:
pnpm --filter dashboard-api exec vitest runCheck rollout unit tests specifically:
pnpm --filter dashboard-api exec vitest run src/lib/rollout.test.tsThresholds Reference
Section titled “Thresholds Reference”| Threshold constant | Value |
|---|---|
ERROR_RATE_THRESHOLD | 0.10 |
LATENCY_THRESHOLD_MS | 2000 |
MIN_REQUESTS_FOR_DISABLE | 20 |
METRICS_WINDOW_MINUTES | 5 |