Skip to content

Locale-Search Rollout Runbook

This runbook covers the staged rollout, monitoring, rollback, and auto-disable recovery for the locale-aware search feature (ticket F3 / E2–E3 locale search projections).


Locale-aware search is gated by a KV-backed cohort percentage stored under rollout:locale-search in PUB_CACHE. The stage value controls what percentage of entity workspaces receive locale-aware SQL (search_projections + json_extract fallback).

Default: stage = 0 (disabled for all entities).


All rollout management goes through:

GET /api/site-tools/rollout
PATCH /api/site-tools/rollout

Both endpoints require a valid Cloudflare Access session (same as all /api/ routes).

Terminal window
curl -s https://dash.legaciti.org/api/site-tools/rollout \
-H "Cookie: CF_Authorization=<token>"

Response:

{
"feature": "locale-search",
"stage": 0,
"enabled": true,
"auto_disabled": false,
"disabled_at": null,
"disabled_reason": null,
"updated_at": 0,
"updated_by": null
}
Terminal window
curl -s -X PATCH https://dash.legaciti.org/api/site-tools/rollout \
-H "Cookie: CF_Authorization=<token>" \
-H "Content-Type: application/json" \
-d '{"stage": 10}'
Terminal window
# Disable without changing stage
curl -X PATCH .../api/site-tools/rollout \
-H "Content-Type: application/json" \
-d '{"enabled": false}'
# Re-enable after investigation
curl -X PATCH .../api/site-tools/rollout \
-H "Content-Type: application/json" \
-d '{"enabled": true}'

Advance through cohort percentages with a soak period at each step.

StepStageCohort sizeMinimum soak
00disabled
110~10% of entities24 hours
225~25%48 hours
350~50%48 hours
4100all

Cohort bucketing is deterministic: djb2(entityId) % 100 < stage. Increasing the stage always adds new entities — entities already in the cohort are never removed.

  1. Pre-check: Confirm baseline error rate is normal.
  2. Advance: PATCH /api/site-tools/rollout { "stage": <next> }.
  3. Validate (see Monitoring section below).
  4. Hold for the minimum soak period before advancing again.
  5. Repeat until stage: 100.

After each stage advance, monitor these signals for at least 30 minutes:

  • Error rate: should stay below 5% (auto-disable triggers at 10%).
  • Average latency: should stay below 1500ms (auto-disable triggers at 2000ms).
  • Response contract: matched_locale, fallback_used, publications/people arrays intact.

Metrics are stored in per-minute buckets at keys: rollout:metrics:locale-search:YYYYMMDD_HHMM

Each bucket is a JSON object:

{ "requests": 42, "errors": 0, "total_latency": 18000 }

TTL: 2 minutes (rolling window of last 5 minutes checked during auto-disable evaluation).

Reading from Cloudflare Logs (Workers Tail)

Section titled “Reading from Cloudflare Logs (Workers Tail)”
Terminal window
wrangler tail dashboard-api --format=json 2>/dev/null | \
jq 'select(.logs[].message[0].event == "list.publications") |
.logs[].message[0] | {duration_ms, rollout}'

Look for the rollout block in structured logs:

{
"event": "list.publications",
"entity_id": "cesam",
"duration_ms": 142,
"rollout": {
"stage": 50,
"enabled": true,
"auto_disabled": false,
"effective_locale": "en",
"locale_search_active": true
}
}

locale_search_active: false means the entity was outside the current cohort.


Terminal window
curl -X PATCH https://dash.legaciti.org/api/site-tools/rollout \
-H "Cookie: CF_Authorization=<token>" \
-H "Content-Type: application/json" \
-d '{"enabled": false}'

This sets enabled: false globally. All entities fall back to plain LIKE search instantly (KV TTL ~60 s for config read; no deployment needed).

To reduce cohort without fully disabling:

Terminal window
curl -X PATCH .../api/site-tools/rollout \
-d '{"stage": 10}'

The system auto-disables locale search when the 5-minute rolling window exceeds:

  • Error rate: > 10%
  • Average latency: > 2000ms

Minimum 20 requests must be in the window before thresholds are evaluated.

When auto-disabled, GET /api/site-tools/rollout returns:

{
"auto_disabled": true,
"disabled_at": 1717000000,
"disabled_reason": "error_rate_exceeded"
}
  1. Investigate the root cause (check CF logs for errors, check D1 query plans).
  2. Fix the underlying issue (schema migration, query rewrite, etc.).
  3. Clear auto-disable and re-enable:
    Terminal window
    curl -X PATCH .../api/site-tools/rollout \
    -d '{"enabled": true}'
    This resets auto_disabled: false and re-enables the feature at the current stage.
  4. Monitor again before advancing.

Run the CI locale-key check:

Terminal window
node scripts/check-locale-keys.mjs

Run dashboard-api tests:

Terminal window
pnpm --filter dashboard-api exec vitest run

Check rollout unit tests specifically:

Terminal window
pnpm --filter dashboard-api exec vitest run src/lib/rollout.test.ts

Threshold constantValue
ERROR_RATE_THRESHOLD0.10
LATENCY_THRESHOLD_MS2000
MIN_REQUESTS_FOR_DISABLE20
METRICS_WINDOW_MINUTES5