Plan Folder Template
Section titled “Plan Folder Template”Use this template when creating or reorganizing a multi-file plan under
plans/{plan-slug}/.
This version explicitly integrates:
- functional regression tests to protect correctness
- performance regression benchmarks to protect latency, throughput, memory, and cost
- baseline/evidence documents under
kb/ - rollout performance verification under
rollout/
Required folder structure
Section titled “Required folder structure”plans/{plan-slug}/├── plan-index.md├── completed/├── rollout/├── todo/│ └── blocked/└── kb/Folder rules
Section titled “Folder rules”plan-index.md: the only Markdown file allowed at the plan root. It is the folder map, roadmap, status source of truth, and quality gate tracker for the plan.todo/: open, proposed, or in-progress work packets that are ready to execute. New implementation tasks start here.todo/blocked/: blocked work packets. Move a task packet here as soon as it is blocked by another task, and include explicit blocker metadata in both the blocked file andplan-index.md.completed/: finished implementation or evidence packets. Move a file here only after matching code, migration, verification, regression tests, benchmark evidence where required, docs, and rollout evidence are complete.rollout/: rollout verification, cutover evidence, release gates, go/no-go packets, production validation notes, and performance validation records.kb/: knowledge-base/reference docs used to support implementation, including inventories, decisions, manifests, baselines, regression coverage maps, benchmark methodology, and runbook templates that are not themselves task packets.
Quality policy
Section titled “Quality policy”Apply this policy to every plan:
- Every bug fix must include an automated regression test when feasible.
- Every refactor must identify impacted regression coverage.
- Any change to a hot path, query path, event pipeline, batch job, or high-traffic endpoint must assess performance regression risk.
- If performance regression risk exists, capture a baseline before the change and compare after the change using documented thresholds.
- No task is complete until required correctness and performance evidence is linked from the plan.
Linking rules
Section titled “Linking rules”- Use relative Markdown links from the current file to the target file.
- After moving files between folders, update links in every Markdown file that points to the moved file.
- Link to
plan-index.mdas the entrypoint from parent catalogues, not to a task packet. - Avoid stale names such as
README.mdor legacy index filenames whenplan-index.mdis the plan entrypoint.
plan-index.md Required Content
Section titled “plan-index.md Required Content”Start every plan folder with this root file:
# PLAN-ID Plan Index
Status: ProposedOwner: Team/PersonDate: YYYY-MM-DDTarget window: YYYY-MM-DD to YYYY-MM-DDRelated issue(s): #123, #456Related docs: docs/... , plans/...
This index maps the plan folder structure, keeps the roadmap checklist, andtracks correctness/performance evidence. Mark tasks with `[x]` only after thematching code, migration, regression test evidence, benchmark evidence whenrequired, verification, and rollout evidence exist.
## Plan quality gates
- [ ] Functional regression scope identified- [ ] Performance regression risk assessed- [ ] Baseline required? If yes, linked under `kb/`- [ ] Rollout verification required? If yes, linked under `rollout/`
## Folder Map
| Folder | Purpose | Files || :-------------------------------- | :---------------------------------------------------- | ----: || `plans/{plan-slug}/` | Plan entrypoint and roadmap checklist. | 1 || `plans/{plan-slug}/completed/` | Completed implementation/evidence packets. | 0 || `plans/{plan-slug}/rollout/` | Rollout verification and release/perf gates. | 0 || `plans/{plan-slug}/todo/` | Open or in-progress packets ready to execute. | 0 || `plans/{plan-slug}/todo/blocked/` | Blocked work packets with blocker references. | 0 || `plans/{plan-slug}/kb/` | References, inventories, baselines, and support docs. | 0 |
### Root
- [plan-index.md](./plan-index.md) - Plan folder index and roadmap checklist.
### Completed
Completed implementation/evidence packets.
- None yet.
### Rollout
Rollout verification, release gates, and performance validation.
- [PLAN-ID-performance-gate.md](./rollout/PLAN-ID-performance-gate.md) - Functional and performance release gate- [PLAN-ID-production-validation.md](./rollout/PLAN-ID-production-validation.md) - Post-deploy validation and monitoring notes
### Todo
Open or in-progress work packets.
- [PLAN-ID-001-regression-foundation.md](./todo/PLAN-ID-001-regression-foundation.md) - Reproduce bug and define regression coverage- [PLAN-ID-002-implementation.md](./todo/PLAN-ID-002-implementation.md) - Implement fix/refactor- [PLAN-ID-003-performance-validation.md](./todo/PLAN-ID-003-performance-validation.md) - Benchmark baseline and comparison- [PLAN-ID-004-validation-cutover.md](./todo/PLAN-ID-004-validation-cutover.md) - Validation and cutover
### Blocked
Blocked work packets. Each item must identify the blocking task.
- [PLAN-ID-005-optional-followup.md](./todo/blocked/PLAN-ID-005-optional-followup.md) - Optional follow-up optimization (blocked by [PLAN-ID-003-performance-validation.md](./todo/PLAN-ID-003-performance-validation.md))
### KB
Knowledge-base references and implementation support.
- [PLAN-ID-inventory.md](./kb/PLAN-ID-inventory.md) - Inventory and baseline reference- [PLAN-ID-regression-test-inventory.md](./kb/PLAN-ID-regression-test-inventory.md) - Protected behaviors and bug reproductions- [PLAN-ID-performance-baseline.md](./kb/PLAN-ID-performance-baseline.md) - Benchmark environment, baselines, thresholds- [PLAN-ID-benchmark-methodology.md](./kb/PLAN-ID-benchmark-methodology.md) - Commands, datasets, warmup rules, variance policy
## Roadmap
- [ ] PLAN-ID-001: Reproduce issue and add regression tests- [ ] PLAN-ID-002: Implement fix/refactor- [ ] PLAN-ID-003: Capture baseline and compare post-change performance- [ ] PLAN-ID-004: Validation and cutover
## Status Board
| Packet | Folder | Owner | Status | Blocked by | Last update || :---------- | :-------------- | :---------- | :------ | :---------------------------------------------------------- | :---------- || PLAN-ID-001 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || PLAN-ID-002 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || PLAN-ID-003 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || PLAN-ID-004 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || PLAN-ID-005 | `todo/blocked/` | Team/Person | Blocked | [PLAN-ID-003](./todo/PLAN-ID-003-performance-validation.md) | YYYY-MM-DD |
Status values:
- `Todo`- `In progress`- `Blocked`- `Done`
## Regression coverage summary
| Behavior / bug / workflow | Test level | Status | Packet | Evidence || :------------------------ | :---------- | :----- | :---------- | :---------- || Example bug reproduction | Unit | Todo | PLAN-ID-001 | `tests/...` || Critical API workflow | Integration | Todo | PLAN-ID-002 | `tests/...` || Main user journey | E2E | Todo | PLAN-ID-004 | `tests/...` |
## Performance gate summary
| Path / job / endpoint | Baseline doc | Metric | Threshold | Status || :-------------------- | :---------------------------------------------------------------------- | :---------- | :-------- | :----- || `/api/search` | [PLAN-ID-performance-baseline.md](./kb/PLAN-ID-performance-baseline.md) | p95 latency | +10% max | Todo || `event-consumer` | [PLAN-ID-performance-baseline.md](./kb/PLAN-ID-performance-baseline.md) | throughput | -5% max | Todo |
## Progress Log
- YYYY-MM-DD: Created plan folder.- YYYY-MM-DD: Added regression and performance validation scope.todo/PLAN-ID-###-task-name.md Template
Section titled “todo/PLAN-ID-###-task-name.md Template”Use this template for active task packets under todo/ or todo/blocked/.
# PLAN-ID-### — Plan Title
Status: ProposedBlocked by: NoneOwner: Team/PersonDate: YYYY-MM-DDTarget window: YYYY-MM-DD to YYYY-MM-DDRelated issue(s): #123, #456Related docs: docs/... , plans/...
Parent index: [plan-index.md](../plan-index.md)
## Engineering defaults
- Default to Test Driven Development (TDD): write/adjust failing tests first, implement second, refactor third.- Treat this as an event-driven application: model changes and workflows around events, event contracts, and clear producers/consumers.- Always evaluate whether responsibilities should be decoupled into dedicated Cloudflare Workers using Service Bindings.- If decomposition is rejected, document why (cost, complexity, latency, ownership, operational overhead).- Always include GlitchTip and Cloudflare Observability integration in implementation and rollout scope.- Every bug fix must include a regression test when feasible.- Every refactor must assess whether performance regression benchmarks are required.
## 1. Context
Briefly describe:
- Why this plan exists- What changed or what problem was observed- Why now- Whether the change is primarily a bug fix, refactor, optimization, migration, or rollout task
## 2. Goals
- Goal 1- Goal 2- Goal 3
## 3. Non-goals
- Non-goal 1- Non-goal 2
## 4. Constraints and assumptions
- Constraint 1- Constraint 2- Assumption 1- Assumption 2
## 5. Scope
In scope:
- Item 1- Item 2
Out of scope:
- Item 1- Item 2
## 6. Target architecture or design
Describe the intended end state.
Cloudflare decomposition decision:
- Should any part be split into one or more Workers?- If yes, list proposed worker boundaries and Service Bindings.- If no, justify why decomposition is not worth it.
Optional diagram:
```textCurrent -> Intermediate -> Target```
## 7. Packet map
- TKT-001: Reproduce issue and add regression coverage- TKT-002: Core implementation- TKT-003: Performance validation- TKT-004: Validation and cutover
## 7.1 Packet status board
Use this table as the live execution board for the plan.
| Ticket | Folder | Owner | Status | Blocked by | Last update || :------ | :-------------- | :---------- | :------ | :----------------------------------------------------- | :---------- || TKT-001 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || TKT-002 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || TKT-003 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || TKT-004 | `todo/` | Team/Person | Todo | - | YYYY-MM-DD || TKT-005 | `todo/blocked/` | Team/Person | Blocked | [TKT-003](./blocked/TKT-003-performance-validation.md) | YYYY-MM-DD |
Status values:
- `Todo`- `In progress`- `Blocked`- `Done`
## 7.2 Progress update log
- YYYY-MM-DD: Started TKT-001, defined scope and failing test cases- YYYY-MM-DD: TKT-001 moved to In progress- YYYY-MM-DD: TKT-001 marked Done; TKT-002 started
## 8. Regression test scope
Behavior protected by regression tests:
- Behavior 1- Behavior 2- Behavior 3
Required regression coverage:
- [ ] Reproduce the original bug with a failing automated test- [ ] Cover the expected happy path- [ ] Cover edge cases related to the bug/refactor- [ ] Cover adjacent behavior likely to break from this change- [ ] Link the resulting tests in `kb/PLAN-ID-regression-test-inventory.md`
Test levels in scope:
- [ ] Unit- [ ] Integration- [ ] End-to-end- [ ] Contract/event tests where applicable
## 9. Performance benchmark scope
Performance regression assessment:
- [ ] No benchmark required for this packet- [ ] Benchmark required for this packet
Why:
- Reason 1- Reason 2
Critical paths to benchmark:
- Path 1- Path 2
Metrics to capture:
- [ ] p50 latency- [ ] p95 latency- [ ] p99 latency if applicable- [ ] Throughput- [ ] CPU time- [ ] Memory usage- [ ] Queue lag / job duration- [ ] Cost-related metric if relevant
Baseline and thresholds:
- Baseline source: `../kb/PLAN-ID-performance-baseline.md`- Methodology source: `../kb/PLAN-ID-benchmark-methodology.md`- Regression threshold: define acceptable variance, for example: - p95 latency must not increase by more than 10% - throughput must not drop by more than 5% - memory must not increase by more than 5%- If threshold is exceeded, document approval or rollback decision
Benchmark evidence:
- [ ] Baseline captured- [ ] Post-change benchmark captured- [ ] Results compared in equivalent environment- [ ] Variance/noise explained if relevant- [ ] Benchmark result linked in rollout packet
## 10. Execution plan
---
## TKT-001 — Reproduce issue and add regression coverage
Outcome:
- The issue is reproducible in automated tests and the expected behavior is documented.
Implementation tasks:
- [ ] TDD: add failing tests for this ticket scope first- [ ] Reproduce the bug in a focused unit or integration test- [ ] Add edge case coverage for known variants- [ ] Document protected behaviors in regression inventory- [ ] TDD: refactor after tests pass and keep behavior stable
Acceptance criteria:
- [ ] The original bug is reproduced by at least one failing test before the fix- [ ] Regression tests pass after the fix- [ ] Regression inventory is updated
Risks:
- The bug may depend on production-only state or timing- Reproduction may require fixtures or seeded data
Rollback:
- Revert test harness changes if they create instability- Keep issue documented if automation is not yet feasible
---
## TKT-002 — Core implementation
Outcome:
- The bug is fixed or the refactor is complete without changing intended behavior.
Implementation tasks:
- [ ] TDD: add or update failing tests for implementation scope first- [ ] Implement the fix/refactor- [ ] Preserve or improve event contracts and error handling- [ ] Add telemetry/logging for changed surfaces- [ ] Verify GlitchTip and Cloudflare Observability coverage- [ ] TDD: refactor after tests pass and keep behavior stable
Acceptance criteria:
- [ ] All impacted regression tests pass- [ ] No intended behavior change is left undocumented- [ ] Observability coverage exists for changed paths
Risks:
- Behavior may shift subtly during refactor- Adjacent systems may rely on undocumented behavior
Rollback:
- Revert implementation changes- Re-enable previous path with feature flag if applicable
---
## TKT-003 — Performance validation
Outcome:
- Performance baseline and post-change comparison are documented for affected critical paths.
Implementation tasks:
- [ ] Determine whether benchmark coverage is required- [ ] Create or update benchmark harness if missing- [ ] Capture pre-change baseline using documented methodology- [ ] Capture post-change benchmark using the same methodology- [ ] Record results, thresholds, and pass/fail decision- [ ] Document any approved regression or follow-up task
Acceptance criteria:
- [ ] Benchmark methodology is documented- [ ] Before/after comparison exists for each critical path in scope- [ ] Results meet thresholds or have explicit approval and follow-up- [ ] Rollout performance gate references benchmark evidence
Risks:
- Benchmark noise may produce misleading results- Environment drift may invalidate comparisons
Rollback:
- Re-run in controlled environment- Defer rollout if results are inconclusive- Revert implementation if thresholds are exceeded
---
## TKT-004 — Validation and cutover
Outcome:
- The change is validated in staging/pre-prod and production rollout is safe to proceed.
Implementation tasks:
- [ ] Run smoke tests- [ ] Run targeted regression suite- [ ] Run release/performance gates- [ ] Validate dashboards, alerts, traces, and error monitoring- [ ] Complete rollout packet and go/no-go decision- [ ] Monitor post-deploy and document findings
Acceptance criteria:
- [ ] Functional regression suite passes- [ ] Performance gate passes or accepted exception is documented- [ ] Rollout packet includes commands, evidence, and final decision- [ ] Production validation steps are documented
Risks:
- Real traffic may differ from staging assumptions- Canary may hide lower-frequency regressions
Rollback:
- Revert deployment- Disable feature flag- Restore previous Worker binding/config if relevant- Re-run production validation after rollback
## 11. Cross-cutting checklist
- [ ] TDD evidence captured (failing tests -> passing tests -> refactor)- [ ] Functional regression tests added for bug fixes and touched critical behavior- [ ] Regression test inventory updated for new protected behavior- [ ] Performance benchmark requirement assessed- [ ] Benchmark methodology documented or updated- [ ] Performance baseline captured for affected hot paths- [ ] Post-change benchmark comparison recorded- [ ] Performance regression thresholds documented- [ ] Worker decoupling assessment documented- [ ] Service Bindings plan documented when decomposition is selected- [ ] Event contracts documented (producer, consumer, payload shape, failure/retry behavior)- [ ] GlitchTip integration added or verified for changed surfaces- [ ] Cloudflare Observability integration added or verified (logs/traces/metrics)- [ ] Telemetry/logging added- [ ] Alerts/SLOs updated- [ ] Security review completed- [ ] Docs updated- [ ] Runbooks updated- [ ] Tests added or updated- [ ] Performance baseline captured- [ ] Unused code removed when refactoring (dead files, stale exports, obsolete paths)
## 12. Test and validation strategy
TDD workflow:
- [ ] Red: add tests that fail for the intended behavior- [ ] Green: implement minimal code to pass tests- [ ] Refactor: improve structure while preserving passing tests
Functional regression validation:
- [ ] Add regression tests for reproduced bugs- [ ] Run affected unit tests- [ ] Run affected integration tests- [ ] Run critical workflow regression suite- [ ] Run contract/event tests where applicable
Performance regression validation:
- [ ] Identify affected hot paths/endpoints/jobs- [ ] Capture pre-change baseline- [ ] Run post-change benchmark in a comparable environment- [ ] Compare against threshold/budget- [ ] Document deviations and approval if threshold is exceeded
Local validation:
- [ ] Unit tests- [ ] Integration tests- [ ] Type-check and lint- [ ] Focused benchmark sanity check when stable enough
Staging or pre-prod validation:
- [ ] Smoke tests- [ ] Functional regression tests- [ ] Performance regression benchmarks for critical paths- [ ] Dashboard and trace verification
Production validation:
- [ ] Health checks- [ ] Real traffic verification- [ ] Error/latency monitoring- [ ] Benchmark-related SLI/SLO review- [ ] Post-deploy comparison against baseline
## 13. Deployment plan
Deployment order:
1. Step 12. Step 23. Step 3
Operational notes:
- Note 1- Note 2- Identify whether rollout uses feature flags, canary, shadow traffic, or direct cutover
Release gates:
- [ ] Functional regression suite passed- [ ] Performance gate passed or exception approved- [ ] Monitoring links verified- [ ] Rollback path tested or confirmed
## 14. Rollback plan
Trigger conditions:
- Condition 1- Condition 2- Performance trigger, for example: p95 latency regression > 15%- Stability trigger, for example: error rate increases above threshold
Rollback steps:
1. Step 12. Step 23. Step 3
Post-rollback validation:
- [ ] Service health restored- [ ] Key regression tests re-run if applicable- [ ] Performance returns to baseline or accepted range- [ ] Incident/follow-up logged
## 15. Success metrics
- [ ] Reproduced bug is covered by automated regression tests- [ ] No failing tests in impacted regression suite- [ ] Metric 1 with threshold- [ ] Metric 2 with threshold- [ ] Example: `/api/search` p95 latency does not regress by more than 10%- [ ] Example: batch job memory usage remains within baseline + 5%- [ ] Example: error rate remains below 1%
## 16. Open questions
- [ ] Question 1- [ ] Question 2- [ ] Is benchmark noise low enough to use this result as a release gate?- [ ] Are current dashboards sufficient to detect post-release regressions?
## 17. Change log
- YYYY-MM-DD: Created plan- YYYY-MM-DD: Updated scope- YYYY-MM-DD: Added regression coverage requirements- YYYY-MM-DD: Added performance validation requirements- YYYY-MM-DD: Completed TKT-001rollout/PLAN-ID-rollout-name.md Minimum content
Section titled “rollout/PLAN-ID-rollout-name.md Minimum content”Use rollout files for verification packets, release gates, production cutover evidence, rollback notes, and performance validation.
PLAN-ID Rollout / Release Gate
Section titled “PLAN-ID Rollout / Release Gate”Status: Proposed Owner: Team/Person Date: YYYY-MM-DD Related issue(s): #123, #456 Related docs: docs/… , plans/… Parent index: plan-index.md
Scope of rollout
Section titled “Scope of rollout”Describe what is being rolled out and which services/endpoints/jobs/workflows are affected.
Preconditions and go/no-go checks
Section titled “Preconditions and go/no-go checks”- Code merged and deployed to target environment
- Functional regression suite passed
- Required benchmark baseline exists
- Post-change performance comparison exists
- Observability dashboards available
- Alerts/SLOs ready
- Rollback steps confirmed
Commands run and results
Section titled “Commands run and results”npm run lintnpm run typechecknpm run testnpm run test:regressionnpm run bench:critical-pathResults summary:
- Command 1: pass/fail
- Command 2: pass/fail
- Regression suite result:
- Benchmark result:
Monitoring links or evidence locations
Section titled “Monitoring links or evidence locations”- Dashboard:
- Trace search:
- GlitchTip issue search:
- CI artifact:
- Benchmark report:
- Logs:
Performance comparison
Section titled “Performance comparison”| Path / metric | Baseline | Current | Threshold | Result |
|---|---|---|---|---|
/api/search p95 | 120ms | 126ms | +10% | Pass |
worker-consumer throughput | 500 msg/s | 480 msg/s | -5% | Pass |
Rollback trigger and rollback steps
Section titled “Rollback trigger and rollback steps”Trigger:
- Error rate threshold exceeded
- Latency threshold exceeded
- Throughput below threshold
- Functional regression observed in production
Rollback steps:
- Step 1
- Step 2
- Step 3
Final decision and follow-up tasks
Section titled “Final decision and follow-up tasks”Decision:
- Go / No-go / Go with exception
Follow-up tasks:
- Task 1
- Task 2
kb/PLAN-ID-reference-name.md Minimum Content
Section titled “kb/PLAN-ID-reference-name.md Minimum Content”Use KB files for implementation support docs that are not active task packets.
Recommended KB files
Section titled “Recommended KB files”Typical files for bug-fix/refactor plans:
- kb/PLAN-ID-inventory.md
- kb/PLAN-ID-regression-test-inventory.md
- kb/PLAN-ID-performance-baseline.md
- kb/PLAN-ID-benchmark-methodology.md
Generic KB template
Section titled “Generic KB template”# PLAN-ID Reference Name
Parent index: [plan-index.md](../plan-index.md)
## Purpose
Describe why this reference exists.
## Source of truth or data sources
- Source 1- Source 2
## Key decisions or findings
- Decision 1- Decision 2
## How to regenerate or verify
1. Step 12. Step 23. Step 3
## Consumers or dependent task packets
- `../todo/PLAN-ID-001-some-task.md`- `../rollout/PLAN-ID-some-rollout.md`kb/PLAN-ID-regression-test-inventory.md Suggested Content
Section titled “kb/PLAN-ID-regression-test-inventory.md Suggested Content”# PLAN-ID Regression Test Inventory
Parent index: [plan-index.md](../plan-index.md)
## Purpose
Track the behaviors, bug reproductions, and workflows protected by automated regression tests.
## Source of truth or data sources
- Unit test suite- Integration test suite- End-to-end suite- Linked issues/bug reports
## Key decisions or findings
- Every fixed bug should have a reproducing automated test where feasible- Critical workflows require at least integration-level coverage- Refactors must preserve documented protected behaviors
## Protected scenarios
| Scenario | Test level | Location | Related issue | Status || :----------------- | :---------- | :---------- | :------------ | :------ || Example bug case | Unit | `tests/...` | #123 | Planned || Checkout edge case | Integration | `tests/...` | #456 | Planned |
## How to regenerate or verify
- Run targeted regression suite- Confirm issue-to-test linkage is still valid
## Consumers or dependent task packets
- `../todo/PLAN-ID-001-regression-foundation.md`- `../todo/PLAN-ID-002-implementation.md`kb/PLAN-ID-performance-baseline.md Suggested Content
Section titled “kb/PLAN-ID-performance-baseline.md Suggested Content”# PLAN-ID Performance Baseline
Parent index: [plan-index.md](../plan-index.md)
## Purpose
Document baseline performance for critical paths affected by this plan.
## Source of truth or data sources
- Local benchmark harness- CI benchmark artifacts- Staging metrics- Cloudflare Observability dashboards
## Key decisions or findings
- Compare like-for-like environments only- Use repeated runs, not a single execution- Prefer p50/p95 or throughput over anecdotal timings- Record warmup rules and dataset size
## Baseline metrics
| Path | Metric | Baseline | Threshold || :--------------- | :---------- | --------: | --------: || `/api/search` | p95 latency | 120ms | +10% max || `/api/checkout` | p95 latency | 240ms | +10% max || `event-consumer` | throughput | 500 msg/s | -5% max || `batch-job` | max memory | 300MB | +5% max |
## How to regenerate or verify
1. Seed dataset2. Run benchmark commands from methodology doc3. Save results to CI artifact or report4. Update rollout packet with comparison
## Consumers or dependent task packets
- `../todo/PLAN-ID-003-performance-validation.md`- `../rollout/PLAN-ID-performance-gate.md`kb/PLAN-ID-benchmark-methodology.md Suggested Content
Section titled “kb/PLAN-ID-benchmark-methodology.md Suggested Content”# PLAN-ID Benchmark Methodology
Parent index: [plan-index.md](../plan-index.md)
## Purpose
Define how benchmarks must be run so before/after comparisons are valid.
## Source of truth or data sources
- Benchmark scripts- Seed data definitions- CI job configuration- Runtime/environment settings
## Key decisions or findings
- Number of warmup runs:- Number of measured runs:- Dataset size/profile:- Environment parity requirements:- Allowed variance/noise window:- Metrics to report:
## How to regenerate or verify
1. Prepare identical environment2. Seed benchmark dataset3. Run documented commands4. Store raw output and summary artifact5. Compare against baseline thresholds
## Consumers or dependent task packets
- `../todo/PLAN-ID-003-performance-validation.md`- `../rollout/PLAN-ID-performance-gate.md`Completion Rules
Section titled “Completion Rules”When a task becomes blocked
Section titled “When a task becomes blocked”- Move the task packet from
todo/totodo/blocked/. - Set
Status: Blockedin the packet. - Set
Blocked by:with the blocking ticket ID and link. - Update
plan-index.mdfolder map counts, Blocked section, and status board blocker column.
Before moving a file from todo/ to completed/
Section titled “Before moving a file from todo/ to completed/”- All acceptance criteria are checked.
- Tests, lint, type-check, and relevant validation commands passed.
- Functional regression tests were added or updated when required.
- Regression inventory was updated when protected behavior changed.
- Performance benchmark requirement was explicitly assessed.
- Baseline and post-change benchmark evidence exist when benchmarking was required.
- Rollout/evidence files are added under
rollout/when the task affected production behavior. - Supporting references are moved or linked under
kb/when they are no longer task packets. -
plan-index.mdfolder counts, file lists, roadmap, status board, evidence matrix, and links are updated. - Links to the moved file are updated across Markdown files.
-
git diff --checkpasses.
Completion gate policy
Section titled “Completion gate policy”A task is not done unless all applicable gates are satisfied:
- Correctness gate:
- Regression tests exist and pass for the changed behavior.
- Performance gate:
- Benchmarks were assessed, and where required, baseline and post-change comparisons are documented and within threshold or explicitly approved.
- Rollout gate:
- Validation evidence, monitoring links, and rollback steps are recorded.