Resync Unverified Works
Section titled “Resync Unverified Works”Scan all stored ORCID items and queue ingestion for every person who has works with a now-eligible work type so the ingestion pipeline re-classifies them as works instead of leaving them in the unverified review queue.
Script: scripts/db/resync-unverified-works.mjs
npm alias: pnpm resync:unverified-works
Token is loaded automatically from .env at the repo root (CLOUDFLARE_API_TOKEN or CF_API_TOKEN).
When to use this
Section titled “When to use this”Run this after expanding the ORCID_WORK_TYPE_KEYWORDS list. Existing
orcid_works rows that were stored before the change still carry their original
classification. This script finds those rows and queues ingestion jobs for the
affected ORCIDs so the pipeline re-processes them with the updated keyword list.
Newly-ingested works (after the keyword change is deployed) are handled automatically — no manual resync needed.
Commands
Section titled “Commands”Dry-run (plan only, no queues touched)
Section titled “Dry-run (plan only, no queues touched)”Prints the affected ORCID count and a work-type breakdown without sending anything:
pnpm resync:unverified-works --dry-runLive run (default: remote D1)
Section titled “Live run (default: remote D1)”pnpm resync:unverified-worksThis defaults to low-pressure ingestion mode (skip_existing=true,
concurrency=1, enqueue-delay-ms=1000) to reduce worker invocation pressure.
Scope to a specific entity
Section titled “Scope to a specific entity”pnpm resync:unverified-works --entity-id=cesamAdd more time to watch
Section titled “Add more time to watch”pnpm resync:unverified-works --watch-timeout=300By default, watch mode has no timeout and waits until all queued jobs reach a
terminal state (complete or failed).
Use local D1 (for testing against local state)
Section titled “Use local D1 (for testing against local state)”pnpm resync:unverified-works --localControl queue send concurrency (default: 1)
Section titled “Control queue send concurrency (default: 1)”pnpm resync:unverified-works --concurrency=10Control delay between queue chunks (default: 1000ms)
Section titled “Control delay between queue chunks (default: 1000ms)”pnpm resync:unverified-works --enqueue-delay-ms=2000Re-enrich already-known publications (higher load)
Section titled “Re-enrich already-known publications (higher load)”pnpm resync:unverified-works --no-skip-existingAll options
Section titled “All options”| Flag | Default | Description |
|---|---|---|
--dry-run | off | Print plan only; no messages sent |
--remote | on | Use remote D1 databases |
--local | off | Use local D1 databases |
--entity-id=<id> | cesam | Entity ID written into each sync job message |
--concurrency=<n> | 1 | Concurrent queue send requests |
--enqueue-delay-ms=<n> | 1000 | Delay between queue chunks in milliseconds |
--no-skip-existing | off | Re-enrich already-known publications (higher load) |
--no-watch | off | Queue jobs only, skip live status polling |
--watch-timeout | off | Optional timeout in seconds for watch mode |
What happens after queueing
Section titled “What happens after queueing”Each queued message triggers the ingestion pipeline for one ORCID.
The worker fetches ORCID works and runs classifyOrcidWork with the updated
keyword list. Items that now resolve to "work" are written to the works
table via handleNonDoiOrcidWork and will no longer appear in the unverified
review queue.
The script also polls ingestion_jobs live and prints counts for missing,
queued, processing, complete, and failed jobs so you can see if processing
started and whether it finished.