Summary
The exponential backoff calculation in packages/payload/src/queues/errors/calculateBackoffWaitUntil.ts (line 24) uses a deterministic delay formula:
waitUntil = new Date(now.getTime() + Math.pow(2, totalTried) * delay)
This means concurrent jobs that fail at the same time will all retry at exactly the same intervals (2^n * delay), creating a thundering herd effect on the queue backend.
Suggested fix
Add decorrelated jitter to spread retry times:
const baseDelay = Math.pow(2, totalTried) * delay
const jitter = baseDelay * (0.5 + Math.random() * 0.5)
waitUntil = new Date(now.getTime() + jitter)
This preserves the exponential growth curve while adding ±50% randomization, which is standard practice for distributed retry (AWS, Google Cloud, and most resilience libraries recommend this).
Why this matters
In production job queues with many workers, deterministic backoff causes retry storms — all failed jobs retry simultaneously, overloading the backend at predictable intervals. Jitter breaks the synchronization and spreads load.
Impact
- Additive change, no API modification
- Existing behavior preserved within ±50% timing variance
- No new dependencies
- The
fixed backoff type is unaffected (only exponential changes)
Happy to submit a PR if this approach looks right.
Summary
The exponential backoff calculation in
packages/payload/src/queues/errors/calculateBackoffWaitUntil.ts(line 24) uses a deterministic delay formula:This means concurrent jobs that fail at the same time will all retry at exactly the same intervals (
2^n * delay), creating a thundering herd effect on the queue backend.Suggested fix
Add decorrelated jitter to spread retry times:
This preserves the exponential growth curve while adding ±50% randomization, which is standard practice for distributed retry (AWS, Google Cloud, and most resilience libraries recommend this).
Why this matters
In production job queues with many workers, deterministic backoff causes retry storms — all failed jobs retry simultaneously, overloading the backend at predictable intervals. Jitter breaks the synchronization and spreads load.
Impact
fixedbackoff type is unaffected (onlyexponentialchanges)Happy to submit a PR if this approach looks right.