Skip to content

Add jitter to exponential backoff in job queue retry #16318

@bobinzuks

Description

@bobinzuks

Summary

The exponential backoff calculation in packages/payload/src/queues/errors/calculateBackoffWaitUntil.ts (line 24) uses a deterministic delay formula:

waitUntil = new Date(now.getTime() + Math.pow(2, totalTried) * delay)

This means concurrent jobs that fail at the same time will all retry at exactly the same intervals (2^n * delay), creating a thundering herd effect on the queue backend.

Suggested fix

Add decorrelated jitter to spread retry times:

const baseDelay = Math.pow(2, totalTried) * delay
const jitter = baseDelay * (0.5 + Math.random() * 0.5)
waitUntil = new Date(now.getTime() + jitter)

This preserves the exponential growth curve while adding ±50% randomization, which is standard practice for distributed retry (AWS, Google Cloud, and most resilience libraries recommend this).

Why this matters

In production job queues with many workers, deterministic backoff causes retry storms — all failed jobs retry simultaneously, overloading the backend at predictable intervals. Jitter breaks the synchronization and spreads load.

Impact

  • Additive change, no API modification
  • Existing behavior preserved within ±50% timing variance
  • No new dependencies
  • The fixed backoff type is unaffected (only exponential changes)

Happy to submit a PR if this approach looks right.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions