Skip to content

fga tuple delete --file hangs indefinitely — retries HTTP 400 errors with exponential backoff #677

@bassemkaroui

Description

@bassemkaroui

Description

fga tuple delete --file hangs indefinitely in non-interactive environments (e.g. Docker containers, CI) when some tuples in the file don't exist in the store. The CLI retries 400 Bad Request responses with exponential backoff (up to ~28s between attempts), even though 400 errors are client errors and not retryable.

This happens regardless of whether --on-missing ignore is passed.

Environment

  • CLI version: fga v0.7.12 (commit 79d44e3, 2026-03-23)
  • Server version: openfga/openfga:v1.9.2
  • OS: Linux (Docker container, non-interactive — no stdin)

Steps to Reproduce

  1. Create a store with a model
  2. Run fga tuple delete --file on a YAML file where some tuples don't exist in the store:
fga tuple delete --file tuples.yaml --max-tuples-per-write 1 --on-missing ignore --store-id <STORE_ID>

Expected Behavior

  • Tuples that exist are deleted
  • Tuples that don't exist are skipped (or fail immediately with a clear error)
  • Command completes promptly

Actual Behavior

For each non-existent tuple, the server returns HTTP 400:

{
  "code": "write_failed_due_to_invalid_input",
  "message": "cannot delete a tuple which does not exist: ..."
}

The CLI retries this 400 with exponential backoff:

Waiting 28.4s to retry Write (attempt 5, status=400, error=POST validation error for Write POST ...)

With --max-parallel-requests 10 (default), multiple goroutines are all stuck in retry loops simultaneously, causing the command to hang indefinitely.

Root Cause

The retry logic treats HTTP 400 as a retryable error. HTTP 400 is a client error — the request is malformed or invalid, and resending the same request will always produce the same result. Only 429 (rate limit) and 5xx (server errors) should be retried.

Impact

This makes fga tuple delete --file unusable in any automated pipeline (Docker prestart scripts, CI/CD, init containers) where tuples may or may not exist. The standard pattern of "delete all, then write all" to refresh tuples cannot work reliably.

Suggested Fix

Do not retry HTTP 4xx responses (except 429). Return the error immediately so the caller (or --on-missing ignore logic) can handle it.

Workaround

Bypass the CLI and use the REST API directly, reading existing tuples first and deleting only those that actually exist.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Intake

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions