Skip to content

[Bug]: Python Enrichment BigQuery handler does not expose max_batch_duration_secs to BatchElements #38243

@prabhnoor0212

Description

@prabhnoor0212

What happened?

Description: In Python Enrichment, RequestResponseIO supports passing batching kwargs from Caller.batch_elements_kwargs() into BatchElements(**kwargs).

BatchElements supports max_batch_duration_secs, but BigQueryEnrichmentHandler currently only sets:

  • min_batch_size
  • max_batch_size

As a result, users of BigQueryEnrichmentHandler cannot configure max_batch_duration_secs, even though the downstream batching transform supports it.

Code path: BigQueryEnrichmentHandler -> Enrichment -> RequestResponseIO -> BatchElements

Current behavior max_batch_duration_secs is not available/configurable from BigQueryEnrichmentHandler and therefore is never forwarded to BatchElements.

Expected behavior BigQueryEnrichmentHandler should optionally accept max_batch_duration_secs and pass it through batch_elements_kwargs() when batching is enabled.

Proposed fix

  1. Add optional max_batch_duration_secs: Optional[float] = None to BigQueryEnrichmentHandler.__init__.
  2. When query_fn is not used, include it in _batching_kwargs (when provided).

Additional note: CloudSQLEnrichmentHandler appears to have the same batching-kwargs limitation and may benefit from parity in a follow-up or same PR.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions