Skip to content

refactor: separate add action generation from domain metadata in generate_adds #2370

@rliao147

Description

@rliao147

Describe the bug

Context

Transaction::generate_adds currently returns two logically distinct things: an iterator of add file actions AND an optional RowTrackingDomainMetadata. These were bundled together because computing the row ID high water mark requires visiting all add file batches (the same pass that generates the add actions) making a single function feel natural. However, domain metadata is a separate concept from add actions in the Delta log, and coupling them in one function creates several problems:

  1. The function signature leaks a row-tracking-specific type into the general add action path
  2. Adding other domain metadata concerns (e.g., future features that also need a commit-time scan of add files) would require further hacking into this function
  3. The is_create_table() branching inside generate_adds is hard to follow (it's create-table logic embedded in a general-purpose function)
  4. Future writers that need to inspect add files for multiple purposes would trigger multiple passes

Proposed direction

Separate the concerns:

  1. generate_adds produces only add actions (no domain metadata output)
  2. Row tracking HWM computation is done in a dedicated step that can share the same file scan or use the already computed visitor results
  3. Domain metadata actions (row tracking HWM, and any future additions) are assembled in generate_domain_metadata_actions where they already belong

This likely involves restructuring the commit pipeline in Transaction::build_commit_info to either
(a) do a single shared scan of add files that feeds both the add action iterator and the domain metadata builder,
(b) accept that the HWM scan is a separate lightweight pass over the already-materialized add_files_metadata.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions