Describe the bug
Context
Transaction::generate_adds currently returns two logically distinct things: an iterator of add file actions AND an optional RowTrackingDomainMetadata. These were bundled together because computing the row ID high water mark requires visiting all add file batches (the same pass that generates the add actions) making a single function feel natural. However, domain metadata is a separate concept from add actions in the Delta log, and coupling them in one function creates several problems:
- The function signature leaks a row-tracking-specific type into the general add action path
- Adding other domain metadata concerns (e.g., future features that also need a commit-time scan of add files) would require further hacking into this function
- The
is_create_table() branching inside generate_adds is hard to follow (it's create-table logic embedded in a general-purpose function)
- Future writers that need to inspect add files for multiple purposes would trigger multiple passes
Proposed direction
Separate the concerns:
generate_adds produces only add actions (no domain metadata output)
- Row tracking HWM computation is done in a dedicated step that can share the same file scan or use the already computed visitor results
- Domain metadata actions (row tracking HWM, and any future additions) are assembled in
generate_domain_metadata_actions where they already belong
This likely involves restructuring the commit pipeline in Transaction::build_commit_info to either
(a) do a single shared scan of add files that feeds both the add action iterator and the domain metadata builder,
(b) accept that the HWM scan is a separate lightweight pass over the already-materialized add_files_metadata.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
Context
Transaction::generate_addscurrently returns two logically distinct things: an iterator of add file actions AND an optionalRowTrackingDomainMetadata. These were bundled together because computing the row ID high water mark requires visiting all add file batches (the same pass that generates the add actions) making a single function feel natural. However, domain metadata is a separate concept from add actions in the Delta log, and coupling them in one function creates several problems:is_create_table()branching insidegenerate_addsis hard to follow (it's create-table logic embedded in a general-purpose function)Proposed direction
Separate the concerns:
generate_addsproduces only add actions (no domain metadata output)generate_domain_metadata_actionswhere they already belongThis likely involves restructuring the commit pipeline in
Transaction::build_commit_infoto either(a) do a single shared scan of add files that feeds both the add action iterator and the domain metadata builder,
(b) accept that the HWM scan is a separate lightweight pass over the already-materialized
add_files_metadata.To Reproduce
No response
Expected behavior
No response
Additional context
No response