refactor!: separate read state from effective state in Transaction by william-ch-databricks · Pull Request #2385 · delta-io/delta-kernel-rs

william-ch-databricks · 2026-04-14T19:21:07Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/alter-table-1-refactor-state [Files changed]
- stack/alter-table-2-supports-data-files [Files changed]
  - stack/alter-table-3-framework-add-column [Files changed]
    - stack/alter-table-4-set-nullable [Files changed]
      - stack/alter-table-5-column-mapping-add [Files changed]
        
        stack/alter-table-6-drop-column [Files changed]
        
        stack/alter-table-7-rename-column [Files changed]

Stacked PR

Use this link to review incremental changes.

stack/alter-table-1-refactor-state [Files changed]
- stack/alter-table-2-supports-data-files [Files changed]
  - stack/alter-table-3-framework-add-column [Files changed]
    - stack/alter-table-4-set-nullable [Files changed]
      - stack/alter-table-5-column-mapping-add [Files changed]
        
        stack/alter-table-6-drop-column [Files changed]
        
        stack/alter-table-7-rename-column [Files changed]

What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:

read_snapshot_opt: Option<SnapshotRef> -- the pre-commit table state (None for CREATE TABLE)
effective_table_config: TableConfiguration -- the config this commit will produce

This separates "what did I read?" (conflict detection, post-commit snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context). Write-path call sites read from
effective_table_config; read-path call sites use read_snapshot().

Also adds should_emit_protocol / should_emit_metadata flags to replace the old
is_create_table() checks for Protocol/Metadata action emission, and replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct TableConfiguration construction.

This is a pure refactor with no behaviour change.

How was this change tested?

All existing tests pass. Added unit tests for LogSegment::new_for_version_zero (valid input,
non-zero version rejection, non-commit file rejection).

william-ch-databricks · 2026-04-14T22:32:29Z

Range-diff: main (6a0ea39 -> c6c465f)

.github/actions/install-and-cache/action.yml

@@ -0,0 +1,105 @@
+diff --git a/.github/actions/install-and-cache/action.yml b/.github/actions/install-and-cache/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/install-and-cache/action.yml
++# This is copied from https://github.com/tecolicom/actions-install-and-cache
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache generic backend
++description: 'GitHub Action to run installer and cache the result'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  run:     { required: true,  type: string }
++  path:    { required: true,  type: string }
++  cache:   { required: false, type: string, default: yes }
++  key:     { required: false, type: string }
++  sudo:    { required: false, type: string }
++  verbose: { required: false, type: string, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.cache.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup install-and-cache
++        define() { IFS='\n' read -r -d '' ${1} || true ; }
++        define script <<'EOS_cad8_c24e_'
++        ${{ inputs.run }}
++        EOS_cad8_c24e_
++        directory="${{ inputs.path }}"
++        given_key="${{ inputs.key }}"
++        archive= key=
++        case "${{ inputs.cache }}" in
++            yes|workflow)
++                cache="${{ inputs.cache }}"
++                uname -mrs
++                hash=$( (uname -mrs ; cat <<< "$script" ; echo $directory) | \
++                        (md5sum||md5) | awk '{print $1}' )
++                key="${hash}${given_key:+-$given_key}"
++                [ "$cache" == 'workflow' ] && \
++                    key+="-${{ github.run_id }}-${{ github.run_attempt }}"
++                archive=$HOME/archive-$hash.tz
++                ;;
++            *)
++                cache=no
++                ;;
++        esac
++        # use "--recursive-unlink" option if GNU tar is found
++        if tar --version | grep GNU > /dev/null
++        then
++            tar="tar --recursive-unlink"
++        elif gtar --version | grep GNU > /dev/null
++        then
++            tar="gtar --recursive-unlink"
++        else
++            tar=tar
++        fi
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            cache=$cache
++            archive=$archive
++            key=$key
++            tar=$tar
++        END
++
++    - id: cache
++      if: steps.setup.outputs.cache != 'no'
++      uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
++      with:
++        path: ${{ steps.setup.outputs.archive }}
++        key:  ${{ steps.setup.outputs.key }}
++
++    - id: extract
++      if: steps.setup.outputs.cache != 'no' && steps.cache.outputs.cache-hit == 'true'
++      shell: bash
++      run: |
++        : extract
++        archive="${{ steps.setup.outputs.archive }}"
++        verbose="${{ inputs.verbose }}"
++        tar="${{ steps.setup.outputs.tar }}"
++        ls -l $archive
++        if [ -s $archive ]
++        then
++            opt=-Pxz
++            [[ $verbose == yes || $verbose == true ]] && opt+=v
++            sudo $tar -C / $opt -f $archive
++        else
++            echo "$archive is empty"
++        fi
++
++    - id: install-and-archive
++      if: steps.cache.outputs.cache-hit != 'true'
++      uses: tecolicom/actions-install-and-archive@9d5afb27f9900f2df47fe40de58fbd837032bddf # v1.3
++      with:
++        run:     ${{ inputs.run }}
++        archive: ${{ steps.setup.outputs.archive }}
++        path:    ${{ inputs.path }}
++        sudo:    ${{ inputs.sudo }}
\ No newline at end of file

.github/actions/pr-title-validator/action.yml

@@ -0,0 +1,47 @@
+diff --git a/.github/actions/pr-title-validator/action.yml b/.github/actions/pr-title-validator/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/pr-title-validator/action.yml
++name: 'PR Title Validator'
++description: 'Validates a pull request title against a regex pattern'
++
++inputs:
++  regex:
++    description: 'Regular expression the PR title must match'
++    required: true
++  breaking-change-regex:
++    description: 'Regex to use instead when the breaking-change label is present'
++    required: false
++    default: ''
++  labels:
++    description: 'JSON array of label names on the PR'
++    required: false
++    default: '[]'
++  title:
++    description: 'PR title to validate. Defaults to github.event.pull_request.title.'
++    required: false
++    default: ''
++
++runs:
++  using: composite
++  steps:
++    - name: Validate PR title
++      shell: bash
++      env:
++        PR_TITLE: ${{ inputs.title || github.event.pull_request.title }}
++        INPUT_REGEX: ${{ inputs.regex }}
++        BREAKING_REGEX: ${{ inputs.breaking-change-regex }}
++        LABELS: ${{ inputs.labels }}
++      run: |
++        REGEX="$INPUT_REGEX"
++        if [[ -n "$BREAKING_REGEX" ]] && echo "$LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++          REGEX="$BREAKING_REGEX"
++          echo "breaking-change label detected, using breaking change regex."
++        fi
++
++        if [[ "$PR_TITLE" =~ $REGEX ]]; then
++          echo "PR title matches pattern."
++          exit 0
++        fi
++        echo "::error::PR title \"$PR_TITLE\" does not match pattern: $REGEX"
++        exit 1
\ No newline at end of file

.github/actions/use-homebrew-tools/action.yml

@@ -0,0 +1,51 @@
+diff --git a/.github/actions/use-homebrew-tools/action.yml b/.github/actions/use-homebrew-tools/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/use-homebrew-tools/action.yml
++# This is copied from https://github.com/tecolicom/actions-use-homebrew-tools/
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache homebrew tools
++description: 'GitHub Action to install and cache homebrew tools'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  tools:   { required: false, type: string }
++  key:     { required: false, type: string }
++  path:    { required: false, type: string }
++  cache:   { required: false, type: string, default: yes }
++  verbose: { required: false, type: boolean, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.update.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup use-homebrew-tools
++        given_key="${{ inputs.key }}"
++        brew_version="$(brew --version)"
++        echo "$brew_version"
++        version_key="$( echo "$brew_version" | (md5sum||md5) | awk '{print $1}' )"
++        key="${given_key:+$given_key-}${version_key}"
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            command=brew install
++            prefix=$(brew --prefix)
++            key=$key
++        END
++
++    - id: update
++      uses: ./.github/actions/install-and-cache
++      with:
++        run:     ${{ steps.setup.outputs.command }} ${{ inputs.tools }}
++        path:    ${{ steps.setup.outputs.prefix }} ${{ inputs.path }}
++        key:     ${{ steps.setup.outputs.key }}
++        cache:   ${{ inputs.cache }}
++        verbose: ${{ inputs.verbose }}
\ No newline at end of file

.github/workflows/auto-assign-pr.yml

@@ -0,0 +1,8 @@
+diff --git a/.github/workflows/auto-assign-pr.yml b/.github/workflows/auto-assign-pr.yml
+--- a/.github/workflows/auto-assign-pr.yml
++++ b/.github/workflows/auto-assign-pr.yml
+   assign-author:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: toshimaru/auto-author-assign@v2.1.1
++      - uses: toshimaru/auto-author-assign@16f0022cf3d7970c106d8d1105f75a1165edb516 # v2.1.1
\ No newline at end of file

.github/workflows/benchmark.yml

@@ -0,0 +1,86 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/benchmark.yml
++# issue_comment is used here to trigger on PR comments, as opposed to pull_request_review
++# (review submissions) or pull_request_review_comment (comments on the diff itself)
++# we want to trigger this on comment creation or edit
++on:
++  issue_comment:
++    types: [created, edited]
++name: Benchmarking PR performance
++jobs:
++  run-benchmark:
++    name: Run benchmarks
++    if: >
++      github.event.issue.pull_request &&
++      (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
++    runs-on: ubuntu-latest
++    permissions:
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
++    steps:
++      - name: Get PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++        with:
++          ref: ${{ steps.pr.outputs.head_sha }}
++      - uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file

.github/workflows/build.yml

@@ -0,0 +1,315 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ name: build
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   RUST_BACKTRACE: 1
+ 
++# Supply chain security: all cargo commands that resolve dependencies use --locked to
++# enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
++# (potentially compromised) dependency version. If Cargo.lock is out of sync with
++# Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
++#
++# Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
++# save-if restricts cache writes to main pushes only. PRs read from main's cache but
++# never write their own entries.
++#
++# The key insight: Cargo.lock changes infrequently, so main's cache key almost always
++# matches. PRs download and compile zero dependencies on cache hit. By only writing on
++# main, we keep main's cache entries alive (no LRU eviction from PR churn), and every
++# PR benefits from them.
++#
++# Without this, GHA's ref-scoped caching works against us: each PR writes ~6.3GB of
++# cache entries (14 jobs x ~450MB) that only that PR can read. A handful of active PRs
++# fills the 10GB cache budget, LRU evicts main's shared entries, and every subsequent
++# PR compiles from scratch.
++#
++# The save-if condition checks both event_name == 'push' and ref == main because
++# pull_request_target events set github.ref to the base branch (main), not the PR
++# branch. Without the event_name check, those workflows would write cache entries on
++# every PR.
++#
++# Note: actions-rust-lang/setup-rust-toolchain has built-in Swatinem/rust-cache that
++# writes on every run with no save-if support. We disable it with cache: false and
++# manage caching explicitly via the Swatinem/rust-cache steps below.
++
+ jobs:
+   format:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: rustfmt
+       - name: format
+         run: cargo fmt -- --check
+   msrv:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+       - name: verify-msrv
+   msrv-run-tests:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+-      - uses: taiki-e/install-action@nextest
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Get rust-version from Cargo.toml
+         id: rust-version
+         run: echo "RUST_VERSION=$(cargo msrv show --path kernel/ --output-format minimal)" >> $GITHUB_ENV
+       - name: Install specified rust version
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           toolchain: ${{ env.RUST_VERSION }}
+       - name: run tests
+         run: |
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+-          cargo +$(cargo msrv show --output-format minimal) nextest run
++          cargo +$(cargo msrv show --output-format minimal) nextest run --locked
+   docs:
+     runs-on: ubuntu-latest
+     env:
+       RUSTDOCFLAGS: -D warnings
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build docs
+-        run: cargo doc --workspace --all-features
+-
++        run: cargo doc --locked --workspace --all-features --no-deps
+ 
+   # When we run cargo { build, clippy } --no-default-features, we want to build/lint the kernel to
+   # ensure that we can build the kernel without any features enabled. Unfortunately, due to how
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with clippy
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: clippy
+-      - uses: Swatinem/rust-cache@v2
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build and lint with clippy
+-        run: cargo clippy --benches --tests --all-features -- -D warnings
++        run: cargo clippy --locked --benches --tests --all-features -- -D warnings
+       - name: lint without default features - packages which depend on kernel with features enabled
+-        run: cargo clippy --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+       - name: lint without default features - packages which don't depend on kernel with features enabled
+-        run: cargo clippy --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
+       - name: check kernel builds with default-engine-native-tls
+-        run: cargo build -p feature_tests --features default-engine-native-tls
++        run: cargo build --locked -p feature_tests --features default-engine-native-tls
++      - name: test native-tls backend has no crypto provider conflict
++        run: cargo test --locked -p feature_tests --features default-engine-native-tls
+       - name: check kernel builds with default-engine-rustls
+-        run: cargo build -p feature_tests --features default-engine-rustls
++        run: cargo build --locked -p feature_tests --features default-engine-rustls
++      - name: test rustls TLS backend feature-tests
++        run: cargo test --locked -p feature_tests --features default-engine-rustls
+   test:
+     runs-on: ${{ matrix.os }}
+     strategy:
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++      - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
++        id: filter
++        with:
++          filters: |
++            ffi:
++              - 'ffi/src/handle.rs'
++              - 'ffi-proc-macros/**'
+       - name: Install minimal stable with clippy and rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: test
+-        run: cargo nextest run --workspace --all-features -E 'not test(read_table_version_hdfs)'
++        run: cargo nextest run --locked --workspace --all-features -E 'not test(read_table_version_hdfs) and not test(invalid_handle_code)'
++      - name: trybuild tests
++        if: steps.filter.outputs.ffi == 'true'
++        run: cargo test --locked --package delta_kernel_ffi --features internal-api -- invalid_handle_code
+ 
+   ffi_test:
+     runs-on: ${{ matrix.os }}
+           - macOS-latest
+           - ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Setup cmake
+-        uses: jwlawson/actions-setup-cmake@v2
++        uses: jwlawson/actions-setup-cmake@0d6a7d60b009d01c9e7523be22153ff8f19460d3 # v2.2.0
+         with:
+-          cmake-version: '3.30.x'
++          cmake-version: "3.30.x"
+       - name: Install arrow-glib-linux
+         run: |
+           if [ "$RUNNER_OS" == "Linux" ]; then
+            fi
+       - name: Install arrow-glib-macOS
+         if: runner.os == 'macOS'
+-        uses: tecolicom/actions-use-homebrew-tools@v1
++        uses: ./.github/actions/use-homebrew-tools
+         with:
+-          tools: 'apache-arrow apache-arrow-glib'
++          tools: "apache-arrow apache-arrow-glib"
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Set output on fail
+         run: echo "CTEST_OUTPUT_ON_FAILURE=1" >> "$GITHUB_ENV"
+       - name: Build kernel
+         run: |
+           pushd acceptance
+-          cargo build
++          cargo build --locked
+           popd
+           pushd ffi
+-          cargo b --features default-engine-rustls,test-ffi,tracing,uc-catalog
++          cargo build --locked --features default-engine-rustls,test-ffi,tracing,delta-kernel-unity-catalog
+           popd
+       - name: build and run read-table test
+         run: |
+           cmake ..
+           make
+           make test
+-      - name: build and run uc-catalog-ffi test
++      - name: build and run delta-kernel-unity-catalog-ffi test
+         run: |
+-          pushd ffi/examples/uc-catalog-example
++          pushd ffi/examples/delta-kernel-unity-catalog-example
+           mkdir build
+           pushd build
+           cmake ..
+           make
+           make test
+   miri:
+-    name: "Miri"
++    name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
++    strategy:
++      matrix:
++        partition: [1, 2, 3]
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install Miri
+         run: |
+           rustup toolchain install nightly --component miri
+           rustup override set nightly
+           cargo miri setup
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Test with Miri
+         run: |
+           pushd ffi
+-          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --features default-engine-rustls,uc-catalog
++          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --locked --features default-engine-rustls,delta-kernel-unity-catalog --partition slice:${{ matrix.partition }}/3
+ 
+   coverage:
+     runs-on: ubuntu-latest
+     env:
+       CARGO_TERM_COLOR: always
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install rust
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-llvm-cov
+-        uses: taiki-e/install-action@cargo-llvm-cov
++        uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
+       - name: Upload coverage to Codecov
+-        uses: codecov/codecov-action@v5
++        uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
+           files: codecov.json
+           fail_ci_if_error: true
\ No newline at end of file

.github/workflows/comment-on-title-failure.yml

@@ -0,0 +1,65 @@
+diff --git a/.github/workflows/comment-on-title-failure.yml b/.github/workflows/comment-on-title-failure.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/comment-on-title-failure.yml
++name: Comment on PR Title Failure
++
++on:
++  workflow_run:
++    workflows: ["Validate PR Title"]
++    types: [completed]
++
++jobs:
++  comment:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      # Step taken from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR info
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          # If the PR is from a fork, prefix it with `<owner-login>:`, otherwise only the PR branch name is relevant:
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        # Query the PR number by repo + branch, then assign to step output:
++        run: |
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++             --json 'number,title' --jq '"number=\(.number)\ntitle=\(.title)"' \
++             >> "${GITHUB_OUTPUT}"
++
++      - name: Find existing comment
++        id: find
++        uses: peter-evans/find-comment@3eae4d37986fb5a8592848f6a574fdf654e61f9e # v3.1.0
++        with:
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          comment-author: 'github-actions[bot]'
++          body-includes: PR title does not match the required pattern
++
++      - name: Post or update failure comment
++        if: ${{ github.event.workflow_run.conclusion == 'failure' }}
++        uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
++        env:
++          PR_TITLE: ${{ steps.pr-context.outputs.title }}
++        with:
++          comment-id: ${{ steps.find.outputs.comment-id }}
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          body: |
++            PR title does not match the required pattern. Please ensure you follow the [conventional commits](https://www.conventionalcommits.org/) spec.
++
++            Your title should start with `feat:`, `fix:`, `chore:`, `docs:`, `perf:`, `refactor:`, `test:`, or `ci:`, and if it's a breaking change that should be suffixed with a `!` (like `feat!:`), and then a 1-72 character brief description of your change.
++
++            **Title:** `${{ env.PR_TITLE }}`
++
++      - name: Delete comment on success
++        if: ${{ github.event.workflow_run.conclusion == 'success' && steps.find.outputs.comment-id != '' }}
++        env:
++          GH_TOKEN: ${{ github.token }}
++        run: |
++          gh api repos/${{ github.repository }}/issues/comments/${{ steps.find.outputs.comment-id }} -X DELETE
\ No newline at end of file

.github/workflows/pr-validator.yml

@@ -0,0 +1,57 @@
+diff --git a/.github/workflows/pr-validator.yml b/.github/workflows/pr-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-validator.yml
++name: Validate PR Title
++
++on:
++  pull_request:
++    types: [opened, edited, reopened, synchronize, labeled, unlabeled]
++  workflow_run:
++    workflows: ["semver-label"] # we need this since auto-labels from jobs don't trigger a workflow
++    types: [completed]
++
++jobs:
++  validate-title:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Resolve PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ github.token }}
++          # Captured as env vars to prevent expression injection into the shell command.
++          PR_TITLE: ${{ github.event.pull_request.title }}
++          PR_LABELS_JSON: ${{ toJson(github.event.pull_request.labels.*.name) }}
++        run: |
++          if [[ "${{ github.event_name }}" == "workflow_run" ]]; then
++            pr_json=$(gh api --paginate repos/${{ github.repository }}/pulls \
++              --jq ".[] | select(.head.sha == \"${{ github.event.workflow_run.head_sha }}\")")
++            echo "number=$(echo "$pr_json" | jq -r '.number')" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$pr_json" | jq -r '.title'
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$pr_json" | jq -c '[.labels[].name]')" >> "$GITHUB_OUTPUT"
++          else
++            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$PR_TITLE"
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$PR_LABELS_JSON" | jq -c '.')" >> "$GITHUB_OUTPUT"
++          fi
++
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++
++      - uses: ./.github/actions/pr-title-validator
++        with:
++          regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!?(\(.+\))?: .{1,72}$'
++          breaking-change-regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!(\(.+\))?: .{1,72}$'
++          labels: ${{ steps.pr.outputs.labels }}
++          title: ${{ steps.pr.outputs.title }}
\ No newline at end of file

.github/workflows/run-examples.yml

@@ -0,0 +1,55 @@
+diff --git a/.github/workflows/run-examples.yml b/.github/workflows/run-examples.yml
+--- a/.github/workflows/run-examples.yml
++++ b/.github/workflows/run-examples.yml
+ name: run-examples
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   run-examples:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+ 
+       - name: Run all examples
+         run: |
+               # Special case for write-table: it needs a temp directory
+               if [ "$example_dir" = "write-table" ]; then
+                 tmp_dir=$(mktemp -d)
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
+                 rm -r "$tmp_dir"
+               # Special case for inspect-table: it needs an operation/subcommand, run each one
+               elif [ "$example_dir" = "inspect-table" ]; then
+                 for operation in table-version metadata schema scan-metadata actions; do
+                   echo "  Running inspect-table with operation: $operation"
+-                  cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
++                  cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
+                 done
+               # Special case for read-table-changes: skip running it in CI as it needs a specific CDF-enabled table
+               # but still verify it compiles
+               # TODO: Add a suitable test table for CDF
+               elif [ "$example_dir" = "read-table-changes" ]; then
+                 echo "Building read-table-changes (skipping run - requires CDF-enabled table)"
+-                cargo build --manifest-path "$example_dir/Cargo.toml" --release
++                cargo build --locked --manifest-path "$example_dir/Cargo.toml" --release
+               else
+                 # All other examples run with the test table path
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
+               fi
+ 
+               echo ""
\ No newline at end of file

.github/workflows/run_integration_test.yml

@@ -0,0 +1,70 @@
+diff --git a/.github/workflows/run_integration_test.yml b/.github/workflows/run_integration_test.yml
+--- a/.github/workflows/run_integration_test.yml
++++ b/.github/workflows/run_integration_test.yml
+-name: Run tests to ensure we can compile across arrow versions
++# TODO: Disabled. The test script runs cargo update which resolves fresh dependencies,
++#       bypassing the Cargo.lock supply chain policy (see build.yml top-level comment).
+ 
+-on: [workflow_dispatch, push, pull_request]
+-
+-jobs:
+-  arrow_integration_test:
+-    runs-on: ${{ matrix.os }}
+-    timeout-minutes: 20
+-    strategy:
+-      fail-fast: false
+-      matrix:
+-        include:
+-          - os: macOS-latest
+-          - os: ubuntu-latest
+-          - os: windows-latest
+-            skip: ${{ github.event_name == 'pull_request' }} # skip running windows tests on every PR since they are slow
+-    steps:
+-      - name: Skip job for pull requests on Windows
+-        if: ${{ matrix.skip }}
+-        run: echo "Skipping job for pull requests on Windows."
+-      - uses: actions/checkout@v4
+-        if: ${{ !matrix.skip }}
+-      - name: Setup rust toolchain
+-        if: ${{ !matrix.skip }}
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - name: Run integration tests
+-        if: ${{ !matrix.skip }}
+-        shell: bash
+-        run: pushd integration-tests && ./test-all-arrow-versions.sh
++# name: Run tests to ensure we can compile across arrow versions
++#
++# on: [workflow_dispatch, push, pull_request, merge_group]
++#
++# jobs:
++#   arrow_integration_test:
++#     runs-on: ${{ matrix.os }}
++#     timeout-minutes: 20
++#     strategy:
++#       fail-fast: false
++#       matrix:
++#         include:
++#           - os: macOS-latest
++#           - os: ubuntu-latest
++#           - os: windows-latest
++#             skip: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }} # skip running windows tests on PRs and merge queue since they are slow
++#     steps:
++#       - name: Skip job for pull requests on Windows
++#         if: ${{ matrix.skip }}
++#         run: echo "Skipping job for pull requests on Windows."
++#       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++#         if: ${{ !matrix.skip }}
++#       - name: Setup rust toolchain
++#         if: ${{ !matrix.skip }}
++#         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++#         with:
++#           cache: false
++#       # See build.yml top-level comment for why save-if is restricted to main.
++#       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++#         if: ${{ !matrix.skip }}
++#         with:
++#           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++#       - name: Run integration tests
++#         if: ${{ !matrix.skip }}
++#         shell: bash
++#         run: pushd integration-tests && ./test-all-arrow-versions.sh
\ No newline at end of file

.github/workflows/semver-checks.yml

@@ -0,0 +1,136 @@
+diff --git a/.github/workflows/semver-checks.yml b/.github/workflows/semver-checks.yml
+--- a/.github/workflows/semver-checks.yml
++++ b/.github/workflows/semver-checks.yml
+ name: semver-checks
+ 
+-# Trigger when a PR is opened or changed
++# Trigger when a PR is opened or changed. This runs with `pull_request` trigger, which means it has
++# only read perms. The adding of the label happens in semver-label.yml via workflow_run which will
++# will look at the status of this job, and always runs in the base-repo context.
+ on:
+-  pull_request_target:
++  pull_request:
+     types:
+       - opened
+       - synchronize
+       - reopened
++  merge_group:
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   check_if_pr_breaks_semver:
+     runs-on: ubuntu-latest
+     permissions:
+-      # this job runs with read because it checks out the PR head which could contain malicious code
+       contents: read
+     steps:
+-      - uses: actions/checkout@v4
+-        name: checkout full rep
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           fetch-depth: 0
+-          ref: ${{ github.event.pull_request.head.sha }}
++          ref: >-
++            ${{ github.event_name == 'merge_group'
++                && github.event.merge_group.head_sha
++                || github.event.pull_request.head.sha }}
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-semver-checks
++        uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
++        with:
++          tool: cargo-semver-checks
++      - name: Compute baseline revision
++        id: baseline
+         shell: bash
++        env:
++          MERGE_GROUP_BASE_SHA: ${{ github.event.merge_group.base_sha }}
++          PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
++          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
+         run: |
+-          cargo install cargo-semver-checks --locked
+-      - name: Run check
++          if [ "${{ github.event_name }}" = "merge_group" ]; then
++            echo "rev=${MERGE_GROUP_BASE_SHA}" >> "$GITHUB_OUTPUT"
++          else
++            # Use the merge-base instead of the PR base SHA. The base SHA is the tip of
++            # the target branch when the webhook fires, which can differ from where the PR
++            # actually diverged. Using merge-base avoids false positives when the PR branch
++            # is behind the target branch.
++            MERGE_BASE=$(git merge-base "$PR_HEAD_SHA" "$PR_BASE_SHA")
++            echo "rev=${MERGE_BASE}" >> "$GITHUB_OUTPUT"
++          fi
++      - name: Run semver check
+         id: check
+         continue-on-error: true
+         shell: bash
++        env:
++          BASELINE_REV: ${{ steps.baseline.outputs.rev }}
+         # only check semver on released crates (delta_kernel and delta_kernel_ffi).
+         # note that this won't run on proc macro/derive crates, so don't need to include
+         # delta_kernel_derive etc.
+         run: |
+-          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features --baseline-rev ${{ github.event.pull_request.base.sha }}
+-      - name: On Failure
+-        id: set_failure
+-        if: ${{ steps.check.outcome == 'failure' }}
+-        run: |
+-          echo "Checks failed"
+-          echo "check_status=failure" >> $GITHUB_OUTPUT
+-      - name: On Success
+-        id: set_success
+-        if: ${{ steps.check.outcome == 'success' }}
+-        run: |
+-          echo "Checks succeed"
+-          echo "check_status=success" >> $GITHUB_OUTPUT
+-    outputs:
+-      check_status: ${{ steps.set_failure.outputs.check_status || steps.set_success.outputs.check_status }}
+-  update_label_if_needed:
+-    needs: check_if_pr_breaks_semver
+-    runs-on: ubuntu-latest
+-    permissions:
+-      # this job only looks at previous output and then sets a label, so malicious code in the PR
+-      # isn't a concern
+-      pull-requests: write
+-    steps:
+-      - name: On Failure
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'failure'
+-        uses: actions-ecosystem/action-add-labels@v1
++          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features \
++            --baseline-rev "$BASELINE_REV"
++      # Upload the step outcome as an artifact so semver-label.yml can read it via workflow_run.
++      # steps.check.outcome is the raw result *before* continue-on-error converts it to "success",
++      # so it correctly reflects whether a breaking change was detected.
++      # Only upload for pull_request events; merge_group runs have no PR to label.
++      - name: Save semver outcome
++        if: github.event_name == 'pull_request'
++        env:
++          SEMVER_OUTCOME: ${{ steps.check.outcome }}
++        run: echo "$SEMVER_OUTCOME" > semver-outcome.txt
++      - name: Upload semver outcome
++        if: github.event_name == 'pull_request'
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+         with:
+-          labels: breaking-change
+-      - name: Remove breaking-change label
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success' && contains(github.event.pull_request.labels.*.name, 'breaking-change')
+-        uses: actions-ecosystem/action-remove-labels@v1
+-        with:
+-          labels: breaking-change
+-      - name: On Success
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success'
+-        run: |
+-          echo "Checks succeed"
+-      - name: Fail On Incorrect Previous Output
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status != 'success' && needs.check_if_pr_breaks_semver.outputs.check_status != 'failure'
+-        run: exit 1
++          name: semver-outcome
++          path: semver-outcome.txt
++          retention-days: 1
\ No newline at end of file

.github/workflows/semver-label.yml

@@ -0,0 +1,81 @@
+diff --git a/.github/workflows/semver-label.yml b/.github/workflows/semver-label.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/semver-label.yml
++name: semver-label
++
++# Apply or remove the breaking-change label based on the outcome of the semver-checks workflow.
++# This must be a separate workflow from semver-checks.yml: label writes require pull-requests:write,
++# which is unavailable in pull_request workflows triggered by fork PRs. workflow_run always runs
++# in the base-repo context with full write permissions, and never executes PR code.
++on:
++  workflow_run:
++    workflows: ["semver-checks"]
++    types: [completed]
++
++jobs:
++  update_label_if_needed:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++      actions: read
++    # Label updates only apply to PRs; merge_group runs have no associated PR to label.
++    if: github.event.workflow_run.event == 'pull_request'
++    steps:
++      # Resolve PR number from the triggering workflow run's branch. For fork PRs the branch
++      # must be prefixed with `<owner>:` so gh pr view can locate it.
++      # Pattern from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR number
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        run: |
++          echo "Looking up PR for branch '${PR_BRANCH}' in repo '${PR_TARGET_REPO}'"
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++            --json 'number' --jq '"number=\(.number)"' \
++            >> "${GITHUB_OUTPUT}"
++          echo "PR lookup complete: $(cat "${GITHUB_OUTPUT}")"
++
++      # Download the semver outcome artifact written by semver-checks.yml.
++      # steps.check.outcome in that workflow is the raw result before continue-on-error
++      # converts it to "success", so it correctly reflects whether a breaking change was found.
++      - name: Download semver outcome
++        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
++        with:
++          name: semver-outcome
++          github-token: ${{ github.token }}
++          run-id: ${{ github.event.workflow_run.id }}
++
++      - name: Update breaking-change label
++        if: steps.pr-context.outputs.number != ''
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_NUMBER: ${{ steps.pr-context.outputs.number }}
++        run: |
++          STEP_OUTCOME=$(cat semver-outcome.txt)
++          echo "Semver check outcome: '${STEP_OUTCOME}' for PR #${PR_NUMBER}"
++
++          if [[ "$STEP_OUTCOME" == "failure" ]]; then
++            echo "Breaking change detected -- adding 'breaking-change' label to PR #$PR_NUMBER"
++            gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --add-label "breaking-change"
++          elif [[ "$STEP_OUTCOME" == "success" ]]; then
++            # Remove the label only if it is currently present; gh pr edit fails on absent labels.
++            CURRENT_LABELS=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --json labels --jq '[.labels[].name]')
++            echo "Current PR labels: $CURRENT_LABELS"
++            if echo "$CURRENT_LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++              echo "Semver check passed -- removing 'breaking-change' label from PR #$PR_NUMBER"
++              gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --remove-label "breaking-change"
++            else
++              echo "Semver check passed -- 'breaking-change' label not present, nothing to do"
++            fi
++          else
++            echo "ERROR: unexpected semver outcome '${STEP_OUTCOME}' in semver-outcome.txt"
++            exit 1
++          fi
\ No newline at end of file

.gitignore

@@ -0,0 +1,31 @@
+diff --git a/.gitignore b/.gitignore
+--- a/.gitignore
++++ b/.gitignore
+ 
+ # IDE
+ .claude/
++.cursor/
+ .dir-locals.el
+ .idea/
+ .vscode/
+ .zed
+ .cache/
+ .clangd
++*.*~
+ 
+ # Rust
++.cargo-home
+ target/
+-/Cargo.lock
+ integration-tests/Cargo.lock
+ 
+ # Project
+ acceptance/tests/dat/
++acceptance/workloads/
+ ffi/examples/read-table/build
++ffi/examples/visit-expression/build
+ /build
+ /kernel/target
+ /target
++
++/benchmarks/workloads/
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

_{Reproduce locally: git range-diff 8737564..6a0ea39 7b1612f..c6c465f | Disable: git config gitstack.push-range-diff false}

scottsand-db

Looks AWESOME! Thanks! Left some comments

william-ch-databricks · 2026-04-15T01:06:22Z

Range-diff: main (c6c465f -> bf96beb)

.github/actions/install-and-cache/action.yml

@@ -0,0 +1,105 @@
+diff --git a/.github/actions/install-and-cache/action.yml b/.github/actions/install-and-cache/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/install-and-cache/action.yml
++# This is copied from https://github.com/tecolicom/actions-install-and-cache
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache generic backend
++description: 'GitHub Action to run installer and cache the result'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  run:     { required: true,  type: string }
++  path:    { required: true,  type: string }
++  cache:   { required: false, type: string, default: yes }
++  key:     { required: false, type: string }
++  sudo:    { required: false, type: string }
++  verbose: { required: false, type: string, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.cache.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup install-and-cache
++        define() { IFS='\n' read -r -d '' ${1} || true ; }
++        define script <<'EOS_cad8_c24e_'
++        ${{ inputs.run }}
++        EOS_cad8_c24e_
++        directory="${{ inputs.path }}"
++        given_key="${{ inputs.key }}"
++        archive= key=
++        case "${{ inputs.cache }}" in
++            yes|workflow)
++                cache="${{ inputs.cache }}"
++                uname -mrs
++                hash=$( (uname -mrs ; cat <<< "$script" ; echo $directory) | \
++                        (md5sum||md5) | awk '{print $1}' )
++                key="${hash}${given_key:+-$given_key}"
++                [ "$cache" == 'workflow' ] && \
++                    key+="-${{ github.run_id }}-${{ github.run_attempt }}"
++                archive=$HOME/archive-$hash.tz
++                ;;
++            *)
++                cache=no
++                ;;
++        esac
++        # use "--recursive-unlink" option if GNU tar is found
++        if tar --version | grep GNU > /dev/null
++        then
++            tar="tar --recursive-unlink"
++        elif gtar --version | grep GNU > /dev/null
++        then
++            tar="gtar --recursive-unlink"
++        else
++            tar=tar
++        fi
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            cache=$cache
++            archive=$archive
++            key=$key
++            tar=$tar
++        END
++
++    - id: cache
++      if: steps.setup.outputs.cache != 'no'
++      uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
++      with:
++        path: ${{ steps.setup.outputs.archive }}
++        key:  ${{ steps.setup.outputs.key }}
++
++    - id: extract
++      if: steps.setup.outputs.cache != 'no' && steps.cache.outputs.cache-hit == 'true'
++      shell: bash
++      run: |
++        : extract
++        archive="${{ steps.setup.outputs.archive }}"
++        verbose="${{ inputs.verbose }}"
++        tar="${{ steps.setup.outputs.tar }}"
++        ls -l $archive
++        if [ -s $archive ]
++        then
++            opt=-Pxz
++            [[ $verbose == yes || $verbose == true ]] && opt+=v
++            sudo $tar -C / $opt -f $archive
++        else
++            echo "$archive is empty"
++        fi
++
++    - id: install-and-archive
++      if: steps.cache.outputs.cache-hit != 'true'
++      uses: tecolicom/actions-install-and-archive@9d5afb27f9900f2df47fe40de58fbd837032bddf # v1.3
++      with:
++        run:     ${{ inputs.run }}
++        archive: ${{ steps.setup.outputs.archive }}
++        path:    ${{ inputs.path }}
++        sudo:    ${{ inputs.sudo }}
\ No newline at end of file

.github/actions/pr-title-validator/action.yml

@@ -0,0 +1,47 @@
+diff --git a/.github/actions/pr-title-validator/action.yml b/.github/actions/pr-title-validator/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/pr-title-validator/action.yml
++name: 'PR Title Validator'
++description: 'Validates a pull request title against a regex pattern'
++
++inputs:
++  regex:
++    description: 'Regular expression the PR title must match'
++    required: true
++  breaking-change-regex:
++    description: 'Regex to use instead when the breaking-change label is present'
++    required: false
++    default: ''
++  labels:
++    description: 'JSON array of label names on the PR'
++    required: false
++    default: '[]'
++  title:
++    description: 'PR title to validate. Defaults to github.event.pull_request.title.'
++    required: false
++    default: ''
++
++runs:
++  using: composite
++  steps:
++    - name: Validate PR title
++      shell: bash
++      env:
++        PR_TITLE: ${{ inputs.title || github.event.pull_request.title }}
++        INPUT_REGEX: ${{ inputs.regex }}
++        BREAKING_REGEX: ${{ inputs.breaking-change-regex }}
++        LABELS: ${{ inputs.labels }}
++      run: |
++        REGEX="$INPUT_REGEX"
++        if [[ -n "$BREAKING_REGEX" ]] && echo "$LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++          REGEX="$BREAKING_REGEX"
++          echo "breaking-change label detected, using breaking change regex."
++        fi
++
++        if [[ "$PR_TITLE" =~ $REGEX ]]; then
++          echo "PR title matches pattern."
++          exit 0
++        fi
++        echo "::error::PR title \"$PR_TITLE\" does not match pattern: $REGEX"
++        exit 1
\ No newline at end of file

.github/actions/use-homebrew-tools/action.yml

@@ -0,0 +1,51 @@
+diff --git a/.github/actions/use-homebrew-tools/action.yml b/.github/actions/use-homebrew-tools/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/use-homebrew-tools/action.yml
++# This is copied from https://github.com/tecolicom/actions-use-homebrew-tools/
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache homebrew tools
++description: 'GitHub Action to install and cache homebrew tools'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  tools:   { required: false, type: string }
++  key:     { required: false, type: string }
++  path:    { required: false, type: string }
++  cache:   { required: false, type: string, default: yes }
++  verbose: { required: false, type: boolean, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.update.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup use-homebrew-tools
++        given_key="${{ inputs.key }}"
++        brew_version="$(brew --version)"
++        echo "$brew_version"
++        version_key="$( echo "$brew_version" | (md5sum||md5) | awk '{print $1}' )"
++        key="${given_key:+$given_key-}${version_key}"
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            command=brew install
++            prefix=$(brew --prefix)
++            key=$key
++        END
++
++    - id: update
++      uses: ./.github/actions/install-and-cache
++      with:
++        run:     ${{ steps.setup.outputs.command }} ${{ inputs.tools }}
++        path:    ${{ steps.setup.outputs.prefix }} ${{ inputs.path }}
++        key:     ${{ steps.setup.outputs.key }}
++        cache:   ${{ inputs.cache }}
++        verbose: ${{ inputs.verbose }}
\ No newline at end of file

.github/workflows/auto-assign-pr.yml

@@ -0,0 +1,8 @@
+diff --git a/.github/workflows/auto-assign-pr.yml b/.github/workflows/auto-assign-pr.yml
+--- a/.github/workflows/auto-assign-pr.yml
++++ b/.github/workflows/auto-assign-pr.yml
+   assign-author:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: toshimaru/auto-author-assign@v2.1.1
++      - uses: toshimaru/auto-author-assign@16f0022cf3d7970c106d8d1105f75a1165edb516 # v2.1.1
\ No newline at end of file

.github/workflows/benchmark.yml

@@ -0,0 +1,86 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/benchmark.yml
++# issue_comment is used here to trigger on PR comments, as opposed to pull_request_review
++# (review submissions) or pull_request_review_comment (comments on the diff itself)
++# we want to trigger this on comment creation or edit
++on:
++  issue_comment:
++    types: [created, edited]
++name: Benchmarking PR performance
++jobs:
++  run-benchmark:
++    name: Run benchmarks
++    if: >
++      github.event.issue.pull_request &&
++      (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
++    runs-on: ubuntu-latest
++    permissions:
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
++    steps:
++      - name: Get PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++        with:
++          ref: ${{ steps.pr.outputs.head_sha }}
++      - uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file

.github/workflows/build.yml

@@ -0,0 +1,315 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ name: build
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   RUST_BACKTRACE: 1
+ 
++# Supply chain security: all cargo commands that resolve dependencies use --locked to
++# enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
++# (potentially compromised) dependency version. If Cargo.lock is out of sync with
++# Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
++#
++# Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
++# save-if restricts cache writes to main pushes only. PRs read from main's cache but
++# never write their own entries.
++#
++# The key insight: Cargo.lock changes infrequently, so main's cache key almost always
++# matches. PRs download and compile zero dependencies on cache hit. By only writing on
++# main, we keep main's cache entries alive (no LRU eviction from PR churn), and every
++# PR benefits from them.
++#
++# Without this, GHA's ref-scoped caching works against us: each PR writes ~6.3GB of
++# cache entries (14 jobs x ~450MB) that only that PR can read. A handful of active PRs
++# fills the 10GB cache budget, LRU evicts main's shared entries, and every subsequent
++# PR compiles from scratch.
++#
++# The save-if condition checks both event_name == 'push' and ref == main because
++# pull_request_target events set github.ref to the base branch (main), not the PR
++# branch. Without the event_name check, those workflows would write cache entries on
++# every PR.
++#
++# Note: actions-rust-lang/setup-rust-toolchain has built-in Swatinem/rust-cache that
++# writes on every run with no save-if support. We disable it with cache: false and
++# manage caching explicitly via the Swatinem/rust-cache steps below.
++
+ jobs:
+   format:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: rustfmt
+       - name: format
+         run: cargo fmt -- --check
+   msrv:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+       - name: verify-msrv
+   msrv-run-tests:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+-      - uses: taiki-e/install-action@nextest
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Get rust-version from Cargo.toml
+         id: rust-version
+         run: echo "RUST_VERSION=$(cargo msrv show --path kernel/ --output-format minimal)" >> $GITHUB_ENV
+       - name: Install specified rust version
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           toolchain: ${{ env.RUST_VERSION }}
+       - name: run tests
+         run: |
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+-          cargo +$(cargo msrv show --output-format minimal) nextest run
++          cargo +$(cargo msrv show --output-format minimal) nextest run --locked
+   docs:
+     runs-on: ubuntu-latest
+     env:
+       RUSTDOCFLAGS: -D warnings
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build docs
+-        run: cargo doc --workspace --all-features
+-
++        run: cargo doc --locked --workspace --all-features --no-deps
+ 
+   # When we run cargo { build, clippy } --no-default-features, we want to build/lint the kernel to
+   # ensure that we can build the kernel without any features enabled. Unfortunately, due to how
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with clippy
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: clippy
+-      - uses: Swatinem/rust-cache@v2
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build and lint with clippy
+-        run: cargo clippy --benches --tests --all-features -- -D warnings
++        run: cargo clippy --locked --benches --tests --all-features -- -D warnings
+       - name: lint without default features - packages which depend on kernel with features enabled
+-        run: cargo clippy --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+       - name: lint without default features - packages which don't depend on kernel with features enabled
+-        run: cargo clippy --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
+       - name: check kernel builds with default-engine-native-tls
+-        run: cargo build -p feature_tests --features default-engine-native-tls
++        run: cargo build --locked -p feature_tests --features default-engine-native-tls
++      - name: test native-tls backend has no crypto provider conflict
++        run: cargo test --locked -p feature_tests --features default-engine-native-tls
+       - name: check kernel builds with default-engine-rustls
+-        run: cargo build -p feature_tests --features default-engine-rustls
++        run: cargo build --locked -p feature_tests --features default-engine-rustls
++      - name: test rustls TLS backend feature-tests
++        run: cargo test --locked -p feature_tests --features default-engine-rustls
+   test:
+     runs-on: ${{ matrix.os }}
+     strategy:
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++      - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
++        id: filter
++        with:
++          filters: |
++            ffi:
++              - 'ffi/src/handle.rs'
++              - 'ffi-proc-macros/**'
+       - name: Install minimal stable with clippy and rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: test
+-        run: cargo nextest run --workspace --all-features -E 'not test(read_table_version_hdfs)'
++        run: cargo nextest run --locked --workspace --all-features -E 'not test(read_table_version_hdfs) and not test(invalid_handle_code)'
++      - name: trybuild tests
++        if: steps.filter.outputs.ffi == 'true'
++        run: cargo test --locked --package delta_kernel_ffi --features internal-api -- invalid_handle_code
+ 
+   ffi_test:
+     runs-on: ${{ matrix.os }}
+           - macOS-latest
+           - ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Setup cmake
+-        uses: jwlawson/actions-setup-cmake@v2
++        uses: jwlawson/actions-setup-cmake@0d6a7d60b009d01c9e7523be22153ff8f19460d3 # v2.2.0
+         with:
+-          cmake-version: '3.30.x'
++          cmake-version: "3.30.x"
+       - name: Install arrow-glib-linux
+         run: |
+           if [ "$RUNNER_OS" == "Linux" ]; then
+            fi
+       - name: Install arrow-glib-macOS
+         if: runner.os == 'macOS'
+-        uses: tecolicom/actions-use-homebrew-tools@v1
++        uses: ./.github/actions/use-homebrew-tools
+         with:
+-          tools: 'apache-arrow apache-arrow-glib'
++          tools: "apache-arrow apache-arrow-glib"
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Set output on fail
+         run: echo "CTEST_OUTPUT_ON_FAILURE=1" >> "$GITHUB_ENV"
+       - name: Build kernel
+         run: |
+           pushd acceptance
+-          cargo build
++          cargo build --locked
+           popd
+           pushd ffi
+-          cargo b --features default-engine-rustls,test-ffi,tracing,uc-catalog
++          cargo build --locked --features default-engine-rustls,test-ffi,tracing,delta-kernel-unity-catalog
+           popd
+       - name: build and run read-table test
+         run: |
+           cmake ..
+           make
+           make test
+-      - name: build and run uc-catalog-ffi test
++      - name: build and run delta-kernel-unity-catalog-ffi test
+         run: |
+-          pushd ffi/examples/uc-catalog-example
++          pushd ffi/examples/delta-kernel-unity-catalog-example
+           mkdir build
+           pushd build
+           cmake ..
+           make
+           make test
+   miri:
+-    name: "Miri"
++    name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
++    strategy:
++      matrix:
++        partition: [1, 2, 3]
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install Miri
+         run: |
+           rustup toolchain install nightly --component miri
+           rustup override set nightly
+           cargo miri setup
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Test with Miri
+         run: |
+           pushd ffi
+-          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --features default-engine-rustls,uc-catalog
++          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --locked --features default-engine-rustls,delta-kernel-unity-catalog --partition slice:${{ matrix.partition }}/3
+ 
+   coverage:
+     runs-on: ubuntu-latest
+     env:
+       CARGO_TERM_COLOR: always
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install rust
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-llvm-cov
+-        uses: taiki-e/install-action@cargo-llvm-cov
++        uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
+       - name: Upload coverage to Codecov
+-        uses: codecov/codecov-action@v5
++        uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
+           files: codecov.json
+           fail_ci_if_error: true
\ No newline at end of file

.github/workflows/comment-on-title-failure.yml

@@ -0,0 +1,65 @@
+diff --git a/.github/workflows/comment-on-title-failure.yml b/.github/workflows/comment-on-title-failure.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/comment-on-title-failure.yml
++name: Comment on PR Title Failure
++
++on:
++  workflow_run:
++    workflows: ["Validate PR Title"]
++    types: [completed]
++
++jobs:
++  comment:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      # Step taken from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR info
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          # If the PR is from a fork, prefix it with `<owner-login>:`, otherwise only the PR branch name is relevant:
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        # Query the PR number by repo + branch, then assign to step output:
++        run: |
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++             --json 'number,title' --jq '"number=\(.number)\ntitle=\(.title)"' \
++             >> "${GITHUB_OUTPUT}"
++
++      - name: Find existing comment
++        id: find
++        uses: peter-evans/find-comment@3eae4d37986fb5a8592848f6a574fdf654e61f9e # v3.1.0
++        with:
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          comment-author: 'github-actions[bot]'
++          body-includes: PR title does not match the required pattern
++
++      - name: Post or update failure comment
++        if: ${{ github.event.workflow_run.conclusion == 'failure' }}
++        uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
++        env:
++          PR_TITLE: ${{ steps.pr-context.outputs.title }}
++        with:
++          comment-id: ${{ steps.find.outputs.comment-id }}
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          body: |
++            PR title does not match the required pattern. Please ensure you follow the [conventional commits](https://www.conventionalcommits.org/) spec.
++
++            Your title should start with `feat:`, `fix:`, `chore:`, `docs:`, `perf:`, `refactor:`, `test:`, or `ci:`, and if it's a breaking change that should be suffixed with a `!` (like `feat!:`), and then a 1-72 character brief description of your change.
++
++            **Title:** `${{ env.PR_TITLE }}`
++
++      - name: Delete comment on success
++        if: ${{ github.event.workflow_run.conclusion == 'success' && steps.find.outputs.comment-id != '' }}
++        env:
++          GH_TOKEN: ${{ github.token }}
++        run: |
++          gh api repos/${{ github.repository }}/issues/comments/${{ steps.find.outputs.comment-id }} -X DELETE
\ No newline at end of file

.github/workflows/pr-validator.yml

@@ -0,0 +1,57 @@
+diff --git a/.github/workflows/pr-validator.yml b/.github/workflows/pr-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-validator.yml
++name: Validate PR Title
++
++on:
++  pull_request:
++    types: [opened, edited, reopened, synchronize, labeled, unlabeled]
++  workflow_run:
++    workflows: ["semver-label"] # we need this since auto-labels from jobs don't trigger a workflow
++    types: [completed]
++
++jobs:
++  validate-title:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Resolve PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ github.token }}
++          # Captured as env vars to prevent expression injection into the shell command.
++          PR_TITLE: ${{ github.event.pull_request.title }}
++          PR_LABELS_JSON: ${{ toJson(github.event.pull_request.labels.*.name) }}
++        run: |
++          if [[ "${{ github.event_name }}" == "workflow_run" ]]; then
++            pr_json=$(gh api --paginate repos/${{ github.repository }}/pulls \
++              --jq ".[] | select(.head.sha == \"${{ github.event.workflow_run.head_sha }}\")")
++            echo "number=$(echo "$pr_json" | jq -r '.number')" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$pr_json" | jq -r '.title'
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$pr_json" | jq -c '[.labels[].name]')" >> "$GITHUB_OUTPUT"
++          else
++            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$PR_TITLE"
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$PR_LABELS_JSON" | jq -c '.')" >> "$GITHUB_OUTPUT"
++          fi
++
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++
++      - uses: ./.github/actions/pr-title-validator
++        with:
++          regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!?(\(.+\))?: .{1,72}$'
++          breaking-change-regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!(\(.+\))?: .{1,72}$'
++          labels: ${{ steps.pr.outputs.labels }}
++          title: ${{ steps.pr.outputs.title }}
\ No newline at end of file

.github/workflows/run-examples.yml

@@ -0,0 +1,55 @@
+diff --git a/.github/workflows/run-examples.yml b/.github/workflows/run-examples.yml
+--- a/.github/workflows/run-examples.yml
++++ b/.github/workflows/run-examples.yml
+ name: run-examples
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   run-examples:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+ 
+       - name: Run all examples
+         run: |
+               # Special case for write-table: it needs a temp directory
+               if [ "$example_dir" = "write-table" ]; then
+                 tmp_dir=$(mktemp -d)
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
+                 rm -r "$tmp_dir"
+               # Special case for inspect-table: it needs an operation/subcommand, run each one
+               elif [ "$example_dir" = "inspect-table" ]; then
+                 for operation in table-version metadata schema scan-metadata actions; do
+                   echo "  Running inspect-table with operation: $operation"
+-                  cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
++                  cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
+                 done
+               # Special case for read-table-changes: skip running it in CI as it needs a specific CDF-enabled table
+               # but still verify it compiles
+               # TODO: Add a suitable test table for CDF
+               elif [ "$example_dir" = "read-table-changes" ]; then
+                 echo "Building read-table-changes (skipping run - requires CDF-enabled table)"
+-                cargo build --manifest-path "$example_dir/Cargo.toml" --release
++                cargo build --locked --manifest-path "$example_dir/Cargo.toml" --release
+               else
+                 # All other examples run with the test table path
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
+               fi
+ 
+               echo ""
\ No newline at end of file

.github/workflows/run_integration_test.yml

@@ -0,0 +1,70 @@
+diff --git a/.github/workflows/run_integration_test.yml b/.github/workflows/run_integration_test.yml
+--- a/.github/workflows/run_integration_test.yml
++++ b/.github/workflows/run_integration_test.yml
+-name: Run tests to ensure we can compile across arrow versions
++# TODO: Disabled. The test script runs cargo update which resolves fresh dependencies,
++#       bypassing the Cargo.lock supply chain policy (see build.yml top-level comment).
+ 
+-on: [workflow_dispatch, push, pull_request]
+-
+-jobs:
+-  arrow_integration_test:
+-    runs-on: ${{ matrix.os }}
+-    timeout-minutes: 20
+-    strategy:
+-      fail-fast: false
+-      matrix:
+-        include:
+-          - os: macOS-latest
+-          - os: ubuntu-latest
+-          - os: windows-latest
+-            skip: ${{ github.event_name == 'pull_request' }} # skip running windows tests on every PR since they are slow
+-    steps:
+-      - name: Skip job for pull requests on Windows
+-        if: ${{ matrix.skip }}
+-        run: echo "Skipping job for pull requests on Windows."
+-      - uses: actions/checkout@v4
+-        if: ${{ !matrix.skip }}
+-      - name: Setup rust toolchain
+-        if: ${{ !matrix.skip }}
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - name: Run integration tests
+-        if: ${{ !matrix.skip }}
+-        shell: bash
+-        run: pushd integration-tests && ./test-all-arrow-versions.sh
++# name: Run tests to ensure we can compile across arrow versions
++#
++# on: [workflow_dispatch, push, pull_request, merge_group]
++#
++# jobs:
++#   arrow_integration_test:
++#     runs-on: ${{ matrix.os }}
++#     timeout-minutes: 20
++#     strategy:
++#       fail-fast: false
++#       matrix:
++#         include:
++#           - os: macOS-latest
++#           - os: ubuntu-latest
++#           - os: windows-latest
++#             skip: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }} # skip running windows tests on PRs and merge queue since they are slow
++#     steps:
++#       - name: Skip job for pull requests on Windows
++#         if: ${{ matrix.skip }}
++#         run: echo "Skipping job for pull requests on Windows."
++#       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++#         if: ${{ !matrix.skip }}
++#       - name: Setup rust toolchain
++#         if: ${{ !matrix.skip }}
++#         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++#         with:
++#           cache: false
++#       # See build.yml top-level comment for why save-if is restricted to main.
++#       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++#         if: ${{ !matrix.skip }}
++#         with:
++#           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++#       - name: Run integration tests
++#         if: ${{ !matrix.skip }}
++#         shell: bash
++#         run: pushd integration-tests && ./test-all-arrow-versions.sh
\ No newline at end of file

.github/workflows/semver-checks.yml

@@ -0,0 +1,136 @@
+diff --git a/.github/workflows/semver-checks.yml b/.github/workflows/semver-checks.yml
+--- a/.github/workflows/semver-checks.yml
++++ b/.github/workflows/semver-checks.yml
+ name: semver-checks
+ 
+-# Trigger when a PR is opened or changed
++# Trigger when a PR is opened or changed. This runs with `pull_request` trigger, which means it has
++# only read perms. The adding of the label happens in semver-label.yml via workflow_run which will
++# will look at the status of this job, and always runs in the base-repo context.
+ on:
+-  pull_request_target:
++  pull_request:
+     types:
+       - opened
+       - synchronize
+       - reopened
++  merge_group:
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   check_if_pr_breaks_semver:
+     runs-on: ubuntu-latest
+     permissions:
+-      # this job runs with read because it checks out the PR head which could contain malicious code
+       contents: read
+     steps:
+-      - uses: actions/checkout@v4
+-        name: checkout full rep
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           fetch-depth: 0
+-          ref: ${{ github.event.pull_request.head.sha }}
++          ref: >-
++            ${{ github.event_name == 'merge_group'
++                && github.event.merge_group.head_sha
++                || github.event.pull_request.head.sha }}
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-semver-checks
++        uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
++        with:
++          tool: cargo-semver-checks
++      - name: Compute baseline revision
++        id: baseline
+         shell: bash
++        env:
++          MERGE_GROUP_BASE_SHA: ${{ github.event.merge_group.base_sha }}
++          PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
++          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
+         run: |
+-          cargo install cargo-semver-checks --locked
+-      - name: Run check
++          if [ "${{ github.event_name }}" = "merge_group" ]; then
++            echo "rev=${MERGE_GROUP_BASE_SHA}" >> "$GITHUB_OUTPUT"
++          else
++            # Use the merge-base instead of the PR base SHA. The base SHA is the tip of
++            # the target branch when the webhook fires, which can differ from where the PR
++            # actually diverged. Using merge-base avoids false positives when the PR branch
++            # is behind the target branch.
++            MERGE_BASE=$(git merge-base "$PR_HEAD_SHA" "$PR_BASE_SHA")
++            echo "rev=${MERGE_BASE}" >> "$GITHUB_OUTPUT"
++          fi
++      - name: Run semver check
+         id: check
+         continue-on-error: true
+         shell: bash
++        env:
++          BASELINE_REV: ${{ steps.baseline.outputs.rev }}
+         # only check semver on released crates (delta_kernel and delta_kernel_ffi).
+         # note that this won't run on proc macro/derive crates, so don't need to include
+         # delta_kernel_derive etc.
+         run: |
+-          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features --baseline-rev ${{ github.event.pull_request.base.sha }}
+-      - name: On Failure
+-        id: set_failure
+-        if: ${{ steps.check.outcome == 'failure' }}
+-        run: |
+-          echo "Checks failed"
+-          echo "check_status=failure" >> $GITHUB_OUTPUT
+-      - name: On Success
+-        id: set_success
+-        if: ${{ steps.check.outcome == 'success' }}
+-        run: |
+-          echo "Checks succeed"
+-          echo "check_status=success" >> $GITHUB_OUTPUT
+-    outputs:
+-      check_status: ${{ steps.set_failure.outputs.check_status || steps.set_success.outputs.check_status }}
+-  update_label_if_needed:
+-    needs: check_if_pr_breaks_semver
+-    runs-on: ubuntu-latest
+-    permissions:
+-      # this job only looks at previous output and then sets a label, so malicious code in the PR
+-      # isn't a concern
+-      pull-requests: write
+-    steps:
+-      - name: On Failure
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'failure'
+-        uses: actions-ecosystem/action-add-labels@v1
++          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features \
++            --baseline-rev "$BASELINE_REV"
++      # Upload the step outcome as an artifact so semver-label.yml can read it via workflow_run.
++      # steps.check.outcome is the raw result *before* continue-on-error converts it to "success",
++      # so it correctly reflects whether a breaking change was detected.
++      # Only upload for pull_request events; merge_group runs have no PR to label.
++      - name: Save semver outcome
++        if: github.event_name == 'pull_request'
++        env:
++          SEMVER_OUTCOME: ${{ steps.check.outcome }}
++        run: echo "$SEMVER_OUTCOME" > semver-outcome.txt
++      - name: Upload semver outcome
++        if: github.event_name == 'pull_request'
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+         with:
+-          labels: breaking-change
+-      - name: Remove breaking-change label
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success' && contains(github.event.pull_request.labels.*.name, 'breaking-change')
+-        uses: actions-ecosystem/action-remove-labels@v1
+-        with:
+-          labels: breaking-change
+-      - name: On Success
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success'
+-        run: |
+-          echo "Checks succeed"
+-      - name: Fail On Incorrect Previous Output
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status != 'success' && needs.check_if_pr_breaks_semver.outputs.check_status != 'failure'
+-        run: exit 1
++          name: semver-outcome
++          path: semver-outcome.txt
++          retention-days: 1
\ No newline at end of file

.github/workflows/semver-label.yml

@@ -0,0 +1,81 @@
+diff --git a/.github/workflows/semver-label.yml b/.github/workflows/semver-label.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/semver-label.yml
++name: semver-label
++
++# Apply or remove the breaking-change label based on the outcome of the semver-checks workflow.
++# This must be a separate workflow from semver-checks.yml: label writes require pull-requests:write,
++# which is unavailable in pull_request workflows triggered by fork PRs. workflow_run always runs
++# in the base-repo context with full write permissions, and never executes PR code.
++on:
++  workflow_run:
++    workflows: ["semver-checks"]
++    types: [completed]
++
++jobs:
++  update_label_if_needed:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++      actions: read
++    # Label updates only apply to PRs; merge_group runs have no associated PR to label.
++    if: github.event.workflow_run.event == 'pull_request'
++    steps:
++      # Resolve PR number from the triggering workflow run's branch. For fork PRs the branch
++      # must be prefixed with `<owner>:` so gh pr view can locate it.
++      # Pattern from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR number
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        run: |
++          echo "Looking up PR for branch '${PR_BRANCH}' in repo '${PR_TARGET_REPO}'"
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++            --json 'number' --jq '"number=\(.number)"' \
++            >> "${GITHUB_OUTPUT}"
++          echo "PR lookup complete: $(cat "${GITHUB_OUTPUT}")"
++
++      # Download the semver outcome artifact written by semver-checks.yml.
++      # steps.check.outcome in that workflow is the raw result before continue-on-error
++      # converts it to "success", so it correctly reflects whether a breaking change was found.
++      - name: Download semver outcome
++        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
++        with:
++          name: semver-outcome
++          github-token: ${{ github.token }}
++          run-id: ${{ github.event.workflow_run.id }}
++
++      - name: Update breaking-change label
++        if: steps.pr-context.outputs.number != ''
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_NUMBER: ${{ steps.pr-context.outputs.number }}
++        run: |
++          STEP_OUTCOME=$(cat semver-outcome.txt)
++          echo "Semver check outcome: '${STEP_OUTCOME}' for PR #${PR_NUMBER}"
++
++          if [[ "$STEP_OUTCOME" == "failure" ]]; then
++            echo "Breaking change detected -- adding 'breaking-change' label to PR #$PR_NUMBER"
++            gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --add-label "breaking-change"
++          elif [[ "$STEP_OUTCOME" == "success" ]]; then
++            # Remove the label only if it is currently present; gh pr edit fails on absent labels.
++            CURRENT_LABELS=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --json labels --jq '[.labels[].name]')
++            echo "Current PR labels: $CURRENT_LABELS"
++            if echo "$CURRENT_LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++              echo "Semver check passed -- removing 'breaking-change' label from PR #$PR_NUMBER"
++              gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --remove-label "breaking-change"
++            else
++              echo "Semver check passed -- 'breaking-change' label not present, nothing to do"
++            fi
++          else
++            echo "ERROR: unexpected semver outcome '${STEP_OUTCOME}' in semver-outcome.txt"
++            exit 1
++          fi
\ No newline at end of file

.gitignore

@@ -0,0 +1,31 @@
+diff --git a/.gitignore b/.gitignore
+--- a/.gitignore
++++ b/.gitignore
+ 
+ # IDE
+ .claude/
++.cursor/
+ .dir-locals.el
+ .idea/
+ .vscode/
+ .zed
+ .cache/
+ .clangd
++*.*~
+ 
+ # Rust
++.cargo-home
+ target/
+-/Cargo.lock
+ integration-tests/Cargo.lock
+ 
+ # Project
+ acceptance/tests/dat/
++acceptance/workloads/
+ ffi/examples/read-table/build
++ffi/examples/visit-expression/build
+ /build
+ /kernel/target
+ /target
++
++/benchmarks/workloads/
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

_{Reproduce locally: git range-diff a5164b3..c6c465f 7b1612f..bf96beb | Disable: git config gitstack.push-range-diff false}

william-ch-databricks · 2026-04-15T01:15:56Z

Range-diff: main (bf96beb -> b293bf7)

.github/workflows/benchmark.yml

@@ -0,0 +1,105 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+--- a/.github/workflows/benchmark.yml
++++ b/.github/workflows/benchmark.yml
+     types: [created, edited]
+ name: Benchmarking PR performance
+ jobs:
+-  runBenchmark:
++  run-benchmark:
+     name: Run benchmarks
+     if: >
+       github.event.issue.pull_request &&
+       (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
+     runs-on: ubuntu-latest
+     permissions:
+-      pull-requests: write
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
+     steps:
+-      - name: Parse benchmark tags
+-        env:
+-          COMMENT: ${{ github.event.comment.body }}
+-        run: |
+-          if [[ "$COMMENT" == "/bench" ]]; then
+-            TAGS="base"
+-          else
+-            TAGS="${COMMENT#/bench }"
+-            TAGS=$(echo "$TAGS" | tr -d '[:space:]')
+-          fi
+-          echo "BENCH_TAGS=$TAGS" >> "$GITHUB_ENV"
+-          echo "Parsed tags: $TAGS"
+-      - name: Get PR HEAD sha
++      - name: Get PR metadata
+         id: pr
+-        run: |
+-          PR_DATA=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.issue.number }})
+-          echo "head_sha=$(echo "$PR_DATA" | jq -r .head.sha)" >> "$GITHUB_OUTPUT"
+-          echo "base_ref=$(echo "$PR_DATA" | jq -r .base.ref)" >> "$GITHUB_OUTPUT"
+         env:
+           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           ref: ${{ steps.pr.outputs.head_sha }}
+       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
+         with:
+           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+-      # TODO: This action internally runs `cargo bench` without --locked, bypassing our
+-      #       supply chain lockfile policy (see build.yml top-level comment). Replace with
+-      #       manual cargo bench --locked + critcmp + gh pr comment steps.
+-      # - uses: boa-dev/criterion-compare-action@adfd3a94634fe2041ce5613eb7df09d247555b87 # v3.2.4
+-      #   with:
+-      #     token: ${{ secrets.GITHUB_TOKEN }}
+-      #     branchName: ${{ steps.pr.outputs.base_ref }}
+-      #     cwd: benchmarks
+-      #     benchName: workload_bench
+-      - run: echo "Benchmarking is temporarily disabled. See TODO above."
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file

CHANGELOG.md

@@ -0,0 +1,276 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file

CLAUDE.md

@@ -0,0 +1,54 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+   but default-engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
\ No newline at end of file

CLAUDE/architecture.md

@@ -0,0 +1,48 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit actions
++(CommitInfo, Add files), enforces protocol compliance (table features, schema validation), and
++delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path encoding
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

_{Reproduce locally: git range-diff a5164b3..bf96beb eae37a7..b293bf7 | Disable: git config gitstack.push-range-diff false}

william-ch-databricks · 2026-04-15T01:29:03Z

Range-diff: main (b293bf7 -> 16b2bc6)

.github/workflows/benchmark.yml

@@ -0,0 +1,105 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+--- a/.github/workflows/benchmark.yml
++++ b/.github/workflows/benchmark.yml
+     types: [created, edited]
+ name: Benchmarking PR performance
+ jobs:
+-  runBenchmark:
++  run-benchmark:
+     name: Run benchmarks
+     if: >
+       github.event.issue.pull_request &&
+       (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
+     runs-on: ubuntu-latest
+     permissions:
+-      pull-requests: write
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
+     steps:
+-      - name: Parse benchmark tags
+-        env:
+-          COMMENT: ${{ github.event.comment.body }}
+-        run: |
+-          if [[ "$COMMENT" == "/bench" ]]; then
+-            TAGS="base"
+-          else
+-            TAGS="${COMMENT#/bench }"
+-            TAGS=$(echo "$TAGS" | tr -d '[:space:]')
+-          fi
+-          echo "BENCH_TAGS=$TAGS" >> "$GITHUB_ENV"
+-          echo "Parsed tags: $TAGS"
+-      - name: Get PR HEAD sha
++      - name: Get PR metadata
+         id: pr
+-        run: |
+-          PR_DATA=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.issue.number }})
+-          echo "head_sha=$(echo "$PR_DATA" | jq -r .head.sha)" >> "$GITHUB_OUTPUT"
+-          echo "base_ref=$(echo "$PR_DATA" | jq -r .base.ref)" >> "$GITHUB_OUTPUT"
+         env:
+           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           ref: ${{ steps.pr.outputs.head_sha }}
+       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
+         with:
+           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+-      # TODO: This action internally runs `cargo bench` without --locked, bypassing our
+-      #       supply chain lockfile policy (see build.yml top-level comment). Replace with
+-      #       manual cargo bench --locked + critcmp + gh pr comment steps.
+-      # - uses: boa-dev/criterion-compare-action@adfd3a94634fe2041ce5613eb7df09d247555b87 # v3.2.4
+-      #   with:
+-      #     token: ${{ secrets.GITHUB_TOKEN }}
+-      #     branchName: ${{ steps.pr.outputs.base_ref }}
+-      #     cwd: benchmarks
+-      #     benchName: workload_bench
+-      - run: echo "Benchmarking is temporarily disabled. See TODO above."
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file

CHANGELOG.md

@@ -0,0 +1,276 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file

CLAUDE.md

@@ -0,0 +1,59 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+   but default-engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
\ No newline at end of file

CLAUDE/architecture.md

@@ -0,0 +1,48 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit actions
++(CommitInfo, Add files), enforces protocol compliance (table features, schema validation), and
++delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path encoding
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

_{Reproduce locally: git range-diff ed6b22f..b293bf7 eae37a7..16b2bc6 | Disable: git config gitstack.push-range-diff false}

scottsand-db

LGTM! (1 nit on comment style)

sanujbasu

I have no blocking comments. Please address the comments before merge

codecov · 2026-04-17T21:07:01Z

Codecov Report

❌ Patch coverage is 85.60606% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.32%. Comparing base (76eebdb) to head (029d667).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
kernel/src/transaction/mod.rs	78.08%	7 Missing and 9 partials ⚠️
kernel/src/snapshot/mod.rs	94.73%	1 Missing ⚠️
kernel/src/transaction/builder/create_table.rs	50.00%	0 Missing and 1 partial ⚠️
kernel/src/transaction/domain_metadata.rs	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2385      +/-   ##
==========================================
- Coverage   88.33%   88.32%   -0.02%     
==========================================
  Files         171      171              
  Lines       56696    56727      +31     
  Branches    56696    56727      +31     
==========================================
+ Hits        50083    50103      +20     
- Misses       4699     4704       +5     
- Partials     1914     1920       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-04-17T21:17:09Z

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: refactor: separate read state from effective state in Transaction
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: refactor: separate read state from effective state in Transaction

This APi was removed in the upstream delta-io#2385 but this is a load-bearing API for delta-rs. The removal was not related to the change and seems like an erroneous removal that wasn't caught in review. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>

william-ch-databricks mentioned this pull request Apr 14, 2026

feat: add rename_column support for ALTER TABLE #2391

Draft

github-actions Bot assigned william-ch-databricks Apr 14, 2026

william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from 6a0ea39 to c6c465f Compare April 14, 2026 22:32

scottsand-db requested review from sanujbasu and scottsand-db April 14, 2026 22:45

scottsand-db reviewed Apr 14, 2026

View reviewed changes

Comment thread kernel/src/transaction/mod.rs Outdated

scottsand-db requested changes Apr 14, 2026

View reviewed changes

william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from c6c465f to bf96beb Compare April 15, 2026 01:06

william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from bf96beb to b293bf7 Compare April 15, 2026 01:15

william-ch-databricks requested a review from scottsand-db April 15, 2026 01:22

william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from b293bf7 to 16b2bc6 Compare April 15, 2026 01:28

scottsand-db requested a review from DrakeLin April 15, 2026 15:57

scottsand-db reviewed Apr 15, 2026

View reviewed changes

Comment thread kernel/src/transaction/mod.rs Outdated

scottsand-db approved these changes Apr 15, 2026

View reviewed changes

william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from f6617cc to 4e1a8b3 Compare April 15, 2026 16:30