Skip to content

refactor!: separate read state from effective state in Transaction#2385

Merged
scottsand-db merged 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-1-refactor-state
Apr 17, 2026
Merged

refactor!: separate read state from effective state in Transaction#2385
scottsand-db merged 1 commit intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-1-refactor-state

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Contributor

@william-ch-databricks william-ch-databricks commented Apr 14, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:

  • read_snapshot_opt: Option<SnapshotRef> -- the pre-commit table state (None for CREATE TABLE)
  • effective_table_config: TableConfiguration -- the config this commit will produce

This separates "what did I read?" (conflict detection, post-commit snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context). Write-path call sites read from
effective_table_config; read-path call sites use read_snapshot().

Also adds should_emit_protocol / should_emit_metadata flags to replace the old
is_create_table() checks for Protocol/Metadata action emission, and replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct TableConfiguration construction.

This is a pure refactor with no behaviour change.

How was this change tested?

All existing tests pass. Added unit tests for LogSegment::new_for_version_zero (valid input,
non-zero version rejection, non-commit file rejection).

@william-ch-databricks
Copy link
Copy Markdown
Contributor Author

Range-diff: main (6a0ea39 -> c6c465f)
.github/actions/install-and-cache/action.yml
@@ -0,0 +1,105 @@
+diff --git a/.github/actions/install-and-cache/action.yml b/.github/actions/install-and-cache/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/install-and-cache/action.yml
++# This is copied from https://github.com/tecolicom/actions-install-and-cache
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache generic backend
++description: 'GitHub Action to run installer and cache the result'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  run:     { required: true,  type: string }
++  path:    { required: true,  type: string }
++  cache:   { required: false, type: string, default: yes }
++  key:     { required: false, type: string }
++  sudo:    { required: false, type: string }
++  verbose: { required: false, type: string, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.cache.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup install-and-cache
++        define() { IFS='\n' read -r -d '' ${1} || true ; }
++        define script <<'EOS_cad8_c24e_'
++        ${{ inputs.run }}
++        EOS_cad8_c24e_
++        directory="${{ inputs.path }}"
++        given_key="${{ inputs.key }}"
++        archive= key=
++        case "${{ inputs.cache }}" in
++            yes|workflow)
++                cache="${{ inputs.cache }}"
++                uname -mrs
++                hash=$( (uname -mrs ; cat <<< "$script" ; echo $directory) | \
++                        (md5sum||md5) | awk '{print $1}' )
++                key="${hash}${given_key:+-$given_key}"
++                [ "$cache" == 'workflow' ] && \
++                    key+="-${{ github.run_id }}-${{ github.run_attempt }}"
++                archive=$HOME/archive-$hash.tz
++                ;;
++            *)
++                cache=no
++                ;;
++        esac
++        # use "--recursive-unlink" option if GNU tar is found
++        if tar --version | grep GNU > /dev/null
++        then
++            tar="tar --recursive-unlink"
++        elif gtar --version | grep GNU > /dev/null
++        then
++            tar="gtar --recursive-unlink"
++        else
++            tar=tar
++        fi
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            cache=$cache
++            archive=$archive
++            key=$key
++            tar=$tar
++        END
++
++    - id: cache
++      if: steps.setup.outputs.cache != 'no'
++      uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
++      with:
++        path: ${{ steps.setup.outputs.archive }}
++        key:  ${{ steps.setup.outputs.key }}
++
++    - id: extract
++      if: steps.setup.outputs.cache != 'no' && steps.cache.outputs.cache-hit == 'true'
++      shell: bash
++      run: |
++        : extract
++        archive="${{ steps.setup.outputs.archive }}"
++        verbose="${{ inputs.verbose }}"
++        tar="${{ steps.setup.outputs.tar }}"
++        ls -l $archive
++        if [ -s $archive ]
++        then
++            opt=-Pxz
++            [[ $verbose == yes || $verbose == true ]] && opt+=v
++            sudo $tar -C / $opt -f $archive
++        else
++            echo "$archive is empty"
++        fi
++
++    - id: install-and-archive
++      if: steps.cache.outputs.cache-hit != 'true'
++      uses: tecolicom/actions-install-and-archive@9d5afb27f9900f2df47fe40de58fbd837032bddf # v1.3
++      with:
++        run:     ${{ inputs.run }}
++        archive: ${{ steps.setup.outputs.archive }}
++        path:    ${{ inputs.path }}
++        sudo:    ${{ inputs.sudo }}
\ No newline at end of file
.github/actions/pr-title-validator/action.yml
@@ -0,0 +1,47 @@
+diff --git a/.github/actions/pr-title-validator/action.yml b/.github/actions/pr-title-validator/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/pr-title-validator/action.yml
++name: 'PR Title Validator'
++description: 'Validates a pull request title against a regex pattern'
++
++inputs:
++  regex:
++    description: 'Regular expression the PR title must match'
++    required: true
++  breaking-change-regex:
++    description: 'Regex to use instead when the breaking-change label is present'
++    required: false
++    default: ''
++  labels:
++    description: 'JSON array of label names on the PR'
++    required: false
++    default: '[]'
++  title:
++    description: 'PR title to validate. Defaults to github.event.pull_request.title.'
++    required: false
++    default: ''
++
++runs:
++  using: composite
++  steps:
++    - name: Validate PR title
++      shell: bash
++      env:
++        PR_TITLE: ${{ inputs.title || github.event.pull_request.title }}
++        INPUT_REGEX: ${{ inputs.regex }}
++        BREAKING_REGEX: ${{ inputs.breaking-change-regex }}
++        LABELS: ${{ inputs.labels }}
++      run: |
++        REGEX="$INPUT_REGEX"
++        if [[ -n "$BREAKING_REGEX" ]] && echo "$LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++          REGEX="$BREAKING_REGEX"
++          echo "breaking-change label detected, using breaking change regex."
++        fi
++
++        if [[ "$PR_TITLE" =~ $REGEX ]]; then
++          echo "PR title matches pattern."
++          exit 0
++        fi
++        echo "::error::PR title \"$PR_TITLE\" does not match pattern: $REGEX"
++        exit 1
\ No newline at end of file
.github/actions/use-homebrew-tools/action.yml
@@ -0,0 +1,51 @@
+diff --git a/.github/actions/use-homebrew-tools/action.yml b/.github/actions/use-homebrew-tools/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/use-homebrew-tools/action.yml
++# This is copied from https://github.com/tecolicom/actions-use-homebrew-tools/
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache homebrew tools
++description: 'GitHub Action to install and cache homebrew tools'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  tools:   { required: false, type: string }
++  key:     { required: false, type: string }
++  path:    { required: false, type: string }
++  cache:   { required: false, type: string, default: yes }
++  verbose: { required: false, type: boolean, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.update.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup use-homebrew-tools
++        given_key="${{ inputs.key }}"
++        brew_version="$(brew --version)"
++        echo "$brew_version"
++        version_key="$( echo "$brew_version" | (md5sum||md5) | awk '{print $1}' )"
++        key="${given_key:+$given_key-}${version_key}"
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            command=brew install
++            prefix=$(brew --prefix)
++            key=$key
++        END
++
++    - id: update
++      uses: ./.github/actions/install-and-cache
++      with:
++        run:     ${{ steps.setup.outputs.command }} ${{ inputs.tools }}
++        path:    ${{ steps.setup.outputs.prefix }} ${{ inputs.path }}
++        key:     ${{ steps.setup.outputs.key }}
++        cache:   ${{ inputs.cache }}
++        verbose: ${{ inputs.verbose }}
\ No newline at end of file
.github/workflows/auto-assign-pr.yml
@@ -0,0 +1,8 @@
+diff --git a/.github/workflows/auto-assign-pr.yml b/.github/workflows/auto-assign-pr.yml
+--- a/.github/workflows/auto-assign-pr.yml
++++ b/.github/workflows/auto-assign-pr.yml
+   assign-author:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: toshimaru/auto-author-assign@v2.1.1
++      - uses: toshimaru/auto-author-assign@16f0022cf3d7970c106d8d1105f75a1165edb516 # v2.1.1
\ No newline at end of file
.github/workflows/benchmark.yml
@@ -0,0 +1,86 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/benchmark.yml
++# issue_comment is used here to trigger on PR comments, as opposed to pull_request_review
++# (review submissions) or pull_request_review_comment (comments on the diff itself)
++# we want to trigger this on comment creation or edit
++on:
++  issue_comment:
++    types: [created, edited]
++name: Benchmarking PR performance
++jobs:
++  run-benchmark:
++    name: Run benchmarks
++    if: >
++      github.event.issue.pull_request &&
++      (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
++    runs-on: ubuntu-latest
++    permissions:
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
++    steps:
++      - name: Get PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++        with:
++          ref: ${{ steps.pr.outputs.head_sha }}
++      - uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file
.github/workflows/build.yml
@@ -0,0 +1,315 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ name: build
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   RUST_BACKTRACE: 1
+ 
++# Supply chain security: all cargo commands that resolve dependencies use --locked to
++# enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
++# (potentially compromised) dependency version. If Cargo.lock is out of sync with
++# Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
++#
++# Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
++# save-if restricts cache writes to main pushes only. PRs read from main's cache but
++# never write their own entries.
++#
++# The key insight: Cargo.lock changes infrequently, so main's cache key almost always
++# matches. PRs download and compile zero dependencies on cache hit. By only writing on
++# main, we keep main's cache entries alive (no LRU eviction from PR churn), and every
++# PR benefits from them.
++#
++# Without this, GHA's ref-scoped caching works against us: each PR writes ~6.3GB of
++# cache entries (14 jobs x ~450MB) that only that PR can read. A handful of active PRs
++# fills the 10GB cache budget, LRU evicts main's shared entries, and every subsequent
++# PR compiles from scratch.
++#
++# The save-if condition checks both event_name == 'push' and ref == main because
++# pull_request_target events set github.ref to the base branch (main), not the PR
++# branch. Without the event_name check, those workflows would write cache entries on
++# every PR.
++#
++# Note: actions-rust-lang/setup-rust-toolchain has built-in Swatinem/rust-cache that
++# writes on every run with no save-if support. We disable it with cache: false and
++# manage caching explicitly via the Swatinem/rust-cache steps below.
++
+ jobs:
+   format:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: rustfmt
+       - name: format
+         run: cargo fmt -- --check
+   msrv:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+       - name: verify-msrv
+   msrv-run-tests:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+-      - uses: taiki-e/install-action@nextest
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Get rust-version from Cargo.toml
+         id: rust-version
+         run: echo "RUST_VERSION=$(cargo msrv show --path kernel/ --output-format minimal)" >> $GITHUB_ENV
+       - name: Install specified rust version
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           toolchain: ${{ env.RUST_VERSION }}
+       - name: run tests
+         run: |
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+-          cargo +$(cargo msrv show --output-format minimal) nextest run
++          cargo +$(cargo msrv show --output-format minimal) nextest run --locked
+   docs:
+     runs-on: ubuntu-latest
+     env:
+       RUSTDOCFLAGS: -D warnings
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build docs
+-        run: cargo doc --workspace --all-features
+-
++        run: cargo doc --locked --workspace --all-features --no-deps
+ 
+   # When we run cargo { build, clippy } --no-default-features, we want to build/lint the kernel to
+   # ensure that we can build the kernel without any features enabled. Unfortunately, due to how
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with clippy
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: clippy
+-      - uses: Swatinem/rust-cache@v2
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build and lint with clippy
+-        run: cargo clippy --benches --tests --all-features -- -D warnings
++        run: cargo clippy --locked --benches --tests --all-features -- -D warnings
+       - name: lint without default features - packages which depend on kernel with features enabled
+-        run: cargo clippy --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+       - name: lint without default features - packages which don't depend on kernel with features enabled
+-        run: cargo clippy --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
+       - name: check kernel builds with default-engine-native-tls
+-        run: cargo build -p feature_tests --features default-engine-native-tls
++        run: cargo build --locked -p feature_tests --features default-engine-native-tls
++      - name: test native-tls backend has no crypto provider conflict
++        run: cargo test --locked -p feature_tests --features default-engine-native-tls
+       - name: check kernel builds with default-engine-rustls
+-        run: cargo build -p feature_tests --features default-engine-rustls
++        run: cargo build --locked -p feature_tests --features default-engine-rustls
++      - name: test rustls TLS backend feature-tests
++        run: cargo test --locked -p feature_tests --features default-engine-rustls
+   test:
+     runs-on: ${{ matrix.os }}
+     strategy:
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++      - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
++        id: filter
++        with:
++          filters: |
++            ffi:
++              - 'ffi/src/handle.rs'
++              - 'ffi-proc-macros/**'
+       - name: Install minimal stable with clippy and rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: test
+-        run: cargo nextest run --workspace --all-features -E 'not test(read_table_version_hdfs)'
++        run: cargo nextest run --locked --workspace --all-features -E 'not test(read_table_version_hdfs) and not test(invalid_handle_code)'
++      - name: trybuild tests
++        if: steps.filter.outputs.ffi == 'true'
++        run: cargo test --locked --package delta_kernel_ffi --features internal-api -- invalid_handle_code
+ 
+   ffi_test:
+     runs-on: ${{ matrix.os }}
+           - macOS-latest
+           - ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Setup cmake
+-        uses: jwlawson/actions-setup-cmake@v2
++        uses: jwlawson/actions-setup-cmake@0d6a7d60b009d01c9e7523be22153ff8f19460d3 # v2.2.0
+         with:
+-          cmake-version: '3.30.x'
++          cmake-version: "3.30.x"
+       - name: Install arrow-glib-linux
+         run: |
+           if [ "$RUNNER_OS" == "Linux" ]; then
+            fi
+       - name: Install arrow-glib-macOS
+         if: runner.os == 'macOS'
+-        uses: tecolicom/actions-use-homebrew-tools@v1
++        uses: ./.github/actions/use-homebrew-tools
+         with:
+-          tools: 'apache-arrow apache-arrow-glib'
++          tools: "apache-arrow apache-arrow-glib"
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Set output on fail
+         run: echo "CTEST_OUTPUT_ON_FAILURE=1" >> "$GITHUB_ENV"
+       - name: Build kernel
+         run: |
+           pushd acceptance
+-          cargo build
++          cargo build --locked
+           popd
+           pushd ffi
+-          cargo b --features default-engine-rustls,test-ffi,tracing,uc-catalog
++          cargo build --locked --features default-engine-rustls,test-ffi,tracing,delta-kernel-unity-catalog
+           popd
+       - name: build and run read-table test
+         run: |
+           cmake ..
+           make
+           make test
+-      - name: build and run uc-catalog-ffi test
++      - name: build and run delta-kernel-unity-catalog-ffi test
+         run: |
+-          pushd ffi/examples/uc-catalog-example
++          pushd ffi/examples/delta-kernel-unity-catalog-example
+           mkdir build
+           pushd build
+           cmake ..
+           make
+           make test
+   miri:
+-    name: "Miri"
++    name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
++    strategy:
++      matrix:
++        partition: [1, 2, 3]
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install Miri
+         run: |
+           rustup toolchain install nightly --component miri
+           rustup override set nightly
+           cargo miri setup
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Test with Miri
+         run: |
+           pushd ffi
+-          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --features default-engine-rustls,uc-catalog
++          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --locked --features default-engine-rustls,delta-kernel-unity-catalog --partition slice:${{ matrix.partition }}/3
+ 
+   coverage:
+     runs-on: ubuntu-latest
+     env:
+       CARGO_TERM_COLOR: always
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install rust
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-llvm-cov
+-        uses: taiki-e/install-action@cargo-llvm-cov
++        uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
+       - name: Upload coverage to Codecov
+-        uses: codecov/codecov-action@v5
++        uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
+           files: codecov.json
+           fail_ci_if_error: true
\ No newline at end of file
.github/workflows/comment-on-title-failure.yml
@@ -0,0 +1,65 @@
+diff --git a/.github/workflows/comment-on-title-failure.yml b/.github/workflows/comment-on-title-failure.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/comment-on-title-failure.yml
++name: Comment on PR Title Failure
++
++on:
++  workflow_run:
++    workflows: ["Validate PR Title"]
++    types: [completed]
++
++jobs:
++  comment:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      # Step taken from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR info
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          # If the PR is from a fork, prefix it with `<owner-login>:`, otherwise only the PR branch name is relevant:
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        # Query the PR number by repo + branch, then assign to step output:
++        run: |
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++             --json 'number,title' --jq '"number=\(.number)\ntitle=\(.title)"' \
++             >> "${GITHUB_OUTPUT}"
++
++      - name: Find existing comment
++        id: find
++        uses: peter-evans/find-comment@3eae4d37986fb5a8592848f6a574fdf654e61f9e # v3.1.0
++        with:
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          comment-author: 'github-actions[bot]'
++          body-includes: PR title does not match the required pattern
++
++      - name: Post or update failure comment
++        if: ${{ github.event.workflow_run.conclusion == 'failure' }}
++        uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
++        env:
++          PR_TITLE: ${{ steps.pr-context.outputs.title }}
++        with:
++          comment-id: ${{ steps.find.outputs.comment-id }}
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          body: |
++            PR title does not match the required pattern. Please ensure you follow the [conventional commits](https://www.conventionalcommits.org/) spec.
++
++            Your title should start with `feat:`, `fix:`, `chore:`, `docs:`, `perf:`, `refactor:`, `test:`, or `ci:`, and if it's a breaking change that should be suffixed with a `!` (like `feat!:`), and then a 1-72 character brief description of your change.
++
++            **Title:** `${{ env.PR_TITLE }}`
++
++      - name: Delete comment on success
++        if: ${{ github.event.workflow_run.conclusion == 'success' && steps.find.outputs.comment-id != '' }}
++        env:
++          GH_TOKEN: ${{ github.token }}
++        run: |
++          gh api repos/${{ github.repository }}/issues/comments/${{ steps.find.outputs.comment-id }} -X DELETE
\ No newline at end of file
.github/workflows/pr-validator.yml
@@ -0,0 +1,57 @@
+diff --git a/.github/workflows/pr-validator.yml b/.github/workflows/pr-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-validator.yml
++name: Validate PR Title
++
++on:
++  pull_request:
++    types: [opened, edited, reopened, synchronize, labeled, unlabeled]
++  workflow_run:
++    workflows: ["semver-label"] # we need this since auto-labels from jobs don't trigger a workflow
++    types: [completed]
++
++jobs:
++  validate-title:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Resolve PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ github.token }}
++          # Captured as env vars to prevent expression injection into the shell command.
++          PR_TITLE: ${{ github.event.pull_request.title }}
++          PR_LABELS_JSON: ${{ toJson(github.event.pull_request.labels.*.name) }}
++        run: |
++          if [[ "${{ github.event_name }}" == "workflow_run" ]]; then
++            pr_json=$(gh api --paginate repos/${{ github.repository }}/pulls \
++              --jq ".[] | select(.head.sha == \"${{ github.event.workflow_run.head_sha }}\")")
++            echo "number=$(echo "$pr_json" | jq -r '.number')" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$pr_json" | jq -r '.title'
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$pr_json" | jq -c '[.labels[].name]')" >> "$GITHUB_OUTPUT"
++          else
++            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$PR_TITLE"
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$PR_LABELS_JSON" | jq -c '.')" >> "$GITHUB_OUTPUT"
++          fi
++
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++
++      - uses: ./.github/actions/pr-title-validator
++        with:
++          regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!?(\(.+\))?: .{1,72}$'
++          breaking-change-regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!(\(.+\))?: .{1,72}$'
++          labels: ${{ steps.pr.outputs.labels }}
++          title: ${{ steps.pr.outputs.title }}
\ No newline at end of file
.github/workflows/run-examples.yml
@@ -0,0 +1,55 @@
+diff --git a/.github/workflows/run-examples.yml b/.github/workflows/run-examples.yml
+--- a/.github/workflows/run-examples.yml
++++ b/.github/workflows/run-examples.yml
+ name: run-examples
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   run-examples:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+ 
+       - name: Run all examples
+         run: |
+               # Special case for write-table: it needs a temp directory
+               if [ "$example_dir" = "write-table" ]; then
+                 tmp_dir=$(mktemp -d)
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
+                 rm -r "$tmp_dir"
+               # Special case for inspect-table: it needs an operation/subcommand, run each one
+               elif [ "$example_dir" = "inspect-table" ]; then
+                 for operation in table-version metadata schema scan-metadata actions; do
+                   echo "  Running inspect-table with operation: $operation"
+-                  cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
++                  cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
+                 done
+               # Special case for read-table-changes: skip running it in CI as it needs a specific CDF-enabled table
+               # but still verify it compiles
+               # TODO: Add a suitable test table for CDF
+               elif [ "$example_dir" = "read-table-changes" ]; then
+                 echo "Building read-table-changes (skipping run - requires CDF-enabled table)"
+-                cargo build --manifest-path "$example_dir/Cargo.toml" --release
++                cargo build --locked --manifest-path "$example_dir/Cargo.toml" --release
+               else
+                 # All other examples run with the test table path
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
+               fi
+ 
+               echo ""
\ No newline at end of file
.github/workflows/run_integration_test.yml
@@ -0,0 +1,70 @@
+diff --git a/.github/workflows/run_integration_test.yml b/.github/workflows/run_integration_test.yml
+--- a/.github/workflows/run_integration_test.yml
++++ b/.github/workflows/run_integration_test.yml
+-name: Run tests to ensure we can compile across arrow versions
++# TODO: Disabled. The test script runs cargo update which resolves fresh dependencies,
++#       bypassing the Cargo.lock supply chain policy (see build.yml top-level comment).
+ 
+-on: [workflow_dispatch, push, pull_request]
+-
+-jobs:
+-  arrow_integration_test:
+-    runs-on: ${{ matrix.os }}
+-    timeout-minutes: 20
+-    strategy:
+-      fail-fast: false
+-      matrix:
+-        include:
+-          - os: macOS-latest
+-          - os: ubuntu-latest
+-          - os: windows-latest
+-            skip: ${{ github.event_name == 'pull_request' }} # skip running windows tests on every PR since they are slow
+-    steps:
+-      - name: Skip job for pull requests on Windows
+-        if: ${{ matrix.skip }}
+-        run: echo "Skipping job for pull requests on Windows."
+-      - uses: actions/checkout@v4
+-        if: ${{ !matrix.skip }}
+-      - name: Setup rust toolchain
+-        if: ${{ !matrix.skip }}
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - name: Run integration tests
+-        if: ${{ !matrix.skip }}
+-        shell: bash
+-        run: pushd integration-tests && ./test-all-arrow-versions.sh
++# name: Run tests to ensure we can compile across arrow versions
++#
++# on: [workflow_dispatch, push, pull_request, merge_group]
++#
++# jobs:
++#   arrow_integration_test:
++#     runs-on: ${{ matrix.os }}
++#     timeout-minutes: 20
++#     strategy:
++#       fail-fast: false
++#       matrix:
++#         include:
++#           - os: macOS-latest
++#           - os: ubuntu-latest
++#           - os: windows-latest
++#             skip: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }} # skip running windows tests on PRs and merge queue since they are slow
++#     steps:
++#       - name: Skip job for pull requests on Windows
++#         if: ${{ matrix.skip }}
++#         run: echo "Skipping job for pull requests on Windows."
++#       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++#         if: ${{ !matrix.skip }}
++#       - name: Setup rust toolchain
++#         if: ${{ !matrix.skip }}
++#         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++#         with:
++#           cache: false
++#       # See build.yml top-level comment for why save-if is restricted to main.
++#       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++#         if: ${{ !matrix.skip }}
++#         with:
++#           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++#       - name: Run integration tests
++#         if: ${{ !matrix.skip }}
++#         shell: bash
++#         run: pushd integration-tests && ./test-all-arrow-versions.sh
\ No newline at end of file
.github/workflows/semver-checks.yml
@@ -0,0 +1,136 @@
+diff --git a/.github/workflows/semver-checks.yml b/.github/workflows/semver-checks.yml
+--- a/.github/workflows/semver-checks.yml
++++ b/.github/workflows/semver-checks.yml
+ name: semver-checks
+ 
+-# Trigger when a PR is opened or changed
++# Trigger when a PR is opened or changed. This runs with `pull_request` trigger, which means it has
++# only read perms. The adding of the label happens in semver-label.yml via workflow_run which will
++# will look at the status of this job, and always runs in the base-repo context.
+ on:
+-  pull_request_target:
++  pull_request:
+     types:
+       - opened
+       - synchronize
+       - reopened
++  merge_group:
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   check_if_pr_breaks_semver:
+     runs-on: ubuntu-latest
+     permissions:
+-      # this job runs with read because it checks out the PR head which could contain malicious code
+       contents: read
+     steps:
+-      - uses: actions/checkout@v4
+-        name: checkout full rep
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           fetch-depth: 0
+-          ref: ${{ github.event.pull_request.head.sha }}
++          ref: >-
++            ${{ github.event_name == 'merge_group'
++                && github.event.merge_group.head_sha
++                || github.event.pull_request.head.sha }}
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-semver-checks
++        uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
++        with:
++          tool: cargo-semver-checks
++      - name: Compute baseline revision
++        id: baseline
+         shell: bash
++        env:
++          MERGE_GROUP_BASE_SHA: ${{ github.event.merge_group.base_sha }}
++          PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
++          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
+         run: |
+-          cargo install cargo-semver-checks --locked
+-      - name: Run check
++          if [ "${{ github.event_name }}" = "merge_group" ]; then
++            echo "rev=${MERGE_GROUP_BASE_SHA}" >> "$GITHUB_OUTPUT"
++          else
++            # Use the merge-base instead of the PR base SHA. The base SHA is the tip of
++            # the target branch when the webhook fires, which can differ from where the PR
++            # actually diverged. Using merge-base avoids false positives when the PR branch
++            # is behind the target branch.
++            MERGE_BASE=$(git merge-base "$PR_HEAD_SHA" "$PR_BASE_SHA")
++            echo "rev=${MERGE_BASE}" >> "$GITHUB_OUTPUT"
++          fi
++      - name: Run semver check
+         id: check
+         continue-on-error: true
+         shell: bash
++        env:
++          BASELINE_REV: ${{ steps.baseline.outputs.rev }}
+         # only check semver on released crates (delta_kernel and delta_kernel_ffi).
+         # note that this won't run on proc macro/derive crates, so don't need to include
+         # delta_kernel_derive etc.
+         run: |
+-          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features --baseline-rev ${{ github.event.pull_request.base.sha }}
+-      - name: On Failure
+-        id: set_failure
+-        if: ${{ steps.check.outcome == 'failure' }}
+-        run: |
+-          echo "Checks failed"
+-          echo "check_status=failure" >> $GITHUB_OUTPUT
+-      - name: On Success
+-        id: set_success
+-        if: ${{ steps.check.outcome == 'success' }}
+-        run: |
+-          echo "Checks succeed"
+-          echo "check_status=success" >> $GITHUB_OUTPUT
+-    outputs:
+-      check_status: ${{ steps.set_failure.outputs.check_status || steps.set_success.outputs.check_status }}
+-  update_label_if_needed:
+-    needs: check_if_pr_breaks_semver
+-    runs-on: ubuntu-latest
+-    permissions:
+-      # this job only looks at previous output and then sets a label, so malicious code in the PR
+-      # isn't a concern
+-      pull-requests: write
+-    steps:
+-      - name: On Failure
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'failure'
+-        uses: actions-ecosystem/action-add-labels@v1
++          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features \
++            --baseline-rev "$BASELINE_REV"
++      # Upload the step outcome as an artifact so semver-label.yml can read it via workflow_run.
++      # steps.check.outcome is the raw result *before* continue-on-error converts it to "success",
++      # so it correctly reflects whether a breaking change was detected.
++      # Only upload for pull_request events; merge_group runs have no PR to label.
++      - name: Save semver outcome
++        if: github.event_name == 'pull_request'
++        env:
++          SEMVER_OUTCOME: ${{ steps.check.outcome }}
++        run: echo "$SEMVER_OUTCOME" > semver-outcome.txt
++      - name: Upload semver outcome
++        if: github.event_name == 'pull_request'
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+         with:
+-          labels: breaking-change
+-      - name: Remove breaking-change label
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success' && contains(github.event.pull_request.labels.*.name, 'breaking-change')
+-        uses: actions-ecosystem/action-remove-labels@v1
+-        with:
+-          labels: breaking-change
+-      - name: On Success
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success'
+-        run: |
+-          echo "Checks succeed"
+-      - name: Fail On Incorrect Previous Output
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status != 'success' && needs.check_if_pr_breaks_semver.outputs.check_status != 'failure'
+-        run: exit 1
++          name: semver-outcome
++          path: semver-outcome.txt
++          retention-days: 1
\ No newline at end of file
.github/workflows/semver-label.yml
@@ -0,0 +1,81 @@
+diff --git a/.github/workflows/semver-label.yml b/.github/workflows/semver-label.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/semver-label.yml
++name: semver-label
++
++# Apply or remove the breaking-change label based on the outcome of the semver-checks workflow.
++# This must be a separate workflow from semver-checks.yml: label writes require pull-requests:write,
++# which is unavailable in pull_request workflows triggered by fork PRs. workflow_run always runs
++# in the base-repo context with full write permissions, and never executes PR code.
++on:
++  workflow_run:
++    workflows: ["semver-checks"]
++    types: [completed]
++
++jobs:
++  update_label_if_needed:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++      actions: read
++    # Label updates only apply to PRs; merge_group runs have no associated PR to label.
++    if: github.event.workflow_run.event == 'pull_request'
++    steps:
++      # Resolve PR number from the triggering workflow run's branch. For fork PRs the branch
++      # must be prefixed with `<owner>:` so gh pr view can locate it.
++      # Pattern from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR number
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        run: |
++          echo "Looking up PR for branch '${PR_BRANCH}' in repo '${PR_TARGET_REPO}'"
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++            --json 'number' --jq '"number=\(.number)"' \
++            >> "${GITHUB_OUTPUT}"
++          echo "PR lookup complete: $(cat "${GITHUB_OUTPUT}")"
++
++      # Download the semver outcome artifact written by semver-checks.yml.
++      # steps.check.outcome in that workflow is the raw result before continue-on-error
++      # converts it to "success", so it correctly reflects whether a breaking change was found.
++      - name: Download semver outcome
++        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
++        with:
++          name: semver-outcome
++          github-token: ${{ github.token }}
++          run-id: ${{ github.event.workflow_run.id }}
++
++      - name: Update breaking-change label
++        if: steps.pr-context.outputs.number != ''
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_NUMBER: ${{ steps.pr-context.outputs.number }}
++        run: |
++          STEP_OUTCOME=$(cat semver-outcome.txt)
++          echo "Semver check outcome: '${STEP_OUTCOME}' for PR #${PR_NUMBER}"
++
++          if [[ "$STEP_OUTCOME" == "failure" ]]; then
++            echo "Breaking change detected -- adding 'breaking-change' label to PR #$PR_NUMBER"
++            gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --add-label "breaking-change"
++          elif [[ "$STEP_OUTCOME" == "success" ]]; then
++            # Remove the label only if it is currently present; gh pr edit fails on absent labels.
++            CURRENT_LABELS=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --json labels --jq '[.labels[].name]')
++            echo "Current PR labels: $CURRENT_LABELS"
++            if echo "$CURRENT_LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++              echo "Semver check passed -- removing 'breaking-change' label from PR #$PR_NUMBER"
++              gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --remove-label "breaking-change"
++            else
++              echo "Semver check passed -- 'breaking-change' label not present, nothing to do"
++            fi
++          else
++            echo "ERROR: unexpected semver outcome '${STEP_OUTCOME}' in semver-outcome.txt"
++            exit 1
++          fi
\ No newline at end of file
.gitignore
@@ -0,0 +1,31 @@
+diff --git a/.gitignore b/.gitignore
+--- a/.gitignore
++++ b/.gitignore
+ 
+ # IDE
+ .claude/
++.cursor/
+ .dir-locals.el
+ .idea/
+ .vscode/
+ .zed
+ .cache/
+ .clangd
++*.*~
+ 
+ # Rust
++.cargo-home
+ target/
+-/Cargo.lock
+ integration-tests/Cargo.lock
+ 
+ # Project
+ acceptance/tests/dat/
++acceptance/workloads/
+ ffi/examples/read-table/build
++ffi/examples/visit-expression/build
+ /build
+ /kernel/target
+ /target
++
++/benchmarks/workloads/
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff 8737564..6a0ea39 7b1612f..c6c465f | Disable: git config gitstack.push-range-diff false

Comment thread kernel/src/transaction/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks AWESOME! Thanks! Left some comments

Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/snapshot.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/transaction/mod.rs Outdated
Comment thread kernel/src/log_segment/mod.rs
Comment thread kernel/src/transaction/mod.rs Outdated
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from c6c465f to bf96beb Compare April 15, 2026 01:06
@william-ch-databricks
Copy link
Copy Markdown
Contributor Author

Range-diff: main (c6c465f -> bf96beb)
.github/actions/install-and-cache/action.yml
@@ -0,0 +1,105 @@
+diff --git a/.github/actions/install-and-cache/action.yml b/.github/actions/install-and-cache/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/install-and-cache/action.yml
++# This is copied from https://github.com/tecolicom/actions-install-and-cache
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache generic backend
++description: 'GitHub Action to run installer and cache the result'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  run:     { required: true,  type: string }
++  path:    { required: true,  type: string }
++  cache:   { required: false, type: string, default: yes }
++  key:     { required: false, type: string }
++  sudo:    { required: false, type: string }
++  verbose: { required: false, type: string, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.cache.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup install-and-cache
++        define() { IFS='\n' read -r -d '' ${1} || true ; }
++        define script <<'EOS_cad8_c24e_'
++        ${{ inputs.run }}
++        EOS_cad8_c24e_
++        directory="${{ inputs.path }}"
++        given_key="${{ inputs.key }}"
++        archive= key=
++        case "${{ inputs.cache }}" in
++            yes|workflow)
++                cache="${{ inputs.cache }}"
++                uname -mrs
++                hash=$( (uname -mrs ; cat <<< "$script" ; echo $directory) | \
++                        (md5sum||md5) | awk '{print $1}' )
++                key="${hash}${given_key:+-$given_key}"
++                [ "$cache" == 'workflow' ] && \
++                    key+="-${{ github.run_id }}-${{ github.run_attempt }}"
++                archive=$HOME/archive-$hash.tz
++                ;;
++            *)
++                cache=no
++                ;;
++        esac
++        # use "--recursive-unlink" option if GNU tar is found
++        if tar --version | grep GNU > /dev/null
++        then
++            tar="tar --recursive-unlink"
++        elif gtar --version | grep GNU > /dev/null
++        then
++            tar="gtar --recursive-unlink"
++        else
++            tar=tar
++        fi
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            cache=$cache
++            archive=$archive
++            key=$key
++            tar=$tar
++        END
++
++    - id: cache
++      if: steps.setup.outputs.cache != 'no'
++      uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
++      with:
++        path: ${{ steps.setup.outputs.archive }}
++        key:  ${{ steps.setup.outputs.key }}
++
++    - id: extract
++      if: steps.setup.outputs.cache != 'no' && steps.cache.outputs.cache-hit == 'true'
++      shell: bash
++      run: |
++        : extract
++        archive="${{ steps.setup.outputs.archive }}"
++        verbose="${{ inputs.verbose }}"
++        tar="${{ steps.setup.outputs.tar }}"
++        ls -l $archive
++        if [ -s $archive ]
++        then
++            opt=-Pxz
++            [[ $verbose == yes || $verbose == true ]] && opt+=v
++            sudo $tar -C / $opt -f $archive
++        else
++            echo "$archive is empty"
++        fi
++
++    - id: install-and-archive
++      if: steps.cache.outputs.cache-hit != 'true'
++      uses: tecolicom/actions-install-and-archive@9d5afb27f9900f2df47fe40de58fbd837032bddf # v1.3
++      with:
++        run:     ${{ inputs.run }}
++        archive: ${{ steps.setup.outputs.archive }}
++        path:    ${{ inputs.path }}
++        sudo:    ${{ inputs.sudo }}
\ No newline at end of file
.github/actions/pr-title-validator/action.yml
@@ -0,0 +1,47 @@
+diff --git a/.github/actions/pr-title-validator/action.yml b/.github/actions/pr-title-validator/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/pr-title-validator/action.yml
++name: 'PR Title Validator'
++description: 'Validates a pull request title against a regex pattern'
++
++inputs:
++  regex:
++    description: 'Regular expression the PR title must match'
++    required: true
++  breaking-change-regex:
++    description: 'Regex to use instead when the breaking-change label is present'
++    required: false
++    default: ''
++  labels:
++    description: 'JSON array of label names on the PR'
++    required: false
++    default: '[]'
++  title:
++    description: 'PR title to validate. Defaults to github.event.pull_request.title.'
++    required: false
++    default: ''
++
++runs:
++  using: composite
++  steps:
++    - name: Validate PR title
++      shell: bash
++      env:
++        PR_TITLE: ${{ inputs.title || github.event.pull_request.title }}
++        INPUT_REGEX: ${{ inputs.regex }}
++        BREAKING_REGEX: ${{ inputs.breaking-change-regex }}
++        LABELS: ${{ inputs.labels }}
++      run: |
++        REGEX="$INPUT_REGEX"
++        if [[ -n "$BREAKING_REGEX" ]] && echo "$LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++          REGEX="$BREAKING_REGEX"
++          echo "breaking-change label detected, using breaking change regex."
++        fi
++
++        if [[ "$PR_TITLE" =~ $REGEX ]]; then
++          echo "PR title matches pattern."
++          exit 0
++        fi
++        echo "::error::PR title \"$PR_TITLE\" does not match pattern: $REGEX"
++        exit 1
\ No newline at end of file
.github/actions/use-homebrew-tools/action.yml
@@ -0,0 +1,51 @@
+diff --git a/.github/actions/use-homebrew-tools/action.yml b/.github/actions/use-homebrew-tools/action.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/actions/use-homebrew-tools/action.yml
++# This is copied from https://github.com/tecolicom/actions-use-homebrew-tools/
++# which is Copyright 2022 Office TECOLI, LLC
++
++name: install-and-cache homebrew tools
++description: 'GitHub Action to install and cache homebrew tools'
++branding:
++  color: orange
++  icon:  type
++
++inputs:
++  tools:   { required: false, type: string }
++  key:     { required: false, type: string }
++  path:    { required: false, type: string }
++  cache:   { required: false, type: string, default: yes }
++  verbose: { required: false, type: boolean, default: false }
++
++outputs:
++  cache-hit:
++    value: ${{ steps.update.outputs.cache-hit }}
++
++runs:
++  using: composite
++  steps:
++
++    - id: setup
++      shell: bash
++      run: |
++        : setup use-homebrew-tools
++        given_key="${{ inputs.key }}"
++        brew_version="$(brew --version)"
++        echo "$brew_version"
++        version_key="$( echo "$brew_version" | (md5sum||md5) | awk '{print $1}' )"
++        key="${given_key:+$given_key-}${version_key}"
++        sed 's/^ *//' << END >> $GITHUB_OUTPUT
++            command=brew install
++            prefix=$(brew --prefix)
++            key=$key
++        END
++
++    - id: update
++      uses: ./.github/actions/install-and-cache
++      with:
++        run:     ${{ steps.setup.outputs.command }} ${{ inputs.tools }}
++        path:    ${{ steps.setup.outputs.prefix }} ${{ inputs.path }}
++        key:     ${{ steps.setup.outputs.key }}
++        cache:   ${{ inputs.cache }}
++        verbose: ${{ inputs.verbose }}
\ No newline at end of file
.github/workflows/auto-assign-pr.yml
@@ -0,0 +1,8 @@
+diff --git a/.github/workflows/auto-assign-pr.yml b/.github/workflows/auto-assign-pr.yml
+--- a/.github/workflows/auto-assign-pr.yml
++++ b/.github/workflows/auto-assign-pr.yml
+   assign-author:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: toshimaru/auto-author-assign@v2.1.1
++      - uses: toshimaru/auto-author-assign@16f0022cf3d7970c106d8d1105f75a1165edb516 # v2.1.1
\ No newline at end of file
.github/workflows/benchmark.yml
@@ -0,0 +1,86 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/benchmark.yml
++# issue_comment is used here to trigger on PR comments, as opposed to pull_request_review
++# (review submissions) or pull_request_review_comment (comments on the diff itself)
++# we want to trigger this on comment creation or edit
++on:
++  issue_comment:
++    types: [created, edited]
++name: Benchmarking PR performance
++jobs:
++  run-benchmark:
++    name: Run benchmarks
++    if: >
++      github.event.issue.pull_request &&
++      (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
++    runs-on: ubuntu-latest
++    permissions:
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
++    steps:
++      - name: Get PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++        with:
++          ref: ${{ steps.pr.outputs.head_sha }}
++      - uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file
.github/workflows/build.yml
@@ -0,0 +1,315 @@
+diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
+--- a/.github/workflows/build.yml
++++ b/.github/workflows/build.yml
+ name: build
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   RUST_BACKTRACE: 1
+ 
++# Supply chain security: all cargo commands that resolve dependencies use --locked to
++# enforce the committed Cargo.lock. This prevents CI from silently resolving a newer
++# (potentially compromised) dependency version. If Cargo.lock is out of sync with
++# Cargo.toml, the build fails immediately. Any dependency change must be an explicit,
++# reviewable update to Cargo.lock in the PR. Commands that skip --locked: cargo fmt
++# (no dep resolution), cargo msrv verify/show (wrapper tool), cargo miri setup (tooling).
++#
++# Swatinem/rust-cache caches the cargo registry and target directory (~450MB per job).
++# save-if restricts cache writes to main pushes only. PRs read from main's cache but
++# never write their own entries.
++#
++# The key insight: Cargo.lock changes infrequently, so main's cache key almost always
++# matches. PRs download and compile zero dependencies on cache hit. By only writing on
++# main, we keep main's cache entries alive (no LRU eviction from PR churn), and every
++# PR benefits from them.
++#
++# Without this, GHA's ref-scoped caching works against us: each PR writes ~6.3GB of
++# cache entries (14 jobs x ~450MB) that only that PR can read. A handful of active PRs
++# fills the 10GB cache budget, LRU evicts main's shared entries, and every subsequent
++# PR compiles from scratch.
++#
++# The save-if condition checks both event_name == 'push' and ref == main because
++# pull_request_target events set github.ref to the base branch (main), not the PR
++# branch. Without the event_name check, those workflows would write cache entries on
++# every PR.
++#
++# Note: actions-rust-lang/setup-rust-toolchain has built-in Swatinem/rust-cache that
++# writes on every run with no save-if support. We disable it with cache: false and
++# manage caching explicitly via the Swatinem/rust-cache steps below.
++
+ jobs:
+   format:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: rustfmt
+       - name: format
+         run: cargo fmt -- --check
+   msrv:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+       - name: verify-msrv
+   msrv-run-tests:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable and cargo msrv
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
+         with:
+           tool: cargo-msrv
+-      - uses: taiki-e/install-action@nextest
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Get rust-version from Cargo.toml
+         id: rust-version
+         run: echo "RUST_VERSION=$(cargo msrv show --path kernel/ --output-format minimal)" >> $GITHUB_ENV
+       - name: Install specified rust version
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           toolchain: ${{ env.RUST_VERSION }}
+       - name: run tests
+         run: |
+           pushd kernel
+           echo "Testing with $(cargo msrv show --output-format minimal)"
+-          cargo +$(cargo msrv show --output-format minimal) nextest run
++          cargo +$(cargo msrv show --output-format minimal) nextest run --locked
+   docs:
+     runs-on: ubuntu-latest
+     env:
+       RUSTDOCFLAGS: -D warnings
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build docs
+-        run: cargo doc --workspace --all-features
+-
++        run: cargo doc --locked --workspace --all-features --no-deps
+ 
+   # When we run cargo { build, clippy } --no-default-features, we want to build/lint the kernel to
+   # ensure that we can build the kernel without any features enabled. Unfortunately, due to how
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable with clippy
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
+         with:
++          cache: false
+           components: clippy
+-      - uses: Swatinem/rust-cache@v2
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: build and lint with clippy
+-        run: cargo clippy --benches --tests --all-features -- -D warnings
++        run: cargo clippy --locked --benches --tests --all-features -- -D warnings
+       - name: lint without default features - packages which depend on kernel with features enabled
+-        run: cargo clippy --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --workspace --no-default-features --exclude delta_kernel --exclude delta_kernel_ffi --exclude delta_kernel_derive --exclude delta_kernel_ffi_macros -- -D warnings
+       - name: lint without default features - packages which don't depend on kernel with features enabled
+-        run: cargo clippy --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
++        run: cargo clippy --locked --no-default-features --package delta_kernel --package delta_kernel_ffi --package delta_kernel_derive --package delta_kernel_ffi_macros -- -D warnings
+       - name: check kernel builds with default-engine-native-tls
+-        run: cargo build -p feature_tests --features default-engine-native-tls
++        run: cargo build --locked -p feature_tests --features default-engine-native-tls
++      - name: test native-tls backend has no crypto provider conflict
++        run: cargo test --locked -p feature_tests --features default-engine-native-tls
+       - name: check kernel builds with default-engine-rustls
+-        run: cargo build -p feature_tests --features default-engine-rustls
++        run: cargo build --locked -p feature_tests --features default-engine-rustls
++      - name: test rustls TLS backend feature-tests
++        run: cargo test --locked -p feature_tests --features default-engine-rustls
+   test:
+     runs-on: ${{ matrix.os }}
+     strategy:
+           - ubuntu-latest
+           - windows-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++      - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
++        id: filter
++        with:
++          filters: |
++            ffi:
++              - 'ffi/src/handle.rs'
++              - 'ffi-proc-macros/**'
+       - name: Install minimal stable with clippy and rustfmt
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: test
+-        run: cargo nextest run --workspace --all-features -E 'not test(read_table_version_hdfs)'
++        run: cargo nextest run --locked --workspace --all-features -E 'not test(read_table_version_hdfs) and not test(invalid_handle_code)'
++      - name: trybuild tests
++        if: steps.filter.outputs.ffi == 'true'
++        run: cargo test --locked --package delta_kernel_ffi --features internal-api -- invalid_handle_code
+ 
+   ffi_test:
+     runs-on: ${{ matrix.os }}
+           - macOS-latest
+           - ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Setup cmake
+-        uses: jwlawson/actions-setup-cmake@v2
++        uses: jwlawson/actions-setup-cmake@0d6a7d60b009d01c9e7523be22153ff8f19460d3 # v2.2.0
+         with:
+-          cmake-version: '3.30.x'
++          cmake-version: "3.30.x"
+       - name: Install arrow-glib-linux
+         run: |
+           if [ "$RUNNER_OS" == "Linux" ]; then
+            fi
+       - name: Install arrow-glib-macOS
+         if: runner.os == 'macOS'
+-        uses: tecolicom/actions-use-homebrew-tools@v1
++        uses: ./.github/actions/use-homebrew-tools
+         with:
+-          tools: 'apache-arrow apache-arrow-glib'
++          tools: "apache-arrow apache-arrow-glib"
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Set output on fail
+         run: echo "CTEST_OUTPUT_ON_FAILURE=1" >> "$GITHUB_ENV"
+       - name: Build kernel
+         run: |
+           pushd acceptance
+-          cargo build
++          cargo build --locked
+           popd
+           pushd ffi
+-          cargo b --features default-engine-rustls,test-ffi,tracing,uc-catalog
++          cargo build --locked --features default-engine-rustls,test-ffi,tracing,delta-kernel-unity-catalog
+           popd
+       - name: build and run read-table test
+         run: |
+           cmake ..
+           make
+           make test
+-      - name: build and run uc-catalog-ffi test
++      - name: build and run delta-kernel-unity-catalog-ffi test
+         run: |
+-          pushd ffi/examples/uc-catalog-example
++          pushd ffi/examples/delta-kernel-unity-catalog-example
+           mkdir build
+           pushd build
+           cmake ..
+           make
+           make test
+   miri:
+-    name: "Miri"
++    name: "Miri (shard ${{ matrix.partition }}/3)"
+     runs-on: ubuntu-latest
++    strategy:
++      matrix:
++        partition: [1, 2, 3]
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install Miri
+         run: |
+           rustup toolchain install nightly --component miri
+           rustup override set nightly
+           cargo miri setup
+-      - uses: Swatinem/rust-cache@v2
+-      - uses: taiki-e/install-action@nextest
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++      - uses: taiki-e/install-action@98ec31d284eb962f41c14065e9391a955aa810cf # nextest
+       - name: Test with Miri
+         run: |
+           pushd ffi
+-          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --features default-engine-rustls,uc-catalog
++          MIRIFLAGS=-Zmiri-disable-isolation cargo miri nextest run --locked --features default-engine-rustls,delta-kernel-unity-catalog --partition slice:${{ matrix.partition }}/3
+ 
+   coverage:
+     runs-on: ubuntu-latest
+     env:
+       CARGO_TERM_COLOR: always
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install rust
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-llvm-cov
+-        uses: taiki-e/install-action@cargo-llvm-cov
++        uses: taiki-e/install-action@2d15d02e710b40b6332201aba6af30d595b5cd96 # cargo-llvm-cov
+       - name: Generate code coverage
+-        run: cargo llvm-cov --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
++        run: cargo llvm-cov --locked --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
+       - name: Upload coverage to Codecov
+-        uses: codecov/codecov-action@v5
++        uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
+         with:
+           files: codecov.json
+           fail_ci_if_error: true
\ No newline at end of file
.github/workflows/comment-on-title-failure.yml
@@ -0,0 +1,65 @@
+diff --git a/.github/workflows/comment-on-title-failure.yml b/.github/workflows/comment-on-title-failure.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/comment-on-title-failure.yml
++name: Comment on PR Title Failure
++
++on:
++  workflow_run:
++    workflows: ["Validate PR Title"]
++    types: [completed]
++
++jobs:
++  comment:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      # Step taken from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR info
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          # If the PR is from a fork, prefix it with `<owner-login>:`, otherwise only the PR branch name is relevant:
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        # Query the PR number by repo + branch, then assign to step output:
++        run: |
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++             --json 'number,title' --jq '"number=\(.number)\ntitle=\(.title)"' \
++             >> "${GITHUB_OUTPUT}"
++
++      - name: Find existing comment
++        id: find
++        uses: peter-evans/find-comment@3eae4d37986fb5a8592848f6a574fdf654e61f9e # v3.1.0
++        with:
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          comment-author: 'github-actions[bot]'
++          body-includes: PR title does not match the required pattern
++
++      - name: Post or update failure comment
++        if: ${{ github.event.workflow_run.conclusion == 'failure' }}
++        uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
++        env:
++          PR_TITLE: ${{ steps.pr-context.outputs.title }}
++        with:
++          comment-id: ${{ steps.find.outputs.comment-id }}
++          issue-number: ${{ steps.pr-context.outputs.number }}
++          body: |
++            PR title does not match the required pattern. Please ensure you follow the [conventional commits](https://www.conventionalcommits.org/) spec.
++
++            Your title should start with `feat:`, `fix:`, `chore:`, `docs:`, `perf:`, `refactor:`, `test:`, or `ci:`, and if it's a breaking change that should be suffixed with a `!` (like `feat!:`), and then a 1-72 character brief description of your change.
++
++            **Title:** `${{ env.PR_TITLE }}`
++
++      - name: Delete comment on success
++        if: ${{ github.event.workflow_run.conclusion == 'success' && steps.find.outputs.comment-id != '' }}
++        env:
++          GH_TOKEN: ${{ github.token }}
++        run: |
++          gh api repos/${{ github.repository }}/issues/comments/${{ steps.find.outputs.comment-id }} -X DELETE
\ No newline at end of file
.github/workflows/pr-validator.yml
@@ -0,0 +1,57 @@
+diff --git a/.github/workflows/pr-validator.yml b/.github/workflows/pr-validator.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/pr-validator.yml
++name: Validate PR Title
++
++on:
++  pull_request:
++    types: [opened, edited, reopened, synchronize, labeled, unlabeled]
++  workflow_run:
++    workflows: ["semver-label"] # we need this since auto-labels from jobs don't trigger a workflow
++    types: [completed]
++
++jobs:
++  validate-title:
++    runs-on: ubuntu-latest
++    steps:
++      - name: Resolve PR metadata
++        id: pr
++        env:
++          GH_TOKEN: ${{ github.token }}
++          # Captured as env vars to prevent expression injection into the shell command.
++          PR_TITLE: ${{ github.event.pull_request.title }}
++          PR_LABELS_JSON: ${{ toJson(github.event.pull_request.labels.*.name) }}
++        run: |
++          if [[ "${{ github.event_name }}" == "workflow_run" ]]; then
++            pr_json=$(gh api --paginate repos/${{ github.repository }}/pulls \
++              --jq ".[] | select(.head.sha == \"${{ github.event.workflow_run.head_sha }}\")")
++            echo "number=$(echo "$pr_json" | jq -r '.number')" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$pr_json" | jq -r '.title'
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$pr_json" | jq -c '[.labels[].name]')" >> "$GITHUB_OUTPUT"
++          else
++            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
++            # Use multiline delimiter syntax so a title containing newlines cannot inject
++            # additional key=value pairs into GITHUB_OUTPUT.
++            {
++              echo 'title<<PR_TITLE_EOF'
++              echo "$PR_TITLE"
++              echo 'PR_TITLE_EOF'
++            } >> "$GITHUB_OUTPUT"
++            echo "labels=$(echo "$PR_LABELS_JSON" | jq -c '.')" >> "$GITHUB_OUTPUT"
++          fi
++
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++
++      - uses: ./.github/actions/pr-title-validator
++        with:
++          regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!?(\(.+\))?: .{1,72}$'
++          breaking-change-regex: '^(feat|fix|chore|docs|perf|refactor|test|ci)!(\(.+\))?: .{1,72}$'
++          labels: ${{ steps.pr.outputs.labels }}
++          title: ${{ steps.pr.outputs.title }}
\ No newline at end of file
.github/workflows/run-examples.yml
@@ -0,0 +1,55 @@
+diff --git a/.github/workflows/run-examples.yml b/.github/workflows/run-examples.yml
+--- a/.github/workflows/run-examples.yml
++++ b/.github/workflows/run-examples.yml
+ name: run-examples
+ 
+-on: [push, pull_request]
++on: [push, pull_request, merge_group]
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   run-examples:
+     runs-on: ubuntu-latest
+     steps:
+-      - uses: actions/checkout@v4
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - uses: Swatinem/rust-cache@v2
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+ 
+       - name: Run all examples
+         run: |
+               # Special case for write-table: it needs a temp directory
+               if [ "$example_dir" = "write-table" ]; then
+                 tmp_dir=$(mktemp -d)
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- "$tmp_dir"
+                 rm -r "$tmp_dir"
+               # Special case for inspect-table: it needs an operation/subcommand, run each one
+               elif [ "$example_dir" = "inspect-table" ]; then
+                 for operation in table-version metadata schema scan-metadata actions; do
+                   echo "  Running inspect-table with operation: $operation"
+-                  cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
++                  cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small $operation
+                 done
+               # Special case for read-table-changes: skip running it in CI as it needs a specific CDF-enabled table
+               # but still verify it compiles
+               # TODO: Add a suitable test table for CDF
+               elif [ "$example_dir" = "read-table-changes" ]; then
+                 echo "Building read-table-changes (skipping run - requires CDF-enabled table)"
+-                cargo build --manifest-path "$example_dir/Cargo.toml" --release
++                cargo build --locked --manifest-path "$example_dir/Cargo.toml" --release
+               else
+                 # All other examples run with the test table path
+-                cargo run --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
++                cargo run --locked --manifest-path "$example_dir/Cargo.toml" --release -- ../tests/data/table-without-dv-small
+               fi
+ 
+               echo ""
\ No newline at end of file
.github/workflows/run_integration_test.yml
@@ -0,0 +1,70 @@
+diff --git a/.github/workflows/run_integration_test.yml b/.github/workflows/run_integration_test.yml
+--- a/.github/workflows/run_integration_test.yml
++++ b/.github/workflows/run_integration_test.yml
+-name: Run tests to ensure we can compile across arrow versions
++# TODO: Disabled. The test script runs cargo update which resolves fresh dependencies,
++#       bypassing the Cargo.lock supply chain policy (see build.yml top-level comment).
+ 
+-on: [workflow_dispatch, push, pull_request]
+-
+-jobs:
+-  arrow_integration_test:
+-    runs-on: ${{ matrix.os }}
+-    timeout-minutes: 20
+-    strategy:
+-      fail-fast: false
+-      matrix:
+-        include:
+-          - os: macOS-latest
+-          - os: ubuntu-latest
+-          - os: windows-latest
+-            skip: ${{ github.event_name == 'pull_request' }} # skip running windows tests on every PR since they are slow
+-    steps:
+-      - name: Skip job for pull requests on Windows
+-        if: ${{ matrix.skip }}
+-        run: echo "Skipping job for pull requests on Windows."
+-      - uses: actions/checkout@v4
+-        if: ${{ !matrix.skip }}
+-      - name: Setup rust toolchain
+-        if: ${{ !matrix.skip }}
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
+-      - name: Run integration tests
+-        if: ${{ !matrix.skip }}
+-        shell: bash
+-        run: pushd integration-tests && ./test-all-arrow-versions.sh
++# name: Run tests to ensure we can compile across arrow versions
++#
++# on: [workflow_dispatch, push, pull_request, merge_group]
++#
++# jobs:
++#   arrow_integration_test:
++#     runs-on: ${{ matrix.os }}
++#     timeout-minutes: 20
++#     strategy:
++#       fail-fast: false
++#       matrix:
++#         include:
++#           - os: macOS-latest
++#           - os: ubuntu-latest
++#           - os: windows-latest
++#             skip: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }} # skip running windows tests on PRs and merge queue since they are slow
++#     steps:
++#       - name: Skip job for pull requests on Windows
++#         if: ${{ matrix.skip }}
++#         run: echo "Skipping job for pull requests on Windows."
++#       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
++#         if: ${{ !matrix.skip }}
++#       - name: Setup rust toolchain
++#         if: ${{ !matrix.skip }}
++#         uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++#         with:
++#           cache: false
++#       # See build.yml top-level comment for why save-if is restricted to main.
++#       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++#         if: ${{ !matrix.skip }}
++#         with:
++#           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
++#       - name: Run integration tests
++#         if: ${{ !matrix.skip }}
++#         shell: bash
++#         run: pushd integration-tests && ./test-all-arrow-versions.sh
\ No newline at end of file
.github/workflows/semver-checks.yml
@@ -0,0 +1,136 @@
+diff --git a/.github/workflows/semver-checks.yml b/.github/workflows/semver-checks.yml
+--- a/.github/workflows/semver-checks.yml
++++ b/.github/workflows/semver-checks.yml
+ name: semver-checks
+ 
+-# Trigger when a PR is opened or changed
++# Trigger when a PR is opened or changed. This runs with `pull_request` trigger, which means it has
++# only read perms. The adding of the label happens in semver-label.yml via workflow_run which will
++# will look at the status of this job, and always runs in the base-repo context.
+ on:
+-  pull_request_target:
++  pull_request:
+     types:
+       - opened
+       - synchronize
+       - reopened
++  merge_group:
+ 
+ env:
+   CARGO_TERM_COLOR: always
+   check_if_pr_breaks_semver:
+     runs-on: ubuntu-latest
+     permissions:
+-      # this job runs with read because it checks out the PR head which could contain malicious code
+       contents: read
+     steps:
+-      - uses: actions/checkout@v4
+-        name: checkout full rep
++      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           fetch-depth: 0
+-          ref: ${{ github.event.pull_request.head.sha }}
++          ref: >-
++            ${{ github.event_name == 'merge_group'
++                && github.event.merge_group.head_sha
++                || github.event.pull_request.head.sha }}
+       - name: Install minimal stable
+-        uses: actions-rust-lang/setup-rust-toolchain@v1
++        uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1.15.4
++        with:
++          cache: false
++      # See build.yml top-level comment for why save-if is restricted to main.
++      - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
++        with:
++          save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+       - name: Install cargo-semver-checks
++        uses: taiki-e/install-action@7bc99eee1f1b8902a125006cf790a1f4c8461e63 # v2.69.8
++        with:
++          tool: cargo-semver-checks
++      - name: Compute baseline revision
++        id: baseline
+         shell: bash
++        env:
++          MERGE_GROUP_BASE_SHA: ${{ github.event.merge_group.base_sha }}
++          PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
++          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
+         run: |
+-          cargo install cargo-semver-checks --locked
+-      - name: Run check
++          if [ "${{ github.event_name }}" = "merge_group" ]; then
++            echo "rev=${MERGE_GROUP_BASE_SHA}" >> "$GITHUB_OUTPUT"
++          else
++            # Use the merge-base instead of the PR base SHA. The base SHA is the tip of
++            # the target branch when the webhook fires, which can differ from where the PR
++            # actually diverged. Using merge-base avoids false positives when the PR branch
++            # is behind the target branch.
++            MERGE_BASE=$(git merge-base "$PR_HEAD_SHA" "$PR_BASE_SHA")
++            echo "rev=${MERGE_BASE}" >> "$GITHUB_OUTPUT"
++          fi
++      - name: Run semver check
+         id: check
+         continue-on-error: true
+         shell: bash
++        env:
++          BASELINE_REV: ${{ steps.baseline.outputs.rev }}
+         # only check semver on released crates (delta_kernel and delta_kernel_ffi).
+         # note that this won't run on proc macro/derive crates, so don't need to include
+         # delta_kernel_derive etc.
+         run: |
+-          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features --baseline-rev ${{ github.event.pull_request.base.sha }}
+-      - name: On Failure
+-        id: set_failure
+-        if: ${{ steps.check.outcome == 'failure' }}
+-        run: |
+-          echo "Checks failed"
+-          echo "check_status=failure" >> $GITHUB_OUTPUT
+-      - name: On Success
+-        id: set_success
+-        if: ${{ steps.check.outcome == 'success' }}
+-        run: |
+-          echo "Checks succeed"
+-          echo "check_status=success" >> $GITHUB_OUTPUT
+-    outputs:
+-      check_status: ${{ steps.set_failure.outputs.check_status || steps.set_success.outputs.check_status }}
+-  update_label_if_needed:
+-    needs: check_if_pr_breaks_semver
+-    runs-on: ubuntu-latest
+-    permissions:
+-      # this job only looks at previous output and then sets a label, so malicious code in the PR
+-      # isn't a concern
+-      pull-requests: write
+-    steps:
+-      - name: On Failure
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'failure'
+-        uses: actions-ecosystem/action-add-labels@v1
++          cargo semver-checks -p delta_kernel -p delta_kernel_ffi --all-features \
++            --baseline-rev "$BASELINE_REV"
++      # Upload the step outcome as an artifact so semver-label.yml can read it via workflow_run.
++      # steps.check.outcome is the raw result *before* continue-on-error converts it to "success",
++      # so it correctly reflects whether a breaking change was detected.
++      # Only upload for pull_request events; merge_group runs have no PR to label.
++      - name: Save semver outcome
++        if: github.event_name == 'pull_request'
++        env:
++          SEMVER_OUTCOME: ${{ steps.check.outcome }}
++        run: echo "$SEMVER_OUTCOME" > semver-outcome.txt
++      - name: Upload semver outcome
++        if: github.event_name == 'pull_request'
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+         with:
+-          labels: breaking-change
+-      - name: Remove breaking-change label
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success' && contains(github.event.pull_request.labels.*.name, 'breaking-change')
+-        uses: actions-ecosystem/action-remove-labels@v1
+-        with:
+-          labels: breaking-change
+-      - name: On Success
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status == 'success'
+-        run: |
+-          echo "Checks succeed"
+-      - name: Fail On Incorrect Previous Output
+-        if: needs.check_if_pr_breaks_semver.outputs.check_status != 'success' && needs.check_if_pr_breaks_semver.outputs.check_status != 'failure'
+-        run: exit 1
++          name: semver-outcome
++          path: semver-outcome.txt
++          retention-days: 1
\ No newline at end of file
.github/workflows/semver-label.yml
@@ -0,0 +1,81 @@
+diff --git a/.github/workflows/semver-label.yml b/.github/workflows/semver-label.yml
+new file mode 100644
+--- /dev/null
++++ b/.github/workflows/semver-label.yml
++name: semver-label
++
++# Apply or remove the breaking-change label based on the outcome of the semver-checks workflow.
++# This must be a separate workflow from semver-checks.yml: label writes require pull-requests:write,
++# which is unavailable in pull_request workflows triggered by fork PRs. workflow_run always runs
++# in the base-repo context with full write permissions, and never executes PR code.
++on:
++  workflow_run:
++    workflows: ["semver-checks"]
++    types: [completed]
++
++jobs:
++  update_label_if_needed:
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++      actions: read
++    # Label updates only apply to PRs; merge_group runs have no associated PR to label.
++    if: github.event.workflow_run.event == 'pull_request'
++    steps:
++      # Resolve PR number from the triggering workflow run's branch. For fork PRs the branch
++      # must be prefixed with `<owner>:` so gh pr view can locate it.
++      # Pattern from: https://github.com/orgs/community/discussions/25220#discussioncomment-11316244
++      - name: Find PR number
++        id: pr-context
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_TARGET_REPO: ${{ github.repository }}
++          PR_BRANCH: |-
++            ${{
++              (github.event.workflow_run.head_repository.owner.login != github.event.workflow_run.repository.owner.login)
++                && format('{0}:{1}', github.event.workflow_run.head_repository.owner.login, github.event.workflow_run.head_branch)
++                || github.event.workflow_run.head_branch
++            }}
++        run: |
++          echo "Looking up PR for branch '${PR_BRANCH}' in repo '${PR_TARGET_REPO}'"
++          gh pr view --repo "${PR_TARGET_REPO}" "${PR_BRANCH}" \
++            --json 'number' --jq '"number=\(.number)"' \
++            >> "${GITHUB_OUTPUT}"
++          echo "PR lookup complete: $(cat "${GITHUB_OUTPUT}")"
++
++      # Download the semver outcome artifact written by semver-checks.yml.
++      # steps.check.outcome in that workflow is the raw result before continue-on-error
++      # converts it to "success", so it correctly reflects whether a breaking change was found.
++      - name: Download semver outcome
++        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
++        with:
++          name: semver-outcome
++          github-token: ${{ github.token }}
++          run-id: ${{ github.event.workflow_run.id }}
++
++      - name: Update breaking-change label
++        if: steps.pr-context.outputs.number != ''
++        env:
++          GH_TOKEN: ${{ github.token }}
++          PR_NUMBER: ${{ steps.pr-context.outputs.number }}
++        run: |
++          STEP_OUTCOME=$(cat semver-outcome.txt)
++          echo "Semver check outcome: '${STEP_OUTCOME}' for PR #${PR_NUMBER}"
++
++          if [[ "$STEP_OUTCOME" == "failure" ]]; then
++            echo "Breaking change detected -- adding 'breaking-change' label to PR #$PR_NUMBER"
++            gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --add-label "breaking-change"
++          elif [[ "$STEP_OUTCOME" == "success" ]]; then
++            # Remove the label only if it is currently present; gh pr edit fails on absent labels.
++            CURRENT_LABELS=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --json labels --jq '[.labels[].name]')
++            echo "Current PR labels: $CURRENT_LABELS"
++            if echo "$CURRENT_LABELS" | jq -e '.[] | select(. == "breaking-change")' > /dev/null 2>&1; then
++              echo "Semver check passed -- removing 'breaking-change' label from PR #$PR_NUMBER"
++              gh pr edit "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --remove-label "breaking-change"
++            else
++              echo "Semver check passed -- 'breaking-change' label not present, nothing to do"
++            fi
++          else
++            echo "ERROR: unexpected semver outcome '${STEP_OUTCOME}' in semver-outcome.txt"
++            exit 1
++          fi
\ No newline at end of file
.gitignore
@@ -0,0 +1,31 @@
+diff --git a/.gitignore b/.gitignore
+--- a/.gitignore
++++ b/.gitignore
+ 
+ # IDE
+ .claude/
++.cursor/
+ .dir-locals.el
+ .idea/
+ .vscode/
+ .zed
+ .cache/
+ .clangd
++*.*~
+ 
+ # Rust
++.cargo-home
+ target/
+-/Cargo.lock
+ integration-tests/Cargo.lock
+ 
+ # Project
+ acceptance/tests/dat/
++acceptance/workloads/
+ ffi/examples/read-table/build
++ffi/examples/visit-expression/build
+ /build
+ /kernel/target
+ /target
++
++/benchmarks/workloads/
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff a5164b3..c6c465f 7b1612f..bf96beb | Disable: git config gitstack.push-range-diff false

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from bf96beb to b293bf7 Compare April 15, 2026 01:15
@william-ch-databricks
Copy link
Copy Markdown
Contributor Author

Range-diff: main (bf96beb -> b293bf7)
.github/workflows/benchmark.yml
@@ -0,0 +1,105 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+--- a/.github/workflows/benchmark.yml
++++ b/.github/workflows/benchmark.yml
+     types: [created, edited]
+ name: Benchmarking PR performance
+ jobs:
+-  runBenchmark:
++  run-benchmark:
+     name: Run benchmarks
+     if: >
+       github.event.issue.pull_request &&
+       (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
+     runs-on: ubuntu-latest
+     permissions:
+-      pull-requests: write
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
+     steps:
+-      - name: Parse benchmark tags
+-        env:
+-          COMMENT: ${{ github.event.comment.body }}
+-        run: |
+-          if [[ "$COMMENT" == "/bench" ]]; then
+-            TAGS="base"
+-          else
+-            TAGS="${COMMENT#/bench }"
+-            TAGS=$(echo "$TAGS" | tr -d '[:space:]')
+-          fi
+-          echo "BENCH_TAGS=$TAGS" >> "$GITHUB_ENV"
+-          echo "Parsed tags: $TAGS"
+-      - name: Get PR HEAD sha
++      - name: Get PR metadata
+         id: pr
+-        run: |
+-          PR_DATA=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.issue.number }})
+-          echo "head_sha=$(echo "$PR_DATA" | jq -r .head.sha)" >> "$GITHUB_OUTPUT"
+-          echo "base_ref=$(echo "$PR_DATA" | jq -r .base.ref)" >> "$GITHUB_OUTPUT"
+         env:
+           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           ref: ${{ steps.pr.outputs.head_sha }}
+       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
+         with:
+           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+-      # TODO: This action internally runs `cargo bench` without --locked, bypassing our
+-      #       supply chain lockfile policy (see build.yml top-level comment). Replace with
+-      #       manual cargo bench --locked + critcmp + gh pr comment steps.
+-      # - uses: boa-dev/criterion-compare-action@adfd3a94634fe2041ce5613eb7df09d247555b87 # v3.2.4
+-      #   with:
+-      #     token: ${{ secrets.GITHUB_TOKEN }}
+-      #     branchName: ${{ steps.pr.outputs.base_ref }}
+-      #     cwd: benchmarks
+-      #     benchName: workload_bench
+-      - run: echo "Benchmarking is temporarily disabled. See TODO above."
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,276 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,54 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+   but default-engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,48 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit actions
++(CommitInfo, Add files), enforces protocol compliance (table features, schema validation), and
++delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path encoding
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff a5164b3..bf96beb eae37a7..b293bf7 | Disable: git config gitstack.push-range-diff false

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from b293bf7 to 16b2bc6 Compare April 15, 2026 01:28
@william-ch-databricks
Copy link
Copy Markdown
Contributor Author

Range-diff: main (b293bf7 -> 16b2bc6)
.github/workflows/benchmark.yml
@@ -0,0 +1,105 @@
+diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
+--- a/.github/workflows/benchmark.yml
++++ b/.github/workflows/benchmark.yml
+     types: [created, edited]
+ name: Benchmarking PR performance
+ jobs:
+-  runBenchmark:
++  run-benchmark:
+     name: Run benchmarks
+     if: >
+       github.event.issue.pull_request &&
+       (github.event.comment.body == '/bench' || startsWith(github.event.comment.body, '/bench '))
+     runs-on: ubuntu-latest
+     permissions:
+-      pull-requests: write
++      contents: read
++    outputs:
++      pr_number: ${{ steps.pr.outputs.pr_number }}
+     steps:
+-      - name: Parse benchmark tags
+-        env:
+-          COMMENT: ${{ github.event.comment.body }}
+-        run: |
+-          if [[ "$COMMENT" == "/bench" ]]; then
+-            TAGS="base"
+-          else
+-            TAGS="${COMMENT#/bench }"
+-            TAGS=$(echo "$TAGS" | tr -d '[:space:]')
+-          fi
+-          echo "BENCH_TAGS=$TAGS" >> "$GITHUB_ENV"
+-          echo "Parsed tags: $TAGS"
+-      - name: Get PR HEAD sha
++      - name: Get PR metadata
+         id: pr
+-        run: |
+-          PR_DATA=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.issue.number }})
+-          echo "head_sha=$(echo "$PR_DATA" | jq -r .head.sha)" >> "$GITHUB_OUTPUT"
+-          echo "base_ref=$(echo "$PR_DATA" | jq -r .base.ref)" >> "$GITHUB_OUTPUT"
+         env:
+           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          REPO: ${{ github.repository }}
++          PR_NUMBER: ${{ github.event.issue.number }}
++        run: |
++          PR_DATA=$(gh api "repos/$REPO/pulls/$PR_NUMBER")
++          HEAD_SHA=$(echo "$PR_DATA" | jq -r .head.sha)
++          BASE_REF=$(echo "$PR_DATA" | jq -r .base.ref)
++          [[ "$HEAD_SHA" == *$'\n'* || "$BASE_REF" == *$'\n'* ]] && { echo "Unexpected newline in API response" >&2; exit 1; }
++          [[ "$BASE_REF" =~ ^[a-zA-Z0-9/_.-]+$ ]] || { echo "Invalid BASE_REF: $BASE_REF" >&2; exit 1; }
++          printf 'head_sha=%s\n' "$HEAD_SHA" >> "$GITHUB_OUTPUT"
++          printf 'base_ref=%s\n'  "$BASE_REF"  >> "$GITHUB_OUTPUT"
++          printf 'pr_number=%s\n' "$PR_NUMBER"  >> "$GITHUB_OUTPUT"
++      - name: Install critcmp
++        # Installed before checkout so the PR's .cargo/config.toml cannot
++        # redirect the registry to a malicious source. The runner's
++        # pre-installed Rust is sufficient -- no toolchain setup needed here.
++        # --locked is omitted for cargo install (same exemption as cargo miri
++        # setup); --version pins the top-level crate.
++        run: cargo install critcmp --version 0.1.8
+       - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+         with:
+           ref: ${{ steps.pr.outputs.head_sha }}
+       - uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
+         with:
+           save-if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+-      # TODO: This action internally runs `cargo bench` without --locked, bypassing our
+-      #       supply chain lockfile policy (see build.yml top-level comment). Replace with
+-      #       manual cargo bench --locked + critcmp + gh pr comment steps.
+-      # - uses: boa-dev/criterion-compare-action@adfd3a94634fe2041ce5613eb7df09d247555b87 # v3.2.4
+-      #   with:
+-      #     token: ${{ secrets.GITHUB_TOKEN }}
+-      #     branchName: ${{ steps.pr.outputs.base_ref }}
+-      #     cwd: benchmarks
+-      #     benchName: workload_bench
+-      - run: echo "Benchmarking is temporarily disabled. See TODO above."
++      - name: Run benchmarks
++        # The comment is posted in the post-comment job after this job completes.
++        env:
++          COMMENT:  ${{ github.event.comment.body }}
++          BASE_REF: ${{ steps.pr.outputs.base_ref }}
++          HEAD_SHA: ${{ steps.pr.outputs.head_sha }}
++        run: bash benchmarks/ci/run-benchmarks.sh
++      - name: Upload benchmark comment
++        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
++        with:
++          name: bench-comment
++          path: /tmp/bench-comment.md
++
++  post-comment:
++    name: Post benchmark results
++    needs: run-benchmark
++    runs-on: ubuntu-latest
++    permissions:
++      pull-requests: write
++    steps:
++      - name: Download benchmark comment
++        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
++        with:
++          name: bench-comment
++          path: /tmp/
++      - name: Post results as PR comment
++        env:
++          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
++          PR_NUMBER: ${{ needs.run-benchmark.outputs.pr_number }}
++          REPO: ${{ github.repository }}
++        run: gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file /tmp/bench-comment.md
\ No newline at end of file
CHANGELOG.md
@@ -0,0 +1,276 @@
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+ # Changelog
+ 
++## [v0.21.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.21.0/) (2026-04-10)
++
++[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.20.0...v0.21.0)
++
++
++### 🏗️ Breaking changes
++
++1. Add partitioned variant to DataLayout enum ([#2145])
++   - Adds `Partitioned` variant to `DataLayout` enum. Update match statements to handle the new variant.
++2. Add create many API to engine ([#2070])
++   - Adds `create_many` method to `ParquetHandler` trait. Implementors must add this method. See the trait rustdocs for details.
++3. Rename uc-catalog and uc-client crates ([#2136])
++   - `delta-kernel-uc-catalog` renamed to `delta-kernel-unity-catalog`. `delta-kernel-uc-client` renamed to `unity-catalog-delta-rest-client`. Update `Cargo.toml` dependencies accordingly.
++4. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++   - `Snapshot::checkpoint()` and checksum APIs now return the updated `Snapshot`. Callers must handle the returned value.
++5. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++   - Enforces that committer type matches table type (catalog-managed vs path-based). Use appropriate committer for your table type.
++6. Add UCCommitter validation for catalog-managed tables ([#2254])
++   - `UCCommitter` now rejects commits to non-catalog-managed tables. Use `FileSystemCommitter` for path-based tables.
++7. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++   - FFI snapshot creation now uses builder pattern. Update FFI callers to use the new builder APIs.
++8. Make tags and remove partition values allow null values in map ([#2281])
++   - `tags` and `partitionValues` map values are now nullable. Update code that assumes non-null values.
++9. Better naming style for column mapping related functions/variables ([#2290])
++   - Renamed: `make_physical` to `to_physical_name`, `make_physical_struct` to `to_physical_schema`, `transform_struct_for_projection` to `projection_transform`. Update call sites.
++10. Remove the catalog-managed feature flag ([#2310])
++    - The `catalog-managed` feature flag is removed. Catalog-managed table support is now always available.
++11. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++    - `Snapshot::checkpoint()` now returns `CheckpointResult` instead of `Snapshot`. Access the snapshot via `CheckpointResult::snapshot`.
++12. Remove old non-builder snapshot FFI functions ([#2318])
++    - Removed legacy FFI snapshot functions. Use the new builder-pattern FFI functions instead.
++13. Support version 0 (table creation) commits in UCCommitter ([#2247])
++    - Connectors using `UCCommitter` for table creation must now handle post-commit finalization via the UC create table API.
++14. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++    - `CommitMetadata` now uses computed in-commit timestamp instead of wall-clock time. Callers relying on wall-clock timing should update accordingly.
++15. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++    - Minimum supported Arrow version is now arrow-57. Update your `Cargo.toml` if using `arrow-56` feature.
++16. Crc File Histogram Read and Write Support ([#2235])
++    - Adds `AddedHistogram` and `RemovedHistogram` fields to `FileStatsDelta` struct.
++17. Add ScanMetadataCompleted metric event ([#2236])
++    - Adds `ScanMetadataCompleted` variant to `MetricEvent` enum. Update metric reporters to handle the new variant.
++18. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++    - Adds `JsonReadCompleted` and `ParquetReadCompleted` variants to `MetricEvent` enum. Update metric reporters to handle new variants.
++19. New transform helpers for unary and binary children ([#2150])
++    - Removes public `CowExt` trait. Remove any usages of this trait.
++20. New mod transforms for expression and schema transforms ([#2077])
++    - Moves `SchemaTransform` and `ExpressionTransform` to new `transforms` module. Update import paths.
++21. Introduce object_store compat shim ([#2111])
++    - Renames `object_store` dependency to `object_store_12`. Update any direct references.
++22. Consolidate domain metadata reads through Snapshot ([#2065])
++    - Domain metadata reads now go through `Snapshot` methods. Update callers using old free functions.
++23. Don't read or write arrow schema in parquet files ([#2025])
++    - Parquet files no longer include arrow schema metadata. Code relying on this metadata must be updated.
++24. Rename include_stats_columns to include_all_stats_columns ([#1996])
++    - Renames `ScanBuilder::include_stats_columns()` to `ScanBuilder::include_all_stats_columns()`. Update call sites.
++
++### 🚀 Features / new APIs
++
++1. Add SQL -> Kernel predicate parser to benchmark framework ([#2099])
++2. Add observability metrics for scan log replay ([#1866])
++3. Filtered engine data visitor ([#1942])
++4. Trigger benchmarking with comments ([#2089])
++5. Unify data stats and partition values in DataSkippingFilter ([#1948])
++6. Download benchmark workloads from DAT release ([#2163])
++7. Add partitioned variant to DataLayout enum ([#2145])
++8. Expose table_properties in FFI via visit_table_properties ([#2196])
++9. Allow checkpoint stats properties in CREATE TABLE ([#2210])
++10. Add crc file histogram initial struct and methods ([#2212])
++11. BinaryPredicate evaluate expression with ArrowViewType. ([#2052])
++12. Add acceptance workloads testing harness ([#2092])
++13. Enable DeletionVectors table feature in CREATE TABLE ([#2245])
++14. Checksum and checkpoint APIs return updated Snapshot ([#2182])
++15. Adding ScanBuilder FFI functions for Scans ([#2237])
++16. Add CountingReporter and fix metrics forwarding ([#2166])
++17. Instrument JSON and Parquet handler reads with MetricsReporter ([#2169])
++18. Wire CountingReporter into workload benchmarks ([#2171])
++19. Add create many API to engine ([#2070])
++20. Add ScanMetadataCompleted metric event ([#2236])
++21. Allow AppendOnly, ChangeDataFeed, and TypeWidening in CREATE TABLE ([#2279])
++22. Support max timestamp stats for data skipping ([#2249])
++23. Add list with backward checkpoint scan ([#2174])
++24. Add Snapshot::get_timestamp ([#2266])
++25. Make tags  and remove partition values allow null values in map ([#2281])
++26. Support UC credential vending and S3 benchmarks ([#2109])
++27. Add catalogManaged to allowed features in CREATE TABLE ([#2293])
++28. Add catalog-managed table creation utilities ([#2203])
++29. Support version 0 (table creation) commits in UCCommitter ([#2247])
++30. Update snapshot.checkpoint API to return a CheckpointResult ([#2314])
++31. Cached checkpoint output schema ([#2270])
++32. Refactor snapshot FFI to use builder pattern and enable snapshot reuse ([#2255])
++33. Add P&M to CommitMetadata and enforce committer/table type matching ([#2250])
++34. Add UCCommitter validation for catalog-managed tables ([#2254])
++35. Crc File Histogram Read and Write Support ([#2235])
++36. Add FFI function to expose snapshot's timestamp ([#2274])
++37. Add FFI create table DDL functions ([#2296])
++38. Add FFI remove files DML functions ([#2297])
++39. Expose Protocol and Metadata as opaque FFI handle types ([#2260])
++40. Add FFI bindings for domain metadata write operations ([#2327])
++
++### 🐛 Bug Fixes
++
++1. Treat null literal as unknown in meta-predicate evaluation ([#2097])
++2. Update TokioBackgroundExecutor to join thread instead of detaching ([#2126])
++3. Use thread pools and multi-thread tokio executor in read metadata benchmark runner ([#2044])
++4. Emit null stats for all-null columns instead of omitting them ([#2187])
++5. Allow Date/Timestamp casting for stats_parsed compatibility ([#2074])
++6. Filter evaluator input schema ([#2195])
++7. SnapshotCompleted.total_duration now includes log segment loading ([#2183])
++8. Avoid creating empty stats schemas ([#2199])
++9. Prevent dual TLS crypto backends from reqwest default features ([#2178])
++10. Vendor and pin homebrew actions ([#2243])
++11. Validate min_reader/writer_version are at least 1 ([#2202])
++12. Preserve loaded LazyCrc during incremental snapshot updates ([#2211])
++13. Detect stats_parsed in multi-part V1 checkpoints ([#2214])
++14. Downgrade per-batch data skipping log from info to debug ([#2219])
++15. Unknown table features in feature list are "supported" ([#2159])
++16. Remove debug_assert_eq before require in scan evaluator row count checks ([#2262])
++17. Adopt checkpoint written later for same-version snapshot refresh ([#2143])
++18. Return error when parquet handler returns empty data for scan files ([#2261])
++19. Refactor benchmarking workflow to not require criterion compare action ([#2264])
++20. Skip name-based validation for struct columns in expression evaluator ([#2160])
++21. Handle missing leaf columns in nested struct during parquet projection ([#2170])
++22. Pass computed ICT to CommitMetadata instead of wall-clock time ([#2319])
++23. Detect and handle empty (0-byte) log files during listing ([#2336])
++
++### 📚 Documentation
++
++1. Update claude readme to include github actions safety note ([#2190])
++2. Add line width and comment divider style rules to CLAUDE.md ([#2277])
++3. Add documentation for current tags ([#2234])
++4. Document benchmarking in CI accuracy ([#2302])
++
++### ⚡ Performance
++
++1. Pre-size dedup HashSet in ScanLogReplayProcessor ([#2186])
++2. Pre-size HashMap in ArrowEngineData::visit_rows ([#2185])
++3. Remove dead schema conversions in expression evaluators ([#2184])
++
++### 🚜 Refactor
++
++1. Finalized benchmark table names and added new tables ([#2072])
++2. New transform helpers for unary and binary children ([#2150])
++3. Remove legacy row-level partition filter path ([#2158])
++4. Restructured list log files function ([#2173])
++5. Consolidate and add testing for set transaction expiration ([#2176])
++6. Rename uc-catalog and uc-client crates ([#2136])
++7. Better naming style for column mapping related functions/variables ([#2290])
++8. Centralize computation for physical schema without partition columns ([#2142])
++9. Consolidate FFI test setup helpers into ffi_test_utils ([#2307])
++10. *(action_reconciliation)* Combine getter index and field name constants ([#1717]) ([#1774])
++11. Extract shared stat helpers from RowGroupFilter ([#2324])
++12. Extract WriteContext to its own file ([#2349])
++
++### ⚙️ Chores/CI
++
++1. Clean up arrow deps in cargo files ([#2115])
++2. Commit Cargo.lock and enforce --locked in all CI workflows ([#2240])
++3. Harden pr-title-validator a bit ([#2246])
++4. Renable semver ([#2248])
++5. Attempt fixup of semver-label job ([#2253])
++6. Use artifacts for semver label ([#2258])
++7. Remove old non-builder snapshot FFI functions ([#2318])
++8. Remove the catalog-managed feature flag ([#2310])
++9. Upgrade to arrow-58 and object_store-13, drop arrow-56 support ([#2116])
++
++### Other
++
++[#2097]: https://github.com/delta-io/delta-kernel-rs/pull/2097
++[#2099]: https://github.com/delta-io/delta-kernel-rs/pull/2099
++[#2126]: https://github.com/delta-io/delta-kernel-rs/pull/2126
++[#2115]: https://github.com/delta-io/delta-kernel-rs/pull/2115
++[#1866]: https://github.com/delta-io/delta-kernel-rs/pull/1866
++[#2044]: https://github.com/delta-io/delta-kernel-rs/pull/2044
++[#1942]: https://github.com/delta-io/delta-kernel-rs/pull/1942
++[#2072]: https://github.com/delta-io/delta-kernel-rs/pull/2072
++[#2089]: https://github.com/delta-io/delta-kernel-rs/pull/2089
++[#2187]: https://github.com/delta-io/delta-kernel-rs/pull/2187
++[#2190]: https://github.com/delta-io/delta-kernel-rs/pull/2190
++[#1948]: https://github.com/delta-io/delta-kernel-rs/pull/1948
++[#2150]: https://github.com/delta-io/delta-kernel-rs/pull/2150
++[#2074]: https://github.com/delta-io/delta-kernel-rs/pull/2074
++[#2195]: https://github.com/delta-io/delta-kernel-rs/pull/2195
++[#2158]: https://github.com/delta-io/delta-kernel-rs/pull/2158
++[#2186]: https://github.com/delta-io/delta-kernel-rs/pull/2186
++[#2185]: https://github.com/delta-io/delta-kernel-rs/pull/2185
++[#2173]: https://github.com/delta-io/delta-kernel-rs/pull/2173
++[#2163]: https://github.com/delta-io/delta-kernel-rs/pull/2163
++[#2145]: https://github.com/delta-io/delta-kernel-rs/pull/2145
++[#2184]: https://github.com/delta-io/delta-kernel-rs/pull/2184
++[#2183]: https://github.com/delta-io/delta-kernel-rs/pull/2183
++[#2199]: https://github.com/delta-io/delta-kernel-rs/pull/2199
++[#2196]: https://github.com/delta-io/delta-kernel-rs/pull/2196
++[#2210]: https://github.com/delta-io/delta-kernel-rs/pull/2210
++[#2178]: https://github.com/delta-io/delta-kernel-rs/pull/2178
++[#2240]: https://github.com/delta-io/delta-kernel-rs/pull/2240
++[#2243]: https://github.com/delta-io/delta-kernel-rs/pull/2243
++[#2202]: https://github.com/delta-io/delta-kernel-rs/pull/2202
++[#2211]: https://github.com/delta-io/delta-kernel-rs/pull/2211
++[#2214]: https://github.com/delta-io/delta-kernel-rs/pull/2214
++[#2246]: https://github.com/delta-io/delta-kernel-rs/pull/2246
++[#2219]: https://github.com/delta-io/delta-kernel-rs/pull/2219
++[#2212]: https://github.com/delta-io/delta-kernel-rs/pull/2212
++[#2176]: https://github.com/delta-io/delta-kernel-rs/pull/2176
++[#2159]: https://github.com/delta-io/delta-kernel-rs/pull/2159
++[#2248]: https://github.com/delta-io/delta-kernel-rs/pull/2248
++[#2253]: https://github.com/delta-io/delta-kernel-rs/pull/2253
++[#2052]: https://github.com/delta-io/delta-kernel-rs/pull/2052
++[#2092]: https://github.com/delta-io/delta-kernel-rs/pull/2092
++[#2258]: https://github.com/delta-io/delta-kernel-rs/pull/2258
++[#2136]: https://github.com/delta-io/delta-kernel-rs/pull/2136
++[#2245]: https://github.com/delta-io/delta-kernel-rs/pull/2245
++[#2182]: https://github.com/delta-io/delta-kernel-rs/pull/2182
++[#2262]: https://github.com/delta-io/delta-kernel-rs/pull/2262
++[#2237]: https://github.com/delta-io/delta-kernel-rs/pull/2237
++[#2166]: https://github.com/delta-io/delta-kernel-rs/pull/2166
++[#2169]: https://github.com/delta-io/delta-kernel-rs/pull/2169
++[#2171]: https://github.com/delta-io/delta-kernel-rs/pull/2171
++[#2143]: https://github.com/delta-io/delta-kernel-rs/pull/2143
++[#2070]: https://github.com/delta-io/delta-kernel-rs/pull/2070
++[#2261]: https://github.com/delta-io/delta-kernel-rs/pull/2261
++[#2277]: https://github.com/delta-io/delta-kernel-rs/pull/2277
++[#2236]: https://github.com/delta-io/delta-kernel-rs/pull/2236
++[#2279]: https://github.com/delta-io/delta-kernel-rs/pull/2279
++[#2249]: https://github.com/delta-io/delta-kernel-rs/pull/2249
++[#2290]: https://github.com/delta-io/delta-kernel-rs/pull/2290
++[#2174]: https://github.com/delta-io/delta-kernel-rs/pull/2174
++[#2264]: https://github.com/delta-io/delta-kernel-rs/pull/2264
++[#2234]: https://github.com/delta-io/delta-kernel-rs/pull/2234
++[#2302]: https://github.com/delta-io/delta-kernel-rs/pull/2302
++[#2142]: https://github.com/delta-io/delta-kernel-rs/pull/2142
++[#2266]: https://github.com/delta-io/delta-kernel-rs/pull/2266
++[#2281]: https://github.com/delta-io/delta-kernel-rs/pull/2281
++[#2109]: https://github.com/delta-io/delta-kernel-rs/pull/2109
++[#2293]: https://github.com/delta-io/delta-kernel-rs/pull/2293
++[#2203]: https://github.com/delta-io/delta-kernel-rs/pull/2203
++[#2247]: https://github.com/delta-io/delta-kernel-rs/pull/2247
++[#2160]: https://github.com/delta-io/delta-kernel-rs/pull/2160
++[#2314]: https://github.com/delta-io/delta-kernel-rs/pull/2314
++[#2270]: https://github.com/delta-io/delta-kernel-rs/pull/2270
++[#2255]: https://github.com/delta-io/delta-kernel-rs/pull/2255
++[#2250]: https://github.com/delta-io/delta-kernel-rs/pull/2250
++[#2254]: https://github.com/delta-io/delta-kernel-rs/pull/2254
++[#2307]: https://github.com/delta-io/delta-kernel-rs/pull/2307
++[#2170]: https://github.com/delta-io/delta-kernel-rs/pull/2170
++[#2235]: https://github.com/delta-io/delta-kernel-rs/pull/2235
++[#2274]: https://github.com/delta-io/delta-kernel-rs/pull/2274
++[#1774]: https://github.com/delta-io/delta-kernel-rs/pull/1774
++[#2296]: https://github.com/delta-io/delta-kernel-rs/pull/2296
++[#2318]: https://github.com/delta-io/delta-kernel-rs/pull/2318
++[#2310]: https://github.com/delta-io/delta-kernel-rs/pull/2310
++[#2297]: https://github.com/delta-io/delta-kernel-rs/pull/2297
++[#2324]: https://github.com/delta-io/delta-kernel-rs/pull/2324
++[#2260]: https://github.com/delta-io/delta-kernel-rs/pull/2260
++[#2327]: https://github.com/delta-io/delta-kernel-rs/pull/2327
++[#2319]: https://github.com/delta-io/delta-kernel-rs/pull/2319
++[#2116]: https://github.com/delta-io/delta-kernel-rs/pull/2116
++[#2349]: https://github.com/delta-io/delta-kernel-rs/pull/2349
++[#2336]: https://github.com/delta-io/delta-kernel-rs/pull/2336
++
++
+ ## [v0.20.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.20.0/) (2026-02-26)
+ 
+ [Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.19.2...v0.20.0)
+ 22. Implement schema diffing for flat schemas (2/5]) ([#1478])
+ 23. Add API on Scan to perform 2-phase log replay  ([#1547])
+ 24. Enable distributed log replay serde serialization for serializable scan state ([#1549])
+-25. Add InCommitTimestamp support to ChangeDataFeed ([#1670]) 
++25. Add InCommitTimestamp support to ChangeDataFeed ([#1670])
+ 26. Add include_stats_columns API and output_stats_schema field ([#1728])
+ 27. Add write support for clustered tables behind feature flag ([#1704])
+ 28. Add snapshot load instrumentation ([#1750])
\ No newline at end of file
CLAUDE.md
@@ -0,0 +1,59 @@
+diff --git a/CLAUDE.md b/CLAUDE.md
+--- a/CLAUDE.md
++++ b/CLAUDE.md
+ (`Snapshot`, `Scan`, `Transaction`) and delegates _how_ to the `Engine` trait.
+ 
+ Current capabilities: table reads with predicates, data skipping, deletion vectors, change
+-data feed, checkpoints (V1 & V2), log compaction, blind append writes, table creation
++data feed, checkpoints (V1 & V2), log compaction (disabled, #2337), blind append writes, table creation
+ (including clustered tables), and catalog-managed table support.
+ 
+ ## Build & Test Commands
+   but default-engine does.
+ - `arrow-conversion`, `arrow-expression` -- Arrow interop (auto-enabled by default engine)
+ - `prettyprint` -- enables Arrow pretty-print helpers (primarily test/example oriented)
+-- `catalog-managed` -- catalog-managed table support (experimental)
+ - `clustered-table` -- clustered table write support (experimental)
+ - `internal-api` -- unstable APIs like `parallel_scan_metadata`. Items are marked with the
+   `#[internal_api]` proc macro attribute.
+ `execute()` (simple), `scan_metadata()` (advanced/distributed),
+ `parallel_scan_metadata()` (two-phase distributed log replay).
+ 
+-**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`,
+-assembles commit actions, enforces protocol compliance, delegates atomic commit to a
+-`Committer`.
++**Write path:** `Snapshot` -> `Transaction` -> `commit()`. Kernel provides `WriteContext`
++(via `partitioned_write_context` or `unpartitioned_write_context`), assembles commit
++actions, enforces protocol compliance, delegates atomic commit to a `Committer`.
+ 
+ **Engine trait:** five handlers (`StorageHandler`, `JsonHandler`, `ParquetHandler`,
+ `EvaluationHandler`, optional `MetricsReporter`). `DefaultEngine` lives in
+   or inputs. Prefer `#[case]` over duplicating test functions. When parameters are
+   independent and form a cartesian product, prefer `#[values]` over enumerating
+   every combination with `#[case]`.
++- Actively look for rstest consolidation opportunities: when writing multiple tests
++  that share the same setup/flow and differ only in configuration and expected
++  outcome, write one parameterized rstest instead of separate functions. Also check
++  whether a new test duplicates the flow of an existing nearby test and should be
++  merged into it as a new `#[case]`. A common pattern is toggling a feature (e.g.
++  column mapping on/off) and asserting success vs. error.
+ - Reuse helpers from `test_utils` instead of writing custom ones when possible.
++- **Prefer snapshot/public API assertions over reading raw commit JSON.** Only read raw
++  commit JSON when the data is inaccessible via public API (e.g., system domain metadata
++  is blocked by `get_domain_metadata`). For commit JSON reads, use `read_actions_from_commit`
++  from `test_utils` -- do NOT write local helpers that duplicate this.
+ - **`add_commit` and table setup in tests:** `add_commit` takes a `table_root` string and
+   resolves it to an absolute object-store path. The `table_root` must be a proper URL string
+   with a trailing slash (e.g. `"memory:///"`, `"file:///tmp/my_table/"`). Avoid using the
+   `allowColumnDefaults`, `changeDataFeed`, `identityColumns`, `rowTracking`,
+   `domainMetadata`, `icebergCompatV1`, `icebergCompatV2`, `clustering`,
+   `inCommitTimestamp`
+-- Reader + writer: `columnMapping`, `deletionVectors`, `timestampNtz`,
+-  `v2Checkpoint`, `vacuumProtocolCheck`, `variantType`, `variantType-preview`,
+-  `typeWidening`
++- Reader + writer: `catalogManaged`, `catalogOwned-preview`, `columnMapping`,
++  `deletionVectors`, `timestampNtz`, `v2Checkpoint`, `vacuumProtocolCheck`,
++  `variantType`, `variantType-preview`, `typeWidening`
+ 
+ Keep this list updated when new protocol features are added to kernel.
+ 
\ No newline at end of file
CLAUDE/architecture.md
@@ -0,0 +1,48 @@
+diff --git a/CLAUDE/architecture.md b/CLAUDE/architecture.md
+--- a/CLAUDE/architecture.md
++++ b/CLAUDE/architecture.md
+ 
+ Built via `Snapshot::builder_for(url).build(engine)` (latest version) or
+ `.at_version(v).build(engine)` (specific version). For catalog-managed tables,
+-`.with_log_tail(commits)` supplies recent unpublished commits from the catalog.
++`.with_log_tail(commits)` supplies recent unpublished commits from the catalog and
++`.with_max_catalog_version(v)` caps the snapshot at the latest catalog-ratified version.
+ 
+ **Snapshot loading internals:**
+ 1. **LogSegment** (`kernel/src/log_segment/`) -- discovers commits + checkpoints for the
+ 
+ `Snapshot` -> `Transaction` -> commit
+ 
+-The kernel coordinates the write transaction: it provides the write context (target directory,
+-physical schema, stats columns), assembles commit actions (CommitInfo, Add files), enforces
+-protocol compliance (table features, schema validation), and delegates the atomic commit to a
+-`Committer`.
++The kernel coordinates the write transaction: it provides the write context (validated partition
++values, recommended write directory, physical schema, stats columns), assembles commit actions
++(CommitInfo, Add files), enforces protocol compliance (table features, schema validation), and
++delegates the atomic commit to a `Committer`.
+ 
+ **Steps:**
+ 1. Create `Transaction` from a snapshot with a `Committer` (e.g. `FileSystemCommitter`)
+-2. Get `WriteContext` for target dir, physical schema, and stats columns
++2. Get `WriteContext` via `partitioned_write_context(values)` or `unpartitioned_write_context()`
+ 3. Write Parquet files (via engine), collect file metadata
+ 4. Register files via `txn.add_files(metadata)`
+ 5. Commit: returns `CommittedTransaction`, `ConflictedTransaction`, or `RetryableTransaction`
+ - `kernel/src/snapshot/` -- `Snapshot`, `SnapshotBuilder`, entry point for reads/writes
+ - `kernel/src/scan/` -- `Scan`, `ScanBuilder`, log replay, data skipping
+ - `kernel/src/transaction/` -- `Transaction`, `WriteContext`, `create_table` builder
++- `kernel/src/partition/` -- partition value validation, serialization, Hive-style path encoding
+ - `kernel/src/committer/` -- `Committer` trait, `FileSystemCommitter`
+ - `kernel/src/log_segment/` -- log file discovery, Protocol/Metadata replay
+ - `kernel/src/log_replay.rs` -- file-action deduplication, `LogReplayProcessor` trait
+ 
+ Tables whose commits go through a catalog (e.g. Unity Catalog) instead of direct filesystem
+ writes. Kernel doesn't know about catalogs -- the catalog client provides a log tail via
+-`SnapshotBuilder::with_log_tail()` and a custom `Committer` for staging/ratifying/publishing
+-commits. Requires `catalog-managed` feature flag.
++`SnapshotBuilder::with_log_tail()`, caps the version via `with_max_catalog_version()`, and
++uses a custom `Committer` for staging/ratifying/publishing commits.
+ 
+ The `UCCommitter` (in the `delta-kernel-unity-catalog` crate) is the reference implementation of a catalog
+ committer for Unity Catalog. It stages commits to `_staged_commits/`, calls the UC commit API to
\ No newline at end of file

... (truncated, output exceeded 60000 bytes)

Reproduce locally: git range-diff ed6b22f..b293bf7 eae37a7..16b2bc6 | Disable: git config gitstack.push-range-diff false

@scottsand-db scottsand-db requested a review from DrakeLin April 15, 2026 15:57
Comment thread kernel/src/transaction/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (1 nit on comment style)

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from f6617cc to 4e1a8b3 Compare April 15, 2026 16:30
Comment thread kernel/src/table_configuration.rs Outdated
Comment thread kernel/src/log_segment.rs Outdated
Comment thread kernel/src/transaction/mod.rs
Comment thread kernel/src/transaction/mod.rs
Comment thread kernel/src/transaction/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@sanujbasu sanujbasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no blocking comments. Please address the comments before merge

@sanujbasu sanujbasu removed the request for review from DrakeLin April 15, 2026 17:26
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch 5 times, most recently from 15c4b27 to c8ecda0 Compare April 17, 2026 01:55
@github-actions github-actions Bot added the breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump. label Apr 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 85.60606% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.32%. Comparing base (76eebdb) to head (029d667).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction/mod.rs 78.08% 7 Missing and 9 partials ⚠️
kernel/src/snapshot/mod.rs 94.73% 1 Missing ⚠️
kernel/src/transaction/builder/create_table.rs 50.00% 0 Missing and 1 partial ⚠️
kernel/src/transaction/domain_metadata.rs 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2385      +/-   ##
==========================================
- Coverage   88.33%   88.32%   -0.02%     
==========================================
  Files         171      171              
  Lines       56696    56727      +31     
  Branches    56696    56727      +31     
==========================================
+ Hits        50083    50103      +20     
- Misses       4699     4704       +5     
- Partials     1914     1920       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-1-refactor-state branch from c8ecda0 to 029d667 Compare April 17, 2026 21:15
@scottsand-db scottsand-db enabled auto-merge April 17, 2026 21:16
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 17, 2026

PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: refactor: separate read state from effective state in Transaction
PR title does not match the required pattern. Please ensure you follow the conventional commits spec.

Your title should start with feat:, fix:, chore:, docs:, perf:, refactor:, test:, or ci:, and if it's a breaking change that should be suffixed with a ! (like feat!:), and then a 1-72 character brief description of your change.

Title: refactor: separate read state from effective state in Transaction

@william-ch-databricks william-ch-databricks changed the title refactor: separate read state from effective state in Transaction refactor!: separate read state from effective state in Transaction Apr 17, 2026
@scottsand-db scottsand-db added this pull request to the merge queue Apr 17, 2026
Merged via the queue into delta-io:main with commit 9decd69 Apr 17, 2026
19 of 25 checks passed
rtyler added a commit to buoyant-data/delta-kernel-rs that referenced this pull request Apr 18, 2026
This APi was removed in the upstream delta-io#2385 but
this is a load-bearing API for delta-rs. The removal was not related to
the change and seems like an erroneous removal that wasn't caught in
review.

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
rtyler added a commit to buoyant-data/delta-kernel-rs that referenced this pull request Apr 19, 2026
This APi was removed in the upstream delta-io#2385 but
this is a load-bearing API for delta-rs. The removal was not related to
the change and seems like an erroneous removal that wasn't caught in
review.

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants