Skip to content

Commit ff87919

Browse files
committed
fix(startup): select Delta Maven artifact by Spark runtime version
* Remove hardcoded Delta package/version from startup path. * Default DELTA_SPARK_VERSION to 4.1.0 while allowing env override. *Detect Spark major.minor at runtime and choose artifact: 4.1 -> delta-spark_4.1_2.13 4.0 -> delta-spark_4.0_2.13 fallback -> delta-spark_2.13 * Keep existing Spark session configs unchanged. * Update README with Spark-specific artifact guidance. * Add integration tests to validate startup artifact selection for Spark 4.1 and 4.0. Signed-off-by: Rajesh Jain <73859950+rjain21@users.noreply.github.com>
1 parent 4444fb9 commit ff87919

File tree

3 files changed

+53
-6
lines changed

3 files changed

+53
-6
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Once the image has been built or you have downloaded the correct image, you can
8282

8383
In the following instructions, the variable `${DELTA_PACKAGE_VERSION}` refers to the Delta Lake Package version.
8484

85-
The current version is `delta-spark_2.13:4.0.0` which corresponds to Apache Spark 4.x release line.
85+
For Spark 4.x, use Spark-version-specific Delta artifacts: `delta-spark_4.1_2.13:<version>` for Spark 4.1 and `delta-spark_4.0_2.13:<version>` for Spark 4.0 (use `delta-spark_2.13:<version>` for older Spark lines).
8686

8787
## Choose an Interface
8888

startup.sh

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,30 @@ source "$HOME/.cargo/env"
44

55
export PYSPARK_DRIVER_PYTHON=jupyter
66
export PYSPARK_DRIVER_PYTHON_OPTS='lab --ip=0.0.0.0'
7-
export DELTA_SPARK_VERSION='4.0.1'
8-
export DELTA_PACKAGE_VERSION=delta-spark_2.13:${DELTA_SPARK_VERSION}
7+
8+
# Default Delta version; can be overridden by setting DELTA_SPARK_VERSION in the environment
9+
: "${DELTA_SPARK_VERSION:=4.1.0}"
10+
11+
# Detect the Spark major.minor version from the running runtime (e.g. "4.1")
12+
SPARK_FULL_VERSION=$("${SPARK_HOME}/bin/spark-submit" --version 2>&1 \
13+
| grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -1)
14+
SPARK_MAJOR_MINOR=$(echo "${SPARK_FULL_VERSION}" | cut -d. -f1,2)
15+
16+
# Select the Delta Maven artifact that matches this Spark version.
17+
# Spark 4.1 and 4.0 each publish a Spark-specific artifact; older releases use the generic one.
18+
case "${SPARK_MAJOR_MINOR}" in
19+
4.1)
20+
DELTA_ARTIFACT="delta-spark_4.1_2.13"
21+
;;
22+
4.0)
23+
DELTA_ARTIFACT="delta-spark_4.0_2.13"
24+
;;
25+
*)
26+
DELTA_ARTIFACT="delta-spark_2.13"
27+
;;
28+
esac
29+
30+
export DELTA_PACKAGE_VERSION="${DELTA_ARTIFACT}:${DELTA_SPARK_VERSION}"
931

1032
$SPARK_HOME/bin/pyspark --packages io.delta:${DELTA_PACKAGE_VERSION} \
1133
--conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp -Dio.netty.tryReflectionSetAccessible=true" \

tests/test_docker.sh

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -172,15 +172,40 @@ run_test "spark-submit is on PATH" "spark-submit --version"
172172
run_test "pyspark is on PATH" "pyspark --version"
173173

174174
# ---------------------------------------------------------------
175-
# 6. Rust toolchain
175+
# 6. startup.sh artifact resolution
176+
# ---------------------------------------------------------------
177+
178+
section "startup.sh artifact resolution"
179+
run_test_verbose "startup.sh selects Spark 4.1 Delta artifact" \
180+
"set -euo pipefail
181+
mkdir -p /tmp/mock-spark/bin
182+
printf '%s\n' '#!/usr/bin/env bash' 'echo \"Spark version 4.1.1\" >&2' > /tmp/mock-spark/bin/spark-submit
183+
printf '%s\n' '#!/usr/bin/env bash' 'echo \"\$*\"' > /tmp/mock-spark/bin/pyspark
184+
chmod +x /tmp/mock-spark/bin/spark-submit /tmp/mock-spark/bin/pyspark
185+
startup_output=\$(SPARK_HOME=/tmp/mock-spark DELTA_SPARK_VERSION=4.1.0 bash startup.sh 2>&1)
186+
echo \"\$startup_output\"
187+
[[ \"\$startup_output\" == *\"--packages io.delta:delta-spark_4.1_2.13:4.1.0\"* ]]"
188+
189+
run_test_verbose "startup.sh selects Spark 4.0 Delta artifact" \
190+
"set -euo pipefail
191+
mkdir -p /tmp/mock-spark/bin
192+
printf '%s\n' '#!/usr/bin/env bash' 'echo \"Spark version 4.0.3\" >&2' > /tmp/mock-spark/bin/spark-submit
193+
printf '%s\n' '#!/usr/bin/env bash' 'echo \"\$*\"' > /tmp/mock-spark/bin/pyspark
194+
chmod +x /tmp/mock-spark/bin/spark-submit /tmp/mock-spark/bin/pyspark
195+
startup_output=\$(SPARK_HOME=/tmp/mock-spark DELTA_SPARK_VERSION=4.1.0 bash startup.sh 2>&1)
196+
echo \"\$startup_output\"
197+
[[ \"\$startup_output\" == *\"--packages io.delta:delta-spark_4.0_2.13:4.1.0\"* ]]"
198+
199+
# ---------------------------------------------------------------
200+
# 7. Rust toolchain
176201
# ---------------------------------------------------------------
177202

178203
section "Rust Toolchain"
179204
run_test "rustc is available" 'source "$HOME/.cargo/env" && rustc --version'
180205
run_test "cargo is available" 'source "$HOME/.cargo/env" && cargo --version'
181206

182207
# ---------------------------------------------------------------
183-
# 7. Functional: delta-rs (Python) write/read via Polars
208+
# 8. Functional: delta-rs (Python) write/read via Polars
184209
# ---------------------------------------------------------------
185210

186211
section "Functional: delta-rs + Polars"
@@ -215,7 +240,7 @@ print('Polars Delta append OK')
215240
\""
216241

217242
# ---------------------------------------------------------------
218-
# 8. Functional: deltalake Python API
243+
# 9. Functional: deltalake Python API
219244
# ---------------------------------------------------------------
220245

221246
section "Functional: deltalake Python API"

0 commit comments

Comments
 (0)