Skip to content

Commit a35cec6

Browse files
committed
feat: enable OAuth2-based access to services from notebooks
1 parent 3e38804 commit a35cec6

6 files changed

Lines changed: 150 additions & 362 deletions

File tree

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,20 +116,21 @@ At deployment time, the Helm chart:
116116
# Know issues
117117
1. [Polaris - Spark Iceberg REST Catalog refresh token](https://github.com/apache/iceberg/issues/12363)
118118
> Long-running jobs may need more metadata calls to Polaris during execution, not just one initial call
119-
2. [Trino - Issue with Vended Credential Renewal with Iceberg REST Catalog](https://github.com/trinodb/trino/issues/25827)
119+
2. [Polaris - OAuth 2 grant type "refresh_token" not implemented](https://github.com/apache/iceberg/issues/12196)
120+
3. [Trino - Issue with Vended Credential Renewal with Iceberg REST Catalog](https://github.com/trinodb/trino/issues/25827)
120121
> Reported upstream: with `iceberg.rest-catalog.vended-credentials-enabled=true`, long-running queries may fail once the STS token expires because Trino appears not to refresh vended credentials from the Iceberg REST catalog `/credentials` endpoint.
121122
>
122123
> A fix has been proposed in [PR #28792](https://github.com/trinodb/trino/pull/28792), but it is still under review, so this behavior should be validated in our environment.
123-
3. [Trino - Extra credential support for user token passthrough](https://github.com/trinodb/trino/issues/27197)
124+
4. [Trino - Extra credential support for user token passthrough](https://github.com/trinodb/trino/issues/27197)
124125
> Requests support for passing per-user OAuth tokens/credentials to the Iceberg REST catalog
125-
4. [Trino - Include oauth user in the request to the iceberg REST catalog](https://github.com/trinodb/trino/issues/26320)
126+
5. [Trino - Include oauth user in the request to the iceberg REST catalog](https://github.com/trinodb/trino/issues/26320)
126127
> [Starburst supports OAuth 2.0 token pass-through for the Iceberg REST catalog](https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through), which forwards the delegated OAuth token from the coordinator to the catalog:
127128
>
128129
> ```properties
129130
> http-server.authentication.type=DELEGATED-OAUTH2
130131
> iceberg.rest-catalog.security=OAUTH2_PASSTHROUGH
131132
> ```
132-
5. [STS assume role fails with credentials (from Lakekeeper) due to incomplete STS implementation](https://github.com/seaweedfs/seaweedfs/discussions/8312)
133+
6. [STS assume role fails with credentials (from Lakekeeper) due to incomplete STS implementation](https://github.com/seaweedfs/seaweedfs/discussions/8312)
133134
> The discussion initially points to a possible SeaweedFS STS compatibility issue, but the later reproducer narrows the failure to Lakekeeper's scoped session policy: multipart writes fail when the policy omits the required multipart S3 permissions.
134135
>
135136
> It demonstrates that multipart upload can fail if the scoped session policy does not include multipart actions such as:

notebooks/bronze/mobility/nyc_trip/02_trino_python_explore_yellow_tripdata.ipynb

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
"metadata": {},
3838
"outputs": [],
3939
"source": [
40+
"from trino.auth import JWTAuthentication\n",
4041
"from sqlalchemy import create_engine\n",
4142
"import pandas as pd\n",
4243
"import altair as alt\n",
@@ -53,11 +54,15 @@
5354
"\n",
5455
"We connect to the Trino cluster using SQLAlchemy.\n",
5556
"\n",
57+
"The `me()` helper returns the current authenticated notebook user.<br>\n",
58+
"The `access_token()` helper retrieves a fresh access token from JupyterHub auth state for the current session.\n",
59+
"\n",
5660
"Connection details:\n",
57-
"- **User:** `trino` \n",
58-
"- **Host:** `trino-default.okdp.sandbox` \n",
59-
"- **Catalog:** `lakehouse` \n",
60-
"- **Schema:** `nyc_tripdata` \n",
61+
"- **User:** `me()`\n",
62+
"- **Host:** `trino-default.okdp.sandbox`\n",
63+
"- **Catalog:** `bronze`\n",
64+
"- **Schema:** `nyc_tlc`\n",
65+
"- **Authentication:** `JWTAuthentication(access_token())`\n",
6166
"- **Protocol:** HTTPS with `verify=False` (disabled cert verification)"
6267
]
6368
},
@@ -69,9 +74,14 @@
6974
"outputs": [],
7075
"source": [
7176
"engine = create_engine(\n",
72-
" \"trino://trino@trino-default.okdp.sandbox/bronze/nyc_tlc\",\n",
73-
" connect_args={\"http_scheme\": \"https\", \"verify\": False}\n",
77+
" f\"trino://{me()}@trino-default.okdp.sandbox/bronze/nyc_tlc\",\n",
78+
" connect_args={\n",
79+
" \"http_scheme\": \"https\",\n",
80+
" \"verify\": False,\n",
81+
" \"auth\": JWTAuthentication(access_token()),\n",
82+
" },\n",
7483
")\n",
84+
"\n",
7585
"engine"
7686
]
7787
},
@@ -401,7 +411,7 @@
401411
"name": "python",
402412
"nbconvert_exporter": "python",
403413
"pygments_lexer": "ipython3",
404-
"version": "3.11.12"
414+
"version": "3.11.15"
405415
}
406416
},
407417
"nbformat": 4,

notebooks/bronze/mobility/nyc_trip/03_trino_sql_explore_yellow_tripdata.ipynb

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
"metadata": {},
3535
"outputs": [],
3636
"source": [
37+
"from trino.auth import JWTAuthentication\n",
38+
"from sqlalchemy import create_engine\n",
3739
"import pandas as pd\n",
3840
"import altair as alt\n",
3941
"%load_ext sql\n",
@@ -48,7 +50,13 @@
4850
"source": [
4951
"## 🔌 2. Connect to Trino using SQL Magic\n",
5052
"\n",
51-
"We connect to the Trino cluster using a SQLAlchemy-compliant connection URI through SQL Magic."
53+
"We connect to the Trino cluster using a pre-created SQLAlchemy engine with SQL Magic.\n",
54+
"\n",
55+
"The `me()` helper returns the current authenticated notebook user. </br>\n",
56+
"The `access_token()` helper retrieves a fresh access token from JupyterHub auth state for the current session.\n",
57+
"\n",
58+
"Instead of passing a raw connection URL directly to SQL Magic, we first create a SQLAlchemy engine with `JWTAuthentication(access_token())`, then register that engine with `%sql engine`.\n",
59+
"This keeps authentication consistent with the standard SQLAlchemy connection pattern and avoids embedding a static token in the SQL Magic connection string."
5260
]
5361
},
5462
{
@@ -58,7 +66,18 @@
5866
"metadata": {},
5967
"outputs": [],
6068
"source": [
61-
"%sql trino://trino@trino-default.okdp.sandbox:443/bronze/nyc_tlc?http_scheme=https&verify=false"
69+
"engine = create_engine(\n",
70+
" f\"trino://{me()}@trino-default.okdp.sandbox/bronze/nyc_tlc\",\n",
71+
" connect_args={\n",
72+
" \"http_scheme\": \"https\",\n",
73+
" \"verify\": False,\n",
74+
" \"auth\": JWTAuthentication(access_token()),\n",
75+
" },\n",
76+
")\n",
77+
"\n",
78+
"engine\n",
79+
"\n",
80+
"%sql engine"
6281
]
6382
},
6483
{
@@ -549,7 +568,7 @@
549568
"name": "python",
550569
"nbconvert_exporter": "python",
551570
"pygments_lexer": "ipython3",
552-
"version": "3.11.12"
571+
"version": "3.11.15"
553572
}
554573
},
555574
"nbformat": 4,

notebooks/gold/mobility/nyc_trip/01_build_monthly_stats.ipynb

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,31 @@
129129
"source": [
130130
"## Start Spark\n",
131131
"\n",
132-
"This notebook only sets notebook-local values that are safe to make visible.\n",
132+
"Configuration choices used here:\n",
133133
"\n",
134-
"Catalog wiring, Iceberg extensions, REST catalog endpoints, and object-storage access are assumed to be injected outside the notebook.\n"
134+
"- `org.apache.iceberg.spark.SparkCatalog` is used for Iceberg catalog integration in Spark.\n",
135+
"- Iceberg REST catalog settings are already preconfigured to point to Apache Polaris.\n",
136+
"- `header.X-Iceberg-Access-Delegation=vended-credentials` is enabled so Polaris can delegate storage access for Iceberg-managed tables.\n",
137+
"- Global `s3a` settings are already configured for Bronze raw Parquet reads.\n",
138+
"- Iceberg Spark SQL extensions are enabled explicitly.\n",
139+
"\n",
140+
"Spark catalogs are configured under `spark.sql.catalog.<catalog_name>`.\n",
141+
"\n",
142+
"For authentication, this example uses the `access_token()` notebook helper by default.\n",
143+
"This helper retrieves the current fresh user access token from JupyterHub auth state and passes it to Polaris as a bearer token.\n",
144+
"\n",
145+
"The line below is commented on purpose:\n",
146+
"\n",
147+
"`# .config(f\"spark.sql.catalog.{SILVER_CATALOG}.credential\", polaris_oauth2_credential)` </br>\n",
148+
"`# .config(f\"spark.sql.catalog.{GOLD_CATALOG}.credential\", polaris_oauth2_credential)`\n",
149+
"\n",
150+
"If needed, you can uncomment it to switch from end-user token authentication to a static OAuth2 client credential (`client_id:client_secret`) for service-to-service or ETL-style execution.\n",
151+
"\n",
152+
"The following line is enabled by default:\n",
153+
"\n",
154+
"` .config(f\"spark.sql.catalog.{CATALOG_NAME}.token\", access_token())`\n",
155+
"\n",
156+
"This means Spark connects to Polaris with the current authenticated notebook user token instead of a fixed technical client.\n"
135157
]
136158
},
137159
{
@@ -141,13 +163,17 @@
141163
"metadata": {},
142164
"outputs": [],
143165
"source": [
144-
"polaris_credential = \"my-polaris-spark-etl-app:mySparkAppSecret\"\n",
166+
"polaris_oauth2_credential = \"my-polaris-spark-etl-app:mySparkAppSecret\"\n",
145167
"\n",
146168
"spark = (\n",
147169
" SparkSession.builder\n",
148170
" .appName(\"NYC Tripdata - Gold - Monthly Stats - PySpark\")\n",
149-
" .config(f\"spark.sql.catalog.{SILVER_CATALOG}.credential\", polaris_credential)\n",
150-
" .config(f\"spark.sql.catalog.{GOLD_CATALOG}.credential\", polaris_credential)\n",
171+
" #.config(f\"spark.sql.catalog.{SILVER_CATALOG}.credential\", polaris_oauth2_credential)\n",
172+
" #.config(f\"spark.sql.catalog.{GOLD_CATALOG}.credential\", polaris_oauth2_credential)\n",
173+
" .config(f\"spark.sql.catalog.{SILVER_CATALOG}.token\", access_token())\n",
174+
" .config(f\"spark.sql.catalog.{GOLD_CATALOG}.token\", access_token())\n",
175+
" .config(f\"spark.sql.catalog.{SILVER_CATALOG}.token-refresh-enabled\", \"false\")\n",
176+
" .config(f\"spark.sql.catalog.{GOLD_CATALOG}.token-refresh-enabled\", \"false\")\n",
151177
" .config(\"spark.executor.memory\", \"2g\")\n",
152178
" .config(\"spark.executor.memoryOverhead\", \"640m\")\n",
153179
" .config(\"spark.executor.cores\", 1)\n",

0 commit comments

Comments
 (0)