Skip to content

Commit c395200

Browse files
author
Karthi Thyagarajan
committed
Created separate v4 Remote Functions and corresponding changes
1 parent 4da6ae0 commit c395200

File tree

7 files changed

+558
-41
lines changed

7 files changed

+558
-41
lines changed

spannergeo-s2/README.md

Lines changed: 41 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@
44

55
This sample demonstrates how to perform geo-spatial indexing and querying on [Google Cloud Spanner](https://cloud.google.com/spanner) using the [S2 Geometry Library](https://s2geometry.io/). Spanner does not natively support spatial data types or spatial indexes, so we use S2 to encode geographic coordinates into indexable 64-bit cell IDs.
66

7-
The sample includes two approaches to querying:
7+
The sample exercises two dimensions of the design space:
88

9-
- **Client-side S2**: The application computes S2 coverings and binds cell ID ranges as query parameters.
10-
- **Remote UDFs**: Spanner calls Cloud Functions server-side to compute coverings and distances, so the query is self-contained SQL with no client-side S2 dependency.
9+
- **Schema design**: v3 (interleaved token index with multi-level cell IDs) vs. v4 (single leaf-level cell ID with range scans via a covering index)
10+
- **Computation approach**: Client-side S2 (application computes coverings and binds parameters) vs. Remote UDFs (Spanner calls Cloud Functions server-side for self-contained SQL)
11+
12+
All four combinations are demonstrated across three query shapes (radius, bounding box, k-NN), for a total of twelve query types.
1113

1214
## Prerequisites
1315

@@ -36,7 +38,14 @@ The sample includes two approaches to querying:
3638
3. **Run the demo:**
3739

3840
```bash
39-
mvn exec:java -Dexec.args="YOUR_PROJECT YOUR_INSTANCE YOUR_DATABASE"
41+
# Option 1: Use env vars (or put these in a .env file and source it)
42+
export SPANNER_PROJECT_ID=your-project
43+
export SPANNER_INSTANCE_ID=your-instance
44+
export SPANNER_DATABASE_ID=your-database
45+
mvn exec:java
46+
47+
# Option 2: Pass as CLI args
48+
mvn exec:java -Dexec.args="your-project your-instance your-database"
4049
```
4150

4251
4. **(Optional) Deploy Remote UDFs** for the server-side S2 query demos:
@@ -57,7 +66,7 @@ The sample includes two approaches to querying:
5766

5867
## Schema Design
5968

60-
We walk through three progressively refined schema designs. The recommended pattern is **v3** (token index table).
69+
We walk through four progressively refined schema designs. v3 and v4 are both production-ready patterns with different tradeoffs.
6170

6271
### v1: Naive Lat/Lng Columns ([`schemas/v1_naive.sql`](schemas/v1_naive.sql))
6372

@@ -67,9 +76,9 @@ Store raw coordinates with a composite index. Simple but inefficient for radius
6776

6877
Add an S2 Cell ID column at a fixed level (e.g., level 16, ~150m cells). Better, but a fixed level means either too coarse or too fine for different query radii.
6978

70-
### v3: Interleaved Token Index (Recommended) ([`schemas/v3_token_index.sql`](schemas/v3_token_index.sql))
79+
### v3: Interleaved Token Index ([`schemas/v3_token_index.sql`](schemas/v3_token_index.sql))
7180

72-
The canonical pattern. Store multiple S2 tokens per location at varying cell levels in an interleaved child table. This balances precision vs. index size and supports queries at any radius. The production schema ([`infra/schema.sql`](infra/schema.sql)) uses this design.
81+
The canonical pattern. Store multiple S2 tokens per location at varying cell levels in an interleaved child table. This balances precision vs. index size and supports queries at any radius.
7382

7483
```sql
7584
CREATE TABLE PointOfInterest (
@@ -159,12 +168,12 @@ With Remote UDFs deployed, queries become self-contained SQL -- no client-side S
159168

160169
### Remote UDF Queries (v4)
161170

162-
The v4 UDF queries use the same three Cloud Functions as v3 -- no new deployments needed. The key difference: covering cell IDs are converted to leaf-cell ranges using bitwise arithmetic directly in SQL (`C & (-C)` extracts the sentinel bit).
171+
The v4 UDF queries use dedicated covering UDFs (`geo.s2_covering_v4`, `geo.s2_covering_rect_v4`) backed by simpler Cloud Functions that return cells at any level the S2 coverer chooses -- no filtering to levels 12/14/16. The SQL converts each covering cell to a leaf-cell range using bitwise arithmetic (`C & (-C)` extracts the sentinel bit). The `geo.s2_distance` UDF is shared with v3.
163172

164-
- **Radius search** ([`queries/v4_udf_query.sql`](queries/v4_udf_query.sql)) -- Combines `geo.s2_covering()` with inline bitwise range computation and `geo.s2_distance()` post-filter.
165-
- **Bounding box** and **k-NN** follow the same pattern with `geo.s2_covering_rect()` and appropriate post-filters.
173+
- **Radius search** ([`queries/v4_udf_query.sql`](queries/v4_udf_query.sql)) -- Combines `geo.s2_covering_v4()` with inline bitwise range computation and `geo.s2_distance()` post-filter.
174+
- **Bounding box** and **k-NN** follow the same pattern with `geo.s2_covering_rect_v4()` and appropriate post-filters.
166175

167-
Here is the v4 UDF radius search query as an example. The client only provides `(lat, lng, radius)`:
176+
Here is the v3 UDF radius search query as an example. The client only provides `(lat, lng, radius)`:
168177

169178
```sql
170179
WITH candidates AS (
@@ -184,35 +193,41 @@ WHERE distance_meters <= @radiusMeters
184193
ORDER BY distance_meters;
185194
```
186195

187-
Three Remote UDFs power these queries:
196+
Five Remote UDFs power these queries:
188197

189198
| UDF | Purpose |
190199
|-----|---------|
191-
| `geo.s2_covering(lat, lng, radius)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle |
192-
| `geo.s2_covering_rect(minLat, minLng, maxLat, maxLng)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box |
200+
| `geo.s2_covering(lat, lng, radius)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle (v3, levels 12/14/16) |
201+
| `geo.s2_covering_v4(lat, lng, radius)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle (v4, any level) |
202+
| `geo.s2_covering_rect(minLat, minLng, maxLat, maxLng)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box (v3, levels 12/14/16) |
203+
| `geo.s2_covering_rect_v4(minLat, minLng, maxLat, maxLng)` | Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box (v4, any level) |
193204
| `geo.s2_distance(lat1, lng1, lat2, lng2)` | Returns great-circle distance in meters between two points |
194205

195206
> **Note:** Remote UDFs must live in a named schema (Spanner does not allow them in the default schema). This sample uses the `geo` schema. Additionally, `UNNEST` of a Remote UDF result requires materializing the array in a subquery first -- `UNNEST(geo.s2_covering(...))` directly in `FROM` is not supported.
196207
197208
## Remote UDFs
198209

199-
Remote UDFs push S2 logic into Spanner so queries don't require a client-side S2 library. Three Cloud Functions back the UDFs:
210+
Remote UDFs push S2 logic into Spanner so queries don't require a client-side S2 library. Five Cloud Functions back the UDFs:
200211

201212
| Cloud Function | Entry Point | Spanner UDF |
202213
|----------------|-------------|-------------|
203214
| `s2-covering` | `S2CoveringFunction` | `geo.s2_covering()` |
215+
| `s2-covering-v4` | `S2CoveringV4Function` | `geo.s2_covering_v4()` |
204216
| `s2-covering-rect` | `S2CoveringRectFunction` | `geo.s2_covering_rect()` |
217+
| `s2-covering-rect-v4` | `S2CoveringRectV4Function` | `geo.s2_covering_rect_v4()` |
205218
| `s2-distance` | `S2DistanceFunction` | `geo.s2_distance()` |
206219

207220
### Cloud Function Implementation
208221

209222
The Cloud Functions live in [`cloud-function/`](cloud-function/) as a separate Maven project:
210223

211-
- [`S2CoveringFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringFunction.java) -- Computes S2 coverings for a circular region at levels 12, 14, 16. Returns cell IDs as **JSON strings** (not numbers) because S2 cell IDs exceed JSON's safe integer limit of 2^53.
212-
- [`S2CoveringRectFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectFunction.java) -- Computes S2 coverings for a rectangular region (bounding box) at levels 12, 14, 16. Same wire protocol and cell ID encoding as `S2CoveringFunction`.
224+
- [`S2CoveringFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringFunction.java) -- Computes S2 coverings for a circular region at levels 12, 14, 16 (v3). Returns cell IDs as **JSON strings** (not numbers) because S2 cell IDs exceed JSON's safe integer limit of 2^53.
225+
- [`S2CoveringV4Function.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringV4Function.java) -- Computes S2 coverings for a circular region at any level (v4). No level filtering -- returns all cells from the coverer for use with range scans.
226+
- [`S2CoveringRectFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectFunction.java) -- Computes S2 coverings for a rectangular region (bounding box) at levels 12, 14, 16 (v3). Same wire protocol and cell ID encoding as `S2CoveringFunction`.
227+
- [`S2CoveringRectV4Function.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectV4Function.java) -- Same for rectangular regions (v4). No level filtering -- returns all cells from the coverer.
213228
- [`S2DistanceFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2DistanceFunction.java) -- Computes great-circle distance using `S2LatLng.getDistance()`.
214229

215-
All three implement the [Spanner Remote UDF wire protocol](https://cloud.google.com/spanner/docs/remote-functions):
230+
All five implement the [Spanner Remote UDF wire protocol](https://cloud.google.com/spanner/docs/remote-functions):
216231
- **Request:** `{"requestId": "...", "calls": [[args_row1], [args_row2], ...]}`
217232
- **Response:** `{"replies": [result1, result2, ...]}` (array length must match `calls`)
218233

@@ -227,7 +242,7 @@ Deployment scripts are in [`deploy/`](deploy/). Run them in order:
227242
# 2. Deploy all three Cloud Functions (builds with Maven, then deploys)
228243
./deploy/deploy-function.sh --project YOUR_PROJECT
229244

230-
# 3. Grant Spanner's service agent permission to invoke the functions
245+
# 3. Grant Spanner's service agent the Spanner API Service Agent role on the project
231246
./deploy/grant-permissions.sh --project YOUR_PROJECT
232247
```
233248

@@ -249,7 +264,9 @@ This deletes the Cloud Functions and removes the project-level `roles/spanner.se
249264

250265
```sql
251266
DROP FUNCTION IF EXISTS geo.s2_covering;
267+
DROP FUNCTION IF EXISTS geo.s2_covering_v4;
252268
DROP FUNCTION IF EXISTS geo.s2_covering_rect;
269+
DROP FUNCTION IF EXISTS geo.s2_covering_rect_v4;
253270
DROP FUNCTION IF EXISTS geo.s2_distance;
254271
DROP SCHEMA IF EXISTS geo;
255272
```
@@ -297,7 +314,7 @@ All mutations happen in a single Spanner transaction. Both v3 and v4 indexes are
297314
4. No S2 library dependency in the DAO method -- pure SQL
298315

299316
**v4 Remote UDF** (e.g., `SpannerGeoDao.radiusSearchWithUdfV4()`):
300-
1. Same parameters as v3 UDF -- the same three Cloud Functions are reused
317+
1. Uses dedicated v4 covering UDFs (`geo.s2_covering_v4`) that return cells at any level. The `geo.s2_distance` UDF is shared with v3
301318
2. Covering cell IDs are converted to leaf-cell ranges using bitwise arithmetic in SQL: `C & (-C)` extracts the sentinel bit, then `C - (bit - 1)` and `C + (bit - 1)` give the range
302319
3. Range scans hit the `PointOfInterestByS2Cell` covering index directly
303320

@@ -430,7 +447,7 @@ sample/
430447
431448
├── infra/
432449
│ ├── schema.sql # Production schema (v3 token index + v4 range index)
433-
│ └── udf_definition.sql # Remote UDF DDL (geo.s2_covering, geo.s2_covering_rect, geo.s2_distance)
450+
│ └── udf_definition.sql # Remote UDF DDL (v3: geo.s2_covering, geo.s2_covering_rect; v4: geo.s2_covering_v4, geo.s2_covering_rect_v4; shared: geo.s2_distance)
434451
435452
├── schemas/ # All schema iterations (for reference)
436453
│ ├── v1_naive.sql
@@ -452,7 +469,9 @@ sample/
452469
│ ├── pom.xml # Separate Maven project
453470
│ └── src/main/java/.../functions/
454471
│ ├── S2CoveringFunction.java
472+
│ ├── S2CoveringV4Function.java
455473
│ ├── S2CoveringRectFunction.java
474+
│ ├── S2CoveringRectV4Function.java
456475
│ └── S2DistanceFunction.java
457476
458477
├── deploy/ # Deployment & IAM scripts
@@ -479,7 +498,6 @@ sample/
479498
| 20 | ~10 m | Building-level (not used in this sample) |
480499
| 30 | ~1 cm | Maximum precision (leaf cell) |
481500

482-
483501
## Some Gotchas to Keep in Mind
484502

485503
**Remote Functions must live in a named schema**. Spanner does not allow Remote Functions in the default schema. We use `CREATE SCHEMA IF NOT EXISTS geo` and qualify all calls as `geo.s2_covering()`, `geo.s2_covering_rect()`, and `geo.s2_distance()`.
@@ -495,5 +513,5 @@ sample/
495513
- [S2 Geometry Library](https://s2geometry.io/)
496514
- [S2 Cell Hierarchy](https://s2geometry.io/devguide/s2cell_hierarchy.html)
497515
- [Google Cloud Spanner DDL Reference](https://cloud.google.com/spanner/docs/reference/standard-sql/data-definition-language)
498-
- [Spanner Remote Functions](https://docs.cloud.google.com/spanner/docs/cloud-run-remote-function)
516+
- [Spanner Remote UDFs](https://cloud.google.com/spanner/docs/remote-functions)
499517
- [Spanner Interleaved Tables](https://cloud.google.com/spanner/docs/schema-and-data-model#creating-interleaved-tables)
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
/*
2+
* Copyright 2026 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package com.example.spannergeo.functions;
18+
19+
import com.google.cloud.functions.HttpFunction;
20+
import com.google.cloud.functions.HttpRequest;
21+
import com.google.cloud.functions.HttpResponse;
22+
import com.google.common.geometry.S2CellId;
23+
import com.google.common.geometry.S2CellUnion;
24+
import com.google.common.geometry.S2LatLng;
25+
import com.google.common.geometry.S2LatLngRect;
26+
import com.google.common.geometry.S2RegionCoverer;
27+
import com.google.gson.Gson;
28+
import com.google.gson.JsonArray;
29+
import com.google.gson.JsonElement;
30+
import com.google.gson.JsonObject;
31+
import com.google.gson.JsonParser;
32+
33+
import java.io.BufferedReader;
34+
import java.io.BufferedWriter;
35+
import java.io.IOException;
36+
37+
/**
38+
* Cloud Run function backing the Spanner Remote UDF {@code geo.s2_covering_rect_v4}.
39+
*
40+
* <p>This is the v4 variant of the rectangle covering function, designed for the
41+
* range-scan schema where each POI stores a single leaf-level S2 Cell ID (level 30).
42+
* Unlike {@link S2CoveringRectFunction} which filters covering cells to levels
43+
* 12/14/16 for the v3 token index, this function returns cells at whatever levels
44+
* the {@link S2RegionCoverer} chooses. The SQL query handles the conversion from
45+
* covering cells to leaf-cell ranges using bitwise arithmetic.
46+
*
47+
* <p>Wire protocol (Spanner Remote UDF batch format):
48+
* <pre>
49+
* Request: {"requestId": "...", "calls": [[minLat, minLng, maxLat, maxLng], ...]}
50+
* Response: {"replies": [["cellId1", "cellId2", ...], ...]}
51+
* </pre>
52+
*
53+
* <p>Cell IDs are returned as JSON strings (not numbers) because S2 cell IDs are
54+
* unsigned 64-bit values that exceed JSON's safe integer limit (2^53). Spanner
55+
* handles the string-to-INT64 parsing automatically.
56+
*/
57+
public class S2CoveringRectV4Function implements HttpFunction {
58+
59+
private static final Gson GSON = new Gson();
60+
61+
/**
62+
* Coverer level range. Unlike the v3 function which constrains to 12-16,
63+
* we allow a wider range so the coverer can pick optimal levels for the
64+
* search region size. The SQL query handles any level via range scans.
65+
*/
66+
private static final int MIN_LEVEL = 12;
67+
private static final int MAX_LEVEL = 20;
68+
69+
/**
70+
* Maximum number of cells in the covering. We use 20 (the default) since
71+
* no cells are filtered out — every cell the coverer produces is returned.
72+
*/
73+
private static final int MAX_CELLS = 20;
74+
75+
@Override
76+
public void service(HttpRequest request, HttpResponse response) throws IOException {
77+
response.setContentType("application/json");
78+
BufferedWriter writer = response.getWriter();
79+
80+
try {
81+
BufferedReader reader = request.getReader();
82+
JsonObject requestBody = JsonParser.parseReader(reader).getAsJsonObject();
83+
JsonArray calls = requestBody.getAsJsonArray("calls");
84+
85+
JsonArray replies = new JsonArray();
86+
for (JsonElement callElement : calls) {
87+
JsonArray callArgs = callElement.getAsJsonArray();
88+
double minLat = callArgs.get(0).getAsDouble();
89+
double minLng = callArgs.get(1).getAsDouble();
90+
double maxLat = callArgs.get(2).getAsDouble();
91+
double maxLng = callArgs.get(3).getAsDouble();
92+
93+
// Validate inputs
94+
if (minLat < -90 || minLat > 90) {
95+
writeError(writer, "minLat must be between -90 and 90, got: " + minLat);
96+
return;
97+
}
98+
if (maxLat < -90 || maxLat > 90) {
99+
writeError(writer, "maxLat must be between -90 and 90, got: " + maxLat);
100+
return;
101+
}
102+
if (minLng < -180 || minLng > 180) {
103+
writeError(writer, "minLng must be between -180 and 180, got: " + minLng);
104+
return;
105+
}
106+
if (maxLng < -180 || maxLng > 180) {
107+
writeError(writer, "maxLng must be between -180 and 180, got: " + maxLng);
108+
return;
109+
}
110+
if (minLat > maxLat) {
111+
writeError(writer, "minLat must be <= maxLat, got: " + minLat + " > " + maxLat);
112+
return;
113+
}
114+
if (minLng > maxLng) {
115+
writeError(writer, "minLng must be <= maxLng, got: " + minLng + " > " + maxLng);
116+
return;
117+
}
118+
119+
JsonArray cellIdArray = computeCovering(minLat, minLng, maxLat, maxLng);
120+
replies.add(cellIdArray);
121+
}
122+
123+
JsonObject responseBody = new JsonObject();
124+
responseBody.add("replies", replies);
125+
writer.write(GSON.toJson(responseBody));
126+
127+
} catch (Exception e) {
128+
writeError(writer, "Failed to compute S2 covering for rect: " + e.getMessage());
129+
}
130+
}
131+
132+
/**
133+
* Compute S2 covering cell IDs for a rectangular region. Unlike the v3 function,
134+
* no level filtering or promotion is applied — cells are returned at whatever
135+
* levels the coverer selects.
136+
*/
137+
static JsonArray computeCovering(double minLat, double minLng, double maxLat, double maxLng) {
138+
S2LatLngRect rect = new S2LatLngRect(
139+
S2LatLng.fromDegrees(minLat, minLng),
140+
S2LatLng.fromDegrees(maxLat, maxLng));
141+
142+
S2RegionCoverer coverer = S2RegionCoverer.builder()
143+
.setMinLevel(MIN_LEVEL)
144+
.setMaxLevel(MAX_LEVEL)
145+
.setMaxCells(MAX_CELLS)
146+
.build();
147+
148+
S2CellUnion covering = coverer.getCovering(rect);
149+
150+
// Return all cells directly — no filtering needed for v4.
151+
JsonArray cellIdArray = new JsonArray();
152+
for (S2CellId cellId : covering) {
153+
cellIdArray.add(String.valueOf(cellId.id()));
154+
}
155+
return cellIdArray;
156+
}
157+
158+
/** Write a JSON error response. */
159+
private static void writeError(BufferedWriter writer, String message) throws IOException {
160+
JsonObject error = new JsonObject();
161+
error.addProperty("errorMessage", message);
162+
writer.write(GSON.toJson(error));
163+
}
164+
}

0 commit comments

Comments
 (0)