You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: spannergeo-s2/README.md
+41-23Lines changed: 41 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,12 @@
4
4
5
5
This sample demonstrates how to perform geo-spatial indexing and querying on [Google Cloud Spanner](https://cloud.google.com/spanner) using the [S2 Geometry Library](https://s2geometry.io/). Spanner does not natively support spatial data types or spatial indexes, so we use S2 to encode geographic coordinates into indexable 64-bit cell IDs.
6
6
7
-
The sample includes two approaches to querying:
7
+
The sample exercises two dimensions of the design space:
8
8
9
-
-**Client-side S2**: The application computes S2 coverings and binds cell ID ranges as query parameters.
10
-
-**Remote UDFs**: Spanner calls Cloud Functions server-side to compute coverings and distances, so the query is self-contained SQL with no client-side S2 dependency.
9
+
-**Schema design**: v3 (interleaved token index with multi-level cell IDs) vs. v4 (single leaf-level cell ID with range scans via a covering index)
10
+
-**Computation approach**: Client-side S2 (application computes coverings and binds parameters) vs. Remote UDFs (Spanner calls Cloud Functions server-side for self-contained SQL)
11
+
12
+
All four combinations are demonstrated across three query shapes (radius, bounding box, k-NN), for a total of twelve query types.
11
13
12
14
## Prerequisites
13
15
@@ -36,7 +38,14 @@ The sample includes two approaches to querying:
@@ -67,9 +76,9 @@ Store raw coordinates with a composite index. Simple but inefficient for radius
67
76
68
77
Add an S2 Cell ID column at a fixed level (e.g., level 16, ~150m cells). Better, but a fixed level means either too coarse or too fine for different query radii.
69
78
70
-
### v3: Interleaved Token Index (Recommended) ([`schemas/v3_token_index.sql`](schemas/v3_token_index.sql))
79
+
### v3: Interleaved Token Index ([`schemas/v3_token_index.sql`](schemas/v3_token_index.sql))
71
80
72
-
The canonical pattern. Store multiple S2 tokens per location at varying cell levels in an interleaved child table. This balances precision vs. index size and supports queries at any radius. The production schema ([`infra/schema.sql`](infra/schema.sql)) uses this design.
81
+
The canonical pattern. Store multiple S2 tokens per location at varying cell levels in an interleaved child table. This balances precision vs. index size and supports queries at any radius.
73
82
74
83
```sql
75
84
CREATETABLEPointOfInterest (
@@ -159,12 +168,12 @@ With Remote UDFs deployed, queries become self-contained SQL -- no client-side S
159
168
160
169
### Remote UDF Queries (v4)
161
170
162
-
The v4 UDF queries use the same three Cloud Functions as v3 -- no new deployments needed. The key difference: covering cell IDs are converted to leaf-cell ranges using bitwise arithmetic directly in SQL (`C & (-C)` extracts the sentinel bit).
171
+
The v4 UDF queries use dedicated covering UDFs (`geo.s2_covering_v4`, `geo.s2_covering_rect_v4`) backed by simpler Cloud Functions that return cells at any level the S2 coverer chooses -- no filtering to levels 12/14/16. The SQL converts each covering cell to a leaf-cell range using bitwise arithmetic (`C & (-C)` extracts the sentinel bit). The `geo.s2_distance` UDF is shared with v3.
163
172
164
-
-**Radius search** ([`queries/v4_udf_query.sql`](queries/v4_udf_query.sql)) -- Combines `geo.s2_covering()` with inline bitwise range computation and `geo.s2_distance()` post-filter.
165
-
-**Bounding box** and **k-NN** follow the same pattern with `geo.s2_covering_rect()` and appropriate post-filters.
173
+
-**Radius search** ([`queries/v4_udf_query.sql`](queries/v4_udf_query.sql)) -- Combines `geo.s2_covering_v4()` with inline bitwise range computation and `geo.s2_distance()` post-filter.
174
+
-**Bounding box** and **k-NN** follow the same pattern with `geo.s2_covering_rect_v4()` and appropriate post-filters.
166
175
167
-
Here is the v4 UDF radius search query as an example. The client only provides `(lat, lng, radius)`:
176
+
Here is the v3 UDF radius search query as an example. The client only provides `(lat, lng, radius)`:
168
177
169
178
```sql
170
179
WITH candidates AS (
@@ -184,35 +193,41 @@ WHERE distance_meters <= @radiusMeters
184
193
ORDER BY distance_meters;
185
194
```
186
195
187
-
Three Remote UDFs power these queries:
196
+
Five Remote UDFs power these queries:
188
197
189
198
| UDF | Purpose |
190
199
|-----|---------|
191
-
|`geo.s2_covering(lat, lng, radius)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle |
192
-
|`geo.s2_covering_rect(minLat, minLng, maxLat, maxLng)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box |
200
+
|`geo.s2_covering(lat, lng, radius)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle (v3, levels 12/14/16) |
201
+
|`geo.s2_covering_v4(lat, lng, radius)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a search circle (v4, any level) |
202
+
|`geo.s2_covering_rect(minLat, minLng, maxLat, maxLng)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box (v3, levels 12/14/16) |
203
+
|`geo.s2_covering_rect_v4(minLat, minLng, maxLat, maxLng)`| Returns `ARRAY<INT64>` of S2 cell IDs covering a bounding box (v4, any level) |
193
204
|`geo.s2_distance(lat1, lng1, lat2, lng2)`| Returns great-circle distance in meters between two points |
194
205
195
206
> **Note:** Remote UDFs must live in a named schema (Spanner does not allow them in the default schema). This sample uses the `geo` schema. Additionally, `UNNEST` of a Remote UDF result requires materializing the array in a subquery first -- `UNNEST(geo.s2_covering(...))` directly in `FROM` is not supported.
196
207
197
208
## Remote UDFs
198
209
199
-
Remote UDFs push S2 logic into Spanner so queries don't require a client-side S2 library. Three Cloud Functions back the UDFs:
210
+
Remote UDFs push S2 logic into Spanner so queries don't require a client-side S2 library. Five Cloud Functions back the UDFs:
The Cloud Functions live in [`cloud-function/`](cloud-function/) as a separate Maven project:
210
223
211
-
-[`S2CoveringFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringFunction.java) -- Computes S2 coverings for a circular region at levels 12, 14, 16. Returns cell IDs as **JSON strings** (not numbers) because S2 cell IDs exceed JSON's safe integer limit of 2^53.
212
-
-[`S2CoveringRectFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectFunction.java) -- Computes S2 coverings for a rectangular region (bounding box) at levels 12, 14, 16. Same wire protocol and cell ID encoding as `S2CoveringFunction`.
224
+
-[`S2CoveringFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringFunction.java) -- Computes S2 coverings for a circular region at levels 12, 14, 16 (v3). Returns cell IDs as **JSON strings** (not numbers) because S2 cell IDs exceed JSON's safe integer limit of 2^53.
225
+
-[`S2CoveringV4Function.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringV4Function.java) -- Computes S2 coverings for a circular region at any level (v4). No level filtering -- returns all cells from the coverer for use with range scans.
226
+
-[`S2CoveringRectFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectFunction.java) -- Computes S2 coverings for a rectangular region (bounding box) at levels 12, 14, 16 (v3). Same wire protocol and cell ID encoding as `S2CoveringFunction`.
227
+
-[`S2CoveringRectV4Function.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2CoveringRectV4Function.java) -- Same for rectangular regions (v4). No level filtering -- returns all cells from the coverer.
213
228
-[`S2DistanceFunction.java`](cloud-function/src/main/java/com/example/spannergeo/functions/S2DistanceFunction.java) -- Computes great-circle distance using `S2LatLng.getDistance()`.
214
229
215
-
All three implement the [Spanner Remote UDF wire protocol](https://cloud.google.com/spanner/docs/remote-functions):
230
+
All five implement the [Spanner Remote UDF wire protocol](https://cloud.google.com/spanner/docs/remote-functions):
1.Same parameters as v3 UDF -- the same three Cloud Functions are reused
317
+
1.Uses dedicated v4 covering UDFs (`geo.s2_covering_v4`) that return cells at any level. The `geo.s2_distance` UDF is shared with v3
301
318
2. Covering cell IDs are converted to leaf-cell ranges using bitwise arithmetic in SQL: `C & (-C)` extracts the sentinel bit, then `C - (bit - 1)` and `C + (bit - 1)` give the range
302
319
3. Range scans hit the `PointOfInterestByS2Cell` covering index directly
303
320
@@ -430,7 +447,7 @@ sample/
430
447
│
431
448
├── infra/
432
449
│ ├── schema.sql # Production schema (v3 token index + v4 range index)
├── schemas/ # All schema iterations (for reference)
436
453
│ ├── v1_naive.sql
@@ -452,7 +469,9 @@ sample/
452
469
│ ├── pom.xml # Separate Maven project
453
470
│ └── src/main/java/.../functions/
454
471
│ ├── S2CoveringFunction.java
472
+
│ ├── S2CoveringV4Function.java
455
473
│ ├── S2CoveringRectFunction.java
474
+
│ ├── S2CoveringRectV4Function.java
456
475
│ └── S2DistanceFunction.java
457
476
│
458
477
├── deploy/ # Deployment & IAM scripts
@@ -479,7 +498,6 @@ sample/
479
498
| 20 |~10 m | Building-level (not used in this sample) |
480
499
| 30 |~1 cm | Maximum precision (leaf cell) |
481
500
482
-
483
501
## Some Gotchas to Keep in Mind
484
502
485
503
**Remote Functions must live in a named schema**. Spanner does not allow Remote Functions in the default schema. We use `CREATE SCHEMA IF NOT EXISTS geo` and qualify all calls as `geo.s2_covering()`, `geo.s2_covering_rect()`, and `geo.s2_distance()`.
0 commit comments