Skip to content

Graph size multi shard support - changes in JdbcIOWrapper#3706

Open
VardhanThigle wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
VardhanThigle:graph-size-multi-shard-jdbc
Open

Graph size multi shard support - changes in JdbcIOWrapper#3706
VardhanThigle wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
VardhanThigle:graph-size-multi-shard-jdbc

Conversation

@VardhanThigle
Copy link
Copy Markdown
Contributor

@VardhanThigle VardhanThigle commented Apr 17, 2026

JDBC Engine Support for Multi-Source Readers

This is the fourth child of #3684 . Here we refactor JdbcIO to allow multi sharded read. This will be followed by changes in Pipeline Controller and ReadWithUniformPartitions Changes. Please look at #3684 for all the details.

Design Decision

This PR implements the core multi-shard logic within the JdbcIoWrapper.

Key Changes:

  • Parallel Schema Discovery: Refactored the discovery phase to process all shards in a JdbcIoWrapperConfigGroup in parallel using parallelStream().
  • Bulk Transform Building: The wrapper now aggregates all tables across all shards into a consolidated list and instantiates a single ReadWithUniformPartitions instance.
  • Fail-Fast Initialization: Implemented a strict error handling policy where failure to discover a single shard results in immediate job failure.

Rationale:

To support thousands of tables across hundreds of shards, sequential schema discovery was no longer feasible. By parallelizing this phase and consolidating the resulting transforms, we achieve both rapid job startup and a constant-sized Dataflow graph.


Why it's Safe (Concurrency & Error Isolation)

  • Parallel Discovery Performance: We used standard Java parallel streams for cross-shard discovery. Since this happens during job submission (on the client/launcher), it significantly reduces job setup time without impacting worker resources.
  • Fail-Fast Integrity: We chose a fail-fast approach because a multi-shard migration is only as reliable as its weakest shard. Failing early prevents the template from starting a "partial" migration which could lead to difficult-to-detect data gaps in Spanner.
  • Retry Mechanism: While the discovery is fail-fast, individual shard operations are wrapped in exponential backoff retries (via FluentBackoff), ensuring that transient network issues don't trigger unnecessary job failures.

How to Verify

The added tests simulate multi-shard configurations and verify that parallel discovery produces the correct aggregated schema and a consolidated reader transform.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for multi-shard database configurations in the JdbcIoWrapper. It enables parallel schema discovery across multiple shards to improve startup performance and updates the internal identification mechanism to use table schema UUIDs, ensuring more robust tracking of table reads and completions. The changes also include necessary refactoring to aggregate reader transforms across these shards.

Highlights

  • Multi-Shard Support: Introduced JdbcIoWrapperConfigGroup to manage multiple database shards, enabling parallel schema discovery and table inference.
  • Parallel Discovery: Implemented parallel schema discovery using a fixed thread pool to reduce pipeline startup latency.
  • Schema Identification: Transitioned from using table names to table schema UUIDs for identifying tables in reader transforms and completion tracking.
  • Refactored Reader Construction: Updated JdbcIoWrapper to aggregate reader transforms across multiple shards, supporting both legacy JdbcIO and optimized ReadWithUniformPartitions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 95.65217% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.76%. Comparing base (f3c3064) to head (b940a2c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...source/reader/io/jdbc/iowrapper/JdbcIoWrapper.java 95.45% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3706      +/-   ##
============================================
+ Coverage     52.49%   58.76%   +6.27%     
+ Complexity     6248     2138    -4110     
============================================
  Files          1065      505     -560     
  Lines         64318    29278   -35040     
  Branches       7119     3209    -3910     
============================================
- Hits          33765    17206   -16559     
+ Misses        28268    11095   -17173     
+ Partials       2285      977    -1308     
Components Coverage Δ
spanner-templates 73.87% <95.65%> (+1.78%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 80.79% <ø> (-0.07%) ⬇️
spanner-live-reverse-replication 77.47% <ø> (-0.06%) ⬇️
spanner-bulk-migration 89.37% <95.65%> (-0.01%) ⬇️
gcs-spanner-dv 86.69% <ø> (+0.96%) ⬆️
Files with missing lines Coverage Δ
...2/source/reader/io/schema/SchemaDiscoveryImpl.java 95.83% <100.00%> (+0.05%) ⬆️
...e/reader/io/transform/AccumulatingTableReader.java 100.00% <100.00%> (ø)
...2/source/reader/io/transform/ExtractTableIdFn.java 100.00% <100.00%> (ø)
...ource/reader/io/transform/GroupCompletionDoFn.java 100.00% <100.00%> (ø)
...source/reader/io/jdbc/iowrapper/JdbcIoWrapper.java 96.60% <95.45%> (-1.30%) ⬇️

... and 584 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@VardhanThigle VardhanThigle changed the title [Draft] Graph size multi shard changes in KdbcIOWrapper Graph size multi shard support - changes in JdbcIOWrapper Apr 17, 2026
Comment thread build.log Outdated
@VardhanThigle VardhanThigle force-pushed the graph-size-multi-shard-jdbc branch from 62552ce to b940a2c Compare April 17, 2026 11:11
@VardhanThigle VardhanThigle marked this pull request as ready for review April 17, 2026 11:12
@VardhanThigle VardhanThigle requested a review from a team as a code owner April 17, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants