sync-diff-inspector: enforce splitter-strategy + add auto mode by joechenrh · Pull Request #12609 · pingcap/tiflow

joechenrh · 2026-04-17T03:59:57Z

What problem does this PR solve?

Issue Number: close #xxxx

Today splitter-strategy in sync-diff-inspector is consumed only as a fallback when bucket-based splitting fails. The bucket iterator is always attempted first for chunk checksum, and global checksum ignores the setting entirely and always uses random. When TiDB bucket statistics are stale or misleading, bucket-based splits can produce skewed or incorrect chunk boundaries — and the user has no way to opt out.

What is changed and how it works?

splitter-strategy is now a three-valued enforced setting:

Value	Behavior
`auto` (new default)	Chunk checksum: try bucket first, fall back to random on error. Global checksum: random. Preserves the pre-existing behavior.
`random`	Random iterator for both chunk checksum and global checksum. Bucket iterator is not attempted.
`limit`	Limit iterator for both chunk checksum and global checksum. Bucket iterator is not attempted.

Implementation:

sync_diff_inspector/config/config.go: add SplitterStrategyAuto and update normalizeSplitterStrategy so empty/"auto" normalize to auto, and any other unknown value fails CheckConfig.
sync_diff_inspector/source/tidb.go:AnalyzeSplitter: switch on SplitterStrategy up-front. limit/random build their iterators directly; only auto falls through to the bucket-first path.
sync_diff_inspector/source/tidb.go:GetGlobalChecksumIterator: switch on SplitterStrategy — limit returns a *LimitIterator (chunk count reported as 0 since LimitIterator produces chunks asynchronously with no upfront total); auto/random continue to use the random iterator.

Check List

Tests

Unit test

New / updated unit tests:

TestNormalizeSplitterStrategy in config/config_test.go — covers empty, auto, mixed case, surrounding whitespace, valid enum values, and an invalid value.
TestAnalyzeSplitterEnforcesLimitStrategy / TestAnalyzeSplitterEnforcesRandomStrategy / TestAnalyzeSplitterAutoFallsBackToRandomOnBucketError in source/tidb_splitter_test.go — verify (via sqlmock + iterator type assertions) that non-auto strategies skip the bucket path, and that auto still falls back to random on bucket failure.
TestGetGlobalChecksumIteratorUsesLimitWhenConfigured / TestGetGlobalChecksumIteratorDefaultsToRandom in source/tidb_checksum_test.go — verify the global-checksum dispatch honors limit and treats auto/random/"" identically as random.

Questions

Will it cause performance regression or break compatibility?

No performance regression. The default (auto) reproduces today's behavior byte-for-byte. Users who previously set splitter-strategy = "random" will now skip the bucket iterator — that matches the intent of having set the field explicitly.

Do you need to update user documentation, design documentation or monitoring documentation?

No. splitter-strategy is not publicly documented.

Release note

sync-diff-inspector: `splitter-strategy` is now enforced. Setting `random` or `limit` skips the bucket iterator entirely for both chunk checksum and global checksum. A new `auto` value (the default) preserves the previous bucket-first-with-random-fallback behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iterators

Route GetGlobalChecksumIterator to LimitIterator when SplitterStrategy is "limit", and keep RandomIterator for "auto", "random", and the empty default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…comment

ti-chi-bot · 2026-04-17T04:00:01Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

ti-chi-bot · 2026-04-17T04:00:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign overvenus for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces a new "auto" splitter strategy for the sync_diff_inspector, making it the default option. The changes include updating the configuration normalization logic, modifying the TiDB table analyzer to explicitly handle "limit", "random", and "auto" strategies, and adding support for the "limit" strategy in global checksum iterations. Additionally, new unit tests were implemented to ensure correct strategy enforcement and configuration normalization. I have no feedback to provide as the existing review comments did not identify actionable issues or improvements within the scope of the provided code.

codecov · 2026-04-17T04:18:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.4152%. Comparing base (7e0d4db) to head (55323a0).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

❌ Your project check has failed because the head coverage (53.4152%) is below the target coverage (60.0000%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Components	Coverage Δ
cdc	`57.3566% <ø> (-0.0234%)`	⬇️
dm	`49.1975% <ø> (+0.0112%)`	⬆️
engine	`50.6659% <ø> (-0.0678%)`	⬇️

Flag	Coverage Δ
cdc	`57.3566% <ø> (?)`
unit	`53.4152% <ø> (-0.1003%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master     #12609        +/-   ##
================================================
- Coverage   53.5155%   53.4152%   -0.1003%     
================================================
  Files          1037       1011        -26     
  Lines        145625     139975      -5650     
================================================
- Hits          77932      74768      -3164     
+ Misses        61845      59588      -2257     
+ Partials       5848       5619       -229

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ti-chi-bot · 2026-04-17T07:55:56Z

@joechenrh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cdc-integration-kafka-test	`55323a0`	link	true	`/test pull-cdc-integration-kafka-test`
pull-dm-integration-test-next-gen	`55323a0`	link	false	`/test pull-dm-integration-test-next-gen`
pull-dm-integration-test	`55323a0`	link	true	`/test pull-dm-integration-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

joechenrh and others added 7 commits April 16, 2026 23:58

sync-diff-inspector: add auto splitter-strategy and enforce validation

8c349b3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sync-diff-inspector: update TableDiff.SplitterStrategy comment

dda22af

sync-diff-inspector: enforce splitter-strategy for chunk checksum

6478f08

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sync-diff-inspector: tighten AnalyzeSplitter test comments and close …

2c91b93

…iterators

sync-diff-inspector: enforce splitter-strategy for global checksum

323948e

Route GetGlobalChecksumIterator to LimitIterator when SplitterStrategy is "limit", and keep RandomIterator for "auto", "random", and the empty default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sync-diff-inspector: tighten GetGlobalChecksumIterator test name and …

f9e8842

…comment

sync-diff-inspector: clarify GetGlobalChecksumIterator strategy doc

55323a0

ti-chi-bot bot added the do-not-merge/needs-linked-issue label Apr 17, 2026

ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 17, 2026

ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 17, 2026

gemini-code-assist bot reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync-diff-inspector: enforce splitter-strategy + add auto mode#12609

sync-diff-inspector: enforce splitter-strategy + add auto mode#12609
joechenrh wants to merge 7 commits intopingcap:masterfrom
joechenrh:sync-diff-splitter-strategy-enforce

joechenrh commented Apr 17, 2026

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

codecov bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joechenrh commented Apr 17, 2026

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

codecov bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ti-chi-bot bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 17, 2026 •

edited

Loading