sync-diff-inspector: enforce splitter-strategy + add auto mode#12609
sync-diff-inspector: enforce splitter-strategy + add auto mode#12609joechenrh wants to merge 7 commits intopingcap:masterfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Route GetGlobalChecksumIterator to LimitIterator when SplitterStrategy is "limit", and keep RandomIterator for "auto", "random", and the empty default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a new "auto" splitter strategy for the sync_diff_inspector, making it the default option. The changes include updating the configuration normalization logic, modifying the TiDB table analyzer to explicitly handle "limit", "random", and "auto" strategies, and adding support for the "limit" strategy in global checksum iterations. Additionally, new unit tests were implemented to ensure correct strategy enforcement and configuration normalization. I have no feedback to provide as the existing review comments did not identify actionable issues or improvements within the scope of the provided code.
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (53.4152%) is below the target coverage (60.0000%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #12609 +/- ##
================================================
- Coverage 53.5155% 53.4152% -0.1003%
================================================
Files 1037 1011 -26
Lines 145625 139975 -5650
================================================
- Hits 77932 74768 -3164
+ Misses 61845 59588 -2257
+ Partials 5848 5619 -229 🚀 New features to boost your workflow:
|
|
@joechenrh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxxx
Today
splitter-strategyin sync-diff-inspector is consumed only as a fallback when bucket-based splitting fails. The bucket iterator is always attempted first for chunk checksum, and global checksum ignores the setting entirely and always uses random. When TiDB bucket statistics are stale or misleading, bucket-based splits can produce skewed or incorrect chunk boundaries — and the user has no way to opt out.What is changed and how it works?
splitter-strategyis now a three-valued enforced setting:auto(new default)randomlimitImplementation:
sync_diff_inspector/config/config.go: addSplitterStrategyAutoand updatenormalizeSplitterStrategyso empty/"auto"normalize toauto, and any other unknown value failsCheckConfig.sync_diff_inspector/source/tidb.go:AnalyzeSplitter: switch onSplitterStrategyup-front.limit/randombuild their iterators directly; onlyautofalls through to the bucket-first path.sync_diff_inspector/source/tidb.go:GetGlobalChecksumIterator: switch onSplitterStrategy—limitreturns a*LimitIterator(chunk count reported as0sinceLimitIteratorproduces chunks asynchronously with no upfront total);auto/randomcontinue to use the random iterator.Check List
Tests
New / updated unit tests:
TestNormalizeSplitterStrategyinconfig/config_test.go— covers empty,auto, mixed case, surrounding whitespace, valid enum values, and an invalid value.TestAnalyzeSplitterEnforcesLimitStrategy/TestAnalyzeSplitterEnforcesRandomStrategy/TestAnalyzeSplitterAutoFallsBackToRandomOnBucketErrorinsource/tidb_splitter_test.go— verify (via sqlmock + iterator type assertions) that non-autostrategies skip the bucket path, and thatautostill falls back to random on bucket failure.TestGetGlobalChecksumIteratorUsesLimitWhenConfigured/TestGetGlobalChecksumIteratorDefaultsToRandominsource/tidb_checksum_test.go— verify the global-checksum dispatch honorslimitand treatsauto/random/""identically as random.Questions
Will it cause performance regression or break compatibility?
No performance regression. The default (
auto) reproduces today's behavior byte-for-byte. Users who previously setsplitter-strategy = "random"will now skip the bucket iterator — that matches the intent of having set the field explicitly.Do you need to update user documentation, design documentation or monitoring documentation?
No.
splitter-strategyis not publicly documented.Release note