test(dm): add MariaDB source smoke test and next-gen integration test#12599
test(dm): add MariaDB source smoke test and next-gen integration test#12599joechenrh wants to merge 17 commits intopingcap:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Code Review
This pull request introduces integration tests for MariaDB as a data source in DM. It includes environment variable updates, configuration files, test data, and a new test runner script. The main test entry point was also updated to conditionally manage MySQL and MariaDB services. Feedback focuses on improving test isolation and maintainability, specifically by disabling automatic master resets in MariaDB-only environments, ensuring consistent SQL modes and server variables for MariaDB, and using wildcard patterns for test case matching.
|
/test ? |
|
@joechenrh: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-dm-integration-test-next-gen |
|
/test pull-dm-integration-test-next-gen |
|
/test pull-dm-integration-test-next-gen |
3 similar comments
|
/test pull-dm-integration-test-next-gen |
|
/test pull-dm-integration-test-next-gen |
|
/test pull-dm-integration-test-next-gen |
222df5d to
1907227
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #12599 +/- ##
===========================================
Coverage ? 53.4109%
===========================================
Files ? 1011
Lines ? 139975
Branches ? 0
===========================================
Hits ? 74762
Misses ? 59592
Partials ? 5621 🚀 New features to boost your workflow:
|
9314f56 to
4293341
Compare
Add cleanup_downstream_cluster to test_prepare: handles next-gen (port-4000 TiDB only) vs classic (tidb+tikv+pd + unistore data) teardown in one function. Replace all raw killall/pkill tidb-server patterns across 9 test scripts with cleanup_tidb_server or cleanup_downstream_cluster. This eliminates ~30 duplicated kill+wait+cleanup lines and ensures next-gen SYSTEM TiDB is preserved consistently. Files simplified: new_collation_off, tls, openapi, many_tables, lightning_mode, s3_dumpling_lightning, import_into_mode, util.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
83fe810 to
ef58291
Compare
The .normalized files were written inside /tmp/configs/tasks/ which config import scans for task configs. Move normalized copies to /tmp/ with distinct names so they don't interfere with config import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/test pull-dm-integration-test-next-gen |
1 similar comment
|
/test pull-dm-integration-test-next-gen |
… port Lightning's loader fetches TiDB settings via the HTTP status port (10080). When TiDB has [security] ssl-* configured, Lightning assumes the status port serves HTTPS. But ssl-* only enables TLS on the mysql port — the status port needs cluster-ssl-* (which requires TLS-enabled PD/TiKV) to serve HTTPS. Skip until the next-gen cluster supports full TLS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/test pull-dm-integration-test-next-gen |
2e2261a to
1c87655
Compare
In multi-master HA tests (3-node etcd cluster), sending SIGHUP to all masters simultaneously causes etcd to lose quorum — each master tries to transfer leadership but no peer can accept it. The leader transfer blocks for 120s, failing the test. Fix: kill dm-masters one at a time (SIGHUP + 30s wait per master), so each graceful shutdown completes while quorum is maintained. Escalate to SIGKILL after 30s for any stuck master. Workers and syncers are still killed in parallel (no quorum dependency). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1c87655 to
de9eb41
Compare
|
/test pull-dm-integration-test-next-gen |
…aster kill Two fixes: 1. many_tables Phase 2: run_downstream_cluster already starts TiDB on port 4000 (classic). The extra run_tidb_server after it tried to start a second TiDB → fslock crash → Phase 2 failure → worker stuck. Now run_tidb_server only runs on next-gen (where run_downstream_cluster is not called). 2. cleanup_process: kill dm-masters one at a time (SIGHUP + 30s wait) to maintain etcd quorum during graceful shutdown. Previously all 3 masters received SIGHUP simultaneously → etcd lost quorum → leader transfer blocked 120s. Workers use SIGKILL directly (can be stuck in long Lightning loads). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Missing i++ in the dm-worker exit log wait loop caused an infinite loop when the log message wasn't found (exposed on next-gen where cleanup_process uses SIGKILL — worker doesn't log exit message). 2. Make the timeout non-fatal since the exit log is just a flush indicator, not a required assertion. The actual status checks follow after. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/retest |
- Fix shfmt indentation: cluster_lib.sh (4-space→tab), run_group.sh (space-tab→tab on G10), run.sh (case statement extra tab) - mariadb_source: set RESET_MASTER=false before sourcing test_prepare - run.sh: add MariaDB SQL_MODE cleanup in stop_services - run.sh: add set_default_variables for MariaDB in start_services - run.sh: widen case pattern from mariadb_source) to mariadb_*) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rver Classic TiDB (unistore) accepts -keyspace-name and -tidb-service-scope as no-ops, verified locally. Remove the NEXT_GEN guard so both classic and next-gen use the same config path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Classic TiDB rejects keyspace-name in config ("invalid config: keyspace
name or standby mode is not supported for classic TiDB"). Restore the
NEXT_GEN guard in run_tidb_server and tls/run.sh.
Also remove the diagnostic DROP DATABASE logging from run_sql that was
left over from debugging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mplify run.sh - env_variables: add TIDB_EXTRA_ARGS under NEXT_GEN guard - tls/run.sh: use TIDB_EXTRA_ARGS instead of inline NEXT_GEN check - test_prepare: add shared normalize_session_block() function - config.sh, new_relay/run.sh: use normalize_session_block() - run.sh: move test_case parsing back to original position, remove redundant initial need_mariadb/need_mysql defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes pull-check failure: "mariadb_source is not added to any group" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MariaDB sidecar is not yet available in CI (PingCAP-QE/ci#4496). Exclude mariadb_source from group assignment and the "check others" validation until the CI change is merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/retest |
1 similar comment
|
/retest |
((i++)) when i=0 returns exit code 1 (pre-increment value is 0), which set -e treats as failure. Use i=$((i + 1)) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/retest |
What problem does this PR solve?
Issue Number: ref #12615
What is changed and how it works?
Enable DM integration tests to run on next-gen TiDB (Cloud Storage Engine edition) alongside classic TiDB. All 13 test groups (G00–G11 + TLS_GROUP) pass on next-gen CI.
mariadb_source)cluster_lib.shInfrastructure changes
cluster_lib.sh(new): centralizes cluster lifecycle ops —cleanup_tidb_server,cleanup_downstream_cluster,run_tidb_server,run_downstream_cluster,run_downstream_cluster_with_tlsrun_downstream_cluster_nextgen(new): starts MinIO + PD + TiKV + tikv-worker + SYSTEM TiDB + user TiDBrun_downstream_cluster_with_tls_nextgen(new): restarts user TiDB with client-facing TLSrun_downstream_cluster_classic(renamed fromrun_downstream_cluster): classic PD + TiKV + TiDBenv_variables: centralized next-gen vars (PD_ADDR, TIKV_WORKER_ADDR, KEYSPACE_NAME, MINIO_*, etc.) underNEXT_GEN=1guardrun_tidb_server: unified startup — unistore/tikv via PD_ADDR, next-gen keyspace config, TLS detectioncleanup_tidb_server: port-4000 targeted (preserves SYSTEM TiDB on 4001), removes temp-storage lockcleanup_process: sequential dm-master kill (maintains etcd quorum), SIGKILL for workersha_cases_lib.sh: moveprint_debug_statusfrom ha_cases2 (fix command-not-found in ha_cases3)tidb_ddl_enable_fast_reorg=0/tidb_enable_dist_task=0on next-gen (breaks DXF-based DDL)Test adaptations for next-gen (by group)
Tests skipped on next-gen
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No. This only affects DM integration test infrastructure. No production code changes.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note