FEAT: Add CI perf baseline regression detection & insertmanyvalues benchmark#527
FEAT: Add CI perf baseline regression detection & insertmanyvalues benchmark#527bewithgaurav wants to merge 1 commit intomainfrom
Conversation
…n, CI artifact pipeline - Add insertmanyvalues scenario (100K rows, 1000/batch, GH #500 repro) - Add --baseline flag for 3-way comparison (main vs PR vs pyodbc) - Add --json flag to save results for pipeline artifact consumption - Publish benchmark results as CI artifact on main merge - Download baseline artifact on PR runs for automatic regression detection - Fix timing: time.time() -> time.perf_counter() - Move connection setup outside timing window (measure execute+fetch only)
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changesNo lines with coverage information in this diff. 📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%
mssql_python.pybind.connection.connection.cpp: 75.8%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.2%🔗 Quick Links
|
There was a problem hiding this comment.
Pull request overview
This PR enhances CI performance benchmarking by generating JSON benchmark outputs, adding optional baseline comparisons against main, and introducing an insertmanyvalues-style insert benchmark alongside existing fetch/query scenarios.
Changes:
- Update CI to run
benchmarks/perf-benchmarking.pywith--baseline ... --json ...and publish benchmark results as a pipeline artifact. - Add logic to download a baseline artifact from the latest
mainrun for PR comparisons. - Extend the benchmark script with baseline/JSON support, updated reporting, and a new insertmanyvalues benchmark scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| eng/pipelines/pr-validation-pipeline.yml | Runs benchmarks with baseline/JSON flags and adds publish/download artifact steps intended to enable baseline comparisons in CI. |
| benchmarks/perf-benchmarking.py | Adds argparse-driven baseline + JSON I/O, improves output formatting, and adds an insertmanyvalues benchmark scenario. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Write-Host "`nRunning performance benchmarks..." | ||
| python benchmarks/perf-benchmarking.py | ||
| python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json |
There was a problem hiding this comment.
The benchmark run is invoked with --baseline benchmark_baseline.json, but this pipeline only ever produces/publishes benchmark_results.json (and the download step would bring down that file, typically under an artifact directory). As written, the script will never find the baseline file, so baseline comparison/regression detection won’t run. Align the baseline path with the downloaded artifact’s actual filename/location (and avoid clobbering the current run’s benchmark_results.json output).
| python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json | |
| python benchmarks/perf-benchmarking.py --baseline "$(Pipeline.Workspace)/perf-baseline-$(sqlVersion)/benchmark_results.json" --json benchmark_results.json |
| # Download baseline from latest main run (for PR comparison) | ||
| - task: DownloadPipelineArtifact@2 | ||
| inputs: | ||
| source: specific | ||
| project: $(System.TeamProjectId) | ||
| pipeline: $(System.DefinitionId) | ||
| runVersion: latestFromBranch | ||
| runBranch: refs/heads/main | ||
| artifact: 'perf-baseline-$(sqlVersion)' | ||
| path: $(Build.SourcesDirectory) | ||
| displayName: 'Download baseline from main' | ||
| condition: and(ne(variables['Build.SourceBranch'], 'refs/heads/main'), or(eq(variables['sqlVersion'], 'SQL2022'), eq(variables['sqlVersion'], 'SQL2025'))) | ||
| continueOnError: true | ||
|
|
There was a problem hiding this comment.
The baseline artifact download happens after the benchmark script runs, so the --baseline ... file cannot exist at the time the script tries to read it. Move the DownloadPipelineArtifact step before the benchmark execution step (and ensure it downloads into the location the script expects).
| # Download baseline from latest main run (for PR comparison) | |
| - task: DownloadPipelineArtifact@2 | |
| inputs: | |
| source: specific | |
| project: $(System.TeamProjectId) | |
| pipeline: $(System.DefinitionId) | |
| runVersion: latestFromBranch | |
| runBranch: refs/heads/main | |
| artifact: 'perf-baseline-$(sqlVersion)' | |
| path: $(Build.SourcesDirectory) | |
| displayName: 'Download baseline from main' | |
| condition: and(ne(variables['Build.SourceBranch'], 'refs/heads/main'), or(eq(variables['sqlVersion'], 'SQL2022'), eq(variables['sqlVersion'], 'SQL2025'))) | |
| continueOnError: true |
| pip install pyodbc | ||
| echo 'Running performance benchmarks on $(distroName)' | ||
| python benchmarks/perf-benchmarking.py || echo 'Performance benchmark failed or database not available' | ||
| python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json || echo 'Performance benchmark failed or database not available' |
There was a problem hiding this comment.
This container benchmark invocation passes --baseline benchmark_baseline.json, but there is no preceding step that downloads/restores that baseline file into the container/workspace. Either add a baseline download + mount/copy into the container, or drop the --baseline flag here to avoid always falling back to the 2-column output.
| python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json || echo 'Performance benchmark failed or database not available' | |
| BASELINE_ARG='' | |
| if [ -f /workspace/benchmark_baseline.json ]; then | |
| echo 'Using benchmark baseline file' | |
| BASELINE_ARG='--baseline benchmark_baseline.json' | |
| else | |
| echo 'No benchmark baseline file found; running without baseline' | |
| fi | |
| python benchmarks/perf-benchmarking.py $BASELINE_ARG --json benchmark_results.json || echo 'Performance benchmark failed or database not available' |
| try: | ||
| with open(path) as f: | ||
| data = json.load(f) | ||
| return data.get("baseline", {}) |
There was a problem hiding this comment.
load_baseline() returns {} when the JSON file exists but has no baseline key, but print_results() treats any non-None value as “has baseline”. That can produce a misleading 3-column table full of N/A values. Consider returning None when the parsed baseline is missing/empty and/or changing has_baseline to bool(baseline).
| return data.get("baseline", {}) | |
| baseline = data.get("baseline") | |
| return baseline if baseline else None |
| conn = conn_factory(conn_str) | ||
| cursor = conn.cursor() | ||
| cursor.execute( | ||
| "IF OBJECT_ID('tempdb..#bench_insert') IS NOT NULL DROP TABLE #bench_insert; " | ||
| "CREATE TABLE #bench_insert (id INT, val VARCHAR(100))" | ||
| ) | ||
|
|
||
| start = time.perf_counter() | ||
| for flat_params in batches: | ||
| cursor.execute(batch_sql, flat_params) | ||
| elapsed = time.perf_counter() - start | ||
|
|
||
| result.add_time(elapsed, INSERTMANY_ROWS) | ||
| cursor.close() | ||
| conn.close() |
There was a problem hiding this comment.
The insertmanyvalues benchmark measures only the looped INSERT executes and stops timing before any commit()/transaction finalization or connection close. Since mssql_python.connect() defaults to autocommit=False and the connection closes with an implicit rollback when autocommit is off, this can substantially underreport real insert cost and makes results harder to compare. Consider enabling autocommit for this scenario or including an explicit commit (and any required close/cleanup) inside the timed window for both drivers.
Work Item / Issue Reference
Summary
This pull request enhances the performance benchmarking process in the CI pipeline by adding support for baseline comparison and automated artifact publishing and retrieval. The main improvements are focused on making benchmark results reproducible and comparable across pull requests and main branch runs.
Performance benchmarking improvements:
benchmarks/perf-benchmarking.py) is now run with--baseline benchmark_baseline.json --json benchmark_results.jsonarguments, enabling baseline comparison and outputting results in JSON format. [1] [2]Artifact management for benchmarks:
benchmark_results.json) as a pipeline artifact namedperf-baseline-$(sqlVersion)when running on themainbranch. This artifact serves as the baseline for future comparisons.mainbranch for pull request runs, enabling direct comparison of current benchmark results against the established baseline.