Skip to content

FEAT: Add CI perf baseline regression detection & insertmanyvalues benchmark#527

Open
bewithgaurav wants to merge 1 commit intomainfrom
bewithgaurav/insertmany-benchmark
Open

FEAT: Add CI perf baseline regression detection & insertmanyvalues benchmark#527
bewithgaurav wants to merge 1 commit intomainfrom
bewithgaurav/insertmany-benchmark

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav commented Apr 17, 2026

Work Item / Issue Reference

AB#44074


Summary

This pull request enhances the performance benchmarking process in the CI pipeline by adding support for baseline comparison and automated artifact publishing and retrieval. The main improvements are focused on making benchmark results reproducible and comparable across pull requests and main branch runs.

Performance benchmarking improvements:

  • The benchmark script (benchmarks/perf-benchmarking.py) is now run with --baseline benchmark_baseline.json --json benchmark_results.json arguments, enabling baseline comparison and outputting results in JSON format. [1] [2]

Artifact management for benchmarks:

  • Added a step to publish the current benchmark results (benchmark_results.json) as a pipeline artifact named perf-baseline-$(sqlVersion) when running on the main branch. This artifact serves as the baseline for future comparisons.
  • Added a step to download the latest baseline artifact from the main branch for pull request runs, enabling direct comparison of current benchmark results against the established baseline.

…n, CI artifact pipeline

- Add insertmanyvalues scenario (100K rows, 1000/batch, GH #500 repro)
- Add --baseline flag for 3-way comparison (main vs PR vs pyodbc)
- Add --json flag to save results for pipeline artifact consumption
- Publish benchmark results as CI artifact on main merge
- Download baseline artifact on PR runs for automatic regression detection
- Fix timing: time.time() -> time.perf_counter()
- Move connection setup outside timing window (measure execute+fetch only)
@github-actions github-actions bot added the pr-size: large Substantial code update label Apr 17, 2026
Comment thread benchmarks/perf-benchmarking.py Dismissed
@github-actions
Copy link
Copy Markdown

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

79%


📈 Total Lines Covered: 6677 out of 8449
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

No lines with coverage information in this diff.


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%
mssql_python.pybind.connection.connection.cpp: 75.8%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.2%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@bewithgaurav bewithgaurav marked this pull request as ready for review April 17, 2026 10:45
Copilot AI review requested due to automatic review settings April 17, 2026 10:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances CI performance benchmarking by generating JSON benchmark outputs, adding optional baseline comparisons against main, and introducing an insertmanyvalues-style insert benchmark alongside existing fetch/query scenarios.

Changes:

  • Update CI to run benchmarks/perf-benchmarking.py with --baseline ... --json ... and publish benchmark results as a pipeline artifact.
  • Add logic to download a baseline artifact from the latest main run for PR comparisons.
  • Extend the benchmark script with baseline/JSON support, updated reporting, and a new insertmanyvalues benchmark scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
eng/pipelines/pr-validation-pipeline.yml Runs benchmarks with baseline/JSON flags and adds publish/download artifact steps intended to enable baseline comparisons in CI.
benchmarks/perf-benchmarking.py Adds argparse-driven baseline + JSON I/O, improves output formatting, and adds an insertmanyvalues benchmark scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Write-Host "`nRunning performance benchmarks..."
python benchmarks/perf-benchmarking.py
python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark run is invoked with --baseline benchmark_baseline.json, but this pipeline only ever produces/publishes benchmark_results.json (and the download step would bring down that file, typically under an artifact directory). As written, the script will never find the baseline file, so baseline comparison/regression detection won’t run. Align the baseline path with the downloaded artifact’s actual filename/location (and avoid clobbering the current run’s benchmark_results.json output).

Suggested change
python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json
python benchmarks/perf-benchmarking.py --baseline "$(Pipeline.Workspace)/perf-baseline-$(sqlVersion)/benchmark_results.json" --json benchmark_results.json

Copilot uses AI. Check for mistakes.
Comment on lines +402 to +415
# Download baseline from latest main run (for PR comparison)
- task: DownloadPipelineArtifact@2
inputs:
source: specific
project: $(System.TeamProjectId)
pipeline: $(System.DefinitionId)
runVersion: latestFromBranch
runBranch: refs/heads/main
artifact: 'perf-baseline-$(sqlVersion)'
path: $(Build.SourcesDirectory)
displayName: 'Download baseline from main'
condition: and(ne(variables['Build.SourceBranch'], 'refs/heads/main'), or(eq(variables['sqlVersion'], 'SQL2022'), eq(variables['sqlVersion'], 'SQL2025')))
continueOnError: true

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline artifact download happens after the benchmark script runs, so the --baseline ... file cannot exist at the time the script tries to read it. Move the DownloadPipelineArtifact step before the benchmark execution step (and ensure it downloads into the location the script expects).

Suggested change
# Download baseline from latest main run (for PR comparison)
- task: DownloadPipelineArtifact@2
inputs:
source: specific
project: $(System.TeamProjectId)
pipeline: $(System.DefinitionId)
runVersion: latestFromBranch
runBranch: refs/heads/main
artifact: 'perf-baseline-$(sqlVersion)'
path: $(Build.SourcesDirectory)
displayName: 'Download baseline from main'
condition: and(ne(variables['Build.SourceBranch'], 'refs/heads/main'), or(eq(variables['sqlVersion'], 'SQL2022'), eq(variables['sqlVersion'], 'SQL2025')))
continueOnError: true

Copilot uses AI. Check for mistakes.
pip install pyodbc
echo 'Running performance benchmarks on $(distroName)'
python benchmarks/perf-benchmarking.py || echo 'Performance benchmark failed or database not available'
python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json || echo 'Performance benchmark failed or database not available'
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container benchmark invocation passes --baseline benchmark_baseline.json, but there is no preceding step that downloads/restores that baseline file into the container/workspace. Either add a baseline download + mount/copy into the container, or drop the --baseline flag here to avoid always falling back to the 2-column output.

Suggested change
python benchmarks/perf-benchmarking.py --baseline benchmark_baseline.json --json benchmark_results.json || echo 'Performance benchmark failed or database not available'
BASELINE_ARG=''
if [ -f /workspace/benchmark_baseline.json ]; then
echo 'Using benchmark baseline file'
BASELINE_ARG='--baseline benchmark_baseline.json'
else
echo 'No benchmark baseline file found; running without baseline'
fi
python benchmarks/perf-benchmarking.py $BASELINE_ARG --json benchmark_results.json || echo 'Performance benchmark failed or database not available'

Copilot uses AI. Check for mistakes.
try:
with open(path) as f:
data = json.load(f)
return data.get("baseline", {})
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_baseline() returns {} when the JSON file exists but has no baseline key, but print_results() treats any non-None value as “has baseline”. That can produce a misleading 3-column table full of N/A values. Consider returning None when the parsed baseline is missing/empty and/or changing has_baseline to bool(baseline).

Suggested change
return data.get("baseline", {})
baseline = data.get("baseline")
return baseline if baseline else None

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +175
conn = conn_factory(conn_str)
cursor = conn.cursor()
cursor.execute(
"IF OBJECT_ID('tempdb..#bench_insert') IS NOT NULL DROP TABLE #bench_insert; "
"CREATE TABLE #bench_insert (id INT, val VARCHAR(100))"
)

start = time.perf_counter()
for flat_params in batches:
cursor.execute(batch_sql, flat_params)
elapsed = time.perf_counter() - start

result.add_time(elapsed, INSERTMANY_ROWS)
cursor.close()
conn.close()
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insertmanyvalues benchmark measures only the looped INSERT executes and stops timing before any commit()/transaction finalization or connection close. Since mssql_python.connect() defaults to autocommit=False and the connection closes with an implicit rollback when autocommit is off, this can substantially underreport real insert cost and makes results harder to compare. Consider enabling autocommit for this scenario or including an explicit commit (and any required close/cleanup) inside the timed window for both drivers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants