SQLCompare

SQLCompare is a Python CLI and data diff tool for comparing SQL tables, SQL query results, CSV files, and Excel/XLSX files.

It is built for analytics engineers and data engineers who need to validate migrations, backfills, model rewrites, vendor file drops, and other data pipeline changes without relying on one-off SQL checks.

Use SQLCompare to compare datasets from PostgreSQL, Snowflake, Databricks, DuckDB, CSV, and XLSX sources with the same review workflow.

What SQLCompare Does

Compare SQL tables before and after a logic change
Compare SQL query results when tables are not materialized yet
Compare CSV and Excel files for local QA and vendor deliveries
Review row-level diffs with before/after values and missing-row detection
Run analytics regression testing with a repeatable saved diff_id

SQLCompare supports two complementary comparison modes:

Row-by-row comparison with an ID: detect missing rows, identify changed columns, and inspect before/after values.
Statistical comparison: compare null counts, distinct counts, and other aggregates to understand overall impact.

Use Cases

Validate a dbt model rewrite by comparing the old and new SQL outputs on the same key.
Compare warehouse tables before deploying a migration or backfill.
Compare SQL query results when testing filters, joins, or business logic changes.
Diff vendor CSV or XLSX deliveries before loading them into your warehouse.
Run data validation checks for analytics regression testing and release QA.

Supported Connectors and File Types

Source type	Examples	Supported workflow
SQL tables	PostgreSQL, Snowflake, Databricks, DuckDB	`sqlcompare run table ...`
SQL query results	Inline SQL or `.sql` files	`sqlcompare run query ...`
CSV files	Local `.csv` datasets	`sqlcompare run file ...`
Excel files	Local `.xlsx` datasets	`sqlcompare run file ...`
Dataset configs	YAML definitions for SQL or file sources	`sqlcompare run dataset ...`

Install

Recommended:

uv tool install sqlcompare
-- or
uv tool install "sqlcompare[snowflake] @ git+https://github.com/luisggc/sqlcompare" --python 3.11

Install optional connector extras as needed:

uv tool install "sqlcompare[<connector>]"

Examples:

uv tool install "sqlcompare[snowflake]"
uv tool install "sqlcompare[databricks]"

Quick Start

Compare SQL Tables

Compare two tables on a key:

export SQLCOMPARE_CONN_DEFAULT="postgresql://<user>:<pass>@<host>/<db>"
sqlcompare run table analytics.fact_sales analytics.fact_sales_new id

That command prints a diff_id. Use it for follow-up analysis:

sqlcompare review stats <diff_id>
sqlcompare review diff <diff_id> --column revenue --limit 100
sqlcompare review missing <diff_id> --side current
sqlcompare review meta <diff_id>
sqlcompare review export <diff_id> --mode summary
sqlcompare review export <diff_id> --mode complete --output ./reports/full_diff.xlsx

Compare SQL Query Results

Use this when tables are not materialized yet or when you want to compare a filtered slice.

Inline SQL:

sqlcompare run query \
  --previous "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'" \
  --current "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'" \
  --index order_id \
  -c snowflake_prod

SQL files:

sqlcompare run query --previous queries/previous.sql --current queries/current.sql --index order_id -c snowflake_prod

Compare CSV and Excel Files

Use the same diff workflow for local file validation, vendor drops, and ad hoc QA.

sqlcompare run file path/to/previous.csv path/to/current.xlsx id

Review Report Export (XLSX)

You can export review results as a multi-tab Excel report using review export.

Modes:

--mode summary (default): creates Overview + per-column tabs (top 200 rows each) + SQL Reference
--mode complete: same tabs, but without the 200-row cap (limited only by XLSX limits)

Examples:

# Save summary report in current directory with generated timestamped filename
sqlcompare review export <diff_id> --mode summary

# Save full report to a specific location
sqlcompare review export <diff_id> --mode complete --output ./reports/full_diff.xlsx

# Save a single-column summary report
sqlcompare review export <diff_id> --mode summary --column revenue --output ./reports/revenue_diff.xlsx

Notes:

--output is optional; if omitted, SQLCompare generates a readable timestamped filename.
review export is for the standard diff view and should not be combined with review stats, review missing, or review columns.

Analytics Regression Testing and Data Validation

SQLCompare is useful when row counts are not enough and manual joins are too fragile. Instead of writing ad hoc SQL every time you change a model or pipeline, you can save a diff once and review it repeatedly with the generated diff_id.

Common validation workflows:

release QA for transformed tables
migration and backfill verification
regression testing after SQL logic changes
warehouse-to-file or file-to-file comparison during onboarding

Why SQLCompare Instead of Manual SQL Diff Checks

Manual SQL joins, notebooks, and spreadsheets usually answer one question once. SQLCompare keeps the comparison reusable:

run a compare once, then inspect stats, missing rows, and changed values
use the same workflow across warehouse tables and local files
export Excel review reports for debugging and handoff
avoid rebuilding the same outer join logic for every validation task

`review meta` for AI-Friendly Metadata

Use review meta to get a JSON payload describing the queryable tables and ready-to-run SQL templates for a given diff_id. This is especially useful for AI agents that need structured context before running analysis queries.

sqlcompare review meta <diff_id>

Output (JSON):

diff_id, resolved_diff_id, connection
index_columns, common_columns
tables: previous/current/join table names and their columns
queries: SQL templates for row-only checks and per-column diffs

Examples

See examples/ for datasets, commands, and captured outputs.

Core Idea: Compare Once, Analyze Many Times

SQLCompare does two things:

Compare two datasets using an index (single or composite key)
Persist comparison results and return a diff_id you can use to:

get overall stats
drill into a specific column’s changes
list missing rows (previous-only / current-only)
pull samples for debugging

Usage by Use Case

1) Compare SQL tables

Best for production validation and regression checks across supported connectors.

sqlcompare run table analytics.users analytics.users_new user_id,tenant_id

Why it’s useful:

Handles composite keys
Produces a saved diff ID for repeatable review

2) Compare SQL query results

Use this when tables are not materialized yet or you want a filtered slice.

Inline SQL:

sqlcompare run query \
  --previous "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'" \
  --current "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'" \
  --index order_id \
  -c snowflake_prod

SQL files:

sqlcompare run query --previous queries/previous.sql --current queries/current.sql --index order_id -c snowflake_prod

Or create a dataset config:

previous:
  select_sql: "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'"
  index:
    - ORDER_ID

new:
  select_sql: "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'"
  index:
    - ORDER_ID

Run the compare:

sqlcompare run dataset path/to/dataset.yaml

Why it’s useful:

Compare slices without touching production tables
Keep an auditable diff_id

3) Compare CSV and Excel files (DuckDB)

Great for ad hoc QA, one-off deliveries, or vendor drops. SQLCompare uses DuckDB under the hood — no DB server required.

Create a dataset config (supports {{here}} for relative paths):

previous:
  file_name: "{{here}}/previous.csv"
  index:
    - id

new:
  file_name: "{{here}}/current.xlsx"
  index:
    - id

Set a local default connector and run:

sqlcompare run file path/to/previous.csv path/to/current.xlsx id

Why it’s useful:

Same diff workflow as warehouses
Fast local comparisons, zero infra

4) Compare tables inside a DuckDB file

Use this when your data lives in a local .duckdb file.

export SQLCOMPARE_CONN_LOCAL="duckdb:////absolute/path/to/warehouse.duckdb"
sqlcompare run table raw.customers staged.customers id -c local

Why it’s useful:

Local + fast, without Snowflake/warehouse costs
Easy to integrate into lightweight pipelines

Configuration

SQLCompare resolves connectors in this order:

default connector
direct URL
environment variables
YAML files

Environment variables

SQLCOMPARE_CONN_DEFAULT — default connector URL (used when -c is omitted)
SQLCOMPARE_CONN_{NAME} — SQLAlchemy URL for a named connector
SQLCOMPARE_COMPARISON_SCHEMA — schema for comparison tables (default: sqlcompare)

Example:

export SQLCOMPARE_CONN_DEFAULT="postgresql://..."
export SQLCOMPARE_CONN_LOCAL="duckdb:////abs/path/to/db.duckdb"

YAML connections file (optional)

Location:

~/.sqlcompare/connections.yml

Example:

snowflake:
  drivername: snowflake
  username: my_user
  password: my_password
  host: my_account
  database: ANALYTICS
  schema: PUBLIC
  query:
    warehouse: COMPUTE_WH

Operational notes

SQLCompare creates a physical join table in the comparison schema. Ensure your connector has CREATE SCHEMA and CREATE TABLE privileges.
Table names are passed through directly to SQL. Provide fully-qualified or quoted names when required by your database.
Index columns must exist in both datasets; otherwise the command fails with a clear error.

Development

Run tests:

uv run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
examples		examples
scripts/generate_examples		scripts/generate_examples
skills		skills
sqlcompare		sqlcompare
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQLCompare

What SQLCompare Does

Use Cases

Supported Connectors and File Types

Install

Quick Start

Compare SQL Tables

Compare SQL Query Results

Compare CSV and Excel Files

Review Report Export (XLSX)

Analytics Regression Testing and Data Validation

Why SQLCompare Instead of Manual SQL Diff Checks

`review meta` for AI-Friendly Metadata

Examples

Core Idea: Compare Once, Analyze Many Times

Usage by Use Case

1) Compare SQL tables

2) Compare SQL query results

3) Compare CSV and Excel files (DuckDB)

4) Compare tables inside a DuckDB file

Configuration

Environment variables

YAML connections file (optional)

Operational notes

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SQLCompare

What SQLCompare Does

Use Cases

Supported Connectors and File Types

Install

Quick Start

Compare SQL Tables

Compare SQL Query Results

Compare CSV and Excel Files

Review Report Export (XLSX)

Analytics Regression Testing and Data Validation

Why SQLCompare Instead of Manual SQL Diff Checks

review meta for AI-Friendly Metadata

Examples

Core Idea: Compare Once, Analyze Many Times

Usage by Use Case

1) Compare SQL tables

2) Compare SQL query results

3) Compare CSV and Excel files (DuckDB)

4) Compare tables inside a DuckDB file

Configuration

Environment variables

YAML connections file (optional)

Operational notes

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`review meta` for AI-Friendly Metadata

Packages