SQLCompare is a Python CLI and data diff tool for comparing SQL tables, SQL query results, CSV files, and Excel/XLSX files.
It is built for analytics engineers and data engineers who need to validate migrations, backfills, model rewrites, vendor file drops, and other data pipeline changes without relying on one-off SQL checks.
Use SQLCompare to compare datasets from PostgreSQL, Snowflake, Databricks, DuckDB, CSV, and XLSX sources with the same review workflow.
- Compare SQL tables before and after a logic change
- Compare SQL query results when tables are not materialized yet
- Compare CSV and Excel files for local QA and vendor deliveries
- Review row-level diffs with before/after values and missing-row detection
- Run analytics regression testing with a repeatable saved
diff_id
SQLCompare supports two complementary comparison modes:
- Row-by-row comparison with an ID: detect missing rows, identify changed columns, and inspect before/after values.
- Statistical comparison: compare null counts, distinct counts, and other aggregates to understand overall impact.
- Validate a dbt model rewrite by comparing the old and new SQL outputs on the same key.
- Compare warehouse tables before deploying a migration or backfill.
- Compare SQL query results when testing filters, joins, or business logic changes.
- Diff vendor CSV or XLSX deliveries before loading them into your warehouse.
- Run data validation checks for analytics regression testing and release QA.
| Source type | Examples | Supported workflow |
|---|---|---|
| SQL tables | PostgreSQL, Snowflake, Databricks, DuckDB | sqlcompare run table ... |
| SQL query results | Inline SQL or .sql files |
sqlcompare run query ... |
| CSV files | Local .csv datasets |
sqlcompare run file ... |
| Excel files | Local .xlsx datasets |
sqlcompare run file ... |
| Dataset configs | YAML definitions for SQL or file sources | sqlcompare run dataset ... |
Recommended:
uv tool install sqlcompare
-- or
uv tool install "sqlcompare[snowflake] @ git+https://github.com/luisggc/sqlcompare" --python 3.11Install optional connector extras as needed:
uv tool install "sqlcompare[<connector>]"Examples:
uv tool install "sqlcompare[snowflake]"
uv tool install "sqlcompare[databricks]"Compare two tables on a key:
export SQLCOMPARE_CONN_DEFAULT="postgresql://<user>:<pass>@<host>/<db>"
sqlcompare run table analytics.fact_sales analytics.fact_sales_new idThat command prints a diff_id. Use it for follow-up analysis:
sqlcompare review stats <diff_id>
sqlcompare review diff <diff_id> --column revenue --limit 100
sqlcompare review missing <diff_id> --side current
sqlcompare review meta <diff_id>
sqlcompare review export <diff_id> --mode summary
sqlcompare review export <diff_id> --mode complete --output ./reports/full_diff.xlsxUse this when tables are not materialized yet or when you want to compare a filtered slice.
Inline SQL:
sqlcompare run query \
--previous "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'" \
--current "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'" \
--index order_id \
-c snowflake_prodSQL files:
sqlcompare run query --previous queries/previous.sql --current queries/current.sql --index order_id -c snowflake_prodUse the same diff workflow for local file validation, vendor drops, and ad hoc QA.
sqlcompare run file path/to/previous.csv path/to/current.xlsx idYou can export review results as a multi-tab Excel report using review export.
Modes:
--mode summary(default): createsOverview+ per-column tabs (top 200 rows each) +SQL Reference--mode complete: same tabs, but without the 200-row cap (limited only by XLSX limits)
Examples:
# Save summary report in current directory with generated timestamped filename
sqlcompare review export <diff_id> --mode summary
# Save full report to a specific location
sqlcompare review export <diff_id> --mode complete --output ./reports/full_diff.xlsx
# Save a single-column summary report
sqlcompare review export <diff_id> --mode summary --column revenue --output ./reports/revenue_diff.xlsxNotes:
--outputis optional; if omitted, SQLCompare generates a readable timestamped filename.review exportis for the standard diff view and should not be combined withreview stats,review missing, orreview columns.
SQLCompare is useful when row counts are not enough and manual joins are too fragile. Instead of writing ad hoc SQL every time you change a model or pipeline, you can save a diff once and review it repeatedly with the generated diff_id.
Common validation workflows:
- release QA for transformed tables
- migration and backfill verification
- regression testing after SQL logic changes
- warehouse-to-file or file-to-file comparison during onboarding
Manual SQL joins, notebooks, and spreadsheets usually answer one question once. SQLCompare keeps the comparison reusable:
- run a compare once, then inspect stats, missing rows, and changed values
- use the same workflow across warehouse tables and local files
- export Excel review reports for debugging and handoff
- avoid rebuilding the same outer join logic for every validation task
Use review meta to get a JSON payload describing the queryable tables and ready-to-run SQL templates for a given diff_id. This is especially useful for AI agents that need structured context before running analysis queries.
sqlcompare review meta <diff_id>Output (JSON):
diff_id,resolved_diff_id,connectionindex_columns,common_columnstables: previous/current/join table names and their columnsqueries: SQL templates for row-only checks and per-column diffs
See examples/ for datasets, commands, and captured outputs.
SQLCompare does two things:
- Compare two datasets using an index (single or composite key)
- Persist comparison results and return a diff_id you can use to:
- get overall stats
- drill into a specific column’s changes
- list missing rows (previous-only / current-only)
- pull samples for debugging
Best for production validation and regression checks across supported connectors.
sqlcompare run table analytics.users analytics.users_new user_id,tenant_idWhy it’s useful:
- Handles composite keys
- Produces a saved diff ID for repeatable review
Use this when tables are not materialized yet or you want a filtered slice.
Inline SQL:
sqlcompare run query \
--previous "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'" \
--current "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'" \
--index order_id \
-c snowflake_prodSQL files:
sqlcompare run query --previous queries/previous.sql --current queries/current.sql --index order_id -c snowflake_prodOr create a dataset config:
previous:
select_sql: "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'"
index:
- ORDER_ID
new:
select_sql: "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'"
index:
- ORDER_IDRun the compare:
sqlcompare run dataset path/to/dataset.yamlWhy it’s useful:
- Compare slices without touching production tables
- Keep an auditable diff_id
Great for ad hoc QA, one-off deliveries, or vendor drops. SQLCompare uses DuckDB under the hood — no DB server required.
Create a dataset config (supports {{here}} for relative paths):
previous:
file_name: "{{here}}/previous.csv"
index:
- id
new:
file_name: "{{here}}/current.xlsx"
index:
- idSet a local default connector and run:
sqlcompare run file path/to/previous.csv path/to/current.xlsx idWhy it’s useful:
- Same diff workflow as warehouses
- Fast local comparisons, zero infra
Use this when your data lives in a local .duckdb file.
export SQLCOMPARE_CONN_LOCAL="duckdb:////absolute/path/to/warehouse.duckdb"
sqlcompare run table raw.customers staged.customers id -c localWhy it’s useful:
- Local + fast, without Snowflake/warehouse costs
- Easy to integrate into lightweight pipelines
SQLCompare resolves connectors in this order:
- default connector
- direct URL
- environment variables
- YAML files
SQLCOMPARE_CONN_DEFAULT— default connector URL (used when-cis omitted)SQLCOMPARE_CONN_{NAME}— SQLAlchemy URL for a named connectorSQLCOMPARE_COMPARISON_SCHEMA— schema for comparison tables (default:sqlcompare)
Example:
export SQLCOMPARE_CONN_DEFAULT="postgresql://..."
export SQLCOMPARE_CONN_LOCAL="duckdb:////abs/path/to/db.duckdb"Location:
~/.sqlcompare/connections.yml
Example:
snowflake:
drivername: snowflake
username: my_user
password: my_password
host: my_account
database: ANALYTICS
schema: PUBLIC
query:
warehouse: COMPUTE_WH- SQLCompare creates a physical join table in the comparison schema. Ensure your connector has
CREATE SCHEMAandCREATE TABLEprivileges. - Table names are passed through directly to SQL. Provide fully-qualified or quoted names when required by your database.
- Index columns must exist in both datasets; otherwise the command fails with a clear error.
Run tests:
uv run pytest