You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+94-18Lines changed: 94 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,43 @@
1
1
# SQLCompare
2
2
3
-
SQLCompare helps you understand how a change impacted your data.
4
-
When you modify logic, filters, or inputs, it lets you compare the previous and current versions of a dataset—whether they come from tables, SQL queries, or files.
3
+
SQLCompare is a Python CLI and data diff tool for comparing SQL tables, SQL query results, CSV files, and Excel/XLSX files.
5
4
6
-
You can compare datasets in two complementary ways:
5
+
It is built for analytics engineers and data engineers who need to validate migrations, backfills, model rewrites, vendor file drops, and other data pipeline changes without relying on one-off SQL checks.
7
6
8
-
1) Row-by-row comparison (with an ID): detect missing rows on either side, identify the columns with the most changes, and review before/after values for any record.
9
-
2) Statistical comparison: compare column-level statistics such as null counts, distinct counts, and other aggregates to quickly understand overall impact.
7
+
Use SQLCompare to compare datasets from PostgreSQL, Snowflake, Databricks, DuckDB, CSV, and XLSX sources with the same review workflow.
10
8
11
9
---
12
10
13
-
## What you get
11
+
## What SQLCompare Does
14
12
15
-
-**Repeatable checks** for releases, backfills, migrations, and vendor drops
Use the same diff workflow for local file validation, vendor drops, and ad hoc QA.
114
+
115
+
```bash
116
+
sqlcompare run file path/to/previous.csv path/to/current.xlsx id
117
+
```
118
+
119
+
## Review Report Export (XLSX)
67
120
68
121
You can export review results as a multi-tab Excel report using `review export`.
69
122
@@ -92,7 +145,27 @@ Notes:
92
145
93
146
---
94
147
95
-
## review meta (AI-friendly metadata)
148
+
## Analytics Regression Testing and Data Validation
149
+
150
+
SQLCompare is useful when row counts are not enough and manual joins are too fragile. Instead of writing ad hoc SQL every time you change a model or pipeline, you can save a diff once and review it repeatedly with the generated `diff_id`.
151
+
152
+
Common validation workflows:
153
+
154
+
- release QA for transformed tables
155
+
- migration and backfill verification
156
+
- regression testing after SQL logic changes
157
+
- warehouse-to-file or file-to-file comparison during onboarding
158
+
159
+
## Why SQLCompare Instead of Manual SQL Diff Checks
160
+
161
+
Manual SQL joins, notebooks, and spreadsheets usually answer one question once. SQLCompare keeps the comparison reusable:
162
+
163
+
- run a compare once, then inspect stats, missing rows, and changed values
164
+
- use the same workflow across warehouse tables and local files
165
+
- export Excel review reports for debugging and handoff
166
+
- avoid rebuilding the same outer join logic for every validation task
167
+
168
+
## `review meta` for AI-Friendly Metadata
96
169
97
170
Use `review meta` to get a JSON payload describing the queryable tables and ready-to-run SQL templates for a given `diff_id`. This is especially useful for AI agents that need structured context before running analysis queries.
98
171
@@ -109,13 +182,16 @@ Output (JSON):
109
182
110
183
---
111
184
112
-
## Example outputs
185
+
## Examples
113
186
114
187
See [`examples/`](examples/) for datasets, commands, and captured outputs.
0 commit comments