Skip to content

Commit a05cc9e

Browse files
feat: Update README and pyproject.toml for improved descriptions and keywords
1 parent de0fb21 commit a05cc9e

File tree

2 files changed

+112
-20
lines changed

2 files changed

+112
-20
lines changed

README.md

Lines changed: 94 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,43 @@
11
# SQLCompare
22

3-
SQLCompare helps you understand how a change impacted your data.
4-
When you modify logic, filters, or inputs, it lets you compare the previous and current versions of a dataset—whether they come from tables, SQL queries, or files.
3+
SQLCompare is a Python CLI and data diff tool for comparing SQL tables, SQL query results, CSV files, and Excel/XLSX files.
54

6-
You can compare datasets in two complementary ways:
5+
It is built for analytics engineers and data engineers who need to validate migrations, backfills, model rewrites, vendor file drops, and other data pipeline changes without relying on one-off SQL checks.
76

8-
1) Row-by-row comparison (with an ID): detect missing rows on either side, identify the columns with the most changes, and review before/after values for any record.
9-
2) Statistical comparison: compare column-level statistics such as null counts, distinct counts, and other aggregates to quickly understand overall impact.
7+
Use SQLCompare to compare datasets from PostgreSQL, Snowflake, Databricks, DuckDB, CSV, and XLSX sources with the same review workflow.
108

119
---
1210

13-
## What you get
11+
## What SQLCompare Does
1412

15-
- **Repeatable checks** for releases, backfills, migrations, and vendor drops
16-
- **Clear summaries**: missing-row detection + per-column change counts
17-
- **One workflow** for warehouses *and* local files (DuckDB-powered)
13+
- **Compare SQL tables** before and after a logic change
14+
- **Compare SQL query results** when tables are not materialized yet
15+
- **Compare CSV and Excel files** for local QA and vendor deliveries
16+
- **Review row-level diffs** with before/after values and missing-row detection
17+
- **Run analytics regression testing** with a repeatable saved `diff_id`
18+
19+
SQLCompare supports two complementary comparison modes:
20+
21+
1. Row-by-row comparison with an ID: detect missing rows, identify changed columns, and inspect before/after values.
22+
2. Statistical comparison: compare null counts, distinct counts, and other aggregates to understand overall impact.
23+
24+
## Use Cases
25+
26+
- Validate a dbt model rewrite by comparing the old and new SQL outputs on the same key.
27+
- Compare warehouse tables before deploying a migration or backfill.
28+
- Compare SQL query results when testing filters, joins, or business logic changes.
29+
- Diff vendor CSV or XLSX deliveries before loading them into your warehouse.
30+
- Run data validation checks for analytics regression testing and release QA.
31+
32+
## Supported Connectors and File Types
33+
34+
| Source type | Examples | Supported workflow |
35+
| --- | --- | --- |
36+
| SQL tables | PostgreSQL, Snowflake, Databricks, DuckDB | `sqlcompare run table ...` |
37+
| SQL query results | Inline SQL or `.sql` files | `sqlcompare run query ...` |
38+
| CSV files | Local `.csv` datasets | `sqlcompare run file ...` |
39+
| Excel files | Local `.xlsx` datasets | `sqlcompare run file ...` |
40+
| Dataset configs | YAML definitions for SQL or file sources | `sqlcompare run dataset ...` |
1841

1942
---
2043

@@ -43,7 +66,9 @@ uv tool install "sqlcompare[databricks]"
4366

4467
---
4568

46-
## Quick start (tables)
69+
## Quick Start
70+
71+
### Compare SQL Tables
4772

4873
Compare two tables on a key:
4974

@@ -63,7 +88,35 @@ sqlcompare review export <diff_id> --mode summary
6388
sqlcompare review export <diff_id> --mode complete --output ./reports/full_diff.xlsx
6489
```
6590

66-
## Review report export (XLSX)
91+
### Compare SQL Query Results
92+
93+
Use this when tables are not materialized yet or when you want to compare a filtered slice.
94+
95+
Inline SQL:
96+
97+
```bash
98+
sqlcompare run query \
99+
--previous "SELECT * FROM analytics.orders WHERE order_date < '2024-01-01'" \
100+
--current "SELECT * FROM analytics.orders WHERE order_date >= '2024-01-01'" \
101+
--index order_id \
102+
-c snowflake_prod
103+
```
104+
105+
SQL files:
106+
107+
```bash
108+
sqlcompare run query --previous queries/previous.sql --current queries/current.sql --index order_id -c snowflake_prod
109+
```
110+
111+
### Compare CSV and Excel Files
112+
113+
Use the same diff workflow for local file validation, vendor drops, and ad hoc QA.
114+
115+
```bash
116+
sqlcompare run file path/to/previous.csv path/to/current.xlsx id
117+
```
118+
119+
## Review Report Export (XLSX)
67120

68121
You can export review results as a multi-tab Excel report using `review export`.
69122

@@ -92,7 +145,27 @@ Notes:
92145

93146
---
94147

95-
## review meta (AI-friendly metadata)
148+
## Analytics Regression Testing and Data Validation
149+
150+
SQLCompare is useful when row counts are not enough and manual joins are too fragile. Instead of writing ad hoc SQL every time you change a model or pipeline, you can save a diff once and review it repeatedly with the generated `diff_id`.
151+
152+
Common validation workflows:
153+
154+
- release QA for transformed tables
155+
- migration and backfill verification
156+
- regression testing after SQL logic changes
157+
- warehouse-to-file or file-to-file comparison during onboarding
158+
159+
## Why SQLCompare Instead of Manual SQL Diff Checks
160+
161+
Manual SQL joins, notebooks, and spreadsheets usually answer one question once. SQLCompare keeps the comparison reusable:
162+
163+
- run a compare once, then inspect stats, missing rows, and changed values
164+
- use the same workflow across warehouse tables and local files
165+
- export Excel review reports for debugging and handoff
166+
- avoid rebuilding the same outer join logic for every validation task
167+
168+
## `review meta` for AI-Friendly Metadata
96169

97170
Use `review meta` to get a JSON payload describing the queryable tables and ready-to-run SQL templates for a given `diff_id`. This is especially useful for AI agents that need structured context before running analysis queries.
98171

@@ -109,13 +182,16 @@ Output (JSON):
109182

110183
---
111184

112-
## Example outputs
185+
## Examples
113186

114187
See [`examples/`](examples/) for datasets, commands, and captured outputs.
115188

189+
- [`examples/row_compare.md`](examples/row_compare.md)
190+
- [`examples/stats_compare.md`](examples/stats_compare.md)
191+
116192
---
117193

118-
## Core idea: compare once, analyze many times
194+
## Core Idea: Compare Once, Analyze Many Times
119195

120196
SQLCompare does two things:
121197

@@ -129,9 +205,9 @@ SQLCompare does two things:
129205

130206
---
131207

132-
## Usage by use case
208+
## Usage by Use Case
133209

134-
### 1) Compare two tables
210+
### 1) Compare SQL tables
135211

136212
Best for production validation and regression checks across supported connectors.
137213

@@ -148,7 +224,7 @@ Why it’s useful:
148224

149225
### 2) Compare SQL query results
150226

151-
Use this when tables aren’t materialized yet or you want a filtered slice.
227+
Use this when tables are not materialized yet or you want a filtered slice.
152228

153229
Inline SQL:
154230

@@ -193,7 +269,7 @@ Why it’s useful:
193269

194270
---
195271

196-
### 3) Compare local CSV / XLSX files (DuckDB)
272+
### 3) Compare CSV and Excel files (DuckDB)
197273

198274
Great for ad hoc QA, one-off deliveries, or vendor drops.
199275
SQLCompare uses DuckDB under the hood — no DB server required.

pyproject.toml

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ build-backend = "hatchling.build"
55
[project]
66
name = "sqlcompare"
77
version = "0.0.1"
8-
description = "CLI for comparing tables and sql queries"
8+
description = "Python CLI and data diff tool for comparing SQL tables, SQL query results, CSV files, and Excel/XLSX files"
99
requires-python = ">=3.11"
1010
dependencies = [
1111
"pyyaml>=6.0.0",
@@ -24,16 +24,32 @@ license = {text = "MIT"}
2424
authors = [
2525
{name = "Luis Coimbra"}
2626
]
27+
keywords = [
28+
"sql",
29+
"data-diff",
30+
"data-validation",
31+
"analytics",
32+
"csv-diff",
33+
"query-comparison",
34+
"duckdb",
35+
"regression-testing",
36+
]
2737
classifiers = [
2838
"Development Status :: 3 - Alpha",
2939
"Intended Audience :: Developers",
3040
"License :: OSI Approved :: MIT License",
3141
"Programming Language :: Python :: 3",
32-
"Programming Language :: Python :: 3.10",
3342
"Programming Language :: Python :: 3.11",
3443
"Programming Language :: Python :: 3.12",
3544
]
3645

46+
[project.urls]
47+
Homepage = "https://github.com/luisggc/sqlcompare"
48+
Repository = "https://github.com/luisggc/sqlcompare"
49+
Issues = "https://github.com/luisggc/sqlcompare/issues"
50+
Documentation = "https://github.com/luisggc/sqlcompare#readme"
51+
Examples = "https://github.com/luisggc/sqlcompare/tree/main/examples"
52+
3753
[project.optional-dependencies]
3854
dev = [
3955
"pytest>=8.0.0",

0 commit comments

Comments
 (0)