Skip to content

feat: Add unique rule to dy.Column#325

Merged
Andreas Albert (AndreasAlbertQC) merged 3 commits intoQuantco:mainfrom
gab23r:add_is_unique
Apr 21, 2026
Merged

feat: Add unique rule to dy.Column#325
Andreas Albert (AndreasAlbertQC) merged 3 commits intoQuantco:mainfrom
gab23r:add_is_unique

Conversation

@gab23r
Copy link
Copy Markdown
Contributor

@gab23r gab23r commented Apr 13, 2026

Motivation

Closes #313

Changes

Add the new rule using the same logic than primary_keys.

Drive by:

  • Allow primary_keys for array dtype as it now works in polars, I added a test of it.
  • Disallow primary_keys for object dtype as it never worked. (technically breaking but IMHO shouldn't break anything)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a per-column unique constraint to dy.Column/dy.Schema, aligning validation and SQLAlchemy output, and updates array/object primary key behavior.

Changes:

  • Introduce unique as a first-class column attribute and emit unique=True in SQLAlchemy column definitions.
  • Add schema validation rules for unique columns (and tighten primary key validation via is_unique()).
  • Expand tests for unique constraints and enable primary keys on Array columns while disallowing them for Object columns.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/schema/test_validate.py Adds validation tests for unique columns and Schema.unique_columns().
tests/schema/test_sample.py Adds sampling tests ensuring generated data respects unique constraints.
tests/column_types/test_array.py Adds coverage for primary keys on Array columns.
dataframely/columns/_base.py Adds unique attribute to Column and passes it to SQLAlchemy columns.
dataframely/_base_schema.py Adds unique rules into schema validation; uses is_unique() for primary keys.
dataframely/columns/array.py Allows primary_key on arrays and threads through unique.
dataframely/columns/string.py Threads unique through to Column.
dataframely/columns/integer.py Threads unique through to Column.
dataframely/columns/float.py Threads unique through to Column.
dataframely/columns/decimal.py Threads unique through to Column.
dataframely/columns/datetime.py Threads unique through to Column for date/time/datetime/timedelta.
dataframely/columns/enum.py Threads unique through to Column.
dataframely/columns/categorical.py Threads unique through to Column.
dataframely/columns/list.py Threads unique through to Column.
dataframely/columns/struct.py Threads unique through to Column.
dataframely/columns/object.py Removes primary_key kwarg from Object column constructor.

Comment thread tests/schema/test_validate.py Outdated
Comment thread dataframely/columns/object.py
Comment thread dataframely/columns/object.py
Comment thread dataframely/_base_schema.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (7c73bb1) to head (04b97f3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #325   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           56        56           
  Lines         3399      3408    +9     
=========================================
+ Hits          3399      3408    +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks gab23r ! This looks quite nice to me already. Small suggestions.

Comment thread dataframely/columns/object.py
Comment thread tests/schema/test_sample.py Outdated
Comment thread tests/schema/test_sample.py Outdated
@borchero Oliver Borchert (borchero) changed the title feat: Add is_unique rule to dy.Column feat: Add unique rule to dy.Column Apr 17, 2026
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks gab23r !

@AndreasAlbertQC Andreas Albert (AndreasAlbertQC) merged commit 7126c1e into Quantco:main Apr 21, 2026
32 checks passed
@gab23r gab23r deleted the add_is_unique branch April 21, 2026 21:42
primary_key = _primary_key(columns)
if len(primary_key) > 0:
rules["primary_key"] = Rule(~pl.struct(primary_key).is_duplicated())
rules["primary_key"] = Rule(pl.struct(primary_key).is_unique())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta-comment because I just noticed: pl.struct(primary_key).is_unique() is likely much more inefficient than pl.col(primary_key).is_unique() if we only have a single primary key. We might want to introduce an optimization for this after benchmarking 😄

Comment on lines +44 to +49
# Add unique column validation rules
unique_columns = _unique_columns(columns)
for col_name in unique_columns:
# wrap the column in a struct to make `is_unique` work with list/arrays
# https://github.com/pola-rs/polars/issues/27286
rules[f"{col_name}|unique"] = Rule(pl.struct(col_name).is_unique())
Copy link
Copy Markdown
Member

@borchero Oliver Borchert (borchero) Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't review previously but I find this implementation suboptimal. Why is it on the schema if we do not check composite uniqueness but uniqueness of individual columns? This should be on the column which would also allow for much more efficient evaluation of is_unique for primitive types (because we can very easily skip the struct-wrapping).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, this also breaks for nested types; for example setting unique on a list element is simply ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: unique=True column constraint

4 participants