Skip to content

Add explicit addRaggedColumn and addEnumColumn helpers for DynamicTable #809

@ehennestad

Description

@ehennestad

Summary

DynamicTable.addColumn currently works at the individual dataset level, but users often need to add logical columns that come with dependent datasets or attributes. Two cases stand out:

  • ragged columns, where a primary column depends on one or more *_index datasets
  • enum columns, where EnumData depends on its elements reference and often also benefits from a sibling *_elements dataset for interoperability with PyNWB

I think it would be valuable to add explicit convenience APIs for these cases rather than expecting users to assemble all companion objects manually and pass them through addColumn.

Current pain points

Ragged columns

Today users either construct VectorData/DynamicTableRegion plus one or more VectorIndex objects manually, or use util.create_indexed_column as a separate helper.

Relevant examples:

  • util.create_indexed_column
  • tutorials/dynamic_tables.mlx
  • tutorials/icephys.mlx

This works, but it means the API for adding a logical ragged column is split across multiple objects and helper functions.

Enum columns

EnumData already models the enum relationship via its required elements reference, but users still need to manually manage the supporting VectorData dataset that stores the actual element values. The tutorial also notes that the <name>_elements dataset layout is useful/required for compatibility with PyNWB.

Relevant example:

  • tutorials/dynamic_tables.mlx

addColumn is not a great fit for these cases

addColumn currently behaves like a low-level dataset insertion helper. That makes some cases awkward:

  • *_index datasets are companions of a logical column rather than user-facing columns
  • *_elements datasets are support datasets and should not participate in row-height validation the same way as true columns
  • some DynamicTable subclasses define schema-backed column properties, while others define non-column properties, so property-name handling also needs care

Proposal

Add explicit convenience methods on DynamicTable:

  • addRaggedColumn(...)
  • addEnumColumn(...)

These methods would build and install the necessary typed objects and companion datasets in a consistent way, while leaving addColumn available as the lower-level API for already-constructed typed objects.

Possible design direction

addRaggedColumn

Accept row-wise data and construct/store:

  • the primary column (VectorData or DynamicTableRegion)
  • one or more VectorIndex companions as needed

Possible behavior:

  • update colnames only for the primary logical column
  • use the terminal index for row-height validation
  • support both simple and nested ragged columns
  • optionally support table references similar to util.create_indexed_column(..., table)

addEnumColumn

Accept enum values plus the allowed element set and construct/store:

  • the EnumData primary column
  • the backing elements dataset
  • the elements object reference

Possible behavior:

  • keep the enum column itself as the logical user-facing column
  • optionally store <name>_elements explicitly for interoperability/documented compatibility with PyNWB
  • avoid treating the support dataset as a row-aligned table column for validation purposes

Why separate helpers seem preferable

I think dedicated methods are clearer than making addColumn infer too much from heterogeneous inputs.

That would preserve a nice split:

  • addColumn for low-level typed-object insertion
  • addRaggedColumn for ragged logical columns
  • addEnumColumn for enum logical columns

This feels easier to understand and document than expanding addColumn until it tries to infer all companion-dataset semantics from arbitrary name/value pairs.

Open questions

  • Should addRaggedColumn support doubly-ragged columns from the start, or just single-index ragged columns initially?
  • Should addEnumColumn always materialize <name>_elements, or make that optional?
  • Should util.create_indexed_column remain as a lower-level utility, or eventually delegate to the new helper(s)?

I’m opening this as an enhancement suggestion based on the current DynamicTable behavior and tutorial patterns, since these workflows already exist conceptually but are not yet first-class in the table API.

Written by GPT-5.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions