Skip to content

Cater for column name mismatches across systems #1617

@nj1973

Description

@nj1973

Columns names occasionally differ between systems. For example when copying data from a transactional system to a data warehouse we may choose to give id columns a more descriptive name.

It would be good if this could be automated in some way.

  • If DVT already assumed the columns are in the correct order then maybe it should honour the names
  • If there is no expected order then perhaps we need a --column-name-mappings option to map the names
  • This needs to work for primary keys too.

Test case
Oracle table:

create table dvt_test.tab1_a
(id number(5),col_data varchar2(30));

PostgreSQL table:

create table dvt_test.tab1_a
(tab_id int,col_data varchar(30));

DVT command:

data-validation validate column -sc=ora_local -tc=pg_local -tbls=dvt_test.tab1_a --count="*" --sum="*"
...
  File "/path/github/professional-services-data-validator/data_validation/config_manager.py", line 1059, in build_config_column_aggregates
    casefold_target_columns[column]
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'id'

What version of DVT are you using?
8.1.1

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
Storing the validations to YAML files and then editing and correcting manually.

How are you executing DVT?
CLI commands

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: feature request'Nice-to-have' improvement, new feature or different behavior or design.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions