feat: wire metabolomics SDRF templates and add validation fixtures by ypriverol · Pull Request #286 · bigbio/sdrf-pipelines

ypriverol · 2026-04-20T07:18:11Z

Summary

Bumps the sdrf-templates submodule to the metabolomics branch (bigbio/sdrf-templates#29) that adds ms-metabolomics, lc-ms-metabolomics, and gc-ms-metabolomics at version 1.0.0-dev.

The schema registry auto-discovers the three new templates from the templates directory + manifest, so no registry.py edits are required. Validation works end-to-end via the existing pattern, values, ontology, mz_value, and mz_range_interval validators; no new validator types were needed.

Test fixtures

Adds tests/test_metabolomics_templates.py with four parameterised cases that load each example SDRF and assert validation against the target schema produces no blocking errors:

Fixture	Schema	Source
`tests/data/metabolomics/MTBLS1129.sdrf.tsv`	`lc-ms-metabolomics`	bigbio/proteomics-sample-metadata#680, re-annotated
`tests/data/metabolomics/MTBLS1903.sdrf.tsv`	`lc-ms-metabolomics`	same
`tests/data/metabolomics/MTBLS547.sdrf.tsv`	`lc-ms-metabolomics`	same
`tests/data/metabolomics/MTBKS5.sdrf.tsv`	`gc-ms-metabolomics`	real MetaboBank deposit, converted via the mapping doc

MTBLS re-annotation deltas (vs PR #680 originals)

technology type and comment[scan polarity] switched from NT=...;AC=... form to bare values, matching ms-proteomics convention for whitelist columns.
Added required comment[acquisition method].
Fixed ectrospray ionisation typo → electrospray ionization with correct PSI-MS accession MS:1000073.
Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments (suffix _liver / _plasma derived from the data file path).
Replaced not applicable in characteristics[biological replicate] with auto-incremented integer per source name.

MTBKS5 (MetaboBank example)

Sourced from https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt. Real GC-MS dataset (Agilent 6890N + Pegasus III TOF, EI ionization, Rtx-5 Sil MS column). 291 rows convert to gc-ms-metabolomics and validate with zero errors.

Verification

$ pytest tests/test_metabolomics_templates.py -v
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS1129.sdrf.tsv-lc-ms-metabolomics] PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS1903.sdrf.tsv-lc-ms-metabolomics] PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS547.sdrf.tsv-lc-ms-metabolomics]  PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBKS5.sdrf.tsv-gc-ms-metabolomics]    PASSED

$ pytest tests/ -q
327 passed in 178.99s

Tests are pinned to skip_ontology=True so they run without the parquet ontology cache; ontology checks remain available with a populated cache.

Refs

Templates PR: feat: add metabolomics SDRF templates (LC-MS, GC-MS) at 1.0.0-dev sdrf-templates#29
Driver: Metabolomics example proteomics-sample-metadata#680

Test plan

Run pytest tests/test_metabolomics_templates.py -v — 4/4 pass
Run full suite — 327/327 pass, no regressions

Summary by CodeRabbit

Tests
- Added validation tests to ensure metabolomics data files meet required standards.
Chores
- Updated template dependencies.

coderabbitai · 2026-04-20T07:18:19Z

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5792967-cd13-4461-a522-526c3283d46a

📥 Commits

Reviewing files that changed from the base of the PR and between 6c4da02 and 20bed00.

⛔ Files ignored due to path filters (4)

tests/data/metabolomics/MTBKS5.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS1129.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS1903.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS547.sdrf.tsv is excluded by !**/*.tsv

📒 Files selected for processing (2)

src/sdrf_pipelines/sdrf/sdrf-templates
tests/test_metabolomics_templates.py

📝 Walkthrough

Walkthrough

Updated a submodule pointer in the SDRF templates directory and added a new test module that validates metabolomics SDRF TSV examples against template schemas using parameterized test cases with ontology validation disabled.

Changes

Cohort / File(s)	Summary
Submodule Update `src/sdrf_pipelines/sdrf/sdrf-templates`	Updated submodule commit pointer from `889c1d7bb771...` to `38e1f84642f71...` to reference new submodule contents.
Metabolomics Test Suite `tests/test_metabolomics_templates.py`	Added new test module with helper function `_load_sdrf()` for TSV parsing and column normalization, `validator` fixture for schema validation setup, and parameterized test `test_metabolomics_example_validates()` covering three MTBLS and one MetaboBank deposit cases; treats only logging.ERROR-level issues as blocking failures.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A submodule hops to a newer ground,
While metabolomics tests come around,
TSV columns are trimmed with care,
Validation workflows checked with flair,
Templates dance in schemas' lair! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: updating metabolomics SDRF templates and adding validation test fixtures.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in bigbio#680 that validate against the new lc-ms-metabolomics template in bigbio/sdrf-templates at 1.0.0-dev. Changes from the bigbio#680 originals: - technology type and comment[scan polarity] rewritten from NT=...;AC=... form to bare values (LC-MS-based metabolomics, positive scan / negative scan), matching the ms-proteomics convention for whitelist columns. - Added required comment[acquisition method] column (default: full scan for untargeted QTof/Orbitrap profiling). - Fixed typo ectrospray ionisation -> electrospray ionization, with the correct PSI-MS accession MS:1000073 replacing CHMO:0001659. - Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments by suffixing with _liver / _plasma derived from the data file path. Without this, the combination-of-columns uniqueness validator flagged duplicates. - Replaced not applicable in characteristics[biological replicate] with an auto-incremented integer per source name (sample-metadata requires real values for this column). End-to-end validation of these files against lc-ms-metabolomics is covered by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py). Refs bigbio#680.

Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in bigbio#680 that validate against the new lc-ms-metabolomics template in bigbio/sdrf-templates at 1.0.0-dev. Changes from the bigbio#680 originals: - technology type and comment[scan polarity] rewritten from NT=...;AC=... form to bare values (LC-MS-based metabolomics, positive scan / negative scan), matching the ms-proteomics convention for whitelist columns. - Added required comment[acquisition method] column (default: full scan for untargeted QTof/Orbitrap profiling). - Fixed typo ectrospray ionisation -> electrospray ionization, with the correct PSI-MS accession MS:1000073 replacing CHMO:0001659. - Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments by suffixing with _liver / _plasma derived from the data file path. Without this, the combination-of-columns uniqueness validator flagged duplicates. - Replaced not applicable in characteristics[biological replicate] with an auto-incremented integer per source name (sample-metadata requires real values for this column). End-to-end validation of these files against lc-ms-metabolomics is covered by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py). Refs bigbio#680. Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_metabolomics_templates.py`:
- Line 50: The filtering for blocking errors only matches exactly logging.ERROR;
update the comprehension that builds blocking to treat missing (None) or
higher-severity levels as blocking by checking the LogicError.error_type on each
item (use the existing errors list and getattr(e, "error_type", None)) and
include the error if the retrieved level is None or >= logging.ERROR; adjust the
blocking assignment that currently uses getattr(e, "error_type", logging.ERROR)
== logging.ERROR to use that None-or-greater check so CRITICAL and missing
error_type are treated as blocking.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df704db2-8fd2-4255-92ce-5898b58c0110

📥 Commits

Reviewing files that changed from the base of the PR and between 8676072 and 6c4da02.

⛔ Files ignored due to path filters (4)

tests/data/metabolomics/MTBKS5.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS1129.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS1903.sdrf.tsv is excluded by !**/*.tsv
tests/data/metabolomics/MTBLS547.sdrf.tsv is excluded by !**/*.tsv

📒 Files selected for processing (2)

src/sdrf_pipelines/sdrf/sdrf-templates
tests/test_metabolomics_templates.py

coderabbitai · 2026-04-20T08:53:59Z

+    df = _load_sdrf(path)
+
+    errors = validator.validate(df, schema_name, skip_ontology=True)
+    blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]


⚠️ Potential issue | 🟠 Major

Treat missing or higher-severity error levels as blocking.

Line 50 only fails on logging.ERROR. LogicError.error_type can be None, and any logging.CRITICAL result would also be ignored, allowing blocking validation failures to pass.

🐛 Proposed fix

- blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR] + blocking = [ + e + for e in errors + if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR + ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]

blocking = [

e

for e in errors

if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR

]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_metabolomics_templates.py` at line 50, The filtering for blocking errors only matches exactly logging.ERROR; update the comprehension that builds blocking to treat missing (None) or higher-severity levels as blocking by checking the LogicError.error_type on each item (use the existing errors list and getattr(e, "error_type", None)) and include the error if the retrieved level is None or >= logging.ERROR; adjust the blocking assignment that currently uses getattr(e, "error_type", logging.ERROR) == logging.ERROR to use that None-or-greater check so CRITICAL and missing error_type are treated as blocking.

Bumps the sdrf-templates submodule to the feat/metabolomics-templates branch that introduces ms-metabolomics, lc-ms-metabolomics, and gc-ms-metabolomics at version 1.0.0-dev (see bigbio/sdrf-templates#feat/metabolomics-templates). The schema registry auto-discovers the three new templates from the templates directory + manifest, so no registry.py edits are required. Validation works end-to-end via the existing 'pattern', 'values', 'ontology', 'mz_value', and 'mz_range_interval' validators; no new validator types were needed. Adds tests/test_metabolomics_templates.py with four parameterised cases that load each example SDRF and assert validation against the target schema produces no blocking errors: - tests/data/metabolomics/MTBLS1129.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBLS1903.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBLS547.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBKS5.sdrf.tsv -> gc-ms-metabolomics The MTBLS files are re-annotated versions of the SDRFs in proteomics-sample-metadata#680 (dropped NT=...;AC=EDAM:3172 form for technology type and scan polarity in favour of bare values consistent with ms-proteomics; added required comment[acquisition method]; fixed 'ectrospray ionisation' typo to 'electrospray ionization' with correct PSI-MS accession MS:1000073; disambiguated assay names for QC and blank rows shared between liver and plasma sub-experiments). The MTBKS5 file is a real GC-MS dataset sourced from MetaboBank (https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt), converted via the column mapping documented in sdrf-templates/docs/mappings/metabobank-sdrf-mapping.md. Tests are pinned to skip_ontology=True so they run without the parquet ontology cache; ontology checks remain available with a populated cache. Refs proteomics-sample-metadata#680. Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>

ypriverol mentioned this pull request Apr 20, 2026

feat: add metabolomics example SDRFs validating against lc-ms-metabolomics bigbio/proteomics-sample-metadata#824

Open

2 tasks

ypriverol force-pushed the feat/metabolomics-templates branch from da597e9 to 6c4da02 Compare April 20, 2026 08:50

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

ypriverol force-pushed the feat/metabolomics-templates branch from 6c4da02 to 20bed00 Compare April 20, 2026 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire metabolomics SDRF templates and add validation fixtures#286

feat: wire metabolomics SDRF templates and add validation fixtures#286
ypriverol wants to merge 1 commit intobigbio:mainfrom
ypriverol:feat/metabolomics-templates

ypriverol commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

ypriverol commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test fixtures

MTBLS re-annotation deltas (vs PR #680 originals)

MTBKS5 (MetaboBank example)

Verification

Refs

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

ypriverol commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading