Skip to content

feat: wire metabolomics SDRF templates and add validation fixtures#286

Open
ypriverol wants to merge 1 commit intobigbio:mainfrom
ypriverol:feat/metabolomics-templates
Open

feat: wire metabolomics SDRF templates and add validation fixtures#286
ypriverol wants to merge 1 commit intobigbio:mainfrom
ypriverol:feat/metabolomics-templates

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 20, 2026

Summary

Bumps the sdrf-templates submodule to the metabolomics branch (bigbio/sdrf-templates#29) that adds ms-metabolomics, lc-ms-metabolomics, and gc-ms-metabolomics at version 1.0.0-dev.

The schema registry auto-discovers the three new templates from the templates directory + manifest, so no registry.py edits are required. Validation works end-to-end via the existing pattern, values, ontology, mz_value, and mz_range_interval validators; no new validator types were needed.

Test fixtures

Adds tests/test_metabolomics_templates.py with four parameterised cases that load each example SDRF and assert validation against the target schema produces no blocking errors:

Fixture Schema Source
tests/data/metabolomics/MTBLS1129.sdrf.tsv lc-ms-metabolomics bigbio/proteomics-sample-metadata#680, re-annotated
tests/data/metabolomics/MTBLS1903.sdrf.tsv lc-ms-metabolomics same
tests/data/metabolomics/MTBLS547.sdrf.tsv lc-ms-metabolomics same
tests/data/metabolomics/MTBKS5.sdrf.tsv gc-ms-metabolomics real MetaboBank deposit, converted via the mapping doc

MTBLS re-annotation deltas (vs PR #680 originals)

  • technology type and comment[scan polarity] switched from NT=...;AC=... form to bare values, matching ms-proteomics convention for whitelist columns.
  • Added required comment[acquisition method].
  • Fixed ectrospray ionisation typo → electrospray ionization with correct PSI-MS accession MS:1000073.
  • Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments (suffix _liver / _plasma derived from the data file path).
  • Replaced not applicable in characteristics[biological replicate] with auto-incremented integer per source name.

MTBKS5 (MetaboBank example)

Sourced from https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt. Real GC-MS dataset (Agilent 6890N + Pegasus III TOF, EI ionization, Rtx-5 Sil MS column). 291 rows convert to gc-ms-metabolomics and validate with zero errors.

Verification

$ pytest tests/test_metabolomics_templates.py -v
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS1129.sdrf.tsv-lc-ms-metabolomics] PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS1903.sdrf.tsv-lc-ms-metabolomics] PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBLS547.sdrf.tsv-lc-ms-metabolomics]  PASSED
tests/test_metabolomics_templates.py::test_metabolomics_example_validates[MTBKS5.sdrf.tsv-gc-ms-metabolomics]    PASSED

$ pytest tests/ -q
327 passed in 178.99s

Tests are pinned to skip_ontology=True so they run without the parquet ontology cache; ontology checks remain available with a populated cache.

Refs

Test plan

  • Run pytest tests/test_metabolomics_templates.py -v — 4/4 pass
  • Run full suite — 327/327 pass, no regressions

Summary by CodeRabbit

  • Tests

    • Added validation tests to ensure metabolomics data files meet required standards.
  • Chores

    • Updated template dependencies.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5792967-cd13-4461-a522-526c3283d46a

📥 Commits

Reviewing files that changed from the base of the PR and between 6c4da02 and 20bed00.

⛔ Files ignored due to path filters (4)
  • tests/data/metabolomics/MTBKS5.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS1129.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS1903.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS547.sdrf.tsv is excluded by !**/*.tsv
📒 Files selected for processing (2)
  • src/sdrf_pipelines/sdrf/sdrf-templates
  • tests/test_metabolomics_templates.py
📝 Walkthrough

Walkthrough

Updated a submodule pointer in the SDRF templates directory and added a new test module that validates metabolomics SDRF TSV examples against template schemas using parameterized test cases with ontology validation disabled.

Changes

Cohort / File(s) Summary
Submodule Update
src/sdrf_pipelines/sdrf/sdrf-templates
Updated submodule commit pointer from 889c1d7bb771... to 38e1f84642f71... to reference new submodule contents.
Metabolomics Test Suite
tests/test_metabolomics_templates.py
Added new test module with helper function _load_sdrf() for TSV parsing and column normalization, validator fixture for schema validation setup, and parameterized test test_metabolomics_example_validates() covering three MTBLS and one MetaboBank deposit cases; treats only logging.ERROR-level issues as blocking failures.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A submodule hops to a newer ground,
While metabolomics tests come around,
TSV columns are trimmed with care,
Validation workflows checked with flair,
Templates dance in schemas' lair! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: updating metabolomics SDRF templates and adding validation test fixtures.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ypriverol added a commit to ypriverol/proteomics-metadata-standard that referenced this pull request Apr 20, 2026
Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in
bigbio#680 that validate against the new lc-ms-metabolomics template in
bigbio/sdrf-templates at 1.0.0-dev.

Changes from the bigbio#680 originals:

- technology type and comment[scan polarity] rewritten from NT=...;AC=...
  form to bare values (LC-MS-based metabolomics, positive scan / negative
  scan), matching the ms-proteomics convention for whitelist columns.
- Added required comment[acquisition method] column (default: full scan for
  untargeted QTof/Orbitrap profiling).
- Fixed typo ectrospray ionisation -> electrospray ionization, with the
  correct PSI-MS accession MS:1000073 replacing CHMO:0001659.
- Disambiguated assay name for QC and blank rows that were paired with both
  liver and plasma sub-experiments by suffixing with _liver / _plasma
  derived from the data file path. Without this, the combination-of-columns
  uniqueness validator flagged duplicates.
- Replaced not applicable in characteristics[biological replicate] with an
  auto-incremented integer per source name (sample-metadata requires real
  values for this column).

End-to-end validation of these files against lc-ms-metabolomics is covered
by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py).

Refs bigbio#680.
ypriverol added a commit to ypriverol/proteomics-metadata-standard that referenced this pull request Apr 20, 2026
Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in
bigbio#680 that validate against the new lc-ms-metabolomics template in
bigbio/sdrf-templates at 1.0.0-dev.

Changes from the bigbio#680 originals:

- technology type and comment[scan polarity] rewritten from NT=...;AC=...
  form to bare values (LC-MS-based metabolomics, positive scan / negative
  scan), matching the ms-proteomics convention for whitelist columns.
- Added required comment[acquisition method] column (default: full scan for
  untargeted QTof/Orbitrap profiling).
- Fixed typo ectrospray ionisation -> electrospray ionization, with the
  correct PSI-MS accession MS:1000073 replacing CHMO:0001659.
- Disambiguated assay name for QC and blank rows that were paired with both
  liver and plasma sub-experiments by suffixing with _liver / _plasma
  derived from the data file path. Without this, the combination-of-columns
  uniqueness validator flagged duplicates.
- Replaced not applicable in characteristics[biological replicate] with an
  auto-incremented integer per source name (sample-metadata requires real
  values for this column).

End-to-end validation of these files against lc-ms-metabolomics is covered
by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py).

Refs bigbio#680.

Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>
@ypriverol ypriverol force-pushed the feat/metabolomics-templates branch from da597e9 to 6c4da02 Compare April 20, 2026 08:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_metabolomics_templates.py`:
- Line 50: The filtering for blocking errors only matches exactly logging.ERROR;
update the comprehension that builds blocking to treat missing (None) or
higher-severity levels as blocking by checking the LogicError.error_type on each
item (use the existing errors list and getattr(e, "error_type", None)) and
include the error if the retrieved level is None or >= logging.ERROR; adjust the
blocking assignment that currently uses getattr(e, "error_type", logging.ERROR)
== logging.ERROR to use that None-or-greater check so CRITICAL and missing
error_type are treated as blocking.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df704db2-8fd2-4255-92ce-5898b58c0110

📥 Commits

Reviewing files that changed from the base of the PR and between 8676072 and 6c4da02.

⛔ Files ignored due to path filters (4)
  • tests/data/metabolomics/MTBKS5.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS1129.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS1903.sdrf.tsv is excluded by !**/*.tsv
  • tests/data/metabolomics/MTBLS547.sdrf.tsv is excluded by !**/*.tsv
📒 Files selected for processing (2)
  • src/sdrf_pipelines/sdrf/sdrf-templates
  • tests/test_metabolomics_templates.py

df = _load_sdrf(path)

errors = validator.validate(df, schema_name, skip_ontology=True)
blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat missing or higher-severity error levels as blocking.

Line 50 only fails on logging.ERROR. LogicError.error_type can be None, and any logging.CRITICAL result would also be ignored, allowing blocking validation failures to pass.

🐛 Proposed fix
-    blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]
+    blocking = [
+        e
+        for e in errors
+        if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR
+    ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]
blocking = [
e
for e in errors
if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_metabolomics_templates.py` at line 50, The filtering for blocking
errors only matches exactly logging.ERROR; update the comprehension that builds
blocking to treat missing (None) or higher-severity levels as blocking by
checking the LogicError.error_type on each item (use the existing errors list
and getattr(e, "error_type", None)) and include the error if the retrieved level
is None or >= logging.ERROR; adjust the blocking assignment that currently uses
getattr(e, "error_type", logging.ERROR) == logging.ERROR to use that
None-or-greater check so CRITICAL and missing error_type are treated as
blocking.

Bumps the sdrf-templates submodule to the feat/metabolomics-templates branch
that introduces ms-metabolomics, lc-ms-metabolomics, and gc-ms-metabolomics
at version 1.0.0-dev (see bigbio/sdrf-templates#feat/metabolomics-templates).

The schema registry auto-discovers the three new templates from the templates
directory + manifest, so no registry.py edits are required. Validation works
end-to-end via the existing 'pattern', 'values', 'ontology', 'mz_value', and
'mz_range_interval' validators; no new validator types were needed.

Adds tests/test_metabolomics_templates.py with four parameterised cases that
load each example SDRF and assert validation against the target schema
produces no blocking errors:

- tests/data/metabolomics/MTBLS1129.sdrf.tsv -> lc-ms-metabolomics
- tests/data/metabolomics/MTBLS1903.sdrf.tsv -> lc-ms-metabolomics
- tests/data/metabolomics/MTBLS547.sdrf.tsv  -> lc-ms-metabolomics
- tests/data/metabolomics/MTBKS5.sdrf.tsv    -> gc-ms-metabolomics

The MTBLS files are re-annotated versions of the SDRFs in
proteomics-sample-metadata#680 (dropped NT=...;AC=EDAM:3172 form for
technology type and scan polarity in favour of bare values consistent with
ms-proteomics; added required comment[acquisition method]; fixed
'ectrospray ionisation' typo to 'electrospray ionization' with correct
PSI-MS accession MS:1000073; disambiguated assay names for QC and blank
rows shared between liver and plasma sub-experiments).

The MTBKS5 file is a real GC-MS dataset sourced from MetaboBank
(https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt),
converted via the column mapping documented in
sdrf-templates/docs/mappings/metabobank-sdrf-mapping.md.

Tests are pinned to skip_ontology=True so they run without the parquet
ontology cache; ontology checks remain available with a populated cache.

Refs proteomics-sample-metadata#680.

Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>
@ypriverol ypriverol force-pushed the feat/metabolomics-templates branch from 6c4da02 to 20bed00 Compare April 20, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant