feat: wire metabolomics SDRF templates and add validation fixtures#286
feat: wire metabolomics SDRF templates and add validation fixtures#286ypriverol wants to merge 1 commit intobigbio:mainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 47 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdated a submodule pointer in the SDRF templates directory and added a new test module that validates metabolomics SDRF TSV examples against template schemas using parameterized test cases with ontology validation disabled. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in bigbio#680 that validate against the new lc-ms-metabolomics template in bigbio/sdrf-templates at 1.0.0-dev. Changes from the bigbio#680 originals: - technology type and comment[scan polarity] rewritten from NT=...;AC=... form to bare values (LC-MS-based metabolomics, positive scan / negative scan), matching the ms-proteomics convention for whitelist columns. - Added required comment[acquisition method] column (default: full scan for untargeted QTof/Orbitrap profiling). - Fixed typo ectrospray ionisation -> electrospray ionization, with the correct PSI-MS accession MS:1000073 replacing CHMO:0001659. - Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments by suffixing with _liver / _plasma derived from the data file path. Without this, the combination-of-columns uniqueness validator flagged duplicates. - Replaced not applicable in characteristics[biological replicate] with an auto-incremented integer per source name (sample-metadata requires real values for this column). End-to-end validation of these files against lc-ms-metabolomics is covered by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py). Refs bigbio#680.
Companion to bigbio#680. These SDRFs are re-annotated versions of the examples in bigbio#680 that validate against the new lc-ms-metabolomics template in bigbio/sdrf-templates at 1.0.0-dev. Changes from the bigbio#680 originals: - technology type and comment[scan polarity] rewritten from NT=...;AC=... form to bare values (LC-MS-based metabolomics, positive scan / negative scan), matching the ms-proteomics convention for whitelist columns. - Added required comment[acquisition method] column (default: full scan for untargeted QTof/Orbitrap profiling). - Fixed typo ectrospray ionisation -> electrospray ionization, with the correct PSI-MS accession MS:1000073 replacing CHMO:0001659. - Disambiguated assay name for QC and blank rows that were paired with both liver and plasma sub-experiments by suffixing with _liver / _plasma derived from the data file path. Without this, the combination-of-columns uniqueness validator flagged duplicates. - Replaced not applicable in characteristics[biological replicate] with an auto-incremented integer per source name (sample-metadata requires real values for this column). End-to-end validation of these files against lc-ms-metabolomics is covered by bigbio/sdrf-pipelines#286 (tests/test_metabolomics_templates.py). Refs bigbio#680. Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>
da597e9 to
6c4da02
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_metabolomics_templates.py`:
- Line 50: The filtering for blocking errors only matches exactly logging.ERROR;
update the comprehension that builds blocking to treat missing (None) or
higher-severity levels as blocking by checking the LogicError.error_type on each
item (use the existing errors list and getattr(e, "error_type", None)) and
include the error if the retrieved level is None or >= logging.ERROR; adjust the
blocking assignment that currently uses getattr(e, "error_type", logging.ERROR)
== logging.ERROR to use that None-or-greater check so CRITICAL and missing
error_type are treated as blocking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df704db2-8fd2-4255-92ce-5898b58c0110
⛔ Files ignored due to path filters (4)
tests/data/metabolomics/MTBKS5.sdrf.tsvis excluded by!**/*.tsvtests/data/metabolomics/MTBLS1129.sdrf.tsvis excluded by!**/*.tsvtests/data/metabolomics/MTBLS1903.sdrf.tsvis excluded by!**/*.tsvtests/data/metabolomics/MTBLS547.sdrf.tsvis excluded by!**/*.tsv
📒 Files selected for processing (2)
src/sdrf_pipelines/sdrf/sdrf-templatestests/test_metabolomics_templates.py
| df = _load_sdrf(path) | ||
|
|
||
| errors = validator.validate(df, schema_name, skip_ontology=True) | ||
| blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR] |
There was a problem hiding this comment.
Treat missing or higher-severity error levels as blocking.
Line 50 only fails on logging.ERROR. LogicError.error_type can be None, and any logging.CRITICAL result would also be ignored, allowing blocking validation failures to pass.
🐛 Proposed fix
- blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR]
+ blocking = [
+ e
+ for e in errors
+ if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| blocking = [e for e in errors if getattr(e, "error_type", logging.ERROR) == logging.ERROR] | |
| blocking = [ | |
| e | |
| for e in errors | |
| if (error_type := getattr(e, "error_type", logging.ERROR)) is None or error_type >= logging.ERROR | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_metabolomics_templates.py` at line 50, The filtering for blocking
errors only matches exactly logging.ERROR; update the comprehension that builds
blocking to treat missing (None) or higher-severity levels as blocking by
checking the LogicError.error_type on each item (use the existing errors list
and getattr(e, "error_type", None)) and include the error if the retrieved level
is None or >= logging.ERROR; adjust the blocking assignment that currently uses
getattr(e, "error_type", logging.ERROR) == logging.ERROR to use that
None-or-greater check so CRITICAL and missing error_type are treated as
blocking.
Bumps the sdrf-templates submodule to the feat/metabolomics-templates branch that introduces ms-metabolomics, lc-ms-metabolomics, and gc-ms-metabolomics at version 1.0.0-dev (see bigbio/sdrf-templates#feat/metabolomics-templates). The schema registry auto-discovers the three new templates from the templates directory + manifest, so no registry.py edits are required. Validation works end-to-end via the existing 'pattern', 'values', 'ontology', 'mz_value', and 'mz_range_interval' validators; no new validator types were needed. Adds tests/test_metabolomics_templates.py with four parameterised cases that load each example SDRF and assert validation against the target schema produces no blocking errors: - tests/data/metabolomics/MTBLS1129.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBLS1903.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBLS547.sdrf.tsv -> lc-ms-metabolomics - tests/data/metabolomics/MTBKS5.sdrf.tsv -> gc-ms-metabolomics The MTBLS files are re-annotated versions of the SDRFs in proteomics-sample-metadata#680 (dropped NT=...;AC=EDAM:3172 form for technology type and scan polarity in favour of bare values consistent with ms-proteomics; added required comment[acquisition method]; fixed 'ectrospray ionisation' typo to 'electrospray ionization' with correct PSI-MS accession MS:1000073; disambiguated assay names for QC and blank rows shared between liver and plasma sub-experiments). The MTBKS5 file is a real GC-MS dataset sourced from MetaboBank (https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt), converted via the column mapping documented in sdrf-templates/docs/mappings/metabobank-sdrf-mapping.md. Tests are pinned to skip_ontology=True so they run without the parquet ontology cache; ontology checks remain available with a populated cache. Refs proteomics-sample-metadata#680. Co-authored-by: Matthias Mattanovich <matthias.mattanovich@gmail.com>
6c4da02 to
20bed00
Compare
Summary
Bumps the
sdrf-templatessubmodule to the metabolomics branch (bigbio/sdrf-templates#29) that addsms-metabolomics,lc-ms-metabolomics, andgc-ms-metabolomicsat version 1.0.0-dev.The schema registry auto-discovers the three new templates from the templates directory + manifest, so no
registry.pyedits are required. Validation works end-to-end via the existingpattern,values,ontology,mz_value, andmz_range_intervalvalidators; no new validator types were needed.Test fixtures
Adds
tests/test_metabolomics_templates.pywith four parameterised cases that load each example SDRF and assert validation against the target schema produces no blocking errors:tests/data/metabolomics/MTBLS1129.sdrf.tsvlc-ms-metabolomicstests/data/metabolomics/MTBLS1903.sdrf.tsvlc-ms-metabolomicstests/data/metabolomics/MTBLS547.sdrf.tsvlc-ms-metabolomicstests/data/metabolomics/MTBKS5.sdrf.tsvgc-ms-metabolomicsMTBLS re-annotation deltas (vs PR #680 originals)
technology typeandcomment[scan polarity]switched fromNT=...;AC=...form to bare values, matchingms-proteomicsconvention for whitelist columns.comment[acquisition method].ectrospray ionisationtypo →electrospray ionizationwith correct PSI-MS accessionMS:1000073.assay namefor QC and blank rows that were paired with both liver and plasma sub-experiments (suffix_liver/_plasmaderived from the data file path).not applicableincharacteristics[biological replicate]with auto-incremented integer per source name.MTBKS5 (MetaboBank example)
Sourced from
https://ddbj.nig.ac.jp/public/metabobank/study/MTBKS5/MTBKS5.sdrf.txt. Real GC-MS dataset (Agilent 6890N + Pegasus III TOF, EI ionization, Rtx-5 Sil MS column). 291 rows convert togc-ms-metabolomicsand validate with zero errors.Verification
Tests are pinned to
skip_ontology=Trueso they run without the parquet ontology cache; ontology checks remain available with a populated cache.Refs
Test plan
pytest tests/test_metabolomics_templates.py -v— 4/4 passSummary by CodeRabbit
Tests
Chores