Skip to content

Commit 3e4af01

Browse files
feat(qc): add combined recombination detection strategies B+C+D+F
Closes #1699 Combines four recombination detection strategies: - B: Spatial uniformity (PR #1737) - C: Cluster gaps (PR #1738) - D: Reversion clustering (PR #1739) - F: Label switching (PR #1741) Test dataset in this PR: `./data/recomb/enpen/enterovirus/ev-d68/` Preview: https://nextstrain--nextclade--pr-1742.previews.neherlab.click Preview with test dataset: https://nextstrain--nextclade--pr-1742.previews.neherlab.click?dataset-url=gh:nextstrain/nextclade@feat/qc-recomb-strategy-combined@/data/recomb/enpen/enterovirus/ev-d68/&input-fasta=example CLI test: ``` nextclade run \ --input-dataset data/recomb/enpen/enterovirus/ev-d68/ \ --output-all output/ \ data/recomb/enpen/enterovirus/ev-d68/sequences.fasta ``` Note: The current weighted score aggregation (simple sum of strategy scores) is a temporary solution. The scoring mechanism needs further discussion to determine optimal combination approach.
1 parent 901e78d commit 3e4af01

29 files changed

Lines changed: 27963 additions & 7 deletions

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
/build/
55
/data_dev*/
66
/data_local*
7-
/data/
7+
/data/*
8+
!/data/recomb/
89
/docs/build/
910
/e2e/cli/snapshots/
1011
/e2e/cli/tmp/
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## 2025-12-10T13:21:04Z
2+
3+
- Update alignment parameters in pathogen.json:
4+
- Fix gap extension penalty
5+
- Enable reverse-complement handling
6+
- Recompute tree topology (ML tree rerun)
7+
- Regenerate mutation labels for all clades
8+
- Update reference example sequences
9+
10+
## 2025-11-20T19:02:04Z
11+
12+
Add citation information to README.md
13+
14+
## 2025-11-19T20:40:14Z
15+
16+
Initial release of an Enterovirus D68 dataset for lineage classification!
17+
18+
Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Enterovirus D68 dataset with reference Fermon
2+
3+
| Key | Value |
4+
|----------------------|-----------------------------------------------------------------------|
5+
| authors | [Nadia Neuner-Jehle](https://eve-lab.org/people/nadia-neuner-jehle/), [Alejandra González-Sánchez](https://www.vallhebron.com/en/professionals/alejandra-gonzalez-sanchez), [Emma B. Hodcroft](https://eve-lab.org/people/emma-hodcroft/), [ENPEN](https://escv.eu/european-non-polio-enterovirus-network-enpen/) |
6+
| name | Enterovirus D68 |
7+
| reference | [AY426531.1](https://www.ncbi.nlm.nih.gov/nuccore/AY426531.1) |
8+
| workflow | https://github.com/enterovirus-phylo/nextclade_d68 |
9+
| path | `enpen/enterovirus/ev-d68` |
10+
| clade definitions | A–C (D) |
11+
12+
## Citation
13+
14+
If you use this dataset in your research, please cite:
15+
16+
> Neuner-Jehle, N., González Sánchez, A., Hodcroft, E. B., & European Non-Polio Enterovirus Network (ENPEN). (2025). *enterovirus-phylo/nextclade_d68: Enterovirus D68 Nextclade Dataset v1.0.0* (v1.0.0--2025-11-18). Zenodo. https://doi.org/10.5281/zenodo.17642338
17+
18+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17642338.svg)](https://doi.org/10.5281/zenodo.17642338)
19+
20+
## Scope of this dataset
21+
22+
Based on full-genome sequences, this dataset uses the **Fermon reference sequence** ([AY426531.1](https://www.ncbi.nlm.nih.gov/nuccore/AY426531.1)), originally isolated in 1962. It serves as the basis for quality control, clade assignment, and mutation calling across global EV-D68 diversity.
23+
24+
*Note: The Fermon reference differs substantially from currently circulating strains.* This is common for enterovirus datasets, in contrast to some other virus datasets (e.g., seasonal influenza), where the reference is updated more frequently to reflect recent lineages.
25+
26+
To address this, the dataset is *rooted* on a Static Inferred Ancestor — a phylogenetically reconstructed ancestral sequence near the tree root. This provides a stable reference point that can be used, optionally, as an alternative for mutation calling.
27+
28+
## Features
29+
30+
This dataset supports:
31+
32+
- Assignment of subgenotypes
33+
- Phylogenetic placement
34+
- Sequence quality control (QC)
35+
36+
## Subgenogroups of Enterovirus D68
37+
38+
Clade designations follow the global diversity of EV-D68: A (A1–A2/D), B (B1–B3), and C. The label "pre-ABC" indicates old, basal lineages that are no longer circulating. Sequences labeled "pre-ABC" or "unassigned" may indicate sequencing or assembly issues and should be assessed carefully.
39+
40+
These designations are based on the phylogenetic structure and mutations, and are widely used in molecular epidemiology, similar to subgenotype systems for other enteroviruses. Unlike influenza (H1N1, H3N2) or SARS-CoV-2, there is no universal, standardized global lineage nomenclature for enteroviruses. Naming follows conventions from published studies and surveillance practices.
41+
42+
## Reference types
43+
44+
This dataset includes several reference points used in analyses:
45+
- *Reference:* RefSeq or similarly established reference sequence. Here Fermon.
46+
47+
- *Parent:* The nearest ancestral node of a sample in the tree, used to infer branch-specific mutations.
48+
49+
- *Clade founder:* The inferred ancestral node defining a clade (e.g., A2, B3). Mutations "since clade founder" describe changes that define that clade.
50+
51+
- *Static Inferred Ancestor:* Reconstructed ancestral sequence inferred with an outgroup, representing the likely founder of EV-D68. Serves as a stable reference.
52+
53+
- *Tree root:* Corresponds to the root of the tree, it may change in future updates as more data become available.
54+
55+
All references use the coordinate system of the Fermon sequence.
56+
57+
## Issues & Contact
58+
- For questions or suggestions, please [open an issue](https://github.com/enterovirus-phylo/nextclade_d68/issues) or email: eve-group[at]swisstph.ch
59+
60+
## What is a Nextclade dataset?
61+
62+
A Nextclade dataset includes the reference sequence, genome annotations, tree, clade definitions, and QC rules. Learn more in the [Nextclade documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html).
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
##gff-version 3
2+
#!gff-spec-version 1.21
3+
#!processor NCBI annotwriter
4+
##sequence-region AY426531.2 1 7367
5+
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=42789
6+
# seqname source feature start end score strand frame attribute
7+
AY426531.1 Genbank region 1 7367 . + . ID=AY426531.1:1..7367;Dbxref=taxon:42789;country=USA;gb-acronym=EV-D68;gbkey=Src;mol_type=genomic RNA;note=prototype strain of Enterovirus 68;old-name=Enterovirus 68;strain=Fermon
8+
AY426531.1 Genbank CDS 733 939 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAR98503.1:1..69
9+
AY426531.1 Genbank CDS 940 1683 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAR98503.1:70..317
10+
AY426531.1 Genbank CDS 1684 2388 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAR98503.1:318..552
11+
AY426531.1 Genbank CDS 2389 3315 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAR98503.1:553..861
12+
AY426531.1 Genbank CDS 3316 3756 . + . Name=2A;gbkey=Prot;product=2A;ID=id-AAR98503.1:862..1008
13+
AY426531.1 Genbank CDS 3757 4053 . + . Name=2B;gbkey=Prot;product=2B;ID=id-AAR98503.1:1009..1107
14+
AY426531.1 Genbank CDS 4054 5043 . + . Name=2C;gbkey=Prot;product=2C;ID=id-AAR98503.1:1108..1437
15+
AY426531.1 Genbank CDS 5044 5310 . + . Name=3A;gbkey=Prot;product=3A;ID=id-AAR98503.1:1438..1526
16+
AY426531.1 Genbank CDS 5311 5376 . + . Name=3B;gbkey=Prot;product=3B;ID=id-AAR98503.1:1527..1548
17+
AY426531.1 Genbank CDS 5377 5925 . + . Name=3C;gbkey=Prot;product=3C;ID=id-AAR98503.1:1549..1731
18+
AY426531.1 Genbank CDS 5926 7296 . + . Name=3D;gbkey=Prot;product=3D;ID=id-AAR98503.1:1732..2188

0 commit comments

Comments
 (0)