@@ -12,199 +12,68 @@ Memory: 384GiB
1212GPUs: 0
1313```
1414
15- ## WGS (Illumina)
16-
17- ### Runtime
18-
19- Runtime is on HG003 (all chromosomes).
20- Reported runtime is an average of 5 runs.
21-
22- Stage | Time (minutes)
23- -------------------------------- | ------------------
24- make_examples | 47m4.92s
25- call_variants | 15m56.52s
26- postprocess_variants (with gVCF) | 7m0.99s
27- vcf_stats_report (optional) | 5m17.67s (optional)
28- total | 83m57.12s (1h23m57.12s)
29-
30- ### Accuracy
31-
32- hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
33- held out while training.
34-
35- | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
36- | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
37- | INDEL | 501594 | 2907 | 1190 | 0.994238 | 0.997729 | 0.99598 |
38- | SNP | 3306720 | 20776 | 4880 | 0.993756 | 0.998527 | 0.996136 |
39-
40- [ See VCF stats report.] ( https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WGS/deepvariant.output.visual_report.html )
41-
42- ## WES (Illumina)
43-
44- ### Runtime
45-
46- Runtime is on HG003 (all chromosomes).
47- Reported runtime is an average of 5 runs.
48-
49- Stage | Time (minutes)
50- -------------------------------- | -----------------
51- make_examples | 3m0.33s
52- call_variants | 0m33.72s
53- postprocess_variants (with gVCF) | 0m39.24s
54- vcf_stats_report (optional) | 0m5.10s (optional)
55- total | 5m7.71s
56-
57- ### Accuracy
58-
59- hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
60- held out while training.
61-
62- | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
63- | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
64- | INDEL | 1024 | 27 | 8 | 0.97431 | 0.992417 | 0.98328 |
65- | SNP | 24983 | 296 | 60 | 0.988291 | 0.997604 | 0.992926 |
66-
67- [ See VCF stats report.] ( https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WES/deepvariant.output.visual_report.html )
68-
69- ## PacBio (HiFi)
70-
71- ### Updated dataset
72-
73- We have updated the PacBio test data from HG003 Sequel-II to
74- latest Revio with SPRQ chemistry data to showcase performance on the updated
75- platform and chemistry. The numbers reported here are generated using the bam
76- that can be found in:
77-
78- ``` bash
79- gs://deepvariant/pacbio-case-study-testdata/HG003.SPRQ.pacbio.GRCh38.nov2024.bam
80- ```
81-
82- Which is also available through [ here] ( https://downloads.pacbcloud.com/public/revio/2024Q4/WGS/GIAB_trio/HG003/analysis/GRCh38.m84039_241002_000337_s3.hifi_reads.bc2020.bam ) .
83-
84- ### Runtime
85-
86- Runtime is on HG003 (all chromosomes).
87- Reported runtime is an average of 5 runs.
88-
89- Stage | Time (minutes)
90- -------------------------------- | -------------------
91- make_examples | 33m46.75s
92- call_variants | 11m38.86s
93- postprocess_variants (with gVCF) | 5m12.45s
94- vcf_stats_report (optional) | 5m34.81s (optional)
95- total | 65m27.90s (1h05m27.90s)
96-
97- ### Accuracy
98-
99- hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
100- held out while training.
101-
102- Starting from v1.4.0, users don't need to phase the BAMs first, and only need
103- to run DeepVariant once.
104-
105- | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
106- | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
107- | INDEL | 501455 | 3046 | 2986 | 0.993962 | 0.994296 | 0.994129 |
108- | SNP | 3321751 | 5744 | 4032 | 0.998274 | 0.998789 | 0.998532 |
109-
110- [ See VCF stats report.] ( https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/PACBIO/deepvariant.output.visual_report.html )
111-
112- ## ONT_R104
113-
114- ### Runtime
115-
116- Runtime is on HG003 reads (all chromosomes).
117- Reported runtime is an average of 5 runs.
118-
119- Stage | Time (minutes)
120- -------------------------------- | --------------------
121- make_examples | 46m29.14s
122- call_variants | 53m48.26s
123- postprocess_variants (with gVCF) | 11m25.74s
124- vcf_stats_report (optional) | 7m22.90s (optional)
125- total | 127m34.97s (2h07m34.97s)
126-
127- ### Accuracy
128-
129- hap.py results on HG003 (all chromosomes, using NIST v4.2.1
130- truth), which was held out while training.
131-
132- | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
133- | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
134- | INDEL | 461818 | 42683 | 31344 | 0.915396 | 0.938385 | 0.926748 |
135- | SNP | 3321289 | 6206 | 5476 | 0.998135 | 0.998355 | 0.998245 |
136-
137- [ See VCF stats report.] ( https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/ONT_R104/deepvariant.output.visual_report.html )
138-
139- ## Hybrid (Illumina + PacBio HiFi)
140-
141- ### Runtime
142-
143- Runtime is on HG003 (all chromosomes).
144- Reported runtime is an average of 5 runs.
145-
146- Stage | Time (minutes)
147- -------------------------------- | ------------------
148- make_examples | 60m4.06s
149- call_variants | 62m23.86s
150- postprocess_variants (with gVCF) | 4m10.56s
151- vcf_stats_report (optional) | 5m16.31s (optional)
152- total | 162m45.17s (2h42m45.17s)
153-
154- ### Accuracy
155-
156- Evaluating on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held
157- out while training the hybrid model.
158-
159- | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
160- | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
161- | INDEL | 503264 | 1237 | 2052 | 0.997548 | 0.996129 | 0.996838 |
162- | SNP | 3324021 | 3474 | 1856 | 0.998956 | 0.999442 | 0.999199 |
163-
164- [ See VCF stats report.] ( https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/HYBRID/deepvariant.output.visual_report.html )
165-
166- ## Inspect outputs that produced the metrics above
167-
168- The DeepVariant VCFs, gVCFs, and hap.py evaluation outputs are available at:
169-
170- ```
171- gs://deepvariant/case-study-outputs
172- ```
173-
174- You can also inspect them in a web browser here:
175- https://42basepairs.com/browse/gs/deepvariant/case-study-outputs
176-
177- ## How to reproduce the metrics on this page
178-
179- For simplicity and consistency, we report runtime with a
180- [ CPU instance with 96 CPUs] ( deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform )
181- This is NOT the fastest or cheapest configuration.
182-
183- Use ` gcloud compute ssh ` to log in to the newly created instance.
184-
185- Download and run any of the following case study scripts:
186-
187- ```
188- # Get the script.
189- curl -O https://raw.githubusercontent.com/google/deepvariant/r1.9/scripts/inference_deepvariant.sh
190-
191- # WGS
192- bash inference_deepvariant.sh --model_preset WGS
193-
194- # WES
195- bash inference_deepvariant.sh --model_preset WES
196-
197- # PacBio
198- bash inference_deepvariant.sh --model_preset PACBIO
199-
200- # ONT_R104
201- bash inference_deepvariant.sh --model_preset ONT_R104
202-
203- # Hybrid
204- bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA
205- ```
206-
207- Runtime metrics are taken from the resulting log after each stage of
208- DeepVariant. The runtime numbers reported above are the average of 5 runs each.
209- The accuracy metrics come from the hap.py summary.csv output file.
210- The runs are deterministic so all 5 runs produced the same output.
15+ Reported values are based on evaluations of HG003.
16+
17+ ## Accuracy
18+
19+ Below we report full genome accuracy as reported using
20+ [ hap.py] ( https://github.com/Illumina/hap.py ) .
21+
22+ model_type | Type | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | Recall | Precision | F1_Score
23+ :--------------------- |:----- | ----------: | -------: | -------: | ----------: | -------: | -------: | --------: | -------:
24+ wgs | INDEL | 504501 | 501594 | 2907 | 937937 | 1190 | 0.994238 | 0.997729 | 0.99598
25+ wgs | SNP | 3327496 | 3306720 | 20776 | 3817962 | 4880 | 0.993756 | 0.998527 | 0.996136
26+ exome | INDEL | 1051 | 1024 | 27 | 1485 | 8 | 0.97431 | 0.992417 | 0.98328
27+ exome | SNP | 25279 | 24983 | 296 | 27709 | 60 | 0.988291 | 0.997604 | 0.992926
28+ pacbio | INDEL | 504501 | 501598 | 2903 | 986955 | 2949 | 0.994246 | 0.994368 | 0.994307
29+ pacbio | SNP | 3327495 | 3321742 | 5753 | 4331772 | 4107 | 0.998271 | 0.998767 | 0.998519
30+ ont-r104 | INDEL | 504501 | 463074 | 41427 | 895345 | 35116 | 0.917885 | 0.931685 | 0.924733
31+ ont-r104 | SNP | 3327495 | 3321037 | 6458 | 4408429 | 5729 | 0.998059 | 0.998279 | 0.998169
32+ hybrid-pacbio-illumina | INDEL | 504501 | 503264 | 1237 | 998274 | 2052 | 0.997548 | 0.996129 | 0.996838
33+ hybrid-pacbio-illumina | SNP | 3327495 | 3324021 | 3474 | 4068058 | 1856 | 0.998956 | 0.999442 | 0.999199
34+
35+ ## Runtime
36+
37+ Each case study was run 5x times and the runtimes were averaged. Here we report
38+ the mean runtime in seconds, the standard deviation of runtimes, and a duration
39+ format (` mean_hruntime ` ; hours, minutes, seconds).
40+
41+ model_type | stage | mean_runtime (s) | std_runtime | mean_hruntime
42+ :--------------------- | :------------------- | ---------------: | ----------: | :------------
43+ wgs | make_examples | 2887.1 | 68.658 | 48m 7s
44+ wgs | call_variants | 939.88 | 19.599 | 15m 39s
45+ wgs | postprocess_variants | 403.37 | 3.327 | 6m 43s
46+ wgs | vcf_stats | 317.07 | 1.123 | 5m 17s
47+ wgs | total | 4230.35 | | 1h 10m 30s
48+ exome | make_examples | 176.57 | 2.153 | 2m 56s
49+ exome | call_variants | 33.28 | 0.224 | 33s
50+ exome | postprocess_variants | 29.28 | 0.465 | 29s
51+ exome | vcf_stats | 4.95 | 0.046 | 4s
52+ exome | total | 239.13 | | 3m 59s
53+ pacbio | make_examples | 2036.71 | 104.087 | 33m 56s
54+ pacbio | call_variants | 697.31 | 61.092 | 11m 37s
55+ pacbio | postprocess_variants | 291.27 | 6.432 | 4m 51s
56+ pacbio | vcf_stats | 340.26 | 11.488 | 5m 40s
57+ pacbio | total | 3025.29 | | 50m 25s
58+ ont-r104 | make_examples | 3042.24 | 20.359 | 50m 42s
59+ ont-r104 | call_variants | 3286.89 | 104.469 | 54m 46s
60+ ont-r104 | postprocess_variants | 669.59 | 5.558 | 11m 9s
61+ ont-r104 | vcf_stats | 444.71 | 10.684 | 7m 24s
62+ ont-r104 | total | 6998.72 | | 1h 56m 38s
63+ hybrid-pacbio-illumina | make_examples | 3648.28 | 34.422 | 1h 48s
64+ hybrid-pacbio-illumina | call_variants | 4215.97 | 314.295 | 1h 10m 15s
65+ hybrid-pacbio-illumina | postprocess_variants | 235.97 | 2.797 | 3m 55s
66+ hybrid-pacbio-illumina | vcf_stats | 305.55 | 1.529 | 5m 5s
67+ hybrid-pacbio-illumina | total | 8100.22 | | 2h 15m
68+
69+ ** Total Runtime**
70+
71+ The total rows are summarized below as well:
72+
73+ uid | sample | mean_hruntime
74+ :--------------------- | :----- | :------------
75+ wgs | HG003 | 1h 10m 30s
76+ exome | HG003 | 3m 59s
77+ pacbio | HG003 | 50m 25s
78+ ont-r104 | HG003 | 1h 56m 38s
79+ hybrid-pacbio-illumina | HG003 | 2h 15m
0 commit comments