Skip to content

Commit 46027eb

Browse files
authored
Update metrics.md for r1.10.0-beta
1 parent aec3fbc commit 46027eb

File tree

1 file changed

+65
-196
lines changed

1 file changed

+65
-196
lines changed

docs/metrics.md

Lines changed: 65 additions & 196 deletions
Original file line numberDiff line numberDiff line change
@@ -12,199 +12,68 @@ Memory: 384GiB
1212
GPUs: 0
1313
```
1414

15-
## WGS (Illumina)
16-
17-
### Runtime
18-
19-
Runtime is on HG003 (all chromosomes).
20-
Reported runtime is an average of 5 runs.
21-
22-
Stage | Time (minutes)
23-
-------------------------------- | ------------------
24-
make_examples | 47m4.92s
25-
call_variants | 15m56.52s
26-
postprocess_variants (with gVCF) | 7m0.99s
27-
vcf_stats_report (optional) | 5m17.67s (optional)
28-
total | 83m57.12s (1h23m57.12s)
29-
30-
### Accuracy
31-
32-
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
33-
held out while training.
34-
35-
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
36-
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
37-
| INDEL | 501594 | 2907 | 1190 | 0.994238 | 0.997729 | 0.99598 |
38-
| SNP | 3306720 | 20776 | 4880 | 0.993756 | 0.998527 | 0.996136 |
39-
40-
[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WGS/deepvariant.output.visual_report.html)
41-
42-
## WES (Illumina)
43-
44-
### Runtime
45-
46-
Runtime is on HG003 (all chromosomes).
47-
Reported runtime is an average of 5 runs.
48-
49-
Stage | Time (minutes)
50-
-------------------------------- | -----------------
51-
make_examples | 3m0.33s
52-
call_variants | 0m33.72s
53-
postprocess_variants (with gVCF) | 0m39.24s
54-
vcf_stats_report (optional) | 0m5.10s (optional)
55-
total | 5m7.71s
56-
57-
### Accuracy
58-
59-
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
60-
held out while training.
61-
62-
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
63-
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
64-
| INDEL | 1024 | 27 | 8 | 0.97431 | 0.992417 | 0.98328 |
65-
| SNP | 24983 | 296 | 60 | 0.988291 | 0.997604 | 0.992926 |
66-
67-
[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WES/deepvariant.output.visual_report.html)
68-
69-
## PacBio (HiFi)
70-
71-
### Updated dataset
72-
73-
We have updated the PacBio test data from HG003 Sequel-II to
74-
latest Revio with SPRQ chemistry data to showcase performance on the updated
75-
platform and chemistry. The numbers reported here are generated using the bam
76-
that can be found in:
77-
78-
```bash
79-
gs://deepvariant/pacbio-case-study-testdata/HG003.SPRQ.pacbio.GRCh38.nov2024.bam
80-
```
81-
82-
Which is also available through [here](https://downloads.pacbcloud.com/public/revio/2024Q4/WGS/GIAB_trio/HG003/analysis/GRCh38.m84039_241002_000337_s3.hifi_reads.bc2020.bam).
83-
84-
### Runtime
85-
86-
Runtime is on HG003 (all chromosomes).
87-
Reported runtime is an average of 5 runs.
88-
89-
Stage | Time (minutes)
90-
-------------------------------- | -------------------
91-
make_examples | 33m46.75s
92-
call_variants | 11m38.86s
93-
postprocess_variants (with gVCF) | 5m12.45s
94-
vcf_stats_report (optional) | 5m34.81s (optional)
95-
total | 65m27.90s (1h05m27.90s)
96-
97-
### Accuracy
98-
99-
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
100-
held out while training.
101-
102-
Starting from v1.4.0, users don't need to phase the BAMs first, and only need
103-
to run DeepVariant once.
104-
105-
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
106-
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
107-
| INDEL | 501455 | 3046 | 2986 | 0.993962 | 0.994296 | 0.994129 |
108-
| SNP | 3321751 | 5744 | 4032 | 0.998274 | 0.998789 | 0.998532 |
109-
110-
[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/PACBIO/deepvariant.output.visual_report.html)
111-
112-
## ONT_R104
113-
114-
### Runtime
115-
116-
Runtime is on HG003 reads (all chromosomes).
117-
Reported runtime is an average of 5 runs.
118-
119-
Stage | Time (minutes)
120-
-------------------------------- | --------------------
121-
make_examples | 46m29.14s
122-
call_variants | 53m48.26s
123-
postprocess_variants (with gVCF) | 11m25.74s
124-
vcf_stats_report (optional) | 7m22.90s (optional)
125-
total | 127m34.97s (2h07m34.97s)
126-
127-
### Accuracy
128-
129-
hap.py results on HG003 (all chromosomes, using NIST v4.2.1
130-
truth), which was held out while training.
131-
132-
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
133-
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
134-
| INDEL | 461818 | 42683 | 31344 | 0.915396 | 0.938385 | 0.926748 |
135-
| SNP | 3321289 | 6206 | 5476 | 0.998135 | 0.998355 | 0.998245 |
136-
137-
[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/ONT_R104/deepvariant.output.visual_report.html)
138-
139-
## Hybrid (Illumina + PacBio HiFi)
140-
141-
### Runtime
142-
143-
Runtime is on HG003 (all chromosomes).
144-
Reported runtime is an average of 5 runs.
145-
146-
Stage | Time (minutes)
147-
-------------------------------- | ------------------
148-
make_examples | 60m4.06s
149-
call_variants | 62m23.86s
150-
postprocess_variants (with gVCF) | 4m10.56s
151-
vcf_stats_report (optional) | 5m16.31s (optional)
152-
total | 162m45.17s (2h42m45.17s)
153-
154-
### Accuracy
155-
156-
Evaluating on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held
157-
out while training the hybrid model.
158-
159-
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
160-
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
161-
| INDEL | 503264 | 1237 | 2052 | 0.997548 | 0.996129 | 0.996838 |
162-
| SNP | 3324021 | 3474 | 1856 | 0.998956 | 0.999442 | 0.999199 |
163-
164-
[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/HYBRID/deepvariant.output.visual_report.html)
165-
166-
## Inspect outputs that produced the metrics above
167-
168-
The DeepVariant VCFs, gVCFs, and hap.py evaluation outputs are available at:
169-
170-
```
171-
gs://deepvariant/case-study-outputs
172-
```
173-
174-
You can also inspect them in a web browser here:
175-
https://42basepairs.com/browse/gs/deepvariant/case-study-outputs
176-
177-
## How to reproduce the metrics on this page
178-
179-
For simplicity and consistency, we report runtime with a
180-
[CPU instance with 96 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
181-
This is NOT the fastest or cheapest configuration.
182-
183-
Use `gcloud compute ssh` to log in to the newly created instance.
184-
185-
Download and run any of the following case study scripts:
186-
187-
```
188-
# Get the script.
189-
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.9/scripts/inference_deepvariant.sh
190-
191-
# WGS
192-
bash inference_deepvariant.sh --model_preset WGS
193-
194-
# WES
195-
bash inference_deepvariant.sh --model_preset WES
196-
197-
# PacBio
198-
bash inference_deepvariant.sh --model_preset PACBIO
199-
200-
# ONT_R104
201-
bash inference_deepvariant.sh --model_preset ONT_R104
202-
203-
# Hybrid
204-
bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA
205-
```
206-
207-
Runtime metrics are taken from the resulting log after each stage of
208-
DeepVariant. The runtime numbers reported above are the average of 5 runs each.
209-
The accuracy metrics come from the hap.py summary.csv output file.
210-
The runs are deterministic so all 5 runs produced the same output.
15+
Reported values are based on evaluations of HG003.
16+
17+
## Accuracy
18+
19+
Below we report full genome accuracy as reported using
20+
[hap.py](https://github.com/Illumina/hap.py).
21+
22+
model_type | Type | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | Recall | Precision | F1_Score
23+
:--------------------- |:----- | ----------: | -------: | -------: | ----------: | -------: | -------: | --------: | -------:
24+
wgs | INDEL | 504501 | 501594 | 2907 | 937937 | 1190 | 0.994238 | 0.997729 | 0.99598
25+
wgs | SNP | 3327496 | 3306720 | 20776 | 3817962 | 4880 | 0.993756 | 0.998527 | 0.996136
26+
exome | INDEL | 1051 | 1024 | 27 | 1485 | 8 | 0.97431 | 0.992417 | 0.98328
27+
exome | SNP | 25279 | 24983 | 296 | 27709 | 60 | 0.988291 | 0.997604 | 0.992926
28+
pacbio | INDEL | 504501 | 501598 | 2903 | 986955 | 2949 | 0.994246 | 0.994368 | 0.994307
29+
pacbio | SNP | 3327495 | 3321742 | 5753 | 4331772 | 4107 | 0.998271 | 0.998767 | 0.998519
30+
ont-r104 | INDEL | 504501 | 463074 | 41427 | 895345 | 35116 | 0.917885 | 0.931685 | 0.924733
31+
ont-r104 | SNP | 3327495 | 3321037 | 6458 | 4408429 | 5729 | 0.998059 | 0.998279 | 0.998169
32+
hybrid-pacbio-illumina | INDEL | 504501 | 503264 | 1237 | 998274 | 2052 | 0.997548 | 0.996129 | 0.996838
33+
hybrid-pacbio-illumina | SNP | 3327495 | 3324021 | 3474 | 4068058 | 1856 | 0.998956 | 0.999442 | 0.999199
34+
35+
## Runtime
36+
37+
Each case study was run 5x times and the runtimes were averaged. Here we report
38+
the mean runtime in seconds, the standard deviation of runtimes, and a duration
39+
format (`mean_hruntime`; hours, minutes, seconds).
40+
41+
model_type | stage | mean_runtime (s) | std_runtime | mean_hruntime
42+
:--------------------- | :------------------- | ---------------: | ----------: | :------------
43+
wgs | make_examples | 2887.1 | 68.658 | 48m 7s
44+
wgs | call_variants | 939.88 | 19.599 | 15m 39s
45+
wgs | postprocess_variants | 403.37 | 3.327 | 6m 43s
46+
wgs | vcf_stats | 317.07 | 1.123 | 5m 17s
47+
wgs | total | 4230.35 | | 1h 10m 30s
48+
exome | make_examples | 176.57 | 2.153 | 2m 56s
49+
exome | call_variants | 33.28 | 0.224 | 33s
50+
exome | postprocess_variants | 29.28 | 0.465 | 29s
51+
exome | vcf_stats | 4.95 | 0.046 | 4s
52+
exome | total | 239.13 | | 3m 59s
53+
pacbio | make_examples | 2036.71 | 104.087 | 33m 56s
54+
pacbio | call_variants | 697.31 | 61.092 | 11m 37s
55+
pacbio | postprocess_variants | 291.27 | 6.432 | 4m 51s
56+
pacbio | vcf_stats | 340.26 | 11.488 | 5m 40s
57+
pacbio | total | 3025.29 | | 50m 25s
58+
ont-r104 | make_examples | 3042.24 | 20.359 | 50m 42s
59+
ont-r104 | call_variants | 3286.89 | 104.469 | 54m 46s
60+
ont-r104 | postprocess_variants | 669.59 | 5.558 | 11m 9s
61+
ont-r104 | vcf_stats | 444.71 | 10.684 | 7m 24s
62+
ont-r104 | total | 6998.72 | | 1h 56m 38s
63+
hybrid-pacbio-illumina | make_examples | 3648.28 | 34.422 | 1h 48s
64+
hybrid-pacbio-illumina | call_variants | 4215.97 | 314.295 | 1h 10m 15s
65+
hybrid-pacbio-illumina | postprocess_variants | 235.97 | 2.797 | 3m 55s
66+
hybrid-pacbio-illumina | vcf_stats | 305.55 | 1.529 | 5m 5s
67+
hybrid-pacbio-illumina | total | 8100.22 | | 2h 15m
68+
69+
**Total Runtime**
70+
71+
The total rows are summarized below as well:
72+
73+
uid | sample | mean_hruntime
74+
:--------------------- | :----- | :------------
75+
wgs | HG003 | 1h 10m 30s
76+
exome | HG003 | 3m 59s
77+
pacbio | HG003 | 50m 25s
78+
ont-r104 | HG003 | 1h 56m 38s
79+
hybrid-pacbio-illumina | HG003 | 2h 15m

0 commit comments

Comments
 (0)