Update metrics.md for r1.10.0-beta

danielecook · web-flow · commit 46027ebcfc31 · 2025-10-09T12:30:56.000-05:00
diff --git a/docs/metrics.md b/docs/metrics.md
@@ -12,199 +12,68 @@ Memory: 384GiB
 GPUs: 0
 ```
 
-## WGS (Illumina)
-
-### Runtime
-
-Runtime is on HG003 (all chromosomes).
-Reported runtime is an average of 5 runs.
-
-Stage                            | Time (minutes)
--------------------------------- | ------------------
-make_examples                    |  47m4.92s
-call_variants                    |  15m56.52s
-postprocess_variants (with gVCF) |  7m0.99s
-vcf_stats_report (optional)      |  5m17.67s (optional)
-total                            |  83m57.12s (1h23m57.12s)
-
-### Accuracy
-
-hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
-held out while training.
-
-| Type  | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
-| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
-| INDEL | 501594   | 2907     | 1190     | 0.994238      | 0.997729         | 0.99598         |
-| SNP   | 3306720  | 20776    | 4880     | 0.993756      | 0.998527         | 0.996136        |
-
-[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WGS/deepvariant.output.visual_report.html)
-
-## WES (Illumina)
-
-### Runtime
-
-Runtime is on HG003 (all chromosomes).
-Reported runtime is an average of 5 runs.
-
-Stage                            | Time (minutes)
--------------------------------- | -----------------
-make_examples                    | 3m0.33s
-call_variants                    | 0m33.72s
-postprocess_variants (with gVCF) | 0m39.24s
-vcf_stats_report (optional)      | 0m5.10s (optional)
-total                            | 5m7.71s
-
-### Accuracy
-
-hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
-held out while training.
-
-| Type  | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
-| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
-| INDEL | 1024     | 27       | 8        | 0.97431       | 0.992417         | 0.98328         |
-| SNP   | 24983    | 296      | 60       | 0.988291      | 0.997604         | 0.992926        |
-
-[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/WES/deepvariant.output.visual_report.html)
-
-## PacBio (HiFi)
-
-### Updated dataset
-
-We have updated the PacBio test data from HG003 Sequel-II to
-latest Revio with SPRQ chemistry data to showcase performance on the updated
-platform and chemistry. The numbers reported here are generated using the bam
-that can be found in:
-
-```bash
-gs://deepvariant/pacbio-case-study-testdata/HG003.SPRQ.pacbio.GRCh38.nov2024.bam
-```
-
-Which is also available through [here](https://downloads.pacbcloud.com/public/revio/2024Q4/WGS/GIAB_trio/HG003/analysis/GRCh38.m84039_241002_000337_s3.hifi_reads.bc2020.bam).
-
-### Runtime
-
-Runtime is on HG003 (all chromosomes).
-Reported runtime is an average of 5 runs.
-
-Stage                            | Time (minutes)
--------------------------------- | -------------------
-make_examples                    | 33m46.75s
-call_variants                    | 11m38.86s
-postprocess_variants (with gVCF) | 5m12.45s
-vcf_stats_report (optional)      | 5m34.81s (optional)
-total                            | 65m27.90s (1h05m27.90s)
-
-### Accuracy
-
-hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was
-held out while training.
-
-Starting from v1.4.0, users don't need to phase the BAMs first, and only need
-to run DeepVariant once.
-
-| Type  | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
-| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
-| INDEL | 501455   | 3046     | 2986     | 0.993962      | 0.994296         | 0.994129        |
-| SNP   | 3321751  | 5744     | 4032     | 0.998274      | 0.998789         | 0.998532        |
-
-[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/PACBIO/deepvariant.output.visual_report.html)
-
-## ONT_R104
-
-### Runtime
-
-Runtime is on HG003 reads (all chromosomes).
-Reported runtime is an average of 5 runs.
-
-Stage                            | Time (minutes)
--------------------------------- | --------------------
-make_examples                    | 46m29.14s
-call_variants                    | 53m48.26s
-postprocess_variants (with gVCF) | 11m25.74s
-vcf_stats_report (optional)      | 7m22.90s (optional)
-total                            | 127m34.97s (2h07m34.97s)
-
-### Accuracy
-
-hap.py results on HG003 (all chromosomes, using NIST v4.2.1
-truth), which was held out while training.
-
-| Type  | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
-| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
-| INDEL | 461818   | 42683    | 31344    | 0.915396      | 0.938385         | 0.926748        |
-| SNP   | 3321289  | 6206     | 5476     | 0.998135      | 0.998355         | 0.998245        |
-
-[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/ONT_R104/deepvariant.output.visual_report.html)
-
-## Hybrid (Illumina + PacBio HiFi)
-
-### Runtime
-
-Runtime is on HG003 (all chromosomes).
-Reported runtime is an average of 5 runs.
-
-Stage                            | Time (minutes)
--------------------------------- | ------------------
-make_examples                    | 60m4.06s
-call_variants                    | 62m23.86s
-postprocess_variants (with gVCF) | 4m10.56s
-vcf_stats_report (optional)      | 5m16.31s (optional)
-total                            | 162m45.17s (2h42m45.17s)
-
-### Accuracy
-
-Evaluating on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held
-out while training the hybrid model.
-
-| Type  | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
-| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
-| INDEL | 503264   | 1237     | 2052     | 0.997548      | 0.996129         | 0.996838        |
-| SNP   | 3324021  | 3474     | 1856     | 0.998956      | 0.999442         | 0.999199        |
-
-[See VCF stats report.](https://storage.googleapis.com/deepvariant/visual_reports/DeepVariant/1.9.0/HYBRID/deepvariant.output.visual_report.html)
-
-## Inspect outputs that produced the metrics above
-
-The DeepVariant VCFs, gVCFs, and hap.py evaluation outputs are available at:
-
-```
-gs://deepvariant/case-study-outputs
-```
-
-You can also inspect them in a web browser here:
-https://42basepairs.com/browse/gs/deepvariant/case-study-outputs
-
-## How to reproduce the metrics on this page
-
-For simplicity and consistency, we report runtime with a
-[CPU instance with 96 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
-This is NOT the fastest or cheapest configuration.
-
-Use `gcloud compute ssh` to log in to the newly created instance.
-
-Download and run any of the following case study scripts:
-
-```
-# Get the script.
-curl -O https://raw.githubusercontent.com/google/deepvariant/r1.9/scripts/inference_deepvariant.sh
-
-# WGS
-bash inference_deepvariant.sh --model_preset WGS
-
-# WES
-bash inference_deepvariant.sh --model_preset WES
-
-# PacBio
-bash inference_deepvariant.sh --model_preset PACBIO
-
-# ONT_R104
-bash inference_deepvariant.sh --model_preset ONT_R104
-
-# Hybrid
-bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA
-```
-
-Runtime metrics are taken from the resulting log after each stage of
-DeepVariant. The runtime numbers reported above are the average of 5 runs each.
-The accuracy metrics come from the hap.py summary.csv output file.
-The runs are deterministic so all 5 runs produced the same output.
+Reported values are based on evaluations of HG003.
+
+## Accuracy
+
+Below we report full genome accuracy as reported using
+[hap.py](https://github.com/Illumina/hap.py).
+
+model_type             | Type  | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | Recall   | Precision | F1_Score
+:--------------------- |:----- | ----------: | -------: | -------: | ----------: | -------: | -------: | --------: | -------:
+wgs                    | INDEL | 504501      | 501594   | 2907     | 937937      | 1190     | 0.994238 | 0.997729  | 0.99598
+wgs                    | SNP   | 3327496     | 3306720  | 20776    | 3817962     | 4880     | 0.993756 | 0.998527  | 0.996136
+exome                  | INDEL | 1051        | 1024     | 27       | 1485        | 8        | 0.97431  | 0.992417  | 0.98328
+exome                  | SNP   | 25279       | 24983    | 296      | 27709       | 60       | 0.988291 | 0.997604  | 0.992926
+pacbio                 | INDEL | 504501      | 501598   | 2903     | 986955      | 2949     | 0.994246 | 0.994368  | 0.994307
+pacbio                 | SNP   | 3327495     | 3321742  | 5753     | 4331772     | 4107     | 0.998271 | 0.998767  | 0.998519
+ont-r104               | INDEL | 504501      | 463074   | 41427    | 895345      | 35116    | 0.917885 | 0.931685  | 0.924733
+ont-r104               | SNP   | 3327495     | 3321037  | 6458     | 4408429     | 5729     | 0.998059 | 0.998279  | 0.998169
+hybrid-pacbio-illumina | INDEL | 504501      | 503264   | 1237     | 998274      | 2052     | 0.997548 | 0.996129  | 0.996838
+hybrid-pacbio-illumina | SNP   | 3327495     | 3324021  | 3474     | 4068058     | 1856     | 0.998956 | 0.999442  | 0.999199
+
+## Runtime
+
+Each case study was run 5x times and the runtimes were averaged. Here we report
+the mean runtime in seconds, the standard deviation of runtimes, and a duration
+format (`mean_hruntime`; hours, minutes, seconds).
+
+model_type             | stage                | mean_runtime (s) | std_runtime | mean_hruntime
+:--------------------- | :------------------- | ---------------: | ----------: | :------------
+wgs                    | make_examples        | 2887.1           | 68.658      | 48m 7s
+wgs                    | call_variants        | 939.88           | 19.599      | 15m 39s
+wgs                    | postprocess_variants | 403.37           | 3.327       | 6m 43s
+wgs                    | vcf_stats            | 317.07           | 1.123       | 5m 17s
+wgs                    | total                | 4230.35          |             | 1h 10m 30s
+exome                  | make_examples        | 176.57           | 2.153       | 2m 56s
+exome                  | call_variants        | 33.28            | 0.224       | 33s
+exome                  | postprocess_variants | 29.28            | 0.465       | 29s
+exome                  | vcf_stats            | 4.95             | 0.046       | 4s
+exome                  | total                | 239.13           |             | 3m 59s
+pacbio                 | make_examples        | 2036.71          | 104.087     | 33m 56s
+pacbio                 | call_variants        | 697.31           | 61.092      | 11m 37s
+pacbio                 | postprocess_variants | 291.27           | 6.432       | 4m 51s
+pacbio                 | vcf_stats            | 340.26           | 11.488      | 5m 40s
+pacbio                 | total                | 3025.29          |             | 50m 25s
+ont-r104               | make_examples        | 3042.24          | 20.359      | 50m 42s
+ont-r104               | call_variants        | 3286.89          | 104.469     | 54m 46s
+ont-r104               | postprocess_variants | 669.59           | 5.558       | 11m 9s
+ont-r104               | vcf_stats            | 444.71           | 10.684      | 7m 24s
+ont-r104               | total                | 6998.72          |             | 1h 56m 38s
+hybrid-pacbio-illumina | make_examples        | 3648.28          | 34.422      | 1h 48s
+hybrid-pacbio-illumina | call_variants        | 4215.97          | 314.295     | 1h 10m 15s
+hybrid-pacbio-illumina | postprocess_variants | 235.97           | 2.797       | 3m 55s
+hybrid-pacbio-illumina | vcf_stats            | 305.55           | 1.529       | 5m 5s
+hybrid-pacbio-illumina | total                | 8100.22          |             | 2h 15m
+
+**Total Runtime**
+
+The total rows are summarized below as well:
+
+uid                    | sample | mean_hruntime
+:--------------------- | :----- | :------------
+wgs                    | HG003  | 1h 10m 30s
+exome                  | HG003  | 3m 59s
+pacbio                 | HG003  | 50m 25s
+ont-r104               | HG003  | 1h 56m 38s
+hybrid-pacbio-illumina | HG003  | 2h 15m