This document goes over the example of using DeepVariant Fast Pipeline with PacBio data.
Fast Pipeline is a DeepVariant feature that allows parallelization of the make_examples and call_variant stages. It is especially useful for machines with a GPU. Examples are streamed to call_variants inference, allowing simultaneous utilization of both the CPU and GPU. Note that this feature is still experimental.
This setup requires a machine with a GPU. For this case study, we will use a
n1-standard-16 compute instance with 1 Nvidia P4 GPU. However, this setup is
not optimal, as 16 cores may not be sufficient to fully utilize the GPU. In a
real-life scenario, allocating 32 cores for make_examples would ensure better
GPU utilization and improved runtime.
Here we create Google Cloud compute instance. You may skip this step if you run the case study on a local computer with GPU.
gcloud compute instances create "deepvariant-fast-pipeline" \
--scopes "compute-rw,storage-full,cloud-platform" \
--maintenance-policy "TERMINATE" \
--accelerator=type=nvidia-tesla-p4,count=1 \
--image-family "ubuntu-2204-lts" \
--image-project "ubuntu-os-cloud" \
--machine-type "n1-standard-16" \
--boot-disk-size "100" \
--zone "us-central1-a"You can then ssh into the machine by running:
gcloud compute ssh "deepvariant-fast-pipeline" --zone us-central1-aCUDA drivers and NVIDIA Container toolkit are required to run the case study. Please refer to the following documentation for more details. NVIDIA CUDA Installation Guide for Linux, or Install GPU drivers for the installation on GCP. Installing the NVIDIA Container Toolkit
BIN_VERSION="1.10.0"
sudo docker pull google/deepvariant:"${BIN_VERSION}-gpu"Before you start running, you need to have the following input files:
- A reference genome in [FASTA] format and its corresponding index file (.fai).
mkdir -p reference
gcloud storage cp gs://deepvariant/case-study-testdata/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna* reference/- An aligned reads file in [BAM] format and its corresponding index file (.bai). You get this by aligning the reads from a sequencing instrument, using an aligner like [BWA] for example.
mkdir -p input
gcloud storage cp gs://deepvariant/pacbio-case-study-testdata/HG003.SPRQ.pacbio.GRCh38.nov2024.chr20.bam* input/fast_pipeline binary in DeepVariant docker allows to run make_examples and
call_variant stages of DeepVariant in stream mode. Here is the command line to
run the fast_pipeline
Config files below contain all input data parameters for DeepVariant. All other
model specific parameters are automatically applied from
model.example_info.json.
--examples and --gvcf flags are set with the sharded file names. ==It is
important to ensure that the number of shards matches in all config files and
the --num_shards flag in the fast_pipeline binary. In our case it is set to
14==
The machine has 16 virtual cores, but we set the number of shards to 14 to reserve 2 cores for the input pipeline in call_variants. Insufficient CPU resources for the inference pipeline can cause an input bottleneck, leading to a slowdown in the inference stage.
mkdir -p config
FILE=config/make_examples.ini
cat <<EOM >$FILE
--examples=/tmp/examples.tfrecords@14.gz
--gvcf=/tmp/examples.gvcf.tfrecord@14.gz
--mode=calling
--reads=/input/HG003.SPRQ.pacbio.GRCh38.nov2024.chr20.bam
--ref=/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
--output_phase_info
--checkpoint=/opt/models/pacbio
--regions=chr20
EOMFILE=config/call_variants.ini
cat <<EOM >$FILE
--outfile=/output/case_study.cvo.tfrecord.gz
--checkpoint=/opt/models/pacbio
--batch_size=1024
--writer_threads=1
EOMFILE=config/postprocess_variants.ini
cat <<EOM >$FILE
--ref=/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
--infile=/output/case_study.cvo.tfrecord.gz
--nonvariant_site_tfrecord_path=/tmp/examples.gvcf.tfrecord@14.gz
--outfile=/output/variants.chr20.vcf.gz
--gvcf_outfile=/output/variants.gvcf.chr20.vcf.gz
--small_model_cvo_records=/tmp/examples_call_variant_outputs.tfrecords@14.gz
--cpus=14
EOMtime sudo docker run \
-v "${PWD}/config":"/config" \
-v "${PWD}/input":"/input" \
-v "${PWD}/output":"/output" \
-v "${PWD}/reference":"/reference" \
-v /tmp:/tmp \
--gpus all \
-e DV_BIN_PATH=/opt/deepvariant/bin \
--shm-size=2gb \
google/deepvariant:"${BIN_VERSION}-gpu" \
/opt/deepvariant/bin/fast_pipeline \
--make_example_flags /config/make_examples.ini \
--call_variants_flags /config/call_variants.ini \
--postprocess_variants_flags /config/postprocess_variants.ini \
--shm_prefix dv \
--num_shards 14 \
--buffer_size 10485760 \
2>&1 | tee /tmp/fast_pipeline.docker.log-vallows to map local directory inside docker container.-ewe need to setDV_BIN_PATHenvironment variable to point to DeepVariant binaries directory inside the container.--shm-sizesets the size of shared memory available to the container. It has to be larger than--buffer_sizex--num_shards. In our case buffer_size is 10M and we run 14 shards, so 2gb would be large enough to accommodate buffers and all synchronization objects for each shard.
--make_example_flags- path to the file containingmake_examplescommand line parameters.--call_variants_flags- path to the file containingcall_variantscommand line parameters.--postprocess_variants_flags- path to the file containingpostprocess_variantscommand line parameters.--shm_prefix- prefix for shared memory files. It is an arbitrary name.--num_shards- number of parallel processes to runmake_examples--buffer_size- shared memory buffer size for each process.
On a successful completion the output directory will contain two VCF files:
variants.chr20.vcf
variants.gvcf.chr20.vcf
With the same settings the pipeline takes approximately 10 minutes.
real 10m23.879s
user 0m0.041s
sys 0m0.074s
Download benchmark data:
mkdir -p benchmark
FTPDIR=ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/AshkenazimTrio/HG003_NA24149_father/NISTv4.2.1/GRCh38
curl ${FTPDIR}/HG003_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed > benchmark/HG003_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed
curl ${FTPDIR}/HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz > benchmark/HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
curl ${FTPDIR}/HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi > benchmark/HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbiHAPPY_VERSION=v0.3.12
time sudo docker run \
-v ${PWD}/output:/output \
-v ${PWD}/benchmark:/benchmark \
-v ${PWD}/reference:/reference \
jmcdani20/hap.py:${HAPPY_VERSION} \
/opt/hap.py/bin/hap.py \
/benchmark/HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
/output/variants.chr20.vcf.gz \
-f /benchmark/HG003_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed \
-r /reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
-o /output/happy.output \
--engine=vcfeval \
--pass-only \
-l "chr20"
Benchmarking Summary:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 10628 10561 67 22717 70 11671 31 30 0.993696 0.993663 0.513756 0.993679 NaN NaN 1.748961 2.174472
INDEL PASS 10628 10561 67 22717 70 11671 31 30 0.993696 0.993663 0.513756 0.993679 NaN NaN 1.748961 2.174472
SNP ALL 70166 70106 60 103051 60 32792 7 5 0.999145 0.999146 0.318211 0.999145 2.296566 1.720961 1.883951 1.409186
SNP PASS 70166 70106 60 103051 60 32792 7 5 0.999145 0.999146 0.318211 0.999145 2.296566 1.720961 1.883951 1.409186