Annotated Examples & Interpretation
This section provides step-by-step annotated examples of WGS/WES workflows for tumor–normal, tumor-only, and germline analyses, along with notes on result interpretation and clinical caveats.
Example: Tumor–Normal Variant Calling
Scenario: Paired tumor–normal workflow using WES data (extendable to WGS by adjusting depth, PoN selection, and compute resources).
Inputs: - Tumor: tumor_R1.fq.gz, tumor_R2.fq.gz - Normal: normal_R1.fq.gz, normal_R2.fq.gz - Reference: GRCh38/hg38, associated indices, and known sites - Resources: PoN VCF, germline resource (gnomAD AF), exome BED
Steps:
QC and Trimming
fastp \ -i tumor_R1.fq.gz -I tumor_R2.fq.gz \ -o tumor_trim_R1.fq.gz -O tumor_trim_R2.fq.gz \ --html tumor_fastp.html --thread 16 fastp \ -i normal_R1.fq.gz -I normal_R2.fq.gz \ -o normal_trim_R1.fq.gz -O normal_trim_R2.fq.gz \ --html normal_fastp.html --thread 16
Alignment
bwa-mem2 mem -t 16 ref.fa tumor_trim_R1.fq.gz tumor_trim_R2.fq.gz \ | samtools sort -o tumor.bam bwa-mem2 mem -t 16 ref.fa normal_trim_R1.fq.gz normal_trim_R2.fq.gz \ | samtools sort -o normal.bam
Duplicate Marking & BQSR
gatk MarkDuplicates -I tumor.bam -O tumor.markdup.bam -M tumor.metrics.txt gatk MarkDuplicates -I normal.bam -O normal.markdup.bam -M normal.metrics.txt gatk BaseRecalibrator -I tumor.markdup.bam -R ref.fa \ --known-sites dbsnp.vcf --known-sites mills.vcf -O tumor.recal.table gatk BaseRecalibrator -I normal.markdup.bam -R ref.fa \ --known-sites dbsnp.vcf --known-sites mills.vcf -O normal.recal.table gatk ApplyBQSR -R ref.fa -I tumor.markdup.bam \ --bqsr-recal-file tumor.recal.table -O tumor.bqsr.bam gatk ApplyBQSR -R ref.fa -I normal.markdup.bam \ --bqsr-recal-file normal.recal.table -O normal.bqsr.bam
Somatic Calling
gatk Mutect2 \ -R ref.fa \ -I tumor.bqsr.bam -tumor TUMOR \ -I normal.bqsr.bam -normal NORMAL \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O somatic.vcf.gz
Filtering
gatk FilterMutectCalls \ -V somatic.vcf.gz -R ref.fa \ -O somatic.filtered.vcf.gz
Annotation
vep \ -i somatic.filtered.vcf.gz \ -o somatic.annotated.vcf \ --cache --everything --vcf --assembly GRCh38 --offline
Interpretation Notes:
- Variants labeled PASS are high-confidence somatic calls.
- Check the FILTER field for failure reasons (e.g., contamination, panel_of_normals).
- VAF: Low VAF (<5%) may be subclonal; interpret with tumor purity.
- Consequence terms (missense, nonsense, frameshift) inform functional impact.
- Cross-reference COSMIC/OncoKB for known drivers; use orthogonal validation before any clinical decision-making.
Example: Tumor-Only Variant Calling
Scenario: No matched normal available—requires aggressive artifact and germline suppression.
Inputs: - Tumor: tumor_R1.fq.gz, tumor_R2.fq.gz - PoN VCF, gnomAD germline resource
Steps:
QC and Trimming
fastp \ -i tumor_R1.fq.gz -I tumor_R2.fq.gz \ -o tumor_trim_R1.fq.gz -O tumor_trim_R2.fq.gz \ --html tumor_fastp.html --thread 16
Alignment
bwa-mem2 mem -t 16 ref.fa tumor_trim_R1.fq.gz tumor_trim_R2.fq.gz \ | samtools sort -o tumor.bam
Duplicate Marking & BQSR
gatk MarkDuplicates -I tumor.bam -O tumor.markdup.bam -M tumor.metrics.txt gatk BaseRecalibrator -I tumor.markdup.bam -R ref.fa \ --known-sites dbsnp.vcf --known-sites mills.vcf \ -O tumor.recal.table gatk ApplyBQSR -R ref.fa -I tumor.markdup.bam \ --bqsr-recal-file tumor.recal.table -O tumor.bqsr.bam
Somatic Calling (Tumor-Only Mode)
gatk Mutect2 \ -R ref.fa -I tumor.bqsr.bam -tumor TUMOR \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O tumoronly.vcf.gz
Filtering
gatk FilterMutectCalls \ -V tumoronly.vcf.gz -R ref.fa \ -O tumoronly.filtered.vcf.gz
Annotation
vep \ -i tumoronly.filtered.vcf.gz \ -o tumoronly.annotated.vcf \ --cache --everything --vcf --assembly GRCh38 --offline
Interpretation Notes: - Expect higher false positives; rare germline often remains even after filtering. - Use gnomAD AF thresholds (e.g., exclude AF > 0.001 for rare cancers). - Orthogonal validation (Sanger, ddPCR) is strongly recommended. - Review strand bias and read-level evidence in IGV for low-VAF calls.
Example: Germline Variant Calling
Scenario: Single-sample germline variant discovery using GATK Best Practices (extendable to joint calling for cohorts).
Inputs: - Sample: sample_R1.fq.gz, sample_R2.fq.gz - Known sites (dbSNP, Mills, 1000G indels)
Steps:
QC and Trimming
fastp \ -i sample_R1.fq.gz -I sample_R2.fq.gz \ -o trim_R1.fq.gz -O trim_R2.fq.gz \ --html sample_fastp.html --thread 16
Alignment
bwa-mem2 mem -t 16 ref.fa trim_R1.fq.gz trim_R2.fq.gz \ | samtools sort -o sample.bam
Duplicate Marking
gatk MarkDuplicates -I sample.bam -O sample.markdup.bam -M sample.metrics.txt
BQSR
gatk BaseRecalibrator -I sample.markdup.bam -R ref.fa \ --known-sites dbsnp.vcf --known-sites mills.vcf \ -O sample.recal.table gatk ApplyBQSR -R ref.fa -I sample.markdup.bam \ --bqsr-recal-file sample.recal.table \ -O sample.bqsr.bam
Variant Calling
gatk HaplotypeCaller -R ref.fa -I sample.bqsr.bam \ -O sample.g.vcf.gz -ERC GVCF
Joint Genotyping (multi-sample)
gatk CombineGVCFs -R ref.fa \ -V sample1.g.vcf.gz \ -V sample2.g.vcf.gz \ -O cohort.g.vcf.gz gatk GenotypeGVCFs -R ref.fa -V cohort.g.vcf.gz \ -O cohort.vcf.gz
Filtering
Large cohorts: VQSR Small cohorts: hard filters
Example hard filters:
gatk VariantFiltration -V cohort.vcf.gz \ --filter-expression "QD < 2.0" --filter-name "QD2" \ -O cohort.filtered.vcf.gz
Annotation
vep -i cohort.filtered.vcf.gz \ -o cohort.annotated.vcf \ --cache --everything --vcf --assembly GRCh38 --offline
Interpretation Notes:
- Filter on quality metrics (QD, FS, MQ) to reduce false positives.
- Use gnomAD to flag common benign variants.
- Classify by ACMG/AMP where appropriate (research context): pathogenic, likely pathogenic, VUS, likely benign, benign.
- For research publications, emphasize novel, rare, and functionally significant variants.
Note
Regardless of workflow type, validate high-impact variants with orthogonal methods before clinical use, and interpret in the context of phenotype, tumor histology (if applicable), and supporting molecular data.