Annotated Examples & Interpretation

This section provides step-by-step annotated examples of WGS/WES workflows for tumor–normal, tumor-only, and germline analyses, along with notes on result interpretation and clinical caveats.

Example: Tumor–Normal Variant Calling

Scenario: Paired tumor–normal workflow using WES data (extendable to WGS by adjusting depth, PoN selection, and compute resources).

Inputs: - Tumor: tumor_R1.fq.gz, tumor_R2.fq.gz - Normal: normal_R1.fq.gz, normal_R2.fq.gz - Reference: GRCh38/hg38, associated indices, and known sites - Resources: PoN VCF, germline resource (gnomAD AF), exome BED

Steps:

  1. QC and Trimming

    fastp \
      -i tumor_R1.fq.gz -I tumor_R2.fq.gz \
      -o tumor_trim_R1.fq.gz -O tumor_trim_R2.fq.gz \
      --html tumor_fastp.html --thread 16
    
    fastp \
      -i normal_R1.fq.gz -I normal_R2.fq.gz \
      -o normal_trim_R1.fq.gz -O normal_trim_R2.fq.gz \
      --html normal_fastp.html --thread 16
    
  2. Alignment

    bwa-mem2 mem -t 16 ref.fa tumor_trim_R1.fq.gz tumor_trim_R2.fq.gz \
      | samtools sort -o tumor.bam
    
    bwa-mem2 mem -t 16 ref.fa normal_trim_R1.fq.gz normal_trim_R2.fq.gz \
      | samtools sort -o normal.bam
    
  3. Duplicate Marking & BQSR

    gatk MarkDuplicates -I tumor.bam -O tumor.markdup.bam -M tumor.metrics.txt
    gatk MarkDuplicates -I normal.bam -O normal.markdup.bam -M normal.metrics.txt
    
    gatk BaseRecalibrator -I tumor.markdup.bam -R ref.fa \
      --known-sites dbsnp.vcf --known-sites mills.vcf -O tumor.recal.table
    gatk BaseRecalibrator -I normal.markdup.bam -R ref.fa \
      --known-sites dbsnp.vcf --known-sites mills.vcf -O normal.recal.table
    
    gatk ApplyBQSR -R ref.fa -I tumor.markdup.bam \
      --bqsr-recal-file tumor.recal.table -O tumor.bqsr.bam
    gatk ApplyBQSR -R ref.fa -I normal.markdup.bam \
      --bqsr-recal-file normal.recal.table -O normal.bqsr.bam
    
  4. Somatic Calling

    gatk Mutect2 \
      -R ref.fa \
      -I tumor.bqsr.bam -tumor TUMOR \
      -I normal.bqsr.bam -normal NORMAL \
      --germline-resource af-only-gnomad.vcf.gz \
      --panel-of-normals pon.vcf.gz \
      -O somatic.vcf.gz
    
  5. Filtering

    gatk FilterMutectCalls \
      -V somatic.vcf.gz -R ref.fa \
      -O somatic.filtered.vcf.gz
    
  6. Annotation

    vep \
      -i somatic.filtered.vcf.gz \
      -o somatic.annotated.vcf \
      --cache --everything --vcf --assembly GRCh38 --offline
    

Interpretation Notes: - Variants labeled PASS are high-confidence somatic calls. - Check the FILTER field for failure reasons (e.g., contamination, panel_of_normals). - VAF: Low VAF (<5%) may be subclonal; interpret with tumor purity. - Consequence terms (missense, nonsense, frameshift) inform functional impact. - Cross-reference COSMIC/OncoKB for known drivers; use orthogonal validation before any clinical decision-making.

Example: Tumor-Only Variant Calling

Scenario: No matched normal available—requires aggressive artifact and germline suppression.

Inputs: - Tumor: tumor_R1.fq.gz, tumor_R2.fq.gz - PoN VCF, gnomAD germline resource

Steps:

  1. QC and Trimming

    fastp \
      -i tumor_R1.fq.gz -I tumor_R2.fq.gz \
      -o tumor_trim_R1.fq.gz -O tumor_trim_R2.fq.gz \
      --html tumor_fastp.html --thread 16
    
  2. Alignment

    bwa-mem2 mem -t 16 ref.fa tumor_trim_R1.fq.gz tumor_trim_R2.fq.gz \
      | samtools sort -o tumor.bam
    
  3. Duplicate Marking & BQSR

    gatk MarkDuplicates -I tumor.bam -O tumor.markdup.bam -M tumor.metrics.txt
    
    gatk BaseRecalibrator -I tumor.markdup.bam -R ref.fa \
      --known-sites dbsnp.vcf --known-sites mills.vcf \
      -O tumor.recal.table
    
    gatk ApplyBQSR -R ref.fa -I tumor.markdup.bam \
      --bqsr-recal-file tumor.recal.table -O tumor.bqsr.bam
    
  4. Somatic Calling (Tumor-Only Mode)

    gatk Mutect2 \
      -R ref.fa -I tumor.bqsr.bam -tumor TUMOR \
      --germline-resource af-only-gnomad.vcf.gz \
      --panel-of-normals pon.vcf.gz \
      -O tumoronly.vcf.gz
    
  5. Filtering

    gatk FilterMutectCalls \
      -V tumoronly.vcf.gz -R ref.fa \
      -O tumoronly.filtered.vcf.gz
    
  6. Annotation

    vep \
      -i tumoronly.filtered.vcf.gz \
      -o tumoronly.annotated.vcf \
      --cache --everything --vcf --assembly GRCh38 --offline
    

Interpretation Notes: - Expect higher false positives; rare germline often remains even after filtering. - Use gnomAD AF thresholds (e.g., exclude AF > 0.001 for rare cancers). - Orthogonal validation (Sanger, ddPCR) is strongly recommended. - Review strand bias and read-level evidence in IGV for low-VAF calls.

Example: Germline Variant Calling

Scenario: Single-sample germline variant discovery using GATK Best Practices (extendable to joint calling for cohorts).

Inputs: - Sample: sample_R1.fq.gz, sample_R2.fq.gz - Known sites (dbSNP, Mills, 1000G indels)

Steps:

  1. QC and Trimming

    fastp \
      -i sample_R1.fq.gz -I sample_R2.fq.gz \
      -o trim_R1.fq.gz -O trim_R2.fq.gz \
      --html sample_fastp.html --thread 16
    
  2. Alignment

    bwa-mem2 mem -t 16 ref.fa trim_R1.fq.gz trim_R2.fq.gz \
      | samtools sort -o sample.bam
    
  3. Duplicate Marking

    gatk MarkDuplicates -I sample.bam -O sample.markdup.bam -M sample.metrics.txt
    
  4. BQSR

    gatk BaseRecalibrator -I sample.markdup.bam -R ref.fa \
      --known-sites dbsnp.vcf --known-sites mills.vcf \
      -O sample.recal.table
    
    gatk ApplyBQSR -R ref.fa -I sample.markdup.bam \
      --bqsr-recal-file sample.recal.table \
      -O sample.bqsr.bam
    
  5. Variant Calling

    gatk HaplotypeCaller -R ref.fa -I sample.bqsr.bam \
      -O sample.g.vcf.gz -ERC GVCF
    
  6. Joint Genotyping (multi-sample)

    gatk CombineGVCFs -R ref.fa \
      -V sample1.g.vcf.gz \
      -V sample2.g.vcf.gz \
      -O cohort.g.vcf.gz
    
    gatk GenotypeGVCFs -R ref.fa -V cohort.g.vcf.gz \
      -O cohort.vcf.gz
    
  7. Filtering

    Large cohorts: VQSR Small cohorts: hard filters

    Example hard filters:

    gatk VariantFiltration -V cohort.vcf.gz \
      --filter-expression "QD < 2.0" --filter-name "QD2" \
      -O cohort.filtered.vcf.gz
    
  8. Annotation

    vep -i cohort.filtered.vcf.gz \
        -o cohort.annotated.vcf \
        --cache --everything --vcf --assembly GRCh38 --offline
    

Interpretation Notes: - Filter on quality metrics (QD, FS, MQ) to reduce false positives. - Use gnomAD to flag common benign variants. - Classify by ACMG/AMP where appropriate (research context): pathogenic, likely pathogenic, VUS, likely benign, benign. - For research publications, emphasize novel, rare, and functionally significant variants.

Note

Regardless of workflow type, validate high-impact variants with orthogonal methods before clinical use, and interpret in the context of phenotype, tumor histology (if applicable), and supporting molecular data.