官网文档:https://support.sentieon.com/manual/_downloads/Sentieon.pdf

DNAseq基本流程

Sentieon - 图1

1 realign- Realigner

nodup.bam -> realn.bam

  1. sentieon driver \
  2. -t NUMBER_THREADS \
  3. -i NODUP_BAM \\
  4. -r REFERENCE \
  5. --algo Realigner \
  6. -k MILLS_INDELS \
  7. REALIGNED_BAM

2 recal(BQSR) - QualCal

realn.bam -> recal.table [+ recal.bam]

  1. # 仅生成recal.table [推荐]
  2. sentieon driver \
  3. -t NUMBER_THREADS \
  4. -i REALIGNED_BAM \
  5. -r REFERENCE \
  6. --algo QualCal \
  7. -k MILLS_INDELS \
  8. -k DBSNP \
  9. RECAL_DATA.TABLE
  10. # 同时生成recal.bam
  11. sentieon driver \
  12. -t NUMBER_THREADS \
  13. -i REALIGNED_BAM \
  14. -r REFERENCE \
  15. --algo QualCal \
  16. -k MILLS_INDELS \
  17. -k DBSNP \
  18. RECAL_DATA.TABLE \
  19. --algo ReadWriter \
  20. RECALIBRATED_BAM

3 Variant calling - Haplotyper / Genotyper

此步可分染色体进行,然后再用bcftools concat进行合并

方式1:使用Haplotyper算法,输入文件为 realn.bam + recal.table

  1. sentieon driver \
  2. -t NUMBER_THREADS \
  3. -r REFERENCE \
  4. -i REALIGNED_BAM \
  5. -q RECAL_DATA.TABLE \
  6. --algo Haplotyper \
  7. --emit_mode gvcf \
  8. --emit_conf=10 \
  9. --call_conf=10 \
  10. [-d dbSNP] \
  11. VARIANT_GVCF

方式2: 使用Genotyper算法,输入文件为recal.bam

  1. sentieon driver
  2. -t NUMBER_THREADS \
  3. -r REFERENCE \
  4. -i RECALIBRATED_BAM \
  5. --algo Genotyper \
  6. [-d dbSNP] \
  7. --emit_mode gvcf \
  8. VARIANT_GVCF

PS: 注意方式2中不能使用recal.bam+recal.table作为输入,否则会recal两次而导致错误结果

4 joint calling - GVCFtyper

s1.g.vcf.gz, s2.g.vcf.gz, … -> merged.vcf.gz

把一个或多个样本的GVCF合并成一个VCF文件, 输入文件可以通过-v指定

  1. sentieon driver \
  2. -t NUMBER_THREADS \
  3. -r REFERENCE \
  4. --algo GVCFtyper \
  5. -d DBSNP \
  6. -v s1.g.vcf.gz \
  7. -v s2.g.vcf.gz \
  8. merge.vcf.gz

也可以直接在输出文件后面列上

  1. sentieon driver -r reference.fasta [driver_options] \
  2. --algo GVCFtyper [algo_options] output.vcf[.gz] input.g.vcf[.gz] ...

5 VQSR - VarCal + ApplyVarCal

VQSR: Variant Quality Score Recalibration 使用一些可信的已知位点构建模型,对检出的变异位点进行过滤 merge.vcf.gz -> SNP VQSR -> Apply SNP VQSR -> INDEL VQSR -> Apply INDEL VQSR -> out.vcf.gz

  1. # Step1: SNP VQSR
  2. sentieon driver \\
  3. -t {vqsr_threads} \\
  4. -r {reffasta} \\
  5. --algo VarCal \\
  6. --var_type SNP \\
  7. --resource {1000g_phase1} \\
  8. --resource_param 1000G,known=false,training=true,truth=false,prior=10.0 \\
  9. --resource {1000g_omni} \\
  10. --resource_param omni,known=false,training=true,truth=true,prior=12.0 \\
  11. --resource {dbsnp} \\
  12. --resource_param dbsnp,known=true,training=false,truth=false,prior=2.0 \\
  13. --resource {hapmap} \\
  14. --resource_param hapmap,known=false,training=true,truth=true,prior=15.0 \\
  15. --annotation QD --annotation MQ --annotation MQRankSum \\
  16. --annotation ReadPosRankSum --annotation FS \\
  17. --tranches_file VCF/all.merged.snp.tranches \\
  18. -v VCF/all.merged.vcf.gz \\
  19. VCF/all.merged.snp.recal
  20. # Step2: Apply SNP VQSR
  21. sentieon driver \\
  22. -t {vqsr_threads} \\
  23. -r {reffasta} \\
  24. --algo ApplyVarCal \\
  25. --var_type SNP \\
  26. --recal VCF/all.merged.snp.recal \\
  27. --tranches_file VCF/all.merged.snp.tranches \\
  28. -v VCF/all.merged.vcf.gz \\
  29. VCF/all.merged.snp.recal.vcf.gz
  30. # Step3: INDEL VQSR
  31. sentieon driver \\
  32. -t {vqsr_threads} \\
  33. -r {reffasta} \\
  34. --algo VarCal \\
  35. --var_type INDEL \\
  36. --resource {dbsnp} \\
  37. --resource_param dbsnp,known=true,training=false,truth=false,prior=2.0 \\
  38. --resource {mills_indels} \\
  39. --resource_param Mills,known=false,training=true,truth=true,prior=12.0 \\
  40. --annotation QD --annotation MQ \\
  41. --annotation ReadPosRankSum --annotation FS \\
  42. --tranches_file VCF/all.merged.snp.indel.tranches \\
  43. -v VCF/all.merged.snp.recal.vcf.gz \\
  44. VCF/all.merged.snp.indel.recal
  45. # Step4: Apply INDEL VQSR
  46. sentieon driver \\
  47. -t {vqsr_threads} \\
  48. -r {reffasta} \\
  49. --algo ApplyVarCal \\
  50. --var_type INDEL \\
  51. --recal VCF/all.merged.snp.indel.recal \\
  52. --tranches_file VCF/all.merged.snp.indel.tranches \\
  53. -v VCF/all.merged.snp.recal.vcf.gz \\
  54. VCF/all.merged.snp.indel.recal.vcf.gz