学习资料

    pysam可以实现ref和alignments的来回转换(MD tag)和alignments tag的各种计算。碱基层面计算的好用的工具。

    pysam - An interface for reading and writing SAM files — pysam 0.15.0 documentation

    SAM数据格式学习1之CIGAR理解_Keep Learning-CSDN博客

    1. #cigar
    2. The standard cigar has three operations:
    3. #M : match or mismatch
    4. #I : insertion
    5. #D: deletion
    6. #
    7. #The extended cigar adds
    8. #N : skipped bases on reference
    9. #S : soft clipping
    10. #H : hard clipping
    11. #P : padding
    12. #= : BAM_CEQUAL 7
    13. #X : BAM_CDIFF 8
    14. ###
    15. samtools mpileup

    我觉得这个可以关注一下Cigar值与mismatch,editing distance和block size之间的关系。

    • Block Size = 84 + ReadNameLength(including null) + CigarLength4 + (ReadLength+1)/2 + ReadLength + TagLength

    这个算的话就samtools mpileup算一下后面的具体的:http://www.htslib.org/doc/samtools-mpileup.html

    1. Call SNPs and short INDELs:
    2. Generate the consensus sequence for one diploid individual
    3. Include customized index file as a part of arugments
    4. Phase one individual
    1. #Call SNPs and short INDELs:
    2. samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
    3. bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf > var.flt.vcf
    4. #The bcftools filter command marks low quality sites and sites with the read depth exceeding a limit, which should be adjusted to about twice the average read depth (bigger read depths usually indicate problematic regions which are often enriched for artefacts). One may consider to add -C50 to mpileup if mapping quality is overestimated for reads containing excessive mismatches. Applying this option usually helps BWA-short but may not other mappers.
    5. #Individuals are identified from the SM tags in the @RG header lines. Individuals can be pooled in one alignment file; one individual can also be separated into multiple files. The -P option specifies that indel candidates should be collected only from read groups with the @RG-PL tag set to ILLUMINA. Collecting indel candidates from reads sequenced by an indel-prone technology may affect the performance of indel calling.
    6. #
    7. #Generate the consensus sequence for one diploid individual:
    8. samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq
    9. #Include customized index file as a part of arugments.
    10. samtools mpileup [options] -X /data_folder/in1.bam [/data_folder/in2.bam [...]] /index_folder/index1.bai [/index_folder/index2.bai [...]]
    11. #Phase one individual:
    12. samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out
    13. #The calmd command is used to reduce false heterozygotes around INDELs.