学习资料
pysam可以实现ref和alignments的来回转换(MD tag)和alignments tag的各种计算。碱基层面计算的好用的工具。
pysam - An interface for reading and writing SAM files — pysam 0.15.0 documentation
SAM数据格式学习1之CIGAR理解_Keep Learning-CSDN博客
#cigar
The standard cigar has three operations:
#M : match or mismatch
#I : insertion
#D: deletion
#
#The extended cigar adds
#N : skipped bases on reference
#S : soft clipping
#H : hard clipping
#P : padding
#= : BAM_CEQUAL 7
#X : BAM_CDIFF 8
###
samtools mpileup
我觉得这个可以关注一下Cigar值与mismatch,editing distance和block size之间的关系。
- Block Size = 84 + ReadNameLength(including null) + CigarLength4 + (ReadLength+1)/2 + ReadLength + TagLength
这个算的话就samtools mpileup算一下后面的具体的:http://www.htslib.org/doc/samtools-mpileup.html
- Call SNPs and short INDELs:
- Generate the consensus sequence for one diploid individual
- Include customized index file as a part of arugments
- Phase one individual
#Call SNPs and short INDELs:
samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf > var.flt.vcf
#The bcftools filter command marks low quality sites and sites with the read depth exceeding a limit, which should be adjusted to about twice the average read depth (bigger read depths usually indicate problematic regions which are often enriched for artefacts). One may consider to add -C50 to mpileup if mapping quality is overestimated for reads containing excessive mismatches. Applying this option usually helps BWA-short but may not other mappers.
#Individuals are identified from the SM tags in the @RG header lines. Individuals can be pooled in one alignment file; one individual can also be separated into multiple files. The -P option specifies that indel candidates should be collected only from read groups with the @RG-PL tag set to ILLUMINA. Collecting indel candidates from reads sequenced by an indel-prone technology may affect the performance of indel calling.
#
#Generate the consensus sequence for one diploid individual:
samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq
#Include customized index file as a part of arugments.
samtools mpileup [options] -X /data_folder/in1.bam [/data_folder/in2.bam [...]] /index_folder/index1.bai [/index_folder/index2.bai [...]]
#Phase one individual:
samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out
#The calmd command is used to reduce false heterozygotes around INDELs.