学习资料
pysam可以实现ref和alignments的来回转换(MD tag)和alignments tag的各种计算。碱基层面计算的好用的工具。
pysam - An interface for reading and writing SAM files — pysam 0.15.0 documentation
SAM数据格式学习1之CIGAR理解_Keep Learning-CSDN博客
#cigarThe standard cigar has three operations:#M : match or mismatch#I : insertion#D: deletion##The extended cigar adds#N : skipped bases on reference#S : soft clipping#H : hard clipping#P : padding#= : BAM_CEQUAL 7#X : BAM_CDIFF 8###samtools mpileup
我觉得这个可以关注一下Cigar值与mismatch,editing distance和block size之间的关系。
- Block Size = 84 + ReadNameLength(including null) + CigarLength4 + (ReadLength+1)/2 + ReadLength + TagLength
这个算的话就samtools mpileup算一下后面的具体的:http://www.htslib.org/doc/samtools-mpileup.html
- Call SNPs and short INDELs:
- Generate the consensus sequence for one diploid individual
- Include customized index file as a part of arugments
- Phase one individual
#Call SNPs and short INDELs:samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcfbcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf > var.flt.vcf#The bcftools filter command marks low quality sites and sites with the read depth exceeding a limit, which should be adjusted to about twice the average read depth (bigger read depths usually indicate problematic regions which are often enriched for artefacts). One may consider to add -C50 to mpileup if mapping quality is overestimated for reads containing excessive mismatches. Applying this option usually helps BWA-short but may not other mappers.#Individuals are identified from the SM tags in the @RG header lines. Individuals can be pooled in one alignment file; one individual can also be separated into multiple files. The -P option specifies that indel candidates should be collected only from read groups with the @RG-PL tag set to ILLUMINA. Collecting indel candidates from reads sequenced by an indel-prone technology may affect the performance of indel calling.##Generate the consensus sequence for one diploid individual:samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq#Include customized index file as a part of arugments.samtools mpileup [options] -X /data_folder/in1.bam [/data_folder/in2.bam [...]] /index_folder/index1.bai [/index_folder/index2.bai [...]]#Phase one individual:samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out#The calmd command is used to reduce false heterozygotes around INDELs.
