Alignments - Alignment Tags - 《Bioinformatics》

1. Manual

1. Manual

1. samtools

SAMtags_samtools.pdf

2. Common tags

number of **mismatch**: Pleas note EndToEnd and soft-clipping mode!!! EndToEnd mode punishes more qulity score in mapping with two ends of the read.
1. nM,NM:i:<N> : count. STAR. nM: per (paired) alignment, NM: each mate
2. XM:i:<N> : count. HISAT2 and TopHat2.
editing** distance**: Please note the difference between the count of editing sites and
1. NM:i:<N> : count.

3. Notes

editing** distance = number of mismatch + **

Concrete Tag Manual

HISAT2: http://daehwankimlab.github.io/hisat2/manual/

Optional fields. Fields are tab-separated. hisat2 outputs zero or more of these optional fields for each alignment, depending on the type of the alignment:
• AS:i:<N> : Alignment score. Can be negative. Only present if SAM record is for an aligned read.
• ZS:i:<N> : Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.
• YS:i:<N> : Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment.
• XN:i:<N> : The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read.
• XM:i:<N> : The number of mismatches in the alignment. Only present if SAM record is for an aligned read.
• XO:i:<N> : The number of gap opens, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read.
• XG:i:<N> : The number of gap extensions, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read.
• NM:i:<N> : The edit distance; that is, the minimal number of one-nucleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read.
• YF:Z:<S> : String indicating reason why the read was filtered out. See also: [Filtering]. Only appears for reads that were filtered out.
• YT:Z:<S> : Value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly.
• MD:Z:<S> : A string representation of the mismatched reference bases in the alignment. See SAM format specification for details. Only present if SAM record is for an aligned read.
• XS:A:<A> : Values of + and - indicate the read is mapped to transcripts on sense and anti-sense strands, respectively. Spliced alignments need to have this field, which is required in Cufflinks and StringTie.
We can report this field for the canonical-splice site (GT/AG), but not for non-canonical splice sites. You can direct HISAT2 not to output such alignments (involving non-canonical splice sites) using "–pen-noncansplice 1000000".
• NH:i:<N> : The number of mapped locations for the read or the pair.
• Zs:Z:<S> : When the alignment of a read involves SNPs that are in the index, this option is used to indicate where exactly the read involves the SNPs. This optional field is similar to the above MD:Z field. For example, Zs:Z:1|S|rs3747203,97|S|rs16990981 indicates the second base of the read corresponds to a known SNP (ID: rs3747203). 97 bases after the third base (the base after the second one), the read at 100th base involves another known SNP (ID: rs16990981). 'S' indicates a single nucleotide polymorphism. 'D' and 'I' indicate a deletion and an insertion, respectively.

TopHat2:** It is similar to HISAT2**

AS:i: alignment score generated by aligner
CC:Z: reference name of the next hit; "=" for the same chromosome
CP:i: leftmost coordinate of the next hit
HI:i: query hit index, indicating the alignment record is the i-th one stored in SAM
MD:Z: string for mismatching positions.
NH:i: number of reported alignments that contains the query in the current record.
NM:i: edit distance to the reference, including ambiguous bases but excluding clipping
XG:i: the number of gap extensions, for both read and reference gaps, in the alignment.
XM:i: the number of mismatches in the alignment
XN:i: the number of ambiguous bases in the reference covering this alignment
XO:i: the number of gap opens, for both read and reference gaps, in the alignment.
XS:Z: if either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below. 
YT:Z: value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly.