1. SAM file

SAM file is a format to save alignment information of short reads mapped against reference sequences.

  1. ##---------------View SAM file
  2. $ cat P3-VERO-P3-1-vero_L4.sam | head -n 10
  3. # or
  4. $ samtools view -h P3-VERO-P3-1-vero_L4.sam | head -n 10

A SAM file usually starts with a header section and is followed by alignment information as tab-separated lines for each read.

* Header section

Each header line begins with the character @ followed by one of the two-letter header record type codes defined in this section.

  1. @SQ SN:NC_045512.2 LN:29903
  2. @PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 4 -M /public/home/ykk/SARS_CoV_2/mapping/ref/Wuhan-Hu-1.fasta /public/home/ykk/SARS_CoV_2/clean_datP3-1-vero_L4_1.fq.gz /public/home/ykk/SARS_CoV_2/clean_data/P3-VERO-P3-1-vero_L4_2.fq.gz

* Tab-delimited read alignment information lines

Each alignment line typically represents the linear alignment of a segment. Each line consists of 11 or more TAB-separated fields.

  1. A00821:275:HWMMWDSXX:4:1101:1298:1016 83 NC_045512.2 4209 60 150M = 4179 -180 AAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTAGGGTTTAAATGGTTACACTGTAGAGGAGGCAAAGACAGTGCTTAAAAAGTGTAAAAGTGCCTTTTACATTCTACCATCTATTATCTCTAATGAGAAGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF NM:i:0 MD:Z:150 MC:Z:150M AS:i:150
  2. ……
  3. A00821:275:HWMMWDSXX:4:1101:23484:2519 77 * 0 0 * * 0 0 ACGGATTGTACGCAAGTACAGTGGTAGGGGAGCGTTCCAAGGGTGATGATAAGGACTGGTGGAGCGCTTGGAAGTGATTATGCCGGCATGAGTAACGTTTGGAAGTGAGAATCTTCCATGCCGTTTGACCAAGGTTTCCT FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF,FFFFFFFFFF:FFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF AS:i:0 XS:i:0

The first eleven fields are always present and in the order shown below; if the information represented by any of these fields is unavailable, that field’s value will be a placeholder, either ‘0’ or ‘*’ as determined by the field’s type.

* The Meaning of Each Column

  1. QNAME: Read Name
  2. FLAG: Combination of bitwise FLAGs. Each bit is explained in the following table:
    SAM/BAM - 图1
  3. RNAME: Reference sequence NAME of the alignment.
  4. POS: 1-based leftmost mapping position of the first CIGAR operation that “consumes” a reference base (see table below). The fifirst base in a reference sequence has coordinate 1. POS is set as 0 for an unmapped read without coordination.
  5. MAPQ: Mapping quality.
  6. CIGAR: The CIGAR operations are given in the following table (set ‘*’ if unavailable):
    SAM/BAM - 图2
  7. RNEXT: Reference sequence name of the primary alignment of the NEXT read in the template. This field is set as ‘*’ when the information is unavailable, and set as ‘=’ if RNEXT is identical to RNAME.
  8. PNEXT: 1-based Position of the primary alignment of the NEXT read in the template. Set as 0 when the information is unavailable.
  9. TLEN: The length of the template.
  10. SEQ: Read Sequence
  11. QUAL: Read Quality

    2. BAM file

    A BAM file (*.bam) is the compressed binary version of a SAM file.

    View BAM file

    BAM files can only be viewed by samtools view command.

  1. $ samtools view -h P3-VERO-P3-1-vero_L4_sort.bam | head -n 10
  1. @HD VN:1.6 SO:coordinate
  2. @SQ SN:NC_045512.2 LN:29903
  3. @PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 4 -M /public/home/ykk/SARS_CoV_2/mapping/ref/Wuhan-Hu-1.fasta /public/home/ykk/SARS_CoV_2/clean_data/P3-VERO-P3-1-vero_L4_1.fq.gz /public/home/ykk/SARS_CoV_2/clean_data/P3-VERO-P3-1-vero_L4_2.fq.gz
  4. @PG ID:samtools PN:samtools PP:bwa VN:1.11 CL:samtools view -b -F 12 /public/home/ykk/SARS_CoV_2/mapping/P3-VERO-P3-1-vero_L4.sam
  5. @PG ID:samtools.1 PN:samtools PP:samtools VN:1.11 CL:samtools sort -@ 4 -l 9 -o /public/home/ykk/SARS_CoV_2/sortedbam/P3-VERO-P3-1-vero_L4_sort.bam /public/home/ykk/SARS_CoV_2/bam/P3-VERO-P3-1-vero_L4.bam
  6. @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.12 CL:samtools view -h P3-VERO-P3-1-vero_L4_sort.bam
  7. A00821:275:HWMMWDSXX:4:1116:11957:9580 163 NC_045512.2 1 60 92M = 1 92 ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGC ,FFFF:F:FFFF:FFF:FFFFFF:FFFFFFFFFFFFFFFF:FFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F NM:i:0 MD:Z:92 MC:Z:92M AS:i:92 XS:i:0

3. More information: