Samtools is a set of utilities that manipulate alignments in the SAM (Sequence Alignment/Map), BAM, and CRAM formats. It converts between the formats, does sorting, merging and indexing, and can retrieve reads in any regions swiftly.
1. Commands & options:
1.1 samtools view
View and convert SAM/BAM/CRAM files.
-b # Output in the BAM format.-h # Include the header in the output.-S # Input is in SAM format.-F <INT> # Do not output alignments with any bits set in INT present in the FLAG field.-@ <INT> # Number of threads.-c # count reads and print the total number
1.2 samtools sort
The file size can be compressed after sorting. An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary.
-l <INT># Set the desired compression level for the final output file, ranging from 0 (uncompressed) or 1 (fastest but minimal compression) to 9 (best compression but slowest to write), similarly to gzip(1)'s compression level setting.-n # Sort by read names (i.e., the QNAME field) rather than by chromosomal coordinates.-m <INT> # Set maximum memory per thread; suffix K/M/G recognized [768M].-t <TAG> # Sort first by the value in the alignment tag TAG, then by position or name (if also using -n).-o <FILE> # output filename-O <FORMAT> # output file format
The -n and -t options are incompatible with samtools index.
1.3 smtools index
Index a coordinate-sorted BGZIP-compressed SAM, BAM or CRAM file for fast random access.
1.4 samtools flagstat
Counts the number of alignments for each FLAG type.
1.5 samtools depth
Compute the read depth at each position or region.
-d/-m <int> # maximum coverage depth [default = 8000]. If 0, depth is set to the maximum integer value, effectively removing any depth limit.
1.6 samtools fastq/fasta
Convert a SAM/BAM/CRAM file to FASTQ or FASTA.
1.7 samtools markdup
Mark duplicate alignments in a coordinate sorted file.
-r # Remove duplicate reads-l <INT> # Expected maximum read length of INT bases. [300]
2. Pipeline
2.1
$ samtools view -b -F 12 ~/SARS_CoV_2/mapping/P3-VERO-P3-1-vero_L4.sam > ~/SARS_CoV_2/mapping/P3-VERO-P3-1-vero_L4.bam
-F 12: Don’t output unmapped reads. 4+8=12. View the meaning of the flag value: SAM/BAM.
2.2
$ samtools sort -@ 4 -l 9 ~/SARS_CoV_2/mapping/P3-VERO-P3-1-vero_L4.bam -o ~/SARS_CoV_2/sort/P3-VERO-P3-1-vero_L4_sort.bam$ samtools index ~/SARS_CoV_2/sort/P3-VERO-P3-1-vero_L4_sort.bam ~/SARS_CoV_2/sortedbam/P3-VERO-P3-1-vero_L4_sort.bam.bai
2.3
$ samtools flagstat ~/SARS_CoV_2/sort/P3-VERO-P3-1-vero_L4_sort.bam > flagstat-1.txt
473417027 + 0 in total (QC-passed reads + QC-failed reads)3432805 + 0 secondary0 + 0 supplementary0 + 0 duplicates473417027 + 0 mapped (100.00% : N/A)469984222 + 0 paired in sequencing234992111 + 0 read1234992111 + 0 read2462394014 + 0 properly paired (98.39% : N/A)469984222 + 0 with itself and mate mapped0 + 0 singletons (0.00% : N/A)0 + 0 with mate mapped to a different chr0 + 0 with mate mapped to a different chr (mapQ>=5)
2.4
$ samtools depth -d 0 ~/SARS_CoV_2/sort/P3-VERO-P3-1-vero_L4_sort.bam > depth3-1.txt
NC_045512.2 101 8004NC_045512.2 102 8005NC_045512.2 103 7996NC_045512.2 104 7998NC_045512.2 105 7999NC_045512.2 106 8000NC_045512.2 107 8001NC_045512.2 108 8002NC_045512.2 109 8003NC_045512.2 110 8004NC_045512.2 111 8004NC_045512.2 112 8004NC_045512.2 113 8003NC_045512.2 114 7994NC_045512.2 115 7993NC_045512.2 116 7994NC_045512.2 117 7992NC_045512.2 118 7992NC_045512.2 119 7992NC_045512.2 120 7975
If -d options is not be set, the result as follows:
$ samtools depth ~/SARS_CoV_2/sort/P3-VERO-P3-1-vero_L4_sort.bam > depth1-1.txt
NC_045512.2 101 1706050NC_045512.2 102 1708068NC_045512.2 103 1710832NC_045512.2 104 1711865NC_045512.2 105 1716288NC_045512.2 106 1724816NC_045512.2 107 1729485NC_045512.2 108 1737588NC_045512.2 109 1775730NC_045512.2 110 1778939NC_045512.2 111 1780036NC_045512.2 112 1780479NC_045512.2 113 1787108NC_045512.2 114 1793678NC_045512.2 115 1795985NC_045512.2 116 1797876NC_045512.2 117 1803406NC_045512.2 118 1806767NC_045512.2 119 1818003NC_045512.2 120 1816881
Calculate the average sequencing depth.
$ samtools view P3-VERO-P3-1-vero_L4_sort.bam | awk '{print $10}' | grep -o '[ACGTN]' | wc -l# Divide the result by the base number of the reference sequence.
More informations:
