MAFFT

Multiple alignment program for amino acid or nucleotide sequences.

Options:

  1. $ mafft --help
  1. High speed:
  2. % mafft in > out
  3. % mafft --retree 1 in > out (fast)
  4. High accuracy (for <~200 sequences x <~2,000 aa/nt):
  5. % mafft --maxiterate 1000 --localpair in > out (% linsi in > out is also ok)
  6. % mafft --maxiterate 1000 --genafpair in > out (% einsi in > out)
  7. % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out)
  8. If unsure which option to use:
  9. % mafft --auto in > out
  10. --clustalout # Output: clustal format, default: fasta
  11. --thread # Number of threads (if unsure, --thread -1)
  12. --inputorder # Output order: same as input. Default: on
  13. --reorder # Output order: aligned. Default: off (inputorder)
  14. --adjustdirection # Adjust the direction according to the first sequence

Online version: https://mafft.cbrc.jp/alignment/server/
LAST hits (score>39) between the top sequence and the others.
ML phylogenetic tree - 图1
The assembled genome sequence2, 3 and 5 are reverse compared to NC_045512.2. Thus when aligning these sequences, --adjustdirection option should be added.

Manual: https://mafft.cbrc.jp/alignment/software/manual/manual.html

iqtree

IQ-TREE takes as input a multiple sequence alignment and will reconstruct an evolutionary tree that is best explained by the input data.

Options:

  1. -s # Specify input alignment file in PHYLIP, FASTA, NEXUS, CLUSTAL or MSF format.
  2. -m # Specify the model name to use during the analysis
  3. -redo # Redo the entire analysis no matter if it was stopped or successful. WARNING: This option will overwrite all existing output files.
  4. --prefix # change the output files' prefix

Output files:

  • example.phy.iqtree: the main report file that is self-readable.
  • example.phy.treefile: the ML tree in NEWICK format, which can be visualized by any supported tree viewer programs like FigTree or iTOL.
  • example.phy.log: log file of the entire run (also printed on the screen).
  • example.phy.ckp.gz : IQ-TREE automatically performs check pointing to resume an interrupted analysis. During the program run, IQ-TREE periodically wrote to disk a checkpoint file example.phy.ckp.gz. This checkpoint file is used to resume an interrupted run. If the run did not finish, invoking IQ-TREE again with the very same command line will recover the analysis from the last stopped point.
    • If the run successfully completed, running again will issue an error message:
      1. ERROR: Checkpoint (example.phy.ckp.gz) indicates that a previous run successfully finished
      2. Use `-redo` option if you really want to redo the analysis and overwrite all output files.
      If you really want to re-run the analysis and overwrite all previous output files, use -redo option:

      Manual:

      http://www.iqtree.org/doc/Tutorial
      https://github.com/iqtree/iqtree

      iTOL

      iTOL is an online tool for the display, annotation and management of phylogenetic and other trees.

iTOL: Interactive Tree Of Life (embl.de)

Pipeline

  • Alignment

    1. mafft --reorder --adjustdirection --auto --clustalout --thread 4 ~/SARS_CoV_2/mafft/sequences112.fasta > output.aln

    View .aln file using AliView
    ML phylogenetic tree - 图2
    The sequences adjusted direction were added a prefix ‘R‘.

  • Infer maximum-likelihood tree using **GTR+I+G** model.

    1. iqtree -s /public/home/ykk/SARS_CoV_2/mafft/output.aln -m GTR+I+G

    ML phylogenetic tree - 图3

  • Visualize **.treefile** by tree viewer programs iTOL.

ML phylogenetic tree - 图4

  • Identify mutations in the viral genomes by comparing to the **NC_045512.2** ref genome.
    1. mafft --reorder --adjustdirection --auto --clustalout --thread 4 ~/SARS_CoV_2/mafft/sequences12.fasta > output1.aln
    ML phylogenetic tree - 图5