1. 介绍

2. 使用

2.1 下载

  1. $ conda search muscle
  2. Loading channels: done
  3. # Name Version Build Channel
  4. muscle 3.8.31 0 bioconda
  5. muscle 3.8.1551 1 bioconda
  6. muscle 3.8.1551 2 bioconda
  7. muscle 3.8.1551 h2d50403_3 bioconda
  8. muscle 3.8.1551 h6bb024c_4 bioconda
  9. muscle 3.8.1551 h7d875b9_6 bioconda
  10. muscle 3.8.1551 hc9558a2_5 bioconda
  11. muscle 5.1 h7d875b9_0 bioconda
  12. muscle 5.1 h9f5acd7_1 bioconda

2.2 运行

  1. $ cat test.fa
  2. >gene1
  3. MRLFLLLAFNALMQLEAYGFTDESDRQALLEIKSQVSESKRDALSAWNNSFP
  4. >gene2
  5. MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKETSRVVLG
  6. >gene3
  7. MRLFLLLAFNALMLLETHGFTDETDRQALLQFKSQVSEDKRVVLSSWNHSFPLCNWKGVT
  8. >gene4
  9. MKLFLLLSFSAHLLLGETDRQALLEFKSQVSEGKRDVLSSWNNSFPLCNWKWVT
  10. >gene5
  11. MKLSFSLVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF
  12. >gene6
  13. MKVCILVFAQARFSNETDMQALLEFKSQVTENKREVLASWNHSFPL
  1. $ muscle -in test.fa -quiet | seqkit seq -w 0
  2. >gene2
  3. MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKE-----TSRVVLG---------------
  4. >gene5
  5. -------MKLSFS--LVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF-------
  6. >gene6
  7. -------MKVCIL---------------VFAQARFSNETDMQALLEFKSQVTE-NKREVLASWNHSFPL-------
  8. >gene4
  9. -------MKLFLL--LSFSAHL------LL------GETDRQALLEFKSQVSE-GKRDVLSSWNNSFPLCNWKWVT
  10. >gene1
  11. -------MRLFLL--LAFNALM------QLEAYGFTDESDRQALLEIKSQVSE-SKRDALSAWNNSFP--------
  12. >gene3
  13. -------MRLFLL--LAFNALM------LLETHGFTDETDRQALLQFKSQVSE-DKRVVLSSWNHSFPLCNWKGVT
  • 其他

看来我的版本比较老了

  1. $ muscle
  2. MUSCLE v3.8.1551 by Robert C. Edgar
  3. http://www.drive5.com/muscle
  4. This software is donated to the public domain.
  5. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
  6. Basic usage
  7. muscle -in <inputfile> -out <outputfile>
  8. Common options (for a complete list please see the User Guide):
  9. -in <inputfile> Input file in FASTA format (default stdin)
  10. -out <outputfile> Output alignment in FASTA format (default stdout)
  11. -diags Find diagonals (faster for similar sequences)
  12. -maxiters <n> Maximum number of iterations (integer, default 16)
  13. -maxhours <h> Maximum time to iterate in hours (default no limit)
  14. -html Write output in HTML format (default FASTA)
  15. -msf Write output in GCG MSF format (default FASTA)
  16. -clw Write output in CLUSTALW format (default FASTA)
  17. -clwstrict As -clw, with 'CLUSTAL W (1.81)' header
  18. -log[a] <logfile> Log to file (append if -loga, overwrite if -log)
  19. -quiet Do not write progress messages to stderr
  20. -version Display version information and exit
  21. Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
  22. Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
  23. Fastest possible (nucleotides): -maxiters 1 -diags

参考

在线:https://www.ebi.ac.uk/Tools/msa/muscle/

本地:https://www.drive5.com/muscle/

多序列比对的软件:musclemafftclustalw……
现在常用的是前两者,上面用的mafft,这里看一下muscle

什么时候要用到多序列比对,它的结果能用于什么呢?

  1. 用于构建基因树:
    1.1 用trimAl 修剪比对结果,用iqtreefasttree等进行建pep树;
    1.2 用pal2nal.plcds序列回帖到比对结果,用于构建cds树。
  2. 用于构建物种树:将单拷贝基因家族的比对结果串联建树。
  3. 共线性块上的基因对进行全局比对,回帖cds序列,用yn00等计算ka、ks值。

如果不正确,希望批评指正!