1. 介绍
2. 使用
2.1 下载
$ conda search muscle
Loading channels: done
# Name Version Build Channel
muscle 3.8.31 0 bioconda
muscle 3.8.1551 1 bioconda
muscle 3.8.1551 2 bioconda
muscle 3.8.1551 h2d50403_3 bioconda
muscle 3.8.1551 h6bb024c_4 bioconda
muscle 3.8.1551 h7d875b9_6 bioconda
muscle 3.8.1551 hc9558a2_5 bioconda
muscle 5.1 h7d875b9_0 bioconda
muscle 5.1 h9f5acd7_1 bioconda
2.2 运行
$ cat test.fa
>gene1
MRLFLLLAFNALMQLEAYGFTDESDRQALLEIKSQVSESKRDALSAWNNSFP
>gene2
MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKETSRVVLG
>gene3
MRLFLLLAFNALMLLETHGFTDETDRQALLQFKSQVSEDKRVVLSSWNHSFPLCNWKGVT
>gene4
MKLFLLLSFSAHLLLGETDRQALLEFKSQVSEGKRDVLSSWNNSFPLCNWKWVT
>gene5
MKLSFSLVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF
>gene6
MKVCILVFAQARFSNETDMQALLEFKSQVTENKREVLASWNHSFPL
$ muscle -in test.fa -quiet | seqkit seq -w 0
>gene2
MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKE-----TSRVVLG---------------
>gene5
-------MKLSFS--LVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF-------
>gene6
-------MKVCIL---------------VFAQARFSNETDMQALLEFKSQVTE-NKREVLASWNHSFPL-------
>gene4
-------MKLFLL--LSFSAHL------LL------GETDRQALLEFKSQVSE-GKRDVLSSWNNSFPLCNWKWVT
>gene1
-------MRLFLL--LAFNALM------QLEAYGFTDESDRQALLEIKSQVSE-SKRDALSAWNNSFP--------
>gene3
-------MRLFLL--LAFNALM------LLETHGFTDETDRQALLQFKSQVSE-DKRVVLSSWNHSFPLCNWKGVT
- 其他
看来我的版本比较老了
$ muscle
MUSCLE v3.8.1551 by Robert C. Edgar
http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
Basic usage
muscle -in <inputfile> -out <outputfile>
Common options (for a complete list please see the User Guide):
-in <inputfile> Input file in FASTA format (default stdin)
-out <outputfile> Output alignment in FASTA format (default stdout)
-diags Find diagonals (faster for similar sequences)
-maxiters <n> Maximum number of iterations (integer, default 16)
-maxhours <h> Maximum time to iterate in hours (default no limit)
-html Write output in HTML format (default FASTA)
-msf Write output in GCG MSF format (default FASTA)
-clw Write output in CLUSTALW format (default FASTA)
-clwstrict As -clw, with 'CLUSTAL W (1.81)' header
-log[a] <logfile> Log to file (append if -loga, overwrite if -log)
-quiet Do not write progress messages to stderr
-version Display version information and exit
Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
Fastest possible (nucleotides): -maxiters 1 -diags
参考
在线:https://www.ebi.ac.uk/Tools/msa/muscle/
本地:https://www.drive5.com/muscle/
多序列比对的软件:muscle、mafft、clustalw……
现在常用的是前两者,上面用的mafft,这里看一下muscle
什么时候要用到多序列比对,它的结果能用于什么呢?
- 用于构建基因树:
1.1 用trimAl 修剪比对结果,用iqtree
、fasttree
等进行建pep
树;
1.2 用pal2nal.pl
将cds
序列回帖到比对结果,用于构建cds
树。 - 用于构建物种树:将单拷贝基因家族的比对结果串联建树。
- 共线性块上的基因对进行全局比对,回帖
cds
序列,用yn00
等计算ka、ks值。
如果不正确,希望批评指正!