1. 介绍
2. 使用
2.1 下载
$ conda search muscleLoading channels: done# Name Version Build Channelmuscle 3.8.31 0 biocondamuscle 3.8.1551 1 biocondamuscle 3.8.1551 2 biocondamuscle 3.8.1551 h2d50403_3 biocondamuscle 3.8.1551 h6bb024c_4 biocondamuscle 3.8.1551 h7d875b9_6 biocondamuscle 3.8.1551 hc9558a2_5 biocondamuscle 5.1 h7d875b9_0 biocondamuscle 5.1 h9f5acd7_1 bioconda
2.2 运行
$ cat test.fa>gene1MRLFLLLAFNALMQLEAYGFTDESDRQALLEIKSQVSESKRDALSAWNNSFP>gene2MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKETSRVVLG>gene3MRLFLLLAFNALMLLETHGFTDETDRQALLQFKSQVSEDKRVVLSSWNHSFPLCNWKGVT>gene4MKLFLLLSFSAHLLLGETDRQALLEFKSQVSEGKRDVLSSWNNSFPLCNWKWVT>gene5MKLSFSLVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF>gene6MKVCILVFAQARFSNETDMQALLEFKSQVTENKREVLASWNHSFPL
$ muscle -in test.fa -quiet | seqkit seq -w 0>gene2MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKE-----TSRVVLG--------------->gene5-------MKLSFS--LVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF------->gene6-------MKVCIL---------------VFAQARFSNETDMQALLEFKSQVTE-NKREVLASWNHSFPL------->gene4-------MKLFLL--LSFSAHL------LL------GETDRQALLEFKSQVSE-GKRDVLSSWNNSFPLCNWKWVT>gene1-------MRLFLL--LAFNALM------QLEAYGFTDESDRQALLEIKSQVSE-SKRDALSAWNNSFP-------->gene3-------MRLFLL--LAFNALM------LLETHGFTDETDRQALLQFKSQVSE-DKRVVLSSWNHSFPLCNWKGVT
- 其他
看来我的版本比较老了
$ muscleMUSCLE v3.8.1551 by Robert C. Edgarhttp://www.drive5.com/muscleThis software is donated to the public domain.Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.Basic usagemuscle -in <inputfile> -out <outputfile>Common options (for a complete list please see the User Guide):-in <inputfile> Input file in FASTA format (default stdin)-out <outputfile> Output alignment in FASTA format (default stdout)-diags Find diagonals (faster for similar sequences)-maxiters <n> Maximum number of iterations (integer, default 16)-maxhours <h> Maximum time to iterate in hours (default no limit)-html Write output in HTML format (default FASTA)-msf Write output in GCG MSF format (default FASTA)-clw Write output in CLUSTALW format (default FASTA)-clwstrict As -clw, with 'CLUSTAL W (1.81)' header-log[a] <logfile> Log to file (append if -loga, overwrite if -log)-quiet Do not write progress messages to stderr-version Display version information and exitWithout refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3Fastest possible (nucleotides): -maxiters 1 -diags
参考
在线:https://www.ebi.ac.uk/Tools/msa/muscle/
本地:https://www.drive5.com/muscle/
多序列比对的软件:muscle、mafft、clustalw……
现在常用的是前两者,上面用的mafft,这里看一下muscle
什么时候要用到多序列比对,它的结果能用于什么呢?
- 用于构建基因树:
1.1 用trimAl 修剪比对结果,用iqtree、fasttree等进行建pep树;
1.2 用pal2nal.pl将cds序列回帖到比对结果,用于构建cds树。 - 用于构建物种树:将单拷贝基因家族的比对结果串联建树。
- 共线性块上的基因对进行全局比对,回帖
cds序列,用yn00等计算ka、ks值。
如果不正确,希望批评指正!
