Fasta Sequences Align| Karobben - 《Protocol for Bioinformatics》

Fasta Sequences Align
- Quick look about the data
ClustelW2
- Install
- ALign prarmeters
Muscle
Tcoffee

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

Fasta Sequences Align

Quick look about the data

Suppose there is a fasta file name as LYSV-NCBI.fasta

Have a quick look

grep  ">" LYSV-NCBI.fasta |wc
wc LYSV-NCBI.fasta
echo $(wc LYSV-NCBI.fasta| awk '{print $2}')-$(grep  ">" LYSV-NCBI.fasta |wc |awk '{print $2}')|bc

124    1114    8363
18250   19116 1282572 LYSV-NCBI.fasta
18002

As you can see, there are 124 sequences and roughly about 18002 bases

Let’s have a quick look at the sequences by Seq-view

Seq-view1.3 -i LYSV-NCBI.fasta

Fasta Sequences Align| Karobben - 图1

ClustelW2

Install

url: click me

wget http://www.clustal.org/download/current/clustalw-2.1.tar.gz
tar -zxvf clustalw-2.1.tar.gz
cd clustalw-2.1/
./configure
make
make install

clustalw2 -QUICKTREE -OUTPUT=FASTA  -INFILE=LYSV-NCBI.fasta

It takes less than 20 mins
Fasta Sequences Align| Karobben - 图2

Quick look

Seq-view1.3  -i LYSV-NCBI.fasta -a 90

Fasta Sequences Align| Karobben - 图3

ALign prarmeters

UPGAM Tree

clustalw2 -QUICKTREE -OUTPUT=FASTA  -INFILE=LYSV-NCBI.fasta -CLUSTERING=UPGMA -BOOTSTRAP=1000

clustalw2 -QUICKTREE -OUTPUT=FASTA  -INFILE=LYSV-NCBI.fasta -CLUSTERING=NJ -BOOTSTRAP=1000

Muscle

apt install muscle
time muscle -in LYSV-NCBI.fasta -out 123.fa

Fasta Sequences Align| Karobben - 图4

real  17m59.470s
user  17m59.034s
sys  0m0.256s

It takes about 18min as well as ClustelW2

Tcoffee

Source: Fasta Sequences Align| Karobben - 图5 T-Coffee

Protein sequences

# Default
t_coffee sample_seq1.fasta

# Quick
t_coffee sample_seq1.fasta -mode quickaln

# Consistent (M-Coffee combines the most common MSA packages)
t_coffee sample_seq1.fasta -mode mcoffee

# Structure (Expresso finds structures homologous to your sequences)
t_coffee sample_seq1.fasta -mode expresso

# Homology (PSI-Coffee enriches your dataset with homologous sequences)
t_coffee sample_seq1.fasta -mode psicoffee

# Accurate (combines Structures and Homology)
t_coffee sample_seq1.fasta -mode accurate

DNA sequences

# Default
t_coffee sample_dnaseq1.fasta

# Functional (Pro-Coffee increases accuracy of functional DNA regions )
t_coffee sample_dnaseq1.fasta -mode procoffee

RNA sequences

# Default
t_coffee sample_rnaseq1.fasta

# Structure 2D (R-Coffee uses predicted secondary structures)
t_coffee sample_rnaseq1.fasta -mode rcoffee

# Structure 3D (R-Coffee combined with Consan structural alignments)
t_coffee sample_rnaseq1.fasta -mode rcoffee_consan

# Accurate (RM-Coffee use M-Coffee and secondary structure predictions)
t_coffee sample_rnaseq1.fasta -mode rmcoffee

Full Tutorial:

Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗