RNA-Seq, 注释和富集 - 图1
© cshlpress.com

由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

Swiss-Prot Annotation

prerequisite

Prepare Softwares
downloads latest blast+

  1. wget -c https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gz
  2. tar -zxvf ncbi-blast-2.12.0+-x64-linux.tar.gz
  3. cd ncbi-blast-2.12.0+/bin

Prepare Database
Download the latest version from Uniport

for example, we’d like to download the reviewed database:

  1. wget -c https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
  2. gzip -d uniprot_sprot.fasta.gz
  3. du -h uniprot_sprot.fasta

text268M uniprot_sprot.fasta

  1. ~/Bio/blast/ncbi-blast-2.12.0+/bin/makeblastdb -in uniprot_sprot.fasta -dbtype prot -parse_seqids -out Swiss
  2. ls
  1. Swiss.pdb Swiss.pin Swiss.pos Swiss.psq Swiss.pto
  2. Swiss.phr Swiss.pog Swiss.pot Swiss.ptf uniprot_sprot.fasta

Blast

mkdir Annotation
cd Annotation
blastx -query ../trinity_out_dir.Trinity.fasta -out blast.out -db /run/media/ken/BackUP/blastdb/Swiss -outfmt "6 qacc sacc evalue stitle sblastname" -evalue 1e-5 -max_target_seqs 1 -num_threads 8 -max_hsps 1

head blast.out
qacc sacc evalue stitle sblastname
TRINITY_DN1_c0_g1_i1 P31792 4.12e-31 Pol polyprotein (Fragment) OS=Feline endogenous virus ECE1 OX=11766 GN=pol PE=3 SV=1 N/A
TRINITY_DN10_c0_g1_i1 P63088 3.52e-16 Serine/threonine-protein phosphatase PP1-gamma catalytic subunit OS=Rattus norvegicus OX=10116 GN=Ppp1cc PE=1 SV=1 N/A
TRINITY_DN12_c0_g1_i1 Q61092 0.0 Laminin subunit gamma-2 OS=Mus musculus OX=10090 GN=Lamc2 PE=1 SV=2 N/A
TRINITY_DN13_c0_g1_i1 P09405 1.17e-14 Nucleolin OS=Mus musculus OX=10090 GN=Ncl

Module Species

For example, this group of transcripts is belongs to Mus musculus (house mouse), we can find well annotated protein from NCBI Genome

We can using this from NCBI (Link)

gzip -d GCF_000001635.27_GRCm39_protein.faa.gz
~/Bio/blast/ncbi-blast-2.12.0+/bin/makeblastdb -in GCF_000001635.27_GRCm39_protein.faa  -dbtype prot -parse_seqids -out Mus_musculus

~/Bio/blast/ncbi-blast-2.12.0+/bin/blastx -query ../trinity_out_dir.Trinity.fasta -out blast.out_Mus -db /run/media/ken/BackUP/blastdb/Swiss -outfmt "6 qacc sacc evalue stitle sblastname" -evalue 1e-5 -max_target_seqs 1 -num_threads 8 -max_hsps 1

Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗