SRAtoolkit是NCBI开发的一个用于SRA文件处理的软件包,包含许多有用的工具。
下载SRAtoolkit软件包
软件包下载地址在NCBI网站:https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
目前最新版是2.10.8;有多个操作系统版本,我们选Ubuntu 64bit版;
wget -P ~/download https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.8/sratoolkit.2.10.8-ubuntu64.tar.gz
解压压缩包
tar zvxf ~/download/sratoolkit.2.10.8-ubuntu64.tar.gz -C ~/softwaremv ~/software/sratoolkit.2.10.8-ubuntu64 ~/software/sratoolkit
将sratoolkit安装文件路径加入环境变量
echo "export PATH=$PATH:/home/ubuntu/software/sratoolkit/bin" >> ~/.zshrcsource ~/.zshrc
使用fastq-dump和prefetch进行测试
更详细步骤和说明,参见官网Documentation:
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=std
下载SRA文件
prefetch SRR101072702020-10-03T06:03:13 prefetch.2.9.6: 1) Downloading 'SRR10107270'...2020-10-03T06:03:13 prefetch.2.9.6: Downloading via https...2020-10-03T06:09:01 prefetch.2.9.6: https download succeed2020-10-03T06:09:01 prefetch.2.9.6: 1) 'SRR10107270' was downloaded successfully2020-10-03T06:09:01 prefetch.2.9.6: 'SRR10107270' has 0 unresolved dependencies
解压SRA文件
for i in *sradoecho $ifastq-dump --split-3 $idone
查看文件
lsSRR10107270_1.fastq SRR10107270_2.fastq SRR10107270.sra
head SRR10107270_1.fastq@SRR10107270.1 1 length=72TCGGGNAGTGCTAGCTCGCGATTCCAGGATGTAGTTAACCTTGAGCACAATTTCATTGACGNNAGCAGCNNN+SRR10107270.1 1 length=72AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE##EEEEEE###@SRR10107270.2 2 length=76CATGTNATTGTTGTAGGAATCAAAGTCAAACACATTTCGAACTACACTGGAGAGACCTTCANNCGGAAANTNNNGT+SRR10107270.2 2 length=76AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEE/EEEEEE##EEEEEA#E###EE@SRR10107270.3 3 length=76AGACGNTGGAGGATGAAGGGCTGGCTGTTGGGTCTGTTCTTGCTCTAAGGCCACATCCTAGNAAAGCAGGGNNNGT
可以看到测序read的读长是七十多bp
