Result 3: Package the pipeline :
(1)Goals:
Input the original paired-end fastq files、index files、subsampling fraction and assembling strategies, the output file will be the finally assembled fasta files with only single contigs and the whole process taking around 7 hours for each sample.
Usage:
Run the following shell from the command line:#!/bin/bash
#SBATCH -e subfqall.err
#SBATCH -o subfqall.out
#SBATCH -J subfqall
#SBATCH --ntasks-per-node=2
#SBATCH -N 1#SBATCH -p sugon
#SBATCH -t 120:00:00
python ``ymy.py`` ``--filea`` ./P3-VERO-P3-1-vero_L4_1.fq.gz ``--fileb`` ./P3-VERO-P3-1-vero_L4_2.fq.gz ``--index`` ./2019-nCoV.fasta ``--fraction`` 0.0036 ``--startegy`` meta
(2)—help**:
(3)Code:**
#!/usr/bin/python
#-*- coding:utf-8 -*-
import argparse
import os
def _argparse():
parser = argparse.ArgumentParser(description='Process spades from fastq')
parser.add_argument('--filea', dest='file1', default='./P3-VERO-P3-1-vero_L4_1.fq.gz', help="The read1 fastq file")
parser.add_argument('--fileb', dest='file2', default='./P3-VERO-P3-1-vero_L4_2.fq.gz', help="The read2 fastq file")
parser.add_argument('--index', dest='fileindex', default='./2019-nCoV.fasta', help="The index")
parser.add_argument('--fraction', dest='fra', default='0.00036', help="The subsample fraction")
parser.add_argument('--startegy', dest='star', default='--meta', help="The startegy of spades")
return parser.parse_args()
def zuzhuang(x,y,z,w,t):
'''
os.system(f"fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz")
os.system(f"bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam")
os.system(f"samtools index -@ 8 sorted.bam sorted.bam.bai")
os.system(f"sambamba view -t 10 -h -s {w} sorted.bam -o subsorted.bam")
os.system(f"samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
os.system(f"spades.py -{t} -1 paired1.fq -2 paired2.fq -o outputfile")
'''
os.system("fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz".format(x=x, y=y))
os.system("bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam".format(z = z))
os.system("samtools index -@ 8 sorted.bam sorted.bam.bai")
os.system("sambamba view -t 10 -h -s {w} sorted.bam -o subsorted.bam".format(w = w))
os.system("samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
os.system("spades.py --{t} -1 paired1.fq -2 paired2.fq -o outputfile".format(t = t))
#return outputfile
print("all finished!")
def main():
parser = _argparse()
zuzhuang(parser.file1, parser.file2, parser.fileindex, parser.fra, parser.star)
if __name__ == '__main__':
main()
(4)Performance on 11 samples :
Parameter selection:
Filtering parameter during subsampling : -f 3
Subsampling coverage selection: 1000-10000
SPAdes strategies: —meta,**
Assembled results of 11 samples:
Each sample only has one contig in the assemblie. Some indicators, as the coverage of reference genome, the miss-assembled rates and the error rate are more reliable than before.