Package the pipeline - 《The English version of COVID-19 Genome Assembly》

Result 3: Package the pipeline :

Result 3: Package the pipeline :

（1）Goals：
Input the original paired-end fastq files、index files、subsampling fraction and assembling strategies, the output file will be the finally assembled fasta files with only single contigs and the whole process taking around 7 hours for each sample.

Usage:
Run the following shell from the command line：
#!/bin/bash
#SBATCH -e subfqall.err
#SBATCH -o subfqall.out
#SBATCH -J subfqall
#SBATCH --ntasks-per-node=2
#SBATCH -N 1#SBATCH -p sugon
#SBATCH -t 120:00:00

python ``ymy.py`` ``--filea`` ./P3-VERO-P3-1-vero_L4_1.fq.gz ``--fileb`` ./P3-VERO-P3-1-vero_L4_2.fq.gz ``--index`` ./2019-nCoV.fasta ``--fraction`` 0.0036 ``--startegy`` meta
（2）—help**：

（3）Code：**

#!/usr/bin/python
#-*- coding:utf-8 -*-
import argparse
import os
def _argparse():
    parser = argparse.ArgumentParser(description='Process spades from fastq')
    parser.add_argument('--filea', dest='file1', default='./P3-VERO-P3-1-vero_L4_1.fq.gz', help="The read1 fastq file")
    parser.add_argument('--fileb', dest='file2', default='./P3-VERO-P3-1-vero_L4_2.fq.gz', help="The read2 fastq file")
    parser.add_argument('--index', dest='fileindex', default='./2019-nCoV.fasta', help="The index")
    parser.add_argument('--fraction', dest='fra', default='0.00036', help="The subsample fraction")
    parser.add_argument('--startegy', dest='star', default='--meta', help="The startegy of spades")
    return parser.parse_args()
def zuzhuang(x,y,z,w,t):
    '''
    os.system(f"fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz")
    os.system(f"bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam")
    os.system(f"samtools index -@ 8 sorted.bam sorted.bam.bai")
    os.system(f"sambamba view -t 10 -h -s {w}  sorted.bam -o subsorted.bam")
    os.system(f"samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
    os.system(f"spades.py -{t} -1 paired1.fq -2 paired2.fq  -o  outputfile")
    '''
    os.system("fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz".format(x=x, y=y))
    os.system("bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam".format(z = z))
    os.system("samtools index -@ 8 sorted.bam sorted.bam.bai")
    os.system("sambamba view -t 10 -h -s {w}  sorted.bam -o subsorted.bam".format(w = w))
    os.system("samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
    os.system("spades.py --{t} -1 paired1.fq -2 paired2.fq  -o  outputfile".format(t = t))
    #return outputfile
    print("all finished!")
def main():
    parser = _argparse()
    zuzhuang(parser.file1, parser.file2, parser.fileindex, parser.fra, parser.star)
if __name__ == '__main__':
    main()

(4)Performance on 11 samples ：

Parameter selection:
Filtering parameter during subsampling : -f 3
Subsampling coverage selection: 1000-10000
SPAdes strategies: —meta,**

Assembled results of 11 samples:
Each sample only has one contig in the assemblie. Some indicators, as the coverage of reference genome, the miss-assembled rates and the error rate are more reliable than before.