Result 3: Package the pipeline :

(1)Goals:
Input the original paired-end fastq files、index files、subsampling fraction and assembling strategies, the output file will be the finally assembled fasta files with only single contigs and the whole process taking around 7 hours for each sample.

Usage:
Run the following shell from the command line:
#!/bin/bash
#SBATCH -e subfqall.err
#SBATCH -o subfqall.out
#SBATCH -J subfqall
#SBATCH --ntasks-per-node=2
#SBATCH -N 1#SBATCH -p sugon
#SBATCH -t 120:00:00

python ``ymy.py`` ``--filea`` ./P3-VERO-P3-1-vero_L4_1.fq.gz ``--fileb`` ./P3-VERO-P3-1-vero_L4_2.fq.gz ``--index`` ./2019-nCoV.fasta ``--fraction`` 0.0036 ``--startegy`` meta
(2)—help**
D7FFFFC8-48FC-4610-B9D8-025AE060AC6F.png
(3)Code:**

  1. #!/usr/bin/python
  2. #-*- coding:utf-8 -*-
  3. import argparse
  4. import os
  5. def _argparse():
  6. parser = argparse.ArgumentParser(description='Process spades from fastq')
  7. parser.add_argument('--filea', dest='file1', default='./P3-VERO-P3-1-vero_L4_1.fq.gz', help="The read1 fastq file")
  8. parser.add_argument('--fileb', dest='file2', default='./P3-VERO-P3-1-vero_L4_2.fq.gz', help="The read2 fastq file")
  9. parser.add_argument('--index', dest='fileindex', default='./2019-nCoV.fasta', help="The index")
  10. parser.add_argument('--fraction', dest='fra', default='0.00036', help="The subsample fraction")
  11. parser.add_argument('--startegy', dest='star', default='--meta', help="The startegy of spades")
  12. return parser.parse_args()
  13. def zuzhuang(x,y,z,w,t):
  14. '''
  15. os.system(f"fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz")
  16. os.system(f"bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam")
  17. os.system(f"samtools index -@ 8 sorted.bam sorted.bam.bai")
  18. os.system(f"sambamba view -t 10 -h -s {w} sorted.bam -o subsorted.bam")
  19. os.system(f"samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
  20. os.system(f"spades.py -{t} -1 paired1.fq -2 paired2.fq -o outputfile")
  21. '''
  22. os.system("fastp -i {x} -I {y} -o out1.fastq.gz -O out2.fastq.gz".format(x=x, y=y))
  23. os.system("bwa mem -t 20 {z} out1.fq.gz out2.fq.gz | samtools sort -@ 10 -O bam -o sorted.bam".format(z = z))
  24. os.system("samtools index -@ 8 sorted.bam sorted.bam.bai")
  25. os.system("sambamba view -t 10 -h -s {w} sorted.bam -o subsorted.bam".format(w = w))
  26. os.system("samtools fastq -s -f 3 -1 paired1.fq -2 paired2.fq subsorted.bam")
  27. os.system("spades.py --{t} -1 paired1.fq -2 paired2.fq -o outputfile".format(t = t))
  28. #return outputfile
  29. print("all finished!")
  30. def main():
  31. parser = _argparse()
  32. zuzhuang(parser.file1, parser.file2, parser.fileindex, parser.fra, parser.star)
  33. if __name__ == '__main__':
  34. main()

(4)Performance on 11 samples :

Parameter selection:
Filtering parameter during subsampling : -f 3
Subsampling coverage selection: 1000-10000
SPAdes strategies: —meta,**

Assembled results of 11 samples:
Each sample only has one contig in the assemblie. Some indicators, as the coverage of reference genome, the miss-assembled rates and the error rate are more reliable than before.
EFF62EF0-AB95-468D-97B4-8FB12D3EFE66.png